<?xml version="1.0" encoding="UTF-8"?>
<?latexml searchpaths="/home/japhy/scienceReplication.artiswrong.com/paper_files/arxiv/2006.02802/latex_extracted"?>
<?latexml class="article" options="10pt,letterpaper"?>
<?latexml package="cogsci"?>
<!--  %**** main.removed.tex Line 25 **** --><?latexml package="booktabs"?>
<?latexml package="amsfonts"?>
<?latexml package="nicefrac"?>
<?latexml package="microtype"?>
<?latexml package="bbm"?>
<?latexml RelaxNGSchema="LaTeXML"?>
<?latexml package="pslatex"?>
<?latexml package="apacite"?>
<?latexml package="float"?>
<?latexml package="graphicx"?>
<?latexml package="subcaption"?>
<?latexml package="xcolor" options="dvipsnames"?>
<document xmlns="http://dlmf.nist.gov/LaTeXML" class="ltx_authors_1line">
  <resource src="LaTeXML.css" type="text/css"/>
  <resource src="ltx-article.css" type="text/css"/>
  <title>A Computational Model of Early Word Learning from the Infant’s Point of View</title>
  <creator role="author">
    <personname>
<break/><text font="bold" fontsize="120%">Satoshi Tsutsui<Math mode="inline" tex="{}^{1}" text="^1" xml:id="m1">
          <XMath>
            <XMApp role="FLOATSUPERSCRIPT" scriptpos="1">
              <XMTok font="medium" fontsize="70%" meaning="1" role="NUMBER">1</XMTok>
            </XMApp>
          </XMath>
        </Math>, Arjun Chandrasekaran<Math mode="inline" tex="{}^{3}" text="^3" xml:id="m2">
          <XMath>
            <XMApp role="FLOATSUPERSCRIPT" scriptpos="1">
              <XMTok font="medium" fontsize="70%" meaning="3" role="NUMBER">3</XMTok>
            </XMApp>
          </XMath>
        </Math>, Md Alimoor Reza<Math mode="inline" tex="{}^{1}" text="^1" xml:id="m3">
          <XMath>
            <XMApp role="FLOATSUPERSCRIPT" scriptpos="1">
              <XMTok font="medium" fontsize="70%" meaning="1" role="NUMBER">1</XMTok>
            </XMApp>
          </XMath>
        </Math>, David Crandall<Math mode="inline" tex="{}^{1}" text="^1" xml:id="m4">
          <XMath>
            <XMApp role="FLOATSUPERSCRIPT" scriptpos="1">
              <XMTok font="medium" fontsize="70%" meaning="1" role="NUMBER">1</XMTok>
            </XMApp>
          </XMath>
        </Math>, Chen Yu<Math mode="inline" tex="{}^{2}" text="^2" xml:id="m5">
          <XMath>
            <XMApp role="FLOATSUPERSCRIPT" scriptpos="1">
              <XMTok font="medium" fontsize="70%" meaning="2" role="NUMBER">2</XMTok>
            </XMApp>
          </XMath>
        </Math><break/></text>{stsutsui,mdreza,djcran,chenyu}@iu.edu, arjun.chandrasekaran@tuebingen.mpg.de
<break/><Math mode="inline" tex="{}^{1}" text="^1" xml:id="m6">
        <XMath>
          <XMApp role="FLOATSUPERSCRIPT" scriptpos="1">
            <XMTok fontsize="70%" meaning="1" role="NUMBER">1</XMTok>
          </XMApp>
        </XMath>
      </Math> Luddy School of Informatics, Computing, and Engineering, Indiana University, USA <break/><Math mode="inline" tex="{}^{2}" text="^2" xml:id="m7">
        <XMath>
          <XMApp role="FLOATSUPERSCRIPT" scriptpos="1">
            <XMTok fontsize="70%" meaning="2" role="NUMBER">2</XMTok>
          </XMApp>
        </XMath>
      </Math> Department of Psychological and Brain Sciences, Indiana University, USA <break/><Math mode="inline" tex="{}^{3}" text="^3" xml:id="m8">
        <XMath>
          <XMApp role="FLOATSUPERSCRIPT" scriptpos="1">
            <XMTok fontsize="70%" meaning="3" role="NUMBER">3</XMTok>
          </XMApp>
        </XMath>
      </Math> Max Planck Institute for Intelligent Systems, Germany
<break/></personname>
  </creator>
  <abstract name="Abstract">
    <p>Human infants have the remarkable ability to learn the associations
between object names and visual objects from inherently ambiguous
experiences. Researchers in cognitive science and developmental
<!--  %**** main.removed.tex Line 100 **** -->psychology have built formal models that implement in-principle
learning algorithms, and then used pre-selected and pre-cleaned
datasets to test the abilities of the models to find statistical
regularities in the input data. In contrast to previous modeling
approaches, the present study used egocentric video and gaze data
collected from infant learners during natural toy play with their parents.
This allowed us to capture the learning
environment from the perspective of the learner’s own point of
view. We then used a Convolutional Neural Network (CNN) model to
process sensory data from the infant’s point of view and learn
name-object associations from scratch. As the first model that takes raw
egocentric video to simulate infant word learning, the present study
provides a proof of principle that the problem of early word learning
can be solved, using actual visual data perceived by infant
learners. Moreover, we conducted simulation experiments to
systematically determine how visual, perceptual, and attentional
properties of infants’ sensory experiences may affect word
learning.</p>
<!--  %**** main.removed.tex Line 125 **** -->    <p><text font="bold">Keywords:</text>
Word learning, Computational Modeling, Eye Tracking and Visual Attention, Parent-Child Social Interaction</p>
  </abstract>
  <ERROR class="undefined">\cogscifinalcopy</ERROR>
<!--  %**** main.removed.tex Line 50 **** 
     %**** main.removed.tex Line 75 ****-->  <section inlist="toc" xml:id="S1">
    <tags>
      <tag>1</tag>
      <tag role="refnum">1</tag>
      <tag role="typerefnum">§1</tag>
    </tags>
    <title><tag close=" ">1</tag>Introduction</title>
<!--  %**** main.removed.tex Line 150 **** -->    <para xml:id="S1.p1">
      <p>Infants show knowledge of their first words as early as 6 months old
and produce their first words at around a year.
Learning object names — a major component of their early vocabularies — in everyday contexts requires young
learners to not only find and recognize visual objects in view but also to map
<!--  %**** main.removed.tex Line 175 **** -->them with heard names. In such a context, infants seem to be able to learn from a
sea of data relevant to object names and their referents because
parents interact with and talk to their infants in various occasions
— from toy play, to picture book reading, to family meal time <cite class="ltx_citemacro_cite"><bibref bibrefs="yu2012embodied" separator=";" show="Authors Phrase1YearPhrase2" yyseparator=",">
            <bibrefphrase>(</bibrefphrase>
            <bibrefphrase>)</bibrefphrase>
          </bibref></cite>.</p>
    </para>
    <para xml:id="S1.p2">
      <p>However, if we take the young learner’s point of view, we see that the
task of word learning is quite challenging. Imagine an infant and
parent playing with several toys jumbled together as shown in
Figure <ref labelref="LABEL:fig:env"/>. When the parent names a particular toy at a
particular moment, the infant perceives 2-dimensional images on the
retina from a first-person point of view, as shown in
Figure <ref labelref="LABEL:fig:overview"/>. These images usually contain multiple
objects in view. Since the learner does not yet know the name of the
toy, how do they recognize all the toys in view and then infer the
target to which the parent is referring? This <text font="italic">referential
uncertainty</text> <cite class="ltx_citemacro_cite"><bibref bibrefs="quine1960word" separator=";" show="Authors Phrase1YearPhrase2" yyseparator=",">
            <bibrefphrase>(</bibrefphrase>
            <bibrefphrase>)</bibrefphrase>
          </bibref></cite> is the classic puzzle of early
word learning:
because real-life learning situations are replete with objects
and events, a challenge for young word learners is to recognize and
identify the correct referent from many possible candidates at a given
naming moment. Despite many experimental studies on
infants <cite class="ltx_citemacro_cite"><bibref bibrefs="golinkoff2000becoming" separator=";" show="Authors Phrase1YearPhrase2" yyseparator=",">
            <bibrefphrase>(</bibrefphrase>
            <bibrefphrase>)</bibrefphrase>
          </bibref></cite> and much computational work on simulating early
word learning <cite class="ltx_citemacro_cite"><bibref bibrefs="yu2007unified,frank2009" separator=";" show="Authors Phrase1YearPhrase2" yyseparator=",">
            <bibrefphrase>(</bibrefphrase>
            <bibrefphrase>)</bibrefphrase>
          </bibref></cite>, how young children solve this problem
<!--  %**** main.removed.tex Line 200 **** -->remains an open question.</p>
    </para>
    <figure inlist="lof" labels="LABEL:fig:env" placement="tb!" xml:id="S1.F1">
      <tags>
        <tag><text fontsize="90%">Figure 1</text></tag>
        <tag role="refnum">1</tag>
        <tag role="typerefnum">Figure 1</tag>
      </tags>
      <graphics candidates="fig1.jpg" class="ltx_centering" graphic="./fig1.jpg" options="width=325.215pt" xml:id="S1.F1.g1"/>
      <toccaption class="ltx_centering"><tag close=" ">1</tag>An infant and parent play with a set of toys in a
free-flowing joint play session. Both participants wore
head-mounted cameras and eye trackers to record egocentric video and gaze
data from their own perspectives. </toccaption>
      <caption class="ltx_centering"><tag close=": "><text fontsize="90%">Figure 1</text></tag><text fontsize="90%">An infant and parent play with a set of toys in a
free-flowing joint play session. Both participants wore
head-mounted cameras and eye trackers to record egocentric video and gaze
data from their own perspectives. </text></caption>
    </figure>
    <para xml:id="S1.p3">
      <p>Decades of research in developmental psychology and cognitive science
have attempted to resolve this mystery. Researchers have designed
human laboratory experiments by creating experimental training
datasets and testing the abilities of human learners to learn from
them <cite class="ltx_citemacro_cite"><bibref bibrefs="golinkoff2000becoming" separator=";" show="Authors Phrase1YearPhrase2" yyseparator=",">
            <bibrefphrase>(</bibrefphrase>
            <bibrefphrase>)</bibrefphrase>
          </bibref></cite>. In computational studies,
researchers have built models that implement in-principle learning
<!--  %**** main.removed.tex Line 225 **** -->algorithms, and created training sets to test the abilities of the
models to find statistical regularities in the input data. Some
work in modeling word learning has used sensory data collected
from adult learners or robots <cite class="ltx_citemacro_cite"><bibref bibrefs="Roy2002,yu2007unified,Rasanen2019" separator=";" show="Authors Phrase1YearPhrase2" yyseparator=",">
            <bibrefphrase>(</bibrefphrase>
            <bibrefphrase>)</bibrefphrase>
          </bibref></cite>, while many models take symbolic data
or simplified
inputs <cite class="ltx_citemacro_cite"><bibref bibrefs="frank2009,kachergis2017,smith2011,fazly2010,yu2007unified" separator=";" show="Authors Phrase1YearPhrase2" yyseparator=",">
            <bibrefphrase>(</bibrefphrase>
            <bibrefphrase>)</bibrefphrase>
          </bibref></cite>.
Little is known about whether these models can scale up to address the
same problems faced by infants in real-world learning. As recently
pointed out in <cite class="ltx_citemacro_cite"><bibref bibrefs="dupoux2018" separator=";" show="Authors Phrase1YearPhrase2" yyseparator=",">
            <bibrefphrase>(</bibrefphrase>
            <bibrefphrase>)</bibrefphrase>
          </bibref></cite>, the research field of cognitive
modeling needs to move toward using realistic data as input because
all the learning processes in human cognitive systems are sensitive to
the input signals <cite class="ltx_citemacro_cite"><bibref bibrefs="smith2018" separator=";" show="Authors Phrase1YearPhrase2" yyseparator=",">
            <bibrefphrase>(</bibrefphrase>
            <bibrefphrase>)</bibrefphrase>
          </bibref></cite>. If our ultimate goal is to
understand how infants learn language in the real world — not in
laboratories or in simulated environment — we should model internal
learning processes with natural statistics of the learning
environment. This paper takes a step towards this goal and uses
data collected by infants as they naturally play with toys and interact with parents.</p>
    </para>
    <figure inlist="lof" labels="LABEL:fig:overview" placement="htb!" xml:id="S1.F2">
      <tags>
        <tag><text fontsize="90%">Figure 2</text></tag>
        <tag role="refnum">2</tag>
        <tag role="typerefnum">Figure 2</tag>
      </tags>
      <graphics candidates="fig2.png" class="ltx_centering" graphic="./fig2.png" options="width=433.62pt" xml:id="S1.F2.g1"/>
      <toccaption class="ltx_centering"><tag close=" ">2</tag>Overview of our approach. The training data were created by extracting egocentric image frames around the moments when parents named objects in free-flowing interaction. The data was fed into deep learning (ResNet) models to find and associate visual objects in view with names in parent speech. As a result, the models built the associations between heard labels and visual presentations of target objects.
 </toccaption>
      <caption class="ltx_centering"><tag close=": "><text fontsize="90%">Figure 2</text></tag><text fontsize="90%">Overview of our approach. The training data were created by extracting egocentric image frames around the moments when parents named objects in free-flowing interaction. The data was fed into deep learning (ResNet) models to find and associate visual objects in view with names in parent speech. As a result, the models built the associations between heard labels and visual presentations of target objects.
 </text></caption>
    </figure>
    <para xml:id="S1.p4">
      <p>Recent advances in computational and sensing techniques (deep
learning, wearable sensors, etc.) could
revolutionize the study of cognitive modeling. In the field of machine
learning, Convolutional Neural Networks (CNNs) have achieved
impressive learning results and even outperform humans on some specific
tasks <cite class="ltx_citemacro_cite"><bibref bibrefs="silver2016mastering,he2015delving" separator=";" show="Authors Phrase1YearPhrase2" yyseparator=",">
            <bibrefphrase>(</bibrefphrase>
            <bibrefphrase>)</bibrefphrase>
          </bibref></cite>. In the field of computer vision,
small wearable cameras have been used to capture an
approximation of the visual field of their human wearer. Video from
this egocentric point of view provides a unique perspective of the
visual world that is inherently human-centric, giving a level of
detail and ubiquity that may well exceed what is possible from
environmental cameras in a third-person point-of-view. Recently,
head-mounted cameras and eye trackers have been used in developmental
psychology to collect fine-grained information about what infants are
seeing and doing in real
time <cite class="ltx_citemacro_cite"><bibref bibrefs="he2015delving,silver2016mastering" separator=";" show="Authors Phrase1YearPhrase2" yyseparator=",">
            <bibrefphrase>(</bibrefphrase>
            <bibrefphrase>)</bibrefphrase>
          </bibref></cite>. These new
technologies make it feasible to build
computational models using inputs that are very close to infants’ actual sensory
experiences, in order to understand the rich complexity of
infants’ sensory experiences available for word learning.
<!--  %**** main.removed.tex Line 275 **** --></p>
    </para>
    <para xml:id="S1.p5">
      <p>In the present study, we collect egocentric video and gaze data from
infant learners as they and their parents naturally play with a set of
toys. This allows us to capture the learning environment from the
perspective of the learner’s own point of view. We then build a
computational system that processes this infant sensory data
to learn name-object associations from scratch. As the
first model taking raw egocentric video to simulate infant word
learning, the present study has two primary goals. The first aim is to
provide a proof of principle that the problem of early word learning
can be solved using raw data. The second aim is to systematically
determine the computational roles of visual, perceptual, and
attentional properties that may influence word learning. This
examination allows us to generate quantitative predictions which can
be further tested in future experimental studies.</p>
    </para>
  </section>
  <section inlist="toc" xml:id="S2">
    <tags>
      <tag>2</tag>
      <tag role="refnum">2</tag>
      <tag role="typerefnum">§2</tag>
    </tags>
    <title><tag close=" ">2</tag>Method</title>
    <subsection inlist="toc" xml:id="S2.SS1">
      <tags>
        <tag>2.1</tag>
        <tag role="refnum">2.1</tag>
        <tag role="typerefnum">§2.1</tag>
      </tags>
      <title><tag close=" ">2.1</tag>Data Collection</title>
      <para xml:id="S2.SS1.p1">
        <p>To closely approximate the input perceived by infants, we collected
visual and audio data from everyday toy play — a context in which
infants naturally learn about objects and their names. We developed
and used an experimental setup in which we placed a camera on the
infant’s head to collect egocentric video of their field of view,
<!--  %**** main.removed.tex Line 300 **** -->as shown in
Figure <ref labelref="LABEL:fig:env"/>. We also used a head-mounted eye gaze tracker to
record their visual attention.
Additionally, we collected synchronized video and gaze data from the
parent during the same play session.</p>
      </para>
      <para xml:id="S2.SS1.p2">
        <p>Thirty-four child-parent dyads participated in our study. Each dyad
was brought into a room with 24 toys (the same as
in <cite class="ltx_citemacro_cite"><bibref bibrefs="bambach2018toddler" separator=";" show="Authors Phrase1YearPhrase2" yyseparator=",">
              <bibrefphrase>(</bibrefphrase>
              <bibrefphrase>)</bibrefphrase>
            </bibref></cite>) scattered on the floor. Children and
parents were told to play with the toys, without more specific
directions. The children ranged in age from 15.2 to 24.2
months (<Math mode="inline" tex="\mu" text="mu" xml:id="S2.SS1.p2.m1">
            <XMath>
              <XMTok font="italic" name="mu" role="UNKNOWN">μ</XMTok>
            </XMath>
          </Math>=19.4 months, <Math mode="inline" tex="\sigma" text="sigma" xml:id="S2.SS1.p2.m2">
            <XMath>
              <XMTok font="italic" name="sigma" role="UNKNOWN">σ</XMTok>
            </XMath>
          </Math>=2.2 months). We collected five
synchronized videos per dyad (head camera and eye camera for child,
head camera and eye camera for parent, and a third-person view camera
– see Figure <ref labelref="LABEL:fig:env"/>). The final dataset contains 212
minutes of synchronized video, with each dyad contributing different amounts of
data ranging from 3.4 minutes to 11.6 minutes (<Math mode="inline" tex="\mu" text="mu" xml:id="S2.SS1.p2.m3">
            <XMath>
              <XMTok font="italic" name="mu" role="UNKNOWN">μ</XMTok>
            </XMath>
          </Math>=7.5 minutes,
<Math mode="inline" tex="\sigma" text="sigma" xml:id="S2.SS1.p2.m4">
            <XMath>
              <XMTok font="italic" name="sigma" role="UNKNOWN">σ</XMTok>
            </XMath>
          </Math>=2.3 minutes). The head-mounted eye trackers recorded video
at 30 frames per second and 480 <Math mode="inline" tex="\times" text="*" xml:id="S2.SS1.p2.m5">
            <XMath>
              <XMTok meaning="times" role="MULOP">×</XMTok>
            </XMath>
          </Math> 640 pixels per frame, with a
<!--  %**** main.removed.tex Line 325 **** -->horizontal field of view of about 70 degrees. We followed validated
best practices for mounting the head cameras so as to best approximate
participants’ actual first-person views, and for calibrating the eye
trackers <cite class="ltx_citemacro_cite"><bibref bibrefs="slone2018gaze" separator=";" show="Authors Phrase1YearPhrase2" yyseparator=",">
              <bibrefphrase>(</bibrefphrase>
              <bibrefphrase>)</bibrefphrase>
            </bibref></cite>.</p>
      </para>
    </subsection>
    <subsection inlist="toc" xml:id="S2.SS2">
      <tags>
        <tag>2.2</tag>
        <tag role="refnum">2.2</tag>
        <tag role="typerefnum">§2.2</tag>
      </tags>
      <title><tag close=" ">2.2</tag>Training Data</title>
      <para xml:id="S2.SS2.p1">
        <p>Parents’ speech during toy play was fully transcribed and divided into
spoken utterances, each defined as a string of speech between two
periods of silence lasting at least
400ms <cite class="ltx_citemacro_cite"><bibref bibrefs="yu2012embodied" separator=";" show="Authors Phrase1YearPhrase2" yyseparator=",">
              <bibrefphrase>(</bibrefphrase>
              <bibrefphrase>)</bibrefphrase>
            </bibref></cite>. Spoken utterances containing the name of
one of the objects were marked as “naming utterances” (e.g. “that’s
a helmet”). For each naming utterance, trained coders annotated the
intended referent object. On average, parents produced 15.51
utterances per minute (<Math mode="inline" tex="\sigma" text="sigma" xml:id="S2.SS2.p1.m1">
            <XMath>
              <XMTok font="italic" name="sigma" role="UNKNOWN">σ</XMTok>
            </XMath>
          </Math>=4.56), 4.82 of which were referential
(<Math mode="inline" tex="\sigma" text="sigma" xml:id="S2.SS2.p1.m2">
            <XMath>
              <XMTok font="italic" name="sigma" role="UNKNOWN">σ</XMTok>
            </XMath>
          </Math>=2.09). In total, the entire training dataset contains 1,459
naming utterances.</p>
      </para>
      <para xml:id="S2.SS2.p2">
        <p>Recent studies on infant word learning show that the moments during
and after hearing a word are critical for young learners to associate
seen objects with heard words <cite class="ltx_citemacro_cite"><bibref bibrefs="yu2012embodied" separator=";" show="Authors Phrase1YearPhrase2" yyseparator=",">
              <bibrefphrase>(</bibrefphrase>
              <bibrefphrase>)</bibrefphrase>
            </bibref></cite>. In light of this,
we temporally aligned speech data with video data, and used a 3-sec
temporal window starting from the onset of each naming utterance.
Given that each naming utterance lasted about 1.5 to 2 seconds, a 3-sec
<!--  %**** main.removed.tex Line 350 **** -->window captured both the moments that infants heard the target name in
parent speech and also the moments after hearing the name. For each
temporal window, a total of 90 image frames (30 frames per second) were extracted.
To summarize, the final training dataset consists of all the naming instances
in parent-child joint play, with each instance containing a target name
and a set of 90 image frames from the child’s first-person camera that
co-occur with the naming utterance. As shown in Figure <ref labelref="LABEL:fig:overview"/>,
each image typically contains multiple visual objects
and the named object may or may not be in view.</p>
      </para>
      <figure inlist="lof" labels="LABEL:fig:test" placement="htb!" xml:id="S2.F3">
        <tags>
          <tag><text fontsize="90%">Figure 3</text></tag>
          <tag role="refnum">3</tag>
          <tag role="typerefnum">Figure 3</tag>
        </tags>
        <graphics candidates="figs/testsamples.jpg" graphic="./figs/testsamples.jpg" options="width=433.62pt" xml:id="S2.F3.g1"/>
        <toccaption><tag close=" ">3</tag>Testing images. We evaluate the models trained from egocentric images using systematically captured images from various views with a clean background.</toccaption>
        <caption><tag close=": "><text fontsize="90%">Figure 3</text></tag><text fontsize="90%">Testing images. We evaluate the models trained from egocentric images using systematically captured images from various views with a clean background.</text></caption>
      </figure>
    </subsection>
    <subsection inlist="toc" xml:id="S2.SS3">
      <tags>
        <tag>2.3</tag>
        <tag role="refnum">2.3</tag>
        <tag role="typerefnum">§2.3</tag>
      </tags>
      <title><tag close=" ">2.3</tag>Testing Data and Evaluation Metrics</title>
      <para xml:id="S2.SS3.p1">
        <p>To evaluate the result of word learning, we prepared a separate set of
clean canonical images for each of the 24 objects varying in camera
view and object size and orientation in a similar manner to previous
work <cite class="ltx_citemacro_cite"><bibref bibrefs="bambach2016active" separator=";" show="Authors Phrase1YearPhrase2" yyseparator=",">
              <bibrefphrase>(</bibrefphrase>
              <bibrefphrase>)</bibrefphrase>
            </bibref></cite>. In particular, we took pictures of each
toy from eight different points of view (45 degree rotations around
the vertical axis), totaling 3,072 images (see
Fig <ref labelref="LABEL:fig:test"/>). This test set allowed us to examine whether the
models generalized the learned names to new visual instances never
seen before. During test, we presented one image at a time to a
<!--  %**** main.removed.tex Line 375 **** -->trained model and checked whether the model generated the correct
label. We compute mean accuracy (i.e., the number of correctly
classified images over the total number of test images) as the
evaluation metric.</p>
      </para>
    </subsection>
    <subsection inlist="toc" xml:id="S2.SS4">
      <tags>
        <tag>2.4</tag>
        <tag role="refnum">2.4</tag>
        <tag role="typerefnum">§2.4</tag>
      </tags>
      <title><tag close=" ">2.4</tag>Simulating acuity</title>
      <para xml:id="S2.SS4.p1">
        <p>Egocentric video captured by head-mounted cameras provides a good
approximation of the field of view of the infant. However, the human
visual system exhibits well-defined contrast sensitivity due to retinal
eccentricity: the area centered around the gaze point (the fovea)
captures a high-resolution image, while the imagery in the periphery
is captured at dramatically lower resolution due to its lesser
sensitivity to higher spatial frequencies. As a result, the human
visual system does not process all “pixels” in the first-person
image equally, but instead focuses more on the pixels around the
fovea. To closely approximate the visual signals that are “input” to
a learner’s learning system, we implemented the method of <cite class="ltx_citemacro_cite"><bibref bibrefs="perry" separator=";" show="Authors Phrase1YearPhrase2" yyseparator=",">
              <bibrefphrase>(</bibrefphrase>
              <bibrefphrase>)</bibrefphrase>
            </bibref></cite>
to simulate the effect of foveated visual acuity on each frame. The
basic idea is to preserve the original high-resolution image at the
center of gaze while increasing blur progressively towards the periphery,
as shown in Figure <ref labelref="LABEL:fig:acuity"/>. This technique applies a model of
what is known about human visual acuity and has been validated with
human psychophysical studies <cite class="ltx_citemacro_cite"><bibref bibrefs="perry" separator=";" show="Authors Phrase1YearPhrase2" yyseparator=",">
              <bibrefphrase>(</bibrefphrase>
              <bibrefphrase>)</bibrefphrase>
            </bibref></cite>.</p>
      </para>
      <figure inlist="lof" labels="LABEL:fig:basedata" placement="t" xml:id="S2.F4">
        <tags>
          <tag><text fontsize="90%">Figure 4</text></tag>
          <tag role="refnum">4</tag>
          <tag role="typerefnum">Figure 4</tag>
        </tags>
<!--  %**** main.removed.tex Line 400 **** -->        <figure align="center" inlist="lof" labels="LABEL:fig:original" placement="b" xml:id="S2.F3.sf1">
          <tags>
            <tag><text fontsize="90%">(a)</text></tag>
            <tag role="refnum">3(a)</tag>
          </tags>
          <graphics candidates="figs/sid1230frame4570oriGaze.jpg" graphic="./figs/sid1230frame4570oriGaze.jpg" options="width=433.62pt" xml:id="S2.F3.sf1.g1"/>
          <toccaption><tag close=" ">(a)</tag>Original</toccaption>
          <caption><tag close=" "><text fontsize="90%">(a)</text></tag><text fontsize="90%">Original</text></caption>
        </figure>
        <figure align="center" inlist="lof" labels="LABEL:fig:acuity" placement="b" xml:id="S2.F3.sf2">
          <tags>
            <tag><text fontsize="90%">(b)</text></tag>
            <tag role="refnum">3(b)</tag>
          </tags>
          <graphics candidates="figs/sid1230frame4570acuityGaze.jpg" graphic="./figs/sid1230frame4570acuityGaze.jpg" options="width=433.62pt" xml:id="S2.F3.sf2.g1"/>
          <toccaption><tag close=" ">(b)</tag>Acuity simulation</toccaption>
          <caption><tag close=" "><text fontsize="90%">(b)</text></tag><text fontsize="90%">Acuity simulation</text></caption>
        </figure>
        <toccaption class="ltx_centering"><tag close=" ">4</tag> We simulated foveated vision by applying an acuity filter to the original egocentric image, based on the eye gaze position (red crosshairs). </toccaption>
        <caption class="ltx_centering"><tag close=": "><text fontsize="90%">Figure 4</text></tag><text fontsize="90%"> We simulated foveated vision by applying an acuity filter to the original egocentric image, based on the eye gaze position (red crosshairs). </text></caption>
      </figure>
    </subsection>
    <subsection inlist="toc" xml:id="S2.SS5">
      <tags>
        <tag>2.5</tag>
        <tag role="refnum">2.5</tag>
        <tag role="typerefnum">§2.5</tag>
      </tags>
      <title><tag close=" ">2.5</tag>Convolutional Neural Networks Models</title>
      <para xml:id="S2.SS5.p1">
        <p>We used a state-of-the-art CNN model, ResNet50 <cite class="ltx_citemacro_cite"><bibref bibrefs="he2016resnet" separator=";" show="Authors Phrase1YearPhrase2" yyseparator=",">
              <bibrefphrase>(</bibrefphrase>
              <bibrefphrase>)</bibrefphrase>
            </bibref></cite>,
trained with stochastic gradient descent (SGD). The network outputs a
softmax probability distribution over 24 object labels, so the
label with the highest probability is the predicted object. SGD
optimizes the CNN parameters to minimize the cross entropy loss
between the predicted distribution and the ground truth (one-hot) distribution.
<!--  %**** main.removed.tex Line 425 **** -->Before SGD, we initialized the parameters of ResNet50 with a model
pretrained on ImageNet <cite class="ltx_citemacro_cite"><bibref bibrefs="russakovsky2015imagenet" separator=";" show="Authors Phrase1YearPhrase2" yyseparator=",">
              <bibrefphrase>(</bibrefphrase>
              <bibrefphrase>)</bibrefphrase>
            </bibref></cite>. Thus, the model
can reuse the visual filters learned on ImageNet to avoid having to
learn the low-level visual filters from scratch. The training
images were resized to <Math mode="inline" tex="224\times 224" text="224 * 224" xml:id="S2.SS5.p1.m1">
            <XMath>
              <XMApp>
                <XMTok meaning="times" role="MULOP">×</XMTok>
                <XMTok meaning="224" role="NUMBER">224</XMTok>
                <XMTok meaning="224" role="NUMBER">224</XMTok>
              </XMApp>
            </XMath>
          </Math> pixels with bilinear interpolation. We
used SGD with batch size 128, momentum <Math mode="inline" tex="0.9" text="0.9" xml:id="S2.SS5.p1.m2">
            <XMath>
              <XMTok meaning="0.9" role="NUMBER">0.9</XMTok>
            </XMath>
          </Math>,
and initial learning rate 0.01. We decreased the learning rate by a
factor of 10 when the performance stopped improving, and ended
training when the learning rate reached 0.0001. Because training was
stochastic, there is natural variation across training runs; we
thus ran each of our experiments 10 times and report means and
standard deviations. Moreover, since our goal was to discover general
principles that lead to successful word learning and not to analyze
the results of individual objects, we applied a mixed-effect
logistic regression with random effects of trial and object in each of
our analyses.</p>
      </para>
    </subsection>
  </section>
  <section inlist="toc" xml:id="S3">
    <tags>
      <tag>3</tag>
      <tag role="refnum">3</tag>
      <tag role="typerefnum">§3</tag>
    </tags>
    <title><tag close=" ">3</tag>Experiments and Results</title>
<!--  %**** main.removed.tex Line 450 **** -->    <subsection inlist="toc" xml:id="S3.SS1">
      <tags>
        <tag>3.1</tag>
        <tag role="refnum">3.1</tag>
        <tag role="typerefnum">§3.1</tag>
      </tags>
      <title><tag close=" ">3.1</tag>Study 1: Learning object names from raw egocentric video</title>
      <para xml:id="S3.SS1.p1">
        <p>The aim of Study 1 is to demonstrate that a state-of-the-art machine
learning model can be trained to associate object names with visual
objects by using egocentric data closely approximating sensory
experiences of infant learners. We also evaluated models learned with
parent view data in order to compare the informativeness of these
different views. Moreover, to examine the impact of properties of the
training data, we created several simulation conditions by
sub-sampling the whole set of 1459 into seven subsets with different numbers
of naming events (50, 100, 200, 400, 600, 800, 1100). While we expected
that more naming instances would lead to better learning,
we sought to quantify this relationship.</p>
      </para>
      <figure inlist="lof" labels="LABEL:fig:study1" placement="t" xml:id="S3.F5">
        <tags>
          <tag><text fontsize="90%">Figure 5</text></tag>
          <tag role="refnum">5</tag>
          <tag role="typerefnum">Figure 5</tag>
        </tags>
        <graphics candidates="figs/study1num.pdf" class="ltx_centering" graphic="./figs/study1num.pdf" options="width=390.258pt" xml:id="S3.F5.g1"/>
        <toccaption><tag close=" ">5</tag>Results from models trained with infant data
improve with more naming instances, while the
models trained with the parent data show
no improvement. </toccaption>
        <caption><tag close=": "><text fontsize="90%">Figure 5</text></tag><text fontsize="90%">Results from models trained with infant data
improve with more naming instances, while the
models trained with the parent data show
no improvement. </text></caption>
      </figure>
      <para xml:id="S3.SS1.p2">
        <p>Figure <ref labelref="LABEL:fig:study1"/> reveals two noticeable patterns in the
models trained on the infant data and the model trained on the parent
data. First, when there are 200 or more naming events,
models trained with infant data consistently outperformed
the same models trained on parent data (e.g., for 200 naming
events: <Math mode="inline" tex="M_{infant}=16.12\%,SE_{infant}=1.73\%;M_{parent}=8.32\%,SE_{parent}=1.27\%;%&#10;\beta=0.34,t=3.64,p&lt;0.001" text="formulae@(M _ (i * n * f * a * n * t) = 16.12percent, formulae@(S * E _ (i * n * f * a * n * t) = 1.73percent, formulae@(M _ (p * a * r * e * n * t) = 8.32percent, formulae@(S * E _ (p * a * r * e * n * t) = 1.27percent, formulae@(beta = 0.34, formulae@(t = 3.64, p less 0.001))))))" xml:id="S3.SS1.p2.m1">
            <XMath>
              <XMDual>
                <XMApp>
                  <XMTok meaning="formulae"/>
                  <XMRef idref="S3.SS1.p2.m1.1"/>
                  <XMRef idref="S3.SS1.p2.m1.2"/>
                </XMApp>
                <XMWrap>
                  <XMApp xml:id="S3.SS1.p2.m1.1">
                    <XMTok meaning="equals" role="RELOP">=</XMTok>
                    <XMApp>
                      <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                      <XMTok font="italic" role="UNKNOWN">M</XMTok>
                      <XMApp>
                        <XMTok meaning="times" role="MULOP">⁢</XMTok>
                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">i</XMTok>
                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">n</XMTok>
                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">f</XMTok>
                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">a</XMTok>
                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">n</XMTok>
                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
                      </XMApp>
                    </XMApp>
                    <XMApp>
                      <XMTok meaning="percent" role="POSTFIX">%</XMTok>
                      <XMTok meaning="16.12" role="NUMBER">16.12</XMTok>
                    </XMApp>
                  </XMApp>
                  <XMTok role="PUNCT">,</XMTok>
                  <XMDual xml:id="S3.SS1.p2.m1.2">
                    <XMApp>
                      <XMTok meaning="formulae"/>
                      <XMRef idref="S3.SS1.p2.m1.2.1"/>
                      <XMRef idref="S3.SS1.p2.m1.2.2"/>
                    </XMApp>
                    <XMWrap>
                      <XMApp xml:id="S3.SS1.p2.m1.2.1">
                        <XMTok meaning="equals" role="RELOP">=</XMTok>
                        <XMApp>
                          <XMTok meaning="times" role="MULOP">⁢</XMTok>
                          <XMTok font="italic" role="UNKNOWN">S</XMTok>
                          <XMApp>
                            <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                            <XMTok font="italic" role="UNKNOWN">E</XMTok>
                            <XMApp>
                              <XMTok meaning="times" role="MULOP">⁢</XMTok>
                              <XMTok font="italic" fontsize="70%" role="UNKNOWN">i</XMTok>
                              <XMTok font="italic" fontsize="70%" role="UNKNOWN">n</XMTok>
                              <XMTok font="italic" fontsize="70%" role="UNKNOWN">f</XMTok>
                              <XMTok font="italic" fontsize="70%" role="UNKNOWN">a</XMTok>
                              <XMTok font="italic" fontsize="70%" role="UNKNOWN">n</XMTok>
                              <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
                            </XMApp>
                          </XMApp>
                        </XMApp>
                        <XMApp>
                          <XMTok meaning="percent" role="POSTFIX">%</XMTok>
                          <XMTok meaning="1.73" role="NUMBER">1.73</XMTok>
                        </XMApp>
                      </XMApp>
                      <XMTok role="PUNCT">;</XMTok>
                      <XMDual xml:id="S3.SS1.p2.m1.2.2">
                        <XMApp>
                          <XMTok meaning="formulae"/>
                          <XMRef idref="S3.SS1.p2.m1.2.2.1"/>
                          <XMRef idref="S3.SS1.p2.m1.2.2.2"/>
                        </XMApp>
                        <XMWrap>
                          <XMApp xml:id="S3.SS1.p2.m1.2.2.1">
                            <XMTok meaning="equals" role="RELOP">=</XMTok>
                            <XMApp>
                              <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                              <XMTok font="italic" role="UNKNOWN">M</XMTok>
                              <XMApp>
                                <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">p</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">a</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">e</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">n</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
                              </XMApp>
                            </XMApp>
                            <XMApp>
                              <XMTok meaning="percent" role="POSTFIX">%</XMTok>
                              <XMTok meaning="8.32" role="NUMBER">8.32</XMTok>
                            </XMApp>
                          </XMApp>
                          <XMTok role="PUNCT">,</XMTok>
                          <XMDual xml:id="S3.SS1.p2.m1.2.2.2">
                            <XMApp>
                              <XMTok meaning="formulae"/>
                              <XMRef idref="S3.SS1.p2.m1.2.2.2.1"/>
                              <XMRef idref="S3.SS1.p2.m1.2.2.2.2"/>
                            </XMApp>
                            <XMWrap>
                              <XMApp xml:id="S3.SS1.p2.m1.2.2.2.1">
                                <XMTok meaning="equals" role="RELOP">=</XMTok>
                                <XMApp>
                                  <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                  <XMTok font="italic" role="UNKNOWN">S</XMTok>
                                  <XMApp>
                                    <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                                    <XMTok font="italic" role="UNKNOWN">E</XMTok>
                                    <XMApp>
                                      <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">p</XMTok>
                                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">a</XMTok>
                                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">e</XMTok>
                                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">n</XMTok>
                                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
                                    </XMApp>
                                  </XMApp>
                                </XMApp>
                                <XMApp>
                                  <XMTok meaning="percent" role="POSTFIX">%</XMTok>
                                  <XMTok meaning="1.27" role="NUMBER">1.27</XMTok>
                                </XMApp>
                              </XMApp>
                              <XMTok role="PUNCT">;</XMTok>
                              <XMDual xml:id="S3.SS1.p2.m1.2.2.2.2">
                                <XMApp>
                                  <XMTok meaning="formulae"/>
                                  <XMRef idref="S3.SS1.p2.m1.2.2.2.2.1"/>
                                  <XMRef idref="S3.SS1.p2.m1.2.2.2.2.2"/>
                                </XMApp>
                                <XMWrap>
                                  <XMApp xml:id="S3.SS1.p2.m1.2.2.2.2.1">
                                    <XMTok meaning="equals" role="RELOP">=</XMTok>
                                    <XMTok font="italic" name="beta" role="UNKNOWN">β</XMTok>
                                    <XMTok meaning="0.34" role="NUMBER">0.34</XMTok>
                                  </XMApp>
                                  <XMTok role="PUNCT">,</XMTok>
                                  <XMDual xml:id="S3.SS1.p2.m1.2.2.2.2.2">
                                    <XMApp>
                                      <XMTok meaning="formulae"/>
                                      <XMRef idref="S3.SS1.p2.m1.2.2.2.2.2.1"/>
                                      <XMRef idref="S3.SS1.p2.m1.2.2.2.2.2.2"/>
                                    </XMApp>
                                    <XMWrap>
                                      <XMApp xml:id="S3.SS1.p2.m1.2.2.2.2.2.1">
                                        <XMTok meaning="equals" role="RELOP">=</XMTok>
                                        <XMTok font="italic" role="UNKNOWN">t</XMTok>
                                        <XMTok meaning="3.64" role="NUMBER">3.64</XMTok>
                                      </XMApp>
                                      <XMTok role="PUNCT">,</XMTok>
                                      <XMApp xml:id="S3.SS1.p2.m1.2.2.2.2.2.2">
                                        <XMTok meaning="less-than" role="RELOP">&lt;</XMTok>
                                        <XMTok font="italic" role="UNKNOWN">p</XMTok>
                                        <XMTok meaning="0.001" role="NUMBER">0.001</XMTok>
                                      </XMApp>
                                    </XMWrap>
                                  </XMDual>
                                </XMWrap>
                              </XMDual>
                            </XMWrap>
                          </XMDual>
                        </XMWrap>
                      </XMDual>
                    </XMWrap>
                  </XMDual>
                </XMWrap>
              </XMDual>
            </XMath>
          </Math>). Second, as the
quantity of training data increased, the models trained on infant
data obtained better performance while the models trained on the
parent data saturated. Taken together, these results
provide convincing evidence that the model can solve the name-object
mapping problem from raw video, and that the infant data contain
certain properties leading to better word learning. The finding that
infant data lead to better learning is consistent with recent results
<!--  %**** main.removed.tex Line 500 **** -->reported on another topic in early development – visual object
recognition <cite class="ltx_citemacro_cite"><bibref bibrefs="bambach2018toddler" separator=";" show="Authors Phrase1YearPhrase2" yyseparator=",">
              <bibrefphrase>(</bibrefphrase>
              <bibrefphrase>)</bibrefphrase>
            </bibref></cite>.</p>
      </para>
<!--  %**** main.removed.tex Line 525 **** -->    </subsection>
    <subsection inlist="toc" xml:id="S3.SS2">
      <tags>
        <tag>3.2</tag>
        <tag role="refnum">3.2</tag>
        <tag role="typerefnum">§3.2</tag>
      </tags>
      <title><tag close=" ">3.2</tag>Study 2: Examining the effects of different attentional strategies</title>
      <para xml:id="S3.SS2.p1">
        <p>Humans perform an average of approximately three eye movements per second because
our visual systems actively select visual information
which is then fed into internal cognitive and learning
processes. Thus during the 3-second window during and after hearing a naming
utterance, an infant learner may generate multiple looks on different objects
in view, or, alternatively,
may sustain their attention on one object.
The aim of Study 2 is to investigate whether
different attention strategies during naming events influence
word learning, and if so, in which ways.</p>
      </para>
<!--  %**** main.removed.tex Line 550 **** -->      <para xml:id="S3.SS2.p2">
        <p>To answer this question, we first assigned each naming
event into one of two categories:
<text font="italic">sustained attention</text> if the infant
attended to a single object for more than 60% of the
frames in the naming event, and <text font="italic">distributed attention</text> otherwise.
This split resulted in 750 sustained attention (SA) and 709
distributed attention (DA) events. In either case, the infant may or may
not attend to the named object because the definition is based on the
distribution of infant attention, <text font="italic">not</text> on which objects were
attended in a naming event. We trained two identical models, one on SA
instances and one on DA instances. The results in Figure <ref labelref="LABEL:fig:study3temporalAtt"/> reveal that the model trained with
sustained attention events (<Math mode="inline" tex="M_{sustained}=30.53\%,SE_{sustained}=2.08\%" text="formulae@(M _ (s * u * s * t * a * i * n * e * d) = 30.53percent, S * E _ (s * u * s * t * a * i * n * e * d) = 2.08percent)" xml:id="S3.SS2.p2.m1">
            <XMath>
              <XMDual>
                <XMApp>
                  <XMTok meaning="formulae"/>
                  <XMRef idref="S3.SS2.p2.m1.1"/>
                  <XMRef idref="S3.SS2.p2.m1.2"/>
                </XMApp>
                <XMWrap>
                  <XMApp xml:id="S3.SS2.p2.m1.1">
                    <XMTok meaning="equals" role="RELOP">=</XMTok>
                    <XMApp>
                      <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                      <XMTok font="italic" role="UNKNOWN">M</XMTok>
                      <XMApp>
                        <XMTok meaning="times" role="MULOP">⁢</XMTok>
                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">s</XMTok>
                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">u</XMTok>
                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">s</XMTok>
                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">a</XMTok>
                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">i</XMTok>
                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">n</XMTok>
                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">e</XMTok>
                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">d</XMTok>
                      </XMApp>
                    </XMApp>
                    <XMApp>
                      <XMTok meaning="percent" role="POSTFIX">%</XMTok>
                      <XMTok meaning="30.53" role="NUMBER">30.53</XMTok>
                    </XMApp>
                  </XMApp>
                  <XMTok role="PUNCT">,</XMTok>
                  <XMApp xml:id="S3.SS2.p2.m1.2">
                    <XMTok meaning="equals" role="RELOP">=</XMTok>
                    <XMApp>
                      <XMTok meaning="times" role="MULOP">⁢</XMTok>
                      <XMTok font="italic" role="UNKNOWN">S</XMTok>
                      <XMApp>
                        <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                        <XMTok font="italic" role="UNKNOWN">E</XMTok>
                        <XMApp>
                          <XMTok meaning="times" role="MULOP">⁢</XMTok>
                          <XMTok font="italic" fontsize="70%" role="UNKNOWN">s</XMTok>
                          <XMTok font="italic" fontsize="70%" role="UNKNOWN">u</XMTok>
                          <XMTok font="italic" fontsize="70%" role="UNKNOWN">s</XMTok>
                          <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
                          <XMTok font="italic" fontsize="70%" role="UNKNOWN">a</XMTok>
                          <XMTok font="italic" fontsize="70%" role="UNKNOWN">i</XMTok>
                          <XMTok font="italic" fontsize="70%" role="UNKNOWN">n</XMTok>
                          <XMTok font="italic" fontsize="70%" role="UNKNOWN">e</XMTok>
                          <XMTok font="italic" fontsize="70%" role="UNKNOWN">d</XMTok>
                        </XMApp>
                      </XMApp>
                    </XMApp>
                    <XMApp>
                      <XMTok meaning="percent" role="POSTFIX">%</XMTok>
                      <XMTok meaning="2.08" role="NUMBER">2.08</XMTok>
                    </XMApp>
                  </XMApp>
                </XMWrap>
              </XMDual>
            </XMath>
          </Math>) outperformed the model trained with
distributed attention events (<Math mode="inline" tex="M_{distributed}=23.26\%,SE_{distributed}=1.78\%;\beta=0.20,t=2.65,p&lt;0.005" text="formulae@(M _ (d * i * s * t * r * i * b * u * t * e * d) = 23.26percent, formulae@(S * E _ (d * i * s * t * r * i * b * u * t * e * d) = 1.78percent, formulae@(beta = 0.20, formulae@(t = 2.65, p less 0.005))))" xml:id="S3.SS2.p2.m2">
            <XMath>
              <XMDual>
                <XMApp>
                  <XMTok meaning="formulae"/>
                  <XMRef idref="S3.SS2.p2.m2.1"/>
                  <XMRef idref="S3.SS2.p2.m2.2"/>
                </XMApp>
                <XMWrap>
                  <XMApp xml:id="S3.SS2.p2.m2.1">
                    <XMTok meaning="equals" role="RELOP">=</XMTok>
                    <XMApp>
                      <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                      <XMTok font="italic" role="UNKNOWN">M</XMTok>
                      <XMApp>
                        <XMTok meaning="times" role="MULOP">⁢</XMTok>
                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">d</XMTok>
                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">i</XMTok>
                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">s</XMTok>
                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">i</XMTok>
                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">b</XMTok>
                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">u</XMTok>
                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">e</XMTok>
                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">d</XMTok>
                      </XMApp>
                    </XMApp>
                    <XMApp>
                      <XMTok meaning="percent" role="POSTFIX">%</XMTok>
                      <XMTok meaning="23.26" role="NUMBER">23.26</XMTok>
                    </XMApp>
                  </XMApp>
                  <XMTok role="PUNCT">,</XMTok>
                  <XMDual xml:id="S3.SS2.p2.m2.2">
                    <XMApp>
                      <XMTok meaning="formulae"/>
                      <XMRef idref="S3.SS2.p2.m2.2.1"/>
                      <XMRef idref="S3.SS2.p2.m2.2.2"/>
                    </XMApp>
                    <XMWrap>
                      <XMApp xml:id="S3.SS2.p2.m2.2.1">
                        <XMTok meaning="equals" role="RELOP">=</XMTok>
                        <XMApp>
                          <XMTok meaning="times" role="MULOP">⁢</XMTok>
                          <XMTok font="italic" role="UNKNOWN">S</XMTok>
                          <XMApp>
                            <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                            <XMTok font="italic" role="UNKNOWN">E</XMTok>
                            <XMApp>
                              <XMTok meaning="times" role="MULOP">⁢</XMTok>
                              <XMTok font="italic" fontsize="70%" role="UNKNOWN">d</XMTok>
                              <XMTok font="italic" fontsize="70%" role="UNKNOWN">i</XMTok>
                              <XMTok font="italic" fontsize="70%" role="UNKNOWN">s</XMTok>
                              <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
                              <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                              <XMTok font="italic" fontsize="70%" role="UNKNOWN">i</XMTok>
                              <XMTok font="italic" fontsize="70%" role="UNKNOWN">b</XMTok>
                              <XMTok font="italic" fontsize="70%" role="UNKNOWN">u</XMTok>
                              <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
                              <XMTok font="italic" fontsize="70%" role="UNKNOWN">e</XMTok>
                              <XMTok font="italic" fontsize="70%" role="UNKNOWN">d</XMTok>
                            </XMApp>
                          </XMApp>
                        </XMApp>
                        <XMApp>
                          <XMTok meaning="percent" role="POSTFIX">%</XMTok>
                          <XMTok meaning="1.78" role="NUMBER">1.78</XMTok>
                        </XMApp>
                      </XMApp>
                      <XMTok role="PUNCT">;</XMTok>
                      <XMDual xml:id="S3.SS2.p2.m2.2.2">
                        <XMApp>
                          <XMTok meaning="formulae"/>
                          <XMRef idref="S3.SS2.p2.m2.2.2.1"/>
                          <XMRef idref="S3.SS2.p2.m2.2.2.2"/>
                        </XMApp>
                        <XMWrap>
                          <XMApp xml:id="S3.SS2.p2.m2.2.2.1">
                            <XMTok meaning="equals" role="RELOP">=</XMTok>
                            <XMTok font="italic" name="beta" role="UNKNOWN">β</XMTok>
                            <XMTok meaning="0.20" role="NUMBER">0.20</XMTok>
                          </XMApp>
                          <XMTok role="PUNCT">,</XMTok>
                          <XMDual xml:id="S3.SS2.p2.m2.2.2.2">
                            <XMApp>
                              <XMTok meaning="formulae"/>
                              <XMRef idref="S3.SS2.p2.m2.2.2.2.1"/>
                              <XMRef idref="S3.SS2.p2.m2.2.2.2.2"/>
                            </XMApp>
                            <XMWrap>
                              <XMApp xml:id="S3.SS2.p2.m2.2.2.2.1">
                                <XMTok meaning="equals" role="RELOP">=</XMTok>
                                <XMTok font="italic" role="UNKNOWN">t</XMTok>
                                <XMTok meaning="2.65" role="NUMBER">2.65</XMTok>
                              </XMApp>
                              <XMTok role="PUNCT">,</XMTok>
                              <XMApp xml:id="S3.SS2.p2.m2.2.2.2.2">
                                <XMTok meaning="less-than" role="RELOP">&lt;</XMTok>
                                <XMTok font="italic" role="UNKNOWN">p</XMTok>
                                <XMTok meaning="0.005" role="NUMBER">0.005</XMTok>
                              </XMApp>
                            </XMWrap>
                          </XMDual>
                        </XMWrap>
                      </XMDual>
                    </XMWrap>
                  </XMDual>
                </XMWrap>
              </XMDual>
            </XMath>
          </Math>), suggesting that
sustained attention on a single object while hearing a name leads to
better learning.</p>
      </para>
      <figure inlist="lof" labels="LABEL:fig:study3temporalAtt" placement="t" xml:id="S3.F6">
        <tags>
          <tag><text fontsize="90%">Figure 6</text></tag>
          <tag role="refnum">6</tag>
          <tag role="typerefnum">Figure 6</tag>
        </tags>
        <graphics candidates="figs/StudyAtt.pdf" class="ltx_centering" graphic="./figs/StudyAtt.pdf" options="width=433.62pt" xml:id="S3.F6.g1"/>
        <toccaption class="ltx_centering"><tag close=" ">6</tag>The model trained with sustained attention events
outperformed the model trained with distributed attention
events. Within the sustained attention events, the model
trained with on-target instances outperformed the model
trained with on-non-target
instances. </toccaption>
        <caption class="ltx_centering"><tag close=": "><text fontsize="90%">Figure 6</text></tag><text fontsize="90%">The model trained with sustained attention events
outperformed the model trained with distributed attention
events. Within the sustained attention events, the model
trained with on-target instances outperformed the model
trained with on-non-target
instances. </text></caption>
      </figure>
      <para xml:id="S3.SS2.p3">
        <p>Of course, infants may or may not show sustained attention on the
object actually named in parent speech.
In total,
infants attended to the target in 452 out of 750 SA events, and
attended to a non-target object in the other 298 SA events. Attending
to the target object with sustained attention should help learning
while sustained attention on a non-target object should hinder
learning. To test this prediction,
we sub-sampled 298 on-target events from 452 SA events, and compared
them with the remaining 298 on-non-target events. As shown in
Figure <ref labelref="LABEL:fig:study3temporalAtt"/>, the model trained with the on-target
<!--  %**** main.removed.tex Line 600 **** -->events (<Math mode="inline" tex="M_{target}=39.27\%,SE_{target}=2.30\%" text="formulae@(M _ (t * a * r * g * e * t) = 39.27percent, S * E _ (t * a * r * g * e * t) = 2.30percent)" xml:id="S3.SS2.p3.m1">
            <XMath>
              <XMDual>
                <XMApp>
                  <XMTok meaning="formulae"/>
                  <XMRef idref="S3.SS2.p3.m1.1"/>
                  <XMRef idref="S3.SS2.p3.m1.2"/>
                </XMApp>
                <XMWrap>
                  <XMApp xml:id="S3.SS2.p3.m1.1">
                    <XMTok meaning="equals" role="RELOP">=</XMTok>
                    <XMApp>
                      <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                      <XMTok font="italic" role="UNKNOWN">M</XMTok>
                      <XMApp>
                        <XMTok meaning="times" role="MULOP">⁢</XMTok>
                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">a</XMTok>
                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">g</XMTok>
                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">e</XMTok>
                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
                      </XMApp>
                    </XMApp>
                    <XMApp>
                      <XMTok meaning="percent" role="POSTFIX">%</XMTok>
                      <XMTok meaning="39.27" role="NUMBER">39.27</XMTok>
                    </XMApp>
                  </XMApp>
                  <XMTok role="PUNCT">,</XMTok>
                  <XMApp xml:id="S3.SS2.p3.m1.2">
                    <XMTok meaning="equals" role="RELOP">=</XMTok>
                    <XMApp>
                      <XMTok meaning="times" role="MULOP">⁢</XMTok>
                      <XMTok font="italic" role="UNKNOWN">S</XMTok>
                      <XMApp>
                        <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                        <XMTok font="italic" role="UNKNOWN">E</XMTok>
                        <XMApp>
                          <XMTok meaning="times" role="MULOP">⁢</XMTok>
                          <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
                          <XMTok font="italic" fontsize="70%" role="UNKNOWN">a</XMTok>
                          <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                          <XMTok font="italic" fontsize="70%" role="UNKNOWN">g</XMTok>
                          <XMTok font="italic" fontsize="70%" role="UNKNOWN">e</XMTok>
                          <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
                        </XMApp>
                      </XMApp>
                    </XMApp>
                    <XMApp>
                      <XMTok meaning="percent" role="POSTFIX">%</XMTok>
                      <XMTok meaning="2.30" role="NUMBER">2.30</XMTok>
                    </XMApp>
                  </XMApp>
                </XMWrap>
              </XMDual>
            </XMath>
          </Math>) achieved
significantly higher accuracy than the model trained on
on-non-target events (<Math mode="inline" tex="M_{non-target}=8.42\%,SE_{non-target}=1.20\%;\beta=0.98,t=14.52,p&lt;0.001" text="formulae@(M _ (n * o * n - t * a * r * g * e * t) = 8.42percent, formulae@(S * E _ (n * o * n - t * a * r * g * e * t) = 1.20percent, formulae@(beta = 0.98, formulae@(t = 14.52, p less 0.001))))" xml:id="S3.SS2.p3.m2">
            <XMath>
              <XMDual>
                <XMApp>
                  <XMTok meaning="formulae"/>
                  <XMRef idref="S3.SS2.p3.m2.1"/>
                  <XMRef idref="S3.SS2.p3.m2.2"/>
                </XMApp>
                <XMWrap>
                  <XMApp xml:id="S3.SS2.p3.m2.1">
                    <XMTok meaning="equals" role="RELOP">=</XMTok>
                    <XMApp>
                      <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                      <XMTok font="italic" role="UNKNOWN">M</XMTok>
                      <XMApp>
                        <XMTok fontsize="70%" meaning="minus" role="ADDOP">-</XMTok>
                        <XMApp>
                          <XMTok meaning="times" role="MULOP">⁢</XMTok>
                          <XMTok font="italic" fontsize="70%" role="UNKNOWN">n</XMTok>
                          <XMTok font="italic" fontsize="70%" role="UNKNOWN">o</XMTok>
                          <XMTok font="italic" fontsize="70%" role="UNKNOWN">n</XMTok>
                        </XMApp>
                        <XMApp>
                          <XMTok meaning="times" role="MULOP">⁢</XMTok>
                          <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
                          <XMTok font="italic" fontsize="70%" role="UNKNOWN">a</XMTok>
                          <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                          <XMTok font="italic" fontsize="70%" role="UNKNOWN">g</XMTok>
                          <XMTok font="italic" fontsize="70%" role="UNKNOWN">e</XMTok>
                          <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
                        </XMApp>
                      </XMApp>
                    </XMApp>
                    <XMApp>
                      <XMTok meaning="percent" role="POSTFIX">%</XMTok>
                      <XMTok meaning="8.42" role="NUMBER">8.42</XMTok>
                    </XMApp>
                  </XMApp>
                  <XMTok role="PUNCT">,</XMTok>
                  <XMDual xml:id="S3.SS2.p3.m2.2">
                    <XMApp>
                      <XMTok meaning="formulae"/>
                      <XMRef idref="S3.SS2.p3.m2.2.1"/>
                      <XMRef idref="S3.SS2.p3.m2.2.2"/>
                    </XMApp>
                    <XMWrap>
                      <XMApp xml:id="S3.SS2.p3.m2.2.1">
                        <XMTok meaning="equals" role="RELOP">=</XMTok>
                        <XMApp>
                          <XMTok meaning="times" role="MULOP">⁢</XMTok>
                          <XMTok font="italic" role="UNKNOWN">S</XMTok>
                          <XMApp>
                            <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                            <XMTok font="italic" role="UNKNOWN">E</XMTok>
                            <XMApp>
                              <XMTok fontsize="70%" meaning="minus" role="ADDOP">-</XMTok>
                              <XMApp>
                                <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">n</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">o</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">n</XMTok>
                              </XMApp>
                              <XMApp>
                                <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">a</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">g</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">e</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
                              </XMApp>
                            </XMApp>
                          </XMApp>
                        </XMApp>
                        <XMApp>
                          <XMTok meaning="percent" role="POSTFIX">%</XMTok>
                          <XMTok meaning="1.20" role="NUMBER">1.20</XMTok>
                        </XMApp>
                      </XMApp>
                      <XMTok role="PUNCT">;</XMTok>
                      <XMDual xml:id="S3.SS2.p3.m2.2.2">
                        <XMApp>
                          <XMTok meaning="formulae"/>
                          <XMRef idref="S3.SS2.p3.m2.2.2.1"/>
                          <XMRef idref="S3.SS2.p3.m2.2.2.2"/>
                        </XMApp>
                        <XMWrap>
                          <XMApp xml:id="S3.SS2.p3.m2.2.2.1">
                            <XMTok meaning="equals" role="RELOP">=</XMTok>
                            <XMTok font="italic" name="beta" role="UNKNOWN">β</XMTok>
                            <XMTok meaning="0.98" role="NUMBER">0.98</XMTok>
                          </XMApp>
                          <XMTok role="PUNCT">,</XMTok>
                          <XMDual xml:id="S3.SS2.p3.m2.2.2.2">
                            <XMApp>
                              <XMTok meaning="formulae"/>
                              <XMRef idref="S3.SS2.p3.m2.2.2.2.1"/>
                              <XMRef idref="S3.SS2.p3.m2.2.2.2.2"/>
                            </XMApp>
                            <XMWrap>
                              <XMApp xml:id="S3.SS2.p3.m2.2.2.2.1">
                                <XMTok meaning="equals" role="RELOP">=</XMTok>
                                <XMTok font="italic" role="UNKNOWN">t</XMTok>
                                <XMTok meaning="14.52" role="NUMBER">14.52</XMTok>
                              </XMApp>
                              <XMTok role="PUNCT">,</XMTok>
                              <XMApp xml:id="S3.SS2.p3.m2.2.2.2.2">
                                <XMTok meaning="less-than" role="RELOP">&lt;</XMTok>
                                <XMTok font="italic" role="UNKNOWN">p</XMTok>
                                <XMTok meaning="0.001" role="NUMBER">0.001</XMTok>
                              </XMApp>
                            </XMWrap>
                          </XMDual>
                        </XMWrap>
                      </XMDual>
                    </XMWrap>
                  </XMDual>
                </XMWrap>
              </XMDual>
            </XMath>
          </Math>).</p>
      </para>
      <para xml:id="S3.SS2.p4">
        <p>In everyday learning contexts such as toy play, young learners do not
passively perceive information from the environment; instead, the
visual input to internal learning processes is highly selective
moment-to-moment. The ability to sustain attention in such contexts
is critical for early development and has been linked to healthy
developmental outcomes <cite class="ltx_citemacro_cite"><bibref bibrefs="ruff2001attention" separator=";" show="Authors Phrase1YearPhrase2" yyseparator=",">
              <bibrefphrase>(</bibrefphrase>
              <bibrefphrase>)</bibrefphrase>
            </bibref></cite>. The results from the
present study suggest a pathway through which sustained attention
during parent naming moments creates sensory experiences that
facilitate word learning.</p>
      </para>
    </subsection>
    <subsection inlist="toc" xml:id="S3.SS3">
      <tags>
        <tag>3.3</tag>
        <tag role="refnum">3.3</tag>
        <tag role="typerefnum">§3.3</tag>
      </tags>
      <title><tag close=" ">3.3</tag>Study 3: Examining the effects of visual properties of attended objects</title>
      <para xml:id="S3.SS3.p1">
        <p>One effect of sustained attention during a naming moment is to
consistently select a certain area in the egocentric view so that the
learning system can process the visual information in that
focused area to find the target object and link it with the heard
label. Moving from the attentional level to the sensory level, we
argue that associating object names with visual objects starts with
visual information selected in the infant’s egocentric view, and
therefore the factors that matter to word learning may not just be
<!--  %**** main.removed.tex Line 625 **** -->attended objects but sensory information selected and
processed in the naming moments.
Study 3 seeks to determine how visual
properties of attended objects influence word learning.</p>
      </para>
      <para xml:id="S3.SS3.p2">
        <p>Previous studies using head-mounted cameras and head-mounted eye
trackers showed that visual objects attended by infants tend to
possess certain visual properties — e.g., they tend to
be large in view, which provides a high-resolution image of the object <cite class="ltx_citemacro_cite"><bibref bibrefs="yu2012embodied" separator=";" show="Authors Phrase1YearPhrase2" yyseparator=",">
              <bibrefphrase>(</bibrefphrase>
              <bibrefphrase>)</bibrefphrase>
            </bibref></cite>. In light of this, the present simulation
focused on object size.
The naming events were grouped into two subsets by a median split of
object size. The large subset contains naming instances in which named
objects are larger than the median size (6%) whereas the small subset
contains naming instances in which named objects are smaller than the
median. The same model was separately trained on the large set and the
small set. We found the model trained with large objects achieved
significantly higher accuracy on the test dataset than that trained
with small objects (<Math mode="inline" tex="M_{large}=30.50\%,SE_{large}=2.20\%;M_{small}=18.81\%,SE_{small}=1.81\%;\beta=%&#10;0.29,t=4.12,p&lt;0.001" text="formulae@(M _ (l * a * r * g * e) = 30.50percent, formulae@(S * E _ (l * a * r * g * e) = 2.20percent, formulae@(M _ (s * m * a * l * l) = 18.81percent, formulae@(S * E _ (s * m * a * l * l) = 1.81percent, formulae@(beta = 0.29, formulae@(t = 4.12, p less 0.001))))))" xml:id="S3.SS3.p2.m1">
            <XMath>
              <XMDual>
                <XMApp>
                  <XMTok meaning="formulae"/>
                  <XMRef idref="S3.SS3.p2.m1.1"/>
                  <XMRef idref="S3.SS3.p2.m1.2"/>
                </XMApp>
                <XMWrap>
                  <XMApp xml:id="S3.SS3.p2.m1.1">
                    <XMTok meaning="equals" role="RELOP">=</XMTok>
                    <XMApp>
                      <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                      <XMTok font="italic" role="UNKNOWN">M</XMTok>
                      <XMApp>
                        <XMTok meaning="times" role="MULOP">⁢</XMTok>
                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">a</XMTok>
                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">g</XMTok>
                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">e</XMTok>
                      </XMApp>
                    </XMApp>
                    <XMApp>
                      <XMTok meaning="percent" role="POSTFIX">%</XMTok>
                      <XMTok meaning="30.50" role="NUMBER">30.50</XMTok>
                    </XMApp>
                  </XMApp>
                  <XMTok role="PUNCT">,</XMTok>
                  <XMDual xml:id="S3.SS3.p2.m1.2">
                    <XMApp>
                      <XMTok meaning="formulae"/>
                      <XMRef idref="S3.SS3.p2.m1.2.1"/>
                      <XMRef idref="S3.SS3.p2.m1.2.2"/>
                    </XMApp>
                    <XMWrap>
                      <XMApp xml:id="S3.SS3.p2.m1.2.1">
                        <XMTok meaning="equals" role="RELOP">=</XMTok>
                        <XMApp>
                          <XMTok meaning="times" role="MULOP">⁢</XMTok>
                          <XMTok font="italic" role="UNKNOWN">S</XMTok>
                          <XMApp>
                            <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                            <XMTok font="italic" role="UNKNOWN">E</XMTok>
                            <XMApp>
                              <XMTok meaning="times" role="MULOP">⁢</XMTok>
                              <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                              <XMTok font="italic" fontsize="70%" role="UNKNOWN">a</XMTok>
                              <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                              <XMTok font="italic" fontsize="70%" role="UNKNOWN">g</XMTok>
                              <XMTok font="italic" fontsize="70%" role="UNKNOWN">e</XMTok>
                            </XMApp>
                          </XMApp>
                        </XMApp>
                        <XMApp>
                          <XMTok meaning="percent" role="POSTFIX">%</XMTok>
                          <XMTok meaning="2.20" role="NUMBER">2.20</XMTok>
                        </XMApp>
                      </XMApp>
                      <XMTok role="PUNCT">;</XMTok>
                      <XMDual xml:id="S3.SS3.p2.m1.2.2">
                        <XMApp>
                          <XMTok meaning="formulae"/>
                          <XMRef idref="S3.SS3.p2.m1.2.2.1"/>
                          <XMRef idref="S3.SS3.p2.m1.2.2.2"/>
                        </XMApp>
                        <XMWrap>
                          <XMApp xml:id="S3.SS3.p2.m1.2.2.1">
                            <XMTok meaning="equals" role="RELOP">=</XMTok>
                            <XMApp>
                              <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                              <XMTok font="italic" role="UNKNOWN">M</XMTok>
                              <XMApp>
                                <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">s</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">m</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">a</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                              </XMApp>
                            </XMApp>
                            <XMApp>
                              <XMTok meaning="percent" role="POSTFIX">%</XMTok>
                              <XMTok meaning="18.81" role="NUMBER">18.81</XMTok>
                            </XMApp>
                          </XMApp>
                          <XMTok role="PUNCT">,</XMTok>
                          <XMDual xml:id="S3.SS3.p2.m1.2.2.2">
                            <XMApp>
                              <XMTok meaning="formulae"/>
                              <XMRef idref="S3.SS3.p2.m1.2.2.2.1"/>
                              <XMRef idref="S3.SS3.p2.m1.2.2.2.2"/>
                            </XMApp>
                            <XMWrap>
                              <XMApp xml:id="S3.SS3.p2.m1.2.2.2.1">
                                <XMTok meaning="equals" role="RELOP">=</XMTok>
                                <XMApp>
                                  <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                  <XMTok font="italic" role="UNKNOWN">S</XMTok>
                                  <XMApp>
                                    <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                                    <XMTok font="italic" role="UNKNOWN">E</XMTok>
                                    <XMApp>
                                      <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">s</XMTok>
                                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">m</XMTok>
                                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">a</XMTok>
                                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                                    </XMApp>
                                  </XMApp>
                                </XMApp>
                                <XMApp>
                                  <XMTok meaning="percent" role="POSTFIX">%</XMTok>
                                  <XMTok meaning="1.81" role="NUMBER">1.81</XMTok>
                                </XMApp>
                              </XMApp>
                              <XMTok role="PUNCT">;</XMTok>
                              <XMDual xml:id="S3.SS3.p2.m1.2.2.2.2">
                                <XMApp>
                                  <XMTok meaning="formulae"/>
                                  <XMRef idref="S3.SS3.p2.m1.2.2.2.2.1"/>
                                  <XMRef idref="S3.SS3.p2.m1.2.2.2.2.2"/>
                                </XMApp>
                                <XMWrap>
                                  <XMApp xml:id="S3.SS3.p2.m1.2.2.2.2.1">
                                    <XMTok meaning="equals" role="RELOP">=</XMTok>
                                    <XMTok font="italic" name="beta" role="UNKNOWN">β</XMTok>
                                    <XMTok meaning="0.29" role="NUMBER">0.29</XMTok>
                                  </XMApp>
                                  <XMTok role="PUNCT">,</XMTok>
                                  <XMDual xml:id="S3.SS3.p2.m1.2.2.2.2.2">
                                    <XMApp>
                                      <XMTok meaning="formulae"/>
                                      <XMRef idref="S3.SS3.p2.m1.2.2.2.2.2.1"/>
                                      <XMRef idref="S3.SS3.p2.m1.2.2.2.2.2.2"/>
                                    </XMApp>
                                    <XMWrap>
                                      <XMApp xml:id="S3.SS3.p2.m1.2.2.2.2.2.1">
                                        <XMTok meaning="equals" role="RELOP">=</XMTok>
                                        <XMTok font="italic" role="UNKNOWN">t</XMTok>
                                        <XMTok meaning="4.12" role="NUMBER">4.12</XMTok>
                                      </XMApp>
                                      <XMTok role="PUNCT">,</XMTok>
                                      <XMApp xml:id="S3.SS3.p2.m1.2.2.2.2.2.2">
                                        <XMTok meaning="less-than" role="RELOP">&lt;</XMTok>
                                        <XMTok font="italic" role="UNKNOWN">p</XMTok>
                                        <XMTok meaning="0.001" role="NUMBER">0.001</XMTok>
                                      </XMApp>
                                    </XMWrap>
                                  </XMDual>
                                </XMWrap>
                              </XMDual>
                            </XMWrap>
                          </XMDual>
                        </XMWrap>
                      </XMDual>
                    </XMWrap>
                  </XMDual>
                </XMWrap>
              </XMDual>
            </XMath>
          </Math>).
<!--  %**** main.removed.tex Line 650 **** --></p>
      </para>
      <para xml:id="S3.SS3.p3">
        <p>If the target object in a naming event is large in view, that object
is more likely to be attended by infants. Thus, infants’ sustained
attention on a target object is likely to co-vary with the size of the
object. If so, the difference in the learning results described
above could be due to sustained attention but not object size. To
distinguish the effects on word learning between those two co-varying
factors, we divided naming events into sustained attention and
distributed attention as in Study 2, and examined the effects of
object size in those two situations. In each case, we
used a median split to further divide naming events into a large
subset and a small subset. As shown in Figure <ref labelref="LABEL:fig:sizesample"/>,
when infants showed sustained attention on named objects, the model
trained based on large targets outperformed the same model trained
with small targets (<Math mode="inline" tex="M_{large}=24.27\%,SE_{large}=2.03\%;M_{small}=12.18\%,SE_{small}=1.5\%;\beta=0%&#10;.37,t=4.,p&lt;0.001" xml:id="S3.SS3.p3.m1">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="italic" role="UNKNOWN">M</XMTok>
                <XMApp>
                  <XMTok meaning="times" role="MULOP">⁢</XMTok>
                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">a</XMTok>
                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">g</XMTok>
                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">e</XMTok>
                </XMApp>
              </XMApp>
              <XMTok meaning="equals" role="RELOP">=</XMTok>
              <XMTok meaning="24.27" role="NUMBER">24.27</XMTok>
              <XMTok meaning="percent" role="POSTFIX">%</XMTok>
              <XMTok role="PUNCT">,</XMTok>
              <XMTok font="italic" role="UNKNOWN">S</XMTok>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="italic" role="UNKNOWN">E</XMTok>
                <XMApp>
                  <XMTok meaning="times" role="MULOP">⁢</XMTok>
                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">a</XMTok>
                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">g</XMTok>
                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">e</XMTok>
                </XMApp>
              </XMApp>
              <XMTok meaning="equals" role="RELOP">=</XMTok>
              <XMTok meaning="2.03" role="NUMBER">2.03</XMTok>
              <XMTok meaning="percent" role="POSTFIX">%</XMTok>
              <XMTok role="PUNCT">;</XMTok>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="italic" role="UNKNOWN">M</XMTok>
                <XMApp>
                  <XMTok meaning="times" role="MULOP">⁢</XMTok>
                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">s</XMTok>
                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">m</XMTok>
                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">a</XMTok>
                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                </XMApp>
              </XMApp>
              <XMTok meaning="equals" role="RELOP">=</XMTok>
              <XMTok meaning="12.18" role="NUMBER">12.18</XMTok>
              <XMTok meaning="percent" role="POSTFIX">%</XMTok>
              <XMTok role="PUNCT">,</XMTok>
              <XMTok font="italic" role="UNKNOWN">S</XMTok>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="italic" role="UNKNOWN">E</XMTok>
                <XMApp>
                  <XMTok meaning="times" role="MULOP">⁢</XMTok>
                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">s</XMTok>
                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">m</XMTok>
                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">a</XMTok>
                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                </XMApp>
              </XMApp>
              <XMTok meaning="equals" role="RELOP">=</XMTok>
              <XMTok meaning="1.5" role="NUMBER">1.5</XMTok>
              <XMTok meaning="percent" role="POSTFIX">%</XMTok>
              <XMTok role="PUNCT">;</XMTok>
              <XMTok font="italic" name="beta" role="UNKNOWN">β</XMTok>
              <XMTok meaning="equals" role="RELOP">=</XMTok>
              <XMTok meaning="0.37" role="NUMBER">0.37</XMTok>
              <XMTok role="PUNCT">,</XMTok>
              <XMTok font="italic" role="UNKNOWN">t</XMTok>
              <XMTok meaning="equals" role="RELOP">=</XMTok>
              <XMTok meaning="4" role="NUMBER">4</XMTok>
              <XMTok role="PERIOD">.</XMTok>
              <XMTok role="PUNCT">,</XMTok>
              <XMTok font="italic" role="UNKNOWN">p</XMTok>
              <XMTok meaning="less-than" role="RELOP">&lt;</XMTok>
              <XMTok meaning="0.001" role="NUMBER">0.001</XMTok>
            </XMath>
          </Math>). In
the cases of naming events with distributed attention, the model again
favored events with large target objects over those
with small targets (<Math mode="inline" tex="M_{large}=17.07\%,SE_{large}=1.60\%;M_{small}=12.88\%,SE_{small}=1.33\%;\beta=%&#10;0.20,t=2.02,p&lt;0.05" text="formulae@(M _ (l * a * r * g * e) = 17.07percent, formulae@(S * E _ (l * a * r * g * e) = 1.60percent, formulae@(M _ (s * m * a * l * l) = 12.88percent, formulae@(S * E _ (s * m * a * l * l) = 1.33percent, formulae@(beta = 0.20, formulae@(t = 2.02, p less 0.05))))))" xml:id="S3.SS3.p3.m2">
            <XMath>
              <XMDual>
                <XMApp>
                  <XMTok meaning="formulae"/>
                  <XMRef idref="S3.SS3.p3.m2.1"/>
                  <XMRef idref="S3.SS3.p3.m2.2"/>
                </XMApp>
                <XMWrap>
                  <XMApp xml:id="S3.SS3.p3.m2.1">
                    <XMTok meaning="equals" role="RELOP">=</XMTok>
                    <XMApp>
                      <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                      <XMTok font="italic" role="UNKNOWN">M</XMTok>
                      <XMApp>
                        <XMTok meaning="times" role="MULOP">⁢</XMTok>
                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">a</XMTok>
                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">g</XMTok>
                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">e</XMTok>
                      </XMApp>
                    </XMApp>
                    <XMApp>
                      <XMTok meaning="percent" role="POSTFIX">%</XMTok>
                      <XMTok meaning="17.07" role="NUMBER">17.07</XMTok>
                    </XMApp>
                  </XMApp>
                  <XMTok role="PUNCT">,</XMTok>
                  <XMDual xml:id="S3.SS3.p3.m2.2">
                    <XMApp>
                      <XMTok meaning="formulae"/>
                      <XMRef idref="S3.SS3.p3.m2.2.1"/>
                      <XMRef idref="S3.SS3.p3.m2.2.2"/>
                    </XMApp>
                    <XMWrap>
                      <XMApp xml:id="S3.SS3.p3.m2.2.1">
                        <XMTok meaning="equals" role="RELOP">=</XMTok>
                        <XMApp>
                          <XMTok meaning="times" role="MULOP">⁢</XMTok>
                          <XMTok font="italic" role="UNKNOWN">S</XMTok>
                          <XMApp>
                            <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                            <XMTok font="italic" role="UNKNOWN">E</XMTok>
                            <XMApp>
                              <XMTok meaning="times" role="MULOP">⁢</XMTok>
                              <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                              <XMTok font="italic" fontsize="70%" role="UNKNOWN">a</XMTok>
                              <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                              <XMTok font="italic" fontsize="70%" role="UNKNOWN">g</XMTok>
                              <XMTok font="italic" fontsize="70%" role="UNKNOWN">e</XMTok>
                            </XMApp>
                          </XMApp>
                        </XMApp>
                        <XMApp>
                          <XMTok meaning="percent" role="POSTFIX">%</XMTok>
                          <XMTok meaning="1.60" role="NUMBER">1.60</XMTok>
                        </XMApp>
                      </XMApp>
                      <XMTok role="PUNCT">;</XMTok>
                      <XMDual xml:id="S3.SS3.p3.m2.2.2">
                        <XMApp>
                          <XMTok meaning="formulae"/>
                          <XMRef idref="S3.SS3.p3.m2.2.2.1"/>
                          <XMRef idref="S3.SS3.p3.m2.2.2.2"/>
                        </XMApp>
                        <XMWrap>
                          <XMApp xml:id="S3.SS3.p3.m2.2.2.1">
                            <XMTok meaning="equals" role="RELOP">=</XMTok>
                            <XMApp>
                              <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                              <XMTok font="italic" role="UNKNOWN">M</XMTok>
                              <XMApp>
                                <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">s</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">m</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">a</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                              </XMApp>
                            </XMApp>
                            <XMApp>
                              <XMTok meaning="percent" role="POSTFIX">%</XMTok>
                              <XMTok meaning="12.88" role="NUMBER">12.88</XMTok>
                            </XMApp>
                          </XMApp>
                          <XMTok role="PUNCT">,</XMTok>
                          <XMDual xml:id="S3.SS3.p3.m2.2.2.2">
                            <XMApp>
                              <XMTok meaning="formulae"/>
                              <XMRef idref="S3.SS3.p3.m2.2.2.2.1"/>
                              <XMRef idref="S3.SS3.p3.m2.2.2.2.2"/>
                            </XMApp>
                            <XMWrap>
                              <XMApp xml:id="S3.SS3.p3.m2.2.2.2.1">
                                <XMTok meaning="equals" role="RELOP">=</XMTok>
                                <XMApp>
                                  <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                  <XMTok font="italic" role="UNKNOWN">S</XMTok>
                                  <XMApp>
                                    <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                                    <XMTok font="italic" role="UNKNOWN">E</XMTok>
                                    <XMApp>
                                      <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">s</XMTok>
                                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">m</XMTok>
                                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">a</XMTok>
                                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                                    </XMApp>
                                  </XMApp>
                                </XMApp>
                                <XMApp>
                                  <XMTok meaning="percent" role="POSTFIX">%</XMTok>
                                  <XMTok meaning="1.33" role="NUMBER">1.33</XMTok>
                                </XMApp>
                              </XMApp>
                              <XMTok role="PUNCT">;</XMTok>
                              <XMDual xml:id="S3.SS3.p3.m2.2.2.2.2">
                                <XMApp>
                                  <XMTok meaning="formulae"/>
                                  <XMRef idref="S3.SS3.p3.m2.2.2.2.2.1"/>
                                  <XMRef idref="S3.SS3.p3.m2.2.2.2.2.2"/>
                                </XMApp>
                                <XMWrap>
                                  <XMApp xml:id="S3.SS3.p3.m2.2.2.2.2.1">
                                    <XMTok meaning="equals" role="RELOP">=</XMTok>
                                    <XMTok font="italic" name="beta" role="UNKNOWN">β</XMTok>
                                    <XMTok meaning="0.20" role="NUMBER">0.20</XMTok>
                                  </XMApp>
                                  <XMTok role="PUNCT">,</XMTok>
                                  <XMDual xml:id="S3.SS3.p3.m2.2.2.2.2.2">
                                    <XMApp>
                                      <XMTok meaning="formulae"/>
                                      <XMRef idref="S3.SS3.p3.m2.2.2.2.2.2.1"/>
                                      <XMRef idref="S3.SS3.p3.m2.2.2.2.2.2.2"/>
                                    </XMApp>
                                    <XMWrap>
                                      <XMApp xml:id="S3.SS3.p3.m2.2.2.2.2.2.1">
                                        <XMTok meaning="equals" role="RELOP">=</XMTok>
                                        <XMTok font="italic" role="UNKNOWN">t</XMTok>
                                        <XMTok meaning="2.02" role="NUMBER">2.02</XMTok>
                                      </XMApp>
                                      <XMTok role="PUNCT">,</XMTok>
                                      <XMApp xml:id="S3.SS3.p3.m2.2.2.2.2.2.2">
                                        <XMTok meaning="less-than" role="RELOP">&lt;</XMTok>
                                        <XMTok font="italic" role="UNKNOWN">p</XMTok>
                                        <XMTok meaning="0.05" role="NUMBER">0.05</XMTok>
                                      </XMApp>
                                    </XMWrap>
                                  </XMDual>
                                </XMWrap>
                              </XMDual>
                            </XMWrap>
                          </XMDual>
                        </XMWrap>
                      </XMDual>
                    </XMWrap>
                  </XMDual>
                </XMWrap>
              </XMDual>
            </XMath>
          </Math>). Taken together, these results suggest that visual
properties of the target object during a naming event have direct and
unique influence on word learning.</p>
      </para>
<!--  %**** main.removed.tex Line 675 **** -->      <figure inlist="lof" labels="LABEL:fig:sizesample LABEL:fig:study4size LABEL:fig:study4temp" placement="t" xml:id="S3.F7">
        <tags>
          <tag><text fontsize="90%">Figure 7</text></tag>
          <tag role="refnum">7</tag>
          <tag role="typerefnum">Figure 7</tag>
        </tags>
        <graphics candidates="figs/LargeSmall.jpg" class="ltx_centering" graphic="./figs/LargeSmall.jpg" options="width=433.62pt" xml:id="S3.F7.g1"/>
        <graphics candidates="figs/StudySize.pdf" class="ltx_centering" graphic="./figs/StudySize.pdf" options="width=433.62pt" xml:id="S3.F7.g2"/>
        <toccaption class="ltx_centering"><tag close=" ">7</tag>Effect of object size. Naming events
were divided based on object size; instances in the
large set contain visual instances of named objects large
in view whereas the named objects are small in view in the
small
set.   </toccaption>
        <caption class="ltx_centering"><tag close=": "><text fontsize="90%">Figure 7</text></tag><text fontsize="90%">Effect of object size. Naming events
were divided based on object size; instances in the
large set contain visual instances of named objects large
in view whereas the named objects are small in view in the
small
set.   </text></caption>
      </figure>
    </subsection>
  </section>
  <section inlist="toc" xml:id="S4">
    <tags>
      <tag>4</tag>
      <tag role="refnum">4</tag>
      <tag role="typerefnum">§4</tag>
    </tags>
    <title><tag close=" ">4</tag>General Discussions</title>
    <para xml:id="S4.p1">
      <p>Despite the fact that the referential uncertainty problem in word learning was
originally proposed as a philosophical puzzle, infant learners need to
solve this problem at the sensory level. From the infant’s point of
view, learning object names begins with hearing an object label while
perceiving a visual scene having multiple objects in view.
However, many computational models on language learning use simple
<!--  %**** main.removed.tex Line 700 **** -->data pre-selected and/or pre-cleaned to evaluate the theoretical ideas
of learning mechanisms instantiated by the models. We argue that to
obtain a complete understanding of learning mechanisms, we need to
examine not only the mechanisms themselves but also the data on which
those mechanisms operate. For infant learners, the data input to
their internal processes are those that make contact with their
sensory systems, so we capture the input data with egocentric video and head-mounted eye tracking. Moreover, compared to prior studies of word learning from third-person images <cite class="ltx_citemacro_cite"><bibref bibrefs="chrupala2015learning" separator=";" show="Authors Phrase1YearPhrase2" yyseparator=",">
            <bibrefphrase>(</bibrefphrase>
            <bibrefphrase>)</bibrefphrase>
          </bibref></cite>, the present study is the first, to our knowledge, to use actual
visual data from the infant’s point of view to reconstruct infants’
sensory experiences and to show how a computational model can solve the
referential uncertainty problem with the information available to
infant learners.</p>
    </para>
    <para xml:id="S4.p2">
      <p>There are three main contributions of the present paper as the first
steps toward using authentic data to model infant word learning. First, our
findings show that the available information from the infant’s point
of view is sufficient for a machine learning model to successfully
associate object names with visual objects. Second, our findings here
provide a sensory account of the role of sustained attention in early
word learning. Previous research showed that infant sustained
attention at naming moments during joint play is a strong predictor of
later vocabulary <cite class="ltx_citemacro_cite"><bibref bibrefs="yu2019infant" separator=";" show="Authors Phrase1YearPhrase2" yyseparator=",">
            <bibrefphrase>(</bibrefphrase>
            <bibrefphrase>)</bibrefphrase>
          </bibref></cite>. The results here offer a
mechanistic explanation that the moments of sustained attention during
parent naming provide better visual input for early word learning
compared with the moments when infants show more distributed
<!--  %**** main.removed.tex Line 725 **** -->attention. Finally, our findings provide quantitative evidence on how
in-moment properties of infants’ visual input influence early word
learning.</p>
    </para>
    <para xml:id="S4.p3">
      <p>The present study used only naming utterances in parent speech (object
names in those utterances, etc.), but we know that parent speech during
parent-child interaction is more information-rich. For example, studies
show that individual utterances in parent speech are usually
inter-connected, forming episodes of coherent discourse
that facilitate child language
learning <cite class="ltx_citemacro_cite"><bibref bibrefs="suanda2016,frank2013social" separator=";" show="Authors Phrase1YearPhrase2" yyseparator=",">
            <bibrefphrase>(</bibrefphrase>
            <bibrefphrase>)</bibrefphrase>
          </bibref></cite>. To better approximate
infants’ learning experiences in our future work, we plan to include
both object naming utterances and other referential and
non-referential utterances as the speech input to computational
models. Including the whole speech transcription will also allow us to
examine how infants learn not only object names but also other types
of words in their early vocabularies, such as action verbs. In
addition, we know that social cues in parent-child interaction play a
critical role in shaping the input to infant learners. With egocentric
video and computational models, our future work will simulate and
analyze how young learners detect and use various kinds of social cues
from the infant’s point of view.</p>
    </para>
<!--  %**** main.removed.tex Line 750 **** -->  </section>
  <section inlist="toc" xml:id="S5">
    <tags>
      <tag>5</tag>
      <tag role="refnum">5</tag>
      <tag role="typerefnum">§5</tag>
    </tags>
    <title><tag close=" ">5</tag>Acknowledgment</title>
    <para xml:id="S5.p1">
      <p>This work was supported in part by the National Institute of Child
Health and Human Development (R01HD074601 and R01HD093792), the
National Science Foundation (CAREER IIS-1253549), and the Indiana
University Office of the Vice Provost for Research, the College of
Arts and Sciences, and the Luddy School of Informatics, Computing, and
Engineering through the Emerging Areas of Research Project
<emph font="italic">Learning: Brains, Machines and Children</emph>.</p>
    </para>
  </section>
  <bibliography xml:id="bib">
    <title>References</title>
    <biblist>
      <bibitem key="bambach2018toddler" xml:id="bib.bib1">
        <tags>
          <tag role="number">1</tag>
          <tag role="year"><ERROR class="undefined">\APACyear</ERROR>2018</tag>
          <tag role="authors">Bambach <ERROR class="undefined">\BOthers</ERROR>.</tag>
          <tag role="fullauthors">Bambach, Crandall, Smith<ERROR class="undefined">\BCBL</ERROR> <ERROR class="undefined">\BBA</ERROR> Yu</tag>
          <tag role="refnum">Bambach <ERROR class="undefined">\BOthers</ERROR>. (<ERROR class="undefined">\APACyear</ERROR>2018)</tag>
          <tag role="key">bambach2018toddler</tag>
        </tags>
        <bibblock>
<ERROR class="undefined">\APACinsertmetastar</ERROR>bambach2018toddler<ERROR class="undefined">{APACrefauthors}</ERROR>Bambach, S., Crandall, D., Smith, L.<ERROR class="undefined">\BCBL</ERROR> <ERROR class="undefined">\BBA</ERROR> Yu, C. </bibblock>
        <bibblock><ERROR class="undefined">\APACrefYearMonthDay</ERROR>2018.
<!--  %**** main.removed.bbl Line 25 **** --></bibblock>
        <bibblock><ERROR class="undefined">\BBOQ</ERROR><ERROR class="undefined">\APACrefatitle</ERROR>Toddler-inspired visual object learning
Toddler-inspired visual object learning.<ERROR class="undefined">\BBCQ</ERROR>
</bibblock>
        <bibblock><ERROR class="undefined">\BIn</ERROR> <ERROR class="undefined">\APACrefbtitle</ERROR>Advances in Neural Information Processing Systems.
Advances in Neural Information Processing Systems.
<ERROR class="undefined">\PrintBackRefs</ERROR><ERROR class="undefined">\CurrentBib</ERROR>
</bibblock>
      </bibitem>
      <bibitem key="bambach2016active" xml:id="bib.bib2">
        <tags>
          <tag role="number">2</tag>
          <tag role="year"><ERROR class="undefined">\APACyear</ERROR>2016</tag>
          <tag role="authors">Bambach <ERROR class="undefined">\BOthers</ERROR>.</tag>
          <tag role="fullauthors">Bambach, Crandall, Smith<ERROR class="undefined">\BCBL</ERROR> <ERROR class="undefined">\BBA</ERROR> Yu</tag>
          <tag role="refnum">Bambach <ERROR class="undefined">\BOthers</ERROR>. (<ERROR class="undefined">\APACyear</ERROR>2016)</tag>
          <tag role="key">bambach2016active</tag>
        </tags>
        <bibblock>
<ERROR class="undefined">\APACinsertmetastar</ERROR>bambach2016active<ERROR class="undefined">{APACrefauthors}</ERROR>Bambach, S., Crandall, D<ERROR class="undefined">\BPBI</ERROR>J., Smith, L<ERROR class="undefined">\BPBI</ERROR>B.<!--  %**** main.removed.bbl Line 50 **** --><ERROR class="undefined">\BCBL</ERROR> <ERROR class="undefined">\BBA</ERROR> Yu, C. </bibblock>
        <bibblock><ERROR class="undefined">\APACrefYearMonthDay</ERROR>2016.
</bibblock>
        <bibblock><ERROR class="undefined">\BBOQ</ERROR><ERROR class="undefined">\APACrefatitle</ERROR>Active Viewing in Toddlers Facilitates Visual Object
Learning: An Egocentric Vision Approach. Active viewing in toddlers
facilitates visual object learning: An egocentric vision approach.<ERROR class="undefined">\BBCQ</ERROR>
</bibblock>
        <bibblock><ERROR class="undefined">\BIn</ERROR> <ERROR class="undefined">\APACrefbtitle</ERROR>Annual Conference of the Cognitive Science Society.
Annual Conference of the Cognitive Science Society.
<ERROR class="undefined">\PrintBackRefs</ERROR><ERROR class="undefined">\CurrentBib</ERROR>
</bibblock>
      </bibitem>
      <bibitem key="chrupala2015learning" xml:id="bib.bib3">
        <tags>
          <tag role="number">3</tag>
          <tag role="year"><ERROR class="undefined">\APACyear</ERROR>2015</tag>
          <tag role="authors">Chrupała <ERROR class="undefined">\BOthers</ERROR>.</tag>
          <tag role="fullauthors">Chrupała, Kádár<ERROR class="undefined">\BCBL</ERROR> <ERROR class="undefined">\BBA</ERROR> Alishahi</tag>
          <tag role="refnum">Chrupała <ERROR class="undefined">\BOthers</ERROR>. (<ERROR class="undefined">\APACyear</ERROR>2015)</tag>
          <tag role="key">chrupala2015learning</tag>
        </tags>
        <bibblock>
<ERROR class="undefined">\APACinsertmetastar</ERROR><!--  %**** main.removed.bbl Line 75 **** -->chrupala2015learning<ERROR class="undefined">{APACrefauthors}</ERROR>Chrupała, G., Kádár, Á.<ERROR class="undefined">\BCBL</ERROR> <ERROR class="undefined">\BBA</ERROR> Alishahi, A. </bibblock>
        <bibblock><ERROR class="undefined">\APACrefYearMonthDay</ERROR>2015.
</bibblock>
        <bibblock><ERROR class="undefined">\BBOQ</ERROR><ERROR class="undefined">\APACrefatitle</ERROR>Learning language through pictures Learning language
through pictures.<ERROR class="undefined">\BBCQ</ERROR>
</bibblock>
        <bibblock><ERROR class="undefined">\BIn</ERROR> <ERROR class="undefined">\APACrefbtitle</ERROR>Association for Computational Linguistics.
Association for Computational Linguistics.
<ERROR class="undefined">\PrintBackRefs</ERROR><ERROR class="undefined">\CurrentBib</ERROR>
</bibblock>
      </bibitem>
      <bibitem key="dupoux2018" xml:id="bib.bib4">
        <tags>
          <tag role="number">4</tag>
          <tag role="year"><ERROR class="undefined">\APACyear</ERROR>2018</tag>
          <tag role="authors">Dupoux</tag>
          <tag role="fullauthors">Dupoux</tag>
          <tag role="refnum">Dupoux (<ERROR class="undefined">\APACyear</ERROR>2018)</tag>
          <tag role="key">dupoux2018</tag>
        </tags>
        <bibblock>
<!--  %**** main.removed.bbl Line 100 **** --><ERROR class="undefined">\APACinsertmetastar</ERROR>dupoux2018<ERROR class="undefined">{APACrefauthors}</ERROR>Dupoux, E. </bibblock>
        <bibblock><ERROR class="undefined">\APACrefYearMonthDay</ERROR>2018.
</bibblock>
        <bibblock><ERROR class="undefined">\BBOQ</ERROR><ERROR class="undefined">\APACrefatitle</ERROR>Cognitive science in the era of artificial intelligence:
A roadmap for reverse-engineering the infant language-learner Cognitive
science in the era of artificial intelligence: A roadmap for
reverse-engineering the infant language-learner.<ERROR class="undefined">\BBCQ</ERROR>
</bibblock>
        <bibblock><ERROR class="undefined">\APACjournalVolNumPages</ERROR>Cognition17343–59.
<ERROR class="undefined">\PrintBackRefs</ERROR><ERROR class="undefined">\CurrentBib</ERROR>
</bibblock>
      </bibitem>
      <bibitem key="fazly2010" xml:id="bib.bib5">
        <tags>
          <tag role="number">5</tag>
          <tag role="year"><ERROR class="undefined">\APACyear</ERROR>2010</tag>
          <tag role="authors">Fazly <ERROR class="undefined">\BOthers</ERROR>.</tag>
          <tag role="fullauthors">Fazly, Alishahi<ERROR class="undefined">\BCBL</ERROR> <ERROR class="undefined">\BBA</ERROR> Stevenson</tag>
          <tag role="refnum">Fazly <ERROR class="undefined">\BOthers</ERROR>. (<ERROR class="undefined">\APACyear</ERROR>2010)</tag>
          <tag role="key">fazly2010</tag>
        </tags>
        <bibblock>
<ERROR class="undefined">\APACinsertmetastar</ERROR>fazly2010<ERROR class="undefined">{APACrefauthors}</ERROR>Fazly, A., Alishahi, A.<ERROR class="undefined">\BCBL</ERROR> <ERROR class="undefined">\BBA</ERROR> Stevenson, S. </bibblock>
        <bibblock><ERROR class="undefined">\APACrefYearMonthDay</ERROR>2010.
</bibblock>
        <bibblock><ERROR class="undefined">\BBOQ</ERROR><ERROR class="undefined">\APACrefatitle</ERROR>A probabilistic computational model of cross-situational
word learning A probabilistic computational model of cross-situational word
learning.<ERROR class="undefined">\BBCQ</ERROR>
</bibblock>
        <bibblock><ERROR class="undefined">\APACjournalVolNumPages</ERROR>Cognitive Science3461017–1063.
<ERROR class="undefined">\PrintBackRefs</ERROR><ERROR class="undefined">\CurrentBib</ERROR>
</bibblock>
      </bibitem>
      <bibitem key="frank2009" xml:id="bib.bib6">
        <tags>
          <tag role="number">6</tag>
          <tag role="year"><ERROR class="undefined">\APACyear</ERROR>2009</tag>
          <tag role="authors">Frank <ERROR class="undefined">\BOthers</ERROR>.</tag>
          <tag role="fullauthors">Frank, Goodman<ERROR class="undefined">\BCBL</ERROR> <ERROR class="undefined">\BBA</ERROR> Tenenbaum</tag>
          <tag role="refnum">Frank <ERROR class="undefined">\BOthers</ERROR>. (<ERROR class="undefined">\APACyear</ERROR>2009)</tag>
          <tag role="key">frank2009</tag>
        </tags>
        <bibblock>
<ERROR class="undefined">\APACinsertmetastar</ERROR>frank2009<ERROR class="undefined">{APACrefauthors}</ERROR>Frank, M<ERROR class="undefined">\BPBI</ERROR>C., Goodman, N<ERROR class="undefined">\BPBI</ERROR>D.<ERROR class="undefined">\BCBL</ERROR> <ERROR class="undefined">\BBA</ERROR> Tenenbaum, J<ERROR class="undefined">\BPBI</ERROR>B. </bibblock>
        <bibblock><ERROR class="undefined">\APACrefYearMonthDay</ERROR>2009.
</bibblock>
        <bibblock><ERROR class="undefined">\BBOQ</ERROR><ERROR class="undefined">\APACrefatitle</ERROR>Using speakers’ referential intentions to model early
cross-situational word learning Using speakers’ referential intentions to
model early cross-situational word learning.<ERROR class="undefined">\BBCQ</ERROR>
</bibblock>
        <bibblock><ERROR class="undefined">\APACjournalVolNumPages</ERROR>Psychological Science205578–585.
<ERROR class="undefined">\PrintBackRefs</ERROR><ERROR class="undefined">\CurrentBib</ERROR>
</bibblock>
      </bibitem>
      <bibitem key="frank2013social" xml:id="bib.bib7">
        <tags>
          <tag role="number">7</tag>
          <tag role="year"><ERROR class="undefined">\APACyear</ERROR>2013</tag>
          <tag role="authors">Frank <ERROR class="undefined">\BOthers</ERROR>.</tag>
          <tag role="fullauthors">Frank, Tenenbaum<ERROR class="undefined">\BCBL</ERROR> <ERROR class="undefined">\BBA</ERROR> Fernald</tag>
          <tag role="refnum">Frank <ERROR class="undefined">\BOthers</ERROR>. (<ERROR class="undefined">\APACyear</ERROR>2013)</tag>
          <tag role="key">frank2013social</tag>
        </tags>
        <bibblock>
<ERROR class="undefined">\APACinsertmetastar</ERROR>frank2013social<ERROR class="undefined">{APACrefauthors}</ERROR>Frank, M<ERROR class="undefined">\BPBI</ERROR>C., Tenenbaum, J<ERROR class="undefined">\BPBI</ERROR>B.<ERROR class="undefined">\BCBL</ERROR> <ERROR class="undefined">\BBA</ERROR> Fernald, A. </bibblock>
        <bibblock><ERROR class="undefined">\APACrefYearMonthDay</ERROR>2013.
</bibblock>
        <bibblock><ERROR class="undefined">\BBOQ</ERROR><ERROR class="undefined">\APACrefatitle</ERROR>Social and discourse contributions to the determination
of reference in cross-situational word learning Social and discourse
contributions to the determination of reference in cross-situational word
learning.<ERROR class="undefined">\BBCQ</ERROR>
</bibblock>
        <bibblock><ERROR class="undefined">\APACjournalVolNumPages</ERROR>Language Learning and Development911–24.
<!--  %**** main.removed.bbl Line 200 **** --><ERROR class="undefined">\PrintBackRefs</ERROR><ERROR class="undefined">\CurrentBib</ERROR>
</bibblock>
      </bibitem>
      <bibitem key="golinkoff2000becoming" xml:id="bib.bib8">
        <tags>
          <tag role="number">8</tag>
          <tag role="year"><ERROR class="undefined">\APACyear</ERROR>2000</tag>
          <tag role="authors">Golinkoff <ERROR class="undefined">\BOthers</ERROR>.</tag>
          <tag role="fullauthors">Golinkoff <ERROR class="undefined">\BOthers</ERROR>.</tag>
          <tag role="refnum">Golinkoff <ERROR class="undefined">\BOthers</ERROR>. (<ERROR class="undefined">\APACyear</ERROR>2000)</tag>
          <tag role="key">golinkoff2000becoming</tag>
        </tags>
        <bibblock>
<ERROR class="undefined">\APACinsertmetastar</ERROR>golinkoff2000becoming<ERROR class="undefined">{APACrefauthors}</ERROR>Golinkoff, R<ERROR class="undefined">\BPBI</ERROR>M., Hirsh-Pasek, K., Bloom, L., Smith, L<ERROR class="undefined">\BPBI</ERROR>B., Woodward, A<ERROR class="undefined">\BPBI</ERROR>L., Akhtar, N.<ERROR class="undefined">\BDBL</ERROR>Hollich, G. </bibblock>
        <bibblock><ERROR class="undefined">\APACrefYear</ERROR>2000.
</bibblock>
        <bibblock><!--  %**** main.removed.bbl Line 225 **** --><ERROR class="undefined">\APACrefbtitle</ERROR>Becoming a word learner: A debate on lexical acquisition
Becoming a word learner: A debate on lexical acquisition.
</bibblock>
        <bibblock><ERROR class="undefined">\APACaddressPublisher</ERROR>Oxford University Press.
<ERROR class="undefined">\PrintBackRefs</ERROR><ERROR class="undefined">\CurrentBib</ERROR>
</bibblock>
      </bibitem>
      <bibitem key="he2015delving" xml:id="bib.bib9">
        <tags>
          <tag role="number">9</tag>
          <tag role="year"><ERROR class="undefined">\APACyear</ERROR>2015</tag>
          <tag role="authors">He <ERROR class="undefined">\BOthers</ERROR>.</tag>
          <tag role="fullauthors">He, Zhang, Ren<ERROR class="undefined">\BCBL</ERROR> <ERROR class="undefined">\BBA</ERROR> Sun</tag>
          <tag role="refnum">He <ERROR class="undefined">\BOthers</ERROR>. (<ERROR class="undefined">\APACyear</ERROR>2015)</tag>
          <tag role="key">he2015delving</tag>
        </tags>
        <bibblock>
<ERROR class="undefined">\APACinsertmetastar</ERROR>he2015delving<ERROR class="undefined">{APACrefauthors}</ERROR>He, K., Zhang, X., Ren, S.<ERROR class="undefined">\BCBL</ERROR> <ERROR class="undefined">\BBA</ERROR> Sun, J. </bibblock>
        <bibblock><ERROR class="undefined">\APACrefYearMonthDay</ERROR>2015.
</bibblock>
        <bibblock><ERROR class="undefined">\BBOQ</ERROR><ERROR class="undefined">\APACrefatitle</ERROR>Delving deep into rectifiers: Surpassing human-level
performance on imagenet classification Delving deep into rectifiers:
Surpassing human-level performance on imagenet classification.<ERROR class="undefined">\BBCQ</ERROR>
</bibblock>
        <bibblock><ERROR class="undefined">\BIn</ERROR> <ERROR class="undefined">\APACrefbtitle</ERROR>IEEE Conference on Computer Vision and Pattern
Recognition. IEEE Conference on Computer Vision and Pattern
Recognition.
<ERROR class="undefined">\PrintBackRefs</ERROR><ERROR class="undefined">\CurrentBib</ERROR>
</bibblock>
      </bibitem>
      <bibitem key="he2016resnet" xml:id="bib.bib10">
        <tags>
          <tag role="number">10</tag>
          <tag role="year"><ERROR class="undefined">\APACyear</ERROR>2016</tag>
          <tag role="authors">He <ERROR class="undefined">\BOthers</ERROR>.</tag>
          <tag role="fullauthors">He, Zhang, Ren<ERROR class="undefined">\BCBL</ERROR> <ERROR class="undefined">\BBA</ERROR> Sun</tag>
          <tag role="refnum">He <ERROR class="undefined">\BOthers</ERROR>. (<ERROR class="undefined">\APACyear</ERROR>2016)</tag>
          <tag role="key">he2016resnet</tag>
        </tags>
        <bibblock>
<ERROR class="undefined">\APACinsertmetastar</ERROR><!--  %**** main.removed.bbl Line 275 **** -->he2016resnet<ERROR class="undefined">{APACrefauthors}</ERROR>He, K., Zhang, X., Ren, S.<ERROR class="undefined">\BCBL</ERROR> <ERROR class="undefined">\BBA</ERROR> Sun, J. </bibblock>
        <bibblock><ERROR class="undefined">\APACrefYearMonthDay</ERROR>2016.
</bibblock>
        <bibblock><ERROR class="undefined">\BBOQ</ERROR><ERROR class="undefined">\APACrefatitle</ERROR>Deep Residual Learning for Image Recognition Deep
residual learning for image recognition.<ERROR class="undefined">\BBCQ</ERROR>
</bibblock>
        <bibblock><ERROR class="undefined">\BIn</ERROR> <ERROR class="undefined">\APACrefbtitle</ERROR>IEEE Conference on Computer Vision and Pattern
Recognition. IEEE Conference on Computer Vision and Pattern
Recognition.
<ERROR class="undefined">\PrintBackRefs</ERROR><ERROR class="undefined">\CurrentBib</ERROR>
</bibblock>
      </bibitem>
      <bibitem key="kachergis2017" xml:id="bib.bib11">
        <tags>
          <tag role="number">11</tag>
          <tag role="year"><ERROR class="undefined">\APACyear</ERROR>2017</tag>
          <tag role="authors">Kachergis <ERROR class="undefined">\BBA</ERROR> Yu</tag>
          <tag role="fullauthors">Kachergis <ERROR class="undefined">\BBA</ERROR> Yu</tag>
          <tag role="refnum">Kachergis <ERROR class="undefined">\BBA</ERROR> Yu (<ERROR class="undefined">\APACyear</ERROR>2017)</tag>
          <tag role="key">kachergis2017</tag>
        </tags>
        <bibblock>
<ERROR class="undefined">\APACinsertmetastar</ERROR>kachergis2017<ERROR class="undefined">{APACrefauthors}</ERROR>Kachergis, G.<ERROR class="undefined">\BCBT</ERROR> <ERROR class="undefined">\BBA</ERROR> Yu, C. </bibblock>
        <bibblock><ERROR class="undefined">\APACrefYearMonthDay</ERROR>2017.
</bibblock>
        <bibblock><ERROR class="undefined">\BBOQ</ERROR><ERROR class="undefined">\APACrefatitle</ERROR>Observing and modeling developing knowledge and
uncertainty during cross-situational word learning Observing and modeling
developing knowledge and uncertainty during cross-situational word
learning.<ERROR class="undefined">\BBCQ</ERROR>
</bibblock>
        <bibblock><ERROR class="undefined">\APACjournalVolNumPages</ERROR>IEEE Transactions on Cognitive and Developmental
Systems102227–236.
<ERROR class="undefined">\PrintBackRefs</ERROR><ERROR class="undefined">\CurrentBib</ERROR>
</bibblock>
      </bibitem>
      <bibitem key="perry" xml:id="bib.bib12">
        <tags>
          <tag role="number">12</tag>
          <tag role="year"><ERROR class="undefined">\APACyear</ERROR>2002</tag>
          <tag role="authors">Perry <ERROR class="undefined">\BBA</ERROR> Geisler</tag>
          <tag role="fullauthors">Perry <ERROR class="undefined">\BBA</ERROR> Geisler</tag>
          <tag role="refnum">Perry <ERROR class="undefined">\BBA</ERROR> Geisler (<ERROR class="undefined">\APACyear</ERROR>2002)</tag>
          <tag role="key">perry</tag>
        </tags>
        <bibblock>
<ERROR class="undefined">\APACinsertmetastar</ERROR>perry<ERROR class="undefined">{APACrefauthors}</ERROR>Perry, J<ERROR class="undefined">\BPBI</ERROR>S.<ERROR class="undefined">\BCBT</ERROR> <ERROR class="undefined">\BBA</ERROR> Geisler, W<ERROR class="undefined">\BPBI</ERROR>S. </bibblock>
        <bibblock><ERROR class="undefined">\APACrefYearMonthDay</ERROR>2002.
</bibblock>
        <bibblock><ERROR class="undefined">\BBOQ</ERROR><ERROR class="undefined">\APACrefatitle</ERROR>Gaze-contingent real-time simulation of arbitrary visual
fields Gaze-contingent real-time simulation of arbitrary visual
fields.<ERROR class="undefined">\BBCQ</ERROR>
</bibblock>
        <bibblock><ERROR class="undefined">\BIn</ERROR> <ERROR class="undefined">\APACrefbtitle</ERROR>Human Vision and Electronic Imaging. Human Vision
and Electronic Imaging.
<ERROR class="undefined">\PrintBackRefs</ERROR><ERROR class="undefined">\CurrentBib</ERROR>
<!--  %**** main.removed.bbl Line 350 **** --></bibblock>
      </bibitem>
      <bibitem key="quine1960word" xml:id="bib.bib13">
        <tags>
          <tag role="number">13</tag>
          <tag role="year"><ERROR class="undefined">\APACyear</ERROR>1960</tag>
          <tag role="authors">Quine</tag>
          <tag role="fullauthors">Quine</tag>
          <tag role="refnum">Quine (<ERROR class="undefined">\APACyear</ERROR>1960)</tag>
          <tag role="key">quine1960word</tag>
        </tags>
        <bibblock>
<ERROR class="undefined">\APACinsertmetastar</ERROR>quine1960word<ERROR class="undefined">{APACrefauthors}</ERROR>Quine, W<ERROR class="undefined">\BPBI</ERROR>V<ERROR class="undefined">\BPBI</ERROR>O. </bibblock>
        <bibblock><ERROR class="undefined">\APACrefYear</ERROR>1960.
</bibblock>
        <bibblock><ERROR class="undefined">\APACrefbtitle</ERROR>Word and Object Word and object.
</bibblock>
        <bibblock><ERROR class="undefined">\APACaddressPublisher</ERROR>MIT press.
<ERROR class="undefined">\PrintBackRefs</ERROR><ERROR class="undefined">\CurrentBib</ERROR>
</bibblock>
      </bibitem>
      <bibitem key="Rasanen2019" xml:id="bib.bib14">
        <tags>
          <tag role="number">14</tag>
          <tag role="year"><ERROR class="undefined">\APACyear</ERROR>2019</tag>
          <tag role="authors">Rasanen <ERROR class="undefined">\BBA</ERROR> Khorrami</tag>
          <tag role="fullauthors">Rasanen <ERROR class="undefined">\BBA</ERROR> Khorrami</tag>
          <tag role="refnum">Rasanen <ERROR class="undefined">\BBA</ERROR> Khorrami (<ERROR class="undefined">\APACyear</ERROR>2019)</tag>
          <tag role="key">Rasanen2019</tag>
        </tags>
        <bibblock>
<ERROR class="undefined">\APACinsertmetastar</ERROR>Rasanen2019<ERROR class="undefined">{APACrefauthors}</ERROR>Rasanen, O.<ERROR class="undefined">\BCBT</ERROR> <ERROR class="undefined">\BBA</ERROR> Khorrami, K. </bibblock>
        <bibblock><ERROR class="undefined">\APACrefYearMonthDay</ERROR>2019.
</bibblock>
        <bibblock><ERROR class="undefined">\BBOQ</ERROR><ERROR class="undefined">\APACrefatitle</ERROR>A computational model of early language acquisition from
audiovisual experiences of young infants A computational model of early
language acquisition from audiovisual experiences of young infants.<ERROR class="undefined">\BBCQ</ERROR>
</bibblock>
        <bibblock><ERROR class="undefined">\BIn</ERROR> <ERROR class="undefined">\APACrefbtitle</ERROR>INTERSPEECH. INTERSPEECH.
<ERROR class="undefined">\PrintBackRefs</ERROR><ERROR class="undefined">\CurrentBib</ERROR>
<!--  %**** main.removed.bbl Line 400 **** --></bibblock>
      </bibitem>
      <bibitem key="Roy2002" xml:id="bib.bib15">
        <tags>
          <tag role="number">15</tag>
          <tag role="year"><ERROR class="undefined">\APACyear</ERROR>2002</tag>
          <tag role="authors">Roy <ERROR class="undefined">\BBA</ERROR> Pentland</tag>
          <tag role="fullauthors">Roy <ERROR class="undefined">\BBA</ERROR> Pentland</tag>
          <tag role="refnum">Roy <ERROR class="undefined">\BBA</ERROR> Pentland (<ERROR class="undefined">\APACyear</ERROR>2002)</tag>
          <tag role="key">Roy2002</tag>
        </tags>
        <bibblock>
<ERROR class="undefined">\APACinsertmetastar</ERROR>Roy2002<ERROR class="undefined">{APACrefauthors}</ERROR>Roy, D<ERROR class="undefined">\BPBI</ERROR>K.<ERROR class="undefined">\BCBT</ERROR> <ERROR class="undefined">\BBA</ERROR> Pentland, A<ERROR class="undefined">\BPBI</ERROR>P. </bibblock>
        <bibblock><ERROR class="undefined">\APACrefYearMonthDay</ERROR>2002Jan.
</bibblock>
        <bibblock><ERROR class="undefined">\BBOQ</ERROR><ERROR class="undefined">\APACrefatitle</ERROR>Learning words from sights and sounds: a computational
model Learning words from sights and sounds: a computational model.<ERROR class="undefined">\BBCQ</ERROR>
</bibblock>
        <bibblock><ERROR class="undefined">\APACjournalVolNumPages</ERROR>Cognitive Science261113–146.
<ERROR class="undefined">\PrintBackRefs</ERROR><ERROR class="undefined">\CurrentBib</ERROR>
<!--  %**** main.removed.bbl Line 425 **** --></bibblock>
      </bibitem>
      <bibitem key="ruff2001attention" xml:id="bib.bib16">
        <tags>
          <tag role="number">16</tag>
          <tag role="year"><ERROR class="undefined">\APACyear</ERROR>2001</tag>
          <tag role="authors">Ruff <ERROR class="undefined">\BBA</ERROR> Rothbart</tag>
          <tag role="fullauthors">Ruff <ERROR class="undefined">\BBA</ERROR> Rothbart</tag>
          <tag role="refnum">Ruff <ERROR class="undefined">\BBA</ERROR> Rothbart (<ERROR class="undefined">\APACyear</ERROR>2001)</tag>
          <tag role="key">ruff2001attention</tag>
        </tags>
        <bibblock>
<ERROR class="undefined">\APACinsertmetastar</ERROR>ruff2001attention<ERROR class="undefined">{APACrefauthors}</ERROR>Ruff, H<ERROR class="undefined">\BPBI</ERROR>A.<ERROR class="undefined">\BCBT</ERROR> <ERROR class="undefined">\BBA</ERROR> Rothbart, M<ERROR class="undefined">\BPBI</ERROR>K. </bibblock>
        <bibblock><ERROR class="undefined">\APACrefYear</ERROR>2001.
</bibblock>
        <bibblock><ERROR class="undefined">\APACrefbtitle</ERROR>Attention in early development: Themes and variations
Attention in early development: Themes and variations.
</bibblock>
        <bibblock><ERROR class="undefined">\APACaddressPublisher</ERROR>Oxford University Press.
<!--  %**** main.removed.bbl Line 450 **** --><ERROR class="undefined">\PrintBackRefs</ERROR><ERROR class="undefined">\CurrentBib</ERROR>
</bibblock>
      </bibitem>
      <bibitem key="russakovsky2015imagenet" xml:id="bib.bib17">
        <tags>
          <tag role="number">17</tag>
          <tag role="year"><ERROR class="undefined">\APACyear</ERROR>2015</tag>
          <tag role="authors">Russakovsky <ERROR class="undefined">\BOthers</ERROR>.</tag>
          <tag role="fullauthors">Russakovsky <ERROR class="undefined">\BOthers</ERROR>.</tag>
          <tag role="refnum">Russakovsky <ERROR class="undefined">\BOthers</ERROR>. (<ERROR class="undefined">\APACyear</ERROR>2015)</tag>
          <tag role="key">russakovsky2015imagenet</tag>
        </tags>
        <bibblock>
<ERROR class="undefined">\APACinsertmetastar</ERROR>russakovsky2015imagenet<ERROR class="undefined">{APACrefauthors}</ERROR>Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S.<ERROR class="undefined">\BDBL</ERROR>others </bibblock>
        <bibblock><ERROR class="undefined">\APACrefYearMonthDay</ERROR>2015.
</bibblock>
        <bibblock><!--  %**** main.removed.bbl Line 475 **** --><ERROR class="undefined">\BBOQ</ERROR><ERROR class="undefined">\APACrefatitle</ERROR>ImageNet Large Scale Visual Recognition Challenge
Imagenet large scale visual recognition challenge.<ERROR class="undefined">\BBCQ</ERROR>
</bibblock>
        <bibblock><ERROR class="undefined">\APACjournalVolNumPages</ERROR>International Journal of Computer
Vision1153211–252.
<ERROR class="undefined">\PrintBackRefs</ERROR><ERROR class="undefined">\CurrentBib</ERROR>
</bibblock>
      </bibitem>
      <bibitem key="silver2016mastering" xml:id="bib.bib18">
        <tags>
          <tag role="number">18</tag>
          <tag role="year"><ERROR class="undefined">\APACyear</ERROR>2016</tag>
          <tag role="authors">Silver <ERROR class="undefined">\BOthers</ERROR>.</tag>
          <tag role="fullauthors">Silver <ERROR class="undefined">\BOthers</ERROR>.</tag>
          <tag role="refnum">Silver <ERROR class="undefined">\BOthers</ERROR>. (<ERROR class="undefined">\APACyear</ERROR>2016)</tag>
          <tag role="key">silver2016mastering</tag>
        </tags>
        <bibblock>
<ERROR class="undefined">\APACinsertmetastar</ERROR>silver2016mastering<ERROR class="undefined">{APACrefauthors}</ERROR>Silver, D., Huang, A., Maddison, C<ERROR class="undefined">\BPBI</ERROR>J., Guez, A., Sifre, L., Van Den Driessche, G.<ERROR class="undefined">\BDBL</ERROR>others </bibblock>
        <bibblock><ERROR class="undefined">\APACrefYearMonthDay</ERROR>2016.
</bibblock>
        <bibblock><ERROR class="undefined">\BBOQ</ERROR><ERROR class="undefined">\APACrefatitle</ERROR>Mastering the game of Go with deep neural networks and
tree search Mastering the game of go with deep neural networks and tree
search.<ERROR class="undefined">\BBCQ</ERROR>
</bibblock>
        <bibblock><ERROR class="undefined">\APACjournalVolNumPages</ERROR>Nature5297587484.
<ERROR class="undefined">\PrintBackRefs</ERROR><ERROR class="undefined">\CurrentBib</ERROR>
</bibblock>
      </bibitem>
      <bibitem key="slone2018gaze" xml:id="bib.bib19">
        <tags>
          <tag role="number">19</tag>
          <tag role="year"><ERROR class="undefined">\APACyear</ERROR>2018</tag>
          <tag role="authors">Slone <ERROR class="undefined">\BOthers</ERROR>.</tag>
          <tag role="fullauthors">Slone <ERROR class="undefined">\BOthers</ERROR>.</tag>
          <tag role="refnum">Slone <ERROR class="undefined">\BOthers</ERROR>. (<ERROR class="undefined">\APACyear</ERROR>2018)</tag>
          <tag role="key">slone2018gaze</tag>
        </tags>
        <bibblock>
<ERROR class="undefined">\APACinsertmetastar</ERROR>slone2018gaze<ERROR class="undefined">{APACrefauthors}</ERROR>Slone, L<ERROR class="undefined">\BPBI</ERROR>K., Abney, D<ERROR class="undefined">\BPBI</ERROR>H.<!--  %**** main.removed.bbl Line 525 **** -->, Borjon, J<ERROR class="undefined">\BPBI</ERROR>I., Chen, C<ERROR class="undefined">\BHBI</ERROR>h., Franchak, J<ERROR class="undefined">\BPBI</ERROR>M., Pearcy, D.<ERROR class="undefined">\BDBL</ERROR>Yu, C. </bibblock>
        <bibblock><ERROR class="undefined">\APACrefYearMonthDay</ERROR>2018.
</bibblock>
        <bibblock><ERROR class="undefined">\BBOQ</ERROR><ERROR class="undefined">\APACrefatitle</ERROR>Gaze in action: Head-mounted eye tracking of children’s
dynamic visual attention during naturalistic behavior Gaze in action:
Head-mounted eye tracking of children’s dynamic visual attention during
naturalistic behavior.<ERROR class="undefined">\BBCQ</ERROR>
</bibblock>
        <bibblock><ERROR class="undefined">\APACjournalVolNumPages</ERROR>Journal of Visualized Experiments141e58496.
<ERROR class="undefined">\PrintBackRefs</ERROR><ERROR class="undefined">\CurrentBib</ERROR>
</bibblock>
      </bibitem>
      <bibitem key="smith2011" xml:id="bib.bib20">
        <tags>
          <tag role="number">20</tag>
          <tag role="year"><ERROR class="undefined">\APACyear</ERROR>2011</tag>
          <tag role="authors">K. Smith <ERROR class="undefined">\BOthers</ERROR>.</tag>
          <tag role="fullauthors">K. Smith, Smith<ERROR class="undefined">\BCBL</ERROR> <ERROR class="undefined">\BBA</ERROR> Blythe</tag>
          <tag role="refnum">K. Smith <ERROR class="undefined">\BOthers</ERROR>. (<ERROR class="undefined">\APACyear</ERROR>2011)</tag>
          <tag role="key">smith2011</tag>
        </tags>
        <bibblock>
<ERROR class="undefined">\APACinsertmetastar</ERROR>smith2011<ERROR class="undefined">{APACrefauthors}</ERROR>Smith, K., Smith, A<ERROR class="undefined">\BPBI</ERROR>D.<ERROR class="undefined">\BCBL</ERROR> <ERROR class="undefined">\BBA</ERROR> Blythe, R<ERROR class="undefined">\BPBI</ERROR>A. </bibblock>
        <bibblock><ERROR class="undefined">\APACrefYearMonthDay</ERROR>2011.
</bibblock>
        <bibblock><ERROR class="undefined">\BBOQ</ERROR><ERROR class="undefined">\APACrefatitle</ERROR>Cross-situational learning: An experimental study of
word-learning mechanisms Cross-situational learning: An experimental study
of word-learning mechanisms.<ERROR class="undefined">\BBCQ</ERROR>
</bibblock>
        <bibblock><ERROR class="undefined">\APACjournalVolNumPages</ERROR>Cognitive Science353480–498.
<ERROR class="undefined">\PrintBackRefs</ERROR><ERROR class="undefined">\CurrentBib</ERROR>
</bibblock>
      </bibitem>
      <bibitem key="smith2018" xml:id="bib.bib21">
        <tags>
          <tag role="number">21</tag>
          <tag role="year"><ERROR class="undefined">\APACyear</ERROR>2018</tag>
          <tag role="authors">L<ERROR class="undefined">\BPBI</ERROR>B. Smith <ERROR class="undefined">\BOthers</ERROR>.</tag>
          <tag role="fullauthors">L<ERROR class="undefined">\BPBI</ERROR>B. Smith, Jayaraman, Clerkin<ERROR class="undefined">\BCBL</ERROR> <ERROR class="undefined">\BBA</ERROR> Yu</tag>
          <tag role="refnum">L<ERROR class="undefined">\BPBI</ERROR>B. Smith <ERROR class="undefined">\BOthers</ERROR>. (<ERROR class="undefined">\APACyear</ERROR>2018)</tag>
          <tag role="key">smith2018</tag>
        </tags>
        <bibblock>
<ERROR class="undefined">\APACinsertmetastar</ERROR>smith2018<ERROR class="undefined">{APACrefauthors}</ERROR>Smith, L<ERROR class="undefined">\BPBI</ERROR>B., Jayaraman, S., Clerkin, E.<ERROR class="undefined">\BCBL</ERROR> <ERROR class="undefined">\BBA</ERROR> Yu, C. </bibblock>
        <bibblock><ERROR class="undefined">\APACrefYearMonthDay</ERROR>2018.
</bibblock>
        <bibblock><ERROR class="undefined">\BBOQ</ERROR><ERROR class="undefined">\APACrefatitle</ERROR>The developing infant creates a curriculum for
statistical learning The developing infant creates a curriculum for
statistical learning.<ERROR class="undefined">\BBCQ</ERROR>
</bibblock>
        <bibblock><ERROR class="undefined">\APACjournalVolNumPages</ERROR>Trends in Cognitive Sciences224325–336.
<ERROR class="undefined">\PrintBackRefs</ERROR><ERROR class="undefined">\CurrentBib</ERROR>
<!--  %**** main.removed.bbl Line 600 **** --></bibblock>
      </bibitem>
      <bibitem key="suanda2016" xml:id="bib.bib22">
        <tags>
          <tag role="number">22</tag>
          <tag role="year"><ERROR class="undefined">\APACyear</ERROR>2016</tag>
          <tag role="authors">Suanda <ERROR class="undefined">\BOthers</ERROR>.</tag>
          <tag role="fullauthors">Suanda, Smith<ERROR class="undefined">\BCBL</ERROR> <ERROR class="undefined">\BBA</ERROR> Yu</tag>
          <tag role="refnum">Suanda <ERROR class="undefined">\BOthers</ERROR>. (<ERROR class="undefined">\APACyear</ERROR>2016)</tag>
          <tag role="key">suanda2016</tag>
        </tags>
        <bibblock>
<ERROR class="undefined">\APACinsertmetastar</ERROR>suanda2016<ERROR class="undefined">{APACrefauthors}</ERROR>Suanda, S<ERROR class="undefined">\BPBI</ERROR>H., Smith, L<ERROR class="undefined">\BPBI</ERROR>B.<ERROR class="undefined">\BCBL</ERROR> <ERROR class="undefined">\BBA</ERROR> Yu, C. </bibblock>
        <bibblock><ERROR class="undefined">\APACrefYearMonthDay</ERROR>2016.
</bibblock>
        <bibblock><ERROR class="undefined">\BBOQ</ERROR><ERROR class="undefined">\APACrefatitle</ERROR>The multisensory nature of verbal discourse in
parent–toddler interactions The multisensory nature of verbal discourse in
parent–toddler interactions.<ERROR class="undefined">\BBCQ</ERROR>
<!--  %**** main.removed.bbl Line 625 **** --></bibblock>
        <bibblock><ERROR class="undefined">\APACjournalVolNumPages</ERROR>Developmental Neuropsychology415-8324–341.
<ERROR class="undefined">\PrintBackRefs</ERROR><ERROR class="undefined">\CurrentBib</ERROR>
</bibblock>
      </bibitem>
      <bibitem key="yu2007unified" xml:id="bib.bib23">
        <tags>
          <tag role="number">23</tag>
          <tag role="year"><ERROR class="undefined">\APACyear</ERROR>2007</tag>
          <tag role="authors">Yu <ERROR class="undefined">\BBA</ERROR> Ballard</tag>
          <tag role="fullauthors">Yu <ERROR class="undefined">\BBA</ERROR> Ballard</tag>
          <tag role="refnum">Yu <ERROR class="undefined">\BBA</ERROR> Ballard (<ERROR class="undefined">\APACyear</ERROR>2007)</tag>
          <tag role="key">yu2007unified</tag>
        </tags>
        <bibblock>
<ERROR class="undefined">\APACinsertmetastar</ERROR>yu2007unified<ERROR class="undefined">{APACrefauthors}</ERROR>Yu, C.<ERROR class="undefined">\BCBT</ERROR> <ERROR class="undefined">\BBA</ERROR> Ballard, D<ERROR class="undefined">\BPBI</ERROR>H. </bibblock>
        <bibblock><ERROR class="undefined">\APACrefYearMonthDay</ERROR>2007.
</bibblock>
        <bibblock><ERROR class="undefined">\BBOQ</ERROR><ERROR class="undefined">\APACrefatitle</ERROR>A unified model of early word learning: Integrating
<!--  %**** main.removed.bbl Line 650 **** -->statistical and social cues A unified model of early word learning:
Integrating statistical and social cues.<ERROR class="undefined">\BBCQ</ERROR>
</bibblock>
        <bibblock><ERROR class="undefined">\APACjournalVolNumPages</ERROR>Neurocomputing7013-152149–2165.
<ERROR class="undefined">\PrintBackRefs</ERROR><ERROR class="undefined">\CurrentBib</ERROR>
</bibblock>
      </bibitem>
      <bibitem key="yu2012embodied" xml:id="bib.bib24">
        <tags>
          <tag role="number">24</tag>
          <tag role="year"><ERROR class="undefined">\APACyear</ERROR>2012</tag>
          <tag role="authors">Yu <ERROR class="undefined">\BBA</ERROR> Smith</tag>
          <tag role="fullauthors">Yu <ERROR class="undefined">\BBA</ERROR> Smith</tag>
          <tag role="refnum">Yu <ERROR class="undefined">\BBA</ERROR> Smith (<ERROR class="undefined">\APACyear</ERROR>2012)</tag>
          <tag role="key">yu2012embodied</tag>
        </tags>
        <bibblock>
<ERROR class="undefined">\APACinsertmetastar</ERROR>yu2012embodied<ERROR class="undefined">{APACrefauthors}</ERROR>Yu, C.<ERROR class="undefined">\BCBT</ERROR> <ERROR class="undefined">\BBA</ERROR> Smith, L<ERROR class="undefined">\BPBI</ERROR>B. </bibblock>
        <bibblock><ERROR class="undefined">\APACrefYearMonthDay</ERROR>2012.
<!--  %**** main.removed.bbl Line 675 **** --></bibblock>
        <bibblock><ERROR class="undefined">\BBOQ</ERROR><ERROR class="undefined">\APACrefatitle</ERROR>Embodied attention and word learning by toddlers
Embodied attention and word learning by toddlers.<ERROR class="undefined">\BBCQ</ERROR>
</bibblock>
        <bibblock><ERROR class="undefined">\APACjournalVolNumPages</ERROR>Cognition1252244–262.
<ERROR class="undefined">\PrintBackRefs</ERROR><ERROR class="undefined">\CurrentBib</ERROR>
</bibblock>
      </bibitem>
      <bibitem key="yu2019infant" xml:id="bib.bib25">
        <tags>
          <tag role="number">25</tag>
          <tag role="year"><ERROR class="undefined">\APACyear</ERROR>2019</tag>
          <tag role="authors">Yu <ERROR class="undefined">\BOthers</ERROR>.</tag>
          <tag role="fullauthors">Yu, Suanda<ERROR class="undefined">\BCBL</ERROR> <ERROR class="undefined">\BBA</ERROR> Smith</tag>
          <tag role="refnum">Yu <ERROR class="undefined">\BOthers</ERROR>. (<ERROR class="undefined">\APACyear</ERROR>2019)</tag>
          <tag role="key">yu2019infant</tag>
        </tags>
        <bibblock>
<ERROR class="undefined">\APACinsertmetastar</ERROR>yu2019infant<ERROR class="undefined">{APACrefauthors}</ERROR>Yu, C., Suanda, S<ERROR class="undefined">\BPBI</ERROR>H.<ERROR class="undefined">\BCBL</ERROR> <ERROR class="undefined">\BBA</ERROR> Smith, L<ERROR class="undefined">\BPBI</ERROR>B. <!--  %**** main.removed.bbl Line 700 **** --></bibblock>
        <bibblock><ERROR class="undefined">\APACrefYearMonthDay</ERROR>2019.
</bibblock>
        <bibblock><ERROR class="undefined">\BBOQ</ERROR><ERROR class="undefined">\APACrefatitle</ERROR>Infant sustained attention but not joint attention to
objects at 9 months predicts vocabulary at 12 and 15 months Infant
sustained attention but not joint attention to objects at 9 months predicts
vocabulary at 12 and 15 months.<ERROR class="undefined">\BBCQ</ERROR>
</bibblock>
        <bibblock><ERROR class="undefined">\APACjournalVolNumPages</ERROR>Developmental Science221e12735.
<ERROR class="undefined">\PrintBackRefs</ERROR><ERROR class="undefined">\CurrentBib</ERROR>
</bibblock>
      </bibitem>
    </biblist>
  </bibliography>
</document>
