<?xml version="1.0" encoding="UTF-8"?>
<?latexml searchpaths="/home/japhy/scienceReplication.artiswrong.com/paper_files/arxiv/2012.14842/latex_extracted"?>
<?latexml class="article"?>
<?latexml package="inputenc" options="utf8"?>
<?latexml package="graphicx"?>
<?latexml package="subcaption"?>
<?latexml package="textcomp,gensymb"?>
<?latexml package="neurips_2020" options="final, nonatbib"?>
<?latexml package="natbib" options="square,sort,comma,numbers"?>
<!--  %“usepackage[backend = biber, style = alphabetic, sorting = ynt]–biblatex˝ --><!--  %“addbibresource–references.bib˝ --><!--  %**** main.tex Line 25 **** --><?latexml RelaxNGSchema="LaTeXML"?>
<document xmlns="http://dlmf.nist.gov/LaTeXML" class="ltx_authors_1line">
  <resource src="LaTeXML.css" type="text/css"/>
  <resource src="ltx-article.css" type="text/css"/>
  <title>Modeling Social Interaction for Baby in Simulated Environment for Developmental Robotics</title>
  <creator role="author">
    <personname>
Md Ashaduzzaman Rubel Mondol <ERROR class="undefined">\And</ERROR>Aishwarya Pothula <ERROR class="undefined">\And</ERROR>Deokgun Park <ERROR class="undefined">\And</ERROR><break/>Computer Science and Engineering<break/>University of Texas at Arlington<break/>Arlington, Texas USA <break/><text font="typewriter"> {mdashaduzzaman.mondol, aishwarya.pothula}@mavs.uta.edu, deokgun.park@uta.edu</text> <break/></personname>
  </creator>
  <date role="creation">October 2020</date>
  <abstract name="Abstract">
    <p>Task-specific AI agents are showing remarkable performance across different domains. But modeling generalized AI agents like human intelligence will require more than current datasets or only reward-based environments that don’t include experiences that an infant gathers throughout its initial stages. In this paper, we present Simulated Environment for Developmental Robotics (SEDRo). It simulates the environments for a baby agent that a human baby experiences throughout the pre-born fetus stage to post-birth 12 months. SEDRo also includes a mother character to provide social interaction with the agent. To evaluate different developmental milestones of the agent, SEDRo incorporates some experiments from developmental psychology.</p>
  </abstract>
  <section inlist="toc" xml:id="S1">
    <tags>
      <tag>1</tag>
      <tag role="refnum">1</tag>
      <tag role="typerefnum">§1</tag>
    </tags>
    <title><tag close=" ">1</tag>Introduction</title>
    <para xml:id="S1.p1">
      <p>To develop an Artificial Intelligence agent that can perform diverse tasks is still far from reality. Although task-specific AI agents are outperforming humans in many fields, there is none to perform all the tasks a single human can do. To build such an agent, combining different current task-specific models poses a great challenge as training time and dataset requirements grow exponentially as the number of tasks grows.</p>
    </para>
    <para xml:id="S1.p2">
      <p>A human child is born with no experience of the world but over time learns to do many tasks that require complex sequential steps. The brain follows a universal algorithm to perform all the diverse tasks. To build a single agent to perform diversified tasks, we have to find out such kind of universal learning mechanism. Many researchers use a physical robot to study and test such a mechanism <cite class="ltx_citemacro_cite"><bibref bibrefs="johansson2020epi,metta2008icub,gouaillier2009mechatronic" separator=";" show="Authors Phrase1YearPhrase2" yyseparator=",">
            <bibrefphrase>(</bibrefphrase>
            <bibrefphrase>)</bibrefphrase>
          </bibref></cite>. Building such an agent would require an environment that facilitates the necessary elements for longitudinal learning. Training that kind of agent in the real world is costly in terms of both time and money. Moreover, it may not be possible to reproduce all the scenarios. Computer simulated environments that can provide realistic experiences has become a common approach to diminish this problem.</p>
    </para>
    <para xml:id="S1.p3">
      <p>The surrounding environment plays an important role in an infant’s learning. Studies suggest that social interaction influences cognitive development. Right from birth, the infant’s social interaction begins with its family members. The motherese or Infant Directed Speech (IDS) has a significant impact on the infants’ cognitive development and language acquisition<cite class="ltx_citemacro_citep">(<bibref bibrefs="catherine2013MothereseInteraction" separator=";" show="AuthorsPhrase1Year" yyseparator=",">
            <bibrefphrase>, </bibrefphrase>
          </bibref>)</cite>. And mostly the mother plays a vital role.</p>
    </para>
    <para xml:id="S1.p4">
      <p>Many researchers have been using computer-simulated environments to decode the different abilities of human baby like vision<cite class="ltx_citemacro_citep">(<bibref bibrefs="fuke2007facePerceptionSimulation" separator=";" show="AuthorsPhrase1Year" yyseparator=",">
            <bibrefphrase>, </bibrefphrase>
          </bibref>)</cite>, motor skills<cite class="ltx_citemacro_citep">(<bibref bibrefs="savastano2012reachingSimulation" separator=";" show="AuthorsPhrase1Year" yyseparator=",">
            <bibrefphrase>, </bibrefphrase>
          </bibref>)</cite> or in modeling Curiosity, Intrinsic Motivation <cite class="ltx_citemacro_citep">(<bibref bibrefs="barto2004IMSimulation,Fiore2008ratSimulationIM" separator=";" show="AuthorsPhrase1Year" yyseparator=",">
            <bibrefphrase>, </bibrefphrase>
          </bibref>)</cite>. There have been some simulated robot platforms also like iCub <cite class="ltx_citemacro_citep">(<bibref bibrefs="Tikhanoff2012iCub" separator=";" show="AuthorsPhrase1Year" yyseparator=",">
            <bibrefphrase>, </bibrefphrase>
          </bibref>)</cite>, Webot <cite class="ltx_citemacro_citep">(<bibref bibrefs="Michel2004Webots" separator=";" show="AuthorsPhrase1Year" yyseparator=",">
            <bibrefphrase>, </bibrefphrase>
          </bibref>)</cite> including some works of Fetus environment<cite class="ltx_citemacro_citep">(<bibref bibrefs="kuniyoshi2007fetusSimulator,kuniyoshi2010fetusSimulator" separator=";" show="AuthorsPhrase1Year" yyseparator=",">
            <bibrefphrase>, </bibrefphrase>
          </bibref>)</cite>. Simulated computer games like VizDoom<cite class="ltx_citemacro_citep">(<bibref bibrefs="kempka2016VizDoom" separator=";" show="AuthorsPhrase1Year" yyseparator=",">
            <bibrefphrase>, </bibrefphrase>
          </bibref>)</cite>, Obstacle Tower Challenge<cite class="ltx_citemacro_citep">(<bibref bibrefs="juliani2019obstacleTower" separator=";" show="AuthorsPhrase1Year" yyseparator=",">
            <bibrefphrase>, </bibrefphrase>
          </bibref>)</cite>. But these platforms don’t provide the environments a newborn baby experiences throughout its first year including the social interaction with a mother or other family members.</p>
    </para>
    <para xml:id="S1.p5">
      <p>In this paper, we present our ongoing work on building a Simulated Environment for Developmental Robotics (SEDRo) to facilitate the development of Generalized Intelligence of baby agent <cite class="ltx_citemacro_cite"><bibref bibrefs="pothula2020sedro" separator=";" show="Authors Phrase1YearPhrase2" yyseparator=",">
            <bibrefphrase>(</bibrefphrase>
            <bibrefphrase>)</bibrefphrase>
          </bibref></cite>. A mother character is added to interact with the baby agent. There will be different stages based on different ages of an infant after the birth as well as one stage that simulates the womb. Each stage will require the learning of previous stages to support incremental developments. Figure <ref labelref="LABEL:fig:sedro"/> shows the screenshots of the SEDRo.

<!--  %“input–sections/backgrounds˝ --></p>
    </para>
  </section>
  <section inlist="toc" xml:id="S2">
    <tags>
      <tag>2</tag>
      <tag role="refnum">2</tag>
      <tag role="typerefnum">§2</tag>
    </tags>
    <title><tag close=" ">2</tag>Proposed Environment</title>
    <para xml:id="S2.p1">
      <p>SEDRo will simulate the minimal experience of a baby starting from the fetus stage to 12 months after birth. The key part of SEDRo will be a body of the baby agent and a surrounding environment for the agent to interact. Another important part will be a care-giving character, which will interact with the agent as part of social interaction. Also, there will be other interactive objects like furniture, toys, etc. The agent may interact with the surrounding objects in the room. A model of the agent can interact with the environment with the interface which is an extension of the OpenAI Gym API <cite class="ltx_citemacro_cite"><bibref bibrefs="brockman2016openai" separator=";" show="Authors Phrase1YearPhrase2" yyseparator=",">
            <bibrefphrase>(</bibrefphrase>
            <bibrefphrase>)</bibrefphrase>
          </bibref></cite>. There will be four developmental stages with two different environments (Fetus and After-birth) to mimic different stages of the baby- 1) Fetus, 2) Immobile, 3) Crawling, and 4) Walking stage. Each one will provide a different experience for the agent and also unfold the new capabilities of the agent.</p>
    </para>
    <figure inlist="lof" labels="LABEL:fig:sedro" xml:id="S2.F1">
      <tags>
        <tag><text fontsize="90%">Figure 1</text></tag>
        <tag role="refnum">1</tag>
        <tag role="typerefnum">Figure 1</tag>
      </tags>
<!--  %“begin–subfigure˝[b]–.49“textwidth˝ 
     %“centering
     %“includegraphics[width = 1“textwidth]–figures/big˙picture.png˝
     %“caption–˝
     %“label–fig:fetus˙env˝
     %“end–subfigure˝
     %“hskip1mm-->      <figure align="center" inlist="lof" labels="LABEL:fig:fetus_env" placement="b" xml:id="S2.F0.sf1">
        <tags>
          <tag><text fontsize="90%">(a)</text></tag>
          <tag role="refnum">0(a)</tag>
        </tags>
        <graphics candidates="figures/fetus_env.jpg" class="ltx_centering" graphic="figures/fetus_env.jpg" options="width=433.62pt" xml:id="S2.F0.sf1.g1"/>
        <toccaption class="ltx_centering"><tag close=" ">(a)</tag></toccaption>
        <caption class="ltx_centering"><tag close=" "><text fontsize="90%">(a)</text></tag></caption>
      </figure>
      <figure align="center" inlist="lof" labels="LABEL:fig:after_birth_env" placement="b" xml:id="S2.F0.sf2">
        <tags>
          <tag><text fontsize="90%">(b)</text></tag>
          <tag role="refnum">0(b)</tag>
        </tags>
        <graphics candidates="figures/after_birth_env.jpg" class="ltx_centering" graphic="figures/after_birth_env.jpg" options="width=433.62pt" xml:id="S2.F0.sf2.g1"/>
        <toccaption class="ltx_centering"><tag close=" ">(b)</tag></toccaption>
        <caption class="ltx_centering"><tag close=" "><text fontsize="90%">(b)</text></tag></caption>
<!--  %**** environment.tex Line 25 **** -->      </figure>
      <figure align="center" inlist="lof" labels="LABEL:fig:mother_feeding" placement="b" xml:id="S2.F0.sf3">
        <tags>
          <tag><text fontsize="90%">(c)</text></tag>
          <tag role="refnum">0(c)</tag>
        </tags>
        <graphics candidates="figures/Mother_feeding.jpg" class="ltx_centering" graphic="figures/Mother_feeding.jpg" options="scale=0.09" xml:id="S2.F0.sf3.g1"/>
        <toccaption class="ltx_centering"><tag close=" ">(c)</tag></toccaption>
        <caption class="ltx_centering"><tag close=" "><text fontsize="90%">(c)</text></tag></caption>
      </figure>
<!--  %“hskip1mm -->      <figure align="center" inlist="lof" labels="LABEL:fig:mother_toy" placement="b" xml:id="S2.F0.sf4">
        <tags>
          <tag><text fontsize="90%">(d)</text></tag>
          <tag role="refnum">0(d)</tag>
        </tags>
        <graphics candidates="figures/mother_showing_toy.jpg" class="ltx_centering" graphic="figures/mother_showing_toy.jpg" options="scale=0.115" xml:id="S2.F0.sf4.g1"/>
        <toccaption class="ltx_centering"><tag close=" ">(d)</tag></toccaption>
        <caption class="ltx_centering"><tag close=" "><text fontsize="90%">(d)</text></tag></caption>
      </figure>
      <toccaption class="ltx_centering"><tag close=" ">1</tag><text fontsize="90%">Screenshots from SEDRo environments. (a) Fetus environment. Almost dark space with no visual capability of the baby. (b) After birth house environment with a Mother character and some other toys. (c) Mother is feeding the baby. (d) Mother is showing a toy to the baby</text></toccaption>
      <caption class="ltx_centering"><tag close=": "><text fontsize="90%">Figure 1</text></tag><text fontsize="90%">Screenshots from SEDRo environments. (a) Fetus environment. Almost dark space with no visual capability of the baby. (b) After birth house environment with a Mother character and some other toys. (c) Mother is feeding the baby. (d) Mother is showing a toy to the baby</text></caption>
    </figure>
    <subsection inlist="toc" xml:id="S2.SS1">
      <tags>
        <tag>2.1</tag>
        <tag role="refnum">2.1</tag>
        <tag role="typerefnum">§2.1</tag>
      </tags>
      <title><tag close=" ">2.1</tag>The Agent</title>
      <para xml:id="S2.SS1.p1">
        <p>The agent body has been developed with capabilities to crawl, walk, grasp objects, and follow the mother’s attention. But these capabilities will unfold gradually in each subsequent stage. Initially, the maximum torque value for the joint motion is too small to support walking, but they will increase over time to enable it. The agent has 64 degree of freedom for the body part movements.</p>
      </para>
<!--  %“begin–figure˝[t] 
     %“centering
     %“subfloat[Near peripheral vision]–“includegraphics[width = .23“textwidth]–figures/vision/agent˙vision˙near˙peripheral.png˝˝
     %“hspace–1mm˝
     %**** environment.tex Line 50 ****
     %“subfloat[Far peripheral vision]–“includegraphics[width = .23“textwidth]–figures/vision/agent˙vision˙far˙peri.png˝˝
     %“hspace–1mm˝
     %“subfloat[Central vision in-focus]–“includegraphics[width = .23“textwidth]–figures/vision/agent˙vision˙central˙in˙focus.png˝˝
     %“hspace–1mm˝
     %“subfloat[central vision out-of-focus]–“includegraphics[width = .23“textwidth]–figures/vision/agent˙vision˙central˙out˙focus.png˝˝
     %“caption–“small Agent’s vision ˝
     %“label–fig:agent˙vision˝
     %“end–figure˝-->      <paragraph inlist="toc" xml:id="S2.SS1.SSS0.Px1">
        <title>Vision</title>
        <para xml:id="S2.SS1.SSS0.Px1.p1">
          <p>The agent has a binocular vision system with two eyes. Both eyes have a combined 3 degree of freedom; 1 for horizontal, 1 for vertical rotation of the eyeballs, and 1 to adjust focus. Each eye contains two cameras to simulate the central(8°) and peripheral(100°) vision of the human eye. There is one more camera on the head placed between the eyes of the agent to generate the combined visual image of both eyes. This is an optional camera provided for debugging purposes. Also, the depth of field effect has been applied to all the cameras for nearsighted focusing effects, since early infants cannot focus beyond arm’s length. Figure <ref labelref="LABEL:fig:agent_vision"/> shows different visual inputs for the agent.</p>
        </para>
        <figure inlist="lof" labels="LABEL:fig:agent_vision" placement="ht" xml:id="S2.F2">
          <tags>
            <tag><text fontsize="90%">Figure 2</text></tag>
            <tag role="refnum">2</tag>
            <tag role="typerefnum">Figure 2</tag>
          </tags>
          <figure align="center" inlist="lof" labels="LABEL:fig:agent_vision_near_peri" placement="b" xml:id="S2.F1.sf1">
            <tags>
              <tag><text fontsize="90%">(a)</text></tag>
              <tag role="refnum">1(a)</tag>
            </tags>
            <graphics candidates="figures/vision/agent_vision_near_peripheral.jpg" class="ltx_centering" graphic="figures/vision/agent_vision_near_peripheral.jpg" options="width=433.62pt" xml:id="S2.F1.sf1.g1"/>
            <toccaption class="ltx_centering"><tag close=" ">(a)</tag></toccaption>
            <caption class="ltx_centering"><tag close=" "><text fontsize="90%">(a)</text></tag></caption>
          </figure>
          <figure align="center" inlist="lof" labels="LABEL:fig:agent_vision_far_peri" placement="b" xml:id="S2.F1.sf2">
            <tags>
              <tag><text fontsize="90%">(b)</text></tag>
              <tag role="refnum">1(b)</tag>
            </tags>
            <graphics candidates="figures/vision/agent_vision_far_peri.jpg" class="ltx_centering" graphic="figures/vision/agent_vision_far_peri.jpg" options="width=433.62pt" xml:id="S2.F1.sf2.g1"/>
<!--  %**** environment.tex Line 75 **** -->            <toccaption class="ltx_centering"><tag close=" ">(b)</tag></toccaption>
            <caption class="ltx_centering"><tag close=" "><text fontsize="90%">(b)</text></tag></caption>
          </figure>
          <figure align="center" inlist="lof" labels="LABEL:fig:agent_vision_in_focus" placement="b" xml:id="S2.F1.sf3">
            <tags>
              <tag><text fontsize="90%">(c)</text></tag>
              <tag role="refnum">1(c)</tag>
            </tags>
            <graphics candidates="figures/vision/agent_vision_central_in_focus.jpg" class="ltx_centering" graphic="figures/vision/agent_vision_central_in_focus.jpg" options="width=433.62pt" xml:id="S2.F1.sf3.g1"/>
            <toccaption class="ltx_centering"><tag close=" ">(c)</tag></toccaption>
            <caption class="ltx_centering"><tag close=" "><text fontsize="90%">(c)</text></tag></caption>
          </figure>
          <figure align="center" inlist="lof" labels="LABEL:fig:agent_vision_out_of_focus" placement="b" xml:id="S2.F1.sf4">
            <tags>
              <tag><text fontsize="90%">(d)</text></tag>
              <tag role="refnum">1(d)</tag>
            </tags>
            <graphics candidates="figures/vision/agent_vision_central_out_focus.jpg" class="ltx_centering" graphic="figures/vision/agent_vision_central_out_focus.jpg" options="width=433.62pt" xml:id="S2.F1.sf4.g1"/>
            <toccaption class="ltx_centering"><tag close=" ">(d)</tag></toccaption>
            <caption class="ltx_centering"><tag close=" "><text fontsize="90%">(d)</text></tag></caption>
          </figure>
          <figure align="center" inlist="lof" labels="LABEL:fig:agent_vision_left_peri" placement="b" xml:id="S2.F1.sf5">
            <tags>
              <tag><text fontsize="90%">(e)</text></tag>
              <tag role="refnum">1(e)</tag>
            </tags>
            <graphics candidates="figures/vision/left_peripheral.jpg" class="ltx_centering" graphic="figures/vision/left_peripheral.jpg" options="width=433.62pt" xml:id="S2.F1.sf5.g1"/>
            <toccaption class="ltx_centering"><tag close=" ">(e)</tag></toccaption>
            <caption class="ltx_centering"><tag close=" "><text fontsize="90%">(e)</text></tag></caption>
          </figure>
<!--  %**** environment.tex Line 100 **** -->          <figure align="center" inlist="lof" labels="LABEL:fig:agent_vision_left_central" placement="b" xml:id="S2.F1.sf6">
            <tags>
              <tag><text fontsize="90%">(f)</text></tag>
              <tag role="refnum">1(f)</tag>
            </tags>
            <graphics candidates="figures/vision/left_central.jpg" class="ltx_centering" graphic="figures/vision/left_central.jpg" options="width=433.62pt" xml:id="S2.F1.sf6.g1"/>
            <toccaption class="ltx_centering"><tag close=" ">(f)</tag></toccaption>
            <caption class="ltx_centering"><tag close=" "><text fontsize="90%">(f)</text></tag></caption>
          </figure>
          <figure align="center" inlist="lof" labels="LABEL:fig:agent_vision_right_peripheral" placement="b" xml:id="S2.F1.sf7">
            <tags>
              <tag><text fontsize="90%">(g)</text></tag>
              <tag role="refnum">1(g)</tag>
            </tags>
            <graphics candidates="figures/vision/right_peripheral.jpg" class="ltx_centering" graphic="figures/vision/right_peripheral.jpg" options="width=433.62pt" xml:id="S2.F1.sf7.g1"/>
            <toccaption class="ltx_centering"><tag close=" ">(g)</tag></toccaption>
            <caption class="ltx_centering"><tag close=" "><text fontsize="90%">(g)</text></tag></caption>
          </figure>
          <figure align="center" inlist="lof" labels="LABEL:fig:agent_vision_right_central" placement="b" xml:id="S2.F1.sf8">
            <tags>
              <tag><text fontsize="90%">(h)</text></tag>
              <tag role="refnum">1(h)</tag>
            </tags>
            <graphics candidates="figures/vision/right_central.jpg" class="ltx_centering" graphic="figures/vision/right_central.jpg" options="width=433.62pt" xml:id="S2.F1.sf8.g1"/>
            <toccaption class="ltx_centering"><tag close=" ">(h)</tag></toccaption>
            <caption class="ltx_centering"><tag close=" "><text fontsize="90%">(h)</text></tag></caption>
          </figure>
          <toccaption class="ltx_centering"><tag close=" ">2</tag><text fontsize="90%">Baby Agent’s vision. (a) Nearsighted peripheral vision. (b) Farsighted peripheral vision. (c) Central vision in-focus. (d) central vision out-of-focus. (e) Left peripheral vision. (f) Left central vision. (g) Right peripheral vision. (h) Right central vision</text></toccaption>
          <caption class="ltx_centering"><tag close=": "><text fontsize="90%">Figure 2</text></tag><text fontsize="90%">Baby Agent’s vision. (a) Nearsighted peripheral vision. (b) Farsighted peripheral vision. (c) Central vision in-focus. (d) central vision out-of-focus. (e) Left peripheral vision. (f) Left central vision. (g) Right peripheral vision. (h) Right central vision</text></caption>
        </figure>
      </paragraph>
      <paragraph inlist="toc" xml:id="S2.SS1.SSS0.Px2">
        <title>Tactile Sensitivity</title>
<!--  %**** environment.tex Line 125 **** -->        <para xml:id="S2.SS1.SSS0.Px2.p1">
          <p>As part of tactile sensitivity, the agent is equipped with touch sensors of varying density, since the perception of touch differs across the body  <cite class="ltx_citemacro_citep">(<bibref bibrefs="rochelle2014TouchDensity" separator=";" show="AuthorsPhrase1Year" yyseparator=",">
                <bibrefphrase>, </bibrefphrase>
              </bibref>)</cite>. There are a total of 2110 tactile sensors placed across the body. About half of the sensors are placed on the head. The sensors detect the touch, based on the collision detection mechanism. Each sensor generates a value of 1 when a ’touch’ occurs or 0 otherwise. A sparse status vector is generated consisting of all sensor status and sent as part of observations.</p>
        </para>
      </paragraph>
      <paragraph inlist="toc" xml:id="S2.SS1.SSS0.Px3">
        <title>Proprioception</title>
        <para xml:id="S2.SS1.SSS0.Px3.p1">
          <p>To learn the association of spatial locations and body parts movements, the baby will require its current joint positions along with visual information. In SEDRo, the current positions and rotations of all the body joints are given to the agent’s observations. In total, 469 observations will be provided with continuous values ranging from -1 to 1. This vector will also include each joint’s velocity and angular velocity to help understand the current body movements.</p>
        </para>
      </paragraph>
      <paragraph inlist="toc" xml:id="S2.SS1.SSS0.Px4">
        <title>Interoception</title>
        <para xml:id="S2.SS1.SSS0.Px4.p1">
          <p>The baby’s stomach food level is also given with the observation vector as a body’s internal sensitivity. It represents the current percentage of food available in the stomach. With time, the food level reduces. When the food level falls below a certain threshold representing the hunger, the baby needs to cry to get food. Whenever the baby cries, the mother will feed it and the stomach food level will increase.</p>
        </para>
      </paragraph>
    </subsection>
    <subsection inlist="toc" xml:id="S2.SS2">
      <tags>
        <tag>2.2</tag>
        <tag role="refnum">2.2</tag>
        <tag role="typerefnum">§2.2</tag>
      </tags>
      <title><tag close=" ">2.2</tag>Modeling the Motherese</title>
      <para xml:id="S2.SS2.p1">
        <p>We implemented a mother character for interacting with the baby. Both the baby and mother are inside a small house. The baby is placed in the crib or on the floor based on different situations and the baby’s age.</p>
      </para>
      <paragraph inlist="toc" xml:id="S2.SS2.SSS0.Px1">
        <title>Building the Mother</title>
        <para xml:id="S2.SS2.SSS0.Px1.p1">
          <p>While facilitating the baby agent’s intelligence development, one challenging part is building a mother that can interact with the baby. We tackle this challenge by limiting the experience up to the first year after birth because most interaction in this period does not require open-ended back-and-forth interaction.
The mother has been programmed with some pre-programmed action capabilities to interact with the baby.
We are building a library of mother’s actions based on real-life mother-child interaction.
Currently, we are manually building scenarios and we plan to analyze different video recordings from the houses with newborn babies.
To make realistic behaviors, we will create the movements of the mother with pre-recorded motion captured(Mocap) animations based on those real footages. So, we will get a library with various responses of the mother for the same type of baby’s actions.</p>
        </para>
      </paragraph>
      <paragraph inlist="toc" xml:id="S2.SS2.SSS0.Px2">
        <title>Interaction with baby</title>
        <para xml:id="S2.SS2.SSS0.Px2.p1">
          <p>As a concrete example of the social scenario, feeding the baby will be the mother’s regular interaction. The mother will feed the baby at pre-scheduled times of the day. Also, the baby will cry when the stomach food level is below the threshold and the mother will begin feeding scenario. When feeding, it will move towards the baby and start the feeding. While walking and feeding, the mother can avoid different obstacles and also adjust body positions during the feeding animation based on the baby’s current location.</p>
        </para>
        <para xml:id="S2.SS2.SSS0.Px2.p2">
          <p>Providing Infant Directed Speech (IDS) is another significant role played by mothers that helps with a child’s development. For IDS in SEDRo, the mother character will be talking to the baby with small words and physical expressions like nodding head while looking at the baby, moving arms. One limitation for implementing the mother’s vocal is that sound can’t be added directly as part of observation for the agent model yet. To overcome this limitation, in our initial version of SEDRo, we will use a one-hot encoded vector of length 26 to represent one English character at every time frame.</p>
        </para>
        <para xml:id="S2.SS2.SSS0.Px2.p3">
          <p>For joint attention, the mother will be holding different objects e.g. toys in front of the baby, looking at it, and describe its identity in small words. Also at a more developed stage, if the baby grabs or touches any object, the mother will describe it. Figure. <ref labelref="LABEL:fig:mother_feeding"/>, <ref labelref="LABEL:fig:mother_toy"/> shows mother’s interaction with baby.</p>
        </para>
<!--  %To identify and respond to baby’s actions, its random actions will be classified to match with some of the pre-determined action categories and a random response of that category will be selected as mother’s response. -->      </paragraph>
    </subsection>
  </section>
  <section inlist="toc" xml:id="S3">
    <tags>
      <tag>3</tag>
      <tag role="refnum">3</tag>
      <tag role="typerefnum">§3</tag>
    </tags>
    <title><tag close=" ">3</tag>Evaluation of Development</title>
    <para xml:id="S3.p1">
      <p>There have been many experiments to evaluate and track the progress of cognitive, motor skill, and visual developments of human babies in Developmental Psychologies <cite class="ltx_citemacro_cite"><bibref bibrefs="cangelosi2015developmental" separator=";" show="Authors Phrase1YearPhrase2" yyseparator=",">
            <bibrefphrase>(</bibrefphrase>
            <bibrefphrase>)</bibrefphrase>
          </bibref></cite>. Similarly, SEDRo will provide different experiments to evaluate the developmental progress of the agent. Figure <ref labelref="LABEL:fig:paper_rod_exp"/> shows one such experiment where a moving rod occluded by a box is shown to the baby. Newborn babies under three months age think of it as two different rods but, older babies see it as a single rod<cite class="ltx_citemacro_citep">(<bibref bibrefs="slater1990evaluation" separator=";" show="AuthorsPhrase1Year" yyseparator=",">
            <bibrefphrase>, </bibrefphrase>
          </bibref>)</cite>. This test is to evaluate the unity perception of babies. Similarly, more experiments will be provided in SEDRo from developmental psychologies.</p>
    </para>
    <figure inlist="lof" labels="LABEL:fig:evaluation" placement="t" xml:id="S3.F3">
      <tags>
        <tag><text fontsize="90%">Figure 3</text></tag>
        <tag role="refnum">3</tag>
        <tag role="typerefnum">Figure 3</tag>
      </tags>
      <figure align="center" inlist="lof" labels="LABEL:fig:paper_rod_exp" placement="b" xml:id="S3.F2.sf1">
        <tags>
          <tag><text fontsize="90%">(a)</text></tag>
          <tag role="refnum">2(a)</tag>
        </tags>
        <graphics candidates="figures/rod_and_box.jpg" class="ltx_centering" graphic="figures/rod_and_box.jpg" options="width=433.62pt" xml:id="S3.F2.sf1.g1"/>
        <toccaption class="ltx_centering"><tag close=" ">(a)</tag></toccaption>
        <caption class="ltx_centering"><tag close=" "><text fontsize="90%">(a)</text></tag></caption>
      </figure>
      <figure align="center" inlist="lof" labels="LABEL:fig:paper_rod_sedro" placement="b" xml:id="S3.F2.sf2">
        <tags>
          <tag><text fontsize="90%">(b)</text></tag>
          <tag role="refnum">2(b)</tag>
        </tags>
        <graphics candidates="figures/paper_rod.jpg" class="ltx_centering" graphic="figures/paper_rod.jpg" options="width=433.62pt" xml:id="S3.F2.sf2.g1"/>
        <toccaption class="ltx_centering"><tag close=" ">(b)</tag></toccaption>
        <caption class="ltx_centering"><tag close=" "><text fontsize="90%">(b)</text></tag></caption>
      </figure>
      <toccaption class="ltx_centering"><tag close=" ">3</tag><text fontsize="90%">Evaluation experiments. (a) Paper rod experiment to evaluate unity perception<cite class="ltx_citemacro_citep">(<bibref bibrefs="slater1990evaluation" separator=";" show="AuthorsPhrase1Year" yyseparator=",">
              <bibrefphrase>, </bibrefphrase>
            </bibref>)</cite>. (b) Paper rod experiment simulation in SEDRo</text></toccaption>
      <caption class="ltx_centering"><tag close=": "><text fontsize="90%">Figure 3</text></tag><text fontsize="90%">Evaluation experiments. (a) Paper rod experiment to evaluate unity perception<cite class="ltx_citemacro_citep">(<bibref bibrefs="slater1990evaluation" separator=";" show="AuthorsPhrase1Year" yyseparator=",">
              <bibrefphrase>, </bibrefphrase>
            </bibref>)</cite>. (b) Paper rod experiment simulation in SEDRo</text></caption>
    </figure>
  </section>
  <section inlist="toc" xml:id="S4">
    <tags>
      <tag>4</tag>
      <tag role="refnum">4</tag>
      <tag role="typerefnum">§4</tag>
    </tags>
    <title><tag close=" ">4</tag>Discussion</title>
    <para xml:id="S4.p1">
      <p>So far, we have presented our in-progress works. SEDRo is currently being implemented using the Unity 3D game engine. It will be improved further over time as we add more social interaction scenarios between the mother character and the agent.</p>
    </para>
    <para xml:id="S4.p2">
      <p>In this version, we have simulated the motherese voice with character sequence observations. This can be further improved by adding the audio data directly to the observations. Representing voice as a text has a limitation. It’s not possible to add variations in a speech this way, but generating prosodic voice requires variations in tone and length of the sounds. And a prosodic voice from motherese plays an important role in language development for the babies<cite class="ltx_citemacro_citep">(<bibref bibrefs="catherine2013MothereseInteraction" separator=";" show="AuthorsPhrase1Year" yyseparator=",">
            <bibrefphrase>, </bibrefphrase>
          </bibref>)</cite>. In the future version, we plan to include vocal audio data as observation for the agent model.</p>
    </para>
    <pagination role="newpage"/>
<!--  %“bibliography–main.bbl˝ -->  </section>
  <bibliography citestyle="numbers" files="references.bib" xml:id="bib">
    <title>References</title>
  </bibliography>
</document>
