<?xml version="1.0" encoding="UTF-8"?>
<?latexml searchpaths="/home/japhy/scienceReplication.artiswrong.com/paper_files/arxiv/2405.07816/latex_extracted"?>
<?latexml class="article"?>
<!--  %if you need to pass options to natbib, use, e.g.: --><!--  %“PassOptionsToPackage–numbers, compress˝–natbib˝ --><!--  %before loading neurips˙2023 --><!--  %ready for submission --><!--  %“usepackage–neurips˙2023˝ --><!--  %to compile a preprint version, e.g., for submission to arXiv, add add the --><!--  %[preprint] option: --><!--  %“usepackage[preprint]–neurips˙2023˝ --><!--  %to compile a camera-ready version, add the [final] option, e.g.: --><?latexml package="neurips_2023" options="final"?>
<!--  %to avoid loading the natbib package, add option nonatbib: --><!--  %“usepackage[nonatbib]–neurips˙2023˝ --><!--  %**** neurips˙2023.tex Line 25 **** --><?latexml package="inputenc" options="utf8"?>
<?latexml package="fontenc" options="T1"?>
<?latexml package="hyperref"?>
<?latexml package="url"?>
<?latexml package="booktabs"?>
<?latexml package="amsfonts"?>
<?latexml package="nicefrac"?>
<?latexml package="microtype"?>
<?latexml package="xcolor"?>
<?latexml package="comment"?>
<?latexml package="graphicx"?>
<?latexml package="subcaption"?>
<?latexml package="amsmath"?>
<!--  %The␣\author␣macro␣works␣with␣any␣number␣of␣authors.␣There␣are␣two␣commands --><!--  %used␣to␣separate␣the␣names␣and␣addresses␣of␣multiple␣authors:␣\And␣and␣\AND. --><!--  %Using␣\And␣between␣authors␣leaves␣it␣to␣LaTeX␣to␣determine␣where␣to␣break␣the --><!--  %lines.␣Using␣\AND␣forces␣a␣line␣break␣at␣that␣point.␣So,␣if␣LaTeX␣puts␣3␣of␣4 --><!--  %authors␣names␣on␣the␣first␣line,␣and␣the␣last␣on␣the␣second␣line,␣try␣using --><!--  %****␣neurips_2023.tex␣Line␣50␣**** --><!--  %\AND␣instead␣of␣\And␣before␣the␣third␣author␣name. --><?latexml RelaxNGSchema="LaTeXML"?>
<document xmlns="http://dlmf.nist.gov/LaTeXML" class="ltx_authors_1line">
  <resource src="LaTeXML.css" type="text/css"/>
  <resource src="ltx-article.css" type="text/css"/>
  <title>Quick and Accurate Affordance Learning</title>
  <creator role="author">
    <personname>Fedor Scholz <break/>Computer Science, Cognitive Modeling <break/>University of Tübingen <break/><text font="typewriter">fedor.scholz@uni-tuebingen.de</text> <break/>&amp;Erik Ayari <break/>Computer Science, Cognitive Modeling <break/>University of Tübingen
<ERROR class="undefined">\AND</ERROR>Johannes Bertram <break/>Computer Science, Cognitive Modeling <break/>University of Tübingen
&amp;Martin V. Butz <break/>Computer Science, Cognitive Modeling <break/>University of Tübingen
</personname>
  </creator>
  <abstract name="Abstract">
    <p>Infants learn actively in their environments, shaping their own learning curricula.
<!--  %They␣selectively␣pay␣attention,␣orient␣themselves,␣and␣later␣move␣to␣particular␣stimuli␣and␣entities. 
     %As␣a␣result,␣infants␣master␣progressively␣more␣complex␣tasks,␣and␣exhibit␣physical␣and␣agentive␣knowledge.-->They learn about their environments’ affordances, that is, how local circumstances determine how their behavior can affect the environment.
Here we model this type of behavior by means of a deep learning architecture.
The architecture mediates between global cognitive map exploration and local affordance learning.
Inference processes actively move the simulated agent towards regions where they expect affordance-related knowledge gain.
We contrast three measures of uncertainty to guide this exploration: predicted uncertainty of a model, standard deviation between the means of several models (SD), and the Jensen-Shannon Divergence (JSD) between several models.
We show that the first measure gets fooled by aleatoric uncertainty inherent in the environment, while the two other measures focus learning on epistemic uncertainty.
JSD exhibits the most balanced exploration strategy.
<!--  %However,␣SD␣yields␣behavior␣that␣is␣attracted␣by␣extreme␣behavior␣regimes,␣while␣JSD␣shows␣the␣most␣balanced␣epistemic␣exploration␣strategy. 
     %While␣our␣brain␣may␣have␣other␣means␣to␣implement␣the␣proposed␣processes,␣on␣computational␣levels␣our␣model␣suggests␣three␣key␣ingredients␣for␣coordinating␣the␣active␣generation␣of␣learning␣curricula:-->From a computational perspective, our model suggests three key ingredients for coordinating the active generation of learning curricula:
(1) Navigation behavior needs to be coordinated with local motor behavior for enabling active affordance learning.
(2) Affordances need to be encoded locally for acquiring generalized knowledge.
(3) Effective active affordance learning mechanisms should use density comparison techniques for estimating expected knowledge gain.
<!--  %We␣furthermore␣exhibit␣the␣possibility␣to␣couple␣the␣uncovered␣techniques␣with␣other␣motivations␣to␣explore␣the␣possibility␣to␣learning␣about␣affordances␣even␣more␣directly. -->Future work may seek collaborations with developmental psychology to model active play in children in more realistic scenarios.</p>
  </abstract>
<!--  %****␣neurips_2023.tex␣Line␣75␣**** 
     %****␣neurips_2023.tex␣Line␣100␣****-->  <section inlist="toc" xml:id="S1">
    <tags>
      <tag>1</tag>
      <tag role="autoref">section 1</tag>
      <tag role="refnum">1</tag>
      <tag role="typerefnum">§1</tag>
    </tags>
    <title><tag close=" ">1</tag>Introduction</title>
<!--  %Affordances -->    <para xml:id="S1.p1">
      <p>Humans learn internal models of their environment in order to interact with it in a flexible, adaptive, and context-dependent manner <cite class="ltx_citemacro_cite"><bibref bibrefs="Butz2016" separator=";" show="Authors Phrase1YearPhrase2" yyseparator=",">
            <bibrefphrase>(</bibrefphrase>
            <bibrefphrase>)</bibrefphrase>
          </bibref></cite>.
Which models are suitable at a certain point in time depends on the current state of the environment:
In order to be able to prepare a cup of tea, both a cup and tea must be available and within reach <cite class="ltx_citemacro_cite"><bibref bibrefs="Kuperberg2021" separator=";" show="Authors Phrase1YearPhrase2" yyseparator=",">
            <bibrefphrase>(</bibrefphrase>
            <bibrefphrase>)</bibrefphrase>
          </bibref></cite>.
This observation is captured by the psychological concept of affordances as introduced by <cite class="ltx_citemacro_citeauthor"><bibref bibrefs="Gibson1986" separator=";" show="Authors" yyseparator=","/></cite>:
affordances encode which behaviors are possible in a given world state <cite class="ltx_citemacro_cite"><bibref bibrefs="Gibson1986" separator=";" show="Authors Phrase1YearPhrase2" yyseparator=",">
            <bibrefphrase>(</bibrefphrase>
            <bibrefphrase>)</bibrefphrase>
          </bibref></cite>.
While it may be said that, on the level of motor commands, any bodily action is executable at any point in time, its outcome clearly depends on the current context.
For example, in the absence of a cup and tea, ‘preparing a cup of tea’ actions will at best result in pantomime.
Therefore, we define affordances—slightly more general than <cite class="ltx_citemacro_citeauthor"><bibref bibrefs="Gibson1986" separator=";" show="Authors" yyseparator=","/></cite>—as any factors in the environment that locally influence the outcome or success of an agent’s actions, that is, that afford particular interactions and prohibit others.</p>
    </para>
<!--  %Allocentric␣vs␣egocentric -->    <para xml:id="S1.p2">
      <p>But what if an agent wants to learn how to prepare a cup of tea in a situation where neither is in reach?
In order to actively search for a location where critical preconditions are met, an allocentric map that relates coordinates to locally available affordances is needed.
Such a map enables an agent to search in allocentric space where to satisfy its personal motivations.
<!--  %****␣neurips_2023.tex␣Line␣125␣**** -->While learning, it can use the map to actively explore affordances by maximizing expected information gain.
By focusing on aspects that can be learned and disregarding aspects that cannot be learned, we emulate curiosity and boredom, and thereby let the agent create its own learning curriculum <cite class="ltx_citemacro_cite"><bibref bibrefs="Smith2018" separator=";" show="Authors Phrase1YearPhrase2" yyseparator=",">
            <bibrefphrase>(</bibrefphrase>
            <bibrefphrase>)</bibrefphrase>
          </bibref></cite>.
<!--  %As␣we␣will␣demonstrate,␣affordances␣should␣be␣encoded␣\emph{egocentrically}␣to␣allow␣generalization␣to␣similar␣environments. --></p>
    </para>
    <para xml:id="S1.p3">
      <p>Neuroscientific evidence suggests that the brain indeed learns such cognitive maps.
It was shown that such maps enable not only the navigation in allocentric, world-centered spaces but also the instantiation of local circumstances for reasoning and planning as well as for reflecting on the past and imagining potential futures <cite class="ltx_citemacro_cite"><bibref bibrefs="Bottini:2020,Buckner:2007,Tolman:1948,O'Keefe:1978" separator=";" show="Authors Phrase1YearPhrase2" yyseparator=",">
            <bibrefphrase>(</bibrefphrase>
            <bibrefphrase>)</bibrefphrase>
          </bibref></cite>.
To date it remains unclear, though, how such dual-use maps may be learned.
Here we assume the availability of an allocentric cognitive map for navigation and for providing sensory cues about local circumstances.
We fully focus on studying how to navigate the environment to effectively learn about the affordances the environment offers.</p>
    </para>
<!--  %The␣map␣thus␣enables␣the␣agent␣to␣focsu␣on␣learning␣affordances 
     %and␣thus␣provide␣the␣ability␣to␣navigate␣and␣plan␣in␣an␣allocentric␣map␣that␣enables␣the␣activation␣of␣egocentric␣observations␣given␣an␣allocentric␣location.
     %With␣the␣help␣of␣this␣map,␣the␣simulated␣agent␣becomes␣able␣to␣focus␣on␣learning␣about␣affordances
     %,␣that␣is,␣about␣encodings␣that␣specify␣how␣egocentric␣observations␣signal␣modifications␣in␣the␣effects␣of␣unfolding␣behavior.
     %by␣navigating␣allocentric␣space␣in␣the␣search␣for␣learning␣opportunities.
     %Simulation-->    <para xml:id="S1.p4">
      <p>We showcase our reasoning in an artificial world with a simulated agent.
The agent perceives its environment via sensors and is able to imagine navigating it utilizing the provided cognitive map.
The environment it lives in is confined by borders and contains terrains that influence the behavioral dynamics of the agent in distinct manners:
obstacles block the passage; <!--  %and␣may␣trigger␣negative␣reward␣dependent␣on␣the␣impact␣speed␣and␣the␣softness␣of␣the␣boundary; -->force fields accelerate the agent in a certain direction;
fog fields corrupt the sensory signals with noise, mimicking an aleatoric uncertainty region.
Our study shows how navigation behavior may target affordance-respective knowledge gain, that affordances should be encoded egocentrically, and that expected knowledge gain may best be computed by means of information-theoretic belief density comparisons.
<!--  %****␣neurips_2023.tex␣Line␣150␣**** --></p>
    </para>
  </section>
  <section inlist="toc" xml:id="S2">
    <tags>
      <tag>2</tag>
      <tag role="autoref">section 2</tag>
      <tag role="refnum">2</tag>
      <tag role="typerefnum">§2</tag>
    </tags>
    <title><tag close=" ">2</tag>Background</title>
    <para xml:id="S2.p1">
      <p>The problem setting we are concerned with can be described as a Markov Decision Process (MDP), where an agent receives observations from an environment and, based on that, executes actions that presumably lead to a desired state.
To this end, world models that encode visual information for planning were introduced before.
<cite class="ltx_citemacro_citeauthor"><bibref bibrefs="Ha2018a" separator=";" show="Authors" yyseparator=","/></cite> trained a vision model to produce codes that aid a controller in action selection <cite class="ltx_citemacro_cite"><bibref bibrefs="Ha2018a" separator=";" show="Authors Phrase1YearPhrase2" yyseparator=",">
            <bibrefphrase>(</bibrefphrase>
            <bibrefphrase>)</bibrefphrase>
          </bibref></cite>.
Since their vision model was trained as an autoencoder, though, it did not specifically produce codes that facilitate the controller’s performance.
Therefore, we do not regard the outputs of their vision model as <emph font="italic">affordance</emph> codes:
They do not necessarily extract behavior-relevant information, but are produced to reconstruct the visual input.
<cite class="ltx_citemacro_citeauthor"><bibref bibrefs="Qi2020" separator=";" show="Authors" yyseparator=","/></cite> went one step further by training a neural network to encode behavior-relevant information <cite class="ltx_citemacro_cite"><bibref bibrefs="Qi2020" separator=";" show="Authors Phrase1YearPhrase2" yyseparator=",">
            <bibrefphrase>(</bibrefphrase>
            <bibrefphrase>)</bibrefphrase>
          </bibref></cite>.
An agent was put into an environment to gather information about regions of harm and no harm.
The experiences were backpropagated onto the input of the visual system, producing affordance maps.
Subsequently, they trained a convolutional neural network to generate these maps from the visual input.
Therefore, the architecture was not trained in an end-to-end fashion.
As a result, the codes produced by the neural network were not optimized for behavioral control, which was anyway performed by a hard-coded A* algorithm.</p>
    </para>
    <para xml:id="S2.p2">
      <p>In contrast to these studies, our work focuses on learning and exploration of a mapping from positions to affordances while a fixed, allocentric world model is provided, namely the world itself.
This is in line with <cite class="ltx_citemacro_citet"><bibref bibrefs="Epstein2015" separator=";" show="Authors Phrase1YearPhrase2" yyseparator=",">
            <bibrefphrase/>
            <bibrefphrase/>
          </bibref></cite>, where the authors present an architecture that learns actual spatial affordances for navigation <cite class="ltx_citemacro_cite"><bibref bibrefs="Epstein2015" separator=";" show="Authors Phrase1YearPhrase2" yyseparator=",">
            <bibrefphrase>(</bibrefphrase>
            <bibrefphrase>)</bibrefphrase>
          </bibref></cite>.
Their space of affordances, however, is limited to three predefined affordances specifically designed for navigation.
Similarly, their action selection algorithm is based on handcrafted advisors and does not plan into the future.</p>
    </para>
    <subsection inlist="toc" xml:id="S2.SS1">
      <tags>
        <tag>2.1</tag>
        <tag role="autoref">subsection 2.1</tag>
        <tag role="refnum">2.1</tag>
        <tag role="typerefnum">§2.1</tag>
      </tags>
      <title><tag close=" ">2.1</tag>Affordance architecture</title>
<!--  %****␣neurips_2023.tex␣Line␣175␣**** 
     %Our␣affordance␣architecture-->      <para xml:id="S2.SS1.p1">
        <p>The first end-to-end trained affordance architecture that produces codes that are explicitly optimized for behavioral control was introduced in <cite class="ltx_citemacro_citet"><bibref bibrefs="Scholz2022" separator=";" show="Authors Phrase1YearPhrase2" yyseparator=",">
              <bibrefphrase/>
              <bibrefphrase/>
            </bibref></cite>.
In this case, the world model consists of a look-up map <Math mode="inline" tex="\omega" text="omega" xml:id="S2.SS1.p1.m1">
            <XMath>
              <XMTok font="italic" name="omega" role="UNKNOWN">ω</XMTok>
            </XMath>
          </Math>, an affordance model <Math mode="inline" tex="a_{M}" text="a _ M" xml:id="S2.SS1.p1.m2">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="italic" role="UNKNOWN">a</XMTok>
                <XMTok font="italic" fontsize="70%" role="UNKNOWN">M</XMTok>
              </XMApp>
            </XMath>
          </Math>, and a transition model <Math mode="inline" tex="t_{M}" text="t _ M" xml:id="S2.SS1.p1.m3">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="italic" role="UNKNOWN">t</XMTok>
                <XMTok font="italic" fontsize="70%" role="UNKNOWN">M</XMTok>
              </XMApp>
            </XMath>
          </Math> (see Figure <ref labelref="LABEL:fig:architecture"/>).
Given a position <Math mode="inline" tex="\mathbf{p}_{t}" text="p _ t" xml:id="S2.SS1.p1.m4">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="bold" role="UNKNOWN">p</XMTok>
                <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
              </XMApp>
            </XMath>
          </Math> in time step <Math mode="inline" tex="t" text="t" xml:id="S2.SS1.p1.m5">
            <XMath>
              <XMTok font="italic" role="UNKNOWN">t</XMTok>
            </XMath>
          </Math>, the hard-coded look-up map <Math mode="inline" tex="\omega" text="omega" xml:id="S2.SS1.p1.m6">
            <XMath>
              <XMTok font="italic" name="omega" role="UNKNOWN">ω</XMTok>
            </XMath>
          </Math> produces a sensory representation <Math mode="inline" tex="\mathbf{v}_{t}" text="v _ t" xml:id="S2.SS1.p1.m7">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="bold" role="UNKNOWN">v</XMTok>
                <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
              </XMApp>
            </XMath>
          </Math> of the environment at that position.
The affordance model is a convolutional neural network (CNN) that computes a context code <Math mode="inline" tex="\mathbf{c}_{t}" text="c _ t" xml:id="S2.SS1.p1.m8">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="bold" role="UNKNOWN">c</XMTok>
                <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
              </XMApp>
            </XMath>
          </Math> based on <Math mode="inline" tex="\mathbf{v}_{t}" text="v _ t" xml:id="S2.SS1.p1.m9">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="bold" role="UNKNOWN">v</XMTok>
                <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
              </XMApp>
            </XMath>
          </Math>.
The transition model <Math mode="inline" tex="t_{M}" text="t _ M" xml:id="S2.SS1.p1.m10">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="italic" role="UNKNOWN">t</XMTok>
                <XMTok font="italic" fontsize="70%" role="UNKNOWN">M</XMTok>
              </XMApp>
            </XMath>
          </Math>—a multi-layer perceptron (MLP)—utilizes <Math mode="inline" tex="\mathbf{c}_{t}" text="c _ t" xml:id="S2.SS1.p1.m11">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="bold" role="UNKNOWN">c</XMTok>
                <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
              </XMApp>
            </XMath>
          </Math> as an additional input to predict the parameters of a probability distribution over positional changes <Math mode="inline" tex="(\mu_{\Delta\tilde{\mathbf{p}}^{t+1}},\sigma_{\Delta\tilde{\mathbf{p}}^{t+1}})" text="open-interval@(mu _ (Delta * (tilde@(p)) ^ (t + 1)), sigma _ (Delta * (tilde@(p)) ^ (t + 1)))" xml:id="S2.SS1.p1.m12">
            <XMath>
              <XMDual>
                <XMApp>
                  <XMTok meaning="open-interval"/>
                  <XMRef idref="S2.SS1.p1.m12.1"/>
                  <XMRef idref="S2.SS1.p1.m12.2"/>
                </XMApp>
                <XMWrap>
                  <XMTok role="OPEN" stretchy="false">(</XMTok>
                  <XMApp xml:id="S2.SS1.p1.m12.1">
                    <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="italic" name="mu" role="UNKNOWN">μ</XMTok>
                    <XMApp>
                      <XMTok meaning="times" role="MULOP">⁢</XMTok>
                      <XMTok fontsize="70%" name="Delta" role="UNKNOWN">Δ</XMTok>
                      <XMApp>
                        <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                        <XMApp>
                          <XMTok fontsize="70%" name="tilde" role="OVERACCENT" stretchy="false">~</XMTok>
                          <XMTok font="bold" fontsize="70%" role="UNKNOWN">p</XMTok>
                        </XMApp>
                        <XMApp>
                          <XMTok fontsize="50%" meaning="plus" role="ADDOP">+</XMTok>
                          <XMTok font="italic" fontsize="50%" role="UNKNOWN">t</XMTok>
                          <XMTok fontsize="50%" meaning="1" role="NUMBER">1</XMTok>
                        </XMApp>
                      </XMApp>
                    </XMApp>
                  </XMApp>
                  <XMTok role="PUNCT">,</XMTok>
                  <XMApp xml:id="S2.SS1.p1.m12.2">
                    <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="italic" name="sigma" role="UNKNOWN">σ</XMTok>
                    <XMApp>
                      <XMTok meaning="times" role="MULOP">⁢</XMTok>
                      <XMTok fontsize="70%" name="Delta" role="UNKNOWN">Δ</XMTok>
                      <XMApp>
                        <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                        <XMApp>
                          <XMTok fontsize="70%" name="tilde" role="OVERACCENT" stretchy="false">~</XMTok>
                          <XMTok font="bold" fontsize="70%" role="UNKNOWN">p</XMTok>
                        </XMApp>
                        <XMApp>
                          <XMTok fontsize="50%" meaning="plus" role="ADDOP">+</XMTok>
                          <XMTok font="italic" fontsize="50%" role="UNKNOWN">t</XMTok>
                          <XMTok fontsize="50%" meaning="1" role="NUMBER">1</XMTok>
                        </XMApp>
                      </XMApp>
                    </XMApp>
                  </XMApp>
                  <XMTok role="CLOSE" stretchy="false">)</XMTok>
                </XMWrap>
              </XMDual>
            </XMath>
          </Math> given the last change in position <Math mode="inline" tex="\Delta\mathbf{p}^{t}" text="Delta * p ^ t" xml:id="S2.SS1.p1.m13">
            <XMath>
              <XMApp>
                <XMTok meaning="times" role="MULOP">⁢</XMTok>
                <XMTok name="Delta" role="UNKNOWN">Δ</XMTok>
                <XMApp>
                  <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                  <XMTok font="bold" role="UNKNOWN">p</XMTok>
                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
                </XMApp>
              </XMApp>
            </XMath>
          </Math> and the executed action <Math mode="inline" tex="\mathbf{a}_{t}" text="a _ t" xml:id="S2.SS1.p1.m14">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="bold" role="UNKNOWN">a</XMTok>
                <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
              </XMApp>
            </XMath>
          </Math>.</p>
      </para>
      <figure inlist="lof" labels="LABEL:fig:architecture" xml:id="S2.F1">
        <tags>
          <tag><text fontsize="90%">Figure 1</text></tag>
          <tag role="autoref">Figure 1</tag>
          <tag role="refnum">1</tag>
          <tag role="typerefnum">Figure 1</tag>
        </tags>
        <graphics candidates="figures/architecture.pdf" class="ltx_centering" graphic="figures/architecture.pdf" options="width=346.896pt" xml:id="S2.F1.g1"/>
        <toccaption class="ltx_centering"><tag close=" ">1</tag>
The overall architecture.
The look-up map <Math mode="inline" tex="\omega" text="omega" xml:id="S2.F1.m1">
            <XMath>
              <XMTok font="italic" name="omega" role="UNKNOWN">ω</XMTok>
            </XMath>
          </Math> provides visual representations <Math mode="inline" tex="v_{t}" text="v _ t" xml:id="S2.F1.m2">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="italic" role="UNKNOWN">v</XMTok>
                <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
              </XMApp>
            </XMath>
          </Math> of the environment at positions <Math mode="inline" tex="p_{t}" text="p _ t" xml:id="S2.F1.m3">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="italic" role="UNKNOWN">p</XMTok>
                <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
              </XMApp>
            </XMath>
          </Math>.
The affordance model <Math mode="inline" tex="a_{M}" text="a _ M" xml:id="S2.F1.m4">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="italic" role="UNKNOWN">a</XMTok>
                <XMTok font="italic" fontsize="70%" role="UNKNOWN">M</XMTok>
              </XMApp>
            </XMath>
          </Math> translates these representations into context codes <Math mode="inline" tex="c_{t}" text="c _ t" xml:id="S2.F1.m5">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="italic" role="UNKNOWN">c</XMTok>
                <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
              </XMApp>
            </XMath>
          </Math>, which are utilized by the transition model <Math mode="inline" tex="t_{M}" text="t _ M" xml:id="S2.F1.m6">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="italic" role="UNKNOWN">t</XMTok>
                <XMTok font="italic" fontsize="70%" role="UNKNOWN">M</XMTok>
              </XMApp>
            </XMath>
          </Math> for the generation of predictions in the form of expected positional changes <Math mode="inline" tex="(\mu_{\Delta\tilde{p}^{t+1}},\sigma_{\Delta\tilde{p}^{t+1}})" text="open-interval@(mu _ (Delta * (tilde@(p)) ^ (t + 1)), sigma _ (Delta * (tilde@(p)) ^ (t + 1)))" xml:id="S2.F1.m7">
            <XMath>
              <XMDual>
                <XMApp>
                  <XMTok meaning="open-interval"/>
                  <XMRef idref="S2.F1.m7.1"/>
                  <XMRef idref="S2.F1.m7.2"/>
                </XMApp>
                <XMWrap>
                  <XMTok role="OPEN" stretchy="false">(</XMTok>
                  <XMApp xml:id="S2.F1.m7.1">
                    <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="italic" name="mu" role="UNKNOWN">μ</XMTok>
                    <XMApp>
                      <XMTok meaning="times" role="MULOP">⁢</XMTok>
                      <XMTok fontsize="70%" name="Delta" role="UNKNOWN">Δ</XMTok>
                      <XMApp>
                        <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                        <XMApp>
                          <XMTok fontsize="70%" name="tilde" role="OVERACCENT" stretchy="false">~</XMTok>
                          <XMTok font="italic" fontsize="70%" role="UNKNOWN">p</XMTok>
                        </XMApp>
                        <XMApp>
                          <XMTok fontsize="50%" meaning="plus" role="ADDOP">+</XMTok>
                          <XMTok font="italic" fontsize="50%" role="UNKNOWN">t</XMTok>
                          <XMTok fontsize="50%" meaning="1" role="NUMBER">1</XMTok>
                        </XMApp>
                      </XMApp>
                    </XMApp>
                  </XMApp>
                  <XMTok role="PUNCT">,</XMTok>
                  <XMApp xml:id="S2.F1.m7.2">
                    <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="italic" name="sigma" role="UNKNOWN">σ</XMTok>
                    <XMApp>
                      <XMTok meaning="times" role="MULOP">⁢</XMTok>
                      <XMTok fontsize="70%" name="Delta" role="UNKNOWN">Δ</XMTok>
                      <XMApp>
                        <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                        <XMApp>
                          <XMTok fontsize="70%" name="tilde" role="OVERACCENT" stretchy="false">~</XMTok>
                          <XMTok font="italic" fontsize="70%" role="UNKNOWN">p</XMTok>
                        </XMApp>
                        <XMApp>
                          <XMTok fontsize="50%" meaning="plus" role="ADDOP">+</XMTok>
                          <XMTok font="italic" fontsize="50%" role="UNKNOWN">t</XMTok>
                          <XMTok fontsize="50%" meaning="1" role="NUMBER">1</XMTok>
                        </XMApp>
                      </XMApp>
                    </XMApp>
                  </XMApp>
                  <XMTok role="CLOSE" stretchy="false">)</XMTok>
                </XMWrap>
              </XMDual>
            </XMath>
          </Math>.
During training, the negative log-likelihood loss between predicted and observed <Math mode="inline" tex="\Delta p^{t+1}" text="Delta * p ^ (t + 1)" xml:id="S2.F1.m8">
            <XMath>
              <XMApp>
                <XMTok meaning="times" role="MULOP">⁢</XMTok>
                <XMTok name="Delta" role="UNKNOWN">Δ</XMTok>
                <XMApp>
                  <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                  <XMTok font="italic" role="UNKNOWN">p</XMTok>
                  <XMApp>
                    <XMTok fontsize="70%" meaning="plus" role="ADDOP">+</XMTok>
                    <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
                    <XMTok fontsize="70%" meaning="1" role="NUMBER">1</XMTok>
                  </XMApp>
                </XMApp>
              </XMApp>
            </XMath>
          </Math> observations is backpropagated to <Math mode="inline" tex="t_{M}" text="t _ M" xml:id="S2.F1.m9">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="italic" role="UNKNOWN">t</XMTok>
                <XMTok font="italic" fontsize="70%" role="UNKNOWN">M</XMTok>
              </XMApp>
            </XMath>
          </Math> (red arrows) and further to <Math mode="inline" tex="a_{M}" text="a _ M" xml:id="S2.F1.m10">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="italic" role="UNKNOWN">a</XMTok>
                <XMTok font="italic" fontsize="70%" role="UNKNOWN">M</XMTok>
              </XMApp>
            </XMath>
          </Math> (orange arrows), training both subcomponents end-to-end.
During control, potential behavioral interactions are evaluated via a reward function that combines estimates of epistemic knowledge gain with estimates of goal state proximity.
In this paper we fully focus on affordance learning and thus on epistemic knowledge gain.
</toccaption>
        <caption class="ltx_centering"><tag close=": "><text fontsize="90%">Figure 1</text></tag><text fontsize="90%">
The overall architecture.
The look-up map <Math mode="inline" tex="\omega" text="omega" xml:id="S2.F1.m11">
              <XMath>
                <XMTok font="italic" name="omega" role="UNKNOWN">ω</XMTok>
              </XMath>
            </Math> provides visual representations <Math mode="inline" tex="v_{t}" text="v _ t" xml:id="S2.F1.m12">
              <XMath>
                <XMApp>
                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                  <XMTok font="italic" role="UNKNOWN">v</XMTok>
                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
                </XMApp>
              </XMath>
            </Math> of the environment at positions <Math mode="inline" tex="p_{t}" text="p _ t" xml:id="S2.F1.m13">
              <XMath>
                <XMApp>
                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                  <XMTok font="italic" role="UNKNOWN">p</XMTok>
                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
                </XMApp>
              </XMath>
            </Math>.
The affordance model <Math mode="inline" tex="a_{M}" text="a _ M" xml:id="S2.F1.m14">
              <XMath>
                <XMApp>
                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                  <XMTok font="italic" role="UNKNOWN">a</XMTok>
                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">M</XMTok>
                </XMApp>
              </XMath>
            </Math> translates these representations into context codes <Math mode="inline" tex="c_{t}" text="c _ t" xml:id="S2.F1.m15">
              <XMath>
                <XMApp>
                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                  <XMTok font="italic" role="UNKNOWN">c</XMTok>
                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
                </XMApp>
              </XMath>
            </Math>, which are utilized by the transition model <Math mode="inline" tex="t_{M}" text="t _ M" xml:id="S2.F1.m16">
              <XMath>
                <XMApp>
                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                  <XMTok font="italic" role="UNKNOWN">t</XMTok>
                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">M</XMTok>
                </XMApp>
              </XMath>
            </Math> for the generation of predictions in the form of expected positional changes <Math mode="inline" tex="(\mu_{\Delta\tilde{p}^{t+1}},\sigma_{\Delta\tilde{p}^{t+1}})" text="open-interval@(mu _ (Delta * (tilde@(p)) ^ (t + 1)), sigma _ (Delta * (tilde@(p)) ^ (t + 1)))" xml:id="S2.F1.m17">
              <XMath>
                <XMDual>
                  <XMApp>
                    <XMTok meaning="open-interval"/>
                    <XMRef idref="S2.F1.m17.1"/>
                    <XMRef idref="S2.F1.m17.2"/>
                  </XMApp>
                  <XMWrap>
                    <XMTok role="OPEN" stretchy="false">(</XMTok>
                    <XMApp xml:id="S2.F1.m17.1">
                      <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                      <XMTok font="italic" name="mu" role="UNKNOWN">μ</XMTok>
                      <XMApp>
                        <XMTok meaning="times" role="MULOP">⁢</XMTok>
                        <XMTok fontsize="70%" name="Delta" role="UNKNOWN">Δ</XMTok>
                        <XMApp>
                          <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                          <XMApp>
                            <XMTok fontsize="70%" name="tilde" role="OVERACCENT" stretchy="false">~</XMTok>
                            <XMTok font="italic" fontsize="70%" role="UNKNOWN">p</XMTok>
                          </XMApp>
                          <XMApp>
                            <XMTok fontsize="50%" meaning="plus" role="ADDOP">+</XMTok>
                            <XMTok font="italic" fontsize="50%" role="UNKNOWN">t</XMTok>
                            <XMTok fontsize="50%" meaning="1" role="NUMBER">1</XMTok>
                          </XMApp>
                        </XMApp>
                      </XMApp>
                    </XMApp>
                    <XMTok role="PUNCT">,</XMTok>
                    <XMApp xml:id="S2.F1.m17.2">
                      <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                      <XMTok font="italic" name="sigma" role="UNKNOWN">σ</XMTok>
                      <XMApp>
                        <XMTok meaning="times" role="MULOP">⁢</XMTok>
                        <XMTok fontsize="70%" name="Delta" role="UNKNOWN">Δ</XMTok>
                        <XMApp>
                          <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                          <XMApp>
                            <XMTok fontsize="70%" name="tilde" role="OVERACCENT" stretchy="false">~</XMTok>
                            <XMTok font="italic" fontsize="70%" role="UNKNOWN">p</XMTok>
                          </XMApp>
                          <XMApp>
                            <XMTok fontsize="50%" meaning="plus" role="ADDOP">+</XMTok>
                            <XMTok font="italic" fontsize="50%" role="UNKNOWN">t</XMTok>
                            <XMTok fontsize="50%" meaning="1" role="NUMBER">1</XMTok>
                          </XMApp>
                        </XMApp>
                      </XMApp>
                    </XMApp>
                    <XMTok role="CLOSE" stretchy="false">)</XMTok>
                  </XMWrap>
                </XMDual>
              </XMath>
            </Math>.
During training, the negative log-likelihood loss between predicted and observed <Math mode="inline" tex="\Delta p^{t+1}" text="Delta * p ^ (t + 1)" xml:id="S2.F1.m18">
              <XMath>
                <XMApp>
                  <XMTok meaning="times" role="MULOP">⁢</XMTok>
                  <XMTok name="Delta" role="UNKNOWN">Δ</XMTok>
                  <XMApp>
                    <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="italic" role="UNKNOWN">p</XMTok>
                    <XMApp>
                      <XMTok fontsize="70%" meaning="plus" role="ADDOP">+</XMTok>
                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
                      <XMTok fontsize="70%" meaning="1" role="NUMBER">1</XMTok>
                    </XMApp>
                  </XMApp>
                </XMApp>
              </XMath>
            </Math> observations is backpropagated to <Math mode="inline" tex="t_{M}" text="t _ M" xml:id="S2.F1.m19">
              <XMath>
                <XMApp>
                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                  <XMTok font="italic" role="UNKNOWN">t</XMTok>
                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">M</XMTok>
                </XMApp>
              </XMath>
            </Math> (red arrows) and further to <Math mode="inline" tex="a_{M}" text="a _ M" xml:id="S2.F1.m20">
              <XMath>
                <XMApp>
                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                  <XMTok font="italic" role="UNKNOWN">a</XMTok>
                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">M</XMTok>
                </XMApp>
              </XMath>
            </Math> (orange arrows), training both subcomponents end-to-end.
During control, potential behavioral interactions are evaluated via a reward function that combines estimates of epistemic knowledge gain with estimates of goal state proximity.
In this paper we fully focus on affordance learning and thus on epistemic knowledge gain.
</text></caption>
      </figure>
<!--  %****␣neurips_2023.tex␣Line␣200␣**** 
     %Training-->      <para xml:id="S2.SS1.p2">
        <p>Given sequences of position and action pairs, the model is trained end-to-end via backpropagation through time.
The loss is given by the negative log-likelihood of the observed change in position <Math mode="inline" tex="\Delta\mathbf{p}^{t+1}" text="Delta * p ^ (t + 1)" xml:id="S2.SS1.p2.m1">
            <XMath>
              <XMApp>
                <XMTok meaning="times" role="MULOP">⁢</XMTok>
                <XMTok name="Delta" role="UNKNOWN">Δ</XMTok>
                <XMApp>
                  <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                  <XMTok font="bold" role="UNKNOWN">p</XMTok>
                  <XMApp>
                    <XMTok fontsize="70%" meaning="plus" role="ADDOP">+</XMTok>
                    <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
                    <XMTok fontsize="70%" meaning="1" role="NUMBER">1</XMTok>
                  </XMApp>
                </XMApp>
              </XMApp>
            </XMath>
          </Math> in the predicted distribution.
This way, the affordance model <Math mode="inline" tex="a_{M}" text="a _ M" xml:id="S2.SS1.p2.m2">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="italic" role="UNKNOWN">a</XMTok>
                <XMTok font="italic" fontsize="70%" role="UNKNOWN">M</XMTok>
              </XMApp>
            </XMath>
          </Math> tends to produce context codes <Math mode="inline" tex="\mathbf{c}_{t}" text="c _ t" xml:id="S2.SS1.p2.m3">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="bold" role="UNKNOWN">c</XMTok>
                <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
              </XMApp>
            </XMath>
          </Math> that facilitate accurate predictions by the transition model <Math mode="inline" tex="t_{M}" text="t _ M" xml:id="S2.SS1.p2.m4">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="italic" role="UNKNOWN">t</XMTok>
                <XMTok font="italic" fontsize="70%" role="UNKNOWN">M</XMTok>
              </XMApp>
            </XMath>
          </Math>.
The motor commands for data generation were selected randomly with a bias towards maintaining the same motor command for a few time steps.
We select a bias that ensures a comprehensive coverage of the environment, emulating an approximately uniformly distributed exploration with regards to positions.</p>
      </para>
<!--  %Goal-directed␣control -->      <para xml:id="S2.SS1.p3">
        <p>Given a state and a policy, i.e., a sequence of actions, the world model enables the agent to imagine how the interaction with its environment unfolds over time.
In order to predict multiple time steps into the future, the predicted mean of the positional change is fed back into the model as the observed change in position.
The agent then probes the look-up map <Math mode="inline" tex="\omega" text="omega" xml:id="S2.SS1.p3.m1">
            <XMath>
              <XMTok font="italic" name="omega" role="UNKNOWN">ω</XMTok>
            </XMath>
          </Math> with the anticipated position, allowing it to probe the environment for local sensory representations.
We employ the cross-entropy method <cite class="ltx_citemacro_cite"><bibref bibrefs="Rubinstein1999" separator=";" show="Authors Phrase1YearPhrase2" yyseparator=",">
              <bibrefphrase>(</bibrefphrase>
              <bibrefphrase>)</bibrefphrase>
            </bibref></cite>, an evolutionary optimization algorithm, to infer the behavior that is expected to maximize reward and perform goal-directed control.</p>
      </para>
    </subsection>
    <subsection inlist="toc" xml:id="S2.SS2">
      <tags>
        <tag>2.2</tag>
        <tag role="autoref">subsection 2.2</tag>
        <tag role="refnum">2.2</tag>
        <tag role="typerefnum">§2.2</tag>
      </tags>
      <title><tag close=" ">2.2</tag>Exploration</title>
      <para xml:id="S2.SS2.p1">
        <p>The model was trained on previously generated sequences of observation and action pairs.
As mentioned before, a heuristic was used that led to sequences which covered the whole environment.
During the development of this heuristic it became apparent that it’s exact implementation heavily influences the model’s final performance and how fast it was able to learn.
Suboptimal heuristics lead to the agent getting stuck in corners, thereby neglecting other parts of the environment.</p>
      </para>
      <para xml:id="S2.SS2.p2">
        <p>We conclude that it would be helpful for the agent to actively explore its environment based on estimates of potential information gain.
The agent should realize where its model is not able to generate accurate predictions and should thus explore those areas to improve its knowledge.
The affordance maps from above allow it to plan considering environmental circumstances, thus enabling it to focus on affordance learning.
<!--  %****␣neurips_2023.tex␣Line␣225␣**** -->To the best of our knowledge, such affordance-driven learning has not been explored before.</p>
      </para>
<!--  %Uncertainty-based␣exploration -->      <para xml:id="S2.SS2.p3">
        <p>Active exploration can be guided by uncertainty, because reducing uncertainty translates into a more accurate world model.
The mechanism should choose actions that produce high uncertainty in order to learn their effects in the current context and reduce uncertainty in the long run.
Usually, not all uncertainty can be reduced though.
As is often done in the machine learning community, we distinguish between two kinds of uncertainty: epistemic and aleatoric uncertainty <cite class="ltx_citemacro_cite"><bibref bibrefs="Kiureghian2009,Hüllermeier2021" separator=";" show="Authors Phrase1YearPhrase2" yyseparator=",">
              <bibrefphrase>(</bibrefphrase>
              <bibrefphrase>)</bibrefphrase>
            </bibref></cite>.
Epistemic uncertainty is inherent in the model.
It arises due to incompleteness or inaccuracy and can often be reduced by learning.
In contrast, aleatoric uncertainty is inherent in the environment and cannot be reduced by learning.
An example is the casting of a die, the outcome of which is practically unpredictable.
To learn affordances quickly, the exploration mechanism should disregard aleatoric uncertainty, as there is nothing to be learned from it.
Instead, it should choose actions that lead to high epistemic uncertainty.
In this way, the agent will choose actions that have the highest learning potential.
It is thus necessary to distinguish between aleatoric and epistemic uncertainties, to enable the active exploration of epistemic uncertainty while avoiding aleatoric uncertainty <cite class="ltx_citemacro_cite"><bibref bibrefs="Vlastelica2021" separator=";" show="Authors Phrase1YearPhrase2" yyseparator=",">
              <bibrefphrase>(</bibrefphrase>
              <bibrefphrase>)</bibrefphrase>
            </bibref></cite>.</p>
      </para>
<!--  %Exploration␣vs␣exploitation 
     %Existing␣RL-systems␣have␣unstructured␣maps␣\cite{Butz:2010a}.
     %and␣also␣from␣\cite{Scholz2022}:
     %While␣certainly␣possible,␣it␣is␣not␣straight-
     %forward␣how␣this␣would␣be␣implemented␣in␣a␣classical␣RL␣agent.␣Classical␣RL␣agents␣do␣not␣predict
     %positional␣changes␣which␣are␣necessary␣for␣the␣look-up.␣Furthermore,␣it␣was␣shown␣that␣RL␣agents
     %struggle␣with␣offline␣learning␣[Levine␣et␣al.,␣2020]␣and␣generalization␣to␣similar␣environments␣[Cobbe
     %et␣al.,␣2019]-->    </subsection>
  </section>
  <section inlist="toc" xml:id="S3">
    <tags>
      <tag>3</tag>
      <tag role="autoref">section 3</tag>
      <tag role="refnum">3</tag>
      <tag role="typerefnum">§3</tag>
    </tags>
    <title><tag close=" ">3</tag>Methods</title>
<!--  %Environment -->    <para xml:id="S3.p1">
      <p>The environment of our MDP is a 2-dimensional physics-based simulation.
The agent is represented by a circular, inert vehicle that is able to glide around by sending motor commands to its four rocket jets, which cause accelerations in four diagonal directions.
Observations consist of the last change in position <Math mode="inline" tex="\Delta\mathbf{p}_{t}" text="Delta * p _ t" xml:id="S3.p1.m1">
          <XMath>
            <XMApp>
              <XMTok meaning="times" role="MULOP">⁢</XMTok>
              <XMTok name="Delta" role="UNKNOWN">Δ</XMTok>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="bold" role="UNKNOWN">p</XMTok>
                <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
              </XMApp>
            </XMApp>
          </XMath>
        </Math> and a visual representation of the environment <Math mode="inline" tex="\mathbf{v}_{t}" text="v _ t" xml:id="S3.p1.m2">
          <XMath>
            <XMApp>
              <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
              <XMTok font="bold" role="UNKNOWN">v</XMTok>
              <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
            </XMApp>
          </XMath>
        </Math> at the currently considered position <Math mode="inline" tex="\mathbf{p}_{t}" text="p _ t" xml:id="S3.p1.m3">
          <XMath>
            <XMApp>
              <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
              <XMTok font="bold" role="UNKNOWN">p</XMTok>
              <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
            </XMApp>
          </XMath>
        </Math>.
Actions determine to which extent the four jets are activated.
Their activities directly translate into accelerations.
<!--  %****␣neurips_2023.tex␣Line␣275␣**** --></p>
    </para>
<!--  %Change␣in␣architecture:␣image-based␣-&gt;␣distance-based -->    <para xml:id="S3.p2">
      <p>In <cite class="ltx_citemacro_citet"><bibref bibrefs="Scholz2022" separator=";" show="Authors Phrase1YearPhrase2" yyseparator=",">
            <bibrefphrase/>
            <bibrefphrase/>
          </bibref></cite>, the visual representation was given by a low-resolution image centered around the given position.
Due to the discrete nature of pixels, this approach introduced uncertainty:
the agent was not able to know exactly where, e.g., the borders of the environment were.
Therefore, in this work, the visual representation is given by distances to surrounding entities in eight directions.
Accordingly, we replace the CNN in the affordance model with an MLP.</p>
    </para>
    <para xml:id="S3.p3">
      <p>All hyperparameters were optimized empirically, which led to the following configuration.
The affordance model <Math mode="inline" tex="a_{M}" text="a _ M" xml:id="S3.p3.m1">
          <XMath>
            <XMApp>
              <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
              <XMTok font="italic" role="UNKNOWN">a</XMTok>
              <XMTok font="italic" fontsize="70%" role="UNKNOWN">M</XMTok>
            </XMApp>
          </XMath>
        </Math> is given by two linear layers with hidden sizes <Math mode="inline" tex="64" text="64" xml:id="S3.p3.m2">
          <XMath>
            <XMTok meaning="64" role="NUMBER">64</XMTok>
          </XMath>
        </Math> and <Math mode="inline" tex="32" text="32" xml:id="S3.p3.m3">
          <XMath>
            <XMTok meaning="32" role="NUMBER">32</XMTok>
          </XMath>
        </Math> and followed by ReLU activation functions.
Its output, representing affordance codes, is produced by a linear layer that maps onto size <Math mode="inline" tex="5" text="5" xml:id="S3.p3.m4">
          <XMath>
            <XMTok meaning="5" role="NUMBER">5</XMTok>
          </XMath>
        </Math> with the tanh activation function.
The transition model <Math mode="inline" tex="t_{M}" text="t _ M" xml:id="S3.p3.m5">
          <XMath>
            <XMApp>
              <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
              <XMTok font="italic" role="UNKNOWN">t</XMTok>
              <XMTok font="italic" fontsize="70%" role="UNKNOWN">M</XMTok>
            </XMApp>
          </XMath>
        </Math> consists of a linear layer that maps onto size <Math mode="inline" tex="64" text="64" xml:id="S3.p3.m6">
          <XMath>
            <XMTok meaning="64" role="NUMBER">64</XMTok>
          </XMath>
        </Math>, followed by a ReLU activation function, followed by two parallel linear layers, one for predicting <Math mode="inline" tex="\mu_{\Delta\tilde{p}^{t+1}}" text="mu _ (Delta * (tilde@(p)) ^ (t + 1))" xml:id="S3.p3.m7">
          <XMath>
            <XMApp>
              <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
              <XMTok font="italic" name="mu" role="UNKNOWN">μ</XMTok>
              <XMApp>
                <XMTok meaning="times" role="MULOP">⁢</XMTok>
                <XMTok fontsize="70%" name="Delta" role="UNKNOWN">Δ</XMTok>
                <XMApp>
                  <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                  <XMApp>
                    <XMTok fontsize="70%" name="tilde" role="OVERACCENT" stretchy="false">~</XMTok>
                    <XMTok font="italic" fontsize="70%" role="UNKNOWN">p</XMTok>
                  </XMApp>
                  <XMApp>
                    <XMTok fontsize="50%" meaning="plus" role="ADDOP">+</XMTok>
                    <XMTok font="italic" fontsize="50%" role="UNKNOWN">t</XMTok>
                    <XMTok fontsize="50%" meaning="1" role="NUMBER">1</XMTok>
                  </XMApp>
                </XMApp>
              </XMApp>
            </XMApp>
          </XMath>
        </Math> without an activation function and one for predicting <Math mode="inline" tex="\sigma_{\Delta\tilde{p}^{t+1}}" text="sigma _ (Delta * (tilde@(p)) ^ (t + 1))" xml:id="S3.p3.m8">
          <XMath>
            <XMApp>
              <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
              <XMTok font="italic" name="sigma" role="UNKNOWN">σ</XMTok>
              <XMApp>
                <XMTok meaning="times" role="MULOP">⁢</XMTok>
                <XMTok fontsize="70%" name="Delta" role="UNKNOWN">Δ</XMTok>
                <XMApp>
                  <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                  <XMApp>
                    <XMTok fontsize="70%" name="tilde" role="OVERACCENT" stretchy="false">~</XMTok>
                    <XMTok font="italic" fontsize="70%" role="UNKNOWN">p</XMTok>
                  </XMApp>
                  <XMApp>
                    <XMTok fontsize="50%" meaning="plus" role="ADDOP">+</XMTok>
                    <XMTok font="italic" fontsize="50%" role="UNKNOWN">t</XMTok>
                    <XMTok fontsize="50%" meaning="1" role="NUMBER">1</XMTok>
                  </XMApp>
                </XMApp>
              </XMApp>
            </XMApp>
          </XMath>
        </Math> with the exponential activation function.
We use Adam as our optimizer <cite class="ltx_citemacro_cite"><bibref bibrefs="Kingma2014" separator=";" show="Authors Phrase1YearPhrase2" yyseparator=",">
            <bibrefphrase>(</bibrefphrase>
            <bibrefphrase>)</bibrefphrase>
          </bibref></cite>.
For each experiment, we train <Math mode="inline" tex="5" text="5" xml:id="S3.p3.m9">
          <XMath>
            <XMTok meaning="5" role="NUMBER">5</XMTok>
          </XMath>
        </Math> different model instances based on different weight initializations.</p>
    </para>
<!--  %Environment␣and␣its␣affordances -->    <para xml:id="S3.p4">
      <p>The environment is confined by borders and contains obstacles, both of which block the way.
Other terrains in the environment locally alter the sensorimotor dynamics of the vehicle.
Force fields accelerate the agent to the left or to the right and fog fields corrupt the observed position by Gaussian noise.
<!--  %****␣neurips_2023.tex␣Line␣300␣**** -->Borders, obstacles, and force fields produce affordances that can be learned by the agent, resulting in epistemic uncertainty until they are learned.
The uncertainty produced by fog fields cannot be reduced by learning and is therefore an instance of aleatoric uncertainty.
We use two different environments in our experiments, one for training and one for validation (see Figure <ref labelref="LABEL:fig:env"/>).</p>
    </para>
    <figure inlist="lof" labels="LABEL:fig:env" placement="htb!" xml:id="S3.F2">
      <tags>
        <tag><text fontsize="90%">Figure 2</text></tag>
        <tag role="autoref">Figure 2</tag>
        <tag role="refnum">2</tag>
        <tag role="typerefnum">Figure 2</tag>
      </tags>
      <figure inlist="lof" labels="LABEL:fig:env_train" placement="t" xml:id="S3.F1.sf1">
        <tags>
          <tag><text fontsize="90%">(a)</text></tag>
          <tag role="autoref">(a)</tag>
          <tag role="refnum">1(a)</tag>
        </tags>
        <graphics candidates="figures/environments/training_environment.pdf" graphic="figures/environments/training_environment.pdf" options="width=433.62pt" xml:id="S3.F1.sf1.g1"/>
        <toccaption><tag close=" ">(a)</tag></toccaption>
        <caption><tag close=" "><text fontsize="90%">(a)</text></tag></caption>
      </figure>
<!--  %maximize␣horizontal␣separation -->      <figure inlist="lof" labels="LABEL:fig:env_val" placement="t" xml:id="S3.F1.sf2">
        <tags>
          <tag><text fontsize="90%">(b)</text></tag>
          <tag role="autoref">(b)</tag>
          <tag role="refnum">1(b)</tag>
        </tags>
        <graphics candidates="figures/environments/validation_environment.pdf" graphic="figures/environments/validation_environment.pdf" options="width=433.62pt" xml:id="S3.F1.sf2.g1"/>
        <toccaption><tag close=" ">(b)</tag></toccaption>
        <caption><tag close=" "><text fontsize="90%">(b)</text></tag></caption>
      </figure>
      <toccaption><tag close=" ">2</tag>
Environments used in our experiments.
A small circular agent (black) navigates its environment using four diagonally attached rocket jets (orange).
Fog fields are depicted in gray, obstacles in black.
Force fields accelerate the agent to the right and left in green and yellow, respectively;
(a) depicts the environment used during training and (b) depicts the environment used during validation.
When focusing on affordance learning, we evaluate the model’s performance only while the agent is within the red rectangle.
</toccaption>
      <caption><tag close=": "><text fontsize="90%">Figure 2</text></tag><text fontsize="90%">
Environments used in our experiments.
A small circular agent (black) navigates its environment using four diagonally attached rocket jets (orange).
Fog fields are depicted in gray, obstacles in black.
Force fields accelerate the agent to the right and left in green and yellow, respectively;
(a) depicts the environment used during training and (b) depicts the environment used during validation.
When focusing on affordance learning, we evaluate the model’s performance only while the agent is within the red rectangle.
</text></caption>
    </figure>
<!--  %****␣neurips_2023.tex␣Line␣325␣**** 
     %Uncertainty␣estimates-->    <subsection inlist="toc" xml:id="S3.SS1">
      <tags>
        <tag>3.1</tag>
        <tag role="autoref">subsection 3.1</tag>
        <tag role="refnum">3.1</tag>
        <tag role="typerefnum">§3.1</tag>
      </tags>
      <title><tag close=" ">3.1</tag>Uncertainty estimation</title>
      <para xml:id="S3.SS1.p1">
        <p>As shown in Figure <ref labelref="LABEL:fig:architecture"/>, a single model instance predicts the parameters of a normal distribution <Math mode="inline" tex="P=\mathcal{N}(\mu,\sigma^{2})" text="P = N * open-interval@(mu, sigma ^ 2)" xml:id="S3.SS1.p1.m1">
            <XMath>
              <XMApp>
                <XMTok meaning="equals" role="RELOP">=</XMTok>
                <XMTok font="italic" role="UNKNOWN">P</XMTok>
                <XMApp>
                  <XMTok meaning="times" role="MULOP">⁢</XMTok>
                  <XMTok font="caligraphic" role="UNKNOWN">N</XMTok>
                  <XMDual>
                    <XMApp>
                      <XMTok meaning="open-interval"/>
                      <XMRef idref="S3.SS1.p1.m1.1"/>
                      <XMRef idref="S3.SS1.p1.m1.2"/>
                    </XMApp>
                    <XMWrap>
                      <XMTok role="OPEN" stretchy="false">(</XMTok>
                      <XMTok font="italic" name="mu" role="UNKNOWN" xml:id="S3.SS1.p1.m1.1">μ</XMTok>
                      <XMTok role="PUNCT">,</XMTok>
                      <XMApp xml:id="S3.SS1.p1.m1.2">
                        <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                        <XMTok font="italic" name="sigma" role="UNKNOWN">σ</XMTok>
                        <XMTok fontsize="70%" meaning="2" role="NUMBER">2</XMTok>
                      </XMApp>
                      <XMTok role="CLOSE" stretchy="false">)</XMTok>
                    </XMWrap>
                  </XMDual>
                </XMApp>
              </XMApp>
            </XMath>
          </Math> over positional changes.
This way, the model performs uncertainty estimation.
However, it cannot distinguish between epistemic and aleatoric uncertainty.
In order to be able to efficiently guide exploration towards learnable interactions, the two types need to be disentangled.
We achieve this with ensembles of <Math mode="inline" tex="i" text="i" xml:id="S3.SS1.p1.m2">
            <XMath>
              <XMTok font="italic" role="UNKNOWN">i</XMTok>
            </XMath>
          </Math> models that predict probability distributions <Math mode="inline" tex="P_{i}" text="P _ i" xml:id="S3.SS1.p1.m3">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="italic" role="UNKNOWN">P</XMTok>
                <XMTok font="italic" fontsize="70%" role="UNKNOWN">i</XMTok>
              </XMApp>
            </XMath>
          </Math>.
Since perfect models produce matching predictions, a measurement of an ensemble’s members’ disagreement can act as a proxy for epistemic uncertainty.
We investigate the following uncertainty measures:</p>
        <itemize xml:id="S3.I1">
          <item xml:id="S3.I1.i1">
            <tags>
              <tag>•</tag>
              <tag role="autoref">item </tag>
              <tag role="typerefnum">1st item</tag>
            </tags>
            <para xml:id="S3.I1.i1.p1">
              <p>Aleatoric uncertainty as the mean of the predicted standard deviations <Math mode="inline" tex="\text{AU}=\mu(\mathbf{\sigma})" text="[AU] = mu * sigma" xml:id="S3.I1.i1.p1.m1">
                  <XMath>
                    <XMApp>
                      <XMTok meaning="equals" role="RELOP">=</XMTok>
                      <XMText>AU</XMText>
                      <XMApp>
                        <XMTok meaning="times" role="MULOP">⁢</XMTok>
                        <XMTok font="italic" name="mu" role="UNKNOWN">μ</XMTok>
                        <XMDual>
                          <XMRef idref="S3.I1.i1.p1.m1.1"/>
                          <XMWrap>
                            <XMTok role="OPEN" stretchy="false">(</XMTok>
                            <XMTok font="italic" name="sigma" role="UNKNOWN" xml:id="S3.I1.i1.p1.m1.1">σ</XMTok>
                            <XMTok role="CLOSE" stretchy="false">)</XMTok>
                          </XMWrap>
                        </XMDual>
                      </XMApp>
                    </XMApp>
                  </XMath>
                </Math></p>
            </para>
          </item>
          <item xml:id="S3.I1.i2">
            <tags>
              <tag>•</tag>
              <tag role="autoref">item </tag>
              <tag role="typerefnum">2nd item</tag>
            </tags>
            <para xml:id="S3.I1.i2.p1">
              <p>Epistemic uncertainty as the standard deviation of the predicted means <Math mode="inline" tex="\text{EU}_{\text{SD}}=\sigma(\mathbf{\mu})" text="[EU] _ [SD] = sigma * mu" xml:id="S3.I1.i2.p1.m1">
                  <XMath>
                    <XMApp>
                      <XMTok meaning="equals" role="RELOP">=</XMTok>
                      <XMApp>
                        <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                        <XMText>EU</XMText>
                        <XMText><text fontsize="70%">SD</text></XMText>
                      </XMApp>
                      <XMApp>
                        <XMTok meaning="times" role="MULOP">⁢</XMTok>
                        <XMTok font="italic" name="sigma" role="UNKNOWN">σ</XMTok>
                        <XMDual>
                          <XMRef idref="S3.I1.i2.p1.m1.1"/>
                          <XMWrap>
                            <XMTok role="OPEN" stretchy="false">(</XMTok>
                            <XMTok font="italic" name="mu" role="UNKNOWN" xml:id="S3.I1.i2.p1.m1.1">μ</XMTok>
                            <XMTok role="CLOSE" stretchy="false">)</XMTok>
                          </XMWrap>
                        </XMDual>
                      </XMApp>
                    </XMApp>
                  </XMath>
                </Math></p>
            </para>
          </item>
          <item xml:id="S3.I1.i3">
            <tags>
              <tag>•</tag>
              <tag role="autoref">item </tag>
              <tag role="typerefnum">3rd item</tag>
            </tags>
            <para xml:id="S3.I1.i3.p1">
              <p>Epistemic uncertainty as the Jensen-Shannon divergence between predicted distributions <Math mode="inline" tex="\text{EU}_{\text{JSD}}=\frac{1}{n}\sum_{i}D_{\text{KL}}(P_{i}||M)," xml:id="S3.I1.i3.p1.m1">
                  <XMath>
                    <XMApp>
                      <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                      <XMText>EU</XMText>
                      <XMText><text fontsize="70%">JSD</text></XMText>
                    </XMApp>
                    <XMTok meaning="equals" role="RELOP">=</XMTok>
                    <XMApp>
                      <XMTok mathstyle="text" meaning="divide" role="FRACOP"/>
                      <XMTok fontsize="70%" meaning="1" role="NUMBER">1</XMTok>
                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">n</XMTok>
                    </XMApp>
                    <XMApp>
                      <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                      <XMTok mathstyle="text" meaning="sum" role="SUMOP" scriptpos="post">∑</XMTok>
                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">i</XMTok>
                    </XMApp>
                    <XMApp>
                      <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                      <XMTok font="italic" role="UNKNOWN">D</XMTok>
                      <XMText><text fontsize="70%">KL</text></XMText>
                    </XMApp>
                    <XMWrap>
                      <XMTok role="OPEN" stretchy="false">(</XMTok>
                      <XMApp>
                        <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                        <XMTok font="italic" role="UNKNOWN">P</XMTok>
                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">i</XMTok>
                      </XMApp>
                      <XMTok role="VERTBAR" stretchy="false">|</XMTok>
                      <XMTok role="VERTBAR" stretchy="false">|</XMTok>
                      <XMTok font="italic" role="UNKNOWN">M</XMTok>
                      <XMTok role="CLOSE" stretchy="false">)</XMTok>
                    </XMWrap>
                    <XMTok role="PUNCT">,</XMTok>
                  </XMath>
                </Math> where <Math mode="inline" tex="M=\frac{1}{n}\sum_{i}P_{i}" text="M = (1 / n) * (sum _ i)@(P _ i)" xml:id="S3.I1.i3.p1.m2">
                  <XMath>
                    <XMApp>
                      <XMTok meaning="equals" role="RELOP">=</XMTok>
                      <XMTok font="italic" role="UNKNOWN">M</XMTok>
                      <XMApp>
                        <XMTok meaning="times" role="MULOP">⁢</XMTok>
                        <XMApp>
                          <XMTok mathstyle="text" meaning="divide" role="FRACOP"/>
                          <XMTok fontsize="70%" meaning="1" role="NUMBER">1</XMTok>
                          <XMTok font="italic" fontsize="70%" role="UNKNOWN">n</XMTok>
                        </XMApp>
                        <XMApp>
                          <XMApp>
                            <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                            <XMTok mathstyle="text" meaning="sum" role="SUMOP" scriptpos="post">∑</XMTok>
                            <XMTok font="italic" fontsize="70%" role="UNKNOWN">i</XMTok>
                          </XMApp>
                          <XMApp>
                            <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                            <XMTok font="italic" role="UNKNOWN">P</XMTok>
                            <XMTok font="italic" fontsize="70%" role="UNKNOWN">i</XMTok>
                          </XMApp>
                        </XMApp>
                      </XMApp>
                    </XMApp>
                  </XMath>
                </Math> and <Math mode="inline" tex="D_{\text{KL}}" text="D _ [KL]" xml:id="S3.I1.i3.p1.m3">
                  <XMath>
                    <XMApp>
                      <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                      <XMTok font="italic" role="UNKNOWN">D</XMTok>
                      <XMText><text fontsize="70%">KL</text></XMText>
                    </XMApp>
                  </XMath>
                </Math> denotes the Kullback-Leibler divergence</p>
            </para>
          </item>
        </itemize>
        <p>Using one of these uncertainty measures as the objective function during action selection allows the agent to perform exploration.
The difference between <Math mode="inline" tex="\text{EU}_{\text{SD}}" text="[EU] _ [SD]" xml:id="S3.SS1.p1.m4">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMText>EU</XMText>
                <XMText><text fontsize="70%">SD</text></XMText>
              </XMApp>
            </XMath>
          </Math> and <Math mode="inline" tex="\text{EU}_{\text{JSD}}" text="[EU] _ [JSD]" xml:id="S3.SS1.p1.m5">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMText>EU</XMText>
                <XMText><text fontsize="70%">JSD</text></XMText>
              </XMApp>
            </XMath>
          </Math> is that the former does not consider the disagreement in predicted standard deviations.
In contrast, JSD takes into account full distributions by extending the Kullback-Leibler divergence to potentially more than two distributions.
The JSD possesses additional advantageous properties, namely being bounded and symmetric for all distributions <cite class="ltx_citemacro_cite"><bibref bibrefs="briet2008" separator=";" show="Authors Phrase1YearPhrase2" yyseparator=",">
              <bibrefphrase>(</bibrefphrase>
              <bibrefphrase>)</bibrefphrase>
            </bibref></cite>.
Therefore, we anticipate <Math mode="inline" tex="\text{EU}_{\text{JSD}}" text="[EU] _ [JSD]" xml:id="S3.SS1.p1.m6">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMText>EU</XMText>
                <XMText><text fontsize="70%">JSD</text></XMText>
              </XMApp>
            </XMath>
          </Math> to provide the most precise measurements of epistemic uncertainty, leading to faster and more accurate affordance learning.</p>
      </para>
<!--  %Exploration -->      <para xml:id="S3.SS1.p2">
        <p>The training data is initialized with randomly generated sequences of observation and action pairs as in <cite class="ltx_citemacro_citet"><bibref bibrefs="Scholz2022" separator=";" show="Authors Phrase1YearPhrase2" yyseparator=",">
              <bibrefphrase/>
              <bibrefphrase/>
            </bibref></cite>.
If an uncertainty-based exploration mechanism is employed, we gradually replace a subset of the training data after each epoch by new sequences.
These new sequences are generated by the agent itself via goal-directed control, based on behavior that is expected to maximize one of the above uncertainty measures.
<!--  %****␣neurips_2023.tex␣Line␣350␣**** -->Validation is performed on a dataset that is generated with the same heuristic as the training data is initialized with.</p>
      </para>
    </subsection>
  </section>
  <section inlist="toc" xml:id="S4">
    <tags>
      <tag>4</tag>
      <tag role="autoref">section 4</tag>
      <tag role="refnum">4</tag>
      <tag role="typerefnum">§4</tag>
    </tags>
    <title><tag close=" ">4</tag>Experiments</title>
    <para xml:id="S4.p1">
      <p>We first investigate how globally vs locally informative sensory information results in different generalization capabilities.
Subsequently, we compare the different uncertainty measures with regards to their suitability for affordance learning.</p>
    </para>
    <subsection inlist="toc" xml:id="S4.SS1">
      <tags>
        <tag>4.1</tag>
        <tag role="autoref">subsection 4.1</tag>
        <tag role="refnum">4.1</tag>
        <tag role="typerefnum">§4.1</tag>
      </tags>
      <title><tag close=" ">4.1</tag>Globally vs locally informative sensory information</title>
      <para xml:id="S4.SS1.p1">
        <p>The affordance model <Math mode="inline" tex="a_{M}" text="a _ M" xml:id="S4.SS1.p1.m1">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="italic" role="UNKNOWN">a</XMTok>
                <XMTok font="italic" fontsize="70%" role="UNKNOWN">M</XMTok>
              </XMApp>
            </XMath>
          </Math> allows the agent to perceive its environment.
As input it receives distances to the nearest obstacle or terrain in each of eight directions.
In our initial experiment, we compare an agent’s affordance learning capabilities with globally informative sensory information vs locally informative sensory information.
In the latter case, the distance sensors are limited in range, such that the agent is not able to perceive obstacles or terrains that are further away than a certain threshold.
We choose the threshold such that the agent is always able to perceive obstacles or terrains that could influence its dynamics in the next time step.
Globally informative sensory information is generated by sensors that are not limited in range.
Here, the perceived distances to the borders encode the current position of the agent in the environment.
For both bases, five model instances with differently initialized weights are trained on randomly generated sequences without any exploration taking place.</p>
      </para>
      <subsubsection inlist="toc" xml:id="S4.SS1.SSS1">
        <tags>
          <tag>4.1.1</tag>
          <tag role="autoref">subsubsection 4.1.1</tag>
          <tag role="refnum">4.1.1</tag>
          <tag role="typerefnum">§4.1.1</tag>
        </tags>
        <title><tag close=" ">4.1.1</tag>Results</title>
        <figure inlist="lof" labels="LABEL:fig:results_exp1" xml:id="S4.F3">
          <tags>
            <tag><text fontsize="90%">Figure 3</text></tag>
            <tag role="autoref">Figure 3</tag>
            <tag role="refnum">3</tag>
            <tag role="typerefnum">Figure 3</tag>
          </tags>
          <graphics candidates="figures/results/exp1_locality/loss_new.pdf" graphic="figures/results/exp1_locality/loss_new.pdf" options="width=433.62pt" xml:id="S4.F3.g1"/>
<!--  %****␣neurips_2023.tex␣Line␣375␣**** -->          <toccaption><tag close=" ">3</tag>
Losses for globally vs locally informative sensory information aggregated over <Math mode="inline" tex="5" text="5" xml:id="S4.F3.m1">
              <XMath>
                <XMTok meaning="5" role="NUMBER">5</XMTok>
              </XMath>
            </Math> seeds.
Shaded areas indicate standard deviations.
Solid lines represent validation losses, dashed lines show training losses.
The agent performs significantly better in the validation environment if equipped with distance sensors that are limited in range.
With distance sensors that are not limited, slight overfitting is observed.
</toccaption>
          <caption><tag close=": "><text fontsize="90%">Figure 3</text></tag><text fontsize="90%">
Losses for globally vs locally informative sensory information aggregated over <Math mode="inline" tex="5" text="5" xml:id="S4.F3.m2">
                <XMath>
                  <XMTok meaning="5" role="NUMBER">5</XMTok>
                </XMath>
              </Math> seeds.
Shaded areas indicate standard deviations.
Solid lines represent validation losses, dashed lines show training losses.
The agent performs significantly better in the validation environment if equipped with distance sensors that are limited in range.
With distance sensors that are not limited, slight overfitting is observed.
</text></caption>
        </figure>
        <para xml:id="S4.SS1.SSS1.p1">
          <p>We find that the model generalizes significantly better to the validation environment if equipped with locally informative sensory information (see Figure <ref labelref="LABEL:fig:results_exp1"/>).
Globally informative sensory information allows the agent to learn the environment “by heart”, relating absolute coordinates to affordances, and thereby harming generalization capabilities:
The trained agent expects the obstacles and terrains to always be at certain absolute positions, an instance of overfitting.
With locally informative sensory information, however, the agent is able to learn only egocentrically encoded knowledge.
This kind of knowledge is applicable anywhere in the environment where the learned egocentric code applies.
This is the case in our validation environment, where the obstacles and terrains have the same effects on the agent but are positioned differently in the environment.</p>
        </para>
        <para xml:id="S4.SS1.SSS1.p2">
          <p>In order to gain an understanding of the model’s inner workings, we visualize the produced affordance codes as maps.
To do so, we probe the environment at regularly distributed positions and feed the corresponding visual representations <Math mode="inline" tex="\mathbf{v}_{t}" text="v _ t" xml:id="S4.SS1.SSS1.p2.m1">
              <XMath>
                <XMApp>
                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                  <XMTok font="bold" role="UNKNOWN">v</XMTok>
                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
                </XMApp>
              </XMath>
            </Math> into the affordance model.
A principal component analysis reduces the dimensionality from <Math mode="inline" tex="5" text="5" xml:id="S4.SS1.SSS1.p2.m2">
              <XMath>
                <XMTok meaning="5" role="NUMBER">5</XMTok>
              </XMath>
            </Math> to <Math mode="inline" tex="3" text="3" xml:id="S4.SS1.SSS1.p2.m3">
              <XMath>
                <XMTok meaning="3" role="NUMBER">3</XMTok>
              </XMath>
            </Math>, allowing us to visualize the encoded affordances by RGB values.
These <emph font="italic">affordance maps</emph> represent local behavioral possibilities, such as whether it is possible to move to the right.
Figure <ref labelref="LABEL:fig:affmaps_exp1"/> shows the effects of the large-range distance sensors in a test environment with four obstacles:
While globally informative sensory information yields distorted affordance maps, the more local sensory signals indicate great generalization abilities.
<!--  %****␣neurips_2023.tex␣Line␣425␣**** --></p>
        </para>
        <para xml:id="S4.SS1.SSS1.p3">
          <p>We have thus shown that it is rather advantageous to encode affordances in a local, egocentric manner.
Therefore we use locally informative sensory information in the following experiments.</p>
        </para>
        <figure inlist="lof" labels="LABEL:fig:affmaps_exp1" xml:id="S4.F4">
          <tags>
            <tag><text fontsize="90%">Figure 4</text></tag>
            <tag role="autoref">Figure 4</tag>
            <tag role="refnum">4</tag>
            <tag role="typerefnum">Figure 4</tag>
          </tags>
          <figure inlist="lof" labels="LABEL:fig:affmap_exp1_global" placement="t" xml:id="S4.F3.sf1">
            <tags>
              <tag><text fontsize="90%">(a)</text></tag>
              <tag role="autoref">(a)</tag>
              <tag role="refnum">3(a)</tag>
            </tags>
            <graphics candidates="figures/affordance_maps/exp1_global.png" graphic="figures/affordance_maps/exp1_global.png" options="width=433.62pt" xml:id="S4.F3.sf1.g1"/>
            <toccaption><tag close=" ">(a)</tag></toccaption>
            <caption><tag close=" "><text fontsize="90%">(a)</text></tag></caption>
          </figure>
          <figure inlist="lof" labels="LABEL:fig:affmap_exp1_local" placement="t" xml:id="S4.F3.sf2">
            <tags>
              <tag><text fontsize="90%">(b)</text></tag>
              <tag role="autoref">(b)</tag>
              <tag role="refnum">3(b)</tag>
            </tags>
            <graphics candidates="figures/affordance_maps/exp1_local.png" graphic="figures/affordance_maps/exp1_local.png" options="width=433.62pt" xml:id="S4.F3.sf2.g1"/>
            <toccaption><tag close=" ">(b)</tag></toccaption>
            <caption><tag close=" "><text fontsize="90%">(b)</text></tag></caption>
          </figure>
          <toccaption><tag close=" ">4</tag>
Affordance maps of an environment with four obstacles generated by a model with (a) globally informative sensory information vs (b) locally informative sensory information.
The maps are produced by feeding visual representations of the environment at regularly distributed positions into the affordance model and mapping the produced context code onto RGB space via principal component analysis.
True affordance maps, i.e., mappings from perceptual information to local behavioral constraints, only emerge in the latter case. Note how the obstacle’s edges present the same local constraints as the borders, thus the matching borders show identical colors.</toccaption>
          <caption><tag close=": "><text fontsize="90%">Figure 4</text></tag><text fontsize="90%">
Affordance maps of an environment with four obstacles generated by a model with (a) globally informative sensory information vs (b) locally informative sensory information.
The maps are produced by feeding visual representations of the environment at regularly distributed positions into the affordance model and mapping the produced context code onto RGB space via principal component analysis.
True affordance maps, i.e., mappings from perceptual information to local behavioral constraints, only emerge in the latter case. Note how the obstacle’s edges present the same local constraints as the borders, thus the matching borders show identical colors.</text></caption>
        </figure>
      </subsubsection>
    </subsection>
    <subsection inlist="toc" xml:id="S4.SS2">
      <tags>
        <tag>4.2</tag>
        <tag role="autoref">subsection 4.2</tag>
        <tag role="refnum">4.2</tag>
        <tag role="typerefnum">§4.2</tag>
      </tags>
      <title><tag close=" ">4.2</tag>Uncertainty-guided exploration</title>
<!--  %****␣neurips_2023.tex␣Line␣450␣**** -->      <para xml:id="S4.SS2.p1">
        <p>We now investigate how affordances can be explored more efficiently, focusing on locally informative sensory information only.
Here, the agent replaces the <Math mode="inline" tex="5\%" text="5percent" xml:id="S4.SS2.p1.m1">
            <XMath>
              <XMApp>
                <XMTok meaning="percent" role="POSTFIX">%</XMTok>
                <XMTok meaning="5" role="NUMBER">5</XMTok>
              </XMApp>
            </XMath>
          </Math> oldest training sequences with new sequences that are generated by the exploration mechanism in each epoch.
We compare the different uncertainty measures <Math mode="inline" tex="\text{AU},\text{EU}_{\text{SD}}" text="list@([AU], [EU] _ [SD])" xml:id="S4.SS2.p1.m2">
            <XMath>
              <XMDual>
                <XMApp>
                  <XMTok meaning="list"/>
                  <XMRef idref="S4.SS2.p1.m2.1"/>
                  <XMRef idref="S4.SS2.p1.m2.2"/>
                </XMApp>
                <XMWrap>
                  <XMText xml:id="S4.SS2.p1.m2.1">AU</XMText>
                  <XMTok role="PUNCT">,</XMTok>
                  <XMApp xml:id="S4.SS2.p1.m2.2">
                    <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                    <XMText>EU</XMText>
                    <XMText><text fontsize="70%">SD</text></XMText>
                  </XMApp>
                </XMWrap>
              </XMDual>
            </XMath>
          </Math>, and <Math mode="inline" tex="\text{EU}_{\text{JSD}}" text="[EU] _ [JSD]" xml:id="S4.SS2.p1.m3">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMText>EU</XMText>
                <XMText><text fontsize="70%">JSD</text></XMText>
              </XMApp>
            </XMath>
          </Math>.
The uncertainty measures are used as the objective that is to be maximized by the behavior inference mechanism.
We always compare performances to the <emph font="italic">random</emph> behavioral policy, where the agent is not able to explore its environment in an active manner but is trained on a static training set.
We evaluate each condition using <Math mode="inline" tex="5" text="5" xml:id="S4.SS2.p1.m4">
            <XMath>
              <XMTok meaning="5" role="NUMBER">5</XMTok>
            </XMath>
          </Math> different seeds, with each seed generating an ensemble of size <Math mode="inline" tex="5" text="5" xml:id="S4.SS2.p1.m5">
            <XMath>
              <XMTok meaning="5" role="NUMBER">5</XMTok>
            </XMath>
          </Math>.</p>
      </para>
      <subsubsection inlist="toc" xml:id="S4.SS2.SSS1">
        <tags>
          <tag>4.2.1</tag>
          <tag role="autoref">subsubsection 4.2.1</tag>
          <tag role="refnum">4.2.1</tag>
          <tag role="typerefnum">§4.2.1</tag>
        </tags>
        <title><tag close=" ">4.2.1</tag>Results</title>
        <para xml:id="S4.SS2.SSS1.p1">
          <p>First, we examine the velocities the agent exhibits for the different cases defined as the distance traveled between two consecutive time steps (see Figure <ref labelref="LABEL:fig:exp2_speeds"/>).
We find that the agent produces significantly higher velocities if no active exploration takes place, i.e., in the <emph font="italic">random</emph> condition.
This poses a disadvantage for the other cases as high velocities that would be present in the validation data are never encountered during training.
We therefore modify the validation set by adjusting the heuristic to produce lower velocities.
Further, we focus on learnable affordances rather than on areas with high sensory uncertainty by restricting the validation set to data points where the agent is within the red rectangle in Figure <ref labelref="LABEL:fig:env"/>.</p>
        </para>
        <figure inlist="lof" labels="LABEL:fig:exp2_speeds" xml:id="S4.F5">
          <tags>
            <tag><text fontsize="90%">Figure 5</text></tag>
            <tag role="autoref">Figure 5</tag>
            <tag role="refnum">5</tag>
            <tag role="typerefnum">Figure 5</tag>
          </tags>
          <graphics candidates="figures/results/exp2_intrinsic_only/speeds_new.png" graphic="figures/results/exp2_intrinsic_only/speeds_new.png" options="width=433.62pt" xml:id="S4.F5.g1"/>
          <toccaption><tag close=" ">5</tag>
Boxplots of the velocities the agent exhibits during training with the different exploration mechanisms, taken over all epochs across the entire environment.
No uncertainty estimate produces velocities as high as the <emph font="italic">random</emph> heuristic.
</toccaption>
          <caption><tag close=": "><text fontsize="90%">Figure 5</text></tag><text fontsize="90%">
Boxplots of the velocities the agent exhibits during training with the different exploration mechanisms, taken over all epochs across the entire environment.
No uncertainty estimate produces velocities as high as the <emph font="italic">random</emph> heuristic.
</text></caption>
        </figure>
        <para xml:id="S4.SS2.SSS1.p2">
          <p>The validation loss on the modified and restricted validation set is shown in Figure <ref labelref="LABEL:fig:exp2_loss"/>.
<!--  %****␣neurips_2023.tex␣Line␣475␣**** -->An agent that focuses on aleatoric uncertainty during exploration indeed performs worst when confronted with the learnable affordances.
<Math mode="inline" tex="\text{EU}_{\text{JSD}}" text="[EU] _ [JSD]" xml:id="S4.SS2.SSS1.p2.m1">
              <XMath>
                <XMApp>
                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                  <XMText>EU</XMText>
                  <XMText><text fontsize="70%">JSD</text></XMText>
                </XMApp>
              </XMath>
            </Math>-based behavioral inference learns the fastest and produces the lowest validation loss overall.</p>
        </para>
        <figure inlist="lof" labels="LABEL:fig:exp2_loss" xml:id="S4.F6">
          <tags>
            <tag><text fontsize="90%">Figure 6</text></tag>
            <tag role="autoref">Figure 6</tag>
            <tag role="refnum">6</tag>
            <tag role="typerefnum">Figure 6</tag>
          </tags>
          <graphics candidates="figures/results/exp2_intrinsic_only/loss_new.pdf" graphic="figures/results/exp2_intrinsic_only/loss_new.pdf" options="width=433.62pt" xml:id="S4.F6.g1"/>
          <toccaption><tag close=" ">6</tag>
Losses for uncertainty-guided exploration aggregated over <Math mode="inline" tex="5" text="5" xml:id="S4.F6.m1">
              <XMath>
                <XMTok meaning="5" role="NUMBER">5</XMTok>
              </XMath>
            </Math> seeds in the adapted, affordance-focussed validation set.
Shaded areas indicate standard deviations.
The best performance is achieved if the agent explores its environment based on the <Math mode="inline" tex="\text{EU}_{\text{JSD}}" text="[EU] _ [JSD]" xml:id="S4.F6.m2">
              <XMath>
                <XMApp>
                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                  <XMText>EU</XMText>
                  <XMText><text fontsize="70%">JSD</text></XMText>
                </XMApp>
              </XMath>
            </Math> uncertainty measure.
It also allows the agent to learn affordances the quickest.
</toccaption>
          <caption><tag close=": "><text fontsize="90%">Figure 6</text></tag><text fontsize="90%">
Losses for uncertainty-guided exploration aggregated over <Math mode="inline" tex="5" text="5" xml:id="S4.F6.m3">
                <XMath>
                  <XMTok meaning="5" role="NUMBER">5</XMTok>
                </XMath>
              </Math> seeds in the adapted, affordance-focussed validation set.
Shaded areas indicate standard deviations.
The best performance is achieved if the agent explores its environment based on the <Math mode="inline" tex="\text{EU}_{\text{JSD}}" text="[EU] _ [JSD]" xml:id="S4.F6.m4">
                <XMath>
                  <XMApp>
                    <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                    <XMText>EU</XMText>
                    <XMText><text fontsize="70%">JSD</text></XMText>
                  </XMApp>
                </XMath>
              </Math> uncertainty measure.
It also allows the agent to learn affordances the quickest.
</text></caption>
        </figure>
        <para xml:id="S4.SS2.SSS1.p3">
          <p>We also monitor the agent’s positions and generate heatmaps to see which parts of the environment the agent explores the most with the different uncertainty estimates (see Figure <ref labelref="LABEL:fig:exp2_heatmaps"/>).
Training based on sequences generated by the random heuristic results in relatively uniform coverage of the environment by design.
We find that, of all approaches, exploration based on aleatoric uncertainty leads to the agent exploring the fog field the most.
With <Math mode="inline" tex="\text{EU}_{\text{SD}}" text="[EU] _ [SD]" xml:id="S4.SS2.SSS1.p3.m1">
              <XMath>
                <XMApp>
                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                  <XMText>EU</XMText>
                  <XMText><text fontsize="70%">SD</text></XMText>
                </XMApp>
              </XMath>
            </Math>, the agent is more focused on other affordances than the fog, partially avoiding regions of high aleatoric uncertainty.
The most profound effect, however, can be seen when <Math mode="inline" tex="\text{EU}_{\text{JSD}}" text="[EU] _ [JSD]" xml:id="S4.SS2.SSS1.p3.m2">
              <XMath>
                <XMApp>
                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                  <XMText>EU</XMText>
                  <XMText><text fontsize="70%">JSD</text></XMText>
                </XMApp>
              </XMath>
            </Math> is employed: the agent avoids the fog field to the greatest extent while exploring the rest of the environment relatively uniformly.
We conclude that <Math mode="inline" tex="\text{EU}_{\text{JSD}}" text="[EU] _ [JSD]" xml:id="S4.SS2.SSS1.p3.m3">
              <XMath>
                <XMApp>
                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                  <XMText>EU</XMText>
                  <XMText><text fontsize="70%">JSD</text></XMText>
                </XMApp>
              </XMath>
            </Math> is the most suitable uncertainty measures for quick and accurate affordance learning.</p>
        </para>
<!--  %Epistemic␣exploration␣alone␣can␣lead␣to␣excessive␣and␣unnecessary␣behavior␣-␣such␣as␣high-speed␣bumping␣into␣borders. 
     %\subsection{Extrinsic␣motivation}
     %****␣neurips_2023.tex␣Line␣500␣****
     %Epistemic␣exploration␣combined␣with␣rewards/punishments.
     %%␣Restricted␣validation␣set
     %%␣Opportunistic␣behavior
     %\subsubsection{Results}
     %Avoids␣unnecessary␣and␣hurtful␣behavior,␣focuses␣on␣skill␣learning,␣thus␣improves␣opportunistic␣behavior.-->      </subsubsection>
    </subsection>
  </section>
  <section inlist="toc" xml:id="S5">
    <tags>
      <tag>5</tag>
      <tag role="autoref">section 5</tag>
      <tag role="refnum">5</tag>
      <tag role="typerefnum">§5</tag>
    </tags>
    <title><tag close=" ">5</tag>Discussion</title>
<!--  %Objective -->    <para xml:id="S5.p1">
      <p>This study investigated how an agent can learn affordances in a quick and accurate manner.
An artificial agent was put into a simulated environment and equipped with a cognitive map, which maps positions to sensory signals. The agent learned a predictive world model in the form of a neural network, predicting action consequences conditioned on local sensory information.
The agent explored its environment and learned about affordances that inform it about environmental aspects that locally influence its behavior.
Our results indicate that cognitive agents that actively learn about affordances should integrate three key ingredients in this process.
First, search for novel affordances should be pursued within world-centered cognitive maps, which allow the activation of local views at particular positions within the map.
Second, the learning of affordances should focus on local, body-relative sensory encodings.
Third, divergence measures between a small collection of model-predictive densities are best-suited to identify regions that support further model learning.</p>
    </para>
<!--  %Locality 
     %****␣neurips_2023.tex␣Line␣525␣****-->    <para xml:id="S5.p2">
      <p>We found that locality of perception is an important ingredient to allow for generalization, which is in accordance with the literature <cite class="ltx_citemacro_cite"><bibref bibrefs="Epstein2015" separator=";" show="Authors Phrase1YearPhrase2" yyseparator=",">
            <bibrefphrase>(</bibrefphrase>
            <bibrefphrase>)</bibrefphrase>
          </bibref></cite>.
If sensors are too globally informative, then local, generalizable affordances are hard to learn.
This is similar to infants which are born with low visual acuity <cite class="ltx_citemacro_cite"><bibref bibrefs="Smith2018" separator=";" show="Authors Phrase1YearPhrase2" yyseparator=",">
            <bibrefphrase>(</bibrefphrase>
            <bibrefphrase>)</bibrefphrase>
          </bibref></cite>, which may indeed be helpful to learn about global outlines and otherwise focus on local visual information, such as faces, hands, and objects.
Note that locality does not necessarily need to be in space, but could also be in time or respective other conceptual spaces <cite class="ltx_citemacro_cite"><bibref bibrefs="Gaerdenfors:2014" separator=";" show="Authors Phrase1YearPhrase2" yyseparator=",">
            <bibrefphrase>(</bibrefphrase>
            <bibrefphrase>)</bibrefphrase>
          </bibref></cite>.</p>
    </para>
    <figure inlist="lof" labels="LABEL:fig:exp2_heatmaps" xml:id="S5.F7">
      <tags>
        <tag><text fontsize="90%">Figure 7</text></tag>
        <tag role="autoref">Figure 7</tag>
        <tag role="refnum">7</tag>
        <tag role="typerefnum">Figure 7</tag>
      </tags>
      <figure inlist="lof" labels="LABEL:fig:exp2_heatmap_random" placement="t" xml:id="S5.F6.sf1">
        <tags>
          <tag><text fontsize="90%">(a)</text></tag>
          <tag role="autoref">(a)</tag>
          <tag role="refnum">6(a)</tag>
        </tags>
        <graphics candidates="figures/results/exp2_intrinsic_only/Heatmap_random.png" graphic="figures/results/exp2_intrinsic_only/Heatmap_random.png" options="width=433.62pt" xml:id="S5.F6.sf1.g1"/>
        <toccaption><tag close=" ">(a)</tag>Random heuristic</toccaption>
        <caption><tag close=" "><text fontsize="90%">(a)</text></tag><text fontsize="90%">Random heuristic</text></caption>
      </figure>
<!--  %maximize␣horizontal␣separation -->      <figure inlist="lof" labels="LABEL:fig:exp2_heatmap_aleatoric" placement="t" xml:id="S5.F6.sf2">
        <tags>
          <tag><text fontsize="90%">(b)</text></tag>
          <tag role="autoref">(b)</tag>
          <tag role="refnum">6(b)</tag>
        </tags>
        <graphics candidates="figures/results/exp2_intrinsic_only/Heatmap_aleatoric.png" graphic="figures/results/exp2_intrinsic_only/Heatmap_aleatoric.png" options="width=433.62pt" xml:id="S5.F6.sf2.g1"/>
        <toccaption><tag close=" ">(b)</tag>Aleatoric <text class="ltx_markedasmath">AU</text></toccaption>
        <caption><tag close=" "><text fontsize="90%">(b)</text></tag><text fontsize="90%">Aleatoric <text class="ltx_markedasmath">AU</text></text></caption>
      </figure>
      <break/>
<!--  %more␣vertical␣separation -->      <figure inlist="lof" labels="LABEL:fig:exp2_heatmap_sd" placement="t" xml:id="S5.F6.sf3">
        <tags>
          <tag><text fontsize="90%">(c)</text></tag>
          <tag role="autoref">(c)</tag>
          <tag role="refnum">6(c)</tag>
        </tags>
        <graphics candidates="figures/results/exp2_intrinsic_only/Heatmap_sd.png" graphic="figures/results/exp2_intrinsic_only/Heatmap_sd.png" options="width=433.62pt" xml:id="S5.F6.sf3.g1"/>
        <toccaption><tag close=" ">(c)</tag>Epistemic <Math mode="inline" tex="\text{EU}_{\text{SD}}" text="[EU] _ [SD]" xml:id="S5.F6.sf3.m1">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMText>EU</XMText>
                <XMText><text fontsize="70%">SD</text></XMText>
              </XMApp>
            </XMath>
          </Math></toccaption>
        <caption><tag close=" "><text fontsize="90%">(c)</text></tag><text fontsize="90%">Epistemic <Math mode="inline" tex="\text{EU}_{\text{SD}}" text="[EU] _ [SD]" xml:id="S5.F6.sf3.m2">
              <XMath>
                <XMApp>
                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                  <XMText>EU</XMText>
                  <XMText><text fontsize="70%">SD</text></XMText>
                </XMApp>
              </XMath>
            </Math></text></caption>
      </figure>
<!--  %maximize␣horizontal␣separation -->      <figure inlist="lof" labels="LABEL:fig:exp2_heatmap_jsd" placement="t" xml:id="S5.F6.sf4">
        <tags>
          <tag><text fontsize="90%">(d)</text></tag>
          <tag role="autoref">(d)</tag>
          <tag role="refnum">6(d)</tag>
        </tags>
        <graphics candidates="figures/results/exp2_intrinsic_only/Heatmap_jsd.png" graphic="figures/results/exp2_intrinsic_only/Heatmap_jsd.png" options="width=433.62pt" xml:id="S5.F6.sf4.g1"/>
        <toccaption><tag close=" ">(d)</tag>Epistemic <Math mode="inline" tex="\text{EU}_{\text{JSD}}" text="[EU] _ [JSD]" xml:id="S5.F6.sf4.m1">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMText>EU</XMText>
                <XMText><text fontsize="70%">JSD</text></XMText>
              </XMApp>
            </XMath>
          </Math></toccaption>
        <caption><tag close=" "><text fontsize="90%">(d)</text></tag><text fontsize="90%">Epistemic <Math mode="inline" tex="\text{EU}_{\text{JSD}}" text="[EU] _ [JSD]" xml:id="S5.F6.sf4.m2">
              <XMath>
                <XMApp>
                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                  <XMText>EU</XMText>
                  <XMText><text fontsize="70%">JSD</text></XMText>
                </XMApp>
              </XMath>
            </Math></text></caption>
      </figure>
      <toccaption><tag close=" ">7</tag>
Positional heatmaps based on all sequences the agent sees in the corresponding conditions during training.
(a) The <emph font="italic">random</emph> heuristic sees all locations in an approximately uniformly distributed manner. As the training set does not change over the course of training, the heatmap is less dense.
(b-d) When actively exploring the environment via the different uncertainty-based exploration strategies, the heat maps indicate most attractive sub-regions.
</toccaption>
      <caption><tag close=": "><text fontsize="90%">Figure 7</text></tag><text fontsize="90%">
Positional heatmaps based on all sequences the agent sees in the corresponding conditions during training.
(a) The <emph font="italic">random</emph> heuristic sees all locations in an approximately uniformly distributed manner. As the training set does not change over the course of training, the heatmap is less dense.
(b-d) When actively exploring the environment via the different uncertainty-based exploration strategies, the heat maps indicate most attractive sub-regions.
</text></caption>
    </figure>
<!--  %Affordance␣maps␣and␣the␣hippocampus -->    <para xml:id="S5.p3">
      <p>The learned affordances can be depicted by relating positions in the environment to affordance codes.
The resulting affordance maps are related to cognitive maps and are possibly related to what the hippocampus is partially doing.
Cognitive maps in the hippocampus appear to be essential for pursuing successful navigation and other environment-centered tasks.
Meanwhile, the strong interconnectivity with neocortical areas indicates that particular local codes in the hippocampus may trigger views onto the local surrounding <cite class="ltx_citemacro_cite"><bibref bibrefs="Mallot:2014,Roehrich:2014" separator=";" show="Authors Phrase1YearPhrase2" yyseparator=",">
            <bibrefphrase>(</bibrefphrase>
            <bibrefphrase>)</bibrefphrase>
          </bibref></cite>.
Moreover, the mapping between allocentric positions and egocentric perceptions is probably mediated between the hippocampal loop and the rest of the default mode network <cite class="ltx_citemacro_cite"><bibref bibrefs="Bottini:2020,Buckner:2008,Zacks:2020,Stawarczyk:2021" separator=";" show="Authors Phrase1YearPhrase2" yyseparator=",">
            <bibrefphrase>(</bibrefphrase>
            <bibrefphrase>)</bibrefphrase>
          </bibref></cite>.
It is this mapping that our agents explored and learned in this study.</p>
    </para>
<!--  %Future␣work -->    <para xml:id="S5.p4">
      <p>In the future, we plan to extend our work to other problems, which do not necessarily need to be based on spatial navigation, such as when following a construction plan for building furniture from individual parts.
Moreover, affordances should be directly based on objects, besides local circumstances, such as a kettle for boiling water for tea.
With regards to developmental psychology, further explorations with respect to curriculum generation <cite class="ltx_citemacro_cite"><bibref bibrefs="Smith2018" separator=";" show="Authors Phrase1YearPhrase2" yyseparator=",">
            <bibrefphrase>(</bibrefphrase>
            <bibrefphrase>)</bibrefphrase>
          </bibref></cite> might be interesting:
How do children generate their curriculum in comparison to our approaches?
How do they decide which experiences should never be forgotten, such as touching a hotplate?
The way we replace sequences in the training set can certainly be improved.
The challenge is to create a good set of training data and at the same time avoid forgetting of important experiences.
Even though we did not observe catastrophic forgetting in the above experiments, it would be interesting to examine whether the learning process exhibits self-stabilizing behavior if disturbed by impactful experiences.
<!--  %Neurology␣might␣ask␣how␣uncertainty␣could␣be␣estimated,␣and␣maybe␣disentangled,␣in␣the␣brain. -->In all of these cases, our study suggests that it will be advantageous to have local information available and to actively explore objects, entities, and locations in an epistemically-driven, active manner.</p>
    </para>
<!--  %\section{Quick␣text␣snippets␣about␣development␣and␣hippocampus} 
     %Concurrently␣with␣learning␣their␣language,␣toddlers␣and␣young␣children␣exhibit␣progressively␣more␣complex␣planning␣and␣reasoning␣abilities␣\cite{McCormack:2011,Nyhout:2019,Rafetseder:2010,Rawlings:2021,Tecwyn:2014}.
     %In␣navigation␣tasks,␣they␣become␣able␣to␣orient␣themselves␣by␣integrating,␣for␣example,␣landmarks,␣geometrical␣cues,␣and␣inertial␣information␣for␣the␣formation␣of␣cognitive␣maps␣\cite{Newcombe:2019}.
     %In␣puzzle␣solving␣tasks,␣such␣as␣the␣paddle␣box␣paradigm␣and␣block␣stacking␣tasks,␣children␣of␣four␣to␣five␣years␣of␣age␣are␣still␣significantly␣challenged␣in␣setting␣up␣the␣proper␣configuration␣to␣guide␣a␣falling␣object␣into␣the␣rewarding␣spot␣\cite{Tecwyn:2014}.
     %Various␣research␣directions␣suggest␣that␣one␣fundamental␣component␣to␣succeed␣in␣such␣planning␣and␣reasoning␣tasks␣is␣to␣embed␣events␣into␣bigger␣contexts
     %****␣neurips_2023.tex␣Line␣600␣****
     %\cite{HealdWolpert:2021,Schwoebel:2021}.
     %The␣critical␣role␣of␣context␣has␣also␣been␣emphasized␣with␣respect␣to␣language␣learning␣\cite{Rowe:2020}.
     %In␣the␣brain,␣context␣representations␣in␣prefrontal␣areas␣have␣been␣shown␣to␣critically␣interact␣with␣the␣hippocampal␣loop,␣which␣is␣crucially␣involved␣in␣the␣formation␣of␣episodic␣memory␣and␣cognitive␣maps␣as␣well␣as␣in␣the␣control␣of␣hypothetical␣thinking,␣reasoning,␣and␣planning␣\cite{Bilkey:2021,Bottini:2020,Buckner:2007,Buckner:2008,Ekstrom:2018,Gilead:2020,Hebscher:2016,Hebscher:2022,Liu:2019a}.
     %The␣objective␣is␣thus␣{\bf␣to␣model␣the␣formation␣of␣contexts␣and␣the␣ability␣to␣form␣context-conditioned␣cognitive␣maps}.
     %...I␣intend␣to␣interface␣the␣developing␣LoT␣with␣a␣context-conditioned␣processing␣loop,␣taking␣inspiration␣from␣the␣hippocampal␣loop.
     %The␣goal␣is␣to␣design␣the␣processing␣loop␣in␣such␣a␣way␣that␣contextual␣signals␣lead␣to␣the␣development␣of␣context-relative␣cognitive␣maps␣that␣are␣highly␣useful␣for␣the␣development␣of␣episodic␣memory,␣for␣deeper␣planning␣and␣reasoning,␣and␣for␣mental␣time␣travel␣\cite{Corballis:2009,McCormack:2011}.
     %Meanwhile,␣the␣hippocampal␣circuit␣communicates␣with␣the␣active␣cortical␣network,␣influencing␣each␣other.
     %The␣circuit␣is␣crucially␣involved␣in␣the␣formation␣and␣replay␣of␣episodic␣memory.
     %In␣interaction␣with␣the␣prefrontal␣cortex,␣it␣infers␣stable␣contexts␣and␣frames␣of␣reference␣leading␣to␣the␣formation␣of␣spatial␣and␣conceptual␣cognitive␣maps␣\cite{Bilkey:2021,Bottini:2020,Buckner:2008,Gershman:2010,Hebscher:2016,Hebscher:2022,Hirel:2013,Liu:2019a,Sanders:2020}.
     %\newpage-->  </section>
  <section xml:id="Sx1">
    <title>Acknowledgements</title>
    <para xml:id="Sx1.p1">
      <p>This research was funded by the German Research Foundation (DFG) under Germany’s Excellence Strategy, EXC number 2064/1, project number 390727645, as well as within priority program SPP 2134, project “DeepSelf: Emergence of Event-Predictive Agency in Robots” (BU1335/14-1).
The authors thank the International Max Planck Research School for Intelligent Systems (IMPRS-IS) for supporting Fedor Scholz.</p>
    </para>
  </section>
  <bibliography citestyle="authoryear" files="neurips_2023" xml:id="bib">
    <title>References</title>
  </bibliography>
</document>
