<?xml version="1.0" encoding="UTF-8"?>
<?latexml searchpaths="/home/japhy/scienceReplication.artiswrong.com/paper_files/arxiv/2503.10241/latex_extracted"?>
<?latexml class="article" options="11pt"?>
<?latexml package="rldmsubmit,palatino"?>
<?latexml package="graphicx"?>
<?latexml package="amsmath,amssymb"?>
<?latexml package="listings"?>
<?latexml package="algorithm"?>
<?latexml package="algorithmic"?>
<?latexml package="tikz"?>
<?latexml RelaxNGSchema="LaTeXML"?>
<document xmlns="http://dlmf.nist.gov/LaTeXML" class="ltx_authors_1line">
  <resource src="LaTeXML.css" type="text/css"/>
  <resource src="ltx-article.css" type="text/css"/>
  <resource src="ltx-listings.css" type="text/css"/>
  <title>SCOOP: A Framework for Proactive Collaboration and Social Continual Learning through Natural Language Interaction and Causal Reasoning</title>
  <creator role="author">
    <personname>
Dimitri Ognibene, Sabrina Patania, Luca Annese, Cansu Koyuturk, Franca Garzotto, Giuseppe Vizzari <break/>University of Milan-Bicocca<break/>Milan, Italy<break/><text font="typewriter">dimitri.ognibene@unimib.it</text> <break/>&amp;Azzurra Ruggeri <break/>TUM School of Social Sciences and Technology <break/>Munich, Germany <break/><text font="typewriter">a.ruggeri@tum.de</text>
&amp;Simone Colombani <break/>Oversonic Robotics <break/>Carate Brianza, Italy <break/></personname>
  </creator>
  <abstract name="Abstract">
<!--  %Multimodal information-gathering settings, where users collaborate with AI helpers in rich environments, are becoming increasingly common. These scenarios often involve navigating complex, dynamic processes characterized by textual and multimodal interaction (e.g., house refurbishment plans), that often require accessing additional structural information (e.g., regulations) through cost-incurring requests to other agents. Moreover, the AI helper typically does not have access to user’s actual goals, beliefs, and preferences. Finally, AI helpers focused on natural interaction showed limitations in integrating such diverse information. 
     %Within this framework, we propose a novel social continual learning framework for causal knowledge acquisition and sequential collaborative decision-making. This framework focuses on autonomous helper agents that learn through active dialogues, question-asking, and direct interaction in open, partially observable, and dynamic environments. A central component is a natural language oracle that answers the agent’s queries about environmental mechanisms and states, allowing iterative refinement of causal understanding. The framework addresses the balance between exploration (acquiring new knowledge) and exploitation (leveraging existing knowledge) in uncertain and evolving domains.
     %Evaluation tasks are inspired by developmental psychology, emphasizing causal reasoning and question-asking skills, and complement standard benchmarks by focusing on the agent’s ability to identify knowledge gaps, generate meaningful queries, and update reasoning incrementally. The evaluation also considers how the cost of acquiring causal knowledge can be amortized across multiple tasks within the same environment.
     %We outline two base architectures: (1) a system combining Large Language Models (LLMs) with the ReAct framework and question-generation capabilities, and (2) an advanced system integrating a dedicated causal world model (symbolic, graph-based, or subsymbolic) for reasoning, learning, and sequential decision-making. The advanced system aims to acquire and exploit explanations and preferences provided by the oracle or the user and continuously builds and maintains a causal knowledge graph with efficient inference mechanisms, enabling greater adaptability under resource constraints. Challenges include integrating causal graph reasoning into ReAct and determining optimal strategies for exploration and question-asking in scenarios where errors are costly. Beyond immediate applications, this framework has the potential to computationally model complex developmental processes combining causal reasoning, question generation, and social learning and interaction.-->    <p>Multimodal information-gathering settings, where users collaborate with AI in dynamic environments, are increasingly common. These scenarios involve complex processes with textual and multimodal interaction (e.g., house refurbishment plans) and often require accessing additional structural information (e.g., regulations) via cost-incurring requests. Moreover, AI helpers lack access to users’ true goals, beliefs, and preferences and struggle to integrate diverse information effectively.</p>
    <p>We propose a social continual learning framework for causal knowledge acquisition and collaborative decision-making. It focuses on autonomous agents learning through dialogues, question-asking, and interaction in open, partially observable environments. A key component is a natural language oracle that answers the agent’s queries about environmental mechanisms and states, refining causal understanding while balancing exploration (learning) and exploitation (using knowledge).</p>
    <p>Evaluation tasks, inspired by developmental psychology, emphasize causal reasoning and question-asking skills, complementing benchmarks by assessing the agent’s ability to identify knowledge gaps, generate meaningful queries, and incrementally update reasoning. The framework also evaluates how the cost of acquiring knowledge is amortized across tasks in the same environment.</p>
    <p>We propose two architectures: (1) a system combining Large Language Models (LLMs) with the ReAct framework and question-generation, and (2) an advanced system with a causal world model (symbolic, graph-based, or subsymbolic) for reasoning and decision-making. The latter builds a causal knowledge graph for efficient inference and adaptability under constraints. Challenges include integrating causal reasoning into ReAct and optimizing exploration and question-asking in error-prone scenarios. Beyond applications, this framework models developmental processes combining causal reasoning, question generation, and social learning.</p>
    <ERROR class="undefined">\keywords</ERROR>
    <p>social learning, question generation, collaborative AI, LLM, planning, continual learning, cognitive development.</p>
  </abstract>
  <ERROR class="undefined">\usetikzlibrary</ERROR>
  <para xml:id="p1">
    <p>arrows,positioning,shapes,calc

<!--  %**** rldm.tex Line 25 **** 
     %“texttt–email˝ ““
     %“And
     %Coauthor ““
     %Affiliation ““
     %Address ““
     %“texttt–email˝ ““
     %“And
     %Coauthor ““
     %Affiliation ““
     %Address ““
     %“texttt–email˝ ““
     %(if needed)““--></p>
  </para>
<!--  %**** rldm.tex Line 50 **** 
     %**** rldm.tex Line 75 ****-->  <ERROR class="undefined">\acknowledgements</ERROR>
  <para xml:id="p2">
    <p>We acknowledge support from the Volkswagen Foundation for the project <text font="italic">Developing an Artificial Social Childhood (ASC) to improve AI causal reasoning, information gathering and decision making</text>, Ref.: 9E530.</p>
  </para>
<!--  %“startmain % to start the main 1-4 pages of the submission. -->  <pagination role="newpage"/>
  <section inlist="toc" xml:id="S1">
    <tags>
      <tag>1</tag>
      <tag role="refnum">1</tag>
      <tag role="typerefnum">§1</tag>
    </tags>
    <title><tag close=" ">1</tag>Introduction</title>
    <para xml:id="S1.p1">
      <p>With the advent of advanced generative AI, tasks involving rich multimodal and natural language information—where users collaborate with AI helpers—are becoming increasingly common. These settings often involve:</p>
    </para>
    <para xml:id="S1.p2">
      <enumerate xml:id="S1.I1">
        <item xml:id="S1.I1.i1">
          <tags>
            <tag>1.</tag>
            <tag role="refnum">1</tag>
            <tag role="typerefnum">item 1</tag>
          </tags>
          <para xml:id="S1.I1.i1.p1">
            <p><text font="bold">Sequential Collaborative Interactions:</text> performing the task requires planning and executing of sequences of interactions and independent by the user or the agent (e.g. collecting different elements and tools), with the need to account for their consequences over multiple steps.</p>
          </para>
        </item>
        <item xml:id="S1.I1.i2">
          <tags>
            <tag>2.</tag>
            <tag role="refnum">2</tag>
            <tag role="typerefnum">item 2</tag>
          </tags>
          <para xml:id="S1.I1.i2.p1">
            <p><text font="bold">Costly Social Information Access</text>: users and agents must manage limited resources when accessing information provided by other agents (e.g., mechanisms descriptions, regulations or user preferences or beliefs <cite class="ltx_citemacro_cite">[<bibref bibrefs="Bianco2020" separator="," yyseparator=","/>]</cite>).</p>
          </para>
        </item>
        <item xml:id="S1.I1.i3">
          <tags>
            <tag>3.</tag>
            <tag role="refnum">3</tag>
            <tag role="typerefnum">item 3</tag>
          </tags>
          <para xml:id="S1.I1.i3.p1">
            <p><text font="bold">Multimodal complexity</text>: combining textual descriptions, visual representations, and structural information (e.g., house refurbishment plans).</p>
          </para>
        </item>
        <item xml:id="S1.I1.i4">
          <tags>
            <tag>4.</tag>
            <tag role="refnum">4</tag>
            <tag role="typerefnum">item 4</tag>
          </tags>
          <para xml:id="S1.I1.i4.p1">
            <p><text font="bold">Dynamic Environments</text>: evolving tasks with shifting states and requirements (e.g., bureaucratic processes, repair scenarios) <cite class="ltx_citemacro_cite">[<bibref bibrefs="taniguchi2023world" separator="," yyseparator=","/>]</cite>.</p>
          </para>
        </item>
      </enumerate>
      <p>The simultaneous presence of these conditions and their interaction motivate our work to formalize and create effective solutions. Moreover, they reflect important aspects of human developmental processes <cite class="ltx_citemacro_cite">[<bibref bibrefs="cangelosi2010integration,ruggeri2016sources" separator="," yyseparator=","/>]</cite>.</p>
    </para>
    <subsection inlist="toc" xml:id="S1.SS1">
      <tags>
        <tag>1.1</tag>
        <tag role="refnum">1.1</tag>
        <tag role="typerefnum">§1.1</tag>
      </tags>
      <title><tag close=" ">1.1</tag>Example Scenarios</title>
      <para xml:id="S1.SS1.p1">
        <p><text font="bold">Example 1: Repairing and Shipping Items</text></p>
      </para>
      <para xml:id="S1.SS1.p2">
        <p>A robot agent assists in repairing and shipping items to various international destinations. Tasks include adhering to specific repair requirements, customs regulations, packaging standards, and climate conditions. The agent collaborates with the user to clarify repair strategies and queries a natural language oracle for regulatory or technical information. During downtime, the agent explores tools and equipment to refine its understanding of repair techniques, improving future performance.</p>
      </para>
      <para xml:id="S1.SS1.p3">
        <p><text font="bold">Example 2: Compiling Documents for Official Requests</text></p>
      </para>
      <para xml:id="S1.SS1.p4">
        <p>An agent aids a user in compiling documents for official applications, such as visas or tax submissions. Tasks involve identifying required documents, extracting relevant information, and validating regulations via a natural language oracle. When ambiguities arise, the agent interacts with the user for clarification. In idle periods, it autonomously explores document templates and regulatory archives to improve performance across multiple problem instances.</p>
      </para>
<!--  %**** rldm.tex Line 100 **** -->    </subsection>
  </section>
  <section inlist="toc" xml:id="S2">
    <tags>
      <tag>2</tag>
      <tag role="refnum">2</tag>
      <tag role="typerefnum">§2</tag>
    </tags>
    <title><tag close=" ">2</tag>Related Work</title>
    <para xml:id="S2.p1">
      <p><text font="bold">ReAct Framework:</text> The ReAct (Reason + Act) framework <cite class="ltx_citemacro_cite">[<bibref bibrefs="yao2023" separator="," yyseparator=","/>]</cite> integrates reasoning and acting capabilities into LLMs, enabling contextual decision-making and blending question generation with action-taking in collaborative environments.</p>
    </para>
    <para xml:id="S2.p2">
      <p><text font="bold">Causal Reasoning in AI:</text> Techniques like causal discovery <cite class="ltx_citemacro_cite">[<bibref bibrefs="peters2017elements" separator="," yyseparator=","/>]</cite> and inference frameworks such as DoWhy <cite class="ltx_citemacro_cite">[<bibref bibrefs="sharma2020dowhy" separator="," yyseparator=","/>]</cite> provide essential tools for reasoning in dynamic, partially observable environments.</p>
    </para>
    <para xml:id="S2.p3">
      <p><text font="bold">Integrating Causal Graphs with LLMs:</text> Recent research integrates causal graphs with LLMs to enhance reasoning and decision-making. Embedding causal reasoning into LLM workflows enables systems to interpret environments, predict outcomes, and optimize actions. Frameworks combining LLMs with causal world models <cite class="ltx_citemacro_cite">[<bibref bibrefs="gkountouras2024languageagentsmeetcausality" separator="," yyseparator=","/>]</cite> and studies on automating causal discovery <cite class="ltx_citemacro_cite">[<bibref bibrefs="long2024largelanguagemodelsbuild" separator="," yyseparator=","/>]</cite> highlight the potential of causal representation learning for dynamically constructing world models.</p>
    </para>
    <para xml:id="S2.p4">
      <p><text font="bold">LLMs and Planning Systems:</text> Combining LLMs with planning systems, such as symbolic planners or reinforcement learning frameworks, supports structured decision-making. Hybrid approaches integrate LLMs for language understanding and planners for task execution, excelling in complex scenarios like multi-agent collaboration and robotics <cite class="ltx_citemacro_cite">[<bibref bibrefs="colombani2024one,kambhampati2024llms" separator="," yyseparator=","/>]</cite>.</p>
    </para>
    <para xml:id="S2.p5">
      <p><text font="bold">Causal Discovery and Question Generation:</text> Integrating LLMs with causal discovery tools enhances reasoning in dynamic environments. For instance, PyWhy-LLM supports causal analysis <cite class="ltx_citemacro_cite">[<bibref bibrefs="kiciman2023causal" separator="," yyseparator=","/>]</cite>, while DoWhy-GCM facilitates inference in graphical causal models <cite class="ltx_citemacro_cite">[<bibref bibrefs="blobaum2024dowhy" separator="," yyseparator=","/>]</cite>. These tools enable LLMs to refine causal graphs and resolve ambiguities, closing the loop between knowledge acquisition and action.</p>
    </para>
    <para xml:id="S2.p6">
      <p><text font="bold">Symbolic and Subsymbolic Integration:</text> Hybrid architectures bridge symbolic causal reasoning with the subsymbolic capabilities of LLMs, combining graph-based reasoning with unstructured data processing for robust decision-making agents <cite class="ltx_citemacro_cite">[<bibref bibrefs="hitzler2022neuro,Ibrahim2024" separator="," yyseparator=","/>]</cite>.</p>
    </para>
    <para xml:id="S2.p7">
      <p><text font="bold">Active Information Gathering:</text> Research on active learning and information gain <cite class="ltx_citemacro_cite">[<bibref bibrefs="patania2024large,bertolazzi2023chatgpt,friston2015active,ognibene2019proactive,masiero2024search" separator="," yyseparator=","/>]</cite> informs strategies for querying to maximize utility while balancing exploration and exploitation in interactive systems.</p>
    </para>
    <para xml:id="S2.p8">
      <p><text font="bold">Human-in-the-Loop Systems:</text> Human feedback integration highlights the value of interactive learning, enabling agents to dynamically query users and adapt to preferences <cite class="ltx_citemacro_cite">[<bibref bibrefs="amershi2014power" separator="," yyseparator=","/>]</cite>.</p>
    </para>
    <para xml:id="S2.p9">
      <p><text font="bold">Developmental Psychology Insights:</text> Insights from children’s causal learning <cite class="ltx_citemacro_cite">[<bibref bibrefs="gopnik2004theory,legare2014curiosity" separator="," yyseparator=","/>]</cite> inspire benchmarks to evaluate AI agents’ abilities in question generation and causal inference.</p>
    </para>
    <para xml:id="S2.p10">
      <p>By synthesizing these elements, our framework advances adaptive, interactive AI systems at the intersection of causal reasoning, social learning, and multimodal interaction.</p>
    </para>
  </section>
  <section inlist="toc" xml:id="S3">
    <tags>
      <tag>3</tag>
      <tag role="refnum">3</tag>
      <tag role="typerefnum">§3</tag>
    </tags>
    <title><tag close=" ">3</tag>The SCOOP framework: Social Continual Object-Oriented POMDP </title>
    <paragraph inlist="toc" xml:id="S3.SS0.SSS0.Px1">
      <title>Formal Framework.</title>
      <para xml:id="S3.SS0.SSS0.Px1.p1">
        <p>We formalize our rich interactive setting as an <emph font="italic">object-oriented</emph> partially observable Markov decision process (OO-POMDP), extended to incorporate a <emph font="italic">lingusistic world descriptor</emph> generating natural language observations (e.g. the initial task description in natural language), multiple problem instances, a <emph font="italic">user</emph> with problem-specific objectives, and a <emph font="italic">natural language oracle</emph> that provides causal information. Let
<!--  %**** rldm.tex Line 125 **** --></p>
        <equation xml:id="S3.Ex1">
          <Math mode="display" tex="\mathcal{D}=\bigl{(}\mathcal{T}_{O},\mathcal{F}_{O},\mathcal{R}_{O},\mathcal{W%&#10;}_{O}\bigr{)}" text="D = vector@(T _ O, F _ O, R _ O, W _ O)" xml:id="S3.Ex1.m1">
            <XMath>
              <XMApp>
                <XMTok meaning="equals" role="RELOP">=</XMTok>
                <XMTok font="caligraphic" role="UNKNOWN">D</XMTok>
                <XMDual>
                  <XMApp>
                    <XMTok meaning="vector"/>
                    <XMRef idref="S3.Ex1.m1.1"/>
                    <XMRef idref="S3.Ex1.m1.2"/>
                    <XMRef idref="S3.Ex1.m1.3"/>
                    <XMRef idref="S3.Ex1.m1.4"/>
                  </XMApp>
                  <XMWrap>
                    <XMTok fontsize="120%" role="OPEN" stretchy="false">(</XMTok>
                    <XMApp xml:id="S3.Ex1.m1.1">
                      <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                      <XMTok font="caligraphic" role="UNKNOWN">T</XMTok>
                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">O</XMTok>
                    </XMApp>
                    <XMTok role="PUNCT">,</XMTok>
                    <XMApp xml:id="S3.Ex1.m1.2">
                      <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                      <XMTok font="caligraphic" role="UNKNOWN">F</XMTok>
                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">O</XMTok>
                    </XMApp>
                    <XMTok role="PUNCT">,</XMTok>
                    <XMApp xml:id="S3.Ex1.m1.3">
                      <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                      <XMTok font="caligraphic" role="UNKNOWN">R</XMTok>
                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">O</XMTok>
                    </XMApp>
                    <XMTok role="PUNCT">,</XMTok>
                    <XMApp xml:id="S3.Ex1.m1.4">
                      <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                      <XMTok font="caligraphic" role="UNKNOWN">W</XMTok>
                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">O</XMTok>
                    </XMApp>
                    <XMTok fontsize="120%" role="CLOSE" stretchy="false">)</XMTok>
                  </XMWrap>
                </XMDual>
              </XMApp>
            </XMath>
          </Math>
        </equation>
        <p>denote a <emph font="italic">domain specification</emph>, where:</p>
        <itemize xml:id="S3.I1">
          <item xml:id="S3.I1.i1">
            <tags>
              <tag>•</tag>
              <tag role="typerefnum">1st item</tag>
            </tags>
            <para xml:id="S3.I1.i1.p1">
              <p><Math mode="inline" tex="\mathcal{T}_{O}" text="T _ O" xml:id="S3.I1.i1.p1.m1">
                  <XMath>
                    <XMApp>
                      <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                      <XMTok font="caligraphic" role="UNKNOWN">T</XMTok>
                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">O</XMTok>
                    </XMApp>
                  </XMath>
                </Math> is a set of object types (e.g., boxes, tools, containers);</p>
            </para>
          </item>
          <item xml:id="S3.I1.i2">
            <tags>
              <tag>•</tag>
              <tag role="typerefnum">2nd item</tag>
            </tags>
            <para xml:id="S3.I1.i2.p1">
              <p><Math mode="inline" tex="\mathcal{F}_{O}" text="F _ O" xml:id="S3.I1.i2.p1.m1">
                  <XMath>
                    <XMApp>
                      <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                      <XMTok font="caligraphic" role="UNKNOWN">F</XMTok>
                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">O</XMTok>
                    </XMApp>
                  </XMath>
                </Math> is the set of allowed features or predicates relevant to the domain (e.g., “contains(x,y)” or “isOpenable(x)”);</p>
            </para>
          </item>
          <item xml:id="S3.I1.i3">
            <tags>
              <tag>•</tag>
              <tag role="typerefnum">3rd item</tag>
            </tags>
            <para xml:id="S3.I1.i3.p1">
              <p><Math mode="inline" tex="\mathcal{R}_{O}" text="R _ O" xml:id="S3.I1.i3.p1.m1">
                  <XMath>
                    <XMApp>
                      <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                      <XMTok font="caligraphic" role="UNKNOWN">R</XMTok>
                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">O</XMTok>
                    </XMApp>
                  </XMath>
                </Math> is an evolving set of <emph font="italic">causal rules</emph> capturing how these features and object types interact (e.g., whether opening a container allows access to its contents), some of which may be unknown or partially specified;</p>
            </para>
          </item>
          <item xml:id="S3.I1.i4">
            <tags>
              <tag>•</tag>
              <tag role="typerefnum">4th item</tag>
            </tags>
            <para xml:id="S3.I1.i4.p1">
              <p><Math mode="inline" tex="\mathcal{W}_{O}" text="W _ O" xml:id="S3.I1.i4.p1.m1">
                  <XMath>
                    <XMApp>
                      <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                      <XMTok font="caligraphic" role="UNKNOWN">W</XMTok>
                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">O</XMTok>
                    </XMApp>
                  </XMath>
                </Math> is the family of possible world configurations consistent with <Math mode="inline" tex="\mathcal{T}_{O},\mathcal{F}_{O}," text="list@(T _ O, F _ O)" xml:id="S3.I1.i4.p1.m2">
                  <XMath>
                    <XMDual>
                      <XMRef idref="S3.I1.i4.p1.m2.1"/>
                      <XMWrap>
                        <XMDual xml:id="S3.I1.i4.p1.m2.1">
                          <XMApp>
                            <XMTok meaning="list"/>
                            <XMRef idref="S3.I1.i4.p1.m2.1.1"/>
                            <XMRef idref="S3.I1.i4.p1.m2.1.2"/>
                          </XMApp>
                          <XMWrap>
                            <XMApp xml:id="S3.I1.i4.p1.m2.1.1">
                              <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                              <XMTok font="caligraphic" role="UNKNOWN">T</XMTok>
                              <XMTok font="italic" fontsize="70%" role="UNKNOWN">O</XMTok>
                            </XMApp>
                            <XMTok role="PUNCT">,</XMTok>
                            <XMApp xml:id="S3.I1.i4.p1.m2.1.2">
                              <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                              <XMTok font="caligraphic" role="UNKNOWN">F</XMTok>
                              <XMTok font="italic" fontsize="70%" role="UNKNOWN">O</XMTok>
                            </XMApp>
                          </XMWrap>
                        </XMDual>
                        <XMTok role="PUNCT">,</XMTok>
                      </XMWrap>
                    </XMDual>
                  </XMath>
                </Math> and <Math mode="inline" tex="\mathcal{R}_{O}" text="R _ O" xml:id="S3.I1.i4.p1.m3">
                  <XMath>
                    <XMApp>
                      <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                      <XMTok font="caligraphic" role="UNKNOWN">R</XMTok>
                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">O</XMTok>
                    </XMApp>
                  </XMath>
                </Math>.</p>
            </para>
          </item>
        </itemize>
      </para>
      <para xml:id="S3.SS0.SSS0.Px1.p2">
        <p>A <emph font="italic">problem instance</emph> <Math mode="inline" tex="\theta" text="theta" xml:id="S3.SS0.SSS0.Px1.p2.m1">
            <XMath>
              <XMTok font="italic" name="theta" role="UNKNOWN">θ</XMTok>
            </XMath>
          </Math> refines <Math mode="inline" tex="\mathcal{D}" text="D" xml:id="S3.SS0.SSS0.Px1.p2.m2">
            <XMath>
              <XMTok font="caligraphic" role="UNKNOWN">D</XMTok>
            </XMath>
          </Math> with a concrete set of objects, an initial state, and a <emph font="italic">user objective</emph> (defined by a problem-specific reward function <Math mode="inline" tex="r_{\theta}" text="r _ theta" xml:id="S3.SS0.SSS0.Px1.p2.m3">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="italic" role="UNKNOWN">r</XMTok>
                <XMTok font="italic" fontsize="70%" name="theta" role="UNKNOWN">θ</XMTok>
              </XMApp>
            </XMath>
          </Math> that encodes the user’s goals). The <emph font="italic">helper agent</emph> interacts with both the user, making questions to clarify his goals or preferences, presenting results and suggestions, or other interactive actions, and a <emph font="italic">natural language oracle</emph> (to acquire missing causal rules or environment states) across potentially many problem instances <Math mode="inline" tex="\{\theta_{i}\}" text="set@(theta _ i)" xml:id="S3.SS0.SSS0.Px1.p2.m4">
            <XMath>
              <XMDual>
                <XMApp>
                  <XMTok meaning="set"/>
                  <XMRef idref="S3.SS0.SSS0.Px1.p2.m4.1"/>
                </XMApp>
                <XMWrap>
                  <XMTok role="OPEN" stretchy="false">{</XMTok>
                  <XMApp xml:id="S3.SS0.SSS0.Px1.p2.m4.1">
                    <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="italic" name="theta" role="UNKNOWN">θ</XMTok>
                    <XMTok font="italic" fontsize="70%" role="UNKNOWN">i</XMTok>
                  </XMApp>
                  <XMTok role="CLOSE" stretchy="false">}</XMTok>
                </XMWrap>
              </XMDual>
            </XMath>
          </Math>. The oracle responds in multiple formats:</p>
        <enumerate xml:id="S3.I2">
          <item xml:id="S3.I2.i1">
            <tags>
              <tag>1.</tag>
              <tag role="refnum">1</tag>
              <tag role="typerefnum">item 1</tag>
            </tags>
            <para xml:id="S3.I2.i1.p1">
              <p><emph font="italic">Language-based descriptions</emph> of environment dynamics, such as “box A must be opened before retrieving item B”;</p>
            </para>
          </item>
          <item xml:id="S3.I2.i2">
            <tags>
              <tag>2.</tag>
              <tag role="refnum">2</tag>
              <tag role="typerefnum">item 2</tag>
            </tags>
            <para xml:id="S3.I2.i2.p1">
              <p><emph font="italic">Formal causal chunks</emph>, where the oracle may directly provide rules or graphs parts (e.g., “node <text class="ltx_markedasmath" font="typewriter">Open(Box)</text> causes <text class="ltx_markedasmath" font="typewriter">Accessible(Item)</text>”),</p>
            </para>
          </item>
          <item xml:id="S3.I2.i3">
            <tags>
              <tag>3.</tag>
              <tag role="refnum">3</tag>
              <tag role="typerefnum">item 3</tag>
            </tags>
            <para xml:id="S3.I2.i3.p1">
              <p><emph font="italic">Observation-like feedback</emph>, akin to sensor readings or state confirmations.</p>
            </para>
          </item>
        </enumerate>
      </para>
      <para xml:id="S3.SS0.SSS0.Px1.p3">
        <p>The overall action space <Math mode="inline" tex="\mathcal{A}=\mathcal{A}^{a}\cup\mathcal{A}^{u}" text="A = A ^ a union A ^ u" xml:id="S3.SS0.SSS0.Px1.p3.m1">
            <XMath>
              <XMApp>
                <XMTok meaning="equals" role="RELOP">=</XMTok>
                <XMTok font="caligraphic" role="UNKNOWN">A</XMTok>
                <XMApp>
                  <XMTok meaning="union" name="cup" role="ADDOP">∪</XMTok>
                  <XMApp>
                    <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="caligraphic" role="UNKNOWN">A</XMTok>
                    <XMTok font="italic" fontsize="70%" role="UNKNOWN">a</XMTok>
                  </XMApp>
                  <XMApp>
                    <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="caligraphic" role="UNKNOWN">A</XMTok>
                    <XMTok font="italic" fontsize="70%" role="UNKNOWN">u</XMTok>
                  </XMApp>
                </XMApp>
              </XMApp>
            </XMath>
          </Math> composed by the agent’s <emph font="italic">action space</emph> by <Math mode="inline" tex="\mathcal{A}^{a}" text="A ^ a" xml:id="S3.SS0.SSS0.Px1.p3.m2">
            <XMath>
              <XMApp>
                <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                <XMTok font="caligraphic" role="UNKNOWN">A</XMTok>
                <XMTok font="italic" fontsize="70%" role="UNKNOWN">a</XMTok>
              </XMApp>
            </XMath>
          </Math> and the user’s <emph font="italic">action space</emph> by <Math mode="inline" tex="\mathcal{A}^{u}" text="A ^ u" xml:id="S3.SS0.SSS0.Px1.p3.m3">
            <XMath>
              <XMApp>
                <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                <XMTok font="caligraphic" role="UNKNOWN">A</XMTok>
                <XMTok font="italic" fontsize="70%" role="UNKNOWN">u</XMTok>
              </XMApp>
            </XMath>
          </Math>.
We denote the agent’s <emph font="italic">action space</emph> by <Math mode="inline" tex="\mathcal{A}^{a}=\mathcal{A}^{a}_{\mathrm{act}}\cup\mathcal{A}^{a}_{\mathrm{%&#10;query}}" text="A ^ a = (A ^ a) _ act union (A ^ a) _ query" xml:id="S3.SS0.SSS0.Px1.p3.m4">
            <XMath>
              <XMApp>
                <XMTok meaning="equals" role="RELOP">=</XMTok>
                <XMApp>
                  <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                  <XMTok font="caligraphic" role="UNKNOWN">A</XMTok>
                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">a</XMTok>
                </XMApp>
                <XMApp>
                  <XMTok meaning="union" name="cup" role="ADDOP">∪</XMTok>
                  <XMApp>
                    <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                    <XMApp>
                      <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                      <XMTok font="caligraphic" role="UNKNOWN">A</XMTok>
                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">a</XMTok>
                    </XMApp>
                    <XMTok fontsize="70%" role="UNKNOWN">act</XMTok>
                  </XMApp>
                  <XMApp>
                    <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                    <XMApp>
                      <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                      <XMTok font="caligraphic" role="UNKNOWN">A</XMTok>
                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">a</XMTok>
                    </XMApp>
                    <XMTok fontsize="70%" role="UNKNOWN">query</XMTok>
                  </XMApp>
                </XMApp>
              </XMApp>
            </XMath>
          </Math>, where <Math mode="inline" tex="\mathcal{A}_{\mathrm{act}}" text="A _ act" xml:id="S3.SS0.SSS0.Px1.p3.m5">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="caligraphic" role="UNKNOWN">A</XMTok>
                <XMTok fontsize="70%" role="UNKNOWN">act</XMTok>
              </XMApp>
            </XMath>
          </Math> includes environment-oriented actions (e.g., open, pick, place) and <Math mode="inline" tex="\mathcal{A}_{\mathrm{query}}" text="A _ query" xml:id="S3.SS0.SSS0.Px1.p3.m6">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="caligraphic" role="UNKNOWN">A</XMTok>
                <XMTok fontsize="70%" role="UNKNOWN">query</XMTok>
              </XMApp>
            </XMath>
          </Math> includes queries to the oracle or the user.
The agent also observes the user’s feedback or clarifications regarding the task objectives and environment states. We denote the user’s <emph font="italic">action space</emph> by <Math mode="inline" tex="\mathcal{A}^{u}=\mathcal{A}^{u}_{\mathrm{act}}\cup\mathcal{A}^{u}_{\mathrm{%&#10;query}}" text="A ^ u = (A ^ u) _ act union (A ^ u) _ query" xml:id="S3.SS0.SSS0.Px1.p3.m7">
            <XMath>
              <XMApp>
                <XMTok meaning="equals" role="RELOP">=</XMTok>
                <XMApp>
                  <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                  <XMTok font="caligraphic" role="UNKNOWN">A</XMTok>
                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">u</XMTok>
                </XMApp>
                <XMApp>
                  <XMTok meaning="union" name="cup" role="ADDOP">∪</XMTok>
                  <XMApp>
                    <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                    <XMApp>
                      <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                      <XMTok font="caligraphic" role="UNKNOWN">A</XMTok>
                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">u</XMTok>
                    </XMApp>
                    <XMTok fontsize="70%" role="UNKNOWN">act</XMTok>
                  </XMApp>
                  <XMApp>
                    <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                    <XMApp>
                      <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                      <XMTok font="caligraphic" role="UNKNOWN">A</XMTok>
                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">u</XMTok>
                    </XMApp>
                    <XMTok fontsize="70%" role="UNKNOWN">query</XMTok>
                  </XMApp>
                </XMApp>
              </XMApp>
            </XMath>
          </Math>, where <Math mode="inline" tex="\mathcal{A^{u}}_{\mathrm{act}}" text="(A ^ u) _ act" xml:id="S3.SS0.SSS0.Px1.p3.m8">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMApp>
                  <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                  <XMTok font="caligraphic" role="UNKNOWN">A</XMTok>
                  <XMTok font="caligraphic" fontsize="70%" role="UNKNOWN">u</XMTok>
                </XMApp>
                <XMTok fontsize="70%" role="UNKNOWN">act</XMTok>
              </XMApp>
            </XMath>
          </Math> includes environment-oriented actions (e.g., open, pick, place) and <Math mode="inline" tex="\mathcal{A}_{\mathrm{query}}" text="A _ query" xml:id="S3.SS0.SSS0.Px1.p3.m9">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="caligraphic" role="UNKNOWN">A</XMTok>
                <XMTok fontsize="70%" role="UNKNOWN">query</XMTok>
              </XMApp>
            </XMath>
          </Math> includes queries to the agent. Formally, each problem instance is modeled as:</p>
        <equation xml:id="S3.Ex2">
          <Math mode="display" tex="(\mathcal{S}_{\theta},\mathcal{A},\Omega_{\theta},\mathcal{T}_{\theta},%&#10;\mathcal{O}_{\theta},r^{u}_{\theta},r^{a}_{\theta},\gamma,\beta)," text="vector@(S _ theta, A, Omega _ theta, T _ theta, O _ theta, (r ^ u) _ theta, (r ^ a) _ theta, gamma, beta)" xml:id="S3.Ex2.m1">
            <XMath>
              <XMDual>
                <XMRef idref="S3.Ex2.m1.4"/>
                <XMWrap>
                  <XMDual xml:id="S3.Ex2.m1.4">
                    <XMApp>
                      <XMTok meaning="vector"/>
                      <XMRef idref="S3.Ex2.m1.4.1"/>
                      <XMRef idref="S3.Ex2.m1.1"/>
                      <XMRef idref="S3.Ex2.m1.4.2"/>
                      <XMRef idref="S3.Ex2.m1.4.3"/>
                      <XMRef idref="S3.Ex2.m1.4.4"/>
                      <XMRef idref="S3.Ex2.m1.4.5"/>
                      <XMRef idref="S3.Ex2.m1.4.6"/>
                      <XMRef idref="S3.Ex2.m1.2"/>
                      <XMRef idref="S3.Ex2.m1.3"/>
                    </XMApp>
                    <XMWrap>
                      <XMTok role="OPEN" stretchy="false">(</XMTok>
                      <XMApp xml:id="S3.Ex2.m1.4.1">
                        <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                        <XMTok font="caligraphic" role="UNKNOWN">S</XMTok>
                        <XMTok font="italic" fontsize="70%" name="theta" role="UNKNOWN">θ</XMTok>
                      </XMApp>
                      <XMTok role="PUNCT">,</XMTok>
                      <XMTok font="caligraphic" role="UNKNOWN" xml:id="S3.Ex2.m1.1">A</XMTok>
                      <XMTok role="PUNCT">,</XMTok>
                      <XMApp xml:id="S3.Ex2.m1.4.2">
                        <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                        <XMTok name="Omega" role="UNKNOWN">Ω</XMTok>
                        <XMTok font="italic" fontsize="70%" name="theta" role="UNKNOWN">θ</XMTok>
                      </XMApp>
                      <XMTok role="PUNCT">,</XMTok>
                      <XMApp xml:id="S3.Ex2.m1.4.3">
                        <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                        <XMTok font="caligraphic" role="UNKNOWN">T</XMTok>
                        <XMTok font="italic" fontsize="70%" name="theta" role="UNKNOWN">θ</XMTok>
                      </XMApp>
                      <XMTok role="PUNCT">,</XMTok>
                      <XMApp xml:id="S3.Ex2.m1.4.4">
                        <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                        <XMTok font="caligraphic" role="UNKNOWN">O</XMTok>
                        <XMTok font="italic" fontsize="70%" name="theta" role="UNKNOWN">θ</XMTok>
                      </XMApp>
                      <XMTok role="PUNCT">,</XMTok>
                      <XMApp xml:id="S3.Ex2.m1.4.5">
                        <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                        <XMApp>
                          <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                          <XMTok font="italic" role="UNKNOWN">r</XMTok>
                          <XMTok font="italic" fontsize="70%" role="UNKNOWN">u</XMTok>
                        </XMApp>
                        <XMTok font="italic" fontsize="70%" name="theta" role="UNKNOWN">θ</XMTok>
                      </XMApp>
                      <XMTok role="PUNCT">,</XMTok>
                      <XMApp xml:id="S3.Ex2.m1.4.6">
                        <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                        <XMApp>
                          <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                          <XMTok font="italic" role="UNKNOWN">r</XMTok>
                          <XMTok font="italic" fontsize="70%" role="UNKNOWN">a</XMTok>
                        </XMApp>
                        <XMTok font="italic" fontsize="70%" name="theta" role="UNKNOWN">θ</XMTok>
                      </XMApp>
                      <XMTok role="PUNCT">,</XMTok>
                      <XMTok font="italic" name="gamma" role="UNKNOWN" xml:id="S3.Ex2.m1.2">γ</XMTok>
                      <XMTok role="PUNCT">,</XMTok>
                      <XMTok font="italic" name="beta" role="UNKNOWN" xml:id="S3.Ex2.m1.3">β</XMTok>
                      <XMTok role="CLOSE" stretchy="false">)</XMTok>
                    </XMWrap>
                  </XMDual>
                  <XMTok role="PUNCT">,</XMTok>
                </XMWrap>
              </XMDual>
            </XMath>
          </Math>
        </equation>
        <p>mirroring a POMDP with the following modifications:
<!--  %**** rldm.tex Line 150 **** --></p>
        <itemize xml:id="S3.I3">
          <item xml:id="S3.I3.i1">
            <tags>
              <tag>•</tag>
              <tag role="typerefnum">1st item</tag>
            </tags>
            <para xml:id="S3.I3.i1.p1">
              <p><Math mode="inline" tex="\mathcal{S}_{\theta}" text="S _ theta" xml:id="S3.I3.i1.p1.m1">
                  <XMath>
                    <XMApp>
                      <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                      <XMTok font="caligraphic" role="UNKNOWN">S</XMTok>
                      <XMTok font="italic" fontsize="70%" name="theta" role="UNKNOWN">θ</XMTok>
                    </XMApp>
                  </XMath>
                </Math> embeds the object-based states from <Math mode="inline" tex="\theta" text="theta" xml:id="S3.I3.i1.p1.m2">
                  <XMath>
                    <XMTok font="italic" name="theta" role="UNKNOWN">θ</XMTok>
                  </XMath>
                </Math> and any partial knowledge of the causal rules <Math mode="inline" tex="\mathcal{R}_{O}" text="R _ O" xml:id="S3.I3.i1.p1.m3">
                  <XMath>
                    <XMApp>
                      <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                      <XMTok font="caligraphic" role="UNKNOWN">R</XMTok>
                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">O</XMTok>
                    </XMApp>
                  </XMath>
                </Math>;</p>
            </para>
          </item>
          <item xml:id="S3.I3.i2">
            <tags>
              <tag>•</tag>
              <tag role="typerefnum">2nd item</tag>
            </tags>
            <para xml:id="S3.I3.i2.p1">
              <p><Math mode="inline" tex="\Omega_{\theta}" text="Omega _ theta" xml:id="S3.I3.i2.p1.m1">
                  <XMath>
                    <XMApp>
                      <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                      <XMTok name="Omega" role="UNKNOWN">Ω</XMTok>
                      <XMTok font="italic" fontsize="70%" name="theta" role="UNKNOWN">θ</XMTok>
                    </XMApp>
                  </XMath>
                </Math> is the space of possible observations, spanning both <emph font="italic">environmental signals</emph> (e.g., sensor readings, gripper state, environment map chunk, etc) and <emph font="italic">language-based</emph> responses from the user or oracle;</p>
            </para>
          </item>
          <item xml:id="S3.I3.i3">
            <tags>
              <tag>•</tag>
              <tag role="typerefnum">3rd item</tag>
            </tags>
            <para xml:id="S3.I3.i3.p1">
              <p><Math mode="inline" tex="r^{u}_{\theta}" text="(r ^ u) _ theta" xml:id="S3.I3.i3.p1.m1">
                  <XMath>
                    <XMApp>
                      <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                      <XMApp>
                        <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                        <XMTok font="italic" role="UNKNOWN">r</XMTok>
                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">u</XMTok>
                      </XMApp>
                      <XMTok font="italic" fontsize="70%" name="theta" role="UNKNOWN">θ</XMTok>
                    </XMApp>
                  </XMath>
                </Math> encodes the <emph font="italic">user’s objectives</emph> for instance <Math mode="inline" tex="\theta" text="theta" xml:id="S3.I3.i3.p1.m2">
                  <XMath>
                    <XMTok font="italic" name="theta" role="UNKNOWN">θ</XMTok>
                  </XMath>
                </Math>, which the helper agent aims to optimize;</p>
            </para>
          </item>
          <item xml:id="S3.I3.i4">
            <tags>
              <tag>•</tag>
              <tag role="typerefnum">4th item</tag>
            </tags>
            <para xml:id="S3.I3.i4.p1">
              <p><Math mode="inline" tex="r^{a}_{\theta}" text="(r ^ a) _ theta" xml:id="S3.I3.i4.p1.m1">
                  <XMath>
                    <XMApp>
                      <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                      <XMApp>
                        <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                        <XMTok font="italic" role="UNKNOWN">r</XMTok>
                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">a</XMTok>
                      </XMApp>
                      <XMTok font="italic" fontsize="70%" name="theta" role="UNKNOWN">θ</XMTok>
                    </XMApp>
                  </XMath>
                </Math> encodes the <emph font="italic">helper agent’s action costs</emph> for instance <Math mode="inline" tex="\theta" text="theta" xml:id="S3.I3.i4.p1.m2">
                  <XMath>
                    <XMTok font="italic" name="theta" role="UNKNOWN">θ</XMTok>
                  </XMath>
                </Math>, which the helper agent aims to optimize;</p>
            </para>
          </item>
          <item xml:id="S3.I3.i5">
            <tags>
              <tag>•</tag>
              <tag role="typerefnum">5th item</tag>
            </tags>
            <para xml:id="S3.I3.i5.p1">
              <p><Math mode="inline" tex="\beta(\cdot)" text="beta * cdot" xml:id="S3.I3.i5.p1.m1">
                  <XMath>
                    <XMApp>
                      <XMTok meaning="times" role="MULOP">⁢</XMTok>
                      <XMTok font="italic" name="beta" role="UNKNOWN">β</XMTok>
                      <XMDual>
                        <XMRef idref="S3.I3.i5.p1.m1.1"/>
                        <XMWrap>
                          <XMTok role="OPEN" stretchy="false">(</XMTok>
                          <XMTok name="cdot" role="MULOP" xml:id="S3.I3.i5.p1.m1.1">⋅</XMTok>
                          <XMTok role="CLOSE" stretchy="false">)</XMTok>
                        </XMWrap>
                      </XMDual>
                    </XMApp>
                  </XMath>
                </Math> defines the <emph font="italic">cost of querying</emph> (time, resources, or complexity) the oracle to obtain new causal information or
the user about current objective and preferences, i.e. <Math mode="inline" tex="r^{u}_{\theta}" text="(r ^ u) _ theta" xml:id="S3.I3.i5.p1.m2">
                  <XMath>
                    <XMApp>
                      <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                      <XMApp>
                        <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                        <XMTok font="italic" role="UNKNOWN">r</XMTok>
                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">u</XMTok>
                      </XMApp>
                      <XMTok font="italic" fontsize="70%" name="theta" role="UNKNOWN">θ</XMTok>
                    </XMApp>
                  </XMath>
                </Math>.</p>
            </para>
          </item>
        </itemize>
      </para>
      <para xml:id="S3.SS0.SSS0.Px1.p4">
        <p>Crucially, the agent can <emph font="italic">explore</emph> the domain outside active tasks to refine <Math mode="inline" tex="\mathcal{R}_{O}" text="R _ O" xml:id="S3.SS0.SSS0.Px1.p4.m1">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="caligraphic" role="UNKNOWN">R</XMTok>
                <XMTok font="italic" fontsize="70%" role="UNKNOWN">O</XMTok>
              </XMApp>
            </XMath>
          </Math> (e.g., by performing experiments or asking domain-level questions). Any information gleaned is <emph font="italic">amortized</emph> across future tasks <Math mode="inline" tex="\theta_{j}" text="theta _ j" xml:id="S3.SS0.SSS0.Px1.p4.m2">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="italic" name="theta" role="UNKNOWN">θ</XMTok>
                <XMTok font="italic" fontsize="70%" role="UNKNOWN">j</XMTok>
              </XMApp>
            </XMath>
          </Math>. This design enables <text font="bold">continual learning</text> of domain mechanics: as the agent accumulates causal knowledge (e.g., “a certain box can contain items of type <Math mode="inline" tex="T" text="T" xml:id="S3.SS0.SSS0.Px1.p4.m3">
            <XMath>
              <XMTok font="italic" role="UNKNOWN">T</XMTok>
            </XMath>
          </Math>”), it improves performance in subsequent problem instances. More formally this is obtained assuming that at each instance <Math mode="inline" tex="\theta_{j}" text="theta _ j" xml:id="S3.SS0.SSS0.Px1.p4.m4">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="italic" name="theta" role="UNKNOWN">θ</XMTok>
                <XMTok font="italic" fontsize="70%" role="UNKNOWN">j</XMTok>
              </XMApp>
            </XMath>
          </Math> the specific instantiation of <Math mode="inline" tex="\mathcal{R}_{O}" text="R _ O" xml:id="S3.SS0.SSS0.Px1.p4.m5">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="caligraphic" role="UNKNOWN">R</XMTok>
                <XMTok font="italic" fontsize="70%" role="UNKNOWN">O</XMTok>
              </XMApp>
            </XMath>
          </Math>, <Math mode="inline" tex="r^{u}_{\theta}" text="(r ^ u) _ theta" xml:id="S3.SS0.SSS0.Px1.p4.m6">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMApp>
                  <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                  <XMTok font="italic" role="UNKNOWN">r</XMTok>
                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">u</XMTok>
                </XMApp>
                <XMTok font="italic" fontsize="70%" name="theta" role="UNKNOWN">θ</XMTok>
              </XMApp>
            </XMath>
          </Math> and <Math mode="inline" tex="r^{a}_{\theta}" text="(r ^ a) _ theta" xml:id="S3.SS0.SSS0.Px1.p4.m7">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMApp>
                  <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                  <XMTok font="italic" role="UNKNOWN">r</XMTok>
                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">a</XMTok>
                </XMApp>
                <XMTok font="italic" fontsize="70%" name="theta" role="UNKNOWN">θ</XMTok>
              </XMApp>
            </XMath>
          </Math> are extracted from the same distribution.</p>
      </para>
      <para xml:id="S3.SS0.SSS0.Px1.p5">
        <p>Ultimately, the helper agent objective functions is:
<Math mode="inline" tex="\sum_{\theta}\sum_{t}^{T(\theta)}\gamma^{t+T(-\theta)}(r^{u}_{\theta}(s_{t},a^%&#10;{u}_{t})+r^{a}_{\theta}(s_{t},a^{a}_{t})+\beta(a^{a}_{t}))" text="(sum _ theta)@(((sum _ t) ^ (T * theta))@(gamma ^ (t + T * (- theta)) * ((r ^ u) _ theta * open-interval@(s _ t, (a ^ u) _ t) + (r ^ a) _ theta * open-interval@(s _ t, (a ^ a) _ t) + beta * (a ^ a) _ t)))" xml:id="S3.SS0.SSS0.Px1.p5.m1">
            <XMath>
              <XMApp>
                <XMApp>
                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                  <XMTok mathstyle="text" meaning="sum" role="SUMOP" scriptpos="post">∑</XMTok>
                  <XMTok font="italic" fontsize="70%" name="theta" role="UNKNOWN">θ</XMTok>
                </XMApp>
                <XMApp>
                  <XMApp>
                    <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                    <XMApp>
                      <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                      <XMTok mathstyle="text" meaning="sum" role="SUMOP" scriptpos="post">∑</XMTok>
                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
                    </XMApp>
                    <XMApp>
                      <XMTok meaning="times" role="MULOP">⁢</XMTok>
                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">T</XMTok>
                      <XMDual>
                        <XMRef idref="S3.SS0.SSS0.Px1.p5.m1.1"/>
                        <XMWrap>
                          <XMTok fontsize="70%" role="OPEN" stretchy="false">(</XMTok>
                          <XMTok font="italic" fontsize="70%" name="theta" role="UNKNOWN" xml:id="S3.SS0.SSS0.Px1.p5.m1.1">θ</XMTok>
                          <XMTok fontsize="70%" role="CLOSE" stretchy="false">)</XMTok>
                        </XMWrap>
                      </XMDual>
                    </XMApp>
                  </XMApp>
                  <XMApp>
                    <XMTok meaning="times" role="MULOP">⁢</XMTok>
                    <XMApp>
                      <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                      <XMTok font="italic" name="gamma" role="UNKNOWN">γ</XMTok>
                      <XMApp>
                        <XMTok fontsize="70%" meaning="plus" role="ADDOP">+</XMTok>
                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
                        <XMApp>
                          <XMTok meaning="times" role="MULOP">⁢</XMTok>
                          <XMTok font="italic" fontsize="70%" role="UNKNOWN">T</XMTok>
                          <XMDual>
                            <XMRef idref="S3.SS0.SSS0.Px1.p5.m1.2"/>
                            <XMWrap>
                              <XMTok fontsize="70%" role="OPEN" stretchy="false">(</XMTok>
                              <XMApp xml:id="S3.SS0.SSS0.Px1.p5.m1.2">
                                <XMTok fontsize="70%" meaning="minus" role="ADDOP">-</XMTok>
                                <XMTok font="italic" fontsize="70%" name="theta" role="UNKNOWN">θ</XMTok>
                              </XMApp>
                              <XMTok fontsize="70%" role="CLOSE" stretchy="false">)</XMTok>
                            </XMWrap>
                          </XMDual>
                        </XMApp>
                      </XMApp>
                    </XMApp>
                    <XMDual>
                      <XMRef idref="S3.SS0.SSS0.Px1.p5.m1.3"/>
                      <XMWrap>
                        <XMTok role="OPEN" stretchy="false">(</XMTok>
                        <XMApp xml:id="S3.SS0.SSS0.Px1.p5.m1.3">
                          <XMTok meaning="plus" role="ADDOP">+</XMTok>
                          <XMApp>
                            <XMTok meaning="times" role="MULOP">⁢</XMTok>
                            <XMApp>
                              <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                              <XMApp>
                                <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                                <XMTok font="italic" role="UNKNOWN">r</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">u</XMTok>
                              </XMApp>
                              <XMTok font="italic" fontsize="70%" name="theta" role="UNKNOWN">θ</XMTok>
                            </XMApp>
                            <XMDual>
                              <XMApp>
                                <XMTok meaning="open-interval"/>
                                <XMRef idref="S3.SS0.SSS0.Px1.p5.m1.3.1"/>
                                <XMRef idref="S3.SS0.SSS0.Px1.p5.m1.3.2"/>
                              </XMApp>
                              <XMWrap>
                                <XMTok role="OPEN" stretchy="false">(</XMTok>
                                <XMApp xml:id="S3.SS0.SSS0.Px1.p5.m1.3.1">
                                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                                  <XMTok font="italic" role="UNKNOWN">s</XMTok>
                                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
                                </XMApp>
                                <XMTok role="PUNCT">,</XMTok>
                                <XMApp xml:id="S3.SS0.SSS0.Px1.p5.m1.3.2">
                                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                                  <XMApp>
                                    <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                                    <XMTok font="italic" role="UNKNOWN">a</XMTok>
                                    <XMTok font="italic" fontsize="70%" role="UNKNOWN">u</XMTok>
                                  </XMApp>
                                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
                                </XMApp>
                                <XMTok role="CLOSE" stretchy="false">)</XMTok>
                              </XMWrap>
                            </XMDual>
                          </XMApp>
                          <XMApp>
                            <XMTok meaning="times" role="MULOP">⁢</XMTok>
                            <XMApp>
                              <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                              <XMApp>
                                <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                                <XMTok font="italic" role="UNKNOWN">r</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">a</XMTok>
                              </XMApp>
                              <XMTok font="italic" fontsize="70%" name="theta" role="UNKNOWN">θ</XMTok>
                            </XMApp>
                            <XMDual>
                              <XMApp>
                                <XMTok meaning="open-interval"/>
                                <XMRef idref="S3.SS0.SSS0.Px1.p5.m1.3.3"/>
                                <XMRef idref="S3.SS0.SSS0.Px1.p5.m1.3.4"/>
                              </XMApp>
                              <XMWrap>
                                <XMTok role="OPEN" stretchy="false">(</XMTok>
                                <XMApp xml:id="S3.SS0.SSS0.Px1.p5.m1.3.3">
                                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                                  <XMTok font="italic" role="UNKNOWN">s</XMTok>
                                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
                                </XMApp>
                                <XMTok role="PUNCT">,</XMTok>
                                <XMApp xml:id="S3.SS0.SSS0.Px1.p5.m1.3.4">
                                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                                  <XMApp>
                                    <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                                    <XMTok font="italic" role="UNKNOWN">a</XMTok>
                                    <XMTok font="italic" fontsize="70%" role="UNKNOWN">a</XMTok>
                                  </XMApp>
                                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
                                </XMApp>
                                <XMTok role="CLOSE" stretchy="false">)</XMTok>
                              </XMWrap>
                            </XMDual>
                          </XMApp>
                          <XMApp>
                            <XMTok meaning="times" role="MULOP">⁢</XMTok>
                            <XMTok font="italic" name="beta" role="UNKNOWN">β</XMTok>
                            <XMDual>
                              <XMRef idref="S3.SS0.SSS0.Px1.p5.m1.3.5"/>
                              <XMWrap>
                                <XMTok role="OPEN" stretchy="false">(</XMTok>
                                <XMApp xml:id="S3.SS0.SSS0.Px1.p5.m1.3.5">
                                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                                  <XMApp>
                                    <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                                    <XMTok font="italic" role="UNKNOWN">a</XMTok>
                                    <XMTok font="italic" fontsize="70%" role="UNKNOWN">a</XMTok>
                                  </XMApp>
                                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
                                </XMApp>
                                <XMTok role="CLOSE" stretchy="false">)</XMTok>
                              </XMWrap>
                            </XMDual>
                          </XMApp>
                        </XMApp>
                        <XMTok role="CLOSE" stretchy="false">)</XMTok>
                      </XMWrap>
                    </XMDual>
                  </XMApp>
                </XMApp>
              </XMApp>
            </XMath>
          </Math>.
Balancing <emph font="italic">exploration</emph> (question-asking, active experimentation) and <emph font="italic">exploitation</emph> (leveraging current knowledge to solve tasks efficiently) is thus a central challenge in this social continual learning framework.</p>
      </para>
    </paragraph>
  </section>
  <section inlist="toc" xml:id="S4">
    <tags>
      <tag>4</tag>
      <tag role="refnum">4</tag>
      <tag role="typerefnum">§4</tag>
    </tags>
    <title><tag close=" ">4</tag>Developmental Psychology-Inspired Tasks for Evaluating Causal Reasoning and Question-Making</title>
    <para xml:id="S4.p1">
      <p>Drawing from developmental psychology, we design tasks to evaluate causal reasoning and question-making skills in collaborative AI systems. Inspired by children’s learning behaviors, these tasks assess the agent’s ability to:</p>
    </para>
    <para xml:id="S4.p2">
      <p><text font="italic">Explore-Exploit Tradeoff</text>: Balance between directed exploration and utilizing known information to reduce uncertainty and achieve goals efficiently <cite class="ltx_citemacro_cite">[<bibref bibrefs="meder2021development" separator="," yyseparator=","/>]</cite>.
<text font="italic">”Why” and ”What If” Questions</text>: Formulate meaningful hypotheses and evaluate counterfactual scenarios to refine causal models <cite class="ltx_citemacro_cite">[<bibref bibrefs="walker2020asking" separator="," yyseparator=","/>]</cite>.
<text font="italic">Epistemic Question Formulation</text>: Construct precise, goal-directed queries to address knowledge gaps efficiently <cite class="ltx_citemacro_cite">[<bibref bibrefs="ronfard2018question" separator="," yyseparator=","/>]</cite>.
<text font="italic">Causal Inference and Learning</text>:
Engage AI in tasks where it observes incomplete sequences of events and must infer causal relationships. For example, after observing that certain components drive a machine, the AI predicts outcomes without direct trial-and-error <cite class="ltx_citemacro_cite">[<bibref bibrefs="gopnik2001causal,shavlik2022contributions" separator="," yyseparator=","/>]</cite>.
<!--  %**** rldm.tex Line 175 **** --><text font="italic">Generating Hypotheses from Confounded Evidence</text>: Assesses AI’s ability to generate interventions that are informative in resolving the structure of ambiguous causal system <cite class="ltx_citemacro_cite">[<bibref bibrefs="gweon2008stretching" separator="," yyseparator=","/>]</cite>.
These tasks provide a multidimensional framework to benchmark AI systems, focusing on cognitive, linguistic, and social reasoning capabilities essential for dynamic, real-world collaboration.</p>
    </para>
  </section>
  <section inlist="toc" xml:id="S5">
    <tags>
      <tag>5</tag>
      <tag role="refnum">5</tag>
      <tag role="typerefnum">§5</tag>
    </tags>
    <title><tag close=" ">5</tag>Reference architectures</title>
    <subsection inlist="toc" xml:id="S5.SS1">
      <tags>
        <tag>5.1</tag>
        <tag role="refnum">5.1</tag>
        <tag role="typerefnum">§5.1</tag>
      </tags>
      <title><tag close=" ">5.1</tag>Base: Oracle-Aided ReAct</title>
      <para xml:id="S5.SS1.p1">
        <p>The base architecture simply extends the ReAct framework introducing actions to ask state and the user about their preference and objectives and the oracle about environment mechanics. However, as noted in the literature, complex planning <cite class="ltx_citemacro_cite">[<bibref bibrefs="katz2024thought" separator="," yyseparator=","/>]</cite> and usage of declarative knowledge for decision making and execution <cite class="ltx_citemacro_cite">[<bibref bibrefs="arabi2024habit" separator="," yyseparator=","/>]</cite> appear difficult for vanilla LLMs.</p>
      </para>
    </subsection>
    <subsection inlist="toc" xml:id="S5.SS2">
      <tags>
        <tag>5.2</tag>
        <tag role="refnum">5.2</tag>
        <tag role="typerefnum">§5.2</tag>
      </tags>
      <title><tag close=" ">5.2</tag>Advanced: ReAct Framework with Oracle-Aided Causal Reasoning</title>
      <para xml:id="S5.SS2.p1">
        <p>Building on the base architecture, the advanced ReAct (Reason + Act) <cite class="ltx_citemacro_cite">[<bibref bibrefs="yao2023" separator="," yyseparator=","/>]</cite> framework introduces information-gathering actions and extends its functionality with a specialized action, <text font="typewriter">CausalRefinementAndAction</text>. This action is invoked by the Large Language Model (LLM) when complex reasoning tasks are required, specifically for:
<!--  %**** rldm.tex Line 200 **** --></p>
      </para>
      <para xml:id="S5.SS2.p2">
        <itemize xml:id="S5.I1">
          <item xml:id="S5.I1.i1">
            <tags>
              <tag>•</tag>
              <tag role="typerefnum">1st item</tag>
            </tags>
            <para xml:id="S5.I1.i1.p1">
              <p><text font="bold">Refining or updating</text> knowledge about user needs and the world’s mechanisms and states (causal model), or</p>
            </para>
          </item>
          <item xml:id="S5.I1.i2">
            <tags>
              <tag>•</tag>
              <tag role="typerefnum">2nd item</tag>
            </tags>
            <para xml:id="S5.I1.i2.p1">
              <p><text font="bold">Planning and executing</text> steps to achieve a specified goal.</p>
            </para>
          </item>
        </itemize>
      </para>
      <para xml:id="S5.SS2.p3">
        <p><text font="typewriter">CausalRefinementAndAction</text> integrates iterative causal knowledge management, utilizing both external oracle support (e.g., a domain expert or automated simulator) and established causal inference libraries such as <text font="italic">causal-learn</text>, <text font="italic">DoWhy</text>, and <text font="italic">Tetrad</text> <cite class="ltx_citemacro_cite">[<bibref bibrefs="zheng2024causal,sharma2020dowhy,tetrad" separator="," yyseparator=","/>]</cite>. Given a user’s prompt, current goal, and contextual information, the LLM initially maps relevant knowledge to a causal graph, which may be incomplete.</p>
      </para>
      <para xml:id="S5.SS2.p4">
        <p>The agent estimates the expected value and cost of potential actions to refine its knowledge, using metrics such as Value of Information (VoI) or robust optimization criteria. Refinement actions include querying the user about preferences and goals, asking the oracle about specific causal links or effect sizes, or performing interventions. If the refinement is deemed beneficial (i.e., cost is below a threshold), the suggested strategy is executed. Based on responses from the user or oracle, or results of interventions, causal inference libraries update the graph and determine whether additional refinements are necessary.</p>
      </para>
      <para xml:id="S5.SS2.p5">
        <p>Once the causal graph is sufficiently refined, the ReAct agent invokes planning routines—using libraries such as <text font="italic">PyCID</text> or a robust Markov Decision Process (MDP) solver—to derive policies or action sequences that maximize the likelihood of achieving the user’s goals under the current causal knowledge. This combination of LLM-driven reasoning, causal knowledge management, and decision-making enable advanced reasoning and information-gathering capabilities are activated when necessary while maintaining the flexibility to handle diverse scenarios typical of LLMs.
<!--  %“input–pseudocode˝ --></p>
      </para>
    </subsection>
  </section>
  <bibliography bibstyle="plain" citestyle="numbers" files="bib-short" xml:id="bib">
    <title>References</title>
  </bibliography>
<!--  %“begin–thebibliography˝–9˝ 
     %“bibitem–yao2022react˝
     %Yao, S., Wu, Y., Cheung, B., Li, A., “&amp; Rinaldo, A. (2022).
     %“newblock –ReAct: Synergizing Reasoning and Acting in Language Models˝.
     %“newblock –“em arXiv preprint˝ arXiv:2210.03629.
     %“bibitem–glymour2019review˝
     %Glymour, C., Zhang, K., “&amp; Spirtes, P. (2019).
     %“newblock –Review of Causal Discovery Methods Based on Graphical Models˝.
     %“newblock –“em Frontiers in Genetics˝, 10, 524.
     %**** rldm.tex Line 225 ****
     %“bibitem–bhattacharya2022causal˝
     %Bhattacharya, R., Jabbari, K., “&amp; Sen, B. (2022).
     %“newblock –causal-learn: A Python Library for Causal Discovery˝.
     %“newblock –“em arXiv preprint˝ arXiv:2210.06518.
     %“end–thebibliography˝--></document>
