SCOOP: A Framework for Proactive Collaboration and Social Continual Learning through Natural Language Interaction and Causal Reasoning

1 1 §1 <tag close=" ">1</tag>Introduction

With the advent of advanced generative AI, tasks involving rich multimodal and natural language information—where users collaborate with AI helpers—are becoming increasingly common. These settings often involve:

1. 1 item 1

Sequential Collaborative Interactions: performing the task requires planning and executing of sequences of interactions and independent by the user or the agent (e.g. collecting different elements and tools), with the need to account for their consequences over multiple steps.

2. 2 item 2

Costly Social Information Access: users and agents must manage limited resources when accessing information provided by other agents (e.g., mechanisms descriptions, regulations or user preferences or beliefs []).

3. 3 item 3

Multimodal complexity: combining textual descriptions, visual representations, and structural information (e.g., house refurbishment plans).

4. 4 item 4

Dynamic Environments: evolving tasks with shifting states and requirements (e.g., bureaucratic processes, repair scenarios) [].

The simultaneous presence of these conditions and their interaction motivate our work to formalize and create effective solutions. Moreover, they reflect important aspects of human developmental processes [].

1.1 1.1 §1.1 <tag close=" ">1.1</tag>Example Scenarios

Example 1: Repairing and Shipping Items

A robot agent assists in repairing and shipping items to various international destinations. Tasks include adhering to specific repair requirements, customs regulations, packaging standards, and climate conditions. The agent collaborates with the user to clarify repair strategies and queries a natural language oracle for regulatory or technical information. During downtime, the agent explores tools and equipment to refine its understanding of repair techniques, improving future performance.

Example 2: Compiling Documents for Official Requests

An agent aids a user in compiling documents for official applications, such as visas or tax submissions. Tasks involve identifying required documents, extracting relevant information, and validating regulations via a natural language oracle. When ambiguities arise, the agent interacts with the user for clarification. In idle periods, it autonomously explores document templates and regulatory archives to improve performance across multiple problem instances.

2 2 §2 <tag close=" ">2</tag>Related Work

ReAct Framework: The ReAct (Reason + Act) framework [] integrates reasoning and acting capabilities into LLMs, enabling contextual decision-making and blending question generation with action-taking in collaborative environments.

Causal Reasoning in AI: Techniques like causal discovery [] and inference frameworks such as DoWhy [] provide essential tools for reasoning in dynamic, partially observable environments.

Integrating Causal Graphs with LLMs: Recent research integrates causal graphs with LLMs to enhance reasoning and decision-making. Embedding causal reasoning into LLM workflows enables systems to interpret environments, predict outcomes, and optimize actions. Frameworks combining LLMs with causal world models [] and studies on automating causal discovery [] highlight the potential of causal representation learning for dynamically constructing world models.

LLMs and Planning Systems: Combining LLMs with planning systems, such as symbolic planners or reinforcement learning frameworks, supports structured decision-making. Hybrid approaches integrate LLMs for language understanding and planners for task execution, excelling in complex scenarios like multi-agent collaboration and robotics [].

Causal Discovery and Question Generation: Integrating LLMs with causal discovery tools enhances reasoning in dynamic environments. For instance, PyWhy-LLM supports causal analysis [], while DoWhy-GCM facilitates inference in graphical causal models []. These tools enable LLMs to refine causal graphs and resolve ambiguities, closing the loop between knowledge acquisition and action.

Symbolic and Subsymbolic Integration: Hybrid architectures bridge symbolic causal reasoning with the subsymbolic capabilities of LLMs, combining graph-based reasoning with unstructured data processing for robust decision-making agents [].

Active Information Gathering: Research on active learning and information gain [] informs strategies for querying to maximize utility while balancing exploration and exploitation in interactive systems.

Human-in-the-Loop Systems: Human feedback integration highlights the value of interactive learning, enabling agents to dynamically query users and adapt to preferences [].

Developmental Psychology Insights: Insights from children’s causal learning [] inspire benchmarks to evaluate AI agents’ abilities in question generation and causal inference.

By synthesizing these elements, our framework advances adaptive, interactive AI systems at the intersection of causal reasoning, social learning, and multimodal interaction.

3 3 §3 <tag close=" ">3</tag>The SCOOP framework: Social Continual Object-Oriented POMDP Formal Framework.

We formalize our rich interactive setting as an object-oriented partially observable Markov decision process (OO-POMDP), extended to incorporate a lingusistic world descriptor generating natural language observations (e.g. the initial task description in natural language), multiple problem instances, a user with problem-specific objectives, and a natural language oracle that provides causal information. Let

= D (T O, F O, R O, W O)

denote a domain specification, where:

• 1st item

$T O$ is a set of object types (e.g., boxes, tools, containers);

• 2nd item

$F O$ is the set of allowed features or predicates relevant to the domain (e.g., “contains(x,y)” or “isOpenable(x)”);

• 3rd item

$R O$ is an evolving set of causal rules capturing how these features and object types interact (e.g., whether opening a container allows access to its contents), some of which may be unknown or partially specified;

• 4th item

$W O$ is the family of possible world configurations consistent with $T O, F O,$ and $R O$ .

A problem instance $θ$ refines $D$ with a concrete set of objects, an initial state, and a user objective (defined by a problem-specific reward function $r θ$ that encodes the user’s goals). The helper agent interacts with both the user, making questions to clarify his goals or preferences, presenting results and suggestions, or other interactive actions, and a natural language oracle (to acquire missing causal rules or environment states) across potentially many problem instances ${θ i}$ . The oracle responds in multiple formats:

1. 1 item 1

Language-based descriptions of environment dynamics, such as “box A must be opened before retrieving item B”;

2. 2 item 2

Formal causal chunks, where the oracle may directly provide rules or graphs parts (e.g., “node Open(Box) causes Accessible(Item)”),

3. 3 item 3

Observation-like feedback, akin to sensor readings or state confirmations.

The overall action space $= A ∪ A a A u$ composed by the agent’s action space by $A a$ and the user’s action space by $A u$ . We denote the agent’s action space by $= A a ∪ A a act A a query$ , where $A act$ includes environment-oriented actions (e.g., open, pick, place) and $A query$ includes queries to the oracle or the user. The agent also observes the user’s feedback or clarifications regarding the task objectives and environment states. We denote the user’s action space by $= A u ∪ A u act A u query$ , where $A u act$ includes environment-oriented actions (e.g., open, pick, place) and $A query$ includes queries to the agent. Formally, each problem instance is modeled as:

(S θ, A, Ω θ, T θ, O θ, r u θ, r a θ, γ, β),

mirroring a POMDP with the following modifications:

• 1st item

$S θ$ embeds the object-based states from $θ$ and any partial knowledge of the causal rules $R O$ ;

• 2nd item

$Ω θ$ is the space of possible observations, spanning both environmental signals (e.g., sensor readings, gripper state, environment map chunk, etc) and language-based responses from the user or oracle;

• 3rd item

$r u θ$ encodes the user’s objectives for instance $θ$ , which the helper agent aims to optimize;

• 4th item

$r a θ$ encodes the helper agent’s action costs for instance $θ$ , which the helper agent aims to optimize;

• 5th item

$⁢ β (⋅)$ defines the cost of querying (time, resources, or complexity) the oracle to obtain new causal information or the user about current objective and preferences, i.e. $r u θ$ .

Crucially, the agent can explore the domain outside active tasks to refine $R O$ (e.g., by performing experiments or asking domain-level questions). Any information gleaned is amortized across future tasks $θ j$ . This design enables continual learning of domain mechanics: as the agent accumulates causal knowledge (e.g., “a certain box can contain items of type $T$ ”), it improves performance in subsequent problem instances. More formally this is obtained assuming that at each instance $θ j$ the specific instantiation of $R O$ , $r u θ$ and $r a θ$ are extracted from the same distribution.

Ultimately, the helper agent objective functions is: $∑ θ ∑ t ⁢ T (θ) ⁢ γ + t ⁢ T (- θ) (+ ⁢ r u θ (s t, a u t) ⁢ r a θ (s t, a a t) ⁢ β (a a t))$ . Balancing exploration (question-asking, active experimentation) and exploitation (leveraging current knowledge to solve tasks efficiently) is thus a central challenge in this social continual learning framework.

4 4 §4 <tag close=" ">4</tag>Developmental Psychology-Inspired Tasks for Evaluating Causal Reasoning and Question-Making

Drawing from developmental psychology, we design tasks to evaluate causal reasoning and question-making skills in collaborative AI systems. Inspired by children’s learning behaviors, these tasks assess the agent’s ability to:

Explore-Exploit Tradeoff: Balance between directed exploration and utilizing known information to reduce uncertainty and achieve goals efficiently []. ”Why” and ”What If” Questions: Formulate meaningful hypotheses and evaluate counterfactual scenarios to refine causal models []. Epistemic Question Formulation: Construct precise, goal-directed queries to address knowledge gaps efficiently []. Causal Inference and Learning: Engage AI in tasks where it observes incomplete sequences of events and must infer causal relationships. For example, after observing that certain components drive a machine, the AI predicts outcomes without direct trial-and-error []. Generating Hypotheses from Confounded Evidence: Assesses AI’s ability to generate interventions that are informative in resolving the structure of ambiguous causal system []. These tasks provide a multidimensional framework to benchmark AI systems, focusing on cognitive, linguistic, and social reasoning capabilities essential for dynamic, real-world collaboration.

5 5 §5 <tag close=" ">5</tag>Reference architectures 5.1 5.1 §5.1 <tag close=" ">5.1</tag>Base: Oracle-Aided ReAct

The base architecture simply extends the ReAct framework introducing actions to ask state and the user about their preference and objectives and the oracle about environment mechanics. However, as noted in the literature, complex planning [] and usage of declarative knowledge for decision making and execution [] appear difficult for vanilla LLMs.

5.2 5.2 §5.2 <tag close=" ">5.2</tag>Advanced: ReAct Framework with Oracle-Aided Causal Reasoning

Building on the base architecture, the advanced ReAct (Reason + Act) [] framework introduces information-gathering actions and extends its functionality with a specialized action, CausalRefinementAndAction. This action is invoked by the Large Language Model (LLM) when complex reasoning tasks are required, specifically for:

• 1st item

Refining or updating knowledge about user needs and the world’s mechanisms and states (causal model), or

• 2nd item

Planning and executing steps to achieve a specified goal.

CausalRefinementAndAction integrates iterative causal knowledge management, utilizing both external oracle support (e.g., a domain expert or automated simulator) and established causal inference libraries such as causal-learn, DoWhy, and Tetrad []. Given a user’s prompt, current goal, and contextual information, the LLM initially maps relevant knowledge to a causal graph, which may be incomplete.

The agent estimates the expected value and cost of potential actions to refine its knowledge, using metrics such as Value of Information (VoI) or robust optimization criteria. Refinement actions include querying the user about preferences and goals, asking the oracle about specific causal links or effect sizes, or performing interventions. If the refinement is deemed beneficial (i.e., cost is below a threshold), the suggested strategy is executed. Based on responses from the user or oracle, or results of interventions, causal inference libraries update the graph and determine whether additional refinements are necessary.

Once the causal graph is sufficiently refined, the ReAct agent invokes planning routines—using libraries such as PyCID or a robust Markov Decision Process (MDP) solver—to derive policies or action sequences that maximize the likelihood of achieving the user’s goals under the current causal knowledge. This combination of LLM-driven reasoning, causal knowledge management, and decision-making enable advanced reasoning and information-gathering capabilities are activated when necessary while maintaining the flexibility to handle diverse scenarios typical of LLMs.

References