<?xml version="1.0" encoding="UTF-8"?>
<?latexml searchpaths="/home/japhy/scienceReplication.artiswrong.com/paper_files/arxiv/2601.08848/latex_extracted"?>
<!--  %This must be in the first 5 lines to tell arXiv to use pdfLaTeX, which is strongly recommended. --><!--  %In particular, the hyperref package requires pdfLaTeX in order to break URLs across lines. --><?latexml class="article" options="11pt"?>
<?latexml package="amsmath,amssymb"?>
<?latexml package="acl" options="final"?>
<?latexml package="adjustbox"?>
<!--  %Standard package includes --><?latexml package="times"?>
<?latexml package="latexsym"?>
<?latexml package="booktabs"?>
<?latexml package="siunitx"?>
<?latexml package="enumitem"?>
<?latexml package="tikz"?>
<?latexml RelaxNGSchema="LaTeXML"?>
<?latexml package="fontenc" options="T1"?>
<?latexml package="inputenc" options="utf8"?>
<?latexml package="graphicx"?>
<?latexml package="microtype"?>
<?latexml package="svg"?>
<?latexml package="inconsolata"?>
<?latexml package="graphicx"?>
<document xmlns="http://dlmf.nist.gov/LaTeXML" class="ltx_authors_1line">
  <resource src="LaTeXML.css" type="text/css"/>
  <resource src="ltx-article.css" type="text/css"/>
  <title>PediaMind-R1: A Temperament-Aware Language Model for Personalized Early Childhood Care
Reasoning via Cognitive Modeling and Preference Alignment</title>
  <creator role="author">
    <personname>
Zihe Zhang<sup>1</sup>,
Can Zhang<sup>1</sup>,
Yanheng Xu<sup>2</sup>,
Xin Hu<sup>1</sup>,
Jichao Leng<sup>1</sup> <break/><sup>1</sup>School of Future Information and Innovation, Fudan University, Shanghai, China <break/><sup>2</sup>Corporate Research, Bosch (China) Investment Ltd., Shanghai, China <break/></personname>
  </creator>
  <abstract name="Abstract">
    <p>This paper presents PediaMind-R1, a domain-specialized large language model designed to achieve active personalization in intelligent parenting scenarios. Unlike conventional systems that provide generic suggestions, PediaMind-R1 draws on insights from developmental psychology. It introduces temperament theory from the Thomas–Chess framework and builds a temperament knowledge graph for infants and toddlers (0–3 years). Our two-stage training pipeline first uses supervised fine-tuning to teach structured chain-of-thought reasoning, and then applies a GRPO-based alignment stage to reinforce logical consistency, domain expertise, and empathetic caregiving strategies. We further design an evaluation framework comprising temperament-sensitive multiple-choice tests and human assessments. The results demonstrate that PediaMind-R1 can accurately interpret early childhood temperament profiles and proactively engage in individualized reasoning. This work highlights the value of integrating vertical-domain modeling with psychological theory. It offers a novel approach to developing user-centered LLMs that advance the practice of active personalization in sensitive caregiving contexts.</p>
  </abstract>
  <ERROR class="undefined">\usetikzlibrary</ERROR>
  <para xml:id="p1">
    <p>mindmap,trees
 <!--  %保持字体一致 
     %**** acl˙latex.tex Line 25 ****
     %For proper rendering and hyphenation of words containing Latin characters (including in bib files)
     %This␣assumes␣your␣files␣are␣encoded␣as␣UTF8
     %This␣is␣not␣strictly␣necessary,␣and␣may␣be␣commented␣out,
     %but␣it␣will␣improve␣the␣layout␣of␣the␣manuscript,
     %and␣will␣typically␣save␣some␣space.
     %Including␣images␣in␣your␣LaTeX␣document␣requires␣adding
     %additional␣package(s)
     %If␣the␣title␣and␣author␣information␣does␣not␣fit␣in␣the␣area␣allocated,␣uncomment␣the␣following
     %****␣acl_latex.tex␣Line␣50␣****
     %\setlength\titlebox{&lt;dim&gt;}
     %and␣set␣&lt;dim&gt;␣to␣something␣5cm␣or␣larger.-->
<!--  %Author␣information␣can␣be␣set␣in␣various␣styles: 
     %For␣several␣authors␣from␣the␣same␣institution:
     %\author{Author␣1␣\and␣...␣\and␣Author␣n␣\\
     %Address␣line␣\\␣...␣\\␣Address␣line}
     %if␣the␣names␣do␣not␣fit␣well␣on␣one␣line␣use
     %Author␣1␣\\␣{\bf␣Author␣2}␣\\␣...␣\\␣{\bf␣Author␣n}␣\\
     %For␣authors␣from␣different␣institutions:
     %\author{Author␣1␣\\␣Address␣line␣\\␣␣...␣\\␣Address␣line
     %\And␣␣...␣\And
     %Author␣n␣\\␣Address␣line␣\\␣...␣\\␣Address␣line}
     %To␣start␣a␣separate␣‘‘row’’␣of␣authors␣use␣\AND,␣as␣in
     %\author{Author␣1␣\\␣Address␣line␣\\␣␣...␣\\␣Address␣line
     %\AND
     %Author␣2␣\\␣Address␣line␣\\␣...␣\\␣Address␣line␣\And
     %Author␣3␣\\␣Address␣line␣\\␣...␣\\␣Address␣line}
     %****␣acl_latex.tex␣Line␣75␣****-->
<!--  %\author{ 
     %\textbf{First␣Author\textsuperscript{1}},
     %\textbf{Second␣Author\textsuperscript{1,2}},
     %\textbf{Third␣T.␣Author\textsuperscript{1}},
     %\textbf{Fourth␣Author\textsuperscript{1}},
     %\\
     %\textbf{Fifth␣Author\textsuperscript{1,2}},
     %\textbf{Sixth␣Author\textsuperscript{1}},
     %\textbf{Seventh␣Author\textsuperscript{1}},
     %\textbf{Eighth␣Author␣\textsuperscript{1,2,3,4}},
     %\\
     %\textbf{Ninth␣Author\textsuperscript{1}},
     %\textbf{Tenth␣Author\textsuperscript{1}},
     %****␣acl_latex.tex␣Line␣100␣****
     %\textbf{Eleventh␣E.␣Author\textsuperscript{1,2,3,4,5}},
     %\textbf{Twelfth␣Author\textsuperscript{1}},
     %\\
     %\textbf{Thirteenth␣Author\textsuperscript{3}},
     %\textbf{Fourteenth␣F.␣Author\textsuperscript{2,4}},
     %\textbf{Fifteenth␣Author\textsuperscript{1}},
     %\textbf{Sixteenth␣Author\textsuperscript{1}},
     %\\
     %\textbf{Seventeenth␣S.␣Author\textsuperscript{4,5}},
     %\textbf{Eighteenth␣Author\textsuperscript{3,4}},
     %\textbf{Nineteenth␣N.␣Author\textsuperscript{2,5}},
     %\textbf{Twentieth␣Author\textsuperscript{1}}
     %\\
     %\\
     %\textsuperscript{1}Affiliation␣1,
     %\textsuperscript{2}Affiliation␣2,
     %\textsuperscript{3}Affiliation␣3,
     %\textsuperscript{4}Affiliation␣4,
     %\textsuperscript{5}Affiliation␣5
     %\\
     %\small{
     %\textbf{Correspondence:}␣\href{mailto:email@domain}{email@domain}
     %}
     %}
     %****␣acl_latex.tex␣Line␣125␣****--></p>
  </para>
  <section inlist="toc" xml:id="S1">
    <tags>
      <tag>1</tag>
      <tag role="autoref">section 1</tag>
      <tag role="refnum">1</tag>
      <tag role="typerefnum">§1</tag>
    </tags>
    <title><tag close=" ">1</tag>Introduction</title>
    <para xml:id="S1.p1">
      <p>Large language models (LLMs) have shown strong general performance across diverse tasks. However, most are designed for generic usage and lack the ability to adapt to individual users. Both active and passive personalization, whether guided by user input or inferred from interaction history, remain underdeveloped, with limited ability to condition responses on structured user characteristics.</p>
    </para>
    <para xml:id="S1.p2">
      <p>Personalization is especially critical in domains such as parenting and infant care, where individual needs vary widely and generic suggestions may be insufficient. Developmental psychology has long emphasized that caregiving strategies tailored to a child’s temperament, including traits such as adaptability and emotional intensity, can significantly impact long-term developmental outcomes. The Thomas–Chess temperament model, for example, categorizes infants into structured types such as “easy,” “difficult,” and “slow-to-warm-up,” offering a psychologically grounded basis for individualization <cite class="ltx_citemacro_citep">(<bibref bibrefs="thomas1977temperament,carey2004temperament" separator=";" show="AuthorsPhrase1Year" yyseparator=",">
            <bibrefphrase>, </bibrefphrase>
          </bibref>)</cite>.</p>
    </para>
    <para xml:id="S1.p3">
      <p>In this work, we propose PediaMind-R1, a domain-specialized LLM for active personalization in early childhood care. Building on the Thomas–Chess framework, we construct a temperament knowledge graph and condition model outputs on temperament labels to deliver individualized caregiving strategies. Our two-stage training pipeline combines supervised fine-tuning (SFT) <cite class="ltx_citemacro_citep">(<bibref bibrefs="hu2022lora" separator=";" show="AuthorsPhrase1Year" yyseparator=",">
            <bibrefphrase>, </bibrefphrase>
          </bibref>)</cite> for structured reasoning with Group Relative Policy Optimization (GRPO) <cite class="ltx_citemacro_citep">(<bibref bibrefs="zhang2025r1vl" separator=";" show="AuthorsPhrase1Year" yyseparator=",">
            <bibrefphrase>, </bibrefphrase>
          </bibref>)</cite> which selects responses exceeding group-average performance. To evaluate effectiveness, we design a temperament-sensitive framework with scenario-based multiple-choice tests and expert assessments, confirming the model’s ability to interpret temperament and provide personalized, empathetic recommendations.</p>
    </para>
    <para xml:id="S1.p4">
      <p>Our contributions are threefold:</p>
      <itemize xml:id="S1.I1">
        <item xml:id="S1.I1.i1">
          <tags>
            <tag>•</tag>
            <tag role="autoref">item </tag>
            <tag role="typerefnum">1st item</tag>
          </tags>
          <para xml:id="S1.I1.i1.p1">
            <p><text font="bold">Activating LLM Personalization via Psychological Temperament Modeling:</text>
We leverage temperament traits from the Thomas–Chess framework to explicitly model psychological profiles, thereby activating personalized reasoning in early childhood care and aligning LLM outputs with children’s unique developmental needs.
<!--  %****␣acl_latex.tex␣Line␣150␣**** --></p>
          </para>
        </item>
        <item xml:id="S1.I1.i2">
          <tags>
            <tag>•</tag>
            <tag role="autoref">item </tag>
            <tag role="typerefnum">2nd item</tag>
          </tags>
          <para xml:id="S1.I1.i2.p1">
            <p><text font="bold">Modeling Temperament-Aware Reasoning via SFT and GRPO:</text>
We jointly apply SFT and GRPO to embed temperament-sensitive reasoning into the LLM, combining structured logic with preference alignment grounded in developmental psychology.</p>
          </para>
        </item>
        <item xml:id="S1.I1.i3">
          <tags>
            <tag>•</tag>
            <tag role="autoref">item </tag>
            <tag role="typerefnum">3rd item</tag>
          </tags>
          <para xml:id="S1.I1.i3.p1">
            <p><text font="bold">Temperament-Sensitive Evaluation:</text>
We propose an evaluation scheme using multiple-choice benchmarks and expert assessments to capture both factual accuracy and psychological appropriateness.</p>
          </para>
        </item>
      </itemize>
    </para>
    <para xml:id="S1.p5">
      <p>Although newer temperament frameworks exist, we adopt the classical Thomas–Chess model <cite class="ltx_citemacro_citep">(<bibref bibrefs="thomas1977temperament" separator=";" show="AuthorsPhrase1Year" yyseparator=",">
            <bibrefphrase>, </bibrefphrase>
          </bibref>)</cite> as a widely recognized baseline to validate our methodology.</p>
    </para>
  </section>
  <section inlist="toc" xml:id="S2">
    <tags>
      <tag>2</tag>
      <tag role="autoref">section 2</tag>
      <tag role="refnum">2</tag>
      <tag role="typerefnum">§2</tag>
    </tags>
    <title><tag close=" ">2</tag>Related Work &amp; Motivation</title>
    <subsection inlist="toc" xml:id="S2.SS1">
      <tags>
        <tag>2.1</tag>
        <tag role="autoref">subsection 2.1</tag>
        <tag role="refnum">2.1</tag>
        <tag role="typerefnum">§2.1</tag>
      </tags>
      <title><tag close=" ">2.1</tag>Infant Temperament as a Personalization Signal</title>
      <para xml:id="S2.SS1.p1">
        <p>Most personalization strategies in artificial intelligence assume that users can explicitly articulate their needs. However, in domains such as infant care, the end user—the infant—lacks communicative agency, necessitating proxy-driven personalization. Among psychological frameworks, the temperament theory proposed by Thomas and Chess <cite class="ltx_citemacro_citep">(<bibref bibrefs="thomas1977temperament" separator=";" show="AuthorsPhrase1Year" yyseparator=",">
              <bibrefphrase>, </bibrefphrase>
            </bibref>)</cite> is particularly influential. It categorizes infants based on observable traits such as adaptability, activity level, and emotional intensity.</p>
      </para>
      <para xml:id="S2.SS1.p2">
        <p>These temperament classifications have demonstrated predictive value for long-term developmental outcomes and are widely used to inform parenting decisions <cite class="ltx_citemacro_citep">(<bibref bibrefs="carey2004temperament" separator=";" show="AuthorsPhrase1Year" yyseparator=",">
              <bibrefphrase>, </bibrefphrase>
            </bibref>)</cite>. Instruments like the Infant Temperament Questionnaire (ITQ) offer a structured way for caregivers to assess these traits. In this work, we use these traits as personalization signals, conditioning LLM reasoning on temperament profiles to support individualized parenting strategies.</p>
      </para>
    </subsection>
    <subsection inlist="toc" xml:id="S2.SS2">
      <tags>
        <tag>2.2</tag>
        <tag role="autoref">subsection 2.2</tag>
        <tag role="refnum">2.2</tag>
        <tag role="typerefnum">§2.2</tag>
      </tags>
      <title><tag close=" ">2.2</tag>Personalizing LLMs via Human-Guided Reward Alignment</title>
      <para xml:id="S2.SS2.p1">
        <p>Prior efforts in LLM personalization include user embedding approaches <cite class="ltx_citemacro_citep">(<bibref bibrefs="madotto2019personalizing" separator=";" show="AuthorsPhrase1Year" yyseparator=",">
              <bibrefphrase>, </bibrefphrase>
            </bibref>)</cite>, in-context learning paradigms <cite class="ltx_citemacro_citep">(<bibref bibrefs="khorashadizadeh2023incontext" separator=";" show="AuthorsPhrase1Year" yyseparator=",">
              <bibrefphrase>, </bibrefphrase>
            </bibref>)</cite>, and retrieval-augmented methods <cite class="ltx_citemacro_citep">(<bibref bibrefs="liu2020personalization" separator=";" show="AuthorsPhrase1Year" yyseparator=",">
              <bibrefphrase>, </bibrefphrase>
            </bibref>)</cite>. However, these approaches typically rely on explicit user feedback or long-term interaction histories, which are unavailable in non-verbal, high-stakes domains such as infant care.</p>
      </para>
      <para xml:id="S2.SS2.p2">
        <p>To address this, we adopt Group Relative Policy Optimization (GRPO), a reinforcement learning method that compares multiple candidate outputs for the same prompt and computes a group-relative advantage. Unlike Direct Preference Optimization (DPO) <cite class="ltx_citemacro_citep">(<bibref bibrefs="rafailov2023dpo" separator=";" show="AuthorsPhrase1Year" yyseparator=",">
              <bibrefphrase>, </bibrefphrase>
            </bibref>)</cite>, which depends on binary preference pairs, GRPO evaluates outputs by their relative performance within a group, enabling stable optimization without requiring absolute reward labels <cite class="ltx_citemacro_citep">(<bibref bibrefs="zhang2025r1vl" separator=";" show="AuthorsPhrase1Year" yyseparator=",">
              <bibrefphrase>, </bibrefphrase>
            </bibref>)</cite>.</p>
      </para>
<!--  %****␣acl_latex.tex␣Line␣175␣**** -->      <para xml:id="S2.SS2.p3">
        <p>This strategy is well-suited for temperament-sensitive reasoning, where correctness is graded across dimensions such as logical consistency, psychological alignment, safety, and empathy. Our approach thus combines psychological profiling, curated supervision, and structured reward design to realize active personalization in infant care, drawing inspiration from reasoning-focused models such as DeepSeek-R1 <cite class="ltx_citemacro_citep">(<bibref bibrefs="guo2025deepseekr1,liu2024deepseekv3" separator=";" show="AuthorsPhrase1Year" yyseparator=",">
              <bibrefphrase>, </bibrefphrase>
            </bibref>)</cite>.</p>
      </para>
    </subsection>
  </section>
  <section inlist="toc" xml:id="S3">
    <tags>
      <tag>3</tag>
      <tag role="autoref">section 3</tag>
      <tag role="refnum">3</tag>
      <tag role="typerefnum">§3</tag>
    </tags>
    <title><tag close=" ">3</tag>Methodology</title>
    <para xml:id="S3.p1">
      <p>We adopt a streamlined two-stage training framework to develop a temperament-sensitive LLM for infant care, as shown in Figure <ref labelref="LABEL:fig:pipeline"/>. Our approach consists of (1) temperament-aware supervised fine-tuning using LoRA and (2) group-relative preference optimization, with detailed training configurations provided in Appendix B.</p>
    </para>
    <figure inlist="lof" labels="LABEL:fig:pipeline" placement="htbp" xml:id="S3.F1">
      <tags>
        <tag>Figure 1</tag>
        <tag role="autoref">Figure 1</tag>
        <tag role="refnum">1</tag>
        <tag role="typerefnum">Figure 1</tag>
      </tags>
      <graphics candidates="Procedure.png" class="ltx_centering" graphic="Procedure.png" options="width=433.62pt" xml:id="S3.F1.g1"/>
      <toccaption class="ltx_centering"><tag close=" ">1</tag>PediaMind-R1 training pipeline: temperament-aware supervised fine-tuning, followed by GRPO alignment.</toccaption>
      <caption class="ltx_centering"><tag close=": ">Figure 1</tag>PediaMind-R1 training pipeline: temperament-aware supervised fine-tuning, followed by GRPO alignment.</caption>
    </figure>
    <subsection inlist="toc" xml:id="S3.SS1">
      <tags>
        <tag>3.1</tag>
        <tag role="autoref">subsection 3.1</tag>
        <tag role="refnum">3.1</tag>
        <tag role="typerefnum">§3.1</tag>
      </tags>
      <title><tag close=" ">3.1</tag>Temperament-Aware Supervised Fine-Tuning</title>
      <subsubsection inlist="toc" xml:id="S3.SS1.SSS1">
        <tags>
          <tag>3.1.1</tag>
          <tag role="autoref">subsubsection 3.1.1</tag>
          <tag role="refnum">3.1.1</tag>
          <tag role="typerefnum">§3.1.1</tag>
        </tags>
        <title><tag close=" ">3.1.1</tag>Dataset Construction</title>
        <para xml:id="S3.SS1.SSS1.p1">
          <p>Our supervised fine-tuning dataset comprises 1,215 caregiver queries annotated with temperament labels from the Thomas–Chess framework and structured chain-of-thought responses. Responses were generated with DeepSeek-R1 and guided by a curated temperament–strategy knowledge graph (see AppendixC), with 10% expert-reviewed for factual and psychological validity. A representative example is shown in Figure<ref labelref="LABEL:fig:dataset-sample"/>.</p>
        </para>
        <figure inlist="lof" labels="LABEL:fig:dataset-sample" placement="h" xml:id="S3.F2">
          <tags>
            <tag>Figure 2</tag>
            <tag role="autoref">Figure 2</tag>
            <tag role="refnum">2</tag>
            <tag role="typerefnum">Figure 2</tag>
          </tags>
          <graphics candidates="sample.png" class="ltx_centering" graphic="sample.png" options="width=208.1376pt" xml:id="S3.F2.g1"/>
          <toccaption class="ltx_centering"><tag close=" ">2</tag>Representative dataset sample combining temperament knowledge graph and structured response.</toccaption>
          <caption class="ltx_centering"><tag close=": ">Figure 2</tag>Representative dataset sample combining temperament knowledge graph and structured response.</caption>
        </figure>
<!--  %****␣acl_latex.tex␣Line␣200␣**** -->      </subsubsection>
      <subsubsection inlist="toc" xml:id="S3.SS1.SSS2">
        <tags>
          <tag>3.1.2</tag>
          <tag role="autoref">subsubsection 3.1.2</tag>
          <tag role="refnum">3.1.2</tag>
          <tag role="typerefnum">§3.1.2</tag>
        </tags>
        <title><tag close=" ">3.1.2</tag>Supervised Fine-Tuning</title>
        <para xml:id="S3.SS1.SSS2.p1">
          <p>Using this dataset, we fine-tune the base model with LoRA adaptation, enabling it to ground multi-step reasoning and recommendations in explicit temperament profiles. The model produces structured, explainable responses aligned with psychological theory and practical caregiving, enhancing personalization while ensuring transparent chain-of-thought reasoning.</p>
        </para>
      </subsubsection>
    </subsection>
    <subsection inlist="toc" xml:id="S3.SS2">
      <tags>
        <tag>3.2</tag>
        <tag role="autoref">subsection 3.2</tag>
        <tag role="refnum">3.2</tag>
        <tag role="typerefnum">§3.2</tag>
      </tags>
      <title><tag close=" ">3.2</tag>Group Relative Preference Optimization</title>
      <subsubsection inlist="toc" xml:id="S3.SS2.SSS1">
        <tags>
          <tag>3.2.1</tag>
          <tag role="autoref">subsubsection 3.2.1</tag>
          <tag role="refnum">3.2.1</tag>
          <tag role="typerefnum">§3.2.1</tag>
        </tags>
        <title><tag close=" ">3.2.1</tag>GRPO Algorithm</title>
        <para xml:id="S3.SS2.SSS1.p1">
          <p>During this phase, we adopt the GRPO algorithm to update the model based on group-relative advantage. For each scenario, <Math mode="inline" tex="G" text="G" xml:id="S3.SS2.SSS1.p1.m1">
              <XMath>
                <XMTok font="italic" role="UNKNOWN">G</XMTok>
              </XMath>
            </Math> candidate responses are sampled from the old policy <Math mode="inline" tex="\pi_{\text{old}}" text="pi _ [old]" xml:id="S3.SS2.SSS1.p1.m2">
              <XMath>
                <XMApp>
                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                  <XMTok font="italic" name="pi" role="UNKNOWN">π</XMTok>
                  <XMText><text fontsize="70%">old</text></XMText>
                </XMApp>
              </XMath>
            </Math>, each assigned a reward <Math mode="inline" tex="r_{i}" text="r _ i" xml:id="S3.SS2.SSS1.p1.m3">
              <XMath>
                <XMApp>
                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                  <XMTok font="italic" role="UNKNOWN">r</XMTok>
                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">i</XMTok>
                </XMApp>
              </XMath>
            </Math>. The group-relative advantage for the <Math mode="inline" tex="i" text="i" xml:id="S3.SS2.SSS1.p1.m4">
              <XMath>
                <XMTok font="italic" role="UNKNOWN">i</XMTok>
              </XMath>
            </Math>-th output is calculated as:</p>
          <equation xml:id="S3.E1">
            <tags>
              <tag><text fontsize="90%">(1)</text></tag>
              <tag role="autoref"><text fontsize="90%">Equation 1</text></tag>
              <tag role="refnum"><text fontsize="90%">1</text></tag>
            </tags>
            <Math mode="display" tex="A_{i}=\frac{r_{i}-\mu_{\{r\}}}{\sigma_{\{r\}}}" text="A _ i = (r _ i - mu _ (set@(r))) / sigma _ (set@(r))" xml:id="S3.E1.m1">
              <XMath>
                <XMApp>
                  <XMTok fontsize="90%" meaning="equals" role="RELOP">=</XMTok>
                  <XMApp>
                    <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="italic" fontsize="90%" role="UNKNOWN">A</XMTok>
                    <XMTok font="italic" fontsize="63%" role="UNKNOWN">i</XMTok>
                  </XMApp>
                  <XMApp>
                    <XMTok mathstyle="display" meaning="divide" role="FRACOP"/>
                    <XMApp>
                      <XMTok fontsize="90%" meaning="minus" role="ADDOP">-</XMTok>
                      <XMApp>
                        <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                        <XMTok font="italic" fontsize="90%" role="UNKNOWN">r</XMTok>
                        <XMTok font="italic" fontsize="63%" role="UNKNOWN">i</XMTok>
                      </XMApp>
                      <XMApp>
                        <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                        <XMTok font="italic" fontsize="90%" name="mu" role="UNKNOWN">μ</XMTok>
                        <XMDual>
                          <XMApp>
                            <XMTok meaning="set"/>
                            <XMRef idref="S3.E1.m1.1"/>
                          </XMApp>
                          <XMWrap>
                            <XMTok fontsize="63%" role="OPEN" stretchy="false">{</XMTok>
                            <XMTok font="italic" fontsize="63%" role="UNKNOWN" xml:id="S3.E1.m1.1">r</XMTok>
                            <XMTok fontsize="63%" role="CLOSE" stretchy="false">}</XMTok>
                          </XMWrap>
                        </XMDual>
                      </XMApp>
                    </XMApp>
                    <XMApp>
                      <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                      <XMTok font="italic" fontsize="90%" name="sigma" role="UNKNOWN">σ</XMTok>
                      <XMDual>
                        <XMApp>
                          <XMTok meaning="set"/>
                          <XMRef idref="S3.E1.m1.2"/>
                        </XMApp>
                        <XMWrap>
                          <XMTok fontsize="63%" role="OPEN" stretchy="false">{</XMTok>
                          <XMTok font="italic" fontsize="63%" role="UNKNOWN" xml:id="S3.E1.m1.2">r</XMTok>
                          <XMTok fontsize="63%" role="CLOSE" stretchy="false">}</XMTok>
                        </XMWrap>
                      </XMDual>
                    </XMApp>
                  </XMApp>
                </XMApp>
              </XMath>
            </Math>
          </equation>
        </para>
        <para xml:id="S3.SS2.SSS1.p2">
          <p>where <Math mode="inline" tex="r_{i}" text="r _ i" xml:id="S3.SS2.SSS1.p2.m1">
              <XMath>
                <XMApp>
                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                  <XMTok font="italic" role="UNKNOWN">r</XMTok>
                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">i</XMTok>
                </XMApp>
              </XMath>
            </Math> is the reward of the <Math mode="inline" tex="i" text="i" xml:id="S3.SS2.SSS1.p2.m2">
              <XMath>
                <XMTok font="italic" role="UNKNOWN">i</XMTok>
              </XMath>
            </Math>-th output, and <Math mode="inline" tex="\mu_{\{r\}}" text="mu _ (set@(r))" xml:id="S3.SS2.SSS1.p2.m3">
              <XMath>
                <XMApp>
                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                  <XMTok font="italic" name="mu" role="UNKNOWN">μ</XMTok>
                  <XMDual>
                    <XMApp>
                      <XMTok meaning="set"/>
                      <XMRef idref="S3.SS2.SSS1.p2.m3.1"/>
                    </XMApp>
                    <XMWrap>
                      <XMTok fontsize="70%" role="OPEN" stretchy="false">{</XMTok>
                      <XMTok font="italic" fontsize="70%" role="UNKNOWN" xml:id="S3.SS2.SSS1.p2.m3.1">r</XMTok>
                      <XMTok fontsize="70%" role="CLOSE" stretchy="false">}</XMTok>
                    </XMWrap>
                  </XMDual>
                </XMApp>
              </XMath>
            </Math>, <Math mode="inline" tex="\sigma_{\{r\}}" text="sigma _ (set@(r))" xml:id="S3.SS2.SSS1.p2.m4">
              <XMath>
                <XMApp>
                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                  <XMTok font="italic" name="sigma" role="UNKNOWN">σ</XMTok>
                  <XMDual>
                    <XMApp>
                      <XMTok meaning="set"/>
                      <XMRef idref="S3.SS2.SSS1.p2.m4.1"/>
                    </XMApp>
                    <XMWrap>
                      <XMTok fontsize="70%" role="OPEN" stretchy="false">{</XMTok>
                      <XMTok font="italic" fontsize="70%" role="UNKNOWN" xml:id="S3.SS2.SSS1.p2.m4.1">r</XMTok>
                      <XMTok fontsize="70%" role="CLOSE" stretchy="false">}</XMTok>
                    </XMWrap>
                  </XMDual>
                </XMApp>
              </XMath>
            </Math> are the mean and standard deviation of reward values in the group. The policy is optimized to maximize:</p>
        </para>
        <para xml:id="S3.SS2.SSS1.p3">
          <equationgroup class="ltx_eqn_align" xml:id="A4.EGx1">
            <equation xml:id="S3.Ex1">
              <MathFork>
                <Math tex="\displaystyle\mathcal{J}_{\text{GRPO}}(\theta)=\mathbb{E}_{S}\Bigg{[}\frac{1}{%&#10;G}\sum_{i=1}^{G}\min\Big{(}r_{i}A_{i},\,\mathrm{clip}(r_{i},1-\epsilon,1+%&#10;\epsilon)A_{i}\Big{)}" xml:id="S3.Ex1.m3">
                  <XMath>
                    <XMApp>
                      <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                      <XMTok font="caligraphic" fontsize="70%" role="UNKNOWN">J</XMTok>
                      <XMText><text fontsize="49%">GRPO</text></XMText>
                    </XMApp>
                    <XMWrap>
                      <XMTok fontsize="70%" role="OPEN" stretchy="false">(</XMTok>
                      <XMTok font="italic" fontsize="70%" name="theta" role="UNKNOWN" xml:id="S3.Ex1.m3.1">θ</XMTok>
                      <XMTok fontsize="70%" role="CLOSE" stretchy="false">)</XMTok>
                    </XMWrap>
                    <XMTok fontsize="70%" meaning="equals" role="RELOP">=</XMTok>
                    <XMApp>
                      <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                      <XMTok font="blackboard" fontsize="70%" role="UNKNOWN">E</XMTok>
                      <XMTok font="italic" fontsize="49%" role="UNKNOWN">S</XMTok>
                    </XMApp>
                    <XMWrap>
                      <XMTok fontsize="260%" role="OPEN" stretchy="false">[</XMTok>
                      <XMApp>
                        <XMTok mathstyle="display" meaning="divide" role="FRACOP"/>
                        <XMTok fontsize="70%" meaning="1" role="NUMBER">1</XMTok>
                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">G</XMTok>
                      </XMApp>
                      <XMApp scriptpos="mid">
                        <XMTok role="SUPERSCRIPTOP" scriptpos="mid1"/>
                        <XMApp scriptpos="mid">
                          <XMTok role="SUBSCRIPTOP" scriptpos="mid1"/>
                          <XMTok fontsize="70%" mathstyle="display" meaning="sum" role="SUMOP" scriptpos="mid">∑</XMTok>
                          <XMApp>
                            <XMTok fontsize="49%" meaning="equals" role="RELOP">=</XMTok>
                            <XMTok font="italic" fontsize="49%" role="UNKNOWN">i</XMTok>
                            <XMTok fontsize="49%" meaning="1" role="NUMBER">1</XMTok>
                          </XMApp>
                        </XMApp>
                        <XMTok font="italic" fontsize="49%" role="UNKNOWN">G</XMTok>
                      </XMApp>
                      <XMTok fontsize="70%" meaning="minimum" role="OPFUNCTION" scriptpos="mid" xml:id="S3.Ex1.m3.2">min</XMTok>
                      <XMWrap>
                        <XMTok fontsize="160%" role="OPEN" stretchy="false">(</XMTok>
                        <XMApp>
                          <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                          <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                          <XMTok font="italic" fontsize="49%" role="UNKNOWN">i</XMTok>
                        </XMApp>
                        <XMApp>
                          <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                          <XMTok font="italic" fontsize="70%" role="UNKNOWN">A</XMTok>
                          <XMTok font="italic" fontsize="49%" role="UNKNOWN">i</XMTok>
                        </XMApp>
                        <XMTok fontsize="70%" role="PUNCT" rpadding="1.7pt">,</XMTok>
                        <XMTok fontsize="70%" role="UNKNOWN">clip</XMTok>
                        <XMWrap>
                          <XMTok fontsize="70%" role="OPEN" stretchy="false">(</XMTok>
                          <XMApp>
                            <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                            <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                            <XMTok font="italic" fontsize="49%" role="UNKNOWN">i</XMTok>
                          </XMApp>
                          <XMTok fontsize="70%" role="PUNCT">,</XMTok>
                          <XMTok fontsize="70%" meaning="1" role="NUMBER">1</XMTok>
                          <XMTok fontsize="70%" meaning="minus" role="ADDOP">-</XMTok>
                          <XMTok font="italic" fontsize="70%" name="epsilon" role="UNKNOWN">ϵ</XMTok>
                          <XMTok fontsize="70%" role="PUNCT">,</XMTok>
                          <XMTok fontsize="70%" meaning="1" role="NUMBER">1</XMTok>
                          <XMTok fontsize="70%" meaning="plus" role="ADDOP">+</XMTok>
                          <XMTok font="italic" fontsize="70%" name="epsilon" role="UNKNOWN">ϵ</XMTok>
                          <XMTok fontsize="70%" role="CLOSE" stretchy="false">)</XMTok>
                        </XMWrap>
                        <XMApp>
                          <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                          <XMTok font="italic" fontsize="70%" role="UNKNOWN">A</XMTok>
                          <XMTok font="italic" fontsize="49%" role="UNKNOWN">i</XMTok>
                        </XMApp>
                        <XMTok fontsize="160%" role="CLOSE" stretchy="false">)</XMTok>
                      </XMWrap>
                    </XMWrap>
                  </XMath>
                </Math>
                <MathBranch>
                  <td align="right"><Math mode="inline" tex="\displaystyle\mathcal{J}_{\text{GRPO}}(\theta)=" text="J _ [GRPO] * theta = absent" xml:id="S3.Ex1.m1">
                      <XMath>
                        <XMApp>
                          <XMTok fontsize="70%" meaning="equals" role="RELOP">=</XMTok>
                          <XMApp>
                            <XMTok meaning="times" role="MULOP">⁢</XMTok>
                            <XMApp>
                              <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                              <XMTok font="caligraphic" fontsize="70%" role="UNKNOWN">J</XMTok>
                              <XMText><text fontsize="49%">GRPO</text></XMText>
                            </XMApp>
                            <XMDual>
                              <XMRef idref="S3.Ex1.m1.1"/>
                              <XMWrap>
                                <XMTok fontsize="70%" role="OPEN" stretchy="false">(</XMTok>
                                <XMTok font="italic" fontsize="70%" name="theta" role="UNKNOWN" xml:id="S3.Ex1.m1.1">θ</XMTok>
                                <XMTok fontsize="70%" role="CLOSE" stretchy="false">)</XMTok>
                              </XMWrap>
                            </XMDual>
                          </XMApp>
                          <XMTok meaning="absent"/>
                        </XMApp>
                      </XMath>
                    </Math></td>
                  <td align="left"><Math mode="inline" tex="\displaystyle\mathbb{E}_{S}\Bigg{[}\frac{1}{G}\sum_{i=1}^{G}\min\Big{(}r_{i}A_%&#10;{i},\,\mathrm{clip}(r_{i},1-\epsilon,1+\epsilon)A_{i}\Big{)}" xml:id="S3.Ex1.m2">
                      <XMath>
                        <XMApp>
                          <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                          <XMTok font="blackboard" fontsize="70%" role="UNKNOWN">E</XMTok>
                          <XMTok font="italic" fontsize="49%" role="UNKNOWN">S</XMTok>
                        </XMApp>
                        <XMWrap>
                          <XMTok fontsize="260%" role="OPEN" stretchy="false">[</XMTok>
                          <XMApp>
                            <XMTok mathstyle="display" meaning="divide" role="FRACOP"/>
                            <XMTok fontsize="70%" meaning="1" role="NUMBER">1</XMTok>
                            <XMTok font="italic" fontsize="70%" role="UNKNOWN">G</XMTok>
                          </XMApp>
                          <XMApp scriptpos="mid">
                            <XMTok role="SUPERSCRIPTOP" scriptpos="mid1"/>
                            <XMApp scriptpos="mid">
                              <XMTok role="SUBSCRIPTOP" scriptpos="mid1"/>
                              <XMTok fontsize="70%" mathstyle="display" meaning="sum" role="SUMOP" scriptpos="mid">∑</XMTok>
                              <XMApp>
                                <XMTok fontsize="49%" meaning="equals" role="RELOP">=</XMTok>
                                <XMTok font="italic" fontsize="49%" role="UNKNOWN">i</XMTok>
                                <XMTok fontsize="49%" meaning="1" role="NUMBER">1</XMTok>
                              </XMApp>
                            </XMApp>
                            <XMTok font="italic" fontsize="49%" role="UNKNOWN">G</XMTok>
                          </XMApp>
                          <XMTok fontsize="70%" meaning="minimum" role="OPFUNCTION" scriptpos="mid" xml:id="S3.Ex1.m2.1">min</XMTok>
                          <XMWrap>
                            <XMTok fontsize="160%" role="OPEN" stretchy="false">(</XMTok>
                            <XMApp>
                              <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                              <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                              <XMTok font="italic" fontsize="49%" role="UNKNOWN">i</XMTok>
                            </XMApp>
                            <XMApp>
                              <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                              <XMTok font="italic" fontsize="70%" role="UNKNOWN">A</XMTok>
                              <XMTok font="italic" fontsize="49%" role="UNKNOWN">i</XMTok>
                            </XMApp>
                            <XMTok fontsize="70%" role="PUNCT" rpadding="1.7pt">,</XMTok>
                            <XMTok fontsize="70%" role="UNKNOWN">clip</XMTok>
                            <XMWrap>
                              <XMTok fontsize="70%" role="OPEN" stretchy="false">(</XMTok>
                              <XMApp>
                                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                                <XMTok font="italic" fontsize="49%" role="UNKNOWN">i</XMTok>
                              </XMApp>
                              <XMTok fontsize="70%" role="PUNCT">,</XMTok>
                              <XMTok fontsize="70%" meaning="1" role="NUMBER">1</XMTok>
                              <XMTok fontsize="70%" meaning="minus" role="ADDOP">-</XMTok>
                              <XMTok font="italic" fontsize="70%" name="epsilon" role="UNKNOWN">ϵ</XMTok>
                              <XMTok fontsize="70%" role="PUNCT">,</XMTok>
                              <XMTok fontsize="70%" meaning="1" role="NUMBER">1</XMTok>
                              <XMTok fontsize="70%" meaning="plus" role="ADDOP">+</XMTok>
                              <XMTok font="italic" fontsize="70%" name="epsilon" role="UNKNOWN">ϵ</XMTok>
                              <XMTok fontsize="70%" role="CLOSE" stretchy="false">)</XMTok>
                            </XMWrap>
                            <XMApp>
                              <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                              <XMTok font="italic" fontsize="70%" role="UNKNOWN">A</XMTok>
                              <XMTok font="italic" fontsize="49%" role="UNKNOWN">i</XMTok>
                            </XMApp>
                            <XMTok fontsize="160%" role="CLOSE" stretchy="false">)</XMTok>
                          </XMWrap>
                        </XMWrap>
                      </XMath>
                    </Math></td>
                </MathBranch>
              </MathFork>
            </equation>
            <equation xml:id="S3.E2">
              <tags>
                <tag><text fontsize="70%">(2)</text></tag>
                <tag role="autoref"><text fontsize="70%">Equation 2</text></tag>
                <tag role="refnum"><text fontsize="70%">2</text></tag>
              </tags>
              <MathFork>
                <Math tex="\displaystyle\qquad-\,\beta D_{\mathrm{KL}}(\pi_{\theta}\|\pi_{\text{ref}})%&#10;\Bigg{]}" xml:id="S3.E2.m3">
                  <XMath>
                    <XMTok fontsize="70%" role="UNKNOWN">  </XMTok>
                    <XMTok fontsize="70%" meaning="minus" role="ADDOP" rpadding="1.7pt">-</XMTok>
                    <XMTok font="italic" fontsize="70%" name="beta" role="UNKNOWN">β</XMTok>
                    <XMApp>
                      <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">D</XMTok>
                      <XMTok fontsize="49%" role="UNKNOWN">KL</XMTok>
                    </XMApp>
                    <XMWrap>
                      <XMTok fontsize="70%" role="OPEN" stretchy="false">(</XMTok>
                      <XMApp>
                        <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                        <XMTok font="italic" fontsize="70%" name="pi" role="UNKNOWN">π</XMTok>
                        <XMTok font="italic" fontsize="49%" name="theta" role="UNKNOWN">θ</XMTok>
                      </XMApp>
                      <XMTok fontsize="70%" meaning="parallel-to" name="||" role="VERTBAR">∥</XMTok>
                      <XMApp>
                        <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                        <XMTok font="italic" fontsize="70%" name="pi" role="UNKNOWN">π</XMTok>
                        <XMText><text fontsize="49%">ref</text></XMText>
                      </XMApp>
                      <XMTok fontsize="70%" role="CLOSE" stretchy="false">)</XMTok>
                    </XMWrap>
                    <XMTok fontsize="260%" role="CLOSE" stretchy="false">]</XMTok>
                  </XMath>
                </Math>
                <MathBranch>
                  <td align="right"/>
                  <td align="left"><Math mode="inline" tex="\displaystyle\qquad-\,\beta D_{\mathrm{KL}}(\pi_{\theta}\|\pi_{\text{ref}})%&#10;\Bigg{]}" xml:id="S3.E2.m2">
                      <XMath>
                        <XMTok fontsize="70%" role="UNKNOWN">  </XMTok>
                        <XMTok fontsize="70%" meaning="minus" role="ADDOP" rpadding="1.7pt">-</XMTok>
                        <XMTok font="italic" fontsize="70%" name="beta" role="UNKNOWN">β</XMTok>
                        <XMApp>
                          <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                          <XMTok font="italic" fontsize="70%" role="UNKNOWN">D</XMTok>
                          <XMTok fontsize="49%" role="UNKNOWN">KL</XMTok>
                        </XMApp>
                        <XMWrap>
                          <XMTok fontsize="70%" role="OPEN" stretchy="false">(</XMTok>
                          <XMApp>
                            <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                            <XMTok font="italic" fontsize="70%" name="pi" role="UNKNOWN">π</XMTok>
                            <XMTok font="italic" fontsize="49%" name="theta" role="UNKNOWN">θ</XMTok>
                          </XMApp>
                          <XMTok fontsize="70%" meaning="parallel-to" name="||" role="VERTBAR">∥</XMTok>
                          <XMApp>
                            <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                            <XMTok font="italic" fontsize="70%" name="pi" role="UNKNOWN">π</XMTok>
                            <XMText><text fontsize="49%">ref</text></XMText>
                          </XMApp>
                          <XMTok fontsize="70%" role="CLOSE" stretchy="false">)</XMTok>
                        </XMWrap>
                        <XMTok fontsize="260%" role="CLOSE" stretchy="false">]</XMTok>
                      </XMath>
                    </Math></td>
                </MathBranch>
              </MathFork>
            </equation>
          </equationgroup>
        </para>
<!--  %****␣acl_latex.tex␣Line␣225␣**** -->        <para xml:id="S3.SS2.SSS1.p4">
          <p>where <Math mode="inline" tex="D_{\mathrm{KL}}" text="D _ KL" xml:id="S3.SS2.SSS1.p4.m1">
              <XMath>
                <XMApp>
                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                  <XMTok font="italic" role="UNKNOWN">D</XMTok>
                  <XMTok fontsize="70%" role="UNKNOWN">KL</XMTok>
                </XMApp>
              </XMath>
            </Math> is the KL-divergence penalty for regularization. This group-based mechanism encourages the model to generate outputs whose rewards exceed the group average, leading to more robust and stable preference alignment.</p>
        </para>
      </subsubsection>
      <subsubsection inlist="toc" xml:id="S3.SS2.SSS2">
        <tags>
          <tag>3.2.2</tag>
          <tag role="autoref">subsubsection 3.2.2</tag>
          <tag role="refnum">3.2.2</tag>
          <tag role="typerefnum">§3.2.2</tag>
        </tags>
        <title><tag close=" ">3.2.2</tag>Reward Design</title>
        <para xml:id="S3.SS2.SSS2.p1">
          <p>Each candidate output <Math mode="inline" tex="y" text="y" xml:id="S3.SS2.SSS2.p1.m1">
              <XMath>
                <XMTok font="italic" role="UNKNOWN">y</XMTok>
              </XMath>
            </Math> is evaluated by a composite reward function, capturing three critical aspects:</p>
          <equation xml:id="S3.Ex2">
            <Math mode="display" tex="\mathcal{R}(y)=R_{\text{fmt}}(y)+R_{\text{temp}}(y)+R_{\text{know}}(y)" text="R * y = R _ [fmt] * y + R _ [temp] * y + R _ [know] * y" xml:id="S3.Ex2.m1">
              <XMath>
                <XMApp>
                  <XMTok meaning="equals" role="RELOP">=</XMTok>
                  <XMApp>
                    <XMTok meaning="times" role="MULOP">⁢</XMTok>
                    <XMTok font="caligraphic" role="UNKNOWN">R</XMTok>
                    <XMDual>
                      <XMRef idref="S3.Ex2.m1.1"/>
                      <XMWrap>
                        <XMTok role="OPEN" stretchy="false">(</XMTok>
                        <XMTok font="italic" role="UNKNOWN" xml:id="S3.Ex2.m1.1">y</XMTok>
                        <XMTok role="CLOSE" stretchy="false">)</XMTok>
                      </XMWrap>
                    </XMDual>
                  </XMApp>
                  <XMApp>
                    <XMTok meaning="plus" role="ADDOP">+</XMTok>
                    <XMApp>
                      <XMTok meaning="times" role="MULOP">⁢</XMTok>
                      <XMApp>
                        <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                        <XMTok font="italic" role="UNKNOWN">R</XMTok>
                        <XMText><text fontsize="70%">fmt</text></XMText>
                      </XMApp>
                      <XMDual>
                        <XMRef idref="S3.Ex2.m1.2"/>
                        <XMWrap>
                          <XMTok role="OPEN" stretchy="false">(</XMTok>
                          <XMTok font="italic" role="UNKNOWN" xml:id="S3.Ex2.m1.2">y</XMTok>
                          <XMTok role="CLOSE" stretchy="false">)</XMTok>
                        </XMWrap>
                      </XMDual>
                    </XMApp>
                    <XMApp>
                      <XMTok meaning="times" role="MULOP">⁢</XMTok>
                      <XMApp>
                        <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                        <XMTok font="italic" role="UNKNOWN">R</XMTok>
                        <XMText><text fontsize="70%">temp</text></XMText>
                      </XMApp>
                      <XMDual>
                        <XMRef idref="S3.Ex2.m1.3"/>
                        <XMWrap>
                          <XMTok role="OPEN" stretchy="false">(</XMTok>
                          <XMTok font="italic" role="UNKNOWN" xml:id="S3.Ex2.m1.3">y</XMTok>
                          <XMTok role="CLOSE" stretchy="false">)</XMTok>
                        </XMWrap>
                      </XMDual>
                    </XMApp>
                    <XMApp>
                      <XMTok meaning="times" role="MULOP">⁢</XMTok>
                      <XMApp>
                        <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                        <XMTok font="italic" role="UNKNOWN">R</XMTok>
                        <XMText><text fontsize="70%">know</text></XMText>
                      </XMApp>
                      <XMDual>
                        <XMRef idref="S3.Ex2.m1.4"/>
                        <XMWrap>
                          <XMTok role="OPEN" stretchy="false">(</XMTok>
                          <XMTok font="italic" role="UNKNOWN" xml:id="S3.Ex2.m1.4">y</XMTok>
                          <XMTok role="CLOSE" stretchy="false">)</XMTok>
                        </XMWrap>
                      </XMDual>
                    </XMApp>
                  </XMApp>
                </XMApp>
              </XMath>
            </Math>
          </equation>
          <p>where</p>
          <equationgroup class="ltx_eqn_align" xml:id="A4.EGx2">
            <equation xml:id="S3.E3">
              <tags>
                <tag><text fontsize="70%">(3)</text></tag>
                <tag role="autoref"><text fontsize="70%">Equation 3</text></tag>
                <tag role="refnum"><text fontsize="70%">3</text></tag>
              </tags>
              <MathFork>
                <Math tex="\displaystyle R_{\text{fmt}}(y)=\begin{cases}1&amp;\text{if output strictly %&#10;matches constructed format}\\&#10;0&amp;\text{otherwise}\end{cases}" text="R _ [fmt] * y = cases@(1, [if output strictly matches constructed format], 0, [otherwise])" xml:id="S3.E3.m3">
                  <XMath>
                    <XMApp>
                      <XMTok fontsize="70%" meaning="equals" role="RELOP">=</XMTok>
                      <XMApp>
                        <XMTok meaning="times" role="MULOP">⁢</XMTok>
                        <XMApp>
                          <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                          <XMTok font="italic" fontsize="70%" role="UNKNOWN">R</XMTok>
                          <XMText><text fontsize="49%">fmt</text></XMText>
                        </XMApp>
                        <XMDual>
                          <XMRef idref="S3.E3.m3.1"/>
                          <XMWrap>
                            <XMTok fontsize="70%" role="OPEN" stretchy="false">(</XMTok>
                            <XMTok font="italic" fontsize="70%" role="UNKNOWN" xml:id="S3.E3.m3.1">y</XMTok>
                            <XMTok fontsize="70%" role="CLOSE" stretchy="false">)</XMTok>
                          </XMWrap>
                        </XMDual>
                      </XMApp>
                      <XMDual>
                        <XMApp>
                          <XMTok meaning="cases"/>
                          <XMRef idref="S3.E3.m2.1.mf"/>
                          <XMRef idref="S3.E3.m2.2.mf"/>
                          <XMRef idref="S3.E3.m2.3.mf"/>
                          <XMRef idref="S3.E3.m2.4.mf"/>
                        </XMApp>
                        <XMWrap>
                          <XMTok fontsize="70%" role="OPEN" stretchy="true">{</XMTok>
                          <XMArray>
                            <XMRow>
                              <XMCell align="left">
                                <XMTok fontsize="70%" meaning="1" role="NUMBER" xml:id="S3.E3.m2.1.mf">1</XMTok>
                              </XMCell>
                              <XMCell align="left">
                                <XMText xml:id="S3.E3.m2.2.mf"><text fontsize="70%">if output strictly matches constructed format</text></XMText>
                              </XMCell>
                            </XMRow>
                            <XMRow>
                              <XMCell align="left">
                                <XMTok fontsize="70%" meaning="0" role="NUMBER" xml:id="S3.E3.m2.3.mf">0</XMTok>
                              </XMCell>
                              <XMCell align="left">
                                <XMText xml:id="S3.E3.m2.4.mf"><text fontsize="70%">otherwise</text></XMText>
                              </XMCell>
                            </XMRow>
                          </XMArray>
                        </XMWrap>
                      </XMDual>
                    </XMApp>
                  </XMath>
                </Math>
                <MathBranch>
                  <td align="right"><Math mode="inline" tex="\displaystyle R_{\text{fmt}}(y)" text="R _ [fmt] * y" xml:id="S3.E3.m1">
                      <XMath>
                        <XMApp>
                          <XMTok meaning="times" role="MULOP">⁢</XMTok>
                          <XMApp>
                            <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                            <XMTok font="italic" fontsize="70%" role="UNKNOWN">R</XMTok>
                            <XMText><text fontsize="49%">fmt</text></XMText>
                          </XMApp>
                          <XMDual>
                            <XMRef idref="S3.E3.m1.1"/>
                            <XMWrap>
                              <XMTok fontsize="70%" role="OPEN" stretchy="false">(</XMTok>
                              <XMTok font="italic" fontsize="70%" role="UNKNOWN" xml:id="S3.E3.m1.1">y</XMTok>
                              <XMTok fontsize="70%" role="CLOSE" stretchy="false">)</XMTok>
                            </XMWrap>
                          </XMDual>
                        </XMApp>
                      </XMath>
                    </Math></td>
                  <td align="left"><Math mode="inline" tex="\displaystyle=\begin{cases}1&amp;\text{if output strictly matches constructed %&#10;format}\\&#10;0&amp;\text{otherwise}\end{cases}" text="absent = cases@(1, [if output strictly matches constructed format], 0, [otherwise])" xml:id="S3.E3.m2">
                      <XMath>
                        <XMApp>
                          <XMTok fontsize="70%" meaning="equals" role="RELOP">=</XMTok>
                          <XMTok meaning="absent"/>
                          <XMDual>
                            <XMApp>
                              <XMTok meaning="cases"/>
                              <XMRef idref="S3.E3.m2.1"/>
                              <XMRef idref="S3.E3.m2.2"/>
                              <XMRef idref="S3.E3.m2.3"/>
                              <XMRef idref="S3.E3.m2.4"/>
                            </XMApp>
                            <XMWrap>
                              <XMTok fontsize="70%" role="OPEN" stretchy="true">{</XMTok>
                              <XMArray>
                                <XMRow>
                                  <XMCell align="left">
                                    <XMTok fontsize="70%" meaning="1" role="NUMBER" xml:id="S3.E3.m2.1">1</XMTok>
                                  </XMCell>
                                  <XMCell align="left">
                                    <XMText xml:id="S3.E3.m2.2"><text fontsize="70%">if output strictly matches constructed format</text></XMText>
                                  </XMCell>
                                </XMRow>
                                <XMRow>
                                  <XMCell align="left">
                                    <XMTok fontsize="70%" meaning="0" role="NUMBER" xml:id="S3.E3.m2.3">0</XMTok>
                                  </XMCell>
                                  <XMCell align="left">
                                    <XMText xml:id="S3.E3.m2.4"><text fontsize="70%">otherwise</text></XMText>
                                  </XMCell>
                                </XMRow>
                              </XMArray>
                            </XMWrap>
                          </XMDual>
                        </XMApp>
                      </XMath>
                    </Math></td>
                </MathBranch>
              </MathFork>
            </equation>
            <equation xml:id="S3.E4">
              <tags>
                <tag><text fontsize="70%">(4)</text></tag>
                <tag role="autoref"><text fontsize="70%">Equation 4</text></tag>
                <tag role="refnum"><text fontsize="70%">4</text></tag>
              </tags>
              <MathFork>
                <Math tex="\displaystyle R_{\text{temp}}(y)=\begin{cases}1&amp;\text{if reasoning aligns with%&#10; temperament knowledge}\\&#10;0&amp;\text{otherwise}\end{cases}" text="R _ [temp] * y = cases@(1, [if reasoning aligns with temperament knowledge], 0, [otherwise])" xml:id="S3.E4.m3">
                  <XMath>
                    <XMApp>
                      <XMTok fontsize="70%" meaning="equals" role="RELOP">=</XMTok>
                      <XMApp>
                        <XMTok meaning="times" role="MULOP">⁢</XMTok>
                        <XMApp>
                          <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                          <XMTok font="italic" fontsize="70%" role="UNKNOWN">R</XMTok>
                          <XMText><text fontsize="49%">temp</text></XMText>
                        </XMApp>
                        <XMDual>
                          <XMRef idref="S3.E4.m3.1"/>
                          <XMWrap>
                            <XMTok fontsize="70%" role="OPEN" stretchy="false">(</XMTok>
                            <XMTok font="italic" fontsize="70%" role="UNKNOWN" xml:id="S3.E4.m3.1">y</XMTok>
                            <XMTok fontsize="70%" role="CLOSE" stretchy="false">)</XMTok>
                          </XMWrap>
                        </XMDual>
                      </XMApp>
                      <XMDual>
                        <XMApp>
                          <XMTok meaning="cases"/>
                          <XMRef idref="S3.E4.m2.1.mf"/>
                          <XMRef idref="S3.E4.m2.2.mf"/>
                          <XMRef idref="S3.E4.m2.3.mf"/>
                          <XMRef idref="S3.E4.m2.4.mf"/>
                        </XMApp>
                        <XMWrap>
                          <XMTok fontsize="70%" role="OPEN" stretchy="true">{</XMTok>
                          <XMArray>
                            <XMRow>
                              <XMCell align="left">
                                <XMTok fontsize="70%" meaning="1" role="NUMBER" xml:id="S3.E4.m2.1.mf">1</XMTok>
                              </XMCell>
                              <XMCell align="left">
                                <XMText xml:id="S3.E4.m2.2.mf"><text fontsize="70%">if reasoning aligns with temperament knowledge</text></XMText>
                              </XMCell>
                            </XMRow>
                            <XMRow>
                              <XMCell align="left">
                                <XMTok fontsize="70%" meaning="0" role="NUMBER" xml:id="S3.E4.m2.3.mf">0</XMTok>
                              </XMCell>
                              <XMCell align="left">
                                <XMText xml:id="S3.E4.m2.4.mf"><text fontsize="70%">otherwise</text></XMText>
                              </XMCell>
                            </XMRow>
                          </XMArray>
                        </XMWrap>
                      </XMDual>
                    </XMApp>
                  </XMath>
                </Math>
                <MathBranch>
                  <td align="right"><Math mode="inline" tex="\displaystyle R_{\text{temp}}(y)" text="R _ [temp] * y" xml:id="S3.E4.m1">
                      <XMath>
                        <XMApp>
                          <XMTok meaning="times" role="MULOP">⁢</XMTok>
                          <XMApp>
                            <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                            <XMTok font="italic" fontsize="70%" role="UNKNOWN">R</XMTok>
                            <XMText><text fontsize="49%">temp</text></XMText>
                          </XMApp>
                          <XMDual>
                            <XMRef idref="S3.E4.m1.1"/>
                            <XMWrap>
                              <XMTok fontsize="70%" role="OPEN" stretchy="false">(</XMTok>
                              <XMTok font="italic" fontsize="70%" role="UNKNOWN" xml:id="S3.E4.m1.1">y</XMTok>
                              <XMTok fontsize="70%" role="CLOSE" stretchy="false">)</XMTok>
                            </XMWrap>
                          </XMDual>
                        </XMApp>
                      </XMath>
                    </Math></td>
                  <td align="left"><Math mode="inline" tex="\displaystyle=\begin{cases}1&amp;\text{if reasoning aligns with temperament %&#10;knowledge}\\&#10;0&amp;\text{otherwise}\end{cases}" text="absent = cases@(1, [if reasoning aligns with temperament knowledge], 0, [otherwise])" xml:id="S3.E4.m2">
                      <XMath>
                        <XMApp>
                          <XMTok fontsize="70%" meaning="equals" role="RELOP">=</XMTok>
                          <XMTok meaning="absent"/>
                          <XMDual>
                            <XMApp>
                              <XMTok meaning="cases"/>
                              <XMRef idref="S3.E4.m2.1"/>
                              <XMRef idref="S3.E4.m2.2"/>
                              <XMRef idref="S3.E4.m2.3"/>
                              <XMRef idref="S3.E4.m2.4"/>
                            </XMApp>
                            <XMWrap>
                              <XMTok fontsize="70%" role="OPEN" stretchy="true">{</XMTok>
                              <XMArray>
                                <XMRow>
                                  <XMCell align="left">
                                    <XMTok fontsize="70%" meaning="1" role="NUMBER" xml:id="S3.E4.m2.1">1</XMTok>
                                  </XMCell>
                                  <XMCell align="left">
                                    <XMText xml:id="S3.E4.m2.2"><text fontsize="70%">if reasoning aligns with temperament knowledge</text></XMText>
                                  </XMCell>
                                </XMRow>
                                <XMRow>
                                  <XMCell align="left">
                                    <XMTok fontsize="70%" meaning="0" role="NUMBER" xml:id="S3.E4.m2.3">0</XMTok>
                                  </XMCell>
                                  <XMCell align="left">
                                    <XMText xml:id="S3.E4.m2.4"><text fontsize="70%">otherwise</text></XMText>
                                  </XMCell>
                                </XMRow>
                              </XMArray>
                            </XMWrap>
                          </XMDual>
                        </XMApp>
                      </XMath>
                    </Math></td>
                </MathBranch>
              </MathFork>
            </equation>
            <equation xml:id="S3.E5">
              <tags>
                <tag><text fontsize="70%">(5)</text></tag>
                <tag role="autoref"><text fontsize="70%">Equation 5</text></tag>
                <tag role="refnum"><text fontsize="70%">5</text></tag>
              </tags>
              <MathFork>
                <Math tex="\displaystyle R_{\text{know}}(y)=\begin{cases}1&amp;\text{if answer is fully %&#10;relevant to query/reference}\\&#10;0.5&amp;\text{if answer is partially relevant to query/reference}\\&#10;0&amp;\text{otherwise}\end{cases}" text="R _ [know] * y = cases@(1, [if answer is fully relevant to query/reference], 0.5, [if answer is partially relevant to query/reference], 0, [otherwise])" xml:id="S3.E5.m3">
                  <XMath>
                    <XMApp>
                      <XMTok fontsize="70%" meaning="equals" role="RELOP">=</XMTok>
                      <XMApp>
                        <XMTok meaning="times" role="MULOP">⁢</XMTok>
                        <XMApp>
                          <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                          <XMTok font="italic" fontsize="70%" role="UNKNOWN">R</XMTok>
                          <XMText><text fontsize="49%">know</text></XMText>
                        </XMApp>
                        <XMDual>
                          <XMRef idref="S3.E5.m3.1"/>
                          <XMWrap>
                            <XMTok fontsize="70%" role="OPEN" stretchy="false">(</XMTok>
                            <XMTok font="italic" fontsize="70%" role="UNKNOWN" xml:id="S3.E5.m3.1">y</XMTok>
                            <XMTok fontsize="70%" role="CLOSE" stretchy="false">)</XMTok>
                          </XMWrap>
                        </XMDual>
                      </XMApp>
                      <XMDual>
                        <XMApp>
                          <XMTok meaning="cases"/>
                          <XMRef idref="S3.E5.m2.1.mf"/>
                          <XMRef idref="S3.E5.m2.2.mf"/>
                          <XMRef idref="S3.E5.m2.3.mf"/>
                          <XMRef idref="S3.E5.m2.4.mf"/>
                          <XMRef idref="S3.E5.m2.5.mf"/>
                          <XMRef idref="S3.E5.m2.6.mf"/>
                        </XMApp>
                        <XMWrap>
                          <XMTok fontsize="70%" role="OPEN" stretchy="true">{</XMTok>
                          <XMArray>
                            <XMRow>
                              <XMCell align="left">
                                <XMTok fontsize="70%" meaning="1" role="NUMBER" xml:id="S3.E5.m2.1.mf">1</XMTok>
                              </XMCell>
                              <XMCell align="left">
                                <XMText xml:id="S3.E5.m2.2.mf"><text fontsize="70%">if answer is fully relevant to query/reference</text></XMText>
                              </XMCell>
                            </XMRow>
                            <XMRow>
                              <XMCell align="left">
                                <XMTok fontsize="70%" meaning="0.5" role="NUMBER" xml:id="S3.E5.m2.3.mf">0.5</XMTok>
                              </XMCell>
                              <XMCell align="left">
                                <XMText xml:id="S3.E5.m2.4.mf"><text fontsize="70%">if answer is partially relevant to query/reference</text></XMText>
                              </XMCell>
                            </XMRow>
                            <XMRow>
                              <XMCell align="left">
                                <XMTok fontsize="70%" meaning="0" role="NUMBER" xml:id="S3.E5.m2.5.mf">0</XMTok>
                              </XMCell>
                              <XMCell align="left">
                                <XMText xml:id="S3.E5.m2.6.mf"><text fontsize="70%">otherwise</text></XMText>
                              </XMCell>
                            </XMRow>
                          </XMArray>
                        </XMWrap>
                      </XMDual>
                    </XMApp>
                  </XMath>
                </Math>
                <MathBranch>
                  <td align="right"><Math mode="inline" tex="\displaystyle R_{\text{know}}(y)" text="R _ [know] * y" xml:id="S3.E5.m1">
                      <XMath>
                        <XMApp>
                          <XMTok meaning="times" role="MULOP">⁢</XMTok>
                          <XMApp>
                            <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                            <XMTok font="italic" fontsize="70%" role="UNKNOWN">R</XMTok>
                            <XMText><text fontsize="49%">know</text></XMText>
                          </XMApp>
                          <XMDual>
                            <XMRef idref="S3.E5.m1.1"/>
                            <XMWrap>
                              <XMTok fontsize="70%" role="OPEN" stretchy="false">(</XMTok>
                              <XMTok font="italic" fontsize="70%" role="UNKNOWN" xml:id="S3.E5.m1.1">y</XMTok>
                              <XMTok fontsize="70%" role="CLOSE" stretchy="false">)</XMTok>
                            </XMWrap>
                          </XMDual>
                        </XMApp>
                      </XMath>
                    </Math></td>
                  <td align="left"><Math mode="inline" tex="\displaystyle=\begin{cases}1&amp;\text{if answer is fully relevant to query/%&#10;reference}\\&#10;0.5&amp;\text{if answer is partially relevant to query/reference}\\&#10;0&amp;\text{otherwise}\end{cases}" text="absent = cases@(1, [if answer is fully relevant to query/reference], 0.5, [if answer is partially relevant to query/reference], 0, [otherwise])" xml:id="S3.E5.m2">
                      <XMath>
                        <XMApp>
                          <XMTok fontsize="70%" meaning="equals" role="RELOP">=</XMTok>
                          <XMTok meaning="absent"/>
                          <XMDual>
                            <XMApp>
                              <XMTok meaning="cases"/>
                              <XMRef idref="S3.E5.m2.1"/>
                              <XMRef idref="S3.E5.m2.2"/>
                              <XMRef idref="S3.E5.m2.3"/>
                              <XMRef idref="S3.E5.m2.4"/>
                              <XMRef idref="S3.E5.m2.5"/>
                              <XMRef idref="S3.E5.m2.6"/>
                            </XMApp>
                            <XMWrap>
                              <XMTok fontsize="70%" role="OPEN" stretchy="true">{</XMTok>
                              <XMArray>
                                <XMRow>
                                  <XMCell align="left">
                                    <XMTok fontsize="70%" meaning="1" role="NUMBER" xml:id="S3.E5.m2.1">1</XMTok>
                                  </XMCell>
                                  <XMCell align="left">
                                    <XMText xml:id="S3.E5.m2.2"><text fontsize="70%">if answer is fully relevant to query/reference</text></XMText>
                                  </XMCell>
                                </XMRow>
                                <XMRow>
                                  <XMCell align="left">
                                    <XMTok fontsize="70%" meaning="0.5" role="NUMBER" xml:id="S3.E5.m2.3">0.5</XMTok>
                                  </XMCell>
                                  <XMCell align="left">
                                    <XMText xml:id="S3.E5.m2.4"><text fontsize="70%">if answer is partially relevant to query/reference</text></XMText>
                                  </XMCell>
                                </XMRow>
                                <XMRow>
                                  <XMCell align="left">
                                    <XMTok fontsize="70%" meaning="0" role="NUMBER" xml:id="S3.E5.m2.5">0</XMTok>
                                  </XMCell>
                                  <XMCell align="left">
                                    <XMText xml:id="S3.E5.m2.6"><text fontsize="70%">otherwise</text></XMText>
                                  </XMCell>
                                </XMRow>
                              </XMArray>
                            </XMWrap>
                          </XMDual>
                        </XMApp>
                      </XMath>
                    </Math></td>
                </MathBranch>
              </MathFork>
            </equation>
          </equationgroup>
          <p>This multi-dimensional reward structure promotes outputs that are standardized, temperament-logical, and professionally grounded. GRPO ensures psychologically sound and domain-aligned recommendations. To support this, we built a dataset of 2,646 temperament-sensitive scenarios from a DeepSeek-V3–assisted parenting encyclopedia and a temperament knowledge graph, with 15% reviewed by pediatric and psychology experts for reliability.</p>
        </para>
      </subsubsection>
    </subsection>
  </section>
  <section inlist="toc" xml:id="S4">
    <tags>
      <tag>4</tag>
      <tag role="autoref">section 4</tag>
      <tag role="refnum">4</tag>
      <tag role="typerefnum">§4</tag>
    </tags>
    <title><tag close=" ">4</tag>Experiments</title>
    <subsection inlist="toc" xml:id="S4.SS1">
      <tags>
        <tag>4.1</tag>
        <tag role="autoref">subsection 4.1</tag>
        <tag role="refnum">4.1</tag>
        <tag role="typerefnum">§4.1</tag>
      </tags>
      <title><tag close=" ">4.1</tag>Benchmark Evaluation</title>
      <para xml:id="S4.SS1.p1">
        <p>We evaluated PediaMind-R1 on 200 temperament-sensitive multiple-choice questions combining infant profiles, caregiving challenges, and candidate strategies. Comparisons were made against the untuned Qwen2.5-7B-Instruct baseline, with ablations for (i) supervised fine-tuning (SFT) and (ii) SFT + GRPO alignment to assess incremental training benefits.</p>
      </para>
      <para xml:id="S4.SS1.p2">
        <p>Each model was prompted in a zero-shot multiple-choice format and required to select the single best answer per scenario. An illustrative example is provided below:</p>
      </para>
      <para xml:id="S4.SS1.p3">
        <quote>
          <p><text font="bold">Scenario:</text>
When guests visit the home, my child immediately hides in their room and refuses to come out for a long time. Temperament: <text font="italic">slow-to-warm-up</text>
<break/>
<text font="bold">Question:</text>
Which of the following parenting strategies is most appropriate for this scenario?
<break/></p>
          <enumerate xml:id="S4.I1">
            <item xml:id="S4.I1.i1">
              <tags>
                <tag>1.</tag>
                <tag role="autoref">item 1</tag>
                <tag role="refnum">1</tag>
                <tag role="typerefnum">item 1</tag>
              </tags>
              <para xml:id="S4.I1.i1.p1">
                <p>Wait for the child to adjust and gently invite them to join when comfortable. <text font="italic">(Best fit: slow-to-warm-up temperament)</text>
<!--  %****␣acl_latex.tex␣Line␣275␣**** --></p>
              </para>
            </item>
            <item xml:id="S4.I1.i2">
              <tags>
                <tag>2.</tag>
                <tag role="autoref">item 2</tag>
                <tag role="refnum">2</tag>
                <tag role="typerefnum">item 2</tag>
              </tags>
              <para xml:id="S4.I1.i2.p1">
                <p>Insist that the child come out right away to face social situations directly.</p>
              </para>
            </item>
            <item xml:id="S4.I1.i3">
              <tags>
                <tag>3.</tag>
                <tag role="autoref">item 3</tag>
                <tag role="refnum">3</tag>
                <tag role="typerefnum">item 3</tag>
              </tags>
              <para xml:id="S4.I1.i3.p1">
                <p>Leave the child alone in their room until they decide to come out.</p>
              </para>
            </item>
          </enumerate>
        </quote>
      </para>
      <table inlist="lot" labels="LABEL:tab:ablation" placement="htbp" xml:id="S4.T1">
        <tags>
          <tag><text fontsize="70%">Table 1</text></tag>
          <tag role="autoref"><text fontsize="70%">Table 1</text></tag>
          <tag role="refnum"><text fontsize="70%">1</text></tag>
          <tag role="typerefnum"><text fontsize="70%">Table 1</text></tag>
        </tags>
<!--  %适度增加行距 
     %列间距比默认稍大
     %保持小字体，符合学术论文风格-->        <tabular class="ltx_centering ltx_guessed_headers" colsep="10.0pt" rowsep="3.0pt" vattach="middle">
          <thead>
            <tr>
              <td align="left" border="tt" thead="column row"><text font="bold" fontsize="70%">Model</text></td>
              <td align="center" border="tt" thead="column"><text font="bold" fontsize="70%">Size</text></td>
              <td align="center" border="tt" thead="column"><text font="bold" fontsize="70%">Accuracy (%)</text></td>
            </tr>
          </thead>
          <tbody>
            <tr>
              <td align="left" border="t" thead="row"><text fontsize="70%">Qwen2.5-7B-Instruct (untuned)</text></td>
              <td align="center" border="t"><text fontsize="70%">7B</text></td>
              <td align="center" border="t"><text fontsize="70%">55.0</text></td>
            </tr>
            <tr>
              <td align="left" thead="row"><text fontsize="70%">PediaMind-R1 (SFT only)</text></td>
              <td align="center"><text fontsize="70%">7B</text></td>
              <td align="center"><text fontsize="70%">62.0</text></td>
            </tr>
            <tr>
              <td align="left" border="bb" thead="row"><text fontsize="70%">PediaMind-R1 (SFT+GRPO)</text></td>
              <td align="center" border="bb"><text fontsize="70%">7B</text></td>
              <td align="center" border="bb"><text font="bold" fontsize="70%">67.0</text></td>
            </tr>
          </tbody>
        </tabular>
        <toccaption class="ltx_centering"><tag close=" "><text fontsize="70%">1</text></tag><text fontsize="70%">Accuracy on temperament-sensitive multiple-choice benchmark (top-1 selection rate) across the Qwen2.5-7B-Instruct baseline and PediaMind-R1 ablation settings.</text></toccaption>
        <caption class="ltx_centering" fontsize="70%"><tag close=": ">Table 1</tag>Accuracy on temperament-sensitive multiple-choice benchmark (top-1 selection rate) across the Qwen2.5-7B-Instruct baseline and PediaMind-R1 ablation settings.</caption>
      </table>
      <para xml:id="S4.SS1.p4">
        <p>As shown in Table <ref labelref="LABEL:tab:ablation"/>, temperament-aware supervised fine-tuning (SFT) markedly enhanced PediaMind-R1, confirming its role in instilling structured reasoning and embedding fundamental temperament knowledge. However, we still noted occasional mismatches between behavioral cues and recommended strategies, pointing to the limited breadth of training scenarios.</p>
      </para>
<!--  %****␣acl_latex.tex␣Line␣300␣**** -->      <para xml:id="S4.SS1.p5">
        <p>Subsequent GRPO alignment further reinforced logical consistency and psychological appropriateness, rewarding outputs more closely aligned with developmental psychology principles. Although the absolute gain was modest, GRPO consistently improved logical consistency and psychological appropriateness across diverse scenarios. Overall, these findings suggest that while SFT provides a solid foundation for temperament-sensitive reasoning, GRPO is essential for consolidating robustness and ensuring more reliable personalization.</p>
      </para>
    </subsection>
    <subsection inlist="toc" xml:id="S4.SS2">
      <tags>
        <tag>4.2</tag>
        <tag role="autoref">subsection 4.2</tag>
        <tag role="refnum">4.2</tag>
        <tag role="typerefnum">§4.2</tag>
      </tags>
      <title><tag close=" ">4.2</tag>Human Assessment</title>
      <para xml:id="S4.SS2.p1">
        <p>We designed 100 scenario-based queries covering diverse infant temperament types and caregiving situations. This evaluation complements the accuracy benchmark by capturing qualitative aspects beyond correctness, such as psychological alignment and caregiving suitability. Three domain experts (developmental psychology PhD, pediatric nursing MSc, and artificial intelligence MSc) conducted a blinded evaluation, where anonymized outputs from different PediaMind-R1 variants were presented side-by-side in randomized order. Each answer was independently rated on (a) knowledge correctness, (b) psychological appropriateness, and (c) caregiving suitability, using a 0–1 scale. Final scores were computed by averaging the three expert ratings for each dimension. Inter-rater agreement among the three experts reached 0.81 (Cohen’s <Math mode="inline" tex="\kappa" text="kappa" xml:id="S4.SS2.p1.m1">
            <XMath>
              <XMTok font="italic" name="kappa" role="UNKNOWN">κ</XMTok>
            </XMath>
          </Math>), indicating substantial consistency and reliability of the evaluation process.</p>
      </para>
      <table inlist="lot" labels="LABEL:tab:expert_ablation" placement="htbp" xml:id="S4.T2">
        <tags>
          <tag><text fontsize="50%">Table 2</text></tag>
          <tag role="autoref"><text fontsize="50%">Table 2</text></tag>
          <tag role="refnum"><text fontsize="50%">2</text></tag>
          <tag role="typerefnum"><text fontsize="50%">Table 2</text></tag>
        </tags>
<!--  %缩小整体字体 
     %缩小列间距
     %行距略微舒展-->        <tabular class="ltx_centering ltx_guessed_headers" rowsep="0.5pt" vattach="middle">
          <thead>
            <tr>
              <td align="left" border="tt" thead="column row"><text font="bold" fontsize="50%">Model</text></td>
              <td align="center" border="tt" thead="column"><text font="bold" fontsize="50%">Knowledge</text></td>
              <td align="center" border="tt" thead="column"><text font="bold" fontsize="50%">Psych. Align.</text></td>
              <td align="center" border="tt" thead="column"><text font="bold" fontsize="50%">Caregiving</text></td>
            </tr>
          </thead>
          <tbody>
            <tr>
              <td align="left" border="t" thead="row"><text fontsize="50%">Qwen2.5-7B-Instruct (untuned)</text></td>
              <td align="center" border="t"><text fontsize="50%">0.68</text></td>
              <td align="center" border="t"><text fontsize="50%">0.68</text></td>
              <td align="center" border="t"><text fontsize="50%">0.75</text></td>
            </tr>
            <tr>
              <td align="left" thead="row"><text fontsize="50%">PediaMind-R1 (SFT only)</text></td>
              <td align="center"><text fontsize="50%">0.66</text></td>
              <td align="center"><text fontsize="50%">0.88</text></td>
              <td align="center"><text fontsize="50%">0.83</text></td>
            </tr>
            <tr>
              <td align="left" border="bb" thead="row"><text fontsize="50%">PediaMind-R1 (SFT+GRPO)</text></td>
              <td align="center" border="bb"><text font="bold" fontsize="50%">0.72</text></td>
              <td align="center" border="bb"><text font="bold" fontsize="50%">0.92</text></td>
              <td align="center" border="bb"><text font="bold" fontsize="50%">0.88</text></td>
            </tr>
          </tbody>
        </tabular>
        <toccaption class="ltx_centering"><tag close=" "><text fontsize="50%">2</text></tag><text fontsize="50%">Expert evaluation results on a 0–1 scale across 100 scenario-based queries, covering knowledge correctness, psychological alignment, and caregiving suitability.</text></toccaption>
        <caption class="ltx_centering" fontsize="50%"><tag close=": ">Table 2</tag>Expert evaluation results on a 0–1 scale across 100 scenario-based queries, covering knowledge correctness, psychological alignment, and caregiving suitability.</caption>
      </table>
      <para xml:id="S4.SS2.p2">
        <p>As shown in Table <ref labelref="LABEL:tab:expert_ablation"/>, SFT enhanced the model’s temperament-awareness but occasionally produced answers with weak query relevance, suggesting an overemphasis on structured formats. GRPO addressed this limitation by rewarding content fidelity, leading to significant improvements in psychological alignment and caregiving suitability.</p>
      </para>
<!--  %****␣acl_latex.tex␣Line␣325␣**** -->    </subsection>
  </section>
  <section inlist="toc" xml:id="S5">
    <tags>
      <tag>5</tag>
      <tag role="autoref">section 5</tag>
      <tag role="refnum">5</tag>
      <tag role="typerefnum">§5</tag>
    </tags>
    <title><tag close=" ">5</tag>Conclusion</title>
    <para xml:id="S5.p1">
      <p>Our study shows that combining psychological profiling with reward-guided training enables effective personalization in infant care. PediaMind-R1 gained temperament reasoning via SFT, while GRPO refined robustness and psychological alignment. This two-stage pipeline offers a reliable framework for extending temperament-sensitive personalization to domains such as healthcare and education. Beyond infancy, the approach highlights the potential of integrating cognitive modeling with reinforcement-based alignment to support other sensitive, user-centered applications requiring both accuracy and empathy.</p>
    </para>
  </section>
  <section xml:id="Sx1">
    <title>Limitations</title>
    <para xml:id="Sx1.p1">
      <p>Our approach relies on caregiver-provided temperament assessments, which may vary in accuracy due to potential reporting bias. While this study focuses on temperament reasoning accuracy, the supervised dataset remains relatively small, and we employ only the classical Thomas–Chess framework, which may limit coverage of modern temperament models. Moreover, our reward design uses largely discrete signals, and no formal significance testing was conducted due to the modest benchmark size, leaving both finer-grained optimization and broader validation for future work.</p>
    </para>
  </section>
  <section xml:id="Sx2">
    <title>Acknowledgments</title>
    <para xml:id="Sx2.p1">
      <p>This research was supported by Fudan University and Bosch China, whose funding and expert guidance were essential to this work.</p>
      <pagination role="newpage"/>
    </para>
  </section>
  <bibliography citestyle="authoryear" files="custom" xml:id="bib">
    <title>References</title>
  </bibliography>
<!--  %\nocite{Ando2005,andrew2007scalable,rasooli-tetrault-2015} -->  <pagination role="newpage"/>
  <appendix inlist="toc" labels="LABEL:sec:appendixA" xml:id="A1">
    <tags>
      <tag>Appendix A</tag>
      <tag role="autoref">Appendix A</tag>
      <tag role="refnum">A</tag>
      <tag role="typerefnum">Appendix A</tag>
    </tags>
    <title><tag close=" ">Appendix A</tag>Background on the Thomas–Chess Temperament Framework</title>
    <toctitle><tag close=" ">A</tag>Background on the Thomas–Chess Temperament Framework</toctitle>
<!--  %****␣acl_latex.tex␣Line␣350␣**** -->    <para xml:id="A1.p1">
      <p>The Thomas–Chess framework <cite class="ltx_citemacro_citep">(<bibref bibrefs="thomas1977temperament" separator=";" show="AuthorsPhrase1Year" yyseparator=",">
            <bibrefphrase>, </bibrefphrase>
          </bibref>)</cite> is a seminal model in developmental psychology based on the New York Longitudinal Study (NYLS), initiated in the 1950s, which tracked infants over time to identify stable temperament profiles. From nine behavioral dimensions—including activity level, adaptability, mood quality, and attention persistence—they distilled three predominant temperament categories: Easy, Difficult, and Slow-to-Warm-Up:</p>
    </para>
    <para xml:id="A1.p2">
      <itemize xml:id="A1.I1">
        <item xml:id="A1.I1.i1">
          <tags>
            <tag>•</tag>
            <tag role="autoref">item </tag>
            <tag role="typerefnum">1st item</tag>
          </tags>
          <para xml:id="A1.I1.i1.p1">
            <p><text font="bold">Easy</text>: Regular biological rhythms, adaptable responses to environmental changes, and predominantly positive affective tone.</p>
          </para>
        </item>
        <item xml:id="A1.I1.i2">
          <tags>
            <tag>•</tag>
            <tag role="autoref">item </tag>
            <tag role="typerefnum">2nd item</tag>
          </tags>
          <para xml:id="A1.I1.i2.p1">
            <p><text font="bold">Difficult</text>: Irregular biological patterns, low adaptability, high withdrawal or negative emotionality.</p>
          </para>
        </item>
        <item xml:id="A1.I1.i3">
          <tags>
            <tag>•</tag>
            <tag role="autoref">item </tag>
            <tag role="typerefnum">3rd item</tag>
          </tags>
          <para xml:id="A1.I1.i3.p1">
            <p><text font="bold">Slow-to-Warm-Up</text>: Low activity levels, initial withdrawal when confronted with novelty, but gradual habituation with repeated exposure.</p>
          </para>
        </item>
      </itemize>
    </para>
    <para xml:id="A1.p3">
      <p>Importantly, approximately 35% of infants did not fit neatly into these three clusters and were categorized as exhibiting a Mixed temperament — a combination of traits crossing multiple dimensions, without dominance of any single profile.</p>
    </para>
    <para xml:id="A1.p4">
      <p>Subsequent work by Carey <cite class="ltx_citemacro_citep">(<bibref bibrefs="carey2004temperament" separator=";" show="AuthorsPhrase1Year" yyseparator=",">
            <bibrefphrase>, </bibrefphrase>
          </bibref>)</cite> enhanced the clinical and pediatric application of this model, emphasizing its utility for individualized parenting strategies. The Thomas–Chess typology has also demonstrated predictive validity for socio-emotional and behavioral outcomes later in childhood.</p>
    </para>
    <para xml:id="A1.p5">
      <p>From a computational perspective, the Thomas–Chess framework provides a structured taxonomy of temperament-relevant traits that can be directly operationalized as features or labels in machine learning systems. In PediaMind-R1, we encode temperament profiles—including mixed-type assessments—as conditioning signals, thereby enabling the model to deliver tailored, psychologically grounded recommendations that maintain interpretability and theoretical rigor.</p>
    </para>
  </appendix>
  <appendix inlist="toc" xml:id="A2">
    <tags>
      <tag>Appendix B</tag>
      <tag role="autoref">Appendix B</tag>
      <tag role="refnum">B</tag>
      <tag role="typerefnum">Appendix B</tag>
    </tags>
    <title><tag close=" ">Appendix B</tag>Details of Training Setup</title>
    <toctitle><tag close=" ">B</tag>Details of Training Setup</toctitle>
    <para xml:id="A2.p1">
      <p>We provide detailed training configurations for both the Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) phases of PediaMind-R1. During the SFT stage, we fine-tuned the base model (Qwen2.5-7B-Instruct) using LoRA adaptation. In the RL stage, we employed Group Relative Policy Optimization (GRPO) with a rollout group size of 4, enabling the model to compare multiple candidate outputs for each prompt and optimize relative to the group average.</p>
    </para>
    <para xml:id="A2.p2">
      <p>All experiments were conducted on an 8<Math mode="inline" tex="\times" text="*" xml:id="A2.p2.m1">
          <XMath>
            <XMTok meaning="times" role="MULOP">×</XMTok>
          </XMath>
        </Math>80GB NVIDIA A100 GPU platform. Key hyperparameters for both SFT and GRPO stages are summarized in Table <ref labelref="LABEL:tab:training_params"/>.</p>
    </para>
    <table inlist="lot" labels="LABEL:tab:training_params" placement="htbp" xml:id="A2.T3">
      <tags>
        <tag>Table 3</tag>
        <tag role="autoref">Table 3</tag>
        <tag role="refnum">3</tag>
        <tag role="typerefnum">Table 3</tag>
      </tags>
      <block align="center" depth="0.0pt" width="346.9pt">
        <tabular vattach="middle">
          <tbody>
            <tr>
              <td align="left" border="tt"><text font="bold">Parameter</text></td>
              <td align="center" border="tt"><text font="bold">SFT</text></td>
              <td align="center" border="tt"><text font="bold">RL(GRPO)</text></td>
            </tr>
            <tr>
              <td align="left" border="t">Batch Size (per device)</td>
              <td align="center" border="t">2</td>
              <td align="center" border="t">4</td>
            </tr>
            <tr>
              <td align="left">Gradient Accumulation</td>
              <td align="center">2</td>
              <td align="center">8</td>
            </tr>
            <tr>
              <td align="left">Global Batch Size</td>
              <td align="center">32</td>
              <td align="center">256</td>
            </tr>
            <tr>
              <td align="left">Epochs</td>
              <td align="center">5</td>
              <td align="center">3</td>
            </tr>
            <tr>
              <td align="left">Learning Rate</td>
              <td align="center">2.0e-5</td>
              <td align="center">1.0e-6</td>
            </tr>
            <tr>
              <td align="left">Warmup Ratio</td>
              <td align="center">–</td>
              <td align="center">0.03</td>
            </tr>
            <tr>
              <td align="left">Max Prompt Length</td>
              <td align="center">–</td>
              <td align="center">512</td>
            </tr>
            <tr>
              <td align="left">Max Completion Length</td>
              <td align="center">–</td>
              <td align="center">1024</td>
            </tr>
            <tr>
              <td align="left">Max Sequence Length</td>
              <td align="center">1024</td>
              <td align="center">1024</td>
            </tr>
            <tr>
              <td align="left">Optimizer</td>
              <td align="center">AdamW</td>
              <td align="center">AdamW</td>
            </tr>
            <tr>
              <td align="left">Adam <Math mode="inline" tex="\beta_{1}" text="beta _ 1" xml:id="A2.T3.m1">
                  <XMath>
                    <XMApp>
                      <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                      <XMTok font="italic" name="beta" role="UNKNOWN">β</XMTok>
                      <XMTok fontsize="70%" meaning="1" role="NUMBER">1</XMTok>
                    </XMApp>
                  </XMath>
                </Math> / <Math mode="inline" tex="\beta_{2}" text="beta _ 2" xml:id="A2.T3.m2">
                  <XMath>
                    <XMApp>
                      <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                      <XMTok font="italic" name="beta" role="UNKNOWN">β</XMTok>
                      <XMTok fontsize="70%" meaning="2" role="NUMBER">2</XMTok>
                    </XMApp>
                  </XMath>
                </Math></td>
              <td align="center">–</td>
              <td align="center">0.9 / 0.99</td>
            </tr>
            <tr>
              <td align="left">Weight Decay</td>
              <td align="center">0.01</td>
              <td align="center">0.1</td>
            </tr>
            <tr>
              <td align="left">LR Scheduler</td>
              <td align="center">Cosine</td>
              <td align="center">Cosine</td>
            </tr>
            <tr>
              <td align="left">Gradient Checkpointing</td>
              <td align="center">True</td>
              <td align="center">–</td>
            </tr>
            <tr>
              <td align="left">Evaluation / Logging Steps</td>
              <td align="center">10</td>
              <td align="center">10</td>
            </tr>
            <tr>
              <td align="left">Save Steps</td>
              <td align="center">100</td>
              <td align="center">20</td>
            </tr>
            <tr>
              <td align="left">Save Total Limit</td>
              <td align="center">3</td>
              <td align="center">–</td>
            </tr>
            <tr>
              <td align="left">Max Grad Norm</td>
              <td align="center">–</td>
              <td align="center">0.5</td>
            </tr>
            <tr>
              <td align="left">Temperature</td>
              <td align="center">–</td>
              <td align="center">1.0</td>
            </tr>
            <tr>
              <td align="left">Rollout Generations</td>
              <td align="center">–</td>
              <td align="center">4</td>
            </tr>
            <tr>
              <td align="left">KL Coefficient (<Math mode="inline" tex="\beta" text="beta" xml:id="A2.T3.m3">
                  <XMath>
                    <XMTok font="italic" name="beta" role="UNKNOWN">β</XMTok>
                  </XMath>
                </Math>)</td>
              <td align="center">–</td>
              <td align="center">0.005</td>
            </tr>
            <tr>
              <td align="left" border="bb">Precision</td>
              <td align="center" border="bb">bfloat16</td>
              <td align="center" border="bb">bfloat16</td>
            </tr>
          </tbody>
        </tabular>
      </block>
      <toccaption class="ltx_centering"><tag close=" ">3</tag>Training hyperparameters for SFT and GRPO stages, executed on an 8<Math mode="inline" tex="\times" text="*" xml:id="A2.T3.m4">
          <XMath>
            <XMTok meaning="times" role="MULOP">×</XMTok>
          </XMath>
        </Math>80GB NVIDIA A100 GPU platform.</toccaption>
      <caption class="ltx_centering"><tag close=": ">Table 3</tag>Training hyperparameters for SFT and GRPO stages, executed on an 8<Math mode="inline" tex="\times" text="*" xml:id="A2.T3.m5">
          <XMath>
            <XMTok meaning="times" role="MULOP">×</XMTok>
          </XMath>
        </Math>80GB NVIDIA A100 GPU platform.</caption>
    </table>
  </appendix>
  <appendix inlist="toc" labels="LABEL:sec:appendixC" xml:id="A3">
    <tags>
      <tag>Appendix C</tag>
      <tag role="autoref">Appendix C</tag>
      <tag role="refnum">C</tag>
      <tag role="typerefnum">Appendix C</tag>
    </tags>
    <title><tag close=" ">Appendix C</tag>Temperament Knowledge Graph</title>
    <toctitle><tag close=" ">C</tag>Temperament Knowledge Graph</toctitle>
    <para xml:id="A3.p1">
      <p>To provide structured support for temperament-aware reasoning, we summarize the Thomas–Chess temperament framework as a knowledge graph (see Table <ref labelref="LABEL:tab:temperament-graph"/>). The design follows the principle of Goodness of Fit: parenting success depends not on changing the child’s temperament, but on adapting caregiving practices and environments to match it.</p>
    </para>
    <table inlist="lot" labels="LABEL:tab:temperament-graph" placement="htbp" xml:id="A3.T4">
      <tags>
        <tag><text fontsize="70%">Table 4</text></tag>
        <tag role="autoref"><text fontsize="70%">Table 4</text></tag>
        <tag role="refnum"><text fontsize="70%">4</text></tag>
        <tag role="typerefnum"><text fontsize="70%">Table 4</text></tag>
      </tags>
      <tabular class="ltx_centering ltx_guessed_headers" colsep="5.0pt" rowsep="3.5pt" vattach="middle">
        <thead>
          <tr>
            <td align="justify" border="tt" thead="column" width="91.1pt"><text class="ltx_wrap" font="bold" fontsize="70%">Easy Child (<Math mode="inline" tex="\approx 40\%" text="absent approximately-equals 40percent" xml:id="A3.T4.m1">
                  <XMath>
                    <XMApp>
                      <XMTok font="medium" meaning="approximately-equals" name="approx" role="RELOP">≈</XMTok>
                      <XMTok meaning="absent"/>
                      <XMApp>
                        <XMTok font="medium" meaning="percent" role="POSTFIX">%</XMTok>
                        <XMTok font="medium" meaning="40" role="NUMBER">40</XMTok>
                      </XMApp>
                    </XMApp>
                  </XMath>
                </Math>)</text></td>
            <td align="justify" border="tt" thead="column" width="91.1pt"><text class="ltx_wrap" font="bold" fontsize="70%">Difficult Child (<Math mode="inline" tex="\approx 10\%" text="absent approximately-equals 10percent" xml:id="A3.T4.m2">
                  <XMath>
                    <XMApp>
                      <XMTok font="medium" meaning="approximately-equals" name="approx" role="RELOP">≈</XMTok>
                      <XMTok meaning="absent"/>
                      <XMApp>
                        <XMTok font="medium" meaning="percent" role="POSTFIX">%</XMTok>
                        <XMTok font="medium" meaning="10" role="NUMBER">10</XMTok>
                      </XMApp>
                    </XMApp>
                  </XMath>
                </Math>)</text></td>
            <td align="justify" border="tt" thead="column" width="91.1pt"><text class="ltx_wrap" font="bold" fontsize="70%">Slow-to-Warm-Up (<Math mode="inline" tex="\approx 15\%" text="absent approximately-equals 15percent" xml:id="A3.T4.m3">
                  <XMath>
                    <XMApp>
                      <XMTok font="medium" meaning="approximately-equals" name="approx" role="RELOP">≈</XMTok>
                      <XMTok meaning="absent"/>
                      <XMApp>
                        <XMTok font="medium" meaning="percent" role="POSTFIX">%</XMTok>
                        <XMTok font="medium" meaning="15" role="NUMBER">15</XMTok>
                      </XMApp>
                    </XMApp>
                  </XMath>
                </Math>)</text></td>
            <td align="justify" border="tt" thead="column" width="91.1pt"><text class="ltx_wrap" font="bold" fontsize="70%">Mixed Type (<Math mode="inline" tex="\approx 35\%" text="absent approximately-equals 35percent" xml:id="A3.T4.m4">
                  <XMath>
                    <XMApp>
                      <XMTok font="medium" meaning="approximately-equals" name="approx" role="RELOP">≈</XMTok>
                      <XMTok meaning="absent"/>
                      <XMApp>
                        <XMTok font="medium" meaning="percent" role="POSTFIX">%</XMTok>
                        <XMTok font="medium" meaning="35" role="NUMBER">35</XMTok>
                      </XMApp>
                    </XMApp>
                  </XMath>
                </Math>)</text></td>
          </tr>
        </thead>
        <tbody>
          <tr>
            <td align="justify" border="t" width="91.1pt"><text font="bold" fontsize="70%">Traits:</text><text fontsize="70%"> Regular biological rhythms, quick adaptability, moderate reaction intensity, generally cheerful. Children usually adjust smoothly to changes in routines or environments.</text></td>
            <td align="justify" border="t" width="91.1pt"><text font="bold" fontsize="70%">Traits:</text><text fontsize="70%"> Irregular biological rhythms, low adaptability, high withdrawal, intense reactions, more frequent negative mood. These children are easily upset by unfamiliar events or transitions.</text></td>
            <td align="justify" border="t" width="91.1pt"><text font="bold" fontsize="70%">Traits:</text><text fontsize="70%"> Low activity level, initial avoidance of novelty, slow adaptability, mild emotional expressions. They often observe cautiously before joining new activities.</text></td>
            <td align="justify" border="t" width="91.1pt"><text font="bold" fontsize="70%">Traits:</text><text fontsize="70%"> Exhibit a mixture of the other three types, with context-dependent reactions. No single temperament trait dominates, making responses less predictable.</text></td>
          </tr>
          <tr>
            <td align="justify" border="bb" width="91.1pt"><text font="bold" fontsize="70%">Advice:</text><text fontsize="70%">
</text><!--  %****␣acl_latex.tex␣Line␣425␣**** --><text fontsize="70%">1. Do not neglect needs due to “easy” behavior.
2. Attend to subtle emotional signals.
</text><text class="ltx_wrap" font="italic" fontsize="70%">Example: Even if the child plays quietly, schedule regular check-ins for comfort and engagement.</text></td>
            <td align="justify" border="bb" width="91.1pt"><text font="bold" fontsize="70%">Advice:</text><text fontsize="70%">
1. Maintain calm, consistent parental emotions.
2. Create a predictable, structured daily routine.
</text><text class="ltx_wrap" font="italic" fontsize="70%">Example: Use a visual daily schedule chart to help the child know what comes next and reduce anxiety.</text></td>
            <td align="justify" border="bb" width="91.1pt"><text font="bold" fontsize="70%">Advice:</text><text fontsize="70%">
1. Avoid forcing or rushing into new situations.
2. Act as a “secure base” by modeling positive interaction.
3. Reframe traits positively (e.g., “cautious” instead of “shy”).
</text><text class="ltx_wrap" font="italic" fontsize="70%">Example: Introduce new environments gradually—first observing with a parent, then gently participating.</text></td>
            <td align="justify" border="bb" width="91.1pt"><text font="bold" fontsize="70%">Advice:</text><text fontsize="70%">
1. Apply a situational “deconstruction” approach: assess traits per context.
2. Flexibly switch strategies to achieve dynamic fit.
</text><text class="ltx_wrap" font="italic" fontsize="70%">Example: If the child shows Easy-type reactions at home but Slow-to-Warm-Up at school, adjust parenting accordingly.</text></td>
          </tr>
        </tbody>
      </tabular>
      <toccaption class="ltx_centering"><tag close=" "><text fontsize="70%">4</text></tag><text fontsize="70%">Knowledge graph of infant temperament categories and caregiving strategies, based on the Thomas–Chess framework. Each type is illustrated with traits, tailored advice, and practical examples, all emphasizing the principle of Goodness of Fit.</text></toccaption>
      <caption class="ltx_centering" fontsize="70%"><tag close=": ">Table 4</tag>Knowledge graph of infant temperament categories and caregiving strategies, based on the Thomas–Chess framework. Each type is illustrated with traits, tailored advice, and practical examples, all emphasizing the principle of Goodness of Fit.</caption>
    </table>
  </appendix>
  <appendix inlist="toc" labels="LABEL:sec:appendixD" xml:id="A4">
    <tags>
      <tag>Appendix D</tag>
      <tag role="autoref">Appendix D</tag>
      <tag role="refnum">D</tag>
      <tag role="typerefnum">Appendix D</tag>
    </tags>
    <title><tag close=" ">Appendix D</tag>Example of PediaMind-R1 Response</title>
    <toctitle><tag close=" ">D</tag>Example of PediaMind-R1 Response</toctitle>
<!--  %****␣acl_latex.tex␣Line␣450␣**** -->    <para xml:id="A4.p1">
      <p>To illustrate the practical use of our system, Figure <ref labelref="LABEL:fig:example"/> presents an example query and the corresponding PediaMind-R1(finetuned with SFT and GRPO) response. The example demonstrates how temperament profiling (here: <text font="italic">Difficult Child</text>) guides the reasoning process.</p>
    </para>
    <figure inlist="lof" labels="LABEL:fig:example" placement="htbp" xml:id="A4.F3">
      <tags>
        <tag>Figure 3</tag>
        <tag role="autoref">Figure 3</tag>
        <tag role="refnum">3</tag>
        <tag role="typerefnum">Figure 3</tag>
      </tags>
      <graphics candidates="example.png" class="ltx_centering" graphic="example.png" options="width=390.258pt" xml:id="A4.F3.g1"/>
      <toccaption class="ltx_centering"><tag close=" ">3</tag>PediaMind-R1 response example for a child with a difficult temperament.</toccaption>
      <caption class="ltx_centering"><tag close=": ">Figure 3</tag>PediaMind-R1 response example for a child with a difficult temperament.</caption>
    </figure>
  </appendix>
</document>
