<?xml version="1.0" encoding="UTF-8"?>
<?latexml searchpaths="/home/japhy/scienceReplication.artiswrong.com/paper_files/arxiv/2512.12208/latex_extracted"?>
<?latexml class="IEEEtran" options="lettersize,journal"?>
<?latexml package="amsmath,amsfonts"?>
<?latexml package="algorithmicx"?>
<?latexml package="algcompatible"?>
<?latexml package="algorithm"?>
<?latexml package="array"?>
<?latexml package="subfig" options="caption=false,font=normalsize,labelfont=sf,textfont=sf"?>
<?latexml package="textcomp"?>
<?latexml package="stfloats"?>
<?latexml package="url"?>
<?latexml package="verbatim"?>
<?latexml package="graphicx"?>
<?latexml package="inputenc" options="utf8"?>
<?latexml package="newunicodechar"?>
<?latexml RelaxNGSchema="LaTeXML"?>
<?latexml package="setspace"?>
<?latexml package="cite"?>
<?latexml package="enumitem"?>
<document xmlns="http://dlmf.nist.gov/LaTeXML" class="ltx_authors_1line">
  <resource src="LaTeXML.css" type="text/css"/>
  <resource src="ltx-article.css" type="text/css"/>
  <title>A Hybrid Deep Learning Framework for Emotion Recognition in Children with Autism During NAO Robot-Mediated Interaction</title>
  <creator role="author">
    <personname>
Indranil Bhattacharjee<sup>1</sup>,
Vartika Narayani Srinet<sup>2</sup>,
Anirudha Bhattacharjee<sup>2</sup>,
Braj Bhushan<sup>2</sup>,
Bishakh Bhattacharya<sup>2,*</sup> <break/><sup>1</sup>Department of Information Technology, School of Engineering,
Cochin University of Science and Technology, Kochi, Kerala, India <break/><sup>2</sup>Indian Institute of Technology Kanpur, Uttar Pradesh, India <break/>Emails: indranil@ug.cusat.ac.in, vartikana23@iitk.ac.in, anirub@iitk.ac.in, brajb@iitk.ac.in, *bishakh@iitk.ac.in
</personname>
  </creator>
  <abstract name="Abstract">
    <p>Understanding emotional responses in children with Autism Spectrum Disorder (ASD) during social interaction remains a critical challenge in both developmental psychology and human-robot interaction. This study presents a novel deep learning pipeline for emotion recognition in autistic children in response to a name-calling event by a humanoid robot (NAO), under controlled experimental settings. The dataset comprises of around 50,000 facial frames extracted from video recordings of 15 children with ASD. A hybrid model combining a fine-tuned ResNet-50-based Convolutional Neural Network (CNN) and a three-layer Graph Convolutional Network (GCN) trained on both visual and geometric features extracted from MediaPipe FaceMesh landmarks. Emotions were probabilistically labeled using a weighted ensemble of two models: DeepFace’s and FER, each contributing to soft-label generation across seven emotion classes. Final classification leveraged a fused embedding optimized via Kullback-Leibler divergence. The proposed method demonstrates robust performance in modeling subtle affective responses and offers significant promise for affective profiling of ASD children in clinical and therapeutic human-robot interaction contexts, as the pipeline effectively captures micro emotional cues in neurodivergent children, addressing a major gap in autism-specific HRI research. This work represents the first such large-scale, real-world dataset and pipeline from India on autism-focused emotion analysis using social robotics, contributing an essential foundation for future personalized assistive technologies.</p>
  </abstract>
  <keywords>Autism, NAO, Child-Robot Interaction, Emotion analysis, ResNet-50, GCN, Deepface, Mini-Xception, FER.</keywords>
  <ERROR class="undefined">\newunicodechar</ERROR>
  <para xml:id="p1">
    <p>⁻<sup>-</sup>

<!--  %updated with editorial comments 8/9/2021 -->

<!--  %**** main.tex Line 25 **** --></p>
  </para>
<!--  %**** main.tex Line 50 **** -->  <section inlist="toc" xml:id="S1">
    <tags>
      <tag>I</tag>
      <tag role="refnum">I</tag>
      <tag role="typerefnum">§I</tag>
    </tags>
    <title><tag close=" ">I</tag><text font="smallcaps">Introduction</text></title>
    <para xml:id="S1.p1">
      <p>NAO humanoid robot developed by SoftBank Robotics, standing 58 cm tall with 25 degrees of freedom, is widely utilized in educational and therapeutic environments due to its semi-anthropomorphic appearance and programmable capabilities. Globally, NAO has been applied in diverse contexts ranging from children’s education to autism interventions, yet its deployment in India remains a sparse scenario that presents a significant opportunity for strengthening socio-cognitive support through technology-enhanced methods.</p>
    </para>
    <para xml:id="S1.p2">
      <p>Amid growing concerns that excessive screen time and digital media consumption may impact children’s attentional capacities, there is increasing interest in robot-mediated interventions as proactive tools to foster engagement and learning. In children with Autism Spectrum Disorder (ASD), one of the hallmark early markers is a delayed or absent response to name-calling, a clinical indicator frequently used in diagnostic assessments. Evidence indicates that ASD children demonstrate heightened responsiveness and engagement when interacting with robotic agents <cite class="ltx_citemacro_cite">[<bibref bibrefs="Rudovic2018" separator="," yyseparator=","/>]</cite> positioning Socially Assistive Robots (SARs) like NAO as promising platforms for eliciting measurable socio-behavioral responses.</p>
    </para>
    <para xml:id="S1.p3">
      <p>While response to name (RTN) paradigms have been previously explored within ASD diagnostic protocols, integration with robust, deep learning based emotion detection especially combining facial appearance and geometric landmark data have not been fully realized. Conventional approaches tend to rely on either texture-based convolutional models, which may miss subtle expressions, or landmark sequences, which fail to account for global affective context, discussed in <cite class="ltx_citemacro_cite">[<bibref bibrefs="li2018microexpression" separator="," yyseparator=","/>]</cite>.</p>
    </para>
    <para xml:id="S1.p4">
      <p>To address this limitation, we propose a novel hybrid CNN–GCN architecture, named Fusion-N, capable of extracting and fusing multi-scale emotional cues from both RGB imagery and facial landmarks simultaneously, shown in Fig 1. Our pipeline leverages ensemble-derived soft labels from DeepFace’s and FER models, enabling probabilistic training that effectively models emotion ambiguity and anticipates ASD-specific expression patterns. We evaluated this approach on a dataset comprising almost  50,000 high-resolution frames obtained from 15 children with ASD during NAO-mediated RTN tasks and demonstrated its efficacy in accurately classifying nuanced emotion states, including fear and disgust, which are typically underrepresented in ASD datasets. This methodology contributes to the fields of affective computing, human-robot interaction, and computational neuro-psychology by introducing a multimodal framework for assessing emotion recognition in vulnerable developmental cohorts.</p>
    </para>
  </section>
  <section inlist="toc" xml:id="S2">
    <tags>
      <tag>II</tag>
      <tag role="refnum">II</tag>
      <tag role="typerefnum">§II</tag>
    </tags>
    <title><tag close=" ">II</tag><text font="smallcaps">Related Work</text></title>
    <para xml:id="S2.p1">
      <p>Facial expression recognition (FER) has long been a cornerstone in affective computing and human-computer interaction. Among the most widely adopted face detection pipelines is the Multi-task Cascaded Convolutional Neural Network (MTCNN) framework by Zhang et al. <cite class="ltx_citemacro_cite">[<bibref bibrefs="zhang2016joint" separator="," yyseparator=","/>]</cite>, which remains a benchmark for real-time face detection and alignment due to its efficiency in bounding-box regression and landmark localization.</p>
    </para>
    <figure inlist="lof" labels="LABEL:fig:unimodal_pipeline" placement="h" xml:id="S2.F1">
      <tags>
        <tag>Fig. 1</tag>
        <tag role="refnum">1</tag>
        <tag role="typerefnum">Fig. 1</tag>
      </tags>
      <graphics candidates="cnn-gcn.pdf" class="ltx_centering" graphic="cnn-gcn.pdf" options="width=252.945pt" xml:id="S2.F1.g1"/>
      <toccaption class="ltx_centering"><tag close=" ">1</tag>A focused top-level view of our multimodal pipeline structure. Fusion-N, the novel hybrid framework made using ResNet-50 and GCN.</toccaption>
      <caption class="ltx_centering"><tag close=": ">Fig. 1</tag>A focused top-level view of our multimodal pipeline structure. Fusion-N, the novel hybrid framework made using ResNet-50 and GCN.</caption>
    </figure>
<!--  %**** main.tex Line 75 **** -->    <para xml:id="S2.p2">
      <p>For facial landmark extraction, Lugaresi et al.<cite class="ltx_citemacro_cite">[<bibref bibrefs="lugaresi2019mediapipe" separator="," yyseparator=","/>]</cite> introduced MediaPipe FaceMesh, which provides dense 3D landmark detection (468 key points), forming a strong basis for extracting geometric and relational facial features and facilitating more nuanced understanding of facial structure and microexpressions. Graph Convolutional Networks (GCNs) have become another cornerstone in modeling structured data by combining node features with graph topology. A seminal work by Kipf and Welling <cite class="ltx_citemacro_cite">[<bibref bibrefs="kipf2017gcn" separator="," yyseparator=","/>]</cite> introduced the modern GCN architecture, which efficiently performs semi-supervised node classification via layer-wise propagation based on graph Laplacians.
To label emotional states, researchers have increasingly moved beyond single-label supervision to probabilistic soft labels that account for ambiguity and class overlap. The DeepFace library <cite class="ltx_citemacro_cite">[<bibref bibrefs="serengil2020deepface" separator="," yyseparator=","/>]</cite>, with its robust backbones such as VGG‑Face <cite class="ltx_citemacro_cite">[<bibref bibrefs="parkhi2015deepface" separator="," yyseparator=","/>]</cite> , FaceNet <cite class="ltx_citemacro_cite">[<bibref bibrefs="schroff2015facenet" separator="," yyseparator=","/>]</cite> backbones, has been widely adopted for face recognition, especially in facial datasets characterized by real-world variability. Similarly, Mini-Xception architectures trained on FER2013 <cite class="ltx_citemacro_cite">[<bibref bibrefs="arriaga2017real" separator="," yyseparator=","/>]</cite> have demonstrated competitive performance with lower computational overhead, making them ideal for ensemble frameworks. These models are particularly helpful in analyzing common human expressions. A recent system, SENSES‑ASD<cite class="ltx_citemacro_cite">[<bibref bibrefs="abu2024senses" separator="," yyseparator=","/>]</cite>, utilized Mini‑Xception (trained on FER‑2013) for facial emotion recognition in autistic adults and achieved a validation accuracy of approximately 60%<cite class="ltx_citemacro_cite">[<bibref bibrefs="abu2024senses" separator="," yyseparator=","/>]</cite>.
The integration of DeepFace (Mini-Xception) and FER-based predictions through weighted averaging forms a non-obvious soft-label calibration method which is better suited for neuro-divergent datasets where emotional ambiguity is prevalent.</p>
    </para>
    <para xml:id="S2.p3">
      <p>The increasing use of GCNs has also led to hybrid models that combine image based CNN features with graph based structural information. Bin Li and Lima <cite class="ltx_citemacro_cite">[<bibref bibrefs="li2021facial" separator="," yyseparator=","/>]</cite> implemented a ResNet-50 based architecture for facial expression recognition, showcasing its robustness across benchmark datasets. Our model Fusion-N integrates a ResNet-50 variant for global semantic extraction and a topology-aware GCN over facial landmarks to generate spatial embeddings. This hybrid architecture demonstrates higher accuracy and better generalization, especially when analyzing subtle or masked emotions such as fear or disgust emotions that are often underrepresented and harder to detect.</p>
    </para>
    <para xml:id="S2.p4">
      <p>While many studies have focused on emotion recognition in typical populations, relatively fewer have addressed the unique challenges posed by children with ASD. <cite class="ltx_citemacro_cite">[<bibref bibrefs="guillon2014emotion" separator="," yyseparator=","/>]</cite> underscored the importance of developing systems that can support or augment emotion recognition capabilities. The role of assistive technologies, particularly humanoid robots such as NAO, has grown significantly in autism research. Robins et al. <cite class="ltx_citemacro_cite">[<bibref bibrefs="robins2004robots" separator="," yyseparator=","/>]</cite> were among the first to demonstrate the potential of robots in engaging children with ASD through structured interactions. Rudovic et al. <cite class="ltx_citemacro_cite">[<bibref bibrefs="Rudovic2018" separator="," yyseparator=","/>]</cite> expanded this domain by introducing personalized machine learning algorithms that enabled robots to adapt to individual emotional patterns in children with ASD.</p>
    </para>
    <para xml:id="S2.p5">
      <p>Studies show that NAO robot interventions have the potential to enhance emotional expressiveness and social engagement in children with ASD significantly. Robot therapy promotes communication in minimally verbal children, increases social engagement with imitation activities, and stimulates better classroom participation compared to normal settings <cite class="ltx_citemacro_cite">[<bibref bibrefs="Feil-Seifer2011,Tapus2007,Dautenhahn2005" separator="," yyseparator=","/>]</cite>. This is particularly significant in name-calling tests, in which a child’s reaction to their own name offers an insight into social awareness, attention, and affective states, all of which are significant diagnostic indicators in early diagnosis of autism. Costescu et al. <cite class="ltx_citemacro_cite">[<bibref bibrefs="costescu2015comparison" separator="," yyseparator=","/>]</cite> similarly proved that children with ASD were more socially responsive when the NAO robot was engaged in imitative play and joint-attention exercises. These results strongly advocate for combining NAO-based interaction paradigms with computationally sophisticated emotion-analysis pipelines through the combination of soft-label supervision, dense facial-geometry modeling, and robot-mediated data collection. Such an integration provides a solid framework to study affective behavior in autistic children in ethically approved, ecologically valid experimental environments.</p>
    </para>
    <figure inlist="lof" labels="LABEL:Figure_1" placement="h" xml:id="S2.F2">
      <tags>
        <tag>Fig. 2</tag>
        <tag role="refnum">2</tag>
        <tag role="typerefnum">Fig. 2</tag>
      </tags>
      <graphics candidates="experimental_setup.pdf" class="ltx_centering" graphic="experimental_setup.pdf" options="width=216.81pt" xml:id="S2.F2.g1"/>
      <toccaption class="ltx_centering"><tag close=" ">2</tag>This figure illustrates the setup of an autistic child engaging in free play in an unbiased environment with NAO and a facilitator seated nearby. </toccaption>
      <caption class="ltx_centering"><tag close=": ">Fig. 2</tag>This figure illustrates the setup of an autistic child engaging in free play in an unbiased environment with NAO and a facilitator seated nearby. </caption>
    </figure>
    <table inlist="lot" labels="LABEL:tab:face_preprocessing" placement="htbp" xml:id="S2.T1">
      <tags>
        <tag>TABLE I</tag>
        <tag role="refnum">I</tag>
        <tag role="typerefnum">TABLE I</tag>
      </tags>
<!--  %**** main.tex Line 100 **** -->      <toccaption class="ltx_centering"><tag close=" ">I</tag>Data Specifications</toccaption>
      <caption class="ltx_centering"><tag close=": ">TABLE I</tag>Data Specifications</caption>
      <tabular class="ltx_centering ltx_guessed_headers" rowsep="2.0pt" vattach="middle">
        <tbody>
          <tr>
            <td align="left" border="l r t" thead="row"><text font="bold">Parameter</text></td>
            <td align="left" border="r t"><text font="bold">Value</text></td>
          </tr>
          <tr>
            <td align="left" border="l r t" thead="row">Subjects</td>
            <td align="left" border="r t">15 children with ASD</td>
          </tr>
          <tr>
            <td align="left" border="l r t" thead="row">Videos</td>
            <td align="left" border="r t">15 (1 per child)</td>
          </tr>
          <tr>
            <td align="left" border="l r t" thead="row">Duration</td>
            <td align="left" border="r t">3–5 minutes per child</td>
          </tr>
          <tr>
            <td align="left" border="l r t" thead="row">Name Called</td>
            <td align="left" border="r t">12 times (randomly spread)</td>
          </tr>
          <tr>
            <td align="left" border="l r t" thead="row">FPS for Processing</td>
            <td align="left" border="r t">15</td>
          </tr>
          <tr>
            <td align="left" border="l r t" thead="row">Frames Extracted</td>
            <td align="left" border="r t">48,891</td>
          </tr>
          <tr>
            <td align="left" border="b l r t" thead="row">Label Distribution</td>
            <td align="left" border="b r t">Balanced across 7 emotions</td>
          </tr>
        </tbody>
      </tabular>
    </table>
<!--  %**** main.tex Line 125 **** -->  </section>
  <section inlist="toc" xml:id="S3">
    <tags>
      <tag>III</tag>
      <tag role="refnum">III</tag>
      <tag role="typerefnum">§III</tag>
    </tags>
    <title><tag close=" ">III</tag><text font="smallcaps">Methodology</text></title>
    <para xml:id="S3.p1">
      <p>The proposed emotion recognition pipeline for autistic children is a modular, multi-staged architecture designed to capture and interpret subtle affective cues from video data. The flow of controls in our pipeline is displayed in Fig. 3. The stages of this pipeline flow as follows:</p>
    </para>
    <figure inlist="lof" labels="LABEL:fig:fer_pipeline" placement="t" xml:id="S3.F3">
      <tags>
        <tag>Fig. 3</tag>
        <tag role="refnum">3</tag>
        <tag role="typerefnum">Fig. 3</tag>
      </tags>
      <graphics candidates="flowchart.pdf" class="ltx_centering" graphic="flowchart.pdf" options="width=252.945pt" xml:id="S3.F3.g1"/>
      <toccaption class="ltx_centering"><tag close=" ">3</tag>Flowchart of the facial emotion recognition pipeline. The process begins with dataset creation through video collection, followed by face detection. Detected faces are validated, aligned, and then passed to the facial landmark extraction module. These features, along with the cropped face images, are fed into our novel hybrid model (Fusion-N) to generate emotion probability predictions.</toccaption>
      <caption class="ltx_centering"><tag close=": ">Fig. 3</tag>Flowchart of the facial emotion recognition pipeline. The process begins with dataset creation through video collection, followed by face detection. Detected faces are validated, aligned, and then passed to the facial landmark extraction module. These features, along with the cropped face images, are fed into our novel hybrid model (Fusion-N) to generate emotion probability predictions.</caption>
    </figure>
    <subsection inlist="toc" xml:id="S3.SS1">
      <tags>
        <tag>III-A</tag>
        <tag role="refnum">III-A</tag>
        <tag role="typerefnum">§III-A</tag>
      </tags>
      <title><tag close=" ">III-A</tag><text font="italic">Experimental data acquisition</text></title>
      <para xml:id="S3.SS1.p1">
        <p>After approval of the Institutional Ethics Committee of the Indian Institute of Technology, Kanpur and the center head and consent from the parents, the psychological analysis report of the children was obtained to finalize our selection criterias such as studying children in mild to moderate autism spectrum and 6 to 10 years of age.</p>
      </para>
      <para xml:id="S3.SS1.p2">
        <p>Sessions were conducted in a carefully curated environment to ensure the child’s comfort, with a trusted psychologist present and strict confidentiality maintained throughout.</p>
      </para>
      <para xml:id="S3.SS1.p3">
        <p>The child participated in a semi-structured interaction session for a duration of 3–5 minutes in a known and relaxed environment, with provision of toys and play materials to minimize stress and improve ecological validity. In this free-play setting, the NAO robot performed a pre-programmed name-calling procedure, uttering each child’s name 12 times in random temporal order. The experimental configuration is shown in Fig. 2, and dataset information is given in Table I.</p>
      </para>
    </subsection>
    <subsection inlist="toc" xml:id="S3.SS2">
      <tags>
        <tag>III-B</tag>
        <tag role="refnum">III-B</tag>
        <tag role="typerefnum">§III-B</tag>
      </tags>
      <title><tag close=" ">III-B</tag><text font="italic">Face Extraction</text></title>
      <para xml:id="S3.SS2.p1">
        <p>Face detection is performed using the Multi-task Cascaded Convolutional Neural Network (MTCNN), which jointly handles face localization and bounding-box regression. To ensure clean inputs, frames are filtered for blur and validity, followed by secondary verification using Dlib’s CNN/HOG detector (results were the same in both cases) via <text font="typewriter">face_recognition.face_locations</text>, discussed by <cite class="ltx_citemacro_cite">[<bibref bibrefs="king2017dlibface" separator="," yyseparator=","/>]</cite> to reduce false positives. To address MTCNN’s over-cropping, temporary dynamic padding is applied during validation, though only unpadded images are retained for downstream processing. Verified bounding boxes are used to extract 468 3D facial landmarks via MediaPipe Face Mesh, capturing dense anatomical regions (e.g., brows, lips, jawline). Landmarks are normalized using min-max scaling relative to the nose tip for scale, rotation, and translation invariance. The resulting data is exported in CSV format for graph-based modeling.
<!--  %**** main.tex Line 150 **** --></p>
      </para>
    </subsection>
    <subsection inlist="toc" xml:id="S3.SS3">
      <tags>
        <tag>III-C</tag>
        <tag role="refnum">III-C</tag>
        <tag role="typerefnum">§III-C</tag>
      </tags>
      <title><tag close=" ">III-C</tag><text font="italic">Probabilistic Soft Label Generation</text></title>
      <para xml:id="S3.SS3.p1">
        <p>To accommodate the ambiguity of expressions common in ASD, we employed a soft-labeling mechanism using ensemble fusion. Emotion probabilities are computed by aggregating predictions from two independently trained models:</p>
      </para>
      <para xml:id="S3.SS3.p2">
        <itemize xml:id="S3.I1">
          <item xml:id="S3.I1.i1">
            <tags>
              <tag>•</tag>
              <tag role="typerefnum">1st item</tag>
            </tags>
            <para xml:id="S3.I1.i1.p1">
              <p><text font="bold">DeepFace:</text> A Mini-Xception model trained on FER-2013  <cite class="ltx_citemacro_cite">[<bibref bibrefs="arriaga2017real" separator="," yyseparator=","/>]</cite>, providing semantic emotion embeddings.</p>
            </para>
          </item>
          <item xml:id="S3.I1.i2">
            <tags>
              <tag>•</tag>
              <tag role="typerefnum">2nd item</tag>
            </tags>
            <para xml:id="S3.I1.i2.p1">
              <p><text font="bold">FER:</text> A custom CNN-based model by Shenk <cite class="ltx_citemacro_cite">[<bibref bibrefs="shenk2020fer" separator="," yyseparator=","/>]</cite>, also trained on FER-2013, outputting 7-class softmax distributions.</p>
            </para>
          </item>
        </itemize>
      </para>
      <para xml:id="S3.SS3.p3">
        <p>The final distribution <Math mode="inline" tex="\mathbf{y}_{\text{final}}\in\mathbb{R}^{7}" text="y _ [final] element-of R ^ 7" xml:id="S3.SS3.p3.m1">
            <XMath>
              <XMApp>
                <XMTok meaning="element-of" name="in" role="RELOP">∈</XMTok>
                <XMApp>
                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                  <XMTok font="bold" role="UNKNOWN">y</XMTok>
                  <XMText><text fontsize="70%">final</text></XMText>
                </XMApp>
                <XMApp>
                  <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                  <XMTok font="blackboard" role="UNKNOWN">R</XMTok>
                  <XMTok fontsize="70%" meaning="7" role="NUMBER">7</XMTok>
                </XMApp>
              </XMApp>
            </XMath>
          </Math> is obtained as a weighted average:</p>
      </para>
      <para xml:id="S3.SS3.p4">
        <equation xml:id="S3.Ex1">
          <Math mode="display" tex="\mathbf{y}_{\text{final}}=\frac{1}{3}\cdot\mathbf{y}_{\text{DeepFace}}+\frac{2%&#10;}{3}\cdot\mathbf{y}_{\text{FER}}" text="y _ [final] = (1 / 3) cdot y _ [DeepFace] + (2 / 3) cdot y _ [FER]" xml:id="S3.Ex1.m1">
            <XMath>
              <XMApp>
                <XMTok meaning="equals" role="RELOP">=</XMTok>
                <XMApp>
                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                  <XMTok font="bold" role="UNKNOWN">y</XMTok>
                  <XMText><text fontsize="70%">final</text></XMText>
                </XMApp>
                <XMApp>
                  <XMTok meaning="plus" role="ADDOP">+</XMTok>
                  <XMApp>
                    <XMTok name="cdot" role="MULOP">⋅</XMTok>
                    <XMApp>
                      <XMTok mathstyle="display" meaning="divide" role="FRACOP"/>
                      <XMTok meaning="1" role="NUMBER">1</XMTok>
                      <XMTok meaning="3" role="NUMBER">3</XMTok>
                    </XMApp>
                    <XMApp>
                      <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                      <XMTok font="bold" role="UNKNOWN">y</XMTok>
                      <XMText><text fontsize="70%">DeepFace</text></XMText>
                    </XMApp>
                  </XMApp>
                  <XMApp>
                    <XMTok name="cdot" role="MULOP">⋅</XMTok>
                    <XMApp>
                      <XMTok mathstyle="display" meaning="divide" role="FRACOP"/>
                      <XMTok meaning="2" role="NUMBER">2</XMTok>
                      <XMTok meaning="3" role="NUMBER">3</XMTok>
                    </XMApp>
                    <XMApp>
                      <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                      <XMTok font="bold" role="UNKNOWN">y</XMTok>
                      <XMText><text fontsize="70%">FER</text></XMText>
                    </XMApp>
                  </XMApp>
                </XMApp>
              </XMApp>
            </XMath>
          </Math>
        </equation>
      </para>
      <para xml:id="S3.SS3.p5">
        <p>FER is trained and tested more on low-quality images. During our validation tests, FER consistently produced lower error rates compared to DeepFace in the low-resolution scenarios  <cite class="ltx_citemacro_cite">[<bibref bibrefs="delovski2023emotion" separator="," yyseparator=","/>]</cite>. That’s the reason why assigning a greater weight to FER in the ensemble enhances overall prediction quality , the ensemble is relying more on the model which is performing better under the real conditions of our data provided in the Table IV. Both models are trained on tightly cropped, aligned face images from FER-2013. Although they include their own detectors, we supplied preprocessed face crops to minimize issues such as failed detection, incorrect scale, or orientation, thereby improving prediction robustness. This ensemble strategy mitigates model-specific bias and enhances reliability across diverse visual inputs, as demonstrated in Table II. The full soft-labeling workflow is illustrated in Fig. 5.</p>
      </para>
      <table inlist="lot" labels="LABEL:tab:emotion_model_comparison" placement="ht" xml:id="S3.T2">
        <tags>
          <tag>TABLE II</tag>
          <tag role="refnum">II</tag>
          <tag role="typerefnum">TABLE II</tag>
        </tags>
        <toccaption class="ltx_centering"><tag close=" ">II</tag>Comparison of Emotion Detection Models and Fusion Strategy Used in the Proposed Pipeline</toccaption>
        <caption class="ltx_centering"><tag close=": ">TABLE II</tag>Comparison of Emotion Detection Models and Fusion Strategy Used in the Proposed Pipeline</caption>
        <tabular class="ltx_centering" rowsep="2.0pt" vattach="middle">
          <tbody>
            <tr>
              <td align="justify" border="l r t" width="42.7pt"><text class="ltx_wrap" font="bold">Model Source</text></td>
              <td align="justify" border="r t" width="62.6pt"><text class="ltx_wrap" font="bold">Architecture</text></td>
              <td align="justify" border="r t" width="71.1pt"><text class="ltx_wrap" font="bold">Output Type</text></td>
              <td align="justify" border="r t" width="42.7pt"><text class="ltx_wrap" font="bold">Fusion Weight</text></td>
              <td align="justify" border="r t" width="156.5pt"><text class="ltx_wrap" font="bold">Rationale</text></td>
            </tr>
            <tr>
              <td align="justify" border="l r t" width="42.7pt"><text class="ltx_wrap" font="bold">DeepFace</text></td>
              <td align="justify" border="r t" width="62.6pt">Mini-Xception</td>
              <td align="justify" border="r t" width="71.1pt">7-class probability distribution</td>
              <td align="justify" border="r t" width="42.7pt"><Math mode="inline" tex="1/3" text="1 / 3" xml:id="S3.T2.m1">
                  <XMath>
                    <XMApp>
                      <XMTok meaning="divide" role="MULOP">/</XMTok>
                      <XMTok meaning="1" role="NUMBER">1</XMTok>
                      <XMTok meaning="3" role="NUMBER">3</XMTok>
                    </XMApp>
                  </XMath>
                </Math></td>
              <td align="justify" border="r t" width="156.5pt">Lightweight CNN pretrained on FER-2013, efficient for real-time inference</td>
            </tr>
            <tr>
              <td align="justify" border="l r t" width="42.7pt"><text class="ltx_wrap" font="bold">FER</text></td>
              <td align="justify" border="r t" width="62.6pt">Custom CNN (<text font="typewriter">fer</text> library)</td>
              <td align="justify" border="r t" width="71.1pt">7-class probability distribution</td>
              <td align="justify" border="r t" width="42.7pt"><Math mode="inline" tex="2/3" text="2 / 3" xml:id="S3.T2.m2">
                  <XMath>
                    <XMApp>
                      <XMTok meaning="divide" role="MULOP">/</XMTok>
                      <XMTok meaning="2" role="NUMBER">2</XMTok>
                      <XMTok meaning="3" role="NUMBER">3</XMTok>
                    </XMApp>
                  </XMath>
                </Math></td>
              <td align="justify" border="r t" width="156.5pt">Accurate and fast, empirically better on subtle emotions</td>
            </tr>
            <tr>
              <td align="justify" border="b l r t" width="42.7pt"><text class="ltx_wrap" font="bold">Ensemble Logic</text></td>
              <td align="justify" border="b r t" width="62.6pt">Weighted average</td>
              <td align="justify" border="b r t" width="71.1pt">Final 7-class soft probabilities</td>
              <td align="justify" border="b r t" width="42.7pt">–</td>
              <td align="justify" border="b r t" width="156.5pt">Reduces neutral bias using penalty regularization and sharpens predictions via temperature scaling</td>
            </tr>
          </tbody>
        </tabular>
      </table>
    </subsection>
    <subsection inlist="toc" xml:id="S3.SS4">
      <tags>
        <tag>III-D</tag>
        <tag role="refnum">III-D</tag>
        <tag role="typerefnum">§III-D</tag>
      </tags>
      <title><tag close=" ">III-D</tag><text font="italic">Hybrid CNN-GCN Classification (Fusion-N)</text></title>
      <para xml:id="S3.SS4.p1">
        <p>We introduced <text font="italic">Fusion-N</text>, a dual-branch architecture that jointly processes pixel-level and geometric information. A schematic diagram of Fusion-N is shown in Fig. 4.</p>
      </para>
      <subsubsection inlist="toc" xml:id="S3.SS4.SSS1">
        <tags>
          <tag>III-D1</tag>
          <tag role="refnum">III-D1</tag>
          <tag role="typerefnum">§III-D1</tag>
        </tags>
        <title><tag close=" ">III-D1</tag>CNN Branch</title>
        <figure inlist="lof" labels="LABEL:Figure_3" placement="t" xml:id="S3.F4">
          <tags>
            <tag>Fig. 4</tag>
            <tag role="refnum">4</tag>
            <tag role="typerefnum">Fig. 4</tag>
          </tags>
          <graphics candidates="Fusion_N.png" class="ltx_centering" graphic="Fusion_N.png" options="width=195.129pt, trim= 20 0 180 0, clip=true" xml:id="S3.F4.g1"/>
          <toccaption class="ltx_centering"><tag close=" ">4</tag>Simplified architecture of the proposed <text font="italic">Fusion-N</text> model.
The network consists of two parallel branches: a CNN-based global feature extractor (left) that uses ResNet-50 with channel-wise attention to produce the global descriptor <Math mode="inline" tex="\mathbf{F}_{\text{CNN}}\in\mathbb{R}^{2048}" text="F _ [CNN] element-of R ^ 2048" xml:id="S3.F4.m1">
              <XMath>
                <XMApp>
                  <XMTok meaning="element-of" name="in" role="RELOP">∈</XMTok>
                  <XMApp>
                    <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="bold" role="UNKNOWN">F</XMTok>
                    <XMText><text fontsize="70%">CNN</text></XMText>
                  </XMApp>
                  <XMApp>
                    <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="blackboard" role="UNKNOWN">R</XMTok>
                    <XMTok fontsize="70%" meaning="2048" role="NUMBER">2048</XMTok>
                  </XMApp>
                </XMApp>
              </XMath>
            </Math>, and a GCN-based geometric branch (right) that encodes 3D facial landmarks into <Math mode="inline" tex="\mathbf{F}_{\text{GCN}}" text="F _ [GCN]" xml:id="S3.F4.m2">
              <XMath>
                <XMApp>
                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                  <XMTok font="bold" role="UNKNOWN">F</XMTok>
                  <XMText><text fontsize="70%">GCN</text></XMText>
                </XMApp>
              </XMath>
            </Math> via a stack of GCN layers and mean pooling. The two feature streams are fused via simple concatenation after intra-branch attention refinement, resulting in the final representation <Math mode="inline" tex="\mathbf{F}_{\text{fused}}\in\mathbb{R}^{2176}" text="F _ [fused] element-of R ^ 2176" xml:id="S3.F4.m3">
              <XMath>
                <XMApp>
                  <XMTok meaning="element-of" name="in" role="RELOP">∈</XMTok>
                  <XMApp>
                    <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="bold" role="UNKNOWN">F</XMTok>
                    <XMText><text fontsize="70%">fused</text></XMText>
                  </XMApp>
                  <XMApp>
                    <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="blackboard" role="UNKNOWN">R</XMTok>
                    <XMTok fontsize="70%" meaning="2176" role="NUMBER">2176</XMTok>
                  </XMApp>
                </XMApp>
              </XMath>
            </Math>.</toccaption>
          <caption class="ltx_centering"><tag close=": ">Fig. 4</tag>Simplified architecture of the proposed <text font="italic">Fusion-N</text> model.
The network consists of two parallel branches: a CNN-based global feature extractor (left) that uses ResNet-50 with channel-wise attention to produce the global descriptor <Math mode="inline" tex="\mathbf{F}_{\text{CNN}}\in\mathbb{R}^{2048}" text="F _ [CNN] element-of R ^ 2048" xml:id="S3.F4.m4">
              <XMath>
                <XMApp>
                  <XMTok meaning="element-of" name="in" role="RELOP">∈</XMTok>
                  <XMApp>
                    <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="bold" role="UNKNOWN">F</XMTok>
                    <XMText><text fontsize="70%">CNN</text></XMText>
                  </XMApp>
                  <XMApp>
                    <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="blackboard" role="UNKNOWN">R</XMTok>
                    <XMTok fontsize="70%" meaning="2048" role="NUMBER">2048</XMTok>
                  </XMApp>
                </XMApp>
              </XMath>
            </Math>, and a GCN-based geometric branch (right) that encodes 3D facial landmarks into <Math mode="inline" tex="\mathbf{F}_{\text{GCN}}" text="F _ [GCN]" xml:id="S3.F4.m5">
              <XMath>
                <XMApp>
                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                  <XMTok font="bold" role="UNKNOWN">F</XMTok>
                  <XMText><text fontsize="70%">GCN</text></XMText>
                </XMApp>
              </XMath>
            </Math> via a stack of GCN layers and mean pooling. The two feature streams are fused via simple concatenation after intra-branch attention refinement, resulting in the final representation <Math mode="inline" tex="\mathbf{F}_{\text{fused}}\in\mathbb{R}^{2176}" text="F _ [fused] element-of R ^ 2176" xml:id="S3.F4.m6">
              <XMath>
                <XMApp>
                  <XMTok meaning="element-of" name="in" role="RELOP">∈</XMTok>
                  <XMApp>
                    <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="bold" role="UNKNOWN">F</XMTok>
                    <XMText><text fontsize="70%">fused</text></XMText>
                  </XMApp>
                  <XMApp>
                    <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="blackboard" role="UNKNOWN">R</XMTok>
                    <XMTok fontsize="70%" meaning="2176" role="NUMBER">2176</XMTok>
                  </XMApp>
                </XMApp>
              </XMath>
            </Math>.</caption>
        </figure>
        <figure inlist="lof" labels="LABEL:fig:unimodal_pipeline_2" placement="t" xml:id="S3.F5">
          <tags>
            <tag>Fig. 5</tag>
            <tag role="refnum">5</tag>
            <tag role="typerefnum">Fig. 5</tag>
          </tags>
          <graphics candidates="Untitled.pdf" class="ltx_centering" graphic="Untitled.pdf" options="width=368.577pt" xml:id="S3.F5.g1"/>
          <toccaption class="ltx_centering"><tag close=" ">5</tag>Segmented architecture of the pipeline, illustrating the phases of face detection using MTCNN, face validation via <text font="typewriter">face_recognition</text>, landmark extraction using MediaPipe FaceMesh, and the creation of soft labels for training the Fusion-N model.</toccaption>
          <caption class="ltx_centering"><tag close=": ">Fig. 5</tag>Segmented architecture of the pipeline, illustrating the phases of face detection using MTCNN, face validation via <text font="typewriter">face_recognition</text>, landmark extraction using MediaPipe FaceMesh, and the creation of soft labels for training the Fusion-N model.</caption>
        </figure>
        <para xml:id="S3.SS4.SSS1.p1">
          <p>Aligned RGB face images of size <Math mode="inline" tex="224\times 224\times 3" text="224 * 224 * 3" xml:id="S3.SS4.SSS1.p1.m1">
              <XMath>
                <XMApp>
                  <XMTok meaning="times" role="MULOP">×</XMTok>
                  <XMTok meaning="224" role="NUMBER">224</XMTok>
                  <XMTok meaning="224" role="NUMBER">224</XMTok>
                  <XMTok meaning="3" role="NUMBER">3</XMTok>
                </XMApp>
              </XMath>
            </Math> are passed through a ResNet-50 backbone, with the first 44 parameters tensors frozen and the rest fine-tuned. The output feature vector <Math mode="inline" tex="\mathbf{f}_{\text{img}}\in\mathbb{R}^{2048}" text="f _ [img] element-of R ^ 2048" xml:id="S3.SS4.SSS1.p1.m2">
              <XMath>
                <XMApp>
                  <XMTok meaning="element-of" name="in" role="RELOP">∈</XMTok>
                  <XMApp>
                    <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="bold" role="UNKNOWN">f</XMTok>
                    <XMText><text fontsize="70%">img</text></XMText>
                  </XMApp>
                  <XMApp>
                    <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="blackboard" role="UNKNOWN">R</XMTok>
                    <XMTok fontsize="70%" meaning="2048" role="NUMBER">2048</XMTok>
                  </XMApp>
                </XMApp>
              </XMath>
            </Math> captures global semantic information and is refined by an attention module.</p>
        </para>
      </subsubsection>
      <subsubsection inlist="toc" xml:id="S3.SS4.SSS2">
        <tags>
          <tag>III-D2</tag>
          <tag role="refnum">III-D2</tag>
          <tag role="typerefnum">§III-D2</tag>
        </tags>
        <title><tag close=" ">III-D2</tag>GCN Branch</title>
        <para xml:id="S3.SS4.SSS2.p1">
          <p>Facial graphs are constructed from 468 landmarks with edges defined by facial geometry (jawline, eyebrows, eyes, mouth).
A 3-layer Graph Convolutional Network (GCN) extracts relational features, and the pooled 128-dimensional embedding <Math mode="inline" tex="f_{\text{geom}}\in\mathbb{R}^{128}" text="f _ [geom] element-of R ^ 128" xml:id="S3.SS4.SSS2.p1.m1">
              <XMath>
                <XMApp>
                  <XMTok meaning="element-of" name="in" role="RELOP">∈</XMTok>
                  <XMApp>
                    <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="italic" role="UNKNOWN">f</XMTok>
                    <XMText><text fontsize="70%">geom</text></XMText>
                  </XMApp>
                  <XMApp>
                    <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="blackboard" role="UNKNOWN">R</XMTok>
                    <XMTok fontsize="70%" meaning="128" role="NUMBER">128</XMTok>
                  </XMApp>
                </XMApp>
              </XMath>
            </Math> is further refined with attention.</p>
        </para>
      </subsubsection>
      <subsubsection inlist="toc" xml:id="S3.SS4.SSS3">
        <tags>
          <tag>III-D3</tag>
          <tag role="refnum">III-D3</tag>
          <tag role="typerefnum">§III-D3</tag>
        </tags>
        <title><tag close=" ">III-D3</tag>Fusion and Classification</title>
<!--  %**** main.tex Line 225 **** -->        <para xml:id="S3.SS4.SSS3.p1">
          <p>The concatenated feature vector <Math mode="inline" tex="\mathbf{f}_{\text{joint}}=[\mathbf{f}_{\text{img}}\|\mathbf{f}_{\text{geom}}]%&#10;\in\mathbb{R}^{2176}" text="f _ [joint] = delimited-[]@(conditional@(f _ [img], f _ [geom])) element-of R ^ 2176" xml:id="S3.SS4.SSS3.p1.m1">
              <XMath>
                <XMApp>
                  <XMTok meaning="multirelation"/>
                  <XMApp>
                    <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="bold" role="UNKNOWN">f</XMTok>
                    <XMText><text fontsize="70%">joint</text></XMText>
                  </XMApp>
                  <XMTok meaning="equals" role="RELOP">=</XMTok>
                  <XMDual>
                    <XMApp>
                      <XMTok meaning="delimited-[]"/>
                      <XMRef idref="S3.SS4.SSS3.p1.m1.1"/>
                    </XMApp>
                    <XMWrap>
                      <XMTok role="OPEN" stretchy="false">[</XMTok>
                      <XMApp xml:id="S3.SS4.SSS3.p1.m1.1">
                        <XMTok meaning="conditional" name="||" role="MODIFIEROP">∥</XMTok>
                        <XMApp>
                          <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                          <XMTok font="bold" role="UNKNOWN">f</XMTok>
                          <XMText><text fontsize="70%">img</text></XMText>
                        </XMApp>
                        <XMApp>
                          <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                          <XMTok font="bold" role="UNKNOWN">f</XMTok>
                          <XMText><text fontsize="70%">geom</text></XMText>
                        </XMApp>
                      </XMApp>
                      <XMTok role="CLOSE" stretchy="false">]</XMTok>
                    </XMWrap>
                  </XMDual>
                  <XMTok meaning="element-of" name="in" role="RELOP">∈</XMTok>
                  <XMApp>
                    <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="blackboard" role="UNKNOWN">R</XMTok>
                    <XMTok fontsize="70%" meaning="2176" role="NUMBER">2176</XMTok>
                  </XMApp>
                </XMApp>
              </XMath>
            </Math> is passed through a series of dense layers with dropout and LayerNorm. Emotion class probabilities are predicted using a softmax layer.</p>
        </para>
      </subsubsection>
      <subsubsection inlist="toc" xml:id="S3.SS4.SSS4">
        <tags>
          <tag>III-D4</tag>
          <tag role="refnum">III-D4</tag>
          <tag role="typerefnum">§III-D4</tag>
        </tags>
        <title><tag close=" ">III-D4</tag>Loss Function</title>
        <para xml:id="S3.SS4.SSS4.p1">
          <p>Model training minimizes KL divergence between predicted scores <Math mode="inline" tex="\mathbf{s}_{\theta}" text="s _ theta" xml:id="S3.SS4.SSS4.p1.m1">
              <XMath>
                <XMApp>
                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                  <XMTok font="bold" role="UNKNOWN">s</XMTok>
                  <XMTok font="italic" fontsize="70%" name="theta" role="UNKNOWN">θ</XMTok>
                </XMApp>
              </XMath>
            </Math> and calibrated targets <Math mode="inline" tex="\tilde{\mathbf{y}}" text="tilde@(y)" xml:id="S3.SS4.SSS4.p1.m2">
              <XMath>
                <XMApp>
                  <XMTok name="tilde" role="OVERACCENT" stretchy="false">~</XMTok>
                  <XMTok font="bold" role="UNKNOWN">y</XMTok>
                </XMApp>
              </XMath>
            </Math>:</p>
          <equation xml:id="S3.E1">
            <tags>
              <tag>(1)</tag>
              <tag role="refnum">1</tag>
            </tags>
            <Math mode="display" tex="\mathcal{L}_{\text{KL}}=\sum_{i}\tilde{y}_{i}\log\left(\frac{\tilde{y}_{i}}{s_%&#10;{\theta,i}}\right)" text="L _ [KL] = (sum _ i)@((tilde@(y)) _ i * logarithm@((tilde@(y)) _ i / s _ (list@(theta, i))))" xml:id="S3.E1.m1">
              <XMath>
                <XMApp>
                  <XMTok meaning="equals" role="RELOP">=</XMTok>
                  <XMApp>
                    <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="caligraphic" role="UNKNOWN">L</XMTok>
                    <XMText><text fontsize="70%">KL</text></XMText>
                  </XMApp>
                  <XMApp>
                    <XMApp scriptpos="mid">
                      <XMTok role="SUBSCRIPTOP" scriptpos="mid1"/>
                      <XMTok mathstyle="display" meaning="sum" role="SUMOP" scriptpos="mid">∑</XMTok>
                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">i</XMTok>
                    </XMApp>
                    <XMApp>
                      <XMTok meaning="times" role="MULOP">⁢</XMTok>
                      <XMApp>
                        <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                        <XMApp>
                          <XMTok name="tilde" role="OVERACCENT" stretchy="false">~</XMTok>
                          <XMTok font="italic" role="UNKNOWN">y</XMTok>
                        </XMApp>
                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">i</XMTok>
                      </XMApp>
                      <XMDual>
                        <XMApp>
                          <XMRef idref="S3.E1.m1.3"/>
                          <XMRef idref="S3.E1.m1.4"/>
                        </XMApp>
                        <XMApp>
                          <XMTok meaning="logarithm" role="OPFUNCTION" xml:id="S3.E1.m1.3">log</XMTok>
                          <XMWrap>
                            <XMTok role="OPEN" stretchy="true">(</XMTok>
                            <XMApp xml:id="S3.E1.m1.4">
                              <XMTok mathstyle="display" meaning="divide" role="FRACOP"/>
                              <XMApp>
                                <XMTok role="SUBSCRIPTOP" scriptpos="post3"/>
                                <XMApp>
                                  <XMTok name="tilde" role="OVERACCENT" stretchy="false">~</XMTok>
                                  <XMTok font="italic" role="UNKNOWN">y</XMTok>
                                </XMApp>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">i</XMTok>
                              </XMApp>
                              <XMApp>
                                <XMTok role="SUBSCRIPTOP" scriptpos="post3"/>
                                <XMTok font="italic" role="UNKNOWN">s</XMTok>
                                <XMDual>
                                  <XMApp>
                                    <XMTok meaning="list"/>
                                    <XMRef idref="S3.E1.m1.1"/>
                                    <XMRef idref="S3.E1.m1.2"/>
                                  </XMApp>
                                  <XMWrap>
                                    <XMTok font="italic" fontsize="70%" name="theta" role="UNKNOWN" xml:id="S3.E1.m1.1">θ</XMTok>
                                    <XMTok fontsize="70%" role="PUNCT">,</XMTok>
                                    <XMTok font="italic" fontsize="70%" role="UNKNOWN" xml:id="S3.E1.m1.2">i</XMTok>
                                  </XMWrap>
                                </XMDual>
                              </XMApp>
                            </XMApp>
                            <XMTok role="CLOSE" stretchy="true">)</XMTok>
                          </XMWrap>
                        </XMApp>
                      </XMDual>
                    </XMApp>
                  </XMApp>
                </XMApp>
              </XMath>
            </Math>
          </equation>
          <p>where <Math mode="inline" tex="i\in\{1,\dots,C\}" text="i element-of set@(1, dots, C)" xml:id="S3.SS4.SSS4.p1.m3">
              <XMath>
                <XMApp>
                  <XMTok meaning="element-of" name="in" role="RELOP">∈</XMTok>
                  <XMTok font="italic" role="UNKNOWN">i</XMTok>
                  <XMDual>
                    <XMApp>
                      <XMTok meaning="set"/>
                      <XMRef idref="S3.SS4.SSS4.p1.m3.1"/>
                      <XMRef idref="S3.SS4.SSS4.p1.m3.2"/>
                      <XMRef idref="S3.SS4.SSS4.p1.m3.3"/>
                    </XMApp>
                    <XMWrap>
                      <XMTok role="OPEN" stretchy="false">{</XMTok>
                      <XMTok meaning="1" role="NUMBER" xml:id="S3.SS4.SSS4.p1.m3.1">1</XMTok>
                      <XMTok role="PUNCT">,</XMTok>
                      <XMTok name="dots" role="ID" xml:id="S3.SS4.SSS4.p1.m3.2">…</XMTok>
                      <XMTok role="PUNCT">,</XMTok>
                      <XMTok font="italic" role="UNKNOWN" xml:id="S3.SS4.SSS4.p1.m3.3">C</XMTok>
                      <XMTok role="CLOSE" stretchy="false">}</XMTok>
                    </XMWrap>
                  </XMDual>
                </XMApp>
              </XMath>
            </Math> indexes emotion classes.</p>
        </para>
      </subsubsection>
    </subsection>
    <subsection inlist="toc" xml:id="S3.SS5">
      <tags>
        <tag>III-E</tag>
        <tag role="refnum">III-E</tag>
        <tag role="typerefnum">§III-E</tag>
      </tags>
      <title><tag close=" ">III-E</tag><text font="italic">Framework Used</text></title>
      <para xml:id="S3.SS5.p1">
        <p>Face detection and pre-processing were performed using MTCNN, followed by validation through the <text font="typewriter">face_recognition</text> library from DLib<cite class="ltx_citemacro_cite">[<bibref bibrefs="king2009dlib" separator="," yyseparator=","/>]</cite>. Quality control was implemented using Laplacian variance thresholding to remove blurry frames. Geometric normalization was applied to ensure alignment consistency.</p>
      </para>
      <para xml:id="S3.SS5.p2">
        <p>For pose-invariant facial landmark extraction, we utilized the Face Mesh solution provided by MediaPipe <cite class="ltx_citemacro_cite">[<bibref bibrefs="lugaresi2019mediapipe" separator="," yyseparator=","/>]</cite>
. The 3D coordinates were normalized prior to further processing.</p>
      </para>
      <para xml:id="S3.SS5.p3">
        <p>To generate soft emotion labels, the DeepFace <cite class="ltx_citemacro_cite">[<bibref bibrefs="serengil2020lightface" separator="," yyseparator=","/>]</cite> and FER  <cite class="ltx_citemacro_cite">[<bibref bibrefs="shenk2020fer" separator="," yyseparator=","/>]</cite> libraries were employed. These outputs were used in conjunction with the PyTorch <text font="typewriter">Dataset</text> API to structure a triplet input pipeline consisting of face images, landmarks, and corresponding soft labels.</p>
      </para>
    </subsection>
  </section>
  <section inlist="toc" xml:id="S4">
    <tags>
      <tag>IV</tag>
      <tag role="refnum">IV</tag>
      <tag role="typerefnum">§IV</tag>
    </tags>
    <title><tag close=" ">IV</tag><text font="smallcaps">Optimization and Training Framework</text></title>
    <para xml:id="S4.p1">
      <p>Training is done with the AdamW optimizer <cite class="ltx_citemacro_cite">[<bibref bibrefs="loshchilov2019decoupled" separator="," yyseparator=","/>]</cite>, using discriminative learning rates of <Math mode="inline" tex="3\times 10^{-6}" text="3 * 10 ^ (- 6)" xml:id="S4.p1.m1">
          <XMath>
            <XMApp>
              <XMTok meaning="times" role="MULOP">×</XMTok>
              <XMTok meaning="3" role="NUMBER">3</XMTok>
              <XMApp>
                <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                <XMTok meaning="10" role="NUMBER">10</XMTok>
                <XMApp>
                  <XMTok fontsize="70%" meaning="minus" role="ADDOP">-</XMTok>
                  <XMTok fontsize="70%" meaning="6" role="NUMBER">6</XMTok>
                </XMApp>
              </XMApp>
            </XMApp>
          </XMath>
        </Math> and <Math mode="inline" tex="1\times 10^{-5}" text="1 * 10 ^ (- 5)" xml:id="S4.p1.m2">
          <XMath>
            <XMApp>
              <XMTok meaning="times" role="MULOP">×</XMTok>
              <XMTok meaning="1" role="NUMBER">1</XMTok>
              <XMApp>
                <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                <XMTok meaning="10" role="NUMBER">10</XMTok>
                <XMApp>
                  <XMTok fontsize="70%" meaning="minus" role="ADDOP">-</XMTok>
                  <XMTok fontsize="70%" meaning="5" role="NUMBER">5</XMTok>
                </XMApp>
              </XMApp>
            </XMApp>
          </XMath>
        </Math> for the pretrained CNN backbone and classifier head, respectively, with a global <Math mode="inline" tex="L_{2}" text="L _ 2" xml:id="S4.p1.m3">
          <XMath>
            <XMApp>
              <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
              <XMTok font="italic" role="UNKNOWN">L</XMTok>
              <XMTok fontsize="70%" meaning="2" role="NUMBER">2</XMTok>
            </XMApp>
          </XMath>
        </Math> weight decay of <Math mode="inline" tex="5\times 10^{-4}" text="5 * 10 ^ (- 4)" xml:id="S4.p1.m4">
          <XMath>
            <XMApp>
              <XMTok meaning="times" role="MULOP">×</XMTok>
              <XMTok meaning="5" role="NUMBER">5</XMTok>
              <XMApp>
                <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                <XMTok meaning="10" role="NUMBER">10</XMTok>
                <XMApp>
                  <XMTok fontsize="70%" meaning="minus" role="ADDOP">-</XMTok>
                  <XMTok fontsize="70%" meaning="4" role="NUMBER">4</XMTok>
                </XMApp>
              </XMApp>
            </XMApp>
          </XMath>
        </Math> to prevent overfitting <cite class="ltx_citemacro_cite">[<bibref bibrefs="krogh1992simple" separator="," yyseparator=","/>]</cite>. The main criterion is the label-smoothed KL divergence (smoothing factor <Math mode="inline" tex="=0.1" text="absent = 0.1" xml:id="S4.p1.m5">
          <XMath>
            <XMApp>
              <XMTok meaning="equals" role="RELOP">=</XMTok>
              <XMTok meaning="absent"/>
              <XMTok meaning="0.1" role="NUMBER">0.1</XMTok>
            </XMApp>
          </XMath>
        </Math>), ensuring robust learning with softened target distributions. Training stability is maintained through gradient clipping (L2 norm limit <Math mode="inline" tex="=1.0" text="absent = 1.0" xml:id="S4.p1.m6">
          <XMath>
            <XMApp>
              <XMTok meaning="equals" role="RELOP">=</XMTok>
              <XMTok meaning="absent"/>
              <XMTok meaning="1.0" role="NUMBER">1.0</XMTok>
            </XMApp>
          </XMath>
        </Math>), while effective exploration of the loss landscape is facilitated by a cosine annealing learning rate schedule with warm restarts (<Math mode="inline" tex="T_{0}=10" text="T _ 0 = 10" xml:id="S4.p1.m7">
          <XMath>
            <XMApp>
              <XMTok meaning="equals" role="RELOP">=</XMTok>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="italic" role="UNKNOWN">T</XMTok>
                <XMTok fontsize="70%" meaning="0" role="NUMBER">0</XMTok>
              </XMApp>
              <XMTok meaning="10" role="NUMBER">10</XMTok>
            </XMApp>
          </XMath>
        </Math>, <Math mode="inline" tex="T_{m}=2" text="T _ m = 2" xml:id="S4.p1.m8">
          <XMath>
            <XMApp>
              <XMTok meaning="equals" role="RELOP">=</XMTok>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="italic" role="UNKNOWN">T</XMTok>
                <XMTok font="italic" fontsize="70%" role="UNKNOWN">m</XMTok>
              </XMApp>
              <XMTok meaning="2" role="NUMBER">2</XMTok>
            </XMApp>
          </XMath>
        </Math>, <Math mode="inline" tex="\eta_{\min}=1\times 10^{-5}" text="eta _ minimum = 1 * 10 ^ (- 5)" xml:id="S4.p1.m9">
          <XMath>
            <XMApp>
              <XMTok meaning="equals" role="RELOP">=</XMTok>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="italic" name="eta" role="UNKNOWN">η</XMTok>
                <XMTok fontsize="70%" meaning="minimum" role="OPFUNCTION" scriptpos="post">min</XMTok>
              </XMApp>
              <XMApp>
                <XMTok meaning="times" role="MULOP">×</XMTok>
                <XMTok meaning="1" role="NUMBER">1</XMTok>
                <XMApp>
                  <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                  <XMTok meaning="10" role="NUMBER">10</XMTok>
                  <XMApp>
                    <XMTok fontsize="70%" meaning="minus" role="ADDOP">-</XMTok>
                    <XMTok fontsize="70%" meaning="5" role="NUMBER">5</XMTok>
                  </XMApp>
                </XMApp>
              </XMApp>
            </XMApp>
          </XMath>
        </Math>). The evaluation metrics include per-class precision, recall, F1 score, and overall accuracy, following recommended practices for balanced and robust evaluation, especially in the presence of minority classes <cite class="ltx_citemacro_cite">[<bibref bibrefs="powers2020evaluation" separator="," yyseparator=","/>]</cite>.
<!--  %**** main.tex Line 250 **** --></p>
    </para>
  </section>
  <section inlist="toc" xml:id="S5">
    <tags>
      <tag>V</tag>
      <tag role="refnum">V</tag>
      <tag role="typerefnum">§V</tag>
    </tags>
    <title><tag close=" ">V</tag><text font="smallcaps">Techniques Involved</text></title>
    <para xml:id="S5.p1">
      <p>This section presents a detailed computational framework for multimodal emotion recognition specifically designed for subjects with Autism Spectrum Disorder (ASD).</p>
    </para>
    <subsection inlist="toc" xml:id="S5.SS1">
      <tags>
        <tag>V-A</tag>
        <tag role="refnum">V-A</tag>
        <tag role="typerefnum">§V-A</tag>
      </tags>
      <title><tag close=" ">V-A</tag><text font="italic">Hierarchical Facial Region-of-Interest Detection</text></title>
      <para xml:id="S5.SS1.p1">
        <p>To achieve precise anatomical localization of facial regions, we implemented a dual-step face verification strategy. Initially, the Multi-task Cascaded Convolutional Networks (MTCNN) was employed. This preliminary detector helped localize potential facial regions.</p>
      </para>
      <para xml:id="S5.SS1.p2">
        <p>To ensure high-quality face inputs, all images were first filtered for blur (Laplacian threshold = 25) and low-confidence detections (MTCNN score <Math mode="inline" tex="&lt;" text="less" xml:id="S5.SS1.p2.m1">
            <XMath>
              <XMTok meaning="less-than" role="RELOP">&lt;</XMTok>
            </XMath>
          </Math> 70%). A secondary validation using Dlib’s <text font="typewriter">face_recognition</text> (CNN/HOG) filtered out non-facial or corrupted frames; both backends yielded comparable results with only clean, centered faces retained. Faces smaller than 30<Math mode="inline" tex="\times" text="*" xml:id="S5.SS1.p2.m2">
            <XMath>
              <XMTok meaning="times" role="MULOP">×</XMTok>
            </XMath>
          </Math>30 were discarded, and accepted crops were resized to 224<Math mode="inline" tex="\times" text="*" xml:id="S5.SS1.p2.m3">
            <XMath>
              <XMTok meaning="times" role="MULOP">×</XMTok>
            </XMath>
          </Math>224.</p>
      </para>
      <para xml:id="S5.SS1.p3">
        <p>To correct MTCNN’s tight cropping, temporary padding was applied during verification (not saved), preserving undistorted facial features. Final verified crops were aligned using reused MTCNN boxes and forwarded for landmark detection.
Later, MediaPipe Face Mesh extracted 468 normalized 3D landmarks per face, enabling pose-invariant, topology-aware CSV features for robust graph modeling of neurodivergent expressions.</p>
      </para>
    </subsection>
    <subsection inlist="toc" xml:id="S5.SS2">
      <tags>
        <tag>V-B</tag>
        <tag role="refnum">V-B</tag>
        <tag role="typerefnum">§V-B</tag>
      </tags>
      <title><tag close=" ">V-B</tag><text font="italic">Confidence-Calibrated Label Incorporation</text></title>
      <para xml:id="S5.SS2.p1">
        <p>Several interactive facial emotion recognition tools targeting autistic individuals have been proposed. For instance, Abu‑Nowar et al. (2024) introduced SENSES‑ASD a web/mobile platform utilizing a compact Mini‑Xception CNN ( 60K parameters) trained on FER‑2013 (35,887 grayscale images across seven emotions). The system initially achieved  60% validation accuracy, which improved to  66% after tuning, with training accuracy reaching  71% <cite class="ltx_citemacro_cite">[<bibref bibrefs="abu2024senses" separator="," yyseparator=","/>]</cite>.
To account for the semantic ambiguity and inter-class overlap prevalent in ASD expression datasets, we proposed a confidence-aware novel soft-labeling mechanism based on ensemble modeling. This approach jointly leverages the high representational capacity of DeepFace (Mini-Xception) and the robustness of FER network.</p>
      </para>
      <subsubsection xml:id="S5.SS2.SSSx1">
        <title>Dual-Model Ensemble</title>
        <paragraph inlist="toc" xml:id="S5.SS2.SSSx1.Px1">
          <title>DeepFace Backbone</title>
<!--  %**** main.tex Line 275 **** -->          <para xml:id="S5.SS2.SSSx1.Px1.p1">
            <p>We used the Mini-Xception model from DeepFace <cite class="ltx_citemacro_cite">[<bibref bibrefs="arriaga2017real" separator="," yyseparator=","/>]</cite>, a lightweight CNN trained on FER-2013, producing softmax outputs <Math mode="inline" tex="\mathbf{p}_{\text{DF}}\in\Delta^{C}" text="p _ [DF] element-of Delta ^ C" xml:id="S5.SS2.SSSx1.Px1.p1.m1">
                <XMath>
                  <XMApp>
                    <XMTok meaning="element-of" name="in" role="RELOP">∈</XMTok>
                    <XMApp>
                      <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                      <XMTok font="bold" role="UNKNOWN">p</XMTok>
                      <XMText><text fontsize="70%">DF</text></XMText>
                    </XMApp>
                    <XMApp>
                      <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                      <XMTok name="Delta" role="UNKNOWN">Δ</XMTok>
                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">C</XMTok>
                    </XMApp>
                  </XMApp>
                </XMath>
              </Math> across <Math mode="inline" tex="C=7" text="C = 7" xml:id="S5.SS2.SSSx1.Px1.p1.m2">
                <XMath>
                  <XMApp>
                    <XMTok meaning="equals" role="RELOP">=</XMTok>
                    <XMTok font="italic" role="UNKNOWN">C</XMTok>
                    <XMTok meaning="7" role="NUMBER">7</XMTok>
                  </XMApp>
                </XMath>
              </Math> emotion classes. These predictions contribute to our ensemble fusion strategy. Despite its efficiency, Mini-Xception has shown performance comparable to human-level accuracy on benchmark datasets.</p>
          </para>
        </paragraph>
        <paragraph inlist="toc" xml:id="S5.SS2.SSSx1.Px2">
          <title>FER Supplement</title>
          <para xml:id="S5.SS2.SSSx1.Px2.p1">
            <p>To enhance robustness against occlusions and low-resolution inputs, we incorporate a parallel FER branch (Shenk <cite class="ltx_citemacro_cite">[<bibref bibrefs="shenk2020fer" separator="," yyseparator=","/>]</cite>) via the <text font="typewriter">fer</text> library. It outputs <Math mode="inline" tex="\mathbf{p}_{\text{FER}}\in\Delta^{C}" text="p _ [FER] element-of Delta ^ C" xml:id="S5.SS2.SSSx1.Px2.p1.m1">
                <XMath>
                  <XMApp>
                    <XMTok meaning="element-of" name="in" role="RELOP">∈</XMTok>
                    <XMApp>
                      <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                      <XMTok font="bold" role="UNKNOWN">p</XMTok>
                      <XMText><text fontsize="70%">FER</text></XMText>
                    </XMApp>
                    <XMApp>
                      <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                      <XMTok name="Delta" role="UNKNOWN">Δ</XMTok>
                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">C</XMTok>
                    </XMApp>
                  </XMApp>
                </XMath>
              </Math>, also trained on FER-2013 but using a deeper CNN than Mini-Xception.</p>
          </para>
        </paragraph>
        <paragraph inlist="toc" xml:id="S5.SS2.SSSx1.Px3">
          <title>Weighted Fusion</title>
          <para xml:id="S5.SS2.SSSx1.Px3.p1">
            <p>The final ensemble prediction is computed as:</p>
            <equation xml:id="S5.E2">
              <tags>
                <tag>(2)</tag>
                <tag role="refnum">2</tag>
              </tags>
              <Math mode="display" tex="\mathbf{p}_{\text{ens}}=\frac{2}{3}\cdot\mathbf{p}_{\text{FER}}+\frac{1}{3}%&#10;\cdot\mathbf{p}_{\text{DF}}" text="p _ [ens] = (2 / 3) cdot p _ [FER] + (1 / 3) cdot p _ [DF]" xml:id="S5.E2.m1">
                <XMath>
                  <XMApp>
                    <XMTok meaning="equals" role="RELOP">=</XMTok>
                    <XMApp>
                      <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                      <XMTok font="bold" role="UNKNOWN">p</XMTok>
                      <XMText><text fontsize="70%">ens</text></XMText>
                    </XMApp>
                    <XMApp>
                      <XMTok meaning="plus" role="ADDOP">+</XMTok>
                      <XMApp>
                        <XMTok name="cdot" role="MULOP">⋅</XMTok>
                        <XMApp>
                          <XMTok mathstyle="display" meaning="divide" role="FRACOP"/>
                          <XMTok meaning="2" role="NUMBER">2</XMTok>
                          <XMTok meaning="3" role="NUMBER">3</XMTok>
                        </XMApp>
                        <XMApp>
                          <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                          <XMTok font="bold" role="UNKNOWN">p</XMTok>
                          <XMText><text fontsize="70%">FER</text></XMText>
                        </XMApp>
                      </XMApp>
                      <XMApp>
                        <XMTok name="cdot" role="MULOP">⋅</XMTok>
                        <XMApp>
                          <XMTok mathstyle="display" meaning="divide" role="FRACOP"/>
                          <XMTok meaning="1" role="NUMBER">1</XMTok>
                          <XMTok meaning="3" role="NUMBER">3</XMTok>
                        </XMApp>
                        <XMApp>
                          <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                          <XMTok font="bold" role="UNKNOWN">p</XMTok>
                          <XMText><text fontsize="70%">DF</text></XMText>
                        </XMApp>
                      </XMApp>
                    </XMApp>
                  </XMApp>
                </XMath>
              </Math>
            </equation>
            <p>Emotion classifiers often over-predict the <text font="italic">neutral</text> class.
To mitigate this bias, we apply a multiplicative penalty:</p>
            <equation xml:id="S5.E3">
              <tags>
                <tag>(3)</tag>
                <tag role="refnum">3</tag>
              </tags>
              <Math mode="display" tex="\tilde{p}_{\text{neutral}}=\gamma\cdot p_{\text{fuse,neutral}},\quad\gamma=0.7," text="formulae@((tilde@(p)) _ [neutral] = gamma cdot p _ [fuse,neutral], gamma = 0.7)" xml:id="S5.E3.m1">
                <XMath>
                  <XMDual>
                    <XMRef idref="S5.E3.m1.1"/>
                    <XMWrap>
                      <XMDual xml:id="S5.E3.m1.1">
                        <XMApp>
                          <XMTok meaning="formulae"/>
                          <XMRef idref="S5.E3.m1.1.1"/>
                          <XMRef idref="S5.E3.m1.1.2"/>
                        </XMApp>
                        <XMWrap>
                          <XMApp xml:id="S5.E3.m1.1.1">
                            <XMTok meaning="equals" role="RELOP">=</XMTok>
                            <XMApp>
                              <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                              <XMApp>
                                <XMTok name="tilde" role="OVERACCENT" stretchy="false">~</XMTok>
                                <XMTok font="italic" role="UNKNOWN">p</XMTok>
                              </XMApp>
                              <XMText><text fontsize="70%">neutral</text></XMText>
                            </XMApp>
                            <XMApp>
                              <XMTok name="cdot" role="MULOP">⋅</XMTok>
                              <XMTok font="italic" name="gamma" role="UNKNOWN">γ</XMTok>
                              <XMApp>
                                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                                <XMTok font="italic" role="UNKNOWN">p</XMTok>
                                <XMText><text fontsize="70%">fuse,neutral</text></XMText>
                              </XMApp>
                            </XMApp>
                          </XMApp>
                          <XMTok role="PUNCT" rpadding="10.0pt">,</XMTok>
                          <XMApp xml:id="S5.E3.m1.1.2">
                            <XMTok meaning="equals" role="RELOP">=</XMTok>
                            <XMTok font="italic" name="gamma" role="UNKNOWN">γ</XMTok>
                            <XMTok meaning="0.7" role="NUMBER">0.7</XMTok>
                          </XMApp>
                        </XMWrap>
                      </XMDual>
                      <XMTok role="PUNCT">,</XMTok>
                    </XMWrap>
                  </XMDual>
                </XMath>
              </Math>
            </equation>
            <p>where <Math mode="inline" tex="p_{\text{fuse}}" text="p _ [fuse]" xml:id="S5.SS2.SSSx1.Px3.p1.m1">
                <XMath>
                  <XMApp>
                    <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="italic" role="UNKNOWN">p</XMTok>
                    <XMText><text fontsize="70%">fuse</text></XMText>
                  </XMApp>
                </XMath>
              </Math> denotes the fused distribution over emotion classes
and <Math mode="inline" tex="\gamma" text="gamma" xml:id="S5.SS2.SSSx1.Px3.p1.m2">
                <XMath>
                  <XMTok font="italic" name="gamma" role="UNKNOWN">γ</XMTok>
                </XMath>
              </Math> is a clinically validated scaling factor.
The adjusted vector <Math mode="inline" tex="\tilde{p}" text="tilde@(p)" xml:id="S5.SS2.SSSx1.Px3.p1.m3">
                <XMath>
                  <XMApp>
                    <XMTok name="tilde" role="OVERACCENT" stretchy="false">~</XMTok>
                    <XMTok font="italic" role="UNKNOWN">p</XMTok>
                  </XMApp>
                </XMath>
              </Math> is re-normalized to ensure a valid probability distribution:</p>
            <equation xml:id="S5.E4">
              <tags>
                <tag>(4)</tag>
                <tag role="refnum">4</tag>
              </tags>
              <Math mode="display" tex="\hat{p}=\text{softmax}(\tilde{p})." text="hat@(p) = [softmax] * tilde@(p)" xml:id="S5.E4.m1">
                <XMath>
                  <XMDual>
                    <XMRef idref="S5.E4.m1.2"/>
                    <XMWrap>
                      <XMApp xml:id="S5.E4.m1.2">
                        <XMTok meaning="equals" role="RELOP">=</XMTok>
                        <XMApp>
                          <XMTok name="hat" role="OVERACCENT" stretchy="false">^</XMTok>
                          <XMTok font="italic" role="UNKNOWN">p</XMTok>
                        </XMApp>
                        <XMApp>
                          <XMTok meaning="times" role="MULOP">⁢</XMTok>
                          <XMText>softmax</XMText>
                          <XMDual>
                            <XMRef idref="S5.E4.m1.1"/>
                            <XMWrap>
                              <XMTok role="OPEN" stretchy="false">(</XMTok>
                              <XMApp xml:id="S5.E4.m1.1">
                                <XMTok name="tilde" role="OVERACCENT" stretchy="false">~</XMTok>
                                <XMTok font="italic" role="UNKNOWN">p</XMTok>
                              </XMApp>
                              <XMTok role="CLOSE" stretchy="false">)</XMTok>
                            </XMWrap>
                          </XMDual>
                        </XMApp>
                      </XMApp>
                      <XMTok role="PERIOD">.</XMTok>
                    </XMWrap>
                  </XMDual>
                </XMath>
              </Math>
            </equation>
            <p>Here, <Math mode="inline" tex="\hat{p}" text="hat@(p)" xml:id="S5.SS2.SSSx1.Px3.p1.m4">
                <XMath>
                  <XMApp>
                    <XMTok name="hat" role="OVERACCENT" stretchy="false">^</XMTok>
                    <XMTok font="italic" role="UNKNOWN">p</XMTok>
                  </XMApp>
                </XMath>
              </Math> represents the probability distribution across emotion
classes after neutral adjustment.</p>
          </para>
<!--  %**** main.tex Line 300 **** -->          <para xml:id="S5.SS2.SSSx1.Px3.p2">
            <p>Temperature scaling (<Math mode="inline" tex="T=0.7" text="T = 0.7" xml:id="S5.SS2.SSSx1.Px3.p2.m1">
                <XMath>
                  <XMApp>
                    <XMTok meaning="equals" role="RELOP">=</XMTok>
                    <XMTok font="italic" role="UNKNOWN">T</XMTok>
                    <XMTok meaning="0.7" role="NUMBER">0.7</XMTok>
                  </XMApp>
                </XMath>
              </Math>) is applied via <text font="typewriter">np.power(final_vector, 1.0/T)</text> followed by normalization, enhancing distribution sharpness. This fusion balances speed and sensitivity. Mini-Xception favors real-time applications, while FER shows improved response to subtle expressions.</p>
          </para>
        </paragraph>
      </subsubsection>
    </subsection>
    <subsection inlist="toc" xml:id="S5.SS3">
      <tags>
        <tag>V-C</tag>
        <tag role="refnum">V-C</tag>
        <tag role="typerefnum">§V-C</tag>
      </tags>
      <title><tag close=" ">V-C</tag><text font="italic">Primary Model Architecture: Fusion-N</text></title>
      <para xml:id="S5.SS3.p1">
        <p>We introduced Fusion-N, a hybrid deep neural network combining Convolutional Neural Network (a fine-tuned ResNet-50) and Graph Convolutional (GCN) to integrate global appearance features and localized relational (landmark) geometry. The architecture of Fusion-N is shown in Fig. 6.</p>
      </para>
    </subsection>
    <subsection xml:id="S5.SSx1">
      <title>a. Attention on CNN feature vector</title>
      <para xml:id="S5.SSx1.p1">
        <equation xml:id="S5.E5">
          <tags>
            <tag>(5)</tag>
            <tag role="refnum">5</tag>
          </tags>
          <Math mode="display" tex="\mathbf{F}_{\text{CNN}}^{\mathrm{attn}}=\mathbf{A}_{\text{CNN}}\odot\mathbf{F}%&#10;_{\text{CNN}}" text="(F _ [CNN]) ^ attn = A _ [CNN] direct-product F _ [CNN]" xml:id="S5.E5.m1">
            <XMath>
              <XMApp>
                <XMTok meaning="equals" role="RELOP">=</XMTok>
                <XMApp>
                  <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                  <XMApp>
                    <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="bold" role="UNKNOWN">F</XMTok>
                    <XMText><text fontsize="70%">CNN</text></XMText>
                  </XMApp>
                  <XMTok fontsize="70%" role="UNKNOWN">attn</XMTok>
                </XMApp>
                <XMApp>
                  <XMTok meaning="direct-product" name="odot" role="MULOP">⊙</XMTok>
                  <XMApp>
                    <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="bold" role="UNKNOWN">A</XMTok>
                    <XMText><text fontsize="70%">CNN</text></XMText>
                  </XMApp>
                  <XMApp>
                    <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="bold" role="UNKNOWN">F</XMTok>
                    <XMText><text fontsize="70%">CNN</text></XMText>
                  </XMApp>
                </XMApp>
              </XMApp>
            </XMath>
          </Math>
        </equation>
      </para>
      <para xml:id="S5.SSx1.p2">
        <p>where <Math mode="inline" tex="\odot" text="direct-product" xml:id="S5.SSx1.p2.m1">
            <XMath>
              <XMTok meaning="direct-product" name="odot" role="MULOP">⊙</XMTok>
            </XMath>
          </Math> denotes the element-wise (Hadamard) product <cite class="ltx_citemacro_cite">[<bibref bibrefs="wiki:hadamard,holt2013elementwise" separator="," yyseparator=","/>]</cite>, <Math mode="inline" tex="\mathbf{A}_{\text{CNN}}" text="A _ [CNN]" xml:id="S5.SSx1.p2.m2">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="bold" role="UNKNOWN">A</XMTok>
                <XMText><text fontsize="70%">CNN</text></XMText>
              </XMApp>
            </XMath>
          </Math> and <Math mode="inline" tex="\mathbf{F}_{\text{CNN}}^{\text{attn}}" text="(F _ [CNN]) ^ [attn]" xml:id="S5.SSx1.p2.m3">
            <XMath>
              <XMApp>
                <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                <XMApp>
                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                  <XMTok font="bold" role="UNKNOWN">F</XMTok>
                  <XMText><text fontsize="70%">CNN</text></XMText>
                </XMApp>
                <XMText><text fontsize="70%">attn</text></XMText>
              </XMApp>
            </XMath>
          </Math> is the refined CNN feature vector used downstream.</p>
      </para>
    </subsection>
    <subsection xml:id="S5.SSx2">
      <title>b. Aggregated GCN Features</title>
<!--  %**** main.tex Line 325 **** -->      <para xml:id="S5.SSx2.p1">
        <equation xml:id="S5.E6">
          <tags>
            <tag>(6)</tag>
            <tag role="refnum">6</tag>
          </tags>
          <Math mode="display" tex="\mathbf{F}_{\text{GCN}}=\frac{1}{N}\sum_{i=1}^{N}\mathbf{H}_{i}^{(3)}" text="F _ [GCN] = (1 / N) * ((sum _ (i = 1)) ^ N)@((H _ i) ^ 3)" xml:id="S5.E6.m1">
            <XMath>
              <XMApp>
                <XMTok meaning="equals" role="RELOP">=</XMTok>
                <XMApp>
                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                  <XMTok font="bold" role="UNKNOWN">F</XMTok>
                  <XMText><text fontsize="70%">GCN</text></XMText>
                </XMApp>
                <XMApp>
                  <XMTok meaning="times" role="MULOP">⁢</XMTok>
                  <XMApp>
                    <XMTok mathstyle="display" meaning="divide" role="FRACOP"/>
                    <XMTok meaning="1" role="NUMBER">1</XMTok>
                    <XMTok font="italic" role="UNKNOWN">N</XMTok>
                  </XMApp>
                  <XMApp>
                    <XMApp scriptpos="mid">
                      <XMTok role="SUPERSCRIPTOP" scriptpos="mid1"/>
                      <XMApp scriptpos="mid">
                        <XMTok role="SUBSCRIPTOP" scriptpos="mid1"/>
                        <XMTok mathstyle="display" meaning="sum" role="SUMOP" scriptpos="mid">∑</XMTok>
                        <XMApp>
                          <XMTok fontsize="70%" meaning="equals" role="RELOP">=</XMTok>
                          <XMTok font="italic" fontsize="70%" role="UNKNOWN">i</XMTok>
                          <XMTok fontsize="70%" meaning="1" role="NUMBER">1</XMTok>
                        </XMApp>
                      </XMApp>
                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">N</XMTok>
                    </XMApp>
                    <XMApp>
                      <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                      <XMApp>
                        <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                        <XMTok font="bold" role="UNKNOWN">H</XMTok>
                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">i</XMTok>
                      </XMApp>
                      <XMDual>
                        <XMRef idref="S5.E6.m1.1"/>
                        <XMWrap>
                          <XMTok fontsize="70%" role="OPEN" stretchy="false">(</XMTok>
                          <XMTok fontsize="70%" meaning="3" role="NUMBER" xml:id="S5.E6.m1.1">3</XMTok>
                          <XMTok fontsize="70%" role="CLOSE" stretchy="false">)</XMTok>
                        </XMWrap>
                      </XMDual>
                    </XMApp>
                  </XMApp>
                </XMApp>
              </XMApp>
            </XMath>
          </Math>
        </equation>
      </para>
      <para xml:id="S5.SSx2.p2">
        <p><Math mode="inline" tex="\mathbf{F}_{\text{GCN}}" text="F _ [GCN]" xml:id="S5.SSx2.p2.m1">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="bold" role="UNKNOWN">F</XMTok>
                <XMText><text fontsize="70%">GCN</text></XMText>
              </XMApp>
            </XMath>
          </Math> denotes the aggregated node representation after three GCN layers,
<Math mode="inline" tex="\mathbf{H}_{i}^{(3)}" text="(H _ i) ^ 3" xml:id="S5.SSx2.p2.m2">
            <XMath>
              <XMApp>
                <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                <XMApp>
                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                  <XMTok font="bold" role="UNKNOWN">H</XMTok>
                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">i</XMTok>
                </XMApp>
                <XMDual>
                  <XMRef idref="S5.SSx2.p2.m2.1"/>
                  <XMWrap>
                    <XMTok fontsize="70%" role="OPEN" stretchy="false">(</XMTok>
                    <XMTok fontsize="70%" meaning="3" role="NUMBER" xml:id="S5.SSx2.p2.m2.1">3</XMTok>
                    <XMTok fontsize="70%" role="CLOSE" stretchy="false">)</XMTok>
                  </XMWrap>
                </XMDual>
              </XMApp>
            </XMath>
          </Math> is the output node features from the third GCN layer for the <Math mode="inline" tex="i^{\text{th}}" text="i ^ [th]" xml:id="S5.SSx2.p2.m3">
            <XMath>
              <XMApp>
                <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                <XMTok font="italic" role="UNKNOWN">i</XMTok>
                <XMText><text fontsize="70%">th</text></XMText>
              </XMApp>
            </XMath>
          </Math> node and
<Math mode="inline" tex="N" text="N" xml:id="S5.SSx2.p2.m4">
            <XMath>
              <XMTok font="italic" role="UNKNOWN">N</XMTok>
            </XMath>
          </Math> represents number of nodes (e.g., facial landmarks).
<Math mode="inline" tex="\sum_{i=1}^{N}\mathbf{H}_{i}^{(3)}" text="((sum _ (i = 1)) ^ N)@((H _ i) ^ 3)" xml:id="S5.SSx2.p2.m5">
            <XMath>
              <XMApp>
                <XMApp>
                  <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                  <XMApp>
                    <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                    <XMTok mathstyle="text" meaning="sum" role="SUMOP" scriptpos="post">∑</XMTok>
                    <XMApp>
                      <XMTok fontsize="70%" meaning="equals" role="RELOP">=</XMTok>
                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">i</XMTok>
                      <XMTok fontsize="70%" meaning="1" role="NUMBER">1</XMTok>
                    </XMApp>
                  </XMApp>
                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">N</XMTok>
                </XMApp>
                <XMApp>
                  <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                  <XMApp>
                    <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="bold" role="UNKNOWN">H</XMTok>
                    <XMTok font="italic" fontsize="70%" role="UNKNOWN">i</XMTok>
                  </XMApp>
                  <XMDual>
                    <XMRef idref="S5.SSx2.p2.m5.1"/>
                    <XMWrap>
                      <XMTok fontsize="70%" role="OPEN" stretchy="false">(</XMTok>
                      <XMTok fontsize="70%" meaning="3" role="NUMBER" xml:id="S5.SSx2.p2.m5.1">3</XMTok>
                      <XMTok fontsize="70%" role="CLOSE" stretchy="false">)</XMTok>
                    </XMWrap>
                  </XMDual>
                </XMApp>
              </XMApp>
            </XMath>
          </Math> is the mean (or sum) of the output features from all nodes in the third GCN layer.</p>
      </para>
      <para xml:id="S5.SSx2.p3">
        <p>This summarizes GCN features by aggregating the landmark node embeddings after the third GCN layer and mean pooling creates a single global feature vector per face.</p>
      </para>
    </subsection>
    <subsection xml:id="S5.SSx3">
      <title>c. Feature Fusion</title>
      <para xml:id="S5.SSx3.p1">
        <equation xml:id="S5.E7">
          <tags>
            <tag>(7)</tag>
            <tag role="refnum">7</tag>
          </tags>
          <Math mode="display" tex="\mathbf{F}_{\text{fused}}=\left[\mathbf{F}^{\text{attn}}_{\text{CNN}}\,||\,%&#10;\mathbf{F}_{\text{GCN}}\right]" xml:id="S5.E7.m1">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="bold" role="UNKNOWN">F</XMTok>
                <XMText><text fontsize="70%">fused</text></XMText>
              </XMApp>
              <XMTok meaning="equals" role="RELOP">=</XMTok>
              <XMWrap>
                <XMTok role="OPEN" stretchy="true">[</XMTok>
                <XMApp rpadding="1.7pt">
                  <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                  <XMApp>
                    <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                    <XMTok font="bold" role="UNKNOWN">F</XMTok>
                    <XMText><text fontsize="70%">attn</text></XMText>
                  </XMApp>
                  <XMText><text fontsize="70%">CNN</text></XMText>
                </XMApp>
                <XMTok role="VERTBAR" stretchy="false">|</XMTok>
                <XMTok role="VERTBAR" rpadding="1.7pt" stretchy="false">|</XMTok>
                <XMApp>
                  <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                  <XMTok font="bold" role="UNKNOWN">F</XMTok>
                  <XMText><text fontsize="70%">GCN</text></XMText>
                </XMApp>
                <XMTok role="CLOSE" stretchy="true">]</XMTok>
              </XMWrap>
            </XMath>
          </Math>
        </equation>
      </para>
      <para xml:id="S5.SSx3.p2">
        <p>where, <Math mode="inline" tex="\mathbf{F}_{\text{fused}}" text="F _ [fused]" xml:id="S5.SSx3.p2.m1">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="bold" role="UNKNOWN">F</XMTok>
                <XMText><text fontsize="70%">fused</text></XMText>
              </XMApp>
            </XMath>
          </Math> is the final fused feature representation obtained by concatenating <Math mode="inline" tex="\mathbf{F}_{\text{CNN}}" text="F _ [CNN]" xml:id="S5.SSx3.p2.m2">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="bold" role="UNKNOWN">F</XMTok>
                <XMText><text fontsize="70%">CNN</text></XMText>
              </XMApp>
            </XMath>
          </Math> (attention-weighted CNN features) and <Math mode="inline" tex="\mathbf{F}_{\text{GCN}}" text="F _ [GCN]" xml:id="S5.SSx3.p2.m3">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="bold" role="UNKNOWN">F</XMTok>
                <XMText><text fontsize="70%">GCN</text></XMText>
              </XMApp>
            </XMath>
          </Math> (aggregated GCN features), denoted by the concatenation operator <Math mode="inline" tex="\left[\,\|\,\right]" xml:id="S5.SSx3.p2.m4">
            <XMath>
              <XMTok role="OPEN" rpadding="1.7pt" stretchy="true">[</XMTok>
              <XMTok meaning="parallel-to" name="||" role="VERTBAR" rpadding="1.7pt">∥</XMTok>
              <XMTok role="CLOSE" stretchy="true">]</XMTok>
            </XMath>
          </Math>.</p>
      </para>
<!--  %**** main.tex Line 350 **** -->      <para xml:id="S5.SSx3.p3">
        <p>This equation explains the concatenation of the features extracted from CNN (with channel-wise attention) and GCN to form a unified representation that combines both appearance and geometric information, and this <text font="bold">fused vector</text> is forwarded to the classification head.
<break/></p>
      </para>
      <subsubsection inlist="toc" xml:id="S5.SSx3.SSS1">
        <tags>
          <tag>V-C1</tag>
          <tag role="refnum">V-C1</tag>
          <tag role="typerefnum">§V-C1</tag>
        </tags>
        <title><tag close=" ">V-C1</tag> CNN-Based Global Feature Extraction</title>
        <para xml:id="S5.SSx3.SSS1.p1">
          <p>We leverage a pre-trained ResNet-50 backbone. ResNet-50 backbone extracts high-level features from facial images, incorporates residual learning through skip connections.
We used the standard ResNet-50 architecture <cite class="ltx_citemacro_cite">[<bibref bibrefs="he2016deep" separator="," yyseparator=","/>]</cite>, comprising four residual stages with bottleneck blocks. The original ResNet-50 uses Batch Normalization, ReLU activations, and identity skip connections within its residual blocks to facilitate residual learning. However, in our architecture, we additionally apply a Layer Normalization step after the attention module to stabilize the reweighted feature distribution before fusion with the GCN branch.
The final FC layer is removed, and the rest of the network is retained up to the Global Average Pooling (GAP) layer. This transforms ResNet-50 into a strict feature extractor, with the GAP layer producing a 2048-dimensional feature vector for each input image.</p>
        </para>
        <para xml:id="S5.SSx3.SSS1.p2">
          <p>We adopt partial fine-tuning by specifically freezing first 44 parameter tensors while the remaining tensors are fine-tuned, which enable learning domain-specific features relevant to autism-oriented emotion data.</p>
        </para>
        <para xml:id="S5.SSx3.SSS1.p3">
          <p>To further enhance the discriminative capacity of the extracted features, a lightweight attention module is appended after ResNet-50. This module comprises two fully connected layers with ReLU and Sigmoid activations. The resulting output is a learned attention weight vector that reweights the 2048-dimensional features, emphasizing the most informative components.</p>
        </para>
        <para xml:id="S5.SSx3.SSS1.p4">
          <p>The feature map <Math mode="inline" tex="\mathbf{F}_{\text{CNN}}\in\mathbb{R}^{2048}" text="F _ [CNN] element-of R ^ 2048" xml:id="S5.SSx3.SSS1.p4.m1">
              <XMath>
                <XMApp>
                  <XMTok meaning="element-of" name="in" role="RELOP">∈</XMTok>
                  <XMApp>
                    <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="bold" role="UNKNOWN">F</XMTok>
                    <XMText><text fontsize="70%">CNN</text></XMText>
                  </XMApp>
                  <XMApp>
                    <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="blackboard" role="UNKNOWN">R</XMTok>
                    <XMTok fontsize="70%" meaning="2048" role="NUMBER">2048</XMTok>
                  </XMApp>
                </XMApp>
              </XMath>
            </Math> is refined using an attention module applied on the feature vector:</p>
          <equation xml:id="S5.E8">
            <tags>
              <tag>(8)</tag>
              <tag role="refnum">8</tag>
            </tags>
            <Math mode="display" tex="\mathbf{A}_{\text{CNN}}=\sigma(W_{2}\cdot\text{ReLU}(W_{1}\cdot\mathbf{F}_{%&#10;\text{CNN}}))\quad" text="A _ [CNN] = sigma * (W _ 2 cdot [ReLU]) * (W _ 1 cdot F _ [CNN])" xml:id="S5.E8.m1">
              <XMath>
                <XMDual>
                  <XMRef idref="S5.E8.m1.1"/>
                  <XMWrap>
                    <XMApp xml:id="S5.E8.m1.1">
                      <XMTok meaning="equals" role="RELOP">=</XMTok>
                      <XMApp>
                        <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                        <XMTok font="bold" role="UNKNOWN">A</XMTok>
                        <XMText><text fontsize="70%">CNN</text></XMText>
                      </XMApp>
                      <XMApp>
                        <XMTok meaning="times" role="MULOP">⁢</XMTok>
                        <XMTok font="italic" name="sigma" role="UNKNOWN">σ</XMTok>
                        <XMDual>
                          <XMRef idref="S5.E8.m1.1.1"/>
                          <XMWrap>
                            <XMTok role="OPEN" stretchy="false">(</XMTok>
                            <XMApp xml:id="S5.E8.m1.1.1">
                              <XMTok meaning="times" role="MULOP">⁢</XMTok>
                              <XMApp>
                                <XMTok name="cdot" role="MULOP">⋅</XMTok>
                                <XMApp>
                                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                                  <XMTok font="italic" role="UNKNOWN">W</XMTok>
                                  <XMTok fontsize="70%" meaning="2" role="NUMBER">2</XMTok>
                                </XMApp>
                                <XMText>ReLU</XMText>
                              </XMApp>
                              <XMDual>
                                <XMRef idref="S5.E8.m1.1.1.1"/>
                                <XMWrap>
                                  <XMTok role="OPEN" stretchy="false">(</XMTok>
                                  <XMApp xml:id="S5.E8.m1.1.1.1">
                                    <XMTok name="cdot" role="MULOP">⋅</XMTok>
                                    <XMApp>
                                      <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                                      <XMTok font="italic" role="UNKNOWN">W</XMTok>
                                      <XMTok fontsize="70%" meaning="1" role="NUMBER">1</XMTok>
                                    </XMApp>
                                    <XMApp>
                                      <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                                      <XMTok font="bold" role="UNKNOWN">F</XMTok>
                                      <XMText><text fontsize="70%">CNN</text></XMText>
                                    </XMApp>
                                  </XMApp>
                                  <XMTok role="CLOSE" stretchy="false">)</XMTok>
                                </XMWrap>
                              </XMDual>
                            </XMApp>
                            <XMTok role="CLOSE" stretchy="false">)</XMTok>
                          </XMWrap>
                        </XMDual>
                      </XMApp>
                    </XMApp>
                    <XMTok font="italic" name="quad" role="PUNCT"> </XMTok>
                  </XMWrap>
                </XMDual>
              </XMath>
            </Math>
          </equation>
          <p>Here, <Math mode="inline" tex="\mathbf{F}_{\text{CNN}}" text="F _ [CNN]" xml:id="S5.SSx3.SSS1.p4.m2">
              <XMath>
                <XMApp>
                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                  <XMTok font="bold" role="UNKNOWN">F</XMTok>
                  <XMText><text fontsize="70%">CNN</text></XMText>
                </XMApp>
              </XMath>
            </Math> is the 2048‑dimensional raw feature vector from the last ResNet layer, <Math mode="inline" tex="W_{1}" text="W _ 1" xml:id="S5.SSx3.SSS1.p4.m3">
              <XMath>
                <XMApp>
                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                  <XMTok font="italic" role="UNKNOWN">W</XMTok>
                  <XMTok fontsize="70%" meaning="1" role="NUMBER">1</XMTok>
                </XMApp>
              </XMath>
            </Math> and <Math mode="inline" tex="W_{2}" text="W _ 2" xml:id="S5.SSx3.SSS1.p4.m4">
              <XMath>
                <XMApp>
                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                  <XMTok font="italic" role="UNKNOWN">W</XMTok>
                  <XMTok fontsize="70%" meaning="2" role="NUMBER">2</XMTok>
                </XMApp>
              </XMath>
            </Math> are learned fully‑connected weight matrices, ReLU is the rectified linear activation, <Math mode="inline" tex="\sigma" text="sigma" xml:id="S5.SSx3.SSS1.p4.m5">
              <XMath>
                <XMTok font="italic" name="sigma" role="UNKNOWN">σ</XMTok>
              </XMath>
            </Math> is the element‑wise sigmoid function (squeezing values to [0,1]), and <Math mode="inline" tex="\mathbf{A}_{\text{CNN}}" text="A _ [CNN]" xml:id="S5.SSx3.SSS1.p4.m6">
              <XMath>
                <XMApp>
                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                  <XMTok font="bold" role="UNKNOWN">A</XMTok>
                  <XMText><text fontsize="70%">CNN</text></XMText>
                </XMApp>
              </XMath>
            </Math> is the attention weight vector (the same size as <Math mode="inline" tex="\mathbf{F}_{\text{CNN}}" text="F _ [CNN]" xml:id="S5.SSx3.SSS1.p4.m7">
              <XMath>
                <XMApp>
                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                  <XMTok font="bold" role="UNKNOWN">F</XMTok>
                  <XMText><text fontsize="70%">CNN</text></XMText>
                </XMApp>
              </XMath>
            </Math>).
<break/></p>
        </para>
      </subsubsection>
      <subsubsection inlist="toc" xml:id="S5.SSx3.SSS2">
        <tags>
          <tag>V-C2</tag>
          <tag role="refnum">V-C2</tag>
          <tag role="typerefnum">§V-C2</tag>
        </tags>
        <title><tag close=" ">V-C2</tag> GCN-Based Landmark Encoding</title>
        <para xml:id="S5.SSx3.SSS2.p1">
          <p>We represent each face as a fixed-topology graph <Math mode="inline" tex="\mathcal{G}=(V,E)" text="G = open-interval@(V, E)" xml:id="S5.SSx3.SSS2.p1.m1">
              <XMath>
                <XMApp>
                  <XMTok meaning="equals" role="RELOP">=</XMTok>
                  <XMTok font="caligraphic" role="UNKNOWN">G</XMTok>
                  <XMDual>
                    <XMApp>
                      <XMTok meaning="open-interval"/>
                      <XMRef idref="S5.SSx3.SSS2.p1.m1.1"/>
                      <XMRef idref="S5.SSx3.SSS2.p1.m1.2"/>
                    </XMApp>
                    <XMWrap>
                      <XMTok role="OPEN" stretchy="false">(</XMTok>
                      <XMTok font="italic" role="UNKNOWN" xml:id="S5.SSx3.SSS2.p1.m1.1">V</XMTok>
                      <XMTok role="PUNCT">,</XMTok>
                      <XMTok font="italic" role="UNKNOWN" xml:id="S5.SSx3.SSS2.p1.m1.2">E</XMTok>
                      <XMTok role="CLOSE" stretchy="false">)</XMTok>
                    </XMWrap>
                  </XMDual>
                </XMApp>
              </XMath>
            </Math> where <Math mode="inline" tex="|V|=468" text="absolute-value@(V) = 468" xml:id="S5.SSx3.SSS2.p1.m2">
              <XMath>
                <XMApp>
                  <XMTok meaning="equals" role="RELOP">=</XMTok>
                  <XMDual>
                    <XMApp>
                      <XMTok meaning="absolute-value"/>
                      <XMRef idref="S5.SSx3.SSS2.p1.m2.1"/>
                    </XMApp>
                    <XMWrap>
                      <XMTok role="VERTBAR" stretchy="false">|</XMTok>
                      <XMTok font="italic" role="UNKNOWN" xml:id="S5.SSx3.SSS2.p1.m2.1">V</XMTok>
                      <XMTok role="VERTBAR" stretchy="false">|</XMTok>
                    </XMWrap>
                  </XMDual>
                  <XMTok meaning="468" role="NUMBER">468</XMTok>
                </XMApp>
              </XMath>
            </Math>, and edges are manually constructed based on facial geometry (jawline, eyebrows, eyes, and mouth), partially following the the MediaPipe topology (i.e., edge-index). A 3-layer GCN computes node embeddings:</p>
          <equation xml:id="S5.E9">
            <tags>
              <tag>(9)</tag>
              <tag role="refnum">9</tag>
            </tags>
            <Math mode="display" tex="\mathbf{H}^{(l+1)}=\text{ReLU}(\text{GCNConv}(\mathbf{H}^{(l)},E)),\quad%&#10;\mathbf{H}^{(0)}=X" text="formulae@(H ^ (l + 1) = [ReLU] * [GCNConv] * open-interval@(H ^ l, E), H ^ 0 = X)" xml:id="S5.E9.m1">
              <XMath>
                <XMDual>
                  <XMApp>
                    <XMTok meaning="formulae"/>
                    <XMRef idref="S5.E9.m1.5"/>
                    <XMRef idref="S5.E9.m1.6"/>
                  </XMApp>
                  <XMWrap>
                    <XMApp xml:id="S5.E9.m1.5">
                      <XMTok meaning="equals" role="RELOP">=</XMTok>
                      <XMApp>
                        <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                        <XMTok font="bold" role="UNKNOWN">H</XMTok>
                        <XMDual>
                          <XMRef idref="S5.E9.m1.1"/>
                          <XMWrap>
                            <XMTok fontsize="70%" role="OPEN" stretchy="false">(</XMTok>
                            <XMApp xml:id="S5.E9.m1.1">
                              <XMTok fontsize="70%" meaning="plus" role="ADDOP">+</XMTok>
                              <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                              <XMTok fontsize="70%" meaning="1" role="NUMBER">1</XMTok>
                            </XMApp>
                            <XMTok fontsize="70%" role="CLOSE" stretchy="false">)</XMTok>
                          </XMWrap>
                        </XMDual>
                      </XMApp>
                      <XMApp>
                        <XMTok meaning="times" role="MULOP">⁢</XMTok>
                        <XMText>ReLU</XMText>
                        <XMDual>
                          <XMRef idref="S5.E9.m1.5.1"/>
                          <XMWrap>
                            <XMTok role="OPEN" stretchy="false">(</XMTok>
                            <XMApp xml:id="S5.E9.m1.5.1">
                              <XMTok meaning="times" role="MULOP">⁢</XMTok>
                              <XMText>GCNConv</XMText>
                              <XMDual>
                                <XMApp>
                                  <XMTok meaning="open-interval"/>
                                  <XMRef idref="S5.E9.m1.5.1.1"/>
                                  <XMRef idref="S5.E9.m1.4"/>
                                </XMApp>
                                <XMWrap>
                                  <XMTok role="OPEN" stretchy="false">(</XMTok>
                                  <XMApp xml:id="S5.E9.m1.5.1.1">
                                    <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                                    <XMTok font="bold" role="UNKNOWN">H</XMTok>
                                    <XMDual>
                                      <XMRef idref="S5.E9.m1.2"/>
                                      <XMWrap>
                                        <XMTok fontsize="70%" role="OPEN" stretchy="false">(</XMTok>
                                        <XMTok font="italic" fontsize="70%" role="UNKNOWN" xml:id="S5.E9.m1.2">l</XMTok>
                                        <XMTok fontsize="70%" role="CLOSE" stretchy="false">)</XMTok>
                                      </XMWrap>
                                    </XMDual>
                                  </XMApp>
                                  <XMTok role="PUNCT">,</XMTok>
                                  <XMTok font="italic" role="UNKNOWN" xml:id="S5.E9.m1.4">E</XMTok>
                                  <XMTok role="CLOSE" stretchy="false">)</XMTok>
                                </XMWrap>
                              </XMDual>
                            </XMApp>
                            <XMTok role="CLOSE" stretchy="false">)</XMTok>
                          </XMWrap>
                        </XMDual>
                      </XMApp>
                    </XMApp>
                    <XMTok role="PUNCT" rpadding="10.0pt">,</XMTok>
                    <XMApp xml:id="S5.E9.m1.6">
                      <XMTok meaning="equals" role="RELOP">=</XMTok>
                      <XMApp>
                        <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                        <XMTok font="bold" role="UNKNOWN">H</XMTok>
                        <XMDual>
                          <XMRef idref="S5.E9.m1.3"/>
                          <XMWrap>
                            <XMTok fontsize="70%" role="OPEN" stretchy="false">(</XMTok>
                            <XMTok fontsize="70%" meaning="0" role="NUMBER" xml:id="S5.E9.m1.3">0</XMTok>
                            <XMTok fontsize="70%" role="CLOSE" stretchy="false">)</XMTok>
                          </XMWrap>
                        </XMDual>
                      </XMApp>
                      <XMTok font="italic" role="UNKNOWN">X</XMTok>
                    </XMApp>
                  </XMWrap>
                </XMDual>
              </XMath>
            </Math>
          </equation>
<!--  %**** main.tex Line 375 **** -->          <p>Here, <Math mode="inline" tex="\mathbf{H}^{(\ell)}" text="H ^ ell" xml:id="S5.SSx3.SSS2.p1.m3">
              <XMath>
                <XMApp>
                  <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                  <XMTok font="bold" role="UNKNOWN">H</XMTok>
                  <XMDual>
                    <XMRef idref="S5.SSx3.SSS2.p1.m3.1"/>
                    <XMWrap>
                      <XMTok fontsize="70%" role="OPEN" stretchy="false">(</XMTok>
                      <XMTok fontsize="70%" name="ell" role="UNKNOWN" xml:id="S5.SSx3.SSS2.p1.m3.1">ℓ</XMTok>
                      <XMTok fontsize="70%" role="CLOSE" stretchy="false">)</XMTok>
                    </XMWrap>
                  </XMDual>
                </XMApp>
              </XMath>
            </Math> is the node-feature matrix output by layer <Math mode="inline" tex="\ell" text="ell" xml:id="S5.SSx3.SSS2.p1.m4">
              <XMath>
                <XMTok name="ell" role="UNKNOWN">ℓ</XMTok>
              </XMath>
            </Math>, <Math mode="inline" tex="E" text="E" xml:id="S5.SSx3.SSS2.p1.m5">
              <XMath>
                <XMTok font="italic" role="UNKNOWN">E</XMTok>
              </XMath>
            </Math> represents the graph’s edge list or adjacency matrix, and the <text class="ltx_markedasmath">GCNConv</text> operator, originating from Kipf and Welling’s seminal GCN model <cite class="ltx_citemacro_cite">[<bibref bibrefs="kipf2016gcn" separator="," yyseparator=","/>]</cite> and implemented in PyTorch Geometric <cite class="ltx_citemacro_cite">[<bibref bibrefs="pyg_gcnconv" separator="," yyseparator=","/>]</cite> performs the graph convolution. <Math mode="inline" tex="X" text="X" xml:id="S5.SSx3.SSS2.p1.m7">
              <XMath>
                <XMTok font="italic" role="UNKNOWN">X</XMTok>
              </XMath>
            </Math> is the initial <Math mode="inline" tex="468\times 3" text="468 * 3" xml:id="S5.SSx3.SSS2.p1.m8">
              <XMath>
                <XMApp>
                  <XMTok meaning="times" role="MULOP">×</XMTok>
                  <XMTok meaning="468" role="NUMBER">468</XMTok>
                  <XMTok meaning="3" role="NUMBER">3</XMTok>
                </XMApp>
              </XMath>
            </Math> matrix of landmark coordinates. ReLU activation is applied in the first two GCN layer, while the third produces the final 128-D embeddings.</p>
        </para>
        <figure inlist="lof" labels="LABEL:fig:wide_image" placement="t" xml:id="S5.F6">
          <tags>
            <tag>Fig. 6</tag>
            <tag role="refnum">6</tag>
            <tag role="typerefnum">Fig. 6</tag>
          </tags>
          <graphics candidates="architecture.pdf" class="ltx_centering" graphic="architecture.pdf" options="width=411.939pt" xml:id="S5.F6.g1"/>
          <toccaption class="ltx_centering"><tag close=" ">6</tag>Architecture of the proposed Fusion-N model for facial emotion recognition. The framework comprises two branches: (i) a global feature extractor using a pre-trained ResNet-50 with an attention module applied on the 2048-D feature vector(<Math mode="inline" tex="F_{\text{CNN}}" text="F _ [CNN]" xml:id="S5.F6.m1">
              <XMath>
                <XMApp>
                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                  <XMTok font="italic" role="UNKNOWN">F</XMTok>
                  <XMText><text fontsize="70%">CNN</text></XMText>
                </XMApp>
              </XMath>
            </Math>), and (ii) a geometric branch processing 3D facial landmarks through stacked GCN layers with mean pooling, followed by an attention module to refine the global landmark embedding (<Math mode="inline" tex="F_{\text{GCN}}" text="F _ [GCN]" xml:id="S5.F6.m2">
              <XMath>
                <XMApp>
                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                  <XMTok font="italic" role="UNKNOWN">F</XMTok>
                  <XMText><text fontsize="70%">GCN</text></XMText>
                </XMApp>
              </XMath>
            </Math>). The features are fused via concatenation, forming a joint descriptor passed through fully connected layers with layer normalization, ReLU activation, and dropout. The final dense layer outputs emotion class probabilities using softmax activation.</toccaption>
          <caption class="ltx_centering"><tag close=": ">Fig. 6</tag>Architecture of the proposed Fusion-N model for facial emotion recognition. The framework comprises two branches: (i) a global feature extractor using a pre-trained ResNet-50 with an attention module applied on the 2048-D feature vector(<Math mode="inline" tex="F_{\text{CNN}}" text="F _ [CNN]" xml:id="S5.F6.m3">
              <XMath>
                <XMApp>
                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                  <XMTok font="italic" role="UNKNOWN">F</XMTok>
                  <XMText><text fontsize="70%">CNN</text></XMText>
                </XMApp>
              </XMath>
            </Math>), and (ii) a geometric branch processing 3D facial landmarks through stacked GCN layers with mean pooling, followed by an attention module to refine the global landmark embedding (<Math mode="inline" tex="F_{\text{GCN}}" text="F _ [GCN]" xml:id="S5.F6.m4">
              <XMath>
                <XMApp>
                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                  <XMTok font="italic" role="UNKNOWN">F</XMTok>
                  <XMText><text fontsize="70%">GCN</text></XMText>
                </XMApp>
              </XMath>
            </Math>). The features are fused via concatenation, forming a joint descriptor passed through fully connected layers with layer normalization, ReLU activation, and dropout. The final dense layer outputs emotion class probabilities using softmax activation.</caption>
        </figure>
        <para xml:id="S5.SSx3.SSS2.p2">
          <p>Stacking the 3 GCN layers enables each landmark to gather information from its neighbors and neighbors-of-neighbors. A <text font="typewriter">try-except</text> block is implemented to handle cases where the GCN fails. In such cases, a zero vector of dimension-128 is filled in to maintain consistency.</p>
        </para>
        <para xml:id="S5.SSx3.SSS2.p3">
          <p>Mean-pooled, then attention-refined yields:</p>
          <equation xml:id="S5.E10">
            <tags>
              <tag>(10)</tag>
              <tag role="refnum">10</tag>
            </tags>
            <Math mode="display" tex="\mathbf{F}_{\text{GCN}}=\text{Attn}\left(\frac{1}{N}\sum_{i=1}^{N}\mathbf{H}_{%&#10;i}^{(3)}\right)" text="F _ [GCN] = [Attn] * (1 / N) * ((sum _ (i = 1)) ^ N)@((H _ i) ^ 3)" xml:id="S5.E10.m1">
              <XMath>
                <XMApp>
                  <XMTok meaning="equals" role="RELOP">=</XMTok>
                  <XMApp>
                    <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="bold" role="UNKNOWN">F</XMTok>
                    <XMText><text fontsize="70%">GCN</text></XMText>
                  </XMApp>
                  <XMApp>
                    <XMTok meaning="times" role="MULOP">⁢</XMTok>
                    <XMText>Attn</XMText>
                    <XMDual>
                      <XMRef idref="S5.E10.m1.2"/>
                      <XMWrap>
                        <XMTok role="OPEN" stretchy="true">(</XMTok>
                        <XMApp xml:id="S5.E10.m1.2">
                          <XMTok meaning="times" role="MULOP">⁢</XMTok>
                          <XMApp>
                            <XMTok mathstyle="display" meaning="divide" role="FRACOP"/>
                            <XMTok meaning="1" role="NUMBER">1</XMTok>
                            <XMTok font="italic" role="UNKNOWN">N</XMTok>
                          </XMApp>
                          <XMApp>
                            <XMApp scriptpos="mid">
                              <XMTok role="SUPERSCRIPTOP" scriptpos="mid2"/>
                              <XMApp scriptpos="mid">
                                <XMTok role="SUBSCRIPTOP" scriptpos="mid2"/>
                                <XMTok mathstyle="display" meaning="sum" role="SUMOP" scriptpos="mid">∑</XMTok>
                                <XMApp>
                                  <XMTok fontsize="70%" meaning="equals" role="RELOP">=</XMTok>
                                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">i</XMTok>
                                  <XMTok fontsize="70%" meaning="1" role="NUMBER">1</XMTok>
                                </XMApp>
                              </XMApp>
                              <XMTok font="italic" fontsize="70%" role="UNKNOWN">N</XMTok>
                            </XMApp>
                            <XMApp>
                              <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                              <XMApp>
                                <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                <XMTok font="bold" role="UNKNOWN">H</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">i</XMTok>
                              </XMApp>
                              <XMDual>
                                <XMRef idref="S5.E10.m1.1"/>
                                <XMWrap>
                                  <XMTok fontsize="70%" role="OPEN" stretchy="false">(</XMTok>
                                  <XMTok fontsize="70%" meaning="3" role="NUMBER" xml:id="S5.E10.m1.1">3</XMTok>
                                  <XMTok fontsize="70%" role="CLOSE" stretchy="false">)</XMTok>
                                </XMWrap>
                              </XMDual>
                            </XMApp>
                          </XMApp>
                        </XMApp>
                        <XMTok role="CLOSE" stretchy="true">)</XMTok>
                      </XMWrap>
                    </XMDual>
                  </XMApp>
                </XMApp>
              </XMath>
            </Math>
          </equation>
        </para>
        <para xml:id="S5.SSx3.SSS2.p4">
          <p>Here, <Math mode="inline" tex="\mathbf{H}_{i}^{(3)}" text="(H _ i) ^ 3" xml:id="S5.SSx3.SSS2.p4.m1">
              <XMath>
                <XMApp>
                  <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                  <XMApp>
                    <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="bold" role="UNKNOWN">H</XMTok>
                    <XMTok font="italic" fontsize="70%" role="UNKNOWN">i</XMTok>
                  </XMApp>
                  <XMDual>
                    <XMRef idref="S5.SSx3.SSS2.p4.m1.1"/>
                    <XMWrap>
                      <XMTok fontsize="70%" role="OPEN" stretchy="false">(</XMTok>
                      <XMTok fontsize="70%" meaning="3" role="NUMBER" xml:id="S5.SSx3.SSS2.p4.m1.1">3</XMTok>
                      <XMTok fontsize="70%" role="CLOSE" stretchy="false">)</XMTok>
                    </XMWrap>
                  </XMDual>
                </XMApp>
              </XMath>
            </Math> denotes the 128‑D embedding of landmark <Math mode="inline" tex="i" text="i" xml:id="S5.SSx3.SSS2.p4.m2">
              <XMath>
                <XMTok font="italic" role="UNKNOWN">i</XMTok>
              </XMath>
            </Math> after three GCN layers, <Math mode="inline" tex="\mathrm{Attn}(\cdot)" text="Attn * cdot" xml:id="S5.SSx3.SSS2.p4.m3">
              <XMath>
                <XMApp>
                  <XMTok meaning="times" role="MULOP">⁢</XMTok>
                  <XMTok role="UNKNOWN">Attn</XMTok>
                  <XMDual>
                    <XMRef idref="S5.SSx3.SSS2.p4.m3.1"/>
                    <XMWrap>
                      <XMTok role="OPEN" stretchy="false">(</XMTok>
                      <XMTok name="cdot" role="MULOP" xml:id="S5.SSx3.SSS2.p4.m3.1">⋅</XMTok>
                      <XMTok role="CLOSE" stretchy="false">)</XMTok>
                    </XMWrap>
                  </XMDual>
                </XMApp>
              </XMath>
            </Math> is a small fully‑connected attention module applied on the pooled global embedding and <Math mode="inline" tex="N" text="N" xml:id="S5.SSx3.SSS2.p4.m4">
              <XMath>
                <XMTok font="italic" role="UNKNOWN">N</XMTok>
              </XMath>
            </Math> is the total number of landmarks (468). Layer Normalization is applied prior fusion.</p>
        </para>
      </subsubsection>
      <subsubsection inlist="toc" xml:id="S5.SSx3.SSS3">
        <tags>
          <tag>V-C3</tag>
          <tag role="refnum">V-C3</tag>
          <tag role="typerefnum">§V-C3</tag>
        </tags>
        <title><tag close=" ">V-C3</tag>Feature Fusion and Classification</title>
        <para xml:id="S5.SSx3.SSS3.p1">
          <p>While CNN and GCN features are concatenated for representational purposes, the fused representation <Math mode="inline" tex="\left[\mathbf{F}^{\text{attn}}_{\text{CNN}}\,\|\,\mathbf{F}_{\text{GCN}}\right%&#10;]\in\mathbb{R}^{2176}" text="delimited-[]@(conditional@((F ^ [attn]) _ [CNN], F _ [GCN])) element-of R ^ 2176" xml:id="S5.SSx3.SSS3.p1.m1">
              <XMath>
                <XMApp>
                  <XMTok meaning="element-of" name="in" role="RELOP">∈</XMTok>
                  <XMDual>
                    <XMApp>
                      <XMTok meaning="delimited-[]"/>
                      <XMRef idref="S5.SSx3.SSS3.p1.m1.1"/>
                    </XMApp>
                    <XMWrap>
                      <XMTok role="OPEN" stretchy="true">[</XMTok>
                      <XMApp xml:id="S5.SSx3.SSS3.p1.m1.1">
                        <XMTok meaning="conditional" name="||" role="MODIFIEROP" rpadding="1.7pt">∥</XMTok>
                        <XMApp rpadding="1.7pt">
                          <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                          <XMApp>
                            <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                            <XMTok font="bold" role="UNKNOWN">F</XMTok>
                            <XMText><text fontsize="70%">attn</text></XMText>
                          </XMApp>
                          <XMText><text fontsize="70%">CNN</text></XMText>
                        </XMApp>
                        <XMApp>
                          <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                          <XMTok font="bold" role="UNKNOWN">F</XMTok>
                          <XMText><text fontsize="70%">GCN</text></XMText>
                        </XMApp>
                      </XMApp>
                      <XMTok role="CLOSE" stretchy="true">]</XMTok>
                    </XMWrap>
                  </XMDual>
                  <XMApp>
                    <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="blackboard" role="UNKNOWN">R</XMTok>
                    <XMTok fontsize="70%" meaning="2176" role="NUMBER">2176</XMTok>
                  </XMApp>
                </XMApp>
              </XMath>
            </Math> is passed through the classification head. Both the CNN and GCN branches contribute to the final prediction.</p>
        </para>
        <para xml:id="S5.SSx3.SSS3.p2">
          <equation xml:id="S5.E11">
            <tags>
              <tag>(11)</tag>
              <tag role="refnum">11</tag>
            </tags>
            <Math mode="display" tex="\mathbf{F}_{\text{fused}}=[\mathbf{F}_{\text{CNN}}^{\text{attn}}\parallel%&#10;\mathbf{F}_{\text{GCN}}]\in\mathbb{R}^{2176}" text="F _ [fused] = delimited-[]@(conditional@((F _ [CNN]) ^ [attn], F _ [GCN])) element-of R ^ 2176" xml:id="S5.E11.m1">
              <XMath>
                <XMApp>
                  <XMTok meaning="multirelation"/>
                  <XMApp>
                    <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="bold" role="UNKNOWN">F</XMTok>
                    <XMText><text fontsize="70%">fused</text></XMText>
                  </XMApp>
                  <XMTok meaning="equals" role="RELOP">=</XMTok>
                  <XMDual>
                    <XMApp>
                      <XMTok meaning="delimited-[]"/>
                      <XMRef idref="S5.E11.m1.1"/>
                    </XMApp>
                    <XMWrap>
                      <XMTok role="OPEN" stretchy="false">[</XMTok>
                      <XMApp xml:id="S5.E11.m1.1">
                        <XMTok meaning="conditional" name="parallel" role="MODIFIEROP">∥</XMTok>
                        <XMApp>
                          <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                          <XMApp>
                            <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                            <XMTok font="bold" role="UNKNOWN">F</XMTok>
                            <XMText><text fontsize="70%">CNN</text></XMText>
                          </XMApp>
                          <XMText><text fontsize="70%">attn</text></XMText>
                        </XMApp>
                        <XMApp>
                          <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                          <XMTok font="bold" role="UNKNOWN">F</XMTok>
                          <XMText><text fontsize="70%">GCN</text></XMText>
                        </XMApp>
                      </XMApp>
                      <XMTok role="CLOSE" stretchy="false">]</XMTok>
                    </XMWrap>
                  </XMDual>
                  <XMTok meaning="element-of" name="in" role="RELOP">∈</XMTok>
                  <XMApp>
                    <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="blackboard" role="UNKNOWN">R</XMTok>
                    <XMTok fontsize="70%" meaning="2176" role="NUMBER">2176</XMTok>
                  </XMApp>
                </XMApp>
              </XMath>
            </Math>
          </equation>
        </para>
        <para xml:id="S5.SSx3.SSS3.p3">
          <equationgroup class="ltx_eqn_align" xml:id="S8.EGx1">
            <equation xml:id="S5.E12">
              <tags>
                <tag>(12)</tag>
                <tag role="refnum">12</tag>
              </tags>
              <MathFork>
                <Math tex="\displaystyle\mathbf{h}_{1}=\text{ReLU}(\text{LN}(W_{1}\cdot\mathbf{F}_{\text{%&#10;fused}}))" text="h _ 1 = [ReLU] * [LN] * (W _ 1 cdot F _ [fused])" xml:id="S5.E12.m3">
                  <XMath>
                    <XMApp>
                      <XMTok meaning="equals" role="RELOP">=</XMTok>
                      <XMApp>
                        <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                        <XMTok font="bold" role="UNKNOWN">h</XMTok>
                        <XMTok fontsize="70%" meaning="1" role="NUMBER">1</XMTok>
                      </XMApp>
                      <XMApp>
                        <XMTok meaning="times" role="MULOP">⁢</XMTok>
                        <XMText>ReLU</XMText>
                        <XMDual>
                          <XMRef idref="S5.E12.m3.1"/>
                          <XMWrap>
                            <XMTok role="OPEN" stretchy="false">(</XMTok>
                            <XMApp xml:id="S5.E12.m3.1">
                              <XMTok meaning="times" role="MULOP">⁢</XMTok>
                              <XMText>LN</XMText>
                              <XMDual>
                                <XMRef idref="S5.E12.m3.1.1"/>
                                <XMWrap>
                                  <XMTok role="OPEN" stretchy="false">(</XMTok>
                                  <XMApp xml:id="S5.E12.m3.1.1">
                                    <XMTok name="cdot" role="MULOP">⋅</XMTok>
                                    <XMApp>
                                      <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                                      <XMTok font="italic" role="UNKNOWN">W</XMTok>
                                      <XMTok fontsize="70%" meaning="1" role="NUMBER">1</XMTok>
                                    </XMApp>
                                    <XMApp>
                                      <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                                      <XMTok font="bold" role="UNKNOWN">F</XMTok>
                                      <XMText><text fontsize="70%">fused</text></XMText>
                                    </XMApp>
                                  </XMApp>
                                  <XMTok role="CLOSE" stretchy="false">)</XMTok>
                                </XMWrap>
                              </XMDual>
                            </XMApp>
                            <XMTok role="CLOSE" stretchy="false">)</XMTok>
                          </XMWrap>
                        </XMDual>
                      </XMApp>
                    </XMApp>
                  </XMath>
                </Math>
                <MathBranch>
                  <td align="right"><Math mode="inline" tex="\displaystyle\mathbf{h}_{1}" text="h _ 1" xml:id="S5.E12.m1">
                      <XMath>
                        <XMApp>
                          <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                          <XMTok font="bold" role="UNKNOWN">h</XMTok>
                          <XMTok fontsize="70%" meaning="1" role="NUMBER">1</XMTok>
                        </XMApp>
                      </XMath>
                    </Math></td>
                  <td align="left"><Math mode="inline" tex="\displaystyle=\text{ReLU}(\text{LN}(W_{1}\cdot\mathbf{F}_{\text{fused}}))" text="absent = [ReLU] * [LN] * (W _ 1 cdot F _ [fused])" xml:id="S5.E12.m2">
                      <XMath>
                        <XMApp>
                          <XMTok meaning="equals" role="RELOP">=</XMTok>
                          <XMTok meaning="absent"/>
                          <XMApp>
                            <XMTok meaning="times" role="MULOP">⁢</XMTok>
                            <XMText>ReLU</XMText>
                            <XMDual>
                              <XMRef idref="S5.E12.m2.1"/>
                              <XMWrap>
                                <XMTok role="OPEN" stretchy="false">(</XMTok>
                                <XMApp xml:id="S5.E12.m2.1">
                                  <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                  <XMText>LN</XMText>
                                  <XMDual>
                                    <XMRef idref="S5.E12.m2.1.1"/>
                                    <XMWrap>
                                      <XMTok role="OPEN" stretchy="false">(</XMTok>
                                      <XMApp xml:id="S5.E12.m2.1.1">
                                        <XMTok name="cdot" role="MULOP">⋅</XMTok>
                                        <XMApp>
                                          <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                                          <XMTok font="italic" role="UNKNOWN">W</XMTok>
                                          <XMTok fontsize="70%" meaning="1" role="NUMBER">1</XMTok>
                                        </XMApp>
                                        <XMApp>
                                          <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                                          <XMTok font="bold" role="UNKNOWN">F</XMTok>
                                          <XMText><text fontsize="70%">fused</text></XMText>
                                        </XMApp>
                                      </XMApp>
                                      <XMTok role="CLOSE" stretchy="false">)</XMTok>
                                    </XMWrap>
                                  </XMDual>
                                </XMApp>
                                <XMTok role="CLOSE" stretchy="false">)</XMTok>
                              </XMWrap>
                            </XMDual>
                          </XMApp>
                        </XMApp>
                      </XMath>
                    </Math></td>
                </MathBranch>
              </MathFork>
            </equation>
            <equation xml:id="S5.E13">
              <tags>
                <tag>(13)</tag>
                <tag role="refnum">13</tag>
              </tags>
              <MathFork>
                <Math tex="\displaystyle\mathbf{h}_{2}=\text{ReLU}(\text{LN}(W_{2}\cdot\mathbf{h}_{1}))" text="h _ 2 = [ReLU] * [LN] * (W _ 2 cdot h _ 1)" xml:id="S5.E13.m3">
                  <XMath>
                    <XMApp>
                      <XMTok meaning="equals" role="RELOP">=</XMTok>
                      <XMApp>
                        <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                        <XMTok font="bold" role="UNKNOWN">h</XMTok>
                        <XMTok fontsize="70%" meaning="2" role="NUMBER">2</XMTok>
                      </XMApp>
                      <XMApp>
                        <XMTok meaning="times" role="MULOP">⁢</XMTok>
                        <XMText>ReLU</XMText>
                        <XMDual>
                          <XMRef idref="S5.E13.m3.1"/>
                          <XMWrap>
                            <XMTok role="OPEN" stretchy="false">(</XMTok>
                            <XMApp xml:id="S5.E13.m3.1">
                              <XMTok meaning="times" role="MULOP">⁢</XMTok>
                              <XMText>LN</XMText>
                              <XMDual>
                                <XMRef idref="S5.E13.m3.1.1"/>
                                <XMWrap>
                                  <XMTok role="OPEN" stretchy="false">(</XMTok>
                                  <XMApp xml:id="S5.E13.m3.1.1">
                                    <XMTok name="cdot" role="MULOP">⋅</XMTok>
                                    <XMApp>
                                      <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                                      <XMTok font="italic" role="UNKNOWN">W</XMTok>
                                      <XMTok fontsize="70%" meaning="2" role="NUMBER">2</XMTok>
                                    </XMApp>
                                    <XMApp>
                                      <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                                      <XMTok font="bold" role="UNKNOWN">h</XMTok>
                                      <XMTok fontsize="70%" meaning="1" role="NUMBER">1</XMTok>
                                    </XMApp>
                                  </XMApp>
                                  <XMTok role="CLOSE" stretchy="false">)</XMTok>
                                </XMWrap>
                              </XMDual>
                            </XMApp>
                            <XMTok role="CLOSE" stretchy="false">)</XMTok>
                          </XMWrap>
                        </XMDual>
                      </XMApp>
                    </XMApp>
                  </XMath>
                </Math>
                <MathBranch>
                  <td align="right"><Math mode="inline" tex="\displaystyle\mathbf{h}_{2}" text="h _ 2" xml:id="S5.E13.m1">
                      <XMath>
                        <XMApp>
                          <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                          <XMTok font="bold" role="UNKNOWN">h</XMTok>
                          <XMTok fontsize="70%" meaning="2" role="NUMBER">2</XMTok>
                        </XMApp>
                      </XMath>
                    </Math></td>
                  <td align="left"><Math mode="inline" tex="\displaystyle=\text{ReLU}(\text{LN}(W_{2}\cdot\mathbf{h}_{1}))" text="absent = [ReLU] * [LN] * (W _ 2 cdot h _ 1)" xml:id="S5.E13.m2">
                      <XMath>
                        <XMApp>
                          <XMTok meaning="equals" role="RELOP">=</XMTok>
                          <XMTok meaning="absent"/>
                          <XMApp>
                            <XMTok meaning="times" role="MULOP">⁢</XMTok>
                            <XMText>ReLU</XMText>
                            <XMDual>
                              <XMRef idref="S5.E13.m2.1"/>
                              <XMWrap>
                                <XMTok role="OPEN" stretchy="false">(</XMTok>
                                <XMApp xml:id="S5.E13.m2.1">
                                  <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                  <XMText>LN</XMText>
                                  <XMDual>
                                    <XMRef idref="S5.E13.m2.1.1"/>
                                    <XMWrap>
                                      <XMTok role="OPEN" stretchy="false">(</XMTok>
                                      <XMApp xml:id="S5.E13.m2.1.1">
                                        <XMTok name="cdot" role="MULOP">⋅</XMTok>
                                        <XMApp>
                                          <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                                          <XMTok font="italic" role="UNKNOWN">W</XMTok>
                                          <XMTok fontsize="70%" meaning="2" role="NUMBER">2</XMTok>
                                        </XMApp>
                                        <XMApp>
                                          <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                                          <XMTok font="bold" role="UNKNOWN">h</XMTok>
                                          <XMTok fontsize="70%" meaning="1" role="NUMBER">1</XMTok>
                                        </XMApp>
                                      </XMApp>
                                      <XMTok role="CLOSE" stretchy="false">)</XMTok>
                                    </XMWrap>
                                  </XMDual>
                                </XMApp>
                                <XMTok role="CLOSE" stretchy="false">)</XMTok>
                              </XMWrap>
                            </XMDual>
                          </XMApp>
                        </XMApp>
                      </XMath>
                    </Math></td>
                </MathBranch>
              </MathFork>
            </equation>
            <equation xml:id="S5.E14">
              <tags>
                <tag>(14)</tag>
                <tag role="refnum">14</tag>
              </tags>
              <MathFork>
                <Math tex="\displaystyle\hat{\mathbf{y}}=\text{Softmax}(W_{3}\cdot\mathbf{h}_{2})" text="hat@(y) = [Softmax] * (W _ 3 cdot h _ 2)" xml:id="S5.E14.m3">
                  <XMath>
                    <XMApp>
                      <XMTok meaning="equals" role="RELOP">=</XMTok>
                      <XMApp>
                        <XMTok name="hat" role="OVERACCENT" stretchy="false">^</XMTok>
                        <XMTok font="bold" role="UNKNOWN">y</XMTok>
                      </XMApp>
                      <XMApp>
                        <XMTok meaning="times" role="MULOP">⁢</XMTok>
                        <XMText>Softmax</XMText>
                        <XMDual>
                          <XMRef idref="S5.E14.m3.1"/>
                          <XMWrap>
                            <XMTok role="OPEN" stretchy="false">(</XMTok>
                            <XMApp xml:id="S5.E14.m3.1">
                              <XMTok name="cdot" role="MULOP">⋅</XMTok>
                              <XMApp>
                                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                                <XMTok font="italic" role="UNKNOWN">W</XMTok>
                                <XMTok fontsize="70%" meaning="3" role="NUMBER">3</XMTok>
                              </XMApp>
                              <XMApp>
                                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                                <XMTok font="bold" role="UNKNOWN">h</XMTok>
                                <XMTok fontsize="70%" meaning="2" role="NUMBER">2</XMTok>
                              </XMApp>
                            </XMApp>
                            <XMTok role="CLOSE" stretchy="false">)</XMTok>
                          </XMWrap>
                        </XMDual>
                      </XMApp>
                    </XMApp>
                  </XMath>
                </Math>
                <MathBranch>
                  <td align="right"><Math mode="inline" tex="\displaystyle\hat{\mathbf{y}}" text="hat@(y)" xml:id="S5.E14.m1">
                      <XMath>
                        <XMApp>
                          <XMTok name="hat" role="OVERACCENT" stretchy="false">^</XMTok>
                          <XMTok font="bold" role="UNKNOWN">y</XMTok>
                        </XMApp>
                      </XMath>
                    </Math></td>
                  <td align="left"><Math mode="inline" tex="\displaystyle=\text{Softmax}(W_{3}\cdot\mathbf{h}_{2})" text="absent = [Softmax] * (W _ 3 cdot h _ 2)" xml:id="S5.E14.m2">
                      <XMath>
                        <XMApp>
                          <XMTok meaning="equals" role="RELOP">=</XMTok>
                          <XMTok meaning="absent"/>
                          <XMApp>
                            <XMTok meaning="times" role="MULOP">⁢</XMTok>
                            <XMText>Softmax</XMText>
                            <XMDual>
                              <XMRef idref="S5.E14.m2.1"/>
                              <XMWrap>
                                <XMTok role="OPEN" stretchy="false">(</XMTok>
                                <XMApp xml:id="S5.E14.m2.1">
                                  <XMTok name="cdot" role="MULOP">⋅</XMTok>
                                  <XMApp>
                                    <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                                    <XMTok font="italic" role="UNKNOWN">W</XMTok>
                                    <XMTok fontsize="70%" meaning="3" role="NUMBER">3</XMTok>
                                  </XMApp>
                                  <XMApp>
                                    <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                                    <XMTok font="bold" role="UNKNOWN">h</XMTok>
                                    <XMTok fontsize="70%" meaning="2" role="NUMBER">2</XMTok>
                                  </XMApp>
                                </XMApp>
                                <XMTok role="CLOSE" stretchy="false">)</XMTok>
                              </XMWrap>
                            </XMDual>
                          </XMApp>
                        </XMApp>
                      </XMath>
                    </Math></td>
                </MathBranch>
              </MathFork>
            </equation>
          </equationgroup>
        </para>
        <para xml:id="S5.SSx3.SSS3.p4">
          <p>Here, <Math mode="inline" tex="W_{1}\in\mathbb{R}^{512\times 2176}" text="W _ 1 element-of R ^ (512 * 2176)" xml:id="S5.SSx3.SSS3.p4.m1">
              <XMath>
                <XMApp>
                  <XMTok meaning="element-of" name="in" role="RELOP">∈</XMTok>
                  <XMApp>
                    <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="italic" role="UNKNOWN">W</XMTok>
                    <XMTok fontsize="70%" meaning="1" role="NUMBER">1</XMTok>
                  </XMApp>
                  <XMApp>
                    <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="blackboard" role="UNKNOWN">R</XMTok>
                    <XMApp>
                      <XMTok fontsize="70%" meaning="times" role="MULOP">×</XMTok>
                      <XMTok fontsize="70%" meaning="512" role="NUMBER">512</XMTok>
                      <XMTok fontsize="70%" meaning="2176" role="NUMBER">2176</XMTok>
                    </XMApp>
                  </XMApp>
                </XMApp>
              </XMath>
            </Math> and <Math mode="inline" tex="W_{2}\in\mathbb{R}^{256\times 512}" text="W _ 2 element-of R ^ (256 * 512)" xml:id="S5.SSx3.SSS3.p4.m2">
              <XMath>
                <XMApp>
                  <XMTok meaning="element-of" name="in" role="RELOP">∈</XMTok>
                  <XMApp>
                    <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="italic" role="UNKNOWN">W</XMTok>
                    <XMTok fontsize="70%" meaning="2" role="NUMBER">2</XMTok>
                  </XMApp>
                  <XMApp>
                    <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="blackboard" role="UNKNOWN">R</XMTok>
                    <XMApp>
                      <XMTok fontsize="70%" meaning="times" role="MULOP">×</XMTok>
                      <XMTok fontsize="70%" meaning="256" role="NUMBER">256</XMTok>
                      <XMTok fontsize="70%" meaning="512" role="NUMBER">512</XMTok>
                    </XMApp>
                  </XMApp>
                </XMApp>
              </XMath>
            </Math> are learned weight matrices, <Math mode="inline" tex="W_{3}\in\mathbb{R}^{7\times 256}" text="W _ 3 element-of R ^ (7 * 256)" xml:id="S5.SSx3.SSS3.p4.m3">
              <XMath>
                <XMApp>
                  <XMTok meaning="element-of" name="in" role="RELOP">∈</XMTok>
                  <XMApp>
                    <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="italic" role="UNKNOWN">W</XMTok>
                    <XMTok fontsize="70%" meaning="3" role="NUMBER">3</XMTok>
                  </XMApp>
                  <XMApp>
                    <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="blackboard" role="UNKNOWN">R</XMTok>
                    <XMApp>
                      <XMTok fontsize="70%" meaning="times" role="MULOP">×</XMTok>
                      <XMTok fontsize="70%" meaning="7" role="NUMBER">7</XMTok>
                      <XMTok fontsize="70%" meaning="256" role="NUMBER">256</XMTok>
                    </XMApp>
                  </XMApp>
                </XMApp>
              </XMath>
            </Math> is the final linear projection, <Math mode="inline" tex="\mathbf{h}_{1}" text="h _ 1" xml:id="S5.SSx3.SSS3.p4.m4">
              <XMath>
                <XMApp>
                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                  <XMTok font="bold" role="UNKNOWN">h</XMTok>
                  <XMTok fontsize="70%" meaning="1" role="NUMBER">1</XMTok>
                </XMApp>
              </XMath>
            </Math> and <Math mode="inline" tex="\mathbf{h}_{2}" text="h _ 2" xml:id="S5.SSx3.SSS3.p4.m5">
              <XMath>
                <XMApp>
                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                  <XMTok font="bold" role="UNKNOWN">h</XMTok>
                  <XMTok fontsize="70%" meaning="2" role="NUMBER">2</XMTok>
                </XMApp>
              </XMath>
            </Math> are intermediate 512-dimensonal and 256-dimensional hidden vectors, respectively. ReLU is the rectified-linear activation function, LN denotes layer normalization as introduced by Ba et al. <cite class="ltx_citemacro_cite">[<bibref bibrefs="ba2016layernorm" separator="," yyseparator=","/>]</cite>, <Math mode="inline" tex="\mathbf{F}^{\text{attn}}_{\text{CNN}}" text="(F ^ [attn]) _ [CNN]" xml:id="S5.SSx3.SSS3.p4.m6">
              <XMath>
                <XMApp>
                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                  <XMApp>
                    <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="bold" role="UNKNOWN">F</XMTok>
                    <XMText><text fontsize="70%">attn</text></XMText>
                  </XMApp>
                  <XMText><text fontsize="70%">CNN</text></XMText>
                </XMApp>
              </XMath>
            </Math> is the 2048-dimensional attention-refined CNN feature vector and <Math mode="inline" tex="\hat{\mathbf{y}}" text="hat@(y)" xml:id="S5.SSx3.SSS3.p4.m7">
              <XMath>
                <XMApp>
                  <XMTok name="hat" role="OVERACCENT" stretchy="false">^</XMTok>
                  <XMTok font="bold" role="UNKNOWN">y</XMTok>
                </XMApp>
              </XMath>
            </Math> is the predicted probability vector for seven emotion classes. Both CNN and GCN branches contribute complementary information to the fused representation.
This process has been illustrated in Fig. <ref labelref="LABEL:fig:ffcp"/>.</p>
        </para>
        <figure inlist="lof" labels="LABEL:alg:classifier_head_corrected LABEL:fig:ffcp" placement="h" xml:id="S5.F7">
          <tags>
            <tag>Fig. 7</tag>
            <tag role="refnum">7</tag>
            <tag role="typerefnum">Fig. 7</tag>
          </tags>
          <graphics candidates="ffcp.pdf" class="ltx_centering" graphic="ffcp.pdf" options="width=252.945pt" xml:id="S5.F7.g1"/>
          <toccaption class="ltx_centering"><tag close=" ">7</tag>The attention-refined CNN feature vector (2048-D) is concatenated with the pooled GCN embedding (128-D) to get a merged 2176-D fused representation. It is passed through a classification head that contains two fully connected layers, each preceded by layer normalization, ReLU activation, and dropout for regularization. The last dense layer outputs to the target number of emotion classes, generating logits, which are then transformed into predicted class probabilities with a softmax function. This combination approach successfully combines global appearance features of the CNN and localized geometric cues of the GCN for robust facial emotion recognition.</toccaption>
          <caption class="ltx_centering"><tag close=": ">Fig. 7</tag>The attention-refined CNN feature vector (2048-D) is concatenated with the pooled GCN embedding (128-D) to get a merged 2176-D fused representation. It is passed through a classification head that contains two fully connected layers, each preceded by layer normalization, ReLU activation, and dropout for regularization. The last dense layer outputs to the target number of emotion classes, generating logits, which are then transformed into predicted class probabilities with a softmax function. This combination approach successfully combines global appearance features of the CNN and localized geometric cues of the GCN for robust facial emotion recognition.</caption>
        </figure>
        <para class="ltx_noindent" xml:id="S5.SSx3.SSS3.p5">
          <p>Inputs of Fusion-N:</p>
        </para>
        <para xml:id="S5.SSx3.SSS3.p6">
          <enumerate xml:id="S5.I1">
            <item xml:id="S5.I1.i1">
              <tags>
                <tag>1.</tag>
                <tag role="refnum">1</tag>
                <tag role="typerefnum">item 1</tag>
              </tags>
              <para xml:id="S5.I1.i1.p1">
                <p>Images of shape <Math mode="inline" tex="[B,3,H,W]" text="list@(B, 3, H, W)" xml:id="S5.I1.i1.p1.m1">
                    <XMath>
                      <XMDual>
                        <XMApp>
                          <XMTok meaning="list"/>
                          <XMRef idref="S5.I1.i1.p1.m1.1"/>
                          <XMRef idref="S5.I1.i1.p1.m1.2"/>
                          <XMRef idref="S5.I1.i1.p1.m1.3"/>
                          <XMRef idref="S5.I1.i1.p1.m1.4"/>
                        </XMApp>
                        <XMWrap>
                          <XMTok role="OPEN" stretchy="false">[</XMTok>
                          <XMTok font="italic" role="UNKNOWN" xml:id="S5.I1.i1.p1.m1.1">B</XMTok>
                          <XMTok role="PUNCT">,</XMTok>
                          <XMTok meaning="3" role="NUMBER" xml:id="S5.I1.i1.p1.m1.2">3</XMTok>
                          <XMTok role="PUNCT">,</XMTok>
                          <XMTok font="italic" role="UNKNOWN" xml:id="S5.I1.i1.p1.m1.3">H</XMTok>
                          <XMTok role="PUNCT">,</XMTok>
                          <XMTok font="italic" role="UNKNOWN" xml:id="S5.I1.i1.p1.m1.4">W</XMTok>
                          <XMTok role="CLOSE" stretchy="false">]</XMTok>
                        </XMWrap>
                      </XMDual>
                    </XMath>
                  </Math>, where <Math mode="inline" tex="B" text="B" xml:id="S5.I1.i1.p1.m2">
                    <XMath>
                      <XMTok font="italic" role="UNKNOWN">B</XMTok>
                    </XMath>
                  </Math> is the batch size,
<!--  %**** main.tex Line 425 **** -->3 refers to RGB channels, and <Math mode="inline" tex="H\times W" text="H * W" xml:id="S5.I1.i1.p1.m3">
                    <XMath>
                      <XMApp>
                        <XMTok meaning="times" role="MULOP">×</XMTok>
                        <XMTok font="italic" role="UNKNOWN">H</XMTok>
                        <XMTok font="italic" role="UNKNOWN">W</XMTok>
                      </XMApp>
                    </XMath>
                  </Math> is the spatial resolution.</p>
              </para>
            </item>
            <item xml:id="S5.I1.i2">
              <tags>
                <tag>2.</tag>
                <tag role="refnum">2</tag>
                <tag role="typerefnum">item 2</tag>
              </tags>
              <para xml:id="S5.I1.i2.p1">
                <p>Landmarks of shape <Math mode="inline" tex="[B,468,3]" text="list@(B, 468, 3)" xml:id="S5.I1.i2.p1.m1">
                    <XMath>
                      <XMDual>
                        <XMApp>
                          <XMTok meaning="list"/>
                          <XMRef idref="S5.I1.i2.p1.m1.1"/>
                          <XMRef idref="S5.I1.i2.p1.m1.2"/>
                          <XMRef idref="S5.I1.i2.p1.m1.3"/>
                        </XMApp>
                        <XMWrap>
                          <XMTok role="OPEN" stretchy="false">[</XMTok>
                          <XMTok font="italic" role="UNKNOWN" xml:id="S5.I1.i2.p1.m1.1">B</XMTok>
                          <XMTok role="PUNCT">,</XMTok>
                          <XMTok meaning="468" role="NUMBER" xml:id="S5.I1.i2.p1.m1.2">468</XMTok>
                          <XMTok role="PUNCT">,</XMTok>
                          <XMTok meaning="3" role="NUMBER" xml:id="S5.I1.i2.p1.m1.3">3</XMTok>
                          <XMTok role="CLOSE" stretchy="false">]</XMTok>
                        </XMWrap>
                      </XMDual>
                    </XMath>
                  </Math>, where <Math mode="inline" tex="B" text="B" xml:id="S5.I1.i2.p1.m2">
                    <XMath>
                      <XMTok font="italic" role="UNKNOWN">B</XMTok>
                    </XMath>
                  </Math> is the batch size,
468 is the number of landmarks (from MediaPipe Face Mesh), and
3 denotes <Math mode="inline" tex="(x,y,z)" text="vector@(x, y, z)" xml:id="S5.I1.i2.p1.m3">
                    <XMath>
                      <XMDual>
                        <XMApp>
                          <XMTok meaning="vector"/>
                          <XMRef idref="S5.I1.i2.p1.m3.1"/>
                          <XMRef idref="S5.I1.i2.p1.m3.2"/>
                          <XMRef idref="S5.I1.i2.p1.m3.3"/>
                        </XMApp>
                        <XMWrap>
                          <XMTok role="OPEN" stretchy="false">(</XMTok>
                          <XMTok font="italic" role="UNKNOWN" xml:id="S5.I1.i2.p1.m3.1">x</XMTok>
                          <XMTok role="PUNCT">,</XMTok>
                          <XMTok font="italic" role="UNKNOWN" xml:id="S5.I1.i2.p1.m3.2">y</XMTok>
                          <XMTok role="PUNCT">,</XMTok>
                          <XMTok font="italic" role="UNKNOWN" xml:id="S5.I1.i2.p1.m3.3">z</XMTok>
                          <XMTok role="CLOSE" stretchy="false">)</XMTok>
                        </XMWrap>
                      </XMDual>
                    </XMath>
                  </Math> coordinates.</p>
              </para>
            </item>
          </enumerate>
        </para>
        <para class="ltx_noindent" xml:id="S5.SSx3.SSS3.p7">
          <p>Output of Fusion-N: Logits of shape <Math mode="inline" tex="[B,\text{num\_classes}]" text="closed-interval@(B, [num_classes])" xml:id="S5.SSx3.SSS3.p7.m1">
              <XMath>
                <XMDual>
                  <XMApp>
                    <XMTok meaning="closed-interval"/>
                    <XMRef idref="S5.SSx3.SSS3.p7.m1.1"/>
                    <XMRef idref="S5.SSx3.SSS3.p7.m1.2"/>
                  </XMApp>
                  <XMWrap>
                    <XMTok role="OPEN" stretchy="false">[</XMTok>
                    <XMTok font="italic" role="UNKNOWN" xml:id="S5.SSx3.SSS3.p7.m1.1">B</XMTok>
                    <XMTok role="PUNCT">,</XMTok>
                    <XMText xml:id="S5.SSx3.SSS3.p7.m1.2">num_classes</XMText>
                    <XMTok role="CLOSE" stretchy="false">]</XMTok>
                  </XMWrap>
                </XMDual>
              </XMath>
            </Math>, i.e., raw scores before softmax.</p>
        </para>
        <para class="ltx_noindent" xml:id="S5.SSx3.SSS3.p8">
          <p>Feature dimensions: The model computes a 2048-dimensional attention-refined CNN feature vector and a 128-dimensional GCN embedding. CNN and GCN features are concatenated, and the fused 2176-dimensional vector is passed through the classification head for final emotion prediction.</p>
        </para>
        <ERROR class="undefined">{algorithm}</ERROR>
        <para xml:id="S5.SSx3.SSS3.p9">
          <p>[h]
<text class="ltx_caption">Classifier Head Pseudo‑Algorithm</text>

<ERROR class="undefined">\lx@orig@algorithmic</ERROR>[1]
<ERROR class="undefined">\REQUIRE</ERROR><Math mode="inline" tex="\mathbf{X}\in\mathbb{R}^{B\times 2176}" text="X element-of R ^ (B * 2176)" xml:id="S5.SSx3.SSS3.p9.m1">
              <XMath>
                <XMApp>
                  <XMTok meaning="element-of" name="in" role="RELOP">∈</XMTok>
                  <XMTok font="bold" role="UNKNOWN">X</XMTok>
                  <XMApp>
                    <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="blackboard" role="UNKNOWN">R</XMTok>
                    <XMApp>
                      <XMTok fontsize="70%" meaning="times" role="MULOP">×</XMTok>
                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">B</XMTok>
                      <XMTok fontsize="70%" meaning="2176" role="NUMBER">2176</XMTok>
                    </XMApp>
                  </XMApp>
                </XMApp>
              </XMath>
            </Math> Fused feature matrix (batch size <Math mode="inline" tex="B" text="B" xml:id="S5.SSx3.SSS3.p9.m2">
              <XMath>
                <XMTok font="italic" role="UNKNOWN">B</XMTok>
              </XMath>
            </Math>)
<Math mode="inline" tex="\mathbf{W}_{1}\in\mathbb{R}^{2176\times 512},\ \mathbf{b}_{1}\in\mathbb{R}^{512}" text="formulae@(W _ 1 element-of R ^ (2176 * 512), b _ 1 element-of R ^ 512)" xml:id="S5.SSx3.SSS3.p9.m3">
              <XMath>
                <XMDual>
                  <XMApp>
                    <XMTok meaning="formulae"/>
                    <XMRef idref="S5.SSx3.SSS3.p9.m3.1"/>
                    <XMRef idref="S5.SSx3.SSS3.p9.m3.2"/>
                  </XMApp>
                  <XMWrap>
                    <XMApp xml:id="S5.SSx3.SSS3.p9.m3.1">
                      <XMTok meaning="element-of" name="in" role="RELOP">∈</XMTok>
                      <XMApp>
                        <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                        <XMTok font="bold" role="UNKNOWN">W</XMTok>
                        <XMTok fontsize="70%" meaning="1" role="NUMBER">1</XMTok>
                      </XMApp>
                      <XMApp>
                        <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                        <XMTok font="blackboard" role="UNKNOWN">R</XMTok>
                        <XMApp>
                          <XMTok fontsize="70%" meaning="times" role="MULOP">×</XMTok>
                          <XMTok fontsize="70%" meaning="2176" role="NUMBER">2176</XMTok>
                          <XMTok fontsize="70%" meaning="512" role="NUMBER">512</XMTok>
                        </XMApp>
                      </XMApp>
                    </XMApp>
                    <XMTok role="PUNCT" rpadding="5.0pt">,</XMTok>
                    <XMApp xml:id="S5.SSx3.SSS3.p9.m3.2">
                      <XMTok meaning="element-of" name="in" role="RELOP">∈</XMTok>
                      <XMApp>
                        <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                        <XMTok font="bold" role="UNKNOWN">b</XMTok>
                        <XMTok fontsize="70%" meaning="1" role="NUMBER">1</XMTok>
                      </XMApp>
                      <XMApp>
                        <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                        <XMTok font="blackboard" role="UNKNOWN">R</XMTok>
                        <XMTok fontsize="70%" meaning="512" role="NUMBER">512</XMTok>
                      </XMApp>
                    </XMApp>
                  </XMWrap>
                </XMDual>
              </XMath>
            </Math>
<Math mode="inline" tex="\mathbf{W}_{2}\in\mathbb{R}^{512\times 256},\ \mathbf{b}_{2}\in\mathbb{R}^{256}" text="formulae@(W _ 2 element-of R ^ (512 * 256), b _ 2 element-of R ^ 256)" xml:id="S5.SSx3.SSS3.p9.m4">
              <XMath>
                <XMDual>
                  <XMApp>
                    <XMTok meaning="formulae"/>
                    <XMRef idref="S5.SSx3.SSS3.p9.m4.1"/>
                    <XMRef idref="S5.SSx3.SSS3.p9.m4.2"/>
                  </XMApp>
                  <XMWrap>
                    <XMApp xml:id="S5.SSx3.SSS3.p9.m4.1">
                      <XMTok meaning="element-of" name="in" role="RELOP">∈</XMTok>
                      <XMApp>
                        <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                        <XMTok font="bold" role="UNKNOWN">W</XMTok>
                        <XMTok fontsize="70%" meaning="2" role="NUMBER">2</XMTok>
                      </XMApp>
                      <XMApp>
                        <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                        <XMTok font="blackboard" role="UNKNOWN">R</XMTok>
                        <XMApp>
                          <XMTok fontsize="70%" meaning="times" role="MULOP">×</XMTok>
                          <XMTok fontsize="70%" meaning="512" role="NUMBER">512</XMTok>
                          <XMTok fontsize="70%" meaning="256" role="NUMBER">256</XMTok>
                        </XMApp>
                      </XMApp>
                    </XMApp>
                    <XMTok role="PUNCT" rpadding="5.0pt">,</XMTok>
                    <XMApp xml:id="S5.SSx3.SSS3.p9.m4.2">
                      <XMTok meaning="element-of" name="in" role="RELOP">∈</XMTok>
                      <XMApp>
                        <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                        <XMTok font="bold" role="UNKNOWN">b</XMTok>
                        <XMTok fontsize="70%" meaning="2" role="NUMBER">2</XMTok>
                      </XMApp>
                      <XMApp>
                        <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                        <XMTok font="blackboard" role="UNKNOWN">R</XMTok>
                        <XMTok fontsize="70%" meaning="256" role="NUMBER">256</XMTok>
                      </XMApp>
                    </XMApp>
                  </XMWrap>
                </XMDual>
              </XMath>
            </Math>
<!--  %**** main.tex Line 450 **** --><Math mode="inline" tex="\mathbf{W}_{3}\in\mathbb{R}^{256\times 7},\ \mathbf{b}_{3}\in\mathbb{R}^{7}" text="formulae@(W _ 3 element-of R ^ (256 * 7), b _ 3 element-of R ^ 7)" xml:id="S5.SSx3.SSS3.p9.m5">
              <XMath>
                <XMDual>
                  <XMApp>
                    <XMTok meaning="formulae"/>
                    <XMRef idref="S5.SSx3.SSS3.p9.m5.1"/>
                    <XMRef idref="S5.SSx3.SSS3.p9.m5.2"/>
                  </XMApp>
                  <XMWrap>
                    <XMApp xml:id="S5.SSx3.SSS3.p9.m5.1">
                      <XMTok meaning="element-of" name="in" role="RELOP">∈</XMTok>
                      <XMApp>
                        <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                        <XMTok font="bold" role="UNKNOWN">W</XMTok>
                        <XMTok fontsize="70%" meaning="3" role="NUMBER">3</XMTok>
                      </XMApp>
                      <XMApp>
                        <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                        <XMTok font="blackboard" role="UNKNOWN">R</XMTok>
                        <XMApp>
                          <XMTok fontsize="70%" meaning="times" role="MULOP">×</XMTok>
                          <XMTok fontsize="70%" meaning="256" role="NUMBER">256</XMTok>
                          <XMTok fontsize="70%" meaning="7" role="NUMBER">7</XMTok>
                        </XMApp>
                      </XMApp>
                    </XMApp>
                    <XMTok role="PUNCT" rpadding="5.0pt">,</XMTok>
                    <XMApp xml:id="S5.SSx3.SSS3.p9.m5.2">
                      <XMTok meaning="element-of" name="in" role="RELOP">∈</XMTok>
                      <XMApp>
                        <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                        <XMTok font="bold" role="UNKNOWN">b</XMTok>
                        <XMTok fontsize="70%" meaning="3" role="NUMBER">3</XMTok>
                      </XMApp>
                      <XMApp>
                        <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                        <XMTok font="blackboard" role="UNKNOWN">R</XMTok>
                        <XMTok fontsize="70%" meaning="7" role="NUMBER">7</XMTok>
                      </XMApp>
                    </XMApp>
                  </XMWrap>
                </XMDual>
              </XMath>
            </Math>
<ERROR class="undefined">\ENSURE</ERROR><Math mode="inline" tex="\mathbf{logits}\in\mathbb{R}^{B\times 7}" text="logits element-of R ^ (B * 7)" xml:id="S5.SSx3.SSS3.p9.m6">
              <XMath>
                <XMApp>
                  <XMTok meaning="element-of" name="in" role="RELOP">∈</XMTok>
                  <XMTok font="bold" role="UNKNOWN">logits</XMTok>
                  <XMApp>
                    <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="blackboard" role="UNKNOWN">R</XMTok>
                    <XMApp>
                      <XMTok fontsize="70%" meaning="times" role="MULOP">×</XMTok>
                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">B</XMTok>
                      <XMTok fontsize="70%" meaning="7" role="NUMBER">7</XMTok>
                    </XMApp>
                  </XMApp>
                </XMApp>
              </XMath>
            </Math> Pre-softmax scores for each emotion class
<ERROR class="undefined">\FOR</ERROR><Math mode="inline" tex="i\leftarrow 1" text="i leftarrow 1" xml:id="S5.SSx3.SSS3.p9.m7">
              <XMath>
                <XMApp>
                  <XMTok name="leftarrow" role="ARROW">←</XMTok>
                  <XMTok font="italic" role="UNKNOWN">i</XMTok>
                  <XMTok meaning="1" role="NUMBER">1</XMTok>
                </XMApp>
              </XMath>
            </Math> to <Math mode="inline" tex="B" text="B" xml:id="S5.SSx3.SSS3.p9.m8">
              <XMath>
                <XMTok font="italic" role="UNKNOWN">B</XMTok>
              </XMath>
            </Math>
<ERROR class="undefined">\STATE</ERROR><text font="bold">FC1:</text> <Math mode="inline" tex="\mathbf{Z}_{1}\leftarrow\mathbf{X}[i]\mathbf{W}_{1}+\mathbf{b}_{1}" text="Z _ 1 leftarrow X * delimited-[]@(i) * W _ 1 + b _ 1" xml:id="S5.SSx3.SSS3.p9.m9">
              <XMath>
                <XMApp>
                  <XMTok name="leftarrow" role="ARROW">←</XMTok>
                  <XMApp>
                    <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="bold" role="UNKNOWN">Z</XMTok>
                    <XMTok fontsize="70%" meaning="1" role="NUMBER">1</XMTok>
                  </XMApp>
                  <XMApp>
                    <XMTok meaning="plus" role="ADDOP">+</XMTok>
                    <XMApp>
                      <XMTok meaning="times" role="MULOP">⁢</XMTok>
                      <XMTok font="bold" role="UNKNOWN">X</XMTok>
                      <XMDual>
                        <XMApp>
                          <XMTok meaning="delimited-[]"/>
                          <XMRef idref="S5.SSx3.SSS3.p9.m9.1"/>
                        </XMApp>
                        <XMWrap>
                          <XMTok role="OPEN" stretchy="false">[</XMTok>
                          <XMTok font="italic" role="UNKNOWN" xml:id="S5.SSx3.SSS3.p9.m9.1">i</XMTok>
                          <XMTok role="CLOSE" stretchy="false">]</XMTok>
                        </XMWrap>
                      </XMDual>
                      <XMApp>
                        <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                        <XMTok font="bold" role="UNKNOWN">W</XMTok>
                        <XMTok fontsize="70%" meaning="1" role="NUMBER">1</XMTok>
                      </XMApp>
                    </XMApp>
                    <XMApp>
                      <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                      <XMTok font="bold" role="UNKNOWN">b</XMTok>
                      <XMTok fontsize="70%" meaning="1" role="NUMBER">1</XMTok>
                    </XMApp>
                  </XMApp>
                </XMApp>
              </XMath>
            </Math>
<ERROR class="undefined">\STATE</ERROR><text font="bold">LN1:</text> <Math mode="inline" tex="\mathbf{N}_{1}\leftarrow\mathrm{LayerNorm}(\mathbf{Z}_{1})" text="N _ 1 leftarrow LayerNorm * Z _ 1" xml:id="S5.SSx3.SSS3.p9.m10">
              <XMath>
                <XMApp>
                  <XMTok name="leftarrow" role="ARROW">←</XMTok>
                  <XMApp>
                    <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="bold" role="UNKNOWN">N</XMTok>
                    <XMTok fontsize="70%" meaning="1" role="NUMBER">1</XMTok>
                  </XMApp>
                  <XMApp>
                    <XMTok meaning="times" role="MULOP">⁢</XMTok>
                    <XMTok role="UNKNOWN">LayerNorm</XMTok>
                    <XMDual>
                      <XMRef idref="S5.SSx3.SSS3.p9.m10.1"/>
                      <XMWrap>
                        <XMTok role="OPEN" stretchy="false">(</XMTok>
                        <XMApp xml:id="S5.SSx3.SSS3.p9.m10.1">
                          <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                          <XMTok font="bold" role="UNKNOWN">Z</XMTok>
                          <XMTok fontsize="70%" meaning="1" role="NUMBER">1</XMTok>
                        </XMApp>
                        <XMTok role="CLOSE" stretchy="false">)</XMTok>
                      </XMWrap>
                    </XMDual>
                  </XMApp>
                </XMApp>
              </XMath>
            </Math>
<ERROR class="undefined">\STATE</ERROR><text font="bold">ReLU1:</text> <Math mode="inline" tex="\mathbf{A}_{1}\leftarrow\mathrm{ReLU}(\mathbf{N}_{1})" text="A _ 1 leftarrow ReLU * N _ 1" xml:id="S5.SSx3.SSS3.p9.m11">
              <XMath>
                <XMApp>
                  <XMTok name="leftarrow" role="ARROW">←</XMTok>
                  <XMApp>
                    <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="bold" role="UNKNOWN">A</XMTok>
                    <XMTok fontsize="70%" meaning="1" role="NUMBER">1</XMTok>
                  </XMApp>
                  <XMApp>
                    <XMTok meaning="times" role="MULOP">⁢</XMTok>
                    <XMTok role="UNKNOWN">ReLU</XMTok>
                    <XMDual>
                      <XMRef idref="S5.SSx3.SSS3.p9.m11.1"/>
                      <XMWrap>
                        <XMTok role="OPEN" stretchy="false">(</XMTok>
                        <XMApp xml:id="S5.SSx3.SSS3.p9.m11.1">
                          <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                          <XMTok font="bold" role="UNKNOWN">N</XMTok>
                          <XMTok fontsize="70%" meaning="1" role="NUMBER">1</XMTok>
                        </XMApp>
                        <XMTok role="CLOSE" stretchy="false">)</XMTok>
                      </XMWrap>
                    </XMDual>
                  </XMApp>
                </XMApp>
              </XMath>
            </Math>
<ERROR class="undefined">\STATE</ERROR><text font="bold">Drop1:</text> <Math mode="inline" tex="\mathbf{D}_{1}\leftarrow\mathrm{Dropout}(\mathbf{A}_{1},\ p=0.325)" xml:id="S5.SSx3.SSS3.p9.m12">
              <XMath>
                <XMApp>
                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                  <XMTok font="bold" role="UNKNOWN">D</XMTok>
                  <XMTok fontsize="70%" meaning="1" role="NUMBER">1</XMTok>
                </XMApp>
                <XMTok name="leftarrow" role="ARROW">←</XMTok>
                <XMTok role="UNKNOWN">Dropout</XMTok>
                <XMWrap>
                  <XMTok role="OPEN" stretchy="false">(</XMTok>
                  <XMApp>
                    <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="bold" role="UNKNOWN">A</XMTok>
                    <XMTok fontsize="70%" meaning="1" role="NUMBER">1</XMTok>
                  </XMApp>
                  <XMTok role="PUNCT" rpadding="5.0pt">,</XMTok>
                  <XMTok font="italic" role="UNKNOWN">p</XMTok>
                  <XMTok meaning="equals" role="RELOP">=</XMTok>
                  <XMTok meaning="0.325" role="NUMBER">0.325</XMTok>
                  <XMTok role="CLOSE" stretchy="false">)</XMTok>
                </XMWrap>
              </XMath>
            </Math>
<ERROR class="undefined">\STATE</ERROR><text font="bold">FC2:</text> <Math mode="inline" tex="\mathbf{Z}_{2}\leftarrow\mathbf{D}_{1}\mathbf{W}_{2}+\mathbf{b}_{2}" text="Z _ 2 leftarrow D _ 1 * W _ 2 + b _ 2" xml:id="S5.SSx3.SSS3.p9.m13">
              <XMath>
                <XMApp>
                  <XMTok name="leftarrow" role="ARROW">←</XMTok>
                  <XMApp>
                    <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="bold" role="UNKNOWN">Z</XMTok>
                    <XMTok fontsize="70%" meaning="2" role="NUMBER">2</XMTok>
                  </XMApp>
                  <XMApp>
                    <XMTok meaning="plus" role="ADDOP">+</XMTok>
                    <XMApp>
                      <XMTok meaning="times" role="MULOP">⁢</XMTok>
                      <XMApp>
                        <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                        <XMTok font="bold" role="UNKNOWN">D</XMTok>
                        <XMTok fontsize="70%" meaning="1" role="NUMBER">1</XMTok>
                      </XMApp>
                      <XMApp>
                        <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                        <XMTok font="bold" role="UNKNOWN">W</XMTok>
                        <XMTok fontsize="70%" meaning="2" role="NUMBER">2</XMTok>
                      </XMApp>
                    </XMApp>
                    <XMApp>
                      <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                      <XMTok font="bold" role="UNKNOWN">b</XMTok>
                      <XMTok fontsize="70%" meaning="2" role="NUMBER">2</XMTok>
                    </XMApp>
                  </XMApp>
                </XMApp>
              </XMath>
            </Math>
<ERROR class="undefined">\STATE</ERROR><text font="bold">LN2:</text> <Math mode="inline" tex="\mathbf{N}_{2}\leftarrow\mathrm{LayerNorm}(\mathbf{Z}_{2})" text="N _ 2 leftarrow LayerNorm * Z _ 2" xml:id="S5.SSx3.SSS3.p9.m14">
              <XMath>
                <XMApp>
                  <XMTok name="leftarrow" role="ARROW">←</XMTok>
                  <XMApp>
                    <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="bold" role="UNKNOWN">N</XMTok>
                    <XMTok fontsize="70%" meaning="2" role="NUMBER">2</XMTok>
                  </XMApp>
                  <XMApp>
                    <XMTok meaning="times" role="MULOP">⁢</XMTok>
                    <XMTok role="UNKNOWN">LayerNorm</XMTok>
                    <XMDual>
                      <XMRef idref="S5.SSx3.SSS3.p9.m14.1"/>
                      <XMWrap>
                        <XMTok role="OPEN" stretchy="false">(</XMTok>
                        <XMApp xml:id="S5.SSx3.SSS3.p9.m14.1">
                          <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                          <XMTok font="bold" role="UNKNOWN">Z</XMTok>
                          <XMTok fontsize="70%" meaning="2" role="NUMBER">2</XMTok>
                        </XMApp>
                        <XMTok role="CLOSE" stretchy="false">)</XMTok>
                      </XMWrap>
                    </XMDual>
                  </XMApp>
                </XMApp>
              </XMath>
            </Math>
<ERROR class="undefined">\STATE</ERROR><text font="bold">ReLU2:</text> <Math mode="inline" tex="\mathbf{A}_{2}\leftarrow\mathrm{ReLU}(\mathbf{N}_{2})" text="A _ 2 leftarrow ReLU * N _ 2" xml:id="S5.SSx3.SSS3.p9.m15">
              <XMath>
                <XMApp>
                  <XMTok name="leftarrow" role="ARROW">←</XMTok>
                  <XMApp>
                    <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="bold" role="UNKNOWN">A</XMTok>
                    <XMTok fontsize="70%" meaning="2" role="NUMBER">2</XMTok>
                  </XMApp>
                  <XMApp>
                    <XMTok meaning="times" role="MULOP">⁢</XMTok>
                    <XMTok role="UNKNOWN">ReLU</XMTok>
                    <XMDual>
                      <XMRef idref="S5.SSx3.SSS3.p9.m15.1"/>
                      <XMWrap>
                        <XMTok role="OPEN" stretchy="false">(</XMTok>
                        <XMApp xml:id="S5.SSx3.SSS3.p9.m15.1">
                          <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                          <XMTok font="bold" role="UNKNOWN">N</XMTok>
                          <XMTok fontsize="70%" meaning="2" role="NUMBER">2</XMTok>
                        </XMApp>
                        <XMTok role="CLOSE" stretchy="false">)</XMTok>
                      </XMWrap>
                    </XMDual>
                  </XMApp>
                </XMApp>
              </XMath>
            </Math>
<ERROR class="undefined">\STATE</ERROR><text font="bold">Drop2:</text> <Math mode="inline" tex="\mathbf{D}_{2}\leftarrow\mathrm{Dropout}(\mathbf{A}_{2},\ p=0.275)" xml:id="S5.SSx3.SSS3.p9.m16">
              <XMath>
                <XMApp>
                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                  <XMTok font="bold" role="UNKNOWN">D</XMTok>
                  <XMTok fontsize="70%" meaning="2" role="NUMBER">2</XMTok>
                </XMApp>
                <XMTok name="leftarrow" role="ARROW">←</XMTok>
                <XMTok role="UNKNOWN">Dropout</XMTok>
                <XMWrap>
                  <XMTok role="OPEN" stretchy="false">(</XMTok>
                  <XMApp>
                    <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="bold" role="UNKNOWN">A</XMTok>
                    <XMTok fontsize="70%" meaning="2" role="NUMBER">2</XMTok>
                  </XMApp>
                  <XMTok role="PUNCT" rpadding="5.0pt">,</XMTok>
                  <XMTok font="italic" role="UNKNOWN">p</XMTok>
                  <XMTok meaning="equals" role="RELOP">=</XMTok>
                  <XMTok meaning="0.275" role="NUMBER">0.275</XMTok>
                  <XMTok role="CLOSE" stretchy="false">)</XMTok>
                </XMWrap>
              </XMath>
            </Math>
<ERROR class="undefined">\STATE</ERROR><text font="bold">FC3:</text> <Math mode="inline" tex="\mathbf{logits}[i]\leftarrow\mathbf{D}_{2}\mathbf{W}_{3}+\mathbf{b}_{3}" text="logits * delimited-[]@(i) leftarrow D _ 2 * W _ 3 + b _ 3" xml:id="S5.SSx3.SSS3.p9.m17">
              <XMath>
                <XMApp>
                  <XMTok name="leftarrow" role="ARROW">←</XMTok>
                  <XMApp>
                    <XMTok meaning="times" role="MULOP">⁢</XMTok>
                    <XMTok font="bold" role="UNKNOWN">logits</XMTok>
                    <XMDual>
                      <XMApp>
                        <XMTok meaning="delimited-[]"/>
                        <XMRef idref="S5.SSx3.SSS3.p9.m17.1"/>
                      </XMApp>
                      <XMWrap>
                        <XMTok role="OPEN" stretchy="false">[</XMTok>
                        <XMTok font="italic" role="UNKNOWN" xml:id="S5.SSx3.SSS3.p9.m17.1">i</XMTok>
                        <XMTok role="CLOSE" stretchy="false">]</XMTok>
                      </XMWrap>
                    </XMDual>
                  </XMApp>
                  <XMApp>
                    <XMTok meaning="plus" role="ADDOP">+</XMTok>
                    <XMApp>
                      <XMTok meaning="times" role="MULOP">⁢</XMTok>
                      <XMApp>
                        <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                        <XMTok font="bold" role="UNKNOWN">D</XMTok>
                        <XMTok fontsize="70%" meaning="2" role="NUMBER">2</XMTok>
                      </XMApp>
                      <XMApp>
                        <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                        <XMTok font="bold" role="UNKNOWN">W</XMTok>
                        <XMTok fontsize="70%" meaning="3" role="NUMBER">3</XMTok>
                      </XMApp>
                    </XMApp>
                    <XMApp>
                      <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                      <XMTok font="bold" role="UNKNOWN">b</XMTok>
                      <XMTok fontsize="70%" meaning="3" role="NUMBER">3</XMTok>
                    </XMApp>
                  </XMApp>
                </XMApp>
              </XMath>
            </Math>
<ERROR class="undefined">\ENDFOR</ERROR></p>
        </para>
      </subsubsection>
      <subsubsection inlist="toc" xml:id="S5.SSx3.SSS4">
        <tags>
          <tag>V-C4</tag>
          <tag role="refnum">V-C4</tag>
          <tag role="typerefnum">§V-C4</tag>
        </tags>
        <title><tag close=" ">V-C4</tag>Rationale for Hybridization</title>
        <para xml:id="S5.SSx3.SSS4.p1">
          <p>While CNNs excel at modeling texture and color, they fail to capture geometric expressiveness, especially in ambiguous or flattened affect. GCNs, while geometrically robust, miss texture semantics. Fusion-N effectively combines both modalities, enhancing generalizability and interpretability in real-world ASD settings.</p>
        </para>
        <table inlist="lot" placement="htbp" xml:id="S5.T3">
          <tags>
            <tag>TABLE III</tag>
            <tag role="refnum">III</tag>
            <tag role="typerefnum">TABLE III</tag>
          </tags>
          <toccaption class="ltx_centering"><tag close=" ">III</tag>Fusion-N Architecture Comparison</toccaption>
          <caption class="ltx_centering"><tag close=": ">TABLE III</tag>Fusion-N Architecture Comparison</caption>
          <tabular class="ltx_centering ltx_guessed_headers" rowsep="4.0pt" vattach="middle">
            <thead>
              <tr>
                <td align="justify" border="l r t" thead="column" width="56.9pt"><text class="ltx_wrap" font="bold">Characteristic</text></td>
                <td align="justify" border="r t" thead="column" width="62.6pt"><text class="ltx_wrap" font="bold">CNN</text></td>
                <td align="justify" border="r t" thead="column" width="62.6pt"><text class="ltx_wrap" font="bold">GCN</text></td>
              </tr>
            </thead>
            <tbody>
              <tr>
                <td align="justify" border="l r t" width="56.9pt">Input</td>
                <td align="justify" border="r t" width="62.6pt">RGB facial images</td>
                <td align="justify" border="r t" width="62.6pt">Facial landmarks as a graph</td>
              </tr>
              <tr>
                <td align="justify" border="l r t" width="56.9pt">Backbone</td>
                <td align="justify" border="r t" width="62.6pt">Pre-trained ResNet-50</td>
                <td align="justify" border="r t" width="62.6pt">3-layer Graph Convolutional Network</td>
              </tr>
              <tr>
                <td align="justify" border="l r t" width="56.9pt">Feature Representation</td>
                <td align="justify" border="r t" width="62.6pt">Deep feature representation (<Math mode="inline" tex="F_{\text{CNN}}" text="F _ [CNN]" xml:id="S5.T3.m1">
                    <XMath>
                      <XMApp>
                        <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                        <XMTok font="italic" role="UNKNOWN">F</XMTok>
                        <XMText><text fontsize="70%">CNN</text></XMText>
                      </XMApp>
                    </XMath>
                  </Math>)</td>
                <td align="justify" border="r t" width="62.6pt">Graph representation (<Math mode="inline" tex="H^{(3)}" text="H ^ 3" xml:id="S5.T3.m2">
                    <XMath>
                      <XMApp>
                        <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                        <XMTok font="italic" role="UNKNOWN">H</XMTok>
                        <XMDual>
                          <XMRef idref="S5.T3.m2.1"/>
                          <XMWrap>
                            <XMTok fontsize="70%" role="OPEN" stretchy="false">(</XMTok>
                            <XMTok fontsize="70%" meaning="3" role="NUMBER" xml:id="S5.T3.m2.1">3</XMTok>
                            <XMTok fontsize="70%" role="CLOSE" stretchy="false">)</XMTok>
                          </XMWrap>
                        </XMDual>
                      </XMApp>
                    </XMath>
                  </Math>)</td>
              </tr>
              <tr>
                <td align="justify" border="l r t" width="56.9pt">Attention Module</td>
                <td align="justify" border="r t" width="62.6pt">Channel-wise attention</td>
                <td align="justify" border="r t" width="62.6pt">Attention after mean-pooling</td>
              </tr>
              <tr>
                <td align="justify" border="b l r t" width="56.9pt">Output Dimension</td>
                <td align="justify" border="b r t" width="62.6pt"><Math mode="inline" tex="F_{\text{CNN\_attn}}\in\mathbb{R}^{2048}" text="F _ [CNN_attn] element-of R ^ 2048" xml:id="S5.T3.m3">
                    <XMath>
                      <XMApp>
                        <XMTok meaning="element-of" name="in" role="RELOP">∈</XMTok>
                        <XMApp>
                          <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                          <XMTok font="italic" role="UNKNOWN">F</XMTok>
                          <XMText><text fontsize="70%">CNN_attn</text></XMText>
                        </XMApp>
                        <XMApp>
                          <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                          <XMTok font="blackboard" role="UNKNOWN">R</XMTok>
                          <XMTok fontsize="70%" meaning="2048" role="NUMBER">2048</XMTok>
                        </XMApp>
                      </XMApp>
                    </XMath>
                  </Math></td>
                <td align="justify" border="b r t" width="62.6pt"><Math mode="inline" tex="F_{\text{GCN}}\in\mathbb{R}^{128}" text="F _ [GCN] element-of R ^ 128" xml:id="S5.T3.m4">
                    <XMath>
                      <XMApp>
                        <XMTok meaning="element-of" name="in" role="RELOP">∈</XMTok>
                        <XMApp>
                          <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                          <XMTok font="italic" role="UNKNOWN">F</XMTok>
                          <XMText><text fontsize="70%">GCN</text></XMText>
                        </XMApp>
                        <XMApp>
                          <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                          <XMTok font="blackboard" role="UNKNOWN">R</XMTok>
                          <XMTok fontsize="70%" meaning="128" role="NUMBER">128</XMTok>
                        </XMApp>
                      </XMApp>
                    </XMath>
                  </Math></td>
              </tr>
            </tbody>
          </tabular>
        </table>
      </subsubsection>
    </subsection>
  </section>
  <section inlist="toc" xml:id="S6">
    <tags>
      <tag>VI</tag>
      <tag role="refnum">VI</tag>
      <tag role="typerefnum">§VI</tag>
    </tags>
    <title><tag close=" ">VI</tag><text font="smallcaps">Results</text></title>
    <subsection inlist="toc" xml:id="S6.SS1">
      <tags>
        <tag>VI-A</tag>
        <tag role="refnum">VI-A</tag>
        <tag role="typerefnum">§VI-A</tag>
      </tags>
      <title><tag close=" ">VI-A</tag><text font="italic">Performance Comparison with Prior Work</text></title>
      <subsubsection inlist="toc" xml:id="S6.SS1.SSS1">
        <tags>
          <tag>VI-A1</tag>
          <tag role="refnum">VI-A1</tag>
          <tag role="typerefnum">§VI-A1</tag>
        </tags>
        <title><tag close=" ">VI-A1</tag>Soft Label Generation via Ensemble Prediction</title>
        <para xml:id="S6.SS1.SSS1.p1">
          <p>To validate our ensemble-based emotion labeling framework for ASD contexts, we used an external dataset of autistic children curated by Dr. Fatma M. Talaat <cite class="ltx_citemacro_cite">[<bibref bibrefs="talaat2023dataset" separator="," yyseparator=","/>]</cite>. A representative subset of 100 images was selected with regards to maintaining a balance between the emotions and to match our cohort’s age and maximize ethnic diversity, reflecting the cross-cultural variance emphasized in <cite class="ltx_citemacro_cite">[<bibref bibrefs="rhue2021racial,fan2023addressing" separator="," yyseparator=","/>]</cite>.</p>
        </para>
<!--  %**** main.tex Line 500 **** -->        <para xml:id="S6.SS1.SSS1.p2">
          <p>Each image was annotated by a licensed clinical psychologist after which 61 total images were finally analysed (some were removed on the account of the image being a little difficult to label as per and to avoid confusions) and compared against predictions from our ensemble fusion pipeline, which integrates multiple pre-trained models. The approach achieved 90.16% accuracy relative to expert labels, demonstrating high reliability and reducing the annotation burden typical in ASD datasets.<break/>Compared to DeepFace(Mini-Xception) (67.07%), FER (71.95%), and their average-fused variant (73.17%), our ensemble showed superior accuracy shown in Table IV, reinforcing its robustness and suitability for real-world clinical deployment.</p>
        </para>
        <table inlist="lot" labels="LABEL:tab:accuracy_summary" placement="h!" xml:id="S6.T4">
          <tags>
            <tag>TABLE IV</tag>
            <tag role="refnum">IV</tag>
            <tag role="typerefnum">TABLE IV</tag>
          </tags>
<!--  %Improves row spacing -->          <toccaption class="ltx_centering"><tag close=" ">IV</tag>Accuracy comparison of individual models and ensemble methods.</toccaption>
          <caption class="ltx_centering"><tag close=": ">TABLE IV</tag>Accuracy comparison of individual models and ensemble methods.</caption>
          <tabular class="ltx_centering ltx_guessed_headers" rowsep="3.0pt" vattach="middle">
            <thead>
              <tr>
                <td align="justify" border="l r t" thead="column row" width="170.7pt"><text class="ltx_wrap" font="bold">Model</text></td>
                <td align="center" border="r t" thead="column"><text font="bold">Accuracy (%)</text></td>
              </tr>
            </thead>
            <tbody>
              <tr>
                <td align="justify" border="l r t" thead="row" width="170.7pt">DeepFace only</td>
                <td align="center" border="r t">67.07</td>
              </tr>
              <tr>
                <td align="justify" border="l r t" thead="row" width="170.7pt">Mini-Xception (FER)</td>
                <td align="center" border="r t">71.95</td>
              </tr>
              <tr>
                <td align="justify" border="l r t" thead="row" width="170.7pt">Average Fusion (DF + FER)</td>
                <td align="center" border="r t">73.17</td>
              </tr>
              <tr>
                <td align="justify" border="b l r t" thead="row" width="170.7pt"><text class="ltx_wrap" font="bold">Ensemble Method (Weighted Average)</text></td>
                <td align="center" border="b r t"><text font="bold">90.16</text></td>
              </tr>
            </tbody>
          </tabular>
        </table>
      </subsubsection>
      <subsubsection inlist="toc" xml:id="S6.SS1.SSS2">
        <tags>
          <tag>VI-A2</tag>
          <tag role="refnum">VI-A2</tag>
          <tag role="typerefnum">§VI-A2</tag>
        </tags>
        <title><tag close=" ">VI-A2</tag> Hybrid Model Training and Optimization</title>
<!--  %**** main.tex Line 525 **** -->        <para xml:id="S6.SS1.SSS2.p1">
          <p>Several prior works have explored emotion recognition models tailored for autistic children. Alhakbani <cite class="ltx_citemacro_cite">[<bibref bibrefs="alhakbani2024" separator="," yyseparator=","/>]</cite> developed a CNN trained on ASD facial images across five emotion classes, achieving 75% accuracy, reflecting the challenges of affect recognition in this population. Smitha and Vinod <cite class="ltx_citemacro_cite">[<bibref bibrefs="smitha2015" separator="," yyseparator=","/>]</cite> proposed a PCA-based system deployed on FPGA; though it reached 94.1% on JAFFE, performance dropped to 82.3% on real-world ASD data, underscoring domain-specific limitations. Wang et al. <cite class="ltx_citemacro_cite">[<bibref bibrefs="wang2025" separator="," yyseparator=","/>]</cite> introduced a multimodal CVT architecture combining facial and speech inputs, where the facial-only branch achieved 79.12% and the fused model reached 90%, highlighting the benefits of cross-modal integration.</p>
        </para>
        <para xml:id="S6.SS1.SSS2.p2">
          <p>These unimodal facial expression systems (75%, 82.3%, 79.12%) offer directly comparable baselines to evaluate our model, as summarized in Table V. In contrast, our architecture built on ResNet-50 and GCN backbones was trained exclusively on an in-house ASD-specific dataset and achieved 96.2% accuracy. This improvement demonstrates the advantage of residual feature fusion for capturing subtle affective cues often missed by traditional CNNs or hand-crafted methods.</p>
        </para>
        <table inlist="lot" labels="LABEL:tab:asd_models" placement="h!" xml:id="S6.T5">
          <tags>
            <tag>TABLE V</tag>
            <tag role="refnum">V</tag>
            <tag role="typerefnum">TABLE V</tag>
          </tags>
          <toccaption class="ltx_centering"><tag close=" ">V</tag>Comparison of unimodal facial-expression models evaluated on ASD datasets and their limitations.</toccaption>
          <caption class="ltx_centering"><tag close=": ">TABLE V</tag>Comparison of unimodal facial-expression models evaluated on ASD datasets and their limitations.</caption>
          <tabular class="ltx_centering ltx_guessed_headers" rowsep="3.0pt" vattach="middle">
            <thead>
              <tr>
                <td align="justify" border="l r t" thead="column" width="79.7pt"><text class="ltx_wrap" font="bold">Study</text></td>
                <td align="justify" border="r t" thead="column" width="39.8pt"><text class="ltx_wrap" font="bold">Accuracy (%)</text></td>
                <td align="justify" border="r t" thead="column" width="88.2pt"><text class="ltx_wrap" font="bold">Limitations</text></td>
              </tr>
            </thead>
            <tbody>
              <tr>
                <td align="justify" border="l r t" width="79.7pt">Alhakbani (2024) <cite class="ltx_citemacro_cite">[<bibref bibrefs="alhakbani2024" separator="," yyseparator=","/>]</cite></td>
                <td align="justify" border="r t" width="39.8pt"><Math mode="inline" tex="\sim" text="similar-to" xml:id="S6.T5.m1">
                    <XMath>
                      <XMTok meaning="similar-to" name="sim" role="RELOP">∼</XMTok>
                    </XMath>
                  </Math>75.0</td>
                <td align="justify" border="r t" width="88.2pt">Small and demographically narrow dataset with limited generalization.</td>
              </tr>
              <tr>
                <td align="justify" border="l r t" width="79.7pt">Smitha &amp; Vinod (2015) <cite class="ltx_citemacro_cite">[<bibref bibrefs="smitha2015" separator="," yyseparator=","/>]</cite></td>
                <td align="justify" border="r t" width="39.8pt">82.3</td>
                <td align="justify" border="r t" width="88.2pt">Low-resolution PCA features that lacks geometric cues and real-time support.</td>
              </tr>
              <tr>
                <td align="justify" border="l r t" width="79.7pt">Wang et al. (2025) <cite class="ltx_citemacro_cite">[<bibref bibrefs="wang2025" separator="," yyseparator=","/>]</cite></td>
                <td align="justify" border="r t" width="39.8pt">79.1</td>
                <td align="justify" border="r t" width="88.2pt">Confusion in similar emotions; no temporal modeling or ablation.</td>
              </tr>
              <tr>
                <td align="justify" border="b l r t" width="79.7pt">Our Model (2025)</td>
                <td align="justify" border="b r t" width="39.8pt">96.2</td>
                <td align="justify" border="b r t" width="88.2pt">Not real-time; possible latency in live deployment.</td>
              </tr>
            </tbody>
          </tabular>
        </table>
<!--  %**** main.tex Line 550 **** -->      </subsubsection>
    </subsection>
    <subsection inlist="toc" xml:id="S6.SS2">
      <tags>
        <tag>VI-B</tag>
        <tag role="refnum">VI-B</tag>
        <tag role="typerefnum">§VI-B</tag>
      </tags>
      <title><tag close=" ">VI-B</tag><text font="italic">Experimental results</text></title>
      <subsubsection inlist="toc" xml:id="S6.SS2.SSS1">
        <tags>
          <tag>VI-B1</tag>
          <tag role="refnum">VI-B1</tag>
          <tag role="typerefnum">§VI-B1</tag>
        </tags>
        <title><tag close=" ">VI-B1</tag>Face pre-processing outcomes</title>
        <para xml:id="S6.SS2.SSS1.p1">
          <p>Our preprocessing component analyzed 48,891 frames from NAO-mediated child–robot interaction videos, recorded in a naturalistic, unconstrained environment without head fixation or behavioral restrictions. Of these, 1,600 were discarded due to blurriness and 20,170 due to missed detections, leaving 19,322 valid face crops obtained through our two-stage pipeline, corresponding to a 39.5% face detection success rate. The comparatively low yield is consistent with the free-play setup, in which the NAO robot called the child’s name 12 times across sessions involving toys and spontaneous movement. The total preprocessing duration was 40,453.52 seconds (<Math mode="inline" tex="\approx" text="approximately-equals" xml:id="S6.SS2.SSS1.p1.m1">
              <XMath>
                <XMTok meaning="approximately-equals" name="approx" role="RELOP">≈</XMTok>
              </XMath>
            </Math> 11.2 hours). A summary of these statistics is provided in Table VI.</p>
        </para>
        <table inlist="lot" labels="LABEL:tab:face_preprocessing" placement="h!" xml:id="S6.T6">
          <tags>
            <tag>TABLE VI</tag>
            <tag role="refnum">VI</tag>
            <tag role="typerefnum">TABLE VI</tag>
          </tags>
<!--  %Improves row spacing -->          <toccaption class="ltx_centering"><tag close=" ">VI</tag>Summary of face preprocessing statistics</toccaption>
          <caption class="ltx_centering"><tag close=": ">TABLE VI</tag>Summary of face preprocessing statistics</caption>
          <tabular class="ltx_centering ltx_guessed_headers" rowsep="3.0pt" vattach="middle">
            <thead>
              <tr>
                <td align="justify" border="l r t" thead="column row" width="170.7pt"><text class="ltx_wrap" font="bold">Metric</text></td>
                <td align="center" border="r t" thead="column"><text font="bold">Value</text></td>
              </tr>
            </thead>
            <tbody>
              <tr>
                <td align="justify" border="l r t" thead="row" width="170.7pt">Total images found</td>
                <td align="center" border="r t">48,891</td>
              </tr>
              <tr>
                <td align="justify" border="l r t" thead="row" width="170.7pt">Valid images</td>
                <td align="center" border="r t">48,886</td>
              </tr>
              <tr>
                <td align="justify" border="l r t" thead="row" width="170.7pt">Blurry images skipped</td>
                <td align="center" border="r t">1,600</td>
              </tr>
              <tr>
                <td align="justify" border="l r t" thead="row" width="170.7pt">Images with no faces</td>
                <td align="center" border="r t">20,170</td>
              </tr>
              <tr>
                <td align="justify" border="l r t" thead="row" width="170.7pt">Total faces extracted</td>
                <td align="center" border="r t">19,322</td>
              </tr>
              <tr>
                <td align="justify" border="l r t" thead="row" width="170.7pt">Success rate</td>
                <td align="center" border="r t">39.5%</td>
              </tr>
              <tr>
                <td align="justify" border="b l r t" thead="row" width="170.7pt">Processing time (seconds)</td>
                <td align="center" border="b r t">40,453.52</td>
              </tr>
            </tbody>
          </tabular>
        </table>
      </subsubsection>
      <subsubsection inlist="toc" xml:id="S6.SS2.SSS2">
        <tags>
          <tag>VI-B2</tag>
          <tag role="refnum">VI-B2</tag>
          <tag role="typerefnum">§VI-B2</tag>
        </tags>
        <title><tag close=" ">VI-B2</tag>Emotion distributed throughout the experiment</title>
        <para xml:id="S6.SS2.SSS2.p1">
          <p>Each child participated in a 200-second interaction session, with video recorded at 15 frames per second, yielding a high number of frames per participant. These were processed through our facial landmark extraction and hybrid deep learning classification pipeline.</p>
        </para>
        <para xml:id="S6.SS2.SSS2.p2">
          <p>Fig. 8 presents the distribution of emotion labels obtained via our weighted ensemble method. Most frames were classified as <text font="italic">neutral</text> (8,969) and <text font="italic">happy</text> (5,309), suggesting a predominance of non-negative affective states during the interaction. Moderate representation was observed for <text font="italic">angry</text> (1,822), <text font="italic">surprise</text> (1,605), and <text font="italic">sad</text> (1,386), while <text font="italic">disgust</text> (152) and <text font="italic">fear</text> (79) were rare, likely due to the controlled experimental setting.</p>
        </para>
        <figure inlist="lof" labels="LABEL:Figure_5" placement="H" xml:id="S6.F8">
          <tags>
            <tag>Fig. 8</tag>
            <tag role="refnum">8</tag>
            <tag role="typerefnum">Fig. 8</tag>
          </tags>
          <graphics candidates="emotion19k.pdf" class="ltx_centering" graphic="emotion19k.pdf" options="width=252.945pt" xml:id="S6.F8.g1"/>
          <toccaption class="ltx_centering"><tag close=" ">8</tag>Bar-chart representation of emotion distribution.</toccaption>
          <caption class="ltx_centering"><tag close=": ">Fig. 8</tag>Bar-chart representation of emotion distribution.</caption>
        </figure>
      </subsubsection>
    </subsection>
    <subsection inlist="toc" xml:id="S6.SS3">
      <tags>
        <tag>VI-C</tag>
        <tag role="refnum">VI-C</tag>
        <tag role="typerefnum">§VI-C</tag>
      </tags>
      <title><tag close=" ">VI-C</tag><text font="italic">Prediction Analysis</text></title>
<!--  %**** main.tex Line 600 **** -->      <para xml:id="S6.SS3.p1">
        <p>In order to quantitatively assess our ensemble-based emotion recognition system on responses of ASD children, a multi-layered visual and statistical analysis was conducted across seven emotion categories: <text font="italic">happy</text>, <text font="italic">sad</text>, <text font="italic">angry</text>, <text font="italic">fear</text>, <text font="italic">disgust</text>, <text font="italic">surprise</text>, and <text font="italic">neutral</text>. Emotion-wise softmax scores of the Fusion-N model were investigated for prediction confidence, shape of distribution, and separability between classes. From <text font="typewriter">emotion_descriptive_stats.csv</text>, mean confidence values suggested <text font="italic">happy</text> (<Math mode="inline" tex="M=0.1459" text="M = 0.1459" xml:id="S6.SS3.p1.m1">
            <XMath>
              <XMApp>
                <XMTok meaning="equals" role="RELOP">=</XMTok>
                <XMTok font="italic" role="UNKNOWN">M</XMTok>
                <XMTok meaning="0.1459" role="NUMBER">0.1459</XMTok>
              </XMApp>
            </XMath>
          </Math>), <text font="italic">sad</text> (<Math mode="inline" tex="M=0.1443" text="M = 0.1443" xml:id="S6.SS3.p1.m2">
            <XMath>
              <XMApp>
                <XMTok meaning="equals" role="RELOP">=</XMTok>
                <XMTok font="italic" role="UNKNOWN">M</XMTok>
                <XMTok meaning="0.1443" role="NUMBER">0.1443</XMTok>
              </XMApp>
            </XMath>
          </Math>), and <text font="italic">surprise</text> (<Math mode="inline" tex="M=0.1434" text="M = 0.1434" xml:id="S6.SS3.p1.m3">
            <XMath>
              <XMApp>
                <XMTok meaning="equals" role="RELOP">=</XMTok>
                <XMTok font="italic" role="UNKNOWN">M</XMTok>
                <XMTok meaning="0.1434" role="NUMBER">0.1434</XMTok>
              </XMApp>
            </XMath>
          </Math>) to be most prevailing, with <text font="italic">neutral</text> lowest (<Math mode="inline" tex="M=0.1386" text="M = 0.1386" xml:id="S6.SS3.p1.m4">
            <XMath>
              <XMApp>
                <XMTok meaning="equals" role="RELOP">=</XMTok>
                <XMTok font="italic" role="UNKNOWN">M</XMTok>
                <XMTok meaning="0.1386" role="NUMBER">0.1386</XMTok>
              </XMApp>
            </XMath>
          </Math>). Low model uncertainty is indicated by narrow standard deviations for all classes (<Math mode="inline" tex="\sigma\approx 0.001\text{--}0.003" text="sigma approximately-equals 0.001 * [–] * 0.003" xml:id="S6.SS3.p1.m5">
            <XMath>
              <XMApp>
                <XMTok meaning="approximately-equals" name="approx" role="RELOP">≈</XMTok>
                <XMTok font="italic" name="sigma" role="UNKNOWN">σ</XMTok>
                <XMApp>
                  <XMTok meaning="times" role="MULOP">⁢</XMTok>
                  <XMTok meaning="0.001" role="NUMBER">0.001</XMTok>
                  <XMText>–</XMText>
                  <XMTok meaning="0.003" role="NUMBER">0.003</XMTok>
                </XMApp>
              </XMApp>
            </XMath>
          </Math>).</p>
      </para>
      <figure inlist="lof" labels="LABEL:fig:kde" placement="ht" xml:id="S6.F9">
        <tags>
          <tag>Fig. 9</tag>
          <tag role="refnum">9</tag>
          <tag role="typerefnum">Fig. 9</tag>
        </tags>
        <graphics candidates="KDE.pdf" class="ltx_centering" graphic="KDE.pdf" options="width=195.129pt" xml:id="S6.F9.g1"/>
        <toccaption class="ltx_centering"><tag close=" ">9</tag>Smoothed KDE Curves for Emotion Scores.</toccaption>
        <caption class="ltx_centering"><tag close=": ">Fig. 9</tag>Smoothed KDE Curves for Emotion Scores.</caption>
      </figure>
      <para xml:id="S6.SS3.p2">
        <p>The boxplot (Fig. <ref labelref="LABEL:fig:box"/>) indicated a greater median and wider outlier spread for <text font="italic">happy</text>, tightly concentrated in <Math mode="inline" tex="[0.145,0.155]" text="closed-interval@(0.145, 0.155)" xml:id="S6.SS3.p2.m1">
            <XMath>
              <XMDual>
                <XMApp>
                  <XMTok meaning="closed-interval"/>
                  <XMRef idref="S6.SS3.p2.m1.1"/>
                  <XMRef idref="S6.SS3.p2.m1.2"/>
                </XMApp>
                <XMWrap>
                  <XMTok role="OPEN" stretchy="false">[</XMTok>
                  <XMTok meaning="0.145" role="NUMBER" xml:id="S6.SS3.p2.m1.1">0.145</XMTok>
                  <XMTok role="PUNCT">,</XMTok>
                  <XMTok meaning="0.155" role="NUMBER" xml:id="S6.SS3.p2.m1.2">0.155</XMTok>
                  <XMTok role="CLOSE" stretchy="false">]</XMTok>
                </XMWrap>
              </XMDual>
            </XMath>
          </Math>, while <text font="italic">neutral</text> was tightly restricted in <Math mode="inline" tex="[0.138,0.140]" text="closed-interval@(0.138, 0.140)" xml:id="S6.SS3.p2.m2">
            <XMath>
              <XMDual>
                <XMApp>
                  <XMTok meaning="closed-interval"/>
                  <XMRef idref="S6.SS3.p2.m2.1"/>
                  <XMRef idref="S6.SS3.p2.m2.2"/>
                </XMApp>
                <XMWrap>
                  <XMTok role="OPEN" stretchy="false">[</XMTok>
                  <XMTok meaning="0.138" role="NUMBER" xml:id="S6.SS3.p2.m2.1">0.138</XMTok>
                  <XMTok role="PUNCT">,</XMTok>
                  <XMTok meaning="0.140" role="NUMBER" xml:id="S6.SS3.p2.m2.2">0.140</XMTok>
                  <XMTok role="CLOSE" stretchy="false">]</XMTok>
                </XMWrap>
              </XMDual>
            </XMath>
          </Math>. KDE smoothing indicated (Fig. 9) a right-skewed peak for <text font="italic">happy</text> (<Math mode="inline" tex="\approx 0.148" text="absent approximately-equals 0.148" xml:id="S6.SS3.p2.m3">
            <XMath>
              <XMApp>
                <XMTok meaning="approximately-equals" name="approx" role="RELOP">≈</XMTok>
                <XMTok meaning="absent"/>
                <XMTok meaning="0.148" role="NUMBER">0.148</XMTok>
              </XMApp>
            </XMath>
          </Math>), while overlapping distributions for <text font="italic">sad</text>, <text font="italic">fear</text>, and <text font="italic">angry</text> reflect difficulties in distinguishing among these emotions due to their subtle expressivity in ASD.</p>
      </para>
      <para xml:id="S6.SS3.p3">
        <p>Additionally, to examine the overall emotional tendencies of the autistic children, we classified the emotions that were observed during name-calling event into two categories : positive (happy, surprise) and negative (sad,angry,disgust). Fig 11 (pie-chart) shows that the majority of children, i.e, 73.3 % (11 out of 15) exhibited predominantly positive emotions and the rest 26.7 %(4 out of 15) were dominated by negative emotions. This observation aligns with prior work showing that robot-based interactive interventions can foster engagement and elicit positive responses in children with ASD <cite class="ltx_citemacro_cite">[<bibref bibrefs="Alarcon2021" separator="," yyseparator=","/>]</cite>.</p>
      </para>
      <figure inlist="lof" labels="LABEL:fig:box" placement="H" xml:id="S6.F10">
        <tags>
          <tag>Fig. 10</tag>
          <tag role="refnum">10</tag>
          <tag role="typerefnum">Fig. 10</tag>
        </tags>
        <graphics candidates="box_whisker.pdf" class="ltx_centering" graphic="box_whisker.pdf" options="width=195.129pt" xml:id="S6.F10.g1"/>
        <toccaption class="ltx_centering"><tag close=" ">10</tag>Box-Whisker Plot for Emotion Confidence Scores.</toccaption>
        <caption class="ltx_centering"><tag close=": ">Fig. 10</tag>Box-Whisker Plot for Emotion Confidence Scores.</caption>
      </figure>
<!--  %**** main.tex Line 625 **** -->      <figure inlist="lof" labels="LABEL:fig:kde" placement="ht" xml:id="S6.F11">
        <tags>
          <tag>Fig. 11</tag>
          <tag role="refnum">11</tag>
          <tag role="typerefnum">Fig. 11</tag>
        </tags>
        <graphics candidates="dominant_emotion_pie.pdf" class="ltx_centering" graphic="dominant_emotion_pie.pdf" options="width=160.4394pt" xml:id="S6.F11.g1"/>
        <toccaption class="ltx_centering"><tag close=" ">11</tag>Pie-chart representing distribution of positive vs negative emotions on name-calling event. Teal shade represents positive (happy, surprise) emotions and coral shade represents negative emotions (sad, angry, disgust,fear).</toccaption>
        <caption class="ltx_centering"><tag close=": ">Fig. 11</tag>Pie-chart representing distribution of positive vs negative emotions on name-calling event. Teal shade represents positive (happy, surprise) emotions and coral shade represents negative emotions (sad, angry, disgust,fear).</caption>
      </figure>
    </subsection>
    <subsection inlist="toc" xml:id="S6.SS4">
      <tags>
        <tag>VI-D</tag>
        <tag role="refnum">VI-D</tag>
        <tag role="typerefnum">§VI-D</tag>
      </tags>
      <title><tag close=" ">VI-D</tag><text font="italic">Statistical Significance Testing</text></title>
      <para xml:id="S6.SS4.p1">
        <p>ANOVA and Kruskal–Wallis tests between the seven emotion classes verified significant variation in model confidence:</p>
        <itemize xml:id="S6.I1">
          <item xml:id="S6.I1.i1">
            <tags>
              <tag>•</tag>
              <tag role="typerefnum">1st item</tag>
            </tags>
            <para xml:id="S6.I1.i1.p1">
              <p><text font="bold">ANOVA:</text> <Math mode="inline" tex="F(6,N)=202.00" text="F * open-interval@(6, N) = 202.00" xml:id="S6.I1.i1.p1.m1">
                  <XMath>
                    <XMApp>
                      <XMTok meaning="equals" role="RELOP">=</XMTok>
                      <XMApp>
                        <XMTok meaning="times" role="MULOP">⁢</XMTok>
                        <XMTok font="italic" role="UNKNOWN">F</XMTok>
                        <XMDual>
                          <XMApp>
                            <XMTok meaning="open-interval"/>
                            <XMRef idref="S6.I1.i1.p1.m1.1"/>
                            <XMRef idref="S6.I1.i1.p1.m1.2"/>
                          </XMApp>
                          <XMWrap>
                            <XMTok role="OPEN" stretchy="false">(</XMTok>
                            <XMTok meaning="6" role="NUMBER" xml:id="S6.I1.i1.p1.m1.1">6</XMTok>
                            <XMTok role="PUNCT">,</XMTok>
                            <XMTok font="italic" role="UNKNOWN" xml:id="S6.I1.i1.p1.m1.2">N</XMTok>
                            <XMTok role="CLOSE" stretchy="false">)</XMTok>
                          </XMWrap>
                        </XMDual>
                      </XMApp>
                      <XMTok meaning="202.00" role="NUMBER">202.00</XMTok>
                    </XMApp>
                  </XMath>
                </Math>, <Math mode="inline" tex="p&lt;1.0\times 10^{-180}" text="p less 1.0 * 10 ^ (- 180)" xml:id="S6.I1.i1.p1.m2">
                  <XMath>
                    <XMApp>
                      <XMTok meaning="less-than" role="RELOP">&lt;</XMTok>
                      <XMTok font="italic" role="UNKNOWN">p</XMTok>
                      <XMApp>
                        <XMTok meaning="times" role="MULOP">×</XMTok>
                        <XMTok meaning="1.0" role="NUMBER">1.0</XMTok>
                        <XMApp>
                          <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                          <XMTok meaning="10" role="NUMBER">10</XMTok>
                          <XMApp>
                            <XMTok fontsize="70%" meaning="minus" role="ADDOP">-</XMTok>
                            <XMTok fontsize="70%" meaning="180" role="NUMBER">180</XMTok>
                          </XMApp>
                        </XMApp>
                      </XMApp>
                    </XMApp>
                  </XMath>
                </Math></p>
            </para>
          </item>
          <item xml:id="S6.I1.i2">
            <tags>
              <tag>•</tag>
              <tag role="typerefnum">2nd item</tag>
            </tags>
            <para xml:id="S6.I1.i2.p1">
              <p><text font="bold">Kruskal–Wallis:</text> <Math mode="inline" tex="H(6)=692.18" text="H * 6 = 692.18" xml:id="S6.I1.i2.p1.m1">
                  <XMath>
                    <XMApp>
                      <XMTok meaning="equals" role="RELOP">=</XMTok>
                      <XMApp>
                        <XMTok meaning="times" role="MULOP">⁢</XMTok>
                        <XMTok font="italic" role="UNKNOWN">H</XMTok>
                        <XMDual>
                          <XMRef idref="S6.I1.i2.p1.m1.1"/>
                          <XMWrap>
                            <XMTok role="OPEN" stretchy="false">(</XMTok>
                            <XMTok meaning="6" role="NUMBER" xml:id="S6.I1.i2.p1.m1.1">6</XMTok>
                            <XMTok role="CLOSE" stretchy="false">)</XMTok>
                          </XMWrap>
                        </XMDual>
                      </XMApp>
                      <XMTok meaning="692.18" role="NUMBER">692.18</XMTok>
                    </XMApp>
                  </XMath>
                </Math>, <Math mode="inline" tex="p&lt;3.0\times 10^{-146}" text="p less 3.0 * 10 ^ (- 146)" xml:id="S6.I1.i2.p1.m2">
                  <XMath>
                    <XMApp>
                      <XMTok meaning="less-than" role="RELOP">&lt;</XMTok>
                      <XMTok font="italic" role="UNKNOWN">p</XMTok>
                      <XMApp>
                        <XMTok meaning="times" role="MULOP">×</XMTok>
                        <XMTok meaning="3.0" role="NUMBER">3.0</XMTok>
                        <XMApp>
                          <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                          <XMTok meaning="10" role="NUMBER">10</XMTok>
                          <XMApp>
                            <XMTok fontsize="70%" meaning="minus" role="ADDOP">-</XMTok>
                            <XMTok fontsize="70%" meaning="146" role="NUMBER">146</XMTok>
                          </XMApp>
                        </XMApp>
                      </XMApp>
                    </XMApp>
                  </XMath>
                </Math></p>
            </para>
          </item>
        </itemize>
      </para>
      <para xml:id="S6.SS4.p2">
        <p>Post-hoc Tukey HSD tests indicated that <text font="italic">neutral</text> was always separable, with significantly lower confidence than <text font="italic">happy</text>, <text font="italic">sad</text>, <text font="italic">angry</text>, and <text font="italic">disgust</text> (<Math mode="inline" tex="p&lt;0.001" text="p less 0.001" xml:id="S6.SS4.p2.m1">
            <XMath>
              <XMApp>
                <XMTok meaning="less-than" role="RELOP">&lt;</XMTok>
                <XMTok font="italic" role="UNKNOWN">p</XMTok>
                <XMTok meaning="0.001" role="NUMBER">0.001</XMTok>
              </XMApp>
            </XMath>
          </Math>). Both <text font="italic">happy</text> and <text font="italic">sad</text> achieved significantly higher confidence than <text font="italic">neutral</text> and <text font="italic">disgust</text>, demonstrating their salience in the ensemble’s predictions.</p>
      </para>
<!--  %**** main.tex Line 650 **** -->    </subsection>
  </section>
  <section inlist="toc" xml:id="S7">
    <tags>
      <tag>VII</tag>
      <tag role="refnum">VII</tag>
      <tag role="typerefnum">§VII</tag>
    </tags>
    <title><tag close=" ">VII</tag><text font="smallcaps">Conclusion</text></title>
    <subsection inlist="toc" xml:id="S7.SS1">
      <tags>
        <tag>VII-A</tag>
        <tag role="refnum">VII-A</tag>
        <tag role="typerefnum">§VII-A</tag>
      </tags>
      <title><tag close=" ">VII-A</tag><text font="italic">Ensemble-based labeling framework</text></title>
      <para xml:id="S7.SS1.p1">
        <p>The proposed framework integrates predictions from pre-trained models (DeepFace’s and FER) using a consensus strategy tailored for the expressive variability of autistic children. Given the inconsistent performance of off-the-shelf models on neurodiverse datasets, our ensemble was optimized to enhance robustness on ASD-specific facial data.</p>
      </para>
      <para xml:id="S7.SS1.p2">
        <p>To assess generalizability, we evaluated the ensemble on a publicly available ASD dataset <cite class="ltx_citemacro_cite">[<bibref bibrefs="talaat2023dataset" separator="," yyseparator=","/>]</cite> , annotated by a certified clinical psychologist. The model achieved 90.16% accuracy relative to expert labels (Table IV), demonstrating strong clinical concordance and adaptability to unseen data. Our results support ensemble learning as a scalable, clinically-aligned alternative to manual annotation in resource-constrained settings.</p>
      </para>
    </subsection>
    <subsection inlist="toc" xml:id="S7.SS2">
      <tags>
        <tag>VII-B</tag>
        <tag role="refnum">VII-B</tag>
        <tag role="typerefnum">§VII-B</tag>
      </tags>
      <title><tag close=" ">VII-B</tag><text font="italic">Predictive hypothesis</text></title>
      <para xml:id="S7.SS2.p1">
        <p>We compared emotion predictions made by 15 children with autism during human–robot interaction facilitated by the NAO robot comparing on 7 basic emotions. Descriptive statistics, visual distribution plots, and inferential statistical analyses were applied to determine emotional expressivity and inter-individual variability.</p>
      </para>
      <para xml:id="S7.SS2.p2">
        <p>Mean and standard deviation values were calculated for each emotion per child. <text font="italic">Happy</text>, <text font="italic">sad</text> and <text font="italic">surprise</text> exhibited higher mean scores across most participants, whereas <text font="italic">neutral</text>, <text font="italic">disgust</text>, and <text font="italic">angry</text> remained at lower and relatively stable levels. Standard deviation patterns indicated greater variability in <text font="italic">happy</text>, <text font="italic">sad</text>, and <text font="italic">fear</text>, while <text font="italic">disgust</text> and <text font="italic">neutral</text> were more consistent.</p>
      </para>
      <para xml:id="S7.SS2.p3">
        <p><text font="italic">Participant-8</text>, <text font="italic">Participant-9</text> and <text font="italic">Participant-10</text> demonstrated a higher prevalence of <text font="italic">happy</text> and <text font="italic">sad</text> predictions, consistent with the theory of emotional salience in autism spectrum disorder (ASD) <cite class="ltx_citemacro_cite">[<bibref bibrefs="Cassel2019" separator="," yyseparator=","/>]</cite>. The emotion <text font="italic">fear</text> was more dominant in some children, reinforcing prior findings that ASD individuals often exhibit elevated anxiety or hyperarousal in novel contexts such as robot interaction <cite class="ltx_citemacro_cite">[<bibref bibrefs="Costa2018" separator="," yyseparator=","/>]</cite>.</p>
      </para>
      <para xml:id="S7.SS2.p4">
        <p>The emotions <text font="italic">happy</text>, <text font="italic">sad</text> and <text font="italic">surprise</text> exhibited broader confidence intervals and denser distributions, suggesting their richer expressivity. The box-and-whisker plots confirmed this with larger inter-quartile ranges. There were several outliers as well in these emotions indicating transient emotional bursts, a known characteristic of affect dysregulation in ASD <cite class="ltx_citemacro_cite">[<bibref bibrefs="Macari2022" separator="," yyseparator=","/>]</cite>. This aligns with the known heterogeneity in affective displays among individuals on the autism spectrum, where emotional responses can range from subdued to highly exaggerated depending on context, sensory sensitivity, or individual traits.</p>
      </para>
      <subsubsection xml:id="S7.SS2.SSSx1">
        <title>Implications and Literature Alignment</title>
        <para xml:id="S7.SS2.SSSx1.p1">
          <p>Our results are consistent with psychological research on emotion expression in ASD, where children with developmental or emotional difficulties possess an innate bias toward positive expressions in interactive and observational situations. In our dataset, 73.3 % of the children exhibited a positive emotional dominance, represented by happy and surprise. An interesting minority (26.7%), however, manifested a negative dominance, namely sad, disgust, and angry, seen among participants 2, 5, 6, and 7. This diversity highlights the importance of individualized, emotion-sensitive interventions since children with the overarching negative affect can be helped through specialized intervention in affective learning environments. Furthermore, these findings verify the viability of using robotic stimuli like NAO to examine and perhaps augment autistic children’s emotional expressivity, and demonstrate the potential of emotion-aware robotics as a tool in affective computing and autism therapy.</p>
        </para>
<!--  %**** main.tex Line 675 **** -->      </subsubsection>
    </subsection>
    <subsection inlist="toc" xml:id="S7.SS3">
      <tags>
        <tag>VII-C</tag>
        <tag role="refnum">VII-C</tag>
        <tag role="typerefnum">§VII-C</tag>
      </tags>
      <title><tag close=" ">VII-C</tag><text font="italic">Future Scope and Discussions</text></title>
      <para xml:id="S7.SS3.p1">
        <p>While the current system performs reliably in offline conditions, its application in real-time scenarios remains a key area for enhancement. As of now, NAO is being used only as a facilator, the primary limitation lies in latency introduced by sequential modules, particularly during face detection and preprocessing.</p>
      </para>
      <para xml:id="S7.SS3.p2">
        <p>Future efforts can focus on optimizing the pipeline for real-time deployment by prioritising low-latency, adaptive, and hardware-efficient implementations to extend real-world applicability.</p>
      </para>
      <para xml:id="S7.SS3.p3">
        <p>Adaptive learning with reference to personal emotional profiles can improve performance across various ASD settings by detecting nuanced differences in affective expressions. Tested and validated using a geographically representative dataset, our ResNet-50 + three-layer GCN architecture presents strong, generalizable capability for ASD emotion analysis in real-world scenarios.</p>
      </para>
    </subsection>
  </section>
  <section inlist="toc" xml:id="S8">
    <tags>
      <tag>VIII</tag>
      <tag role="refnum">VIII</tag>
      <tag role="typerefnum">§VIII</tag>
    </tags>
    <title><tag close=" ">VIII</tag><text font="smallcaps">Acknowledgement</text></title>
    <para xml:id="S8.p1">
      <p>The authors thank the Smart Materials, Structures and Systems Laboratory of the Department of Mechanical Engineering and the Psychology Laboratory of the Department of Humanities and Social Sciences, IIT Kanpur, for the infrastructural facilities and support extended in conducting this research work. Special thanks go to Mr. Rohit Kumar Tiwari, a specialised clinical psychologist (Rehabilitation Psychology) at the Pushpa Khanna Memorial Centre, for his guidance in behavioral assessment, labelling of our global dataset, and support for dataset annotation. We also appreciate the cooperation and provision of logistic support from the Amrita Rehabilitation Centre and Pushpa Khanna Memorial Centre, both situated in Kanpur, India. Our sincere appreciation extends to the parents for trusting us and to the children for their voluntary participation. We lastly acknowledge all personnel who assisted in the process of data collection at partner centers and in our laboratory.</p>
    </para>
  </section>
  <bibliography bibstyle="IEEEtran" citestyle="numbers" files="references" xml:id="bib">
    <title>References</title>
  </bibliography>
</document>
