<?xml version="1.0" encoding="UTF-8"?>
<?latexml searchpaths="/home/japhy/scienceReplication.artiswrong.com/paper_files/arxiv/1912.09913/latex_extracted"?>
<?latexml class="IEEEtran" options="journal"?>
<?latexml package="cite"?>
<?latexml package="graphicx" options="pdftex"?>
<?latexml graphicspath="/home/japhy/scienceReplication.artiswrong.com/paper_files/arxiv/1912.09913/latex_extracted/figures/}{../pdf/}{../jpeg"?>
<?latexml package="inputenc" options="utf8"?>
<?latexml package="babel" options="english"?>
<?latexml package="CJKutf8"?>
<?latexml package="amsmath"?>
<?latexml package="array"?>
<?latexml package="bm"?>
<?latexml package="booktabs"?>
<?latexml package="dblfloatfix"?>
<?latexml package="float"?>
<?latexml package="subcaption"?>
<?latexml package="tipa"?>
<?latexml package="xcolor"?>
<?latexml package="url"?>
<?latexml RelaxNGSchema="LaTeXML"?>
<document xmlns="http://dlmf.nist.gov/LaTeXML" class="ltx_authors_1line" xml:lang="en">
  <resource src="LaTeXML.css" type="text/css"/>
  <resource src="ltx-article.css" type="text/css"/>
  <title>Hierarchical Character Embeddings: Learning Phonological and Semantic Representations in Languages of Logographic Origin using Recursive Neural Networks</title>
  <creator role="author">
    <personname>Minh Nguyen,
Gia H. Ngo ,
and Nancy F. Chen</personname>
    <contact role="thanks">
© 2019 IEEE.
Personal use of this material is permitted.
Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.<break/>Minh Nguyen is currently with University of California - Davis, but part of the work was done at the Institute for Infocomm Research, A<sup>*</sup>STAR.<break/>Gia H. Ngo is currently with Cornell University, but part of this work was done at the Institute for Infocomm Research, A<sup>*</sup>STAR.<break/>Nancy F. Chen is currently with the Institute for Infocomm Research, A<sup>*</sup>STAR.<break/>Nancy F. Chen is the corresponding author (nancychen@alum.mit.edu).</contact>
  </creator>
  <abstract name="Abstract">
    <p>Logographs (Chinese characters) have recursive structures (i.e. hierarchies of sub-units in logographs) that contain phonological and semantic information, as developmental psychology literature suggests that native speakers leverage on the structures to learn how to read.
Exploiting these structures could potentially lead to better embeddings that can benefit many downstream tasks.
We propose building hierarchical logograph (character) embeddings from logograph recursive structures using treeLSTM, a recursive neural network.
Using recursive neural network imposes a prior on the mapping from logographs to embeddings since the network must read in the sub-units in logographs according to the order specified by the recursive structures.
Based on human behavior in language learning and reading, we hypothesize that modeling logographs’ structures using recursive neural network should be beneficial.
To verify this claim, we consider two tasks (1) predicting logographs’ Cantonese pronunciation from logographic structures and (2) language modeling.
Empirical results show that the proposed hierarchical embeddings outperform baseline approaches.
Diagnostic analysis suggests that hierarchical embeddings constructed using treeLSTM is less sensitive to distractors, thus is more robust, especially on complex logographs.</p>
  </abstract>
  <keywords>
recursive structure, morphology, logograph, embeddings, neural networks.
</keywords>
<!--  %**** taslp2019.tex Line 50 **** -->  <section inlist="toc" labels="LABEL:sec:intro" xml:id="S1">
    <tags>
      <tag>I</tag>
      <tag role="refnum">I</tag>
      <tag role="typerefnum">§I</tag>
    </tags>
    <title><tag close=" ">I</tag><text font="smallcaps">Introduction</text></title>
    <para xml:id="S1.p1">
      <p>Logographic structures contain phonological and semantic information about the logographs <cite class="ltx_citemacro_cite">[<bibref bibrefs="hsiao2006analysis" separator="," yyseparator=","/>]</cite>.
Language learners usually exploit logographic structures to learn logographs’ pronunciation by focusing on salient sub-units of logographs that hint at pronunciations <cite class="ltx_citemacro_cite">[<bibref bibrefs="ho1997phonological" separator="," yyseparator=","/>]</cite>.
Being able to focus on sub-units of logographs might explain how humans can remember the pronunciation and meanings of thousands of distinct characters.
Figure <ref labelref="LABEL:fig:example"/> shows how logographic structures encode phonological and semantic information.
<!--  %**** taslp2019.tex Line 75 **** -->The <ERROR class="undefined">{CJK*}</ERROR>UTF8bsmi氶 sub-unit (position 6) hints at the nucleus and coda in the logographs’ pronunciation.
In addition, the <ERROR class="undefined">{CJK*}</ERROR>UTF8bsmi火 sub-unit<note mark="1" role="footnote" xml:id="footnote1"><tags>
            <tag>1</tag>
            <tag role="refnum">1</tag>
            <tag role="typerefnum">footnote 1</tag>
          </tags><ERROR class="undefined">{CJK*}</ERROR>UTF8bsmi火 is written as <ERROR class="undefined">{CJK*}</ERROR>UTF8gkai灬 when is it at the bottom position.</note>
(position 5) suggests that the logographs containing this sub-unit must be related to fire.
For the four logographs <ERROR class="undefined">{CJK*}</ERROR>UTF8bsmi蒸, 烝, 丞, 氶, the structure of one logograph is nested within that of the preceding logograph.
For example, <ERROR class="undefined">{CJK*}</ERROR>UTF8bsmi烝 is nested within <ERROR class="undefined">{CJK*}</ERROR>UTF8bsmi蒸.
Modeling this hierarchy should allow models to pick out <ERROR class="undefined">{CJK*}</ERROR>UTF8bsmi氶 as the most relevant sub-unit for determining the pronunciation of <ERROR class="undefined">{CJK*}</ERROR>UTF8bsmi蒸, 烝, 丞, 氶 and <ERROR class="undefined">{CJK*}</ERROR>UTF8bsmi火 as the most relevant sub-unit for determining the semantic of <ERROR class="undefined">{CJK*}</ERROR>UTF8bsmi蒸 and <ERROR class="undefined">{CJK*}</ERROR>UTF8bsmi烝.</p>
    </para>
    <figure inlist="lof" labels="LABEL:fig:example" placement="ht" xml:id="S1.F1">
      <tags>
        <tag><text fontsize="90%">Figure 1</text></tag>
        <tag role="refnum">1</tag>
        <tag role="typerefnum">Figure 1</tag>
      </tags>
      <graphics class="ltx_centering" graphic="example" options="width=433.62pt" xml:id="S1.F1.g1"/>
      <toccaption class="ltx_centering"><tag close=" ">1</tag>
<text font="italic">An example of logographic structure.
The left panel shows a binary tree representing the logograph <ERROR class="undefined">{CJK*}</ERROR>UTF8bsmi蒸.
The leaf nodes (position 2, 5, 6, 7) are sub-units forming the logograph (analogous to letters forming English words).
The inner nodes (position 1, 3, 4) are composition operators (such as vertical stacking) applied to children nodes.
The logograph <ERROR class="undefined">{CJK*}</ERROR>UTF8bsmi蒸 is formed by composing all the nodes in the tree in a bottom-up fashion.
The sub-trees rooted at positions 3, 4, 5, 6, 7 also form logographs (<ERROR class="undefined">{CJK*}</ERROR>UTF8bsmi烝, 丞, 氶, 火, 一).
The right table shows the logographs’ meanings and their pronunciation in Cantonese.
</text></toccaption>
      <caption class="ltx_centering"><tag close=": "><text fontsize="90%">Figure 1</text></tag><text fontsize="90%">
<text font="italic">An example of logographic structure.
The left panel shows a binary tree representing the logograph <ERROR class="undefined">{CJK*}</ERROR>UTF8bsmi蒸.
The leaf nodes (position 2, 5, 6, 7) are sub-units forming the logograph (analogous to letters forming English words).
The inner nodes (position 1, 3, 4) are composition operators (such as vertical stacking) applied to children nodes.
The logograph <ERROR class="undefined">{CJK*}</ERROR>UTF8bsmi蒸 is formed by composing all the nodes in the tree in a bottom-up fashion.
The sub-trees rooted at positions 3, 4, 5, 6, 7 also form logographs (<ERROR class="undefined">{CJK*}</ERROR>UTF8bsmi烝, 丞, 氶, 火, 一).
The right table shows the logographs’ meanings and their pronunciation in Cantonese.
</text></text></caption>
    </figure>
    <para xml:id="S1.p2">
      <p>Given the link between logographic structures and their phonology and semantics, we investigate methods to construct logograph (character) embeddings that are useful for different downstream tasks.
We consider two tasks (1) predicting logographs’ Cantonese pronunciation from logographic structures and (2) language modeling.
Pronunciation prediction task requires the embeddings to contain phonological information while language modeling requires the embeddings to contain semantic information.
We propose constructing hierarchical logograph (character) embeddings of logographs from their recursive structures using treeLSTM <cite class="ltx_citemacro_cite">[<bibref bibrefs="tai2015improved,zhu2015long" separator="," yyseparator=","/>]</cite>.
<!--  %**** taslp2019.tex Line 100 **** -->treeLSTM model exploits structures explicitly since it must read in sub-units in the logographs according to the order specified by the recursive structures.
We compare hierarchical embeddings against two different approaches that are commonly used to construct embeddings.
The first approach is standard embeddings <cite class="ltx_citemacro_cite">[<bibref bibrefs="mikolov2013distributed" separator="," yyseparator=","/>]</cite> in which logographs are mapped to representations without utilizing the logographs’ structures.
The second approach is to construct logograph embeddings from linearized structures using LSTM <cite class="ltx_citemacro_cite">[<bibref bibrefs="graves2013generating" separator="," yyseparator=","/>]</cite>.
The second approach only exploits structures implicitly since the structural information is in the input data and not in the model.
Without a lot of training data, this approach is prone to overfitting and learning solutions that may not generalize well.</p>
    </para>
    <para xml:id="S1.p3">
      <p>Modeling structures is expected to help models generalize better especially when there is limited training data <cite class="ltx_citemacro_cite">[<bibref bibrefs="ngo2014minimal,ngo2015phonology,ngo2019phonology" separator="," yyseparator=","/>]</cite>.
Modeling structures has led to improvement in multiple tasks such as machine translation <cite class="ltx_citemacro_cite">[<bibref bibrefs="yamada2001syntax,eriguchi2016tree" separator="," yyseparator=","/>]</cite>, sentiment analysis <cite class="ltx_citemacro_cite">[<bibref bibrefs="tai2015improved,miyazaki2017japanese" separator="," yyseparator=","/>]</cite>, natural language inference <cite class="ltx_citemacro_cite">[<bibref bibrefs="bowman2016fast" separator="," yyseparator=","/>]</cite>, and parsing <cite class="ltx_citemacro_cite">[<bibref bibrefs="dyer2016recurrent,zhang2016top" separator="," yyseparator=","/>]</cite>.
Despite these successes, there are also cases whereby there is little improvement <cite class="ltx_citemacro_cite">[<bibref bibrefs="li2015tree,lan2018toolkit" separator="," yyseparator=","/>]</cite>.
The lack of improvement could be due to either (1) the models cannot exploit structures effectively or (2) the structures do not provide information relevant to the tasks.
Thus, it is important to ensure both the high quality of structure annotations and ability of models to exploit structures effectively so as to improve overall task performance.
However, ensuring consistently high quality annotation is not simple, especially for complex tasks where multiple annotations are plausible.
The quality of structure annotation may vary between training sets and test sets or even within examples in the training sets.
Variation in annotations of training samples may happen due to disagreement between human annotators.
Variation between annotations between training and test samples may happen when models are trained on annotations provided by humans but are tested on annotations provided by parsers that were trained to mimic human annotators.
In contrast, for logographic structures, annotations are consistent since they are constructed automatically using a rule-based parser.
The rules <cite class="ltx_citemacro_cite">[<bibref bibrefs="morioka2008chise" separator="," yyseparator=","/>]</cite> are defined by human experts from the Ideographic Rapporteur Group, a committee advising the Unicode Consortium about logographs therefore the annotation should be of reasonably high quality<note mark="2" role="footnote" xml:id="footnote2"><tags>
            <tag>2</tag>
            <tag role="refnum">2</tag>
            <tag role="typerefnum">footnote 2</tag>
          </tags>The Kyoto University’s CHaracter Information Service Environment (CHISE) project: <ref class="ltx_url" font="typewriter" href="http://www.chise.org/">http://www.chise.org/</ref></note>.
Hence, compared to other tasks which utilize structures, tasks involving logographs could benefit more from effective modeling of structures.</p>
    </para>
    <para xml:id="S1.p4">
      <p>In Section <ref labelref="LABEL:sec:model"/>, we introduce the model to construct the hierarchical embeddings.
We apply the proposed hierarchical character embeddings to two distinct tasks:
(1) pronunciation prediction (Section <ref labelref="LABEL:sec:exp_ph"/>) focusing on a case study to isolate the effects of modeling recursive structures, and
(2) language modeling (Section <ref labelref="LABEL:sec:exp_lm"/>) which is an useful auxiliary task, as it characterizes many aspects of language beyond semantics (including syntactic structure and discourse processing), and language modeling can be used to pretrain many other tasks <cite class="ltx_citemacro_cite">[<bibref bibrefs="ramachandran2017unsupervised,peters2018deep,radford2018improving,howard2018universal,devlin2019bert" separator="," yyseparator=","/>]</cite>,
thus, it has a lot of down-stream applications.
<!--  %**** taslp2019.tex Line 125 **** -->However, due to the multifaceted nature of the language modeling task, it is hard to analyze the result qualitatively.
Section <ref labelref="LABEL:sec:related"/> discussed our work in relation to other work.</p>
    </para>
  </section>
  <section inlist="toc" labels="LABEL:sec:model" xml:id="S2">
    <tags>
      <tag>II</tag>
      <tag role="refnum">II</tag>
      <tag role="typerefnum">§II</tag>
    </tags>
    <title><tag close=" ">II</tag><text font="smallcaps">Model</text></title>
    <subsection inlist="toc" labels="LABEL:ssec:parser" xml:id="S2.SS1">
      <tags>
        <tag>II-A</tag>
        <tag role="refnum">II-A</tag>
        <tag role="typerefnum">§II-A</tag>
      </tags>
      <title><tag close=" ">II-A</tag><text font="italic">Rule-based Parser</text></title>
      <para xml:id="S2.SS1.p1">
        <p>Decomposition of logographs into sub-units is necessary to locate the sub-units hinting at the phonetic or semantic information.
Logographs are decomposed recursively using a rule-based parser.
The substitution rules used by the parser are defined by human experts from the Ideographic Rapporteur Group.
A substitution rule defines a mapping from one logograph to sub-units and a geometric operator (Ideographic Description Character) which denotes the relative position of the sub-units.
The output of the parser is a binary tree as shown in Figure <ref labelref="LABEL:fig:parser"/>.</p>
      </para>
      <para xml:id="S2.SS1.p2">
        <p>At the start, there is only the root node, which is the logograph itself.
The parser extends the tree by recursively replacing nodes in the tree with sub-trees defined by the substitution rules.
The root of the sub-trees are the geometric operator and the children of the sub-trees are the sub-units.
This process is repeated until there is no more node in the tree that can be further decomposed.</p>
      </para>
      <figure inlist="lof" labels="LABEL:fig:parser" placement="ht" xml:id="S2.F2">
        <tags>
          <tag><text fontsize="90%">Figure 2</text></tag>
          <tag role="refnum">2</tag>
          <tag role="typerefnum">Figure 2</tag>
        </tags>
        <graphics class="ltx_centering" graphic="grammar" options="width=433.62pt" xml:id="S2.F2.g1"/>
        <toccaption class="ltx_centering"><tag close=" ">2</tag>
<text font="italic">Construction of logographic recursive structure using the ruled-based parser.
In this example, there are only four rules used which are shown at the bottom.
The rule used at each decomposition step is in red.
</text></toccaption>
        <caption class="ltx_centering"><tag close=": "><text fontsize="90%">Figure 2</text></tag><text fontsize="90%">
<text font="italic">Construction of logographic recursive structure using the ruled-based parser.
In this example, there are only four rules used which are shown at the bottom.
The rule used at each decomposition step is in red.
</text></text></caption>
      </figure>
      <para xml:id="S2.SS1.p3">
        <p>Figure <ref labelref="LABEL:fig:parser"/> shows how the structure (represented as a binary tree) is constructed for the logograph <ERROR class="undefined">{CJK*}</ERROR>UTF8bsmi仕.
At step 1, there is a root node <ERROR class="undefined">{CJK*}</ERROR>UTF8bsmi仕 with no children.
At step 2, using the rule in red, the node <ERROR class="undefined">{CJK*}</ERROR>UTF8bsmi仕 is replaced by a geometric operator (horizontal stacking) and two children nodes.
At step 3, using the rules in red, the nodes are further simplified into by <ERROR class="undefined">{CJK*}</ERROR>UTF8bsmi人, <ERROR class="undefined">{CJK*}</ERROR>UTF8bsmi十 and <ERROR class="undefined">{CJK*}</ERROR>UTF8bsmi一.
The process terminates at step 4 where there are four leaf nodes with three distinct values <ERROR class="undefined">{CJK*}</ERROR>UTF8bsmi人, <ERROR class="undefined">{CJK*}</ERROR>UTF8gkai丨, and <ERROR class="undefined">{CJK*}</ERROR>UTF8bsmi一 which cannot be simplified further.
There are 505 sub-units which can be leaf nodes.
These sub-units are not hand-picked, thus whether or not the representation of the sub-units carries phonetic or semantic information is automatically learned during training.
Hence, the hierarchical embeddings can be used in different tasks.
The phonetic and semantic sub-units can be at depth 1 (children of the root node) or they can reside deeper in the trees.</p>
      </para>
      <para xml:id="S2.SS1.p4">
        <p>To construct logograph embeddings from trees, one can use bag-of-words models, sequence models, or tree-structured models <cite class="ltx_citemacro_cite">[<bibref bibrefs="tai2015improved,zhu2015long" separator="," yyseparator=","/>]</cite>.
Since the ordering of sub-units within logographs is important in determining the logographs’ pronunciation and meaning, order-agnostic models such as bag-of-words models are sub-optimal for constructing logograph embeddings.
Since sequence models and tree-structured models are sensitive to the ordering of sub-units, they can be used to construct logograph embeddings.
Sequence models such as recurrent neural networks (RNNs), in particular LSTM <cite class="ltx_citemacro_cite">[<bibref bibrefs="graves2013generating" separator="," yyseparator=","/>]</cite>, can be used with tree inputs by first linearizing the trees into sequences.
In contrast, recursive neural networks, such as treeLSTM, can consume tree inputs directly to yield the logograph embeddings.
We compared LSTM and bi-directional LSTM (biLSTM), which are structure-agnostic, and treeLSTM, which is innate for modeling tree structures.</p>
      </para>
    </subsection>
    <subsection inlist="toc" labels="LABEL:ssec:lstm" xml:id="S2.SS2">
      <tags>
        <tag>II-B</tag>
        <tag role="refnum">II-B</tag>
        <tag role="typerefnum">§II-B</tag>
      </tags>
      <title><tag close=" ">II-B</tag><text font="italic">Constructing Embeddings Using LSTM</text></title>
<!--  %**** taslp2019.tex Line 175 **** -->      <para xml:id="S2.SS2.p1">
        <p>At each position <Math mode="inline" tex="t" text="t" xml:id="S2.SS2.p1.m1">
            <XMath>
              <XMTok font="italic" role="UNKNOWN">t</XMTok>
            </XMath>
          </Math> in a linearized tree of length <Math mode="inline" tex="T" text="T" xml:id="S2.SS2.p1.m2">
            <XMath>
              <XMTok font="italic" role="UNKNOWN">T</XMTok>
            </XMath>
          </Math>, <Math mode="inline" tex="{\bm{x}}_{t}" text="x _ t" xml:id="S2.SS2.p1.m3">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="bold italic" role="UNKNOWN">x</XMTok>
                <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
              </XMApp>
            </XMath>
          </Math>, <Math mode="inline" tex="{\bm{c}}_{t}" text="c _ t" xml:id="S2.SS2.p1.m4">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="bold italic" role="UNKNOWN">c</XMTok>
                <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
              </XMApp>
            </XMath>
          </Math>, <Math mode="inline" tex="{\bm{h}}_{t}" text="h _ t" xml:id="S2.SS2.p1.m5">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="bold italic" role="UNKNOWN">h</XMTok>
                <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
              </XMApp>
            </XMath>
          </Math> are the input, cell value, and hidden state of the LSTM respectively.
The last hidden state, <Math mode="inline" tex="{\bm{h}}_{T}" text="h _ T" xml:id="S2.SS2.p1.m6">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="bold italic" role="UNKNOWN">h</XMTok>
                <XMTok font="italic" fontsize="70%" role="UNKNOWN">T</XMTok>
              </XMApp>
            </XMath>
          </Math>, is used as the logograph embedding.
Figure <ref labelref="LABEL:subfig:lstm_model"/> shows the LSTM model.</p>
      </para>
    </subsection>
    <subsection inlist="toc" labels="LABEL:ssec:bilstm" xml:id="S2.SS3">
      <tags>
        <tag>II-C</tag>
        <tag role="refnum">II-C</tag>
        <tag role="typerefnum">§II-C</tag>
      </tags>
      <title><tag close=" ">II-C</tag><text font="italic">Constructing Embeddings Using Bi-directional LSTM</text></title>
      <para xml:id="S2.SS3.p1">
        <p>The biLSTM consists of two LSTMs, the forward LSTM and the backward LSTM, which read the linearized trees in opposite direction.
The logograph embedding, <Math mode="inline" tex="{\bm{h}}_{T}" text="h _ T" xml:id="S2.SS3.p1.m1">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="bold italic" role="UNKNOWN">h</XMTok>
                <XMTok font="italic" fontsize="70%" role="UNKNOWN">T</XMTok>
              </XMApp>
            </XMath>
          </Math>, is formed by concatenating the last hidden states of the backward and forward LSTMs, i.e. <Math mode="inline" tex="{\bm{h}}^{b}_{T}" text="(h ^ b) _ T" xml:id="S2.SS3.p1.m2">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMApp>
                  <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                  <XMTok font="bold italic" role="UNKNOWN">h</XMTok>
                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">b</XMTok>
                </XMApp>
                <XMTok font="italic" fontsize="70%" role="UNKNOWN">T</XMTok>
              </XMApp>
            </XMath>
          </Math> and <Math mode="inline" tex="{\bm{h}}^{f}_{T}" text="(h ^ f) _ T" xml:id="S2.SS3.p1.m3">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMApp>
                  <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                  <XMTok font="bold italic" role="UNKNOWN">h</XMTok>
                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">f</XMTok>
                </XMApp>
                <XMTok font="italic" fontsize="70%" role="UNKNOWN">T</XMTok>
              </XMApp>
            </XMath>
          </Math>.</p>
      </para>
    </subsection>
    <subsection inlist="toc" labels="LABEL:ssec:bilstm" xml:id="S2.SS4">
      <tags>
        <tag>II-D</tag>
        <tag role="refnum">II-D</tag>
        <tag role="typerefnum">§II-D</tag>
      </tags>
      <title><tag close=" ">II-D</tag><text font="italic">Constructing Embeddings Using CNN</text></title>
      <para xml:id="S2.SS4.p1">
        <p>The model structure is similar to that of <cite class="ltx_citemacro_cite">[<bibref bibrefs="li2018subword" separator="," yyseparator=","/>]</cite>.
The input to the model are also sequences formed by linearizing the trees of logographs.
The CNN model consists of 7 parallel 1D convolutional layers with kernel size from 1 to 7.
Each convolutional layer has 200 filters.
The convolutional layers are followed by max-pooling layers.
After that, the outputs are concatenated and fed through a fully-connected layer.
The output of the fully-connected layer is the logograph embedding.</p>
      </para>
    </subsection>
    <subsection inlist="toc" labels="LABEL:ssec:treelstm" xml:id="S2.SS5">
      <tags>
        <tag>II-E</tag>
        <tag role="refnum">II-E</tag>
        <tag role="typerefnum">§II-E</tag>
      </tags>
      <title><tag close=" ">II-E</tag><text font="italic">Constructing Hierarchical Embeddings Using treeLSTM</text></title>
      <para xml:id="S2.SS5.p1">
        <p>At each node <Math mode="inline" tex="n" text="n" xml:id="S2.SS5.p1.m1">
            <XMath>
              <XMTok font="italic" role="UNKNOWN">n</XMTok>
            </XMath>
          </Math> of the binary tree with two children <Math mode="inline" tex="l" text="l" xml:id="S2.SS5.p1.m2">
            <XMath>
              <XMTok font="italic" role="UNKNOWN">l</XMTok>
            </XMath>
          </Math> and <Math mode="inline" tex="r" text="r" xml:id="S2.SS5.p1.m3">
            <XMath>
              <XMTok font="italic" role="UNKNOWN">r</XMTok>
            </XMath>
          </Math>, <Math mode="inline" tex="{\bm{x}}_{n}" text="x _ n" xml:id="S2.SS5.p1.m4">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="bold italic" role="UNKNOWN">x</XMTok>
                <XMTok font="italic" fontsize="70%" role="UNKNOWN">n</XMTok>
              </XMApp>
            </XMath>
          </Math>, <Math mode="inline" tex="{\bm{c}}_{n}" text="c _ n" xml:id="S2.SS5.p1.m5">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="bold italic" role="UNKNOWN">c</XMTok>
                <XMTok font="italic" fontsize="70%" role="UNKNOWN">n</XMTok>
              </XMApp>
            </XMath>
          </Math>, <Math mode="inline" tex="{\bm{h}}_{n}" text="h _ n" xml:id="S2.SS5.p1.m6">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="bold italic" role="UNKNOWN">h</XMTok>
                <XMTok font="italic" fontsize="70%" role="UNKNOWN">n</XMTok>
              </XMApp>
            </XMath>
          </Math> are the input, cell value, and hidden state of the treeLSTM respectively.
<!--  %**** taslp2019.tex Line 200 **** --><Math mode="inline" tex="{\bm{i}}" text="i" xml:id="S2.SS5.p1.m7">
            <XMath>
              <XMTok font="bold italic" role="UNKNOWN">i</XMTok>
            </XMath>
          </Math>, <Math mode="inline" tex="{\bm{f}}_{l}" text="f _ l" xml:id="S2.SS5.p1.m8">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="bold italic" role="UNKNOWN">f</XMTok>
                <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
              </XMApp>
            </XMath>
          </Math>, <Math mode="inline" tex="{\bm{f}}_{r}" text="f _ r" xml:id="S2.SS5.p1.m9">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="bold italic" role="UNKNOWN">f</XMTok>
                <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
              </XMApp>
            </XMath>
          </Math>, <Math mode="inline" tex="{\bm{o}}" text="o" xml:id="S2.SS5.p1.m10">
            <XMath>
              <XMTok font="bold italic" role="UNKNOWN">o</XMTok>
            </XMath>
          </Math> are the input gate, left forget gate, right forget gate, and output gate respectively.
The forward pass of a treeLSTM unit is given by:</p>
      </para>
      <para xml:id="S2.SS5.p2">
        <equationgroup class="ltx_eqn_eqnarray" xml:id="Sx1.EGx1">
          <equation xml:id="S2.Ex1">
            <MathFork>
              <Math tex="\displaystyle{\bm{i}}=\sigma({\bm{U}}_{l}^{i}{\bm{h}}_{l}+{\bm{U}}_{r}^{i}{\bm%&#10;{h}}_{r}+{\bm{V}}^{i}{\bm{x}}_{n}+{\bm{V}}_{l}^{i}{\bm{x}}_{l}+{\bm{V}}_{r}^{i%&#10;}{\bm{x}}_{r})" text="i = sigma * ((U _ l) ^ i * h _ l + (U _ r) ^ i * h _ r + V ^ i * x _ n + (V _ l) ^ i * x _ l + (V _ r) ^ i * x _ r)" xml:id="S2.Ex1.m4">
                <XMath>
                  <XMApp>
                    <XMTok meaning="equals" role="RELOP">=</XMTok>
                    <XMTok font="bold italic" role="UNKNOWN">i</XMTok>
                    <XMApp>
                      <XMTok meaning="times" role="MULOP">⁢</XMTok>
                      <XMTok font="italic" name="sigma" role="UNKNOWN">σ</XMTok>
                      <XMDual>
                        <XMRef idref="S2.Ex1.m4.1"/>
                        <XMWrap>
                          <XMTok role="OPEN" stretchy="false">(</XMTok>
                          <XMApp xml:id="S2.Ex1.m4.1">
                            <XMTok meaning="plus" role="ADDOP">+</XMTok>
                            <XMApp>
                              <XMTok meaning="times" role="MULOP">⁢</XMTok>
                              <XMApp>
                                <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                <XMApp>
                                  <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                  <XMTok font="bold italic" role="UNKNOWN">U</XMTok>
                                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                                </XMApp>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">i</XMTok>
                              </XMApp>
                              <XMApp>
                                <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                <XMTok font="bold italic" role="UNKNOWN">h</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                              </XMApp>
                            </XMApp>
                            <XMApp>
                              <XMTok meaning="times" role="MULOP">⁢</XMTok>
                              <XMApp>
                                <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                <XMApp>
                                  <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                  <XMTok font="bold italic" role="UNKNOWN">U</XMTok>
                                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                                </XMApp>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">i</XMTok>
                              </XMApp>
                              <XMApp>
                                <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                <XMTok font="bold italic" role="UNKNOWN">h</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                              </XMApp>
                            </XMApp>
                            <XMApp>
                              <XMTok meaning="times" role="MULOP">⁢</XMTok>
                              <XMApp>
                                <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                <XMTok font="bold italic" role="UNKNOWN">V</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">i</XMTok>
                              </XMApp>
                              <XMApp>
                                <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                <XMTok font="bold italic" role="UNKNOWN">x</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">n</XMTok>
                              </XMApp>
                            </XMApp>
                            <XMApp>
                              <XMTok meaning="times" role="MULOP">⁢</XMTok>
                              <XMApp>
                                <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                <XMApp>
                                  <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                  <XMTok font="bold italic" role="UNKNOWN">V</XMTok>
                                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                                </XMApp>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">i</XMTok>
                              </XMApp>
                              <XMApp>
                                <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                <XMTok font="bold italic" role="UNKNOWN">x</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                              </XMApp>
                            </XMApp>
                            <XMApp>
                              <XMTok meaning="times" role="MULOP">⁢</XMTok>
                              <XMApp>
                                <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                <XMApp>
                                  <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                  <XMTok font="bold italic" role="UNKNOWN">V</XMTok>
                                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                                </XMApp>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">i</XMTok>
                              </XMApp>
                              <XMApp>
                                <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                <XMTok font="bold italic" role="UNKNOWN">x</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                              </XMApp>
                            </XMApp>
                          </XMApp>
                          <XMTok role="CLOSE" stretchy="false">)</XMTok>
                        </XMWrap>
                      </XMDual>
                    </XMApp>
                  </XMApp>
                </XMath>
              </Math>
              <MathBranch>
                <tr>
                  <td align="right"><Math mode="inline" tex="\displaystyle{\bm{i}}" text="i" xml:id="S2.Ex1.m1">
                      <XMath>
                        <XMTok font="bold italic" role="UNKNOWN">i</XMTok>
                      </XMath>
                    </Math></td>
                  <td align="center"><Math mode="inline" tex="\displaystyle=" text="=" xml:id="S2.Ex1.m2">
                      <XMath>
                        <XMTok meaning="equals" role="RELOP">=</XMTok>
                      </XMath>
                    </Math></td>
                  <td align="left"><Math mode="inline" tex="\displaystyle\sigma({\bm{U}}_{l}^{i}{\bm{h}}_{l}+{\bm{U}}_{r}^{i}{\bm{h}}_{r}+%&#10;{\bm{V}}^{i}{\bm{x}}_{n}+{\bm{V}}_{l}^{i}{\bm{x}}_{l}+{\bm{V}}_{r}^{i}{\bm{x}}%&#10;_{r})" text="sigma * ((U _ l) ^ i * h _ l + (U _ r) ^ i * h _ r + V ^ i * x _ n + (V _ l) ^ i * x _ l + (V _ r) ^ i * x _ r)" xml:id="S2.Ex1.m3">
                      <XMath>
                        <XMApp>
                          <XMTok meaning="times" role="MULOP">⁢</XMTok>
                          <XMTok font="italic" name="sigma" role="UNKNOWN">σ</XMTok>
                          <XMDual>
                            <XMRef idref="S2.Ex1.m3.1"/>
                            <XMWrap>
                              <XMTok role="OPEN" stretchy="false">(</XMTok>
                              <XMApp xml:id="S2.Ex1.m3.1">
                                <XMTok meaning="plus" role="ADDOP">+</XMTok>
                                <XMApp>
                                  <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                  <XMApp>
                                    <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                    <XMApp>
                                      <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                      <XMTok font="bold italic" role="UNKNOWN">U</XMTok>
                                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                                    </XMApp>
                                    <XMTok font="italic" fontsize="70%" role="UNKNOWN">i</XMTok>
                                  </XMApp>
                                  <XMApp>
                                    <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                    <XMTok font="bold italic" role="UNKNOWN">h</XMTok>
                                    <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                                  </XMApp>
                                </XMApp>
                                <XMApp>
                                  <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                  <XMApp>
                                    <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                    <XMApp>
                                      <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                      <XMTok font="bold italic" role="UNKNOWN">U</XMTok>
                                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                                    </XMApp>
                                    <XMTok font="italic" fontsize="70%" role="UNKNOWN">i</XMTok>
                                  </XMApp>
                                  <XMApp>
                                    <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                    <XMTok font="bold italic" role="UNKNOWN">h</XMTok>
                                    <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                                  </XMApp>
                                </XMApp>
                                <XMApp>
                                  <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                  <XMApp>
                                    <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                    <XMTok font="bold italic" role="UNKNOWN">V</XMTok>
                                    <XMTok font="italic" fontsize="70%" role="UNKNOWN">i</XMTok>
                                  </XMApp>
                                  <XMApp>
                                    <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                    <XMTok font="bold italic" role="UNKNOWN">x</XMTok>
                                    <XMTok font="italic" fontsize="70%" role="UNKNOWN">n</XMTok>
                                  </XMApp>
                                </XMApp>
                                <XMApp>
                                  <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                  <XMApp>
                                    <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                    <XMApp>
                                      <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                      <XMTok font="bold italic" role="UNKNOWN">V</XMTok>
                                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                                    </XMApp>
                                    <XMTok font="italic" fontsize="70%" role="UNKNOWN">i</XMTok>
                                  </XMApp>
                                  <XMApp>
                                    <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                    <XMTok font="bold italic" role="UNKNOWN">x</XMTok>
                                    <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                                  </XMApp>
                                </XMApp>
                                <XMApp>
                                  <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                  <XMApp>
                                    <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                    <XMApp>
                                      <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                      <XMTok font="bold italic" role="UNKNOWN">V</XMTok>
                                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                                    </XMApp>
                                    <XMTok font="italic" fontsize="70%" role="UNKNOWN">i</XMTok>
                                  </XMApp>
                                  <XMApp>
                                    <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                    <XMTok font="bold italic" role="UNKNOWN">x</XMTok>
                                    <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                                  </XMApp>
                                </XMApp>
                              </XMApp>
                              <XMTok role="CLOSE" stretchy="false">)</XMTok>
                            </XMWrap>
                          </XMDual>
                        </XMApp>
                      </XMath>
                    </Math></td>
                </tr>
              </MathBranch>
            </MathFork>
          </equation>
          <equation xml:id="S2.Ex2">
            <MathFork>
              <Math tex="\displaystyle{\bm{f}}_{l}=\sigma({\bm{U}}_{l}^{f_{l}}{\bm{h}}_{l}+{\bm{U}}_{r}%&#10;^{f_{l}}{\bm{h}}_{r}+{\bm{V}}^{f_{l}}{\bm{x}}_{n}+{\bm{V}}_{l}^{f_{l}}{\bm{x}}%&#10;_{l}+{\bm{V}}_{r}^{f_{l}}{\bm{x}}_{r})" text="f _ l = sigma * ((U _ l) ^ (f _ l) * h _ l + (U _ r) ^ (f _ l) * h _ r + V ^ (f _ l) * x _ n + (V _ l) ^ (f _ l) * x _ l + (V _ r) ^ (f _ l) * x _ r)" xml:id="S2.Ex2.m4">
                <XMath>
                  <XMApp>
                    <XMTok meaning="equals" role="RELOP">=</XMTok>
                    <XMApp>
                      <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                      <XMTok font="bold italic" role="UNKNOWN">f</XMTok>
                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                    </XMApp>
                    <XMApp>
                      <XMTok meaning="times" role="MULOP">⁢</XMTok>
                      <XMTok font="italic" name="sigma" role="UNKNOWN">σ</XMTok>
                      <XMDual>
                        <XMRef idref="S2.Ex2.m4.1"/>
                        <XMWrap>
                          <XMTok role="OPEN" stretchy="false">(</XMTok>
                          <XMApp xml:id="S2.Ex2.m4.1">
                            <XMTok meaning="plus" role="ADDOP">+</XMTok>
                            <XMApp>
                              <XMTok meaning="times" role="MULOP">⁢</XMTok>
                              <XMApp>
                                <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                <XMApp>
                                  <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                  <XMTok font="bold italic" role="UNKNOWN">U</XMTok>
                                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                                </XMApp>
                                <XMApp>
                                  <XMTok role="SUBSCRIPTOP" scriptpos="post3"/>
                                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">f</XMTok>
                                  <XMTok font="italic" fontsize="50%" role="UNKNOWN">l</XMTok>
                                </XMApp>
                              </XMApp>
                              <XMApp>
                                <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                <XMTok font="bold italic" role="UNKNOWN">h</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                              </XMApp>
                            </XMApp>
                            <XMApp>
                              <XMTok meaning="times" role="MULOP">⁢</XMTok>
                              <XMApp>
                                <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                <XMApp>
                                  <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                  <XMTok font="bold italic" role="UNKNOWN">U</XMTok>
                                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                                </XMApp>
                                <XMApp>
                                  <XMTok role="SUBSCRIPTOP" scriptpos="post3"/>
                                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">f</XMTok>
                                  <XMTok font="italic" fontsize="50%" role="UNKNOWN">l</XMTok>
                                </XMApp>
                              </XMApp>
                              <XMApp>
                                <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                <XMTok font="bold italic" role="UNKNOWN">h</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                              </XMApp>
                            </XMApp>
                            <XMApp>
                              <XMTok meaning="times" role="MULOP">⁢</XMTok>
                              <XMApp>
                                <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                <XMTok font="bold italic" role="UNKNOWN">V</XMTok>
                                <XMApp>
                                  <XMTok role="SUBSCRIPTOP" scriptpos="post3"/>
                                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">f</XMTok>
                                  <XMTok font="italic" fontsize="50%" role="UNKNOWN">l</XMTok>
                                </XMApp>
                              </XMApp>
                              <XMApp>
                                <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                <XMTok font="bold italic" role="UNKNOWN">x</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">n</XMTok>
                              </XMApp>
                            </XMApp>
                            <XMApp>
                              <XMTok meaning="times" role="MULOP">⁢</XMTok>
                              <XMApp>
                                <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                <XMApp>
                                  <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                  <XMTok font="bold italic" role="UNKNOWN">V</XMTok>
                                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                                </XMApp>
                                <XMApp>
                                  <XMTok role="SUBSCRIPTOP" scriptpos="post3"/>
                                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">f</XMTok>
                                  <XMTok font="italic" fontsize="50%" role="UNKNOWN">l</XMTok>
                                </XMApp>
                              </XMApp>
                              <XMApp>
                                <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                <XMTok font="bold italic" role="UNKNOWN">x</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                              </XMApp>
                            </XMApp>
                            <XMApp>
                              <XMTok meaning="times" role="MULOP">⁢</XMTok>
                              <XMApp>
                                <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                <XMApp>
                                  <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                  <XMTok font="bold italic" role="UNKNOWN">V</XMTok>
                                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                                </XMApp>
                                <XMApp>
                                  <XMTok role="SUBSCRIPTOP" scriptpos="post3"/>
                                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">f</XMTok>
                                  <XMTok font="italic" fontsize="50%" role="UNKNOWN">l</XMTok>
                                </XMApp>
                              </XMApp>
                              <XMApp>
                                <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                <XMTok font="bold italic" role="UNKNOWN">x</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                              </XMApp>
                            </XMApp>
                          </XMApp>
                          <XMTok role="CLOSE" stretchy="false">)</XMTok>
                        </XMWrap>
                      </XMDual>
                    </XMApp>
                  </XMApp>
                </XMath>
              </Math>
              <MathBranch>
                <tr>
                  <td align="right"><Math mode="inline" tex="\displaystyle{\bm{f}}_{l}" text="f _ l" xml:id="S2.Ex2.m1">
                      <XMath>
                        <XMApp>
                          <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                          <XMTok font="bold italic" role="UNKNOWN">f</XMTok>
                          <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                        </XMApp>
                      </XMath>
                    </Math></td>
                  <td align="center"><Math mode="inline" tex="\displaystyle=" text="=" xml:id="S2.Ex2.m2">
                      <XMath>
                        <XMTok meaning="equals" role="RELOP">=</XMTok>
                      </XMath>
                    </Math></td>
                  <td align="left"><Math mode="inline" tex="\displaystyle\sigma({\bm{U}}_{l}^{f_{l}}{\bm{h}}_{l}+{\bm{U}}_{r}^{f_{l}}{\bm{%&#10;h}}_{r}+{\bm{V}}^{f_{l}}{\bm{x}}_{n}+{\bm{V}}_{l}^{f_{l}}{\bm{x}}_{l}+{\bm{V}}%&#10;_{r}^{f_{l}}{\bm{x}}_{r})" text="sigma * ((U _ l) ^ (f _ l) * h _ l + (U _ r) ^ (f _ l) * h _ r + V ^ (f _ l) * x _ n + (V _ l) ^ (f _ l) * x _ l + (V _ r) ^ (f _ l) * x _ r)" xml:id="S2.Ex2.m3">
                      <XMath>
                        <XMApp>
                          <XMTok meaning="times" role="MULOP">⁢</XMTok>
                          <XMTok font="italic" name="sigma" role="UNKNOWN">σ</XMTok>
                          <XMDual>
                            <XMRef idref="S2.Ex2.m3.1"/>
                            <XMWrap>
                              <XMTok role="OPEN" stretchy="false">(</XMTok>
                              <XMApp xml:id="S2.Ex2.m3.1">
                                <XMTok meaning="plus" role="ADDOP">+</XMTok>
                                <XMApp>
                                  <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                  <XMApp>
                                    <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                    <XMApp>
                                      <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                      <XMTok font="bold italic" role="UNKNOWN">U</XMTok>
                                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                                    </XMApp>
                                    <XMApp>
                                      <XMTok role="SUBSCRIPTOP" scriptpos="post3"/>
                                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">f</XMTok>
                                      <XMTok font="italic" fontsize="50%" role="UNKNOWN">l</XMTok>
                                    </XMApp>
                                  </XMApp>
                                  <XMApp>
                                    <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                    <XMTok font="bold italic" role="UNKNOWN">h</XMTok>
                                    <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                                  </XMApp>
                                </XMApp>
                                <XMApp>
                                  <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                  <XMApp>
                                    <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                    <XMApp>
                                      <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                      <XMTok font="bold italic" role="UNKNOWN">U</XMTok>
                                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                                    </XMApp>
                                    <XMApp>
                                      <XMTok role="SUBSCRIPTOP" scriptpos="post3"/>
                                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">f</XMTok>
                                      <XMTok font="italic" fontsize="50%" role="UNKNOWN">l</XMTok>
                                    </XMApp>
                                  </XMApp>
                                  <XMApp>
                                    <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                    <XMTok font="bold italic" role="UNKNOWN">h</XMTok>
                                    <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                                  </XMApp>
                                </XMApp>
                                <XMApp>
                                  <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                  <XMApp>
                                    <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                    <XMTok font="bold italic" role="UNKNOWN">V</XMTok>
                                    <XMApp>
                                      <XMTok role="SUBSCRIPTOP" scriptpos="post3"/>
                                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">f</XMTok>
                                      <XMTok font="italic" fontsize="50%" role="UNKNOWN">l</XMTok>
                                    </XMApp>
                                  </XMApp>
                                  <XMApp>
                                    <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                    <XMTok font="bold italic" role="UNKNOWN">x</XMTok>
                                    <XMTok font="italic" fontsize="70%" role="UNKNOWN">n</XMTok>
                                  </XMApp>
                                </XMApp>
                                <XMApp>
                                  <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                  <XMApp>
                                    <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                    <XMApp>
                                      <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                      <XMTok font="bold italic" role="UNKNOWN">V</XMTok>
                                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                                    </XMApp>
                                    <XMApp>
                                      <XMTok role="SUBSCRIPTOP" scriptpos="post3"/>
                                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">f</XMTok>
                                      <XMTok font="italic" fontsize="50%" role="UNKNOWN">l</XMTok>
                                    </XMApp>
                                  </XMApp>
                                  <XMApp>
                                    <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                    <XMTok font="bold italic" role="UNKNOWN">x</XMTok>
                                    <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                                  </XMApp>
                                </XMApp>
                                <XMApp>
                                  <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                  <XMApp>
                                    <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                    <XMApp>
                                      <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                      <XMTok font="bold italic" role="UNKNOWN">V</XMTok>
                                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                                    </XMApp>
                                    <XMApp>
                                      <XMTok role="SUBSCRIPTOP" scriptpos="post3"/>
                                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">f</XMTok>
                                      <XMTok font="italic" fontsize="50%" role="UNKNOWN">l</XMTok>
                                    </XMApp>
                                  </XMApp>
                                  <XMApp>
                                    <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                    <XMTok font="bold italic" role="UNKNOWN">x</XMTok>
                                    <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                                  </XMApp>
                                </XMApp>
                              </XMApp>
                              <XMTok role="CLOSE" stretchy="false">)</XMTok>
                            </XMWrap>
                          </XMDual>
                        </XMApp>
                      </XMath>
                    </Math></td>
                </tr>
              </MathBranch>
            </MathFork>
          </equation>
          <equation xml:id="S2.Ex3">
            <MathFork>
              <Math tex="\displaystyle{\bm{f}}_{r}=\sigma({\bm{U}}_{l}^{f_{r}}{\bm{h}}_{l}+{\bm{U}}_{r}%&#10;^{f_{r}}{\bm{h}}_{r}+{\bm{V}}^{f_{r}}{\bm{x}}_{n}+{\bm{V}}_{l}^{f_{r}}{\bm{x}}%&#10;_{l}+{\bm{V}}_{r}^{f_{r}}{\bm{x}}_{r})" text="f _ r = sigma * ((U _ l) ^ (f _ r) * h _ l + (U _ r) ^ (f _ r) * h _ r + V ^ (f _ r) * x _ n + (V _ l) ^ (f _ r) * x _ l + (V _ r) ^ (f _ r) * x _ r)" xml:id="S2.Ex3.m4">
                <XMath>
                  <XMApp>
                    <XMTok meaning="equals" role="RELOP">=</XMTok>
                    <XMApp>
                      <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                      <XMTok font="bold italic" role="UNKNOWN">f</XMTok>
                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                    </XMApp>
                    <XMApp>
                      <XMTok meaning="times" role="MULOP">⁢</XMTok>
                      <XMTok font="italic" name="sigma" role="UNKNOWN">σ</XMTok>
                      <XMDual>
                        <XMRef idref="S2.Ex3.m4.1"/>
                        <XMWrap>
                          <XMTok role="OPEN" stretchy="false">(</XMTok>
                          <XMApp xml:id="S2.Ex3.m4.1">
                            <XMTok meaning="plus" role="ADDOP">+</XMTok>
                            <XMApp>
                              <XMTok meaning="times" role="MULOP">⁢</XMTok>
                              <XMApp>
                                <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                <XMApp>
                                  <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                  <XMTok font="bold italic" role="UNKNOWN">U</XMTok>
                                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                                </XMApp>
                                <XMApp>
                                  <XMTok role="SUBSCRIPTOP" scriptpos="post3"/>
                                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">f</XMTok>
                                  <XMTok font="italic" fontsize="50%" role="UNKNOWN">r</XMTok>
                                </XMApp>
                              </XMApp>
                              <XMApp>
                                <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                <XMTok font="bold italic" role="UNKNOWN">h</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                              </XMApp>
                            </XMApp>
                            <XMApp>
                              <XMTok meaning="times" role="MULOP">⁢</XMTok>
                              <XMApp>
                                <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                <XMApp>
                                  <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                  <XMTok font="bold italic" role="UNKNOWN">U</XMTok>
                                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                                </XMApp>
                                <XMApp>
                                  <XMTok role="SUBSCRIPTOP" scriptpos="post3"/>
                                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">f</XMTok>
                                  <XMTok font="italic" fontsize="50%" role="UNKNOWN">r</XMTok>
                                </XMApp>
                              </XMApp>
                              <XMApp>
                                <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                <XMTok font="bold italic" role="UNKNOWN">h</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                              </XMApp>
                            </XMApp>
                            <XMApp>
                              <XMTok meaning="times" role="MULOP">⁢</XMTok>
                              <XMApp>
                                <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                <XMTok font="bold italic" role="UNKNOWN">V</XMTok>
                                <XMApp>
                                  <XMTok role="SUBSCRIPTOP" scriptpos="post3"/>
                                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">f</XMTok>
                                  <XMTok font="italic" fontsize="50%" role="UNKNOWN">r</XMTok>
                                </XMApp>
                              </XMApp>
                              <XMApp>
                                <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                <XMTok font="bold italic" role="UNKNOWN">x</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">n</XMTok>
                              </XMApp>
                            </XMApp>
                            <XMApp>
                              <XMTok meaning="times" role="MULOP">⁢</XMTok>
                              <XMApp>
                                <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                <XMApp>
                                  <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                  <XMTok font="bold italic" role="UNKNOWN">V</XMTok>
                                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                                </XMApp>
                                <XMApp>
                                  <XMTok role="SUBSCRIPTOP" scriptpos="post3"/>
                                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">f</XMTok>
                                  <XMTok font="italic" fontsize="50%" role="UNKNOWN">r</XMTok>
                                </XMApp>
                              </XMApp>
                              <XMApp>
                                <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                <XMTok font="bold italic" role="UNKNOWN">x</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                              </XMApp>
                            </XMApp>
                            <XMApp>
                              <XMTok meaning="times" role="MULOP">⁢</XMTok>
                              <XMApp>
                                <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                <XMApp>
                                  <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                  <XMTok font="bold italic" role="UNKNOWN">V</XMTok>
                                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                                </XMApp>
                                <XMApp>
                                  <XMTok role="SUBSCRIPTOP" scriptpos="post3"/>
                                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">f</XMTok>
                                  <XMTok font="italic" fontsize="50%" role="UNKNOWN">r</XMTok>
                                </XMApp>
                              </XMApp>
                              <XMApp>
                                <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                <XMTok font="bold italic" role="UNKNOWN">x</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                              </XMApp>
                            </XMApp>
                          </XMApp>
                          <XMTok role="CLOSE" stretchy="false">)</XMTok>
                        </XMWrap>
                      </XMDual>
                    </XMApp>
                  </XMApp>
                </XMath>
              </Math>
              <MathBranch>
                <tr>
                  <td align="right"><Math mode="inline" tex="\displaystyle{\bm{f}}_{r}" text="f _ r" xml:id="S2.Ex3.m1">
                      <XMath>
                        <XMApp>
                          <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                          <XMTok font="bold italic" role="UNKNOWN">f</XMTok>
                          <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                        </XMApp>
                      </XMath>
                    </Math></td>
                  <td align="center"><Math mode="inline" tex="\displaystyle=" text="=" xml:id="S2.Ex3.m2">
                      <XMath>
                        <XMTok meaning="equals" role="RELOP">=</XMTok>
                      </XMath>
                    </Math></td>
                  <td align="left"><Math mode="inline" tex="\displaystyle\sigma({\bm{U}}_{l}^{f_{r}}{\bm{h}}_{l}+{\bm{U}}_{r}^{f_{r}}{\bm{%&#10;h}}_{r}+{\bm{V}}^{f_{r}}{\bm{x}}_{n}+{\bm{V}}_{l}^{f_{r}}{\bm{x}}_{l}+{\bm{V}}%&#10;_{r}^{f_{r}}{\bm{x}}_{r})" text="sigma * ((U _ l) ^ (f _ r) * h _ l + (U _ r) ^ (f _ r) * h _ r + V ^ (f _ r) * x _ n + (V _ l) ^ (f _ r) * x _ l + (V _ r) ^ (f _ r) * x _ r)" xml:id="S2.Ex3.m3">
                      <XMath>
                        <XMApp>
                          <XMTok meaning="times" role="MULOP">⁢</XMTok>
                          <XMTok font="italic" name="sigma" role="UNKNOWN">σ</XMTok>
                          <XMDual>
                            <XMRef idref="S2.Ex3.m3.1"/>
                            <XMWrap>
                              <XMTok role="OPEN" stretchy="false">(</XMTok>
                              <XMApp xml:id="S2.Ex3.m3.1">
                                <XMTok meaning="plus" role="ADDOP">+</XMTok>
                                <XMApp>
                                  <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                  <XMApp>
                                    <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                    <XMApp>
                                      <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                      <XMTok font="bold italic" role="UNKNOWN">U</XMTok>
                                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                                    </XMApp>
                                    <XMApp>
                                      <XMTok role="SUBSCRIPTOP" scriptpos="post3"/>
                                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">f</XMTok>
                                      <XMTok font="italic" fontsize="50%" role="UNKNOWN">r</XMTok>
                                    </XMApp>
                                  </XMApp>
                                  <XMApp>
                                    <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                    <XMTok font="bold italic" role="UNKNOWN">h</XMTok>
                                    <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                                  </XMApp>
                                </XMApp>
                                <XMApp>
                                  <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                  <XMApp>
                                    <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                    <XMApp>
                                      <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                      <XMTok font="bold italic" role="UNKNOWN">U</XMTok>
                                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                                    </XMApp>
                                    <XMApp>
                                      <XMTok role="SUBSCRIPTOP" scriptpos="post3"/>
                                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">f</XMTok>
                                      <XMTok font="italic" fontsize="50%" role="UNKNOWN">r</XMTok>
                                    </XMApp>
                                  </XMApp>
                                  <XMApp>
                                    <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                    <XMTok font="bold italic" role="UNKNOWN">h</XMTok>
                                    <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                                  </XMApp>
                                </XMApp>
                                <XMApp>
                                  <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                  <XMApp>
                                    <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                    <XMTok font="bold italic" role="UNKNOWN">V</XMTok>
                                    <XMApp>
                                      <XMTok role="SUBSCRIPTOP" scriptpos="post3"/>
                                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">f</XMTok>
                                      <XMTok font="italic" fontsize="50%" role="UNKNOWN">r</XMTok>
                                    </XMApp>
                                  </XMApp>
                                  <XMApp>
                                    <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                    <XMTok font="bold italic" role="UNKNOWN">x</XMTok>
                                    <XMTok font="italic" fontsize="70%" role="UNKNOWN">n</XMTok>
                                  </XMApp>
                                </XMApp>
                                <XMApp>
                                  <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                  <XMApp>
                                    <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                    <XMApp>
                                      <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                      <XMTok font="bold italic" role="UNKNOWN">V</XMTok>
                                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                                    </XMApp>
                                    <XMApp>
                                      <XMTok role="SUBSCRIPTOP" scriptpos="post3"/>
                                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">f</XMTok>
                                      <XMTok font="italic" fontsize="50%" role="UNKNOWN">r</XMTok>
                                    </XMApp>
                                  </XMApp>
                                  <XMApp>
                                    <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                    <XMTok font="bold italic" role="UNKNOWN">x</XMTok>
                                    <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                                  </XMApp>
                                </XMApp>
                                <XMApp>
                                  <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                  <XMApp>
                                    <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                    <XMApp>
                                      <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                      <XMTok font="bold italic" role="UNKNOWN">V</XMTok>
                                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                                    </XMApp>
                                    <XMApp>
                                      <XMTok role="SUBSCRIPTOP" scriptpos="post3"/>
                                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">f</XMTok>
                                      <XMTok font="italic" fontsize="50%" role="UNKNOWN">r</XMTok>
                                    </XMApp>
                                  </XMApp>
                                  <XMApp>
                                    <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                    <XMTok font="bold italic" role="UNKNOWN">x</XMTok>
                                    <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                                  </XMApp>
                                </XMApp>
                              </XMApp>
                              <XMTok role="CLOSE" stretchy="false">)</XMTok>
                            </XMWrap>
                          </XMDual>
                        </XMApp>
                      </XMath>
                    </Math></td>
                </tr>
              </MathBranch>
            </MathFork>
          </equation>
          <equation xml:id="S2.Ex4">
            <MathFork>
              <Math tex="\displaystyle{\bm{o}}=\sigma({\bm{U}}_{l}^{o}{\bm{h}}_{l}+{\bm{U}}_{r}^{o}{\bm%&#10;{h}}_{r}+{\bm{V}}^{o}{\bm{x}}_{n}+{\bm{V}}_{l}^{o}{\bm{x}}_{l}+{\bm{V}}_{r}^{o%&#10;}{\bm{x}}_{r})" text="o = sigma * ((U _ l) ^ o * h _ l + (U _ r) ^ o * h _ r + V ^ o * x _ n + (V _ l) ^ o * x _ l + (V _ r) ^ o * x _ r)" xml:id="S2.Ex4.m4">
                <XMath>
                  <XMApp>
                    <XMTok meaning="equals" role="RELOP">=</XMTok>
                    <XMTok font="bold italic" role="UNKNOWN">o</XMTok>
                    <XMApp>
                      <XMTok meaning="times" role="MULOP">⁢</XMTok>
                      <XMTok font="italic" name="sigma" role="UNKNOWN">σ</XMTok>
                      <XMDual>
                        <XMRef idref="S2.Ex4.m4.1"/>
                        <XMWrap>
                          <XMTok role="OPEN" stretchy="false">(</XMTok>
                          <XMApp xml:id="S2.Ex4.m4.1">
                            <XMTok meaning="plus" role="ADDOP">+</XMTok>
                            <XMApp>
                              <XMTok meaning="times" role="MULOP">⁢</XMTok>
                              <XMApp>
                                <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                <XMApp>
                                  <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                  <XMTok font="bold italic" role="UNKNOWN">U</XMTok>
                                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                                </XMApp>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">o</XMTok>
                              </XMApp>
                              <XMApp>
                                <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                <XMTok font="bold italic" role="UNKNOWN">h</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                              </XMApp>
                            </XMApp>
                            <XMApp>
                              <XMTok meaning="times" role="MULOP">⁢</XMTok>
                              <XMApp>
                                <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                <XMApp>
                                  <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                  <XMTok font="bold italic" role="UNKNOWN">U</XMTok>
                                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                                </XMApp>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">o</XMTok>
                              </XMApp>
                              <XMApp>
                                <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                <XMTok font="bold italic" role="UNKNOWN">h</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                              </XMApp>
                            </XMApp>
                            <XMApp>
                              <XMTok meaning="times" role="MULOP">⁢</XMTok>
                              <XMApp>
                                <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                <XMTok font="bold italic" role="UNKNOWN">V</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">o</XMTok>
                              </XMApp>
                              <XMApp>
                                <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                <XMTok font="bold italic" role="UNKNOWN">x</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">n</XMTok>
                              </XMApp>
                            </XMApp>
                            <XMApp>
                              <XMTok meaning="times" role="MULOP">⁢</XMTok>
                              <XMApp>
                                <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                <XMApp>
                                  <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                  <XMTok font="bold italic" role="UNKNOWN">V</XMTok>
                                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                                </XMApp>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">o</XMTok>
                              </XMApp>
                              <XMApp>
                                <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                <XMTok font="bold italic" role="UNKNOWN">x</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                              </XMApp>
                            </XMApp>
                            <XMApp>
                              <XMTok meaning="times" role="MULOP">⁢</XMTok>
                              <XMApp>
                                <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                <XMApp>
                                  <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                  <XMTok font="bold italic" role="UNKNOWN">V</XMTok>
                                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                                </XMApp>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">o</XMTok>
                              </XMApp>
                              <XMApp>
                                <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                <XMTok font="bold italic" role="UNKNOWN">x</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                              </XMApp>
                            </XMApp>
                          </XMApp>
                          <XMTok role="CLOSE" stretchy="false">)</XMTok>
                        </XMWrap>
                      </XMDual>
                    </XMApp>
                  </XMApp>
                </XMath>
              </Math>
              <MathBranch>
                <tr>
                  <td align="right"><Math mode="inline" tex="\displaystyle{\bm{o}}" text="o" xml:id="S2.Ex4.m1">
                      <XMath>
                        <XMTok font="bold italic" role="UNKNOWN">o</XMTok>
                      </XMath>
                    </Math></td>
                  <td align="center"><Math mode="inline" tex="\displaystyle=" text="=" xml:id="S2.Ex4.m2">
                      <XMath>
                        <XMTok meaning="equals" role="RELOP">=</XMTok>
                      </XMath>
                    </Math></td>
                  <td align="left"><Math mode="inline" tex="\displaystyle\sigma({\bm{U}}_{l}^{o}{\bm{h}}_{l}+{\bm{U}}_{r}^{o}{\bm{h}}_{r}+%&#10;{\bm{V}}^{o}{\bm{x}}_{n}+{\bm{V}}_{l}^{o}{\bm{x}}_{l}+{\bm{V}}_{r}^{o}{\bm{x}}%&#10;_{r})" text="sigma * ((U _ l) ^ o * h _ l + (U _ r) ^ o * h _ r + V ^ o * x _ n + (V _ l) ^ o * x _ l + (V _ r) ^ o * x _ r)" xml:id="S2.Ex4.m3">
                      <XMath>
                        <XMApp>
                          <XMTok meaning="times" role="MULOP">⁢</XMTok>
                          <XMTok font="italic" name="sigma" role="UNKNOWN">σ</XMTok>
                          <XMDual>
                            <XMRef idref="S2.Ex4.m3.1"/>
                            <XMWrap>
                              <XMTok role="OPEN" stretchy="false">(</XMTok>
                              <XMApp xml:id="S2.Ex4.m3.1">
                                <XMTok meaning="plus" role="ADDOP">+</XMTok>
                                <XMApp>
                                  <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                  <XMApp>
                                    <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                    <XMApp>
                                      <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                      <XMTok font="bold italic" role="UNKNOWN">U</XMTok>
                                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                                    </XMApp>
                                    <XMTok font="italic" fontsize="70%" role="UNKNOWN">o</XMTok>
                                  </XMApp>
                                  <XMApp>
                                    <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                    <XMTok font="bold italic" role="UNKNOWN">h</XMTok>
                                    <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                                  </XMApp>
                                </XMApp>
                                <XMApp>
                                  <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                  <XMApp>
                                    <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                    <XMApp>
                                      <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                      <XMTok font="bold italic" role="UNKNOWN">U</XMTok>
                                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                                    </XMApp>
                                    <XMTok font="italic" fontsize="70%" role="UNKNOWN">o</XMTok>
                                  </XMApp>
                                  <XMApp>
                                    <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                    <XMTok font="bold italic" role="UNKNOWN">h</XMTok>
                                    <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                                  </XMApp>
                                </XMApp>
                                <XMApp>
                                  <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                  <XMApp>
                                    <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                    <XMTok font="bold italic" role="UNKNOWN">V</XMTok>
                                    <XMTok font="italic" fontsize="70%" role="UNKNOWN">o</XMTok>
                                  </XMApp>
                                  <XMApp>
                                    <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                    <XMTok font="bold italic" role="UNKNOWN">x</XMTok>
                                    <XMTok font="italic" fontsize="70%" role="UNKNOWN">n</XMTok>
                                  </XMApp>
                                </XMApp>
                                <XMApp>
                                  <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                  <XMApp>
                                    <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                    <XMApp>
                                      <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                      <XMTok font="bold italic" role="UNKNOWN">V</XMTok>
                                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                                    </XMApp>
                                    <XMTok font="italic" fontsize="70%" role="UNKNOWN">o</XMTok>
                                  </XMApp>
                                  <XMApp>
                                    <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                    <XMTok font="bold italic" role="UNKNOWN">x</XMTok>
                                    <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                                  </XMApp>
                                </XMApp>
                                <XMApp>
                                  <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                  <XMApp>
                                    <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                    <XMApp>
                                      <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                      <XMTok font="bold italic" role="UNKNOWN">V</XMTok>
                                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                                    </XMApp>
                                    <XMTok font="italic" fontsize="70%" role="UNKNOWN">o</XMTok>
                                  </XMApp>
                                  <XMApp>
                                    <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                    <XMTok font="bold italic" role="UNKNOWN">x</XMTok>
                                    <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                                  </XMApp>
                                </XMApp>
                              </XMApp>
                              <XMTok role="CLOSE" stretchy="false">)</XMTok>
                            </XMWrap>
                          </XMDual>
                        </XMApp>
                      </XMath>
                    </Math></td>
                </tr>
              </MathBranch>
            </MathFork>
          </equation>
          <equation xml:id="S2.Ex5">
            <MathFork>
              <Math tex="\displaystyle\tilde{{\bm{c}}}=\tanh({\bm{U}}_{l}^{\tilde{c}}{\bm{h}}_{l}+{\bm{%&#10;U}}_{r}^{\tilde{c}}{\bm{h}}_{r}+{\bm{V}}^{\tilde{c}}{\bm{x}}_{n}+{\bm{V}}_{l}^%&#10;{\tilde{c}}{\bm{x}}_{l}+{\bm{V}}_{r}^{\tilde{c}}{\bm{x}}_{r})" text="tilde@(c) = hyperbolic-tangent@((U _ l) ^ (tilde@(c)) * h _ l + (U _ r) ^ (tilde@(c)) * h _ r + V ^ (tilde@(c)) * x _ n + (V _ l) ^ (tilde@(c)) * x _ l + (V _ r) ^ (tilde@(c)) * x _ r)" xml:id="S2.Ex5.m4">
                <XMath>
                  <XMApp>
                    <XMTok meaning="equals" role="RELOP">=</XMTok>
                    <XMApp>
                      <XMTok name="tilde" role="OVERACCENT" stretchy="false">~</XMTok>
                      <XMTok font="bold italic" role="UNKNOWN">c</XMTok>
                    </XMApp>
                    <XMDual>
                      <XMApp>
                        <XMRef idref="S2.Ex5.m4.1"/>
                        <XMRef idref="S2.Ex5.m4.2"/>
                      </XMApp>
                      <XMApp>
                        <XMTok meaning="hyperbolic-tangent" role="TRIGFUNCTION" xml:id="S2.Ex5.m4.1">tanh</XMTok>
                        <XMWrap>
                          <XMTok role="OPEN" stretchy="false">(</XMTok>
                          <XMApp xml:id="S2.Ex5.m4.2">
                            <XMTok meaning="plus" role="ADDOP">+</XMTok>
                            <XMApp>
                              <XMTok meaning="times" role="MULOP">⁢</XMTok>
                              <XMApp>
                                <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                <XMApp>
                                  <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                  <XMTok font="bold italic" role="UNKNOWN">U</XMTok>
                                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                                </XMApp>
                                <XMApp>
                                  <XMTok fontsize="70%" name="tilde" role="OVERACCENT" stretchy="false">~</XMTok>
                                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">c</XMTok>
                                </XMApp>
                              </XMApp>
                              <XMApp>
                                <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                <XMTok font="bold italic" role="UNKNOWN">h</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                              </XMApp>
                            </XMApp>
                            <XMApp>
                              <XMTok meaning="times" role="MULOP">⁢</XMTok>
                              <XMApp>
                                <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                <XMApp>
                                  <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                  <XMTok font="bold italic" role="UNKNOWN">U</XMTok>
                                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                                </XMApp>
                                <XMApp>
                                  <XMTok fontsize="70%" name="tilde" role="OVERACCENT" stretchy="false">~</XMTok>
                                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">c</XMTok>
                                </XMApp>
                              </XMApp>
                              <XMApp>
                                <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                <XMTok font="bold italic" role="UNKNOWN">h</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                              </XMApp>
                            </XMApp>
                            <XMApp>
                              <XMTok meaning="times" role="MULOP">⁢</XMTok>
                              <XMApp>
                                <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                <XMTok font="bold italic" role="UNKNOWN">V</XMTok>
                                <XMApp>
                                  <XMTok fontsize="70%" name="tilde" role="OVERACCENT" stretchy="false">~</XMTok>
                                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">c</XMTok>
                                </XMApp>
                              </XMApp>
                              <XMApp>
                                <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                <XMTok font="bold italic" role="UNKNOWN">x</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">n</XMTok>
                              </XMApp>
                            </XMApp>
                            <XMApp>
                              <XMTok meaning="times" role="MULOP">⁢</XMTok>
                              <XMApp>
                                <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                <XMApp>
                                  <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                  <XMTok font="bold italic" role="UNKNOWN">V</XMTok>
                                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                                </XMApp>
                                <XMApp>
                                  <XMTok fontsize="70%" name="tilde" role="OVERACCENT" stretchy="false">~</XMTok>
                                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">c</XMTok>
                                </XMApp>
                              </XMApp>
                              <XMApp>
                                <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                <XMTok font="bold italic" role="UNKNOWN">x</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                              </XMApp>
                            </XMApp>
                            <XMApp>
                              <XMTok meaning="times" role="MULOP">⁢</XMTok>
                              <XMApp>
                                <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                <XMApp>
                                  <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                  <XMTok font="bold italic" role="UNKNOWN">V</XMTok>
                                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                                </XMApp>
                                <XMApp>
                                  <XMTok fontsize="70%" name="tilde" role="OVERACCENT" stretchy="false">~</XMTok>
                                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">c</XMTok>
                                </XMApp>
                              </XMApp>
                              <XMApp>
                                <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                <XMTok font="bold italic" role="UNKNOWN">x</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                              </XMApp>
                            </XMApp>
                          </XMApp>
                          <XMTok role="CLOSE" stretchy="false">)</XMTok>
                        </XMWrap>
                      </XMApp>
                    </XMDual>
                  </XMApp>
                </XMath>
              </Math>
              <MathBranch>
                <tr>
                  <td align="right"><Math mode="inline" tex="\displaystyle\tilde{{\bm{c}}}" text="tilde@(c)" xml:id="S2.Ex5.m1">
                      <XMath>
                        <XMApp>
                          <XMTok name="tilde" role="OVERACCENT" stretchy="false">~</XMTok>
                          <XMTok font="bold italic" role="UNKNOWN">c</XMTok>
                        </XMApp>
                      </XMath>
                    </Math></td>
                  <td align="center"><Math mode="inline" tex="\displaystyle=" text="=" xml:id="S2.Ex5.m2">
                      <XMath>
                        <XMTok meaning="equals" role="RELOP">=</XMTok>
                      </XMath>
                    </Math></td>
                  <td align="left"><Math mode="inline" tex="\displaystyle\tanh({\bm{U}}_{l}^{\tilde{c}}{\bm{h}}_{l}+{\bm{U}}_{r}^{\tilde{c%&#10;}}{\bm{h}}_{r}+{\bm{V}}^{\tilde{c}}{\bm{x}}_{n}+{\bm{V}}_{l}^{\tilde{c}}{\bm{x%&#10;}}_{l}+{\bm{V}}_{r}^{\tilde{c}}{\bm{x}}_{r})" text="hyperbolic-tangent@((U _ l) ^ (tilde@(c)) * h _ l + (U _ r) ^ (tilde@(c)) * h _ r + V ^ (tilde@(c)) * x _ n + (V _ l) ^ (tilde@(c)) * x _ l + (V _ r) ^ (tilde@(c)) * x _ r)" xml:id="S2.Ex5.m3">
                      <XMath>
                        <XMDual>
                          <XMApp>
                            <XMRef idref="S2.Ex5.m3.1"/>
                            <XMRef idref="S2.Ex5.m3.2"/>
                          </XMApp>
                          <XMApp>
                            <XMTok meaning="hyperbolic-tangent" role="TRIGFUNCTION" xml:id="S2.Ex5.m3.1">tanh</XMTok>
                            <XMWrap>
                              <XMTok role="OPEN" stretchy="false">(</XMTok>
                              <XMApp xml:id="S2.Ex5.m3.2">
                                <XMTok meaning="plus" role="ADDOP">+</XMTok>
                                <XMApp>
                                  <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                  <XMApp>
                                    <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                    <XMApp>
                                      <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                      <XMTok font="bold italic" role="UNKNOWN">U</XMTok>
                                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                                    </XMApp>
                                    <XMApp>
                                      <XMTok fontsize="70%" name="tilde" role="OVERACCENT" stretchy="false">~</XMTok>
                                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">c</XMTok>
                                    </XMApp>
                                  </XMApp>
                                  <XMApp>
                                    <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                    <XMTok font="bold italic" role="UNKNOWN">h</XMTok>
                                    <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                                  </XMApp>
                                </XMApp>
                                <XMApp>
                                  <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                  <XMApp>
                                    <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                    <XMApp>
                                      <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                      <XMTok font="bold italic" role="UNKNOWN">U</XMTok>
                                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                                    </XMApp>
                                    <XMApp>
                                      <XMTok fontsize="70%" name="tilde" role="OVERACCENT" stretchy="false">~</XMTok>
                                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">c</XMTok>
                                    </XMApp>
                                  </XMApp>
                                  <XMApp>
                                    <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                    <XMTok font="bold italic" role="UNKNOWN">h</XMTok>
                                    <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                                  </XMApp>
                                </XMApp>
                                <XMApp>
                                  <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                  <XMApp>
                                    <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                    <XMTok font="bold italic" role="UNKNOWN">V</XMTok>
                                    <XMApp>
                                      <XMTok fontsize="70%" name="tilde" role="OVERACCENT" stretchy="false">~</XMTok>
                                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">c</XMTok>
                                    </XMApp>
                                  </XMApp>
                                  <XMApp>
                                    <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                    <XMTok font="bold italic" role="UNKNOWN">x</XMTok>
                                    <XMTok font="italic" fontsize="70%" role="UNKNOWN">n</XMTok>
                                  </XMApp>
                                </XMApp>
                                <XMApp>
                                  <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                  <XMApp>
                                    <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                    <XMApp>
                                      <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                      <XMTok font="bold italic" role="UNKNOWN">V</XMTok>
                                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                                    </XMApp>
                                    <XMApp>
                                      <XMTok fontsize="70%" name="tilde" role="OVERACCENT" stretchy="false">~</XMTok>
                                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">c</XMTok>
                                    </XMApp>
                                  </XMApp>
                                  <XMApp>
                                    <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                    <XMTok font="bold italic" role="UNKNOWN">x</XMTok>
                                    <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                                  </XMApp>
                                </XMApp>
                                <XMApp>
                                  <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                  <XMApp>
                                    <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                    <XMApp>
                                      <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                      <XMTok font="bold italic" role="UNKNOWN">V</XMTok>
                                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                                    </XMApp>
                                    <XMApp>
                                      <XMTok fontsize="70%" name="tilde" role="OVERACCENT" stretchy="false">~</XMTok>
                                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">c</XMTok>
                                    </XMApp>
                                  </XMApp>
                                  <XMApp>
                                    <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                    <XMTok font="bold italic" role="UNKNOWN">x</XMTok>
                                    <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                                  </XMApp>
                                </XMApp>
                              </XMApp>
                              <XMTok role="CLOSE" stretchy="false">)</XMTok>
                            </XMWrap>
                          </XMApp>
                        </XMDual>
                      </XMath>
                    </Math></td>
                </tr>
              </MathBranch>
            </MathFork>
          </equation>
          <equation xml:id="S2.Ex6">
            <MathFork>
              <Math tex="\displaystyle{\bm{c}}_{n}={\bm{i}}\odot\tilde{{\bm{c}}}+{\bm{f}}_{l}\odot{\bm{%&#10;c}}_{l}+{\bm{f}}_{r}\odot{\bm{c}}_{r}" text="c _ n = i direct-product tilde@(c) + f _ l direct-product c _ l + f _ r direct-product c _ r" xml:id="S2.Ex6.m4">
                <XMath>
                  <XMApp>
                    <XMTok meaning="equals" role="RELOP">=</XMTok>
                    <XMApp>
                      <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                      <XMTok font="bold italic" role="UNKNOWN">c</XMTok>
                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">n</XMTok>
                    </XMApp>
                    <XMApp>
                      <XMTok meaning="plus" role="ADDOP">+</XMTok>
                      <XMApp>
                        <XMTok meaning="direct-product" name="odot" role="MULOP">⊙</XMTok>
                        <XMTok font="bold italic" role="UNKNOWN">i</XMTok>
                        <XMApp>
                          <XMTok name="tilde" role="OVERACCENT" stretchy="false">~</XMTok>
                          <XMTok font="bold italic" role="UNKNOWN">c</XMTok>
                        </XMApp>
                      </XMApp>
                      <XMApp>
                        <XMTok meaning="direct-product" name="odot" role="MULOP">⊙</XMTok>
                        <XMApp>
                          <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                          <XMTok font="bold italic" role="UNKNOWN">f</XMTok>
                          <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                        </XMApp>
                        <XMApp>
                          <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                          <XMTok font="bold italic" role="UNKNOWN">c</XMTok>
                          <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                        </XMApp>
                      </XMApp>
                      <XMApp>
                        <XMTok meaning="direct-product" name="odot" role="MULOP">⊙</XMTok>
                        <XMApp>
                          <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                          <XMTok font="bold italic" role="UNKNOWN">f</XMTok>
                          <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                        </XMApp>
                        <XMApp>
                          <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                          <XMTok font="bold italic" role="UNKNOWN">c</XMTok>
                          <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                        </XMApp>
                      </XMApp>
                    </XMApp>
                  </XMApp>
                </XMath>
              </Math>
              <MathBranch>
                <tr>
                  <td align="right"><Math mode="inline" tex="\displaystyle{\bm{c}}_{n}" text="c _ n" xml:id="S2.Ex6.m1">
                      <XMath>
                        <XMApp>
                          <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                          <XMTok font="bold italic" role="UNKNOWN">c</XMTok>
                          <XMTok font="italic" fontsize="70%" role="UNKNOWN">n</XMTok>
                        </XMApp>
                      </XMath>
                    </Math></td>
                  <td align="center"><Math mode="inline" tex="\displaystyle=" text="=" xml:id="S2.Ex6.m2">
                      <XMath>
                        <XMTok meaning="equals" role="RELOP">=</XMTok>
                      </XMath>
                    </Math></td>
                  <td align="left"><Math mode="inline" tex="\displaystyle{\bm{i}}\odot\tilde{{\bm{c}}}+{\bm{f}}_{l}\odot{\bm{c}}_{l}+{\bm{%&#10;f}}_{r}\odot{\bm{c}}_{r}" text="i direct-product tilde@(c) + f _ l direct-product c _ l + f _ r direct-product c _ r" xml:id="S2.Ex6.m3">
                      <XMath>
                        <XMApp>
                          <XMTok meaning="plus" role="ADDOP">+</XMTok>
                          <XMApp>
                            <XMTok meaning="direct-product" name="odot" role="MULOP">⊙</XMTok>
                            <XMTok font="bold italic" role="UNKNOWN">i</XMTok>
                            <XMApp>
                              <XMTok name="tilde" role="OVERACCENT" stretchy="false">~</XMTok>
                              <XMTok font="bold italic" role="UNKNOWN">c</XMTok>
                            </XMApp>
                          </XMApp>
                          <XMApp>
                            <XMTok meaning="direct-product" name="odot" role="MULOP">⊙</XMTok>
                            <XMApp>
                              <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                              <XMTok font="bold italic" role="UNKNOWN">f</XMTok>
                              <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                            </XMApp>
                            <XMApp>
                              <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                              <XMTok font="bold italic" role="UNKNOWN">c</XMTok>
                              <XMTok font="italic" fontsize="70%" role="UNKNOWN">l</XMTok>
                            </XMApp>
                          </XMApp>
                          <XMApp>
                            <XMTok meaning="direct-product" name="odot" role="MULOP">⊙</XMTok>
                            <XMApp>
                              <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                              <XMTok font="bold italic" role="UNKNOWN">f</XMTok>
                              <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                            </XMApp>
                            <XMApp>
                              <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                              <XMTok font="bold italic" role="UNKNOWN">c</XMTok>
                              <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                            </XMApp>
                          </XMApp>
                        </XMApp>
                      </XMath>
                    </Math></td>
                </tr>
              </MathBranch>
            </MathFork>
          </equation>
          <equation labels="LABEL:eq:treeLSTM" xml:id="S2.E1">
            <tags>
              <tag>(1)</tag>
              <tag role="refnum">1</tag>
            </tags>
            <MathFork>
              <Math tex="\displaystyle{\bm{h}}_{n}={\bm{o}}\odot\tanh({\bm{c}}_{n})" text="h _ n = o direct-product hyperbolic-tangent@(c _ n)" xml:id="S2.E1.m4">
                <XMath>
                  <XMApp>
                    <XMTok meaning="equals" role="RELOP">=</XMTok>
                    <XMApp>
                      <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                      <XMTok font="bold italic" role="UNKNOWN">h</XMTok>
                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">n</XMTok>
                    </XMApp>
                    <XMApp>
                      <XMTok meaning="direct-product" name="odot" role="MULOP">⊙</XMTok>
                      <XMTok font="bold italic" role="UNKNOWN">o</XMTok>
                      <XMDual>
                        <XMApp>
                          <XMRef idref="S2.E1.m4.1"/>
                          <XMRef idref="S2.E1.m4.2"/>
                        </XMApp>
                        <XMApp>
                          <XMTok meaning="hyperbolic-tangent" role="TRIGFUNCTION" xml:id="S2.E1.m4.1">tanh</XMTok>
                          <XMWrap>
                            <XMTok role="OPEN" stretchy="false">(</XMTok>
                            <XMApp xml:id="S2.E1.m4.2">
                              <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                              <XMTok font="bold italic" role="UNKNOWN">c</XMTok>
                              <XMTok font="italic" fontsize="70%" role="UNKNOWN">n</XMTok>
                            </XMApp>
                            <XMTok role="CLOSE" stretchy="false">)</XMTok>
                          </XMWrap>
                        </XMApp>
                      </XMDual>
                    </XMApp>
                  </XMApp>
                </XMath>
              </Math>
              <MathBranch>
                <tr>
                  <td align="right"><Math mode="inline" tex="\displaystyle{\bm{h}}_{n}" text="h _ n" xml:id="S2.E1.m1">
                      <XMath>
                        <XMApp>
                          <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                          <XMTok font="bold italic" role="UNKNOWN">h</XMTok>
                          <XMTok font="italic" fontsize="70%" role="UNKNOWN">n</XMTok>
                        </XMApp>
                      </XMath>
                    </Math></td>
                  <td align="center"><Math mode="inline" tex="\displaystyle=" text="=" xml:id="S2.E1.m2">
                      <XMath>
                        <XMTok meaning="equals" role="RELOP">=</XMTok>
                      </XMath>
                    </Math></td>
                  <td align="left"><Math mode="inline" tex="\displaystyle{\bm{o}}\odot\tanh({\bm{c}}_{n})" text="o direct-product hyperbolic-tangent@(c _ n)" xml:id="S2.E1.m3">
                      <XMath>
                        <XMApp>
                          <XMTok meaning="direct-product" name="odot" role="MULOP">⊙</XMTok>
                          <XMTok font="bold italic" role="UNKNOWN">o</XMTok>
                          <XMDual>
                            <XMApp>
                              <XMRef idref="S2.E1.m3.1"/>
                              <XMRef idref="S2.E1.m3.2"/>
                            </XMApp>
                            <XMApp>
                              <XMTok meaning="hyperbolic-tangent" role="TRIGFUNCTION" xml:id="S2.E1.m3.1">tanh</XMTok>
                              <XMWrap>
                                <XMTok role="OPEN" stretchy="false">(</XMTok>
                                <XMApp xml:id="S2.E1.m3.2">
                                  <XMTok role="SUBSCRIPTOP" scriptpos="post2"/>
                                  <XMTok font="bold italic" role="UNKNOWN">c</XMTok>
                                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">n</XMTok>
                                </XMApp>
                                <XMTok role="CLOSE" stretchy="false">)</XMTok>
                              </XMWrap>
                            </XMApp>
                          </XMDual>
                        </XMApp>
                      </XMath>
                    </Math></td>
                </tr>
              </MathBranch>
            </MathFork>
          </equation>
        </equationgroup>
        <p>The hidden state of the root node, <Math mode="inline" tex="{\bm{h}}_{root}" text="h _ (r * o * o * t)" xml:id="S2.SS5.p2.m1">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="bold italic" role="UNKNOWN">h</XMTok>
                <XMApp>
                  <XMTok meaning="times" role="MULOP">⁢</XMTok>
                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">r</XMTok>
                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">o</XMTok>
                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">o</XMTok>
                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
                </XMApp>
              </XMApp>
            </XMath>
          </Math>, is considered as the representation of the entire tree.
<!--  %**** taslp2019.tex Line 225 **** -->Figure <ref labelref="LABEL:subfig:treelstm_model"/> shows the treeLSTM model.</p>
      </para>
      <figure inlist="lof" labels="LABEL:fig:models" xml:id="S2.F3">
        <tags>
          <tag><text fontsize="90%">Figure 3</text></tag>
          <tag role="refnum">3</tag>
          <tag role="typerefnum">Figure 3</tag>
        </tags>
        <figure align="center" inlist="lof" labels="LABEL:subfig:lstm_model" xml:id="S2.F2.sf1">
          <tags>
            <tag><text fontsize="90%">(a)</text></tag>
            <tag role="refnum">2(a)</tag>
          </tags>
          <graphics class="ltx_centering" graphic="lstm" options="width=390.258pt" xml:id="S2.F2.sf1.g1"/>
          <toccaption class="ltx_centering"><tag close=" ">(a)</tag>LSTM</toccaption>
          <caption class="ltx_centering"><tag close=" "><text fontsize="90%">(a)</text></tag><text fontsize="90%">LSTM</text></caption>
        </figure>
        <figure align="center" inlist="lof" labels="LABEL:subfig:treelstm_model" xml:id="S2.F2.sf2">
          <tags>
            <tag><text fontsize="90%">(b)</text></tag>
            <tag role="refnum">2(b)</tag>
          </tags>
          <graphics class="ltx_centering" graphic="recursive" options="width=390.258pt" xml:id="S2.F2.sf2.g1"/>
          <toccaption class="ltx_centering"><tag close=" ">(b)</tag>treeLSTM</toccaption>
          <caption class="ltx_centering"><tag close=" "><text fontsize="90%">(b)</text></tag><text fontsize="90%">treeLSTM</text></caption>
        </figure>
        <toccaption class="ltx_centering"><tag close=" ">3</tag>
<text font="italic">LSTM and treeLSTM models.
The <text font="bold">last</text> hidden layer <Math mode="inline" tex="\bf{h}_{7}" text="h _ 7" xml:id="S2.F3.m1">
              <XMath>
                <XMApp>
                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                  <XMTok font="bold upright" role="UNKNOWN">h</XMTok>
                  <XMTok font="bold upright" fontsize="70%" meaning="7" role="NUMBER">7</XMTok>
                </XMApp>
              </XMath>
            </Math> of LSTM is the logograph embedding.
The <text font="bold">root</text> hidden layer <Math mode="inline" tex="\bf{h}_{7}" text="h _ 7" xml:id="S2.F3.m2">
              <XMath>
                <XMApp>
                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                  <XMTok font="bold upright" role="UNKNOWN">h</XMTok>
                  <XMTok font="bold upright" fontsize="70%" meaning="7" role="NUMBER">7</XMTok>
                </XMApp>
              </XMath>
            </Math> of treeLSTM is the logograph hierarchical embedding.
</text></toccaption>
        <caption class="ltx_centering"><tag close=": "><text fontsize="90%">Figure 3</text></tag><text fontsize="90%">
<text font="italic">LSTM and treeLSTM models.
The <text font="bold">last</text> hidden layer <Math mode="inline" tex="\bf{h}_{7}" text="h _ 7" xml:id="S2.F3.m3">
                <XMath>
                  <XMApp>
                    <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="bold upright" role="UNKNOWN">h</XMTok>
                    <XMTok font="bold upright" fontsize="70%" meaning="7" role="NUMBER">7</XMTok>
                  </XMApp>
                </XMath>
              </Math> of LSTM is the logograph embedding.
The <text font="bold">root</text> hidden layer <Math mode="inline" tex="\bf{h}_{7}" text="h _ 7" xml:id="S2.F3.m4">
                <XMath>
                  <XMApp>
                    <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                    <XMTok font="bold upright" role="UNKNOWN">h</XMTok>
                    <XMTok font="bold upright" fontsize="70%" meaning="7" role="NUMBER">7</XMTok>
                  </XMApp>
                </XMath>
              </Math> of treeLSTM is the logograph hierarchical embedding.
</text></text></caption>
      </figure>
    </subsection>
    <subsection inlist="toc" xml:id="S2.SS6">
      <tags>
        <tag>II-F</tag>
        <tag role="refnum">II-F</tag>
        <tag role="typerefnum">§II-F</tag>
      </tags>
      <title><tag close=" ">II-F</tag><text font="italic">Implementation Details</text></title>
<!--  %**** taslp2019.tex Line 250 **** -->      <para xml:id="S2.SS6.p1">
        <p>One problem with tree-structured models is that training is very slow <cite class="ltx_citemacro_cite">[<bibref bibrefs="irsoy2014deep" separator="," yyseparator=","/>]</cite>.
It is hard to batch the training samples as they might have different tree shapes <cite class="ltx_citemacro_cite">[<bibref bibrefs="eriguchi2016tree" separator="," yyseparator=","/>]</cite>.
As a result, training with batch size of one is very common for tree-structured model <cite class="ltx_citemacro_cite">[<bibref bibrefs="neubig2017fly" separator="," yyseparator=","/>]</cite>,
which fails to maximize parallel computation and thus leads to slow training.
Instead, we used dynamic batching to speed up training and inference.
Dynamic batching in Pytorch<note mark="3" role="footnote" xml:id="footnote3"><tags>
              <tag>3</tag>
              <tag role="refnum">3</tag>
              <tag role="typerefnum">footnote 3</tag>
            </tags><ref class="ltx_url" font="typewriter" href="https://devblogs.nvidia.com/recursive-neural-networks-pytorch/">https://devblogs.nvidia.com/recursive-neural-networks-pytorch/</ref></note> has been used to create batches of nodes on the fly to speed up the SPINN model <cite class="ltx_citemacro_cite">[<bibref bibrefs="bowman2016fast" separator="," yyseparator=","/>]</cite> training and inference.
In our experiments, using a batch size of 128 results in more than 10 times faster training and inference.
Besides, we only considered binary trees and converted any ternary nodes (nodes with three children) to two nested binary nodes.
This is done to reduce the number of parameters that the model has to learn therefore improving the learning efficiency.
The tree representation is sensitive to the order of the children nodes as swapping the left child and the right child in a tree results in a character with potentially different meaning and pronunciation.
Thus, we need separate weight matrices for each of the children.
As such, modeling both binary and ternary nodes would require from 3 to 5 weight matrices whereas modeling binary trees only requires 2 weight matrices.
Since the amount of data is limited, we preferred models with fewer parameters and thus we converted all ternary nodes to binary nodes.</p>
      </para>
    </subsection>
  </section>
  <section inlist="toc" labels="LABEL:sec:exp_ph" xml:id="S3">
    <tags>
      <tag>III</tag>
      <tag role="refnum">III</tag>
      <tag role="typerefnum">§III</tag>
    </tags>
    <title><tag close=" ">III</tag><text font="smallcaps">Experiments — Pronunciation Prediction</text></title>
    <para xml:id="S3.p1">
      <p>We compared embeddings produced using treeLSTM against LSTM and biLSTM.
treeLSTM operates directly on the tree form of the logograph in order to exploit the recursive structure of logographs most effectively.
In contrast, LSTM and biLSTM use more implicit structural information of the logograph in the form of linearized trees.
Since standard embeddings do not consider logographic structures, every input logograph is distinct so this approach cannot learn similarities between logographs.
Hence, we did not compare the hierarchical embeddings against standard embeddings.</p>
    </para>
    <subsection inlist="toc" xml:id="S3.SS1">
      <tags>
        <tag>III-A</tag>
        <tag role="refnum">III-A</tag>
        <tag role="typerefnum">§III-A</tag>
      </tags>
      <title><tag close=" ">III-A</tag><text font="italic">Data</text></title>
      <para xml:id="S3.SS1.p1">
        <p>The data was extracted from UniHan database<note mark="4" role="footnote" xml:id="footnote4"><tags>
              <tag>4</tag>
              <tag role="refnum">4</tag>
              <tag role="typerefnum">footnote 4</tag>
            </tags><ref class="ltx_url" font="typewriter" href="https://www.unicode.org/charts/unihan.html">https://www.unicode.org/charts/unihan.html</ref></note>, which is a pronunciation database of characters of Han logographic languages.
<!--  %**** taslp2019.tex Line 275 **** -->Each entry consists of a character and its pronunciations in various languages such as Cantonese and Mandarin.
For entry with multiple pronunciations, since the dominant pronunciation is not indicated, we randomly picked one of the variants.
For this task, the input is the logographic character and output is the Cantonese pronunciation.
The pronunciation includes onset, nucleus, and coda.
As far as we know, lexical tones are not directly determined by logographic structures so we did not include lexical tones as prediction targets.</p>
      </para>
      <para xml:id="S3.SS1.p2">
        <p>There are two types of logographs used in Cantonese, namely traditional and simplified characters.
Simplified characters, as the name implies, are derived from their traditional counterparts by removing or replacing some complex sub-units with simpler ones.
Non-simplified characters include both traditional characters and the subset of Chinese characters that are identical for traditional and simplified counterparts.
Hence, simplified and traditional Chinese characters are quite different in terms of unique sub-units and their complexity.</p>
      </para>
    </subsection>
    <subsection inlist="toc" xml:id="S3.SS2">
      <tags>
        <tag>III-B</tag>
        <tag role="refnum">III-B</tag>
        <tag role="typerefnum">§III-B</tag>
      </tags>
      <title><tag close=" ">III-B</tag><text font="italic">Setup</text></title>
      <para xml:id="S3.SS2.p1">
        <p>A common weakness of deep learning models is that they often merely memorize patterns and do not generalize well on unseen data <cite class="ltx_citemacro_cite">[<bibref bibrefs="jia2017adversarial" separator="," yyseparator=","/>]</cite>.
LSTM has the same weakness as it performs well when there is abundant training data and test distribution is the same as the training distribution <cite class="ltx_citemacro_cite">[<bibref bibrefs="lake2018generalization" separator="," yyseparator=","/>]</cite>.
When the test and training distributions are different, LSTM does not perform as well.
Strong generalization requires models to extrapolate to out-of-distribution data points rather than to interpolate using data points within distribution <cite class="ltx_citemacro_cite">[<bibref bibrefs="mitchell2018extrapolation" separator="," yyseparator=","/>]</cite>.</p>
      </para>
      <para xml:id="S3.SS2.p2">
        <p>To test the generalizability of standard LSTM and treeLSTM, the original UniHan dataset was split into training and test sets in three different scenarios described in Table <ref labelref="LABEL:tbl:ph_data"/>.
In the first scenario, the training and test set’s distribution were homogeneous: both contained traditional and simplified characters.
In the second scenario, the test set only contained simplified characters and the training set contained non-simplified characters.
In the third scenario, the distributions were different and the training data was limited: the test set contained only simplified characters while the training sets contained corresponding traditional characters.</p>
      </para>
      <table inlist="lot" labels="LABEL:tbl:ph_data" placement="ht" xml:id="S3.T1">
        <tags>
          <tag><text fontsize="90%">Table I</text></tag>
          <tag role="refnum">I</tag>
          <tag role="typerefnum">Table I</tag>
        </tags>
<!--  %**** taslp2019.tex Line 300 **** -->        <tabular class="ltx_centering ltx_guessed_headers" vattach="middle">
          <thead>
            <tr>
              <td align="left" thead="column row">Scenario</td>
              <td align="right" thead="column">Training</td>
              <td align="right" thead="column">Validation</td>
              <td align="right" thead="column">Test</td>
            </tr>
          </thead>
          <tbody>
            <tr>
              <td align="left" border="t" thead="row">1. Tr, Sp<Math mode="inline" tex="\rightarrow" text="rightarrow" xml:id="S3.T1.m1">
                  <XMath>
                    <XMTok name="rightarrow" role="ARROW">→</XMTok>
                  </XMath>
                </Math> Tr, Sp</td>
              <td align="right" border="t">16000</td>
              <td align="right" border="t">2400</td>
              <td align="right" border="t">2400</td>
            </tr>
            <tr>
              <td align="left" thead="row">2. Non-Sp <Math mode="inline" tex="\rightarrow" text="rightarrow" xml:id="S3.T1.m2">
                  <XMath>
                    <XMTok name="rightarrow" role="ARROW">→</XMTok>
                  </XMath>
                </Math> Sp</td>
              <td align="right">16000</td>
              <td align="right">2400</td>
              <td align="right">2400</td>
            </tr>
            <tr>
              <td align="left" thead="row">3. Tr <Math mode="inline" tex="\rightarrow" text="rightarrow" xml:id="S3.T1.m3">
                  <XMath>
                    <XMTok name="rightarrow" role="ARROW">→</XMTok>
                  </XMath>
                </Math> Sp</td>
              <td align="right">2302</td>
              <td align="right">200</td>
              <td align="right">2400</td>
            </tr>
          </tbody>
        </tabular>
        <toccaption class="ltx_centering"><tag close=" ">I</tag>
<text font="italic">Number of characters (logographs) used for training and testing in each of the scenario. Tr: Traditional, Sp: Simplified</text></toccaption>
        <caption class="ltx_centering"><tag close=": "><text fontsize="90%">Table I</text></tag><text fontsize="90%">
<text font="italic">Number of characters (logographs) used for training and testing in each of the scenario. Tr: Traditional, Sp: Simplified</text></text></caption>
      </table>
      <para xml:id="S3.SS2.p3">
        <p>The third scenario is inspired by the fact that humans being able to predict pronunciations of simplified characters given the corresponding traditional characters, although they may rely on word context.
Given that human performance is high, it should not be impossible for models to generalize to simplified characters even when trained solely on traditional characters.
By contrasting results obtained from scenario 1 and 2, we could determine whether the models merely memorized patterns or they learned the underlying rules to predict pronunciation as humans,
since models that merely memorize patterns would do well in scenario 1 but not scenario 2.
In addition, contrasting scenario 2 and 3 would hint at how models perform in low-resource scenarios of limited training data as well as whether the bias induced by the logographic structures is useful for improving model generalization.
It should be noted that scenario 1 is the ideal case in which one is very careful in collecting data and performs data normalization.
If data is collected indiscriminately, one can end up in scenario 2.</p>
      </para>
    </subsection>
    <subsection inlist="toc" labels="LABEL:ssec:task_layer" xml:id="S3.SS3">
      <tags>
        <tag>III-C</tag>
        <tag role="refnum">III-C</tag>
        <tag role="typerefnum">§III-C</tag>
      </tags>
      <title><tag close=" ">III-C</tag><text font="italic">Task-specific Layer</text></title>
      <para xml:id="S3.SS3.p1">
        <p>The task-specific layer uses the logograph embedding to predict the logograph’s pronunciation, which includes onset, nucleus and coda.
Probability of each sub-syllabic unit’s pronunciation is given by:
<!--  %**** taslp2019.tex Line 325 **** --></p>
        <equationgroup class="ltx_eqn_eqnarray" xml:id="Sx1.EGx2">
          <equation xml:id="S3.Ex7">
            <MathFork>
              <Math tex="\displaystyle CD=\mbox{softmax}(W^{CD}{\bm{h}})," text="C * D = [softmax] * W ^ (C * D) * h" xml:id="S3.Ex7.m4">
                <XMath>
                  <XMDual>
                    <XMRef idref="S3.Ex7.m4.1"/>
                    <XMWrap>
                      <XMApp xml:id="S3.Ex7.m4.1">
                        <XMTok meaning="equals" role="RELOP">=</XMTok>
                        <XMApp>
                          <XMTok meaning="times" role="MULOP">⁢</XMTok>
                          <XMTok font="italic" role="UNKNOWN">C</XMTok>
                          <XMTok font="italic" role="UNKNOWN">D</XMTok>
                        </XMApp>
                        <XMApp>
                          <XMTok meaning="times" role="MULOP">⁢</XMTok>
                          <XMText>softmax</XMText>
                          <XMDual>
                            <XMRef idref="S3.Ex7.m4.1.1"/>
                            <XMWrap>
                              <XMTok role="OPEN" stretchy="false">(</XMTok>
                              <XMApp xml:id="S3.Ex7.m4.1.1">
                                <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                <XMApp>
                                  <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                  <XMTok font="italic" role="UNKNOWN">W</XMTok>
                                  <XMApp>
                                    <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                    <XMTok font="italic" fontsize="70%" role="UNKNOWN">C</XMTok>
                                    <XMTok font="italic" fontsize="70%" role="UNKNOWN">D</XMTok>
                                  </XMApp>
                                </XMApp>
                                <XMTok font="bold italic" role="UNKNOWN">h</XMTok>
                              </XMApp>
                              <XMTok role="CLOSE" stretchy="false">)</XMTok>
                            </XMWrap>
                          </XMDual>
                        </XMApp>
                      </XMApp>
                      <XMTok role="PUNCT">,</XMTok>
                    </XMWrap>
                  </XMDual>
                </XMath>
              </Math>
              <MathBranch>
                <tr>
                  <td align="right"><Math mode="inline" tex="\displaystyle CD" text="C * D" xml:id="S3.Ex7.m1">
                      <XMath>
                        <XMApp>
                          <XMTok meaning="times" role="MULOP">⁢</XMTok>
                          <XMTok font="italic" role="UNKNOWN">C</XMTok>
                          <XMTok font="italic" role="UNKNOWN">D</XMTok>
                        </XMApp>
                      </XMath>
                    </Math></td>
                  <td align="center"><Math mode="inline" tex="\displaystyle=" text="=" xml:id="S3.Ex7.m2">
                      <XMath>
                        <XMTok meaning="equals" role="RELOP">=</XMTok>
                      </XMath>
                    </Math></td>
                  <td align="left"><Math mode="inline" tex="\displaystyle\mbox{softmax}(W^{CD}{\bm{h}})," text="[softmax] * W ^ (C * D) * h" xml:id="S3.Ex7.m3">
                      <XMath>
                        <XMDual>
                          <XMRef idref="S3.Ex7.m3.1"/>
                          <XMWrap>
                            <XMApp xml:id="S3.Ex7.m3.1">
                              <XMTok meaning="times" role="MULOP">⁢</XMTok>
                              <XMText>softmax</XMText>
                              <XMDual>
                                <XMRef idref="S3.Ex7.m3.1.1"/>
                                <XMWrap>
                                  <XMTok role="OPEN" stretchy="false">(</XMTok>
                                  <XMApp xml:id="S3.Ex7.m3.1.1">
                                    <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                    <XMApp>
                                      <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                      <XMTok font="italic" role="UNKNOWN">W</XMTok>
                                      <XMApp>
                                        <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">C</XMTok>
                                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">D</XMTok>
                                      </XMApp>
                                    </XMApp>
                                    <XMTok font="bold italic" role="UNKNOWN">h</XMTok>
                                  </XMApp>
                                  <XMTok role="CLOSE" stretchy="false">)</XMTok>
                                </XMWrap>
                              </XMDual>
                            </XMApp>
                            <XMTok role="PUNCT">,</XMTok>
                          </XMWrap>
                        </XMDual>
                      </XMath>
                    </Math></td>
                </tr>
              </MathBranch>
            </MathFork>
          </equation>
          <equation xml:id="S3.Ex8">
            <MathFork>
              <Math tex="\displaystyle NU=\mbox{softmax}(W^{NU}[{\bm{h}},CD])," text="N * U = [softmax] * W ^ (N * U) * closed-interval@(h, C * D)" xml:id="S3.Ex8.m4">
                <XMath>
                  <XMDual>
                    <XMRef idref="S3.Ex8.m4.2"/>
                    <XMWrap>
                      <XMApp xml:id="S3.Ex8.m4.2">
                        <XMTok meaning="equals" role="RELOP">=</XMTok>
                        <XMApp>
                          <XMTok meaning="times" role="MULOP">⁢</XMTok>
                          <XMTok font="italic" role="UNKNOWN">N</XMTok>
                          <XMTok font="italic" role="UNKNOWN">U</XMTok>
                        </XMApp>
                        <XMApp>
                          <XMTok meaning="times" role="MULOP">⁢</XMTok>
                          <XMText>softmax</XMText>
                          <XMDual>
                            <XMRef idref="S3.Ex8.m4.2.1"/>
                            <XMWrap>
                              <XMTok role="OPEN" stretchy="false">(</XMTok>
                              <XMApp xml:id="S3.Ex8.m4.2.1">
                                <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                <XMApp>
                                  <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                  <XMTok font="italic" role="UNKNOWN">W</XMTok>
                                  <XMApp>
                                    <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                    <XMTok font="italic" fontsize="70%" role="UNKNOWN">N</XMTok>
                                    <XMTok font="italic" fontsize="70%" role="UNKNOWN">U</XMTok>
                                  </XMApp>
                                </XMApp>
                                <XMDual>
                                  <XMApp>
                                    <XMTok meaning="closed-interval"/>
                                    <XMRef idref="S3.Ex8.m4.1"/>
                                    <XMRef idref="S3.Ex8.m4.2.1.1"/>
                                  </XMApp>
                                  <XMWrap>
                                    <XMTok role="OPEN" stretchy="false">[</XMTok>
                                    <XMTok font="bold italic" role="UNKNOWN" xml:id="S3.Ex8.m4.1">h</XMTok>
                                    <XMTok role="PUNCT">,</XMTok>
                                    <XMApp xml:id="S3.Ex8.m4.2.1.1">
                                      <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                      <XMTok font="italic" role="UNKNOWN">C</XMTok>
                                      <XMTok font="italic" role="UNKNOWN">D</XMTok>
                                    </XMApp>
                                    <XMTok role="CLOSE" stretchy="false">]</XMTok>
                                  </XMWrap>
                                </XMDual>
                              </XMApp>
                              <XMTok role="CLOSE" stretchy="false">)</XMTok>
                            </XMWrap>
                          </XMDual>
                        </XMApp>
                      </XMApp>
                      <XMTok role="PUNCT">,</XMTok>
                    </XMWrap>
                  </XMDual>
                </XMath>
              </Math>
              <MathBranch>
                <tr>
                  <td align="right"><Math mode="inline" tex="\displaystyle NU" text="N * U" xml:id="S3.Ex8.m1">
                      <XMath>
                        <XMApp>
                          <XMTok meaning="times" role="MULOP">⁢</XMTok>
                          <XMTok font="italic" role="UNKNOWN">N</XMTok>
                          <XMTok font="italic" role="UNKNOWN">U</XMTok>
                        </XMApp>
                      </XMath>
                    </Math></td>
                  <td align="center"><Math mode="inline" tex="\displaystyle=" text="=" xml:id="S3.Ex8.m2">
                      <XMath>
                        <XMTok meaning="equals" role="RELOP">=</XMTok>
                      </XMath>
                    </Math></td>
                  <td align="left"><Math mode="inline" tex="\displaystyle\mbox{softmax}(W^{NU}[{\bm{h}},CD])," text="[softmax] * W ^ (N * U) * closed-interval@(h, C * D)" xml:id="S3.Ex8.m3">
                      <XMath>
                        <XMDual>
                          <XMRef idref="S3.Ex8.m3.2"/>
                          <XMWrap>
                            <XMApp xml:id="S3.Ex8.m3.2">
                              <XMTok meaning="times" role="MULOP">⁢</XMTok>
                              <XMText>softmax</XMText>
                              <XMDual>
                                <XMRef idref="S3.Ex8.m3.2.1"/>
                                <XMWrap>
                                  <XMTok role="OPEN" stretchy="false">(</XMTok>
                                  <XMApp xml:id="S3.Ex8.m3.2.1">
                                    <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                    <XMApp>
                                      <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                      <XMTok font="italic" role="UNKNOWN">W</XMTok>
                                      <XMApp>
                                        <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">N</XMTok>
                                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">U</XMTok>
                                      </XMApp>
                                    </XMApp>
                                    <XMDual>
                                      <XMApp>
                                        <XMTok meaning="closed-interval"/>
                                        <XMRef idref="S3.Ex8.m3.1"/>
                                        <XMRef idref="S3.Ex8.m3.2.1.1"/>
                                      </XMApp>
                                      <XMWrap>
                                        <XMTok role="OPEN" stretchy="false">[</XMTok>
                                        <XMTok font="bold italic" role="UNKNOWN" xml:id="S3.Ex8.m3.1">h</XMTok>
                                        <XMTok role="PUNCT">,</XMTok>
                                        <XMApp xml:id="S3.Ex8.m3.2.1.1">
                                          <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                          <XMTok font="italic" role="UNKNOWN">C</XMTok>
                                          <XMTok font="italic" role="UNKNOWN">D</XMTok>
                                        </XMApp>
                                        <XMTok role="CLOSE" stretchy="false">]</XMTok>
                                      </XMWrap>
                                    </XMDual>
                                  </XMApp>
                                  <XMTok role="CLOSE" stretchy="false">)</XMTok>
                                </XMWrap>
                              </XMDual>
                            </XMApp>
                            <XMTok role="PUNCT">,</XMTok>
                          </XMWrap>
                        </XMDual>
                      </XMath>
                    </Math></td>
                </tr>
              </MathBranch>
            </MathFork>
          </equation>
          <equation xml:id="S3.Ex9">
            <MathFork>
              <Math tex="\displaystyle ON=\mbox{softmax}(W^{ON}[{\bm{h}},CD,NU])" text="O * N = [softmax] * W ^ (O * N) * list@(h, C * D, N * U)" xml:id="S3.Ex9.m4">
                <XMath>
                  <XMApp>
                    <XMTok meaning="equals" role="RELOP">=</XMTok>
                    <XMApp>
                      <XMTok meaning="times" role="MULOP">⁢</XMTok>
                      <XMTok font="italic" role="UNKNOWN">O</XMTok>
                      <XMTok font="italic" role="UNKNOWN">N</XMTok>
                    </XMApp>
                    <XMApp>
                      <XMTok meaning="times" role="MULOP">⁢</XMTok>
                      <XMText>softmax</XMText>
                      <XMDual>
                        <XMRef idref="S3.Ex9.m4.2"/>
                        <XMWrap>
                          <XMTok role="OPEN" stretchy="false">(</XMTok>
                          <XMApp xml:id="S3.Ex9.m4.2">
                            <XMTok meaning="times" role="MULOP">⁢</XMTok>
                            <XMApp>
                              <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                              <XMTok font="italic" role="UNKNOWN">W</XMTok>
                              <XMApp>
                                <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">O</XMTok>
                                <XMTok font="italic" fontsize="70%" role="UNKNOWN">N</XMTok>
                              </XMApp>
                            </XMApp>
                            <XMDual>
                              <XMApp>
                                <XMTok meaning="list"/>
                                <XMRef idref="S3.Ex9.m4.1"/>
                                <XMRef idref="S3.Ex9.m4.2.1"/>
                                <XMRef idref="S3.Ex9.m4.2.2"/>
                              </XMApp>
                              <XMWrap>
                                <XMTok role="OPEN" stretchy="false">[</XMTok>
                                <XMTok font="bold italic" role="UNKNOWN" xml:id="S3.Ex9.m4.1">h</XMTok>
                                <XMTok role="PUNCT">,</XMTok>
                                <XMApp xml:id="S3.Ex9.m4.2.1">
                                  <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                  <XMTok font="italic" role="UNKNOWN">C</XMTok>
                                  <XMTok font="italic" role="UNKNOWN">D</XMTok>
                                </XMApp>
                                <XMTok role="PUNCT">,</XMTok>
                                <XMApp xml:id="S3.Ex9.m4.2.2">
                                  <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                  <XMTok font="italic" role="UNKNOWN">N</XMTok>
                                  <XMTok font="italic" role="UNKNOWN">U</XMTok>
                                </XMApp>
                                <XMTok role="CLOSE" stretchy="false">]</XMTok>
                              </XMWrap>
                            </XMDual>
                          </XMApp>
                          <XMTok role="CLOSE" stretchy="false">)</XMTok>
                        </XMWrap>
                      </XMDual>
                    </XMApp>
                  </XMApp>
                </XMath>
              </Math>
              <MathBranch>
                <tr>
                  <td align="right"><Math mode="inline" tex="\displaystyle ON" text="O * N" xml:id="S3.Ex9.m1">
                      <XMath>
                        <XMApp>
                          <XMTok meaning="times" role="MULOP">⁢</XMTok>
                          <XMTok font="italic" role="UNKNOWN">O</XMTok>
                          <XMTok font="italic" role="UNKNOWN">N</XMTok>
                        </XMApp>
                      </XMath>
                    </Math></td>
                  <td align="center"><Math mode="inline" tex="\displaystyle=" text="=" xml:id="S3.Ex9.m2">
                      <XMath>
                        <XMTok meaning="equals" role="RELOP">=</XMTok>
                      </XMath>
                    </Math></td>
                  <td align="left"><Math mode="inline" tex="\displaystyle\mbox{softmax}(W^{ON}[{\bm{h}},CD,NU])" text="[softmax] * W ^ (O * N) * list@(h, C * D, N * U)" xml:id="S3.Ex9.m3">
                      <XMath>
                        <XMApp>
                          <XMTok meaning="times" role="MULOP">⁢</XMTok>
                          <XMText>softmax</XMText>
                          <XMDual>
                            <XMRef idref="S3.Ex9.m3.2"/>
                            <XMWrap>
                              <XMTok role="OPEN" stretchy="false">(</XMTok>
                              <XMApp xml:id="S3.Ex9.m3.2">
                                <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                <XMApp>
                                  <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                                  <XMTok font="italic" role="UNKNOWN">W</XMTok>
                                  <XMApp>
                                    <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                    <XMTok font="italic" fontsize="70%" role="UNKNOWN">O</XMTok>
                                    <XMTok font="italic" fontsize="70%" role="UNKNOWN">N</XMTok>
                                  </XMApp>
                                </XMApp>
                                <XMDual>
                                  <XMApp>
                                    <XMTok meaning="list"/>
                                    <XMRef idref="S3.Ex9.m3.1"/>
                                    <XMRef idref="S3.Ex9.m3.2.1"/>
                                    <XMRef idref="S3.Ex9.m3.2.2"/>
                                  </XMApp>
                                  <XMWrap>
                                    <XMTok role="OPEN" stretchy="false">[</XMTok>
                                    <XMTok font="bold italic" role="UNKNOWN" xml:id="S3.Ex9.m3.1">h</XMTok>
                                    <XMTok role="PUNCT">,</XMTok>
                                    <XMApp xml:id="S3.Ex9.m3.2.1">
                                      <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                      <XMTok font="italic" role="UNKNOWN">C</XMTok>
                                      <XMTok font="italic" role="UNKNOWN">D</XMTok>
                                    </XMApp>
                                    <XMTok role="PUNCT">,</XMTok>
                                    <XMApp xml:id="S3.Ex9.m3.2.2">
                                      <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                      <XMTok font="italic" role="UNKNOWN">N</XMTok>
                                      <XMTok font="italic" role="UNKNOWN">U</XMTok>
                                    </XMApp>
                                    <XMTok role="CLOSE" stretchy="false">]</XMTok>
                                  </XMWrap>
                                </XMDual>
                              </XMApp>
                              <XMTok role="CLOSE" stretchy="false">)</XMTok>
                            </XMWrap>
                          </XMDual>
                        </XMApp>
                      </XMath>
                    </Math></td>
                </tr>
              </MathBranch>
            </MathFork>
          </equation>
        </equationgroup>
        <p>where <Math mode="inline" tex="W^{CD}" text="W ^ (C * D)" xml:id="S3.SS3.p1.m1">
            <XMath>
              <XMApp>
                <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                <XMTok font="italic" role="UNKNOWN">W</XMTok>
                <XMApp>
                  <XMTok meaning="times" role="MULOP">⁢</XMTok>
                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">C</XMTok>
                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">D</XMTok>
                </XMApp>
              </XMApp>
            </XMath>
          </Math>, <Math mode="inline" tex="W^{NU}" text="W ^ (N * U)" xml:id="S3.SS3.p1.m2">
            <XMath>
              <XMApp>
                <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                <XMTok font="italic" role="UNKNOWN">W</XMTok>
                <XMApp>
                  <XMTok meaning="times" role="MULOP">⁢</XMTok>
                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">N</XMTok>
                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">U</XMTok>
                </XMApp>
              </XMApp>
            </XMath>
          </Math> and <Math mode="inline" tex="W^{ON}" text="W ^ (O * N)" xml:id="S3.SS3.p1.m3">
            <XMath>
              <XMApp>
                <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                <XMTok font="italic" role="UNKNOWN">W</XMTok>
                <XMApp>
                  <XMTok meaning="times" role="MULOP">⁢</XMTok>
                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">O</XMTok>
                  <XMTok font="italic" fontsize="70%" role="UNKNOWN">N</XMTok>
                </XMApp>
              </XMApp>
            </XMath>
          </Math> are weights of the fully-connected layer specific to each sub-syllabic unit.</p>
      </para>
      <para xml:id="S3.SS3.p2">
        <p>The setup for treeLSTM to predict a logograph’s pronunciation using hierarchical embeddings is shown in Figure <ref labelref="LABEL:fig:ph_model"/>.</p>
      </para>
      <figure inlist="lof" labels="LABEL:fig:ph_model" placement="ht" xml:id="S3.F4">
        <tags>
          <tag><text fontsize="90%">Figure 4</text></tag>
          <tag role="refnum">4</tag>
          <tag role="typerefnum">Figure 4</tag>
        </tags>
        <graphics class="ltx_centering" graphic="fig_ph_hier" options="width=433.62pt" xml:id="S3.F4.g1"/>
        <toccaption class="ltx_centering"><tag close=" ">4</tag>
<text font="italic">Phonological prediction model using hierarchical embeddings.
(A) The input logograph is decomposed into the logographic structure using the rule-based parser.
(B) treeLSTM constructs hierarchical embedding from the structure.
(C) The embedding is then used to predict the pronunciation.</text>
</toccaption>
        <caption class="ltx_centering"><tag close=": "><text fontsize="90%">Figure 4</text></tag><text fontsize="90%">
<text font="italic">Phonological prediction model using hierarchical embeddings.
(A) The input logograph is decomposed into the logographic structure using the rule-based parser.
(B) treeLSTM constructs hierarchical embedding from the structure.
(C) The embedding is then used to predict the pronunciation.</text>
</text></caption>
      </figure>
    </subsection>
    <subsection inlist="toc" xml:id="S3.SS4">
      <tags>
        <tag>III-D</tag>
        <tag role="refnum">III-D</tag>
        <tag role="typerefnum">§III-D</tag>
      </tags>
      <title><tag close=" ">III-D</tag><text font="italic">Metrics</text></title>
      <para xml:id="S3.SS4.p1">
        <p>We evaluated models’ performance using string error rate (SER) and token error rate (TER).
A wrongly predicted phoneme (onset, nucleus or coda) was counted as one token error.
An output containing at least one token error was counted as one string error.
<!--  %**** taslp2019.tex Line 350 **** -->We used modified Obuchowski statistical test <cite class="ltx_citemacro_cite">[<bibref bibrefs="yang2010note" separator="," yyseparator=","/>]</cite> to assess the difference in predictive differences.</p>
      </para>
    </subsection>
    <subsection inlist="toc" xml:id="S3.SS5">
      <tags>
        <tag>III-E</tag>
        <tag role="refnum">III-E</tag>
        <tag role="typerefnum">§III-E</tag>
      </tags>
      <title><tag close=" ">III-E</tag><text font="italic">Hyperparameters</text></title>
      <para xml:id="S3.SS5.p1">
        <p>The size of hidden layers is fixed as 256.
We used dropout <cite class="ltx_citemacro_cite">[<bibref bibrefs="srivastava14dropout" separator="," yyseparator=","/>]</cite> on input and hidden layers to prevent overfitting.
We optimized the models using the Adam <cite class="ltx_citemacro_cite">[<bibref bibrefs="kingma2015adam" separator="," yyseparator=","/>]</cite> optimizer.
The batch size was 128.
For each of the model, we searched for the best learning rates and dropout rates using grid-search.
The learning rate ranges from <Math mode="inline" tex="3\times 10^{-2}" text="3 * 10 ^ (- 2)" xml:id="S3.SS5.p1.m1">
            <XMath>
              <XMApp>
                <XMTok meaning="times" role="MULOP">×</XMTok>
                <XMTok meaning="3" role="NUMBER">3</XMTok>
                <XMApp>
                  <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                  <XMTok meaning="10" role="NUMBER">10</XMTok>
                  <XMApp>
                    <XMTok fontsize="70%" meaning="minus" role="ADDOP">-</XMTok>
                    <XMTok fontsize="70%" meaning="2" role="NUMBER">2</XMTok>
                  </XMApp>
                </XMApp>
              </XMApp>
            </XMath>
          </Math> to <Math mode="inline" tex="1\times 10^{-4}" text="1 * 10 ^ (- 4)" xml:id="S3.SS5.p1.m2">
            <XMath>
              <XMApp>
                <XMTok meaning="times" role="MULOP">×</XMTok>
                <XMTok meaning="1" role="NUMBER">1</XMTok>
                <XMApp>
                  <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                  <XMTok meaning="10" role="NUMBER">10</XMTok>
                  <XMApp>
                    <XMTok fontsize="70%" meaning="minus" role="ADDOP">-</XMTok>
                    <XMTok fontsize="70%" meaning="4" role="NUMBER">4</XMTok>
                  </XMApp>
                </XMApp>
              </XMApp>
            </XMath>
          </Math>.
The drop out rate ranges from 0.0 to 0.5.</p>
      </para>
    </subsection>
    <subsection inlist="toc" labels="LABEL:ssection:linearize" xml:id="S3.SS6">
      <tags>
        <tag>III-F</tag>
        <tag role="refnum">III-F</tag>
        <tag role="typerefnum">§III-F</tag>
      </tags>
      <title><tag close=" ">III-F</tag><text font="italic">Linearization Order</text></title>
      <para xml:id="S3.SS6.p1">
        <p>Since there are multiple ways to linearize trees into sequences, in this section, we investigated what is the optimal linearization order for the models.
We compared three different schemes namely: in-order, pre-order, post-order linearization.
We paired each of the models (there are five models in total) with the 3 different linearization schemes.
This resulted in fifteen different combinations.
For each combination, we conducted hyperparameter search on the development set.
The lowest TER for each of the combination is reported in Table <ref labelref="LABEL:tbl:lin_order_res"/>.</p>
      </para>
      <table inlist="lot" labels="LABEL:tbl:lin_order_res" placement="ht" xml:id="S3.T2">
        <tags>
          <tag><text fontsize="90%">Table II</text></tag>
          <tag role="refnum">II</tag>
          <tag role="typerefnum">Table II</tag>
        </tags>
        <tabular class="ltx_centering ltx_guessed_headers" vattach="middle">
          <thead>
            <tr>
              <td thead="row"/>
              <td align="right" thead="column">Pre-order</td>
              <td align="right" thead="column">Post-order</td>
              <td align="right" thead="column">In-order</td>
            </tr>
          </thead>
          <tbody>
            <tr>
              <td align="left" border="t" thead="row">LSTM 1 layer</td>
              <td align="right" border="t"><text font="bold">34.14</text></td>
              <td align="right" border="t">34.60</td>
              <td align="right" border="t">34.58</td>
            </tr>
            <tr>
              <td align="left" thead="row">LSTM 2-layer</td>
              <td align="right"><text font="bold">33.69</text></td>
              <td align="right">34.00</td>
              <td align="right">33.76</td>
            </tr>
            <tr>
              <td align="left" thead="row">biLSTM 1-layer</td>
              <td align="right"><text font="bold">34.46</text></td>
              <td align="right">35.04</td>
              <td align="right">34.90</td>
            </tr>
            <tr>
              <td align="left" thead="row">biLSTM 2-layer</td>
              <td align="right"><text font="bold">33.88</text></td>
              <td align="right">34.17</td>
              <td align="right">33.94</td>
            </tr>
            <tr>
              <td align="left" thead="row">CNN</td>
              <td align="right"><text font="bold">36.54</text></td>
              <td align="right">36.95</td>
              <td align="right">37.02</td>
            </tr>
          </tbody>
        </tabular>
        <toccaption class="ltx_centering"><tag close=" ">II</tag>
<text font="italic">Lowest TER on development set for different models and linearization schemes</text></toccaption>
        <caption class="ltx_centering"><tag close=": "><text fontsize="90%">Table II</text></tag><text fontsize="90%">
<text font="italic">Lowest TER on development set for different models and linearization schemes</text></text></caption>
      </table>
      <para xml:id="S3.SS6.p2">
        <p>For all the models, the difference in performance between different linearization schemese is quite small.
However, across all models, the pre-order linearization is slightly better than the post-order and the in-order linearization.
Hence, for the subsequent experiments, we use pre-order linearization to convert from trees to sequences.</p>
      </para>
    </subsection>
    <subsection inlist="toc" labels="LABEL:ssection:results" xml:id="S3.SS7">
      <tags>
        <tag>III-G</tag>
        <tag role="refnum">III-G</tag>
        <tag role="typerefnum">§III-G</tag>
      </tags>
      <title><tag close=" ">III-G</tag><text font="italic">Results</text></title>
      <para xml:id="S3.SS7.p1">
        <p>Table <ref labelref="LABEL:tbl:result"/> shows the prediction results by LSTM, biLSTM, and treeLSTM for three experimental scenarios listed in Table <ref labelref="LABEL:tbl:ph_data"/>.
In scenario 1 and 2, biLSTM performed slightly worse than LSTM so we only compared LSTM against treeLSTM.
In scenario 1 where the training and test distributions were the same, treeLSTM yields 1.8% <Math mode="inline" tex="(p=2e^{-4})" text="p = 2 * e ^ (- 4)" xml:id="S3.SS7.p1.m1">
            <XMath>
              <XMDual>
                <XMRef idref="S3.SS7.p1.m1.1"/>
                <XMWrap>
                  <XMTok role="OPEN" stretchy="false">(</XMTok>
                  <XMApp xml:id="S3.SS7.p1.m1.1">
                    <XMTok meaning="equals" role="RELOP">=</XMTok>
                    <XMTok font="italic" role="UNKNOWN">p</XMTok>
                    <XMApp>
                      <XMTok meaning="times" role="MULOP">⁢</XMTok>
                      <XMTok meaning="2" role="NUMBER">2</XMTok>
                      <XMApp>
                        <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                        <XMTok font="italic" role="UNKNOWN">e</XMTok>
                        <XMApp>
                          <XMTok fontsize="70%" meaning="minus" role="ADDOP">-</XMTok>
                          <XMTok fontsize="70%" meaning="4" role="NUMBER">4</XMTok>
                        </XMApp>
                      </XMApp>
                    </XMApp>
                  </XMApp>
                  <XMTok role="CLOSE" stretchy="false">)</XMTok>
                </XMWrap>
              </XMDual>
            </XMath>
          </Math> and 2.0% <Math mode="inline" tex="(p=6e^{-5})" text="p = 6 * e ^ (- 5)" xml:id="S3.SS7.p1.m2">
            <XMath>
              <XMDual>
                <XMRef idref="S3.SS7.p1.m2.1"/>
                <XMWrap>
                  <XMTok role="OPEN" stretchy="false">(</XMTok>
                  <XMApp xml:id="S3.SS7.p1.m2.1">
                    <XMTok meaning="equals" role="RELOP">=</XMTok>
                    <XMTok font="italic" role="UNKNOWN">p</XMTok>
                    <XMApp>
                      <XMTok meaning="times" role="MULOP">⁢</XMTok>
                      <XMTok meaning="6" role="NUMBER">6</XMTok>
                      <XMApp>
                        <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                        <XMTok font="italic" role="UNKNOWN">e</XMTok>
                        <XMApp>
                          <XMTok fontsize="70%" meaning="minus" role="ADDOP">-</XMTok>
                          <XMTok fontsize="70%" meaning="5" role="NUMBER">5</XMTok>
                        </XMApp>
                      </XMApp>
                    </XMApp>
                  </XMApp>
                  <XMTok role="CLOSE" stretchy="false">)</XMTok>
                </XMWrap>
              </XMDual>
            </XMath>
          </Math> lower absolute TER (5.4% and 6.0% relative TER) than 1-layer and 2-layer LSTM respectively.
treeLSTM also yields 1.6% <Math mode="inline" tex="(p=0.06)" text="p = 0.06" xml:id="S3.SS7.p1.m3">
            <XMath>
              <XMDual>
                <XMRef idref="S3.SS7.p1.m3.1"/>
                <XMWrap>
                  <XMTok role="OPEN" stretchy="false">(</XMTok>
                  <XMApp xml:id="S3.SS7.p1.m3.1">
                    <XMTok meaning="equals" role="RELOP">=</XMTok>
                    <XMTok font="italic" role="UNKNOWN">p</XMTok>
                    <XMTok meaning="0.06" role="NUMBER">0.06</XMTok>
                  </XMApp>
                  <XMTok role="CLOSE" stretchy="false">)</XMTok>
                </XMWrap>
              </XMDual>
            </XMath>
          </Math> and 0.6% <Math mode="inline" tex="(p=0.4)" text="p = 0.4" xml:id="S3.SS7.p1.m4">
            <XMath>
              <XMDual>
                <XMRef idref="S3.SS7.p1.m4.1"/>
                <XMWrap>
                  <XMTok role="OPEN" stretchy="false">(</XMTok>
                  <XMApp xml:id="S3.SS7.p1.m4.1">
                    <XMTok meaning="equals" role="RELOP">=</XMTok>
                    <XMTok font="italic" role="UNKNOWN">p</XMTok>
                    <XMTok meaning="0.4" role="NUMBER">0.4</XMTok>
                  </XMApp>
                  <XMTok role="CLOSE" stretchy="false">)</XMTok>
                </XMWrap>
              </XMDual>
            </XMath>
          </Math> lower absolute SER (2.7% and 1.0% relative SER) than 1-layer and 2-layer LSTM respectively.
The trends are similar when individual output units (i.e., onset, nucleus, coda) are considered.
This result is unlikely due to treeLSTM having a higher capacity since the 2-layer LSTM had more parameters than treeLSTM.
<!--  %**** taslp2019.tex Line 400 **** --></p>
      </para>
      <table inlist="lot" labels="LABEL:tbl:result" placement="ht" xml:id="S3.T3">
        <tags>
          <tag><text fontsize="90%">Table III</text></tag>
          <tag role="refnum">III</tag>
          <tag role="typerefnum">Table III</tag>
        </tags>
        <tabular class="ltx_centering ltx_guessed_headers" vattach="middle">
          <tbody>
            <tr>
              <td thead="row"/>
              <td align="right">SER</td>
              <td align="right">TER</td>
              <td align="right">On.</td>
              <td align="right">Nu.</td>
              <td align="right">Cd.</td>
            </tr>
            <tr>
              <td align="left" border="t" colspan="6" thead="row">Scenario 1: Tr, Sp <Math mode="inline" tex="\rightarrow" text="rightarrow" xml:id="S3.T3.m1">
                  <XMath>
                    <XMTok name="rightarrow" role="ARROW">→</XMTok>
                  </XMath>
                </Math> Tr, Sp</td>
            </tr>
            <tr>
              <td align="left" border="t" thead="row">LSTM 1-layer</td>
              <td align="right" border="t">58.5</td>
              <td align="right" border="t">33.1</td>
              <td align="right" border="t">42.8</td>
              <td align="right" border="t">37.5</td>
              <td align="right" border="t">19.0</td>
            </tr>
            <tr>
              <td align="left" thead="row">LSTM 2-layer</td>
              <td align="right">57.5</td>
              <td align="right">33.3</td>
              <td align="right">42.8</td>
              <td align="right">38.3</td>
              <td align="right">18.9</td>
            </tr>
            <tr>
              <td align="left" thead="row">biLSTM 1-layer</td>
              <td align="right">59.1</td>
              <td align="right">33.4</td>
              <td align="right">43.7</td>
              <td align="right">37.2</td>
              <td align="right">19.3</td>
            </tr>
            <tr>
              <td align="left" thead="row">biLSTM 2-layer</td>
              <td align="right">57.8</td>
              <td align="right">32.9</td>
              <td align="right">42.5</td>
              <td align="right">36.9</td>
              <td align="right">19.2</td>
            </tr>
            <tr>
              <td align="left" thead="row">CNN</td>
              <td align="right">62.1</td>
              <td align="right">35.9</td>
              <td align="right">45.0</td>
              <td align="right">41.3</td>
              <td align="right">21.4</td>
            </tr>
            <tr>
              <td align="left" thead="row"><text font="bold">treeLSTM</text></td>
              <td align="right"><text font="bold">56.9</text></td>
              <td align="right"><text font="bold">31.3</text></td>
              <td align="right"><text font="bold">40.9</text></td>
              <td align="right"><text font="bold">35.7</text></td>
              <td align="right"><text font="bold">17.3</text></td>
            </tr>
            <tr>
              <td align="left" border="t" colspan="6" thead="row">Scenario 2: Non-Sp <Math mode="inline" tex="\rightarrow" text="rightarrow" xml:id="S3.T3.m2">
                  <XMath>
                    <XMTok name="rightarrow" role="ARROW">→</XMTok>
                  </XMath>
                </Math> Sp</td>
            </tr>
            <tr>
              <td align="left" border="t" thead="row">LSTM 1-layer</td>
              <td align="right" border="t">73.5</td>
              <td align="right" border="t">48.5</td>
              <td align="right" border="t">57.3</td>
              <td align="right" border="t">53.0</td>
              <td align="right" border="t">35.3</td>
            </tr>
            <tr>
              <td align="left" thead="row">LSTM 2-layer</td>
              <td align="right">71.3</td>
              <td align="right">45.8</td>
              <td align="right">55.5</td>
              <td align="right">50.0</td>
              <td align="right">32.0</td>
            </tr>
            <tr>
              <td align="left" thead="row">biLSTM 1-layer</td>
              <td align="right">74.1</td>
              <td align="right">48.4</td>
              <td align="right">57.2</td>
              <td align="right">53.0</td>
              <td align="right">35.0</td>
            </tr>
            <tr>
              <td align="left" thead="row">biLSTM 2-layer</td>
              <td align="right">71.5</td>
              <td align="right">47.0</td>
              <td align="right">56.0</td>
              <td align="right">50.9</td>
              <td align="right">34.0</td>
            </tr>
            <tr>
              <td align="left" thead="row">CNN</td>
              <td align="right">79.1</td>
              <td align="right">52.1</td>
              <td align="right">62.4</td>
              <td align="right">56.9</td>
              <td align="right">37.1</td>
            </tr>
            <tr>
              <td align="left" thead="row"><text font="bold">treeLSTM</text></td>
              <td align="right"><text font="bold">69.6</text></td>
              <td align="right"><text font="bold">43.8</text></td>
              <td align="right"><text font="bold">51.8</text></td>
              <td align="right"><text font="bold">48.6</text></td>
              <td align="right"><text font="bold">31.0</text></td>
            </tr>
            <tr>
              <td align="left" border="t" colspan="6" thead="row">Scenario 3: Tr <Math mode="inline" tex="\rightarrow" text="rightarrow" xml:id="S3.T3.m3">
                  <XMath>
                    <XMTok name="rightarrow" role="ARROW">→</XMTok>
                  </XMath>
                </Math> Sp</td>
            </tr>
            <tr>
              <td align="left" border="t" thead="row">LSTM 1-layer</td>
              <td align="right" border="t">77.2</td>
              <td align="right" border="t">55.5</td>
              <td align="right" border="t">62.2</td>
              <td align="right" border="t">59.5</td>
              <td align="right" border="t">44.8</td>
            </tr>
            <tr>
              <td align="left" thead="row">LSTM 2-layer</td>
              <td align="right">77.4</td>
              <td align="right">57.7</td>
              <td align="right">65.2</td>
              <td align="right">61.3</td>
              <td align="right">46.4</td>
            </tr>
            <tr>
              <td align="left" thead="row">biLSTM 1-layer</td>
              <td align="right">73.5</td>
              <td align="right">51.6</td>
              <td align="right">57.9</td>
              <td align="right">55.2</td>
              <td align="right">41.8</td>
            </tr>
            <tr>
              <td align="left" thead="row">biLSTM 2-layer</td>
              <td align="right">75.7</td>
              <td align="right">55.4</td>
              <td align="right">62.0</td>
              <td align="right">60.5</td>
              <td align="right">43.7</td>
            </tr>
            <tr>
              <td align="left" thead="row">CNN</td>
              <td align="right">70.5</td>
              <td align="right">48.1</td>
              <td align="right">54.1</td>
              <td align="right">49.7</td>
              <td align="right">40.5</td>
            </tr>
            <tr>
              <td align="left" thead="row"><text font="bold">treeLSTM</text></td>
              <td align="right"><text font="bold">68.8</text></td>
              <td align="right"><text font="bold">47.7</text></td>
              <td align="right"><text font="bold">53.7</text></td>
              <td align="right"><text font="bold">50.7</text></td>
              <td align="right"><text font="bold">38.9</text></td>
            </tr>
          </tbody>
        </tabular>
        <toccaption class="ltx_centering"><tag close=" ">III</tag>
<text font="italic">Cantonese phonemes prediction percentage error rate. Tr: Traditional, Sp: Simplified</text></toccaption>
        <caption class="ltx_centering"><tag close=": "><text fontsize="90%">Table III</text></tag><text fontsize="90%">
<text font="italic">Cantonese phonemes prediction percentage error rate. Tr: Traditional, Sp: Simplified</text></text></caption>
      </table>
      <para xml:id="S3.SS7.p2">
        <p>When training and test distributions are different (scenario 2), models that have better inductive bias should perform better <cite class="ltx_citemacro_cite">[<bibref bibrefs="haussler1988quantifying" separator="," yyseparator=","/>]</cite>.
For example, the convolution operation in convolutional neural network (CNN) has translation equivariant bias <cite class="ltx_citemacro_cite">[<bibref bibrefs="Goodfellow-et-al-2016" separator="," yyseparator=","/>]</cite>.
This bias enforces that the representation of an object is the same regardless of its position in an image.
This bias makes CNN generalize much better and require few training samples than fully-connected neural networks.
For logographs, the inductive bias is that the interaction between sub-units is local in space.
This inductive bias is enforced in the treeLSTM model since a child node only interacts with its sibling.
The result is that the hierarchical embeddings is much more data-efficient than the LSTM.
The result shown in Table <ref labelref="LABEL:tbl:result"/> indicates that treeLSTM can generalize better than LSTM models even when the test set has out-of-distribution samples.
treeLSTM yields 4.7% <Math mode="inline" tex="(p&lt;1e^{-12})" text="p less 1 * e ^ (- 12)" xml:id="S3.SS7.p2.m1">
            <XMath>
              <XMDual>
                <XMRef idref="S3.SS7.p2.m1.1"/>
                <XMWrap>
                  <XMTok role="OPEN" stretchy="false">(</XMTok>
                  <XMApp xml:id="S3.SS7.p2.m1.1">
                    <XMTok meaning="less-than" role="RELOP">&lt;</XMTok>
                    <XMTok font="italic" role="UNKNOWN">p</XMTok>
                    <XMApp>
                      <XMTok meaning="times" role="MULOP">⁢</XMTok>
                      <XMTok meaning="1" role="NUMBER">1</XMTok>
                      <XMApp>
                        <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                        <XMTok font="italic" role="UNKNOWN">e</XMTok>
                        <XMApp>
                          <XMTok fontsize="70%" meaning="minus" role="ADDOP">-</XMTok>
                          <XMTok fontsize="70%" meaning="12" role="NUMBER">12</XMTok>
                        </XMApp>
                      </XMApp>
                    </XMApp>
                  </XMApp>
                  <XMTok role="CLOSE" stretchy="false">)</XMTok>
                </XMWrap>
              </XMDual>
            </XMath>
          </Math> and 2.0% <Math mode="inline" tex="(p=6e^{-4})" text="p = 6 * e ^ (- 4)" xml:id="S3.SS7.p2.m2">
            <XMath>
              <XMDual>
                <XMRef idref="S3.SS7.p2.m2.1"/>
                <XMWrap>
                  <XMTok role="OPEN" stretchy="false">(</XMTok>
                  <XMApp xml:id="S3.SS7.p2.m2.1">
                    <XMTok meaning="equals" role="RELOP">=</XMTok>
                    <XMTok font="italic" role="UNKNOWN">p</XMTok>
                    <XMApp>
                      <XMTok meaning="times" role="MULOP">⁢</XMTok>
                      <XMTok meaning="6" role="NUMBER">6</XMTok>
                      <XMApp>
                        <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                        <XMTok font="italic" role="UNKNOWN">e</XMTok>
                        <XMApp>
                          <XMTok fontsize="70%" meaning="minus" role="ADDOP">-</XMTok>
                          <XMTok fontsize="70%" meaning="4" role="NUMBER">4</XMTok>
                        </XMApp>
                      </XMApp>
                    </XMApp>
                  </XMApp>
                  <XMTok role="CLOSE" stretchy="false">)</XMTok>
                </XMWrap>
              </XMDual>
            </XMath>
          </Math> lower absolute TER (9.6% and 4.3% relative TER) than 1-layer LSTM and 2-layer LSTM respectively.
<!--  %**** taslp2019.tex Line 450 **** -->Besides, treeLSTM yields 3.9% <Math mode="inline" tex="(p=3e^{-6})" text="p = 3 * e ^ (- 6)" xml:id="S3.SS7.p2.m3">
            <XMath>
              <XMDual>
                <XMRef idref="S3.SS7.p2.m3.1"/>
                <XMWrap>
                  <XMTok role="OPEN" stretchy="false">(</XMTok>
                  <XMApp xml:id="S3.SS7.p2.m3.1">
                    <XMTok meaning="equals" role="RELOP">=</XMTok>
                    <XMTok font="italic" role="UNKNOWN">p</XMTok>
                    <XMApp>
                      <XMTok meaning="times" role="MULOP">⁢</XMTok>
                      <XMTok meaning="3" role="NUMBER">3</XMTok>
                      <XMApp>
                        <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                        <XMTok font="italic" role="UNKNOWN">e</XMTok>
                        <XMApp>
                          <XMTok fontsize="70%" meaning="minus" role="ADDOP">-</XMTok>
                          <XMTok fontsize="70%" meaning="6" role="NUMBER">6</XMTok>
                        </XMApp>
                      </XMApp>
                    </XMApp>
                  </XMApp>
                  <XMTok role="CLOSE" stretchy="false">)</XMTok>
                </XMWrap>
              </XMDual>
            </XMath>
          </Math> and 1.7% <Math mode="inline" tex="(p=3e^{-2})" text="p = 3 * e ^ (- 2)" xml:id="S3.SS7.p2.m4">
            <XMath>
              <XMDual>
                <XMRef idref="S3.SS7.p2.m4.1"/>
                <XMWrap>
                  <XMTok role="OPEN" stretchy="false">(</XMTok>
                  <XMApp xml:id="S3.SS7.p2.m4.1">
                    <XMTok meaning="equals" role="RELOP">=</XMTok>
                    <XMTok font="italic" role="UNKNOWN">p</XMTok>
                    <XMApp>
                      <XMTok meaning="times" role="MULOP">⁢</XMTok>
                      <XMTok meaning="3" role="NUMBER">3</XMTok>
                      <XMApp>
                        <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                        <XMTok font="italic" role="UNKNOWN">e</XMTok>
                        <XMApp>
                          <XMTok fontsize="70%" meaning="minus" role="ADDOP">-</XMTok>
                          <XMTok fontsize="70%" meaning="2" role="NUMBER">2</XMTok>
                        </XMApp>
                      </XMApp>
                    </XMApp>
                  </XMApp>
                  <XMTok role="CLOSE" stretchy="false">)</XMTok>
                </XMWrap>
              </XMDual>
            </XMath>
          </Math> lower absolute SER (5.3% and 2.3% relative SER) than 1-layer LSTM and 2-layer LSTM respectively.
The trends are similar when individual sub-syllabic classes (i.e., onset, nucleus, coda) are considered.</p>
      </para>
      <para xml:id="S3.SS7.p3">
        <p>When training and test distributions are different and the amount of training data is limited, good inductive biases are even more important to obtain good generalization.
Comparing scenario 2 and 3, treeLSTM is less affected than LSTM by the limited training data.
In the limited training data regime, 2-layer LSTM clearly overfits badly compared to 1-layer LSTM and treeLSTM.
It is interesting to note that although the CNN model is the most competitive baseline in scenario 3 although it is worse than the LSTM and biLSTM when there is more data (scenario 1 and 2).
However, compared to the CNN, the treeLSTM still has lower SER <Math mode="inline" tex="(p=0.07)" text="p = 0.07" xml:id="S3.SS7.p3.m1">
            <XMath>
              <XMDual>
                <XMRef idref="S3.SS7.p3.m1.1"/>
                <XMWrap>
                  <XMTok role="OPEN" stretchy="false">(</XMTok>
                  <XMApp xml:id="S3.SS7.p3.m1.1">
                    <XMTok meaning="equals" role="RELOP">=</XMTok>
                    <XMTok font="italic" role="UNKNOWN">p</XMTok>
                    <XMTok meaning="0.07" role="NUMBER">0.07</XMTok>
                  </XMApp>
                  <XMTok role="CLOSE" stretchy="false">)</XMTok>
                </XMWrap>
              </XMDual>
            </XMath>
          </Math> and TER <Math mode="inline" tex="(p=0.5)" text="p = 0.5" xml:id="S3.SS7.p3.m2">
            <XMath>
              <XMDual>
                <XMRef idref="S3.SS7.p3.m2.1"/>
                <XMWrap>
                  <XMTok role="OPEN" stretchy="false">(</XMTok>
                  <XMApp xml:id="S3.SS7.p3.m2.1">
                    <XMTok meaning="equals" role="RELOP">=</XMTok>
                    <XMTok font="italic" role="UNKNOWN">p</XMTok>
                    <XMTok meaning="0.5" role="NUMBER">0.5</XMTok>
                  </XMApp>
                  <XMTok role="CLOSE" stretchy="false">)</XMTok>
                </XMWrap>
              </XMDual>
            </XMath>
          </Math>.</p>
      </para>
    </subsection>
    <subsection inlist="toc" xml:id="S3.SS8">
      <tags>
        <tag>III-H</tag>
        <tag role="refnum">III-H</tag>
        <tag role="typerefnum">§III-H</tag>
      </tags>
      <title><tag close=" ">III-H</tag><text font="italic">Ablation</text></title>
      <para xml:id="S3.SS8.p1">
        <p>We conducted ablation experiments to see how much the models depend on the composition operators.
Without the operators, the LSTM, biLSTM, and CNN cannot discern the hierarchical grouping of sub-units.
On the other hand, even without the composition operations, the treeLSTM model still receives some structural information from ordering of the sub-units in a tree.</p>
      </para>
      <table inlist="lot" labels="LABEL:tbl:ablation" placement="ht" xml:id="S3.T4">
        <tags>
          <tag><text fontsize="90%">Table IV</text></tag>
          <tag role="refnum">IV</tag>
          <tag role="typerefnum">Table IV</tag>
        </tags>
        <tabular class="ltx_centering ltx_guessed_headers" vattach="middle">
          <thead>
            <tr>
              <td border="r" thead="column row"/>
              <td align="center" border="r" colspan="2" thead="column">+ operators</td>
              <td align="center" colspan="2" thead="column">- operators</td>
            </tr>
            <tr>
              <td align="left" border="r t" thead="column row">Model</td>
              <td align="right" border="t" thead="column">SER</td>
              <td align="right" border="r t" thead="column">TER</td>
              <td align="right" border="t" thead="column">SER</td>
              <td align="right" border="t" thead="column">TER</td>
            </tr>
          </thead>
          <tbody>
            <tr>
              <td align="left" border="r t" thead="row">LSTM 1-layer</td>
              <td align="right" border="t">58.5</td>
              <td align="right" border="r t">33.1</td>
              <td align="right" border="t">62.0</td>
              <td align="right" border="t">35.5</td>
            </tr>
            <tr>
              <td align="left" border="r" thead="row">LSTM 2-layer</td>
              <td align="right">57.5</td>
              <td align="right" border="r">33.3</td>
              <td align="right">59.4</td>
              <td align="right">34.3</td>
            </tr>
            <tr>
              <td align="left" border="r" thead="row">biLSTM 1-layer</td>
              <td align="right">59.1</td>
              <td align="right" border="r">33.4</td>
              <td align="right">63.8</td>
              <td align="right">36.5</td>
            </tr>
            <tr>
              <td align="left" border="r" thead="row">biLSTM 2-layer</td>
              <td align="right">57.8</td>
              <td align="right" border="r">32.9</td>
              <td align="right">61.3</td>
              <td align="right">35.5</td>
            </tr>
            <tr>
              <td align="left" border="r" thead="row">CNN</td>
              <td align="right">62.1</td>
              <td align="right" border="r">35.9</td>
              <td align="right">67.2</td>
              <td align="right">40.2</td>
            </tr>
            <tr>
              <td align="left" border="r" thead="row">treeLSTM</td>
              <td align="right">56.9</td>
              <td align="right" border="r">31.3</td>
              <td align="right">57.3</td>
              <td align="right">32.0</td>
            </tr>
          </tbody>
        </tabular>
        <toccaption class="ltx_centering"><tag close=" ">IV</tag>
<text font="italic">Results on the test set of scenario 1</text></toccaption>
        <caption class="ltx_centering"><tag close=": "><text fontsize="90%">Table IV</text></tag><text fontsize="90%">
<text font="italic">Results on the test set of scenario 1</text></text></caption>
      </table>
      <para xml:id="S3.SS8.p2">
        <p>In order to implement the case where there is no composition operators in the input,
for the LSTM, biLSTM, and CNN models, the operators were removed from the input sequences.
For the treeLSTM model, all the <Math mode="inline" tex="{\bm{V}}" text="V" xml:id="S3.SS8.p2.m1">
            <XMath>
              <XMTok font="bold italic" role="UNKNOWN">V</XMTok>
            </XMath>
          </Math> terms were removed from the equation of the inner nodes.
We searched the best hyperparameters for each of the model using the development set of scenario 1.
We picked scenario 1 because it is the most common way to split data into training/validation/test sets, i.e. standard split.
The result is shown in Table <ref labelref="LABEL:tbl:ablation"/>.
It can be seen that the composition operators do provide salient information for the task since taking them out results in worse performance across all models (as reflected by increases in error rates).
However, the treeLSTM performance does not degrade by much, it is more certain that treeLSTM learns to compose the sub-units chiefly from the tree structure.</p>
      </para>
    </subsection>
    <subsection inlist="toc" xml:id="S3.SS9">
      <tags>
        <tag>III-I</tag>
        <tag role="refnum">III-I</tag>
        <tag role="typerefnum">§III-I</tag>
      </tags>
      <title><tag close=" ">III-I</tag><text font="italic">Prediction Order of Output Phonemes</text></title>
      <para xml:id="S3.SS9.p1">
        <p>The phonetic subunits in Chinese characters usually predict nucleus and coda more reliably than onset.
This trend can be seen in Figure <ref labelref="LABEL:fig:example"/> whereby all the nuclei and codas are the same across the first four characters which share the same phonetic subunit.
However, the most effective ordering of input and output in machine learning is may not align with human intuition.
For example, reversing the order of the input sentence boosted the performance of machine translation <cite class="ltx_citemacro_cite">[<bibref bibrefs="sutskever2014sequence" separator="," yyseparator=","/>]</cite>, while swapping the order of onsets and nuclei in Thai syllables boosted the performance of English-to-Thai transliteration <cite class="ltx_citemacro_cite">[<bibref bibrefs="nguyen2016regulating" separator="," yyseparator=","/>]</cite>.
We adopted the Coda-Nucleus-Onset prediction order in this paper as shown in Section <ref labelref="LABEL:ssec:task_layer"/>.
However, we also tried using a different prediction order which is Onset-Nucleus-Coda.
<!--  %**** taslp2019.tex Line 500 **** -->We replaced the task-specific layer of the proposed model and searched for the optimal hyperparameters.
The model with the best hyperparameters is then applied on the test set.
Empirically, we observed little difference in performance between the two orders.</p>
      </para>
      <table inlist="lot" labels="LABEL:tbl:out_order" placement="ht" xml:id="S3.T5">
        <tags>
          <tag><text fontsize="90%">Table V</text></tag>
          <tag role="refnum">V</tag>
          <tag role="typerefnum">Table V</tag>
        </tags>
        <tabular class="ltx_centering ltx_guessed_headers" vattach="middle">
          <thead>
            <tr>
              <td align="left" thead="column">Output order</td>
              <td align="right" thead="column">SER</td>
              <td align="right" thead="column">TER</td>
              <td align="right" thead="column">On.</td>
              <td align="right" thead="column">Nu.</td>
              <td align="right" thead="column">Cd.</td>
            </tr>
          </thead>
          <tbody>
            <tr>
              <td align="left" border="t">Coda-Nucleus-Onset</td>
              <td align="right" border="t">56.9</td>
              <td align="right" border="t">31.3</td>
              <td align="right" border="t">40.9</td>
              <td align="right" border="t">35.7</td>
              <td align="right" border="t">17.3</td>
            </tr>
            <tr>
              <td align="left">Onset-Nucleus-Coda</td>
              <td align="right">57.3</td>
              <td align="right">32.0</td>
              <td align="right">42.1</td>
              <td align="right">36.8</td>
              <td align="right">17.2</td>
            </tr>
          </tbody>
        </tabular>
        <toccaption class="ltx_centering"><tag close=" ">V</tag>
<text font="italic">Comparing different orders of predicting output phonemes. treeLSTM results on the test set of scenario 1</text></toccaption>
        <caption class="ltx_centering"><tag close=": "><text fontsize="90%">Table V</text></tag><text fontsize="90%">
<text font="italic">Comparing different orders of predicting output phonemes. treeLSTM results on the test set of scenario 1</text></text></caption>
      </table>
    </subsection>
  </section>
  <section inlist="toc" labels="LABEL:sec:exp_lm" xml:id="S4">
    <tags>
      <tag>IV</tag>
      <tag role="refnum">IV</tag>
      <tag role="typerefnum">§IV</tag>
    </tags>
    <title><tag close=" ">IV</tag><text font="smallcaps">Experiments — Language Modeling</text></title>
    <para xml:id="S4.p1">
      <p>We evaluated how well the hierarchical embeddings can improve language modeling in Chinese.
We compare hierarchical embeddings against standard embeddings to quantify the usefulness of sub-unit semantic information since hierarchical embeddings are imbued with semantic information from the sub-units while standard embeddings are not.</p>
    </para>
    <subsection inlist="toc" xml:id="S4.SS1">
      <tags>
        <tag>IV-A</tag>
        <tag role="refnum">IV-A</tag>
        <tag role="typerefnum">§IV-A</tag>
      </tags>
      <title><tag close=" ">IV-A</tag><text font="italic">Data</text></title>
<!--  %**** taslp2019.tex Line 525 **** -->      <para xml:id="S4.SS1.p1">
        <p>As the characters (logographs) in the output of language models are not independent, it is difficult to design meaningful statistical tests to evaluate the effectiveness of our proposed approach.
Instead we chose a wide variety of five different datasets, consisting of three datasets using simplified characters (Chinese Penn Treebank (CTB) Version 5.1 <cite class="ltx_citemacro_cite">[<bibref bibrefs="xue2005penn" separator="," yyseparator=","/>]</cite>, Beijing University (PKU) dataset <cite class="ltx_citemacro_cite">[<bibref bibrefs="emerson2005second" separator="," yyseparator=","/>]</cite>, and Microsoft Research (MSR) dataset <cite class="ltx_citemacro_cite">[<bibref bibrefs="emerson2005second" separator="," yyseparator=","/>]</cite>) and two datasets using traditional characters (City University of Hong Kong (CITYU) dataset <cite class="ltx_citemacro_cite">[<bibref bibrefs="emerson2005second" separator="," yyseparator=","/>]</cite> and Academia Sinica (AS) dataset <cite class="ltx_citemacro_cite">[<bibref bibrefs="emerson2005second" separator="," yyseparator=","/>]</cite>).
If we can show consistent improvements across these datasets, it implies the proposed hierarchical embeddings are effective.
Table <ref labelref="LABEL:tbl:lm_data"/> shows the data split for each of the datasets.
Data splits for CTB and PKU datasets are taken from <cite class="ltx_citemacro_cite">[<bibref bibrefs="kawakami2018unsupervised" separator="," yyseparator=","/>]</cite>.<note mark="5" role="footnote" xml:id="footnote5"><tags>
              <tag>5</tag>
              <tag role="refnum">5</tag>
              <tag role="typerefnum">footnote 5</tag>
            </tags><ref class="ltx_url" font="typewriter" href="https://s3.eu-west-2.amazonaws.com/k-kawakami/seg.zip">https://s3.eu-west-2.amazonaws.com/k-kawakami/seg.zip</ref></note></p>
      </para>
      <table inlist="lot" labels="LABEL:tbl:lm_data" placement="ht" xml:id="S4.T6">
        <tags>
          <tag><text fontsize="90%">Table VI</text></tag>
          <tag role="refnum">VI</tag>
          <tag role="typerefnum">Table VI</tag>
        </tags>
        <tabular class="ltx_centering ltx_guessed_headers" vattach="middle">
          <thead>
            <tr>
              <td align="left" thead="column row">Dataset</td>
              <td align="right" thead="column">Training</td>
              <td align="right" thead="column">Validation</td>
              <td align="right" thead="column">Test</td>
            </tr>
          </thead>
          <tbody>
            <tr>
              <td align="left" border="t" thead="row">CTB (Simplified)</td>
              <td align="right" border="t">50,734</td>
              <td align="right" border="t">349</td>
              <td align="right" border="t">345</td>
            </tr>
            <tr>
              <td align="left" thead="row">PKU (Simplified)</td>
              <td align="right">17,149</td>
              <td align="right">1,841</td>
              <td align="right">1,790</td>
            </tr>
            <tr>
              <td align="left" thead="row">MSR (Simplified)</td>
              <td align="right">83,000</td>
              <td align="right">3,924</td>
              <td align="right">3,985</td>
            </tr>
            <tr>
              <td align="left" thead="row">CITYU (Traditional)</td>
              <td align="right">51,000</td>
              <td align="right">2,019</td>
              <td align="right">1,493</td>
            </tr>
            <tr>
              <td align="left" thead="row">AS (Traditional)</td>
              <td align="right">690,000</td>
              <td align="right">18,953</td>
              <td align="right">14,431</td>
            </tr>
          </tbody>
        </tabular>
        <toccaption class="ltx_centering"><tag close=" ">VI</tag>
<text font="italic">Number of sentences in the training, validation, and test sets in each of the datasets.</text></toccaption>
        <caption class="ltx_centering"><tag close=": "><text fontsize="90%">Table VI</text></tag><text fontsize="90%">
<text font="italic">Number of sentences in the training, validation, and test sets in each of the datasets.</text></text></caption>
      </table>
    </subsection>
    <subsection inlist="toc" xml:id="S4.SS2">
      <tags>
        <tag>IV-B</tag>
        <tag role="refnum">IV-B</tag>
        <tag role="typerefnum">§IV-B</tag>
      </tags>
      <title><tag close=" ">IV-B</tag><text font="italic">Setup</text></title>
<!--  %**** taslp2019.tex Line 550 **** -->      <para xml:id="S4.SS2.p1">
        <p>We used AWD-LSTM (ASGD Weight-Dropped LSTM) model <cite class="ltx_citemacro_cite">[<bibref bibrefs="merity2017regularizing" separator="," yyseparator=","/>]</cite> as the core in the language modeling experiment.
The input to AWD-LSTM is either hierarchical embeddings (Figure <ref labelref="LABEL:fig:lm_model_hier"/>) or standard character embeddings (Figure <ref labelref="LABEL:fig:lm_model_bl"/>).
We considered the standard character embeddings as the baseline.
We trained the model using the training set for a fixed number of epochs and used the validation set to select the best model.
The best model performance was evaluated on the test set after training finished.</p>
      </para>
      <figure inlist="lof" placement="ht" xml:id="S4.F5">
        <tags>
          <tag><text fontsize="90%">Figure 5</text></tag>
          <tag role="refnum">5</tag>
          <tag role="typerefnum">Figure 5</tag>
        </tags>
        <figure align="center" inlist="lof" labels="LABEL:fig:lm_model_bl" xml:id="S4.F4.sf1">
          <tags>
            <tag><text fontsize="90%">(a)</text></tag>
            <tag role="refnum">4(a)</tag>
          </tags>
          <graphics class="ltx_centering" graphic="fig_lm_bl" options="width=433.62pt" xml:id="S4.F4.sf1.g1"/>
          <toccaption class="ltx_centering"><tag close=" ">(a)</tag>
<text font="italic">Standard embeddings (baseline)
</text></toccaption>
          <caption class="ltx_centering"><tag close=" "><text fontsize="90%">(a)</text></tag><text fontsize="90%">
<text font="italic">Standard embeddings (baseline)
</text></text></caption>
        </figure>
        <figure align="center" inlist="lof" labels="LABEL:fig:lm_model_hier" xml:id="S4.F4.sf2">
          <tags>
            <tag><text fontsize="90%">(b)</text></tag>
            <tag role="refnum">4(b)</tag>
          </tags>
          <graphics class="ltx_centering" graphic="fig_lm_hier" options="width=433.62pt" xml:id="S4.F4.sf2.g1"/>
          <toccaption class="ltx_centering"><tag close=" ">(b)</tag>
<text font="italic">Hierarchical embeddings (proposed)
</text></toccaption>
          <caption class="ltx_centering"><tag close=" "><text fontsize="90%">(b)</text></tag><text fontsize="90%">
<text font="italic">Hierarchical embeddings (proposed)
</text></text></caption>
        </figure>
        <toccaption class="ltx_centering"><tag close=" ">5</tag><text font="italic">Language model (LM)</text></toccaption>
        <caption class="ltx_centering"><tag close=": "><text fontsize="90%">Figure 5</text></tag><text font="italic" fontsize="90%">Language model (LM)</text></caption>
      </figure>
    </subsection>
    <subsection inlist="toc" xml:id="S4.SS3">
      <tags>
        <tag>IV-C</tag>
        <tag role="refnum">IV-C</tag>
        <tag role="typerefnum">§IV-C</tag>
      </tags>
      <title><tag close=" ">IV-C</tag><text font="italic">Metrics</text></title>
      <para xml:id="S4.SS3.p1">
        <p>We evaluated models’ performance using perplexity (PPL) and bits-per-character (BPC).
BPC is a standard evaluation metric for character-level LMs <cite class="ltx_citemacro_cite">[<bibref bibrefs="graves2013generating" separator="," yyseparator=","/>]</cite>.</p>
      </para>
      <para xml:id="S4.SS3.p2">
        <equationgroup class="ltx_eqn_eqnarray" xml:id="Sx1.EGx3">
          <equation xml:id="S4.Ex10">
            <MathFork>
              <Math tex="\displaystyle BPC=-\frac{1}{|\bm{x}|}\sum{\log_{2}{p(x_{t}|\bm{x}_{&lt;t})}}" text="B * P * C = - (1 / absolute-value@(x)) * sum@((logarithm _ 2)@(p) * conditional@(x _ t, x _ (absent less t)))" xml:id="S4.Ex10.m4">
                <XMath>
                  <XMApp>
                    <XMTok meaning="equals" role="RELOP">=</XMTok>
                    <XMApp>
                      <XMTok meaning="times" role="MULOP">⁢</XMTok>
                      <XMTok font="italic" role="UNKNOWN">B</XMTok>
                      <XMTok font="italic" role="UNKNOWN">P</XMTok>
                      <XMTok font="italic" role="UNKNOWN">C</XMTok>
                    </XMApp>
                    <XMApp>
                      <XMTok meaning="minus" role="ADDOP">-</XMTok>
                      <XMApp>
                        <XMTok meaning="times" role="MULOP">⁢</XMTok>
                        <XMApp>
                          <XMTok mathstyle="display" meaning="divide" role="FRACOP"/>
                          <XMTok meaning="1" role="NUMBER">1</XMTok>
                          <XMDual>
                            <XMApp>
                              <XMTok meaning="absolute-value"/>
                              <XMRef idref="S4.Ex10.m4.1"/>
                            </XMApp>
                            <XMWrap>
                              <XMTok role="VERTBAR" stretchy="false">|</XMTok>
                              <XMTok font="bold italic" role="UNKNOWN" xml:id="S4.Ex10.m4.1">x</XMTok>
                              <XMTok role="VERTBAR" stretchy="false">|</XMTok>
                            </XMWrap>
                          </XMDual>
                        </XMApp>
                        <XMApp>
                          <XMTok mathstyle="display" meaning="sum" role="SUMOP" scriptpos="mid">∑</XMTok>
                          <XMApp>
                            <XMTok meaning="times" role="MULOP">⁢</XMTok>
                            <XMApp>
                              <XMApp>
                                <XMTok role="SUBSCRIPTOP" scriptpos="post3"/>
                                <XMTok meaning="logarithm" role="OPFUNCTION">log</XMTok>
                                <XMTok fontsize="70%" meaning="2" role="NUMBER">2</XMTok>
                              </XMApp>
                              <XMTok font="italic" role="UNKNOWN">p</XMTok>
                            </XMApp>
                            <XMDual>
                              <XMRef idref="S4.Ex10.m4.2"/>
                              <XMWrap>
                                <XMTok role="OPEN" stretchy="false">(</XMTok>
                                <XMApp xml:id="S4.Ex10.m4.2">
                                  <XMTok meaning="conditional" role="MODIFIEROP" stretchy="false">|</XMTok>
                                  <XMApp>
                                    <XMTok role="SUBSCRIPTOP" scriptpos="post4"/>
                                    <XMTok font="italic" role="UNKNOWN">x</XMTok>
                                    <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
                                  </XMApp>
                                  <XMApp>
                                    <XMTok role="SUBSCRIPTOP" scriptpos="post4"/>
                                    <XMTok font="bold italic" role="UNKNOWN">x</XMTok>
                                    <XMApp>
                                      <XMTok fontsize="70%" meaning="less-than" role="RELOP">&lt;</XMTok>
                                      <XMTok meaning="absent"/>
                                      <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
                                    </XMApp>
                                  </XMApp>
                                </XMApp>
                                <XMTok role="CLOSE" stretchy="false">)</XMTok>
                              </XMWrap>
                            </XMDual>
                          </XMApp>
                        </XMApp>
                      </XMApp>
                    </XMApp>
                  </XMApp>
                </XMath>
              </Math>
              <MathBranch>
                <tr>
                  <td align="right"><Math mode="inline" tex="\displaystyle BPC" text="B * P * C" xml:id="S4.Ex10.m1">
                      <XMath>
                        <XMApp>
                          <XMTok meaning="times" role="MULOP">⁢</XMTok>
                          <XMTok font="italic" role="UNKNOWN">B</XMTok>
                          <XMTok font="italic" role="UNKNOWN">P</XMTok>
                          <XMTok font="italic" role="UNKNOWN">C</XMTok>
                        </XMApp>
                      </XMath>
                    </Math></td>
                  <td align="center"><Math mode="inline" tex="\displaystyle=" text="=" xml:id="S4.Ex10.m2">
                      <XMath>
                        <XMTok meaning="equals" role="RELOP">=</XMTok>
                      </XMath>
                    </Math></td>
                  <td align="left"><Math mode="inline" tex="\displaystyle-\frac{1}{|\bm{x}|}\sum{\log_{2}{p(x_{t}|\bm{x}_{&lt;t})}}" text="- (1 / absolute-value@(x)) * sum@((logarithm _ 2)@(p) * conditional@(x _ t, x _ (absent less t)))" xml:id="S4.Ex10.m3">
                      <XMath>
                        <XMApp>
                          <XMTok meaning="minus" role="ADDOP">-</XMTok>
                          <XMApp>
                            <XMTok meaning="times" role="MULOP">⁢</XMTok>
                            <XMApp>
                              <XMTok mathstyle="display" meaning="divide" role="FRACOP"/>
                              <XMTok meaning="1" role="NUMBER">1</XMTok>
                              <XMDual>
                                <XMApp>
                                  <XMTok meaning="absolute-value"/>
                                  <XMRef idref="S4.Ex10.m3.1"/>
                                </XMApp>
                                <XMWrap>
                                  <XMTok role="VERTBAR" stretchy="false">|</XMTok>
                                  <XMTok font="bold italic" role="UNKNOWN" xml:id="S4.Ex10.m3.1">x</XMTok>
                                  <XMTok role="VERTBAR" stretchy="false">|</XMTok>
                                </XMWrap>
                              </XMDual>
                            </XMApp>
                            <XMApp>
                              <XMTok mathstyle="display" meaning="sum" role="SUMOP" scriptpos="mid">∑</XMTok>
                              <XMApp>
                                <XMTok meaning="times" role="MULOP">⁢</XMTok>
                                <XMApp>
                                  <XMApp>
                                    <XMTok role="SUBSCRIPTOP" scriptpos="post3"/>
                                    <XMTok meaning="logarithm" role="OPFUNCTION">log</XMTok>
                                    <XMTok fontsize="70%" meaning="2" role="NUMBER">2</XMTok>
                                  </XMApp>
                                  <XMTok font="italic" role="UNKNOWN">p</XMTok>
                                </XMApp>
                                <XMDual>
                                  <XMRef idref="S4.Ex10.m3.2"/>
                                  <XMWrap>
                                    <XMTok role="OPEN" stretchy="false">(</XMTok>
                                    <XMApp xml:id="S4.Ex10.m3.2">
                                      <XMTok meaning="conditional" role="MODIFIEROP" stretchy="false">|</XMTok>
                                      <XMApp>
                                        <XMTok role="SUBSCRIPTOP" scriptpos="post4"/>
                                        <XMTok font="italic" role="UNKNOWN">x</XMTok>
                                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
                                      </XMApp>
                                      <XMApp>
                                        <XMTok role="SUBSCRIPTOP" scriptpos="post4"/>
                                        <XMTok font="bold italic" role="UNKNOWN">x</XMTok>
                                        <XMApp>
                                          <XMTok fontsize="70%" meaning="less-than" role="RELOP">&lt;</XMTok>
                                          <XMTok meaning="absent"/>
                                          <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
                                        </XMApp>
                                      </XMApp>
                                    </XMApp>
                                    <XMTok role="CLOSE" stretchy="false">)</XMTok>
                                  </XMWrap>
                                </XMDual>
                              </XMApp>
                            </XMApp>
                          </XMApp>
                        </XMApp>
                      </XMath>
                    </Math></td>
                </tr>
              </MathBranch>
            </MathFork>
          </equation>
          <equation xml:id="S4.Ex11">
            <MathFork>
              <Math tex="\displaystyle PPL=2^{BPC}" text="P * P * L = 2 ^ (B * P * C)" xml:id="S4.Ex11.m4">
                <XMath>
                  <XMApp>
                    <XMTok meaning="equals" role="RELOP">=</XMTok>
                    <XMApp>
                      <XMTok meaning="times" role="MULOP">⁢</XMTok>
                      <XMTok font="italic" role="UNKNOWN">P</XMTok>
                      <XMTok font="italic" role="UNKNOWN">P</XMTok>
                      <XMTok font="italic" role="UNKNOWN">L</XMTok>
                    </XMApp>
                    <XMApp>
                      <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                      <XMTok meaning="2" role="NUMBER">2</XMTok>
                      <XMApp>
                        <XMTok meaning="times" role="MULOP">⁢</XMTok>
                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">B</XMTok>
                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">P</XMTok>
                        <XMTok font="italic" fontsize="70%" role="UNKNOWN">C</XMTok>
                      </XMApp>
                    </XMApp>
                  </XMApp>
                </XMath>
              </Math>
              <MathBranch>
                <tr>
                  <td align="right"><Math mode="inline" tex="\displaystyle PPL" text="P * P * L" xml:id="S4.Ex11.m1">
                      <XMath>
                        <XMApp>
                          <XMTok meaning="times" role="MULOP">⁢</XMTok>
                          <XMTok font="italic" role="UNKNOWN">P</XMTok>
                          <XMTok font="italic" role="UNKNOWN">P</XMTok>
                          <XMTok font="italic" role="UNKNOWN">L</XMTok>
                        </XMApp>
                      </XMath>
                    </Math></td>
                  <td align="center"><Math mode="inline" tex="\displaystyle=" text="=" xml:id="S4.Ex11.m2">
                      <XMath>
                        <XMTok meaning="equals" role="RELOP">=</XMTok>
                      </XMath>
                    </Math></td>
                  <td align="left"><Math mode="inline" tex="\displaystyle 2^{BPC}" text="2 ^ (B * P * C)" xml:id="S4.Ex11.m3">
                      <XMath>
                        <XMApp>
                          <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                          <XMTok meaning="2" role="NUMBER">2</XMTok>
                          <XMApp>
                            <XMTok meaning="times" role="MULOP">⁢</XMTok>
                            <XMTok font="italic" fontsize="70%" role="UNKNOWN">B</XMTok>
                            <XMTok font="italic" fontsize="70%" role="UNKNOWN">P</XMTok>
                            <XMTok font="italic" fontsize="70%" role="UNKNOWN">C</XMTok>
                          </XMApp>
                        </XMApp>
                      </XMath>
                    </Math></td>
                </tr>
              </MathBranch>
            </MathFork>
          </equation>
        </equationgroup>
        <p>where <Math mode="inline" tex="\bm{x}" text="x" xml:id="S4.SS3.p2.m1">
            <XMath>
              <XMTok font="bold italic" role="UNKNOWN">x</XMTok>
            </XMath>
          </Math> is the whole corpus, <Math mode="inline" tex="x_{t}" text="x _ t" xml:id="S4.SS3.p2.m2">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMTok font="italic" role="UNKNOWN">x</XMTok>
                <XMTok font="italic" fontsize="70%" role="UNKNOWN">t</XMTok>
              </XMApp>
            </XMath>
          </Math> is the character at position <Math mode="inline" tex="t" text="t" xml:id="S4.SS3.p2.m3">
            <XMath>
              <XMTok font="italic" role="UNKNOWN">t</XMTok>
            </XMath>
          </Math>, and <Math mode="inline" tex="|\bm{x}|" text="absolute-value@(x)" xml:id="S4.SS3.p2.m4">
            <XMath>
              <XMDual>
                <XMApp>
                  <XMTok meaning="absolute-value"/>
                  <XMRef idref="S4.SS3.p2.m4.1"/>
                </XMApp>
                <XMWrap>
                  <XMTok role="VERTBAR" stretchy="false">|</XMTok>
                  <XMTok font="bold italic" role="UNKNOWN" xml:id="S4.SS3.p2.m4.1">x</XMTok>
                  <XMTok role="VERTBAR" stretchy="false">|</XMTok>
                </XMWrap>
              </XMDual>
            </XMath>
          </Math> is the length of the corpus.</p>
      </para>
    </subsection>
    <subsection inlist="toc" xml:id="S4.SS4">
      <tags>
        <tag>IV-D</tag>
        <tag role="refnum">IV-D</tag>
        <tag role="typerefnum">§IV-D</tag>
      </tags>
      <title><tag close=" ">IV-D</tag><text font="italic">Hyperparameters</text></title>
      <para xml:id="S4.SS4.p1">
        <p>The same hyperparameters are used across the datasets.
We optimized the models using the Adam <cite class="ltx_citemacro_cite">[<bibref bibrefs="kingma2015adam" separator="," yyseparator=","/>]</cite> optimizer for 300 epochs.
The learning rate was set at 0.002 and is divided by 10 after 250 epochs.
The size of hidden layer is fixed as 1000.
The size of the embedding is fixed as 200.
The AWD-LSTM has three hidden layers with sizes 1000, 1000, 200 respectively.
We used dropout <cite class="ltx_citemacro_cite">[<bibref bibrefs="srivastava14dropout" separator="," yyseparator=","/>]</cite> on input and hidden layers to prevent overfitting.
Dropout rates were set as 0.1, 0.1, and 0.25 for the input, hidden and output layers of the AWD-LSTM.
L2 weight decay was set as <Math mode="inline" tex="1.2\times 10^{-6}" text="1.2 * 10 ^ (- 6)" xml:id="S4.SS4.p1.m1">
            <XMath>
              <XMApp>
                <XMTok meaning="times" role="MULOP">×</XMTok>
                <XMTok meaning="1.2" role="NUMBER">1.2</XMTok>
                <XMApp>
                  <XMTok role="SUPERSCRIPTOP" scriptpos="post1"/>
                  <XMTok meaning="10" role="NUMBER">10</XMTok>
                  <XMApp>
                    <XMTok fontsize="70%" meaning="minus" role="ADDOP">-</XMTok>
                    <XMTok fontsize="70%" meaning="6" role="NUMBER">6</XMTok>
                  </XMApp>
                </XMApp>
              </XMApp>
            </XMath>
          </Math>.
Weight dropout was set at 0.5.
<!--  %**** taslp2019.tex Line 600 **** -->The batch size was 100.
To improve computational speed, only embeddings of the characters appearing in the training batch were updated.
During testing, the embeddings were constructed once and then cached, hence using hierarchical embedding was nearly as fast as standard embeddings.
The caching technique was similar to <cite class="ltx_citemacro_cite">[<bibref bibrefs="ling2015finding" separator="," yyseparator=","/>]</cite>.</p>
      </para>
    </subsection>
    <subsection inlist="toc" labels="LABEL:ssection:results" xml:id="S4.SS5">
      <tags>
        <tag>IV-E</tag>
        <tag role="refnum">IV-E</tag>
        <tag role="typerefnum">§IV-E</tag>
      </tags>
      <title><tag close=" ">IV-E</tag><text font="italic">Results</text></title>
      <table inlist="lot" labels="LABEL:tbl:lm_result" placement="ht" xml:id="S4.T7">
        <tags>
          <tag><text fontsize="90%">Table VII</text></tag>
          <tag role="refnum">VII</tag>
          <tag role="typerefnum">Table VII</tag>
        </tags>
        <tabular class="ltx_centering ltx_guessed_headers" vattach="middle">
          <tbody>
            <tr>
              <td align="left" thead="row">Model</td>
              <td align="right">Perplexity</td>
              <td align="right">BPC</td>
            </tr>
            <tr>
              <td align="left" border="t" colspan="3" thead="row">Dataset: CTB (Simplified)</td>
            </tr>
            <tr>
              <td align="left" border="t" thead="row">LSTM <cite class="ltx_citemacro_cite">[<bibref bibrefs="kawakami2018unsupervised" separator="," yyseparator=","/>]</cite></td>
              <td align="right" border="t">30.78</td>
              <td align="right" border="t">4.944</td>
            </tr>
            <tr>
              <td align="left" thead="row">Segmental Neural LM <cite class="ltx_citemacro_cite">[<bibref bibrefs="kawakami2018unsupervised" separator="," yyseparator=","/>]</cite></td>
              <td align="right">28.56</td>
              <td align="right">4.836</td>
            </tr>
            <tr>
              <td align="left" thead="row">AWD-LSTM, baseline</td>
              <td align="right">19.14</td>
              <td align="right">4.259</td>
            </tr>
            <tr>
              <td align="left" thead="row"><text font="bold">AWD-LSTM, hier-emb</text></td>
              <td align="right"><text font="bold">18.71</text></td>
              <td align="right"><text font="bold">4.226</text></td>
            </tr>
            <tr>
              <td align="left" thead="row">AWD-LSTM, hier-emb, ext</td>
              <td align="right">18.85</td>
              <td align="right">4.237</td>
            </tr>
            <tr>
              <td align="left" border="t" colspan="3" thead="row">Dataset: PKU (Simplified)</td>
            </tr>
            <tr>
              <td align="left" border="t" thead="row">LSTM <cite class="ltx_citemacro_cite">[<bibref bibrefs="kawakami2018unsupervised" separator="," yyseparator=","/>]</cite></td>
              <td align="right" border="t">73.66</td>
              <td align="right" border="t">6.203</td>
            </tr>
            <tr>
              <td align="left" thead="row">Segmental Neural LM <cite class="ltx_citemacro_cite">[<bibref bibrefs="kawakami2018unsupervised" separator="," yyseparator=","/>]</cite></td>
              <td align="right">59.01</td>
              <td align="right">5.883</td>
            </tr>
            <tr>
              <td align="left" thead="row">AWD-LSTM, baseline</td>
              <td align="right">55.42</td>
              <td align="right">5.792</td>
            </tr>
            <tr>
              <td align="left" thead="row"><text font="bold">AWD-LSTM, hier-emb</text></td>
              <td align="right"><text font="bold">53.96</text></td>
              <td align="right"><text font="bold">5.754</text></td>
            </tr>
            <tr>
              <td align="left" thead="row">AWD-LSTM, hier-emb, ext</td>
              <td align="right">56.09</td>
              <td align="right">5.810</td>
            </tr>
            <tr>
              <td align="left" border="t" colspan="3" thead="row">Dataset: MSR (Simplified)</td>
            </tr>
            <tr>
              <td align="left" border="t" thead="row">GRU <cite class="ltx_citemacro_cite">[<bibref bibrefs="dai2017glyph" separator="," yyseparator=","/>]</cite></td>
              <td align="right" border="t">47.53</td>
              <td align="right" border="t">5.571</td>
            </tr>
            <tr>
              <td align="left" thead="row">GRU, glyph-emb <cite class="ltx_citemacro_cite">[<bibref bibrefs="dai2017glyph" separator="," yyseparator=","/>]</cite></td>
              <td align="right">47.75</td>
              <td align="right">5.577</td>
            </tr>
            <tr>
              <td align="left" thead="row">GRU, reimplemented</td>
              <td align="right">34.27</td>
              <td align="right">5.099</td>
            </tr>
            <tr>
              <td align="left" thead="row">GRU, glyph-emb, reimplemented</td>
              <td align="right">34.76</td>
              <td align="right">5.119</td>
            </tr>
            <tr>
              <td align="left" thead="row">AWD-LSTM, baseline</td>
              <td align="right">22.28</td>
              <td align="right">4.478</td>
            </tr>
            <tr>
              <td align="left" thead="row">AWD-LSTM, glyph-emb</td>
              <td align="right">22.52</td>
              <td align="right">4.493</td>
            </tr>
            <tr>
              <td align="left" thead="row">AWD-LSTM, hier-emb</td>
              <td align="right">22.64</td>
              <td align="right">4.501</td>
            </tr>
            <tr>
              <td align="left" thead="row"><text font="bold">AWD-LSTM, hier-emb, ext</text></td>
              <td align="right"><text font="bold">22.25</text></td>
              <td align="right"><text font="bold">4.476</text></td>
            </tr>
            <tr>
              <td align="left" border="t" colspan="3" thead="row">Dataset: CITYU (Traditional)</td>
            </tr>
            <tr>
              <td align="left" border="t" thead="row">AWD-LSTM, baseline</td>
              <td align="right" border="t">70.48</td>
              <td align="right" border="t">6.139</td>
            </tr>
            <tr>
              <td align="left" thead="row"><text font="bold">AWD-LSTM, hier-emb</text></td>
              <td align="right"><text font="bold">68.47</text></td>
              <td align="right"><text font="bold">6.097</text></td>
            </tr>
            <tr>
              <td align="left" thead="row">AWD-LSTM, hier-emb, ext</td>
              <td align="right">68.93</td>
              <td align="right">6.107</td>
            </tr>
            <tr>
              <td align="left" border="t" colspan="3" thead="row">Dataset: AS (Traditional)</td>
            </tr>
            <tr>
              <td align="left" border="t" thead="row">AWD-LSTM, baseline</td>
              <td align="right" border="t">45.99</td>
              <td align="right" border="t">5.523</td>
            </tr>
            <tr>
              <td align="left" thead="row">AWD-LSTM, hier-emb</td>
              <td align="right">46.88</td>
              <td align="right">5.551</td>
            </tr>
            <tr>
              <td align="left" thead="row"><text font="bold">AWD-LSTM, hier-emb, ext</text></td>
              <td align="right"><text font="bold">45.91</text></td>
              <td align="right"><text font="bold">5.521</text></td>
            </tr>
          </tbody>
        </tabular>
        <toccaption class="ltx_centering"><tag close=" ">VII</tag>
<text font="italic">Language modeling performance on test sets from different datasets.
<text font="bold">hier-emb</text>: hierarchical embedding, <text font="bold">glyph-emb</text>: glyph embeddings, <text font="bold">baseline</text>: standard embeddings, <text font="bold">ext</text>: additional bias term in treeLSTM. Results for the LSTM, Segmental Neural LM, and the glyph embeddings were taken from the original papers.
We also reimplemented the glyph embeddings for a fairer comparison.
</text></toccaption>
        <caption class="ltx_centering"><tag close=": "><text fontsize="90%">Table VII</text></tag><text fontsize="90%">
<text font="italic">Language modeling performance on test sets from different datasets.
<text font="bold">hier-emb</text>: hierarchical embedding, <text font="bold">glyph-emb</text>: glyph embeddings, <text font="bold">baseline</text>: standard embeddings, <text font="bold">ext</text>: additional bias term in treeLSTM. Results for the LSTM, Segmental Neural LM, and the glyph embeddings were taken from the original papers.
We also reimplemented the glyph embeddings for a fairer comparison.
</text></text></caption>
      </table>
      <para xml:id="S4.SS5.p1">
        <p>Table <ref labelref="LABEL:tbl:lm_result"/> shows the prediction results.
We also report results on the CTB and PKU datasets from <cite class="ltx_citemacro_cite">[<bibref bibrefs="kawakami2018unsupervised" separator="," yyseparator=","/>]</cite> and results on the MSR dataset from <cite class="ltx_citemacro_cite">[<bibref bibrefs="dai2017glyph" separator="," yyseparator=","/>]</cite>.
The results from <cite class="ltx_citemacro_cite">[<bibref bibrefs="kawakami2018unsupervised" separator="," yyseparator=","/>]</cite> can be compared with our results since the results are evaluated on the same data splits.
However, direct comparison is unfair for <cite class="ltx_citemacro_cite">[<bibref bibrefs="kawakami2018unsupervised" separator="," yyseparator=","/>]</cite> because our models are bigger than theirs.
The results from <cite class="ltx_citemacro_cite">[<bibref bibrefs="dai2017glyph" separator="," yyseparator=","/>]</cite> cannot be compared with our results as the data splits are different because their data split is not publicly available.
Thus, we reimplemented the glyph embedding for a fairer comparison.
The glyph embedding model architecture is similar to that used in the original paper <cite class="ltx_citemacro_cite">[<bibref bibrefs="dai2017glyph" separator="," yyseparator=","/>]</cite>.
We only include the results for the Segmental Neural LM model for reference and did not reimplement this model because it depends on multitask training which is different from the other models.
Our result agrees with the conclusion from <cite class="ltx_citemacro_cite">[<bibref bibrefs="dai2017glyph" separator="," yyseparator=","/>]</cite> that the glyph embeddings are slightly worse than standard embeddings regardless of the baseline (GRU or AWD-LSTM).
The hierarchical embeddings outperformed the standard embeddings in all datasets, regardless of whether the datasets use simplified or traditional characters.
<!--  %**** taslp2019.tex Line 675 **** --></p>
      </para>
    </subsection>
  </section>
  <section inlist="toc" labels="LABEL:sec:related" xml:id="S5">
    <tags>
      <tag>V</tag>
      <tag role="refnum">V</tag>
      <tag role="typerefnum">§V</tag>
    </tags>
    <title><tag close=" ">V</tag><text font="smallcaps">Relation to Other Work</text></title>
    <subsection inlist="toc" xml:id="S5.SS1">
      <tags>
        <tag>V-A</tag>
        <tag role="refnum">V-A</tag>
        <tag role="typerefnum">§V-A</tag>
      </tags>
      <title><tag close=" ">V-A</tag><text font="italic">Exploiting Recursive Structures</text></title>
      <para xml:id="S5.SS1.p1">
        <p>Exploiting recursive structures has been shown to be beneficial in many NLP tasks such as sentiment analysis <cite class="ltx_citemacro_cite">[<bibref bibrefs="irsoy2014deep,tai2015improved,zhu2015long" separator="," yyseparator=","/>]</cite>, text simplification <cite class="ltx_citemacro_cite">[<bibref bibrefs="siddharthan2014hybrid" separator="," yyseparator=","/>]</cite>, and machine translation <cite class="ltx_citemacro_cite">[<bibref bibrefs="quirk2005dependency,eriguchi2016tree,nakazawa2016insertion,chen2017improved" separator="," yyseparator=","/>]</cite>.
These models are usually trained using human annotated structures but may be tested on structures annotated automatically using parsers when human annotation is not available.
This mismatch in annotation quality could worsen the performance of these models and could partially explain why exploiting structures in NLP tasks have not always led to better results.
For example, recursive models <cite class="ltx_citemacro_cite">[<bibref bibrefs="socher2011semi,socher2012semantic,socher2013recursive" separator="," yyseparator=","/>]</cite> were not as good as the biLSTM in sentiment analysis task <cite class="ltx_citemacro_cite">[<bibref bibrefs="tai2015improved" separator="," yyseparator=","/>]</cite>.
To address the mismatch in annotation quality, new models which can both produce and exploit structures have been introduced <cite class="ltx_citemacro_cite">[<bibref bibrefs="bowman2016fast,eriguchi2017learning,yogatama2017learning" separator="," yyseparator=","/>]</cite>.
For our case, annotation quality is consistent across the training and test set, thus, better ways of modeling structures led to better results.</p>
      </para>
    </subsection>
    <subsection inlist="toc" xml:id="S5.SS2">
      <tags>
        <tag>V-B</tag>
        <tag role="refnum">V-B</tag>
        <tag role="typerefnum">§V-B</tag>
      </tags>
      <title><tag close=" ">V-B</tag><text font="italic">Building Logographic Embeddings</text></title>
      <para xml:id="S5.SS2.p1">
        <p>In languages like Mandarin, Japanese or Cantonese, logographs are characters and the number of characters are in the range of thousands.
In contrast, alphabetic languages usually have far fewer characters (e.g. 26 characters for English).
The large number of characters in languages with logographic origin makes character-level modeling inefficient and worsens the problem of out-of-vocabulary words and characters.
However, alphabetic languages and languages with logographic origin are often treated the same way, disregarding their intrinsically marked differences <cite class="ltx_citemacro_cite">[<bibref bibrefs="zhang2018neural" separator="," yyseparator=","/>]</cite>.
Modeling logograph sub-units can alleviate these issues since there are fewer sub-units and they can be used to construct out-of-vocabulary words and characters.
This is consistent with how learners of languages with logographic origin can comprehend the meaning or pronunciation of a logograph from its constituent sub-units <cite class="ltx_citemacro_cite">[<bibref bibrefs="ho1997phonological" separator="," yyseparator=","/>]</cite>.
Hence, leveraging structures of logographs can be useful in capturing semantic <cite class="ltx_citemacro_cite">[<bibref bibrefs="su2017learning,song2018joint" separator="," yyseparator=","/>]</cite> or phonological information <cite class="ltx_citemacro_cite">[<bibref bibrefs="nguyen2018multimodal" separator="," yyseparator=","/>]</cite>.</p>
      </para>
      <para xml:id="S5.SS2.p2">
        <p>There are many prior work on building embeddings of logographs.
The first approach is to apply convolutional neural network (CNN) on the visual rendering of logographs <cite class="ltx_citemacro_cite">[<bibref bibrefs="dai2017glyph,liu2017learning,toyama2017utilizing,su2017learning" separator="," yyseparator=","/>]</cite>.
<!--  %**** taslp2019.tex Line 700 **** -->The second approach is to combine sub-unit embeddings with the logograph embeddings.
Sub-units embeddings can be learned independently of logograph embeddings <cite class="ltx_citemacro_cite">[<bibref bibrefs="shi2015radical,peng2017radical,yu2017joint" separator="," yyseparator=","/>]</cite> using Skip-Gram or CBOW models <cite class="ltx_citemacro_cite">[<bibref bibrefs="mikolov2013distributed" separator="," yyseparator=","/>]</cite> or learned jointly with logograph embeddings <cite class="ltx_citemacro_cite">[<bibref bibrefs="yin2016multi,ke2017radical,karpinska2018subcharacter" separator="," yyseparator=","/>]</cite>.
The third approach is to apply CNN or RNN on the sequence of sub-units <cite class="ltx_citemacro_cite">[<bibref bibrefs="dong2016character,han2017dual,zhuang2017natural,cao2018cw2vec,li2018subword" separator="," yyseparator=","/>]</cite>.</p>
      </para>
      <para xml:id="S5.SS2.p3">
        <p>Our work is most similar to the third approach.
However, while our approach exploits the recursive structures of logographs, most work in this area ignores structures or only consider the structures implicitly.</p>
      </para>
    </subsection>
    <subsection inlist="toc" xml:id="S5.SS3">
      <tags>
        <tag>V-C</tag>
        <tag role="refnum">V-C</tag>
        <tag role="typerefnum">§V-C</tag>
      </tags>
      <title><tag close=" ">V-C</tag><text font="italic">Incorporating Morphology into Embeddings</text></title>
      <para xml:id="S5.SS3.p1">
        <p>In languages like English, popular models to learn word embeddings assign a distinct vector to each word, ignoring word morphology (how characters, word’s sub-units, form a word).
This approach uses solely the context surrounding words to learn the embeddings which may be a limitation in languages with a large vocabulary and many rare words since the context may be insufficient to learn good embeddings.
Building logographic (character) embeddings in languages of logographic origin has the same difficulty since there a lot of logographs (characters) and many of them are rare characters.</p>
      </para>
      <para xml:id="S5.SS3.p2">
        <p>To incorporate morphology into word embeddings learning, <cite class="ltx_citemacro_cite">[<bibref bibrefs="bojanowski2017enriching,zhao2018generalizing" separator="," yyseparator=","/>]</cite> proposed building word embeddings by averaging bags of character n-grams.
This method may be agnostic to the order of characters if the n-gram length is short.
Others have used RNN <cite class="ltx_citemacro_cite">[<bibref bibrefs="ling2015finding,li2018subword" separator="," yyseparator=","/>]</cite> or CNN <cite class="ltx_citemacro_cite">[<bibref bibrefs="kim2016character,papay2018addressing,li2018subword" separator="," yyseparator=","/>]</cite> to better incorporate word morphology information into words embeddings.
Unlike English words which are linear sequences of characters, logographs are recursive structures of of sub-units.
Hence, using models operating on sequences such as RNN or LSTM may not be optimal.</p>
      </para>
      <para xml:id="S5.SS3.p3">
        <p>Rare word/character embeddings can be improved by leveraging similarity in morphology between rare words and common words.
<cite class="ltx_citemacro_cite">[<bibref bibrefs="pinter2017mimicking,kim2018learning,schick2019attentive" separator="," yyseparator=","/>]</cite> proposed building embeddings of new words from pre-trained embeddings by learning mapping from characters to embeddings.
However, in this line of approach, the embeddings are fixed, which may not be useful for tasks that require information not captured in the embeddings pre-trained via unsupervised language modeling.
In work from <cite class="ltx_citemacro_cite">[<bibref bibrefs="ling2015finding,kim2016character,li2018subword" separator="," yyseparator=","/>]</cite>, the embeddings are learned jointly with the task models so that the embeddings contain useful information for the task.
Our hierarchical embeddings can be trained on task-specific data, making it potentially useful for many different tasks.</p>
      </para>
<!--  %**** taslp2019.tex Line 725 **** -->    </subsection>
  </section>
  <section inlist="toc" labels="LABEL:sec:discussion" xml:id="S6">
    <tags>
      <tag>VI</tag>
      <tag role="refnum">VI</tag>
      <tag role="typerefnum">§VI</tag>
    </tags>
    <title><tag close=" ">VI</tag><text font="smallcaps">Discussion</text></title>
    <figure inlist="lof" labels="LABEL:subfig:ex1" placement="ht" xml:id="S6.F6">
      <tags>
        <tag><text fontsize="90%">Figure 6</text></tag>
        <tag role="refnum">6</tag>
        <tag role="typerefnum">Figure 6</tag>
      </tags>
      <figure align="center" inlist="lof" labels="LABEL:subfig:ex1_slstm" xml:id="S6.F5.sf1">
        <tags>
          <tag><text fontsize="90%">(a)</text></tag>
          <tag role="refnum">5(a)</tag>
        </tags>
        <graphics class="ltx_centering" graphic="rn2266-0b" options="width=411.939pt" xml:id="S6.F5.sf1.g1"/>
        <toccaption class="ltx_centering"><tag close=" ">(a)</tag>LSTM prediction</toccaption>
        <caption class="ltx_centering"><tag close=" "><text fontsize="90%">(a)</text></tag><text fontsize="90%">LSTM prediction</text></caption>
      </figure>
      <figure align="center" inlist="lof" labels="LABEL:subfig:ex1_tlstm" xml:id="S6.F5.sf2">
        <tags>
          <tag><text fontsize="90%">(b)</text></tag>
          <tag role="refnum">5(b)</tag>
        </tags>
        <graphics class="ltx_centering" graphic="rv2266-0b" options="width=411.939pt" xml:id="S6.F5.sf2.g1"/>
        <toccaption class="ltx_centering"><tag close=" ">(b)</tag>treeLSTM prediction</toccaption>
        <caption class="ltx_centering"><tag close=" "><text fontsize="90%">(b)</text></tag><text fontsize="90%">treeLSTM prediction</text></caption>
      </figure>
      <toccaption class="ltx_centering"><tag close=" ">6</tag>
<text font="italic">Visualizing the construction of the logograph embedding for
<ERROR class="undefined">{CJK*}</ERROR>UTF8bkai賄 (<emph font="upright">bribery</emph>) by LSTM (a) and treeLSTM (b).
The central panels show the hidden states <Math mode="inline" tex="\textbf{h}_{i}" text="[h] _ i" xml:id="S6.F6.m1">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMText><text font="bold">h</text></XMText>
                <XMTok fontsize="70%" role="UNKNOWN">i</XMTok>
              </XMApp>
            </XMath>
          </Math>.
The left columns show the input sub-units.
The right columns show the predicted pronunciations using the hidden states <Math mode="inline" tex="\textbf{h}_{i}" text="[h] _ i" xml:id="S6.F6.m2">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMText><text font="bold">h</text></XMText>
                <XMTok fontsize="70%" role="UNKNOWN">i</XMTok>
              </XMApp>
            </XMath>
          </Math>.
The bottom rows of the right columns are the predicted pronunciations for the logographs (“f ui #” for both LSTM and treeLSTM).
Ground-truth pronunciation is “f ui #”.
</text></toccaption>
      <caption class="ltx_centering"><tag close=": "><text fontsize="90%">Figure 6</text></tag><text fontsize="90%">
<text font="italic">Visualizing the construction of the logograph embedding for
<ERROR class="undefined">{CJK*}</ERROR>UTF8bkai賄 (<emph font="upright">bribery</emph>) by LSTM (a) and treeLSTM (b).
The central panels show the hidden states <Math mode="inline" tex="\textbf{h}_{i}" text="[h] _ i" xml:id="S6.F6.m3">
              <XMath>
                <XMApp>
                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                  <XMText><text font="bold">h</text></XMText>
                  <XMTok fontsize="70%" role="UNKNOWN">i</XMTok>
                </XMApp>
              </XMath>
            </Math>.
The left columns show the input sub-units.
The right columns show the predicted pronunciations using the hidden states <Math mode="inline" tex="\textbf{h}_{i}" text="[h] _ i" xml:id="S6.F6.m4">
              <XMath>
                <XMApp>
                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                  <XMText><text font="bold">h</text></XMText>
                  <XMTok fontsize="70%" role="UNKNOWN">i</XMTok>
                </XMApp>
              </XMath>
            </Math>.
The bottom rows of the right columns are the predicted pronunciations for the logographs (“f ui #” for both LSTM and treeLSTM).
Ground-truth pronunciation is “f ui #”.
</text></text></caption>
    </figure>
    <figure inlist="lof" labels="LABEL:subfig:ex2" placement="hb" xml:id="S6.F7">
      <tags>
        <tag><text fontsize="90%">Figure 7</text></tag>
        <tag role="refnum">7</tag>
        <tag role="typerefnum">Figure 7</tag>
      </tags>
      <figure align="center" inlist="lof" labels="LABEL:subfig:ex2_slstm" xml:id="S6.F6.sf1">
        <tags>
          <tag><text fontsize="90%">(a)</text></tag>
          <tag role="refnum">6(a)</tag>
        </tags>
        <graphics class="ltx_centering" graphic="fig7a_cropped" options="width=411.939pt" xml:id="S6.F6.sf1.g1"/>
        <toccaption class="ltx_centering"><tag close=" ">(a)</tag>LSTM prediction</toccaption>
        <caption class="ltx_centering"><tag close=" "><text fontsize="90%">(a)</text></tag><text fontsize="90%">LSTM prediction</text></caption>
      </figure>
      <figure align="center" inlist="lof" labels="LABEL:subfig:ex2_tlstm" xml:id="S6.F6.sf2">
        <tags>
          <tag><text fontsize="90%">(b)</text></tag>
          <tag role="refnum">6(b)</tag>
        </tags>
        <graphics class="ltx_centering" graphic="rv70-0" options="width=411.939pt" xml:id="S6.F6.sf2.g1"/>
        <toccaption class="ltx_centering"><tag close=" ">(b)</tag>treeLSTM prediction</toccaption>
        <caption class="ltx_centering"><tag close=" "><text fontsize="90%">(b)</text></tag><text fontsize="90%">treeLSTM prediction</text></caption>
      </figure>
      <toccaption class="ltx_centering"><tag close=" ">7</tag>
<text font="italic">Visualizing the construction of the logograph embedding for
<ERROR class="undefined">{CJK*}</ERROR>UTF8bkai鴽 (<emph font="upright">quail</emph>) by LSTM (a) and treeLSTM (b).
The central panels show the hidden states <Math mode="inline" tex="\textbf{h}_{i}" text="[h] _ i" xml:id="S6.F7.m1">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMText><text font="bold">h</text></XMText>
                <XMTok fontsize="70%" role="UNKNOWN">i</XMTok>
              </XMApp>
            </XMath>
          </Math>.
The left columns show the input sub-units.
The right columns show the predicted pronunciations using the hidden states <Math mode="inline" tex="\textbf{h}_{i}" text="[h] _ i" xml:id="S6.F7.m2">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMText><text font="bold">h</text></XMText>
                <XMTok fontsize="70%" role="UNKNOWN">i</XMTok>
              </XMApp>
            </XMath>
          </Math>.
The bottom rows of the right columns are the predicted pronunciations for the logographs (“m u #” for LSTM and “j yu #” for treeLSTM).
Ground-truth pronunciation is “j yu #”.
While LSTM made a mistake, treeLSTM predicted the correct pronunciation.
</text></toccaption>
      <caption class="ltx_centering"><tag close=": "><text fontsize="90%">Figure 7</text></tag><text fontsize="90%">
<text font="italic">Visualizing the construction of the logograph embedding for
<ERROR class="undefined">{CJK*}</ERROR>UTF8bkai鴽 (<emph font="upright">quail</emph>) by LSTM (a) and treeLSTM (b).
The central panels show the hidden states <Math mode="inline" tex="\textbf{h}_{i}" text="[h] _ i" xml:id="S6.F7.m3">
              <XMath>
                <XMApp>
                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                  <XMText><text font="bold">h</text></XMText>
                  <XMTok fontsize="70%" role="UNKNOWN">i</XMTok>
                </XMApp>
              </XMath>
            </Math>.
The left columns show the input sub-units.
The right columns show the predicted pronunciations using the hidden states <Math mode="inline" tex="\textbf{h}_{i}" text="[h] _ i" xml:id="S6.F7.m4">
              <XMath>
                <XMApp>
                  <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                  <XMText><text font="bold">h</text></XMText>
                  <XMTok fontsize="70%" role="UNKNOWN">i</XMTok>
                </XMApp>
              </XMath>
            </Math>.
The bottom rows of the right columns are the predicted pronunciations for the logographs (“m u #” for LSTM and “j yu #” for treeLSTM).
Ground-truth pronunciation is “j yu #”.
While LSTM made a mistake, treeLSTM predicted the correct pronunciation.
</text></text></caption>
    </figure>
<!--  %**** taslp2019.tex Line 775 **** -->    <subsection inlist="toc" xml:id="S6.SS1">
      <tags>
        <tag>VI-A</tag>
        <tag role="refnum">VI-A</tag>
        <tag role="typerefnum">§VI-A</tag>
      </tags>
      <title><tag close=" ">VI-A</tag><text font="italic">Left-right Bias in Pronunciation Prediction</text></title>
      <para xml:id="S6.SS1.p1">
        <p>More than 80% of frequently used Han logographs are semantic-phonetic compounds <cite class="ltx_citemacro_cite">[<bibref bibrefs="li1993analysis" separator="," yyseparator=","/>]</cite>.
These compounds consist of sub-units that might contain phonetic or semantic information <cite class="ltx_citemacro_cite">[<bibref bibrefs="hsiao2006analysis" separator="," yyseparator=","/>]</cite>.
Pronunciation of these compounds could conceivably be predicted from the phonetic sub-units.
Amongst semantic-phonetic compounds, logographs with the left-right arrangement (in which the semantic sub-unit is on the left and the phonetic sub-unit is on the right) are the most common.
For logographs with the left-right arrangement, a good model for logograph’s pronunciation should prefer the right child (the likely phonetic sub-unit) of a root node for making pronunciation prediction.
To check whether the hierarchical embeddings prefer the left child or the right child, we compared the norm of the left forget gate against the norm of the right forget gate.
The right child is preferred if the norm of the right forget gate is larger.</p>
      </para>
      <table inlist="lot" labels="LABEL:tbl:lr_bias" placement="ht" xml:id="S6.T8">
        <tags>
          <tag><text fontsize="90%">Table VIII</text></tag>
          <tag role="refnum">VIII</tag>
          <tag role="typerefnum">Table VIII</tag>
        </tags>
        <tabular class="ltx_centering ltx_guessed_headers" vattach="middle">
          <thead>
            <tr>
              <td align="left" thead="column">Scenario</td>
              <td align="center" thead="column">Left-Right</td>
              <td align="center" thead="column">Prefer Right</td>
            </tr>
          </thead>
          <tbody>
            <tr>
              <td align="left" border="t">Tr., Sp. <Math mode="inline" tex="\rightarrow" text="rightarrow" xml:id="S6.T8.m1">
                  <XMath>
                    <XMTok name="rightarrow" role="ARROW">→</XMTok>
                  </XMath>
                </Math> Tr., Sp.</td>
              <td align="center" border="t">1657</td>
              <td align="center" border="t">1543 (93%)</td>
            </tr>
            <tr>
              <td align="left">Non-Sp. <Math mode="inline" tex="\rightarrow" text="rightarrow" xml:id="S6.T8.m2">
                  <XMath>
                    <XMTok name="rightarrow" role="ARROW">→</XMTok>
                  </XMath>
                </Math> Sp.</td>
              <td align="center">1686</td>
              <td align="center">1589 (94%)</td>
            </tr>
            <tr>
              <td align="left">Tr. <Math mode="inline" tex="\rightarrow" text="rightarrow" xml:id="S6.T8.m3">
                  <XMath>
                    <XMTok name="rightarrow" role="ARROW">→</XMTok>
                  </XMath>
                </Math> Sp.</td>
              <td align="center">1686</td>
              <td align="center">1643 (97%)</td>
            </tr>
          </tbody>
        </tabular>
        <toccaption class="ltx_centering"><tag close=" ">VIII</tag>
<text font="italic">Number of times the model using hierarchical embeddings predicts the phonetic sub-unit is on the right of a logograph that follows the left-right arrangement.
The scenarios were described in Table <ref labelref="LABEL:tbl:ph_data"/>.
Tr: Traditional, Sp: Simplified.
</text></toccaption>
        <caption class="ltx_centering"><tag close=": "><text fontsize="90%">Table VIII</text></tag><text fontsize="90%">
<text font="italic">Number of times the model using hierarchical embeddings predicts the phonetic sub-unit is on the right of a logograph that follows the left-right arrangement.
The scenarios were described in Table <ref labelref="LABEL:tbl:ph_data"/>.
Tr: Traditional, Sp: Simplified.
</text></text></caption>
      </table>
      <para xml:id="S6.SS1.p2">
        <p>In Table <ref labelref="LABEL:tbl:lr_bias"/>, the second column shows the number of logographs following the left-right arrangement for different scenarios.
The third column shows the number of logographs following the left-right arrangement in which the right child is preferred over the left child.
The hierarchical embeddings prefer the right child most of the time (close to 100%) in all three scenarios.
Thus, the learned hierarchical embeddings consider the right sub-units to be more relevant for pronunciation prediction for the majority of compound logographs with the left-right arrangement.
This is consistent with human intuition.
Since human depends on this intuition to infer pronunciation and it seems to work well, this suggests that the hierarchical embeddings might have learned a general solution that works well.</p>
      </para>
    </subsection>
    <subsection inlist="toc" xml:id="S6.SS2">
      <tags>
        <tag>VI-B</tag>
        <tag role="refnum">VI-B</tag>
        <tag role="typerefnum">§VI-B</tag>
      </tags>
      <title><tag close=" ">VI-B</tag><text font="italic">Robustness to Distractors in Pronunciation Prediction</text></title>
      <para xml:id="S6.SS2.p1">
        <p>By overfitting to common patterns at the expense of more difficult, infrequent samples that require deeper understanding, statistical models can perform well as measured by some aggregate metrics <cite class="ltx_citemacro_cite">[<bibref bibrefs="jia2017adversarial" separator="," yyseparator=","/>]</cite>.
A common pattern useful for predicting pronunciation is that phonetic sub-units usually occur at the end of the linearized sequences.
A general model would be able to find where the phonetic sub-units are in the sequences.
A model that only attends to the end of sequences would make wrong prediction when the phonetic sub-units are not at the end of the sequences.</p>
      </para>
      <para xml:id="S6.SS2.p2">
        <p>To determine how the models predict, we visualize the hidden states of LSTM and treeLSTM.
The visualization for biLSTM is not shown since it performed worse than LSTM.
For both LSTM and treeLSTM, the last hidden state (e.g. <Math mode="inline" tex="\textbf{h}_{15}" text="[h] _ 15" xml:id="S6.SS2.p2.m1">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMText><text font="bold">h</text></XMText>
                <XMTok fontsize="70%" meaning="15" role="NUMBER">15</XMTok>
              </XMApp>
            </XMath>
          </Math> in Figure <ref labelref="LABEL:subfig:ex1"/>) is considered the logograph embedding.
The intermediate embeddings (e.g. <Math mode="inline" tex="\textbf{h}_{1}" text="[h] _ 1" xml:id="S6.SS2.p2.m2">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMText><text font="bold">h</text></XMText>
                <XMTok fontsize="70%" meaning="1" role="NUMBER">1</XMTok>
              </XMApp>
            </XMath>
          </Math> to <Math mode="inline" tex="\textbf{h}_{14}" text="[h] _ 14" xml:id="S6.SS2.p2.m3">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMText><text font="bold">h</text></XMText>
                <XMTok fontsize="70%" meaning="14" role="NUMBER">14</XMTok>
              </XMApp>
            </XMath>
          </Math>) are embeddings of the subsequence of sub-units for LSTM and
embeddings of the subtrees of sub-units for treeLSTM.
The hidden states (embeddings) evolve to contain more phonetic information with more sub-units as indicated by generally increasing magnitude of the hidden states (corresponding to darker bands).
When the magnitude of the hidden states are small (corresponding to faint bands), the hidden states do not have enough information to predict pronunciation confidently.
We also obtained the prediction corresponding to each hidden state by feeding the hidden states (<Math mode="inline" tex="\textbf{h}_{1}" text="[h] _ 1" xml:id="S6.SS2.p2.m4">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMText><text font="bold">h</text></XMText>
                <XMTok fontsize="70%" meaning="1" role="NUMBER">1</XMTok>
              </XMApp>
            </XMath>
          </Math> to <Math mode="inline" tex="\textbf{h}_{15}" text="[h] _ 15" xml:id="S6.SS2.p2.m5">
            <XMath>
              <XMApp>
                <XMTok role="SUBSCRIPTOP" scriptpos="post1"/>
                <XMText><text font="bold">h</text></XMText>
                <XMTok fontsize="70%" meaning="15" role="NUMBER">15</XMTok>
              </XMApp>
            </XMath>
          </Math>) to the task-specific layer in order to determine at which step did the embeddings contain phonetic information to make the correct pronunciation prediction.
<!--  %**** taslp2019.tex Line 825 **** --></p>
      </para>
      <para xml:id="S6.SS2.p3">
        <p>Figure <ref labelref="LABEL:subfig:ex1"/> shows how the models predict the pronunciation of the logograph <ERROR class="undefined">{CJK*}</ERROR>UTF8bkai賄 (<emph font="italic">bribery</emph>).
This is a common example as the phonetic sub-units are on the right (corresponding to end of the linearized sequence).
While both models predict correctly, they used the logograph structural representation differently.
LSTM had to observe the whole sequence to predict correctly, as suggested by the build-up in magnitude of the embeddings until the end of the sequence.
For treeLSTM, the pattern of the embeddings’ magnitude is consistent with the hierarchical structure of the input logograph with two subtrees <ERROR class="undefined">{CJK*}</ERROR>UTF8bkai貝 and <ERROR class="undefined">{CJK*}</ERROR>UTF8bkai有.
Specifically, not only was the final pronunciation prediction of <ERROR class="undefined">{CJK*}</ERROR>UTF8bkai賄 correct (“f ui #”), but pronunciation of the subtrees (<ERROR class="undefined">{CJK*}</ERROR>UTF8bkai貝 and <ERROR class="undefined">{CJK*}</ERROR>UTF8bkai有) were also correct (“b ui #” and “j au #” respectively).</p>
      </para>
      <para xml:id="S6.SS2.p4">
        <p>Figure <ref labelref="LABEL:subfig:ex2"/> shows a rare example where the phonetic sub-units are not at the end of the sequence.
LSTM made the correct prediction after observing the relevant parts (up to the second last input token) but soon forgot the correct prediction as it might focus more on the end of the sequence.
This mistake indicates that LSTM might have learned a heuristic instead of the general strategies.
On the other hand, treeLSTM predicted the pronunciation correctly by seemingly focusing on the relevant part (<ERROR class="undefined">{CJK*}</ERROR>UTF8bkai如) of the logograph and ignoring the less relevant tokens.
Thus, imposing a prior on the mapping from logographs to embeddings by using recursive network seems to lead to a solution that may generalize better to more challenging cases.</p>
      </para>
    </subsection>
    <subsection inlist="toc" xml:id="S6.SS3">
      <tags>
        <tag>VI-C</tag>
        <tag role="refnum">VI-C</tag>
        <tag role="typerefnum">§VI-C</tag>
      </tags>
      <title><tag close=" ">VI-C</tag><text font="italic">Infrequent Characters’ Embeddings in Language Modeling</text></title>
      <table inlist="lot" labels="LABEL:tbl:lm_rare" placement="ht" xml:id="S6.T9">
        <tags>
          <tag><text fontsize="90%">Table IX</text></tag>
          <tag role="refnum">IX</tag>
          <tag role="typerefnum">Table IX</tag>
        </tags>
        <tabular class="ltx_centering ltx_guessed_headers" vattach="middle">
          <thead>
            <tr>
              <td align="left" border="r" thead="column row">Character</td>
              <td align="left" border="r" thead="column">Standard Embedding</td>
              <td align="left" thead="column">Hierarchical Embedding</td>
            </tr>
          </thead>
          <tbody>
            <tr>
              <td align="left" border="r t" thead="row"><graphics graphic="rare/1_spider" options="width=17.3448pt" xml:id="S6.T9.g1"/>,</td>
              <td align="left" border="r t"><graphics graphic="rare/1_plant" options="width=17.3448pt" xml:id="S6.T9.g2"/>, <emph font="italic">a plant</emph>, “ch a ng”</td>
              <td align="left" border="t"><graphics graphic="rare/1_cricket" options="width=17.3448pt" xml:id="S6.T9.g3"/>, <emph font="italic">cricket</emph>, “q u #”</td>
            </tr>
            <tr>
              <td align="left" border="r" thead="row"><emph font="italic">spider</emph>,</td>
              <td align="left" border="r"><graphics graphic="rare/1_drawer" options="width=17.3448pt" xml:id="S6.T9.g4"/>, <emph font="italic">drawer</emph>, “t i #”</td>
              <td align="left"><graphics graphic="rare/1_clam" options="width=17.3448pt" xml:id="S6.T9.g5"/>, <emph font="italic">ark clam</emph>, “q u #”</td>
            </tr>
            <tr>
              <td align="left" border="r" thead="row">“zh u #”</td>
              <td align="left" border="r"><graphics graphic="rare/1_scold" options="width=17.3448pt" xml:id="S6.T9.g6"/>, <emph font="italic">scold</emph>, “ch i #”</td>
              <td align="left"><graphics graphic="rare/1_louse" options="width=17.3448pt" xml:id="S6.T9.g7"/>, <emph font="italic">louse</emph>, “y a #”</td>
            </tr>
            <tr>
              <td align="left" border="r t" thead="row"><graphics graphic="rare/2_fort" options="width=17.3448pt" xml:id="S6.T9.g8"/>,</td>
              <td align="left" border="r t"><graphics graphic="rare/2_omit" options="width=17.3448pt" xml:id="S6.T9.g9"/>, <emph font="italic">omit</emph>, “sh e ng”</td>
              <td align="left" border="t"><graphics graphic="rare/2_firewood" options="width=17.3448pt" xml:id="S6.T9.g10"/>, <emph font="italic">firewood</emph>, “ch ai #”</td>
            </tr>
            <tr>
              <td align="left" border="r" thead="row"><emph font="italic">fort</emph>,</td>
              <td align="left" border="r"><graphics graphic="rare/2_south" options="width=17.3448pt" xml:id="S6.T9.g11"/>, <emph font="italic">south</emph>, “n a n”</td>
              <td align="left"><graphics graphic="rare/2_purple" options="width=17.3448pt" xml:id="S6.T9.g12"/>, <emph font="italic">purple</emph>, “z i #”</td>
            </tr>
            <tr>
              <td align="left" border="r" thead="row">“zh ai #”</td>
              <td align="left" border="r"><graphics graphic="rare/2_blanket" options="width=17.3448pt" xml:id="S6.T9.g13"/>, <emph font="italic">blanket</emph>, “b ei #”</td>
              <td align="left"><graphics graphic="rare/2_female" options="width=17.3448pt" xml:id="S6.T9.g14"/>, <emph font="italic">female</emph>, “c i #”</td>
            </tr>
            <tr>
              <td align="left" border="r t" thead="row"><graphics graphic="rare/3_jade" options="width=17.3448pt" xml:id="S6.T9.g15"/>,</td>
              <td align="left" border="r t"><graphics graphic="rare/3_this" options="width=17.3448pt" xml:id="S6.T9.g16"/>, <emph font="italic">this</emph>, “c i #”</td>
              <td align="left" border="t"><graphics graphic="rare/3_pendant" options="width=17.3448pt" xml:id="S6.T9.g17"/>, <emph font="italic">pendant</emph>, “p ei #”</td>
            </tr>
            <tr>
              <td align="left" border="r" thead="row"><emph font="italic">jade belt</emph>,</td>
              <td align="left" border="r"><graphics graphic="rare/3_army" options="width=17.3448pt" xml:id="S6.T9.g18"/>, <emph font="italic">army</emph>, “j u n”</td>
              <td align="left"><graphics graphic="rare/3_pearl" options="width=17.3448pt" xml:id="S6.T9.g19"/>, <emph font="italic">imperfect pearl</emph>, “j i #”</td>
            </tr>
            <tr>
              <td align="left" border="r" thead="row">“p ei #”</td>
              <td align="left" border="r"><graphics graphic="rare/3_gate" options="width=17.3448pt" xml:id="S6.T9.g20"/>, <emph font="italic">gate</emph>, “m e n”</td>
              <td align="left"><graphics graphic="rare/3_can" options="width=17.3448pt" xml:id="S6.T9.g21"/>, <emph font="italic">watering can</emph>, “g ua n”</td>
            </tr>
            <tr>
              <td align="left" border="r t" thead="row"><graphics graphic="rare/4_celery" options="width=17.3448pt" xml:id="S6.T9.g22"/>,</td>
              <td align="left" border="r t"><graphics graphic="rare/4_powerful" options="width=17.3448pt" xml:id="S6.T9.g23"/>, <emph font="italic">powerful</emph>, “z a ng”</td>
              <td align="left" border="t"><graphics graphic="rare/4_axe" options="width=17.3448pt" xml:id="S6.T9.g24"/>, <emph font="italic">axe</emph>, “f u #”</td>
            </tr>
            <tr>
              <td align="left" border="r" thead="row"><emph font="italic">celery</emph>,</td>
              <td align="left" border="r"><graphics graphic="rare/4_note" options="width=17.3448pt" xml:id="S6.T9.g25"/>, <emph font="italic">note</emph>, “zh a #”</td>
              <td align="left"><graphics graphic="rare/4_fragrance" options="width=17.3448pt" xml:id="S6.T9.g26"/>, <emph font="italic">fragrance</emph>, “x u n”</td>
            </tr>
            <tr>
              <td align="left" border="r" thead="row">“q i n”</td>
              <td align="left" border="r"><graphics graphic="rare/4_not" options="width=17.3448pt" xml:id="S6.T9.g27"/>, <emph font="italic">not</emph>, “m a #”</td>
              <td align="left"><graphics graphic="rare/4_lush" options="width=17.3448pt" xml:id="S6.T9.g28"/>, <emph font="italic">lush</emph>, “m ao #”</td>
            </tr>
          </tbody>
        </tabular>
        <toccaption class="ltx_centering"><tag close=" ">IX</tag>
<text font="italic">Nearest neighbors in embedding space of infrequent words.
The meaning and Mandarin pronunciation are shown next to the characters.
The common sub-units between the logographs and their neighbors in the embedding space are color-coded.
Red sub-units carry semantic information. Blue sub-units carry phonetic information.
</text></toccaption>
        <caption class="ltx_centering"><tag close=": "><text fontsize="90%">Table IX</text></tag><text fontsize="90%">
<text font="italic">Nearest neighbors in embedding space of infrequent words.
The meaning and Mandarin pronunciation are shown next to the characters.
The common sub-units between the logographs and their neighbors in the embedding space are color-coded.
Red sub-units carry semantic information. Blue sub-units carry phonetic information.
</text></text></caption>
      </table>
      <para xml:id="S6.SS3.p1">
        <p>Hierarchical embeddings could learn better representations of infrequent characters than standard embeddings could since the latter ignores the morphology within characters.
Using the learned embeddings in the language modeling experiments, we looked for characters that are most similar (nearest neighbors) to the infrequent characters in the embedding space.
If the nearest neighbors are semantically or phonologically close then we are more certain that the learned embeddings are sensible.
The distance between embedding vectors is calculated using cosine similarity.
Table <ref labelref="LABEL:tbl:lm_rare"/> showed that the infrequent characters and their nearest neighbors are relatively close in meaning when using hierarchical embeddings.</p>
      </para>
<!--  %**** taslp2019.tex Line 900 **** -->      <para xml:id="S6.SS3.p2">
        <p>For standard embeddings, infrequent characters and their neighbors are generally unrelated.
For example, <emph font="italic">spider</emph> are unrelated to <emph font="italic">a plant</emph>, <emph font="italic">drawer</emph>, or <emph font="italic">scold</emph>.
It is possible that with little training data, the infrequent characters’ embedding stay close to the original random initialized values and hence are far away from related characters in embedding space.
For hierarchical embeddings, infrequent characters are more related to their neighbors.
The relatedness between infrequent characters and their neighbors can be semantic or phonological.
For example, the first row in Table <ref labelref="LABEL:tbl:lm_rare"/> shows characters (<ERROR class="undefined">{CJK*}</ERROR>UTF8bkai<text fontsize="144%">蛛,蛐,蚶,蚜</text>) that share the same sematic sub-units (shown in red).
Accordingly, <emph font="italic">spider</emph> is semantically related to <emph font="italic">cricket</emph>, <emph font="italic">ark clam</emph>, and <emph font="italic">louse</emph> since they are all insects.
The second row in Table <ref labelref="LABEL:tbl:lm_rare"/> shows another example in which characters (<ERROR class="undefined">{CJK*}</ERROR>UTF8bkai<text fontsize="144%">砦,柴,紫,雌</text>) have the same phonetic sub-units (shown in blue).
Correspondingly, “zh ai #” is phonologically related (having similar pronunciation) to “ch ai #”, “z i #”, and “c i #”.</p>
      </para>
      <para xml:id="S6.SS3.p3">
        <p>However, the hierarchical embeddings are not always accurate and the last row of Table <ref labelref="LABEL:tbl:lm_rare"/> shows an interesting failure.
The cosine distance suggests that (<emph font="italic">celery</emph>, “q i n”) and (<emph font="italic">axe</emph>, “f u #”) are related.
Although both characters has a common sub-unit, the sub-unit carries phonological information (color-coded as blue) in the case of (<emph font="italic">celery</emph>, “q i n”) while the sub-unit carries semantic information (color-coded as red) in the case of (<emph font="italic">axe</emph>, “f u #”).</p>
      </para>
    </subsection>
    <subsection inlist="toc" xml:id="S6.SS4">
      <tags>
        <tag>VI-D</tag>
        <tag role="refnum">VI-D</tag>
        <tag role="typerefnum">§VI-D</tag>
      </tags>
      <title><tag close=" ">VI-D</tag><text font="italic">Automated Feature Granularities Selection</text></title>
      <para xml:id="S6.SS4.p1">
        <p>The granularity of the input features derived from logographs could have a major impact on model performance.
The input features could be as granular as individual strokes, which results in a small vocabulary.
Different permutations of the strokes can form unique ideographs and expand the vocabulary.
The choice of the vocabulary set has a major impact on sequential models like RNN, as a big vocabulary makes training slow and makes it hard for the model to generalize.
On the other hand, a small vocabulary leads to longer sequences and makes it harder for models to learn.
<cite class="ltx_citemacro_cite">[<bibref bibrefs="nguyen2017sub" separator="," yyseparator=","/>]</cite> showed that a big vocabulary yields lower perplexity for language modeling of Japanese, while big vocabulary implies that each token is a meaningful unit that carries semantic information <cite class="ltx_citemacro_cite">[<bibref bibrefs="karpinska2018subcharacter" separator="," yyseparator=","/>]</cite>.
Moreover, different sub-unit granularities might be more suitable for different logographic languages.
For example, ideographs are more suitable as input tokens for Chinese, while individual strokes are better suited for Japanese <cite class="ltx_citemacro_cite">[<bibref bibrefs="zhang2018neural" separator="," yyseparator=","/>]</cite>.
<cite class="ltx_citemacro_cite">[<bibref bibrefs="su2017learning" separator="," yyseparator=","/>]</cite> chose to extract visual features of logographs instead of symbolic features to avoid specifying the level of granularity when decomposing a character.
<!--  %**** taslp2019.tex Line 925 **** --></p>
      </para>
      <para xml:id="S6.SS4.p2">
        <p>In Figure <ref labelref="LABEL:subfig:ex1_slstm"/> and <ref labelref="LABEL:subfig:ex2_slstm"/>, LSTM treats all input features with relatively equal importance, evident by relatively high activation values across most hidden states.
On the contrary, the structural constraints imposed by treeLSTM resulted in a more automated selection of input features, in which most of the high activation concentrate at the hidden states of sub-trees’ roots.
In other words, treeLSTM seemed to have learned to build representations relevant to the task at the right level of granularity.
Learning the right features via structures instead of delicate feature engineering is an advantage that should be explored further for RNN models.</p>
      </para>
    </subsection>
    <subsection inlist="toc" xml:id="S6.SS5">
      <tags>
        <tag>VI-E</tag>
        <tag role="refnum">VI-E</tag>
        <tag role="typerefnum">§VI-E</tag>
      </tags>
      <title><tag close=" ">VI-E</tag><text font="italic">Intuitive Exploitation of Input Structures</text></title>
      <para xml:id="S6.SS5.p1">
        <p>Various work suggested that incorporating syntactic structures is tricky and does not always improve results.
For example, in subject-verb agreement modeling, a model could easily ignore syntactic information from the input data and so syntactic constraints must be explicitly injected into the model’s architecture <cite class="ltx_citemacro_cite">[<bibref bibrefs="kuncoro2018lstms" separator="," yyseparator=","/>]</cite>.
Doing so would make it easier for the model to discern certain relationship of interest (subject-verb agreement) by shortening the path between relevant sub-units (subject and verb in a sentence) <cite class="ltx_citemacro_cite">[<bibref bibrefs="kuncoro2018lstms,bjorne2009extracting" separator="," yyseparator=","/>]</cite>.
Hence, being explicit in modeling structures may be the key to obtaining performance gain.
Similarly, our work showed that modeling structures explicitly (using treeLSTM) is better than implicitly (using LSTM) in terms of model performance.</p>
      </para>
      <para xml:id="S6.SS5.p2">
        <p>Models that learn task-specific trees from data could be better than models that use conventional parsers to obtain the trees <cite class="ltx_citemacro_cite">[<bibref bibrefs="yogatama2017learning,choi2018learning" separator="," yyseparator=","/>]</cite>.
However, the learned trees are usually shallow and hard to interpret <cite class="ltx_citemacro_cite">[<bibref bibrefs="williams2018latent" separator="," yyseparator=","/>]</cite>.
Shallow trees make the paths between related tokens shorter but they do not always result in better performance.
For the binary trees of logographs, the tree depth is unlikely to account for the improved performance because the trees are not balanced binary trees (which are shallowest).
The improved performance is more likely due to the inductive bias using logographic structures.
We showed that by exploiting structures like human intuition, treeLSTM could arrive at the general and correct solution in a more data-efficient and effective manner for pronunciation prediction and language modeling tasks concerning logographs (Chinese characters).
Better interpretability due to the model following human intuition provides some confidence that the model is general and is not exploiting statistical biases in the data.</p>
      </para>
    </subsection>
    <subsection inlist="toc" xml:id="S6.SS6">
      <tags>
        <tag>VI-F</tag>
        <tag role="refnum">VI-F</tag>
        <tag role="typerefnum">§VI-F</tag>
      </tags>
      <title><tag close=" ">VI-F</tag><text font="italic">Potential Applications and Extensions</text></title>
      <para xml:id="S6.SS6.p1">
        <p>To tackle the out-of-vocabulary problem, it is common to apply pre-processing steps such as replacing infrequent characters or characters unseen during training with the UNK token.
<!--  %**** taslp2019.tex Line 950 **** -->However, these pre-processing steps could potentially remove information stemmed from the usage of the infrequent characters.
Hierarchical embeddings enable modeling Chinese text directly without these pre-processing steps.
By treating Chinese characters as recursive structures of common sub-units instead of independent tokens, hierarchical embeddings make it possible for model to have a much bigger vocabulary.
Hierarchical embeddings also make learning representations of infrequent characters easier through leveraging the similarity between structures of infrequent characters and of common characters.
Thus, models that use hierarchical embeddings may be able to capture the intention behind the usage of infrequent characters.
Furthermore, hierarchical embeddings can also be used to model Japanese Kanji which are logographs created using the same principles as Chinese logographs.</p>
      </para>
      <para xml:id="S6.SS6.p2">
        <p>We used hierarchical embeddings in the pronunciation prediction task and language modeling task.
However, other NLP tasks may also benefit from using hierarchical embeddings as previous work exploiting logograph structures have shown promising results in tasks such as machine translation <cite class="ltx_citemacro_cite">[<bibref bibrefs="karpinska2018subcharacter,zhang2018neural,zhang2019chinese" separator="," yyseparator=","/>]</cite> or textual error detection <cite class="ltx_citemacro_cite">[<bibref bibrefs="chen2015probabilistic" separator="," yyseparator=","/>]</cite>.
In particular, hierarchical embeddings may be useful in named entity recognition (NER) where infrequent characters may be used in names or in poetry generation where characters need to rhyme.</p>
      </para>
      <para xml:id="S6.SS6.p3">
        <p>The current work can be extended in a few different ways.
Although treeLSTM was used to construct the hierarchical embeddings, it is possible that self-attention models such as Transformer <cite class="ltx_citemacro_cite">[<bibref bibrefs="vaswani2017attention" separator="," yyseparator=","/>]</cite> might lead to even better performance as they could learn patterns in trees that the current models could not.
However, since self-attention models usually have a lot of parameters they may overfit given the small amount of training data.
It would be interesting to see how well a bigger and more powerful model such as Transformer can model logographic structures given limited data.
For language modeling, using hierarchical embeddings does not constrain the choice of the language model.
The AWD-LSTM can be replaced by a more powerful model such as Transformer <cite class="ltx_citemacro_cite">[<bibref bibrefs="vaswani2017attention" separator="," yyseparator=","/>]</cite> or BERT <cite class="ltx_citemacro_cite">[<bibref bibrefs="devlin2019bert" separator="," yyseparator=","/>]</cite>.
It is possible that hierarchical embeddings may have little benefit for models such as Transformer or BERT which can exploit contextual information to deduce the representations of the infrequent characters.
However, exploiting both contextual information and structural similarity to obtain better embeddings for infrequent characters should theoretically be better than just relying on contextual information.</p>
      </para>
    </subsection>
  </section>
  <section inlist="toc" xml:id="S7">
    <tags>
      <tag>VII</tag>
      <tag role="refnum">VII</tag>
      <tag role="typerefnum">§VII</tag>
    </tags>
    <title><tag close=" ">VII</tag><text font="smallcaps">Conclusion</text></title>
    <para xml:id="S7.p1">
      <p>Exploiting recursive structures of logographs to build logographic embeddings can lead to embeddings that yield both better results and interpretability.
We showed both quantitative and qualitative evidence that exploiting recursive structures boosts accuracy in logographs’ pronunciation prediction: the hierarchical embeddings is better than the embeddings constructed by LSTM, biLSTM, and CNN.
Hierarchical embeddings also consistently outperformed standard embeddings in language modeling of five different datasets.
<!--  %**** taslp2019.tex Line 975 **** -->Inspecting the inner workings of the models also revealed that treeLSTM conceivably resembles how humans perform reading tasks, suggesting that exploiting structures not only improves performance, but might also help us develop more interpretable models.
Although this paper only consider two tasks, building better logographic (character) embedding by exploiting recursive structures can potentially benefit other tasks such as machine translation.</p>
    </para>
<!--  %use section* for acknowledgment -->  </section>
  <section xml:id="Sx1">
    <title>Acknowledgment</title>
    <para xml:id="Sx1.p1">
      <p>The authors would like to thank the anonymous reviewers for their constructive feedback to improve the paper.
In addition, the delightful discussions with Ai Ti Aw and Ed Hovy are also much appreciated.</p>
    </para>
<!--  %Generated by IEEEtran.bst, version: 1.12 (2007/01/11) -->  </section>
  <bibliography xml:id="bib">
    <title>References</title>
    <biblist>
      <bibitem key="hsiao2006analysis" xml:id="bib.bib1">
        <tags>
          <tag>[1]</tag>
          <tag role="refnum">1</tag>
        </tags>
        <bibblock>
<!--  %**** taslp2019.bbl Line 25 **** -->J. H.-w. Hsiao and R. Shillcock, “Analysis of a Chinese phonetic compound
database: Implications for orthographic processing,” <emph font="italic">Journal of
psycholinguistic research</emph>, vol. 35, no. 5, 2006.
</bibblock>
      </bibitem>
      <bibitem key="ho1997phonological" xml:id="bib.bib2">
        <tags>
          <tag>[2]</tag>
          <tag role="refnum">2</tag>
        </tags>
        <bibblock>
C. S.-H. Ho and P. Bryant, “Phonological skills are important in learning to
read Chinese.” <emph font="italic">Developmental psychology</emph>, vol. 33, no. 6, 1997.
</bibblock>
      </bibitem>
      <bibitem key="tai2015improved" xml:id="bib.bib3">
        <tags>
          <tag>[3]</tag>
          <tag role="refnum">3</tag>
        </tags>
        <bibblock>
K. S. Tai, R. Socher, and C. Manning, “Improved semantic representations from
tree-structured long short-term memory networks,” in <emph font="italic">Proceedings of
ACL</emph>, 2015.
</bibblock>
      </bibitem>
      <bibitem key="zhu2015long" xml:id="bib.bib4">
        <tags>
          <tag>[4]</tag>
          <tag role="refnum">4</tag>
        </tags>
        <bibblock>
X. Zhu, P. Sobihani, and H. Guo, “Long short-term memory over recursive
structures,” in <emph font="italic">Proceedings of ICML</emph>, 2015.
</bibblock>
      </bibitem>
      <bibitem key="mikolov2013distributed" xml:id="bib.bib5">
        <tags>
          <tag>[5]</tag>
          <tag role="refnum">5</tag>
        </tags>
        <bibblock>
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed
representations of words and phrases and their compositionality,” in
<emph font="italic">Proceedings of NIPS</emph>, 2013.
</bibblock>
      </bibitem>
      <bibitem key="graves2013generating" xml:id="bib.bib6">
        <tags>
          <tag>[6]</tag>
          <tag role="refnum">6</tag>
        </tags>
        <bibblock>
A. Graves, “Generating Sequences With Recurrent Neural Networks,”
<emph font="italic">CoRR</emph>, vol. abs/1308.0850, 2013.
<!--  %**** taslp2019.bbl Line 50 **** --></bibblock>
      </bibitem>
      <bibitem key="ngo2014minimal" xml:id="bib.bib7">
        <tags>
          <tag>[7]</tag>
          <tag role="refnum">7</tag>
        </tags>
        <bibblock>
H. G. Ngo, N. F. Chen, S. Sivadas, B. Ma, and H. Li, “A Minimal-Resource
Transliteration Framework for Vietnamese,” in <emph font="italic">Proceedings of
INTERSPEECH</emph>, 2014.
</bibblock>
      </bibitem>
      <bibitem key="ngo2015phonology" xml:id="bib.bib8">
        <tags>
          <tag>[8]</tag>
          <tag role="refnum">8</tag>
        </tags>
        <bibblock>
H. G. Ngo, N. F. Chen, M. Nguyen, B. Ma, and H. Li, “Phonology-augmented
statistical transliteration for low-resource languages,” in
<emph font="italic">Proceedings of INTERSPEECH</emph>, 2015.
</bibblock>
      </bibitem>
      <bibitem key="ngo2019phonology" xml:id="bib.bib9">
        <tags>
          <tag>[9]</tag>
          <tag role="refnum">9</tag>
        </tags>
        <bibblock>
H. G. Ngo, M. Nguyen, and N. F. Chen, “Phonology-augmented statistical
framework for machine transliteration using limited linguistic resources,”
<emph font="italic">IEEE/ACM Transactions on Audio, Speech and Language Processing
(TASLP)</emph>, vol. 27, no. 1, 2019.
</bibblock>
      </bibitem>
      <bibitem key="yamada2001syntax" xml:id="bib.bib10">
        <tags>
          <tag>[10]</tag>
          <tag role="refnum">10</tag>
        </tags>
        <bibblock>
K. Yamada and K. Knight, “A syntax-based statistical translation model,” in
<emph font="italic">Proceedings of ACL</emph>, 2001.
</bibblock>
      </bibitem>
      <bibitem key="eriguchi2016tree" xml:id="bib.bib11">
        <tags>
          <tag>[11]</tag>
          <tag role="refnum">11</tag>
        </tags>
        <bibblock>
A. Eriguchi, K. Hashimoto, and Y. Tsuruoka, “Tree-to-sequence attentional
neural machine translation,” in <emph font="italic">Proceedings of ACL</emph>, 2016.
<!--  %**** taslp2019.bbl Line 75 **** --></bibblock>
      </bibitem>
      <bibitem key="miyazaki2017japanese" xml:id="bib.bib12">
        <tags>
          <tag>[12]</tag>
          <tag role="refnum">12</tag>
        </tags>
        <bibblock>
R. Miyazaki and M. Komachi, “Japanese sentiment classification using a
tree-structured Long Short-Term Memory with attention,” in
<emph font="italic">Proceedings of PACLIC</emph>, 2018.
</bibblock>
      </bibitem>
      <bibitem key="bowman2016fast" xml:id="bib.bib13">
        <tags>
          <tag>[13]</tag>
          <tag role="refnum">13</tag>
        </tags>
        <bibblock>
S. R. Bowman, J. Gauthier, A. Rastogi, R. Gupta, C. Manning, and C. Potts, “A
fast unified model for parsing and sentence understanding,” in
<emph font="italic">Proceedings of ACL</emph>, 2016.
</bibblock>
      </bibitem>
      <bibitem key="dyer2016recurrent" xml:id="bib.bib14">
        <tags>
          <tag>[14]</tag>
          <tag role="refnum">14</tag>
        </tags>
        <bibblock>
C. Dyer, A. Kuncoro, M. Ballesteros, and N. A. Smith, “Recurrent neural
network grammars,” in <emph font="italic">Proceedings of NAACL-HLT</emph>, 2016.
</bibblock>
      </bibitem>
      <bibitem key="zhang2016top" xml:id="bib.bib15">
        <tags>
          <tag>[15]</tag>
          <tag role="refnum">15</tag>
        </tags>
        <bibblock>
X. Zhang, L. Lu, and M. Lapata, “Top-down Tree Long Short-Term Memory
Networks,” in <emph font="italic">Proceedings of NAACL-HLT</emph>, 2016.
</bibblock>
      </bibitem>
      <bibitem key="li2015tree" xml:id="bib.bib16">
        <tags>
          <tag>[16]</tag>
          <tag role="refnum">16</tag>
        </tags>
        <bibblock>
J. Li, T. Luong, D. Jurafsky, and E. Hovy, “When are tree structures
necessary for deep learning of representations?” in <emph font="italic">Proceedings of
EMNLP</emph>, 2015.
</bibblock>
      </bibitem>
      <bibitem key="lan2018toolkit" xml:id="bib.bib17">
        <tags>
          <tag>[17]</tag>
          <tag role="refnum">17</tag>
        </tags>
        <bibblock>
W. Lan and W. Xu, “Neural network models for paraphrase identification,
<!--  %**** taslp2019.bbl Line 100 **** -->semantic textual similarity, natural language inference, and question
answering,” in <emph font="italic">Proceedings of COLING</emph>, 2018.
</bibblock>
      </bibitem>
      <bibitem key="morioka2008chise" xml:id="bib.bib18">
        <tags>
          <tag>[18]</tag>
          <tag role="refnum">18</tag>
        </tags>
        <bibblock>
T. Morioka, “CHISE: Character processing based on character ontology,” in
<emph font="italic">International Conference on Large-Scale Knowledge Resources</emph>.   Springer, 2008.
</bibblock>
      </bibitem>
      <bibitem key="ramachandran2017unsupervised" xml:id="bib.bib19">
        <tags>
          <tag>[19]</tag>
          <tag role="refnum">19</tag>
        </tags>
        <bibblock>
P. Ramachandran, P. Liu, and Q. Le, “Unsupervised pretraining for sequence to
sequence learning,” in <emph font="italic">Proceedings of EMNLP</emph>, 2017.
</bibblock>
      </bibitem>
      <bibitem key="peters2018deep" xml:id="bib.bib20">
        <tags>
          <tag>[20]</tag>
          <tag role="refnum">20</tag>
        </tags>
        <bibblock>
M. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and
L. Zettlemoyer, “Deep contextualized word representations,” in
<emph font="italic">Proceedings of NAACL-HLT</emph>, 2018, pp. 2227–2237.
</bibblock>
      </bibitem>
      <bibitem key="radford2018improving" xml:id="bib.bib21">
        <tags>
          <tag>[21]</tag>
          <tag role="refnum">21</tag>
        </tags>
        <bibblock>
A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving language
understanding by generative pre-training,” 2018.
</bibblock>
      </bibitem>
      <bibitem key="howard2018universal" xml:id="bib.bib22">
        <tags>
          <tag>[22]</tag>
          <tag role="refnum">22</tag>
        </tags>
        <bibblock>
J. Howard and S. Ruder, “Universal language model fine-tuning for text
classification,” in <emph font="italic">Proceedings of ACL</emph>, 2018.
<!--  %**** taslp2019.bbl Line 125 **** --></bibblock>
      </bibitem>
      <bibitem key="devlin2019bert" xml:id="bib.bib23">
        <tags>
          <tag>[23]</tag>
          <tag role="refnum">23</tag>
        </tags>
        <bibblock>
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep
bidirectional transformers for language understanding,” in
<emph font="italic">Proceedings of NAACL-HLT</emph>, 2019, pp. 4171–4186.
</bibblock>
      </bibitem>
      <bibitem key="li2018subword" xml:id="bib.bib24">
        <tags>
          <tag>[24]</tag>
          <tag role="refnum">24</tag>
        </tags>
        <bibblock>
B. Li, A. Drozd, T. Liu, and X. Du, “Subword-level composition functions for
learning word embeddings,” in <emph font="italic">Proceedings of the Second Workshop on
Subword and Character Level Models in NLP</emph>, 2018.
</bibblock>
      </bibitem>
      <bibitem key="irsoy2014deep" xml:id="bib.bib25">
        <tags>
          <tag>[25]</tag>
          <tag role="refnum">25</tag>
        </tags>
        <bibblock>
O. Irsoy and C. Cardie, “Deep recursive neural networks for compositionality
in language,” in <emph font="italic">Proceedings of NIPS</emph>, 2014, pp. 2096–2104.
</bibblock>
      </bibitem>
      <bibitem key="neubig2017fly" xml:id="bib.bib26">
        <tags>
          <tag>[26]</tag>
          <tag role="refnum">26</tag>
        </tags>
        <bibblock>
G. Neubig, Y. Goldberg, and C. Dyer, “On-the-fly operation batching in
dynamic computation graphs,” in <emph font="italic">Proceedings of NIPS</emph>, 2017, pp.
3971–3981.
</bibblock>
      </bibitem>
      <bibitem key="jia2017adversarial" xml:id="bib.bib27">
        <tags>
          <tag>[27]</tag>
          <tag role="refnum">27</tag>
        </tags>
        <bibblock>
R. Jia and P. Liang, “Adversarial examples for evaluating reading
comprehension systems,” in <emph font="italic">Proceedings of EMNLP</emph>, 2017, pp.
2021–2031.
</bibblock>
      </bibitem>
      <bibitem key="lake2018generalization" xml:id="bib.bib28">
        <tags>
          <tag>[28]</tag>
          <tag role="refnum">28</tag>
        </tags>
        <bibblock>
<!--  %**** taslp2019.bbl Line 150 **** -->B. Lake and M. Baroni, “Generalization without systematicity: On the
compositional skills of sequence-to-sequence recurrent networks,” in
<emph font="italic">Proceedings of ICML</emph>, 2018, pp. 2879–2888.
</bibblock>
      </bibitem>
      <bibitem key="mitchell2018extrapolation" xml:id="bib.bib29">
        <tags>
          <tag>[29]</tag>
          <tag role="refnum">29</tag>
        </tags>
        <bibblock>
J. Mitchell, P. Stenetorp, P. Minervini, and S. Riedel, “Extrapolation in
NLP,” in <emph font="italic">Proceedings of the Workshop on Generalization in the Age of
Deep Learning</emph>, 2018, pp. 28–33.
</bibblock>
      </bibitem>
      <bibitem key="yang2010note" xml:id="bib.bib30">
        <tags>
          <tag>[30]</tag>
          <tag role="refnum">30</tag>
        </tags>
        <bibblock>
Z. Yang, X. Sun, and J. W. Hardin, “A note on the tests for clustered
matched-pair binary data,” <emph font="italic">Biometrical journal</emph>, vol. 52, no. 5,
2010.
</bibblock>
      </bibitem>
      <bibitem key="srivastava14dropout" xml:id="bib.bib31">
        <tags>
          <tag>[31]</tag>
          <tag role="refnum">31</tag>
        </tags>
        <bibblock>
N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov,
“Dropout: A Simple Way to Prevent Neural Networks from Overfitting,”
<emph font="italic">JMLR</emph>, vol. 15, 2014.
</bibblock>
      </bibitem>
      <bibitem key="kingma2015adam" xml:id="bib.bib32">
        <tags>
          <tag>[32]</tag>
          <tag role="refnum">32</tag>
        </tags>
        <bibblock>
D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” in
<emph font="italic">Proceedings of ICLR</emph>, 2015.
</bibblock>
      </bibitem>
      <bibitem key="haussler1988quantifying" xml:id="bib.bib33">
        <tags>
          <tag>[33]</tag>
          <tag role="refnum">33</tag>
        </tags>
        <bibblock>
D. Haussler, “Quantifying inductive bias: AI learning algorithms and
<!--  %**** taslp2019.bbl Line 175 **** -->Valiant’s learning framework,” <emph font="italic">Artificial intelligence</emph>, vol. 36,
no. 2, 1988.
</bibblock>
      </bibitem>
      <bibitem key="Goodfellow-et-al-2016" xml:id="bib.bib34">
        <tags>
          <tag>[34]</tag>
          <tag role="refnum">34</tag>
        </tags>
        <bibblock>
I. Goodfellow, Y. Bengio, and A. Courville, <emph font="italic">Deep Learning</emph>.   MIT Press, 2016.
</bibblock>
      </bibitem>
      <bibitem key="sutskever2014sequence" xml:id="bib.bib35">
        <tags>
          <tag>[35]</tag>
          <tag role="refnum">35</tag>
        </tags>
        <bibblock>
I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with
neural networks,” in <emph font="italic">Proceedings of NIPS</emph>, 2014.
</bibblock>
      </bibitem>
      <bibitem key="nguyen2016regulating" xml:id="bib.bib36">
        <tags>
          <tag>[36]</tag>
          <tag role="refnum">36</tag>
        </tags>
        <bibblock>
M. Nguyen, H. G. Ngo, and N. F. Chen, “Regulating Orthography-Phonology
Relationship for English to Thai Transliteration,” in <emph font="italic">Proceedings of
the Sixth Named Entity Workshop</emph>, 2016, pp. 83–87.
</bibblock>
      </bibitem>
      <bibitem key="xue2005penn" xml:id="bib.bib37">
        <tags>
          <tag>[37]</tag>
          <tag role="refnum">37</tag>
        </tags>
        <bibblock>
N. Xue, F. Xia, F.-D. Chiou, and M. Palmer, “The Penn Chinese TreeBank:
Phrase structure annotation of a large corpus,” <emph font="italic">Natural language
engineering</emph>, vol. 11, no. 2, 2005.
</bibblock>
      </bibitem>
      <bibitem key="emerson2005second" xml:id="bib.bib38">
        <tags>
          <tag>[38]</tag>
          <tag role="refnum">38</tag>
        </tags>
        <bibblock>
T. Emerson, “The second International Chinese Word Segmentation Bakeoff,”
in <emph font="italic">Proceedings of the fourth SIGHAN workshop on Chinese language
Processing</emph>, 2005.
<!--  %**** taslp2019.bbl Line 200 **** --></bibblock>
      </bibitem>
      <bibitem key="kawakami2018unsupervised" xml:id="bib.bib39">
        <tags>
          <tag>[39]</tag>
          <tag role="refnum">39</tag>
        </tags>
        <bibblock>
K. Kawakami, C. Dyer, and P. Blunsom, “Unsupervised Word Discovery with
Segmental Neural Language Models,” <emph font="italic">CoRR</emph>, vol. abs/1811.09353, 2018.
</bibblock>
      </bibitem>
      <bibitem key="merity2017regularizing" xml:id="bib.bib40">
        <tags>
          <tag>[40]</tag>
          <tag role="refnum">40</tag>
        </tags>
        <bibblock>
S. Merity, N. S. Keskar, and R. Socher, “Regularizing and optimizing LSTM
language models,” in <emph font="italic">Proceedings of the International Conference on
Learning Representations, ICLR</emph>, 2018.
</bibblock>
      </bibitem>
      <bibitem key="ling2015finding" xml:id="bib.bib41">
        <tags>
          <tag>[41]</tag>
          <tag role="refnum">41</tag>
        </tags>
        <bibblock>
L. Wang, C. Dyer, A. W. Black, I. Trancoso, R. Fermandez, S. Amir, L. Marujo,
and T. Luis, “Finding function in form: Compositional character models for
open vocabulary word representation,” in <emph font="italic">Proceedings of EMNLP</emph>, 2015.
</bibblock>
      </bibitem>
      <bibitem key="dai2017glyph" xml:id="bib.bib42">
        <tags>
          <tag>[42]</tag>
          <tag role="refnum">42</tag>
        </tags>
        <bibblock>
F. Z. Dai and Z. Cai, “Glyph-aware Embedding of Chinese Characters,” in
<emph font="italic">Proceedings of the First Workshop on Subword and Character Level Models
in NLP</emph>, 2017, pp. 64–69.
</bibblock>
      </bibitem>
      <bibitem key="siddharthan2014hybrid" xml:id="bib.bib43">
        <tags>
          <tag>[43]</tag>
          <tag role="refnum">43</tag>
        </tags>
        <bibblock>
A. Siddharthan and A. Mandya, “Hybrid text simplification using synchronous
dependency grammars with hand-written and automatically harvested rules,”
in <emph font="italic">Proceedings of EACL</emph>, 2014.
<!--  %**** taslp2019.bbl Line 225 **** --></bibblock>
      </bibitem>
      <bibitem key="quirk2005dependency" xml:id="bib.bib44">
        <tags>
          <tag>[44]</tag>
          <tag role="refnum">44</tag>
        </tags>
        <bibblock>
C. Quirk, A. Menezes, and C. Cherry, “Dependency treelet translation:
Syntactically informed phrasal SMT,” in <emph font="italic">Proceedings of ACL</emph>, 2005.
</bibblock>
      </bibitem>
      <bibitem key="nakazawa2016insertion" xml:id="bib.bib45">
        <tags>
          <tag>[45]</tag>
          <tag role="refnum">45</tag>
        </tags>
        <bibblock>
T. Nakazawa, J. Richardson, and S. Kurohashi, “Insertion position selection
model for flexible non-terminals in dependency tree-to-tree machine
translation,” in <emph font="italic">Proceedings of EMNLP</emph>, 2016.
</bibblock>
      </bibitem>
      <bibitem key="chen2017improved" xml:id="bib.bib46">
        <tags>
          <tag>[46]</tag>
          <tag role="refnum">46</tag>
        </tags>
        <bibblock>
H. Chen, S. Huang, D. Chiang, and J. Chen, “Improved neural machine
translation with a syntax-aware encoder and decoder,” in <emph font="italic">Proceedings
of ACL</emph>, 2017.
</bibblock>
      </bibitem>
      <bibitem key="socher2011semi" xml:id="bib.bib47">
        <tags>
          <tag>[47]</tag>
          <tag role="refnum">47</tag>
        </tags>
        <bibblock>
R. Socher, J. Pennington, E. H. Huang, A. Ng, and C. Manning,
“Semi-supervised recursive autoencoders for predicting sentiment
distributions,” in <emph font="italic">Proceedings of EMNLP</emph>, 2011.
</bibblock>
      </bibitem>
      <bibitem key="socher2012semantic" xml:id="bib.bib48">
        <tags>
          <tag>[48]</tag>
          <tag role="refnum">48</tag>
        </tags>
        <bibblock>
R. Socher, B. Huval, C. Manning, and A. Ng, “Semantic compositionality
through recursive matrix-vector spaces,” in <emph font="italic">Proceedings of EMNLP</emph>,
2012.
</bibblock>
      </bibitem>
      <bibitem key="socher2013recursive" xml:id="bib.bib49">
        <tags>
          <tag>[49]</tag>
          <tag role="refnum">49</tag>
        </tags>
        <bibblock>
<!--  %**** taslp2019.bbl Line 250 **** -->R. Socher, A. Perelygin, J. Wu, J. Chuang, C. Manning, A. Ng, and C. Potts,
“Recursive deep models for semantic compositionality over a sentiment
treebank,” in <emph font="italic">Proceedings of EMNLP</emph>, 2013.
</bibblock>
      </bibitem>
      <bibitem key="eriguchi2017learning" xml:id="bib.bib50">
        <tags>
          <tag>[50]</tag>
          <tag role="refnum">50</tag>
        </tags>
        <bibblock>
A. Eriguchi, Y. Tsuruoka, and K. Cho, “Learning to parse and translate
improves neural machine translation,” in <emph font="italic">Proceedings of ACL</emph>, 2017,
pp. 72–78.
</bibblock>
      </bibitem>
      <bibitem key="yogatama2017learning" xml:id="bib.bib51">
        <tags>
          <tag>[51]</tag>
          <tag role="refnum">51</tag>
        </tags>
        <bibblock>
D. Yogatama, P. Blunsom, C. Dyer, E. Grefenstette, and L. Wang, “Learning to
compose words into sentences with reinforcement learning,” in
<emph font="italic">Proceedings of ICLR</emph>, 2017.
</bibblock>
      </bibitem>
      <bibitem key="zhang2018neural" xml:id="bib.bib52">
        <tags>
          <tag>[52]</tag>
          <tag role="refnum">52</tag>
        </tags>
        <bibblock>
L. Zhang and M. Komachi, “Neural machine translation of logographic language
using sub-character level information,” in <emph font="italic">Proceedings of the Third
Conference on Machine Translation: Research Papers</emph>, 2018.
</bibblock>
      </bibitem>
      <bibitem key="su2017learning" xml:id="bib.bib53">
        <tags>
          <tag>[53]</tag>
          <tag role="refnum">53</tag>
        </tags>
        <bibblock>
T.-R. Su and H.-Y. Lee, “Learning Chinese word representations from glyphs of
characters,” in <emph font="italic">Proceedings of EMNLP</emph>, 2017.
</bibblock>
      </bibitem>
      <bibitem key="song2018joint" xml:id="bib.bib54">
        <tags>
          <tag>[54]</tag>
          <tag role="refnum">54</tag>
        </tags>
        <bibblock>
S. Yan, S. Shuming, and L. Jing, “Joint learning embeddings for Chinese words
<!--  %**** taslp2019.bbl Line 275 **** -->and their components via ladder structured networks,” in <emph font="italic">Proceedings
of IJCAI</emph>, 2018.
</bibblock>
      </bibitem>
      <bibitem key="nguyen2018multimodal" xml:id="bib.bib55">
        <tags>
          <tag>[55]</tag>
          <tag role="refnum">55</tag>
        </tags>
        <bibblock>
M. Nguyen, H. G. Ngo, and N. F. Chen, “Multimodal neural pronunciation
modeling for spoken languages with logographic origin,” in
<emph font="italic">Proceedings of EMNLP</emph>, 2018.
</bibblock>
      </bibitem>
      <bibitem key="liu2017learning" xml:id="bib.bib56">
        <tags>
          <tag>[56]</tag>
          <tag role="refnum">56</tag>
        </tags>
        <bibblock>
F. Liu, H. Lu, C. Lo, and G. Neubig, “Learning character-level
compositionality with visual features,” in <emph font="italic">Proceedings of ACL</emph>, 2017,
pp. 2059–2068.
</bibblock>
      </bibitem>
      <bibitem key="toyama2017utilizing" xml:id="bib.bib57">
        <tags>
          <tag>[57]</tag>
          <tag role="refnum">57</tag>
        </tags>
        <bibblock>
Y. Toyama, M. Miwa, and Y. Sasaki, “Utilizing Visual Forms of Japanese
Characters for Neural Review Classification,” in <emph font="italic">Proceedings of
IJCNLP</emph>, vol. 2, 2017.
</bibblock>
      </bibitem>
      <bibitem key="shi2015radical" xml:id="bib.bib58">
        <tags>
          <tag>[58]</tag>
          <tag role="refnum">58</tag>
        </tags>
        <bibblock>
X. Shi, J. Zhai, X. Yang, Z. Xie, and C. Liu, “Radical embedding: Delving
deeper to Chinese radicals,” in <emph font="italic">Proceedings of ACL</emph>, vol. 2, 2015.
</bibblock>
      </bibitem>
      <bibitem key="peng2017radical" xml:id="bib.bib59">
        <tags>
          <tag>[59]</tag>
          <tag role="refnum">59</tag>
        </tags>
        <bibblock>
H. Peng, E. Cambria, and X. Zou, “Radical-based hierarchical embeddings for
Chinese sentiment analysis at sentence level,” in <emph font="italic"><!--  %**** taslp2019.bbl Line 300 **** -->The 30th
International FLAIRS conference. Marco Island</emph>, 2017, pp. 347–352.
</bibblock>
      </bibitem>
      <bibitem key="yu2017joint" xml:id="bib.bib60">
        <tags>
          <tag>[60]</tag>
          <tag role="refnum">60</tag>
        </tags>
        <bibblock>
J. Yu, X. Jian, H. Xin, and Y. Song, “Joint embeddings of chinese words,
characters, and fine-grained subcharacter components,” in <emph font="italic">Proceedings
of EMNLP</emph>, 2017.
</bibblock>
      </bibitem>
      <bibitem key="yin2016multi" xml:id="bib.bib61">
        <tags>
          <tag>[61]</tag>
          <tag role="refnum">61</tag>
        </tags>
        <bibblock>
R. Yin, Q. Wang, P. Li, R. Li, and B. Wang, “Multi-granularity Chinese word
embedding,” in <emph font="italic">Proceedings of EMNLP</emph>, 2016.
</bibblock>
      </bibitem>
      <bibitem key="ke2017radical" xml:id="bib.bib62">
        <tags>
          <tag>[62]</tag>
          <tag role="refnum">62</tag>
        </tags>
        <bibblock>
Y. Ke and M. Hagiwara, “Radical-level ideograph encoder for RNN-based
sentiment analysis of Chinese and Japanese,” in <emph font="italic">Asian Conference on
Machine Learning</emph>, 2017, pp. 561–573.
</bibblock>
      </bibitem>
      <bibitem key="karpinska2018subcharacter" xml:id="bib.bib63">
        <tags>
          <tag>[63]</tag>
          <tag role="refnum">63</tag>
        </tags>
        <bibblock>
M. Karpinska, B. Li, A. Rogers, and A. Drozd, “Subcharacter information in
Japanese embeddings: When is it worth it?” in <emph font="italic">Proceedings of the
Workshop on the Relevance of Linguistic Structure in Neural Architectures for
NLP</emph>, 2018.
</bibblock>
      </bibitem>
      <bibitem key="dong2016character" xml:id="bib.bib64">
        <tags>
          <tag>[64]</tag>
          <tag role="refnum">64</tag>
        </tags>
        <bibblock>
C. Dong, J. Zhang, C. Zong, M. Hattori, and H. Di, “Character-based LSTM-CRF
with radical-level features for Chinese named entity recognition,” in
<!--  %**** taslp2019.bbl Line 325 **** --><emph font="italic">Natural Language Understanding and Intelligent Applications</emph>.   Springer, 2016.
</bibblock>
      </bibitem>
      <bibitem key="han2017dual" xml:id="bib.bib65">
        <tags>
          <tag>[65]</tag>
          <tag role="refnum">65</tag>
        </tags>
        <bibblock>
H. Han, X. Yang, L. Wu, H. Yan, Z. Gao, Y. Feng, and G. Townsend, “Dual long
short-term memory networks for sub-character representation learning,”
<emph font="italic">CoRR</emph>, vol. abs/1712.08841, 2017.
</bibblock>
      </bibitem>
      <bibitem key="zhuang2017natural" xml:id="bib.bib66">
        <tags>
          <tag>[66]</tag>
          <tag role="refnum">66</tag>
        </tags>
        <bibblock>
H. Zhuang, C. Wang, C. Li, Q. Wang, and X. Zhou, “Natural Language Processing
Service Based on Stroke-Level Convolutional Networks for Chinese Text
Classification,” in <emph font="italic">Web Services (ICWS), 2017 IEEE International
Conference on</emph>, 2017.
</bibblock>
      </bibitem>
      <bibitem key="cao2018cw2vec" xml:id="bib.bib67">
        <tags>
          <tag>[67]</tag>
          <tag role="refnum">67</tag>
        </tags>
        <bibblock>
S. Cao, W. Lu, J. Zhou, and X. Li, “cw2vec: Learning Chinese word embeddings
with stroke n-grams,” in <emph font="italic">Proceedings of the AAAI Conference on
Artificial Intelligence</emph>, 2018, pp. 5053–5061.
</bibblock>
      </bibitem>
      <bibitem key="bojanowski2017enriching" xml:id="bib.bib68">
        <tags>
          <tag>[68]</tag>
          <tag role="refnum">68</tag>
        </tags>
        <bibblock>
P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Enriching word vectors
with subword information,” <emph font="italic">Transactions of ACL</emph>, vol. 5, 2017.
</bibblock>
      </bibitem>
      <bibitem key="zhao2018generalizing" xml:id="bib.bib69">
        <tags>
          <tag>[69]</tag>
          <tag role="refnum">69</tag>
        </tags>
        <bibblock>
J. Zhao, S. Mudgal, and Y. Liang, “Generalizing word embeddings using bag of
<!--  %**** taslp2019.bbl Line 350 **** -->subwords,” in <emph font="italic">Proceedings of EMNLP</emph>, 2018, pp. 601–606.
</bibblock>
      </bibitem>
      <bibitem key="kim2016character" xml:id="bib.bib70">
        <tags>
          <tag>[70]</tag>
          <tag role="refnum">70</tag>
        </tags>
        <bibblock>
Y. Kim, Y. Jernite, D. Sontag, and A. Rush, “Character-aware neural language
models,” in <emph font="italic">Proceedings of AAAI</emph>, 2016.
</bibblock>
      </bibitem>
      <bibitem key="papay2018addressing" xml:id="bib.bib71">
        <tags>
          <tag>[71]</tag>
          <tag role="refnum">71</tag>
        </tags>
        <bibblock>
S. Papay, S. Padó, and N. T. Vu, “Addressing low-resource scenarios with
character-aware embeddings,” in <emph font="italic">Proceedings of the Second Workshop on
Subword and Character Level Models in NLP</emph>, 2018.
</bibblock>
      </bibitem>
      <bibitem key="pinter2017mimicking" xml:id="bib.bib72">
        <tags>
          <tag>[72]</tag>
          <tag role="refnum">72</tag>
        </tags>
        <bibblock>
Y. Pinter, R. Guthrie, and J. Eisenstein, “Mimicking word embeddings using
subword RNNs,” in <emph font="italic">Proceedings of EMNLP</emph>, 2017.
</bibblock>
      </bibitem>
      <bibitem key="kim2018learning" xml:id="bib.bib73">
        <tags>
          <tag>[73]</tag>
          <tag role="refnum">73</tag>
        </tags>
        <bibblock>
Y. Kim, K.-M. Kim, J.-M. Lee, and S. Lee, “Learning to generate word
representations using subword information,” in <emph font="italic">Proceedings of
COLING</emph>, 2018.
</bibblock>
      </bibitem>
      <bibitem key="schick2019attentive" xml:id="bib.bib74">
        <tags>
          <tag>[74]</tag>
          <tag role="refnum">74</tag>
        </tags>
        <bibblock>
T. Schick and H. Schütze, “Attentive Mimicking: Better word embeddings by
attending to informative contexts,” in <emph font="italic">Proceedings of NAACL-HLT</emph>,
2019.
<!--  %**** taslp2019.bbl Line 375 **** --></bibblock>
      </bibitem>
      <bibitem key="li1993analysis" xml:id="bib.bib75">
        <tags>
          <tag>[75]</tag>
          <tag role="refnum">75</tag>
        </tags>
        <bibblock>
Y. Li and J. Kang, “Analysis of phonetics of the ideophonetic characters in
Modern Chinese,” <emph font="italic">Information analysis of usage of characters in
modern Chinese</emph>, pp. 84–98, 1993.
</bibblock>
      </bibitem>
      <bibitem key="nguyen2017sub" xml:id="bib.bib76">
        <tags>
          <tag>[76]</tag>
          <tag role="refnum">76</tag>
        </tags>
        <bibblock>
V. Nguyen, J. Brooke, and T. Baldwin, “Sub-character neural language
modelling in Japanese,” in <emph font="italic">Proceedings of the First Workshop on
Subword and Character Level Models in NLP</emph>, 2017.
</bibblock>
      </bibitem>
      <bibitem key="kuncoro2018lstms" xml:id="bib.bib77">
        <tags>
          <tag>[77]</tag>
          <tag role="refnum">77</tag>
        </tags>
        <bibblock>
A. Kuncoro, C. Dyer, J. Hale, D. Yogatama, S. Clark, and P. Blunsom, “LSTMs
can learn syntax-sensitive dependencies well, but modeling structure makes
them better,” in <emph font="italic">Proceedings of ACL</emph>, 2018.
</bibblock>
      </bibitem>
      <bibitem key="bjorne2009extracting" xml:id="bib.bib78">
        <tags>
          <tag>[78]</tag>
          <tag role="refnum">78</tag>
        </tags>
        <bibblock>
J. Björne, J. Heimonen, F. Ginter, A. Airola, T. Pahikkala, and
T. Salakoski, “Extracting complex biological events with rich graph-based
feature sets,” in <emph font="italic">Proceedings of the Workshop on Current Trends in
Biomedical Natural Language Processing: Shared Task</emph>, 2009, pp. 10–18.
</bibblock>
      </bibitem>
      <bibitem key="choi2018learning" xml:id="bib.bib79">
        <tags>
          <tag>[79]</tag>
          <tag role="refnum">79</tag>
        </tags>
        <bibblock>
J. Choi, K. M. Yoo, and S.-G. Lee, “Learning to compose task-specific tree
structures,” in <emph font="italic">Proceedings of AAAI</emph>, 2018, pp. 5094–5101.
<!--  %**** taslp2019.bbl Line 400 **** --></bibblock>
      </bibitem>
      <bibitem key="williams2018latent" xml:id="bib.bib80">
        <tags>
          <tag>[80]</tag>
          <tag role="refnum">80</tag>
        </tags>
        <bibblock>
A. Williams, A. Drozdov*, and S. R. Bowman, “Do latent tree learning models
identify meaningful structure in sentences?” <emph font="italic">Transactions of ACL</emph>,
vol. 6, 2018.
</bibblock>
      </bibitem>
      <bibitem key="zhang2019chinese" xml:id="bib.bib81">
        <tags>
          <tag>[81]</tag>
          <tag role="refnum">81</tag>
        </tags>
        <bibblock>
L. Zhang and M. Komachi, “Chinese-Japanese Unsupervised Neural Machine
Translation Using Sub-character Level Information,” <emph font="italic">CoRR</emph>, vol.
abs/1903.00149, 2019.
</bibblock>
      </bibitem>
      <bibitem key="chen2015probabilistic" xml:id="bib.bib82">
        <tags>
          <tag>[82]</tag>
          <tag role="refnum">82</tag>
        </tags>
        <bibblock>
K.-Y. Chen, H.-M. Wang, and H.-H. Chen, “A probabilistic framework for
Chinese spelling check,” <emph font="italic">ACM Transactions on Asian and Low-Resource
Language Information Processing (TALLIP)</emph>, vol. 14, no. 4, pp. 15:1–15:17,
2015.
</bibblock>
      </bibitem>
      <bibitem key="vaswani2017attention" xml:id="bib.bib83">
        <tags>
          <tag>[83]</tag>
          <tag role="refnum">83</tag>
        </tags>
        <bibblock>
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez,
Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in
<emph font="italic">Proceedings of NIPS</emph>, 2017, pp. 5998–6008.
</bibblock>
      </bibitem>
    </biblist>
  </bibliography>
</document>
