0%

Dependency Parsing

本文主要介绍dependency parsing,包括:dependency parsing和constituency parsing的区别与优点;实现dependency parsing的两种主要方法(trasition based 和 MST);用RNNs实现dependency parsing。

原文参考

[1] Jurafsky and Martin Ch. 14 (3rd edition)

[2] Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations(Eliyahu Kiperwasser, Yoav Goldberg, 2016)

正文为英文

Intro

Constituents vs. Dependencies

  • Syntax is often described in terms of constituency
  • Traditional grammars model constituent structure:
  • they capture the configurational patterns of sentences.
  • In other languages (e.g., German), there is little evidence for the existence of a VP constituent.
  • leave out much semantic information
  • A dependency structure consists of dependency relations, which are binary and asymmetric.
  • Dependency syntax is closer to semantics
  • Dependency syntax is still (usually) tree-like: connected, single headed, acyclic.
  • Dependency Trees can be non-projective; A non-projective dependency grammar is not context-free.

Parsers

Dependency Parsing

  • Different from constituency, why consider dependency parsing as a distinct topic:

    • context-free parsing algorithms base their decisions on adjacency;
    • Long-distance dependency (non-adjacency): in a dependency structure, a dependent need not be adjacent to its head (even if the structure is projective);
    • we need new parsing algorithms to deal with non-adjacency (and with non-projectivity if present).
  • Two types of dependency parsers

    • graph-based dependency parsing, based on maximum spanning trees (MST parser)
    • transition-based dependency parsing, an extension of shift-reduce parsing (MALT parser)

Maximum Spanning Trees (MST)

  • Goal: find the highest scoring dependency tree $\mathbf{y}$ in the space of all possible trees for a sentence $\mathbf{x}$.
  • Formulize:

    • The score of a dependency edge $(i,j)$ is a function $s(i,j)$.
    • Then the score of dependency tree $\mathbf{y}$ for sentence $\mathbf{x}$ is:
  • process:

    • start with a totally connected graph $G$, i.e., assume a directed edge between every pair of words;
    • assume you have a scoring function that assigns a score $s(i,j)$ to every edge $(i,j)$;
    • find the maximum spanning tree (MST) of $G$, i.e., the directed tree with the highest overall score that includes all nodes of $G$;
    • this is possible in $O(n^2)$ time using the Chu-Liu-Edmonds algorithm:
  • it finds a MST which is not guaranteed to be projective;

  • actually $O(mn)$ where $m$ is number of edges, $n$ is number of nodes .

    • the highest-scoring parse is the MST of $G$.
  • Scoring edges with a neural network

    • An effective formulation:
    • we get $\mathbf{a}_i$ by concatenating the hidden states of a forward and backward RNN at position $i$.
    • the function $g({\mathbf {a}_j,\mathbf{a}_i})$ computes an association score telling us how much word $w_i$ prefers word $w_j$ as its head.
    • Association scores is a useful way to select from a dynamic group of candidates, and underly the idea of attention.

Transition-based dependency parsing

  • for a give parse state, the transition system defines a set of actions which the parser can take.
  • if more than one action is applicable, a classifier (e.g., a SVM) is used to decide which action to take
  • Also require a mechanism to compute scores over a set of (possible dynamic) candidates: In the paper, use MLP to calculate the score.
  • Understand: Process of transition-based parsing

Comparision

Comparing MST and transition-based parsers:

  • the MST parser selects the globally optimal tree, given a set of edges with scores;
  • it can naturally handle projective and non-projective trees;
  • a transition-based parser makes a sequence of local decisions about the best parse action;
  • it can be extended to projective dependency trees by changing the transition set;
  • accuracies are similar, but transition-based is faster;
  • both require dynamic classifiers, and these can be implemented using neural networks, conditioned on bidirectional RNN encodings of the sentence.

References

[1] Jurafsky, D., & Martin, J. H. (2014). Speech and language processing (Vol. 3). London:: Pearson.

[2] Kiperwasser, E., & Goldberg, Y. (2016). Simple and accurate dependency parsing using bidirectional LSTM feature representations. arXiv preprint arXiv:1603.04351.

Modified 18th-May-2018 12:00