18.22M
Category: englishenglish

Linguistically-Informed Self-Attention for Semantic Role Labeling

1.

LISA
Linguistically-Informed Self-Attention
for Semantic Role Labeling
Emma
1
Strubell
Patrick
1
Verga
1
Daniel
2
Andor
2
David
2
Weiss
Andrew
1
McCallum

2.

Want fast, accurate, robust NLP
( (S (NP-SBJ (NP (NNP Pierre) (NNP Vinken))
(, ,)
(ADJP (NML (CD 61) (NNS years))
(JJ old))
(, ,))
(VP (MD will)
(VP (VB join)
(NP (DT the) (NN board))
(PP-CLR (IN as)
(NP (DT a) (JJ nonexecutive) (NN director)))
(NP-TMP (NNP Nov.) (CD 29))))
(. .)))
2

3.

SRL: Who did what to whom?
root
nsubj
Committee
agent
rcmod
prep
dobj
pobj
nsubj
awards Nobel to Strickland who
predicate
theme
beneficiary
dobj
advanced optics
predicate
3

4.

SRL: Who did what to whom?
root
nsubj
Committee
agent
rcmod
prep
dobj
pobj
nsubj
awards Nobel to Strickland who
predicate
theme
agent
beneficiary
dobj
advanced optics
predicate
theme
4

5.

SRL: Who did what to whom?
root
nsubj
Committee
rcmod
prep
dobj
awards
pobj
nsubj
Nobel to Strickland who
agent
Committee
agent
awards Nobel to Strickland who
predicate
theme
dobj
advanced optics
predicate
theme
advanced optics
beneficiary
5

6.

PropBank SRL: Who did what to whom?
root
nsubj
rcmod
prep
dobj
pobj
nsubj
dobj
Committee
awards
Nobel to Strickland who
advanced optics
Committee
awards
Nobel to Strickland who
advanced optics
agent
ARG
R-ARG0 predicate
V
0
Committee
agent0
ARG
awards Nobel to Strickland who
predicate
V
theme
ARG1
theme
ARG1
advanced optics
beneficiary
ARG2
6

7.

10 years of PropBank SRL
in-domain
End-to-end
deep NN
[Tan, Wang, Xie,
Chen, Shi]
[Tan et al.]
[He et al.]
[He, Lee, Lewis,
[He et al.]
Zettlemoyer]
Syntax-based
[Toutanova,etHaghigi,
Manning]
[Toutanova
al.]
[Punyakanok,
[Punyakanok et
Roth,
al.] Yih]
[He, Lee, Levy,
Zettlemoyer]
[Zhou & Xu]
[FitzGerald, Täckström, Ganchev, Das]
[FitzGerald et al.]
[Täckström et al.]
[Täckström, Ganchev, Das]
F1
Year
7

8.

10 years of PropBank SRL
in-domain
[LISA]
[LISA]
F1
out-of-domain
[Tan et al.]
[He et al.]
[FitzGerald et al.]
[He et al.]
[Tackström et al.]
[Toutanova et al.]
[Zhou & Xu]
[Punyakanok et al.]
Year
8

9.

Linguistically-Informed Self-Attention
Multi-task learning, single-pass inference
– Part-of-speech tagging
– Labeled dependency parsing
– Predicate detection
– Semantic role spans & labeling
Syntactically-informed self-attention
Multi-head self-attention
self-attention supervised
supervised by
by syntax
syntax
– Multi-head
9

10.

Outline
Want fast, accurate, robust NLU
PropBank SRL: Who did what to whom?
10 years of PropBank SRL
LISA: Linguistically-informed self attention
Multi-head self-attention
Syntactically-informed self-attention
Multi-task learning, single-pass inference
[Vaswani et al. 2017]
Experimental results & error analysis
10

11.

Self-attention
[Vaswani et al. 2017]
Q
K
V
Layer p
Nobel committee awards Strickland who advanced optics
11

12.

Self-attention
[Vaswani et al. 2017]
Q
K
V
Layer p
Nobel committee awards Strickland who advanced optics
12

13.

Self-attention
[Vaswani et al. 2017]
optics
advanced
who
Strickland
awards
committee
Nobel
Q
K
V
Layer p
Nobel committee awards Strickland who advanced optics
13

14.

Self-attention
[Vaswani et al. 2017]
optics
advanced
who
Strickland
awards
committee
Nobel
A
Q
K
V
Layer p
Nobel committee awards Strickland who advanced optics
14

15.

Self-attention
[Vaswani et al. 2017]
optics
advanced
who
Strickland
awards
committee
Nobel
A
Q
K
V
Layer p
Nobel committee awards Strickland who advanced optics
15

16.

Self-attention
[Vaswani et al. 2017]
optics
advanced
who
Strickland
awards
committee
Nobel
A
Q
K
V
Layer p
Nobel committee awards Strickland who advanced optics
16

17.

Self-attention
[Vaswani et al. 2017]
M
optics
advanced
who
Strickland
awards
committee
Nobel
A
Q
K
V
Layer p
Nobel committee awards Strickland who advanced optics
17

18.

Self-attention
[Vaswani et al. 2017]
M
optics
advanced
who
Strickland
awards
committee
Nobel
A
Q
K
V
Layer p
Nobel committee awards Strickland who advanced optics
18

19.

[Vaswani et al. 2017]
Multi-head self-attention
MH
M1
optics
advanced
who
Strickland
awards
committee
Nobel
A
Q
K
V
Layer p
Nobel committee awards Strickland who advanced optics
19

20.

[Vaswani et al. 2017]
Multi-head self-attention
MH
M1
optics
advanced
who
Strickland
awards
committee
Nobel
A
Q
K
V
Layer p
Nobel committee awards Strickland who advanced optics
20

21.

[Vaswani et al. 2017]
Multi-head self-attention
Feed
Forward
Forward
Forward
Feed
Forward
Forward
Forward
Feed
Forward
Forward
Forward
Feed
Forward
Forward
Forward
Feed
Forward
Forward
Forward
Feed
Forward
Forward
Forward
Feed
Forward
Forward
Forward
MH
M1
optics
advanced
who
Strickland
awards
committee
Nobel
A
Q
K
V
Layer
Layer p
p+1
Nobel committee awards Strickland who advanced optics
21

22.

[Vaswani et al. 2017]
Multi-head self-attention
p+1
Nobel committee awards Strickland who advanced optics
22

23.

[Vaswani et al. 2017]
Multi-head self-attention
Layer J
Multi-head self-attention + feed forward
Layer p
Multi-head self-attention + feed forward
Layer 1
Multi-head self-attention + feed forward
Nobel committee awards Strickland who advanced optics
23

24.

Outline
Want fast, accurate, robust NLU
PropBank SRL: Who did what to whom?
10 years of PropBank SRL
LISA: Linguistically-informed self attention
Multi-head self-attention
Syntactically-informed self-attention
Multi-task learning, single-pass inference
[Vaswani et al. 2017]
Experimental results & error analysis
24

25.

How to incorporate syntax?
Multi-task learning [Caruana 1993; Collobert et al. 2011]:


Dependency path embeddings [Roth & Lapata 2016];
Graph CNN over parse [Marcheggiani & Titov 2017]

Overfits to training domain like single-task end-to-end NN.
Must re-train SRL model to leverage new (improved) syntax.
Restricted context: path to predicate or fixed-width window.
Syntactically-informed self-attention



In one head, token attends to its likely syntactic parent(s).
Global context: In next layer, tokens observe all other parents.
At test time: can use own predicted parse, OR
supply syntax to improve SRL model without re-training.
25

26.

Syntactically-informed self-attention
Feed
Forward
Forward
Forward
Feed
Forward
Forward
Forward
Feed
Forward
Forward
Forward
Feed
Forward
Forward
Forward
Feed
Forward
Forward
Forward
Feed
Forward
Forward
Forward
Feed
Forward
Forward
Forward
MH
M1
optics
advanced
who
Strickland
awards
committee
Nobel
A
Q
K
V
biaffine
parser
biaffine
parser
biaffine
parser
biaffine
parser
biaffine
parser
biaffine
parser
biaffine
parser
Layer p
Nobel committee awards Strickland who advanced optics
[Dozat and Manning 2017]
26

27.

Syntactically-informed self-attention
Nobel committee awards Strickland who advanced optics
27

28.

Syntactically-informed self-attention
Layer J
Multi-head self-attention + feed forward
Layer p
Syntactically-informed self-attention
Layer 1
Multi-head self-attention + feed forward
Nobel committee awards Strickland who advanced optics
28

29.

Outline
Want fast, accurate, robust NLU
PropBank SRL: Who did what to whom?
10 years of PropBank SRL
LISA: Linguistically-informed self attention
Multi-head self-attention
Syntactically-informed self-attention
Multi-task learning, single-pass inference
Experimental results & error analysis
29

30.

LISA: Linguistically-Informed Self-Attention
Layer J
NNP
Multi-head self-attention + feed forward
NN
VBZ/PRED
WP
VBN/PRED
NNP
Layer rp
Syntactically-informed
Multi-head
self-attention self-attention
+ feed forward
Layer 1
Multi-head self-attention + feed forward
NN
Nobel committee awards Strickland who advanced optics
30

31.

LISA: Linguistically-Informed Self-Attention
Layer J
nn
Layer p
Multi-head self-attention + feed forward
nsubj
nsubj
root
dobj
rcmod
dobj
VBN/PRED
NN
Syntactically-informed self-attention
NNP
NN
VBZ/PRED
NNP
WP
Layer r
Multi-head self-attention + feed forward
Layer 1
Multi-head self-attention + feed forward
Nobel committee awards Strickland who advanced optics
31

32.

LISA: Linguistically-Informed Self-Attention
Layer J
Multi-head self-attention + feed forward
nn
Layer p
nsubj
root
dobj
nsubj
rcmod
dobj
VBN/PRED
NN
Syntactically-informed self-attention
NNP
NN
VBZ/PRED
NNP
WP
Layer r
Multi-head self-attention + feed forward
Layer 1
Multi-head self-attention + feed forward
Nobel committee awards Strickland who advanced optics
32

33.

LISA: Linguistically-Informed Self-Attention
Bilinear
predicates
args
Layer J
Multi-head self-attention + feed forward
nn
Layer p
root
dobj
nsubj
rcmod
dobj
VBN/PRED
NN
Syntactically-informed self-attention
NNP
Layer r
nsubj
NN
VBZ/PRED
NNP
WP
Multi-head self-attention + feed forward
Nobel committee awards Strickland who advanced optics
33

34.

LISA: Linguistically-Informed Self-Attention
Bilinear
predicates
args
Layer J
Multi-head self-attention + feed forward
nn
Layer p
root
dobj
nsubj
rcmod
dobj
VBN/PRED
NN
Syntactically-informed self-attention
NNP
Layer r
nsubj
NN
VBZ/PRED
NNP
WP
Multi-head self-attention + feed forward
Nobel committee awards Strickland who advanced optics
34

35.

LISA: Linguistically-Informed Self-Attention
B-ARG0
Bilinear
predicates
args
Layer J
Multi-head self-attention + feed forward
nn
Layer p
root
dobj
nsubj
rcmod
dobj
VBN/PRED
NN
Syntactically-informed self-attention
NNP
Layer r
nsubj
NN
VBZ/PRED
NNP
WP
Multi-head self-attention + feed forward
Nobel committee awards Strickland who advanced optics
35

36.

LISA: Linguistically-Informed Self-Attention
B-ARG0
Bilinear
predicates
args
Layer J
Multi-head self-attention + feed forward
nn
Layer p
root
dobj
nsubj
rcmod
dobj
VBN/PRED
NN
Syntactically-informed self-attention
NNP
Layer r
nsubj
NN
VBZ/PRED
NNP
WP
Multi-head self-attention + feed forward
Nobel committee awards Strickland who advanced optics
36

37.

LISA: Linguistically-Informed Self-Attention
B-ARG0
I-ARG0
B-V
B-ARG1
O
O
O
Bilinear
predicates
args
Layer J
Multi-head self-attention + feed forward
nn
Layer p
root
dobj
nsubj
rcmod
dobj
VBN/PRED
NN
Syntactically-informed self-attention
NNP
Layer r
nsubj
NN
VBZ/PRED
NNP
WP
Multi-head self-attention + feed forward
Nobel committee awards Strickland who advanced optics
37

38.

LISA: Linguistically-Informed Self-Attention
B-ARG0
I-ARG0
B-V
B-ARG1
O
O
O
Bilinear
predicates
args
Layer J
Multi-head self-attention + feed forward
nn
Layer p
root
dobj
nsubj
rcmod
dobj
VBN/PRED
NN
Syntactically-informed self-attention
NNP
Layer r
nsubj
NN
VBZ/PRED
NNP
WP
Multi-head self-attention + feed forward
Nobel committee awards Strickland who advanced optics
38

39.

LISA: Linguistically-Informed Self-Attention
B-ARG0
I-ARG0
B-V
B-ARG1
O
O
O
Bilinear
predicates
args
Layer J
Multi-head self-attention + feed forward
nn
Layer p
root
dobj
nsubj
rcmod
dobj
VBN/PRED
NN
Syntactically-informed self-attention
NNP
Layer r
nsubj
NN
VBZ/PRED
NNP
WP
Multi-head self-attention + feed forward
Nobel committee awards Strickland who advanced optics
39

40.

LISA: Linguistically-Informed Self-Attention
B-ARG0
I-ARG0
B-V
B-ARG1
O
O
O
Bilinear
predicates
args
Layer J
Multi-head self-attention + feed forward
nn
Layer p
root
dobj
nsubj
rcmod
dobj
VBN/PRED
NN
Syntactically-informed self-attention
NNP
Layer r
nsubj
NN
VBZ/PRED
NNP
WP
Multi-head self-attention + feed forward
Nobel committee awards Strickland who advanced optics
40

41.

LISA: Linguistically-Informed Self-Attention
B-ARG0
O
I-ARG0
O
B-V
O
B-ARG1
B-ARG0
O
B-R-ARG0
O
B-V
O
B-ARG1
Bilinear
predicates
args
Layer J
Multi-head self-attention + feed forward
nn
Layer p
root
dobj
nsubj
rcmod
dobj
VBN/PRED
NN
Syntactically-informed self-attention
NNP
Layer r
nsubj
NN
VBZ/PRED
NNP
WP
Multi-head self-attention + feed forward
Nobel committee awards Strickland who advanced optics
41

42.

Outline
Want fast, accurate, robust NLU
PropBank SRL: Who did what to whom?
10 years of PropBank SRL
LISA: Linguistically-informed self attention
Multi-head self-attention
Syntactically-informed self-attention
Multi-task learning, single-pass inference
Experimental results & error analysis
46

43.

Experimental results
CoNLL-2005
domains
Train, dev: news
Test: news, novels
CoNLL-2012
Train, dev, test: 7 domains
(news, telephone, bible, ...)
word
GloVe [Pennington et al. 2014] GloVe [Pennington et al. 2014]
embeddings ELMo [Peters et al. 2018]
ELMo [Peters et al. 2018]
predicates
predicted; gold
predicted
baselines
He et al. 2017
He et al. 2018
Tan et al. 2018
He et al. 2018
SA
LISA
our models
LISA+D&M,
+Gold Lisa_Gold
SA
LISA
LISA+D&M,
+Gold Lisa_Gold
47

44.

Experimental results: CoNLL-2005
GloVe
ELMo
in-domain out-of-domain in-domain out-of-domain
He et al. 2017
82.7
70.1
---
---
He et al. 2018
82.5
70.8
86.0
76.1
SA
83.72
71.51
86.09
76.35
LISA
83.61
71.91
86.55
78.05
84.99
74.66
86.90
78.25
+D&M
+Gold
?
48

45.

Experimental results: CoNLL-2005
GloVe
ELMo
in-domain (dev)
in-domain (dev)
He et al. 2017
81.5
---
He et al. 2018
81.6
85.3
SA
82.39
85.26
LISA
82.24
85.35
+D&M
83.58
85.17
+Gold
86.81
87.63
49

46.

Experimental results: Analysis
51

47.

Experimental results: Analysis
boundary mistakes
52

48.

Summary
LISA: Multi-task learning + multi-head self attention
trained to attend to syntactic parents



Achieves state-of-the-art F1 on PropBank SRL
Linguistic structure improves generalization
Fast: encodes sequence only once to predict predicates,
parts-of-speech, labeled dependency parse, SRL
Everyone wants to run NLP on the entire web:

accuracy, robustness, computational efficiency.
Models & Code:
https://github.com/strubell/LISA
I am on the academic job market this spring!
53
English     Русский Rules