Erratum: “Improving Topic Models with Latent Feature Word
Representations”
Dat Quoc Nguyen, Richard Billingsley, Lan Du and Mark Johnson
Abstrait
FROM (a part of Table 10 in the original published arti-
clé): F1 scores for TMN and TMNtitle datasets.
Change in clustering and classification results
due to the DMM and LF-DMM bugs.
Données
TMN
4.3 Document clustering evaluation
FROM (in the original published article): For exam-
ple with 40 topics on the TMNtitle dataset, the DMM
achieves about 6% higher Purity and NMI scores
than LDA.
TO: For example with 80 topics on the TMNtitle
dataset, the DMM achieves about 7+% higher Purity
and NMI scores than LDA.
FROM (in the original published article): on the
short text TMN and TMNtitle datasets we obtain
3.6% et 3.0% higher Purity at T = 80.
TO: on the short text TMN and TMNtitle datasets
we obtain 6.1% et 2.5% higher Purity at T = 80.
4.4 Document classification evaluation
FROM (in the original published article): In addi-
tion, our w2v-DMM model achieves 3.6% et 3.4%
higher F1 score than the DMM model on short TMN
and TMNtitle datasets with T = 80, respectivement.
TO:
En outre, our w2v-DMM model achieves
5.4% et 2.9% higher F1 score than the DMM model
on short TMN and TMNtitle datasets with T = 80,
respectivement.
λ = 0.6
Method
T=80
T=40
T=20
T=7
0.605 ± 0.023 0.724 ± 0.016 0.738 ± 0.008 0.741 ± 0.005
DMM
w2v-DMM 0.619 ± 0.033 0.744 ± 0.009 0.759 ± 0.005 0.777 ± 0.005
glove-DMM 0.624 ± 0.025 0.757 ± 0.009 0.761 ± 0.005 0.774 ± 0.010
Improve.
0.019
0.570 ± 0.022 0.650 ± 0.011 0.654 ± 0.008 0.646 ± 0.008
TMNtitle w2v-DMM 0.562 ± 0.022 0.670 ± 0.012 0.677 ± 0.006 0.680 ± 0.003
glove-DMM 0.592 ± 0.017 0.674 ± 0.016 0.683 ± 0.006 0.679 ± 0.009
Improve.
0.029
0.024
0.023
0.033
0.034
0.022
0.036
DMM
TO: F1 scores for TMN and TMNtitle datasets.
λ = 0.6
Method
Données
TMN
T=80
T=40
T=20
T=7
0.607 ± 0.040 0.694 ± 0.026 0.712 ± 0.014 0.721 ± 0.008
DMM
w2v-DMM 0.607 ± 0.019 0.736 ± 0.025 0.760 ± 0.011 0.771 ± 0.005
glove-DMM 0.621 ± 0.042 0.750 ± 0.011 0.759 ± 0.006 0.775 ± 0.006
Improve.
DMM
0.014
0.500 ± 0.021 0.600 ± 0.015 0.630 ± 0.016 0.652 ± 0.005
TMNtitle w2v-DMM 0.528 ± 0.028 0.663 ± 0.008 0.682 ± 0.008 0.681 ± 0.006
glove-DMM 0.565 ± 0.022 0.680 ± 0.011 0.684 ± 0.009 0.681 ± 0.004
Improve.
0.056
0.048
0.054
0.029
0.054
0.065
0.08
FROM (a part of Table 11 in the original published arti-
clé): F1 scores for Twitter dataset.
λ = 0.6
Data Method
DMM
T=20
T=4
0.505 ± 0.023 0.614 ± 0.012 0.634 ± 0.013 0.656 ± 0.011
Twitter w2v-DMM 0.541 ± 0.035 0.636 ± 0.015 0.648 ± 0.011 0.670 ± 0.010
glove-DMM 0.539 ± 0.024 0.638 ± 0.017 0.645 ± 0.012 0.666 ± 0.009
Improve.
0.024
0.036
0.014
0.014
T=80
T=40
TO: F1 scores for Twitter dataset.
λ = 0.6
Data Method
DMM
T=20
T=4
0.469 ± 0.014 0.600 ± 0.021 0.645 ± 0.009 0.665 ± 0.014
Twitter w2v-DMM 0.539 ± 0.016 0.649 ± 0.016 0.656 ± 0.007 0.676 ± 0.012
glove-DMM 0.536 ± 0.027 0.654 ± 0.019 0.657 ± 0.008 0.680 ± 0.009
Improve.
0.054
0.015
0.012
T=80
T=40
0.07
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
t
un
c
je
/
je
un
r
t
je
c
e
–
p
d
F
/
d
o
je
/
.
1
0
1
1
6
2
/
t
je
un
c
_
un
_
0
0
2
4
5
1
5
6
6
8
0
4
/
/
t
je
un
c
_
un
_
0
0
2
4
5
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
FROM (a part of Table 7 in the original published article): Purity and NMI results on the TMN and TMNtitle datasets
with the mixture weight λ = 0.6.
Purity
NMI
Method
Données
TMN
T=40
T=20
T=80
T=7
0.632 ± 0.025 0.719 ± 0.020 0.735 ± 0.010 0.742 ± 0.005 0.445 ± 0.017 0.426 ± 0.010 0.397 ± 0.006 0.364 ± 0.002
DMM
w2v-DMM 0.639 ± 0.024 0.741 ± 0.011 0.759 ± 0.006 0.778 ± 0.005 0.437 ± 0.018 0.429 ± 0.004 0.402 ± 0.003 0.377 ± 0.002
glove-DMM 0.646 ± 0.022 0.757 ± 0.009 0.763 ± 0.005 0.775 ± 0.011 0.445 ± 0.023 0.443 ± 0.008 0.404 ± 0.003 0.378 ± 0.004
Improve.
0.014
0.598 ± 0.018 0.650 ± 0.011 0.657 ± 0.007 0.651 ± 0.008 0.353 ± 0.012 0.317 ± 0.007 0.287 ± 0.004 0.257 ± 0.004
TMNtitle w2v-DMM 0.583 ± 0.020 0.665 ± 0.012 0.674 ± 0.006 0.681 ± 0.003 0.324 ± 0.013 0.329 ± 0.007 0.300 ± 0.003 0.277 ± 0.003
glove-DMM 0.601 ± 0.021 0.670 ± 0.016 0.680 ± 0.005 0.679 ± 0.008 0.354 ± 0.013 0.333 ± 0.009 0.301 ± 0.003 0.278 ± 0.003
Improve.
0.023
0.028
0.014
0.003
0.007
0.016
0.036
0.021
0.017
0.014
0.001
0.038
T=20
T=80
T=40
0.03
0.02
DMM
T=7
0.0
TO: Purity and NMI results on the TMN and TMNtitle datasets with the mixture weight λ = 0.6.
Purity
NMI
Method
Données
TMN
T=40
T=20
T=80
T=7
0.637 ± 0.029 0.699 ± 0.015 0.707 ± 0.014 0.715 ± 0.009 0.445 ± 0.024 0.422 ± 0.007 0.393 ± 0.009 0.364 ± 0.006
DMM
w2v-DMM 0.623 ± 0.020 0.737 ± 0.018 0.760 ± 0.010 0.772 ± 0.005 0.426 ± 0.015 0.428 ± 0.009 0.405 ± 0.006 0.378 ± 0.003
glove-DMM 0.641 ± 0.042 0.749 ± 0.011 0.758 ± 0.008 0.776 ± 0.006 0.449 ± 0.028 0.441 ± 0.008 0.408 ± 0.005 0.381 ± 0.003
Improve.
0.004
0.558 ± 0.015 0.600 ± 0.010 0.634 ± 0.011 0.658 ± 0.006 0.338 ± 0.012 0.327 ± 0.006 0.304 ± 0.004 0.271 ± 0.002
TMNtitle w2v-DMM 0.552 ± 0.022 0.653 ± 0.012 0.678 ± 0.007 0.682 ± 0.005 0.314 ± 0.016 0.325 ± 0.006 0.305 ± 0.004 0.282 ± 0.003
glove-DMM 0.586 ± 0.019 0.672 ± 0.013 0.679 ± 0.009 0.683 ± 0.004 0.343 ± 0.015 0.339 ± 0.007 0.307 ± 0.004 0.282 ± 0.002
Improve.
0.053
0.045
0.072
0.015
0.061
0.019
0.025
0.003
0.017
0.012
0.028
0.005
0.004
0.011
T=80
T=20
T=40
0.05
DMM
T=7
FROM (a part of Table 8 in the original published article): Purity and NMI results on the Twitter dataset with the
mixture weight λ = 0.6.
Purity
NMI
Data Method
DMM
T=20
T=4
0.552 ± 0.020 0.624 ± 0.010 0.647 ± 0.009 0.675 ± 0.009 0.194 ± 0.017 0.186 ± 0.006 0.184 ± 0.005 0.190 ± 0.003
Twitter w2v-DMM 0.581 ± 0.019 0.641 ± 0.013 0.660 ± 0.010 0.687 ± 0.007 0.230 ± 0.015 0.195 ± 0.007 0.193 ± 0.004 0.199 ± 0.005
glove-DMM 0.580 ± 0.013 0.644 ± 0.016 0.657 ± 0.008 0.684 ± 0.006 0.232 ± 0.010 0.201 ± 0.010 0.191 ± 0.006 0.195 ± 0.005
Improve.
0.029
0.015
0.038
0.009
0.013
0.009
0.012
T=40
T=80
T=20
T=40
T=80
0.02
T=4
TO: Purity and NMI results on the Twitter dataset with the mixture weight λ = 0.6.
Purity
NMI
Data Method
DMM
T=20
T=4
0.523 ± 0.011 0.619 ± 0.015 0.660 ± 0.008 0.684 ± 0.010 0.222 ± 0.013 0.213 ± 0.011 0.198 ± 0.008 0.196 ± 0.004
Twitter w2v-DMM 0.589 ± 0.017 0.655 ± 0.015 0.668 ± 0.008 0.694 ± 0.009 0.243 ± 0.014 0.215 ± 0.009 0.203 ± 0.005 0.204 ± 0.006
glove-DMM 0.583 ± 0.023 0.661 ± 0.019 0.667 ± 0.009 0.697 ± 0.009 0.250 ± 0.020 0.223 ± 0.014 0.201 ± 0.006 0.206 ± 0.005
Improve.
0.066
0.042
0.013
0.005
0.008
0.028
T=80
T=80
T=40
T=40
T=20
0.01
0.01
T=4
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
t
un
c
je
/
je
un
r
t
je
c
e
–
p
d
F
/
d
o
je
/
.
1
0
1
1
6
2
/
t
je
un
c
_
un
_
0
0
2
4
5
1
5
6
6
8
0
4
/
/
t
je
un
c
_
un
_
0
0
2
4
5
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
Télécharger le PDF