Errata: “Improving Topic Models with Latent Feature Word

Errata: “Improving Topic Models with Latent Feature Word
Representations”

Dat Quoc Nguyen, Richard Billingsley, Lan Du and Mark Johnson

Abstracto

FROM (a part of Table 10 in the original published arti-
cle): F1 scores for TMN and TMNtitle datasets.

Change in clustering and classification results
due to the DMM and LF-DMM bugs.

Datos

TMN

4.3 Document clustering evaluation

FROM (in the original published article): Para examen-
ple with 40 topics on the TMNtitle dataset, the DMM
achieves about 6% higher Purity and NMI scores
than LDA.
TO: For example with 80 topics on the TMNtitle
conjunto de datos, the DMM achieves about 7+% higher Purity
and NMI scores than LDA.

FROM (in the original published article): sobre el
short text TMN and TMNtitle datasets we obtain
3.6% y 3.0% higher Purity at T = 80.
TO: on the short text TMN and TMNtitle datasets
we obtain 6.1% y 2.5% higher Purity at T = 80.

4.4 Document classification evaluation

FROM (in the original published article): In addi-
ción, our w2v-DMM model achieves 3.6% y 3.4%
higher F1 score than the DMM model on short TMN
and TMNtitle datasets with T = 80, respectivamente.
TO:
Además, our w2v-DMM model achieves
5.4% y 2.9% higher F1 score than the DMM model
on short TMN and TMNtitle datasets with T = 80,
respectivamente.

λ = 0.6

Método

T=80

T=40

T=20

T=7
0.605 ± 0.023 0.724 ± 0.016 0.738 ± 0.008 0.741 ± 0.005
DMM
w2v-DMM 0.619 ± 0.033 0.744 ± 0.009 0.759 ± 0.005 0.777 ± 0.005
glove-DMM 0.624 ± 0.025 0.757 ± 0.009 0.761 ± 0.005 0.774 ± 0.010
Improve.

0.019
0.570 ± 0.022 0.650 ± 0.011 0.654 ± 0.008 0.646 ± 0.008
TMNtitle w2v-DMM 0.562 ± 0.022 0.670 ± 0.012 0.677 ± 0.006 0.680 ± 0.003
glove-DMM 0.592 ± 0.017 0.674 ± 0.016 0.683 ± 0.006 0.679 ± 0.009
Improve.

0.029

0.024

0.023

0.033

0.034

0.022

0.036

DMM

TO: F1 scores for TMN and TMNtitle datasets.

λ = 0.6

Método

Datos

TMN

T=80

T=40

T=20

T=7
0.607 ± 0.040 0.694 ± 0.026 0.712 ± 0.014 0.721 ± 0.008
DMM
w2v-DMM 0.607 ± 0.019 0.736 ± 0.025 0.760 ± 0.011 0.771 ± 0.005
glove-DMM 0.621 ± 0.042 0.750 ± 0.011 0.759 ± 0.006 0.775 ± 0.006
Improve.
DMM

0.014
0.500 ± 0.021 0.600 ± 0.015 0.630 ± 0.016 0.652 ± 0.005
TMNtitle w2v-DMM 0.528 ± 0.028 0.663 ± 0.008 0.682 ± 0.008 0.681 ± 0.006
glove-DMM 0.565 ± 0.022 0.680 ± 0.011 0.684 ± 0.009 0.681 ± 0.004
Improve.

0.056

0.048

0.054

0.029

0.054

0.065

0.08

FROM (a part of Table 11 in the original published arti-
cle): F1 scores for Twitter dataset.

λ = 0.6

Data Method

DMM

T=20

T=4
0.505 ± 0.023 0.614 ± 0.012 0.634 ± 0.013 0.656 ± 0.011
Twitter w2v-DMM 0.541 ± 0.035 0.636 ± 0.015 0.648 ± 0.011 0.670 ± 0.010
glove-DMM 0.539 ± 0.024 0.638 ± 0.017 0.645 ± 0.012 0.666 ± 0.009
Improve.

0.024

0.036

0.014

0.014

T=80

T=40

TO: F1 scores for Twitter dataset.

λ = 0.6

Data Method

DMM

T=20

T=4
0.469 ± 0.014 0.600 ± 0.021 0.645 ± 0.009 0.665 ± 0.014
Twitter w2v-DMM 0.539 ± 0.016 0.649 ± 0.016 0.656 ± 0.007 0.676 ± 0.012
glove-DMM 0.536 ± 0.027 0.654 ± 0.019 0.657 ± 0.008 0.680 ± 0.009
Improve.

0.054

0.015

0.012

T=80

T=40

0.07

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

.

1
0
1
1
6
2

/
t

yo

a
C
_
a
_
0
0
2
4
5
1
5
6
6
8
0
4

/

/
t

yo

a
C
_
a
_
0
0
2
4
5
pag
d

.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

FROM (a part of Table 7 in the original published article): Purity and NMI results on the TMN and TMNtitle datasets
with the mixture weight λ = 0.6.

Purity

NMI

Método

Datos

TMN

T=40

T=20

T=80

T=7
0.632 ± 0.025 0.719 ± 0.020 0.735 ± 0.010 0.742 ± 0.005 0.445 ± 0.017 0.426 ± 0.010 0.397 ± 0.006 0.364 ± 0.002
DMM
w2v-DMM 0.639 ± 0.024 0.741 ± 0.011 0.759 ± 0.006 0.778 ± 0.005 0.437 ± 0.018 0.429 ± 0.004 0.402 ± 0.003 0.377 ± 0.002
glove-DMM 0.646 ± 0.022 0.757 ± 0.009 0.763 ± 0.005 0.775 ± 0.011 0.445 ± 0.023 0.443 ± 0.008 0.404 ± 0.003 0.378 ± 0.004
Improve.

0.014
0.598 ± 0.018 0.650 ± 0.011 0.657 ± 0.007 0.651 ± 0.008 0.353 ± 0.012 0.317 ± 0.007 0.287 ± 0.004 0.257 ± 0.004
TMNtitle w2v-DMM 0.583 ± 0.020 0.665 ± 0.012 0.674 ± 0.006 0.681 ± 0.003 0.324 ± 0.013 0.329 ± 0.007 0.300 ± 0.003 0.277 ± 0.003
glove-DMM 0.601 ± 0.021 0.670 ± 0.016 0.680 ± 0.005 0.679 ± 0.008 0.354 ± 0.013 0.333 ± 0.009 0.301 ± 0.003 0.278 ± 0.003
Improve.

0.023

0.028

0.014

0.003

0.007

0.016

0.036

0.021

0.017

0.014

0.001

0.038

T=20

T=80

T=40

0.03

0.02

DMM

T=7

0.0

TO: Purity and NMI results on the TMN and TMNtitle datasets with the mixture weight λ = 0.6.

Purity

NMI

Método

Datos

TMN

T=40

T=20

T=80

T=7
0.637 ± 0.029 0.699 ± 0.015 0.707 ± 0.014 0.715 ± 0.009 0.445 ± 0.024 0.422 ± 0.007 0.393 ± 0.009 0.364 ± 0.006
DMM
w2v-DMM 0.623 ± 0.020 0.737 ± 0.018 0.760 ± 0.010 0.772 ± 0.005 0.426 ± 0.015 0.428 ± 0.009 0.405 ± 0.006 0.378 ± 0.003
glove-DMM 0.641 ± 0.042 0.749 ± 0.011 0.758 ± 0.008 0.776 ± 0.006 0.449 ± 0.028 0.441 ± 0.008 0.408 ± 0.005 0.381 ± 0.003
Improve.

0.004
0.558 ± 0.015 0.600 ± 0.010 0.634 ± 0.011 0.658 ± 0.006 0.338 ± 0.012 0.327 ± 0.006 0.304 ± 0.004 0.271 ± 0.002
TMNtitle w2v-DMM 0.552 ± 0.022 0.653 ± 0.012 0.678 ± 0.007 0.682 ± 0.005 0.314 ± 0.016 0.325 ± 0.006 0.305 ± 0.004 0.282 ± 0.003
glove-DMM 0.586 ± 0.019 0.672 ± 0.013 0.679 ± 0.009 0.683 ± 0.004 0.343 ± 0.015 0.339 ± 0.007 0.307 ± 0.004 0.282 ± 0.002
Improve.

0.053

0.045

0.072

0.015

0.061

0.019

0.025

0.003

0.017

0.012

0.028

0.005

0.004

0.011

T=80

T=20

T=40

0.05

DMM

T=7

FROM (a part of Table 8 in the original published article): Purity and NMI results on the Twitter dataset with the
mixture weight λ = 0.6.

Purity

NMI

Data Method

DMM

T=20

T=4
0.552 ± 0.020 0.624 ± 0.010 0.647 ± 0.009 0.675 ± 0.009 0.194 ± 0.017 0.186 ± 0.006 0.184 ± 0.005 0.190 ± 0.003
Twitter w2v-DMM 0.581 ± 0.019 0.641 ± 0.013 0.660 ± 0.010 0.687 ± 0.007 0.230 ± 0.015 0.195 ± 0.007 0.193 ± 0.004 0.199 ± 0.005
glove-DMM 0.580 ± 0.013 0.644 ± 0.016 0.657 ± 0.008 0.684 ± 0.006 0.232 ± 0.010 0.201 ± 0.010 0.191 ± 0.006 0.195 ± 0.005
Improve.

0.029

0.015

0.038

0.009

0.013

0.009

0.012

T=40

T=80

T=20

T=40

T=80

0.02

T=4

TO: Purity and NMI results on the Twitter dataset with the mixture weight λ = 0.6.

Purity

NMI

Data Method

DMM

T=20

T=4
0.523 ± 0.011 0.619 ± 0.015 0.660 ± 0.008 0.684 ± 0.010 0.222 ± 0.013 0.213 ± 0.011 0.198 ± 0.008 0.196 ± 0.004
Twitter w2v-DMM 0.589 ± 0.017 0.655 ± 0.015 0.668 ± 0.008 0.694 ± 0.009 0.243 ± 0.014 0.215 ± 0.009 0.203 ± 0.005 0.204 ± 0.006
glove-DMM 0.583 ± 0.023 0.661 ± 0.019 0.667 ± 0.009 0.697 ± 0.009 0.250 ± 0.020 0.223 ± 0.014 0.201 ± 0.006 0.206 ± 0.005
Improve.

0.066

0.042

0.013

0.005

0.008

0.028

T=80

T=80

T=40

T=40

T=20

0.01

0.01

T=4

yo

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

yo

a
r
t
i
C
mi

pag
d

F
/

d
oh

i
/

.

1
0
1
1
6
2

/
t

yo

a
C
_
a
_
0
0
2
4
5
1
5
6
6
8
0
4

/

/
t

yo

a
C
_
a
_
0
0
2
4
5
pag
d

.

F

b
y
gramo
tu
mi
s
t

t

oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Descargar PDF