- Multi-class text classification cross-bench # 0 1 2 -

HuffPost dataset.


Documents
Classes
Vocabulary
Commons2
Json

Scores

2020-05-09 17:48:35,403 : INFO : HuffPost dataset  : [200843] size, [18750] train, [6250] test, [41] classes, [62812] vocabulary,
[[('the', 28230), ('to', 16388), ('a', 12865), ('of', 12288), ('in', 9986), ('and', 9201), ('for', 7004), ('is', 6718), ('on', 5405), ('trump', 4260)]] common words,
Dataset HuffPost, 
MRR==0.7155570160588818
Accuracy==0.8325267371691462, 

38:TRAVEL          -> 38:TRAVEL, 40:WELLNESS
38:TRAVEL          -> 38:TRAVEL, 30:CULTURE & ARTS
6:STYLE & BEAUTY   -> 6:STYLE & BEAUTY, 10:ENTERTAINMENT
17:BUSINESS        -> 20:POLITICS, 10:ENTERTAINMENT
24:WOMEN           -> 24:WOMEN, 20:POLITICS
22:COMEDY          -> 22:COMEDY, 20:POLITICS
40:WELLNESS        -> 40:WELLNESS, 3:PARENTING
35:DIVORCE         -> 35:DIVORCE, 10:ENTERTAINMENT
20:POLITICS        -> 20:POLITICS, 12:GREEN
5:MONEY            -> 5:MONEY, 39:HOME & LIVING
Dataset HuffPost , Model 0 ,
Test score: 1.3521850590515136 ,
Test accuracy: 0.6449599862098694

5:HEALTHY LIVING   -> 4:CRIME, 25:GOOD NEWS
16:POLITICS        -> 12:COMEDY, 4:CRIME
19:WORLD NEWS      -> 15:WOMEN, 12:COMEDY
23:ENTERTAINMENT   -> 18:TRAVEL, 8:SPORTS
5:HEALTHY LIVING   -> 12:COMEDY, 15:WOMEN
39:BUSINESS        -> 12:COMEDY, 27:TECH
40:WELLNESS        -> 4:CRIME, 13:ARTS & CULTURE
8:SPORTS           -> 11:GREEN, 4:CRIME
28:PARENTING       -> 0:MONEY, 23:ENTERTAINMENT
15:WOMEN           -> 12:COMEDY, 11:GREEN


Accuracy on the 6250 test articles:  60 %

40:TRAVEL          -> 6 HOME & LIVING
15:COMEDY          -> 9 LATINO VOICES
30:ENTERTAINMENT   -> 20 FIFTY
14:POLITICS        -> 8 BLACK VOICES
4:WELLNESS         -> 8 BLACK VOICES
21:STYLE & BEAUTY  -> 20 FIFTY
6:HOME & LIVING    -> 13 CULTURE & ARTS
34:WEIRD NEWS      -> 24 WORLD NEWS
40:TRAVEL          -> 1 HEALTHY LIVING
6:HOME & LIVING    -> 8 BLACK VOICES

20news dataset.


Documents
Classes
Vocabulary
Commons
Commons

Scores

2020-05-09 17:59:13,973 : INFO : 20news dataset  : [1764] size, [1323] train, [441] test, [3] classes, [82339] vocabulary, 
[[('', 103146), ('the', 15829), ('of', 7906), ('to', 7812), ('a', 7676), ('and', 6606), ('in', 4973), ('is', 4606), ('for', 3658), ('i', 3444)]] common words,
Dataset 20news,
MRR=0.9833711262282692
Accuracy=1.0,

0:comp.graphics      -> 0:comp.graphics
1:rec.sport.baseball -> 1:rec.sport.baseball
0:comp.graphics      -> 0:comp.graphics
0:comp.graphics      -> 0:comp.graphics
2:sci.space          -> 2:sci.space
2:sci.space          -> 2:sci.space
0:comp.graphics      -> 0:comp.graphics
2:sci.space          -> 2:sci.space
0:comp.graphics      -> 0:comp.graphics
2:sci.space          -> 2:sci.space
Dataset 20news , Model 0 ,
Test score: 0.0912760134845499 ,
Test accuracy: 0.9727891087532043

1:rec.sport.baseball -> 1:rec.sport.baseball,
2:sci.space          -> 2:sci.space,
0:comp.graphics      -> 0:comp.graphics,
1:rec.sport.baseball -> 1:rec.sport.baseball,
0:comp.graphics      -> 0:comp.graphics,
2:sci.space          -> 2:sci.space,
1:rec.sport.baseball -> 1:rec.sport.baseball,
1:rec.sport.baseball -> 1:rec.sport.baseball,
0:comp.graphics      -> 0:comp.graphics,
0:comp.graphics      -> 0:comp.graphics,


Accuracy on the 441 test articles: 95 %

2:sci.space          -> 2 sci.space
1:rec.sport.baseball -> 1 rec.sport.baseball
0:comp.graphics      -> 0 comp.graphics
1:rec.sport.baseball -> 1 rec.sport.baseball
1:rec.sport.baseball -> 1 rec.sport.baseball
2:sci.space          -> 2 sci.space
2:sci.space          -> 2 sci.space
0:comp.graphics      -> 0 comp.graphics
2:sci.space          -> 1 rec.sport.baseball
1:rec.sport.baseball -> 1 rec.sport.baseball

Reuters dataset.


Documents
Classes
Vocabulary
Commons
Commons

Scores

2020-05-09 18:09:39,163 : INFO : reuters dataset  : [11218] size, [8413] train, [2805] test, [46] classes, [30979] vocabulary, 
[[('the', 82723), ('of', 42393), ('to', 40350), ('in', 33157), ('said', 29978), ('and', 29956), ('a', 29581), ('mln', 20141), ('3', 16668), ('for', 15224)]] common words,
Dataset reuters,
MRR=0.8862745098039214
Accuracy=0.9372549019607843, 

9:coffee         -> 9:coffee, 1:grain
10:sugar         -> 10:sugar, 3:earn
9:coffee         -> 9:coffee, 4:acq
11:trade         -> 11:trade, 4:acq
4:acq            -> 4:acq, 16:crude
3:earn           -> 3:earn, 4:acq
3:earn           -> 3:earn, 19:money-fx
4:acq            -> 4:acq, 3:earn
4:acq            -> 4:acq, 3:earn
11:trade         -> 11:trade, 19:money-fx
Dataset reuters , Model 0 ,
Test score: 0.7079413878513955 ,
Test accuracy: 0.8363636136054993

3:earn           -> 3:earn, 20:interest
3:earn           -> 3:earn, 1:grain
3:earn           -> 3:earn, 19:money-fx
3:earn           -> 3:earn, 4:acq
3:earn           -> 3:earn, 20:interest
1:grain          -> 1:grain, 28:livestock
4:acq            -> 4:acq, 3:earn
39:pet-chem      -> 4:acq, 3:earn
16:crude         -> 3:earn, 19:money-fx
3:earn           -> 3:earn, 20:interest


Accuracy on the 2805 test articles: 78 %

25:gold          -> 0 cocoa
3:earn           -> 3 earn
38:orange        -> 1 grain
4:acq            -> 4 acq
3:earn           -> 3 earn
4:acq            -> 4 acq
20:interest      -> 20 interest
4:acq            -> 4 acq
3:earn           -> 3 earn
3:earn           -> 3 earn

Imdb dataset.


Documents
Classes
Vocabulary
Commons
Json
TODO

#30@2020.05-15k#l