Text classification corpora used to examine the effect of varying class number on the classifiers' performance
来源: 庞观松/
University of Technology Sydney, Australia

In many classification problems, e.g., text classification, with tthe increase of class number, the discrepancy between different classes would reduce. So it would become more difficult to perform dimension reduction and classification techniques effectively. 


Recently, We have used the 20Newsgroup to create 8 datasets with varying class number to examine the effect of varying class number on the performance of various classifiers.

The datasets were used in our recent DMKD paper available at http://link.springer.com/article/10.1007/s10618-014-0358-xIf you want to conduct similar experiments, you can download the datasets below.

Note: 20 Newsgroup is originally a collection of nearly 20,000 newsgroup documents, organized evenly into 20 different newsgroups. It is available at  http://qwone.com/~jason/20Newsgroups/.



登录用户可以查看和发表评论, 请前往  登录 或  注册
SCHOLAT.com 学者网
免责声明 | 关于我们 | 联系我们