Take your research further with SCHOLAT data
SROAD (SCHOLAT Research Open-Access Dataset) is designed to help academics and independent researchers advance research objectives on academic social network, SCHOLAT knowledge graph, smart education and other AI related fields.


SCHOLAT is an emerging vertical social networking system designed and built specifically for scholars, learners and course instructors. The main goal of SCHOLAT is to enhance collaboration and social interactions focused around scholarly and learning discourses among the community of scholars. In addition to social networking capabilities, SCHOLAT incorporates various modules to encourage collaborative and interactive discussions, for example, chat, email, events, news posts, etc.

SCHOLAT Open-Access Dataset

Name Nodes Edges Description
SCHOLAT Social Network 16,007 202,248

* More SCHOLAT open-access dataset will be released soon.

Download Notice

If you want to acquire any dataset, you should login to SCHOLAT and fill the application form.

Copyright Notice

  1. Respect the privacy of personal information of the original source.
  2. The original copyright of all the dataset belongs to "SCHOLAT Lab". "SCHOLAT Lab" collects, organizes, filters and purifies them.
  3. If you want to use the dataset for depth study, data providers "SCHOLAT Lab" should be identified in your results.
  4. The dataset is only for the specified applicant or study groups for research purposes. Without permission, it may not be used for any commercial purposes.
  5. If the terms changed, the latest online version shall prevail.

SCHOLAT Social Network [Download]

Community is the implicit structure in social networks. In academic social networks, the users with similar or same research interests are more likely to be in the same community with close links and similar attributes. Effective community detection results can be further utilized for user analytics and user recommendation.

This dataset aims to fuse user links and attributes for community detection. The dataset mainly consists of three parts where contains 16,007 users and 202,248 links. (1) "attribute" directory contains 16,007 files which are user attributes with the user IDs as file names. (2) "links.txt" contains 202,248 lines and each line means that there exists one link (friendship, team members in the same research teams, or classmates in the same courses) between two users (represented as IDs) which are split by TAB. (3) "lexicons.txt" contains 25,817 words with 15,790 Chinese words and 10,027 English words, which constitutes all the user attributes.

BTW, the last version of this dataset has been successfully applied for the first ChineseCSCW Big Data Analytics Competition (ChineseCSCW Cup) on the 15th Chinese Conference on Computer Supported Cooperative Work and Social Computing (ChineseCSCW 2020). View the detail on https://www.scholat.com/confweb/CCSCW2020/big_data_competiton.jsp. The difference between this version and the last version on ChineseCSCW Cup is that we extend more user links.

How to cite:
This websit accompanies our paper:

  1. Qing Xu, Lunjie Qiu, Ronghua Lin, Yong Tang, Chaobo He, and Chengzhe Yuan, "An Improved Community Detection Algorithm via Fusing Topology and Attribute Information." CSCWD 2021, [in press].

Copyright ©2009-2021. SCHOLAT.com