Title |
Comparing writing style feature-based classification methods for estimating user reputations in social media
|
---|---|
Published in |
SpringerPlus, March 2016
|
DOI | 10.1186/s40064-016-1841-1 |
Pubmed ID | |
Authors |
Jong Hwan Suh |
Abstract |
In recent years, the anonymous nature of the Internet has made it difficult to detect manipulated user reputations in social media, as well as to ensure the qualities of users and their posts. To deal with this, this study designs and examines an automatic approach that adopts writing style features to estimate user reputations in social media. Under varying ways of defining Good and Bad classes of user reputations based on the collected data, it evaluates the classification performance of the state-of-art methods: four writing style features, i.e. lexical, syntactic, structural, and content-specific, and eight classification techniques, i.e. four base learners-C4.5, Neural Network (NN), Support Vector Machine (SVM), and Naïve Bayes (NB)-and four Random Subspace (RS) ensemble methods based on the four base learners. When South Korea's Web forum, Daum Agora, was selected as a test bed, the experimental results show that the configuration of the full feature set containing content-specific features and RS-SVM combining RS and SVM gives the best accuracy for classification if the test bed poster reputations are segmented strictly into Good and Bad classes by portfolio approach. Pairwise t tests on accuracy confirm two expectations coming from the literature reviews: first, the feature set adding content-specific features outperform the others; second, ensemble learning methods are more viable than base learners. Moreover, among the four ways on defining the classes of user reputations, i.e. like, dislike, sum, and portfolio, the results show that the portfolio approach gives the highest accuracy. |
Mendeley readers
Geographical breakdown
Country | Count | As % |
---|---|---|
Lebanon | 1 | 2% |
Unknown | 41 | 98% |
Demographic breakdown
Readers by professional status | Count | As % |
---|---|---|
Student > Master | 8 | 19% |
Student > Bachelor | 6 | 14% |
Student > Ph. D. Student | 6 | 14% |
Student > Doctoral Student | 4 | 10% |
Researcher | 4 | 10% |
Other | 6 | 14% |
Unknown | 8 | 19% |
Readers by discipline | Count | As % |
---|---|---|
Computer Science | 14 | 33% |
Engineering | 4 | 10% |
Business, Management and Accounting | 4 | 10% |
Linguistics | 4 | 10% |
Medicine and Dentistry | 2 | 5% |
Other | 5 | 12% |
Unknown | 9 | 21% |