↓ Skip to main content

Comparing writing style feature-based classification methods for estimating user reputations in social media

Overview of attention for article published in SpringerPlus, March 2016
Altmetric Badge

Mentioned by

facebook
1 Facebook page

Citations

dimensions_citation
7 Dimensions

Readers on

mendeley
42 Mendeley
Title
Comparing writing style feature-based classification methods for estimating user reputations in social media
Published in
SpringerPlus, March 2016
DOI 10.1186/s40064-016-1841-1
Pubmed ID
Authors

Jong Hwan Suh

Abstract

In recent years, the anonymous nature of the Internet has made it difficult to detect manipulated user reputations in social media, as well as to ensure the qualities of users and their posts. To deal with this, this study designs and examines an automatic approach that adopts writing style features to estimate user reputations in social media. Under varying ways of defining Good and Bad classes of user reputations based on the collected data, it evaluates the classification performance of the state-of-art methods: four writing style features, i.e. lexical, syntactic, structural, and content-specific, and eight classification techniques, i.e. four base learners-C4.5, Neural Network (NN), Support Vector Machine (SVM), and Naïve Bayes (NB)-and four Random Subspace (RS) ensemble methods based on the four base learners. When South Korea's Web forum, Daum Agora, was selected as a test bed, the experimental results show that the configuration of the full feature set containing content-specific features and RS-SVM combining RS and SVM gives the best accuracy for classification if the test bed poster reputations are segmented strictly into Good and Bad classes by portfolio approach. Pairwise t tests on accuracy confirm two expectations coming from the literature reviews: first, the feature set adding content-specific features outperform the others; second, ensemble learning methods are more viable than base learners. Moreover, among the four ways on defining the classes of user reputations, i.e. like, dislike, sum, and portfolio, the results show that the portfolio approach gives the highest accuracy.

Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 42 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Lebanon 1 2%
Unknown 41 98%

Demographic breakdown

Readers by professional status Count As %
Student > Master 8 19%
Student > Bachelor 6 14%
Student > Ph. D. Student 6 14%
Student > Doctoral Student 4 10%
Researcher 4 10%
Other 6 14%
Unknown 8 19%
Readers by discipline Count As %
Computer Science 14 33%
Engineering 4 10%
Business, Management and Accounting 4 10%
Linguistics 4 10%
Medicine and Dentistry 2 5%
Other 5 12%
Unknown 9 21%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 1. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 03 March 2016.
All research outputs
#20,311,744
of 22,852,911 outputs
Outputs from SpringerPlus
#1,459
of 1,849 outputs
Outputs of similar age
#252,105
of 298,624 outputs
Outputs of similar age from SpringerPlus
#132
of 161 outputs
Altmetric has tracked 22,852,911 research outputs across all sources so far. This one is in the 1st percentile – i.e., 1% of other outputs scored the same or lower than it.
So far Altmetric has tracked 1,849 research outputs from this source. They typically receive a little more attention than average, with a mean Attention Score of 5.7. This one is in the 1st percentile – i.e., 1% of its peers scored the same or lower than it.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 298,624 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 1st percentile – i.e., 1% of its contemporaries scored the same or lower than it.
We're also able to compare this research output to 161 others from the same source and published within six weeks on either side of this one. This one is in the 1st percentile – i.e., 1% of its contemporaries scored the same or lower than it.