Report for: Comparing writing style feature-based classification methods for estimating user reputations in social media

Title	Comparing writing style feature-based classification methods for estimating user reputations in social media
Published in	SpringerPlus, March 2016
DOI	10.1186/s40064-016-1841-1
Pubmed ID	27006870
Authors	Jong Hwan Suh
Abstract	In recent years, the anonymous nature of the Internet has made it difficult to detect manipulated user reputations in social media, as well as to ensure the qualities of users and their posts. To deal with this, this study designs and examines an automatic approach that adopts writing style features to estimate user reputations in social media. Under varying ways of defining Good and Bad classes of user reputations based on the collected data, it evaluates the classification performance of the state-of-art methods: four writing style features, i.e. lexical, syntactic, structural, and content-specific, and eight classification techniques, i.e. four base learners-C4.5, Neural Network (NN), Support Vector Machine (SVM), and Naïve Bayes (NB)-and four Random Subspace (RS) ensemble methods based on the four base learners. When South Korea's Web forum, Daum Agora, was selected as a test bed, the experimental results show that the configuration of the full feature set containing content-specific features and RS-SVM combining RS and SVM gives the best accuracy for classification if the test bed poster reputations are segmented strictly into Good and Bad classes by portfolio approach. Pairwise t tests on accuracy confirm two expectations coming from the literature reviews: first, the feature set adding content-specific features outperform the others; second, ensemble learning methods are more viable than base learners. Moreover, among the four ways on defining the classes of user reputations, i.e. like, dislike, sum, and portfolio, the results show that the portfolio approach gives the highest accuracy.

View on publisher site Alert me about new mentions

Mendeley readers

The data shown below were compiled from readership statistics for 42 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country	Count	As %
Lebanon	1	2%
Unknown	41	98%

Demographic breakdown

Readers by professional status	Count	As %
Student > Master	8	19%
Student > Bachelor	6	14%
Student > Ph. D. Student	6	14%
Student > Doctoral Student	4	10%
Researcher	4	10%
Other	6	14%
Unknown	8	19%

Readers by discipline	Count	As %
Computer Science	14	33%
Engineering	4	10%
Business, Management and Accounting	4	10%
Linguistics	4	10%
Medicine and Dentistry	2	5%
Other	5	12%
Unknown	9	21%

Attention Score in Context

This research output has an Altmetric Attention Score of 1. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 03 March 2016.

All research outputs

#20,311,744

of 22,852,911 outputs

Outputs from SpringerPlus

#1,459

of 1,849 outputs

Outputs of similar age

#252,105

of 298,624 outputs

Outputs of similar age from SpringerPlus

#132

of 161 outputs

Altmetric has tracked 22,852,911 research outputs across all sources so far. This one is in the 1st percentile – i.e., 1% of other outputs scored the same or lower than it.

So far Altmetric has tracked 1,849 research outputs from this source. They typically receive a little more attention than average, with a mean Attention Score of 5.7. This one is in the 1st percentile – i.e., 1% of its peers scored the same or lower than it.

Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 298,624 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 1st percentile – i.e., 1% of its contemporaries scored the same or lower than it.

We're also able to compare this research output to 161 others from the same source and published within six weeks on either side of this one. This one is in the 1st percentile – i.e., 1% of its contemporaries scored the same or lower than it.

Comparing writing style feature-based classification methods for estimating user reputations in social media

Mentioned by

Citations

Readers on

Mendeley readers

Geographical breakdown

Demographic breakdown

Attention Score in Context