IRJET-SOCIAL SUMMARIZATION FOR E-COMMERCE SITE BY EVALUATING THE COMMENTS AND RATINGS

Published on June 2016 | Categories: Types, Presentations | Downloads: 50 | Comments: 0 | Views: 134
of 7
Download PDF   Embed   Report

Internet has widely applied in various fields, more such major popular field is e-commerce. In order to gain peoples trust we use reputation based trust models and feedback ratings to compute sellers reputation trust scores. It became very difficult to select the trust worthy sellers. One such key issue for any e-commerce application is “all good reputation”. The buyers openly express their opinions in the feedback comments which is observed by the sellers and the e-commerce site management. In this paper the buyers express their opinions genuinely in the feedback comments. Here we propose a Comm Trust approach that combines dependency relation analysis. A tool recently developed in natural language processing and opinion mining from feedback comments. We further propose algorithm based on dependency relation analysis and Latent Dirichlet Allocation (LDA) topic modelling to cluster expressions into dimensions and then compute aggregated dimension ratings and weights.

Comments

Content

International Research Journal of Engineering and Technology (IRJET)
Volume: 02 Issue: 07 | Oct-2015

www.irjet.net

e-ISSN: 2395 -0056
p-ISSN: 2395-0072

SOCIAL SUMMARIZATION FOR E-COMMERCE SITE BY EVALUATING
THE COMMENTS AND RATINGS
B.Bhanu Vidya Kiran1, M.Sampath Kumar2
M.Tech, Department of Computer Science & Systems Engineering, Andhra University, India
Associate Professor, Department of Computer Science & Systems Engineering, Andhra University, India
1

2

---------------------------------------------------------------------***---------------------------------------------------------------------

Abstract - Internet has widely applied in various
fields, more such major popular field is e-commerce. In
order to gain peoples trust we use reputation based
trust models and feedback ratings to compute sellers
reputation trust scores. It became very difficult to select
the trust worthy sellers. One such key issue for any ecommerce application is “all good reputation”. The
buyers openly express their opinions in the feedback
comments which is observed by the sellers and the ecommerce site management. In this paper the buyers
express their opinions genuinely in the feedback
comments. Here we propose a Comm Trust approach
that combines dependency relation analysis. A tool
recently developed in natural language processing and
opinion mining from feedback comments. We further
propose algorithm based on dependency relation
analysis and Latent Dirichlet Allocation (LDA) topic
modelling to cluster expressions into dimensions and
then compute aggregated dimension ratings and
weights.

Key Words: E-Commerce, text mining, natural
language processing, opinion mining and latent
Dirichelt allocation

In this paper, we propose comment-based multidimensional trust model by mining ecommerce feedback
comments. With Comm Trust comprehensive trust profiles
are computed for sellers, which include reputation scores
and weights, as well as the overall trust score by the
aggregation of reputation scores. Here it combine
dependency relation analysis (DRA), a tool which is
recently developed in natural language processing and
lexical based opinion mining based techniques, to extract
the opinion expressions from feedback comments.
We further propose an algorithm based on DRA
and LDA topic modelling techniques to cluster expressions
into dimensions and compute the dimension ratings and
weights. The individual trust level models are aimed to
compute the reliability of peers and assist buyers in their
decision making whereas the system level models are
aimed to regulate the behaviour of peers, prevent
fraudsters and ensure system security.The rating
aggregation algorithm for computing individual reputation
score which include the simple positive feedback
percentage and the average of individual star ratings as in
amazon or any other ecommerce system, many models
like Kalman inference[20], which also computes trust
score variance and confidence level. With respect to time
more factors are involved in feedback ratings, reputation
models and comment based trust values.

1. INTRODUCTION

2. RELATED WORK

It is very difficult to find the accurate trust values
for any product. Few reporting systems have been
implemented in ecommerce system such as eBay and
amazon, where the overall reputation scores for sellers are
computed by aggregating feedback ratings. The main issue
with eBay reputation management system is “all good
reputation”. It becomes easy for buyers to select sellers
based on strong bias. The DSR’s are aggregated on 1 to 5
star scale as rating scores. One such reason for lack of
negative rating at ecommerce site is that users who leave
negative feedback ratings which will damage their own
reputation.

[1] Extracting product features and opinions from reviews

© 2015, IRJET

This paper introduces OPINE, which is an
unsupervised information extraction system that
embodies a solution to identify product features, identify
opinions regarding product features, determine the
polarity of opinions, and rank opinions based on their
strength these OPINE solves the opinion mining tasks and
gives an output as set of product features
Here we compare the most relevant previous mining
systems with opine and find whether the opines precision
is better than the previous data sets. The other systems

ISO 9001:2008 Certified Journal

Page 295

International Research Journal of Engineering and Technology (IRJET)
Volume: 02 Issue: 07 | Oct-2015

www.irjet.net

are used to identify polarity of documents. But opine is
one such system which recalls the opinion phase
extraction and opinion phase polarity determination.
[2] Mining and summarizing customer reviews
With the rapid growth of ecommerce more and
more products are sold on the web in order to satisfy the
customers and give them a good shopping experience the
online merchants takes review from the customers and
mining is done based on the feedback. Some popular
products can get more reviews and some may get less
reviews .The set of customer review of a product involves
three sub tasks




e-ISSN: 2395 -0056
p-ISSN: 2395-0072

comments on ecommerce site reveals if the buyer give
positive rating for any transaction and sometimes they
leave mixed opinions in order to know the exact opinion of
any buyer, the sellers view both the feedback comments
and ratings. For example at times he may give positive
rating and feedback as bad communication and late
shipping. With all these salient features the seller can
assume whether there is lack of shipping, customer
services or product delivery, with this the ecommerce site
management will take care of these factors. If product is
not upto the buyers satisfaction. We can know by the
ratings of the product. The overall trust score T for a seller
is the weighted aggregation of dimension trust scores for
the sellers,

Identify the features of the product that
customers have expressed their own opinions
Per each positive and negative opinions,
review is made to enhance their product
review
And finally producing the summary using the
discovered information

In this paper our work is classification on reviews they
not only focus on classifying each review but they classify
the whole sentence positive opinions are review at one
time and negative opinions are review at other time. The
most important issue of this paper is information
gathering behaviour to find how people think. To facilitate
future work a discussion is made on available resources,
computation of datasets, and evaluating the customer
reviews.

3. SYSTEM ARCHITECTURE:
This figure shows the scheme of Comm Trust
frame work. Here the opinion expressions and associated
ratings are first extracted from feedback expressions.
Dimension trust scores together with weights are
computed by clustering aspect expressions into
dimensions and dimensions into ratings.

3.1 COMM TRUST COMMENT
DIMENSIONAL TRUST EVALUATION

BASED

where td and wd represent the trust score and weight for
dimension d=(d=1……m).

3.2 EXTRACTING ASPECT EXPRESSIONS AND
RATINGS BY TYPED DEPENDENCY ANALYSIS
The typed dependency relation is a NLP tool to
perform grammatical relationship in a sentence we can
parse the sentence into pair of words in the form of heads.
Let us take an example “ super quick shipping product
work excellent”. The sentence super quick shipping is
represented as three dependency relation. The adjective
modifier relation and (shipping-3,super-1) and amod
(shipping-3,quick-2) indicates that the super modifies
shipping and quick modifies shipping words are annotated
pos tags as noun(NN),verb(VB), adjective(JJ), adverb(RB).
The modified relations thus can be denoted as
(modifier,head) pairs. The ratings from the dimension
expression for the head terms are identified by identifying
the prior polarity of modifier terms by SentiWordNet. In
SentiWordNet the prior polarity terms are positive,
negative or neutral which corresponds to ratings of +1,1,0.

MULTI

Buyers express their opinions openly and
honestly as feedback comments. Our analysis of feedback
© 2015, IRJET

ISO 9001:2008 Certified Journal

Page 296

International Research Journal of Engineering and Technology (IRJET)
Volume: 02 Issue: 07 | Oct-2015

www.irjet.net

Fig. 1 plots trust score td by Equation 3 in relation to
different settings of total number of ratings n and pseudo
counts m. The figure is plotted for y/n = 0.8, and similar
trends are observed for other values of y/n. It shows that
when the total number of observed ratings n is large (n ≥
300), td is not very sensitive to the settings of m and
converges to the observed positive rating frequency of 0.8.
When there is a limited number of observed ratings, that is
n < 300, an observed high positive rating frequency y/n is
very likely an overestimation, and so m is set to regulate
the estimated value for td. With m = 2, td ≈ 0.8 when n ≥ 50.
On the other hand, with m = 20, td ≈ 0.8 only when n ≈ 300.
From our experiments, settings of m = 6..20 typically give
stable results. By default, we set m = 6.
We will first describe our approach based on the typed
dependency anlaysis to extracting aspect opinion
expressions and identifying their associated ratings. We
then propose an algorithm based on LDA for clustering
dimension expressions into dimensions and computing
dimension weights.

© 2015, IRJET

e-ISSN: 2395 -0056
p-ISSN: 2395-0072

Sample Comments on eBay
No

Comment

eBay rating

C1

Beautiful item!

1

C2

This phone is simply awesome

1

C3

Im not satisfied with the

1.2

delivery of product
C4

Best seller! Thank you

1.8

C5

Wrong colour was sent. Item

1.3

ISO 9001:2008 Certified Journal

was damaged

Page 297

International Research Journal of Engineering and Technology (IRJET)
Volume: 02 Issue: 07 | Oct-2015

www.irjet.net

4.CLUSTERING DIMENSION EXPRESSIONS INTO
DIMENSIONS
In order to cluster aspect expressions into
semantically coherent categories, we use lexical LDA
algorithm. Here LDA takes document by term matrix as
input which differs from conventional topic modelling
approach we use two types of lexical knowledge to
supervise clustering dimension expression into
dimensions so as to form meaningful clusters
a.

b.

As comments are short the co-occurence of hear
terms in comments is not very clear.so we use the
co-occurence of dimension expression with
respect to same modifier across comments which
give a meaningful expression
In some feedback comments we observe that the
same aspect of ecommerce transaction is
commented more than once.

By using this shallow lexical knowledge of
dependency relation for dimension expression, the
clustering problem is evolved by topic modelling as
follows: the input tool lexical LDA or dependency relation
for dimension expression in the form of (modifier,head)
pairs or their negations like (past shipping) or (not good
seller).For LDA, gibbs sampling has been proposed as
approximate inference. The detailed description for gibbs
sample for LDA is given as below, M,K,V denotes number
of documents, number of topics and number of word
tokens of vocabulary and α -> and β -> be hyper
parameters on mixing component of topics. The
distribution of a word token wi for a topic k where i=(m,n)
denotes nth word in mth document. w ->={wi=t,wгi}, Z>={Zi=k,Zгi} and n denotes count.

e-ISSN: 2395 -0056
p-ISSN: 2395-0072

5.1 DATASETS
Per suppose take ten eBay sellers where two sellers
were taken per each four categories. List out all the
categories in addition to seller products and then extract
the feedbackprofile for each seller.



Feedback score is given as the total number of
positive ratings for a seller
Positive feedback sellers percentage is calculated
as
( positive ratings ) / ( positive ratings + negative
ratings

Likewise take another shopping site and evaluate two
items per each four categories. Note that each item
illustrates the feedback of the sellers based on their
ratings.

eBay seller product information

5.2 EVALUATE METRICS
Trust evaluation of ecommerce application is to
rank sellers and provide a trust worthy sellers to the
users. It also helps them in their business. Here large
number of sellers are taken for various categories of
products. The feedback comments are evaluated in any
order and later arranged in high to low manner. Average
rating is calculated per each product by clustering the
head terms. It gives the exact accuracy of the rating.

5. EXPERIMENT
Experiment is done on two ecommerce data sets
and one restaurant review datasets were conducted
inorder to evaluate the various aspects of Comm Trust
which includes trust model and the lexical LDA algorithm
for clustering dimension expressions. Inorder to
demonstrate the generality of lexical LAD, the restaurant
review dataset is used other than ecommerce.
© 2015, IRJET

Acc(H)= Σki Ni / |v|

ISO 9001:2008 Certified Journal

Page 298

International Research Journal of Engineering and Technology (IRJET)
Volume: 02 Issue: 07 | Oct-2015

www.irjet.net

The ultimate goal of trust evaluation for e-commerce
applications is to rank sellers and help users select
trustworthy sellers to transact with. In this respect, in
addition to absolute trust scores, relative rankings are
more important for evaluating the performance of
different trust models. To this end, we employ Kendall’s τ
[50] to measure the correlation between two rankings
based on the number of pairwise swaps that is needed to
transform one ranking into another. τ falls in [−1, 1], a
positive value indicates positive correlation, zero
represents independence and a negative value indicates
negative correlation. τ is the standard metric for
comparing information retrieval systems, and it is
generally considered that τ ≥ 0.9 for a correlation test
suggests two system rankings are equivalent. A large value
for |τ| with p ≤ 0.05 suggests that two rankings are
correlated.

e-ISSN: 2395 -0056
p-ISSN: 2395-0072

(0.7275 and 0.1083) suggest that on eBay and Amazon,
user preference rankings after reading comment
summaries are not strongly correlated with the rankings
by the respective eBay and Amazon reputation systems.
This suggests that the comments contain distinct
information for users to rank sellers.

5.3 USER STUDY
A user study was conducted to elicit users ranking
of sellers from reading feedback comments, which was
also used as the ground truth for evaluating the
CommTrust multidimensional trust evaluation model.
Inspired by evaluation techniques from the Information
Retrieval community [51] , experiment participants are
asked to judge differences rather than make absolute
ratings. For ten sellers, each seller is paired with every
other seller and form 45 pairs. The orders for pairs and for
sellers within pairs were randomised to avoid any
presentational bias. Each pair was judged by five users
and a seller preferred by at least three users was seen as a
vote for the seller. The total number of preference votes
from 45 pairs for each seller were used as the preference
score to rank sellers.
It is infeasible to ask participants to read all
comments for two sellers and choose a preferred seller.
We therefore generated summaries of comments for
sellers. The comment summaries for each pair of users
were presented side by side to elicit users preference
judgements. For a seller, we generated opinionated
phrases for four dimensions, where positive and negative
phrases for each dimension are ordered by decreasing
frequency. The three most frequent positive and negative
phrases for each dimension formed the summary for a
seller Under the column heading of Comment rank is the
ranking of sellers by user preferences after participants
read the comment summaries for sellers. The correlation
between rankings are measured by Kendall’s τ. The rank
difference between two ranking vectors is defined
as:where rank(i) and rank’(i) are respectively the rank for
seller i by two ranking methods, and N=10. The low
Kendall’s τ value (0.1111 and 0.4222) and high p- value
© 2015, IRJET

5.4 Evaluation of Lexical-LDA
Informal language expressions are widely used in
feedback comments. Some pre-processing was first
performed: Spelling correction was applied. Informal
expressions like A+++ and thankx were replaced with AAA
and thanks. The Stanford dependency relation parser was
then applied to produce the dependency relation
representation of comments and dimension expressions
were extracted. The dimension expressions were then
clustered to dimensions by the Lexical-LDA algorithm.
The ranking difference of 3 for ten eBay users between
rankings by reading comments and by eBay reputation
system suggests that on average there is a difference of 3
ranks for sellers by the two approaches. Similarly for
Amazon sellers there is difference of 1.8 ranks on average.
Our user study demonstrates that it can be speculated that
content of comments can be used to reliably evaluate the
trustworthiness of sellers.
To evaluate Lexical-LDA, the ground truth for clustering
was first established. Dimension expressions are (modifier,
head) pairs, and to remove noise only those pairs with
support for head terms of at least 0.1% or three comments
(whichever is larger) were considered for manual
clustering. Some head terms resulted from parsing errors
that do not appear to be an aspect were discarded.
Examples of such terms include thanks, ok and A+++. In the
end a maximum of 100 head terms were manually

ISO 9001:2008 Certified Journal

Page 299

International Research Journal of Engineering and Technology (IRJET)
Volume: 02 Issue: 07 | Oct-2015

www.irjet.net

clustered based on the inductive approach to analysing
qualitative data [52]. We first grouped head terms into
categories according to their conceptual meaning – some
head terms may belong to more than one category, and
some orphan words were discarded. We then combined
some categories with overlapping head terms into a
broader category, until some level of agreement was
reached between annotators. 3 As a result of this manual
labelling process for the eBay and Amazon dataset, the
feedback comments for each seller finally seven clusters
are obtained. A strength of CommTrust is that the relative
weights that users have placed on different dimensions in
their feedback comments can be inferred. However, it is
hard to elicit the weights from users when they write the
feedback comments. We therefore evaluate our dimension
weight prediction indirectly. To verify the effectiveness of
the dimension weights in the overall trust score, we
compute the unweighted overall trust scores for sellers,
and compare the ranking of sellers by unweighted overall
trust scores with the ground truth ranking by users
Lexical-LDA was implemented based on the Mallet topic
modelling toolkit [53]. With aspect expressions in the form
of (modifier, head) pairs, the modifier term by head term
matrix formed the input for Lexical-LDA. In constructing
the cannot-link head term list for a head term ( c.f. Section
4.2), only head terms appearing together with the head
term in at least 0.1% of or three (whichever is larger)
comments were considered. The purpose was to remove
the otherwise many spurious cannot-link head terms. The
Lexical-LDA parameter settings were: prior pseudo counts
for topics and terms were set as α k = 0.1 and βt = 0.01 ( See
Equation (5)), the number of topics K = 4, 7, 10 for
evaluating the trust model and number of iterations was
set to
1000.
We evaluate Lexical-LDA against standard LDA for
clustering and against the human clustering result. As
there are seven categories by human clustering, K = 7 for
LexicalLDA.

multi dimensional trust profile for sellers by taking the
ratings of feedback comments. Amazon and eBay are one
such fast going sites in this trend. So few dimensions are
taken by clustering the expression, NLP,opinion mining
and some marisation techniques are involved.

7.REFERENCES
[1] P. Resnick, K. Kuwabara, R. Zeckhauser, and E.
Friedman, “Reputation systems: Facilitating trust in
internet interactions,” Commun. ACM, vol. 43, no. 12, pp.
45–48, 2000.
[2] P. Resnick and R. Zeckhauser, “Trust among strangers
in inter- net transactions: Empirical analysis of eBay’s
reputation system,” Econ. Internet E-Commerce, vol. 11,
no. 11, pp. 127–157, Nov. 2002.
[3] J. O’Donovan, B. Smyth, V. Evrim, and D. McLeod,
“Extracting and visualizing trust relationships from online
auction feedback comments,” in Proc. IJCAI, San Francisco,
CA, USA, 2007, pp. 2826–2831.
[4] M. De Marneffe, B. MacCartney, and C. Manning,
“Generating typed dependency parses from phrase
structure parses,” in Proc. LREC, vol. 6. 2006, pp. 449–454.
[5] M. De Marneffe and C. Manning, “The Stanford typed
dependen- cies representation,” in Proc. CrossParser,
Stroudsburg, PA, USA, 2008.
[6] B. Pang and L. Lee, “Opinion mining and sentiment
analysis,” Found. Trends Inf. Ret., vol. 2, no. 1–2, pp. 1–135,
Jan. 2008.
[7] B. Liu, Sentiment Analysis and Opinion Mining. San
Rafael, CA, USA: Morgan & Claypool Publishers, 2012.
[8] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet
alloca- tion,” J. Mach. Learn. Res., vol. 3, pp. 993–1022, Jan.
2003.
[9] T. Hofmann, “Probabilistic latent semantic indexing,” in
Proc. 22nd ACM SIGIR, New York, NY, USA, 1999, pp. 50–
57.

6.CONCLUSION
With the rapid growth of ecommerce sites in
today’s world. It became complex for users to trust the
sellers product. Same product may vary in different sites
based on sellers opinion. In order to improve their
reputation the management need to focus on certain
things. The most high reputated products are given ranks
based on certain factors. On the other hand, the feedback
ratings are taken where positive and negative comments
are evaluated for accuracy. In this paper we proposed a
© 2015, IRJET

e-ISSN: 2395 -0056
p-ISSN: 2395-0072

[10] Y. Lu, C. Zhai, and N. Sundaresan, “Rated aspect
summarization of short comments,” in Proc. 18th Int. Conf.
WWW, New York, NY, USA, 2009.
[11] H. Wang, Y. Lu, and C. Zhai, “Latent aspect rating
analysis with- out aspect keyword supervision,” in Proc.
17th ACM SIGKDD Int. Conf. KDD, San Diego, CA, USA,
2011, pp. 618–626.

ISO 9001:2008 Certified Journal

Page 300

International Research Journal of Engineering and Technology (IRJET)
Volume: 02 Issue: 07 | Oct-2015

www.irjet.net

[12] H. Wang, Y. Lu, and C. Zhai, “Latent aspect rating
analysis
on
reviewtextdata:Aratingregressionapproach,”inProc.16thA
CM SIGKDD Int. Conf. KDD, New York, NY, USA, 2010, pp.
783–792.
[13] S. Ramchurn, D. Huynh, and N. Jennings, “Trust in
multi-agent systems,” Knowl. Eng. Rev., Vol. 19, no. 1, pp.
1–25, 2004.
[14] B. Yu and M. P. Singh, “Distributed reputation
management for electronic commerce,” Comput. Intell.,
vol. 18, no. 4, pp. 535–549, Nov. 2002.
[15] M. Schillo, P. Funk, and M. Rovatsos, “Using trust for
detecting deceptive agents in artificial societies,” Appl.
Artif. Intell., vol. 14, no. 8, pp. 825–848, 2000

© 2015, IRJET

e-ISSN: 2395 -0056
p-ISSN: 2395-0072

BIOGRAPHIES
B. Bhanu Vidya Kiran received the
M.Tech degree in computer science and
systems engineering from Andhra
University, Visakhapatnam, in the year
2013-2015. His research mainly focuses
on network security and system security
with particular interest in security
issues in RFID and NFC security communication
protocols. Topics include: mutual authentication
protocols, secure ownership transfer protocols,
polymorphic worms, tracing mobile attackers.
M. Sampath Kumar is working as
Associate Professor in Dept of Computer
Science & Systems Engineering in
Andhra University ,Visakhpatnam. His
research mainly focuses on network
security and system security with
particular interest in security issues in
RFID and NFC security communication protocols. Topics
include :mutual authentication protocols, secure
ownership transfer protocols, cryptography, network
security , interconnection networks, design and analysis
of algorithms, and digital home.

ISO 9001:2008 Certified Journal

Page 301

Sponsor Documents

Or use your account on DocShare.tips

Hide

Forgot your password?

Or register your new account on DocShare.tips

Hide

Lost your password? Please enter your email address. You will receive a link to create a new password.

Back to log-in

Close