SENTIMENTAL ANALYSIS

SENTIMENTAL ANALYSIS

For the last few days I have been learning and implementing Text Analysis and Sentimental Analysis. I used few reviews of specific firms from Indeed , Restaurant reviews from different sources. All are in .txt files and started to analyse them. I mainly used R for my analysis purpose. Sementria in Excel can also be used. The main problem I faced is to deal with the neutral reviews. The package Sentiment in R has been removed so I had to deal with different one. Then I got SYUZHET package which came very handy. I am sharing my code which can be used for any .txt file Sentiment and Text analysis.

a="indel.txt"

a1=readLines(a)

length(a1)

head(a1)

tail(a1)

d=VectorSource(a1)

d1=Corpus(d)

summary(d1)

wordLengths=d1(0,Inf)

d1=tm_map(d1, content_transformer(tolower))

d1=tm_map(d1, removePunctuation)

d1=tm_map(d1, removeNumbers)

d1=tm_map(d1, removeWords, stopwords("english"))

d1=tm_map(d1, stemDocument)

d1=tm_map(d1, stripWhitespace)

t=TermDocumentMatrix(d1)

t

inspect(t[1:50,1:50])

t1= DocumentTermMatrix(d1)

inspect(t1[1:10,1:10])

inspect(t1[1:50,1:50])

mat=as.matrix(t1)

v=sort(rowSums(mat),decreasing = TRUE)



findFreqTerms(t, 50)

findAssocs(t, "work", 0.5)

t2= removeSparseTerms(t1, 0.1)

dim(t2)

t3=as.matrix(t2)

t3

wordcloud(d1, min.freq = 8,

     max.words=1500, rot.per=0.35, 

     colors=brewer.pal(8, "Dark2"), scale = c(3,0.5))


senti=get_nrc_sentiment(a1)

txt=cbind(a1,senti)

totsenti=data.frame(colSums(txt[,c(2:11)]))

names(totsenti)="count"

totsenti=cbind("sentiment"=rownames(totsenti),totsenti)

rownames(totsenti)=NULL


ggplot(data = totsenti,aes(x=sentiment,y=count))+geom_bar(aes(fill=sentiment),stat="identity")+

 theme(legend.position = "none")+xlab("sentiment")+ylab("tot count")+ggtitle("tot senti score")

The get_nrc_sentiment implements Saif Mohammad’s NRC Emotion lexicon. According to Mohammad, “the NRC emotion lexicon is a list of words and their associations with eight emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and two sentiments (negative and positive)” (See http://www.purl.org/net/NRCemotionlexicon). The get_nrc_sentiment function returns a data frame in which each row represents a sentence from the original file. The columns include one for each emotion type was well as the positive or negative sentiment valence.


To view or add a comment, sign in

More articles by Tirthankar Goon

Explore content categories