SENTIMENTAL ANALYSIS
For the last few days I have been learning and implementing Text Analysis and Sentimental Analysis. I used few reviews of specific firms from Indeed , Restaurant reviews from different sources. All are in .txt files and started to analyse them. I mainly used R for my analysis purpose. Sementria in Excel can also be used. The main problem I faced is to deal with the neutral reviews. The package Sentiment in R has been removed so I had to deal with different one. Then I got SYUZHET package which came very handy. I am sharing my code which can be used for any .txt file Sentiment and Text analysis.
a="indel.txt"
a1=readLines(a)
length(a1)
head(a1)
tail(a1)
d=VectorSource(a1)
d1=Corpus(d)
summary(d1)
wordLengths=d1(0,Inf)
d1=tm_map(d1, content_transformer(tolower))
d1=tm_map(d1, removePunctuation)
d1=tm_map(d1, removeNumbers)
d1=tm_map(d1, removeWords, stopwords("english"))
d1=tm_map(d1, stemDocument)
d1=tm_map(d1, stripWhitespace)
t=TermDocumentMatrix(d1)
t
inspect(t[1:50,1:50])
t1= DocumentTermMatrix(d1)
inspect(t1[1:10,1:10])
inspect(t1[1:50,1:50])
mat=as.matrix(t1)
v=sort(rowSums(mat),decreasing = TRUE)
findFreqTerms(t, 50)
findAssocs(t, "work", 0.5)
t2= removeSparseTerms(t1, 0.1)
dim(t2)
t3=as.matrix(t2)
t3
wordcloud(d1, min.freq = 8,
max.words=1500, rot.per=0.35,
colors=brewer.pal(8, "Dark2"), scale = c(3,0.5))
senti=get_nrc_sentiment(a1)
txt=cbind(a1,senti)
totsenti=data.frame(colSums(txt[,c(2:11)]))
names(totsenti)="count"
totsenti=cbind("sentiment"=rownames(totsenti),totsenti)
rownames(totsenti)=NULL
ggplot(data = totsenti,aes(x=sentiment,y=count))+geom_bar(aes(fill=sentiment),stat="identity")+
theme(legend.position = "none")+xlab("sentiment")+ylab("tot count")+ggtitle("tot senti score")
The get_nrc_sentiment implements Saif Mohammad’s NRC Emotion lexicon. According to Mohammad, “the NRC emotion lexicon is a list of words and their associations with eight emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and two sentiments (negative and positive)” (See http://www.purl.org/net/NRCemotionlexicon). The get_nrc_sentiment function returns a data frame in which each row represents a sentence from the original file. The columns include one for each emotion type was well as the positive or negative sentiment valence.