Sentiment Analysis in R - coding problem with switching raw data - r

I am trying to run code to complete Sentiment Analysis. My main goal is to do a Word Cloud and possibly a sentiment analysis score. I am using a script and swopping out the raw data but I don't really understand what the outcomes are and also I get stuck trying to create a TDM. I am new to programming so apologies if this is not clear.

Related

Error message on R : "data set 'X' has not been found" when trying to do topic modeling although I have already used that data for other techniques

I am doing a lyrical analysis of Paramore's discography using data from GeniusAPI. I have done most my analysis after going through data wrangling. I was able to create word clouds and bar charts based on sentiment analysis for each album. But now I am trying to conduct a topic model for one of these albums (Riot). To do this you have to make sure your data is a document term matrix.
Just at the first step an error message comes up when trying to do start the topic model
data(riottoken)
Error message: data set 'riottoken' not found"
Although I have used 'riottoken' for word clouds and sentiment analysis.
I tried to turn 'riottoken' (my data) into a corpus and a document term matrix using different codes and failed at this too. I will leave two examples below. Any help would be greatly appreciated.

R-Studio - computing crosstab and creating a table

I am new to programming and R. My R experience thus far has been Udemy's courses; specifically the Beginner and Intermediate R courses.
My data analysis background is heavily Excel and SPSS, as such I am trying to carry over those skills and find applicable analysis strategies in R.
I am attempting to compute a crosstabs, which will output the frequencies for the sets of 'character' data I am analyzing.
Below is a piece of code I am used to create a crosstab:
crosstab(Survey, row.vars = c("testcode","outcome"), col.vars = "svy1", type = "j")
I am able to see the data output in the Console, but I am unable to move it/put it into its own matrix like table in the Environment; the purpose being to create matrix like tables for reporting. I am sure there is an easy fix I am overlooking but any help is appreciated.

How to run a dynamic linear regression in R?

I am new to using R as I usually use Stata. I want to estimate a state space model on some time series data with time varying coefficients. From what I have gathered this is not possible to do in Stata.
I have downloaded the dlm package in R and I am trying to run the dlmModReg command to regress my dependent variable on a single explanatory variable. I would like to allow the intercept and beta coefficient to vary over time.
If anyone could show me an example of the code I want to run I think that would be enough for me to work out how to do this. The examples I have found online are vague or use terminology that I am not familiar with as a new R user. Any help or comments are greatly appreciated.

How to export multivariate forecast results from R to excel

I'm terribly new with R, so I apologize if there's a way to do this using a slight variation of an existing code/package.
I've created yearly forecasts of a variable (student enrollment) for 129 countries using the predict command, and then i have them binded. I've done this because I'm forecasting using a multivariate regression.
Here's what I'm doing (if this helps)
`fm1=lm(log(y+1)~Var.Ind)
XNew=data.frame(Var.Ind)
(rse<-summary(fit)$sigma(fm1)* df.residual(fm1))/2
rse<-summary(fm1)$sigma
yhat1=exp(predict(fm1,XNew)+rse*rse/2)-1
pos2014=which(Var.Ind[,1]==c(2014))
Var.Ind.2015=model.matrix(~as.matrix(Imp.Data4[pos2014,-2])-1)
head(Var.Ind.2015)
Var.Ind.2015=data.frame(Var.Ind.2015)
Var.Ind.2015.Ord=as.data.frame(Var.Ind.2015[order(Var.Ind.2015[,3],Var.Ind.2015[,1]), ])
head(Var.Ind.2015.Ord)
X.New.New=data.frame(cbind(model.matrix(~as.matrix(Var.Ind.2015.Ord))))
head(X.New.New)
ColNames.N=ColNames[-2]
colnames(X.New.New)=c("Int",ColNames.N,"Lag1","Lag2")
head(X.New.New)
Beta.Coef=matrix(as.numeric(fm1$coefficients),ncol=1)
Beta.Coef
Pred2015=as.data.frame(cbind(X.New.New[,3],exp(as.matrix(X.New.New)%*%Beta.Coef+rse*rse/2)-1))
dim(Pred2015)
colnames(Pred2015)=c("country","Yhat")
*And so on for subsequent years until 2030)
cbind(Pred2015, Pred2016, Pred2017, Pred2018, Pred2019)`
I need to figure out if there is a way to make sense of these results:
a) how to export the forecast results to excel
b) alternatively, if I could put these results into a table using R.
Also, these results do not appear in the Global Environment, only in the results section of the program, which is why I am not asking how to export data, but rather these specific results.
As previously mentioned, my coding knowledge is limited to my 1 week experience with R (I usually work with STATA).
Any help would be greatly appreciated!

how to use LSA for dimension reduction in text analytics with R

I am a beginner at data science, and I am working on a text analytics/sentiment analysis project with tweets.
what i have been trying to do is to perform some dimension reduction on my tweets training set, and feed the training set into a NaiveBayes learner, and use the learned NaiveBayes to predict the sentiment on the testing tweet set.
I have been following the steps in this article:
http://www.analyticskhoj.com/data-mining/text-analytics-part-iv-cluster-analysis-on-terms-and-documents-using-r/
their explanation is kind of too brief for a beginner like me.
I have used the lsa() to create a, what's labeled as "Large LSAspace (3 elements)" in RStudio. And following their example, I've created 3 more data frames:
lsa.train.tk = as.data.frame(lsa.train$tk)
lsa.train.dk = as.data.frame(lsa.train$dk)
lsa.train.sk = as.data.frame(lsa.train$sk)
when i view the lsa.train.tk data, it looks like this (lsa.train.dk looks pretty similar to this matrix):
and my lsa.train.sk looks like following:
my question is, how do i interpret such information?
How can i utilize this information to create something that I can feed into my NaiveBayes learner? I tried just using the lsa.train.sk for the NaiveBayes learner, but I cannot think of any good explanation that can justify what I've tried. Any help would be much appreciated!
EDIT:
What I've done so far:
making everything into term document matrix
pass in the matrix into the NaiveBayes learner
predict using the learned algorithm
my problems are:
accuracy is only 50%... and I realized that it labels everything as positive sentiment (so I could have gotten 1% accuracy if my test set only contains negative sentiment tweets).
current code is not scalable. since it utilizes large matrices, I can only handle up to 3.5k rows of data. more than that, my computer would crash. thus I wanted to do a dimensional reduction so that I can handle up to more data (such as 10k or 100k rows of tweets)

Resources