Sentiment Analysis Of A Dataset With Multiple NewsPaper Articles - r

I'm trying to call get_nrc_sentiment in R but getting the following error:
Error in get_nrc_sentiment(Test) : Data must be a character vector.
Can anyone see what I'm doing wrong?
library("RDSTK")
library("readr")
library("qdap")
library("syuzhet")
library("ggplot2")
library(readxl)
Test <- read_excel("Test.xlsx")
View(Test)
scores = get_nrc_sentiment(Test) //throwing error

I suspect that the Test.xlsx file your are reading in has multiple columns. In that case, the Test object would not be a character vector, but a dataframe. Putting the dataframe object into the get_nrc_sentiment() causes the error. You can check test with class(Test) to determine what kind of R object it is.

Related

Creating a compartive object in R from two dataframes for comparitive phylogenetics

I'm trying to read in two dataframes into a comparitive object so I can plot them using pgls.
I'm not sure what the error being returned means, and how to go about getting rid of it.
My code:
library(ape)
library(geiger)
library(caper)
taxatree <- read.nexus("taxonomyforzeldospecies.nex")
LWEVIYRcombodata <- read.csv("LWEVIYR.csv")
LWEVIYRcombodataPGLS <-data.frame(LWEVIYRcombodata$Sum.of.percentage,OGT=LWEVIYRcombodata$OGT, Species=LWEVIYRcombodata$Species)
comp.dat <- comparative.data(taxatree, LWEVIYRcombodataPGLS, "Species")
Returns error:
> comp.dat <- comparative.data(taxatree, LWEVIYRcombodataPGLS, 'Species')
Error in if (tabulate(phy$edge[, 1])[ntips + 1] > 2) FALSE else TRUE :
missing value where TRUE/FALSE needed
This might come from your data set and your phylogeny having some discrepancies that comparative.data struggles to handle (by the look of the error message).
You can try cleaning both the data set and the tree using dispRity::clean.data:
library(dispRity)
## Reading the data
taxatree <- read.nexus("taxonomyforzeldospecies.nex")
LWEVIYRcombodata <- read.csv("LWEVIYR.csv")
LWEVIYRcombodataPGLS <- data.frame(LWEVIYRcombodata$Sum.of.percentage,OGT=LWEVIYRcombodata$OGT, Species=LWEVIYRcombodata$Species)
## Cleaning the data
cleaned_data <- clean.data(LWEVIYRcombodataPGLS, taxatree)
## Preparing the comparative data object
comp.dat <- comparative.data(cleaned_data$tree, cleaned_data$data, "Species")
However, as #MrFlick suggests, it's hard to know if that solves the problem without a reproducible example.
The error here is that I was using a nexus file, although ?comparitive.data does not specify which phylo objects it should use, newick trees seem to work fine, whereas nexus files do not.

R NaiveBayes issue with numeric variables

Even though the NaiveBayes() help says that numeric can be passed in the first parameter 'x', I am not able to run it successfully. Without numeric variable(resale) it works fine. Here is the script:
library(readr)
library(klaR)
### load dataset
Dataset <- read_csv("D:/sampledata.csv")
### converting 'model' and 'type' to factor
Dataset$model <- factor(Dataset$model)
Dataset$type <- factor(Dataset$type)
### Executing NaiveBayes with numeric 'resale'
NaiveBayesModel1 <- NaiveBayes(model~type+mylogical+resale,data=Dataset,na.action =na.omit)
### now removing resale. Following works as expected.
NaiveBayesModel1 <- NaiveBayes(model~type+mylogical,data=Dataset,na.action =na.omit)
'model' and 'type' are factors,
'mylogical' is a logical and
'resale' is a numeric variable.
Since, I cannot attach my datafile, I am pasting few rows here. Copy these rows and save as sampledata.csv file on your drive. Modify read_csv() in the above script to point to this csv file.
"model","sales","resale","type","mylogical"
"Integra",16.919,16.36,"Automobile",TRUE
"TL",39.384,19.875,"Automobile",FALSE
"Camry",247.994,13.245,"Automobile",FALSE
"Avalon",63.849,18.14,"Automobile",TRUE
"Celica",33.269,15.445,"Automobile",TRUE
"Tacoma",84.087,9.575,"Truck",TRUE
"RAV4",25.106,13.325,"Truck",FALSE
"4Runner",68.411,19.425,"Truck",FALSE
"Land Cruiser",9.835,34.08,"Truck",TRUE
"Golf",9.761,11.425,"Automobile",FALSE
"Jetta",83.721,13.24,"Automobile",FALSE
"Passat",51.102,16.725,"Automobile",TRUE
"Cabrio",9.569,16.575,"Automobile",FALSE
"GTI",5.596,13.76,"Automobile",FALSE
I get following error if I run NaiveBayes with "resale".
Error in if (any(temp)) stop("Zero variances for at least one class in variables: ", :
missing value where TRUE/FALSE needed
R help ( help(NaiveBayes) ) says I can use numeric. I don't understand what is wrong. Please help.
Regards,
SG
The error is caused by zero variance in variable resale values for each of the outcomes in model. Most likely your training set contains single training record for each distinct value in model.

eqmcc function in R QCA package exiting with error

When I attempt to call eqmcc() against a truthTable object, the result is this error message:
Error: The outcome's length should be the same as the number of rows in the data.
Here's my script:
library(QCA); library (psych); library(readr)
gamson <- read_csv("/path/to/Gamson.csv", col_names = TRUE)
is.na(gamson)
ttACP2 <- truthTable(data=gamson, outcome = "ACP", conditions = "BUR, LOW, DIS, HLP", n.cut=3, incl.cut=0.750, sort.by="incl, n", complete=FALSE, show.cases=TRUE)
ttACP2
csACP2 <- eqmcc(ttACP2, details=TRUE, show.cases=TRUE, row.dom=TRUE, all.sol=FALSE, use.tilde=FALSE)
The is.na() function shows that there are no missing values in my data set. The data set contains 54 rows, of which the first is the column names. The truth table is generated according to expectations. But the minimization of the selected causal conditions fails.
I found a chunk of source code that matches the error message on line 90 here:
https://github.com/cran/QCApro/blob/master/R/pof.R
But I'm not competent enough in programming to understand what conditions lead to the error message being thrown.
This is because your dataset is a tibble instead of a dataframe. After loading the dataset, and before finding the truth table, do this:
gamson <- as.data.frame(gamson)
It should work after that. (The latest version of the eqmcc function is called minimize now.

Error in is.data.frame(x) : object '' not found , how could I fix this?

I'm trying to run correlations on R.
This is my code so far:
library("foreign")
mydata<-read.csv(" ",header=FALSE)
options(max.print=1000000)
attach(mydata)
cor(as.numeric(agree_election),as.numeric(agree_party))
Then it gives me the error that object "agree_election" is not an object.
However, agree_election is just one of the headers of my columns for my excel spreadsheet.How do I fix this?
Check the names in your data frame! Does it contain a variable with a name agree_election?
Please avoid the attach function. It could be fine with just one data frame, but it can make a mess if you have several data frames attached.
This could should be fine, if the variable names are correct.
mydata <- read.csv("...", header = F)
names(mydata)
str(mydata)
cor(as.numeric(mydata$agree_election), as.numeric(mydata$agree_party))

colnames intgroup arguement of arrayQualityMetrics package of Biobase

I am using a package from Biobase : arrayQualityMetrics for creating the plots for visualization of microarray data.
My data is stored in ExpressionSet.
one of the column names of the phenoData(ExpressionSet) has name "Tissue" but when i run the following command :
arrayQualityMetrics(ExpressionSet,intgroup = "Tissue")
It gives me an error saying that :
Error in prepdata(expressionset, intgroup = intgroup, do.logtransform = do.logtransform) :
all elements of 'intgroup' should match column names of 'pData(expressionset)'.
I dont understand why I ma getting this error although my ExpressionSet contains a column names "Tissue" in its phenoData.
It's been awhile since you asked this question but this is likely due to arrayQualityMetrics() having to trim down the data frame in your pData() slot to a limited number of fields for display in the metadata table at the beginning of the report.
Try something like:
tmp <- pData(ExpressionSet)
pData(ExpressionSet) <- tmp[,c("Tissue", "SomeOtherInterestingField")] # swap out
arrayQualityMetrics(ExpressionSet,intgroup="Tissue")
pData(ExpressionSet) <- tmp # replace with your original full pData() data frame

Resources