eqmcc function in R QCA package exiting with error - r

When I attempt to call eqmcc() against a truthTable object, the result is this error message:
Error: The outcome's length should be the same as the number of rows in the data.
Here's my script:
library(QCA); library (psych); library(readr)
gamson <- read_csv("/path/to/Gamson.csv", col_names = TRUE)
is.na(gamson)
ttACP2 <- truthTable(data=gamson, outcome = "ACP", conditions = "BUR, LOW, DIS, HLP", n.cut=3, incl.cut=0.750, sort.by="incl, n", complete=FALSE, show.cases=TRUE)
ttACP2
csACP2 <- eqmcc(ttACP2, details=TRUE, show.cases=TRUE, row.dom=TRUE, all.sol=FALSE, use.tilde=FALSE)
The is.na() function shows that there are no missing values in my data set. The data set contains 54 rows, of which the first is the column names. The truth table is generated according to expectations. But the minimization of the selected causal conditions fails.
I found a chunk of source code that matches the error message on line 90 here:
https://github.com/cran/QCApro/blob/master/R/pof.R
But I'm not competent enough in programming to understand what conditions lead to the error message being thrown.

This is because your dataset is a tibble instead of a dataframe. After loading the dataset, and before finding the truth table, do this:
gamson <- as.data.frame(gamson)
It should work after that. (The latest version of the eqmcc function is called minimize now.

Related

Sentiment Analysis Of A Dataset With Multiple NewsPaper Articles

I'm trying to call get_nrc_sentiment in R but getting the following error:
Error in get_nrc_sentiment(Test) : Data must be a character vector.
Can anyone see what I'm doing wrong?
library("RDSTK")
library("readr")
library("qdap")
library("syuzhet")
library("ggplot2")
library(readxl)
Test <- read_excel("Test.xlsx")
View(Test)
scores = get_nrc_sentiment(Test) //throwing error
I suspect that the Test.xlsx file your are reading in has multiple columns. In that case, the Test object would not be a character vector, but a dataframe. Putting the dataframe object into the get_nrc_sentiment() causes the error. You can check test with class(Test) to determine what kind of R object it is.

R NaiveBayes issue with numeric variables

Even though the NaiveBayes() help says that numeric can be passed in the first parameter 'x', I am not able to run it successfully. Without numeric variable(resale) it works fine. Here is the script:
library(readr)
library(klaR)
### load dataset
Dataset <- read_csv("D:/sampledata.csv")
### converting 'model' and 'type' to factor
Dataset$model <- factor(Dataset$model)
Dataset$type <- factor(Dataset$type)
### Executing NaiveBayes with numeric 'resale'
NaiveBayesModel1 <- NaiveBayes(model~type+mylogical+resale,data=Dataset,na.action =na.omit)
### now removing resale. Following works as expected.
NaiveBayesModel1 <- NaiveBayes(model~type+mylogical,data=Dataset,na.action =na.omit)
'model' and 'type' are factors,
'mylogical' is a logical and
'resale' is a numeric variable.
Since, I cannot attach my datafile, I am pasting few rows here. Copy these rows and save as sampledata.csv file on your drive. Modify read_csv() in the above script to point to this csv file.
"model","sales","resale","type","mylogical"
"Integra",16.919,16.36,"Automobile",TRUE
"TL",39.384,19.875,"Automobile",FALSE
"Camry",247.994,13.245,"Automobile",FALSE
"Avalon",63.849,18.14,"Automobile",TRUE
"Celica",33.269,15.445,"Automobile",TRUE
"Tacoma",84.087,9.575,"Truck",TRUE
"RAV4",25.106,13.325,"Truck",FALSE
"4Runner",68.411,19.425,"Truck",FALSE
"Land Cruiser",9.835,34.08,"Truck",TRUE
"Golf",9.761,11.425,"Automobile",FALSE
"Jetta",83.721,13.24,"Automobile",FALSE
"Passat",51.102,16.725,"Automobile",TRUE
"Cabrio",9.569,16.575,"Automobile",FALSE
"GTI",5.596,13.76,"Automobile",FALSE
I get following error if I run NaiveBayes with "resale".
Error in if (any(temp)) stop("Zero variances for at least one class in variables: ", :
missing value where TRUE/FALSE needed
R help ( help(NaiveBayes) ) says I can use numeric. I don't understand what is wrong. Please help.
Regards,
SG
The error is caused by zero variance in variable resale values for each of the outcomes in model. Most likely your training set contains single training record for each distinct value in model.

How do I fix the following error code: Error in `$<-.data.frame`(`*tmp*`, "cases", value =

When I give the following code:
ttE1B <- truthTable(data=O4S2, outcome = "FSUMSUST", conditions = "CSUSCOM,
CESGCOMP, FWOM, CENERGY, CMAT, CINDUST, CCDISC, CFINSECT, CENVSPOL, CTELE,
CUTIL, CGRBLDGPOL, CSPKGPOL, FTOTPOL, CGRI, CUN, FESGD, FSOCD, FENVD, FGOVD",
sort.by="incl, n", show.cases=TRUE)
I repeatedly get the following error code:
Error in `$<-.data.frame`(`*tmp*`, "cases", value = c("99,148,155,175", :
replacement has 77 rows, data has 167
I am trying to obtain truth tables for my outcome and conditions (truthTable command). I have tried various combinations but I continue to get the above noted error when I include some (but not all) of the conditions. I assume that the error relates to the cases mentioned (each time I run it different cases are mentioned depending on which conditions I have used in the command).
I can successfully run the command with some of the conditions but I need to run it with all of the conditions. I tried removing the data (cases) referenced to see if that would help but it has not.
Thanks in advance.

RecordLinkage Package and RLBigDataLinkage-Class Objects

I am attempting to use R package RecordLinkage, and am using two articles by the package authors as usage guides, in addition to the package documentation.
I am using 2 large datasets (100k+ rows), which I hope to link, and so I am using those elements of the package which are built around S4 class RLBigDataLinkage.
I begin by running the following lines in R:
>library('RecordLinkage')
>data1 <- as.data.frame(#source)
>data2 <- as.data.frame(#source)
>rpairs <- RLBigDataLinkage(data1, data2, strcmp = 2:8, exclude = 9:10)
This works fine (though it takes some time), and writes the necessary .ff files to deal with the large data sets.
If I then try:
>rpairs <- epiWeights(rpairs)
Or:
>rpairs <- epiWeights(rpairs, e = 0.01, f = getFrequencies(rpairs))
Then when I run:
>summary(rpairs)
I get the error message:
Error in dbGetQuery(object#con, "select count(*) from data1") :
error in evaluating the argument 'conn' in selecting a method for function 'dbGetQuery': Error: no slot of name "con" for this object of class "RLBigDataLinkage"
If, on the other hand, I run:
>result <- epiClassify(rpairs, 0.5)
>getTable(result)
I get the error message:
Error in table.ff(object#data#pairs$is_match, object#prediction, useNA = "ifany") :
Only vmodes integer currently allowed - are you sure ... contains only factors or integers?
I'm clearly missing something about how these objects need to be handled. Does anyone have any experience with this package that sees my error? Thanks kindly.
when the type of 'rpairs' is 'RLBigDataLinkage' use print(rpairs) ,you will get the summary of rpairs.

colnames intgroup arguement of arrayQualityMetrics package of Biobase

I am using a package from Biobase : arrayQualityMetrics for creating the plots for visualization of microarray data.
My data is stored in ExpressionSet.
one of the column names of the phenoData(ExpressionSet) has name "Tissue" but when i run the following command :
arrayQualityMetrics(ExpressionSet,intgroup = "Tissue")
It gives me an error saying that :
Error in prepdata(expressionset, intgroup = intgroup, do.logtransform = do.logtransform) :
all elements of 'intgroup' should match column names of 'pData(expressionset)'.
I dont understand why I ma getting this error although my ExpressionSet contains a column names "Tissue" in its phenoData.
It's been awhile since you asked this question but this is likely due to arrayQualityMetrics() having to trim down the data frame in your pData() slot to a limited number of fields for display in the metadata table at the beginning of the report.
Try something like:
tmp <- pData(ExpressionSet)
pData(ExpressionSet) <- tmp[,c("Tissue", "SomeOtherInterestingField")] # swap out
arrayQualityMetrics(ExpressionSet,intgroup="Tissue")
pData(ExpressionSet) <- tmp # replace with your original full pData() data frame

Resources