Shiny reactive Unexpected Behavior - r

I'm trying to create a reactive function that looks up the indices, corresponding to the user's inputs, from a dataframe referred to as df in the code below. Just to give you an idea, here's how the dataframe df looks like:
'data.frame': 87 obs. of 6 variables:
$ Job : Factor w/ 66 levels "Applications Engineer",..: 61 14 23 31 22 15 57 26 30 13 ...
$ Company : Factor w/ 102 levels "A10 Networks",..: 95 50 83 71 80 60 20 7 30 51 ...
$ Location: Factor w/ 64 levels "Ayr","Bangalore",..: 36 22 19 29 59 7 7 55 53 63 ...
$ Posted : num 2 3 2 3 1 1 2 5 4 1 ...
$ Source : Factor w/ 2 levels "Glassdoor","Indeed": 2 2 2 2 2 2 2 2 2 2 ...
$ url : chr "http://ca.indeed.com/rc/clk?jk=71f1abcd100850c6" "http://ca.indeed.com/rc/clk?jk=504724a4d74674fe" "http://ca.indeed.com/rc/clk?jk=d2e78fb67e8c86d6" "http://ca.indeed.com/rc/clk?jk=df790aa5fc7bdc3c" ...
The reactive function mostly uses the grep function to do a text search and find the respective indices. Here's the relevant chunk of the code from server.R:
#Create a reactive function to look up the indices correponding to the inputs
index <- reactive({
ind.j <- if(input$j=='') NULL else grep(input$j,df[,'Job'],ignore.case = T)
ind.c <- {tmp<-lapply(input$c, function(x) {which(df[,'Company']==x)}); Reduce(union,tmp)}
ind.l <- if(input$l=='') NULL else grep(input$l,df[,'Location'],ignore.case = T)
ind.d <- which(df[,'Posted']<=input$d)
ind.s <- {tmp<-lapply(input$s, function(x) {which(df[,'Source']==x)}); Reduce(union,tmp)}
ind.all <- list(ind.j,ind.c,ind.l,ind.d,ind.s)
ind <- if(is.null(ind.s)) NULL else {ind.null<- which(lapply(ind.all,is.null)==TRUE) ;Reduce(intersect,ind.all[-ind.null])}
})
I have printed the results of ind.j, ind.c, ind.l,ind.d, ind.s, and ind.all to the console and they all produce the right results. however when I test the results of ind it's not quite what I expect so I'm wondering if it's the reactivity or the line of code that doesn't work.
What the ind intends to do is to take the list of all the looked-up indices, stored in ind.all, and applies the intersect function recursively to find the common elements from all the sublists in ind.all.
The index function works fine for individual filters. however when I enter values for all the indices, the function does not update to the correct list of indices as expected.

This question has been answered by in this post by jdharrison. I'm going to reiterate his answer here:
The problem you have is with the which function:
> which(rep(FALSE, 5))
integer(0)
You can change:
ind <- if(is.null(ind.s)){
NULL
}else{
ind.null<- which(lapply(ind.all,is.null)==TRUE)
Reduce(intersect,ind.all[-ind.null])
}
to
ind <- if(is.null(ind.s)){
NULL
}else{
Reduce(intersect,ind.all[!sapply(ind.all,is.null)])
}

Related

"Number of observations <= number of random effects" error

I am using a package called diagmeta for meta-analysis purposes. I can use this package with a built in data set called Schneider2017. However when I make my own database/data set I get the following error:
Error: number of observations (=300) <= number of random effects (=3074) for term (Group * Cutoff | Study); the random-effects parameters and the residual variance (or scale parameter) are probably unidentifiable
Another thread here on SO suggests the error is caused by the data format of one or more columns. I have made sure every column's data type matches that in the Schneider2017 dataset - no effect.
Link to the other thread
I have tried extracting all of the data from the Schneider2017 dataset into excel and then importing a dataset from Excel through R studio. This again makes no difference. This suggests to me that something in the data format could be different, although I can't see how.
diag2 <- diagmeta(tpos, fpos, tneg, fneg, cutpoint,
studlab = paste(author,year,group),
data = SRschneider,
model = "DIDS", log.cutoff = FALSE,
check.nobs.vs.nRE = "ignore")
The dataset looks like this:
I expected the same successful execution and plotting as with the built-in data set, but keep getting this error.
Result from doing str(mydataset):
> str(SRschneider)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 150 obs. of 10 variables:
$ ...1 : num 1 2 3 4 5 6 7 8 9 10 ...
$ study_id: num 1 1 1 1 1 1 1 1 1 1 ...
$ author : chr "Arora" "Arora" "Arora" "Arora" ...
$ year : num 2006 2006 2006 2006 2006 ...
$ group : chr NA NA NA NA ...
$ cutpoint: chr "6" "7.0" "8.0" "9.0" ...
$ tpos : num 133 131 130 127 119 115 113 110 102 98 ...
$ fneg : num 5 7 8 11 19 23 25 28 36 40 ...
$ fpos : num 34 33 31 30 28 26 25 21 19 19 ...
$ tneg : num 0 1 3 4 6 8 9 13 15 15 ...
Just a quick follow-up on Ben's detailed answer.
The statistical method implemented in diagmeta() expects that argument cutpoint is a continuous variable. We added a corresponding check for argument cutpoint (as well as arguments TP, FP, TN, and FN) in version 0.3-1 of R package diagmeta; see commit in GitHub repository for technical details.
Accordingly, the following R commands will result in a more informative error message:
data(Schneider2017)
diagmeta(tpos, fpos, tneg, fneg, as.character(cutpoint),
studlab = paste(author, year, group), data = Schneider2017)
You said that you
have made sure every column's data type matches that in the Schneider2017 dataset
but that doesn't seem to be true. Besides differences between num (numeric) and int (integer) types (which actually aren't typically important), your data has
$ cutpoint: chr "6" "7.0" "8.0" "9.0" ...
while str(Schneider2017) has
$ cutpoint: num 6 7 8 9 10 11 12 13 14 15 ...
Having your cutpoint be a character rather than numeric means that R will try to treat it as a categorical variable (with many discrete levels). This is very likely the source of your problem.
The cutpoint variable is likely a character because R encountered some value in this column that can't be interpreted as numeric (something as simple as a typographic error). You can use SRschneider$cutpoint <- as.numeric(SRschneider$cutpoint) to convert the variable to numeric by brute force (values that can't be interpreted will be set to NA), but it would be better to go upstream and see where the problem is.
If you use tidyverse packages to load your data you should get a list of "parsing problems" that may be useful. You can also try cp <- SRschneider$cutpoint; cp[which(is.na(as.numeric(cp)))] to look at the values that can't be converted.

Choosing multiple columns and changing their classes using a lookup table in R?

Is it possible to use a lookup table to assign/change the classes of variables in a data frame in R? I have thousands of columns with messed up classes in one data frame (my_df), and list of what they should be in another data frame (my_lt). PSEUDO CODE I was thinking something like use my_lt$variable_name and grep() on colnames(my_df) and pass the output through as.numeric if lt$variable_class == "numeric", with some form of if..else. Any help would be much appreciated!
input - my data frame (my_df)
my_df = data.frame(q1_hight_1=c(12,31,22,12),q1_hight_2=c(24,54,23,32),q1_hight_3=c(34,23,65,34),q2_shoe_size_1=c(2,2,3,4),q2_shoe_size_2=c(4,3,3,4))
input - my lookup table (my_lt)
my_lt = data.frame(variable_name=c("hight","shoe_size"),variable_class=c("numeric","integer"))
desired output (when checking classes)
$q1_hight_1 [1] "numeric" $q1_hight_2 [1] "numeric" $q1_hight_3 [1] "numeric" $q2_shoe_size_1 [1] "integer" $q2_shoe_size_2 [1] "integer"
This does the trick, given that there's no trap in the names you give to your variables (I use a very naïve grep).
library(dplyr)
library(purr)
map2(as.character(my_lt$variable_name),
as.character(my_lt$variable_class),
function(nam,cl){ map(grep(nam,names(my_df)),function(i){class(my_df[[i]]) <<- cl})})
str(my_df)
# 'data.frame': 4 obs. of 5 variables:
# $ q1_hight_1 : num 12 31 22 12
# $ q1_hight_2 : num 24 54 23 32
# $ q1_hight_3 : num 34 23 65 34
# $ q2_shoe_size_1: int 2 2 3 4
# $ q2_shoe_size_2: int 4 3 3 4

Subset dataframe using logicals in R

I'm trying to subset a dataframe using logical operators on the day of the year, and I wonder why the following doesn't work.
num <- c(11,22,33,44)
day.of.yr <- c(31,32,33,34)
dframe <- data.frame(num,day.of.yr)
num day.of.yr
1 11 31
2 22 32
3 33 33
4 44 34
target.days <- c(32,34)
# works
test1 <-dframe[(day.of.yr==target.days[1] | day.of.yr==target.days[2]),]
num day.of.yr
2 22 32
4 44 34
# doesn't work
test2 <- dframe[day.of.yr==target.days,]
num day.of.yr
4 44 34
When I try it on a real dataset, R also outputs just a subset of what I want it to output, with this warning message:
Warning message:
In dframe$day.of.yr == target.days :
longer object length is not a multiple of shorter object length
It would be nice to have a short-cut way of specifying multiple rows of a dataframe based on the values in one column. I've tried a few different ways, but no luck yet.
Use %in%, like so:
subset(dframe, day.of.yr %in% target.days)

Remove duplicates in R without converting to numeric

I have 2 variables in a data frame with 300 observations.
$ imagelike: int 3 27 4 5370 ...
$ user: Factor w/ 24915 levels "\"0.1gr\"","\"008bla\"", ..
I then tried to remove the duplicates, such as "- " appears 2 times:
testclean <- data1[!duplicated(data1), ]
This gives me the warning message:
In Ops.factor(left): "-"not meaningful for factors
I have then converted it to a maxtrix:
data2 <- data.matrix(data1)
testclean2 <- data2[!duplicated(data2), ]
This does the trick - however - it converts the userNames to a numeric.
=========================================================================
I am new but I have tried looking at previous posts on this topic (including the one below) but it did not work out:
Convert data.frame columns from factors to characters
Some sample data, from your image (please don't post images of data!):
data1 <- data.frame(imageLikeCount = c(3,27,4,4,16,103),
userName = c("\"testblabla\"", "test_00", "frenchfries", "frenchfries", "test.inc", "\"parmezan_pizza\""))
str(data1)
# 'data.frame': 6 obs. of 2 variables:
# $ imageLikeCount: num 3 27 4 4 16 103
# $ userName : Factor w/ 5 levels "\"parmezan_pizza\"",..: 2 5 3 3 4 1
To fix the problem with factors as well as the embedded quotes:
data1$userName <- gsub('"', '', as.character(data1$userName))
str(data1)
# 'data.frame': 6 obs. of 2 variables:
# $ imageLikeCount: num 3 27 4 4 16 103
# $ userName : chr "testblabla" "test_00" "frenchfries" "frenchfries" ...
Like #DanielWinkler suggested, if you can change how the data is read-in or defined, you might choose to include stringsAsFactors = FALSE (this argument is accepted in many functions, including read.csv, read.table, and most data.frame functions including as.data.frame and rbind):
data1 <- data.frame(imageLikeCount = c(3,27,4,4,16,103),
userName = c("\"testblabla\"", "test_00", "frenchfries", "frenchfries", "test.inc", "\"parmezan_pizza\""),
stringsAsFactors = FALSE)
str(data1)
# 'data.frame': 6 obs. of 2 variables:
# $ imageLikeCount: num 3 27 4 4 16 103
# $ userName : chr "\"testblabla\"" "test_00" "frenchfries" "frenchfries" ...
(Note that this still has embedded quotes, so you'll still need something like data1$userName <- gsub('"', '', data1$userName).)
Now, we have data that looks like this:
data1
# imageLikeCount userName
# 1 3 testblabla
# 2 27 test_00
# 3 4 frenchfries
# 4 4 frenchfries
# 5 16 test.inc
# 6 103 parmezan_pizza
and your need to remove duplicates works:
data1[! duplicated(data1), ]
# imageLikeCount userName
# 1 3 testblabla
# 2 27 test_00
# 3 4 frenchfries
# 5 16 test.inc
# 6 103 parmezan_pizza
Try
data$userName <- as.character(data$userName)
And then
data<-unique(data)
You could also pass the argument stringAsFactor = FALSE when reading the data. This is usually a good idea.

C5.0 decision tree - c50 code called exit with value 1

I am getting the following error
c50 code called exit with value 1
I am doing this on the titanic data available from Kaggle
# Importing datasets
train <- read.csv("train.csv", sep=",")
# this is the structure
str(train)
Output :-
'data.frame': 891 obs. of 12 variables:
$ PassengerId: int 1 2 3 4 5 6 7 8 9 10 ...
$ Survived : int 0 1 1 1 0 0 0 0 1 1 ...
$ Pclass : int 3 1 3 1 3 3 1 3 3 2 ...
$ Name : Factor w/ 891 levels "Abbing, Mr. Anthony",..: 109 191 358 277 16 559 520 629 417 581 ...
$ Sex : Factor w/ 2 levels "female","male": 2 1 1 1 2 2 2 2 1 1 ...
$ Age : num 22 38 26 35 35 NA 54 2 27 14 ...
$ SibSp : int 1 1 0 1 0 0 0 3 0 1 ...
$ Parch : int 0 0 0 0 0 0 0 1 2 0 ...
$ Ticket : Factor w/ 681 levels "110152","110413",..: 524 597 670 50 473 276 86 396 345 133 ...
$ Fare : num 7.25 71.28 7.92 53.1 8.05 ...
$ Cabin : Factor w/ 148 levels "","A10","A14",..: 1 83 1 57 1 1 131 1 1 1 ...
$ Embarked : Factor w/ 4 levels "","C","Q","S": 4 2 4 4 4 3 4 4 4 2 ...
Then I tried using C5.0 dtree
# Trying with C5.0 decision tree
library(C50)
#C5.0 models require a factor outcome otherwise error
train$Survived <- factor(train$Survived)
new_model <- C5.0(train[-2],train$Survived)
So running the above lines gives me this error
c50 code called exit with value 1
I'm not able to figure out what's going wrong? I was using similar code on different dataset and it was working fine. Any ideas about how can I debug my code?
-Thanks
For anyone interested, the data can be found here: http://www.kaggle.com/c/titanic-gettingStarted/data. I think you need to be registered in order to download it.
Regarding your problem, first of I think you meant to write
new_model <- C5.0(train[,-2],train$Survived)
Next, notice the structure of the Cabin and Embarked Columns. These two factors have an empty character as a level name (check with levels(train$Embarked)). This is the point where C50 falls over. If you modify your data such that
levels(train$Cabin)[1] = "missing"
levels(train$Embarked)[1] = "missing"
your algorithm will now run without an error.
Just in case. You can take a look to the error by
summary(new_model)
Also this error occurs when there are a special characters in the name of a variable. For example, one will get this error if there is "я"(it's from Russian alphabet) character in the name of a variable.
Here is what worked finally:-
Got this idea after reading this post
library(C50)
test$Survived <- NA
combinedData <- rbind(train,test)
combinedData$Survived <- factor(combinedData$Survived)
# fixing empty character level names
levels(combinedData$Cabin)[1] = "missing"
levels(combinedData$Embarked)[1] = "missing"
new_train <- combinedData[1:891,]
new_test <- combinedData[892:1309,]
new_model <- C5.0(new_train[,-2],new_train$Survived)
new_model_predict <- predict(new_model,new_test)
submitC50 <- data.frame(PassengerId=new_test$PassengerId, Survived=new_model_predict)
write.csv(submitC50, file="c50dtree.csv", row.names=FALSE)
The intuition behind this is that in this way both the train and test data set will have consistent factor levels.
I had the same error, but I was using a numeric dataset without missing values.
After a long time, I discovered that my dataset had a predictive attribute called "outcome" and the C5.0Control use this name, and this was the error cause :'(
My solution was changing the column name. Other way, would be create a C5.0Control object and change the value of the label attribute and then pass this object as parameter for the C50 method.
I also struggled some hours with the same Problem (Return code "1") when building a model as well as when predicting.
With the hint of answer of Marco I have written a small function to remove all factor levels equal to "" in a data frame or vector, see code below. However, since R does not allow for pass by reference to functions, you have to use the result of the function (it can not change the original dataframe):
removeBlankLevelsInDataFrame <- function(dataframe) {
for (i in 1:ncol(dataframe)) {
levels <- levels(dataframe[, i])
if (!is.null(levels) && levels[1] == "") {
levels(dataframe[,i])[1] = "?"
}
}
dataframe
}
removeBlankLevelsInVector <- function(vector) {
levels <- levels(vector)
if (!is.null(levels) && levels[1] == "") {
levels(vector)[1] = "?"
}
vector
}
Call of the functions may look like this:
trainX = removeBlankLevelsInDataFrame(trainX)
trainY = removeBlankLevelsInVector(trainY)
model = C50::C5.0.default(trainX,trainY)
However, it seems, that C50 has a similar Problem with character columns containing an empty cell, so you will have probably to extend this to handle also character attributes if you have some.
I also got the same error, but it was because of some illegal characters in the factor levels of one the columns.
I used make.names function and corrected the factor levels:
levels(FooData$BarColumn) <- make.names(levels(FooData$BarColumn))
Then the problem was resolved.

Resources