Trouble converting list into factor in R - r

I am having problems creating a boxplot of my data, because one of my variables is in the form of a list.
I am trying to create a boxplot:
boxplot(dist~species, data=out)
and received the following error:
Error in model.frame.default(formula = dist ~ species, data = out) :
invalid type (list) for variable 'species'
I have been unsuccessful in forcing 'species' into the form of a factor:
out[species]<- as.factor(out[[out$species]])
and receive the following error:
Error in .subset2(x, i, exact = exact) : invalid subscript type 'list'
How can I convert my 'species' column into a factor which I can then use to create a boxplot? Thanks.
EDIT:
str(out)
'data.frame': 4570 obs. of 6 variables:
$ GridRef : chr "NT73" "NT80" "NT85" "NT86" ...
$ pred : num 154 71 81 85 73 99 113 157 92 85 ...
$ pred_bin : int 0 0 0 0 0 0 0 0 0 0 ...
$ dist : num 20000 10000 9842 14144 22361 ...
$ years_since_1990: chr "21" "16" "21" "20" ...
$ species :List of 4570
..$ : chr "C.splendens"
..$ : chr "C.splendens"
..$ : chr "C.splendens"
.. [list output truncated]

It's hard to imagine how you got the data into this form in the first place, but it looks like
out <- transform(out,species=unlist(species))
should solve your problem.
set.seed(101)
f <- as.list(sample(letters[1:5],replace=TRUE,size=100))
## need I() to make a wonky data frame ...
d <- data.frame(y=runif(100),f=I(f))
## 'data.frame': 100 obs. of 2 variables:
## $ y: num 0.125 0.0233 0.3919 0.8596 0.7183 ...
## $ f:List of 100
## ..$ : chr "b"
## ..$ : chr "a"
boxplot(y~f,data=d) ## invalid type (list) ...
d2 <- transform(d,f=unlist(f))
boxplot(y~f,data=d2)

Related

Error in inherits(data, "data.frame") : argument "data" is missing, with no default

I am keep running into the error and I can't seem to find any apperant problem in the code.
library(tidyverse)
library(ggplot2)
require(data.table)
library(ggplot2)
require(data.table)
data <- as.data.frame(fread("MyND_merged.tsv"))
g <- ggplot(data = data, aes(x = study, y = value)) +
geom_boxplot() +
facet_wrap(facets = ~type, scale ='free') +
ggpubr::compare_means()
The error says
Error in inherits(data, "data.frame") :
argument "data" is missing, with no default
data is defined in the code, I believe so - would someone please help me solving this error?
Thank you
> str(data)
'data.frame': 4266 obs. of 9 variables:
$ V1 : int 0 1 2 3 4 5 6 7 8 9 ...
$ sample : chr "AANDS0002-01" "AANDS0002-01" ...
$ type : chr "index1" "index 2" ...
$ value : num 0.0122 0.9729 ...
$ donor_id: chr "AANDS0002" "AANDS0002" ...
$ gender : chr "M" "M" "M" "M" ...
$ age : int 80 80 80 80 80 80 75 75 75 75 ...
$ disease : chr "name1" "name2" ...
$ study : chr "mynd_2" "mynd_2" "mynd_2" "mynd_2" ...

Find AUC with tree package - binary response

Attempting to get ROC Curve and AUC for CART decision tree which was made using "tree" package.
> str(pruned.tree7)
Here is the Structure of my tree
'data.frame': 13 obs. of 6 variables:
$ var : Factor w/ 15 levels "","Age",..: 15 10 1 11 11 5 1 1 15 1 ...
$ n : num 383 158 29 129 110 38 20 18 72 7 ...
$ dev : num 461.1 218.6 29.6 174 141.8 ...
$ yval : Factor w/ 2 levels "Negative","Positive": 2 2 1 2 2 1 2 1 2 1 ...
$ splits: chr [1:13, 1:2] "<19.5" "<81.5" "" "<65" ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr "cutleft" "cutright"
$ yprob : num [1:13, 1:2] 0.29 0.475 0.793 0.403 0.345 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr "Negative" "Positive"
Referencing the above structure, I have written (many variations of) the following code:
> preds <- prediction(pruned.tree7$frame$yprob, dimnames(pruned.tree7$frame$yprob))
Error in prediction(pruned.tree7$frame$yprob, dimnames(pruned.tree7$frame$yprob)) :
Number of predictions in each run must be equal to the number of labels for each run.
> preds <- prediction(pruned.tree7$frame$yprob, dimnames)
Error in prediction(pruned.tree7$frame$yprob, dimnames) :
Format of labels is invalid.
> preds <- prediction(pruned.tree7$frame$yprob, "dimnames")
Error in prediction(pruned.tree7$frame$yprob, "dimnames") :
Number of cross-validation runs must be equal for predictions and labels.
> preds <- prediction(pruned.tree7$frame$yprob, names(yprob))
Error in is.data.frame(labels) : object 'yprob' not found
> preds <- prediction(pruned.tree7$frame$yprob, names(pruned.tree7$frame$yprob))
Error in prediction(pruned.tree7$frame$yprob, names(pruned.tree7$frame$yprob)) :
Format of labels is invalid.
> preds <- prediction(pruned.tree7$frame$yprob, dimnames(pruned.tree7$frame$yprob))
Error in prediction(pruned.tree7$frame$yprob, dimnames(pruned.tree7$frame$yprob)) :
Number of predictions in each run must be equal to the number of labels for each run.
I have searched and found this link: ROCR Package Documentation
It mentions the topic of cross-validation. However, it does not make sense to me.
Thank you in advance!!

cramer.test: NAs introduced by coercion

I know there is a lot of information in Google about this problem, but I could not solve it.
I have a data frame:
> str(myData)
'data.frame': 1199456 obs. of 7 variables:
$ A: num 3064 82307 4431998 1354 193871 ...
$ B: num 6067 403916 2709997 2743 203434 ...
$ C: num 299 11752 33282 170 2748 ...
$ D: num 105 6676 7065 20 1593 ...
$ E: num 8 572 236 3 170 ...
$ F: num 0 21 95 0 13 ...
$ G: num 583 18512 961328 348 42728 ...
Then I convert it to a matrix in order to apply the Cramer-von Mises test from "cramer" library:
> myData = as.matrix(myData)
> str(myData)
num [1:1199456, 1:7] 3064 82307 4431998 1354 193871 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:1199456] "8" "32" "48" "49" ...
..$ : chr [1:7] "A" "B" "C" "D" ...
After that, if I apply a "cramer.test(myData[x1:y1,], myData[x2:y2,])" I get the following error:
Error in rep(0, (RVAL$m + RVAL$n)^2) : invalid 'times' argument
In addition: Warning message:
In matrix(rep(0, (RVAL$m + RVAL$n)^2), ncol = (RVAL$m + RVAL$n)) :
NAs introduced by coercion
I also tried to convert the data frame to a matrix like this, but the error is the same:
> myData = as.matrix(sapply(myData, as.numeric))
> str(myData)
num [1:1199456, 1:7] 3064 82307 4431998 1354 193871 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:7] "A" "B" "C" "D" ...
Your problem is that your data set is too large for the algorithm that cramer.test is using (at least the way it's coded). The code tries to create a lookup table according to
lookup <- matrix(rep(0, (RVAL$m + RVAL$n)^2),
ncol = (RVAL$m + RVAL$n))
where RVAL$m and RVAL$n are the number of rows of the two samples. The standard maximum length of an R vector is 2^31-1 on a 32-bit platform: since your samples have equal numbers of rows N, you'll be trying to create a vector of length (2*N^2), which in your case is 5.754779e+12 -- probably too big even if R would let you create the vector.
You may have to look for another implementation of the test, or another test.

How to get fitted values from ar() method model in R

I want to retrieve the fitted values from an ar() function output model in R. When using Arima() method, I get them using fitted(model.object) function, but I cannot find its equivalent for ar().
It does not store a fitted vector but does have the residuals. An example of using the residuals from the ar-object to reconstruct the predictions from the original data:
data(WWWusage)
arf <- ar(WWWusage)
str(arf)
#====================
List of 14
$ order : int 3
$ ar : num [1:3] 1.175 -0.0788 -0.1544
$ var.pred : num 117
$ x.mean : num 137
$ aic : Named num [1:21] 258.822 5.787 0.413 0 0.545 ...
..- attr(*, "names")= chr [1:21] "0" "1" "2" "3" ...
$ n.used : int 100
$ order.max : num 20
$ partialacf : num [1:20, 1, 1] 0.9602 -0.2666 -0.1544 -0.1202 -0.0715 ...
$ resid : Time-Series [1:100] from 1 to 100: NA NA NA -2.65 -4.19 ...
$ method : chr "Yule-Walker"
$ series : chr "WWWusage"
$ frequency : num 1
$ call : language ar(x = WWWusage)
$ asy.var.coef: num [1:3, 1:3] 0.01017 -0.01237 0.00271 -0.01237 0.02449 ...
- attr(*, "class")= chr "ar"
#===================
str(WWWusage)
# Time-Series [1:100] from 1 to 100: 88 84 85 85 84 85 83 85 88 89 ...
png(); plot(WWWusage)
lines(seq(WWWusage),WWWusage - arf$resid, col="red"); dev.off()
The simplest way to get the fits from an AR(p) model would be to use auto.arima() from the forecast package, which does have a fitted() method. If you really want a pure AR model, you can constrain the differencing via the d parameter and the MA order via the max.q parameter.
> library(forecast)
> fitted(auto.arima(WWWusage,d=0,max.q=0))
Time Series:
Start = 1
End = 100
Frequency = 1
[1] 91.68778 86.20842 82.13922 87.60576 ...

Transform to numeric a column with "NULL" values

I've imported a dataset into R where in a column which should be supposed to contain numeric values are present NULL. This make R set the column class to character or factor depending on if you are using or not the stringAsFactors argument.
To give you and idea this is the structure of the dataset.
> str(data)
'data.frame': 1016 obs. of 10 variables:
$ Date : Date, format: "2014-01-01" "2014-01-01" "2014-01-01" "2014-01-01" ...
$ Name : chr "Chi" "Chi" "Chi" "Chi" ...
$ Impressions: chr "229097" "3323" "70171" "1359" ...
$ Revenue : num 533.78 11.62 346.16 3.36 1282.28 ...
$ Clicks : num 472 13 369 1 963 161 1 7 317 21 ...
$ CTR : chr "0.21" "0.39" "0.53" "0.07" ...
$ PCC : chr "32" "2" "18" "0" ...
$ PCOV : chr "3470.52" "94.97" "2176.95" "0" ...
$ PCROI : chr "6.5" "8.17" "6.29" "NULL" ...
$ Dimension : Factor w/ 11 levels "100x72","1200x627",..: 1 3 4 5 7 8 9 10 11 1 ...
I would like to transform the PCROI column as numeric, but containing NULLs it makes this harder.
I've tried to get around the issue setting the value 0 to all observations where current value is NULL, but I got the following error message:
> data$PCROI[which(data$PCROI == "NULL"), ] <- 0
Error in data$PCROI[which(data$PCROI == "NULL"), ] <- 0 :
incorrect number of subscripts on matrix
My idea was to change to 0 all the NULL observations and afterwards transform all the column to numeric using the as.numeric function.
You have a syntax error:
data$PCROI[which(data$PCROI == "NULL"), ] <- 0 # will not work
data$PCROI[which(data$PCROI == "NULL")] <- 0 # will work
by the way you can say:
data$PCROI = as.numeric(data$PCROI)
it will convert your "NULL" to NA automatically.

Resources