Subscript out of bounds-R error - r

I am using createFolds function in R to create folds which is returning successful result. But when I am using loop to perform some calculation on each fold I am getting below error.
Code is:
set.seed(1000)
k <- 10
folds <- createFolds(train_data,k=k,list = TRUE, returnTrain = FALSE)
str(folds)
This is giving output as:
List of 10
$ Fold01: int [1:18687] 1 8 10 21 22 25 26 29 34 35 ...
$ Fold02: int [1:18685] 5 11 14 32 40 46 50 52 56 58 ...
$ Fold03: int [1:18685] 16 20 39 47 49 77 78 83 84 86 ...
$ Fold04: int [1:18685] 3 15 30 38 41 44 51 53 54 55 ...
$ Fold05: int [1:18685] 7 9 17 18 23 37 42 67 75 79 ...
$ Fold06: int [1:18686] 6 31 36 48 72 74 90 113 114 121 ...
$ Fold07: int [1:18686] 2 33 59 61 100 103 109 123 137 161 ...
$ Fold08: int [1:18685] 24 64 68 87 88 101 110 130 141 152 ...
$ Fold09: int [1:18684] 4 27 28 66 70 85 97 105 112 148 ...
$ Fold10: int [1:18684] 12 13 19 43 65 91 94 108 134 138 ...
However below code is giving me error
for( i in 1:k ){
testData <- train_data[folds[[i]], ]
trainData <- train_data[(-folds[[i]]), ]
}
Error is:
> for( i in 1:k ){
+ testData <- train_data[folds[[i]], ]
+ trainData <- train_data[(-folds[[i]]), ]
+ }
Error in train_data[folds[[i]], ] : subscript out of bounds
I tried with different seed values but I am getting same error.
Any help is appreciated.
Thank you!

As per my understanding, your problem is arising because you are using the whole dataframe train_data to create folds. K-folds can be generated for samples, ie, rows of the dataset.
For instance:
data(spam) # from package kernlab
dim(spam) #has 4601 rows/samples
folds <- createFolds(y=spam$type, k=10, list=T, returnTrain = T)
# Here, only one column , spam$type, is used
# and indeed
max(unlist(folds)) #4601
#and these can be used as row indices
head( spam[folds[[4]], ] )
Using the whole dataframe is very similar to using a matrix. Such a matrix will first be converted to a vector. Thus a 5x10 matrix will actually be converted to 50 element vector and the values in folds will be corresponding to the indices of this vector. If you try to then use these values as row indices for your dataframe, they will overshoot
r <- 8
c <- 10
m0 <- matrix(rnorm(r*c), r, c)
features<-apply(m0, c(1,2), function(x) sample(c(0,1),1))
features
folds<-createFolds(features,4)
folds
max(unlist(folds))
m0[folds[[2]],] # Error in m0[folds[[2]], ] : subscript out of bounds

Related

Why am I getting an "undefined column error"

I am getting this error when I run my script.
Script
`# Analyze another data set with row / col data using SpATS
library(plantbreeding) # load library
data(dataset)
head (dataset)
str(dataset)
dataset$genotypes<-as.factor(dataset$genotypes)
m1 <- SpATS(response = "yield", spatial = ~ SAP(columns, rows), genotype = "genotypes", data = dataset)
plot(m1, all.in.one = TRUE) # see all plots in a common plot`
Dataset is arranged as follows.
Variables are as follows.
> str(dataset) tibble [87 x 4] (S3: tbl_df/tbl/data.frame) $ genotypes: Factor w/ 87 levels "1","2","3","4",..: 66 51 64 62 77 2 86 69 21 74 ... $ Columns : num [1:87] 47 47 47 47 47 47 47 47 47 47 ... $ Rows : num [1:87] 55 57 59 61 63 65 67 69 71 73 ... $ yield : num [1:87] 235 NA 152 119 146 ...
Error that I am getting.
Error in [.data.frame(data, , model.terms) : undefined columns selected
I have tried taking the quotes of variables but it doesn't solve the problem.

Nonlinear model in r

What is the problem with the following r code as I get error?
nonlinear <- function(G,Q,T) {
Y=G+Q*X^T
}
Model <- nls(nonlinear, start = list(G=0.4467, Q=-0.0020537, T=1), data=sample1)
Error: object of type 'closure' is not subsettable
Taking the data from your other question Nonlinear modelling starting values and the code from #Roland this works:
sample1 <- read.table(header=TRUE, text=
"X Y Z
135 -0.171292376 85
91 0.273954718 54
171 -0.288513438 107
88 -0.17363066 54
59 -1.770852012 50
1 0 37
1 0 32
1 0.301029996 36
2 -0.301029996 39
1 1.041392685 30
11 -0.087150176 42
9 0.577236408 20
34 -0.355387658 28
15 0.329058719 17
32 -0.182930683 24
21 0.196294645 21
33 0.114954516 91
43 -0.042403849 111
39 -0.290034611 88
20 -0.522878746 76
6 -0.301029995 108
3 0.477121254 78
9 0 63
9 0.492915522 51
28 -0.243038048 88
16 -0.028028724 17
15 -0.875061263 29
2 -0.301029996 44
1 0 52
1 1.531478917 65")
nonlinear<-function(X,G,Q,T) G+Q*X^T
nls(Y ~ nonlinear(X,G,Q,T), start=list(G=-0.4, Q=0.2, T=-1), data=sample1)
Depending from the data I had to change the starting values!

Error in plotting atomic vector

I want to plot log return graph. I imported my data file in CSV format.My code after that as below with the errors. For more information all the variables do exist in the table.
t <- read.csv("~/Documents/FYP/Log return 1.csv")
View(t)
df<-ts(t)
plot.ts(df$Year,df$IND)
Error in df$Year : $ operator is invalid for atomic vectors
plot.ts(df[Time],df[CHN])
Error in NextMethod("[") : object 'Time' not found
plot.ts(df[Year],df[CHN])
Error in NextMethod("[") : object 'Year' not found
plot.ts(df[[Year]],df[[CHN]])
Error in NCOL(x) : object 'Year' not found
Given a data frame input t, df <- ts(t) gives you a matrix rather than a data frame, so using $ is invalid. To access a column of a matrix, you need
for example, df[, "Time"].
As an example, let's use R's built-in dataset cars. Originally it is a data frame with two columns: speed and dist, while x <- ts(cars) gives a matrix:
class(x)
# [1] "mts" "ts" "matrix"
head(x)
# speed dist
#[1,] 4 2
#[2,] 4 10
#[3,] 7 4
#[4,] 7 22
#[5,] 8 16
#[6,] 9 10
The error you saw can be reproduced by
x$dist
# Error in x$dist : $ operator is invalid for atomic vectors
Instead, we want
x[, "dist"]
#Time Series:
#Start = 1
#End = 50
#Frequency = 1
# [1] 2 10 4 22 16 10 18 26 34 17 28 14 20 24 28 26 34 34 46
#[20] 26 36 60 80 20 26 54 32 40 32 40 50 42 56 76 84 36 46 68
#[39] 32 48 52 56 64 66 54 70 92 93 120 85

Error on simple chi-square test

I am try to carry out chi-square test to see if there is a significant difference in disease proportion between regions but I end up with error in R. Any suggestions on how to correct this error?
data:
E NE NW SE SW EM WM YH
Cases 11 37 54 30 114 44 31 39
Non.cases 28 73 116 68 211 80 78 92
d=read.csv(file.choose(),header=T)
attach(d)
chisq.test(d)
Error in chisq.test(d) :
all entries of 'x' must be nonnegative and finite
Your problem must be somewhere upstream of the chi-squared test, i.e. the data are getting mangled somehow when being read in.
d <- read.table(header=TRUE,text="
E NE NW SE SW EM WM YH
Cases 11 37 54 30 114 44 31 39
Non.cases 28 73 116 68 211 80 78 92")
However you read the data, results should look like this:
str(d)
## 'data.frame': 2 obs. of 8 variables:
## $ E : int 11 28
## $ NE: int 37 73
## ... etc.
chisq.test(d)
## Pearson's Chi-squared test
## data: d
## X-squared = 3.3405, df = 7, p-value = 0.8518
(attach() is not necessary, and usually actually harmful/confusing ...)

Confusion matrix output error

I tried to build a predictive model in R using decision tree through this code:
library(rpart)
library(caret)
DataYesNo<-read.csv('DataYesNo.csv',header=T)
worktrain<- sample(1:50,40)
worktest <- setdiff(1:50,worktrain)
M <- ncol(DataYesNo)
input <- names(DataYesNo)[1:(M-1)]
target <- "ICUtransfer"
tree<- rpart(ICUtransfer~Temperature+RespiratoryRate+HeartRate+SystolicBP+OxygenSaturations,
data=DataYesNo[worktrain, c(input,target)],
method="class",
parms=list(split="information"),
control=rpart.control(usesurrogate=0, maxsurrogate=0))
fitted <- predict(tree, DataYesNo[worktest, c(input,target)])
cmatrix <- confusionMatrix(fitted, worktest$ICUtransfer)
print(cmatrix)
tree
plot(tree)
text(tree)
I got error at : cmatrix <- confusionMatrix(fitted, worktest$ICUtransfer)
"$ operator is invalid for atomic vectors "
please help me to solve this?
Regards,
DataYesNo[worktest,]
Temperature RespiratoryRate HeartRate SystolicBP OxygenSaturations ICUtransfer
11 36.3 26 65 140 97 no
15 37.3 20 80 129 99 no
21 36.9 20 72 154 95 no
26 36.0 28 56 199 97 no
30 36.9 20 72 150 96 no
34 36.6 16 97 118 95 yes
36 36.0 20 77 145 97 yes
38 36.0 20 77 145 97 yes
43 36.3 28 98 116 95 yes
47 36.0 20 77 145 97 yes
I tried this line:
cmatrix <- confusionMatrix(fitted, DataYesNo[worktest,]$ICUtransfer)
but I got this error: Error in confusionMatrix.default(fitted, DataYesNo[worktest, ]$ICUtransfer) :
the data and reference factors must have the same number of levels
please anyone can help me?
You're getting that error because worktest doesn't have any factor called ICUtransfer. worktest is just a numeric vector of indices, and thus has no factors. You want the subset of your data corresponding to the worktest indices.
It's impossible to know what exactly needs to be done, because I can't see into the data structures you're using.
Instead of worktest$ICUtransfer try using DataYesNo[worktest, c(input,target)].

Resources