I'm trying to run a block bootstrapping function on some time series data (monthly interest rates for ~15 years).
My data is in a csv file with no header, all comprising one column and going down by row.
I installed the package bootstrap because tsboot wouldn't work for me.
Here is my code:
testFile = read.csv("\\Users\\unori/sample_data.csv")
theta <- function(x){mean(x)}
results = bootstrap(testFile,100,theta)
It tells me there are at least 50 errors. All of them say "In mean.default(x) : argument is not numeric or logical: returning NA"
What to do? It runs when I use the example in the documentation. I think it must be how my data is stored/imported?
Thanks in advance.
Try to supply a working, minimal example that reproduces your problem! Check here to see how to make a minimal reproducible example.
The error messages tells you that the thing you want to calculate the mean of, is not a number! So R will just return NA.
Suggestions for debugging:
Does the object 'testFile' exist?
What is the output of
str(testFile)
This works for me:
library(bootstrap)
testFile <- cars[,1]
theta <- function(x){mean(x)}
results = bootstrap(testFile,100,theta)
Related
I am trying to fit a model with the SuperLearner package. However, I can't even get past the stage of playing with the package to get comfortable with it....
I use the following code:
superlearner<-SuperLearner::SuperLearner(Y=y, X=as.data.frame(data_train[1:30]), family =binomial(), SL.library = list("SL.glmnet"), obsWeights = weights)
y is a numeric vector of the same length as my dataframe "data_train", containing the correct labels with 9 different classes. The dataframe "data_train" contains 30 columns with numeric data.
When i run this, i get the Error:
Error in get(library$screenAlgorithm[s], envir = env) :
Objekt 'All' not found
I don't really know what the problem could be and i can't really wrap my head around the source code. Please note that the variable obsWeights in the function contains a numeric vector of the same length as my data with weights i calculated for the model. This shouldn't be the problem, as it doesn't work either way.
Unfortunately i can't really share my data on here, but maybe someone had this error before...
Thanks!
this seems to happen if you do not attach SuperLearner, you can fix via library(SuperLearner)
Possibly it's a stupid question (but be patient, I'm a beginner in R's word)... I'm working with ImpulseDE2, a package designed to RNAseq data analysis along different times (see article for more information).
The running function (runImpulseDE2) requires a matrix counts and a annotation data frame. I've created both but it appears this error message:
Error in checkCounts(matCountData, "matCountData"): ERROR: matCountData contains non-integer elements. Requires count data.
I have tried some solutions and nothing seems to work (and I've not found any solution in the Internet)...
as.matrix(data)
(data + 1) > and there isn't NAs nor zero values that originate this error ($ which(is.na(data)) and $ which(data < 1), but both results are integer(0))
as.numeric(data) > and appears another error: ERROR: [Rownames of matCountData] was not given as input.
I think that's something I'm not realizing, but I'm totally locked. Every tip will be welcome!
And here is the (silly) solution! This function seems not to accept float numbers... so applying a simple round is enough to solve this error.
Thanks for your help!
Im trying to use dput() to create a reproducible example with a large database. The database needs to be large as the reproducible example involves moving averages. The way I've found to do this involves the function reproduce, shared here How to make a great R reproducible example? by #Ricardo Saporta. reproduce is based on dput() (code here https://github.com/rsaporta/pubR/blob/gitbranch/reproduce.R).
library(data.table)
library(devtools)
source_url("https://raw.github.com/rsaporta/pubR/gitbranch/reproduce.R")
data <- read.table("http://pastebin.com/raw/xP1Zd0sC")
setDF(data)
reproduce(data, rows = c(1:100))
That code creates data dataframe, and then provides a dput() output for it. It uses the rows argument to output the full dataframe. Yet if I use such output to recreate the dataframe, it fails.
Trying to allocate the dput() output to a new dataframe results in incomplete code, requiring me to add three parentheses manually at the end. And after doing so, I get the following error message: "Error in View : arguments imply differing number of rows: 100, 61".
Please not that the dput() output from reproduce without the rows = c(1:100) argument works fine. It just does not output the full dataframe, but rather a sample of it.
#This works fine
reproduce(data)
Please also note that I used the pastebin method to create this reproducible example. That method does not replace the dput() method for my purposes because it fails whenever trying to import data where some columns have spaces between the words (e.g. dataframes with datetime stamps).
EDIT: after some further troubleshooting discovered that reproduce fails as described above when the rows argument is used together with a dataframe containing 4 or more columns. Will have to find an alternative.
If anyone is interested in testing this, run the code above with the following links, all containing different number of columns:
1) 100x5: http://pastebin.com/raw/xP1Zd0sC
2) 100x4: http://pastebin.com/raw/YZtetfne
3) 100x4: http://pastebin.com/raw/63Ap2bh5
4) 100x3: http://pastebin.com/raw/1vMMcMtx
5) 100x3: http://pastebin.com/raw/ziM1bYQt
6) 100x1: http://pastebin.com/raw/qxtQs5u4
If you are just trying to dput() the first 100 rows of a data set, then you can simply subset the data just prior to running dput(). There doesn't seem to be a need to use the linked function.
dput(droplevels(head(data, 100))) ## or dput(droplevels(data[1:100,]))
should do it.
It is, however, peculiar that your try on reproduce() did not work. I would file an issue on the github page for that. You will likely get a more constructive answer there.
Thanks to #David Arenburg for reminding me about droplevels(). It is useful on this operation if we have factor columns. "Leftover" levels will be dropped.
I was trying to run some entropy() calculations on Force Platform data and i get a warning message:
> library(entropy)
> d2 <- read.csv("c:/users/SLA9DI/Documents/data2.csv")
> entropy(d2$CoPy, method="MM")
[1] 10.98084
> entropy(d2$CoPx, method="MM")
[1] 391.2395
Warning message:
In log(freqs) : NaNs produced
I am sure it is because the entropy() is trying to take the log of a negative number. I also know R can do complex numbers using complex(), however i have not been successful in getting it to work with my data. I did not get this error on my CoPy data, only the CoPx data, since a force platform gets Center of Pressure data in 2 dimensions. Does anyone have any suggestions on getting complex() to work on my data set or is there another function that would work better to try and get a proper entropy calculation? Entropy shouldn't be that much greater in CoPx compared to CoPy. I also tried it with some more data sets from other subjects and the same thing was popping up, CoPx entropy measures were giving me warning messages and CoPy measurements were not. I am attaching a data set link so anyone can try it out for themselves and see if they can figure it out, as the data is a little long to just post into here.
Data
Edit: Correct Answer
As suggested, i tried the table(...) function and received no warning/error message and the entropy output was also in the expected range as well. However, i apparently overlooked a function in the package discretize() and that is what you are supposed to use to correctly setup the data for entropy calculation.
I think there's no point in applying the entropy function on your data. According to ?entropy, it
estimates the Shannon entropy H of the random variable Y from the corresponding observed counts y
(emphasis mine). This means that you need to convert your data (which seems to be continuous) to count data first, for instance by binning it.
I am conducting a sensitivity study using the Sensitivity package. When trying to calculate the sensitivity indices with the output data of the external model I get the error specified in the titel.
The output is a three column table stored in a csv file which I read in as follows:
day1 <- read.csv("day_1_outputs.csv",header=FALSE)
Now when I try to calculate sensitivity indices with the ouput of the first column:
tell(sob.pars,day1[,1])
I get:
Error in `row.names<-.data.frame`(`*tmp*`, value = value) :
invalid 'row.names' length
At first I thought I should use a matrix like object because in another study I conducted I generated the ouput from a raster image read in as a matrix which worked fine, but that didn't help.
The help page for tell states using a vector for the model results but even if I store the column of the dataframe before using tell the problem persists.
I guess my main problem is that I don't understand the error message in conjunction with the tell function, sob.pars is a list returned by sensitivity analyses objects constructors from the same package so I don't know to which rownames of that object the message is refering.
Any hint is appreciated.
Finally found out what the problem was. The error is kind of missleading.
The problem was not the row names since these were identical, that's what irritated me in the first place. There was obviously nothing wrong with them.
The actual problem was the column names in sob.pars. These were missing. Once I added these everything worked fine. Thanks rawr anyways (I just only now noticed someone had commented on the question, I thought I would be notified when this happens, but I guess not).