ImpulseDE2, matrix counts contains non-integer elements - r

Possibly it's a stupid question (but be patient, I'm a beginner in R's word)... I'm working with ImpulseDE2, a package designed to RNAseq data analysis along different times (see article for more information).
The running function (runImpulseDE2) requires a matrix counts and a annotation data frame. I've created both but it appears this error message:
Error in checkCounts(matCountData, "matCountData"): ERROR: matCountData contains non-integer elements. Requires count data.
I have tried some solutions and nothing seems to work (and I've not found any solution in the Internet)...
as.matrix(data)
(data + 1) > and there isn't NAs nor zero values that originate this error ($ which(is.na(data)) and $ which(data < 1), but both results are integer(0))
as.numeric(data) > and appears another error: ERROR: [Rownames of matCountData] was not given as input.
I think that's something I'm not realizing, but I'm totally locked. Every tip will be welcome!

And here is the (silly) solution! This function seems not to accept float numbers... so applying a simple round is enough to solve this error.
Thanks for your help!

Related

R: pwfdtest (plm package) produces error message

I used a fixed effects and a first difference estimation. To decide which is more efficient Wooldridge proposes a specific test that is incorporated in the plm package via the following function:
pwfdtest(Y~X1+X2+..., data=...)
However, running this results in an error message for me stating:
> pwfdtest(DepVar~ExplVar1+ExplVar2, data = data)
Error in `$<-.data.frame`(`*tmp*`, "FDres", value = c(-1.18517291896221, :
replacement has 521293 rows, data has 621829
In addition: Warning message:
In id - c(NA, id[1:(N - 1)]) :
longer object length is not a multiple of shorter object length
I tried to look up if anyone has experienced this error before posting, but I couldn't find the answer.
Sometimes, I came across people asking for a minimum working example, but mine is not working at all. However, the example from the plm package does work.
Please note that this is my first research conducted as well as the first time I have used R. So bear with me.
Best wishes
Alex
EDIT:
I read that the traceback() function might be somewhat useful. However it mostly just spat out various number of which I can not even reach the top (?) Anyway,
last lines of these numbers are:
-1.65868856541809, 2.89084861854684, -1.68650260853188, 0.655681663187397,
-0.677329685017227, 0.993684102310348, 1.33441048058398, -2.0526651614649,
-1.64392358708552, 2.58673448155514, 0.952616064091869, -0.909754051474562,
0.815593306056627, -0.0542364686765445, 0.0184515528912868))
2: pwfdtest.panelmodel(fd1)
1: pwfdtest(fd1)
EDIT2:
My first guess was that the NA might be troubling, so I reduced my panel only to the dependent variable and one explanatory variable. Beforehand, I checked if there were any NA, which were not. Yet a smiliar error message:
Error in `$<-.data.frame`(`*tmp*`, FDres, value = c(-1.18517291896221, :
replacement has 521293 rows, data has 621829
In addition: Warning message:
In id - c(NA, id[1:(N - 1)]) :
longer object length is not a multiple of shorter object length
EDIT3:
I think I might have found the problem: unbalanced panel. And it makes somewhat sense I guess... Yet there does not seem to be a solution for it in the traditional sense, it simply does not work.
So if anyone is interested what I did:
I further reduced my panel to only 300 individuals and less years. I named the individuals 1-300 and drumroll it worked. However, after changing some of the individuals names to, for example 555 or 556 it gave me the same error as before.
I am not very proficient with these things, but I my uneducated guess is that the test simply does not work on unbalanced panels.

Correctly setting up Shannon's Entropy Calculation in R

I was trying to run some entropy() calculations on Force Platform data and i get a warning message:
> library(entropy)
> d2 <- read.csv("c:/users/SLA9DI/Documents/data2.csv")
> entropy(d2$CoPy, method="MM")
[1] 10.98084
> entropy(d2$CoPx, method="MM")
[1] 391.2395
Warning message:
In log(freqs) : NaNs produced
I am sure it is because the entropy() is trying to take the log of a negative number. I also know R can do complex numbers using complex(), however i have not been successful in getting it to work with my data. I did not get this error on my CoPy data, only the CoPx data, since a force platform gets Center of Pressure data in 2 dimensions. Does anyone have any suggestions on getting complex() to work on my data set or is there another function that would work better to try and get a proper entropy calculation? Entropy shouldn't be that much greater in CoPx compared to CoPy. I also tried it with some more data sets from other subjects and the same thing was popping up, CoPx entropy measures were giving me warning messages and CoPy measurements were not. I am attaching a data set link so anyone can try it out for themselves and see if they can figure it out, as the data is a little long to just post into here.
Data
Edit: Correct Answer
As suggested, i tried the table(...) function and received no warning/error message and the entropy output was also in the expected range as well. However, i apparently overlooked a function in the package discretize() and that is what you are supposed to use to correctly setup the data for entropy calculation.
I think there's no point in applying the entropy function on your data. According to ?entropy, it
estimates the Shannon entropy H of the random variable Y from the corresponding observed counts y
(emphasis mine). This means that you need to convert your data (which seems to be continuous) to count data first, for instance by binning it.

R Error in `row.names<-.data.frame`(`*tmp*`, value = value) while using tell of the sensitivity package

I am conducting a sensitivity study using the Sensitivity package. When trying to calculate the sensitivity indices with the output data of the external model I get the error specified in the titel.
The output is a three column table stored in a csv file which I read in as follows:
day1 <- read.csv("day_1_outputs.csv",header=FALSE)
Now when I try to calculate sensitivity indices with the ouput of the first column:
tell(sob.pars,day1[,1])
I get:
Error in `row.names<-.data.frame`(`*tmp*`, value = value) :
invalid 'row.names' length
At first I thought I should use a matrix like object because in another study I conducted I generated the ouput from a raster image read in as a matrix which worked fine, but that didn't help.
The help page for tell states using a vector for the model results but even if I store the column of the dataframe before using tell the problem persists.
I guess my main problem is that I don't understand the error message in conjunction with the tell function, sob.pars is a list returned by sensitivity analyses objects constructors from the same package so I don't know to which rownames of that object the message is refering.
Any hint is appreciated.
Finally found out what the problem was. The error is kind of missleading.
The problem was not the row names since these were identical, that's what irritated me in the first place. There was obviously nothing wrong with them.
The actual problem was the column names in sob.pars. These were missing. Once I added these everything worked fine. Thanks rawr anyways (I just only now noticed someone had commented on the question, I thought I would be notified when this happens, but I guess not).

R ffdf Error: is.null(names(x)) is not TRUE

I'm trying to understand what a particular error message means in R.
I'm using an ff data frame (ffdf) to build an rpart model. That works fine.
However, when I try to apply the predict function (or any function) using ffdfdply, I get a cryptic error message that I can't seem to crack. I'm hoping someone here can shed light on its meaning.
PredictedData<-ffdfdply(x=TrainingData,split=TrainingData$id,
FUN=function(x) {x$Predicted<-predict(Model1,newdata=x)
x})
If I've thought about this correctly, ffdfdply will take the TrainingData table, split it into chunks based on TrainingData$id, then apply the predict function using the model file Model1. Then it will return the table (its labelled x in the function field), combine them back together into the table PredictedData. PredictedData should be the same as TrainingData, except with an additional column called "Predicted" added.
However, when I run this, I get rather unhelpful error message.
2014-07-16 21:16:17, calculating split sizes
2014-07-16 21:16:36, building up split locations
2014-07-16 21:17:02, working on split 1/30, extracting data in RAM of 32 split elements, totalling, 0.07934 GB, while max specified data specified using BATCHBYTES is 0.07999 GB
Error: is.null(names(x)) is not TRUE
In addition: Warning message:
In ffdfdply(x = TrainingData, split = TrainingData$id, FUN = function(x) { :
split needs to be an ff factor, converting using as.character.ff to an ff factor
Yes, every column has a name. Those names contain just alphanumeric characters plus periods. But the error message makes me think that that the columns should not have names? I guess I 'm confused what this means exactly.
I appreciate any hints anyone can provide and I'll be happy to provide more detail.
I think I found the solution to this.
Turns out I had periods in my column names. When I removed those periods, this worked perfectly.

R error when calculating log rates of returns on portfolios (ETFs)

I have tried to calculate the log rate of return for a world index (labled acw) portfolio and a country index portfolio (labled chi). The world index works but the country index gives me the error "only 0's may be mixed with negative subscripts". I have combed the data and there are no zeros or negative numbers in it. I'm going insane trying to work it out. I am learning R for the first time so it is probably a very basic problem but I can't find an answer anywhere online. Here is the code,
> data <- read.table("C:/Documents and Settings/Emma/My Documents/data.csv",header=T,sep=",")
> data <- data.frame(data)
> td<-length(data$date)
> t<-td-1
> acwr<-250*log(data$acw[2:td]/data$acw[1:(td-1)])
> chir<-250*log(data$chi[2:td]/data$chi[1:(td-1)])
Error in data$chi[1:(td - 1)] :
only 0's may be mixed with negative subscripts
> traceback()
No traceback available
Any advice or help would be much appreciated!
This is all based on guesswork of course...
But I think....
Your data has no date column. Your fundamental failure is not checking this.
Hence td is zero. Your second failure is not checking this.
You then try and subscript data$acw[1:(td-1)]. This works, you say, in that it doesn't return an error. But you haven't checked this. I guess it returns NULL because your data HAS NO acw column. So data$acw is NULL, and R doesn't care how you try and subset NULL. You get back NULL. Your third failure is not checking this.
You then try and subscript data$chi[1:(td-1)]. This fails, because data$chi exists, and:
> 1:(td-1)
[1] 1 0 -1
so your subscript has +1 and -1 in it.
And in R subscripts, this is saying to get the element 1 and not element 1 (the zero is irrelevant). So it fails.
All this would have been obvious if you'd done summary(data) or shown us the data file. Currently it is just speculation until you show us these things, and it would have saved me twenty minutes.
Try and break down your R into elements and check they are all what you expect them to be. As an interpreted language, R makes this easy for you.
What you should really do of course, is not check the length of an element to get the number of rows of a dataframe as you try with data$date. There's an nrow function for that.

Resources