Missing values and NaN's not allowed if 'na.rm' is FALSE - r

I am just starting out using R, and I am trying to create a False Discovery Rate plot. However, there's an error message that I keep running into that seems very basic. I've search everywhere for a solution but haven't been able to find one.
Here are the inputs and outputs:
> melanoma.data <- pamr.from.excel("MelanomaData.txt", 10, sample.labels=FALSE)
pamr.menu(melanoma.data)
1: pamr.train
2: pamr.cv
3: pamr.plotcv
...(other selections)
Selection: 1
Warning: a class contains only 1 sampleError in quantile.default(sd, offset.percent/100) :
missing values and NaN's not allowed if 'na.rm' is FALSE
Error during wrapup: cannot open the connection
I converted "MelanomaData" from an excel file into a text file. I've searched through the data to make sure that there are no NA's or blank values.
I'm at a lost as to what the problem could be, and any help would be greatly appreciated!

I am seeing two helpful messages here.
The first is Warning: a class contains only 1 sample.
The second says Error in quantile.default(sd, offset.percent/100).
The first argument of quantile.default sounds like 'standard deviation'.
What about this possibility: Your code computes the sd of a one-element vector, which is NA, which quantile.default cannot use because nobody told it to ignore it?
I suggest you go and find out what produces the warning.

Related

Undefined columns selected error in R

I apologize in advance because I'm extremely new to coding and was thrust into it just a few days ago by my boss for a project.
My data set is called s1. S1 has 123 variables and 4 of them have some form of "QISSUE" in their name. I want to take these four variables and duplicate them all, adding "Rec" to the end of each one (That way I can freely play with the new variables, while still maintaining the actual ones).
Running this line of code keeps giving me an error:
b<- llply(s1[,
str_c(names(s1)
[str_detect(names(s1), fixed("QISSUE"))],
"Rec")],table)
The error is as such:
Error in `[.data.frame`(s1, , str_c(names(s1)[str_detect(names(s1), fixed("QISSUE")) & :
undefined columns selected
Thank you!
Use this to get the subset. Of course there is other ways to do that with simpler code
b<- llply(s1[,
names(s1)[str_detect(names(s1), fixed("QISSUE"))]
],c)
nwnam=str_c(names(s1)[str_detect(names(s1), fixed("QISSUE"))],"Rec")
ndf=data.frame(do.call(cbind,b));colnames(ndf)=nwnam
ndf
# of course you can do
cbind(s1,ndf)

window function R code

fine people of stackoverflow. I have become trapped on a rather simple part of my program and was wondering if you guys could help me.
library(nonlinearTseries)
tt<-c(0,500,1000)
mm<-rep(0,2)
for (j in 1:2){mm[j]=estimateEmbeddingDim(window(rnorm(1000), start=tt[j],end=tt[j+1]), number.points=(tt[j+1]-tt[j]),do.plot=FALSE)}
Warning message:
In window.default(rnorm(1000), start = tt[j], end = tt[j + 1]) :
'start' value not changed
If I plug in the values directly (tt[1], tt[2], tt[3]), it works but I also get a warning
estimateEmbeddingDim(window(rnorm(1000), start=tt[1],end=tt[2]), number.points=(tt[2]-tt[1]),do.plot=FALSE)
[1] 9
Warning message:
In window.default(rnorm(1000), start = tt[1], end = tt[2]) :
'start' value not changed
Thanks, Matt.
The problem seems to be with the
window(rnorm(1000), start=tt[j],end=tt[j+1])
lines. First of all, window is only meant to be used with a time series object (class=="ts"). In this case, rnorm(1000) simply returns a numeric vector, there are no dates associated with this object. So i'm not sure what you think this function does. Did you only want to extract the values that were between 0-500 and 500-1000? If so that seems a bit because with a standard normal variable, the max of 1000 samples isn't likely to be much over 4 let alone 500.
So be sure to use a proper "ts" object with dates and everything to get this to work.

Scilab Error 10000

Hi I am new to scilab and don't have much mathematical background.
I have been following code for another example and am being shown error 10000 for the following code:
function [z]=f(x,y)
z=0.026*(1.0-(y/ym))*y;
endfunction;
ym=12000;
x0=1950;y0=2555;xn=5;h=10;
x=[x0:h:xn];
y=ode("rk",y0,x0,x,f);
disp("x y")
disp("--------")
disp([x'y']);
function z=fe(x)
z=ym/(1-(1-ym/y0)*e^(-k*(t-t0)));
endfunction;
xe=(x0:h/10:xn);
n=length(xe)
for i=1:n
ye(i)=fe(xe(i));
end;
plot (x,y,'ro',xe, ye,'-b');legend ('rk4','Exact',3);
xtitle('solving dy/dx=k(1-y/ym)y','x','y');
I have worked through several other error messages. I am lost and don't know if the problem is in the code or the way I set up the problem. The following is the current error message:
!--error 10000
plot: Wrong size for input argument #2: A non empty matrix expected.
at line 57 of function checkXYPair called by :
at line 235 of function plot called by :
plot (x,y,'ro',xe, ye,'-b');legend ('rk4','Exact',3);
at line 25 of exec file called by :
I would appreciate any help.
Thanks
Start by adding clear as first statement, this will erase all variables before running your function. In the above script you don't declare ye.
Also the statement x=[x0:h:xn]; is strange with those values for x0,h and xn. You are now trying to get a list of x-values starting at 1950 and with positive steps of 10 up until 5 is reached.
I would recommend to try each line and see if the outcome is as expected. You do not need to know everything about the code, but probaly x and y should contain at least some values. The error is telling you that it expected a non-empty matrix for argument 2. This is y, so essentially it is telling you y is empty.

glm probit "cannot find valid starting values" error message

I am trying to use R for the first time to do some probit analaysis.
I get the following error:
Error in if (!(validmu(mu) && valideta(eta))) stop("cannot find valid starting values: please specify some", :
missing value where TRUE/FALSE needed
This is the command I am using:
m1=glm(Good~Stg.Days+Dev.Deployments+Check.Ins+NoOfDevelopers,family=poisson(link = "probit"),data=deploy[1:4,])
My data deploy[1:4,] are as loaded in from a CSV file follows:
Good,Application Type,Project,Start Date,End Date,Stg Days,Dev Deployments,Check Ins,NoOfDevelopers
1,DocumentPlatform,ZCP,11/08/2010,11/11/2010,0.6,0,12,4
1,DocumentPlatform,ZCP,11/11/2010,09/12/2010,0.4,0,4,1
0,DocumentPlatform,ZCP,09/12/2010,07/03/2011,10,0,7,3
1,FactsheetPlatform,Process.ARCH,28/06/2010,09/03/2011,7.1,0,18,2
deploy is in reality a much bigger vector than 1:4 I am just using a subset of the data to help determine the problem.
Any ideas what is wrong?
As i commented: Using ?glm I found tha the poisoon family supports the following links: log, identity, and sqrt.
Testing on another link:
test <- data.frame('Good'=c(1,1,0,1),'Stg Days'=c(0.6,0.4,10,7.1),'Dev Deployments'=c(0,0,0,0),'Check Ins'=c(12,4,7,18),'NoOfDevelopers'=c(4,1,3,2))
m1=glm(Good~ . ,family=poisson(link = "log"),data=test)
Gives no errors. So I think your link = "probit" is the problem.

`data.table` error: "reorder received irregular lengthed list" in setkey

I have a fairly basic data.table in R, with 250k rows and 90 columns. I am trying to key the data.table on one of the columns which is of class character. When I call:
setkey(my.dt,my.column)
I receive the following cryptic error message:
"Error in setkeyv(x, cols, verbose=verbose) :
reorder received irregular lengthed list"
I have found a source-code commit with this message, but can't quite decipher what it means. My key column contains no NA or blank values, seems perfectly reasonable to look at (it contains stock tickers), and behaves well with the default order() command.
Even more frustrating, the following code completes correctly:
first.dt <- my.dt[1:100000]
setkey(first.dt,my.column)
second.dt <- my.dt[100001:nrow(my.dt]
setkey(second.dt,my.column)
I have no idea what could be going on here. Any tips?
Edit 1: I have confirmed every value in the key fits a fairly standard format:
> length(grep("[A-Z]{3,4}\\.[A-Z]{2}",my.dt$my.column)) == nrow(my.dt)
[1] TRUE
Edit 2: My system info is below (note that I'm actually using Windows 7). I am using data.table version 1.8.
> Sys.info()
sysname release version nodename machine login
"Windows" "Server 2008 x64" "build 7600" "WIN-9RH28AH0CKG" "x86-64" "Administrator"
user effective_user
"Administrator" "Administrator"
Please run :
sapply(my.dt, length)
I suspect that one or more columns have a different length to the first column, and that's an invalid data.table. It won't be one of the first 5 because your .Internal(inspect(my.dt)) (thanks) shows those and they're ok.
If so, there is this bug fix in v1.8.1 :
o rbind() of DT with an irregular list() now recycles the list items
correctly, #2003. Test added.
Any chance there's an rbind() at an earlier point to create my.dt together with an irregular lengthed list? If not, please step through your code running the sapply(my.dt,length) to see where the invalidly lengthed column is being created. Armed with that we can make a work around and also fix the potential bug. Thanks.
EDIT :
The original cryptic error message is now improved in v1.8.1, as follows :
DT = list(a=6:1,b=4:1)
setattr(DT,"class",c("data.table","data.frame"))
setkey(DT,a)
Error in setkeyv(x, cols, verbose = verbose) :
Column 2 is length 4 which differs from length of column 1 (6). Invalid
data.table. Check NEWS link at top of ?data.table for latest bug fixes. If
not already reported and fixed, please report to datatable-help.
NB: This method to create a data.table is not recommended because it lets you create an invalid data.table. Unless, you are really sure the list is regular and you really do need speed (i.e. for speed you want to avoid the checks that as.data.table() and data.table() do), or you need to demonstrate an invalid data.table, as I'm doing here.

Resources