Error in if statement with NA in data in R [duplicate] - r

This question already has answers here:
Error in if/while (condition) {: missing Value where TRUE/FALSE needed
(4 answers)
Closed 1 year ago.
I have a problem with a in if statement. I get an error message saying "absent value where TRUE / FALSE is required". I am trying to calculate a new variable using an if statement and a for cycle, but the data has NA values and the cycle I used cannot work any further after finding a NA value.
This is the variables I am using to create the new variable:
x=c(3,3,3,2,NA,2,3,NA,3,NA)
y=c(3,6,5,4,NA,3,2,NA,3,NA)
h=c(1,2,1.6666667,2,NA,1.5,0.6666667,NA,1,NA)
This the code I am using that has the problem with NA value:
z=rep(NA,length(y))
for(i in 1:length(x)){
if((x[i]==0 & y[i]>=3) | h[i]>=3){
z[i]=1
} else if((x==0 & y[i]<3) | h[i]<3){
z[i]=0
}
}
Can you tell me how could I include the NA values into the if statement or what should I do?
Thanks for your reply.

We can make changes based on the NA by inserting is.na
for(i in 1:length(x)){
if((x[i] %in% 0 & y[i]>=3 & !is.na(y[i])) | h[i]>=3 & !is.na(h[i])){
z[i]=1
} else if((x[i] %in% 0 & y[i]<3 & !is.na(y[i])) | h[i]<3 & !is.na(h[i])){
z[i]=0
}
}

You can check with !is.na(). Also this operation is vectorized so you don't need for loop.
inds <- x == 0 & y >= 3 | h >= 3
as.integer(inds & !is.na(inds))
#[1] 0 0 0 0 0 0 0 0 0 0
None of the value match the condition here.

Related

if else multiple conditions comparing rows

I am strugling with this loop. I want to get "6" in the second row of column "Newcolumn".I get the following error.
Error in if (mydata$type_name[i] == "a" && mydata$type_name[i - :
missing value where TRUE/FALSE needed.
The code that I created:
id type_name name score newcolumn
1 a Car 2 2
1 a van 2 6
1 b Car 2 2
1 b Car 2 2
mydata$newcolumn <-c(0)
for (i in 1:length(mydata$id)){
if ((mydata$type_name [i] == "a") && (mydata$type_name[i-1] == "a") && ((mydata$name[i]) != (mydata$name[i-1]))){
mydata$newcolumn[i]=mydata$score[i]*3 }
else {
mydata$newcolumn[i]=mydata$score[i]*1
}
}
Thank you very much in advance
List starts at index 1 in R but like you are doing a i-1 in your loop starting at 1, your list is out of range (i-1=0) so your code can not return a True or False.

Making new variable through mutate

I want to make a new variable "churned" by taking into account five variables :
Include in churn
A-Churn
B-Churn
C-Churn
D-Churn
My condition is - If variable "Include in churn" has 1 and for all other variables , if any one of the variables has 1 than my new variable "Churned" should have 1 else 0. I am a newbie in using mutate function.
Please help me to create this new variable thru 'mutate' function.
If I understand your formulation logically, you want
mutate(data, Churned = Include.in.Churn == 1 & (A.Churn == 1 | B.Churn == 1 | C.Churn == 1 | D.Churn == 1))
This will make Churned a logical. If you really need an integer, as.integer will produce 1 for TRUE and 0 for FALSE.
If all mentioned Variables are either 1 or 0 you can also use the possibly faster
mutate(data, Churned = Include.in.Churn * (A.Churn + B.Churn + C.Churn + D.Churn) >= 1)

How to use AND in R to modify dataframe

I have a data matrix 1200 (row, sample name)* 20000 (col, gene name), I want to delete row when my interested 5 genes have zero values in all samples
command I used for single gene:
allexp <-preallexp[preallexp$GZMB > 0, ]
but I want to use AND in above command, like this:
allexp <-preallexp[preallexp$GZMB && preallexp$TP53 && preallexp$EGFR && preallexp$BRAF && preallexp$VGEF > 0, ]
but this command doesnt work, please I need your help..How to use AND in above command.
EDIT: in response to OP.
I'm sure there's a much more efficient way to code this, but this is what you're after:
allexp <-preallexp[preallexp$GZMB + preallexp$TP53 + preallexp$EGFR +
preallexp$BRAF + preallexp$VGEF > 0, ]
Unless you have negative expression values I would have thought mkt's should work. But here is mine. It will remove values rows where each of the 5 genes and a value of 0
which(preallexp$GZMB == 0 && preallexp$TP53 &&
preallexp$EGFR == 0 && preallexp$BRAF == 0 && preallexp$VGEF == 0)
This gives so the rows where all 5 genes have a value of zero
So we can remove these rows if from the dataframe like follows
allexp <-preallexp[
-(which(preallexp$GZMB == 0 && preallexp$TP53 &&
preallexp$EGFR == 0 && preallexp$BRAF == 0 && preallexp$VGEF == 0)), ]

Error in lis[[i]] : attempt to select less than one element

This code is meant to compute the total distance of some given coordinates, but I don't know why it's not working.
The error is: Error in lis[[i]] : attempt to select less than one element.
Here is the code:
distant<-function(a,b)
{
return(sqrt((a[1]-b[1])^2+(a[2]-b[2])^2))
}
totdistance<-function(lis)
{
totdis=0
for(i in 1:length(lis)-1)
{
totdis=totdis+distant(lis[[i]],lis[[i+1]])
}
totdis=totdis+distant(lis[[1]],lis[[length(lis)]])
return(totdis)
}
liss1<-list()
liss1[[1]]<-c(12,12)
liss1[[2]]<-c(18,23)
liss1[[4]]<-c(29,25)
liss1[[5]]<-c(31,52)
liss1[[3]]<-c(24,21)
liss1[[6]]<-c(36,43)
liss1[[7]]<-c(37,14)
liss1[[8]]<-c(42,8)
liss1[[9]]<-c(51,47)
liss1[[10]]<-c(62,53)
liss1[[11]]<-c(63,19)
liss1[[12]]<-c(69,39)
liss1[[13]]<-c(81,7)
liss1[[14]]<-c(82,18)
liss1[[15]]<-c(83,40)
liss1[[16]]<-c(88,30)
Output:
> totdistance(liss1)
Error in lis[[i]] : attempt to select less than one element
> distant(liss1[[2]],liss1[[3]])
[1] 6.324555
Let me reproduce your error in a simple way
>list1 = list()
> list1[[0]]=list(a=c("a"))
>Error in list1[[0]] = list(a = c("a")) :
attempt to select less than one element
So, the next question is where are you accessing 0 index list ?
(Indexing of lists starts with 1 in R )
As Molx, indicated in previous posts : "The : operator is evaluated before the subtraction - " . This is causing 0 indexed list access.
For ex:
> 1:10-1
[1] 0 1 2 3 4 5 6 7 8 9
>1:(10-1)
[1] 1 2 3 4 5 6 7 8 9
So replace the following lines of your code
>for(i in 1:(length(lis)-1))
{
totdis=totdis+distant(lis[[i]],lis[[i+1]])
}

Check if a column has an value if so right true or false to column next to it

i was wondering how to make something that checks if column Lair in the data
is below or above an certain threshold lets say below 0.5 is called LOH en
above is called imbalance. So the calls LOH and INBALANCE should be written in a new column. I tried something as the code below.
detection<-function(assay,method,thres){
if(method=="threshold"){
idx<-ifelse(segmenten["intensity"]<1.1000000 & segmenten["intensity"]>0.900000 & segmenten["Lair"]>thres,TRUE,FALSE)
}
if(method=="cnloh"){
idx<-ifelse(segmenten["intensity"]<1.1000000 & segmenten["intensity"]>0.900000 & segmenten["Lair"]<thres,TRUE,FALSE)
}
if(method=="gain"){
idx<-ifelse(segmenten["intensity"]>1.1000000 & segmenten["Lair"]<thres,TRUE,FALSE)
}
if(method=="loss"){
idx<-ifelse(segmenten["intensity"]<0.900000 & segmenten["Lair"]<thres,TRUE,FALSE)
}
if(method=="bloss"){
idx<-ifelse(segmenten["intensity"]<0.900000 & segmenten["Lair"]>thres,TRUE,FALSE)
}
if(method=="bgain"){
idx<-ifelse(segmenten["intensity"]>1.100000 & segmenten["Lair"]>thres,TRUE,FALSE)
}
return(idx)
}
After this part the next step is to write the data from the function to the existing table.
Anyone has an idea
Since your desired result is not clear enough I made some assumptions and wrote something that might be useful or not.
First at all, inside your function there is an object segmenten which is not defined, I suppose this is the data set supplied as an input, then you used ifelse and the returning results are TRUE or FALSE but you want either LOH or INBALANCE when some conditions are met.
You want INBALANCE when ... & segmenten["Lair"]>thres and LOH otherwise (here ... means the other part of the condition) this will give a vector, but you want it in the main dataset as an addional column, don't you? So maybe this could be a new starting point for you to improve your code.
detection <- function(assay, method=c('threshold', 'cnloh', 'gain', 'loss', 'bloss', 'bgain'),
thres=0.5){
x <- assay
idx <- switch(match.arg(method),
threshold = ifelse(x["intensity"]<1.1 & x["intensity"]>0.9 & x["Lair"]>thres, 'INBALANCE', 'LOH'),
cnloh = ifelse(x["intensity"]<1.1 & x["intensity"]>0.9 & x["Lair"]<thres, 'LOH', 'INBALANCE'),
gain = ifelse(x["intensity"]>1.1 & x["Lair"]<thres, 'LOH', 'INBALANCE'),
loss = ifelse(x["intensity"]<0.9 & x["Lair"]<thres,'LOH', 'INBALANCE'),
bloss = ifelse(x["intensity"]<0.9 & x["Lair"]>thres, 'INBALANCE', 'LOH'),
bgain = ifelse(x["intensity"]>1.1 & x["Lair"]>thres, 'INBALANCE', 'LOH'))
colnames(idx) <- 'Checking'
return(cbind(x, as.data.frame(idx)))
}
Example:
Data <- read.csv("japansegment data.csv", header=T)
result <- detection(Data, method='threshold', thres=0.5) # 'threshold' is the default value for method
head(result)
SNP_NAME x0 x1 y pos.start pos.end chrom count copynumber intensity allele.B Lair uncertain sample_id
1 SNP_A-1656705 0 0 0 836727 27933161 1 230 2 1.0783 1 0.9218 FALSE GSM288035
2 SNP_A-1677548 0 0 0 28244579 246860994 1 4408 2 0.9827 1 0.9236 FALSE GSM288035
3 SNP_A-1669537 0 0 0 100819 159783145 2 3480 2 0.9806 1 0.9193 FALSE GSM288035
4 SNP_A-1758569 0 0 0 159783255 159791136 2 5 2 1.7244 1 0.9665 FALSE GSM288035
5 SNP_A-1662168 0 0 0 159817465 168664268 2 250 2 0.9786 1 0.9197 FALSE GSM288035
6 SNP_A-1723506 0 0 0 168721411 168721920 2 2 2 1.8027 -4 NA FALSE GSM288035
Checking
1 INBALANCE
2 INBALANCE
3 INBALANCE
4 LOH
5 INBALANCE
6 LOH
Using match.arg and switch functions will help you to avoid a lot of if statements.

Resources