if else multiple conditions comparing rows - r

I am strugling with this loop. I want to get "6" in the second row of column "Newcolumn".I get the following error.
Error in if (mydata$type_name[i] == "a" && mydata$type_name[i - :
missing value where TRUE/FALSE needed.
The code that I created:
id type_name name score newcolumn
1 a Car 2 2
1 a van 2 6
1 b Car 2 2
1 b Car 2 2
mydata$newcolumn <-c(0)
for (i in 1:length(mydata$id)){
if ((mydata$type_name [i] == "a") && (mydata$type_name[i-1] == "a") && ((mydata$name[i]) != (mydata$name[i-1]))){
mydata$newcolumn[i]=mydata$score[i]*3 }
else {
mydata$newcolumn[i]=mydata$score[i]*1
}
}
Thank you very much in advance

List starts at index 1 in R but like you are doing a i-1 in your loop starting at 1, your list is out of range (i-1=0) so your code can not return a True or False.

Related

How can I solve this error when using case_when?

I'm using this code:
ovabonnement <- ovabonnement %>%
mutate(c12_ovabonnement_type_con_voor = case_when(s2_ovabonnement_type_voor_anders == 1 ~ NA,
s2_ovabonnement_type_voor_1 == 1 |
s2_ovabonnement_type_voor_13 == 1 ~ "Basis",
s2_ovabonnement_type_voor_2 == 1 |
s2_ovabonnement_type_voor_3 == 1 |
s2_ovabonnement_type_voor_4 == 1 |
s2_ovabonnement_type_voor_9 == 1 |
s2_ovabonnement_type_voor_11 == 1 ~ "Voordeel",
s2_ovabonnement_type_voor_5 == 1 |
s2_ovabonnement_type_voor_6 == 1 |
s2_ovabonnement_type_voor_7 == 1 |
s2_ovabonnement_type_voor_8 == 1 |
s2_ovabonnement_type_voor_10 == 1 |
s2_ovabonnement_type_voor_12 == 1 |
s2_ovabonnement_type_voor_14 == 1 ~ "Vrij"))
So I have these 15 variables that represent whether a person has that subscription added onto their public transport membership. Because it was a multiple choice questionnaire people could select multiple choices, which is why they are different variables.
I want to make these into one variable that takes NA if people answered "other", "Basis" if people answered 1 or 13, "Voordeel" if people answered 2,3,4,9 or 11 and "Vrij" if people answered 5,6,7,8,10,12 or 14.
If people answered 2, there will be a 1 in s2_ovabonnement_type_voor_2. People can have answered multiple of these, which makes it a bit tricky. However, I want it to go through these chronologically. For example, if a person answered 2 AND 10, it should choose the 10, because the code is later, but I'm not sure if that is how case_when works.
I get this error:
Error in `mutate()`:
! Problem while computing `c12_ovabonnement_type_con_voor = case_when(...)`.
Caused by error in `names(message) <- `*vtmp*``:
! 'names' attribute [1] must be the same length as the vector [0]
Run `rlang::last_error()` to see where the error occurred.
case_when/if_else are type sensitive i.e all the expressions should return the same type. In the OP's expression, the first expression returns NA and NA by default is logical, and all others return character type. We need NA_character_ to match the type of others
ovabonnement <- ovabonnement %>%
mutate(c12_ovabonnement_type_con_voor = case_when(s2_ovabonnement_type_voor_anders == 1 ~ NA_character_,
s2_ovabonnement_type_voor_1 == 1 |
s2_ovabonnement_type_voor_13 == 1 ~ "Basis",
s2_ovabonnement_type_voor_2 == 1 |
s2_ovabonnement_type_voor_3 == 1 |
s2_ovabonnement_type_voor_4 == 1 |
s2_ovabonnement_type_voor_9 == 1 |
s2_ovabonnement_type_voor_11 == 1 ~ "Voordeel",
s2_ovabonnement_type_voor_5 == 1 |
s2_ovabonnement_type_voor_6 == 1 |
s2_ovabonnement_type_voor_7 == 1 |
s2_ovabonnement_type_voor_8 == 1 |
s2_ovabonnement_type_voor_10 == 1 |
s2_ovabonnement_type_voor_12 == 1 |
s2_ovabonnement_type_voor_14 == 1 ~ "Vrij"))

Error in if statement with NA in data in R [duplicate]

This question already has answers here:
Error in if/while (condition) {: missing Value where TRUE/FALSE needed
(4 answers)
Closed 1 year ago.
I have a problem with a in if statement. I get an error message saying "absent value where TRUE / FALSE is required". I am trying to calculate a new variable using an if statement and a for cycle, but the data has NA values and the cycle I used cannot work any further after finding a NA value.
This is the variables I am using to create the new variable:
x=c(3,3,3,2,NA,2,3,NA,3,NA)
y=c(3,6,5,4,NA,3,2,NA,3,NA)
h=c(1,2,1.6666667,2,NA,1.5,0.6666667,NA,1,NA)
This the code I am using that has the problem with NA value:
z=rep(NA,length(y))
for(i in 1:length(x)){
if((x[i]==0 & y[i]>=3) | h[i]>=3){
z[i]=1
} else if((x==0 & y[i]<3) | h[i]<3){
z[i]=0
}
}
Can you tell me how could I include the NA values into the if statement or what should I do?
Thanks for your reply.
We can make changes based on the NA by inserting is.na
for(i in 1:length(x)){
if((x[i] %in% 0 & y[i]>=3 & !is.na(y[i])) | h[i]>=3 & !is.na(h[i])){
z[i]=1
} else if((x[i] %in% 0 & y[i]<3 & !is.na(y[i])) | h[i]<3 & !is.na(h[i])){
z[i]=0
}
}
You can check with !is.na(). Also this operation is vectorized so you don't need for loop.
inds <- x == 0 & y >= 3 | h >= 3
as.integer(inds & !is.na(inds))
#[1] 0 0 0 0 0 0 0 0 0 0
None of the value match the condition here.

Making new variable through mutate

I want to make a new variable "churned" by taking into account five variables :
Include in churn
A-Churn
B-Churn
C-Churn
D-Churn
My condition is - If variable "Include in churn" has 1 and for all other variables , if any one of the variables has 1 than my new variable "Churned" should have 1 else 0. I am a newbie in using mutate function.
Please help me to create this new variable thru 'mutate' function.
If I understand your formulation logically, you want
mutate(data, Churned = Include.in.Churn == 1 & (A.Churn == 1 | B.Churn == 1 | C.Churn == 1 | D.Churn == 1))
This will make Churned a logical. If you really need an integer, as.integer will produce 1 for TRUE and 0 for FALSE.
If all mentioned Variables are either 1 or 0 you can also use the possibly faster
mutate(data, Churned = Include.in.Churn * (A.Churn + B.Churn + C.Churn + D.Churn) >= 1)

How to use AND in R to modify dataframe

I have a data matrix 1200 (row, sample name)* 20000 (col, gene name), I want to delete row when my interested 5 genes have zero values in all samples
command I used for single gene:
allexp <-preallexp[preallexp$GZMB > 0, ]
but I want to use AND in above command, like this:
allexp <-preallexp[preallexp$GZMB && preallexp$TP53 && preallexp$EGFR && preallexp$BRAF && preallexp$VGEF > 0, ]
but this command doesnt work, please I need your help..How to use AND in above command.
EDIT: in response to OP.
I'm sure there's a much more efficient way to code this, but this is what you're after:
allexp <-preallexp[preallexp$GZMB + preallexp$TP53 + preallexp$EGFR +
preallexp$BRAF + preallexp$VGEF > 0, ]
Unless you have negative expression values I would have thought mkt's should work. But here is mine. It will remove values rows where each of the 5 genes and a value of 0
which(preallexp$GZMB == 0 && preallexp$TP53 &&
preallexp$EGFR == 0 && preallexp$BRAF == 0 && preallexp$VGEF == 0)
This gives so the rows where all 5 genes have a value of zero
So we can remove these rows if from the dataframe like follows
allexp <-preallexp[
-(which(preallexp$GZMB == 0 && preallexp$TP53 &&
preallexp$EGFR == 0 && preallexp$BRAF == 0 && preallexp$VGEF == 0)), ]

Check if a column has an value if so right true or false to column next to it

i was wondering how to make something that checks if column Lair in the data
is below or above an certain threshold lets say below 0.5 is called LOH en
above is called imbalance. So the calls LOH and INBALANCE should be written in a new column. I tried something as the code below.
detection<-function(assay,method,thres){
if(method=="threshold"){
idx<-ifelse(segmenten["intensity"]<1.1000000 & segmenten["intensity"]>0.900000 & segmenten["Lair"]>thres,TRUE,FALSE)
}
if(method=="cnloh"){
idx<-ifelse(segmenten["intensity"]<1.1000000 & segmenten["intensity"]>0.900000 & segmenten["Lair"]<thres,TRUE,FALSE)
}
if(method=="gain"){
idx<-ifelse(segmenten["intensity"]>1.1000000 & segmenten["Lair"]<thres,TRUE,FALSE)
}
if(method=="loss"){
idx<-ifelse(segmenten["intensity"]<0.900000 & segmenten["Lair"]<thres,TRUE,FALSE)
}
if(method=="bloss"){
idx<-ifelse(segmenten["intensity"]<0.900000 & segmenten["Lair"]>thres,TRUE,FALSE)
}
if(method=="bgain"){
idx<-ifelse(segmenten["intensity"]>1.100000 & segmenten["Lair"]>thres,TRUE,FALSE)
}
return(idx)
}
After this part the next step is to write the data from the function to the existing table.
Anyone has an idea
Since your desired result is not clear enough I made some assumptions and wrote something that might be useful or not.
First at all, inside your function there is an object segmenten which is not defined, I suppose this is the data set supplied as an input, then you used ifelse and the returning results are TRUE or FALSE but you want either LOH or INBALANCE when some conditions are met.
You want INBALANCE when ... & segmenten["Lair"]>thres and LOH otherwise (here ... means the other part of the condition) this will give a vector, but you want it in the main dataset as an addional column, don't you? So maybe this could be a new starting point for you to improve your code.
detection <- function(assay, method=c('threshold', 'cnloh', 'gain', 'loss', 'bloss', 'bgain'),
thres=0.5){
x <- assay
idx <- switch(match.arg(method),
threshold = ifelse(x["intensity"]<1.1 & x["intensity"]>0.9 & x["Lair"]>thres, 'INBALANCE', 'LOH'),
cnloh = ifelse(x["intensity"]<1.1 & x["intensity"]>0.9 & x["Lair"]<thres, 'LOH', 'INBALANCE'),
gain = ifelse(x["intensity"]>1.1 & x["Lair"]<thres, 'LOH', 'INBALANCE'),
loss = ifelse(x["intensity"]<0.9 & x["Lair"]<thres,'LOH', 'INBALANCE'),
bloss = ifelse(x["intensity"]<0.9 & x["Lair"]>thres, 'INBALANCE', 'LOH'),
bgain = ifelse(x["intensity"]>1.1 & x["Lair"]>thres, 'INBALANCE', 'LOH'))
colnames(idx) <- 'Checking'
return(cbind(x, as.data.frame(idx)))
}
Example:
Data <- read.csv("japansegment data.csv", header=T)
result <- detection(Data, method='threshold', thres=0.5) # 'threshold' is the default value for method
head(result)
SNP_NAME x0 x1 y pos.start pos.end chrom count copynumber intensity allele.B Lair uncertain sample_id
1 SNP_A-1656705 0 0 0 836727 27933161 1 230 2 1.0783 1 0.9218 FALSE GSM288035
2 SNP_A-1677548 0 0 0 28244579 246860994 1 4408 2 0.9827 1 0.9236 FALSE GSM288035
3 SNP_A-1669537 0 0 0 100819 159783145 2 3480 2 0.9806 1 0.9193 FALSE GSM288035
4 SNP_A-1758569 0 0 0 159783255 159791136 2 5 2 1.7244 1 0.9665 FALSE GSM288035
5 SNP_A-1662168 0 0 0 159817465 168664268 2 250 2 0.9786 1 0.9197 FALSE GSM288035
6 SNP_A-1723506 0 0 0 168721411 168721920 2 2 2 1.8027 -4 NA FALSE GSM288035
Checking
1 INBALANCE
2 INBALANCE
3 INBALANCE
4 LOH
5 INBALANCE
6 LOH
Using match.arg and switch functions will help you to avoid a lot of if statements.

Resources