adding two variables which has NA present - r

lets say data is 'ab':
a <- c(1,2,3,NA,5,NA)
b <- c(5,NA,4,NA,NA,6)
ab <-c(a,b)
I would like to have new variable which is sum of the two but keeping NA's as follows:
desired output:
ab$c <-(6,2,7,NA,5,6)
so addition of number + NA should equal number
I tried following but does not work as desired:
ab$c <- a+b
gives me : 6 NA 7 NA NA NA
Also don't know how to include "na.rm=TRUE", something I was trying.
I would also like to create third variable as categorical based on cutoff <=4 then event 1, otherwise 0:
desired output:
ab$d <-(1,1,1,NA,0,0)
I tried:
ab$d =ifelse(ab$a<=4|ab$b<=4,1,0)
print(ab$d)
gives me logical(0)
Thanks!

a <- c(1,2,3,NA,5,NA)
b <- c(5,NA,4,NA,NA,6)
dfd <- data.frame(a,b)
dfd$c <- rowSums(dfd, na.rm = TRUE)
dfd$c <- ifelse(is.na(dfd$a) & is.na(dfd$b), NA_integer_, dfd$c)
dfd$d <- ifelse(dfd$c >= 4, 1, 0)
dfd
a b c d
1 1 5 6 1
2 2 NA 2 0
3 3 4 7 1
4 NA NA NA NA
5 5 NA 5 1
6 NA 6 6 1

Related

How to find whether at least one column satisfies a certain condition, with NAs

I have a dataframe with multiple columns: I need to identify those rows in which there is at least one outlier among some of the columns, but I do not know how to deal with NAs.
An example of dataframe (different from mine):
# X atq ME.BE.crsp X2
# 1 10 0.5 4
# NA 2 1.3 5
# 3 NA 5 2
# NA NA NA NA
# 2 4 NA 3
I'm doing the following:
data = data %>%
mutate(outlier= as.numeric(atq > quantile(atq, 0.99,na.rm=T)|
atq < quantile(atq, 0.01,na.rm=T)|
ME.BE.crsp > quantile(ME.BE.crsp, 0.99,na.rm = T)|
ME.BE.crsp < quantile(ME.BE.crsp, 0.01,na.rm = T)
))
My expected result is (I'm making up the outliers, the point is about NAs):
# X atq ME.BE.crsp X2 outlier
# 1 10 0.5 4 1
# NA 2 1.3 5 0
# 3 NA 5 2 0
# NA NA NA NA NA
# 2 4 NA 3 1
What I get instead is:
# X atq ME.BE.crsp X2 outlier
# 1 10 0.5 4 1
# NA 2 1.3 5 0
# 3 NA 5 2 NA
# NA NA NA NA NA
# 2 4 NA 3 NA
So, it seems that as soon as the as.numeric finds an NA either in data$atq or in data$ME.BE.crsp, it just gives NA to data$outlier, while I would like it to consider the non NA value and assign 0 or 1 based on that one.
Any suggestions? Thanks!
If both'atq' and 'ME.BE.crsp' are NA and it should return NA, then use a condition with case_when
library(dplyr)
data %>%
mutate(outlier= case_when(is.na(atq) & is.na(ME.BE.crsp) ~
NA_real_,
TRUE ~ as.numeric((atq > quantile(atq, 0.99,na.rm=TRUE)) &
!is.na(atq)|
(atq < quantile(atq, 0.01,na.rm=T)) & !is.na(atq)|
(ME.BE.crsp > quantile(ME.BE.crsp, 0.99,na.rm = T)) &
!is.na(ME.BE.crsp)|
(ME.BE.crsp < quantile(ME.BE.crsp, 0.01,na.rm = T)) &
!is.na(ME.BE.crsp)
)))

Using mapply to set values based on values in other columns

Based on my previous question, I need help with using the mapply function correctly.
x <- data.frame(a = seq(1,3), b = seq(2,4), c = seq(3,5), d = seq(4,6), b2 = seq(5,7), c2 = seq(6,8), d2 = seq(7,9))
# a b c d b2 c2 d2
# 1 2 3 4 5 6 7
# 2 3 4 5 6 7 8
# 3 4 5 6 7 8 9
My goal is to look at the columns b2 to d2 and, based on their values, change the values in columns b to d respectively. I can do this for a single column quite easily:
x[which(x$b2 == 7),][b] <- NA_real_
My problem is that I want this applied across all my columns but I don't know how to convert this single column formula to work on multiple columns. I tried:
onez <- c(2:4)
twoz <- c(5:7)
f <- function(df, ones, twos) {
df[which(df[,twos] == 7),][ones] <- NA_real_
}
mapply(f, df = x, ones = onez, twos = twoz)
But I'm getting error messages (incorrect dimensions etc) and I see that my function is messy but I lack the knowledge how to fix it.
One way to do it is to tell it to:
Get the subset of the data frame with columns 5, 6, 7: x[5:7]
Check from that subset which values satisfy your condition: x[5:7] == 7
Replace those values with NA: ... <- NA
This gives the following,
x[5:7][x[5:7] == 7] <- NA
x
# a b c d b2 c2 d2
#1 1 2 3 4 5 6 NA
#2 2 3 4 5 6 NA 8
#3 3 4 5 6 NA 8 9
If you want the NAs to be replaced at x[2:4], then you can do,
x[2:4][x[5:7] == 7] <- NA
x
# a b c d b2 c2 d2
#1 1 2 3 NA 5 6 7
#2 2 3 NA 5 6 7 8
#3 3 NA 5 6 7 8 9

Replace values within a range in a data frame in R

I have ranked rows in a data frame based on values in each column.Ranking 1-10. not every column in picture
I have code that replaces values to NA or 1. But I can't figure out how to replace range of numbers, e.g. 3-6 with 1 and then replace the rest (1-2 and 7-10) with NA.
lag.rank <- as.matrix(lag.rank)
lag.rank[lag.rank > n] <- NA
lag.rank[lag.rank <= n] <- 1
At the moment it only replaces numbers above or under n. Any suggestions? I figure it should be fairly simple?
Is this what your are trying to accomplish?
> x <- sample(1:10,20, TRUE)
> x
[1] 1 2 8 2 6 4 9 1 4 8 6 1 2 5 8 6 9 4 7 6
> x <- ifelse(x %in% c(3:6), 1, NA)
> x
[1] NA NA NA NA 1 1 NA NA 1 NA 1 NA NA 1 NA 1 NA 1 NA 1
If your data aren't integers but numeric you can use between from the dplyr package:
x <- ifelse(between(x,3,6), 1, NA)

Conditionals calculations across rows R

First, I'm brand new to R and am making the switch from SAS. I have a dataset that is 1000 rows by 24 columns, where the columns are different treatments. I want to count the number of times an observation meets a criteria across rows of my dataset listed below.
Gene A B C D
1 AARS_3 NA NA 4.168365 NA
2 AASDHPPT_21936 NA NA NA -3.221287
3 AATF_26432 NA NA NA NA
4 ABCC2_22 4.501518 3.17992 NA NA
5 ABCC2_26620 NA NA NA NA
I was trying to create column vectors that counted
1) Number of NAs
2) Number of columns <0
3) Number of columns >0
I would then use cbind to add these to my large dataset
I solved the first one with :
NA.Count <- (apply(b01,MARGIN=1,FUN=function(x) length(x[is.na(x)])))
I tried to modify this to count evaluate the !is.na and then count the number of times the value was less than zero with this:
lt0 <- (apply(b01,MARGIN=1,FUN=function(x) ifelse(x[!is.na(x)],count(x[x<0]))))
which didn't work at all.
I tried a dozen ways to get dplyr mutate to work with this and did not succeed.
What I want are the last two columns below; and if you had a cleaner version of the NA.Count I did, that would also be greatly appreciated.
Gene A B C D NA.Count lt0 gt0
1 AARS_3 NA NA 4.168365 NA 3 0 1
2 AASDHPPT_21936 NA NA NA -3.221287 3 1 0
3 AATF_26432 NA NA NA NA 4 0 0
4 ABCC2_22 4.501518 3.17992 NA NA 2 0 2
5 ABCC2_26620 NA NA NA NA 4 0 0
Here is one way to do it taking advantage of the fact that TRUE equals 1 in R.
# test data frame
lil_df <- data.frame(Gene = c("AAR3", "ABCDE"),
A = c(NA, 3),
B = c(2, NA),
C = c(-1, -2),
D = c(NA, NA))
# is.na
NA.count <- rowSums(is.na(lil_df[,-1]))
# less than zero
lt0 <- rowSums(lil_df[,-1]<0, na.rm = TRUE)
# more that zero
mt0 <- rowSums(lil_df[,-1]>0, na.rm = TRUE)
# cbind to data frame
larger_df <- cbind(lil_df, NA.count, lt0, mt0 )
larger_df
Gene A B C D NA.count lt0 mt0
1 AAR3 NA 2 -1 NA 2 1 1
2 ABCDE 3 NA -2 NA 2 1 1

ifelse rows the same in R [duplicate]

This question already has answers here:
ifelse matching vectors in r
(2 answers)
Closed 9 years ago.
I have a dataframe that looks like this:
> df<-data.frame(A=c(NA,1,2,3,4),B=c(NA,5,NA,3,4),C=c(NA,NA,NA,NA,4))
> df
A B C
1 NA NA NA
2 1 5 NA
3 2 NA NA
4 3 3 NA
5 4 4 4
I am trying to create a "D" column based on the row values in df, where D gets an NA if the values in the row are different (i.e. row 2) or all NAs (i.e. row 1), and the value in the row if the values in that row are the same, excluding NAs (i.e. rows 3, 4, 5). This would produce a vector and dataframe that looks like this:
> df$D<-c(NA,NA,2,3,4)
> df
A B C D
1 NA NA NA NA
2 1 5 NA NA
3 2 NA NA 2
4 3 3 NA 3
5 4 4 4 4
Thank you in advance for your suggestions.
You can use apply() to do calculation for each row and then use unique() and !is.na(). With !is.na() you select values that are not NA. With unique() you get unique values and then with length() get number of unique values. If number is 1 then use first non NA value, if not then NA.
df$D<-apply(df,1,function(x)
ifelse(length(unique(x[!is.na(x)]))==1,x[!is.na(x)][1],NA))
Here is one possible approach:
FUN <- function(x) {
no.na <- x[!is.na(x)]
len <- length(no.na)
if (len == 0) return(NA)
if (len == 1) return(no.na)
runs <- rle(no.na)[[2]]
if(length(runs) > 1) return(NA)
runs
}
df$D <- apply(df, 1, FUN)
## > df
## A B C D
## 1 NA NA NA NA
## 2 1 5 NA NA
## 3 2 NA NA 2
## 4 3 3 NA 3
## 5 4 4 4 4

Resources