In R, I would like to sum across rows but keep NA's as NA if the whole row is NA. My data contains 0's and I want to count them as such. E.g.:
colA colB colC Total
1 NA 2 3
NA NA NA NA
0 NA NA 0
3 0 NA 3
I used the code below and got 0's for the all-NA rows. If I change na.rm to F, I get all NAs all the way down. I would like get NA in the all-NA rows.
Total <- as.data.frame(rowSums(df[,1:3], na.rm = T))
Thanks!
You could simply change the results in a second pass:
dat <- data.frame(colA=c(1,NA,0,3), colB=c(NA,NA,NA,0), colC=c(2,NA,NA,NA))
dat
colA colB colC
1 1 NA 2
2 NA NA NA
3 0 NA NA
4 3 0 NA
res <- rowSums(dat,na.rm=T)
res
[1] 3 0 0 3
res[rowSums(is.na(dat))==3] <- NA
res
[1] 3 NA 0 3
dat <- data.frame(colA=c(1,NA,0,3), colB=c(NA,NA,NA,0), colC=c(2,NA,NA,NA))
dat
colA colB colC
1 1 NA 2
2 NA NA NA
3 0 NA NA
4 3 0 NA
res <- rowSums(dat,na.rm=T)
res
[1] 3 0 0 3
res[rowSums(is.na(dat))==3] <- NA
res
[1] 3 NA 0 3
And if you want save it back in your data:
df$total <- res
You can do this in one line using a manipulation of NA.
rowSums(df, na.rm=TRUE) * NA^(rowSums(is.na(df)) == length(df))
[1] 3 NA 0 3
Here, the first rowSums gets the sums while removing NAs. This is then multiplied by NA^(rowSums(is.na(df)) == length(df)), which returns NA in all cases except when the exponentiated term is 0 (or FALSE). In this case, FALSE occurs when at least one element of the row is non-NA.
use this to get total and then cbind it with your dataframe .
apply(df,1,function(x){if (sum(is.na(x)) == length(x)){
return(NA)
}else{
sum(x,na.rm = T)
}
})
In two steps like the above answer (but shorter):
sums <- rowSums(df, na.rm=TRUE)
allna <- apply(df,1, function(x)all(is.na(x)))
sums[allna] <- NA
Using Dplyr (in one step);
t1<- data.frame ( A=c(1,NA,0,3),
B=c(NA,5,NA,0),
C=c(2,NA,NA,NA))
t1<-t1 %>% rowwise() %>% mutate(Total=sum(A,B,C,na.rm=T))
Related
Is there anyway to avoid 0 as result in case all rows are NA while computing rowSum using na.rm =T
in below example row sum is zero in case all rows are NA but I need NA in results. I cant skip using na.rm = T because other cases may have some NA.
and I am specifically looking for data table solution.
`df <- data.table::fread("X Y
2 26
3 NA
0 0
NA NA
4 5
", header = TRUE)
df[,Sum:=rowSums(.SD,na.rm = T), .SDcols = names(df)]`
There are a lot of ways you can do this. My input:
df[, Sum:= rowSums(df, na.rm = TRUE) * NA ^ (rowSums(!is.na(df)) == 0)]
Will return:
X Y Sum
1: 2 26 28
2: 3 NA 3
3: 0 0 0
4: NA NA NA
5: 4 5 9
Another way, call the rowSums function only on the rows that you don't want to be NA. The assign by reference call will fill the rest with NA:
df[, .SD
][Reduce('*', lapply(df, is.na)) == 0, sum := rowSums(.SD, na.rm=T)
][]
X Y sum
1: 2 26 28
2: 3 NA 3
3: 0 0 0
4: NA NA NA
5: 4 5 9
I have a dataframe with multiple columns and I want to replace NAs in one column if they are between two rows with an identical number. Here is my data:
v1 v2
1 2
NA 3
NA 2
1 1
NA 7
NA 2
3 1
I basically want to start from the beginning of the data frame and replcae NAs in column v1 with previous Non NA if the next Non NA matches the previous one. That been said, I want the result to be like this:
v1 v2
1 2
1 3
1 2
1 1
NA 7
NA 2
3 1
As you may see, rows 2 and 3 are replaced with number "1" because row 1 and 4 had an identical number but rows 5,6 stays the same because the non na values in rows 4 and 7 are not identical. I have been twicking a lot but so far no luck. Thanks
Here is an idea using zoo package. We basically fill NAs in both directions and set NA the values that are not equal between those directions.
library(zoo)
ind1 <- na.locf(df$v1, fromLast = TRUE)
df$v1 <- na.locf(df$v1)
df$v1[df$v1 != ind1] <- NA
which gives,
v1 v2
1 1 2
2 1 3
3 1 2
4 1 1
5 NA 7
6 NA 2
7 3 1
Here is a similar approach in tidyverse using fill
library(tidyverse)
df1 %>%
mutate(vNew = v1) %>%
fill(vNew, .direction = 'up') %>%
fill(v1) %>%
mutate(v1 = replace(v1, v1 != vNew, NA)) %>%
select(-vNew)
# v1 v2
#1 1 2
#2 1 3
#3 1 2
#4 1 1
#5 NA 7
#6 NA 2
#7 3 1
Here is a base R solution, the logic is almost the same as Sotos's one:
replace_na <- function(x){
f <- function(x) ave(x, cumsum(!is.na(x)), FUN = function(x) x[1])
y <- f(x)
yp <- rev(f(rev(x)))
ifelse(!is.na(y) & y == yp, y, x)
}
df$v1 <- replace_na(df$v1)
test:
> replace_na(c(1, NA, NA, 1, NA, NA, 3))
[1] 1 1 1 1 NA NA 3
I could use na.locf function to do so. Basically, I use the normal na.locf function package zoo to replace each NA with the latest previous non NA and store the data in a column. by using the same function but fixing fromlast=TRUE NAs are replaces with the first next nonNA and store them in another column. I checked these two columns and if the results in each row for these two columns are not matching I replace them with NA.
First, I'm brand new to R and am making the switch from SAS. I have a dataset that is 1000 rows by 24 columns, where the columns are different treatments. I want to count the number of times an observation meets a criteria across rows of my dataset listed below.
Gene A B C D
1 AARS_3 NA NA 4.168365 NA
2 AASDHPPT_21936 NA NA NA -3.221287
3 AATF_26432 NA NA NA NA
4 ABCC2_22 4.501518 3.17992 NA NA
5 ABCC2_26620 NA NA NA NA
I was trying to create column vectors that counted
1) Number of NAs
2) Number of columns <0
3) Number of columns >0
I would then use cbind to add these to my large dataset
I solved the first one with :
NA.Count <- (apply(b01,MARGIN=1,FUN=function(x) length(x[is.na(x)])))
I tried to modify this to count evaluate the !is.na and then count the number of times the value was less than zero with this:
lt0 <- (apply(b01,MARGIN=1,FUN=function(x) ifelse(x[!is.na(x)],count(x[x<0]))))
which didn't work at all.
I tried a dozen ways to get dplyr mutate to work with this and did not succeed.
What I want are the last two columns below; and if you had a cleaner version of the NA.Count I did, that would also be greatly appreciated.
Gene A B C D NA.Count lt0 gt0
1 AARS_3 NA NA 4.168365 NA 3 0 1
2 AASDHPPT_21936 NA NA NA -3.221287 3 1 0
3 AATF_26432 NA NA NA NA 4 0 0
4 ABCC2_22 4.501518 3.17992 NA NA 2 0 2
5 ABCC2_26620 NA NA NA NA 4 0 0
Here is one way to do it taking advantage of the fact that TRUE equals 1 in R.
# test data frame
lil_df <- data.frame(Gene = c("AAR3", "ABCDE"),
A = c(NA, 3),
B = c(2, NA),
C = c(-1, -2),
D = c(NA, NA))
# is.na
NA.count <- rowSums(is.na(lil_df[,-1]))
# less than zero
lt0 <- rowSums(lil_df[,-1]<0, na.rm = TRUE)
# more that zero
mt0 <- rowSums(lil_df[,-1]>0, na.rm = TRUE)
# cbind to data frame
larger_df <- cbind(lil_df, NA.count, lt0, mt0 )
larger_df
Gene A B C D NA.count lt0 mt0
1 AAR3 NA 2 -1 NA 2 1 1
2 ABCDE 3 NA -2 NA 2 1 1
I am trying to replace all the groups of elements in a vector that sum up to zero with NAs.
The size of each group is 3. For instance:
a = c(0,0,0,0,2,3,1,0,2,0,0,0,0,1,2,0,0,0)
should be finally:
c(NA,NA,NA,0,2,3,1,0,2,NA,NA,NA,0,1,2,NA,NA,NA)
Until now, I have managed to find the groups having the sum equal to zero via:
b = which(tapply(a,rep(1:(length(a)/3),each=3),sum) == 0)
which yields c(1,4,6)
I then calculate the starting indexes of the groups in the vector via: b <- b*3-2.
Probably there is a more elegant way, but this is what I've stitched together so far.
Now I am stuck at "expanding" the vector of start indexes, to generate a sequence of the elements to be replaced. For instance, if vector b now contains c(1,10,16), I will need a sequence c(1,2,3,10,11,12,16,17,18) which are the indexes of the elements to replace by NAs.
If you have any idea of a solution without a for loop or even a more simple/elegant solution for the whole problem, I would appreciate it. Thank you.
Marius
You can use something like this:
a[as.logical(ave(a, 0:(length(a)-1) %/% 3,
FUN = function(x) sum(x) == 0))] <- NA
a
# [1] NA NA NA 0 2 3 1 0 2 NA NA NA 0 1 2 NA NA NA
The 0:(length(a)-1) %/% 3 creates groups of your desired length (in this case, 3) and ave is used to check whether those groups add to 0 or not.
To designate the values to the same group turn your vector into (a three-row) matrix. You can then calculate the column-wise sums and compare with 0. The rest is simple.
a <- c(0,0,0,0,2,3,1,0,2,0,0,0,0,1,2,0,0,0)
a <- as.integer(a)
is.na(a) <- rep(colSums(matrix(a, 3L)) == 0L, each = 3L)
a
#[1] NA NA NA 0 2 3 1 0 2 NA NA NA 0 1 2 NA NA NA
Note that I make the comparison with integers to indicate that if your vector is not an integer, you need to consider this FAQ.
Or using gl, ave and all
n <- length(a)
a[ave(!a, gl(n, 3, n), FUN=all)] <- NA
a
#[1] NA NA NA 0 2 3 1 0 2 NA NA NA 0 1 2 NA NA NA
This question already has answers here:
ifelse matching vectors in r
(2 answers)
Closed 9 years ago.
I have a dataframe that looks like this:
> df<-data.frame(A=c(NA,1,2,3,4),B=c(NA,5,NA,3,4),C=c(NA,NA,NA,NA,4))
> df
A B C
1 NA NA NA
2 1 5 NA
3 2 NA NA
4 3 3 NA
5 4 4 4
I am trying to create a "D" column based on the row values in df, where D gets an NA if the values in the row are different (i.e. row 2) or all NAs (i.e. row 1), and the value in the row if the values in that row are the same, excluding NAs (i.e. rows 3, 4, 5). This would produce a vector and dataframe that looks like this:
> df$D<-c(NA,NA,2,3,4)
> df
A B C D
1 NA NA NA NA
2 1 5 NA NA
3 2 NA NA 2
4 3 3 NA 3
5 4 4 4 4
Thank you in advance for your suggestions.
You can use apply() to do calculation for each row and then use unique() and !is.na(). With !is.na() you select values that are not NA. With unique() you get unique values and then with length() get number of unique values. If number is 1 then use first non NA value, if not then NA.
df$D<-apply(df,1,function(x)
ifelse(length(unique(x[!is.na(x)]))==1,x[!is.na(x)][1],NA))
Here is one possible approach:
FUN <- function(x) {
no.na <- x[!is.na(x)]
len <- length(no.na)
if (len == 0) return(NA)
if (len == 1) return(no.na)
runs <- rle(no.na)[[2]]
if(length(runs) > 1) return(NA)
runs
}
df$D <- apply(df, 1, FUN)
## > df
## A B C D
## 1 NA NA NA NA
## 2 1 5 NA NA
## 3 2 NA NA 2
## 4 3 3 NA 3
## 5 4 4 4 4