Replacing 0 values with NA in data.frame conditionally - r

dat <- data.frame(A=c("name1", "name2", "name3"),
B=c(0,1,0), C=c(0,0,5), D= c(4,4,0), E=c(1,0,0), F=c(4,0,0) )
desiredresult <- data.frame(A=c("name1", "name2", "name3"),
B=c(NA,1,NA), C=c(NA,0,5), D= c(4,4,0), E=c(1,0,NA), F=c(4,NA,NA))
I want to replace 0 values with NA in every row until a positive value is encountered (no negative values in dataset). In addition to that I want to replace all values if their ending are all zeros leaving first 0 in place after last positive value. etc 5,0,0,0 -> 5,0,NA,NA
Provided example data with desiredresult. I was trying approach something like this, but there would need to be 5+ conditions to cover it all. Is there a better way to do this? Maybe with data.table?
dat$B[dat$B == 0 & (dat$C!=0 | dat$D!=0)] <- NA
dat$C[dat$C == 0 & dat$D!=0 & is.na(dat$B)] <- NA

Using the data.table-package, you could approach this as follows:
cols <- names(dat)[2:6]
library(data.table)
setDT(dat)[, (cols) := {x <- unlist(.SD);
x[cumsum(x)==0] <- NA;
l <- c(tail(cumsum(rev(x)),-1),1) == 0;
x[rev(l)] <- NA;
names(x) <- cols;
as.list(x)},
by = A]
you get:
> dat
A B C D E F
1: name1 NA NA 4 1 4
2: name2 1 0 4 0 NA
3: name3 NA 5 0 NA NA
The same kind of thinking, but then with base R:
dl <- as.data.frame(t(dat[,-1]))
idx1 <- cumsum(dl) == 0
idx2 <- sapply(dl, function(x) {
l <- c(tail(cumsum(rev(x)),-1),1) == 0
l[is.na(l)] <- FALSE
rev(l)
})
dl[idx1 | idx2] <- NA
dat[,-1] <- t(dl)
which will get you the same result:
> dat
A B C D E F
1 name1 NA NA 4 1 4
2 name2 1 0 0 4 0
3 name3 NA 5 0 NA NA
New example data:
dat <- data.frame(A=c("name1", "name2", "name3"),
B=c(0,1,0), C=c(0,0,5), D=c(4,0,0), E=c(1,4,0), F=c(4,0,0) )

This should work:
#Apply the first rule: convert 0 to NA until we find a non negative
res1<-t(apply(dat[,-1], 1, function(x) {
xc <- cumsum(x) #cumulative sum
x[xc==0]<-NA #NA where cumulative sum iz 0
x
}))
# Apply the second rule
res2<-t(apply(res1, 1, function(x) {
xc <- cumsum(rev(x)) #reverse the sum
xc<-c(tail(xc,-1),1) # shift the sum
res<-rev(x) #reverse the vector
res[xc==0]<-NA
rev(res)
}))
#Reconstruct the data frame
cbind(data.frame(name=dat[,1]),res2)
# name B C D E F
#1 name1 NA NA 4 1 4
#2 name2 1 0 4 0 NA
#3 name3 NA 5 0 NA NA

Related

R dataframe: combine conditions by processing

I have to find all columns with all NA-values. If there are not all NA-values in column, I have to replace NAs with 0.
My solution is:
NA_check <- colSums(is.na(frame)) == nrow(frame) #True or False - all NA or not
frame[is.na(frame) & which(names(frame) %in% names(NA_check)[which(NA_check == FALSE, arr.ind=T)])] <- 0
These conditions work separately, but they don't work together or I get some errors combining them. How can I solve my problem?
P.S. This modification also doesn't work if NA_checkis not all FALSE:
frame[is.na(frame[which(names(frame) %in% names(NA_check)[which(NA_check == FALSE, arr.ind=T)])])] <- 0
You can find out columns which has atleast one non-NA value (not all values are NA) and replace NA in that subset to 0.
not_all_NA <- colSums(!is.na(frame)) > 0
frame[not_all_NA][is.na(frame[not_all_NA])] <- 0
We can check this with an example :
frame <- data.frame(a = c(NA, NA, 3, 4), b = NA, c = c(NA, 1:3), d = NA)
frame
# a b c d
#1 NA NA NA NA
#2 NA NA 1 NA
#3 3 NA 2 NA
#4 4 NA 3 NA
not_all_NA <- colSums(!is.na(frame)) > 0
frame[not_all_NA][is.na(frame[not_all_NA])] <- 0
frame
# a b c d
#1 0 NA 0 NA
#2 0 NA 1 NA
#3 3 NA 2 NA
#4 4 NA 3 NA
We can also do this with dplyr :
library(dplyr)
frame %>% mutate(across(where(~any(!is.na(.))), tidyr::replace_na, 0))

Conditionally replace columns with NA [duplicate]

This question already has answers here:
How to conditionally replace values with NA across multiple columns
(2 answers)
Closed 2 years ago.
here is an example of my data:
m <- data.frame(swim = c(0,1,0,0), time1 = c(1,2,3,4), time2 = c(2,3,4,5))
I want to replace all numbers in columns time1 and time2 with NA after the row where there is a 1 in m$swim. It should look like this:
n <- data.frame(swim = c(0,1,0,0), time1 = c(1,2,NA,NA), time2 = c(2,3,NA,NA))
Thank you!
In dplyr you can do :
library(dplyr)
m %>%
mutate(across(starts_with('time'),
~replace(., row_number() > match(1, swim), NA)))
A base R option however, would be more efficient.
cols <- grep('time', names(m))
inds <- match(1, m$swim)
m[(inds + 1):nrow(m), cols] <- NA
m
# swim time1 time2
#1 0 1 2
#2 1 2 3
#3 0 NA NA
#4 0 NA NA
A base R solution would be:
#Data
m <- data.frame(swim = c(0,1,0,0), time1 = c(1,2,3,4), time2 = c(2,3,4,5))
#Detect position
index <- min(which(m$swim==1))
#Replace
m[(index+1):dim(m)[1],-1] <- NA
Output:
swim time1 time2
1 0 1 2
2 1 2 3
3 0 NA NA
4 0 NA NA
Using data.table, the result would be as follows:
library(data.table)
setDT(m)
#Start after the row with the 1
stop.here <- which(m$swim == 1)+1
these_rows <- seq(stop.here,length(m$swim),1)
m <- m[these_rows,time1:=NA]
m <- m[these_rows,time2:=NA]

How to efficiently replace first row's NA with 0 by group with R

is there a better way of replace the first row's NA with 0 by group?
This is the example. Thanks.
x <- matrix(c(NA,NA,2,3,NA,4,NA,NA,6,NA,NA,7),nrow=4)
x <- as.data.table(x)
names(x) <- c("a","b","c")
name <- rep(c("P-1","P-2"),each=2)
x <- cbind(name,x)
x[!duplicated(x$name),] <- replace(x[!duplicated(x$name),],sapply(x[!duplicated(x$name),],is.na),0)
We can replace NA values at first row in each group for all columns.
Using data.table, that can be done as :
library(data.table)
x[, lapply(.SD, function(x) replace(x, seq_along(x) == 1 & is.na(x), 0)), name]
# name a b c
#1: P-1 0 0 6
#2: P-1 NA 4 NA
#3: P-2 2 0 0
#4: P-2 3 NA 7
Or with dplyr :
library(dplyr)
x %>%
group_by(name) %>%
mutate_at(vars(-group_cols()), ~replace(., row_number() == 1 & is.na(.), 0))
You can store !duplicated(x$name) and there is no need for sapply. A base solution to replace first row's NA with 0 by group:
i <- !duplicated(x$name)
x[i,] <- replace(x[i,], is.na(x[i,]), 0)
x
# name a b c
#1 P-1 0 0 6
#2 P-1 NA 4 NA
#3 P-2 2 0 0
#4 P-2 3 NA 7
Another data.table option is:
x[name!=shift(name, fill=""), c("a","b","c") := {
s <- copy(.SD)
s[is.na(.SD)] <- 0
s
}, .SDcols=a:c]

Dispatch values in list column to separate columns

I have a data.table with a list column "c":
df <- data.table(a = 1:3, c = list(1L, 1:2, 1:3))
df
a c
1: 1 1
2: 2 1,2
3: 3 1,2,3
I want to create separate columns for the values in "c".
I create a set of new columns F_1, F_2, F_3:
mmax <- max(df$a)
flux <- paste("F", 1:mmax, sep = "_")
df[, (flux) := 0]
df
a c F_1 F_2 F_3
1: 1 1 0 0 0
2: 2 1,2 0 0 0
3: 3 1,2,3 0 0 0
I want to dispatch values in "c" to columns F_1, F_2, F_3 like this:
df
a c F_1 F_2 F_3
1: 1 1 1 0 0
2: 2 1,2 1 2 0
3: 3 1,2,3 1 2 3
What I have tried:
comp_vect <- function(vec, mmax){
vec <- vec %>% unlist()
n <- length(vec)
answr <- c(vec, rep(0, l = mmax -n))
}
df[ , ..flux := mapply(comp_vect, c, mmax)]
The expected data.table is :
> df
a c F_1 F_2 F_3
1: 1 1 1 0 0
2: 2 1,2 1 2 0
3: 3 1,2,3 1 2 3
I followed a radically different approach. I rbinded the list column and then dcasted it, obtaining the desired result. Last part is to set the names.
library(data.table)
df <- data.table(a = 1:3, d = list(1L, c(1L, 2L), c(1L, 2L, 3L)))
df2 <- df[, rbind(d), by = a][, dcast(.SD, a ~ V1, fill = 0)]
setnames(df2, 2:4, flux)[]
a F_1 F_2 F_3
1: 1 1 0 0
2: 2 1 2 0
3: 3 1 2 3
where flux is the variable of names that you defined in your question.
Please notice that avoided using the column name c, as it may be confused with the function c().
Solution :
for(idx in seq(max(sapply(df$c, length)))){ # maximum number of values according to all the elements of the list
set(x = df,
i = NULL,
j = paste0("F_",idx), # column's name
value = sapply(df$c, function(x){
if(is.na(x[idx])){
return(0) # 0 instead of NA
} else {
return(x[idx])
}
})
)
}
Explications :
We can extract the values from a list like this :
sapply(df$c, function(ll) return(ll[1])) # first value
[1] 1 1 1
sapply(df$c, function(ll) return(ll[2])) # second value
[1] NA 2 2
sapply(df$c, function(ll) return(ll[3])) # third value
[1] NA NA 3
We see that if there is no value, we have a NA.
We need an iterator to extract all values at the position idx. For that, we'll find the number of values in each element of df$c (the list) and keep the maximum.
max(sapply(df$c, length))
[1] 3
If we want zeros instead of NAs, we need to create a function in the sapply to convert them :
vec <- c(NA, 5, 1, NA)
> sapply(vec, function(x) if(is.na(x)) return(0) else return(x))
[1] 0 5 1 0

Combine of information based on two dataframes in R

This is my sample data
> data.frame
a b c d
W_1_N NA NA NA NA
W_1_E 2 2 2 4
W_1_C 4 2 2 4
W_1_D NA NA NA NA
First I had to combine elements from matrix to get pairs of column names of them, where one of element is 4 and another is 2 in the same row.
In a result it looks like this
W_1_E.1 d a
W_1_E.2 d b
W_1_E.3 d c
W_1_C.1 a b
W_1_C.2 a c
W_1_C.3 d b
W_1_C.4 d c
I wanted only pairs where one element is 4 and other is 2 in the same row. W_1_N and W_1_D have only NA so was ommited. W_1_E appears in 3 rows because there are 3 pairs of (4,2) in row in sample data.W_1_C has 4 pairs.
This is code:
lst=data.frame(df) %>%
rownames_to_column("rn") %>%
drop_na() %>%
gather(key, value, -rn) %>%
group_by(rn, value) %>%
summarise(l = list(unique(key))) %>%
split(.$rn)
pair=do.call("rbind", lapply(lst, function(x) expand.grid(x$l[[1]],
x$l[[2]])))
It works perfectly, but now I have second data.frame:
a b c d
W_1_N 0 1 1 1
W_1_E 1 1 0 0
W_1_C 1 1 1 0
W_1_D 1 0 1 1
Here is my problem, I want to get only this pairs where value of both elements of pair is 1 in second data.frame. For example first pair of my result W_1_E.1 d a should be eliminated because d has value 0 in W_1_E row in second data.frame.
The output should be:
W_1_C.1 a b
W_1_C.2 a c
d has value 0 in W_1_E row, so all rows with W_1_E in my result data.frame were eliminated (all pars were with d). The last two rows were eliminated because d is also 0 in W_1_C row in second dataframe.
Thanks for your help
How's this?
x <- "N a b c d
W_1_N NA NA NA NA
W_1_E 2 2 2 4
W_1_C 4 2 2 4
W_1_D NA NA NA NA "
x1 <- read.table(text = x, header = TRUE)
x <- "N a b c d
W_1_N 0 1 1 1
W_1_E 1 1 0 0
W_1_C 1 1 1 0
W_1_D 1 0 1 1 "
x2 <- read.table(text = x, header = TRUE)
df <- merge(x1, x2, by="N")
df$a <- ifelse(df$a.y == 0,NA,df$a.x)
df$b <- ifelse(df$b.y == 0,NA,df$b.x)
df$c <- ifelse(df$c.y == 0,NA,df$c.x)
df$d <- ifelse(df$d.y == 0,NA,df$d.x)
df <- df[ , c(1,10:13)]
library(tidyr)
df_all <- df %>%
gather(key = key1, value, 2:5)
df2 <- df_all[!is.na(df_all$value) & df_all$value == 2,]
df4 <- df_all[!is.na(df_all$value) & df_all$value == 4,]
merge(df2[,1:2], df4[1:2], by = "N", all.x = FALSE, all.y = FALSE)

Resources