Custom data frame in R - r

I have a below data frame
df <- data.frame(a = c(1,3,4,5,8,9), b = c("","",0,0,"",""))
df$b <- as.numeric(df$b)
df
a b
1 1 NA
2 3 NA
3 4 0
4 5 0
5 8 NA
6 9 NA
Is there a way to populate the data frame that is capturing the value in column a only at a specific point
Example : Expected output (a cell before 0 and after 0 in column b should be filled by the value in column a.
df1
a b
1 1 NA
2 3 3
3 4 0
4 5 0
5 8 8
6 9 NA

I think the following solution will help you:
library(dplyr)
df %>%
mutate(b = ifelse(is.na(b) & lead(b) == 0 | is.na(b) & lag(b) == 0, a, b))
a b
1 1 NA
2 3 3
3 4 0
4 5 0
5 8 8
6 9 NA

Related

Conditionally replace NAs in Certain Columns Based on Row Values

For a dataframe like I have below, I am trying to selectively replace the NAs in columns a, b, and c with a 0 using R, but only when there is at least one missing value in those columns for that row.
For example, I would want to replace the NAs in rows 1,2, and 5, but leave row 4 alone, and not replace the NA in column d
sample data
df <- data.frame(a = c(1,NA,2,NA,3,4),
b = c(NA,5,6,NA,7,8),
c = c(9,NA,10,NA,NA,11),
d = c("Alpha","Beta","Charlie","Delta",NA,"Foxtrot"))
> df
a b c d
1 1 NA 9 Alpha
2 NA 5 NA Beta
3 2 6 10 Charlie
4 NA NA NA Delta
5 3 7 NA <NA>
6 4 8 11 Foxtrot
Desired outcome
> df_naReplaced
a b c d
1 1 0 9 Alpha
2 0 5 0 Beta
3 2 6 10 Charlie
4 NA NA NA Delta
5 3 7 0 <NA>
6 4 8 11 Foxtrot
The solutions that I have found so far only work on conditions by column, but not by row, or would require actively removing those columns from their context (in this example separating it from d).
I have tried using ifelse and an if statement like below but was unable to get it to work as selectively as I would like, as it replaces all NA in that column.
if(df %>% select(a:c) %>% any(!is.na(.))){
df<- df %>% replace_na(list(a= 0,
b= 0,
c= 0)
)
}
Thank you for whatever help you are able to offer!
Here's an R base solution
> df[,-4][(is.na(df[, -4]) & rowSums(is.na(df[, -4])) < 3)] <- 0
> df
a b c d
1 1 0 9 Alpha
2 0 5 0 Beta
3 2 6 10 Charlie
4 NA NA NA Delta
5 3 7 0 <NA>
6 4 8 11 Foxtrot

How select and remove rows based on position for a specific range in R

Suppose I have two data frames like this:
df1 <- data.frame(a = c(1,2,4,0,0),
b = c(0,3,5,5,0),
c = c(0,0,6,7,6))
df2 <- data.frame(a = c(3,6,8,0,0),
b = c(0,9,10,4,0),
c = c(0,0,1,4,9))
And then I joint it, like
df3 <- full_join(df1, df2)
print(df3)
a b c
1 1 0 0
2 2 3 0
3 4 5 6
4 0 5 7
5 0 0 6
6 3 0 0
7 6 9 0
8 8 10 1
9 0 4 4
10 0 0 9
Note that I have always the same pattern, with zeros in rows 1 and 2; and in rows 9 and 10. And I also have zeros between rows 4 and 7.
I want to remove, only, the zeros between rows 4 and 7.
So, I can solve it, like:
df3[4,1] <- NA
df3[5,1] <- NA
df3[5,2] <- NA
df3[6,2] <- NA
df3[6,3] <- NA
df3[7,3] <- NA
new.df3 <- as.data.frame(lapply(df3, na.omit))
print(new.df3)
a b c
1 1 0 0
2 2 3 0
3 4 5 6
4 3 5 7
5 6 9 6
6 8 10 1
7 0 4 4
8 0 0 9
But it is not elegant and very time-consuming.
Any thoughts? I really appreciate it, thanks in advance.
Best!
df3 %>%
mutate(rn = between(row_number(), 4, 7)) %>%
summarise(across(-rn, ~.x[!(.x == 0 & rn)]))
a b c
1 1 0 0
2 2 3 0
3 4 5 6
4 3 5 7
5 6 9 6
6 8 10 1
7 0 4 4
8 0 0 9
First, you find which one is zero between rows 4 and 7.
to_remove <- apply(df3[4:7, ], 1, function(x) which(x == 0))
Then, you substitute them by NAs.
for(i in seq(length(to_remove))){
df3[as.numeric(names(to_remove))[i], to_remove[[i]]] <- NA
}
And, finally, drop them.
new.df3 <- as.data.frame(lapply(df3, na.omit))
print(new.df3)
Here's a different approach:
mask <- !(seq(nrow(df3)) %in% 4:7 & df3 == 0)
df.lst <- lapply(1:3, function(x) df3[mask[, x], x])
sapply(df.lst, length)
# [1] 8 8 8 # Check to make sure the columns are the same length
names(df.lst) <- colnames(df3)
(new.df3 <- as.data.frame(df.lst))
# a b c
# 1 1 0 0
# 2 2 3 0
# 3 4 5 6
# 4 3 5 7
# 5 6 9 6
# 6 8 10 1
# 7 0 4 4
# 8 0 0 9

Ifelse with NA values in columns

I am trying to apply an ifelse statement on columns that have NA and would like the else condition to be given when NA is present. Instead, I just get NA. My actual case uses multiple columns making it difficult for me to find a solution (e.g., I can't convert NA's to 0 because there are some cases that are missing across all columns).
Data:
df <- data.frame(a=c(NA, 1:3, NA) , b=c(NA,4:6,NA), c=c(5,10,15,20,25))
a b c
1 NA NA 5
2 1 4 10
3 2 5 15
4 3 6 20
5 NA NA 25
Attempt:
df2 <- df %>% mutate(check=ifelse((a<=2&b>4)|c==25,1,0))
Result:
a b c check
1 NA NA 5 NA
2 1 4 10 0
3 2 5 15 1
4 3 6 20 0
5 NA NA 25 1
Desired output:
a b c check
1 NA NA 5 **0**
2 1 4 10 0
3 2 5 15 1
4 3 6 20 0
5 NA NA 25 1
You can deal with the na's in a separate line:
df2 <- df %>%
#mutate_at(vars("a", "b", "c"), ~if_else(is.na(.x), 0.0, as.double(.x))) %>% # double?
mutate_at(vars("a", "b", "c"), ~if_else(is.na(.x), 0L, as.integer(.x))) %>% # or integer
mutate(check=ifelse((a<=2&b>4)|c==25,1,0))
Let's combine previous comment into the script:
library(dplyr)
df <- data.frame(a=c(NA, 1:3, NA) , b=c(NA,4:6,NA), c=c(5,10,15,20,25))
df2 <- df %>% mutate(check=ifelse((a<=2&b>4)|c==25,1,0))
# if dataset 2 contains NA, transform into 0
df2$check[is.na(df2$check)] <- 0
My answer is not exactly what you want, but if you want to replace NA values, you can try this one
df[is.na(df)] <- 0
Output
a b c
1 0 0 5
2 1 4 10
3 2 5 15
4 3 6 20
5 0 0 25

How to replace 0 or missing value with NA in R [duplicate]

This question already has answers here:
Replace all 0 values to NA
(11 answers)
Closed 4 years ago.
this is what i have already done so far
data is numeric data type
if (is.na(data) || attribute==0){replace(data,NA)}
it gives me error message that
Error in replace(attribute, NA) : argument "values" is missing, with no default
With mutate_all:
library(dplyr)
df %>%
mutate_all(~replace(., . == 0, NA))
or with mutate_if to be safe:
df %>%
mutate_if(is.numeric, ~replace(., . == 0, NA))
Note that there is no need to check for NA's, because we are replacing with NA anyway.
Output:
> df %>%
+ mutate_all(~replace(., . == 0, NA))
X Y Z
1 1 5 <NA>
2 4 4 2
3 2 3 2
4 5 5 2
5 5 3 <NA>
6 NA 4 <NA>
7 3 3 1
8 5 3 2
9 3 1 1
10 2 NA 5
11 5 5 <NA>
12 2 5 2
13 4 4 4
14 3 4 <NA>
15 NA NA 3
16 5 2 1
17 1 4 <NA>
18 NA 1 4
19 1 1 5
20 5 1 2
> df %>%
+ mutate_if(is.numeric, ~replace(., . == 0, NA))
X Y Z
1 1 5 0
2 4 4 2
3 2 3 2
4 5 5 2
5 5 3 0
6 NA 4 0
7 3 3 1
8 5 3 2
9 3 1 1
10 2 NA 5
11 5 5 0
12 2 5 2
13 4 4 4
14 3 4 0
15 NA NA 3
16 5 2 1
17 1 4 0
18 NA 1 4
19 1 1 5
20 5 1 2
Data:
set.seed(123)
df <- data.frame(X = sample(0:5, 20, replace = TRUE),
Y = sample(0:5, 20, replace = TRUE),
Z = as.character(sample(0:5, 20, replace = TRUE)))
You could just use replace without any additional function / package:
data <- replace(data, data == 0, NA)
This is now assuming that data is your data frame.
Otherwise you can simply insert the column name, e.g. if your data frame is df and column name data:
df$data <- replace(df$data, df$data == 0, NA)
Assuming that data is a dataframe then you could use sapply to update your values based on a set of filters:
new.data = as.data.frame(sapply(data,FUN= function(x) replace(x,is.na(x) | x == 0)))

R: Change values in a data frame according to values in another column of the same row

Let's say I have this kind of data frame:
df <- data.frame(
t=rep(seq(0,2),6),
no=rep(c(1,2,3,4,5,6),each=3),
value=rnorm(18),g=rep(c("nc","c1", NA),each=3)
)
t no value g
1 0 1 0.5022163 nc
2 1 1 0.5687227 nc
3 2 1 -0.2922622 nc
4 0 2 -0.3587089 c1
5 1 2 -0.9028012 c1
6 2 2 0.1926774 c1
7 0 3 0.6771236 NA
8 1 3 0.3752632 NA
9 2 3 0.2795892 NA
10 0 4 -0.4565521 nc
11 1 4 -0.1241807 nc
12 2 4 -1.2603695 nc
13 0 5 -0.6323118 c1
14 1 5 -0.6283850 c1
15 2 5 -0.2052317 c1
16 0 6 1.5996913 NA
17 1 6 -0.4802057 NA
18 2 6 -0.4255056 NA
I want to set the values in df$value to NA whenever there is NA in df$g (only in the same rows).
And similarly, set the values in df$value to NA, if df$no is, e.g., 1 or 5.
I was fooling around with for loops, but I could not get it right.
Any help will be much appreciated.
Thanks
With a for loop
for (i in 1:nrow(df)) {
if (df$no[i] == 1 | df$no[i] == 5 | is.na(df$g[i])) {
df$value[i] <- NA
}
}

Resources