Conditionally replace NAs in Certain Columns Based on Row Values - r

For a dataframe like I have below, I am trying to selectively replace the NAs in columns a, b, and c with a 0 using R, but only when there is at least one missing value in those columns for that row.
For example, I would want to replace the NAs in rows 1,2, and 5, but leave row 4 alone, and not replace the NA in column d
sample data
df <- data.frame(a = c(1,NA,2,NA,3,4),
b = c(NA,5,6,NA,7,8),
c = c(9,NA,10,NA,NA,11),
d = c("Alpha","Beta","Charlie","Delta",NA,"Foxtrot"))
> df
a b c d
1 1 NA 9 Alpha
2 NA 5 NA Beta
3 2 6 10 Charlie
4 NA NA NA Delta
5 3 7 NA <NA>
6 4 8 11 Foxtrot
Desired outcome
> df_naReplaced
a b c d
1 1 0 9 Alpha
2 0 5 0 Beta
3 2 6 10 Charlie
4 NA NA NA Delta
5 3 7 0 <NA>
6 4 8 11 Foxtrot
The solutions that I have found so far only work on conditions by column, but not by row, or would require actively removing those columns from their context (in this example separating it from d).
I have tried using ifelse and an if statement like below but was unable to get it to work as selectively as I would like, as it replaces all NA in that column.
if(df %>% select(a:c) %>% any(!is.na(.))){
df<- df %>% replace_na(list(a= 0,
b= 0,
c= 0)
)
}
Thank you for whatever help you are able to offer!

Here's an R base solution
> df[,-4][(is.na(df[, -4]) & rowSums(is.na(df[, -4])) < 3)] <- 0
> df
a b c d
1 1 0 9 Alpha
2 0 5 0 Beta
3 2 6 10 Charlie
4 NA NA NA Delta
5 3 7 0 <NA>
6 4 8 11 Foxtrot

Related

Custom data frame in R

I have a below data frame
df <- data.frame(a = c(1,3,4,5,8,9), b = c("","",0,0,"",""))
df$b <- as.numeric(df$b)
df
a b
1 1 NA
2 3 NA
3 4 0
4 5 0
5 8 NA
6 9 NA
Is there a way to populate the data frame that is capturing the value in column a only at a specific point
Example : Expected output (a cell before 0 and after 0 in column b should be filled by the value in column a.
df1
a b
1 1 NA
2 3 3
3 4 0
4 5 0
5 8 8
6 9 NA
I think the following solution will help you:
library(dplyr)
df %>%
mutate(b = ifelse(is.na(b) & lead(b) == 0 | is.na(b) & lag(b) == 0, a, b))
a b
1 1 NA
2 3 3
3 4 0
4 5 0
5 8 8
6 9 NA

Ifelse with NA values in columns

I am trying to apply an ifelse statement on columns that have NA and would like the else condition to be given when NA is present. Instead, I just get NA. My actual case uses multiple columns making it difficult for me to find a solution (e.g., I can't convert NA's to 0 because there are some cases that are missing across all columns).
Data:
df <- data.frame(a=c(NA, 1:3, NA) , b=c(NA,4:6,NA), c=c(5,10,15,20,25))
a b c
1 NA NA 5
2 1 4 10
3 2 5 15
4 3 6 20
5 NA NA 25
Attempt:
df2 <- df %>% mutate(check=ifelse((a<=2&b>4)|c==25,1,0))
Result:
a b c check
1 NA NA 5 NA
2 1 4 10 0
3 2 5 15 1
4 3 6 20 0
5 NA NA 25 1
Desired output:
a b c check
1 NA NA 5 **0**
2 1 4 10 0
3 2 5 15 1
4 3 6 20 0
5 NA NA 25 1
You can deal with the na's in a separate line:
df2 <- df %>%
#mutate_at(vars("a", "b", "c"), ~if_else(is.na(.x), 0.0, as.double(.x))) %>% # double?
mutate_at(vars("a", "b", "c"), ~if_else(is.na(.x), 0L, as.integer(.x))) %>% # or integer
mutate(check=ifelse((a<=2&b>4)|c==25,1,0))
Let's combine previous comment into the script:
library(dplyr)
df <- data.frame(a=c(NA, 1:3, NA) , b=c(NA,4:6,NA), c=c(5,10,15,20,25))
df2 <- df %>% mutate(check=ifelse((a<=2&b>4)|c==25,1,0))
# if dataset 2 contains NA, transform into 0
df2$check[is.na(df2$check)] <- 0
My answer is not exactly what you want, but if you want to replace NA values, you can try this one
df[is.na(df)] <- 0
Output
a b c
1 0 0 5
2 1 4 10
3 2 5 15
4 3 6 20
5 0 0 25

R: Row-wise expansion of data frame with consecutive integers

I have measured some positions pos, e.g.:
library(dplyr)
set.seed(8)
data <-
data.frame(id=LETTERS[1:5],
pos=c(0,round(runif(4, 1, 10),0))) %>%
arrange(pos)
> data
id pos
1 A 0
2 C 3
3 B 5
4 E 7
5 D 8
How can I expand a data frame like data with every possible pos (0,1,2,..,n) where n would be max(data$pos) (i.e. 8 in this example). I like to get something as:
id pos
1 A 0
2 NA 1
3 NA 2
4 C 3
5 NA 4
6 B 5
7 NA 6
8 E 7
9 D 8
You can do this a number of ways, but one way, in base R, is by using merge:
merge(data.frame(pos = 0:8), data, all.x = TRUE)
Or, using dplyr, it's:
data.frame(pos = 0:8) %>% left_join(data)
We can try
library(data.table)
setDT(data)[data.table(pos=0:8), on='pos']
# id pos
#1: A 0
#2: NA 1
#3: NA 2
#4: C 3
#5: NA 4
#6: B 5
#7: NA 6
#8: E 7
#9: D 8

Selecting values in a dataframe based on a priority list

I am new to R so am still getting my head around the way it works. My problem is as follows, I have a data frame and a prioritised list of columns (pl), I need:
To find the maximum value from the columns in pl for each row and create a new column with this value (df$max)
Using the priority list, subtract this maximum value from the priority value, ignoring NAs and returning the absolute difference
Probably better with an example:
My priority list is
pl <- c("E","D","A","B")
and the data frame is:
A B C D E F G
1 15 5 20 9 NA 6 1
2 3 2 NA 5 1 3 2
3 NA NA 3 NA NA NA NA
4 0 1 0 7 8 NA 6
5 1 2 3 NA NA 1 6
So for the first line the maximum is from column A (15) and the priority value is from column D (9) since E is a NA. The answer I want should look like this.
A B C D E F G MAX MAX-PR
1 15 5 20 9 NA 6 1 15 6
2 3 2 NA 5 1 3 2 5 4
3 NA NA 3 NA NA NA NA NA NA
4 0 1 0 7 8 NA 6 8 0
5 1 2 3 NA NA 1 6 2 1
How about this?
df$MAX <- apply(df[,pl], 1, max, na.rm = T)
df$MAX_PR <- df$MAX - apply(df[,pl], 1, function(x) x[!is.na(x)][1])
df$MAX[is.infinite(df$MAX)] <- NA
> df
# A B C D E F G MAX MAX_PR
# 1 15 5 20 9 NA 6 1 15 6
# 2 3 2 NA 5 1 3 2 5 4
# 3 NA NA 3 NA NA NA NA NA NA
# 4 0 1 0 7 8 NA 6 8 0
# 5 1 2 3 NA NA 1 6 2 1
Example:
df <- data.frame(A=c(1,NA,2,5,3,1),B=c(3,5,NA,6,NA,10),C=c(NA,3,4,5,1,4))
pl <- c("B","A","C")
#now we find the maximum per row, ignoring NAs
max.per.row <- apply(df,1,max,na.rm=T)
#and the first element according to the priority list, ignoring NAs
#(there may be a more efficient way to do this)
first.per.row <- apply(df[,pl],1, function(x) as.vector(na.omit(x))[1])
#and finally compute the difference
max.less.first.per.row <- max.per.row - first.per.row
Note that this code will break for any row that is all NA. There is no check against that.
Here a simple version. First , I take only pl columns , for each line I remove na then I compute the max.
df <- dat[,pl]
cbind(dat, t(apply(df, 1, function(x) {
x <- na.omit(x)
c(max(x),max(x)-x[1])
}
)
)
)
A B C D E F G 1 2
1 15 5 20 9 NA 6 1 15 6
2 3 2 NA 5 1 3 2 5 4
3 NA NA 3 NA NA NA NA -Inf NA
4 0 1 0 7 8 NA 6 8 0
5 1 2 3 NA NA 1 6 2 1

R dataframe filtering

I have a dataframe df as follows:
A B C
NA 1 2
2 NA 3
4 5 6
7 8 9
what I want to do is remove all the rows that has NA.
if I use
apply(df,1,function(row) all(!is.na(row)))
I get the list of all the rows with TRUE (if the row does not contain a NA) and FALSE(if the row contains a NA).
But how do I get the rowname such that I can create some like
df2<-df[-c(list of rows that contains NA),]
which will give me all the new dataframe with NA in rows.
Thanks in advance.
Assuming you have a dataframe that looks like this:
A B C
1 NA 1 2
2 2 NA 3
3 4 5 6
4 7 8 9
Then try:
df1[apply(df1,1,function(x) !any(is.na(x))), ]
A B C
3 4 5 6
4 7 8 9
It doesn't use rownames but rather a logical vector. I guess Joshua and I read you question differently but we used the same method.
Joshua's suggestion is more compact:
> na.omit(df1)
A B C
3 4 5 6
4 7 8 9
And it reminds me that I should have used:
> df1[complete.cases(df1), ]
A B C
3 4 5 6
4 7 8 9
You can use the logical vector from your apply call to index your data.frame.
> Data[!apply(Data,1,function(row) all(!is.na(row))),]
A B C
1 NA 1 2
2 2 NA 3
> # or like this:
> Data[apply(Data,1,function(row) any(is.na(row))),]
A B C
1 NA 1 2
2 2 NA 3
is.na on a data.frame returns a matrix, which is a better candidate for apply:
df <- read.table(textConnection(" A B C
NA 1 2
2 NA 3
4 5 6
7 8 9
"))
## a matrix
is.na(df)
## logical for selecting rows that are all NA
apply(df, 1, function(x) all(is.na(x)))
## one liner
df[!apply(df, 1, function(x) all(is.na(x))), ]

Resources