Replace NA using a vector of column names - r

I have a data frame with columns containing NAs which I replace using replace_na. The problem is these column names can change in the future so I would like to put these column names in a vector and then use the vector in the replace_na function. I don't want to change the entire data frame in one go, just specified columns. When I try this as below, the code runs but it doesn't change the data frame. Can anyone suggest any edits to the code?
library(tidyverse)
col1<-c(9,NA,25,26,NA,51)
col2<-c(9,5,25,26,NA,51)
col3<-c(NA,3,25,26,NA,51)
col4<-c(9,1,NA,26,NA,51)
data<-data.frame(col1,col2,col3,col4, stringsAsFactors = FALSE)
columns<-c(col1,col2)
data<-data%>%
replace_na(list(columns=0))

A dplyr option:
columns <- c("col1" ,"col2")
dplyr::mutate(data, across(columns, replace_na, 0))
Returns:
col1 col2 col3 col4
1 9 9 NA 9
2 0 5 3 1
3 25 25 25 NA
4 26 26 26 26
5 0 0 NA NA
6 51 51 51 51

Another option would be using coalesce inside map_at:
at argument in map_at can be a character vector of column names that you would like to modify
We then use coalesce function to specify the replacement of NAs
library(dplyr)
library(purrr)
data %>%
map_at(c("col1","col2"), ~ coalesce(.x, 0)) %>%
bind_cols()
# A tibble: 6 x 4
col1 col2 col3 col4
<dbl> <dbl> <dbl> <dbl>
1 9 9 NA 9
2 0 5 3 1
3 25 25 25 NA
4 26 26 26 26
5 0 0 NA NA
6 51 51 51 51

columns value should be string, you can then use is.na as -
columns<-c("col1","col2")
data[columns][is.na(data[columns])] <- 0
data
# col1 col2 col3 col4
#1 9 9 NA 9
#2 0 5 3 1
#3 25 25 25 NA
#4 26 26 26 26
#5 0 0 NA NA
#6 51 51 51 51
Or using tidyverse -
library(dplyr)
library(tidyr)
data <- data %>% mutate(across(all_of(columns), replace_na, 0))

Related

Use a dynamcially created variable to select column in mutate

I am trying to use the value of vector_of_names[position] in the code above to dynamically select a column from data which to use for the value "age" using mutate.
vector_of_names <- c("one","two","three")
id <- c(1,2,3,4,5,6)
position <- c(1,1,2,2,1,1)
one <- c(32,34,56,77,87,98)
two <- c(45,67,87,NA,33,56)
three <- c(NA,NA,NA,NA,NA,60)
data <- data.frame(id,position,one,two,three)
attempt <- data %>%
mutate(age=vector_of_names[position])
I see a similar question here but the various answer fail as I am using a variable within the data "posistion" on which to select the column from the vector of names which is never recognised as I suspect is is looking outside of the data.
I am taking this approach as the number of columns "one","two" and "three" is not known before hand but the vector of their names is, and so they need to be selected dynamically.
You could do:
data %>%
rowwise() %>%
mutate(age = c_across(all_of(vector_of_names))[position])
id position one two three age
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 32 45 NA 32
2 2 1 34 67 NA 34
3 3 2 56 87 NA 87
4 4 2 77 NA NA NA
5 5 1 87 33 NA 87
6 6 1 98 56 60 98
If you want to be more explicit about what values should be returned:
named_vector_of_names <- setNames(seq_along(vector_of_names), vector_of_names)
data %>%
rowwise() %>%
mutate(age = get(names(named_vector_of_names)[match(position, named_vector_of_names)]))
Base R vectorized option using matrix subsetting.
data$age <- data[vector_of_names][cbind(1:nrow(data), data$position)]
data
# id position one two three age
#1 1 1 32 45 NA 32
#2 2 1 34 67 NA 34
#3 3 2 56 87 NA 87
#4 4 2 77 NA NA NA
#5 5 1 87 33 NA 87
#6 6 1 98 56 60 98

Create "row" from first non-NA value in an R data frame

I want to create a "row" containing the first non-NA value that appears in a data frame. So for example, given this test data frame:
test.df <- data.frame(a=c(11,12,13,14,15,16),b=c(NA,NA,23,24,25,26), c=c(31,32,33,34,35,36), d=c(NA,NA,NA,NA,45,46))
test.df
a b c d
1 11 NA 31 NA
2 12 NA 32 NA
3 13 23 33 NA
4 14 24 34 NA
5 15 25 35 45
6 16 26 36 46
I know that I can detect the first appearance of a non-NA like this:
first.appearance <- as.numeric(sapply(test.df, function(col) min(which(!is.na(col)))))
first.appearance
[1] 1 3 1 5
This tells me that the first element in column 1 is not NA, the third element in column 2 is not NA, the first element in column 3 is not NA, and the fifth element in column 4 is not NA. But when I put the pieces together, it yields this (which is logical, but not what I want):
> test.df[first.appearance,]
a b c d
1 11 NA 31 NA
3 13 23 33 NA
1.1 11 NA 31 NA
5 15 25 35 45
I would like the output to be the first non-NA in each column. What is a base or dplyr way to do this? I am not seeing it. Thanks in advance.
a b c d
1 11 23 31 45
We can use
library(dplyr)
test.df %>%
slice(first.appearance) %>%
summarise_all(~ first(.[!is.na(.)]))
# a b c d
#1 11 23 31 45
Or it can be
test.df %>%
summarise_all(~ min(na.omit(.)))
# a b c d
#1 11 23 31 45
Or with colMins
library(matrixStats)
colMins(as.matrix(test.df), na.rm = TRUE)
#[1] 11 23 31 45
You can use :
library(tidyverse)
df %>% fill(everything(), .direction = "up") %>% head(1)
a b c d
<dbl> <dbl> <dbl> <dbl>
1 11 23 31 45

How to match and replace elements between two dataframes

I need to replace elements from one dataframe values into another dataframe.
For example:
df1:
id value
0 1 10
1 2 12
2 3 54
3 4 21
df2:
col1 col2 col3
0 1 2 3
1 1 1 3
2 1 3 4
3 1 1 5
Expected Output:
replaced values from df1 and applied to df2.
col1 col2 col3
0 10 12 54
1 10 10 54
2 10 54 21
3 10 10 5
How to do this is in R?
Ill solve this problem in pandas like below,
dic=df1.set_index('id')['value'].to_dict()
print df2.replace(dic)
But I'm stuck in R.
Please help me to solve this problem?
We can loop through each column of df2 using lapply and find a match for id column in df1 and replace the values for the match found using ifelse and keep the remaining values as it is.
df2[] <- lapply(df2, function(x) {
inds <- match(x, df1$id)
ifelse(is.na(inds),x, df1$value[inds])
})
df2
# col1 col2 col3
#0 10 12 54
#1 10 10 54
#2 10 54 21
#3 10 10 5
We could do this using named vector after creating a copy of the second dataset.
df3 <- df2
df3[] <- setNames(df1$value, df1$id)[as.matrix(df2)]
i1 <- is.na(df3)
df3[i1] <- df2[i1]
df3
# col1 col2 col3
#0 10 12 54
#1 10 10 54
#2 10 54 21
#3 10 10 5
What you can do:
Make a copy of df2:
df3=df2 # in R this is a copy not as in python
df3[]=df1$value[match(as.matrix(df2),df1$id)] # Match the columns
df3[is.na(df3)]=df2[is.na(df3)] # Reset Na to the previous value
df3
col1 col2 col3
0 10 12 54
1 10 10 54
2 10 54 21
3 10 10 5

Split all columns in one data frame and create two data frames in R

I have a single data frame (let's call it df) that looks like this:
col1 <- c("1/10", "2/30", "1/40", "3/23", "0/17", "7/14")
col2 <- c("2/44", "0/13", "4/55", "6/43", "0/19", "2/34")
col3 <- c("0/36", "0/87", "3/11", "2/12", "4/33", "0/12")
col4 <- c("1/76", "2/65", "2/21", "5/0", "2/26", "1/52")
df <- data.frame(col1,col2,col3,col4)
GOAL: In each cell there is are two numbers separated by a "/". Create two data frames: 1 data frame with the the LEFT number and another data frame with the RIGHT number.
The end result would ideally look like this:
df.left.numbers:
col1 col2 col3 col4
1 2 0 1
2 0 0 2
1 4 3 2
3 6 2 5
0 0 4 2
7 2 0 1
df.right.numbers:
col1 col2 col3 col4
10 44 36 76
30 13 87 65
40 55 11 21
23 43 12 0
17 19 33 26
14 34 12 53
I've used strsplit() but that is for 1 column splitting into two within ONE data frame. I also tried the separate() function in the tidyr package however that requires the name of a given column. I am iterating through all of them. I suppose I could write a loop, however I was wondering if anyone had an easier way of making this happen!
Thanks!!
Try this:
require(data.table)
lapply(split(unlist(
lapply(df,tstrsplit,"/"),recursive=FALSE),c("Left","Right")),
as.data.frame)
#$Right
# col12 col22 col32 col42
#1 10 44 36 76
#2 30 13 87 65
#3 40 55 11 21
#4 23 43 12 0
#5 17 19 33 26
#6 14 34 12 52
#$Left
# col11 col21 col31 col41
#1 1 2 0 1
#2 2 0 0 2
#3 1 4 3 2
#4 3 6 2 5
#5 0 0 4 2
#6 7 2 0 1
Not very elegant, but it is short and it works...
col1 <- c("1/10", "2/30", "1/40", "3/23", "0/17", "7/14")
col2 <- c("2/44", "0/13", "4/55", "6/43", "0/19", "2/34")
col3 <- c("0/36", "0/87", "3/11", "2/12", "4/33", "0/12")
col4 <- c("1/76", "2/65", "2/21", "5/0", "2/26", "1/52")
df <- data.frame(col1,col2,col3,col4,stringsAsFactors = FALSE)
dfLeft <- as.data.frame(lapply(df,function(x) gsub("\\/.+","",x)))
dfRight <- as.data.frame(lapply(df,function(x) gsub(".+\\/","",x)))
Another option with purrr package:
library(data.table)
library(purrr)
df %>%
map(tstrsplit, split="/") %>%
transpose() %>% map(as.data.frame) %>%
set_names(c("left", "right"))
#$left
# col1 col2 col3 col4
#1 1 2 0 1
#2 2 0 0 2
#3 1 4 3 2
#4 3 6 2 5
#5 0 0 4 2
#6 7 2 0 1
#$right
# col1 col2 col3 col4
#1 10 44 36 76
#2 30 13 87 65
#3 40 55 11 21
#4 23 43 12 0
#5 17 19 33 26
#6 14 34 12 52

Determine the number of rows with NAs

I have a data frame as follows:
col1 col2 col3
1 23 17 NA
2 55 NA NA
3 24 12 13
4 34 23 12
I'm interested in finding the number of rows in col2 and col3 with NAs.
I was surprised that the following code only gave me 4 instead of 2:
numNAs <- rowSums(is.na(all[,2:3]))
Please help.
Another short solution:
> sum(!complete.cases(dat[-1]))
[1] 2
where dat is the name of your data frame.
DF <- read.table(text=" col1 col2 col3
1 23 17 NA
2 55 NA NA
3 24 12 13
4 34 23 12", header=TRUE)
This gives the number of rows that contain any NA values in column 2 or 3:
sum(colSums(is.na(DF[,2:3])) > 0)
[1] 2
Another solution:
data <- read.table(text='col1 col2 col3
23 17 NA
55 NA NA
24 12 13
34 23 12', header=T)
sum(apply(is.na(data[, -1]), 1, any))
test <- read.table(textConnection(" col1 col2 col3
1 23 17 NA
2 55 NA NA
3 24 12 13
4 34 23 12"))
> table(test$col2,useNA="ifany")
12 17 23 <NA>
1 1 1 1
> table(test$col3,useNA="ifany")
12 13 <NA>
1 1 2
Another solution adding columns 2 and 3:
> sum(is.na(all[,"col2"] + all[,"col3"]))
[1] 2

Resources