Say I have this data frame in R.
df <- data.frame( col1 = c(3,4,'NA','NA'), col2 = c('NA','NA',1,5))
col1 col2
1 3 NA
2 4 NA
3 NA 1
4 NA 5
I would like to have new column like this
col1 col2 col3
1 3 NA 3
2 4 NA 4
3 NA 1 1
4 NA 5 5
How shall I do that?
At the moment your df does not contains true NA but rather the strings 'NA'. You probably want to have true NA, as per #G5W comment.
Once we have true NA we can use:
df$col3 <- ifelse(is.na(df$col1), df$col2, df$col1)
or, with dplyr:
library(dplyr)
df$col3 <- coalesce(df$col1, df$col2)
We can use pmax or pmin to do this (from base R)
df$col3 <- do.call(pmax, c(df, na.rm=TRUE))
df$col3
#[1] 3 4 1 5
data
df <- structure(list(col1 = c(3L, 4L, NA, NA), col2 = c(NA, NA, 1L,
5L)), .Names = c("col1", "col2"), class = "data.frame", row.names = c("1",
"2", "3", "4"))
Related
Hello coding community,
If my data frame looks like:
ID Col1 Col2 Col3 Col4
Per1 1 2 3 4
Per2 2 NA NA NA
Per3 NA NA 5 NA
Is there any syntax to delete the row associated with ID = Per2, on the basis that Col2, Col3, AND Col4 = NA? I am hoping for code that will allow me to delete a row on the basis that three specific columns (Col2, Col3, and Col4) ALL are NA. This code would NOT delete the row ID = Per3, even though there are three NAs.
Please note that I know how to delete a specific row, but my data frame is big so I do not want to manually sort through all rows/columns.
Big thanks!
Test for NA and delete rows with a number of NA's equal to the number of columns tested using rowSums.
dat[!rowSums(is.na(dat[c('Col2', 'Col3', 'Col4')])) == 3, ]
# ID Col1 Col2 Col3 Col4
# 1 Per1 1 2 3 4
# 3 Per3 NA NA 5 NA
You can use if_all
library(dplyr)
filter(df, !if_all(c(Col2, Col3, Col4), ~ is.na(.)))
# ID Col1 Col2 Col3 Col4
# 1 Per1 1 2 3 4
# 2 Per3 NA NA 5 NA
data
df <- structure(list(ID = c("Per1", "Per2", "Per3"), Col1 = c(1L, 2L,
NA), Col2 = c(2L, NA, NA), Col3 = c(3L, NA, 5L), Col4 = c(4L,
NA, NA)), class = "data.frame", row.names = c(NA, -3L))
Using if_any
library(dplyr)
df %>%
filter(if_any(Col2:Col4, complete.cases))
ID Col1 Col2 Col3 Col4
1 Per1 1 2 3 4
2 Per3 NA NA 5 NA
Given a data frame in R how do I determine the number of non blank values per row.
col1 col2 col3 rowCounts
1 3 2
1 6 2
1 1
0
This is how I did it in python:
df['rowCounts'] = df.apply(lambda x: x.count(), axis=1)
What is the R Code for this?
In base R, we can use (assuming NA as blank) rowSums as a vectorized option on the logical matrix (!is.na(df)) where TRUE (->1 i.e. non-NA) values will be added for each row with rowSums
df$rowCounts <- rowSums(!is.na(df))
-output
df
# col1 col2 col3 rowCounts
#1 1 3 NA 2
#2 NA 1 6 2
#3 NA NA 1 1
#4 NA NA NA 0
If the blank is ""
df$rowCounts <- rowSums(df != "", na.rm = TRUE)
Or with apply and MARGIN = 1 as a similar syntax to Python (though it will be slower compared to rowSums)
df$rowCounts <- apply(df, 1, function(x) sum(!is.na(x)))
data
df <- structure(list(col1 = c(1L, NA, NA, NA), col2 = c(3L, 1L, NA,
NA), col3 = c(NA, 6L, 1L, NA)), class = "data.frame", row.names = c(NA,
-4L))
Suppose I have a data.table with the following data:
colA colB colC result
1 2 3 231
1 NA 2 123
NA 3 NA 345
11 NA NA 754
How would I use dplyr and magrittr to only select the following rows:
colA colB colC result
NA 3 NA 345
11 NA NA 754
The selection criteria is: only 1 non-NA value for columns A-C (i.e. colA, colB, ColC)
I have been unable to find a similar question; guessing this is an odd situation.
A base R option would be
df[apply(df, 1, function(x) sum(!is.na(x)) == 1), ]
# colA colB colC
#3 NA 3 NA
#4 11 NA NA
A dplyr option is
df %>% filter(rowSums(!is.na(.)) == 1)
Update
In response to your comment, you can do
df[apply(df[, -ncol(df)], 1, function(x) sum(!is.na(x)) == 1), ]
# colA colB colC result
#3 NA 3 NA 345
#4 11 NA NA 754
Or the same in dplyr
df %>% filter(rowSums(!is.na(.[-length(.)])) == 1)
This assumes that the last column is the one you'd like to ignore.
Sample data
df <-read.table(text = "colA colB colC
1 2 3
1 NA 2
NA 3 NA
11 NA NA", header = T)
Sample data for update
df <- read.table(text =
"colA colB colC result
1 2 3 231
1 NA 2 123
NA 3 NA 345
11 NA NA 754
", header = T)
Another option is filter with map
library(dplyr)
library(purrr)
df %>%
filter(map(select(., starts_with('col')), ~ !is.na(.)) %>%
reduce(`+`) == 1)
# colA colB colC result
#1 NA 3 NA 345
#2 11 NA NA 754
Or another option is to use transmute_at
df %>%
transmute_at(vars(starts_with('col')), ~ !is.na(.)) %>%
reduce(`+`) %>%
magrittr::equals(1) %>% filter(df, .)
# colA colB colC result
#1 NA 3 NA 345
#2 11 NA NA 754
data
df <- structure(list(colA = c(1L, 1L, NA, 11L), colB = c(2L, NA, 3L,
NA), colC = c(3L, 2L, NA, NA), result = c(231L, 123L, 345L, 754L
)), class = "data.frame", row.names = c(NA, -4L))
I think this would be possible with filter_at but I was not able to make it work. Here is one attempt with filter and pmap_lgl where you can specify the range of columns in select or specify by their positions or use other tidyselect helper variables.
library(dplyr)
library(purrr)
df %>%
filter(pmap_lgl(select(., colA:colC), ~sum(!is.na(c(...))) == 1))
# colA colB colC result
#1 NA 3 NA 345
#2 11 NA NA 754
data
df <- structure(list(colA = c(1L, 1L, NA, 11L), colB = c(2L, NA, 3L,
NA), colC = c(3L, 2L, NA, NA), result = c(231L, 123L, 345L, 754L
)), class = "data.frame", row.names = c(NA, -4L))
I am trying to do rowSums but I got zero for the last row and I need it to be "NA".
My df is
a b c sum
1 1 4 7 12
2 2 NA 8 10
3 3 5 NA 8
4 NA NA NA NA
I used this code based on this link; Sum of two Columns of Data Frame with NA Values
df$sum<-rowSums(df[,c("a", "b", "c")], na.rm=T)
Any advice will be greatly appreciated
For each row check if it is all NA and if so return NA; otherwise, apply sum. We have selected columns a, b and c even though that is all the columns because the poster indicated that there might be additional ones.
sum_or_na <- function(x) if (all(is.na(x))) NA else sum(x, na.rm = TRUE)
transform(df, sum = apply(df[c("a", "b", "c")], 1, sum_or_na))
giving:
a b c sum
1 1 4 7 12
2 2 NA 8 10
3 3 5 NA 8
4 NA NA NA NA
Note
df in reproducible form is assumed to be:
df <- structure(list(a = c(1L, 2L, 3L, NA), b = c(4L, NA, 5L, NA),
c = c(7L, 8L, NA, NA)),
row.names = c("1", "2", "3", "4"), class = "data.frame")
I have a data frame that looks like this:
cat df1 df2 df3
1 1 NA 1 NA
2 1 NA 2 NA
3 1 NA 3 NA
4 2 1 NA NA
5 2 2 NA NA
6 2 3 NA NA
I want to populate df3 so that when cat = 1, df3 = df2 and when cat = 2, df3 = df1. However I am getting a few different error messages.
My current code looks like this:
df$df3[df$cat == 1] <- df$df2
df$df3[df$cat == 2] <- df$df1
Try this code:
df[df$cat==1,"df3"]<-df[df$cat==1,"df2"]
df[df$cat==2,"df3"]<-df[df$cat==1,"df1"]
The output:
df
cat df1 df2 df3
1 1 1 1 1
2 2 1 2 1
3 3 1 3 NA
4 4 2 NA NA
5 5 2 NA NA
6 5 2 NA NA
You can try
ifelse(df$cat == 1, df$df2, df$df1)
[1] 1 2 3 1 2 3
# saving
df$df3 <- ifelse(df$cat == 1, df$df2, df$df1)
# if there are other values than 1 and 2 you can try a nested ifelse
# that is setting other values to NA
ifelse(df$cat == 1, df$df2, ifelse(df$cat == 2, df$df1, NA))
# or you can try a tidyverse solution.
library(tidyverse)
df %>%
mutate(df3=case_when(cat == 1 ~ df2,
cat == 2 ~ df1))
cat df1 df2 df3
1 1 NA 1 1
2 1 NA 2 2
3 1 NA 3 3
4 2 1 NA 1
5 2 2 NA 2
6 2 3 NA 3
# data
df <- structure(list(cat = c(1L, 1L, 1L, 2L, 2L, 2L), df1 = c(NA, NA,
NA, 1L, 2L, 3L), df2 = c(1L, 2L, 3L, NA, NA, NA), df3 = c(NA,
NA, NA, NA, NA, NA)), .Names = c("cat", "df1", "df2", "df3"), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6"))