Remove NA values in R [duplicate] - r

This question already has answers here:
Remove rows with all or some NAs (missing values) in data.frame
(18 answers)
Closed 6 years ago.
How can I remove rows which include NA values in a subset of columns?
For the following data.frame, I want to remove rows which have NA in both columns ID1 and ID2:
name ID1 ID2
a NA NA
b NA 2
c 3 NA
I want to receive this one:
name ID1 ID2
b NA 2
c 3 NA

mydf[!(is.na(mydf$ID1) & is.na(mydf$ID2)),]

subset(data, !(is.na(ID1) & is.na(ID2)))

Related

R - Select all rows that have one NA value at most? [duplicate]

This question already has answers here:
How to delete rows from a dataframe that contain n*NA
(4 answers)
Closed 3 days ago.
I'm trying to impute my data and keep as many observations as I can. I want to select observations that have 1 NA value at most from the data found at: mlbench::data(PimaIndiansDiabetes2).
For example:
Var1 Var2 Var3
1 NA NA
2 34 NA
3 NA NA
4 NA 55
5 NA NA
6 40 28
What I would like returned:
Var1 Var2 Var3
2 34 NA
4 NA 55
6 40 28
This code returns rows with NA values and I know that I could join all observations with 1 NA value using merge() to observations without NA values. I'm not sure how to do extract those though.
na_rows <- df[!complete.cases(df), ]
A base R solution:
df[rowSums(is.na(df)) <= 1, ]
Its dplyr equivalent:
library(dplyr)
df %>%
filter(rowSums(is.na(pick(everything()))) <= 1)

Removing Rows that Are Exclusively NA [duplicate]

This question already has answers here:
Remove rows in R matrix where all data is NA [duplicate]
(2 answers)
Closed 1 year ago.
Currently I have data that looks like this (just with significantly more NA filled columns and significantly more rows that are exclusively NAs):
Column 1
Column 2
Column 3
NA
NA
NA
Texas
Oklahoma
NA
NA
Florida
Florida
NA
NA
NA
I'd like for it to look like this
Column 1
Column 2
Column 3
Texas
Oklahoma
NA
NA
Florida
Florida
I don't want to get rid of all rows that have an NA value, I just want to get rid of all rows that have nothing but NA values.
Thanks in advance.
You can use janitor package:
janitor::remove_empty(dat, which = "rows")
If you want to get rid of rows that contain only NA but keep rows that have some NA values you could combine is.na and rowSums
dat <- dat[rowSums(!is.na(dat)) > 0, ]
is.na will return a TRUE or FALSE which will be treated as 1 and 0 for the rowSums. Any row with a value that is not NA will have a sum greater than 0.

How can I show non-NA values when selecting columns? [duplicate]

This question already has answers here:
Remove rows with all or some NAs (missing values) in data.frame
(18 answers)
Omit rows containing specific column of NA
(10 answers)
Closed 1 year ago.
I am checking a list of ID numbers from one data frame against another to see if they exist, and I want to only see the values from two columns in the target data frame.
Thus:
df1[!is.na(df1$id %in% df2$id), c("D2", "D8")]
This returns:
D2 D8
1 2021-03-16T13:58:22.4700000Z NA
2 2021-03-22T08:10:12.3190000Z NA
3 2021-03-08T09:17:35.0650000Z 2021-03-10T15:36:53.2090000Z
4 2021-03-08T09:17:35.0650000Z 2021-03-10T15:36:53.2090000Z
5 2021-03-09T08:07:45.4410000Z NA
6 2021-03-23T12:03:46.4720000Z NA
7 2021-04-05T09:01:55.9520000Z NA
8 2021-03-29T12:57:58.5500000Z NA
9 2021-03-09T11:54:34.6420000Z 2021-03-15T12:50:52.6690000Z
10 2021-03-09T11:54:34.6420000Z 2021-03-15T12:50:52.6690000Z
I want to know if there is a way to exclude NA values in either of the columns so that the returned results look like this:
D2 D8
1 2021-03-08T09:17:35.0650000Z 2021-03-10T15:36:53.2090000Z
2 2021-03-08T09:17:35.0650000Z 2021-03-10T15:36:53.2090000Z
3 2021-03-09T11:54:34.6420000Z 2021-03-15T12:50:52.6690000Z
4 2021-03-09T11:54:34.6420000Z 2021-03-15T12:50:52.6690000Z
You can omit NA values from your dataframe by doing
na.omit(df1[, c("D2", "D8")])

R - counting with NA in dataframe [duplicate]

This question already has answers here:
ignore NA in dplyr row sum
(6 answers)
Closed 4 years ago.
lets say that I have this dataframe in R
df <- read.table(text="
id a b c
1 42 3 2 NA
2 42 NA 6 NA
3 42 1 NA 7", header=TRUE)
I´d like to calculate all columns to one, so result should look like this.
id a b c d
1 42 3 2 NA 5
2 42 NA 6 NA 6
3 42 1 NA 7 8
My code below doesn´t work since there is that NA values. Please note that I have to choose columns that I want to count since in my real dataframe I have some columns that I don´t want count together.
df %>%
mutate(d = a + b + c)
You can use rowSums for this which has an na.rm parameter to drop NA values.
df %>% mutate(d=rowSums(tibble(a,b,c), na.rm=TRUE))
or without dplyr using just base R.
df$d <- rowSums(subset(df, select=c(a,b,c)), na.rm=TRUE)

missing values filling in R [duplicate]

This question already has answers here:
How to implement coalesce efficiently in R
(9 answers)
nested ifelse() is the worst; what's the best? [duplicate]
(3 answers)
Closed 5 years ago.
I want help on R programming to fill col4
col4=col1, in case col1 is NA then col4=col2, in case col1 and col2 are NA then col4=col3
id col1 col2 col3
1 10 NA NA
2 NA 12 NA
3 NA NA 13
4 NA NA 1
5 2 3 NA
Answer:
id col4
1 10
2 12
3 13
4 1
5 2
Easily done with coalesce from dplyr. This solution works for N number of columns:
library(dplyr)
data %>%
mutate(col4 = coalesce(!!!data[-1]))
Result:
id col1 col2 col3 col4
1 1 10 NA NA 10
2 2 NA 12 NA 12
3 3 NA NA 13 13
4 4 NA NA 1 1
5 5 2 3 NA 2
Data:
data = read.table(text = "id col1 col2 col3
1 10 NA NA
2 NA 12 NA
3 NA NA 13
4 NA NA 1
5 2 3 NA", header = T)
Notes:
!!! shouldn't be confused with the negation operator ! (understandable confusion). It is an operator that is part of rlang, or the tidyverse (also available to dplyr) which enables explicit splicing.
What this means is that instead of inputting the entire data frame into coalesce (coalesce(data[-1])), I am separating the columns of data[-1] (or elements of the list) and have each element as an input to coalesce. So this:
coalesce(!!!data[-1])
is actually equivalent to this:
coalesce(col1, col2, col3)
The advantage of writing it this way is that you don't have to know the column names nor how many columns there are to begin with.
Using dplyr::coalesce, or any of the answers at How to implement coalesce in R?:
xx$col4 = with(xx, coalesce(col1, col2, col3))

Resources