How can I show non-NA values when selecting columns? [duplicate] - r

This question already has answers here:
Remove rows with all or some NAs (missing values) in data.frame
(18 answers)
Omit rows containing specific column of NA
(10 answers)
Closed 1 year ago.
I am checking a list of ID numbers from one data frame against another to see if they exist, and I want to only see the values from two columns in the target data frame.
Thus:
df1[!is.na(df1$id %in% df2$id), c("D2", "D8")]
This returns:
D2 D8
1 2021-03-16T13:58:22.4700000Z NA
2 2021-03-22T08:10:12.3190000Z NA
3 2021-03-08T09:17:35.0650000Z 2021-03-10T15:36:53.2090000Z
4 2021-03-08T09:17:35.0650000Z 2021-03-10T15:36:53.2090000Z
5 2021-03-09T08:07:45.4410000Z NA
6 2021-03-23T12:03:46.4720000Z NA
7 2021-04-05T09:01:55.9520000Z NA
8 2021-03-29T12:57:58.5500000Z NA
9 2021-03-09T11:54:34.6420000Z 2021-03-15T12:50:52.6690000Z
10 2021-03-09T11:54:34.6420000Z 2021-03-15T12:50:52.6690000Z
I want to know if there is a way to exclude NA values in either of the columns so that the returned results look like this:
D2 D8
1 2021-03-08T09:17:35.0650000Z 2021-03-10T15:36:53.2090000Z
2 2021-03-08T09:17:35.0650000Z 2021-03-10T15:36:53.2090000Z
3 2021-03-09T11:54:34.6420000Z 2021-03-15T12:50:52.6690000Z
4 2021-03-09T11:54:34.6420000Z 2021-03-15T12:50:52.6690000Z

You can omit NA values from your dataframe by doing
na.omit(df1[, c("D2", "D8")])

Related

R - Select all rows that have one NA value at most? [duplicate]

This question already has answers here:
How to delete rows from a dataframe that contain n*NA
(4 answers)
Closed 3 days ago.
I'm trying to impute my data and keep as many observations as I can. I want to select observations that have 1 NA value at most from the data found at: mlbench::data(PimaIndiansDiabetes2).
For example:
Var1 Var2 Var3
1 NA NA
2 34 NA
3 NA NA
4 NA 55
5 NA NA
6 40 28
What I would like returned:
Var1 Var2 Var3
2 34 NA
4 NA 55
6 40 28
This code returns rows with NA values and I know that I could join all observations with 1 NA value using merge() to observations without NA values. I'm not sure how to do extract those though.
na_rows <- df[!complete.cases(df), ]
A base R solution:
df[rowSums(is.na(df)) <= 1, ]
Its dplyr equivalent:
library(dplyr)
df %>%
filter(rowSums(is.na(pick(everything()))) <= 1)

Removing Rows that Are Exclusively NA [duplicate]

This question already has answers here:
Remove rows in R matrix where all data is NA [duplicate]
(2 answers)
Closed 1 year ago.
Currently I have data that looks like this (just with significantly more NA filled columns and significantly more rows that are exclusively NAs):
Column 1
Column 2
Column 3
NA
NA
NA
Texas
Oklahoma
NA
NA
Florida
Florida
NA
NA
NA
I'd like for it to look like this
Column 1
Column 2
Column 3
Texas
Oklahoma
NA
NA
Florida
Florida
I don't want to get rid of all rows that have an NA value, I just want to get rid of all rows that have nothing but NA values.
Thanks in advance.
You can use janitor package:
janitor::remove_empty(dat, which = "rows")
If you want to get rid of rows that contain only NA but keep rows that have some NA values you could combine is.na and rowSums
dat <- dat[rowSums(!is.na(dat)) > 0, ]
is.na will return a TRUE or FALSE which will be treated as 1 and 0 for the rowSums. Any row with a value that is not NA will have a sum greater than 0.

R get every column with NA [duplicate]

This question already has answers here:
Remove columns from dataframe where some of values are NA
(8 answers)
Closed 6 years ago.
I need a function in R which is doing the following.
I have a matrix with some data
mydata <- data.frame (matrix(c(1,2,3,NA,2,3,NA,NA,2), 3,3))
mydata
X1 X2 X3
1 1 NA NA
2 2 2 NA
3 3 3 2
No I want to check every column of this matrix if there is any NA in a column and create a vector which stores 0 if there is a NA in a column or 1 if there is no NA in the column.
So check column X1: if NA is in this column write 0 to the vector, if not write 1 to the vector. Then check the next column and so on.
After checking mydata the vector should look like this
(1 0 0)
with
colnames(mydata)[colSums(is.na(mydata)) > 0]
I get the column names which have NA. But how can I use this function to create the vector?
We can coerce the logical vector from colSums on the logical matrix to binary with as.integer
as.integer(!colSums(is.na(mydata)))
#[1] 1 0 0

Remove NA values in R [duplicate]

This question already has answers here:
Remove rows with all or some NAs (missing values) in data.frame
(18 answers)
Closed 6 years ago.
How can I remove rows which include NA values in a subset of columns?
For the following data.frame, I want to remove rows which have NA in both columns ID1 and ID2:
name ID1 ID2
a NA NA
b NA 2
c 3 NA
I want to receive this one:
name ID1 ID2
b NA 2
c 3 NA
mydf[!(is.na(mydf$ID1) & is.na(mydf$ID2)),]
subset(data, !(is.na(ID1) & is.na(ID2)))

Merging data frames of different row length in R [duplicate]

This question already has answers here:
Merge unequal dataframes and replace missing rows with 0
(7 answers)
Closed 8 years ago.
Hello I have been looking for a solution for quite some time. I'm sure the answer is easy but I've been pulling my hair out here!
I have two data frames that are similar (in fact one represents a more complete dataset). They both have two columns, one containing string values as a factor and one containing numerical values.
df.A looks like this:
Category Number
A 1
B 2
C 3
D 4
and df.B looks like this
Category Number
A 5
B 6
C 7
These categories (ABCD) are common between the two dataframes. In trying to get df.B to have a category D with a NA or 0 value (I am working with percentages so either NA or 0 is fine), my code looks like this:
proto <- df.A
proto$number <- NULL
df.B <- rbind.fill(proto,df.B)
My thought is this would add the fourth row for category D and give NA value but instead results in
Category Number
A NA
B NA
C NA
D NA
NA 5
NA 6
NA 7
I tried removing the factor class from category on both df.A and df.B, tried using rbind.fill.matrix instead...to be honest I am pretty new to R and this is giving me a lot of trouble. How do I get R to recognize that ABCD are the same factor across dataframes?
You can achieve the desired result by using merge:
merge(df.A,df.B,by='Category',all=T)
which will produce the following output:
# Category Number.x Number.y
#1 A 1 5
#2 B 2 6
#3 C 3 7
#4 D 4 NA

Resources