Removing Rows that Are Exclusively NA [duplicate] - r

This question already has answers here:
Remove rows in R matrix where all data is NA [duplicate]
(2 answers)
Closed 1 year ago.
Currently I have data that looks like this (just with significantly more NA filled columns and significantly more rows that are exclusively NAs):
Column 1
Column 2
Column 3
NA
NA
NA
Texas
Oklahoma
NA
NA
Florida
Florida
NA
NA
NA
I'd like for it to look like this
Column 1
Column 2
Column 3
Texas
Oklahoma
NA
NA
Florida
Florida
I don't want to get rid of all rows that have an NA value, I just want to get rid of all rows that have nothing but NA values.
Thanks in advance.

You can use janitor package:
janitor::remove_empty(dat, which = "rows")

If you want to get rid of rows that contain only NA but keep rows that have some NA values you could combine is.na and rowSums
dat <- dat[rowSums(!is.na(dat)) > 0, ]
is.na will return a TRUE or FALSE which will be treated as 1 and 0 for the rowSums. Any row with a value that is not NA will have a sum greater than 0.

Related

R - Select all rows that have one NA value at most? [duplicate]

This question already has answers here:
How to delete rows from a dataframe that contain n*NA
(4 answers)
Closed 3 days ago.
I'm trying to impute my data and keep as many observations as I can. I want to select observations that have 1 NA value at most from the data found at: mlbench::data(PimaIndiansDiabetes2).
For example:
Var1 Var2 Var3
1 NA NA
2 34 NA
3 NA NA
4 NA 55
5 NA NA
6 40 28
What I would like returned:
Var1 Var2 Var3
2 34 NA
4 NA 55
6 40 28
This code returns rows with NA values and I know that I could join all observations with 1 NA value using merge() to observations without NA values. I'm not sure how to do extract those though.
na_rows <- df[!complete.cases(df), ]
A base R solution:
df[rowSums(is.na(df)) <= 1, ]
Its dplyr equivalent:
library(dplyr)
df %>%
filter(rowSums(is.na(pick(everything()))) <= 1)

Move data from small data frame to columns in large dataframe with R [duplicate]

This question already has answers here:
Combine two data frames by rows (rbind) when they have different sets of columns
(14 answers)
Closed last year.
I have two data frames in R. There is not an ID of any sort in DF1 to use to map the rows to - I just need the entire column copied over for a data migration.
DF1 has 1349 named columns, and empty rows.
DF2 has 10 named columns and 2990 rows of sample data.
I made a small scale example:
DF1 <- data.frame(matrix(ncol = 10, nrow = 0))
colnames(DF1) <- c('one','two','three','four','five','six','seven','eight','nine','ten')
one <- c(1,54,7,3,6,3)
seven <- c('MLS','Marshall','AAE','JC','AAA','EXE')
DF2 <- data.frame(one,seven)
The column names are the same, but they are not blocked together in DF1 - they are randomly dispersed.
I want to find an efficient way of mapping the 10 columns and all of the rows from DF2 to DF1 without needing to type in each column name, as I will also need to do with with a much larger data frame later.
I expect the rest of the columns in DF1 to be blank/null other than the 'imported' columns from DF2 have been added -- this is okay. Is there an easy way to do this?
Thanks!
dplyr has a nice utility for this:
dplyr::bind_rows(DF1, DF2)
# one two three four five six seven eight nine ten
# 1 1 NA NA NA NA NA MLS NA NA NA
# 2 54 NA NA NA NA NA Marshall NA NA NA
# 3 7 NA NA NA NA NA AAE NA NA NA
# 4 3 NA NA NA NA NA JC NA NA NA
# 5 6 NA NA NA NA NA AAA NA NA NA
# 6 3 NA NA NA NA NA EXE NA NA NA

How can I show non-NA values when selecting columns? [duplicate]

This question already has answers here:
Remove rows with all or some NAs (missing values) in data.frame
(18 answers)
Omit rows containing specific column of NA
(10 answers)
Closed 1 year ago.
I am checking a list of ID numbers from one data frame against another to see if they exist, and I want to only see the values from two columns in the target data frame.
Thus:
df1[!is.na(df1$id %in% df2$id), c("D2", "D8")]
This returns:
D2 D8
1 2021-03-16T13:58:22.4700000Z NA
2 2021-03-22T08:10:12.3190000Z NA
3 2021-03-08T09:17:35.0650000Z 2021-03-10T15:36:53.2090000Z
4 2021-03-08T09:17:35.0650000Z 2021-03-10T15:36:53.2090000Z
5 2021-03-09T08:07:45.4410000Z NA
6 2021-03-23T12:03:46.4720000Z NA
7 2021-04-05T09:01:55.9520000Z NA
8 2021-03-29T12:57:58.5500000Z NA
9 2021-03-09T11:54:34.6420000Z 2021-03-15T12:50:52.6690000Z
10 2021-03-09T11:54:34.6420000Z 2021-03-15T12:50:52.6690000Z
I want to know if there is a way to exclude NA values in either of the columns so that the returned results look like this:
D2 D8
1 2021-03-08T09:17:35.0650000Z 2021-03-10T15:36:53.2090000Z
2 2021-03-08T09:17:35.0650000Z 2021-03-10T15:36:53.2090000Z
3 2021-03-09T11:54:34.6420000Z 2021-03-15T12:50:52.6690000Z
4 2021-03-09T11:54:34.6420000Z 2021-03-15T12:50:52.6690000Z
You can omit NA values from your dataframe by doing
na.omit(df1[, c("D2", "D8")])

find the row with highest number of NA value in R

I have datafrom
df
1 a c NA NA
2 a a a NA
3 c NA NA NA
Firstly, I want to find which row has the highest number of NA value. I am also interested to find rows with the condition of having more than 2 NA values.
How can I do it in R?
na_rows = rowSums(is.na(df)) gives the count of NA by row. You can then look at which.max(na_rows) and which(na_rows > 2).

Data.frame copy and paste values based on a condition

I have a data frame with the following structure/values and would like to go through the data frame (by row) and paste the values from the first column ("One") into the cells of the other columns only if they are not NA:
My data:
One Two Three Four
1 Bar_2_Foo NA NA 1
2 Mur_4_Doo 1 NA 2
3 Bur_3_Hoo NA 1 NA
What I would like to achieve:
One Two Three Four
1 Bar_2_Foo NA NA Bar_2_Foo_1
2 Mur_4_Doo Mur_4_Doo_1 NA Mur_4_Doo_2
3 Bur_3_Hoo NA Bur_3_Hoo_1 NA
Any ideas how to achieve this would be great. Thanks.
Is this what you're looking for?
mutate_at(data, Two:Four, function(i){
ifelse(!is.na(i), paste0(One, "_", i), i) } )

Resources