How to remove rows with inf from a dataframe in R - r

I have a very large dataframe(df) with approximately 35-45 columns(variables) and rows greater than 300. Some of the rows contains NA,NaN,Inf,-Inf values in single or multiple variables and I have used
na.omit(df) to remove rows with NA and NaN but I cant remove rows with Inf and -Inf values using na.omit function.
While searching I came across this thread Remove rows with Inf and NaN in R and used the modified code df[is.finite(df)] but its not removing the rows with Inf and -Inf and also gives this error
Error in is.finite(df) : default method not implemented for type
'list'
EDITED
Remove the entire row even the corresponding one or multiple columns have inf and -inf

To remove the rows with +/-Inf I'd suggest the following:
df <- df[!is.infinite(rowSums(df)),]
or, equivalently,
df <- df[is.finite(rowSums(df)),]
The second option (the one with is.finite() and without the negation) removes also rows containing NA values in case that this has not already been done.

Depending on the data, there are a couple options using scoped variants of dplyr::filter() and is.finite() or is.infinite() that might be useful:
library(dplyr)
# sample data
df <- data_frame(a = c(1, 2, 3, NA), b = c(5, Inf, 8, 8), c = c(9, 10, Inf, 11), d = c('a', 'b', 'c', 'd'))
# across all columns:
df %>%
filter_all(all_vars(!is.infinite(.)))
# note that is.finite() does not work with NA or strings:
df %>%
filter_all(all_vars(is.finite(.)))
# checking only numeric columns:
df %>%
filter_if(~is.numeric(.), all_vars(!is.infinite(.)))
# checking only select columns, in this case a through c:
df %>%
filter_at(vars(a:c), all_vars(!is.infinite(.)))

The is.finite works on vector and not on data.frame object. So, we can loop through the data.frame using lapply and get only the 'finite' values.
lapply(df, function(x) x[is.finite(x)])
If the number of Inf, -Inf values are different for each column, the above code will have a list with elements having unequal length. So, it may be better to leave it as a list. If we want a data.frame, it should have equal lengths.
If we want to remove rows contain any NA or Inf/-Inf values
df[Reduce(`&`, lapply(df, function(x) !is.na(x) & is.finite(x))),]
Or a compact option by #nicola
df[Reduce(`&`, lapply(df, is.finite)),]
If we are ready to use a package, a compact option would be NaRV.omit
library(IDPmisc)
NaRV.omit(df)
data
set.seed(24)
df <- as.data.frame(matrix(sample(c(1:5, NA, -Inf, Inf),
20*5, replace=TRUE), ncol=5))

To keep the rows without Inf we can do:
df[apply(df, 1, function(x) all(is.finite(x))), ]
Also NAs are handled by this because of:
a rowindex with value NA will remove this row in the result.
Also rows with NaN are not in the result.
set.seed(24)
df <- as.data.frame(matrix(sample(c(0:9, NA, -Inf, Inf, NaN), 20*5, replace=TRUE), ncol=5))
df2 <- df[apply(df, 1, function(x) all(is.finite(x))), ]
Here are the results of the different is.~-functions:
x <- c(42, NA, NaN, Inf)
is.finite(x)
# [1] TRUE FALSE FALSE FALSE
is.na(x)
# [1] FALSE TRUE TRUE FALSE
is.nan(x)
# [1] FALSE FALSE TRUE FALSE

df[!is.infinite(df$x),]
wherein x is the column of df that contains the infinite values. The first answer posted was contingent on rowsums but for my own problem, the df had columns which could not be added.

It took me awhile to work this out for dplyr 1.0.0 so i thought i would put up the new version of #sbha solutions using c_across since filter_all, filter_if are getting deprecated.
library(dplyr)
df <- tibble(a = c(1, 2, 3, NA), b = c(5, Inf, 8, 8), c = c(9, 10, Inf, 11), d = c('a', 'b', 'c', 'd'))
# a b c d
# <dbl> <dbl> <dbl> <chr>
# 1 1 5 9 a
# 2 2 Inf 10 b
# 3 3 8 Inf c
# 4 NA 8 11 d
df %>%
rowwise %>%
filter(!all(is.infinite(c_across(where(is.numeric)))))
# # A tibble: 4 x 4
# # Rowwise:
# a b c d
# <dbl> <dbl> <dbl> <chr>
# 1 1 5 9 a
# 2 2 Inf 10 b
# 3 3 8 Inf c
# 4 NA 8 11 d
df %>%
rowwise %>%
filter(!any(is.infinite(c_across(where(is.numeric)))))
# # A tibble: 2 x 4
# # Rowwise:
# a b c d
# <dbl> <dbl> <dbl> <chr>
# 1 1 5 9 a
# 2 NA 8 11 d
df %>%
rowwise %>%
filter(!any(is.infinite(c_across(a:c))))
# # A tibble: 2 x 4
# # Rowwise:
# a b c d
# <dbl> <dbl> <dbl> <chr>
# 1 1 5 9 a
# 2 NA 8 11 d
To be honest I think #sbha answer is simpler!

I had this problem and none of the above solutions worked for me. I used the following to remove rows with +/-Inf in columns 15 and 16 of my dataframe.
d<-subset(c, c[,15:16]!="-Inf")
e<-subset(d, d[,15:16]!="Inf")

I consider myself new to coding and I couldn't get the recommendations above to work with my code.
I found a less complicated way to reduce a dataframe with 2 lines, first by replacing Inf with Na, then by selecting rows with complete data:
Df[sapply(Df, is.infinite)] <- NA
Df<-Df[complete.cases(Df), ]

Related

How do I add a column to a data frame consisting of minimum values from other columns?

How do I add a column to a data frame consisting of the minimum values from other columns? So in this case, to create a third column that will have the values 1, 2 and 2?
df = data.frame(A = 1:3, B = 4:2)
You can use apply() function to do this. See below.
df$C <- apply(df, 1, min)
The second argument allows you to choose the dimension in which you want min to be applied, in this case 1, applies min to all columns in each row separately.
You can choose specific columns from the dataframe, as follows:
df$newCol <- apply(df[c('A','B')], 1, min)
You can call the parallel minimum function with do.call to apply it on all your columns:
df$C <- do.call(pmin, df)
df %>%
rowwise() %>%
mutate(C = min(A, B))
# A tibble: 3 × 3
# Rowwise:
A B C
<int> <int> <int>
1 1 4 1
2 2 3 2
3 3 2 2
Using input with equal values across rows:
df = data.frame(A = 1:10, B = 11:2)
df %>%
rowwise() %>%
mutate(C = min(A, B))
# A tibble: 10 × 3
# Rowwise:
A B C
<int> <int> <int>
1 1 11 1
2 2 10 2
3 3 9 3
4 4 8 4
5 5 7 5
6 6 6 6
7 7 5 5
8 8 4 4
9 9 3 3
10 10 2 2
You do simply:
df$C <- apply(FUN=min,MARGIN=1,X=df)
Or:
df[, "C"] <- apply(FUN=min,MARGIN=1,X=df)
or:
df["C"] <- apply(FUN=min,MARGIN=1,X=df)
Instead of apply, you could also use data.farme(t(df)), where t transposes df, because sapply would traverse a data frame column-wise applying the given function. So the rows must be made columns. Since t outputs always a matrix, you need to make it a data.frame() again.
df$C <- sapply(data.frame(t(df)), min)
Or one could use the fact that ifelse is vectorized:
df$C <- with(df, ifelse(A<B,A,B))
Or:
df$C <- ifelse(df$A < df$B, df$A, df$B)
matrixStats
# install.packages("matrixStats")
matrixStats::rowMins(as.matrix(df))
According to this SO answer the fastest.
apply-type functions use lists and are always quite slow.
You can use transform() to add the min column as the output of pmin(a, b) and access the elements of df without indexing:
df <- transform(df, min = pmin(a, b))
or
In data.table
library(data.table)
DT = data.table(a = 1:3, b = 4:2)
DT[, min := pmin(a, b)]

Find last of several columns that is not NA (tidyverse)

Not sure what I'm doing wrong but I'm struggling getting the index per row of the last column (among several columns) that is not NA.
Using tidyverse and across, I'm getting as many output columns as input columns where I'd expect one single output column with the index of the respective column.
dat <- data.frame(id = c(1, 2, 3),
x = c(1, NA, NA),
y = c(NA, NA, NA),
z = c(3, 1, NA))
I tried the following (among others, inspired by this one: Return last data frame column which is not NA):
dat %>%
mutate(last = across(-id, ~max.col(!is.na(.x), ties.method="last")))
Expected outcome would be:
id x y z last
1 1 1 NA 3 3
2 2 NA NA 1 3
3 3 NA NA NA NA
The problems with your current flow:
across is going to pass one column at a time to the function/expression; your code needs a row or a matrix/frame. For this, across is not appropriate.
Your desired output of NA for the last row is inconsistent with the logic: !is.na(.x) should return c(F,F,F), which still has a max. Your logic then requires a custom function, since you need to handle it differently.
Try this adaptation of max.col into a custom function:
max.col.notna <- function (m, ties.method = c("random", "first", "last")) {
ties.method <- match.arg(ties.method)
tieM <- which(ties.method == eval(formals()[["ties.method"]]))
out <- .Internal(max.col(as.matrix(m), tieM))
m[] <- !m %in% c(0,NA) # 'm[] <-' is required to maintain the matrix shape
replace(out, rowSums(m) == 0, NA_integer_)
}
dat %>%
mutate(last = max.col.notna(!is.na(select(., -id)), ties.method = "last"))
# id x y z last
# 1 1 1 NA 3 3
# 2 2 NA NA 1 3
# 3 3 NA NA NA NA
Note: I've edited/changed the function several times, trying to ensure a consistent API to the intent of this custom function. As it stands now, the notna in the function name to me reflects a sense of "emptiness" (either 0 or NA). With this logic, the function is usable with logical (as here) and numeric data. Perhaps it's overkill, but I prefer APIs that operate consistently/predictably across input classes.
tidyverse isn't really suitable for row-wise operation. Most of the times reshaping the data into long format (as shown in #Rui Barradas answer) is a good approach.
Here is one way using rowwise keeping the data wide.
library(dplyr)
dat %>%
rowwise() %>%
mutate(last = {ind = which(!is.na(c_across(x:z)));
if(length(ind)) tail(ind, 1) else NA})
# id x y z last
# <dbl> <dbl> <lgl> <dbl> <int>
#1 1 1 NA 3 3
#2 2 NA NA 1 3
#3 3 NA NA NA NA
An R base solution:
dat$last = apply(dat[,2:4], 1,
FUN = function(x) ifelse(max(which(is.na(x))) == length(x), NA, max(which(is.na(x)))+1 ))
dat
# id x y z last
# 1 1 1 NA 3 3
# 2 2 NA NA 1 3
# 3 3 NA NA NA NA
You want to use c_across() and rowwise() to do this. rowwise() works similar to group_by_all(), except it is more explicit. c_across() creates flat vectors out of columns (whereas across() creates tibbles).
If we first define a function seperately to pull out the last non-NA value, or return NA if there are none:
get_last <- function(x){
y <- c(NA,which(!is.na(x)))
y[length(y)]
}
We can then apply that function c_across() the variables we need, but only after converting into a rowwise_df using rowwise()
dat %>%
rowwise() %>%
mutate(last = get_last(c_across(x:z)))
base R
df <- data.frame(id = c(1, 2, 3),
x = c(1, NA, NA),
y = c(NA, NA, NA),
z = c(3, 1, NA))
df$last <- apply(df[-1], 1, function(x) max(as.vector(!is.na(x)) * seq_len(length(x))))
df$last[df$last == 0] <- NA
df
#> id x y z last
#> 1 1 1 NA 3 3
#> 2 2 NA NA 1 3
#> 3 3 NA NA NA NA
Created on 2020-12-29 by the reprex package (v0.3.0)
Starting with a vector of NAs, you could step through each col and if the given element passes your check_fun returning TRUE, assign the index of that col to that element. The difference from the other answers here is that this does not check the condition row-wise or create a matrix from the data. Not sure whether creating two new temp vectors for each column is better/worse than just converting the entire data to a matrix first though.
library(tidyverse) # purrr and dplyr
last_matching_ind <- function(dat, check_fun){
check_fun <- as_mapper(check_fun)
reduce2(dat, seq_along(dat), .init = NA_integer_,
function(prev, dat, ind) if_else(check_fun(dat), ind, prev) )
}
dat %>%
mutate(last = last_matching_ind(dat[-1], ~ !is.na(.x)))
# id x y z last
# 1 1 1 NA 3 3
# 2 2 NA NA 1 3
# 3 3 NA NA NA NA

R - Replace specific value contents with NA [duplicate]

This question already has answers here:
Replacing character values with NA in a data frame
(7 answers)
Closed 3 years ago.
I have a fairly large data frame that has multiple "-" which represent missing data. The data frame consisted of multiple Excel files, which could not use the "na.strings =" or alternative function, so I had to import them with the "-" representation.
How can I replace all "-" in the data frame with NA / missing values? The data frame consists of 200 columns of characters, factors, and integers.
So far I have tried:
sum(df %in c("-"))
returns: [1] 0
df[df=="-"] <-NA #does not do anything
library(plyr)
df <- revalue(df, c("-",NA))
returns: Error in revalue(tmp, c("-", NA)) :
x is not a factor or a character vector.
library(anchors)
df <- replace.value(df,colnames(df),"-",as.character(NA))
Error in charToDate(x) :
character string is not in a standard unambiguous format
The data frame consists of 200 columns of characters, factors, and integers, so I can see why the last two do not work correctly. Any help would be appreciated.
Since you're already using tidyverse functions, you can easily use na_if from dplyr within your pipes.
For example, I have a dataset where 999 is used to fill in a non-answer:
df <- tibble(
alpha = c("a", "b", "c", "d", "e"),
val1 = c(1, 999, 3, 8, 999),
val2 = c(2, 8, 999, 1, 2))
If I wanted to change val1 so 999 is NA, I could do:
df %>%
mutate(val1 = na_if(val1, 999))
In your case, it sounds like you want to replace a value across multiple variables, so using across for multiple columns would be more appropriate:
df %>%
mutate(across(c(val1, val2), na_if, 999)) # or val1:val2
replaces all instances of 999 in both val1 and val2 with NA and now looks like this:
# A tibble: 5 x 3
alpha val1 val2
<chr> <dbl> <dbl>
1 a 1. 2.
2 b NA 8.
3 c 3. NA
4 d 8. 1.
5 e NA 2.
I believe the simplest solution is with base R function is.na<-. It's meant to solve precisely that issue.
First, make up some data. Then set the required values to NA.
set.seed(247) # make the results reproducible
df <- data.frame(X = 1:10, Y = sample(c("-", letters[1:2]), 10, TRUE))
is.na(df) <- df == "-"
df
# X Y
#1 1 a
#2 2 b
#3 3 b
#4 4 a
#5 5 <NA>
#6 6 b
#7 7 a
#8 8 <NA>
#9 9 b
#10 10 a
Here's a solution that will do it:
> library(dplyr)
> test <- tibble(x = c('100', '20.56', '0.003', '-', ' -'), y = 5:1)
> makeNA <- function(x) str_replace(x,'-',NA_character_)
> mutate_all(test, funs(makeNA))
# A tibble: 5 x 2
x y
<chr> <chr>
1 100 5
2 20.56 4
3 0.003 3
4 NA 2
5 NA 1

How to replace NA values in a table for selected columns

There are a lot of posts about replacing NA values. I am aware that one could replace NAs in the following table/frame with the following:
x[is.na(x)]<-0
But, what if I want to restrict it to only certain columns? Let's me show you an example.
First, let's start with a dataset.
set.seed(1234)
x <- data.frame(a=sample(c(1,2,NA), 10, replace=T),
b=sample(c(1,2,NA), 10, replace=T),
c=sample(c(1:5,NA), 10, replace=T))
Which gives:
a b c
1 1 NA 2
2 2 2 2
3 2 1 1
4 2 NA 1
5 NA 1 2
6 2 NA 5
7 1 1 4
8 1 1 NA
9 2 1 5
10 2 1 1
Ok, so I only want to restrict the replacement to columns 'a' and 'b'. My attempt was:
x[is.na(x), 1:2]<-0
and:
x[is.na(x[1:2])]<-0
Which does not work.
My data.table attempt, where y<-data.table(x), was obviously never going to work:
y[is.na(y[,list(a,b)]), ]
I want to pass columns inside the is.na argument but that obviously wouldn't work.
I would like to do this in a data.frame and a data.table. My end goal is to recode the 1:2 to 0:1 in 'a' and 'b' while keeping 'c' the way it is, since it is not a logical variable. I have a bunch of columns so I don't want to do it one by one. And, I'd just like to know how to do this.
Do you have any suggestions?
You can do:
x[, 1:2][is.na(x[, 1:2])] <- 0
or better (IMHO), use the variable names:
x[c("a", "b")][is.na(x[c("a", "b")])] <- 0
In both cases, 1:2 or c("a", "b") can be replaced by a pre-defined vector.
Building on #Robert McDonald's tidyr::replace_na() answer, here are some dplyr options for controlling which columns the NAs are replaced:
library(tidyverse)
# by column type:
x %>%
mutate_if(is.numeric, ~replace_na(., 0))
# select columns defined in vars(col1, col2, ...):
x %>%
mutate_at(vars(a, b, c), ~replace_na(., 0))
# all columns:
x %>%
mutate_all(~replace_na(., 0))
Edit 2020-06-15
Since data.table 1.12.4 (Oct 2019), data.table gains two functions to facilitate this: nafill and setnafill.
nafill operates on columns:
cols = c('a', 'b')
y[ , (cols) := lapply(.SD, nafill, fill=0), .SDcols = cols]
setnafill operates on tables (the replacements happen by-reference/in-place)
setnafill(y, cols=cols, fill=0)
# print y to show the effect
y[]
This will also be more efficient than the other options; see ?nafill for more, the last-observation-carried-forward (LOCF) and next-observation-carried-backward (NOCB) versions of NA imputation for time series.
This will work for your data.table version:
for (col in c("a", "b")) y[is.na(get(col)), (col) := 0]
Alternatively, as David Arenburg points out below, you can use set (side benefit - you can use it either on data.frame or data.table):
for (col in 1:2) set(x, which(is.na(x[[col]])), col, 0)
This is now trivial in tidyr with replace_na(). The function appears to work for data.tables as well as data.frames:
tidyr::replace_na(x, list(a=0, b=0))
Not sure if this is more concise, but this function will also find and allow replacement of NAs (or any value you like) in selected columns of a data.table:
update.mat <- function(dt, cols, criteria) {
require(data.table)
x <- as.data.frame(which(criteria==TRUE, arr.ind = TRUE))
y <- as.matrix(subset(x, x$col %in% which((names(dt) %in% cols), arr.ind = TRUE)))
y
}
To apply it:
y[update.mat(y, c("a", "b"), is.na(y))] <- 0
The function creates a matrix of the selected columns and rows (cell coordinates) that meet the input criteria (in this case is.na == TRUE).
We can solve it in data.table way with tidyr::repalce_na function and lapply
library(data.table)
library(tidyr)
setDT(df)
df[,c("a","b","c"):=lapply(.SD,function(x) replace_na(x,0)),.SDcols=c("a","b","c")]
In this way, we can also solve paste columns with NA string. First, we replace_na(x,""),then we can use stringr::str_c to combine columns!
Starting from the data.table y, you can just write:
y[, (cols):=lapply(.SD, function(i){i[is.na(i)] <- 0; i}), .SDcols = cols]
Don't forget to library(data.table) before creating y and running this command.
This needed a bit extra for dealing with NA's in factors.
Found a useful function here, which you can then use with mutate_at or mutate_if:
replace_factor_na <- function(x){
x <- as.character(x)
x <- if_else(is.na(x), 'NONE', x)
x <- as.factor(x)
}
df <- df %>%
mutate_at(
vars(vector_of_column_names),
replace_factor_na
)
Or apply to all factor columns:
df <- df %>%
mutate_if(is.factor, replace_factor_na)
For a specific column, there is an alternative with sapply
DF <- data.frame(A = letters[1:5],
B = letters[6:10],
C = c(2, 5, NA, 8, NA))
DF_NEW <- sapply(seq(1, nrow(DF)),
function(i) ifelse(is.na(DF[i,3]) ==
TRUE,
0,
DF[i,3]))
DF[,3] <- DF_NEW
DF
For completeness, built upon #sbha's answer, here is the tidyverse version with the across() function that's available in dplyr since version 1.0 (which supersedes the *_at() variants, and others):
# random data
set.seed(1234)
x <- data.frame(a = sample(c(1, 2, NA), 10, replace = T),
b = sample(c(1, 2, NA), 10, replace = T),
c = sample(c(1:5, NA), 10, replace = T))
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tidyr)
# with the magrittr pipe
x %>% mutate(across(1:2, ~ replace_na(.x, 0)))
#> a b c
#> 1 2 2 5
#> 2 2 2 2
#> 3 1 0 5
#> 4 0 2 2
#> 5 1 2 NA
#> 6 1 2 3
#> 7 2 2 4
#> 8 2 1 4
#> 9 0 0 3
#> 10 2 0 1
# with the native pipe (since R 4.1)
x |> mutate(across(1:2, ~ replace_na(.x, 0)))
#> a b c
#> 1 2 2 5
#> 2 2 2 2
#> 3 1 0 5
#> 4 0 2 2
#> 5 1 2 NA
#> 6 1 2 3
#> 7 2 2 4
#> 8 2 1 4
#> 9 0 0 3
#> 10 2 0 1
Created on 2021-12-08 by the reprex package (v2.0.1)
it's quite handy with data.table and stringr
library(data.table)
library(stringr)
x[, lapply(.SD, function(xx) {str_replace_na(xx, 0)})]
FYI
this works fine for me
DataTable DT = new DataTable();
DT = DT.AsEnumerable().Select(R =>
{
R["Campo1"] = valor;
return (R);
}).ToArray().CopyToDataTable();

Removing empty rows of a data file in R

I have a dataset with empty rows. I would like to remove them:
myData<-myData[-which(apply(myData,1,function(x)all(is.na(x)))),]
It works OK. But now I would like to add a column in my data and initialize the first value:
myData$newCol[1] <- -999
Error in `$<-.data.frame`(`*tmp*`, "newCol", value = -999) :
replacement has 1 rows, data has 0
Unfortunately it doesn't work and I don't really understand why and I can't solve this.
It worked when I removed one line at a time using:
TgData = TgData[2:nrow(TgData),]
Or anything similar.
It also works when I used only the first 13.000 rows.
But it doesn't work with my actual data, with 32.000 rows.
What did I do wrong? It seems to make no sense to me.
I assume you want to remove rows that are all NAs. Then, you can do the following :
data <- rbind(c(1,2,3), c(1, NA, 4), c(4,6,7), c(NA, NA, NA), c(4, 8, NA)) # sample data
data
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 1 NA 4
[3,] 4 6 7
[4,] NA NA NA
[5,] 4 8 NA
data[rowSums(is.na(data)) != ncol(data),]
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 1 NA 4
[3,] 4 6 7
[4,] 4 8 NA
If you want to remove rows that have at least one NA, just change the condition :
data[rowSums(is.na(data)) == 0,]
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 6 7
If you have empty rows, not NAs, you can do:
data[!apply(data == "", 1, all),]
To remove both (NAs and empty):
data <- data[!apply(is.na(data) | data == "", 1, all),]
Here are some dplyr options:
# sample data
df <- data.frame(a = c('1', NA, '3', NA), b = c('a', 'b', 'c', NA), c = c('e', 'f', 'g', NA))
library(dplyr)
# remove rows where all values are NA:
df %>% filter_all(any_vars(!is.na(.)))
df %>% filter_all(any_vars(complete.cases(.)))
# remove rows where only some values are NA:
df %>% filter_all(all_vars(!is.na(.)))
df %>% filter_all(all_vars(complete.cases(.)))
# or more succinctly:
df %>% filter(complete.cases(.))
df %>% na.omit
# dplyr and tidyr:
library(tidyr)
df %>% drop_na
Alternative solution for rows of NAs using janitor package
myData %>% remove_empty("rows")
This is similar to some of the above answers, but with this, you can specify if you want to remove rows with a percentage of missing values greater-than or equal-to a given percent (with the argument pct)
drop_rows_all_na <- function(x, pct=1) x[!rowSums(is.na(x)) >= ncol(x)*pct,]
Where x is a dataframe and pct is the threshold of NA-filled data you want to get rid of.
pct = 1 means remove rows that have 100% of its values NA.
pct = .5 means remome rows that have at least half its values NA
Using dplyr's if_all/if_any
Drop rows with any NA OR Select rows with no NA value.
df %>% filter(!if_any(a:c, is.na))
# a b c
#1 1 a e
#2 3 c g
#Also
df %>% filter(if_all(a:c, Negate(is.na)))
Drop rows with all NA values or select rows with at least one non-NA value.
df %>% filter(!if_all(a:c, is.na))
# a b c
#1 1 a e
#2 <NA> b f
#3 3 c g
#Also
df %>% filter(if_any(a:c, Negate(is.na)))
data
Using data from #sbha -
df <- data.frame(a = c('1', NA, '3', NA),
b = c('a', 'b', 'c', NA),
c = c('e', 'f', 'g', NA))
Here's yet another answer if you just want a handy function wrapper. Also, many of the above solutions remove a row with ANY NAs, whereas this one only removes rows that are ALL NAs.
data <- rbind(c(1,2,3), c(1, NA, 4), c(4,6,7), c(NA, NA, NA), c(4, 8, NA)) # sample data
data
rmNArows<-function(d){
goodRows<-apply(d,1,function(x) sum(is.na(x))!=ncol(d))
d[goodRows,]
}
rmNArows(data)

Resources