Find max and min value on a dataframe, ignoring NAs - r

I need to find the max and min value in a dataframe "df" like this:
col1 col2 col3
7 4 5
2 NA 6
3 2 4
NA NA 1
The result should be: min = 1 and max = 7.
I have used this function:
min <- min(df, na.rm=TRUE)
max <- max(df, na.rm=TRUE)
but it gives me the following error:
Error in FUN(X[[i]], ...) :
only defined on a data frame with all numeric variables
So I have converted all the values as.numeric in this way:
df <- as.numeric(as.character(df))
but it introduces NAs by coercion and now the results are:
min = -Inf and max=Inf
How can I operate on the df ignoring NAs?

If the columns are not numeric, convert it with type.convert
df <- type.convert(df, as.is = TRUE)
Or use a force conversion with matrix route
df[] <- as.numeric(as.matrix(df))
Or with lapply
df[] <- lapply(df, function(x) as.numeric(as.character(x)))
With R 4.1.0 we can also do
sapply(df, \(x) as.numeric(as.character(x))) |>
range(na.rm = TRUE)
#[1] 1 7
Once the columns are numeric, the functions work as expected
min(df, na.rm = TRUE)
#[1] 1
max(df, na.rm = TRUE)
#[1] 7
Note that as.character/as.numeric requires a vector input and not a data.frame

We could use minMax function from dataMaid package (handles NA's)
library(dataMaid)
minMax(df, maxDecimals = 2)
Output:
Min. and max.: 2; 7
data:
df <- tribble(
~col1, ~col2, ~col3,
7, 4, 5,
2, NA, 6,
3, 2, 4,
NA, NA, 1)

Another base R option
> range(na.omit(as.numeric(unlist(df))))
[1] 1 7
If it is factor class, you should use (thank #akrun's comment)
as.numeric(as.character(unlist(df)))

Related

To vector with names from dataframe with rownames

I have the following vector with names:
myvec <- c(`C1-C` = 3, `C2-C` = 1, `C3-C` = NA, `C4-C` = 5, `C5-C` = NA)
C1-C C2-C C3-C C4-C C5-C
3 1 NA 5 NA
I would to convert it in a dadtaframe/tibble... keeping the names of elements as rowname.
The best way that I found it was:
mynames <- names(myvec)
myvec <- myvec %>%
as_tibble() %>%
mutate(rownames = mynames) %>%
column_to_rownames("rownames")
How can I to do this in a more efficient way?
Thanks all
as.data.frame(myvec)
myvec
C1-C 3
C2-C 1
C3-C NA
C4-C 5
C5-C NA
Or
data.frame(myvec)

Replace NA in all columns: argument is not numeric or logical

I want to loop though a lot of columns in an r dataframe and replace NA with column mean.
I can get a mean for columns like this
mean(df$col20, na.rm = TRUE)
But this gets the warning: argument is not numeric or logical: returning NA
mean(df[ , 20], na.rm = TRUE)
I tried the above syntax with a small dummy df including some NA and it works fine. Any idea what else to look for to fix this?
ps. head(df[20]) tells me it's a dbl and str(df) says it's num.
(and [ , 20] is an example; I actually get lots of warnings because it really sits in a for loop - but I have executed the line by itself as a test)
1) na.aggregate Create a logical vector ok which is TRUE for each numeric column and FALSE for other columns. Then use na.aggregate on the numeric columns.
library(zoo)
df <- data.frame(a = c(1, NA, 2), b = c("a", NA, "b")) # test data
ok <- sapply(df, is.numeric)
replace(df, ok, na.aggregate(df[ok]))
giving:
a b
1 1.0 a
2 1.5 <NA>
3 2.0 b
2) dplyr/tidyr Alternately use dplyr. df is from above and the output is the same.
library(dplyr)
library(tidyr)
df %>%
mutate(across(where(is.numeric), ~ replace_na(., mean(., na.rm =TRUE))))
3) collapse We could alternately use ftransformv in collapse.
library(collapse)
library(zoo)
ftransformv(df, is.numeric, na.aggregate)
4) base A base solution would be:
fill_na <- function(x) {
if (!is.numeric(x) || all(is.na(x))) x
else replace(x, is.na(x), mean(x, na.rm = TRUE))
}
replace(df, TRUE, lapply(df, fill_na))

How to replace NA values with different values based on column in R dataframe?

I am trying to replace NA values by column with values predetermined from a vector. For example, I have vector containing the values (1,5,3) and a dataframe df, and want to replace all NA values from column one of df with 1, column two NA's with 5, and column three NA's with 3.
I tried a formula I saw that took
df[is.na(df)] = vector
but didn't seem to work due to "wrong length". Both the vector and #columns in df are also the same length.
You can use which to get row/column index of NA values and replace it directly.
mat <- which(is.na(df), arr.ind = TRUE)
df[mat] <- vector[mat[, 2]]
We can use Map to replace the corresponding columns in the dataset with the value in the vector and replace it directly and this would almost all the time and it is a single step replacement and is concise
df[] <- Map(function(x, y) replace(x, is.na(x), y), df, vec)
df
# col1 col2 col3
#1 1 5 2
#2 3 2 3
#3 1 5 3
Or another option is to make the lengths same, and then use pmax
df[] <- pmax(as.matrix(df), is.na(df) * vec[col(df)], na.rm = TRUE)
or another option with replace
df <- replace(df, is.na(df), rep(vec, colSums(is.na(df))))
NOTE: All the solutions above are one-liner
Or using data.table with set
library(data.table)
setDT(df)
for(j in seq_along(df)) set(df, i = which(is.na(df[[j]])), j = j, value = vec[j])
data
df <- data.frame(col1 = c(1, 3, NA), col2 = c(NA, 2, NA), col3 = c(2, NA, NA))
vec <- c(1, 5, 3)

NA as result of operation when all elements are NA

This might be slightly silly but I would appreciate a better way to deal with this problem. I have a dataframe as the following
a <- matrix(1,5,3)
a[1:2,2] <- NA
a[1,c(1,3)] <- NA
a[3:5,2] <- 2
a[2:5,3] <- 3
a <- data.frame(a)
colnames(a) = c("First", "Second", "Third")
I want to sum only some of, say, the columns but I would like to keep the NAs when all elements in the summed columns are NA. In short, if I sum First and Second columns I want to get something like
mySum <- c(NA, 1, 3, 3, 3)
Neither of the two options below provides what I want
rowSums(a[, c("First", "Second")])
rowSums(a[, c("First", "Second")], na.rm=TRUE)
but on the positive side I have resolved this by using a combination of is.na and all
mySum <- rowSums(a[, c("First", "Second")], na.rm=TRUE)
iNA = apply(a[, c("First", "Second")], 2, is.na)
iAllNA = apply(iNA, 1, all)
mySum[iAllNA] = NA
This feels slightly awkward though so I was wondering if there is a smarter way to handle this.
Using apply with margin = 1 for every row if all the row elements are NA we return NA or else we return the sum of them.
apply(a[c("First", "Second")], 1, function(x)
ifelse(all(is.na(x)), NA, sum(x, na.rm = TRUE)))
#[1] NA 1 3 3 3
mycols = c("First", "Second")
replace(x = rowSums(a[mycols], na.rm = TRUE),
list = rowSums(is.na(a[mycols])) == length(mycols),
values = NA)
#[1] NA 1 3 3 3

How do I change NA into Column median [duplicate]

This question already has answers here:
Replacing NA's in each column of matrix with the median of that column
(4 answers)
Closed 5 years ago.
A data frame contains 123 columns,
and each columns have at least 1 NA value.
I want these NA values to be raplaced into column median.
because there are so many columns, i cannot write a code using each column name.
so i tried to use 'apply' to solve this but it didn't work.
data2[-1]<-lapply(data2[-1],function(x)x - median(x,na.rm=TRUE))
it says it doesn't work since it is data frame, not numeric.
We can use na.aggregate
library(zoo)
j1 <- sapply(df1, is.numeric)
df1[j1] <- na.aggregate(df1[j1], FUN = median)
We can use map2_df
library(purrr)
df <- data.frame(a = c(1, 2, 3), b = c(2, NA, 9), c = c(NA, 3, 5), d = c(0, 4, NA))
purrr::map2_df(df, purrr::dmap(df, median, na.rm = TRUE), function(x, y) ifelse(is.na(x), y, x))
for(i in 1:ncol(df)){
df[is.na(df[,i]), i] <- median(df[,i], na.rm = TRUE)
}

Resources