na_if() function in R started giving error recently

na_if() function in R started giving error recently - r

I was using the following code to get rid of empty cell in my dataframe.
df %>%
# recode empty strings "" by NAs
na_if("") %>%
# remove NAs
na.omit`
it was working fine till recently but now i am getting the following error
Error in na_if():
! Can't convert y to match type of x <tbl_df>.
Run rlang::last_error() to see where the error occurred.
rlang::last_error()
<error/vctrs_error_cast>
Error in na_if():
! Can't convert y to match type of x <tbl_df>.
I am using r version 4.1.3 and dplyr package 1.1.0
Note: i am getting the same error when using
df %>% mutate_all(~na_if(.,"")) %>%
na.om`it

library(tidyverse)
set.seed(2023)
df <- data.frame(values=sample(c(letters[1:3],""),30,T))
no_na_df <- df %>%
na_if("") %>%
na.omit()
map_dbl(list(df,no_na_df),nrow) # print number of rows of wach data set
Output:
[1] 30 22
If i may suggest an easier base R version (can be used in the context of mtate as well:
replace(df$values,which(df$values==""),NA)
df %>% mutate(values_no_na=replace(values,which(values==""),NA)) %>% view()

Related

how can i use group_by in text mining with r

I am just wondering why I cant use group_by() in corpus-text.
I tried using some packages too but at the end nothing.
Also tried to convert to tibble.
My code:
data <- data %>%
group_by(Title) %>%
mutate(line = row_number()) %>%
ungroup()
The output:
Error:
! All columns in a tibble must be vectors.
✖ Column `text` is a `corpus_text` object.
Run `rlang::last_error()` to see where the error occurred.

how to remove duplicate rows in R within Arrow?

I work with the arrow dataset to reduce the RAM usage but I met with the following problem.
I need to remove duplicate rows. With dplyr I can do it using distinct() but this function doesn't supported in Arrow.
Any ideas?
Following to recommendations I wrote the following code
Sales_2021 <- Sales_2021 %>%
group_by(`Cust-Item-Loc`) %>%
arrange(desc(SBINDT)) %>%
distinct(`Cust-Item-Loc`, .keep_all = TRUE) %>%
collect()
and got the Error message
Error: `distinct()` with `.keep_all = TRUE` not supported in Arrow
How can I slice the first rows?
The advice with filter(!duplicate()) is not working as well.
Sales_2021 <- Sales_2021 %>%
group_by(`Cust-Item-Loc`) %>%
arrange(desc(SBINDT)) %>%
filter(!duplicated(`Cust-Item-Loc`)) %>%
collect()
Error message
Error: Filter expression not supported for Arrow Datasets: !duplicated(`Cust-Item-Loc`)
Call collect() first to pull data into R.

How to return sorted distinct values in console?

I would like to return a list in the R console of all unique values for a dataframe column. However, I also wanted the list to be sorted but I'm unable to do this.
df %>% distinct(var)
This works fine, but when I try doing:
df %>% sort(distinct(var))
It gives me this error message
Error in distinct(home_street) : object 'home_street' not found

You can keep the unique values, then sort by the variable column, then use pull to get just the vector.
library(tidyverse)
mtcars %>%
distinct(cyl) %>%
arrange(cyl) %>%
pull()
#[1] 4 6 8
Or in base R:
sort(unique(mtcars$cyl))

How do I convert a column from character to double in R?

I am trying to group a dataset on a certain value and then sum a column based on this grouped value.
UN.surface.area.share <- left_join(countries, UN.surface.area, by = 'country') %>% drop_na() %>%
rename('surface.area' = 'Surface.area..km2.') %>% group_by(region) %>% summarise(total.area = sum(surface.area))
When I run this I get this error:
Error: Problem with `summarise()` input `total.area`.
x invalid 'type' (character) of argument
i Input `total.area` is `sum(surface.area)`.
i The error occurred in group 1: region = "Africa".
I think the problem is that the 'surface.area' column is of the character type and therefore the sum function doesn't work. I tried adding %>% as.numeric('surface.area') to the previous code:
UN.surface.area.share <- left_join(countries, UN.surface.area, by = 'country') %>% drop_na() %>%
rename('surface.area' = 'Surface.area..km2.') %>% as.numeric('surface.area') %>% group_by(region) %>% summarise(total.area = sum(surface.area))
But this gives the following error:
Error in group_by(., region) :
'list' object cannot be coerced to type 'double'
I think this problem can be solved by changing the 'surface.area' column to a numeric datatype but I am not sure how to do this. I checked the column and it only consists of numbers.

Use dplyr::mutate()
So instead of:
... %>% as.numeric('surface.area') %>%...
do:
...%>% mutate(surface.area = as.numeric(surface.area)) %>%...
mutate() changes one or more variables within a dataframe. When you pipe to is.numeric, as you're currently doing, you're effectively asking R to run
as.numeric(data.frame.you.piped.in, 'surface.area')
as.numeric then tries to convert the data frame into a number, which it can't do since the data frame is a list object. Hence your error. It's also running with two arguments, which will cause a crash regardless of the structure of the first argument.

How to select_if in dplyr, where the logical condition is negated

I want to select all numeric columns from a dataframe, and then to select all the non-numeric columns. An obvious way to do this is the following :-
mtcars %>%
select_if(is.numeric) %>%
head()
This works exactly as I expect.
mtcars %>%
select_if(!is.numeric) %>%
head()
This doesn't, and produces the error message Error in !is.numeric : invalid argument type
Looking at another way to do the same thing :-
mtcars %>%
select_if(sapply(., is.numeric)) %>%
head()
works perfectly, but
mtcars %>%
select_if(sapply(., !is.numeric)) %>%
head()
fails with the same error message. (purrr::keep behaves exactly the same way).
In both cases using - to drop the undesired columns fails too, with the same error as above for the is.numeric version, and this error message for the sapply version Error: Can't convert an integer vector to function.
The help page for is.numeric says
is.numeric is an internal generic primitive function: you can write methods to handle specific classes of objects, see InternalMethods. ... Methods for is.numeric should only return true if the base type of the class is double or integer and values can reasonably be regarded as numeric (e.g., arithmetic on them makes sense, and comparison should be done via the base type).
The help page for ! says
Value
For !, a logical or raw vector(for raw x) of the same length as x: names, dims and dimnames are copied from x, and all other attributes (including class) if no coercion is done.
Looking at the useful question Negation ! in a dplyr pipeline %>% I can see some of the reasons why this doesn't work, but neither of the solutions suggested there works.
mtcars %>%
select_if(not(is.numeric())) %>%
head()
gives the reasonable error Error in is.numeric() : 0 arguments passed to 'is.numeric' which requires 1.
mtcars %>%
select_if(not(is.numeric(.))) %>%
head()
Fails with this error :-
Error in tbl_if_vars(.tbl, .predicate, caller_env(), .include_group_vars = TRUE) : length(.p) == length(tibble_vars) is not TRUE.
This behaviour definitely violates the principle of least surprise. It's not of great consequence to me now, but it suggests I am failing to understand some more fundamental point.
Any thoughts?

Negating a predicate function can be done with the dedicated Negate() or purrr::negate() functions (rather than the ! operator, that negates a vector):
library(dplyr)
mtcars %>%
mutate(foo = "bar") %>%
select_if(Negate(is.numeric)) %>%
head()
# foo
# 1 bar
# 2 bar
# 3 bar
# 4 bar
# 5 bar
# 6 bar
Or (purrr::negate() (lower-case) has slightly different behavior, see the respective help pages):
library(purrr)
library(dplyr)
mtcars %>%
mutate(foo = "bar") %>%
select_if(negate(is.numeric)) %>%
head()
# foo
# 1 bar
# 2 bar
# 3 bar
# 4 bar
# 5 bar
# 6 bar

you could define your own "is not numeric" function and then use that instead
is_not_num <- function(x) !is.numeric(x)
mtcars %>%
select_if(is_not_num) %>%
head()

mtcars %>%
select_if(funs(!is.numeric(.))) %>%
head()
does the same

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

na_if() function in R started giving error recently - r

Related

how can i use group_by in text mining with r

how to remove duplicate rows in R within Arrow?

How to return sorted distinct values in console?

How do I convert a column from character to double in R?

How to select_if in dplyr, where the logical condition is negated

Categories

Resources