Error when using na.rm = T for ntile() in R - r

Summary
I am using R's ntile() to assign observations into deciles.
I want NAs to be ignored.
However, when I write na.rm = T, the function no longer works.
Code
group_by(date) %>%
mutate(aggregate_ranking = ntile(average_ranking, 10)) %>%
ungroup()
Other
What's most important to me is to assign data into deciles. If it cannot be done via ntile(), is there another function that does this where NAs are ignored?
Error message
Error: Problem with `mutate()` input `aggregate_ranking`.
x unused argument (na.rm = T)
ℹ Input `aggregate_ranking` is `ntile(average_ranking, 10, na.rm = T)`.
ℹ The error occurred in group 1: date = Jan 1993.

Related

Error: Problem with `mutate()` column `nested.col`. i `nested.col = purrr::map(...)`. x no applicable method for 'tk_make_future_timeseries'

I tried to execute the code below and I'm getting error message
Here is the code:
full_data_tbl <- merger %>%
select(outcome, date.y, crime_numbers, sentiment) %>%
# Apply Group-wise Time Series Manipulations
group_by(outcome) %>%
future_frame(
.date_var = date,
.length_out = FORECAST_HORIZON,
.bind_data = TRUE
) %>%
ungroup() %>%
# Consolidate IDs
mutate(outcome = fct_drop(outcome))
Error message:
Error: Problem with mutate() column nested.col.
i nested.col = purrr::map(...).
x no applicable method for 'tk_make_future_timeseries' applied to an object of class "character"
i The error occurred in group 1: outcome = AntiSocial.
Can someone please help?

Chose a function with apply() based on some condition

In apply() function, I need to provide a function name. But in my case, that function name needs to be based on some other condition. Below is such example:
library(dplyr)
Function = TRUE
as.data.frame(matrix(1:12, 4)) %>%
mutate(Res = apply(as.matrix(.), 1, ifelse(Function, ~mean, ~sd), na.rm = TRUE))
However with this I am getting below error:
Error: Problem with `mutate()` column `Res`.
ℹ `Res = apply(as.matrix(.), 1, ifelse(Function, ~mean, ~sd), na.rm = TRUE)`.
✖ attempt to replicate an object of type 'language'
Run `rlang::last_error()` to see where the error occurred.
Can you please help me on right way to apply condition to chose a function.
This should work:
library(dplyr)
Function = TRUE
as.data.frame(matrix(1:12, 4)) %>%
mutate(Res = apply(as.matrix(.), 1, if (Function) mean else sd, na.rm = TRUE))
ifelse is a function that takes a vector and applies a logical condition to it, and returns a vector containing some specified value if that condition is true for that element, or another specified value if that condition is false for that element. The separate if else operators are used for conditionals when programming in R. Sometimes they're interchangeable and sometimes they're not.

replace NA is selected columns with replace_na

I have a dataset that contains columns hh_c22j, hh_r02a, hh_r02b. I want to replace NAs in these col with 0. Right now I have the command as below, it works. But is redundant, as I need to specify for each column to replace with 0.
df %>% select(case_id, hh_c22j, hh_r02a, hh_r02b) %>% replace_na(list(hh_c22j=0, hh_r02a=0, hh_r02b=0))
I want to select the columns together in an array/list like below.
df %>% select(case_id, hh_c22j, hh_r02a, hh_r02b) %>% replace_na(c(hh_c22j, hh_r02a, hh_r02b), 0)
But I got an error. The error msg is :
Error in is_list(replace) : object 'hh_c22j' not found
Error: 1 components of `...` were not used.
We detected these problematic arguments:
* `..1`
Did you misspecify an argument?
Run `rlang::last_error()` to see where the error occurred.
> rlang::last_error()
<error/rlib_error_dots_unused>
1 components of `...` were not used.
We detected these problematic arguments:
* `..1`
Did you misspecify an argument?
Backtrace:
1. `%>%`(...)
5. ellipsis:::action_dots(...)
Run `rlang::last_trace()` to see the full context.
Assuming you have other columns in the data as well but want to change just the three columns, you can do this:
library(dplyr)
df %>% mutate_at(vars(hh_c22j, hh_r02a, hh_r02b), list(~ replace(., which(is.na(.)), 0)))
# Alternatively, using replace_na
df %>% mutate_at(vars(hh_c22j, hh_r02a, hh_r02b), list(~ replace_na(., 0)))
Just for future reference, a small reproducible sample would go a long way to get better answers!
One option to do this in a clean way is make use of the mutate_all function and pass it the function to use on each of the columns. For example, here I create a dataset similar to what you have and replace the null values with 0s:
data <- data.frame(hh_c22j = sample(c(NA, 1), size = 5, replace = TRUE),
hh_r02a = sample(c(NA, 1), size = 5, replace = TRUE),
hh_r02b = sample(c(NA, 1), size = 5, replace = TRUE))
data %>%
mutate_all(replace_na, 0)
If you only want to perform this operation on some columns, mutate_at is a similar option where you can specify which column(s) to use this on.

summarize percent calculation invalid 'type' error

I have the following code that was working but now it is throwing an error. I think a package may have updated and broke it.
scorecard_data %>%
select (STABBR, HBCU, MENONLY, WOMENONLY) %>%
filter (str_detect(STABBR, "OH|PA|WV|KY|IN|MI")) %>%
group_by (STABBR) %>%
summarize (prcntHBCU = (sum(HBCU, na.rm = TRUE)/length(HBCU[!is.na(HBCU)])*100),
prcntMEN = (sum(MENONLY, na.rm = TRUE)/length(MENONLY[!is.na(MENONLY)])*100),
prcntWOMEN = (sum(WOMENONLY, na.rm = TRUE)/length(WOMENONLY[!is.na(WOMENONLY)])*100)) %>%
gather(key = 'Type.prcnt', value = 'Prcnt', prcntHBCU:prcntWOMEN) %>%
ggplot (aes (x = STABBR, y = Prcnt, fill = Type.prcnt)) +
geom_col(stat = "identity", position = "dodge") +
ggtitle ("% of HBCUs, Men Only, and Women Only Institutions - by OH and Neighboring States") +
xlab ("State") +
ylab ("Percent of Institutions")
and here is the error R Studio is giving when I run it...
Error: Problem with `summarise()` input `prcntHBCU`.
x invalid 'type' (character) of argument
i Input `prcntHBCU` is `(sum(HBCU, na.rm = TRUE)/length(HBCU[!is.na(HBCU)]) * 100)`.
i The error occurred in group 1: STABBR = "IN".
Run `rlang::last_error()` to see where the error occurred.
> rlang::last_error()
<error/dplyr_error>
Problem with `summarise()` input `prcntHBCU`.
x invalid 'type' (character) of argument
i Input `prcntHBCU` is `(sum(HBCU, na.rm = TRUE)/length(HBCU[!is.na(HBCU)]) * 100)`.
i The error occurred in group 1: STABBR = "IN".
Backtrace:
1. dplyr::select(., STABBR, HBCU, MENONLY, WOMENONLY)
1. dplyr::filter(., str_detect(STABBR, "OH|PA|WV|KY|IN|MI"))
1. dplyr::group_by(., STABBR)
2. dplyr::summarize(...)
14. dplyr:::h(simpleError(msg, call))
Can anyone help debug this and tell me why it isn't working?
Building on #gregmacfarlane and #Calumn_You the cause is likely applying sum to a character vector.
One easy was to check the type of your variables is summary(scorecard_data). Numeric variables will give min, max, median. Character variables will just say that the variable type is character. Factor variables will give a tally of the different counts.
You can convert characters to numeric with as.numeric assuming the character string is a number. If the variable is a factor, it is often best to first convert to character and then to numeric with as.numeric and as.character.
So you are probably looking for a solution like:
scorecard_data %>%
mutate(STABBR = as.numeric(STABBR), # if STABBR is of type character
HBCU = as.numeric(as.character(HBCU)), # if HBCU is of type factor
MENONLY = as.numeric(MENONLY),
WOMENONLY = as.numeric(WOMENONLY)) %>%
# the rest of your code follows here

R: dplyr::lag throws error when trying to lag characters in tibble

I'm getting the following error in R when I try to use the lag function (from the dplyr library) on a column of characters in a tibble:
Error in mutate_impl(.data, dots) : Expecting a single string
value: [type=logical; extent=1].
This error does not occur for a column of characters in a data frame. I also don't get the error for a column of numbers in either a tibble or a data frame.
Does anyone know why I'm getting this discrepancy in the lag function for data frames versus tibbles? Thanks!
Here is some sample code that reproduces the error. I have examples of both when lag works and when it doesn't. I have tried updating the tidyverse and dplyr libraries on my machine but I'm still getting the same error.
tib = data_frame(x = c('a','b','c'), y = 1:3)
# lagging column of characters in tibble throws error
res = tib %>%
mutate(lag_n = lag(x, n=1, default = NA))
# lagging column of numbers in tibble does NOT throw error
res = tib %>%
mutate(lag_c = lag(y, n=1, default = NA))
df = data.frame(x = c('a','b','c'), y = 1:3)
# lagging column of characters in data frame does NOT throw error
res = df %>%
mutate(lag_n = lag(x, n=1, default = NA))
# lagging column of numbers in data frame does NOT throw error
res = df %>%
mutate(lag_c = lag(y, n=1, default = NA))
You're running into this error because dplyr and tibble are strict about the type of NA values that they allow you to use (or, more specifically, they are more strict about checking the type of the variable you create). You needed NA_character_, like so:
res = tib %>%
mutate(lag_n = lag(x, n=1, default = NA_character_))

Resources