I have read "Programming with dplyr" and have succeeded in writing my first functions using dplyr pipes and bare variable names.
For the sake of readability as well as the use of non-dyplr functions using do(), I rename the columns at the beginning of the script, perform calculations, and return the dataframe with an additional calculated variable. The problem arises when trying to return to the original variable names.
require(dplyr)
require(rlang)
myfun <- function(df, var1) {
var1 <- enquo(var1)
# Rename column of interest
df <- df %>% rename(tempname = UQ(var1))
# Calculate mean of column of interest
df <- df %>% mutate(calc = tempname*2)
# Rename column of interest back to original name
df <- df %>% rename(UQ(var1) = tempname)
}
test <- myfun(mtcars, cyl)
This is the error thrown:
Error: unexpected '=' in:
" # Rename column of interest back to original name
df <- df %>% rename(UQ(var1) ="
> }
Error: unexpected '}' in "}"
Related
I have been reading from this SO post on how to work with string references to variables in dplyr.
I would like to mutate a existing column based on string input:
var <- 'vs'
my_mtcars <- mtcars %>%
mutate(get(var) = factor(get(var)))
Error: unexpected '=' in:
"my_mtcars <- mtcars %>%
mutate(get(var) ="
Also tried:
my_mtcars <- mtcars %>%
mutate(!! rlang::sym(var) = factor(!! rlang::symget(var)))
This resulted in the exact same error message.
How can I do the following based on passing string 'vs' within var variable to mutate?
# works
my_mtcars <- mtcars %>%
mutate(vs = factor(vs))
This operation can be carried out with := while evaluating (!!) and using the conversion to symbol and evaluating on the rhs of assignment
library(dplyr)
my_mtcars <- mtcars %>%
mutate(!! var := factor(!! rlang::sym(var)))
class(my_mtcars$vs)
#[1] "factor"
Or without thinking too much, use mutate_at, which can take strings in vars and apply the function of interest
my_mtcars2 <- mtcars %>%
mutate_at(vars(var), factor)
I have a data frame in which every column consists of number followed by text, e.g. 533 234r/r.
The following code to get rid off text works well:
my_data <- my_data %>%
mutate(column1 = str_extract(column1, '.+?(?=[a-z])'))
I would like to do it for multiple columns:
col_names <- names(my_data)
for (i in 1:length(col_names)) {
my_data <- my_data%>%
mutate(col_names[i] = str_extract(col_names[i], '.+?(?=[a-z])'))
}
But it returns an error:
Error: unexpected '=' in:
" my_data <- my_data %>%
mutate(col_names[i] ="
I think mutate_all() wouldn't work as well, bcos str_extract() requires column name as argument.
If we are using strings, then convert to symbol and evaluate (!!) while we do the assignment with (:=)
library(dplyr)
library(stringr)
col_names <- names(my_data)
for (i in seq_along(col_names)) {
my_data <- my_data %>%
mutate(!! col_names[i] :=
str_extract(!!rlang::sym(col_names[i]), '.+?(?=[a-z])'))
}
In tidyverse, we could do this with across instead of looping with a for loop (dplyr version >= 1.0)
my_data <- my_data %>%
mutate(across(everything(), ~ str_extract(., '.+?(?=[a-z])')))
If the dplyr version is old, use mutate_all
my_data <- my_data %>%
mutate_all(~ str_extract(., '.+?(?=[a-z])'))
With the new release of dplyr I am refactoring quite a lot of code and removing functions that are now retired or deprecated. I had a function that is as follows:
processingAggregatedLoad <- function (df) {
defined <- ls()
passed <- names(as.list(match.call())[-1])
if (any(!defined %in% passed)) {
stop(paste("Missing values for the following arguments:", paste(setdiff(defined, passed), collapse=", ")))
}
df_isolated_load <- df %>% select(matches("snsr_val")) %>% mutate(global_demand = rowSums(.)) # we get isolated load
df_isolated_load_qlty <- df %>% select(matches("qlty_good_ind")) # we get isolated quality
df_isolated_load_qlty <- df_isolated_load_qlty %>% mutate_all(~ factor(.), colnames(df_isolated_load_qlty)) %>%
mutate_each(funs(as.numeric(.)), colnames(df_isolated_load_qlty)) # we convert the qlty to factors and then to numeric
df_isolated_load_qlty[df_isolated_load_qlty[]==1] <- 1 # 1 is bad
df_isolated_load_qlty[df_isolated_load_qlty[]==2] <- 0 # 0 is good we mask to calculate the global index quality
df_isolated_load_qlty <- df_isolated_load_qlty %>% mutate(global_quality = rowSums(.)) %>% select(global_quality)
df <- bind_cols(df, df_isolated_load, df_isolated_load_qlty)
return(df)
}
Basically the function does as follows:
1.The function selects all of the values of a pivoted dataframe and aggregated them.
2.The function selects the quality indicator (character) of a pivoted dataframe.
3.I convert the characters of the quality to factors and then to numeric to get the 2 levels (1 or 2).
4.I replace the numeric values of each of the individual columns by 0 or 1 depending on the level.
5.I rowsum the individual quality as I will get 0 if all of the values are good, otherwise the global quality is bad.
The problem is that I am getting the following messages:
1: `funs()` is deprecated as of dplyr 0.8.0.
Please use a list of either functions or lambdas:
# Simple named list:
list(mean = mean, median = median)
# Auto named with `tibble::lst()`:
tibble::lst(mean, median)
# Using lambdas
list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated.
2: `mutate_each_()` is deprecated as of dplyr 0.7.0.
Please use `across()` instead.
I did multiple trials as for instance:
df_isolated_load_qlty %>% mutate(across(.fns = ~ as.factor(), .names = colnames(df_isolated_load_qlty)))
Error: Problem with `mutate()` input `..1`.
x All unnamed arguments must be length 1
ℹ Input `..1` is `across(.fns = ~as.factor(), .names = colnames(df_isolated_load_qlty))`.
But I am still a bit confused about the new dplyr syntax. Would someone be able to guide me a little bit around the right way of doing this?
mutate_each has been long deprecated and was replaced with mutate_all.
mutate_all is now replaced with across
across has default .cols as everything() which means it behaves as mutate_all by default (like here) if not mentioned explicitly.
You can apply the mulitple function in the same mutate call, so here factor and as.numeric can be applied together.
Considering all this you can change your existing function to :
library(dplyr)
processingAggregatedLoad <- function (df) {
defined <- ls()
passed <- names(as.list(match.call())[-1])
if (any(!defined %in% passed)) {
stop(paste("Missing values for the following arguments:",
paste(setdiff(defined, passed), collapse=", ")))
}
df_isolated_load <- df %>%
select(matches("snsr_val")) %>%
mutate(global_demand = rowSums(.))
df_isolated_load_qlty <- df %>% select(matches("qlty_good_ind"))
df_isolated_load_qlty <- df_isolated_load_qlty %>%
mutate(across(.fns = ~as.numeric(factor(.))))
df_isolated_load_qlty[df_isolated_load_qlty ==1] <- 1
df_isolated_load_qlty[df_isolated_load_qlty==2] <- 0
df_isolated_load_qlty <- df_isolated_load_qlty %>%
mutate(global_quality = rowSums(.)) %>%
select(global_quality)
df <- bind_cols(df, df_isolated_load, df_isolated_load_qlty)
return(df)
}
I am trying to set up a function in R which prepares data in a specific format to be fed into a correlogram. When manipulating datasets I tend to use dplyr due to its clarity and ease of use, but I am running into problems trying to pass a dataset and specified column names into this function while using dplyr.
This is the set up, included here (in slightly abbreviated form) for clarity. I have not encountered any errors with this and before posting this I confirmed corrData is set up properly:
library(corrplot)
library(tidyverse)
library(stringr)
table2a <- table2 %>%
mutate(example_index = str_c(country,year, sep="."))
Here is the actual function:
prepCorr <- function(dtable, x2, index2) {
practice <- dtable %>%
select(index2, x2) %>%
mutate(count=1) %>%
complete(index2, x2)
practice$count[is.na(practice$count)] <- 0
practice <- spread(practice, key = x2, value = count)
M <- cor(practice)
return(M)
}
prepCorr(table2a, type, example_index)
Whenever I run this function I get:
Error in overscope_eval_next(overscope, expr) : object 'example_index' not found
I have also tried to take advantage of quosures to fix this, but recieve a different error when I do so. When I run the following modified code:
prepCorr <- function(dtable, x2, index2) {
x2 <- enquo(x2)
index2 <- enquo(index2)
practice <- dtable %>%
select(!!index2, !!x2) %>%
mutate(count=1) %>%
complete(!!index2, !!x2)
practice$count[is.na(practice$count)] <- 0
practice <- spread(practice, key = !!x2, value = count)
return(cor(practice))
}
prepCorr(table2a, type, example_index)
I get:
Error in !index2 : invalid argument type
What am I doing wrong here, and how can I fix this? I believe I am using dplyr 0.7 for clarification.
UPDATE: replaced old example with reproducible example.
Look at this example
library(dplyr)
myfun <- function(df, col1, col2) {
col1 <- enquo(col1) # need to quote
col2 <- enquo(col2)
df1 <- df %>%
select(!!col1, !!col2) #!! unquotes
return(df1)
}
myfun(mtcars, cyl, gear)
You can learn more here link about NSE vs SE
dplyr::group_by() fails to group the variables of the following data.frame contained in a pc-axis file:
library("pacman")
pacman::p_load(pxR, dplyr, janitor)
px_file <- "https://www.pxweb.bfs.admin.ch/DownloadFile.aspx?file=px-x-1502040100_131"
pxR::read.px(base::url(px_file))$DATA$value %>% # the data.frame
janitor::clean_names() %>%
dplyr::select (student_level = studienstufe,
year = jahr,
counts = value) %>% # dplyr::rename() also fails
dplyr::group_by (year, student_level) %>% # not grouping!
dplyr::summarise(totals = sum (counts))
I believe it could be due to an encoding issue, but I cannot find the problem. Any ideas? Thanks.
The only fault I could find was that you use select instead of rename. You wrote that rename didn't work for you. This worked for me:
library("pacman")
library("dplyr")
library("janitor")
# Loading your data
pacman::p_load(pxR, dplyr, janitor)
px_file <- "https://www.pxweb.bfs.admin.ch/DownloadFile.aspx?file=px-x-1502040100_131"
px <- pxR::read.px(base::url(px_file))$DATA$value
# Cleaning the column names
px1 <- px %>% janitor::clean_names()
# Rename the columns
px2 <- px1 %>%
dplyr::rename (student_level = studienstufe,
sex = geschlecht,
year = jahr,
counts = value)
# Grouping data
px3 <- px2 %>%
dplyr::group_by (year, student_level) %>%
dplyr::summarise(totals = sum (counts))
I split every step into an own dataframe to see the result. This is not necessary.
If this doesn't work, you may upload your session info.
P.S. I also renamed the column geschlecht :)