How can I use dplyr's "Select helpers" in paste()?

How can I use dplyr's "Select helpers" in paste()? - r

This works well, but troublesome.
> library(dplyr)
> mutate(iris, a = paste( Petal.Width, Petal.Length) ) %>>% head
Sepal.Length Sepal.Width Petal.Length Petal.Width Species a
1 5.1 3.5 1.4 0.2 setosa 0.2 1.4
2 4.9 3.0 1.4 0.2 setosa 0.2 1.4
3 4.7 3.2 1.3 0.2 setosa 0.2 1.3
4 4.6 3.1 1.5 0.2 setosa 0.2 1.5
5 5.0 3.6 1.4 0.2 setosa 0.2 1.4
6 5.4 3.9 1.7 0.4 setosa 0.4 1.7
How can I use dplyr's "Select helpers" in paste()?
> mutate(iris, a = paste( starts_with("Petal") ))
Error in mutate_impl(.data, dots) :
wrong result size (0), expected 150 or 1
> mutate_(iris, a = paste( starts_with("Petal") ))
Error in parse(text = x)[[1]] : subscript out of bounds
> mutate_(iris, a = paste( starts_with(Petal) ))
Error in is.string(match) : object 'Petal' not found
> mutate(iris, a = paste( grep("Petal", names(iris), value=T) ))
Error in mutate_impl(.data, dots) :
wrong result size (2), expected 150 or 1
And this did not work.
> mutate(iris, a = paste( names(iris)[base::startsWith(names(iris),"Petal")] ))
Error in mutate_impl(.data, dots) :
wrong result size (2), expected 150 or 1
I made very troublesome function. But it works. Maybe I use this or search more simple good one.
> paste.colprefix <- function(DFNAME, PREFIX){
+ TMP <- eval(parse(text= paste0("grep(\"", PREFIX, "\",names(", DFNAME, "), v=T)")))
+ TMP <- paste0(DFNAME, "$",TMP)
+ TMP <- paste0(TMP, collapse = ",")
+ eval(parse(text= paste0( "paste(", TMP, ")")))
+ }
>
> iris$PetalPaste <- paste.colprefix("iris", "Petal")
> head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species PetalPaste
1 5.1 3.5 1.4 0.2 setosa 1.4 0.2
2 4.9 3.0 1.4 0.2 setosa 1.4 0.2
3 4.7 3.2 1.3 0.2 setosa 1.3 0.2
4 4.6 3.1 1.5 0.2 setosa 1.5 0.2
5 5.0 3.6 1.4 0.2 setosa 1.4 0.2
6 5.4 3.9 1.7 0.4 setosa 1.7 0.4
>

You can not use select's helper functions in paste function.
Following is the trick with which you can get expected output.
You can filter out column names of the data frame and use them as parameter to your paste function.
To filter out those column names you can use any one of the following technique.
base::startsWith(character vector, Starts with string)
cn <- names(iris)[base::startsWith(names(iris),"Petal")]
stringr::str_detect(character vector, regex to find)
cn <- names(iris)[stringr::str_detect(names(iris), "Petal.*")]
In each of this method, it will return vector of column names which start with "Petal".
Then You can use this as following to get your expected result.
iris$a <- do.call(paste,iris[cn])

Related

way to customize zScore function with r

I am new to R and have a question.
Create a function, zScore, that will take a vector of numbers (x) and converts them to a vector of z-scaled numbers (see code below). (Don't worry about NA's)
#This creates the z-scaled numbers for sepal lengths
(iris$Sepal.Length - mean(iris$Sepal.Length))/sd(iris$Sepal.Length)
#This creates the z-scaled numbers for sepal widths
(iris$Sepal.Width - mean(iris$Sepal.Width))/sd(iris$Sepal.Width)
write a zScore function that is flexible.
thank you for any help you provide

You can use the following code:
# Z-score function
zscore <- function(x) {
(x - mean(x))/sd(x)
}
library(tidyverse)
iris %>%
mutate(zscore_sepal.length = zscore(Sepal.Length)) %>%
mutate(zscore_sepal.width = zscore(Sepal.Width))
Output:
Sepal.Length Sepal.Width Petal.Length Petal.Width Species zscore_sepal.length zscore_sepal.width
1 5.1 3.5 1.4 0.2 setosa -1.95660229 -3.514384
2 4.9 3.0 1.4 0.2 setosa -2.15660229 -4.014384
3 4.7 3.2 1.3 0.2 setosa -2.35660229 -3.814384
4 4.6 3.1 1.5 0.2 setosa -2.45660229 -3.914384
5 5.0 3.6 1.4 0.2 setosa -2.05660229 -3.414384

R - Logical operator in subset() is a string

I feel I have a simple question, but I cannot get my code to work. In short, I want the condition statement in a subset() function to be a string. This mostly works, except for the logical operator. So I would want something like this;
my.string = "gender == female"
Subsequently I would run;
myData = subset(myData, my.string)
I have tried things like;
myData = subset(myData, parse(text = my.string))
myData = subset(myData, eval(parse(text = my.string)))
But of no avail. The main reason I want to do this, is because I want you to be able to make filter conditions up front in the code, so this would be;
filter.variable[[1]] = "gender"
filter.condition[[1]] = "==" # or %in%
filer.value[[1]] = "female"
i = 1
my.string = paste(filter.variable[[i]],filter.condition[[i]],filter.value[[i]])
This way I do not have to hardwire any filters in R.
Any suggestions are much appreciated,
Alex

We need to have quotes around 'female' i.e. This can be easily done in dQuote
my.string <- paste0('gender == ', dQuote('female', FALSE))
Or can do this with " wrapped
my.string = 'gender== "female"'
and then use that in subset with eval(parse
Using a reproducible example
my.string <- paste0('Species == ', dQuote('setosa', FALSE))
subset(iris, eval(parse(text = my.string)))
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1 5.1 3.5 1.4 0.2 setosa
#2 4.9 3.0 1.4 0.2 setosa
#3 4.7 3.2 1.3 0.2 setosa
#4 4.6 3.1 1.5 0.2 setosa
#5 5.0 3.6 1.4 0.2 setosa
#6 5.4 3.9 1.7 0.4 setosa
#7 4.6 3.4 1.4 0.3 setosa
#8 5.0 3.4 1.5 0.2 setosa
# ...

R: Transforming variables over many columns

I want to transform multiple columns in a large data.frame at once using across.
As an example I want to make this transformation
library(tidyverse)
iris %>% mutate(Sepal.Length2 = (Sepal.Length^4-min(Sepal.Length^4)) / (max(Sepal.Length^4) - min(Sepal.Length^4)))
but for all columns starting with "Sepal".
I think, I can use this command, but I can't figure how I can add my function.
iris %>% mutate(across(starts_with("Sepal")), ... )
Sorry if it is too trivial, but I don't know what I have to enter into google to find some useful pages.

We can use
library(dplyr)
iris1 <- iris %>%
mutate(across(starts_with("Sepal"),
~ (.^4-min(.^4)) / (max(.^4) - min(.^4)), .names = '{.col}2'))

my_function <- function(x) {
y = x^4-min(x^4)/max(x^4)/min(x^4)
return=y
}
iris %>%
mutate(across(starts_with("Sepal"), my_function))
Output:
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 676.5198 150.05983 1.4 0.2 setosa
2 576.4798 80.99733 1.4 0.2 setosa
3 487.9678 104.85493 1.3 0.2 setosa
4 447.7453 92.34943 1.5 0.2 setosa
5 624.9997 167.95893 1.4 0.2 setosa
6 850.3053 231.34143 1.7 0.4 setosa
7 447.7453 133.63093 1.4 0.3 setosa
8 624.9997 133.63093 1.5 0.2 setosa
9 374.8093 70.72543 1.4 0.2 setosa
10 576.4798 92.34943 1.5 0.1 setosa
11 850.3053 187.41343 1.5 0.2 setosa
12 530.8413 133.63093 1.6 0.2 setosa
13 530.8413 80.99733 1.4 0.1 setosa
14 341.8798 80.99733 1.1 0.1 setosa
15 1131.6493 255.99733 1.2 0.2 setosa
.....

Tidyeval evaluate case_when

Somewhat related to Tidy evaluation programming with dplyr::case_when and Making tidyeval function inside case_when, I want to create strings (using a shiny app) to be parsed later inside a case_when function. Here's an example:
library(tidyverse)
# simulated shiny inputs
new_column = sym("COL_NAME")
number_of_categories = 3
col1_text = "Big"
col1_min = 7.0
col1_max = 8.0
col2_text = "Medium"
col2_min = 5.0
col2_max = 6.9
col3_text = "Small"
col3_max = 4.9
col3_min = 4.0
columninput = sym("Sepal.Length")
DESIRED OUTPUT
iris %>%
mutate(new_column =
case_when(
!!columninput >= col1_min & !!columninput <= col1_max ~ col1_text,
!!columninput >= col2_min & !!columninput <= col2_max ~ col2_text,
!!columninput >= col3_min & !!columninput <= col3_max ~ col3_text
)
)
Because the only thing changing between functions is the index, I was thinking we can use the general pattern to create a string
# create single string
my_string <-function(i) {
paste0("!!", columninput, " >= col", i, "_min & ", "!!", columninput, " <= col", i, "_max ~ col", i, "_text")
}
Then repeat the string for the dynamic number of cases
mega_string <- map_chr(1:number_of_categories, ~ my_string(.x))
TODO:
This is the part I cant quite piece together: using those strings as the arguments within a case_when.
# evaluate somehow?
iris %>%
mutate(
new_column = case_when(
# tidyeval mega_string?
paste(mega_string, collapse = "," )
)
)
Is this even the right approach? How else would you go about solving this - any help high level or otherwise is greatly appreciated!

We could create an expression and evaluate
library(dplyr)
library(stringr)
iris %>%
mutate(new_column = eval(rlang::parse_expr(str_c('case_when(',
str_c(mega_string, collapse=","), ')'))))
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species new_column
#1 5.1 3.5 1.4 0.2 setosa Medium
#2 4.9 3.0 1.4 0.2 setosa Small
#3 4.7 3.2 1.3 0.2 setosa Small
#4 4.6 3.1 1.5 0.2 setosa Small
#5 5.0 3.6 1.4 0.2 setosa Medium
#6 5.4 3.9 1.7 0.4 setosa Medium
#7 4.6 3.4 1.4 0.3 setosa Small
#8 5.0 3.4 1.5 0.2 setosa Medium
#9 4.4 2.9 1.4 0.2 setosa Small
#10 4.9 3.1 1.5 0.1 setosa Small
# ...
Or using parse_expr with !!!
library(purrr)
iris %>%
mutate(new_column = case_when(!!! map(mega_string, rlang::parse_expr)))
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species new_column
#1 5.1 3.5 1.4 0.2 setosa Medium
#2 4.9 3.0 1.4 0.2 setosa Small
#3 4.7 3.2 1.3 0.2 setosa Small
#4 4.6 3.1 1.5 0.2 setosa Small
#5 5.0 3.6 1.4 0.2 setosa Medium
#6 5.4 3.9 1.7 0.4 setosa Medium
#7 4.6 3.4 1.4 0.3 setosa Small
#8 5.0 3.4 1.5 0.2 setosa Medium
#...

thx for the nice question and answer.
I'm using in same context (shiny).
I'd like to mention another approach that suits my needs better, and that I find more easy to read the logic off: rather than passing variables in the string to be evaluated you directly pass the values in the string coming from a tibble and str_glue_data
mega <- tribble(
~min, ~max, ~size,
7, 8, "Big",
5, 6.9, "Medium",
4.9, 4, "Small"
) %>%
str_glue_data("Sepal.Length >= {min} & Sepal.Length <= {max} ~ '{size}'")
iris %>%
mutate(new_column = case_when(!!! map(mega, rlang::parse_expr)))

accessing variables in data frame in R

I am try to open all the csv files in my working directory and read all the tables into a large list of data frame. I find a similar solution on stackoverflow and the solution works. The code is:
load_data <- function(path)
{
files <- dir(path, pattern = '\\.csv', full.names = TRUE)
tables <- lapply(files, read.csv)
do.call(rbind, tables)
}
pollutantmean <- load_data("specdata")
However, I am confused to some steps. If I delete or omit do.call(rbind,tables), I am not able to access the column variables by calling tables[index]$variable. It returns NULL in the console. Then I try to print an output by calling tables[index] and I do not see any column variables' name appearing the the first row in the table. Can someone explain to me what cause the column variables' name missing and return NULL value?

To see why you are getting NULL let's create a reproducible example:
df1 <- head(mtcars)
df2 <- head(iris)
my_list <- list(df1, df2)
Test the subsetting with one bracket and two:
my_list[2]$Species
NULL
my_list[[2]]$Species
[1] setosa setosa setosa setosa setosa setosa
Levels: setosa versicolor virginica
Subsetting with two brackets produces the desired output.
Further Explanation
Why doesn't one bracket work?
> my_list[2]
# [[1]]
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1 5.1 3.5 1.4 0.2 setosa
# 2 4.9 3.0 1.4 0.2 setosa
# 3 4.7 3.2 1.3 0.2 setosa
# 4 4.6 3.1 1.5 0.2 setosa
# 5 5.0 3.6 1.4 0.2 setosa
# 6 5.4 3.9 1.7 0.4 setosa
> my_list[[2]]
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1 5.1 3.5 1.4 0.2 setosa
# 2 4.9 3.0 1.4 0.2 setosa
# 3 4.7 3.2 1.3 0.2 setosa
# 4 4.6 3.1 1.5 0.2 setosa
# 5 5.0 3.6 1.4 0.2 setosa
# 6 5.4 3.9 1.7 0.4 setosa
If someone couldn't tell the difference between the two outputs I wouldn't blame them, they look alike. There's one small important difference between using one bracket and two. The first returns a list, the second returns a data frame. To check, notice the [[1]] in the first line of the output of my_list[2]. That indicates that the output is a list. As a list we cannot analyze it as we would a data frame. We must use the two brackets to get back a data frame.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How can I use dplyr's "Select helpers" in paste()? - r

Related

way to customize zScore function with r

R - Logical operator in subset() is a string

R: Transforming variables over many columns

Tidyeval evaluate case_when

accessing variables in data frame in R

Categories

Resources