Constructing lists using tidyeval tools (like `!!` and `:=`) - r

I am looking a way for easy list constructing based on R's tidyeval framework as defined in the rlang package.
Below is what I want to achieve:
a <- "item_name"
b <- "item_value"
identical(
list(!!a := !!b), # list(!!a := b) is of course also fine
list(item_name = "item_value")
)
What I can obtain at the moment is:
list(!!a := !!b)
# output
[[1]]
`:=`(!(!a), !(!b)
Alternatively it can get perhaps a little bit better when adding quosure:
quo(list(!!a := !!b))
# output
<quosure: global>
~list(`:=`("item_name", "item_value"))
Unfortunately I have no idea how to proceed further from here.
In other words I would like to have a similar effect like what we can get in the dplyr package:
transmute(iris, !!a := b)
# first few rows
Sepal.Length Sepal.Width Petal.Length Petal.Width Species item_name
1 5.1 3.5 1.4 0.2 setosa item_value
2 4.9 3.0 1.4 0.2 setosa item_value
3 4.7 3.2 1.3 0.2 setosa item_value
4 4.6 3.1 1.5 0.2 setosa item_value
5 5.0 3.6 1.4 0.2 setosa item_value
6 5.4 3.9 1.7 0.4 setosa item_value

You can use rlang::list2() which supports name-unquoting with := and splicing with !!!.
Note that you shouldn't unquote the argument itself since list2() is not a quoting function, it is just like list() with a few more syntactic features:
a <- "item_name"
b <- "item_value"
list2(!!a := b)

Related

why passing names(.) to forumla in rename_with doesn't work?

Not sure why the first one has an error but the second line works? My understanding was using names(.) in the formulas tells R to use the data before pipe operator. It seems to work for .cols argument but not for formula.
iris%>%rename_with(~gsub("Petal","_",names(.)),all_of(names(.)))
iris%>%rename_with(~~gsub("Petal","_",names(iris)),all_of(names(.)))
rename_with applies a function to the names of the passed data frame. The function should be one that, given the vector of names, returns the altered names, so the syntax is much simpler than you are trying to make it:
iris %>%
rename_with(~ gsub("Petal", "_", .x))
#> Sepal.Length Sepal.Width _.Length _.Width Species
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3.0 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5.0 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
#... etc

Is it possible to combine parameters to a subset function that is generated programmatically in R?

Before my question, here is a little background.
I am creating a general purpose data shaping and charting library for plotting survey data of a particular format.
As part of my scripts, I am using the subset function on my data frame. The way I am working is that I have a parameter file where I can pass this subsetting criteria into my functions (so I don't need to directly edit my main library). The way I do this is as follows:
subset_criteria <- expression(variable1 != "" & variable2 == TRUE)
(where variable1 and variable2 are columns in my data frame, for example).
Then in my function, I call this as follows:
my.subset <- subset(my.data, eval(subset_criteria))
This part works exactly as I want it to work. But now I want to augment that subsetting criteria inside the function, based on some other calculations that can only be performed inside the function. So I am trying to find a way to combine together these subsetting expressions.
Imagine inside my function I create some new column in my data frame automatically, and then I want to add a condition to my subsetting that says that this additional column must be TRUE.
Essentially, I do the following:
my.data$newcolumn <- with(my.data, ifelse(...some condition..., TRUE, FALSE))
Then I want my subsetting to end up being:
my.subset <- subset(my.data, eval(subset_criteria & newcolumn == TRUE))
But it does not seem like simply doing what I list above is valid. I get the wrong solution. So I'm looking for a way of combining these expressions using expression and eval so that I essentially get the combination of all the conditions.
Thanks for any pointers. It would be great if I can do this without having to rewrite how I do all my expressions, but I understand that might be what is needed...
Bob
You should probably avoid two things: using subset in non-interactive setting (see warning in the help pages) and eval(parse()). Here we go.
You can change the expression into a string and append it whatever you want. The trick is to convert the string back to expression. This is where the aforementioned parse comes in.
sub1 <- expression(Species == "setosa")
subset(iris, eval(sub1))
sub2 <- paste(sub1, '&', 'Petal.Width > 0.2')
subset(iris, eval(parse(text = sub2))) # your case
> subset(iris, eval(parse(text = sub2)))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
16 5.7 4.4 1.5 0.4 setosa
17 5.4 3.9 1.3 0.4 setosa
18 5.1 3.5 1.4 0.3 setosa
19 5.7 3.8 1.7 0.3 setosa
20 5.1 3.8 1.5 0.3 setosa
22 5.1 3.7 1.5 0.4 setosa
24 5.1 3.3 1.7 0.5 setosa
27 5.0 3.4 1.6 0.4 setosa
32 5.4 3.4 1.5 0.4 setosa
41 5.0 3.5 1.3 0.3 setosa
42 4.5 2.3 1.3 0.3 setosa
44 5.0 3.5 1.6 0.6 setosa
45 5.1 3.8 1.9 0.4 setosa
46 4.8 3.0 1.4 0.3 setosa

Is it possible to include >=, <= operators when using setkey with data.table in R?

I'm looking at this brief tuto for data.table
https://www.r-bloggers.com/r-data-table-tutorial-with-50-examples/
but I get stuck when the author talks about setkey()
I will put my example. I work with iris database so it can be easy replicated
mydata <- as.data.table(iris)
#Change variable names
mydata <- setnames(mydata, c("Sepal.Length","Sepal.Width", "Petal.Length", "Petal.Width", "Species"),
c("sepal_length", "sepal_width", "petal_length", "petal_width", "species"))
Now I will use a factor variable and a numeric variable as keys:
setkey(mydata, species, petal_length)
Using this works perfectly:
> mydata[.("setosa", 1.4)]
sepal_length sepal_width petal_length petal_width species
1: 5.1 3.5 1.4 0.2 setosa
2: 4.9 3.0 1.4 0.2 setosa
3: 5.0 3.6 1.4 0.2 setosa
4: 4.6 3.4 1.4 0.3 setosa
5: 4.4 2.9 1.4 0.2 setosa
6: 4.8 3.0 1.4 0.1 setosa
7: 5.1 3.5 1.4 0.3 setosa
8: 5.2 3.4 1.4 0.2 setosa
9: 5.5 4.2 1.4 0.2 setosa
10: 4.9 3.6 1.4 0.1 setosa
11: 4.8 3.0 1.4 0.3 setosa
12: 4.6 3.2 1.4 0.2 setosa
13: 5.0 3.3 1.4 0.2 setosa
But this throws an error:
mydata[.("setosa", <1.4)]
Error: inesperado '<' in "mydata[.("setosa", <"
So my question is if it is possible to include >, <, >=, <= when searching using setkey because that function is supposed to work on variables of any type. If yes, what will be the correct form to call something such as mydata[.("setosa", <1.4)]
I looked at:
R data.table setkey with numeric column
R data.table 1.9.2 issue on setkey
but found nothing useful to answer my question.
I also read data.table documentation but there are no useful examples.
Any comment will be much appreciated.
It appears like you are subsetting rather than extracting identical matches. THe below feels more like the natural syntax
mydata[species=="setosa" & petal_length < 1.4]
or a non-equi join like this
mydata[.(species="setosa", i.petal_length=1.4), on=.(species, petal_length < i.petal_length)]
I found somethig that can be useful using seq function.
Suppose I want to retrieve the observations for setosa which have between petal_length from 1.4 to 2.
Following the example in my original question, we can use:
na.omit(mydata[.("setosa", seq(1.4,2, 0.1))])
Which returns the observations we wanted.
seq(1.4, 2, 0.1)
returns a sequence from 1.4 to 2 by 0.1 steps. This looks for values in the data.table and generates observations for 1.6, 1.8 and 1.9 which are NA. That's why the first function which is called is na.omit
Hope this can be useful for somebody.

When trying to call an object with get() within group_by and mutate, it brings up the entire object and not the grouped object. How do I fix this?

Here is my code:
data(iris)
spec<-names(iris[1:4])
iris$Size<-factor(ifelse(iris$Sepal.Length>5,"A","B"))
for(i in spec){
attach(iris)
output<-iris %>%
group_by(Size)%>%
mutate(
out=mean(get(i)))
detach(iris)
}
The for loop is written around some graphing and report writing that uses object 'i' in various parts. I am using dplyr and plyr.
Sepal.Length Sepal.Width Petal.Length Petal.Width Species Size out
1 5.1 3.5 1.4 0.2 setosa A 1.199333
2 4.9 3.0 1.4 0.2 setosa B 1.199333
3 4.7 3.2 1.3 0.2 setosa B 1.199333
4 4.6 3.1 1.5 0.2 setosa B 1.199333
5 5.0 3.6 1.4 0.2 setosa B 1.199333
Notice how that variable 'out' has the same mean, which is the mean of the entire dataset instead of the grouped mean.
> tapply(iris$Petal.Width,iris$Size,mean)
A B
1.432203 0.340625
> mean(iris$Petal.Width)
[1] 1.199333
Using get() and attach() isn't really consistent with dplyr because it's really messing up the environments in which the functions are evaulated. It would better to use the standard-evaluation equivalent of mutate here as described in the NSE vigette (vignette("nse", package="dplyr"))
for(i in spec){
output<-iris %>%
group_by(Size)%>%
mutate_(.dots=list(out=lazyeval::interp(~mean(x), x=as.name(i))))
# print(output)
}

splitting a data.table, then modifying by reference

I have a use-case where I need to split a data.table, then apply different modify-by-reference operations to each partition. However, splitting forces copying of each table.
Here's a toy example on the iris dataset:
#split the data
DT <- data.table(iris)
out <- split(DT, DT$Species)
#assign partitions to global environment
NAMES <- as.character(unique(DT$Species))
lapply(seq_along(out), function(x) {
assign(NAMES[x], out[[x]], envir=.GlobalEnv)})
#modify by reference, same function applied to different columns for different partitions
#would do this programatically in real use case
virginica[ ,summ:=sum(Petal.Length)]
setosa[ ,summ:=sum(Petal.Width)]
#rbind all (again, programmatic)
do.call(rbind, list(virginica, setosa))
Then I get the following warning:
Warning message:
In `[.data.table`(out$virginica, , `:=`(cumPedal, cumsum(Petal.Width))) :
Invalid .internal.selfref detected and fixed by taking a copy of the whole table so that := can add this new column by reference.
I know this is related to putting data.tables in lists. Is there any workaround for this use case, or a way to avoid using split? Note that in the real case, I want to modify by reference programatically, so hardcoding a solution won't work.
Here's an example of using .EACHI to achieve what it sounds like you're trying to do:
## Create a data.table that indicates the pairs of keys to columns
New <- data.table(
Species = c("virginica", "setosa", "versicolor"),
FunCol = c("Petal.Length", "Petal.Width", "Sepal.Length"))
## Set the key of your original data.table
setkey(DT, Species)
## Now use .EACHI
DT[New, temp := cumsum(get(FunCol)), by = .EACHI][]
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species temp
# 1: 5.1 3.5 1.4 0.2 setosa 0.2
# 2: 4.9 3.0 1.4 0.2 setosa 0.4
# 3: 4.7 3.2 1.3 0.2 setosa 0.6
# 4: 4.6 3.1 1.5 0.2 setosa 0.8
# 5: 5.0 3.6 1.4 0.2 setosa 1.0
# ---
# 146: 6.7 3.0 5.2 2.3 virginica 256.9
# 147: 6.3 2.5 5.0 1.9 virginica 261.9
# 148: 6.5 3.0 5.2 2.0 virginica 267.1
# 149: 6.2 3.4 5.4 2.3 virginica 272.5
# 150: 5.9 3.0 5.1 1.8 virginica 277.6
## Basic verification
head(cumsum(DT["setosa", ]$Petal.Width), 5)
# [1] 0.2 0.4 0.6 0.8 1.0
tail(cumsum(DT["virginica", ]$Petal.Length), 5)

Resources