Passing columns in a variable (dynamically) when aggregating in data.table [duplicate] - r

This question already has answers here:
In R data.table, how do I pass variable parameters to an expression?
(1 answer)
Select / assign to data.table when variable names are stored in a character vector
(6 answers)
How to use data.table within functions and loops?
(2 answers)
Closed 5 years ago.
I need to aggregate a data.table and create a table with counts, means and other statistics for several variables. The format for the output table should always be the same, but I need to aggregate by various methods. How can I set the output columns and aggregate statistics once and use for different by= choices?
# Create data.table
library(data.table)
DT <- data.table(iris)
# This works, but is long and needs to be updated in multiple
# place whenever I update the output format
DT[,list(theCount=.N,
meanSepalWidth=mean(Sepal.Width),
meanPetalWidth=mean(Petal.Width)),
by=Species]
# This does not work. How could I achieve what I'm trying to do here?
col.list <- list(theCount=.N,
meanSepalWidth=mean(Sepal.Width),
meanPetalWidth=mean(Petal.Width))
DT[,col.list, by=Species]

Related

dplyr mutate, dynamic variable [duplicate]

This question already has answers here:
Use dynamic name for new column/variable in `dplyr`
(10 answers)
Use variable names in functions of dplyr
(9 answers)
Closed 11 months ago.
I am having difficulties with utilizing a dynamic variable in dplyr (specifically, using dply::mutate to pick out a specific column). It seems there are quite a few posts online about using dynamic variables in dplyr, but I've gotten confused by this. It seems the solution to use a dynamic variable, you need to write a function, then apply this to your dplyr pipe.
Is there a way to directly use a dynamic variable in a dplyr::mutate, either as a new column name and/or to reference a column that we want to apply the calculation to?
Below is an example (where I am trying to use two variables; newcol as the name of the new column I'm trying to create, and col as the name of the column I'm trying to apply the calculation to).
#libraries
library(dplyr)
#make dataframe and variables for name
testdf <- data.frame(cola=c(1,2,3),colb=c(4,5,6),colc=c(7,8,9))
col <- "cola"
newcol <- "newcolname"
#mutate: new column (with variable column name & variable column which I'm using to calculate)
testdf %>%
dplyr::mutate(newcol=col*5)
Thanks for your help!

How to change a vector name in a function? [duplicate]

This question already has answers here:
Dynamically select data frame columns using $ and a character value
(10 answers)
Closed 1 year ago.
I want to do graphical outputs such as boxplots or graphs using a function. So that I can plot several dataframes, changing only the column name each time.
For example :
boxplot_func = function(column){
boxplot(dataframe1$column, dataframe2$column)}
boxplot_func(mean)
boxplot_func(max)
etc.
But R doesn't seem to compute mean or max in the function. Do you know a way to do it ?
One option would be to pass the column as a character string and use [[ to access the column in your function:
A simple example using mtcars:
boxplot_func = function(column) {
boxplot(mtcars[[column]], mtcars[[column]])
}
boxplot_func("mpg")

Save in a for loop all variables even with NA [duplicate]

This question already has answers here:
Combine two data frames by rows (rbind) when they have different sets of columns
(14 answers)
Closed 1 year ago.
I try to run in a loop some api call
I have a dataframe which saves in every iteration all data
However there are some iterations which don't have a specific column
Is there any easy way to save it with an NA without needing to know in every iteration which variable doesn't exist
This is what I use to save the data:
dfall <- rbind(dfall, dfiteration)
Use dplyr::bind_rows which will automatically add NA for columns which are not present.
dfall <- dplyr::bind_rows(dfall, dfiteration)
We can use rbindlist from data.table
library(data.table)
dfall <- rbindlist(list(dfall, dfiteration), fill = TRUE)

how to generate multiple columns with certain pattern in dplyr and purrr instead of using looping? [duplicate]

This question already has answers here:
Mutate multiple / consecutive columns (with dplyr or base R)
(5 answers)
Mutating multiple columns in a data frame using dplyr
(4 answers)
Closed 4 years ago.
I want to generate multiple new columns based on the current variable information in the data.frame (eg. data here), I can generate the variables with the loops using the scripts I listed below, is it possible to create them with the mutate command in dplyr with concise command? For example, using the mutate together with function, apply function or loop?
The requirement seems different from what have been achieved here
Mutating multiple columns in a data frame using dplyr and Mutate multiple / consecutive columns (with dplyr or base R) as I want to create dozens of new columns with certain pattern based on one original variable.
data=data.frame(value=1:11,EV=seq(100,150,5))
q<-seq(-0.5,0.5,0.1)
data
EV=rep(NA,11)
for (i in 1:11){
value=assign(paste0('EV',i),data[,'EV']*(1+q[i]))
EV=cbind(EV,value)
}
EV=data.frame(EV[,2:12])
names(EV)<-paste0('EV',1:11)
EV
data.frame(data,EV)

data.table subsetting in i with column number [duplicate]

This question already has answers here:
data.table in r : subset using column index
(2 answers)
Closed 5 years ago.
Is it possible to subset a data.table in i, referencing the column not by its name (e.g. by number/position)?
Example:
library(data.table)
dt <- data.table(A=1:18, Name=c('A','B','C'))
dt2 <- data.table(A=2:20, Username=c('A','B','C'))
#stuff happens and eventually I end up with either dt or dt2 copied to a final dt
#depending on which is there, I want to get only "A"s
final[Name=='A']
final[Username=='A']
But I want a way that I can subset both data.tables with the same call despite the different column names. One potential solution is to set the key for each dt as Name and Username, then subset like this: final['A'] but I am wondering if there is another way.
I can't change the column names because they are going into a table in a shiny app.
If this is based on position, then we extract the column with numeric column index using [[ and do the comparison to get the logical vector and subset the rows based on it
final[final[[2]]=="A"]

Resources