This question already has answers here:
Use dynamic name for new column/variable in `dplyr`
(10 answers)
Use variable names in functions of dplyr
(9 answers)
Closed 11 months ago.
I am having difficulties with utilizing a dynamic variable in dplyr (specifically, using dply::mutate to pick out a specific column). It seems there are quite a few posts online about using dynamic variables in dplyr, but I've gotten confused by this. It seems the solution to use a dynamic variable, you need to write a function, then apply this to your dplyr pipe.
Is there a way to directly use a dynamic variable in a dplyr::mutate, either as a new column name and/or to reference a column that we want to apply the calculation to?
Below is an example (where I am trying to use two variables; newcol as the name of the new column I'm trying to create, and col as the name of the column I'm trying to apply the calculation to).
#libraries
library(dplyr)
#make dataframe and variables for name
testdf <- data.frame(cola=c(1,2,3),colb=c(4,5,6),colc=c(7,8,9))
col <- "cola"
newcol <- "newcolname"
#mutate: new column (with variable column name & variable column which I'm using to calculate)
testdf %>%
dplyr::mutate(newcol=col*5)
Thanks for your help!
Related
This question already has answers here:
Opposite of %in%: exclude rows with values specified in a vector
(13 answers)
Closed last year.
How can I filter out (exclude) from a single column called "record". I would like to exclude record = (1,2,3,6,8,10,15,16) from a single column. dataset name is "sample". Sorry for a simple question I am brand new to R.
sample data set below
The dplyr library from tidyverse is very helpful for these types of problems.
library(dplyr)
df_filtered<-df %>%
filter(!(record %in% c(1,2,3,6,8,10,15,16)))
This question already has answers here:
Dynamically select data frame columns using $ and a character value
(10 answers)
Closed 1 year ago.
I want to do graphical outputs such as boxplots or graphs using a function. So that I can plot several dataframes, changing only the column name each time.
For example :
boxplot_func = function(column){
boxplot(dataframe1$column, dataframe2$column)}
boxplot_func(mean)
boxplot_func(max)
etc.
But R doesn't seem to compute mean or max in the function. Do you know a way to do it ?
One option would be to pass the column as a character string and use [[ to access the column in your function:
A simple example using mtcars:
boxplot_func = function(column) {
boxplot(mtcars[[column]], mtcars[[column]])
}
boxplot_func("mpg")
This question already has answers here:
Mutate multiple / consecutive columns (with dplyr or base R)
(5 answers)
Mutating multiple columns in a data frame using dplyr
(4 answers)
Closed 4 years ago.
I want to generate multiple new columns based on the current variable information in the data.frame (eg. data here), I can generate the variables with the loops using the scripts I listed below, is it possible to create them with the mutate command in dplyr with concise command? For example, using the mutate together with function, apply function or loop?
The requirement seems different from what have been achieved here
Mutating multiple columns in a data frame using dplyr and Mutate multiple / consecutive columns (with dplyr or base R) as I want to create dozens of new columns with certain pattern based on one original variable.
data=data.frame(value=1:11,EV=seq(100,150,5))
q<-seq(-0.5,0.5,0.1)
data
EV=rep(NA,11)
for (i in 1:11){
value=assign(paste0('EV',i),data[,'EV']*(1+q[i]))
EV=cbind(EV,value)
}
EV=data.frame(EV[,2:12])
names(EV)<-paste0('EV',1:11)
EV
data.frame(data,EV)
This question already has answers here:
In R data.table, how do I pass variable parameters to an expression?
(1 answer)
Select / assign to data.table when variable names are stored in a character vector
(6 answers)
How to use data.table within functions and loops?
(2 answers)
Closed 5 years ago.
I need to aggregate a data.table and create a table with counts, means and other statistics for several variables. The format for the output table should always be the same, but I need to aggregate by various methods. How can I set the output columns and aggregate statistics once and use for different by= choices?
# Create data.table
library(data.table)
DT <- data.table(iris)
# This works, but is long and needs to be updated in multiple
# place whenever I update the output format
DT[,list(theCount=.N,
meanSepalWidth=mean(Sepal.Width),
meanPetalWidth=mean(Petal.Width)),
by=Species]
# This does not work. How could I achieve what I'm trying to do here?
col.list <- list(theCount=.N,
meanSepalWidth=mean(Sepal.Width),
meanPetalWidth=mean(Petal.Width))
DT[,col.list, by=Species]
This question already has answers here:
dplyr::group_by_ with character string input of several variable names
(2 answers)
Group by multiple columns in dplyr, using string vector input
(10 answers)
Closed 5 years ago.
My code need to group by column names. The problem that the code adds or removes columns to data.frame automatically, thus putting columns names by hand is not good solution.
Is there work around this problem. I tried solutions like this but obviously it doesn’t work. In addition the dataframe stretches to over 100 columns.
myDataFrame1 <- myDataFrame %>% group_by( colnames(myDataFrame) )
How can I paste the column names into group_by() automatically.
Thanks for help
We can make use of the group_by_ if there are more columns. Suppose, we want to have the first three columns as the grouping variable,
library(dplyr)
myDataFrame %>%
group_by_(.dots = names(myDataFrame)[1:3])