This question already has answers here:
Dynamically select data frame columns using $ and a character value
(10 answers)
Closed 1 year ago.
I want to do graphical outputs such as boxplots or graphs using a function. So that I can plot several dataframes, changing only the column name each time.
For example :
boxplot_func = function(column){
boxplot(dataframe1$column, dataframe2$column)}
boxplot_func(mean)
boxplot_func(max)
etc.
But R doesn't seem to compute mean or max in the function. Do you know a way to do it ?
One option would be to pass the column as a character string and use [[ to access the column in your function:
A simple example using mtcars:
boxplot_func = function(column) {
boxplot(mtcars[[column]], mtcars[[column]])
}
boxplot_func("mpg")
Related
This question already has answers here:
Create a numeric vector with names in one statement?
(6 answers)
Closed 10 months ago.
How to assign values to string when the data is very large.
Currently I assign values to character vectors manually as illustrated below, however, when the amount of data is very large it becomes tedious to do that process manually. Is there a function that allows me to do it?
c("a" = 100, "b" = 200, "c"=300, ..., "aaaaaa" = n)
Is there any particular meaning to your numbering? If you just need numerical values for strings, you can use as.factor and as.numeric as outlined in this post:
R: Encode character variables into numeric
However, if you need a specific encoding you will have specify the associated labels necessary; there isn't enough information in your question to help with this further, but the documentation is here:
https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/factor
This question already has answers here:
Use dynamic name for new column/variable in `dplyr`
(10 answers)
Use variable names in functions of dplyr
(9 answers)
Closed 11 months ago.
I am having difficulties with utilizing a dynamic variable in dplyr (specifically, using dply::mutate to pick out a specific column). It seems there are quite a few posts online about using dynamic variables in dplyr, but I've gotten confused by this. It seems the solution to use a dynamic variable, you need to write a function, then apply this to your dplyr pipe.
Is there a way to directly use a dynamic variable in a dplyr::mutate, either as a new column name and/or to reference a column that we want to apply the calculation to?
Below is an example (where I am trying to use two variables; newcol as the name of the new column I'm trying to create, and col as the name of the column I'm trying to apply the calculation to).
#libraries
library(dplyr)
#make dataframe and variables for name
testdf <- data.frame(cola=c(1,2,3),colb=c(4,5,6),colc=c(7,8,9))
col <- "cola"
newcol <- "newcolname"
#mutate: new column (with variable column name & variable column which I'm using to calculate)
testdf %>%
dplyr::mutate(newcol=col*5)
Thanks for your help!
This question already has answers here:
Dynamically select data frame columns using $ and a character value
(10 answers)
Closed 2 years ago.
I have a dataset called df, which has columns a and b with three integers each. I want to write a function for the mean (obviously this already exists; I want to write a larger function and this appears to be where problems are occurring). However, this function returns NA:
mean_function <- function(x) {
mean(df$x)
}
mean_function(a) returns NA, while mean(df$a) returns 2. Is there something I'm missing about how R functions handle datasets, or another problem?
We need [[ instead of $ as it will literally check for x as column and pass a string
mean_function <- function(x) {mean(df[[x]])}
mean_function("a")
If we need to pass unquoted column name, substitute and convert to character with deparse
mean_function<- function(x) {
x <- deparse(substitute(x))
mean(df[[x]]
}
mean_function(a)
This question already has answers here:
Dynamically select data frame columns using $ and a character value
(10 answers)
Closed 4 years ago.
I have data frame with column names 1990.x ..2000.x, 1990.y,..2000.y. I want to replace NAs in variables ending with ".x" with values from .y from corresponding year. It is element by element computation of formula 1990.x = 0.5+0.2*log(1990.y)
I wanted to do something like this:
for (v in colnames(df[ ,grepl(".x",names(df))])) {
print(v)
df$v <- ifelse(is.na(df$v), ols$coefficients[1]+ols$coefficients[2]*log(df$gsub(".x",".y",v)), df$v)
}
but this is not working. How can i make this loop working, or is there any better solution?
Thanks
The $ operator is available for convenience, but can't be used inside of a for loop where the value of the column you're selecting is going to change, e.g, in your for loop. Your code will need to use the [ operator (open and closed square brackets) instead:
df[,v] <- ifelse(is.na(df[,v]), ols$coefficients[1]+ols$coefficients[2]*log(df$gsub(".x",".y",v)), df[,v])
This question already has answers here:
define $ right parameter with a variable in R [duplicate]
(3 answers)
Closed 7 years ago.
I'm trying to filter through a list of 100 lists with four columns of data to pull out individual columns and operate on them.
The columns are: Date/Time, Measurement 1, Measurement 2, Identity Variable
filepull <- list of 100 lists
column_name <- "foo"
meanoflist <- NULL
for (i in 1:100) {
holder_variable<-filepull[[i]]
meanoflist[i]<-mean(na.omit(holder_variable$column_name))
}
holder_variable$"foo" gives me what I need, but holder_variable$column_name gives me NULL. What gives?
Thx for the help!
When you use the $ operator, the input doesn't get evaluated; it will be used as-is. So, by using holder_variable$column_name, you are actually trying to get the column with the name column_name. If you want to get the value with the name stored in a variable, use holder_variable[, column_name] (assuming holder_variable is a data.frame, from instance).
Take a look at this example, to better understand the difference.