I would like to manually define the name of the object/output of a function. A very simple example of what I have is:
x <- data.frame(name = c("A", "B", "C"),
value = c(50, 20, 100))
statistics <- function(data, name){
total <- data %>% mutate(New = value +50)
assign(paste0(name), data)
}
statistics(x, "NewName")
I would like to run this function and define what data to use and the name of the output. The idea is to create a uniquely named output for each dataset used.
Thanks!
One way is to just assign the data with the <- instead of using assign() in the function call. Or you can use assign but you have to specify the envir you would like the object to be assigned too. If it is left blank it will go to the functions environment.
A word of caution on using assign() in the function is that is will overwrite other objects in the global env if they have the same name. So be careful with your object names.
x <- data.frame(name = c("A", "B", "C"),
value = c(50, 20, 100))
statistics <- function(data){
data %>% mutate(New = value +50)
}
Newname <- statistics(x)
statistics2 <- function(data, name){
total <- data %>% mutate(New = value +50)
assign(paste0(name), total, envir = .GlobalEnv)
}
statistics2(x, "NewName2")
A slight aside in your code in your assign() it should say total not data.
Related
Say I have these data:
data <- data.frame(a = c(1,2,3))
And a simple function that creates a data frame. It also creates a new variable based on a simple function; this new variable takes the name that is passed in with varname. Here is my attempt (the assign line is wrong):
fun <- function(varname) {
data <- data.frame(a = c(1,2,3))
assign(paste0("data$", varname), sqrt(data$a))
data
}
fun("newvar")
Base R or tidyverse solutions are both great.
You were close! There are multiple ways of subsetting dataframes, including using the [[.]] notation (e.g., data[["var"]]. Simply assigning a value to a new column initializes the column.
fun <- function(varname) {
data <- data.frame(a = c(1,2,3))
data[[varname]] <- sqrt(data$a)
data
}
fun("newvar")
tidyverse
If you want to pass the variable name as a string then a tidyverse method would be:
library(dplyr)
fun <- function(varname) {
data.frame(a = c(1,2,3)) %>%
mutate(!! varname := sqrt(a))
}
fun("newvar")
Alternatively, you could use tidyeval so you don't have to quote the variable name:
library(dplyr)
fun <- function(varname) {
varname <- rlang::enquo(varname)
data.frame(a = c(1,2,3)) %>%
mutate(!! varname := sqrt(a))
}
fun(newvar)
base R
If you want to use base R I would recommend the solution posted by #Noah, but another base R option that is fairly obtuse would be:
fun <- function(varname) {
data.frame(a = c(1,2,3)) |>
within(eval(substitute(x <- sqrt(a), list(x = as.name(varname)))))
}
fun("newvar")
I'm trying to make my code general, I'd only want to change the YEAR variable without having to change everything in the code
YEAR = 1970
y <- data.frame(col1 = c(1:5))
function (y){
summarize(column_YEAR = sum(col1))
}
#Right now this gives
column_YEAR
1 15
#I would like this function to output this (so col1 is changed to column_1970)
column_1970
1 15
or for example this
df <- list("a_YEAR" = anotherdf)
#I would like to have a list with a df with the name a_1970
I tried things like
df <- list(assign(paste0(a_, YEAR), anotherdf))
But it does not work, does somebody have any advice? Thanks in advance :)
rlang provides a flexible way to defuse R expressions. You can use that functionality to create dynamic column names within dplyr flow. In this example dynamic column name is created using suffix argument passed to a wrapper function on dplyr's summarise.
library("tidyverse")
YEAR = 1970
y <- data.frame(col1 = c(1:5))
function (y) {
summarize(column_YEAR = sum(col1))
}
my_summarise <- function(.data, suffix, sum_col) {
var_name <- paste0("column_", suffix)
summarise(.data,
{{var_name}} := sum({{sum_col}}))
}
my_summarise(.data = y, suffix = YEAR, sum_col = col1)
Results
my_summarise(.data = y, suffix = YEAR, sum_col = col1)
# column_1970
# 1 15
You can also source arguments directly from global environment but from readability perspective this is poorer solution as it's not immediately clear how the function creates suffix.
my_summarise_two <- function(.data, sum_col) {
var_name <- paste0("column_", YEAR)
summarise(.data,
{{var_name}} := sum({{sum_col}}))
}
my_summarise_two(.data = y, sum_col = col1)
I am trying to create a function that performs several statistical tests on specific columns in a dataframe. Some of the tests require more than one level. I would like to test how many levels are in a specific column, but can't seem to get it right.
In my actual code this section would be followed by an ifelse that returns a string saying 'only one level' if single, or continues to the statistical test if > 1.
require("dplyr")
df <- data.frame(A = c("a", "b", "c"), B = c("a", "a", "a"), C = c("a", "b", "b")) %>%
mutate(A = factor(A)) %>%
mutate(B = factor(B)) %>%
mutate(C = factor(C))
my_funct <- function(data_f, column){
n_fact <- paste("data_f", column, sep = "$")
n_levels <- do.call("nlevels",
list(x = as.name(n_fact)))
print(n_levels)
}
```
Then I call my function with the dataframe and column
my_funct(df, "A")
I get the following error:
Error in levels(x) : object 'data_f$A' not found
If I remove the as.name() wrapper it returns a value of 0.
One reason your code is not working is because data_f$A is not the name of any object available to the function.
But I would recommend you don't even try to parse code as strings. It's the wrong way to do it. All you need is double bracket indexing [[. So the body of your function can be the following single line:
nlevels(data_f[[column]])
And for all the columns:
sapply(data_f, nlevels)
I am a novice R programmer. I am wondering how to lappy over a dataframe but avoiding certain columns.
# Some dummy dataframe
df <- data.frame(
grp = c("A", "B", "C", "D"),
trial = as.factor(c(1,1,2,2)),
mean = as.factor(c(44,33,22,11)),
sd = as.factor(c(3,4,1,.5)))
df <- lapply(df, function (x) {as.numeric(as.character(x))})
However, the method I used introduces NAs by coercion.
Would there to selectively (or deselectively) lapply over the dataframe while maintaining the integrity of the dataframe?
In other words, would there be a way to convert only mean and sd to numerics? (In general form)
Thank you
Try doing this:
df[,3:4] <- lapply(df[,3:4], function (x) {as.numeric(as.character(x))})
You are simply passing function to the specified columns. You can also provide a condition to select subset of your columns, something like excluding the ones you don't want to cast.
col = names(df)[names(df)!=c("grp","trial")]
df[,col] <- lapply(df[,col], function (x) {as.numeric(as.character(x))})
Well as you might have guessed, there are many ways. Since you seem to be doing in place substitution, actually, a for loop would be suitable.
df <- data.frame(
grp = c("A", "B", "C", "D"),
trial = as.factor(c(1,1,2,2)),
mean = as.factor(c(44,33,22,11)),
sd = as.factor(c(3,4,1,.5)))
my_cols <- c("trial", "mean", "sd")
for(mc in my_cols) {
df[[mc]] <- as.numeric(as.character(df[[mc]]))
}
If you want to convert selectively by column names:
library(dplyr)
df %>%
mutate_if(names(.) %in% c("mean", "sd"),
function(x) as.numeric(as.character(x)))
I try to create a data.fame, and then add some columns to this data.frame.
I try following code, but it does not work:
test.dim <- as.data.frame(matrix(nrow=0, ncol=4))
names <- c("A", "B", "C", "D")
colnames(test.dim) <- names
for (i in 1:4) {
name = names[i]
# do some calculations, at last get another data.fame named x.data
mean.data <- apply(x.data, 1, mean)
test.dim[, name] <- mean.data
}
Usually one would already have a data.frame (call it df) and simply add frames by calling df$newColName = values or df[,newColNames] = frame_of_values.
Your question indicates that you are separating the creation of your values from putting them in the data frame (which I do not recommend). But if you really want to start from a zero row zero col frame here are some options:
colnamesToAdd = LETTERS[1:4]
test.dim = data.frame( matrix(rep(NA),length(colnamesToAdd),nrow=1) )
colnames(test.dim) = colnamesToAdd
test.dim = test.dim[-1,]
Another option:
colnamesToAdd = LETTERS[1:4]
test.dim = data.frame("USELESS" = NA)
test.dim[,colnamesToAdd] = NA
test.dim = test.dim[-1,-1]
If you are looking to add a mean to your table and repeat it for every factor:
library(data.table);
test.dim = data.table("FACTOR" = sample(letters[1:4],100,replace=TRUE), "VALUE" = runif(100), "MEAN" = NA)
means = test.dim[,list(AVG=mean(VALUE)),by="FACTOR"]
# without data.table: by(test.dim$VALUE, test.dim$FACTOR, mean)
for(x in 1:nrow(means)) { test.dim$MEAN[test.dim$FACTOR==means$FACTOR[x]] = means$AVG[x] } # normally I would use the foreach package instead of this last for loop