Piping variable name into dataframe columns - r

I've been trying to create a function that can pipe the name of a variable into the names of columns for a data.frame that will be created in that function.
For example:
#Create variable
Var1 <- c(1,1,2,2,3,3)
#Create function that dummy codes the variable and renames them
rename_dummies <- function(x){
m <- model.matrix(~factor(x))
colnames(m)[1] <- "Dummy1"
colnames(m)[2] <- "Dummy2"
colnames(m)[3] <- "Dummy3"
m <<- data.frame(m)
}
rename_dummies(Var1)
Now, what can I add to this function that that "Var1" is automatically placed in front of "Dummy" in each of the variable names? Ideally I would end up with 3 variables that look like this...
> names(m)
[1] "Var1_Dummy1" "Var1_Dummy2" "Var1_Dummy3"

Try the below code. The key is in deparse(substitute). I also modified your function to not use the global assignment operator <<-, which is poor practice.
Var1 <- c(1,1,2,2,3,3)
#Create function that dummy codes the variable and renames them
rename_dummies <- function(x){
nm = deparse(substitute(x))
m <- model.matrix(~factor(x))
colnames(m)[1] <- "Dummy1"
colnames(m)[2] <- "Dummy2"
colnames(m)[3] <- "Dummy3"
m <- data.frame(m)
names(m) <- paste(nm, names(m), sep = "_")
m
}
rename_dummies(Var1)

Related

undefined columns selected and cannot xtfrm data frame error

I am trying to write a code that checks for outliers based on IQR and change those respective values to "NA". So I wrote this:
dt <- rnorm(200)
dg <- rnorm(200)
dh <- rnorm(200)
l <- c(1,3) #List of relevant columns
df <- data.frame(dt,dg,dh)
To check if the column contains any outliers and change their value to NA:
vector.is.empty <- function(x) return(length(x) ==0)
#Checks for empty values in vector and returns booleans.
for (i in 1:length(l)){
IDX <- l[i]
BP <- boxplot.stats(df[IDX])
OutIDX <- which(df[IDX] %in% BP$out)
if (vector.is.empty(OutIDX)==FALSE){
for (u in 1:length(OutIDX)){
IDX2 <- OutIDX[u]
df[IDX2,IDX] <- NA
}
}
}
So, when I run this code, I get these error messages:
I've tried to search online for any good answers. but I'm not sure why they claim that the column is unspecified. Any clues here?
I would do something like that in order to replace the outliers:
# Set a seed (to make the example reproducible)
set.seed(31415)
# Generate the data.frame
df <- data.frame(dt = rnorm(100), dg = rnorm(100), dh = rnorm(100))
# A list to save the result of boxplot.stats()
l <- list()
for (i in 1:ncol(df)){
l[[i]] <- boxplot.stats(df[,i])
df[which(df[,i]==l[[i]]$out),i] <- NA
}
# Which values have been replaced?
lapply(l, function(x) x$out)

split list into atomic character vectors by name [duplicate]

A post on here a day back has me wondering how to assign values to multiple objects in the global environment from within a function. This is my attempt using lapply (assign may be safer than <<- but I have never actually used it and am not familiar with it).
#fake data set
df <- data.frame(
x.2=rnorm(25),
y.2=rnorm(25),
g=rep(factor(LETTERS[1:5]), 5)
)
#split it into a list of data frames
LIST <- split(df, df$g)
#pre-allot 5 objects in R with class data.frame()
V <- W <- X <- Y <- Z <- data.frame()
#attempt to assign the data frames in the LIST to the objects just created
lapply(seq_along(LIST), function(x) c(V, W, X, Y, Z)[x] <<- LIST[[x]])
Please feel free to shorten any/all parts of my code to make this work (or work better/faster).
Update of 2018-10-10:
The most succinct way to carry out this specific task is to use list2env() like so:
## Create an example list of five data.frames
df <- data.frame(x = rnorm(25),
g = rep(factor(LETTERS[1:5]), 5))
LIST <- split(df, df$g)
## Assign them to the global environment
list2env(LIST, envir = .GlobalEnv)
## Check that it worked
ls()
## [1] "A" "B" "C" "D" "df" "E" "LIST"
Original answer, demonstrating use of assign()
You're right that assign() is the right tool for the job. Its envir argument gives you precise control over where assignment takes place -- control that is not available with either <- or <<-.
So, for example, to assign the value of X to an object named NAME in the the global environment, you would do:
assign("NAME", X, envir = .GlobalEnv)
In your case:
df <- data.frame(x = rnorm(25),
g = rep(factor(LETTERS[1:5]), 5))
LIST <- split(df, df$g)
NAMES <- c("V", "W", "X", "Y", "Z")
lapply(seq_along(LIST),
function(x) {
assign(NAMES[x], LIST[[x]], envir=.GlobalEnv)
}
)
ls()
[1] "df" "LIST" "NAMES" "V" "W" "X" "Y" "Z"
I think this question can have a nice crossover with this one: Can lists be created that name themselves based on input object names?
Say you want to do the same modification to a set of objects on the fly. But list2env() requires a named list, and you don't want to copy and paste them again. Borrowing the namedList function, and combining it with
Josh O'Brien anwser:
> namedList <- function(...) {
+ L <- list(...)
+ snm <- sapply(substitute(list(...)), deparse)[-1]
+ if (is.null(nm <- names(L))) nm <- snm
+ if (any(nonames <- nm=="")) nm[nonames] <- snm[nonames]
+ setNames(L ,nm)
+ }
>
> df_1 <- data.frame(x = 1)
> df_2 <- data.frame(x = 2)
> df_3 <- data.frame(x = 3)
>
> list2env(lapply(namedList(df_1, df_2, df_3), function(x) {
+ x <- cbind.data.frame(x, y = "B")
+ }), envir = .GlobalEnv)
<environment: R_GlobalEnv>
>
> df_1
x y
1 1 B
> df_2
x y
1 2 B
> df_3
x y
1 3 B
If you have a list of object names and file paths you can also use mapply:
object_names <- c("df_1", "df_2", "df_3")
file_paths <- list.files({path}, pattern = ".csv", full.names = T)
mapply(function(df_name, file)
assign(df_name, read.csv(file), envir=.GlobalEnv),
object_names,
file_paths)
I used list.files() to construct a vector of all the .csv files in a
specific directory. But file_paths could be written or constructed in any way.
If the files you want to read in are in the current working
directory, then file_paths could be replaced with a character vector of
file names.
In the code above, you need to replace {path} with a
string of the desired directory's path.
This demonstrates how to split out a nested dataframe into objects in the global environment with tidyverse functions:
library(tidyverse)
library(palmerpenguins)
penguins %>%
group_nest(species) %>%
deframe() %>%
list2env(.GlobalEnv)

looping over variables of a data.frame leading one final data.frame in R

I have written a function to change any one variable (i.e., column) in a data.frame to its unique levels and return the changed data.frame.
I wonder how to change multiple variables at once using my function and get one final data.frame with all the changes?
I have tried the following, but this gives multiple data.frames while only the last data.frame is the desired output:
data <- data.frame(sid = c(33,33, 41), pid = c('Bob', 'Bob', 'Jim'))
#== My function for ONE variable:
f <- function(data, what){
data[[what]] <- as.numeric(factor(data[[what]], levels = unique(data[[what]])))
return(data)
}
# Looping over `what`:
what <- c('sid', 'pid')
lapply(seq_along(what), function(i) f(data, what[i]))
In the function, we could change to return the data[[what]]
f <- function(data, what){
data[[what]] <- as.numeric(factor(data[[what]], levels = unique(data[[what]])))
data[[what]]
}
data[what] <- lapply(seq_along(what), function(i) f(data, what[i]))
Or do
data[what] <- lapply(what, function(x) f(data, x))
Or simply
data[what] <- lapply(what, f, data = data)

Store a single dataframe in environment from function in R [duplicate]

A post on here a day back has me wondering how to assign values to multiple objects in the global environment from within a function. This is my attempt using lapply (assign may be safer than <<- but I have never actually used it and am not familiar with it).
#fake data set
df <- data.frame(
x.2=rnorm(25),
y.2=rnorm(25),
g=rep(factor(LETTERS[1:5]), 5)
)
#split it into a list of data frames
LIST <- split(df, df$g)
#pre-allot 5 objects in R with class data.frame()
V <- W <- X <- Y <- Z <- data.frame()
#attempt to assign the data frames in the LIST to the objects just created
lapply(seq_along(LIST), function(x) c(V, W, X, Y, Z)[x] <<- LIST[[x]])
Please feel free to shorten any/all parts of my code to make this work (or work better/faster).
Update of 2018-10-10:
The most succinct way to carry out this specific task is to use list2env() like so:
## Create an example list of five data.frames
df <- data.frame(x = rnorm(25),
g = rep(factor(LETTERS[1:5]), 5))
LIST <- split(df, df$g)
## Assign them to the global environment
list2env(LIST, envir = .GlobalEnv)
## Check that it worked
ls()
## [1] "A" "B" "C" "D" "df" "E" "LIST"
Original answer, demonstrating use of assign()
You're right that assign() is the right tool for the job. Its envir argument gives you precise control over where assignment takes place -- control that is not available with either <- or <<-.
So, for example, to assign the value of X to an object named NAME in the the global environment, you would do:
assign("NAME", X, envir = .GlobalEnv)
In your case:
df <- data.frame(x = rnorm(25),
g = rep(factor(LETTERS[1:5]), 5))
LIST <- split(df, df$g)
NAMES <- c("V", "W", "X", "Y", "Z")
lapply(seq_along(LIST),
function(x) {
assign(NAMES[x], LIST[[x]], envir=.GlobalEnv)
}
)
ls()
[1] "df" "LIST" "NAMES" "V" "W" "X" "Y" "Z"
I think this question can have a nice crossover with this one: Can lists be created that name themselves based on input object names?
Say you want to do the same modification to a set of objects on the fly. But list2env() requires a named list, and you don't want to copy and paste them again. Borrowing the namedList function, and combining it with
Josh O'Brien anwser:
> namedList <- function(...) {
+ L <- list(...)
+ snm <- sapply(substitute(list(...)), deparse)[-1]
+ if (is.null(nm <- names(L))) nm <- snm
+ if (any(nonames <- nm=="")) nm[nonames] <- snm[nonames]
+ setNames(L ,nm)
+ }
>
> df_1 <- data.frame(x = 1)
> df_2 <- data.frame(x = 2)
> df_3 <- data.frame(x = 3)
>
> list2env(lapply(namedList(df_1, df_2, df_3), function(x) {
+ x <- cbind.data.frame(x, y = "B")
+ }), envir = .GlobalEnv)
<environment: R_GlobalEnv>
>
> df_1
x y
1 1 B
> df_2
x y
1 2 B
> df_3
x y
1 3 B
If you have a list of object names and file paths you can also use mapply:
object_names <- c("df_1", "df_2", "df_3")
file_paths <- list.files({path}, pattern = ".csv", full.names = T)
mapply(function(df_name, file)
assign(df_name, read.csv(file), envir=.GlobalEnv),
object_names,
file_paths)
I used list.files() to construct a vector of all the .csv files in a
specific directory. But file_paths could be written or constructed in any way.
If the files you want to read in are in the current working
directory, then file_paths could be replaced with a character vector of
file names.
In the code above, you need to replace {path} with a
string of the desired directory's path.
This demonstrates how to split out a nested dataframe into objects in the global environment with tidyverse functions:
library(tidyverse)
library(palmerpenguins)
penguins %>%
group_nest(species) %>%
deframe() %>%
list2env(.GlobalEnv)

Assign multiple objects to .GlobalEnv from within a function

A post on here a day back has me wondering how to assign values to multiple objects in the global environment from within a function. This is my attempt using lapply (assign may be safer than <<- but I have never actually used it and am not familiar with it).
#fake data set
df <- data.frame(
x.2=rnorm(25),
y.2=rnorm(25),
g=rep(factor(LETTERS[1:5]), 5)
)
#split it into a list of data frames
LIST <- split(df, df$g)
#pre-allot 5 objects in R with class data.frame()
V <- W <- X <- Y <- Z <- data.frame()
#attempt to assign the data frames in the LIST to the objects just created
lapply(seq_along(LIST), function(x) c(V, W, X, Y, Z)[x] <<- LIST[[x]])
Please feel free to shorten any/all parts of my code to make this work (or work better/faster).
Update of 2018-10-10:
The most succinct way to carry out this specific task is to use list2env() like so:
## Create an example list of five data.frames
df <- data.frame(x = rnorm(25),
g = rep(factor(LETTERS[1:5]), 5))
LIST <- split(df, df$g)
## Assign them to the global environment
list2env(LIST, envir = .GlobalEnv)
## Check that it worked
ls()
## [1] "A" "B" "C" "D" "df" "E" "LIST"
Original answer, demonstrating use of assign()
You're right that assign() is the right tool for the job. Its envir argument gives you precise control over where assignment takes place -- control that is not available with either <- or <<-.
So, for example, to assign the value of X to an object named NAME in the the global environment, you would do:
assign("NAME", X, envir = .GlobalEnv)
In your case:
df <- data.frame(x = rnorm(25),
g = rep(factor(LETTERS[1:5]), 5))
LIST <- split(df, df$g)
NAMES <- c("V", "W", "X", "Y", "Z")
lapply(seq_along(LIST),
function(x) {
assign(NAMES[x], LIST[[x]], envir=.GlobalEnv)
}
)
ls()
[1] "df" "LIST" "NAMES" "V" "W" "X" "Y" "Z"
I think this question can have a nice crossover with this one: Can lists be created that name themselves based on input object names?
Say you want to do the same modification to a set of objects on the fly. But list2env() requires a named list, and you don't want to copy and paste them again. Borrowing the namedList function, and combining it with
Josh O'Brien anwser:
> namedList <- function(...) {
+ L <- list(...)
+ snm <- sapply(substitute(list(...)), deparse)[-1]
+ if (is.null(nm <- names(L))) nm <- snm
+ if (any(nonames <- nm=="")) nm[nonames] <- snm[nonames]
+ setNames(L ,nm)
+ }
>
> df_1 <- data.frame(x = 1)
> df_2 <- data.frame(x = 2)
> df_3 <- data.frame(x = 3)
>
> list2env(lapply(namedList(df_1, df_2, df_3), function(x) {
+ x <- cbind.data.frame(x, y = "B")
+ }), envir = .GlobalEnv)
<environment: R_GlobalEnv>
>
> df_1
x y
1 1 B
> df_2
x y
1 2 B
> df_3
x y
1 3 B
If you have a list of object names and file paths you can also use mapply:
object_names <- c("df_1", "df_2", "df_3")
file_paths <- list.files({path}, pattern = ".csv", full.names = T)
mapply(function(df_name, file)
assign(df_name, read.csv(file), envir=.GlobalEnv),
object_names,
file_paths)
I used list.files() to construct a vector of all the .csv files in a
specific directory. But file_paths could be written or constructed in any way.
If the files you want to read in are in the current working
directory, then file_paths could be replaced with a character vector of
file names.
In the code above, you need to replace {path} with a
string of the desired directory's path.
This demonstrates how to split out a nested dataframe into objects in the global environment with tidyverse functions:
library(tidyverse)
library(palmerpenguins)
penguins %>%
group_nest(species) %>%
deframe() %>%
list2env(.GlobalEnv)

Resources