Save a dataframe name and then reference that object in subsequent code - r

Would like to reference a dataframe name stored in an object, such as:
dfName <- 'mydf1'
dfName <- data.frame(c(x = 5)) #want dfName to resolve to 'mydf1', not create a dataframe named 'dfName'
mydf1
Instead, I get: Error: object 'mydf1' not found
CORRECTED SCENARIO:
olddf <- data.frame(c(y = 8))
mydf1 <- data.frame(c(x = 5))
assign('dfName', mydf1)
dfName <- olddf #why isnt this the same as doing "mydf1 <- olddf"?
I don't want to reference an actual dataframe named "dfName", rather "mydf1".
UPDATE
I have found a clunky workaround for what I wanted to do. The code is:
olddf <- data.frame(x = 8)
olddfName <- 'olddf'
newdfName <- 'mydf1'
statement <- paste(newdfName, "<-", olddfName, sep = " ")
writeLines(statement, "mycode.R")
source("mycode.R")
Anyone have a more elegant way, especially without resorting to a write/source?

I am guessing you want to store multiple data.frames in a loop or similar. In that case it is much more efficient and better to store them in a named list. However, you can achieve your goal with assign
assign('mydf1', data.frame(x = 5))
mydf1
x
1 5

Related

Error: Can't subset columns that don't exist. x Columns `Q2`, `Q2`, `Q2`, `Q2`, `Q2`, etc. don't exist. in R

Can someone explain why I have the error saying I am trying to submit columns that don't exist when I am actually not trying to submit that column?
The error is this one:
Error: Can't subset columns that don't exist.
x Columns Q2, Q2, Q2, Q2, Q2, etc. don't exist.
But 'Q2' are values of a column I have created before and not variable names.
Below the details:
#load libraries
library(tidyverse)
library(stargazer)
library(lubridate)
library(data.table)
library(dplyr)
#Reading data
raw_data <- read.csv(file= "data/nyc_open_data_vehicle_collisions.csv")
#Inspecting the structure of the dataset
names(raw_data)
tail(raw_data, 3)
str(raw_data)
Sys.getlocale()
Sys.setlocale("LC_ALL", "English")
table(raw_data$CRASH.DATE)
table(raw_data$CRASH.DATE)
#Giving it format as date to CRASH.DATE
raw_data$CRASH.DATE <- as.Date(raw_data$CRASH.DATE, format="%m/ %d/ %Y")
#Creating a new variable that identify the quarter of the crash
raw_data$CRASH.QUARTER <- quarters(raw_data$CRASH.DATE)
#Creating a new variable that identify the year of the crash
raw_data$CRASH.YEAR <- year(raw_data$CRASH.DATE)
#cheking if it make sense
head(raw_data[,c("CRASH.DATE","CRASH.QUARTER", "CRASH.YEAR")],10)
#Creating new dataset:
#panel_df <- raw_data`enter code here`
panel_df <- raw_data %>%
select (ZIPCODE <- raw_data$ZIP.CODE,
YEAR <- raw_data$CRASH.YEAR,
QUARTER <- raw_data$CRASH.QUARTER,
NUMBER.OF.CRASHES <- "1",
TOTAL.NUMBER.OF.PERSONS.INJURIED <- raw_data$NUMBER.OF.PERSON.INJURED,
TOTAL.NUMBER.OF.PERSONS.KILLED <- raw_data$NUMBER.OF.PERSON.KILLED,
NUMBER.OF.CRASHES.WITH.INJURIES <- ifelse(raw_data$NUMBER.OF.PERSONS.INJURIED > "0" , 1, 0),
NUMBER.OF.PERDESTRIANS.INJURED <- raw_data$NUMBER.OF.PEDESTRIANS.INJURED,
NUMBER.OF.PEDESTRIANS.KILLED <- raw_data$NUMBER.OF.PEDESTRIANS.KILLED,
NUMBER.OF.CYCLIST.INJURED <- raw_data$NUMBER.OF.CYCLIST.INJURED,
NUMBER.OF.CYCLIST.KILLED <- raw_data$NUMBER.OF.CYCLIST.KILLED,
NUMBER.OF.MOTORIST.INJURED <- raw_data$NUMBER.OF.MOTORIST.INJURED,
NUMBER.OF.MOTORIST.KILLED <- raw_data$NUMBER.OF.MOTORIST.KILLED)
In which line does the error occur?
A few things might help,
make the "<-" in the select statement "="
You don't need to specify the data again after the raw_data %>%, e.g. drop the
"raw_data$"s for readability
"1" is a character, whereas 1 is a numeric. Probably you want one (1)
crash
You should get something like:
panel_df <- raw_data %>%
select(ZIPCODE = ZIP.CODE,
YEAR = CRASH.YEAR,
QUARTER = CRASH.QUARTER,
NUMBER.OF.CRASHES = 1,
...
Rerun your code, and see if you get the error.
Although not wrong, having such long variable names ("Number.of.motorist.injured") quickly becomes a pain to type and makes the statements long and so difficult to read. Shorter is usually easier to code, something like "mot_inj".

Refer to a variable by pasting strings then make changes and see them refrelcted in the original variable

my_mtcars_1 <- mtcars
my_mtcars_2 <- mtcars
my_mtcars_3 <- mtcars
for(i in 1:3) {get(paste0('my_mtcars_', i))$blah <- 1}
Error in get(paste0("my_mtcars_", i))$blah <- 1 :
target of assignment expands to non-language object
I would like each of my 3 data frames to have a new field called blah that has a value of 1.
How can I iterate over a range of numbers in a loop and refer to DFs by name by pasting the variable name into a string and then edit the df in this way?
These three options all assume you want to modify them and keep them in the environment.
So, if it must be a dataframes (in your environment & in a loop) you could do something like this:
for(i in 1:3) {
obj_name = paste0('my_mtcars_', i)
obj = get(obj_name)
obj$blah = 1
assign(obj_name, obj, envir = .GlobalEnv) # Send back to global environment
}
I agree with #Duck that a list is a better format (and preferred to the above loop). So, if you use a list and need it in your environment, use what Duck suggested with list2env() and send everything back to the .GlobalEnv. I.e. (in one ugly line),
list2env(lapply(mget(ls(pattern = "my_mtcars_")), function(x) {x[["blah"]] = 1; x}), .GlobalEnv)
Or, if you are amenable to working with data.table, you could use the set() function to add columns:
library(data.table)
# assuming my_mtcars_* is already a data.table
for(i in 1:3) {
set(get(paste0('my_mtcars_', i)), NULL, "blah", 1)
}
As suggestion, it is better if you manage data inside a list and use lapply() instead of loop:
#List
List <- list(my_mtcars_1 = mtcars,
my_mtcars_2 = mtcars,
my_mtcars_3 = mtcars)
#Variable
List2 <- lapply(List,function(x) {x$bla <- 1;return(x)})
And it is easy to store your data using a code like this:
#List
List <- mget(ls(pattern = 'my_mt'))
So no need of defining each dataset individually.
We can use tidyverse
library(dplyr)
library(purrr)
map(mget(ls(pattern = '^my_mtcars_\\d+$')), ~ .x %>%
mutate(blah = 1)) %>%
list2env(.GlobalEnv)

Parsing colnames text string as expression in R

I am trying to create a large number of data frames in a for loop using the "assign" function in R. I want to use the colnames function to set the column names in the data frame. The code I am trying to emulate is the following:
county_tmax_min_df <- data.frame(array(NA,c(length(days),67)))
colnames(county_tmax_min_df) <- c('Date',sd_counties$NAME)
county_tmax_min_df$Date <- days
The code I have so far in the loop looks like this:
file_vars = c('file1','file2')
days <- seq(as.Date("1979-01-01"), as.Date("1979-01-02"), "days")
f = 1
for (f in 1:2){
assign(paste0('county_',file_vars[f]),data.frame(array(NA,c(length(days),67))))
}
I need to be able to set the column names similar to how I did in the above statement. How do I do this? I think it needs to be something like this, but I am unsure what goes in the text portion. The end result I need is just a bunch of data frames. Any help would be wonderful. Thank you.
expression(parse(text = ))
You can set the names within assign, like that:
file_vars = c('file1', 'file2')
days <- seq.Date(from = as.Date("1979-01-01"), to = as.Date("1979-01-02"), by = "days")
for (f in seq_along(file_vars)) {
assign(x = paste0('county_', file_vars[f]),
value = {
df <- data.frame(array(NA, c(length(days), 67)))
colnames(df) <- paste0("fancy_column_",
sample(LETTERS, size = ncol(df), replace = TRUE))
df
})
}
When in {} you can use colnames(df) or setNames to assign column names in any manner desired. In your first piece of code you are referring to sd_counties object that is not available but the generic idea should work for you.

New data frame after function is empty

I prepare a function to have a temporary dataframe, but whent i apply this function on my old dataframe , the temporary dataframe is empty. How can i solve this ?
I tried this code :
data_a <- as.data.frame(cbind(pop=c("a1","b2","c3","d4","d5"),
PA1=c(1,40,430,4330,43330),
PA2=c(2,50,530,5330,53330)))
perm_all <- function(dat,vname,loc1, loc2){
popu <- dat["vname"]
locci_1 <- sample(dat["loc1"], replace = F)
locci_2 <- sample(dat["loc2"], replace = F)
data_a_1 <- as.data.frame(cbind(popu, locci_1, locci_2))
return(data_a_1)
}
data_3 <- perm_all(dat= "data_a",vname="pop",loc1="PA1",loc2="PA2")
I've tried to convert the data_a with
data_a <- as.matrix(data_a)
and
popu <- sample(dat[,1], replace = F)
but they didn't work too
Thank's :)
There are maybe multiple issues. First, when you have created your data frame, be aware that data.frame function family treat string as a factor by default. It may be not what you want.
Then #NURAIMIAZIMAH is right, your function needs a data frame to work properly, so :
data_3 <- perm_all(dat= data_a,vname="pop",loc1="PA1",loc2="PA2")
is a good start.
Moreover, you give value to vector like vname, loc1 and loc2. But you only use the name of these objects in your function, because you forgot to remove quotation mark.
perm_all <- function(dat,vname,loc1, loc2){
popu <- dat[vname]
locci_1 <- sample(dat[loc1], replace = F)
locci_2 <- sample(dat[loc2], replace = F)
data_a_1 <- as.data.frame(cbind(popu, locci_1, locci_2))
return(data_a_1)
}
Now your function should work, but maybe not in the way you would like to. Because there won't be any permutations in your data_3 table. If you look carefully, the type of return of this part of the code dat[loc1] is a data frame. You certainly want a vector to permute your data, so you have to subset your data frame like this : dat[,loc1].
This code below should do what you expect.
data_a <- as.data.frame(cbind(pop=c("a1","b2","c3","d4","d5"),
PA1=c(1,40,430,4330,43330),
PA2=c(2,50,530,5330,53330)))
perm_all <- function(dat,vname,loc1, loc2){
popu <- dat[vname]
locci_1 <- sample(dat[,loc1], replace = F)
locci_2 <- sample(dat[,loc2], replace = F)
data_a_1 <- as.data.frame(cbind(popu, locci_1, locci_2))
return(data_a_1)
}
data_3 <- perm_all(dat= data_a,vname="pop",loc1="PA1",loc2="PA2")
See you.

cbind dataframe in R with placeholders

Imagine I have three dataframes:
data.frame1 <- data.frame(x=c(1:10))
data.frame2 <- data.frame(x=c(11:20))
data.frame3 <- data.frame(x=c(21:30))
I could bind them together by explicitely naming each of them:
res.data.frame <- cbind(data.frame1, data.frame2, data.frame3)
However, I am looking for more dynamic ways to do so, e.g. with placeholders.
This saves somehow the three dataframes in a new dataframe, but not in a usable format:
res.data.frame1 <- as.data.frame(mapply(get, grep("^data.frame.$", ls(), value=T)))
This command would only save the three names:
res.data.frame2 <- grep(pattern = "^data.frame.$", ls(), value=T)
This one only gives an error message:
res.data.frame3 <- do.call(cbind, lapply(ls(pattern = "^data.frame.$")), get)
Does anyone know the right way to do this?
Something like this maybe?
Assuming ls()
# [1] "data.frame1" "data.frame2" "data.frame3"
as.data.frame(Reduce("cbind", sapply(ls(), function(i) get(i))))
Based on #akrun's comment, this can be simplified to
as.data.frame(Reduce("cbind", mget(ls())))

Resources