create list of dataframes matching a pattern - r

This is a very simple question, however I can't seem to come up w/ an answer. I would like to create a list of data frames matching a pattern, then rm these from the global environment.
The pattern to match is 'water_land_by_owntype_*'
This is what I have tried, but it doesn't work...I think b/c it doesn't know where to search for the string.
rm (matches <- list(
grep('water_land_by_owntype_*')))
-al

Hi you can do like this :
# Create some data.frame
water_land_by_owntype_1 <- mtcars
water_land_by_owntype_2 <- mtcars
water_land_by_owntype_3 <- mtcars
water_land_by_owntype_4 <- mtcars
water_land_by_owntype_5 <- mtcars
# Put them in a list
water_land_by_owntype <- lapply(ls(pattern = "water_land_by_owntype_.*"), get)
# or more directly
water_land_by_owntype <- mget(ls(pattern = "water_land_by_owntype_.*"))
# Delete them
rm(list = ls(pattern = "water_land_by_owntype_.*"))

This might be the simplest way to do it.
1. Extract variables by ls()
2. Detect (return boolean) pattern
3. Locate and subset
4. Remove
library(stringr)
a = ls()
index = which(str_detect(ls, "water_land_by_owntype_"))
b = a[index]
rm(b)
Hope this helps,

Related

Function does return empty data frame

my first question on Stack Overflow so bear with me ;-)
I wrote a function to row-bind all objects whose names meet a regex criterion into a dataframe.
Curiously, if I run the lines out of the function, it works perfectly. But within the function, an empty data frame is returned.
Reproducible example:
offers_2022_05 <- data.frame(x = 3)
offers_2022_06 <- data.frame(x = 6)
bind_multiple_dates <- function(prefix) {
objects <- ls(pattern = sprintf("%s_[0-9]{4}_[0-9]{2}", prefix))
data <- bind_rows(mget(objects, envir = .GlobalEnv), .id = "month")
return(data)
}
bind_multiple_dates("offers")
# A tibble: 0 × 0
However, this works:
prefix <- "offers"
objects <- ls(pattern = sprintf("%s_[0-9]{4}_[0-9]{2}", prefix))
data <- bind_rows(mget(objects, envir = .GlobalEnv), .id = "month")
data
month x
1 offers_2022_05 3
2 offers_2022_06 5
I suppose it has something to do with the environment, but I can't really figure it out. Is there a better way to do this? I would like to keep the code as a function.
Thanks in advance :-)
By default ls() will look in the current environment when looking for variables. In this case, the current environment is the function body and those data.frame variables are not inside the function scope. You can explicitly set the environment to the calling environment to find using the envir= parameter. For example
bind_multiple_dates <- function(prefix) {
objects <- ls(pattern = sprintf("%s_[0-9]{4}_[0-9]{2}", prefix), envir=parent.frame())
data <- bind_rows(mget(objects, envir = .GlobalEnv), .id = "month")
return(data)
}
The "better" way to do this is to not create a bunch of separate variables like offers_2022_05 and offers_2022_06 in the first place. Variables should not have data or indexes in their name. It would be better to create the data frames in a list directly from the beginning. Often this is easily accomplished with a call to lapply or purrr::map. See this existing question for more info

Refer to a variable by pasting strings then make changes and see them refrelcted in the original variable

my_mtcars_1 <- mtcars
my_mtcars_2 <- mtcars
my_mtcars_3 <- mtcars
for(i in 1:3) {get(paste0('my_mtcars_', i))$blah <- 1}
Error in get(paste0("my_mtcars_", i))$blah <- 1 :
target of assignment expands to non-language object
I would like each of my 3 data frames to have a new field called blah that has a value of 1.
How can I iterate over a range of numbers in a loop and refer to DFs by name by pasting the variable name into a string and then edit the df in this way?
These three options all assume you want to modify them and keep them in the environment.
So, if it must be a dataframes (in your environment & in a loop) you could do something like this:
for(i in 1:3) {
obj_name = paste0('my_mtcars_', i)
obj = get(obj_name)
obj$blah = 1
assign(obj_name, obj, envir = .GlobalEnv) # Send back to global environment
}
I agree with #Duck that a list is a better format (and preferred to the above loop). So, if you use a list and need it in your environment, use what Duck suggested with list2env() and send everything back to the .GlobalEnv. I.e. (in one ugly line),
list2env(lapply(mget(ls(pattern = "my_mtcars_")), function(x) {x[["blah"]] = 1; x}), .GlobalEnv)
Or, if you are amenable to working with data.table, you could use the set() function to add columns:
library(data.table)
# assuming my_mtcars_* is already a data.table
for(i in 1:3) {
set(get(paste0('my_mtcars_', i)), NULL, "blah", 1)
}
As suggestion, it is better if you manage data inside a list and use lapply() instead of loop:
#List
List <- list(my_mtcars_1 = mtcars,
my_mtcars_2 = mtcars,
my_mtcars_3 = mtcars)
#Variable
List2 <- lapply(List,function(x) {x$bla <- 1;return(x)})
And it is easy to store your data using a code like this:
#List
List <- mget(ls(pattern = 'my_mt'))
So no need of defining each dataset individually.
We can use tidyverse
library(dplyr)
library(purrr)
map(mget(ls(pattern = '^my_mtcars_\\d+$')), ~ .x %>%
mutate(blah = 1)) %>%
list2env(.GlobalEnv)

cbind dataframe in R with placeholders

Imagine I have three dataframes:
data.frame1 <- data.frame(x=c(1:10))
data.frame2 <- data.frame(x=c(11:20))
data.frame3 <- data.frame(x=c(21:30))
I could bind them together by explicitely naming each of them:
res.data.frame <- cbind(data.frame1, data.frame2, data.frame3)
However, I am looking for more dynamic ways to do so, e.g. with placeholders.
This saves somehow the three dataframes in a new dataframe, but not in a usable format:
res.data.frame1 <- as.data.frame(mapply(get, grep("^data.frame.$", ls(), value=T)))
This command would only save the three names:
res.data.frame2 <- grep(pattern = "^data.frame.$", ls(), value=T)
This one only gives an error message:
res.data.frame3 <- do.call(cbind, lapply(ls(pattern = "^data.frame.$")), get)
Does anyone know the right way to do this?
Something like this maybe?
Assuming ls()
# [1] "data.frame1" "data.frame2" "data.frame3"
as.data.frame(Reduce("cbind", sapply(ls(), function(i) get(i))))
Based on #akrun's comment, this can be simplified to
as.data.frame(Reduce("cbind", mget(ls())))

globbing dataframes or other objects in R

This should be a simple one, i hope. I have several dataframes loaded into workspace, labelled df01 to df100, not all numbers represented. I'd like to plot a specific column across all datasets, for example in a box plot. How do I refer all objects starting with df, using globbing, ie:
boxplot(df00$col1, df02$col1, df04$col1)
=
boxplot(df*$col1)
The idomatic approach is to work with lists, or to use a separate environment.
You can create this list using ls and pattern
df.names <- ls(pattern = '^df')
# note
# ls(pattern ='^df[[:digit:]]{2,}')
# may be safer if there are objects starting with df you don't want
df.list <- mget(df.names)
# note if you are using a version of R prior to R 3.0.0
# you will need `envir = parent.frame()`
# mget(ls(pattern = 'df'), envir = parent.frame())
# use `lapply` to extract the relevant columns
df.col1 <- lapply(df.list, '[[', 'col1')
# call boxplot
boxplot(df.col1)
Try this:
nums <- sprintf("%02d", 0:100)
dfs.names <- Filter(exists, paste0("df", nums))
dfs.obj <- lapply(dfs.names, get)
dfs.col1 <- lapply(dfs.obj, `[[`, "col1")
do.call(boxplot, dfs.col1)

Split the dataframe into subset dataframes and naming them on-the-fly (for loop)

I have 9880 records in a data frame, I am trying to split it into 9 groups of 1000 each and the last group will have 880 records and also name them accordingly. I used for-loop for 1-9 groups but manually for the last 880 records, but i am sure there are better ways to achieve this,
library(sqldf)
for (i in 0:8)
{
assign(paste("test",i,sep="_"),as.data.frame(final_9880[((1000*i)+1):(1000*(i+1)), (1:53)]))
}
test_9<- num_final_9880[9001:9880,1:53]
also am unable to append all the parts in one for-loop!
#append all parts
all_9880<-rbind(test_0,test_1,test_2,test_3,test_4,test_5,test_6,test_7,test_8,test_9)
Any help is appreciated, thanks!
A small variation on this solution
ls <- split(final_9880, rep(0:9, each = 1000, length.out = 9880)) # edited to Roman's suggestion
for(i in 1:10) assign(paste("test",i,sep="_"), ls[[i]])
Your command for binding should work.
Edit
If you have many dataframes you can use a parse-eval combo. I use the package gsubfn for readability.
library(gsubfn)
nms <- paste("test", 1:10, sep="_", collapse=",")
eval(fn$parse(text='do.call(rbind, list($nms))'))
How does this work? First I create a string containing the comma-separated list of the dataframes
> paste("test", 1:10, sep="_", collapse=",")
[1] "test_1,test_2,test_3,test_4,test_5,test_6,test_7,test_8,test_9,test_10"
Then I use this string to construct the list
list(test_1,test_2,test_3,test_4,test_5,test_6,test_7,test_8,test_9,test_10)
using parse and eval with string interpolation.
eval(fn$parse(text='list($nms)'))
String interpolation is implemented via the fn$ prefix of parse, its effect is to intercept and substitute $nms with the string contained in the variable nms. Parsing and evaluating the string "list($mns)" creates the list needed. In the solution the rbind is included in the parse-eval combo.
EDIT 2
You can collect all variables with a certain pattern, put them in a list and bind them by rows.
do.call("rbind", sapply(ls(pattern = "test_"), get, simplify = FALSE))
ls finds all variables with a pattern "test_"
sapply retrieves all those variables and stores them in a list
do.call flattens the list row-wise.
No for loop required -- use split
data <- data.frame(a = 1:9880, b = sample(letters, 9880, replace = TRUE))
splitter <- (data$a-1) %/% 1000
.list <- split(data, splitter)
lapply(0:9, function(i){
assign(paste('test',i,sep='_'), .list[[(i+1)]], envir = .GlobalEnv)
return(invisible())
})
all_9880<-rbind(test_0,test_1,test_2,test_3,test_4,test_5,test_6,test_7,test_8,test_9)
identical(all_9880,data)
## [1] TRUE

Resources