set dataframe name inside a function using lapply - r

Let's say, for the sake of the example, that I have a list of departments. Everyone of them is on a separate table named "departmentName", so I created a list this way.
depts <- c("financial","sales",.....)
and then iterate to get members this way creating a function:
get.employees <- function(tablename) {
con <- DBI::dbConnect(connectiondata....)
query <- glue::glue("select name,position,area from {tablename}")
assign(tablename,
dplyr::tbl(conn, sql(query)) %>% collect())
}
lapply(depts,get.employees)
It works fine but It returned a list of data frames with no name assigned to every element as I was expecting.
I need every dataframe named as the department name.

1) Simplifying the example to use get.employees and depts in the Note at the end we can use Map instead of lapply:
L <- Map(get.employees, depts)
names(L)
## [1] "finance" "sales"
2) This also works:
L2 <- sapply(depts, get.employees, simplify = FALSE)
names(L2)
## [1] "finance" "sales"
Note
Simplified example:
get.employees <- function(x) BOD
depts <- c("finance", "sales")

You can also try-
> ls <- mapply(get.employees, depts,SIMPLIFY = F)
> names(ls)
[1] "finance" "sales"
Note- Input data was taken from answer provided by #G. Grothendleck

Related

create new dataframes from a master database in R

I have a database of different notifiable diseases.
I want to extract a dataframe for each disease in that database so that I can make an automated report form a template in Rmarkdown.
I created a function for creating the dataframe
NMC <- is master database
The database lists all conditions reported
I created a list of those conditions
conditions <- list(unique(NMC$Condition))
I then created a function to create a new dataframe based on the condition
newdf <- function(data, var){
var <- data %>% filter(data$Condition %in% paste0(var))
var
}
Now I want to run my function to create a number of new dataframes from the master database. I thought of doing a for loop:
for (df in conditions){
df <- newdf(NMC, "df")
}
Which runs but doesn't give me anything.
So I found split(), but this hasn't perfectly solved my problem as I still need to type out all the conditions to get each df to apply to the r template.
NMC <- split(NMC, factor(NMC$Condition), drop= FALSE)
#then to get a specifc df (which is laborious)
rubella <- NMC$congenitalrubellasyndrome
# How can i get the dataframes per condition into my environemnt, or access them easily, maybe with %>% fucntion?
My end goal is to then apply an R template to each data frame so that i have a standard epicurve/descriptive stats for each disease.
Thanks
> df <- data.frame(a = rep(letters[1:10], each = 3), x = 1:30)
> for (i in df$a) {
+ assign(i, df[df$a == i, ])
+ }
> ls()
[1] "a" "b" "c" "d" "df" "e" "f" "g" "h" "i" "j"
> a
a x
1 a 1
2 a 2
3 a 3
But see my comment above.

Refer to a variable by pasting strings then make changes and see them refrelcted in the original variable

my_mtcars_1 <- mtcars
my_mtcars_2 <- mtcars
my_mtcars_3 <- mtcars
for(i in 1:3) {get(paste0('my_mtcars_', i))$blah <- 1}
Error in get(paste0("my_mtcars_", i))$blah <- 1 :
target of assignment expands to non-language object
I would like each of my 3 data frames to have a new field called blah that has a value of 1.
How can I iterate over a range of numbers in a loop and refer to DFs by name by pasting the variable name into a string and then edit the df in this way?
These three options all assume you want to modify them and keep them in the environment.
So, if it must be a dataframes (in your environment & in a loop) you could do something like this:
for(i in 1:3) {
obj_name = paste0('my_mtcars_', i)
obj = get(obj_name)
obj$blah = 1
assign(obj_name, obj, envir = .GlobalEnv) # Send back to global environment
}
I agree with #Duck that a list is a better format (and preferred to the above loop). So, if you use a list and need it in your environment, use what Duck suggested with list2env() and send everything back to the .GlobalEnv. I.e. (in one ugly line),
list2env(lapply(mget(ls(pattern = "my_mtcars_")), function(x) {x[["blah"]] = 1; x}), .GlobalEnv)
Or, if you are amenable to working with data.table, you could use the set() function to add columns:
library(data.table)
# assuming my_mtcars_* is already a data.table
for(i in 1:3) {
set(get(paste0('my_mtcars_', i)), NULL, "blah", 1)
}
As suggestion, it is better if you manage data inside a list and use lapply() instead of loop:
#List
List <- list(my_mtcars_1 = mtcars,
my_mtcars_2 = mtcars,
my_mtcars_3 = mtcars)
#Variable
List2 <- lapply(List,function(x) {x$bla <- 1;return(x)})
And it is easy to store your data using a code like this:
#List
List <- mget(ls(pattern = 'my_mt'))
So no need of defining each dataset individually.
We can use tidyverse
library(dplyr)
library(purrr)
map(mget(ls(pattern = '^my_mtcars_\\d+$')), ~ .x %>%
mutate(blah = 1)) %>%
list2env(.GlobalEnv)

Replacing values in a column using a named list

Let's say one of the columns in my dataframe refers to the name of a city. The city names are expressed as "longformA", "longformB", and I'd like to replace them all with "shrtfrmA", "shrtfrmB". Each "longform" name has an associated "shrtfrm" name with which it should be replaced.
I've got a solution involving a named list and purrr bouncing around in my head, but I can't quite conceptualize it. The named list would have this structure:
city_names_short <- list("ANA" = "Anaheim", "BOS" = "Boston")
And so on, and so forth.
example_df$city[example_df$city == "Anaheim"] <- "ANA"
example_df$city[example_df$city == "Boston"] <- "BOS"
I could of course replace them one by one, as per the above, but I'd like to be a little more elegant.
Any and all advice is greatly appreciated!
I suggest unlisting your list to a named vector and then using match to create the shortform names:
city_names_short <- unlist(city_names_short)
df$shortname <- names(city_names_short)[match(df$city, city_names_short)]
Method 1
You can loop over your city column using sapply:
df$city <- sapply(df$city, function(city) {
names(city_names_short)[city_names_short == city]
})
The function in sapply finds the name (i.e. the shortened city name) of the list item that matches each city name.
Method 2
You can create a map by inverting the city_names_short list:
city_map <- names(city_names_short)
names(city_map) <- city_names_short
df$city <- city_map[df$city]
There is a function setNames in base R:
map = setNames(c("ANA","BOS"),c("Anaheim","Boston"))
df$city_short = map[df$city_long]

cbind dataframe in R with placeholders

Imagine I have three dataframes:
data.frame1 <- data.frame(x=c(1:10))
data.frame2 <- data.frame(x=c(11:20))
data.frame3 <- data.frame(x=c(21:30))
I could bind them together by explicitely naming each of them:
res.data.frame <- cbind(data.frame1, data.frame2, data.frame3)
However, I am looking for more dynamic ways to do so, e.g. with placeholders.
This saves somehow the three dataframes in a new dataframe, but not in a usable format:
res.data.frame1 <- as.data.frame(mapply(get, grep("^data.frame.$", ls(), value=T)))
This command would only save the three names:
res.data.frame2 <- grep(pattern = "^data.frame.$", ls(), value=T)
This one only gives an error message:
res.data.frame3 <- do.call(cbind, lapply(ls(pattern = "^data.frame.$")), get)
Does anyone know the right way to do this?
Something like this maybe?
Assuming ls()
# [1] "data.frame1" "data.frame2" "data.frame3"
as.data.frame(Reduce("cbind", sapply(ls(), function(i) get(i))))
Based on #akrun's comment, this can be simplified to
as.data.frame(Reduce("cbind", mget(ls())))

Split the dataframe into subset dataframes and naming them on-the-fly (for loop)

I have 9880 records in a data frame, I am trying to split it into 9 groups of 1000 each and the last group will have 880 records and also name them accordingly. I used for-loop for 1-9 groups but manually for the last 880 records, but i am sure there are better ways to achieve this,
library(sqldf)
for (i in 0:8)
{
assign(paste("test",i,sep="_"),as.data.frame(final_9880[((1000*i)+1):(1000*(i+1)), (1:53)]))
}
test_9<- num_final_9880[9001:9880,1:53]
also am unable to append all the parts in one for-loop!
#append all parts
all_9880<-rbind(test_0,test_1,test_2,test_3,test_4,test_5,test_6,test_7,test_8,test_9)
Any help is appreciated, thanks!
A small variation on this solution
ls <- split(final_9880, rep(0:9, each = 1000, length.out = 9880)) # edited to Roman's suggestion
for(i in 1:10) assign(paste("test",i,sep="_"), ls[[i]])
Your command for binding should work.
Edit
If you have many dataframes you can use a parse-eval combo. I use the package gsubfn for readability.
library(gsubfn)
nms <- paste("test", 1:10, sep="_", collapse=",")
eval(fn$parse(text='do.call(rbind, list($nms))'))
How does this work? First I create a string containing the comma-separated list of the dataframes
> paste("test", 1:10, sep="_", collapse=",")
[1] "test_1,test_2,test_3,test_4,test_5,test_6,test_7,test_8,test_9,test_10"
Then I use this string to construct the list
list(test_1,test_2,test_3,test_4,test_5,test_6,test_7,test_8,test_9,test_10)
using parse and eval with string interpolation.
eval(fn$parse(text='list($nms)'))
String interpolation is implemented via the fn$ prefix of parse, its effect is to intercept and substitute $nms with the string contained in the variable nms. Parsing and evaluating the string "list($mns)" creates the list needed. In the solution the rbind is included in the parse-eval combo.
EDIT 2
You can collect all variables with a certain pattern, put them in a list and bind them by rows.
do.call("rbind", sapply(ls(pattern = "test_"), get, simplify = FALSE))
ls finds all variables with a pattern "test_"
sapply retrieves all those variables and stores them in a list
do.call flattens the list row-wise.
No for loop required -- use split
data <- data.frame(a = 1:9880, b = sample(letters, 9880, replace = TRUE))
splitter <- (data$a-1) %/% 1000
.list <- split(data, splitter)
lapply(0:9, function(i){
assign(paste('test',i,sep='_'), .list[[(i+1)]], envir = .GlobalEnv)
return(invisible())
})
all_9880<-rbind(test_0,test_1,test_2,test_3,test_4,test_5,test_6,test_7,test_8,test_9)
identical(all_9880,data)
## [1] TRUE

Resources