How to iterate with the get() function in R - r

I have a few data frames that have the names df_JANUARY 2020, df_FEBRUARY 2020 etc. (I know spaces are an ill practice in variable assignment, but it has to do with a sql query). And would like to build a function to iterate through the months of these data frames. The purpose of this is have the function (not written below) clean each df the same way.
date <- c("JANUARY 2020", "FEBRUARY 2020")
x <- function(date) {
y <- get(paste0("df_", date))
}
for(i in seq_along(date)) {
z <- date[i]
assign(paste0("dfclean_", date[i]), x(z))
}
The problem being that when I use the get() function it's pushing the whole list through rather than one element at a time. Is there away to avoid this problem with this methodology or is there a better way to approach this problem? Any help is extremely appreciated.

We can convert the matrix to data.frame and then use $ as matrix columns are extracted with [
x <- function(daten) {
y <- as.data.frame(get(paste0("df_", daten)))
y[grep("Enterprise", y$AcctType), ]
}
for(i in seq_along(date)) {
z <- date[i]
assign(paste0("dfclean_", date[i]), x(z))
}
We can also use mget
lst1 <- mget(paste0("df_", date))
lst1 <- lapply(lst1, function(x) subset(as.data.frame(x),
grepl("Enterprise",AcctType)))
names(lst1) <- sub("_", "clean_", names(lst1))
list2env(lst1, .GlobalEnv)

I know you didn't ask for this, but how about just rename all of the dataframes with _ instead of space?
The first line assigns all of the objects in the global environment with df in the name to be elements of a list named mydfs.
The second line replaces space with _ in the names.
The third line assigns all of the list elements into the global environment.
mydfs <- mget(ls(pattern = "df"), globalenv())
names(mydfs) <- gsub(" ","_",names(mydfs))
list2env(mydfs, env = globalenv())
Or, option two, you could just use lapply on mydfs.

Related

Refer to a variable by pasting strings then make changes and see them refrelcted in the original variable

my_mtcars_1 <- mtcars
my_mtcars_2 <- mtcars
my_mtcars_3 <- mtcars
for(i in 1:3) {get(paste0('my_mtcars_', i))$blah <- 1}
Error in get(paste0("my_mtcars_", i))$blah <- 1 :
target of assignment expands to non-language object
I would like each of my 3 data frames to have a new field called blah that has a value of 1.
How can I iterate over a range of numbers in a loop and refer to DFs by name by pasting the variable name into a string and then edit the df in this way?
These three options all assume you want to modify them and keep them in the environment.
So, if it must be a dataframes (in your environment & in a loop) you could do something like this:
for(i in 1:3) {
obj_name = paste0('my_mtcars_', i)
obj = get(obj_name)
obj$blah = 1
assign(obj_name, obj, envir = .GlobalEnv) # Send back to global environment
}
I agree with #Duck that a list is a better format (and preferred to the above loop). So, if you use a list and need it in your environment, use what Duck suggested with list2env() and send everything back to the .GlobalEnv. I.e. (in one ugly line),
list2env(lapply(mget(ls(pattern = "my_mtcars_")), function(x) {x[["blah"]] = 1; x}), .GlobalEnv)
Or, if you are amenable to working with data.table, you could use the set() function to add columns:
library(data.table)
# assuming my_mtcars_* is already a data.table
for(i in 1:3) {
set(get(paste0('my_mtcars_', i)), NULL, "blah", 1)
}
As suggestion, it is better if you manage data inside a list and use lapply() instead of loop:
#List
List <- list(my_mtcars_1 = mtcars,
my_mtcars_2 = mtcars,
my_mtcars_3 = mtcars)
#Variable
List2 <- lapply(List,function(x) {x$bla <- 1;return(x)})
And it is easy to store your data using a code like this:
#List
List <- mget(ls(pattern = 'my_mt'))
So no need of defining each dataset individually.
We can use tidyverse
library(dplyr)
library(purrr)
map(mget(ls(pattern = '^my_mtcars_\\d+$')), ~ .x %>%
mutate(blah = 1)) %>%
list2env(.GlobalEnv)

Changing column names of many dataframes in a loop

I have three dataframes EC_Data, ED_Data, and ST_data
all of them have the same column names and more specifically, after 4th column
has Year named colums from 2006 to 2015
So I create a new list that has all three dataframes:
Alldata = list(EC_Data, ED_Data, ST_Data)
So I tried to rename all the columns in a for loop like below...
for(x in seq_along(Alldata))
{
for(j in seq_along(Alldata[[x]]))
{
if(j>4)
{
names(colnames(Alldata[[x]][j])) <- paste("X", substr(colnames(Alldata[[x]][j]), start = 1, stop = 5),sep="")
print(colnames(Alldata[[x]][j]))
}
}
}
But nothing happens...
I cannot understand why, because when I try to call the names of every list, for example with
view(colnames(Alldata[[2]]))
the names seems to be exactly what I want to see
Can someone help me to understand the reason that this loop doesn't work and what can I use instead of this?
Thank you
If we want to rename all the columns use lapply to loop over the list, paste with the substr of the existing column names and assign them with setNames
Alldata <- lapply(Alldata, function(x)
setNames(x, paste0("X", substr(colnames(x), 1, 5))))
Or using a for loop
for(i in seq_along(Alldata)) {
Alldata[[i]] <- setNames(Alldata[[i]],
paste0("X", substr(colnames(Alldata[[i]]), 1, 5))
}

How to use mapply to mutate columns in a list of dataframes

I have a list of dataframes - some dataframes in this list require their columns to mutated into date columns. I was wondering if it possible to do this with mapply.
Here is my attempt (files1 is the list of dataframes, c("data, data1") are the names of dataframes within files1, c("adfFlowDate","datedate") are the names of the columns within the respective dataframes:
files2 <- repair_dates(files1, c("data, data1"), c("adfFlowDate","datedate"))
The function that does not work:
repair_dates <- function(data, df_list, col_list) {
mapply(function(n, i) data[[n]] <<- data[[n]] %>% mutate(i = as.Date(i, origin = "1970-01-01")), df_list, col_list)
return(data)
}
Your set-up is fairly complex here, calling an anonymous function inside an mapply inside another function, which takes three parameters, all relating to a single nested object.
Personally, I wouldn't add to this complexity by accommodating the non-standard evaluation required to get mutate to work here (though it is possible). Perhaps something like this (though difficult to tell without any reproducible data) -
repair_dates <- function(data, df_list, col_list)
{
mapply(function(n, i) {
data[[n]][[i]] <- as.Date(data[[n]][[i]], origin = "1970-01-01")
return(data[[n]])
}, df_list, col_list, SIMPLIFY = FALSE)
}

R new variable assignment

I made a loop that assigns the result of a function to a newly created variable. After that that variable is used to create another.
This second step fails to produce the expected result.
library(stringr)
for (i in 1:length(Ids)){
nam <- paste("data", Ids[i], sep = "_")
assign(nam, GetReportData(query, token,paginate_query = F))
newvar=paste(nam,"contentid",sep="$")
originStr=paste(nam,"pagePath",sep="$")
assign(newvar,str_extract(originStr,"&id=[0-9]+"))
}
Don't create a bunch of variables, store related values in named lists to make it easier to retrieve them. You didn't supply any input to test with, but i'm guessing this does the same thing.
library(stringr)
mydata <- lapply(1:length(Ids), function(i) {
dd <- GetReportData(query, token,paginate_query = F))
dd$contentid <- str_extract(d$pagePath,"&id=[0-9]+"))
dd
})
This will return a list of data.frames. You can access them with mydata[[1]], mydata[[2]], etc rather than data_1, data_2, etc
If you absolutely insist on creating a bunch of variables, just make sure to do all your transformations on an actual object, and then save that object when your are done. You can never use assign with names that have $ or [ as described in the help page: "assign does not dispatch assignment methods, so it cannot be used to set elements of vectors, names, attributes, etc." For example
for(i in 1:length(Ids)) {
dd <- GetReportData(query, token,paginate_query = F))
dd$contentid <- str_extract(d$pagePath,"&id=[0-9]+"))
assign(paste("data",i,sep="_"), dd)
}

Function to extract data frames from a list does not provide any output

I am not sure why my function is not working. It loads into the environment, but it returns nothing when I use it. I think the problem may be that I have not specified the "return" argument, but I am not sure where or how it should be placed? Thank you for any help.
Here is the function that I want to use to extract data frames from a list, after I have used the split function to subset a larger data frame.
extractDF<- function(x) {
dfname<-names(x)
n<-length(x)
for (i in 1:n) {
nam <- dfname[i]
assign (nam, data.frame(x[i]))
}
}
When I use this loop outside of a function it works just as it should. Here is the working loop...
dfname<-names(SubPop)
n<-length(dfname)
for (i in 1:n) {
nam <- dfname[i]
assign (nam, data.frame(SubPop[i]))
}
The funtion seems to append column names of different datasets in the list with names of the list. If that is the case, use envir=.GlobalEnv in the assign
extractDF<- function(x) {
dfname<-names(x)
n<-length(x)
for (i in 1:n) {
nam <- dfname[i]
assign (nam, data.frame(x[i]), envir=.GlobalEnv)
}
}
extractDF(SubPop)
Now, check the df1 and df2. The colnames are changed.
colnames(df1)
#[1] "df1.Species8" "df1.Species4" "df1.Species1" "df1.Species5" "df1.Species7"
#[6] "df1.Species6"
colnames(df2)
#[1] "df2.Species4" "df2.Species7" "df2.Species5" "df2.Species10"
#[5] "df2.Species1" "df2.Species2" "df2.Species6" "df2.Species8"
Update
If you don't want to modify the colnames and assuming that the function was just to extract the modified datasets within in the list (from some other analysis), you can do:
list2env(setNames(SubPop, ls(pattern="^df\\d+")), envir=.GlobalEnv)
Here, I am using setNames so as to name the SubPop list elements (in case it is unnamed).
data
set.seed(22)
df1 <- as.data.frame(matrix(sample(0:1, 10*6, replace=TRUE), ncol=6,
dimnames=list(NULL, sample(paste0("Species", 1:10), 6, replace=FALSE))))
set.seed(35)
df2 <- as.data.frame(matrix(sample(0:1, 10*8, replace=TRUE), ncol=8,
dimnames=list(NULL, sample(paste0("Species", 1:10),8 , replace=FALSE))))
SubPop <- list(df1, df2)
names(SubPop) <- c('df1', 'df2')
dfname<-names(SubPop)

Resources