How to modify a dataframe that can't be called directly? - r

I'm creating a function to that several data frames automatically. How can I call those data.frames to mutate them?
For example, say I created a data for which each item is meant to become a dataframe like so:
assign(paste0("d","f"),c(tree,fox,river))
Then I take an item from the list and use it to name a dataframe.
assign(paste(get(paste0("d","f"))[1]),as.data.frame(c(1,2,3))
so that now if i do:
get(paste(get(paste0("d","f"))[1]))
it returns a data frame with 1,2,3
Here's my problem, I want to be able to modify those items so something like
get(paste(get(paste0("d","f"))[1]))[1] <- 4
#So that now if i do
get(paste(get(paste0("d","f"))[1]))
it returns a data frame with 4,2,3

It is better not to create multiple objects in the global environment. If it is already created, load them into a list and do all the changes/transformations/mutates etc. in the list. It would make easier to read/write in list rather than looking for these objects floating in the global env
lapply(mget(paste0("df", 1:3)), function(x) {x[[1]] <- 4; x})

Related

Efficient way of extracting names of a large number of variables in R

It could be a very easy question, given that I am very unfamiliar with R. I know normally one can use deparse(substitute(.)) to extract the name of a variable. However, if I have a long list of variables (let's say it's built without names), how can I extract the name of each variable efficiently? I was thinking about using loops, but the deparse(substitute(.)) method would obviously generate the 'general' variable name we used to denote every item.
Sample code:
countries<-
list(austria,belgium,czech,denmark,france,germany,italy,luxemberg,netherlands,poland,swiss)
Suppose I want to get countryNames equals to list("austria","belgium",...,"swiss"), how shall I code? I tried generating the list using countries <- list(countryA = countryA, countryB = countryB, ...), but it was extremely tedious, and in some cases I might only have an unnamed input list from elsewhere.
countries would just have values of each individual objects (austria,belgium etc.). To access the names you need to create a named list while creating countries which can be done like :
countries <- list(austria = austria,belgium = belgium....)
However, if this is very tedious you can use tibble::lst which creates the names automatically without explicitly mentioning them.
countries <- tibble::lst(austria,belgium....)
In both the case you can access the names using names(countries).
If the country objects are the only ones loaded in the global environment, we can do this easily with ls and mget to return a named list of values
countries <- mget(ls())

Split subsets of a list to new variables

I have a list of data frames and I'm looking for a way to assign each subset (each data frame) to a new variable (with a dynamic name, corresponding to the name of the subset) that I could manipulate.
Is there a specific function for it?
Thank you
Here is how you can do this:
for(counter in 1:length(var_list)){
assign(paste0('var_', counter), list[counter])
}
However as mentioned in the comments its not a good idea to pollute environment.

Converting a list of data frames into individual data frames in R [duplicate]

This question already has answers here:
Return elements of list as independent objects in global environment
(4 answers)
Closed 3 years ago.
I have been searching high and low for what I think is an easy solution.
I have a large data frame that I split by factors.
eqRegions <- split(eqDataAll, eqDataAll$SeismicRegion)
This now creates a list object of the data frames by region; there are 8 in total. I would like to loop through the list to make individual data frames using another name.
I can execute the following to convert the list items to individual data frames, but I am thinking that there is a loop mechanism that is fast if I have many factors.
testRegion1 <- eqRegions[[1]]
testRegion3 <- eqRegions[[3]]
I can manually perform the above and it handles it nicely, but if I have many regions it's not efficient. What I would like to do is the equivalent of the following:
for (i in 1:length(eqRegions)) {
region[i] <- as.data.frame(eqRegions[[i]])
}
I think the key is to define region before the loop, but it keep overwriting itself and not incrementing. Many thanks.
Try
list2env(eqRegions,envir=.GlobalEnv)
This should work. The name of the data.frames created will be equal to the names within eqDataAll$SeismicRegion. Anyways, this practice of populating individual data.frames is not recommended. The more I work with R, the more I love/use list.
lapply(names(eqRegions), function(x) assign(x, eqRegions[[x]], envir = .GlobalEnv))
edit: Use list2env solution posted. Was not aware of list2env function.
attach(eqRegions) should be enough. But I recommend working with them in list form using lapply. I guarantee it will result in simpler code.
list2env returns data frames to the global environment whose names are the names in the list. An alternative, if you want to have the same name for the data frames but identified by i from a loop:
for (i in 1:length(eqRegions)) {
assign(paste0("eqRegions", i), as.data.frame(eqRegions[[i]]))
}
This can be slow if the length if the list gets too long.
As an alternative, a "best practice" when splitting data like this is to keep the data.frames within a list, as provided by split. To process it, you use either one of sapply or lapply (many factors) and capture the output back in a list. For instance:
eqRegionsProcessed <- lapply(eqRegions, function(df) {
## do something meaningful here
})
This obviously only works if you are doing the same thing to each data.frame.
If you really must break them out and deal with each data.frame uniquely, then #MatthewPlourde's and #MaratTalipov's answers will work.

Performing column select over multiple dataframes

I have looked around a lot for this answer, they get close but no cigar. I am trying to perform a selection of columns over multiple dataframes. I can do this and return a list, but I wish to preserve the dataframes in the global environment. I want to keep the dataframes separate for ease of use and visibility in Rstudio. For example I am selecting columns based on their name as so, for one dataframe:
E07 <- E07[,c("Block","Name","F635.Mean","F532.Mean","B635.Mean","B532")]
I have x amount of data frames listed in dflist so I have written this function:
columnselect<-function(df){df[,c("Block","Name","F635.Mean","F532.Mean","B635.Mean","B532")];df}
I then wish to apply this over the dflist as so:
lapply(X=dflist,FUN=columnselect)
This returns the function over the dflist however the data tables remain unchanged. How do I apply the function over multiple dataframes without returning them in a list.
Many thanks
M
Your function returns the data frames unchanged because this is the last thing evaluated in your function. Instead of:
columnselect<-function(df){
df[,c("Block","Name","F635.Mean","F532.Mean","B635.Mean","B532")]
df}
It should be:
columnselect<-function(df){
df[,c("Block","Name","F635.Mean","F532.Mean","B635.Mean","B532")]
}
Having the last df in your function simply returned the full df that you passed in the function.
As for the second question that you would like to have the data.frames in the global environment rather than in the list (which is bad practice just so you know; it is always better to keep those in the list) you need the list2env function i.e.:
mylist <- lapply(X=dflist,FUN=columnselect)
list2env(mylist, envir = globalenv())
Using this the data.frames in the global environment will be updated.

r create and address variable in for loop

I have multiple csv-files in one folder. I want to load each csv-file in this folder into one separate data frame. Next, I want to extract certain elements from this data frame into a matrix and calculate the mean of all these matrixes.
setwd("D:\\data")
group_1<-list.files()
a<-length(group_1)
mferg_mean<-data.frame
for(i in 1:a)
{
assign(paste0("mferg_",i),read.csv(group_1[i],header=FALSE,sep=";",quote="",dec=",",col.names=1:90))
}
As there are 11 csv-files in the folder I now have the data frames
mferg_1
to
mferg_11
How can I address each data frame in this loop? As mentioned, I want to extract certain elements from each data frame to a matrix. I would imagine it something like this:
assign(paste0("mferg_matrix_",i),mferg_i[1:5,1:10])
But this obviously does not work because R does not recognize mferg_i in the loop. How can I address this data frame?
This is not something you should probably be using assign for in the first place. Working with a bunch of different data.frames in R is a mess, but working with a list of data.frames is much easier. Try reading your data with
group_1<-list.files()
mferg <- lapply(group_1, function(filename) {
read.csv(filename,header=FALSE,sep=";",quote="",dec=",",col.names=1:90))
})
and you get each each value with mferg[[1]], mferg[[1]], etc. And then you can create a list of extractions with
mferg_matrix <- lapply(mferg, function(x) x[1:5, 1:10])
This is the more R-like way to do things.
But technically you can use get to retrieve values like you use assign to create them. For example
assign(paste0("mferg_matrix_",i),get(paste0("mferg_",i))[1:5,1:10])
but again, this is probably not a smart strategy in the long run.

Resources