Use variable to hold dataframe name - r

I have four dataframes, df1, df2, df3, and df4. They are all formatted identically.
I would like to be able to store the dataframe name in a variable, and access that dataframe later. I can do the below, but this just copies the entire dataframe. Is there a way to do this without copying the whole dataframe?
chart.df <- df1
plot(chart.df$x, chart.df$y)
Note that this is just an example. I would like to do other things aside from just plotting.

In some circumstances, you can store the names of the data.frames as a character vector and then use get() to access the objects. In my experience #Joran's solution is more flexible as you can loop (or apply) through the list items in either by name or position depending on your application.

Related

Renaming data frame columns using symbolic names from stored variables?

I am building a function that requires me to rename columns in a data frame, where the original column names are stored in a variable from user input.
#self$options$DV is a string identifying the column of interest, such as 'chosenvarname':
DVlabel <- self$options$DV
However, the dplyr::rename function doesn't work when using this variable or its symbolic link:
df1 <-plyr::rename(df1,c(DVlabel='DV'))
df1 <-plyr::rename(df1,c(self$options$DV='DV'))
Even when DVlabel is set to equal a valid column name in the data frame, it still doesn't work
It only works properly when using the actual column name, which makes me think that this function doesn't work with symbolic links:
df1 <-plyr::rename(df1,c(OriginalColumnName='DV'))
Is there another way to use the column name identified in self$options$DV as the basis for renaming that same column to something else?
Put differently, is there any way to rename a column using symbolic links to the column name that don't otherwise exist in the data?
Alternatively, is there some way to construct a column reference, such as data$var1, where the "var1" component is extracted from some other variable (e.g., DVlabel or self$options$DV?)
I found a way around this by using the following to rename the columns:
colnames(df1)[colnames(df1) == self$options$DV] <- 'DV'

How can I extract a single element from a list into a data frame?

I compiled a list of ~60 data frames to keep my RStudio environment tidy.
I will need to occasionally extract a single element into a data frame so that I can work on it before putting it back into the list - how can this extract be achieved?
I am aware that I can manipulate the list element directly, but that isn't ideal and being able to extract the data frame would serve me better for my needs.
If dflist is your list of dataframes, then the easiest way to work on element n would be something like
df <- dflist[[n]]
#...work on df...then
dflist[[n]] <- df

R approach for iterative querying

This is a question of a general approach in R, I'm trying to find a way into R language but the data types and loop approaches (apply, sapply, etc) are a bit unclear to me.
What is my target:
Query data from API with parameters from a config list with multiple parameters. Return the data as aggregated data.frame.
First I want to define a list of multiple vectors (colums)
site segment id
google.com Googleuser 123
bing.com Binguser 456
How to manage such a list of value groups (row by row)? data.frames are column focused, you cant write a data.frame row by row in an R script. So the only way I found to define this initial config table is a csv, which is really an approach I try to avoid, but I can't find a way to make it more elegant.
Now I want to query my data, lets say with this function:
query.data <- function(site, segment, id){
config <- define_request(site, segment, id)
result <- query_api(config)
return result
}
This will give me a data.frame as a result, this means every time I query data the same columns are used. So my result should be one big data.frame, not a list of similar data.frames.
Now sapply allows to use one parameter-list and multiple static parameters. The mapply works, but it will give me my data in some crazy output I cant handle or even understand exactly what it is.
In principle the list of data.frames is ok, the data is correct, but it feels cumbersome to me.
What core concepts of R I did not understand yet? What would be the approach?
If you have a lapply/sapply solution that is returning a list of dataframes with identical columns, you can easily get a single large dataframe with do.call(). do.call() inputs each item of a list as arguments into another function, allowing you to do things such as
big.df <- do.call(rbind, list.of.dfs)
Which would append the component dataframes into a single large dataframe.
In general do.call(rbind,something) is a good trick to keep in your back pocket when working with R, since often the most efficient way to do something will be some kind of apply function that leaves you with a list of elements when you really want a single matrix/vector/dataframe/etc.

R naming convention/tricks for many columns in data.table / data.frame

I have a list of, say, n=10 data.tables (or data.frames).
Performing names(myList) returns the unique table names.
Performing names(myList[[i]]) (for i in 1:n) returns identical output for each value of i - i.e. each data.table has identical column names.
I need to merge all the data tables into one large data table, but would like to preserve the name of the list data.table for each column somehow, in order to keep an overview of where each column originated from.
Is there a trick to doing this, such as giving the columns keys? Or must one just prepend the table name to each of the columns in the final result? This would make the names pretty long in my case.
I want to avoid having to remember (or think about) which columns belongs to which table. Just for comparisons sake, I'd like to run str(myBigTable) or summary(myBigTable) and see something like Excel shows here [but vertically displayed in R]:

Performing column select over multiple dataframes

I have looked around a lot for this answer, they get close but no cigar. I am trying to perform a selection of columns over multiple dataframes. I can do this and return a list, but I wish to preserve the dataframes in the global environment. I want to keep the dataframes separate for ease of use and visibility in Rstudio. For example I am selecting columns based on their name as so, for one dataframe:
E07 <- E07[,c("Block","Name","F635.Mean","F532.Mean","B635.Mean","B532")]
I have x amount of data frames listed in dflist so I have written this function:
columnselect<-function(df){df[,c("Block","Name","F635.Mean","F532.Mean","B635.Mean","B532")];df}
I then wish to apply this over the dflist as so:
lapply(X=dflist,FUN=columnselect)
This returns the function over the dflist however the data tables remain unchanged. How do I apply the function over multiple dataframes without returning them in a list.
Many thanks
M
Your function returns the data frames unchanged because this is the last thing evaluated in your function. Instead of:
columnselect<-function(df){
df[,c("Block","Name","F635.Mean","F532.Mean","B635.Mean","B532")]
df}
It should be:
columnselect<-function(df){
df[,c("Block","Name","F635.Mean","F532.Mean","B635.Mean","B532")]
}
Having the last df in your function simply returned the full df that you passed in the function.
As for the second question that you would like to have the data.frames in the global environment rather than in the list (which is bad practice just so you know; it is always better to keep those in the list) you need the list2env function i.e.:
mylist <- lapply(X=dflist,FUN=columnselect)
list2env(mylist, envir = globalenv())
Using this the data.frames in the global environment will be updated.

Resources