create many subsets at once - r

I have a large dataset based on some medical records so I cannot post a sample due to privacy restrictions, but I am trying to subset the one data frame into many. The goal is for each unique facility to be its own data frame so I can identify efficiency rates for each facility. I have tried the following code where df is the name of the data frame, Name is the name I will give to the subset, Location is the value of interest from the variable "Facility" from the original dataframe:
ratefunct <- function(df, Name, Facility) {Name <- subset(df, Facility, == "Location")
Name <- within(Name, {rate <- <-cumsum(Complete)/ cumsum(Complete+Incomplete) })}
but don't seem to be getting any results in my environment

Based on your comment, it sounds like you're trying to store the results of split as separate data frames.
You can do so like this, using assign
dfL <- split(iris, iris$Species)
for (i in 1:length(dfL)){
assign(paste0("df_", names(dfL[i])), dfL[i])
# added the print line so you can see the names of the objects that are created
print(paste0("df_",names(dfL[i])))
}
[1] "df_setosa"
[1] "df_versicolor"
[1] "df_virginica"
Which will create data frames df_setosa, df_virginica, and df_versicolor
Alternatively, if you're happy with the current object names, you could simply use:
list2env(dfL,envir=.GlobalEnv)
Which will save each list item as an object, using the object's name in the list. So instead of having the df_ prefix, you would just have setosa, virginica, and versicolor objects.
Edit: as a simpler way to assign custom names to each created object, directly specifying the names of dfL is a nice clean solution:
names(dfL) <- paste0("df_",names(dfL))
list2env(dfL,envir=.GlobalEnv)
This way you avoid having to write the for loop, and still get object names with a useful prefix.

Related

How to dynamically create and name data frames in a for loop

I am trying to generate data frame subsets for each respondent in a data frame using a for loop.
I have a large data frame with columns titled "StandardCorrect", "NameProper", "StartTime", "EndTime", "AScore", and "StandardScore" and several thousand rows.
I want to make a subset data frame for each person's name so I can generate statistics for each respondent.
I tried using a for loop
for(name in 1:length(NamesList)){ name <- DigiNONA[DigiNONA$NameProper == NamesList[name], ] }
NamesList is just a list containing all the levels of NamesProper (which isa factor variable)
All I want the loop to do is each iteration, generate a new data frame with the name "NamesList[name]" and I want that data frame to contain a subset of the main data frame where NameProper corresponds to the name in the list for that iteration.
This seems like it should be simple I just can;t figure out how to get r to dynamically generate data frames with different names for each iteration.
Any advice would be appreciated, thank you.
The advice to use assign for this purpose is technically feasible, but incorrect in the sense that it is widely deprecated by experienced users of R. Instead what should be done is to create a single list with named elements each of which contains the data from a single individual. That way you don't need to keep a separate data object with the names of the resulting objects for later access.
named_Dlist <- setNames( split( DigiNONA, DigiNONA$NameProper),
NamesList)
This would allow you to access individual dataframes within the named_Dlist object:
named_Dlist[[ NamesList[1] ]] # The dataframe with the first person in that NamesList vector.
It's probably better to use the term list only for true R lists and not for atomic character vectors.

Is there a way to extract a data frame from a list, and assign the data frame to an object with a dynamic name?

I have a list containing many named data frames. I am trying to find a way to extract each data frame from this list. Ultimately, the goal is to assign each data frame in the list to an object according to the name that it has in the list, allowing me to reference the data frames directly instead of through the list (eg. dataframe instead of LIST[[dataframe]])
Here is an example similar to what I am working with.
library(googlesheets4)
inst.pkg("dplyr")
library(dplyr)
gs4_deauth()
TABLES <- list("Test1", "Test2")
readTable <- function(TABLES){
TABLES <- range_read(as_sheets_id("SHEET ID"),sheet = TABLES)
TABLES <-as.data.frame(TABLES)
TABLES <- TABLES %>%
transmute(Column1= as.character(Column1), Column2 = as.character(Column2 ))
return(TABLES)}
LIST <- lapply(TABLES, readTable)
names(LIST) <- TABLES
I know that this could be done manually, but I'm trying to find a way to automate this process. Any advice would be helpful. Thanks in advance.
If named_dfs is a named list where each element is a dataframe you can use the assign function to achieve your goal.
Map(assign, names(named_dfs), named_dfs, pos = 1)
For each name, it assigns (equivalent to <- operator) the corresponding dataframe object.
Map(function(x, y) assign(x, y, envir = globalenv()), names(named_dfs), named_dfs)
Should also work.

How can I create a simple dataframe from nested, JSON format API content

Using a JSON format file pulled from the SeatGeek API, I'd like to convert the data into a data frame. I've managed to create a frame with all variables + data using the function below:
library(httr)
library(jsonlite)
vpg <- GET("https://api.seatgeek.com/2/venues?country=US&per_page=5000&page=1&client_id=NTM2MzE3fDE1NzM4NTExMTAuNzU&client_secret=77264dfa5a0bc99095279fa7b01c223ff994437433c214c8b9a08e6de10fddd6")
vpgc <- content(vpg)
vpgcv <- (vpgc$venues)
json_file <- sapply(vpgcv, function(x) {
x[sapply(x, is.null)] <- NA
unlist(x)
as.data.frame(t(x))
})
From this point, I can create a data frame using:
venues.dataframe <- as.data.frame(t(json_file), flatten = TRUE)
But my resulting data is a data frame with the correct number of 23 variables and 5000 rows, but each entry is a list rather than just a value. How can I pull the value out of each list?
I've also attempted to pull the values out using data tables in the following code:
library(data.table)
data.table::rbindlist(json_file, fill= TRUE)
But the output data frame flows almost diagonally, placing 1 stored variable + 22 NULL values per row. While all the data exists here, Rows 1-23 (and 24-46, and so on) should be a single row.
Of these two dead ends, which is the easiest/cleanest solution to produce my desired data frame output of [5000 observations, in simple value form of 23 variables]?
Your url is connecting directly to the JSON file, no need for the GET function. The jsonlite library can handle the download directly.
library(jsonlite)
output<-fromJSON("https://api.seatgeek.com/2/venues?country=US&per_page=5000&page=1&client_id=NTM2MzE3fDE1NzM4NTExMTAuNzU&client_secret=77264dfa5a0bc99095279fa7b01c223ff994437433c214c8b9a08e6de10fddd6")
df<-output$venues
flatdf<-flatten(df)
#remove first column of empty lists
flatdf<-flatdf[,-1]
The variable "output" is a list of dataframes from the JSON object. One can reference using the "$" to retrieve the part of interest.
df does have some imbedded data frames, to flatten, use the flatten function from jsonlite package.

How to convert all factor variables into numeric variables (in multiple data frames at once)?

I have n data frames, each corresponding to data from a city.
There are 3 variables per data frame and currently they are all factor variables.
I want to transform all of them into numeric variables.
I have started by creating a vector with the names of all the data frames in order to use in a for loop.
cities <- as.vector(objects())
for ( i in cities){
i <- as.data.frame(lapply(i, function(x) as.numeric(levels(x))[x]))
}
Although the code runs and there I get no error code, I don't see any changes to my data frames as all three variables remain factor variables.
The strangest thing is that when doing them one by one (as below) it works:
df <- as.data.frame(lapply(df, function(x) as.numeric(levels(x))[x]))
What you're essentially trying to do is modify the type of the field if it is a factor (to a numeric type). One approach using purrr would be:
library(purrr)
map(cities, ~ modify_if(., is.factor, as.numeric))
Note that modify() in itself is like lapply() but it doesn't change the underlying data structure of the objects you are modifying (in this case, dataframes). modify_if() simply takes a predicate as an additional argument.
for anyone who's interested in my question, I worked out the answer:
for ( i in cities){
assign(i, as.data.frame(lapply(get(i), function(x) as.numeric(levels(x))[x])))
}

Storing dataframes in a list

I'm trying to store a bunch of dataframes in a list, and each of these dataframes has column names that are important (they are stock names, which are different for each dataframe).
I'm storing them in a list because this way it can be done with a foreach loop, which will allow me to run this beforehand, then use the list as a database of information.
right now I have:
Y.matrices <- foreach(i = (1:600)) %dopar% {
df = data.frame(data)
return(df)
}
The issue with this is once I store them, I'm not sure how to get the data frames back. If I do:
unlist(Y.matrices[1])
I get a long numeric vector that has lost the column names. Is there some other way to store these data frames (ie, perhaps not in a list) that would enable me to preserve the formats?
Thanks!
To access 1 individual dataframe, you can use Y.matrices[[#]], where # is the dataframe you want to access, if the result needs to be 1 merged dataframe with all the 600 dataframes you can use:
library(dplyr)
df1 <- bind_rows(Y.matrices, .id = "df")
The .id fills in the number of the data.frame, or if they are named in the list, the name of the dataframe.

Resources