Using a loop to create multiple data frames in R - r

I have this function that returns a data frame of JSON data from the NBA stats website. The function takes in the game ID of a certain game and returns a data frame of the halftime box score for that game.
getstats<- function(game=x){
for(i in game){
url<- paste("http://stats.nba.com/stats/boxscoretraditionalv2?EndPeriod=10&
EndRange=14400&GameID=",i,"&RangeType=2&Season=2015-16&SeasonType=
Regular+Season&StartPeriod=1&StartRange=0000",sep = "")
json_data<- fromJSON(paste(readLines(url), collapse=""))
df<- data.frame(json_data$resultSets[1, "rowSet"])
names(df)<-unlist(json_data$resultSets[1,"headers"])
}
return(df)
}
So what I would like to do with this function is take a vector of several game ID's and create a separate data frame for each one. For example:
gameids<- as.character(c(0021500580:0021500593))
I would want to take the vector "gameids", and create fourteen data frames. If anyone knew how I would go about doing this it would be greatly appreciated! Thanks!

You can save your data.frames into a list by setting up the function as follows:
getstats<- function(games){
listofdfs <- list() #Create a list in which you intend to save your df's.
for(i in 1:length(games)){ #Loop through the numbers of ID's instead of the ID's
#You are going to use games[i] instead of i to get the ID
url<- paste("http://stats.nba.com/stats/boxscoretraditionalv2?EndPeriod=10&
EndRange=14400&GameID=",games[i],"&RangeType=2&Season=2015-16&SeasonType=
Regular+Season&StartPeriod=1&StartRange=0000",sep = "")
json_data<- fromJSON(paste(readLines(url), collapse=""))
df<- data.frame(json_data$resultSets[1, "rowSet"])
names(df)<-unlist(json_data$resultSets[1,"headers"])
listofdfs[[i]] <- df # save your dataframes into the list
}
return(listofdfs) #Return the list of dataframes.
}
gameids<- as.character(c(0021500580:0021500593))
getstats(games = gameids)
Please note that I could not test this because the URLs do not seem to be working properly. I get the connection error below:
Error in file(con, "r") : cannot open the connection

Adding to Abdou's answer, you could create dynamic data frames to hold results from each gameID using the assign() function
for(i in 1:length(games)){ #Loop through the numbers of ID's instead of the ID's
#You are going to use games[i] instead of i to get the ID
url<- paste("http://stats.nba.com/stats/boxscoretraditionalv2?EndPeriod=10&
EndRange=14400&GameID=",games[i],"&RangeType=2&Season=2015-16&SeasonType=
Regular+Season&StartPeriod=1&StartRange=0000",sep = "")
json_data<- fromJSON(paste(readLines(url), collapse=""))
df<- data.frame(json_data$resultSets[1, "rowSet"])
names(df)<-unlist(json_data$resultSets[1,"headers"])
# create a data frame to hold results
assign(paste('X',i,sep=''),df)
}
The assign function will create data frames same as number of game IDS. They be labelled X1,X2,X3......Xn. Hope this helps.

Use lapply (or sapply) to apply a function to a list and get the results as a list. So if you get a vector of several game ids and a function that do what you want to do, you can use lapply to get a list of dataframe (as your function return df).
I haven't been able to test your code (I got an error with the function you provided), but something like this should work :
library(RJSONIO)
gameids<- as.character(c(0021500580:0021500593))
df_list <- lapply(gameids, getstats)
getstats<- function(game=x){
url<- paste0("http://stats.nba.com/stats/boxscoretraditionalv2?EndPeriod=10&EndRange=14400&GameID=",
game,
"&RangeType=2&Season=2015-16&SeasonType=Regular+Season&StartPeriod=1&StartRange=0000")
json_data<- fromJSON(paste(readLines(url), collapse=""))
df<- data.frame(json_data$resultSets[1, "rowSet"])
names(df)<-unlist(json_data$resultSets[1,"headers"])
return(df)
}
df_list will contain 1 dataframe per Id you provided in gameids.
Just use lapply again for additionnal data processing, including saving the dataframes to disk.
data.table is a nice package if you have to deal with a ton of data. Especially rbindlist allows you to rbind all the dt (=df) contained in a list into a single one if needed (split will do the reverse).

Related

R create a loop for a list of data frames

I currently have a list which is made up of around 80+ data frames, what I would like to do is to loop a chunk of code for each individual data frame within the list, without naming each one individually, or splitting them into individual data frames to work on.
Currently I split the list into each individual data frame using the below code:
dat5split <- setNames(split(dat5, dat5$CODE), paste0("df", unique(dat5$CODE)))
list2env(dat5split, globalenv())
I then work through each data frame individually:
# call in SPC function and write to 'results10000'
results10000<-SPC_XBAR(df10000,vol_n,seasonality)
results10000 = results10000 %>%
cbind(Spec = df10000$CODE) %>%
subset(`table_n` == 1)
results10000 <- results10000[order(results10000$tpd),]
results10000$Date <- as.Date(cbind(Date = df10000$CENSUS_DATE))
# call in SPC function and write to 'results10001'
results10001<-SPC_XBAR(df10001,vol_n,seasonality)
results10001 = results10001 %>%
cbind(Spec = df10001$CODE) %>%
subset(`table_n` == 1)
results10001 <- results10001[order(results10001$tpd),]
results10001$Date <- as.Date(cbind(Date = df10001$CENSUS_DATE))
Currently I call in the function 'SPC_XBAR' to where vol_n and seasonality are set earlier in the code. The script then passes the values to the function which then assigns the results to 'results10000, results10001' etc etc. Upon which I do a small bit of data wrangling on each newly created data frame before feeding the results back into sql server at the end.
As you can see each one is being individually hard coded which is not efficient.
What I would like to do is to loop a chunk of code for each individual data frame within the list, without naming each one individually.
I believe a loop would solve this issue but I am a little inexperienced when it comes to the ability to create a loop around it. Any advice would be much appreciated.
Cheers
Have you considered using lapply instead of a loop throughout the list? Check it here...
EDIT: I try to elaborate a bit more... What happens if you do this:
myFunction <- function(x) {
results<-SPC_XBAR(x,vol_n,seasonality)
results = results %>%
cbind(Spec = x$CODE) %>%
subset(`table_n` == 1)
results <- results[order(results$tpd),]
results$Date <- as.Date(cbind(Date = x$CENSUS_DATE))
results
}
lapply(dat5split, myFunction)
I would expect it to return a list of the resulting datasets

R: Doing the same steps on many data frames with their names stored in a vector

I have several .RData files, each of which has letters and numbers in its name, eg. m22.RData. Each of these contains a single data.frame object, with the same name as the file, eg. m22.RData contains a data.frame object named "m22".
I can generate the file names easily enough with something like datanames <- paste0(c("m","n"),seq(1,100)) and then use load() on those, which will leave me with a few hundred data.frame objects named m1, m2, etc. What I am not sure of is how to do the next step -- prepare and merge each of these dataframes without having to type out all their names.
I can make a function that accepts a data frame as input and does all the processing. But if I pass it datanames[22] as input, I am passing it the string "m22", not the data frame object named m22.
My end goal is to epeatedly do the same steps on a bunch of different data frames without manually typing out "prepdata(m1) prepdata(m2) ... prepdata(n100)". I can think of two ways to do it, but I don't know how to implement either of them:
Get from a vector of the names of the data frames to a list containing the actual data frames.
Modify my "prepdata" function so that it can accept the name of the data frame, but then still somehow be able to do things to the data frame itself (possibly by way of "assign"? But the last step of the function will be to merge the prepared data to a bigger data frame, and I'm not sure if there's a method that uses "assign" that can do that...)
Can anybody advise on how to implement either of the above methods, or another way to make this work?
See this answer and the corresponding R FAQ
Basically:
temp1 <- c(1,2,3)
save(temp1, file = "temp1.RData")
x <- c()
x[1] <- load("temp1.RData")
get(x[1])
#> [1] 1 2 3
Assuming all your data exists in the same folder you can create an R object with all the paths, then you can create a function that gets a path to a Rdata file, reads it and calls "prepdata". Finally, using the purr package you can apply the same function on a input vector.
Something like this should work:
library(purrr)
rdata_paths <- list.files(path = "path/to/your/files", full.names = TRUE)
read_rdata <- function(path) {
data <- load(path)
return(data)
}
prepdata <- function(data) {
### your prepdata implementation
}
master_function <- function(path) {
data <- read_rdata(path)
result <- prepdata(data)
return(result)
}
merged_rdatas <- map_df(rdata_paths, master_function) # This create one dataset. Merging all together

Using a loop to cycle though API calls in R

I have created a simple script to collect a Youtube Channels statistics. Just wondering how I could loop though a list of channel ID's instead of having to manually change the channel ID each time then re-run the script? I struggle to understand how to write loops in R.
key <- 'MyKey'
channel_id1 <- 'UCLSWNf28X3mVTxTT3_nLCcw'
url <- 'https://www.googleapis.com/youtube/v3/channels?part=statistics'
y <- paste0(url,'&id=',channel_id1,'&key=',key)
yt_channel1 <- fromJSON(txt=y)
yt_d_channel1 <- as.data.frame(do.call(c, unlist(yt_channel1, recursive=FALSE)))
Any way to store all channel ID's of interest in a list or vector then loop though them, storing results into new or the same dataframe?
i.e.
channels <- c('UCLSWNf28X3mVTxTT3_nLCcw', 'UCLSW467236VTxTT3_nLCcw', UHJKHS328787_ndncp')
for i 1:3, {
channels...
do stuff
}
Any help is greatly appreciated.
Yes, store the channel IDs in a column in a data frame. Assuming you have a data frame called my_data_frame with a column ID that contains the IDs, you can loop through the IDs like this:
key <- 'MyKey'
url <- 'https://www.googleapis.com/youtube/v3/channels?part=statistics'
for(i in 1:nrow(my_data_frame)){
y <- paste0(url,'&id=',my_data_frame$ID[i],'&key=',key)
yt_channel1 <- fromJSON(txt=y)
yt_d_channel1 <- as.data.frame(do.call(c, unlist(yt_channel1, recursive=FALSE)))
}
Notice how the ID is referenced using an index i which will count from 1 until the number of rows in your data frame.
Note, this code will not work as you will need to come up with a way to store the results.

creating data frame using for loop for string data in R

I have csv file with following data.
i wanted to put this data in dataframe "dfSubClass".
After i will find unique subject list as "uniquesubject" and unique class list as "uniqueclass"form "dfSubClass".
Using "uniquesubject", "uniqueclass" and for loop i wanted to create all subject and class combinations as
csv and expected data
I tried following but its not working.
dfSubClass <- read.csv("SubjectClass.csv",header = TRUE)
uniquesubject = unique(planningItems["Subject"])
uniqueclass = unique(planningItems["Class"])
newDF <- data.frame()
for(Subject in 1:nrow(uniquesubject)){
for(Class in 1:nrow(uniqueclass)){
newDF = rbind(newDF,c(uniquesubject[Subject,],uniqueclass[Class,]))
}
}
this not giving me desired output please help .
I would suggest using the function expand.grid which will automatically generate all the combinations.
Also in your code unique(planningItems["Subject"]), it will return a data frame which is actually not a good idea for this case. A vector would be better.
Here is my code:
uniquesubject = unique(dfSubClass$Subject)
uniqueclass = unique(dfSubClass$Class)
newDF=expand.grid(uniquesubject,uniqueclass)
If using for loops, the main issue in your code is about the rbind function. Here is my code:
uniquesubject = unique(dfSubClass$Subject)
uniqueclass = unique(dfSubClass$Class)
newDF = data.frame()
for (Subject in 1:length(uniquesubject)){
for (Class in 1:length(uniqueclass)){
newDF=rbind(newDF,data.frame("Subject"=uniquesubject[Subject],"Class"=uniqueclass[Class]))
}
}
I think the main different to your code is that I created a dataframe inside the rbind() instead of creating a vector using c(). This is to make sure the result is in dataframe structure instead of a matrix.

R Apply function to list and create new dataframe

I am wanting to retrieve data from several webpages that is in the same place on all the pages and put it all in one data frame.
I have the following code attempt:
library(XML)
library(plyr)
**##the urls**
raceyears<-list(url2013,url2012,url2011)
**##function that is not producing what I want**
raceyearfunction<-function(x){
page<-readLines(x)
stats<-page[10:19]
y<-read.table(textConnection(stats))
run<-data.frame(y$V1,y$V2)
colnames(run)<-c("Country","Participants")
rbind.fill(run)
}
data<-llply(raceyears,raceyearfunction)
This places all the data in multiple columns (two columns for each webpage) but I am wanting all the data in two columns (Participants, Country) one data frame not many columns in one data frame.
I haven't found a question quite like this already on the site but am open to follow a link. Thank you in advance.
You need to use rbindlist outside of raceyearfunction. Let it return(run) without rbind.fill(run).
You can use ldply instead, then it will return binded data.frame already.
library(XML)
library(plyr)
raceyears <- list(url2013,url2012,url2011)
raceyearfunction<-function(x)
{
page <- readLines(x)
stats <- page[10:19]
y <- read.table(textConnection(stats))
run <- data.frame(y$V1,y$V2)
colnames(run) <- c("Country","Participants")
return(run)
}
data<-ldply(raceyears, raceyearfunction)

Resources