Create empty data tables in R - r

Let's say I need to create empty date tables with the following names (create if it doesn't exist in my environment):
# names of datatables that should be created
dt_list <- c("results_1",
"results_2",
"final_results",
"model_results")
That is, I need to get empty (no columns) data tables: results_1, results_2, final_results, model_results (in reality, I have a much longer list of date labels that should be created if they don't exist).
I read the thread but didn't find a suitable solution.
I tried something like this, but it doesn't work:
# create an empty data.table if not exists
for(dt in 1:length(dt_list)){
if(!exists(dt_list[dt])){
dt_list[dt] <- data.table()
}
}
Error in dt_list[dt] <- data.table() : replacement has length zero
I would be grateful for any help!

Try this:
# create an empty data.table if not exists
for(dt in 1:length(dt_list)){
if(!exists(dt_list[dt])){
assign(dt_list[dt], data.table())
}
}

Related

Looping through variables to produce tables of percentages

I am very new to R and would appreciate any advice. I am from a STATA background and so learning to think in R. I am trying to produce tables of percentages for my 20 binary variables. I have tried a for loop but not sure where I am going wrong as there is no warning message.
for (i in 1:ncol(MAAS1r[varbinary])) {
varprop<- varbinary[i]
my.table<-table(MAAS1r[varprop])
my.prop<-prop.table(my.table)
cbind(my.table, my.prop)
}
Many thanks
I made one with an example extracted from mtcars
this are two variables that are binary (0 or 1), called VS and AM
mtcarsBivar<- mtcars[,c(8,9)]
get names of the columns:
varbinary <- colnames(mtcarsBivar)
use dplyr to do it:
library(dplyr)
make an empty list to populate
Binary_table <- list()
now fill it with the loop:
for (i in 1:length(varbinary)) {
Binary_table[[i]] <- summarise(mtcarsBivar, percent_1 = sum(mtcarsBivar[,1] == 1)/nrow(mtcarsBivar))
}
Transform it to a data frame
Binary_table <- do.call("cbind", Binary_table)
give it the name of the varbinary to the columns
colnames(Binary_table) <- varbinary
this only works if all your variables are binary

Displaying data from a list in R without dynamically changing variable names

I'm writing some code in R that builds a list of data frames. While it runs, it needs to display each of the data frames it creates in a separate tab. The data frames and the list are both created by several nested for loops, along the lines of:
df.list <- vector("list", length(e))
i <- 1
for (...){
data <- as.data.frame(stuff)
j <- 1
for (...){
for (...){
[loop stuff]
data[j,] <- [more stuff]
}
}
df.list[[i]] <- data
i <- i + 1
}
The question is where to put the "View" function. If I add a second loop at the end that runs through the list and displays the data frames, then they all get named "df.list". If I put View(data) right before df.list[[i]] <- data then they all get named "data". Having them all have the same name is not an acceptable situation for this context. Ideally, I would be able to name them whatever string I want, but I would settle for anything that is reasonably understandable and distinguishable from the other data frames.
I know I can solve this by dynamically changing the variable name to be datai where i is the list index, but that's almost always the wrong way to do things.
I thought I'd never post an answer using eval(parse()), but it's the only way I can think to make this work:
# sample data
df.list = list(mtcars, iris)
# name your list however you want the tabs to be named
names(df.list) = c("mtcars data", "this is iris")
for (i in seq_along(df.list)) eval(parse(text = sprintf("View(df.list[['%s']])", names(df.list)[i])))
This might be what you meant by "dynamically changing the variable name to be datai where i is the list index", and I agree that it's almost always wrong. In this case it may also be by far the most expedient way to do it as well.
Posting the solution from the comments so I can close:
The View() function takes names as optional arguments! View(data, name) will display data and call the tab name

Using a loop to create multiple data frames in R

I have this function that returns a data frame of JSON data from the NBA stats website. The function takes in the game ID of a certain game and returns a data frame of the halftime box score for that game.
getstats<- function(game=x){
for(i in game){
url<- paste("http://stats.nba.com/stats/boxscoretraditionalv2?EndPeriod=10&
EndRange=14400&GameID=",i,"&RangeType=2&Season=2015-16&SeasonType=
Regular+Season&StartPeriod=1&StartRange=0000",sep = "")
json_data<- fromJSON(paste(readLines(url), collapse=""))
df<- data.frame(json_data$resultSets[1, "rowSet"])
names(df)<-unlist(json_data$resultSets[1,"headers"])
}
return(df)
}
So what I would like to do with this function is take a vector of several game ID's and create a separate data frame for each one. For example:
gameids<- as.character(c(0021500580:0021500593))
I would want to take the vector "gameids", and create fourteen data frames. If anyone knew how I would go about doing this it would be greatly appreciated! Thanks!
You can save your data.frames into a list by setting up the function as follows:
getstats<- function(games){
listofdfs <- list() #Create a list in which you intend to save your df's.
for(i in 1:length(games)){ #Loop through the numbers of ID's instead of the ID's
#You are going to use games[i] instead of i to get the ID
url<- paste("http://stats.nba.com/stats/boxscoretraditionalv2?EndPeriod=10&
EndRange=14400&GameID=",games[i],"&RangeType=2&Season=2015-16&SeasonType=
Regular+Season&StartPeriod=1&StartRange=0000",sep = "")
json_data<- fromJSON(paste(readLines(url), collapse=""))
df<- data.frame(json_data$resultSets[1, "rowSet"])
names(df)<-unlist(json_data$resultSets[1,"headers"])
listofdfs[[i]] <- df # save your dataframes into the list
}
return(listofdfs) #Return the list of dataframes.
}
gameids<- as.character(c(0021500580:0021500593))
getstats(games = gameids)
Please note that I could not test this because the URLs do not seem to be working properly. I get the connection error below:
Error in file(con, "r") : cannot open the connection
Adding to Abdou's answer, you could create dynamic data frames to hold results from each gameID using the assign() function
for(i in 1:length(games)){ #Loop through the numbers of ID's instead of the ID's
#You are going to use games[i] instead of i to get the ID
url<- paste("http://stats.nba.com/stats/boxscoretraditionalv2?EndPeriod=10&
EndRange=14400&GameID=",games[i],"&RangeType=2&Season=2015-16&SeasonType=
Regular+Season&StartPeriod=1&StartRange=0000",sep = "")
json_data<- fromJSON(paste(readLines(url), collapse=""))
df<- data.frame(json_data$resultSets[1, "rowSet"])
names(df)<-unlist(json_data$resultSets[1,"headers"])
# create a data frame to hold results
assign(paste('X',i,sep=''),df)
}
The assign function will create data frames same as number of game IDS. They be labelled X1,X2,X3......Xn. Hope this helps.
Use lapply (or sapply) to apply a function to a list and get the results as a list. So if you get a vector of several game ids and a function that do what you want to do, you can use lapply to get a list of dataframe (as your function return df).
I haven't been able to test your code (I got an error with the function you provided), but something like this should work :
library(RJSONIO)
gameids<- as.character(c(0021500580:0021500593))
df_list <- lapply(gameids, getstats)
getstats<- function(game=x){
url<- paste0("http://stats.nba.com/stats/boxscoretraditionalv2?EndPeriod=10&EndRange=14400&GameID=",
game,
"&RangeType=2&Season=2015-16&SeasonType=Regular+Season&StartPeriod=1&StartRange=0000")
json_data<- fromJSON(paste(readLines(url), collapse=""))
df<- data.frame(json_data$resultSets[1, "rowSet"])
names(df)<-unlist(json_data$resultSets[1,"headers"])
return(df)
}
df_list will contain 1 dataframe per Id you provided in gameids.
Just use lapply again for additionnal data processing, including saving the dataframes to disk.
data.table is a nice package if you have to deal with a ton of data. Especially rbindlist allows you to rbind all the dt (=df) contained in a list into a single one if needed (split will do the reverse).

updating column values when table name is a variable

First question here, and very new to R as well.
I have a loop creating data frames according to a list of studytables. I can read all the CSVs fine, but I would like to get the field "Subject" and add the variable "study" before what is currently in the field. My trouble is with the 2nd "assign" line, I can't get R to assign the new value to "Subject".
Thanks for all your help.
study <- 'study10'
studytables <- list('ae', 'subject')
studypath <- 'C:/mypath/'
for(table in studytables) {
destinframe <- paste(table,study, sep='')
file <- paste(studypath, table, '.CSV', sep='' )
assign(destinframe, read.csv(file)) # create all dataframes
assign(destinframe['Subject'], rep('testing', nrow(get(destinframe))))
}
Using assign like that really isn't a great idea. And as you can see it doesn't work well when you try to add columns to a data.frame. It's better to add the columns before you do the assign. So replace
assign(destinframe, read.csv(file))
assign(destinframe['Subject'], rep('testing', nrow(get(destinframe))))
with
dd <- read.csv(file)
dd$Subject <- paste(study, dd$Subject)
assign(destinframe, dd)

R extract variable from multiple dataframe in loop

I have a lot of result from parametric study to analyze. Fortunately there is an output file where the output file are saved. I need to save the name of file. I used this routine:
IndexJobs<-read.csv("C:/Users/.../File versione7.1/
"IndexJobs.csv",sep=",",header=TRUE,stringsAsFactors=FALSE)
dir<-IndexJobs$WORKDIR
Dir<-gsub("\\\\","/",dir)
Dir1<-gsub(" C","C",Dir)
Now I use e for in order to read CSV and create different dataframe
for(i in Dir1){
filepath <- file.path(paste(i,"eplusout.csv",sep=""))
dat<-NULL
dat<-read.table(filepath,header=TRUE,sep=",")
filenames <- substr(filepath,117,150)
names <-substr(filenames,1,21)
assign(names, dat)
}
Now I want to extract selected variables from each database, and putting together each variable for each database into separated database. I would also joint name of variable and single database in order to have a clear database for making some analysis. I try to make something but with bad results.
I tried to insert in for some other row:
for(i in Dir1){
filepath <- file.path(paste(i,"eplusout.csv",sep=""))
dat<-NULL
dat<-read.table(filepath,header=TRUE,sep=",")
filenames <- substr(filepath,117,150)
names <-substr(filenames,1,21)
assign(names, dat)
datTest<-dat$X5EC132.Surface.Outside.Face.Temperature..C..TimeStep.
nameTest<-paste(names,"_Test",sep="")
assign(nameTest,datTest)
DFtest=c[,nameTest]
}
But for each i there is an overwriting of DFtest and remain only the last database column.
Some suggestion?Thanks
Maybe it will work if you replace DFtest=c[,nameTest] with
DFtest[nameTest] <- get(nameTest)
or, alternatively,
DFtest[nameTest] <- datTest
This procedure assumes the object DFtest exists before you run the loop.
An alternative way is to create an empty list before running the loop:
DFtest <- list()
In the loop, you can use the following command:
DFtest[[nameTest]] <- datTest
After the loop, all values in the list DFtest can be combined using
do.call("cbind", DFtest)
Note that this will only work if all vectors in the list DFtesthave the same length.

Resources