I have some code that runs a model in a loop. Each iteration of the loop runs a slightly different model and the results are stored in a variable . What is a good way to store these objects so I can access them after the loop terminates ? I thought about something like this:
fit.list <- list(n)
for (i in 1:n) {
fit <- glm(......)
fit.list[i] <- fit
}
But then I want to access each model results, for example summary(fit.list[4]) or plot(fit.list[15]) but that doesn't seem to work.
Try
plot(fit.list[[15]])
The single [ function extracts a list with the requested component(s), even if that list if of length 1.
The double [[ function extracts the single stated component and returns it but not in a list; i.e. you get the component itself not a list containing that component.
Here is an illustration:
> mylist <- list(a = 1, b = "A", c = data.frame(X = 1:5, Y = 6:10))
> str(mylist)
List of 3
$ a: num 1
$ b: chr "A"
$ c:'data.frame': 5 obs. of 2 variables:
..$ X: int [1:5] 1 2 3 4 5
..$ Y: int [1:5] 6 7 8 9 10
> str(mylist["c"])
List of 1
$ c:'data.frame': 5 obs. of 2 variables:
..$ X: int [1:5] 1 2 3 4 5
..$ Y: int [1:5] 6 7 8 9 10
> str(mylist[["c"]])
'data.frame': 5 obs. of 2 variables:
$ X: int 1 2 3 4 5
$ Y: int 6 7 8 9 10
Notice the difference in the last two command outputs. str(mylist["c"]) says "List of 1" whilst str(mylist[["c"]]) says "'data.frame':".
With your plot(fit.list[15]) you were asking R to plot a list object not the model contained in that element of the list.
also maybe try
fit.list <- list()
for (i in 1:5) {
counts <- c(18,17,15,20,10,20,25,13,12)
outcome <- gl(3,1,9)
treatment <- gl(3,3)
print(d.AD <- data.frame(treatment, outcome, counts))
glm.D93 <- glm(counts ~ outcome + treatment, family=poisson())
fit.list[[i]] <-glm.D93
}
note the fit.list[[i]] rather then fit.list[i] as you have
Related
I am using the following code, which works fine (improvement suggestions very much welcome):
WeeklySlopes <- function(Year, Week){
DynamicQuery <- paste('select DayOfYear, Week, Year, Close from SourceData where year =', Year, 'and week =', Week, 'order by DayOfYear')
SubData = sqldf(DynamicQuery)
SubData$X <- as.numeric(rownames(SubData))
lmfit <- lm(Close ~ X, data = SubData)
lmfit <- tidy(lmfit)
Slope <- as.numeric(sqldf("select estimate from lmfit where term = 'X'"))
e <- globalenv()
e$WeeklySlopesDf[nrow(e$WeeklySlopesDf) + 1,] = c(Year,Week, Slope)
}
WeeklySlopesDf <- data.frame(Year = integer(), Week = integer(), Slope = double())
WeeklySlopes(2017, 15)
WeeklySlopes(2017, 14)
head(WeeklySlopesDf)
Is there really no other way to append a row to my existing dataframe. I seem to need to access the globalenv. On the other hand, why can sqldf 'see' the 'global' dataframe SourceData?
dfrm <- data.frame(a=1:10, b=letters[1:10]) # reproducible example
myfunc <- function(new_a=20){ g <- globalenv(); g$dfrm[3,1] <- new_a; cat(dfrm[3,1])}
myfunc()
20
dfrm
a b
1 1 a
2 2 b
3 20 c # so your strategy might work, although it's unconventional.
Now try to extend dataframe outside a function:
dfrm[11, ] <- c(a=20,b="c")
An occult disaster (conversion of numeric column to character):
str(dfrm)
'data.frame': 11 obs. of 2 variables:
$ a: chr "1" "2" "20" "4" ...
$ b: Factor w/ 10 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ...
So use a list to avoid occult coercion:
dfrm <- data.frame(a=1:10, b=letters[1:10]) # start over
dfrm[11, ] <- list(a=20,b="c")
str(dfrm)
'data.frame': 11 obs. of 2 variables:
$ a: num 1 2 3 4 5 6 7 8 9 10 ...
$ b: Factor w/ 10 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ...
Now try within a function:
myfunc <- function(new_a=20, new_b="ZZ"){ g <- globalenv(); g$dfrm[nrow(dfrm)+1, ] <- list(a=new_a,b=new_b)}
myfunc()
Warning message:
In `[<-.factor`(`*tmp*`, iseq, value = "ZZ") :
invalid factor level, NA generated
str(dfrm)
'data.frame': 12 obs. of 2 variables:
$ a: num 1 2 3 4 5 6 7 8 9 10 ...
$ b: Factor w/ 10 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ...
So it succeeds, but if there are any factor columns, non-existent levels will get turned into NA values (with a warning). You method of using named access to objects in the global environment is rather unconventional but there is a set of tested methods that you might want to examine. Look at ?R6. Other options are <<- and assign which allows one to specify the environment in which the assignment is to occur.
I have several dataframes:
toto1_1 <- data.frame(x=1:3)
toto1_2 <- data.frame(x=1:3)
titi1_1 <- data.frame(x=1:3)
titi1_2 <- data.frame(x=1:3)
What is the best way to concatenate these tables using 2 different patterns?
Thank you.
The mget function will return a list of data-objects when given a character vector:
totoList <- mget( paste0( rep(c("toto","titi"),each=2), rep(c("1_1","1_2") ) )
str(totoList)
List of 4
$ toto1_1:'data.frame': 3 obs. of 1 variable:
..$ x: int [1:3] 1 2 3
$ toto1_2:'data.frame': 3 obs. of 1 variable:
..$ x: int [1:3] 1 2 3
$ titi1_1:'data.frame': 3 obs. of 1 variable:
..$ x: int [1:3] 1 2 3
$ titi1_2:'data.frame': 3 obs. of 1 variable:
..$ x: int [1:3] 1 2 3
If the goal were as single lust, then that could be an intermediate result on the way to:
do.call( "rbind", totoList) # rbind transforms character value to an R function
x
toto1_1.1 1
toto1_1.2 2
toto1_1.3 3
toto1_2.1 1
toto1_2.2 2
toto1_2.3 3
titi1_1.1 1
titi1_1.2 2
titi1_1.3 3
titi1_2.1 1
titi1_2.2 2
titi1_2.3 3
please see some suggestions below.
1. They're concatenated using the rbind() function
2. They're concatenated using the c() function -- retains as separate vector elements but concatenated into a single element
3. Perhaps more useful because it preserves the table names of the original tables for future analysis and sorting: to add each into a list, then to create a data.frame for each table name and it's data; and then concatenate into a single object using rbind()
toto1_1 <- data.frame(x=1:3)
toto1_2 <- data.frame(x=1:3)
titi1_1 <- data.frame(x=1:3)
titi1_2 <- data.frame(x=1:3)
#1
rbind(toto1_1,toto1_2,titi1_1,titi1_2)
#2
c(toto1_1,toto1_2,titi1_1,titi1_2)
#3
l <- list(toto1_1=toto1_1,toto1_2=toto1_2,titi1_1=titi1_1,titi1_2=titi1_2)
do.call(rbind,lapply(names(l),FUN=function(x) { data.frame(table_name=x,table_data=l[[x]]) }))
Using aggregate, R creates a list Z that can be indexed on the form a$Z$`1.2`, where the first number references the corresponding element in X, and likewise for Y. In addition, if X or Y has 10+ elements, the form changes to a$Z$`01.02` (and assumedly 001.002 for 100+ elements).
Instead of having to index Z with the zero-padded index value of X and Y, how can I index with the actual X and Y values instead (eg. a$Z$`52.60`), which seems much more intuitive!
df = data.frame(X=c(50, 52, 50), Y=c(60, 60, 60), Z=c(4, 5, 6))
a = aggregate(Z ~ X + Y, df, c)
str(a)
'data.frame': 2 obs. of 3 variables:
$ X: num 50 52
$ Y: num 60 60
$ Z:List of 2
..$ 1.1: num 4 6
..$ 1.2: num 5
You easily can do this after aggregate:
names(a$Z) <- paste(a$X, a$Y, sep=".")
Then check it out
str(a)
'data.frame': 2 obs. of 3 variables:
$ X: num 50 52
$ Y: num 60 60
$ Z:List of 2
..$ 50.60: num 4 6
..$ 52.60: num 5
1) Try tapply instead:
ta <- tapply(df[[3]], df[-3], c)
ta[["50", "60"]]
## [1] 4 6
ta[["52", "60"]]
## [1] 5
2) subset Consider just not using aggregate at all and use subset to retrieve the values:
subset(df, X == 50 & Y == 60)$Z
## [1] 4 6
3) data.table Subsetting is even easier with data.table:
library(data.table)
dt <- data.table(df, key = "X,Y")
dt[.(50, 60), Z]
## [1] 4 6
Note: If you are not actually starting with the df shown in the question but rather a is the result of a series of complex transformations then we can recover df like this:
df <- tidyr::unnest(a)
at which point any of the above could be used.
I have a wide data.frame that is all character vectors (df1). I have a separate vector(vec1) that contains the column classes I'd like to assign to each of the columns in df1.
If I was using read.csv(), I'd use the colClasses argument and set it equal to vec1, but there doesn't appear to be a similar option for an existing data.frame.
Any suggestions for a fast way to do this besides a loop?
I don't know if it will be of help but I have run into the same need many times and I have created a function in case it helps:
reclass <- function(df, vec){
df[] <- Map(function(x, f){
#switch below shows the accepted values in the vector
#you can modify it and/or add more
f <- switch(f,
as.is = 'force',
factor = 'as.factor',
num = 'as.numeric',
char = 'as.character')
#takes the name of the function and fetches the function
f <- get(f)
#apply the function
f(x)
},
df,
vec)
df
}
It uses Map to pass in a vector of classes to the data.frame. Each element corresponds to the class of the column. The length of both the dataframe and the vector need to be the same.
I am using switch as well to make the corresponding classes shorter to type. Use as.is to keep the class the same, the rest are self explanatory I think.
Small example:
df1 <- data.frame(1:10, letters[1:10], runif(50))
> str(df1)
'data.frame': 50 obs. of 3 variables:
$ X1.10 : int 1 2 3 4 5 6 7 8 9 10 ...
$ letters.1.10.: Factor w/ 10 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ...
$ runif.50. : num 0.0969 0.1957 0.8283 0.1768 0.9821 ...
And after the function:
df1 <- reclass(df1, c('num','as.is','char'))
> str(df1)
'data.frame': 50 obs. of 3 variables:
$ X1.10 : num [1:50] 1 2 3 4 5 6 7 8 9 10 ...
$ letters.1.10.: Factor w/ 10 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ...
$ runif.50. : chr [1:50] "0.0968757788650692" "0.19566105119884" "0.828283685725182" "0.176784737734124" ...
I guess Map internally is a loop but it is written in C so it should be fast enough.
May be you could try this function that makes the same work.
reclass <- function (df, vec_types) {
for (i in 1:ncol(df)) {
type <- vec_types[i]
class(df[ , i]) <- type
}
return(df)
}
and this is an example of vec_types (vector of types):
vec_types <- c('character', rep('integer', 3), rep('character', 2))
you can test the function (reclass) whith this table (df):
table <- data.frame(matrix(sample(1:10,30, replace = T), nrow = 5, ncol = 6))
str(table) # original column types
# apply the function
table <- reclass(table, vec_types)
str(table) # new column types
I have ~40K data frames in a list. Each data frame has 7 variables, 3 factors and 4 numeric. For reference, here is the first data frame:
$ a:'data.frame': 4 obs. of 7 variables:
..$ x1 : Factor w/ 1 level "a": 1 1 1 1
..$ x2 : Factor w/ 4 levels "12345678901234",..: 1 2 3 4
..$ x3 : Factor w/ 4 levels "SAMPLE",..: 1 2 3 4
..$ x4 : int [1:4] 1 2 3 4
..$ x5 : num [1:4] 10 20 30 40
..$ x6: int [1:4] 50 60 70 80
..$ x7 : num [1:4] 0.5 0.7 0.35 1
I'm trying to merge these into a single ginormous data frame, using:
Reduce(function(...) merge(..., all=T), df_list)
As recommended here: Simultaneously merge multiple data.frames in a list.
If I take the first 1000 items, i.e.
Reduce(function(...) merge(..., all=T), df_list[1:1000])
This produces the desired result (merges the individual data frames into a single one) and completes in 37 seconds.
However, running Reduce() on the entire 40K list of data frames takes an inordinate amount of time.. I've let it run >5 hrs and it doesn't appear to complete.
Are there any tricks that I can use to improve the performance of Reduce(), or is there a better alternative?
If you really needed merge and not just rbind, you could first merge them two by two (1 and 2, 3 and 4, 5 and 6, etc.), then merge the resulting data.frames two by two, and so on, until there is only one remaining data.frame.
# One step
merge_some <- function(l, ...) {
n <- length(l)
k <- floor(n/2)
result <- list()
for(i in 1:k) {
result[[i]] <- merge(l[[2*i-1]], l[[2*i]], ...)
}
if( 2*k < n ) {
result[[k+1]] <- l[[n]]
}
result
}
# Sample data
d <- lapply(1:1000, function(i) {
r <- data.frame(id = sample(1:100,3), v = rnorm(3))
names(r)[[2]] <- paste0("v",i)
r
} )
# Iterate until there is only one data.frame left
while( length(d) > 1 ) {
d <- merge_some(d, by="id", all=TRUE)
}
# Result
head(d[[1]])