R - store a matrix into a single dataframe cell

R - store a matrix into a single dataframe cell - r

I'm trying to store an entire matrix/array into a single cell of a data frame, but can't quite remember how to do it.
Now before you say it can't be done, I'm sure I remember someone asking a question on SO where it was done, although that wasn't the point of the question so I can't find it again.
For example, you can store matrices inti a single cell of a matrix like so:
myMat <- array(list(), dim=c(2, 2))
myMat[[1, 1]] <- 1:5
myMat[[1, 2]] <- 6:10
# [,1] [,2]
#[1,] Integer,5 Integer,5
#[2,] NULL NULL
The trick was in using the double brackets [[]].
Now I just can't work out how to do it for a data frame (or if you can):
# attempt to make a dataframe like above (except if I use list() it gets
# interpreted to mean the `m` column doesn't exist)
myDF <- data.frame(i=1:5, m=NA)
myDF[[1, 'm']] <- 1:5
# Error in `[[<-.data.frame`(`*tmp*`, 1, "m", value = 1:5) :
# more elements supplied than there are to replace
# this seems to work but I have to do myDF$m[[1]][[1]] to get the 1:5,
# whereas I just want to do myDF$m[[1]].
myDF[[1, 'm']] <- list(1:5)
I think I'm almost there. With that last attempt I can do myDF[[1, 'm']] to retrieve list(1:5) and hence myDF[[1, 'm']][[1]] to get 1:5, but I'd prefer to just do myDF[[1, 'm']] and get 1:5.

I think I worked it out. It is important to initialise the data frame such that the column is ready to accept matrices.
To do this you give it a list data type. Note the I to protect the list().
myDF <- data.frame(i=integer(), m=I(list()))
Then you can add rows as usual
myDF[1, 'i'] <- 1
and then add the matrix in with [[]] notation
myDF[[1, 'm']] <- matrix(rnorm(9), 3, 3)
Access with [[]] notation:
> myDF$m[[1]]
[,1] [,2] [,3]
[1,] 0.3307403 -0.2031316 1.5995385
[2,] 0.4588922 0.1631086 -0.2754463
[3,] 0.0568791 1.0358552 -0.1623794
To initialise with non-zero rows you can do (note the I to protect the vector and the vector('list', 5) to initialise an empty list of length 5 to avoid wasting memory):
myDF <- data.frame(i=1:5, m=I(vector('list', 5)))
myDF$m[[1]] <- matrix(rnorm(9), 3, 3)

I think the trick may be to insert it in as a list:
set.seed(123)
dat <- data.frame(women, m=I(replicate(nrow(women), matrix(rnorm(4), 2, 2),
simplify=FALSE)))
str(dat)
'data.frame': 15 obs. of 3 variables:
$ height: num 58 59 60 61 62 63 64 65 66 67 ...
$ weight: num 115 117 120 123 126 129 132 135 139 142 ...
$ m :List of 15
..$ : num [1:2, 1:2] -0.5605 -0.2302 1.5587 0.0705
..$ : num [1:2, 1:2] 0.129 1.715 0.461 -1.265
...
..$ : num [1:2, 1:2] -1.549 0.585 0.124 0.216
..- attr(*, "class")= chr "AsIs"
dat[[1, "m"]]
[,1] [,2]
[1,] -0.5604756 1.55870831
[2,] -0.2301775 0.07050839
dat[[2, "m"]]
[,1] [,2]
[1,] 0.1292877 0.4609162
[2,] 1.7150650 -1.2650612
EDIT: So the question really is about initialising and then assigning. Given that, you should be able to define a data.frame like the one in your question like so:
data.frame(i=1:5, m=I(vector(mode="list", length=5)))
You can then assign to it like so:
dat[[2, "m"]] <- matrix(rnorm(9), 3, 3)

Related

Apply na.locf to multiple datasets

I have multiple datasets (Eg: data01, data02..). In all these datasets, I want to apply na.locf to var1, and create a new variable 'var2' from the locf applied 'var1'. I tried using the following code:
L=list(data01,data02)
for (i in L){i$var2 <- na.locf(i$var1)}
However, when I try to read the locf column using code:
head(data01$var2)
The result given is NULL.

There are a few problems:
in the question i is a copy of each data frame so L is not changed. Index into L to ensure that it is the data frame in L that is changed.
use na.locf0 or equivalently na.locf(..., na.rm = FALSE) to ensure that the output is the same length as the input
the data01 and data02 in L are copies of data01 and data02 and modifying one does not modify the other. That is why you get NULL.
Using the built-in BOD data frame to construct sample input:
library(zoo)
# construct sample input
BOD1 <- BOD2 <- BOD
BOD1$Time[c(1, 3)] <- BOD2$Time[c(3, 5)] <- NA
L <- list(BOD1, BOD2)
for(i in seq_along(L)) L[[i]]$Time2 <- na.locf0(L[[i]]$Time)
giving:
str(L)
List of 2
$ :'data.frame': 6 obs. of 3 variables:
..$ Time : num [1:6] NA 2 NA 4 5 7
..$ demand: num [1:6] 8.3 10.3 19 16 15.6 19.8
..$ Time2 : num [1:6] NA 2 2 4 5 7
..- attr(*, "reference")= chr "A1.4, p. 270"
$ :'data.frame': 6 obs. of 3 variables:
..$ Time : num [1:6] 1 2 NA 4 NA 7
..$ demand: num [1:6] 8.3 10.3 19 16 15.6 19.8
..$ Time2 : num [1:6] 1 2 2 4 4 7
..- attr(*, "reference")= chr "A1.4, p. 270"
Any of these would also work and instead of modifying L produce a new list:
L2 <- lapply(L, function(x) { x$Time2 <- na.locf0(x$Time); x })
L3 <- lapply(L, transform, Time2 = na.locf0(Time))
If your aim is to modify BOD1 and BOD2 as opposed to creating a list with the modified BOD1 and BOD2 then the following would do that (although it is usually better to organize objects in a list if you intend to iterate over them) rather than leave them loose in the global environment.
nms <- c("BOD1", "BOD2")
for(nm in nms) assign(nm, transform(get(nm), Time2 = na.locf0(Time)))
or
nms <- c("BOD1", "BOD2")
for(nm in nms) .GlobalEnv[[nm]]$Time2 <- na.locf0(.GlobalEnv[[nm]]$Time2)
or other variations.

looping through a named vector

i am trying to figure out how to loop through a named vector of regression coefficients. i want to loop through the vector and detect whether or not a coefficient name contains the string 'country'. if it does, i want to append the corresponding value to an empty vector. i already solved this using dplyr tools, but i also want to do it using a for loop.
this is what my data looks like:
str(co2_per_cap_model$coefficients)
Named num [1:164] -0.0511 0.3289 1.2352 3.0743 0.8654 ...
- attr(*, "names")= chr [1:164] "(Intercept)" "time" "countryAlbania" "countryAlgeria" ...
this is the loop i've been tinkering with. any advice? thank you in advance.
storage <- c()
for(coeff in co2_per_cap_model$coefficients){
if(str_detect(names(co2_per_cap_model$coefficients), 'country')){
storage <- c(coeff, storage)
}
}

We need to create some reproducible data. Then just use grep:
set.seed(42)
coef <- 1:25
names(coef) <- sample(LETTERS[1:5], 25, replace=TRUE)
str(coef)
# Named int [1:25] 1 2 3 4 5 6 7 8 9 10 ...
# - attr(*, "names")= chr [1:25] "A" "E" "A" "A" ...
idx <- grep("A", names(coef))
coef[idx]
# A A A A A A A
# 1 3 4 9 11 17 18

Looping apply function over list of dataframes

I have looked through various Overflow pages with similar questions (some linked) but haven't found anything that seems to help with this complicated task.
I have a series of data frames in my workspace and I would like to loop the same function (rollmean or some version of that) over all of them, then save the results to new data frames.
I have written a couple of lines of to generate a list of all data frames and a for loop that should iterate an apply statement over each data frame; however, I'm having problems trying to accomplish everything I'm hoping to achieve (my code and some sample data are included below):
1) I would like to restrict the rollmean function to all columns, except the 1st (or first several), so that the column(s) 'info' does not get averaged. I would also like to add this column(s) back to the output data frame.
2) I want to save the output as a new data frame (with a unique name). I do not care if it is saved to the workspace or exported as an xlsx, as I already have batch import codes written.
3) Ideally, I would like the resultant data frame to be the same number of observations as the input, where as rollmean shrinks your data. I also do not want these to become NA, so I don't want to use fill = NA This could be accomplished by writing a new function, passing type = "partial" in rollmean (though that still shrinks my data by 1 in my hands), or by starting the roll mean on the nth+2 term and binding the non averaged nth and nth+1 terms to the resulting data frame. Any way is fine.
(see picture for detail, it illustrates what the later would look like)
My code only accomplishes parts of these things and I cannot get the for loop to work together but can get parts to work if I run them on single data frames.
Any input is greatly appreciated because I'm out of ideas.
#reproducible data frames
a = as.data.frame(cbind(info = 1:10, matrix(rexp(200), 10)))
b = as.data.frame(cbind(info = 1:10, matrix(rexp(200), 10)))
c = as.data.frame(cbind(info = 1:10, matrix(rexp(200), 10)))
colnames(a) = c("info", 1:20)
colnames(b) = c("info", 1:20)
colnames(c) = c("info", 1:20)
#identify all dataframes for looping rollmean
dflist = as.list(ls()[sapply(mget(ls(), .GlobalEnv), is.data.frame)]
#for loop to create rolling average and save as new dataframe
for (j in 1:length(dflist)){
list = as.list(ls()[sapply(mget(ls(), .GlobalEnv), is.data.frame)])
new.names = as.character(unique(list))
smoothed = as.data.frame(
apply(
X = names(list), MARGIN = 1, FUN = rollmean, k = 3, align = 'right'))
assign(new.names[i], smoothed)
}
I also tried a nested apply approach but couldn't get it to call the rollmean/rollapply function similar to issue here so I went back to for loops but if someone can make this work with nested applies, I'm down!
Picture is ideal output: Top is single input dataframe with colored boxes demonstrating a rolling average across all columns, to be iterated over each column; bottom is ideal output with colors reflecting the location of output for each colored window above

To approach this, think about one column, then one frame (which is just a list of columns), then a list of frames.
(My data used is at the bottom of the answer.)
One Column
If you don't like the reduction of zoo::rollmean, then write your own:
myrollmean <- function(x, k, ..., type=c("normal","rollin","keep"), na.rm=FALSE) {
type <- match.arg(type)
out <- zoo::rollmean(x, k, ...)
aug <- c()
if (type == "rollin") {
# effectively:
# c(mean(x[1]), mean(x[1:2]), ..., mean(x[1:j]))
# for the j=k-1 elements that precede the first from rollmean,
# when it'll become something like:
# c(mean(x[3:5]), mean(x[4:6]), ...)
aug <- sapply(seq_len(k-1), function(i) mean(x[seq_len(i)], na.rm=na.rm))
} else if (type == "keep") {
aug <- x[seq_len(k-1)]
}
out <- c(aug, out)
out
}
myrollmean(1:8, k=3) # "normal", default behavior
# [1] 2 3 4 5 6 7
myrollmean(1:8, k=3, type="rollin")
# [1] 1.0 1.5 2.0 3.0 4.0 5.0 6.0 7.0
myrollmean(1:8, k=3, type="keep")
# [1] 1 2 2 3 4 5 6 7
I caution that this implementation is a bit naïve at best, and needs to be fixed. Make sure that you understand what it is doing when you pick other than "normal" (which will not work for you, I'm just defaulting to the normal zoo::rollmean behavior). This function could easily be applied to other zoo::roll* functions.
On one column of the data:
rbind(
dflist[[1]][,2], # for comparison
myrollmean(dflist[[1]][,2], k=3, type="keep")
)
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 1.865352 0.4047481 0.1466527 1.7307097 0.08952618 0.6668976 1.0743669 1.511629 1.314276 0.1565303
# [2,] 1.865352 0.4047481 0.8055844 0.7607035 0.65562952 0.8290445 0.6102636 1.084298 1.300091 0.9941452
One "frame"
Simple use of lapply, omitting the first column:
str(dflist[[1]][1:4, 1:3])
# 'data.frame': 4 obs. of 3 variables:
# $ info: num 1 2 3 4
# $ 1 : num 1.865 0.405 0.147 1.731
# $ 2 : num 0.745 1.243 0.674 1.59
dflist[[1]][-1] <- lapply(dflist[[1]][-1], myrollmean, k=3, type="keep")
str(dflist[[1]][1:4, 1:3])
# 'data.frame': 4 obs. of 3 variables:
# $ info: num 1 2 3 4
# $ 1 : num 1.865 0.405 0.806 0.761
# $ 2 : num 0.745 1.243 0.887 1.169
(For validation, column $ 1 matches the second row in the "one column" example above.)
List of "frames"
(I reset the data to what it was before I modified it above ... see the "data" code at the bottom of the answer.)
We nest the previous technique into another lapply:
dflist2 <- lapply(dflist, function(ldf) {
ldf[-1] <- lapply(ldf[-1], myrollmean, k=3, type="keep")
ldf
})
str(lapply(dflist2, function(a) a[1:4, 1:3]))
# List of 3
# $ :'data.frame': 4 obs. of 3 variables:
# ..$ info: num [1:4] 1 2 3 4
# ..$ 1 : num [1:4] 1.865 0.405 0.806 0.761
# ..$ 2 : num [1:4] 0.745 1.243 0.887 1.169
# $ :'data.frame': 4 obs. of 3 variables:
# ..$ info: num [1:4] 1 2 3 4
# ..$ 1 : num [1:4] 0.271 3.611 2.36 3.095
# ..$ 2 : num [1:4] 0.127 0.722 0.346 0.73
# $ :'data.frame': 4 obs. of 3 variables:
# ..$ info: num [1:4] 1 2 3 4
# ..$ 1 : num [1:4] 1.278 0.346 1.202 0.822
# ..$ 2 : num [1:4] 0.341 1.296 1.244 1.528
(Again, for simple validation, see that the first frame's $ 1 row shows the same rolled means as the second row of the "one column" example, above.)
PS:
if you need to skip more than just the first column, then inside the outer lapply, use instead ldf[-(1:n)] <- lapply(ldf[-(1:n)], myrollmean, k=3, type="keep") to skip the first n columns
to use a window function other than zoo::rollmean, you'll want to change the special-cases of myrollmean, though it should be straight-forward enough given this example
I use a concocted str(...) to shorten the output for display here. You should verify all of your data that it is doing what you expect for the whole of each frame.
Reproducible Data
set.seed(2)
a = as.data.frame(cbind(info = 1:10, matrix(rexp(200), 10)))
b = as.data.frame(cbind(info = 1:10, matrix(rexp(200), 10)))
c = as.data.frame(cbind(info = 1:10, matrix(rexp(200), 10)))
colnames(a) = c("info", 1:20)
colnames(b) = c("info", 1:20)
colnames(c) = c("info", 1:20)
dflist <- list(a,b,c)
str(lapply(dflist, function(a) a[1:3, 1:4]))
# List of 3
# $ :'data.frame': 3 obs. of 4 variables:
# ..$ info: num [1:3] 1 2 3
# ..$ 1 : num [1:3] 1.865 0.405 0.147
# ..$ 2 : num [1:3] 0.745 1.243 0.674
# ..$ 3 : num [1:3] 0.356 0.689 0.833
# $ :'data.frame': 3 obs. of 4 variables:
# ..$ info: num [1:3] 1 2 3
# ..$ 1 : num [1:3] 0.271 3.611 3.198
# ..$ 2 : num [1:3] 0.127 0.722 0.188
# ..$ 3 : num [1:3] 1.99 2.74 4.78
# $ :'data.frame': 3 obs. of 4 variables:
# ..$ info: num [1:3] 1 2 3
# ..$ 1 : num [1:3] 1.278 0.346 1.981
# ..$ 2 : num [1:3] 0.341 1.296 2.094
# ..$ 3 : num [1:3] 1.1159 3.05877 0.00506

Below dfnames is the names of the data frames in env, the global environment -- we have named it env in case you want to later change where they are located. Note that ls has a pattern= argument and if the data frame names have a distinct pattern then dfnames <- ls(pattern=whatever) could be used instead where whatever is a suitable regular expression.
Now define make_new which calls rollapplyr with a new mean function mean3 which returns the last value of its input if the input vector has a length less than 3 and mean otherwise. Then loop over the names using rollappyr with FUN=mean3 and partial=TRUE.
library(zoo)
env <- .GlobalEnv
dfnames <- Filter(function(x) is.data.frame(get(x, env)), ls(env))
# make_new - first version
mean3 <- function(x, k = 3) if (length(x) < k) tail(x, 1) else mean(x)
make_new <- function(df) replace(df, -1, rollapplyr(df[-1], 3, mean3, partial = TRUE))
for(nm in dfnames) env[[paste(nm, "new", sep = "_")]] <- make_new(get(nm, env))
Alternative version of make_new
An alternative to the first version of make_new shown above is the following second version. In the second version instead of defining mean3 we use just plain mean but specify a vector of widths w in rollapplyr such that w equals c(1, 1, 3, 3, ..., 3). Thus it takes the mean of just the last element for the first two input components and the mean of the 3 last elements for the rest. Note that now that we specify the widths explicitly we no longer need to specify partial= .
# make_new -- second version
make_new <- function(df) {
w <- replace(rep(3, nrow(df)), 1:2, 1)
replace(df, -1, rollapplyr(df[-1], w, mean))
}
Note
Normally when writing R and manpulating a set of objects one stores the objects in a list rather than leaving them loose in the global environment. We could create such a list L like this and then use lapply to create a second list L2 containing the new versions. Either version of make_new would work here.
L <- mget(dfnames, env)
L2 <- lapply(L, make_new)

R: How do you loop an linear model over a list of data frames?

I have a list of data frames called AllFramesCoeff. I want to generate a random number, have my list of data frames reference that number to refer to a random data frame and use a for loop over one of the 185 data frames in the list for two specific columns with an lm model. I want it to do 1000 random tests.
I also want to put the lm coefficient results in an object, probably a vector.
My plan is to later go back and create histograms, distributions and maybe plugin new columns to repeat it.
What I've tried:
m <- matrix(0, ncol = 2)
CorrResults<- as.data.frame(m)
for (i in length(WaFramesCoeff)) function() {
r <- sample(185, 1)
CorrLM <-lm( WaFramesCoeff[i]$ `Nights_&_Weekends_Min_Used` ~ WaFramesCoeff[i]$ `Taxes,_Surcharges_and_Fees` ,data=WaFramesCoeff[i] )
CorrResults[i,]<- CorrLM$Coeff
}
and then:
m <- matrix(0, ncol = 2)
CorrResults<- as.data.frame(m)
for (i in length(WaFramesCoeff)) {
r <- sample(185, 1)
function(x){
CorrLM <-lm( x$ `Nights_&_Weekends_Min_Used` ~ x$ `Taxes,_Surcharges_and_Fees` ,data=x )
}
CorrResults[i,]<- CorrLM$Coeff
}
I know this site prefers reproducible data so I apologize for the lack of it. I and a peer could not figure this out; I'm sure it's obvious but I've exhausted all my knowledge.
EDIT:
I came closer. But each of the 1000 only shows me the intercept. Also plot shows only a point so I obviously did not do this right.
CorrResults <- matrix(0, 1,1000)
for (i in 1:1000) {
d <- sample(WaFramesAll,1)
w <- sapply( d, TestLM )
CorrResults[i]<- w
}

Let's step through a different way to imagine doing this kind of thing.
First, know that for loops have their place, and when done properly they can be just as fast as an *apply function. Though your use of the loop is syntactically correct, there are different ways to use it that may make more sense. You are trying to run a series of commands or a function on multiple elements of a list. Imagine this simple plan: for each element in the list, take the first element and then double and square it:
invec <- list(c(21,22),c(23,24),c(25,26))
str(invec)
# List of 3
# $ : num [1:2] 21 22
# $ : num [1:2] 23 24
# $ : num [1:2] 25 26
outvec <- replicate(length(invec), NULL) # preallocate same size
for (i in seq_along(invec)) {
outvec[[i]] <- c(2*invec[[i]][1], invec[[i]][1]^2)
}
str(outvec)
# List of 3
# $ : num [1:2] 42 441
# $ : num [1:2] 46 529
# $ : num [1:2] 50 625
Seems simple enough. Now let's see how to do this same thing with an *apply function:
invec <- list(c(21,22),c(23,24),c(25,26))
outvec <- lapply(invec, function(a) c(2*a[1], a[1]^2))
str(outvec)
# List of 3
# $ : num [1:2] 42 441
# $ : num [1:2] 46 529
# $ : num [1:2] 50 625
The way to read the apply function is "take the vector invec, and call this function on each element, capturing the results into a list names outvec". The function can be "anonymous" (like it is here), or it can be a "named" function, such as
lapply(invec, max)
# [[1]]
# [1] 22
#
# [[2]]
# [1] 24
#
# [[3]]
# [1] 26
So how does this help your sampling problem? Let me diverge for another second.
Are you aware than you can index a vector and list arbitrarily? For instance:
str(invec[c(1,3,2,3,2,3)])
# List of 6
# $ : num [1:2] 21 22
# $ : num [1:2] 25 26
# $ : num [1:2] 23 24
# $ : num [1:2] 25 26
# $ : num [1:2] 23 24
# $ : num [1:2] 25 26
There are dupes, okay. Let's say we want to grab 1000 random samples from this very short list:
set.seed(3)
ind <- sample(length(invec), size=1000, replace=TRUE)
str(outvec[1:4])
# List of 4
# $ : num 42
# $ : num 46
# $ : num 50
# $ : num 46
outvec <- lapply(invec[ind], function(a) 2*a[1])
str(outvec[1:4])
# List of 4
# $ : num 42
# $ : num 50
# $ : num 46
# $ : num 42
Okay, so we've sampled the original list 1000 times and done our processing of it (2*a[1]), and stored the results.
So let's apply this to your scenario. Since your data is sight-unseen, I'll make up some.
set.seed(2)
n <- 20
lst <- lapply(1:185, function(ign) data.frame(x=sample(100,size=n), y=sample(100,size=n)))
str(lst[1:2])
# List of 2
# $ :'data.frame': 20 obs. of 2 variables:
# ..$ x: int [1:20] 19 70 57 17 91 90 13 78 44 51 ...
# ..$ y: int [1:20] 67 39 83 15 34 47 97 96 89 13 ...
# $ :'data.frame': 20 obs. of 2 variables:
# ..$ x: int [1:20] 99 30 12 16 91 76 92 33 47 74 ...
# ..$ y: int [1:20] 78 88 62 26 83 42 37 43 21 7 ...
Now I have a list of 185 data.frames, each with the same two variables x and y. Let's apply your question to this data. Oh, and randomness can be time-consuming. (BTW: it is much faster to get 1000 random numbers once then 1 random number 1000 times.)
ind <- sample(185, size=1000, replace=TRUE)
Now, lst[ind] will be a list, 1000 elements long, each a random selection from the original list.
lms <- lapply(lst[ind], function(a) lm(y~x, data=a))
(The lm part can be whatever you need, as long as it is the same regression applied to all elements. The code in the function can be as long as you need, so perhaps think of it this way:
lms <- lapply(lst[ind], function(a) {
z <- lm(y~x, data=a)
return(z)
})
Does that make sense?) Okay, let's look at some of the output:
summary(lms[[1]])
# Call:
# lm(formula = y ~ x, data = a)
# Residuals:
# Min 1Q Median 3Q Max
# -53.944 -13.463 -1.239 15.473 44.430
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 80.4523 15.3577 5.239 5.56e-05 ***
# x -0.4217 0.2499 -1.687 0.109
# ---
# Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Residual standard error: 25.23 on 18 degrees of freedom
# Multiple R-squared: 0.1366, Adjusted R-squared: 0.0886
# F-statistic: 2.847 on 1 and 18 DF, p-value: 0.1088
summary(lms[[2]])
# Call:
# lm(formula = y ~ x, data = a)
# Residuals:
# Min 1Q Median 3Q Max
# -55.108 -20.653 -0.465 18.827 42.747
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 60.7651 12.2366 4.966 1e-04 ***
# x -0.1898 0.2060 -0.922 0.369
# ---
# Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Residual standard error: 27.37 on 18 degrees of freedom
# Multiple R-squared: 0.04506, Adjusted R-squared: -0.007996
# F-statistic: 0.8493 on 1 and 18 DF, p-value: 0.3689
"But I don't need the whole model, I just want the coefficients!" Sure, you're right. When you know you only need one thing, you can obviously just cut-to-the-chase and get that directly (such as coef(lm(y~x,data=a))). So, instead of me re-running the 1000 regressions of random samples, I can just do another lapply:
coefs <- lapply(lms[1:3], coef)
str(coefs[1:3])
# List of 3
# $ : Named num [1:2] 80.452 -0.422
# ..- attr(*, "names")= chr [1:2] "(Intercept)" "x"
# $ : Named num [1:2] 60.77 -0.19
# ..- attr(*, "names")= chr [1:2] "(Intercept)" "x"
# $ : Named num [1:2] 53.716 -0.189
# ..- attr(*, "names")= chr [1:2] "(Intercept)" "x"
In this case, I actually have a couple of options. I can either stick with this and "rbind" (row-bind) them together, with
head(do.call(rbind, coefs))
# (Intercept) x
# [1,] 80.45230 -0.42173749
# [2,] 60.76507 -0.18979726
# [3,] 53.71643 -0.18883933
# [4,] 49.51803 0.01494021
# [5,] 49.51803 0.01494021
# [6,] 68.25463 -0.25840920
Or I could have used a "simple-apply" earlier that (optionally, but default yes) simplifies the results for you into a matrix or vector. If any of the returned values are of a different size than the others, it will always return a list. (Because of this, it might be more programmatically defensible to not simplify it, do some sanity checks, and then rbind them.)
coefs2 <- t(sapply(lms, coef))
head(coefs2)
# (Intercept) x
# [1,] 80.45230 -0.42173749
# [2,] 60.76507 -0.18979726
# [3,] 53.71643 -0.18883933
# [4,] 49.51803 0.01494021
# [5,] 49.51803 0.01494021
# [6,] 68.25463 -0.25840920
Notice that I had to transpose the output: it's a little kooky and counter-intuitive in that the output (without t(...)) will have 2 rows (one for each regression coefficient) and 1000 columns. So we transpose it, since I for one naturally think of it as row-per-model. This is not required if you can handle it as column-per-model.
So bottom line, your for loop is not syntactically wrong per se, but if you think about doing ONE thing to a vector/list of MANY things in this fashion, you will get significant speed improvements (in this case) and, arguably, once you understand it, much more readable code.

Extract multiple objects from list in R

I have some output from the vegan function specaccum. It is a list of 8 objects of varying lengths;
> str(SPECIES)
List of 8
$ call : language specaccum(comm = PRETEND.DATA, method = "rarefaction")
$ method : chr "rarefaction"
$ sites : num [1:5] 1 2 3 4 5
$ richness : num [1:5] 20.9 34.5 42.8 47.4 50
$ sd : num [1:5] 1.51 2.02 1.87 1.35 0
$ perm : NULL
$ individuals: num [1:5] 25 50 75 100 125
$ freq : num [1:50] 1 2 3 2 4 3 3 3 4 2 ...
- attr(*, "class")= chr "specaccum"
I want to extract three of the lists ('richness', 'sd' and 'individuals') and convert them to columns in a data frame. I have developed a workaround;
SPECIES.rich <- data.frame(SPECIES[["richness"]])
SPECIES.sd <- data.frame(SPECIES[["sd"]])
SPECIES.individuals <- data.frame(SPECIES[["individuals"]])
SPECIES.df <- cbind(SPECIES.rich, SPECIES.sd, SPECIES.individuals)
But this seems clumsy and protracted. I wonder if anyone could suggest a neater solution? (Should I be looking at something with lapply??) Thanks!
Example data to generate the specaccum output;
Set.Seed(100)
PRETEND.DATA <- matrix(sample(0:1, 250, replace = TRUE), 5, 50)
library(vegan)
SPECIES <- specaccum(PRETEND.DATA, method = "rarefaction")

We can concatenate the names in a vector and extract it
SPECIES.df <- data.frame(SPECIES[c("richness", "sd", "individuals")])

Another alternative, similar to akrun, is:
ctoc1 = as.data.frame(cbind(SPECIES$richness, SPECIES$sd, SPECIES$individuals))
Please note that in both cases (my answer and akrun) you will get an error if the lengths of the columns do not match.
e.g.: SPECIES.df <- data.frame(SPECIES[c( "sd", "freq")])
Error in data.frame(richness = c(20.5549865665613, 33.5688503093388, 41.4708434700877, :
arguments imply differing number of rows:7, 47
If so, remember to use length() function :
length(SPECIES$sd) <- 47 # this will add NAs to increase the column length.
SPECIES.df <- data.frame(SPECIES[c("sd", "freq")])
SPECIES.df # dataframe with 2 columns and 7 rows.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R - store a matrix into a single dataframe cell - r

Related

Apply na.locf to multiple datasets

looping through a named vector

Looping apply function over list of dataframes

R: How do you loop an linear model over a list of data frames?

Extract multiple objects from list in R

Categories

Resources