transform a complicate list to a data frame - r

Sorry if this question has been formulated before, but I am totalyy new and I tried what was suggested in other post here with no result. For example I tried:
do.call("rbind", lapply(MET1, as.data.frame))
But it said:
Error in as.data.frame.default(X[[i]], ...) : cannot coerce class
""mixEM"" to a data.frame
I have this list of lists (MET1) with 7 elements (showed below) and I want to transform it into a simplified data frame. Each row of the data frame needs to be one of the elements and I only need the information of lambda, mu and sigma. So basically something like this:
LAMBDA1 LAMBDA2 MU1 MU2 SIGMA1 SIGMA2
0102-A451 0.822 0.178 1711 10850 249 14986
0102-A453 0.813 0.187 1491 4031 108 6877
...
My list of lists is:
str(MET1)
List of 7
$ 0102-A451:List of 9
..$ x : num [1:178] 2088 1579 1638 1507 1862 ...
..$ lambda : num [1:2] 0.822 0.178
..$ mu : num [1:2] 1711 10850
..$ sigma : num [1:2] 249 14986
..$ loglik : num -1440
..$ posterior : num [1:178, 1:2] 0.991 0.997 0.997 0.996 0.996 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : NULL
.. .. ..$ : chr [1:2] "comp.1" "comp.2"
..$ all.loglik: num [1:16] -1703 -1518 -1472 -1450 -1442 ...
..$ restarts : num 0
..$ ft : chr "normalmixEM"
..- attr(*, "class")= chr "mixEM"
$ 0102-A453:List of 9
..$ x : num [1:663] 1414 1506 1399 1423 1421 ...
..$ lambda : num [1:2] 0.813 0.187
..$ mu : num [1:2] 1491 4031
..$ sigma : num [1:2] 108 6877
..$ loglik : num -4847
..$ posterior : num [1:663, 1:2] 0.996 0.997 0.995 0.996 0.996 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : NULL
.. .. ..$ : chr [1:2] "comp.1" "comp.2"
..$ all.loglik: num [1:29] -5760 -4983 -4883 -4861 -4853 ...
..$ restarts : num 0
..$ ft : chr "normalmixEM"
..- attr(*, "class")= chr "mixEM"
...

Here's a tidyverse solution, first we extract the relevant subsets of the lists and convert them to tibble (adding also row numbers). Then we bind the tibbles and do standard tidyr gymnastics :
MET1 %>%
map_dfr(~as_tibble(.x[c("lambda","mu","sigma")]) %>% rownames_to_column,
.id="id") %>%
gather(,,-rowname,-id) %>%
unite(key,key,rowname) %>%
spread(key,value)
# # A tibble: 2 x 7
# id lambda_1 lambda_2 mu_1 mu_2 sigma_1 sigma_2
# * <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 0.822 0.178 1711 10850 249 14986
# 2 2 0.813 0.187 1491 4031 108 6877
data:
MET1 <- list(
list(lambda = c(0.822, 0.178),
mu = c(1711, 10850),
sigma = c(249, 14986),
something_else="whatever"),
list(lambda = c(0.813, 0.187),
mu = c(1491, 4031),
sigma = c(108, 6877),
something_else="whatever")
)

Overview
Use lapply() to transform each list within MET1 into a wide data frame. Then, use rbind()
within do.call() to collapse all the lists within MET1 into one data frame. Thank you #Moody_Mudskipper for creating the reproducible data.
# load data
MET1 <-
list(
`0102-A451` = list(
lambda = c(0.822, 0.178),
mu = c(1711, 10850),
sigma = c(249, 14986),
something_else = "whatever"
)
, `0102-A453` = list(
lambda = c(0.813, 0.187),
mu = c(1491, 4031),
sigma = c(108, 6877),
something_else = "whatever"
)
)
# Transfrom MET1
# so that each list is a wide data frame
MET1 <-
lapply(
X = MET1
, FUN = function( i )
data.frame(
LAMBDA_1 = i[["lambda"]][1]
, LAMBDA_2 = i[["lambda"]][2]
, MU_1 = i[["mu"]][1]
, MU_2 = i[["mu"]][2]
, SIGMA_1 = i[["sigma"]][1]
, SIGMA_2 = i[["sigma"]][2]
)
)
# now transfrom MET1
# into one data frame
# one row for each data frame within MET1
MET1 <-
do.call(
what = "rbind"
, args = MET1
)
# view results
MET1
# LAMBDA_1 LAMBDA_2 MU_1 MU_2 SIGMA_1
# 0102-A451 0.822 0.178 1711 10850 249
# 0102-A453 0.813 0.187 1491 4031 108
# SIGMA_2
# 0102-A451 14986
# 0102-A453 6877
# end of script #

Related

R Importing ARIMA model outputs to use in forecast

I have undertaken ARIMA modelling using the auto.arima function for 91 models. The outputs are sitting in a list of lists.
The structure of the outputs for one model looks like the following:
List of 19
$ coef : Named num [1:8] -3.17e-01 -3.78e-01 -8.02e-01 -5.39e+04 -1.33e+05 ...
..- attr(*, "names")= chr [1:8] "ar1" "ar2" "ma1" "Price.Diff" ...
$ sigma2 : num 6.37e+10
$ var.coef : num [1:8, 1:8] 1.84e-02 8.90e-03 -7.69e-03 -8.80e+02 2.83e+03 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:8] "ar1" "ar2" "ma1" "Price.Diff" ...
.. ..$ : chr [1:8] "ar1" "ar2" "ma1" "Price.Diff" ...
$ mask : logi [1:8] TRUE TRUE TRUE TRUE TRUE TRUE ...
$ loglik : num -1189
$ aic : num 2395
$ arma : int [1:7] 2 1 0 0 1 1 0
$ residuals: Time-Series [1:87] from 1 to 87: 1810 -59503 263294 240970 94842 ...
$ call : language auto.arima(y = x[, 2], stepwise = FALSE, approximation = FALSE, xreg = x[, 3:ncol(x)], x = list(x = c(1856264.57,| __truncated__ ...
$ series : chr "x[, 2]"
$ code : int 0
$ n.cond : int 0
$ nobs : int 86
$ model :List of 10
..$ phi : num [1:2] -0.317 -0.378
..$ theta: num -0.802
..$ Delta: num 1
..$ Z : num [1:3] 1 0 1
..$ a : num [1:3] -599787 284456 1887763
..$ P : num [1:3, 1:3] 0.00 0.00 -4.47e-23 0.00 3.33e-16 ...
..$ T : num [1:3, 1:3] -0.317 -0.378 1 1 0 ...
..$ V : num [1:3, 1:3] 1 -0.802 0 -0.802 0.643 ...
..$ h : num 0
..$ Pn : num [1:3, 1:3] 1.00 -8.02e-01 -1.83e-23 -8.02e-01 6.43e-01 ...
$ bic : num 2417
$ aicc : num 2398
$ xreg : Time-Series [1:87, 1:5] from 1 to 87: -0.866 -0.466 -1.383 -0.999 -0.383 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:5] "Price.Diff" "Easter" "Christmas" "High.Week" ...
$ x : Time-Series [1:87] from 1 to 87: 1856265 1393925 2200962 2209996 2161707 ...
$ fitted : Time-Series [1:87] from 1 to 87: 1854455 1453429 1937668 1969026 2066864 ...
- attr(*, "class")= chr [1:3] "ARIMA" "forecast_ARIMA" "Arima"
When printed the output looks as follows:
Series: x[, 2]
Regression with ARIMA(2,1,1) errors
Coefficients:
ar1 ar2 ma1 Price.Diff Easter Christmas High.Week Low.Week
-0.3170 -0.3777 -0.8017 -53931.11 -133187.55 -53541.62 -347146.59 216202.71
s.e. 0.1356 0.1319 0.1069 28195.33 68789.25 23396.62 -74115.78 66881.15
sigma^2 estimated as 6.374e+10: log likelihood=-1188.69
AIC=2395.38 AICc=2397.75 BIC=2417.47
I have written the following to export my models to text file format:
# export model outputs to newly created folder
for(i in 1:length(ts_outputs)){
sink(paste0(names(ts_outputs[i]), ".txt"))
print(ts_outputs[i])
sink()
}
This works, to view the model outputs themselves, however I need to be able to import the model outputs back into R to use them to forecast out my time series' forward.
I am assuming that I need to put them back into the original structure once re-imported.
Is there a certain package that has already been written to do this?
Are text files the way to go for the original exporting?
I believe the following is the source code from the forecast package which writes the outputs (https://rdrr.io/github/ttnsdcn/forecast-package/src/R/arima.R):
if (length(x$coef) > 0) {
cat("\nCoefficients:\n")
coef <- round(x$coef, digits=digits)
if (se && nrow(x$var.coef)) {
ses <- rep(0, length(coef))
ses[x$mask] <- round(sqrt(diag(x$var.coef)), digits=digits)
coef <- matrix(coef, 1, dimnames=list(NULL, names(coef)))
coef <- rbind(coef, s.e.=ses)
}
print.default(coef, print.gap=2)
}
cm <- x$call$method
if (is.null(cm) || cm != "CSS")
{
cat("\nsigma^2 estimated as ", format(x$sigma2, digits=digits),
": log likelihood=", format(round(x$loglik, 2)),"\n",sep="")
npar <- length(x$coef) + 1
nstar <- length(x$residuals) - x$arma[6] - x$arma[7]*x$arma[5]
bic <- x$aic + npar*(log(nstar) - 2)
aicc <- x$aic + 2*npar*(nstar/(nstar-npar-1) - 1)
cat("AIC=", format(round(x$aic, 2)), sep="")
cat(" AICc=", format(round(aicc, 2)), sep="")
cat(" BIC=", format(round(bic, 2)), "\n",sep="")
}
else cat("\nsigma^2 estimated as ", format(x$sigma2, digits=digits),
": part log likelihood=", format(round(x$loglik, 2)),
"\n", sep="")
invisible(x)
}
Appreciate any direction/advice.

R - How to extract slope and intercept from lm.fit?

I need a faster way of doing linear regression than the lm() method. I found that lm.fit() is quite a bit faster but I'm wondering how to use the results. For example using this code:
x = 1:5
y = 5:1
regr = lm.fit(as.matrix(x), y)
str(regr)
Outputs:
List of 8
$ coefficients : Named num 0.636
..- attr(*, "names")= chr "x1"
$ residuals : num [1:5] 4.364 2.727 1.091 -0.545 -2.182
$ effects : Named num [1:5] -4.719 1.69 -0.465 -2.619 -4.774
..- attr(*, "names")= chr [1:5] "x1" "" "" "" ...
$ rank : int 1
$ fitted.values: num [1:5] 0.636 1.273 1.909 2.545 3.182
$ assign : NULL
$ qr :List of 5
..$ qr : num [1:5, 1] -7.416 0.27 0.405 0.539 0.674
..$ qraux: num 1.13
..$ pivot: int 1
..$ tol : num 1e-07
..$ rank : int 1
..- attr(*, "class")= chr "qr"
$ df.residual : int 4
I'm expecting intercept = 6 and slope = -1 but the result above doesn't contain anyhing near that. Also, does lm.fit() output r squared?
lm.fit allows to do things much more manually, so, as #MrFlick commented, we must include the intercept manually as well using cbind(1, x) as the design matrix. The R^2 is not provided but we may easily compute it:
x <- 1:5
y <- 5:1 + rnorm(5)
regr <- lm.fit(cbind(1, x), y)
regr$coef
# x
# 5.2044349 -0.5535963
1 - var(regr$residuals) / var(y) # R^2
# [1] 0.3557227
1 - var(regr$residuals) / var(y) * (length(y) - 1) / regr$df.residual # Adj. R^2
# [1] 0.1409636

Using apply over two lists of different lengths

This question is related to my earlier question found here: https://stackoverflow.com/questions/33089532/r-accounting-for-a-factor-with-this-logistic-regression-function-replace-lappl
I realize that I didn't do a good job at asking the first question, so here is a more simple analog with actual data:
My data looks something like this:
#data look like this, but with a variable number of "y" columms
wk<-rep(1:50,2)
X<-rnorm(1:100,1)
y1<-rnorm(1:100,1)
y2<-rnorm(1:100,1)
df1<-as.data.frame(cbind(wk,X,y1,y2))
df1$hyst<-ifelse(df1$wk>=5 & df1$wk<32, "R", "F")
Y<-df1[, -which(colnames(df1) %in% c("wk"))] #this step makes more sense with my actual data since I have a bunch of columns to remove
l1<-length(Y)-1
lst1<-lapply(2:l1,function(x){colnames(Y[x])})
dflst<-c("Y",'Y[Y$hyst=="R",]','Y[Y$hyst=="F",]')
I want to run a model over all Y columns for the full data set (all data) and for two subsets, when the factor hyst=="R" and when hyst=="F".
To do this, I have nested two lapply functions, which sort of works, but I think it essentially doubles my results and is causing me all sorts of list headaches.
Here is the nested lapply code:
lms <- lapply(dflst, function(z){
lapply(lst1, function(y) {
form <- paste0(y, " ~ X")
lm(form, data=eval(parse(text=z)))
})
})
How can I replace or modify the nested lapply function to obtain a model run for each Y column for each data set( all, "R", and "F")?
Construct your DF list like
DFlst <- c(list(full=Y), split(Y, Y$hyst))
str(DFlst)
List of 3
$ full:'data.frame': 100 obs. of 4 variables:
..$ X : num [1:100] 1.792 3.192 0.367 1.632 1.388 ...
..$ y1 : num [1:100] 3.354 1.189 1.99 0.639 0.1 ...
..$ y2 : num [1:100] 0.864 2.415 0.437 1.069 1.368 ...
..$ hyst: chr [1:100] "F" "F" "F" "F" ...
$ F :'data.frame': 46 obs. of 4 variables:
..$ X : num [1:46] 1.792 3.192 0.367 1.632 0.707 ...
..$ y1 : num [1:46] 3.354 1.189 1.99 0.639 0.894 ...
..$ y2 : num [1:46] 0.864 2.415 0.437 1.069 1.213 ...
..$ hyst: chr [1:46] "F" "F" "F" "F" ...
$ R :'data.frame': 54 obs. of 4 variables:
..$ X : num [1:54] 1.388 2.296 0.409 1.494 0.943 ...
..$ y1 : num [1:54] 0.1002 0.6425 -0.0918 1.199 0.8767 ...
..$ y2 : num [1:54] 1.368 1.122 0.402 -0.237 1.518 ...
..$ hyst: chr [1:54] "R" "R" "R" "R" ...
Do some regressions:
res <- lapply(DFlst, function(DF) {
cols = grep("^y[0-9]+$",names(DF),value=TRUE)
lapply(setNames(cols,cols),
function(y) lm(paste(y,"~X"), data=DF))
})
str(res, list.len=2, give.attr=FALSE)
List of 3
$ full:List of 2
..$ y1:List of 12
.. ..$ coefficients : Named num [1:2] 0.903 0.111
.. ..$ residuals : Named num [1:100] 2.2509 -0.0698 1.046 -0.4464 -0.9578 ...
.. .. [list output truncated]
..$ y2:List of 12
.. ..$ coefficients : Named num [1:2] 1.423 -0.166
.. ..$ residuals : Named num [1:100] -0.2623 1.5213 -0.9253 -0.0837 0.1751 ...
.. .. [list output truncated]
$ F :List of 2
..$ y1:List of 12
.. ..$ coefficients : Named num [1:2] 0.9289 0.0769
.. ..$ residuals : Named num [1:46] 2.2871 0.0146 1.0332 -0.4157 -0.0889 ...
.. .. [list output truncated]
..$ y2:List of 12
.. ..$ coefficients : Named num [1:2] 1.4177 -0.0789
.. ..$ residuals : Named num [1:46] -0.413 1.25 -0.952 -0.22 -0.149 ...
.. .. [list output truncated]
[list output truncated]

how to extract data from a list in R

I think the question is correctly phrased but I'm not sure
I have a function which basically calculates a agreement statistic (kappa) between two columns in a series of dataframes. The problem is that the output is a list of lists (I think) so I'm not sure how to get the values I want. Ideally I would like to plot value versus the list name (total..)
Here is the function
lst <- mget(ls(pattern='total\\d+'))
classify_cnv = function (column)
ifelse(column < 2, 1, ifelse(column > 2, 3, 2))
classify_all_cnvs = function (df) {
df$CopyNumber.x = classify_cnv(df$CopyNumber.x)
df$CopyNumber.y = classify_cnv(df$CopyNumber.y)
df
}
result = lapply(lst, classify_all_cnvs)
more<-lapply(result, function(kv){
kappa2(kv[,c(5,8)], "squared")})
the resulting output is
....
$total7
Cohen's Kappa for 2 Raters (Weights: squared)
Subjects = 601
Raters = 2
Kappa = 0.02
z = 0.624
p-value = 0.533
$total8
Cohen's Kappa for 2 Raters (Weights: squared)
Subjects = 620
Raters = 2
Kappa = 0.219
z = 7.27
p-value = 0.000000000000352
....
str(more) gives me
$ total7 :List of 8
..$ method : chr "Cohen's Kappa for 2 Raters (Weights: squared)"
..$ subjects : int 601
..$ raters : int 2
..$ irr.name : chr "Kappa"
..$ value : num 0.02
..$ stat.name: chr "z"
..$ statistic: num 0.624
..$ p.value : num 0.533
..- attr(*, "class")= chr "irrlist"
$ total8 :List of 8
..$ method : chr "Cohen's Kappa for 2 Raters (Weights: squared)"
..$ subjects : int 620
..$ raters : int 2
..$ irr.name : chr "Kappa"
..$ value : num 0.219
..$ stat.name: chr "z"
..$ statistic: num 7.27
..$ p.value : num 0.000000000000352
..- attr(*, "class")= chr "irrlist"
I'd like to end up with a simple dataframe with two columns, one for the name of the parent list (total..) and the other for the value.
I'm guessing the "value" you meant is the field value in your list.
df <- data.frame(name=names(more),
value=sapply(more, function(x) x$value))
creates a data frame with this as content
> df
name value
total7 total7 0.020
total8 total8 0.219

Vegan NA values are breaking envfit, even with na.rm = T. Example images in post

I am trying to overlay envfit arrows on to an NMDS chart like this one (which is when I replace missing values with fake numbers):
However, with our actual data, it doesnt give the arrows but labels each point individually, like this:
Any suggestions would be appreciated.
Code:
# Make MDS
x.mds <- metaMDS(x_matrix, trace = FALSE)
# Extract point co-ordinates for putting into ggplot
NMDS <- data.frame(MDS1 = x.mds$points[,1], MDS2 = x.mds$points[,2])
p <- ggplot(NMDS, aes(MDS1, MDS2))
p + geom_point()
#environmental variables
ef <- envfit(x.mds ~ pH + Ammonia + DO, x.env)
ef <- envfit(x.mds ~ pH + Ammonia + DO, x.env, na.rm = TRUE) ##ALTERNATIVE
plot(ef)
Data:
Sample Region pH Ammonia Nitrate BOD DO
15 N 7.618 0.042 0.845 1 NA
34 N 7.911 0.04 7.41 8 5.62
42 SE 7.75 NA 3.82 1 21.629
........
> ef
***VECTORS
NMDS1 NMDS2 r2 Pr(>r)
pH 0.50849 -0.86107 0.0565 0.719
Ammonia 0.99050 -0.13751 0.0998 0.504
DO -0.88859 -0.45871 0.1640 0.319
P values based on 999 permutations.
1 observation deleted due to missingness
> str(ef)
List of 3
$ vectors :List of 4
..$ arrows : num [1:3, 1:2] 0.508 0.991 -0.889 -0.861 -0.138 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:3] "pH" "Ammonia" "DO"
.. .. ..$ : chr [1:2] "NMDS1" "NMDS2"
.. ..- attr(*, "decostand")= chr "normalize"
..$ r : Named num [1:3] 0.0565 0.0998 0.164
.. ..- attr(*, "names")= chr [1:3] "pH" "Ammonia" "DO"
..$ permutations: num 999
..$ pvals : num [1:3] 0.719 0.504 0.319
..- attr(*, "class")= chr "vectorfit"
$ factors : NULL
$ na.action:Class 'omit' int 17
- attr(*, "class")= chr "envfit"

Resources