How to flatten nested list keeping inner structure - r

Let's say I have the following list (result), which is a nested list that captures information from a model: parameters (betas) and their standard errors (sd), additionally with some information regarding the global model (method) and the number of observations (n).
I want to flatten the lists betas and sd while distinguishing where each value of x1 and x2 comes from (i.e. if they are from betas or sd).
Please gently consider the following example:
result<- list(n = 100,
method = "tree",
betas = list(x1 = 1.47,
x2 = -2.85),
sd = list(x1 = 0.55,
x2 = 0.25))
str(result)
# List of 4
# $ n : num 100
# $ iterations: num 50
# $ betas :List of 2
# ..$ x1: num 1.47
# ..$ x2: num -2.85
# $ sd :List of 2
# ..$ x1: num 0.55
# ..$ x2: num 0.25
First attempt: flatten(). [Spoiler(!): I lose the precedence of each value]
## I can't distinguish between betas and sd.
flatten(result)
# $n
# [1] 100
#
# $iterations
# [1] 50
#
# $x1
# [1] 1.47
#
# $x2
# [1] -2.85
#
# $x1
# [1] 0.55
#
# $x2
# [1] 0.25
Second attempt: unlist(). [Spoiler(!), I need a list, not an atomic vector]
#I need a list
unlist(result)
# n iterations betas.x1 betas.x2 sd.x1 sd.x2
# 100.00 50.00 1.47 -2.85 0.55 0.25
Desired Output.
list(n = 100,
method = "tree",
betas.x1 = 1.47,
betas.x2 = -2.85,
sd.x1 = 0.55,
sd.x2 = 0.25)
# List of 6
# $ n : num 100
# $ method : chr "tree"
# $ betas.x1: num 1.47
# $ betas.x2: num -2.85
# $ sd.x1 : num 0.55
# $ sd.x2 : num 0.25

as.data.frame will flatten for you. From ?as.data.frame:
Arrays with more than two dimensions are converted to matrices by
'flattening' all dimensions after the first and creating suitable
column labels.
Which does a poor job of explaining that it operates on nested lists as well, not just arrays. (In other words, I think the docs do not discuss this feature on non-arrays.)
str(as.data.frame(result))
# 'data.frame': 1 obs. of 6 variables:
# $ n : num 100
# $ method : chr "tree"
# $ betas.x1: num 1.47
# $ betas.x2: num -2.85
# $ sd.x1 : num 0.55
# $ sd.x2 : num 0.25
If you don't want/need a list, just as.list it next:
str(as.list(as.data.frame(result)))
# List of 6
# $ n : num 100
# $ method : chr "tree"
# $ betas.x1: num 1.47
# $ betas.x2: num -2.85
# $ sd.x1 : num 0.55
# $ sd.x2 : num 0.25

Related

apply() function by row requires each row to be a numeric vector

I have a data frame such as in this example:
sample1 sample2 sample3
test1 0.1 0.3 0.03
test2 0.24 0.4 0.5
...(many rows)
I am trying to apply() the function sumlog() from package metap, which calculates the combined p-value using Fisher's procedure, to each row of the data frame, each consisting of 3 p-values obtained in a given test ran in 3 independent samples.
This function works like this (for the p-values in the first row):
sumlog(c(0.1, 0.3, 0.03))
but since
dt[1,]
yields
sample1 sample2 sample3
test1 0.6408721 0.2650909 0.8808415
and
class(dt[1,])
is "data.frame", when I run the apply() function
apply(dt, 1, sumlog)
it returns
Error in log(p[keep]) : non-numeric argument to mathematical function
which is the same that I obtain when running the sumlog() function for a single row:
sumlog(dt[1,])
Error in log(p[keep]) : non-numeric argument to mathematical function
How could I make apply() take each row as a numeric vector for sumlog() to process it?
EDIT:
> dput(head(df, 6))
structure(list(sample1 = list(0.640872129337761,
0.609000106674239, 0.895097234385105, 0.965620545232963,
0.383226609468318, 0.577994668964293), sample2 = list(
0.265090939404131, 0.472455371057292, 0.0126943959203454,
0.0968610413223728, 0.881022723350396, 0.311841106080399),
sample3 = list(0.880841481464769, 0.924264965127336,
0.684971652341359, 0.07916491063753, 0.204131282086192,
0.259781528310932)), row.names = c("test1", "test2",
"test3", "test4", "test5", "test6"), class = "data.frame")
Your data has embedded lists ("list-columns").
str(dt)
# 'data.frame': 6 obs. of 3 variables:
# $ sample1:List of 6
# ..$ : num 0.641
# ..$ : num 0.609
# ..$ : num 0.895
# ..$ : num 0.966
# ..$ : num 0.383
# ..$ : num 0.578
# $ sample2:List of 6
# ..$ : num 0.265
# ..$ : num 0.472
# ..$ : num 0.0127
# ..$ : num 0.0969
# ..$ : num 0.881
# ..$ : num 0.312
# $ sample3:List of 6
# ..$ : num 0.881
# ..$ : num 0.924
# ..$ : num 0.685
# ..$ : num 0.0792
# ..$ : num 0.204
# ..$ : num 0.26
While "normal" frames look like:
str(mtcars[,1:3])
# 'data.frame': 32 obs. of 3 variables:
# $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
# $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
# $ disp: num 160 160 108 258 360 ...
To do what you need, first you need to unlist each column, then you can do your stuff.
dt[] <- lapply(dt, unlist)
apply(dt, 1, sum)
# test1 test2 test3 test4 test5 test6
# 1.786805 2.005720 1.592763 1.141646 1.468381 1.149617
(The use of dt[] <- instead of just dt <- is because lapply will return a list instead of a data.frame. By using dt[], we're saying "overwrite the columns with this list of objects but preserve dt's class".)
(I know I'm using sum and you're using sumlog, but I believe the premise is still correct and your needs will work with the non-list structure.)

How to split list at every 10th item in R?

I have a list of 100 items.
I want to split it after each 10th item in Code 1.
Code 2 is about a list of two former lists and splitting it to 20 lists of 10 items each.
Code 1
Expected output: ten lists of 10 items.
A <- 100
a <- rnorm(A) # [1:100]
n <- 10
str(a)
# Not resulting in equal size of chunks with vectors so reject
# http://stackoverflow.com/a/3321659/54964
#d <- split(d, ceiling(seq_along(d)/(length(d)/n)))
# Works for vectors but not with lists
# http://stackoverflow.com/a/16275428/54964
#d <- function(d,n) split(d, cut(seq_along(d), n, labels = FALSE))
str(d)
Test code 2
Input: a list of two lists
aa <- list(a, rnorm(a))
Expected output: 20 lists of 10 item size
Testing Loki's answer
segmentLists <- function(A, segmentSize) {
res <- lapply(A, function(x) split(unlist(x), cut(seq_along(unlist(x)), segmentSize, labels = F)))
#print(res)
res <- unlist(res, recursive = F)
}
segmentLists(aa, 10)
Output: loop going on, never stopping
OS: Debian 8.5
R: 3.3.1
you can use lapply.
aa <- list(a, rnorm(a))
aa
n <- 10
x <- lapply(aa, function(x) split(unlist(x), cut(seq_along(unlist(x)), n, labels = F)))
y <- unlist(x, recursive = F)
str(y)
# List of 20
# $ 1 : num [1:10] 1.0895 -0.0477 0.225 -0.6308 -0.1558 ...
# $ 2 : num [1:10] -0.469 -0.381 0.709 -0.798 1.183 ...
# $ 3 : num [1:10] 0.757 -1.128 -1.394 -0.712 0.494 ...
# $ 4 : num [1:10] 1.135 0.324 0.75 -0.83 0.794 ...
# $ 5 : num [1:10] -0.786 -0.068 -0.179 0.354 -0.597 ...
# $ 6 : num [1:10] -0.115 0.164 -0.365 -1.827 -2.036 ...
...
length(y)
# [1] 20
to remove the names of the list elements in y ($ 1, $ 2 etc.) you can use unname()
str(unname(y))
# List of 20
# $ : num [1:10] 1.0895 -0.0477 0.225 -0.6308 -0.1558 ...
# $ : num [1:10] -0.469 -0.381 0.709 -0.798 1.183 ...
# $ : num [1:10] 0.757 -1.128 -1.394 -0.712 0.494 ...
# $ : num [1:10] 1.135 0.324 0.75 -0.83 0.794 ...
# $ : num [1:10] -0.786 -0.068 -0.179 0.354 -0.597 ...
...
Using a function, you have to return res at the end of the function.
segmentLists <- function(A, segmentSize)
{
res <- lapply(A, function(x) split(unlist(x), cut(seq_along(unlist(x)), segmentSize, labels = F)))
#print(res)
res <- unlist(res, recursive = F)
res <- unname(res)
res
}

R: perform parameter sweep and collect results in long data frame

I am looking the right R idiom to run a function over a set of parameters and create a long data frame from the results. Imagine that you have the following toy function:
fun <- function(sd, mean, foobar = "foobar") {
list(random = rnorm(10) * sd + mean + 1:10, foobar = foobar)
}
Now you want to run fun over different values of sd and mean:
par_sd <- rep(1:5, 3)
par_mean <- rep(0:2, each = 5)
pars <- data.frame(sd = par_sd, mean = par_mean)
I want to run fun for the parameters in each row of pars, and collect the results in a data frame with columns sd, mean, pos, value. Here is a rather clumsy solution:
set.seed(42)
## Run fun
res <- lapply(seq_len(nrow(pars)), function(x) {
do.call(fun, as.list(pars[x, ]))
})
## Select the result we need
res <- lapply(res, "[[", "random")
## Make it a single data frame
res <- do.call(rbind, res)
## Together with the parameters
res <- as.data.frame(cbind(sd = par_sd, mean = par_mean, res))
colnames(res) <- c("sd", "mean", 1:10)
## Make it a long data frame
res <- reshape2::melt(res, id.vars=c("sd", "mean"),
variable.name = "pos", value.name="value")
## Done
res[1:5,]
#> sd mean pos value
#> 1 1 0 1 2.37095845
#> 2 2 0 1 3.60973931
#> 3 3 0 1 0.08008422
#> 4 4 0 1 2.82180049
#> 5 5 0 1 2.02999300
Is there a simpler way to do this? Anyone knows a package that does things like this? My quick search did not give any good results...
If you're willing to amend fun() to return a data.frame, I find the most elegant solution is plyr's mdply.
fun <- function(sd, mean, foobar = "foobar") {
data.frame(random = rnorm(10) * sd + mean + 1:10, foobar = foobar)
}
par_sd <- rep(1:5, 3)
par_mean <- rep(0:2, each = 5)
pars <- data.frame(sd = par_sd, mean = par_mean)
results = mdply(pars, fun, foobar = "stuff")
str(results)
mapply would seem a good fit:
> str(with(pars, mapply(fun, sd=sd, mean=mean) ) )
List of 30
$ : num [1:10] 3.16 2.28 2.84 1.49 3.43 ...
$ : chr "foobar"
$ : num [1:10] 3.429 0.157 0.583 1.542 6.485 ...
$ : chr "foobar"
$ : num [1:10] -4.56 -1.51 -1.33 7.16 3.21 ...
$ : chr "foobar"
$ : num [1:10] -2.275 2.225 4.196 0.962 15.739 ...
$ : chr "foobar"
$ : num [1:10] 6.23 10.08 2.85 6.81 4.51 ...
$ : chr "foobar"
$ : num [1:10] 1.65 3.15 5.62 5.91 6.14 ...
$ : chr "foobar"
$ : num [1:10] 4.26 1.95 7.33 2.72 6.29 ...
$ : chr "foobar"
$ : num [1:10] 7.53 6.74 3.6 6.43 3.08 ...
$ : chr "foobar"
$ : num [1:10] -0.4181 -0.0584 5.5812 1.038 8.2482 ...
$ : chr "foobar"
$ : num [1:10] 0.2377 4.8557 5.2177 -0.0706 2.0434 ...
$ : chr "foobar"
$ : num [1:10] 2.95 4.3 5.26 8.58 5.81 ...
$ : chr "foobar"
$ : num [1:10] -0.85 4.83 8.19 5.17 6.58 ...
$ : chr "foobar"
$ : num [1:10] 3.59 11.46 6.29 6.57 2.97 ...
$ : chr "foobar"
$ : num [1:10] 0.117 3.142 10.473 10.196 5.56 ...
$ : chr "foobar"
$ : num [1:10] 13.03 2.64 -1.07 5.29 1.97 ...
$ : chr "foobar"
- attr(*, "dim")= int [1:2] 2 15
- attr(*, "dimnames")=List of 2
..$ : chr [1:2] "random" "foobar"
..$ : NULL
By default mapply will attempt to simplify and if you wanted to keep them as separate objects you could negate that default:
> str(with(pars, mapply(fun, sd=sd, mean=mean, SIMPLIFY=FALSE) ) )
List of 15
$ :'data.frame': 10 obs. of 2 variables:
..$ random: num [1:10] 1.08 0.68 3.16 3.38 5.96 ...
..$ foobar: Factor w/ 1 level "foobar": 1 1 1 1 1 1 1 1 1 1
$ :'data.frame': 10 obs. of 2 variables:
..$ random: num [1:10] 0.0927 5.1506 -1.0109 2.7136 2.1263 ...
..$ foobar: Factor w/ 1 level "foobar": 1 1 1 1 1 1 1 1 1 1
$ :'data.frame': 10 obs. of 2 variables:
..$ random: num [1:10] -0.331 2.9 -1.705 5.471 4.712 ...
..$ foobar: Factor w/ 1 level "foobar": 1 1 1 1 1 1 1 1 1 1
snipped
And if you need them in one stacked dataframe, it's just:
> str(do.call( rbind, with(pars, mapply(fun, sd=sd, mean=mean, SIMPLIFY=FALSE) ) ))
'data.frame': 150 obs. of 2 variables:
$ random: num 1 3.34 2.5 4.72 4.25 ...
$ foobar: Factor w/ 1 level "foobar": 1 1 1 1 1 1 1 1 1 1 ...
If you want these "labeled" with the sd and mean values, just this modification of the constructor function:
fun <- function(sd, mean, foobar = "foobar") {
data.frame(random = rnorm(10) * sd + mean + 1:10,
sd=sd, mean=mean, foobar = foobar)
}
str(do.call( rbind, with(pars,
mapply(fun, sd=sd, mean=mean, SIMPLIFY=FALSE) ) ))
#---------------
'data.frame': 150 obs. of 4 variables:
$ random: num 1.42 1.13 3.73 4.5 5.63 ...
$ sd : int 1 1 1 1 1 1 1 1 1 1 ...
$ mean : int 0 0 0 0 0 0 0 0 0 0 ...
$ foobar: Factor w/ 1 level "foobar": 1 1 1 1 1 1 1 1 1 1 ...

Impute missing values

I want to impute some data. I use the data moss from the package mvoutlier. The goal is to impute the values < 0.004 from the column Bi. Because the moss date are compositional data, I use methods from the package robCompositions. When I try to impute the values, I get an error.
Code:
library(mvoutlier)
library(robCompositions)
data(moss)
attach(moss)
x <- moss[-c(1,2,3)] # copying the data from moss, withoud the first 3 variables into x
x$Bi[Bi < 0.004] <- 0 # the values that are under 0.004 are replaced with 0
res <- impRZilr(x,dl=c(0,0,0,0,0,0.004,rep(0,25)))
|======= | 10%Error in !all.equal(x[!w], xOrig[!w]) : invalid argument type
Don't know how to handle this error
library(mvoutlier)
library(robCompositions)
data(moss)
x <- moss[-c(1,2,3)] #copying the data from moss, withoud the first 3 variables into x
### Before
head(x$Bi)
## [1] 0.002 0.039 0.012 0.033 0.002 0.052
# Impute below 0.004
x$Bi[x$Bi < 0.004] <- 0
## head(x$Bi)
## [1] 0.000 0.039 0.012 0.033 0.000 0.052
# Imputation
result <- impRZilr(x, dl = rep(0.004, nrow(x)))
res <- data.frame(result$x)
head(res$Bi)
## [1] 0.002515667 0.039000000 0.012000000 0.033000000 0.002836172 0.052000000
As you can see, the values that were 0 are replaced by the impRZilr function values.
EDIT
Here is a description of how to access the results as required in your comments.
# Imputation
# Use the verbose = TRUE option to see how the algorithm is iterating
result <- impRZilr(x, dl = rep(0.004, nrow(x)), verbose = TRUE)
### Results description
str(result)
# List of 7
# $ x : num [1:598, 1:31] 0.016 0.073 0.032 0.118 0.038 ...
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : NULL
# .. ..$ : chr [1:31] "Ag" "Al" "As" "B" ...
# $ criteria: num 0.0203
# $ iter : num 4
# $ maxit : num 10
# $ wind : logi [1:598, 1:31] FALSE FALSE FALSE FALSE FALSE FALSE ...
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : chr [1:598] "1" "2" "3" "4" ...
# .. ..$ : chr [1:31] "U" "Bi" "Th" "Tl" ...
# $ nComp : int [1:4] 4 6 3 5
# $ method : chr "pls"
# - attr(*, "class")= chr "replaced"
# Results data.frame with imputed ceros
res <- data.frame(result$x)
# Index of missing values
index_missing_wind <- data.frame(result$wind)
# Number of iterations
result$iter
# [1] 4
# Method used (you can change this)
result$method
The OP wrote in an edit:
I managed to solve the problem, this is what I did:
x <-moss[-c(1,2,3)]
x$Bi[Bi <- 0.004] <- NA
res <- impAll(x)
and the object res contains the imputed matrix.

Building a list in a loop in R - getting item names correct

I have a function which contains a loop over two lists and builds up some calculated data. I would like to return these data as a lists of lists, indexed by some value, but I'm getting the assignment wrong.
A minimal example of what I'm trying to do, and where i'm going wrong would be:
mybiglist <- list()
for(i in 1:5){
a <- runif(10)
b <- rnorm(16)
c <- rbinom(8, 5, i/10)
name <- paste('item:',i,sep='')
tmp <- list(uniform=a, normal=b, binomial=c)
mybiglist[[name]] <- append(mybiglist, tmp)
}
If you run this and look at the output mybiglist, you will see that something is going very wrong in the way each item is being named.
Any ideas on how I might achieve what I actually want?
Thanks
ps. I know that in R there is a sense in which one has failed if one has to resort to loops, but in this case I do feel justified ;-)
It works if you don't use the append command:
mybiglist <- list()
for(i in 1:5){
a <- runif(10)
b <- rnorm(16)
c <- rbinom(8, 5, i/10)
name <- paste('item:',i,sep='')
tmp <- list(uniform=a, normal=b, binomial=c)
mybiglist[[name]] <- tmp
}
# List of 5
# $ item:1:List of 3
# ..$ uniform : num [1:10] 0.737 0.987 0.577 0.814 0.452 ...
# ..$ normal : num [1:16] -0.403 -0.104 2.147 0.32 1.713 ...
# ..$ binomial: num [1:8] 0 0 0 0 1 0 0 1
# $ item:2:List of 3
# ..$ uniform : num [1:10] 0.61 0.62 0.49 0.217 0.862 ...
# ..$ normal : num [1:16] 0.945 -0.154 -0.5 -0.729 -0.547 ...
# ..$ binomial: num [1:8] 1 2 2 0 2 1 0 2
# $ item:3:List of 3
# ..$ uniform : num [1:10] 0.66 0.094 0.432 0.634 0.949 ...
# ..$ normal : num [1:16] -0.607 0.274 -1.455 0.828 -0.73 ...
# ..$ binomial: num [1:8] 2 2 3 1 1 1 2 0
# $ item:4:List of 3
# ..$ uniform : num [1:10] 0.455 0.442 0.149 0.745 0.24 ...
# ..$ normal : num [1:16] 0.0994 -0.5332 -0.8131 -1.1847 -0.8032 ...
# ..$ binomial: num [1:8] 2 3 1 1 2 2 2 1
# $ item:5:List of 3
# ..$ uniform : num [1:10] 0.816 0.279 0.583 0.179 0.321 ...
# ..$ normal : num [1:16] -0.036 1.137 0.178 0.29 1.266 ...
# ..$ binomial: num [1:8] 3 4 3 4 4 2 2 3
Change
mybiglist[[name]] <- append(mybiglist, tmp)
to
mybiglist[[name]] <- tmp
To show that an explicit for loop is not required
unif_norm <- replicate(5, list(uniform = runif(10),
normal = rnorm(16)), simplify=F)
binomials <- lapply(seq_len(5)/10, function(prob) {
list(binomial = rbinom(n = 5 ,size = 8, prob = prob))})
biglist <- setNames(mapply(c, unif_norm, binomials, SIMPLIFY = F),
paste0('item:',seq_along(unif_norm)))
In general if you go down the for loop path it is better to preassign the list beforehand. This is more memory efficient.
mybiglist <- vector('list', 5)
names(mybiglist) <- paste0('item:', seq_along(mybiglist))
for(i in seq_along(mybiglist)){
a <- runif(10)
b <- rnorm(16)
c <- rbinom(8, 5, i/10)
tmp <- list(uniform=a, normal=b, binomial=c)
mybiglist[[i]] <- tmp
}

Resources