Variance of a List of Arrays

Variance of a List of Arrays - r

Suppose x is a list of arrays.
a <- array(1,c(2,3,4));
b <- array(2,c(2,3,4));
c <- array(3,c(2,3,4));
x <- list(a=a,b=b,c=c)
My desired output is the following array:
array(var(c(1,2,3)),c(2,3,4))
This is quietly same as the following page, but change the sum to the variance.
Ref: Sum a list of arrays
Edit for comments of #akrun
Thank you #akrun.
His answer is the following;
apply(
array(unlist(x),append( dim(x[[1]]), length(x) ) ),
1:length(dim(x[[1]])),
var
)
where x is the following list of arrays:
a <- array(1,c(2,3,4));
b <- array(2,c(2,3,4));
c <- array(3,c(2,3,4));
d <- array(4,c(2,3,4));
x <- list(a=a,b=b,c=c,d=d)

One option would be to unlist the list, create an array and use apply to get the variance
apply(array(unlist(x), c(2, 3, 4, 3)), 1:3, var)

Related

Optimise row wise matrix comparison in R

I've googled extensively and can't seem to find an answer to my problem. Apologies if this has been asked before. I have two matrices, a & b, each with the same dimensions. What I am trying to do is iterate over the rows of a (from i = 1 to number of rows in a) and check if any elements found in row i of matrix a appear in the corresponding row in matrix b. I have a solution using sapply but this becomes quite slow with very large matrices. I wondered if it is possible to vectorise my solution somehow? Examples below:
# create example matrices
a = matrix(
1:9,
nrow = 3
)
b = matrix(
4:12,
nrow = 3
)
# iterate over rows in a....
# returns TRUE for each row of a where any element in ith row is found in the corresponding row i of matrix b
sapply(1:nrow(a), function(x){ any(a[x,] %in% b[x,])})
# however, for large matrices this performs quite poorly. is it possible to vectorise?
a = matrix(
runif(14000000),
nrow = 7000000
)
b = matrix(
runif(14000000),
nrow = 7000000
)
system.time({
sapply(1:nrow(a), function(x){ any(a[x,] %in% b[x,])})
})

Use apply to find any 0 differences:
a <- sample(1:3, 9, replace = TRUE)
b <- sample(1:3, 9, replace = TRUE)
a <- matrix(a, ncol = 3)
b <- matrix(b, ncol = 3)
diff <- (a - b)
apply(diff, 1, function(x) which(x == 0)) # actual indexes = 0
apply(diff, 1, function(x) any(x == 0)) # row check only
or

Maybe you can try intersect + asplit like below
lengths(Map(intersect, asplit(a, 1), asplit(b, 1))) > 0

Create a matrix from a list consisting of unequal matrices for individual bootstraps

I tried to create a matrix from a list which consists of N unequal matrices...
The reason to do this is to make R individual bootstrap samples.
In the example below you can find e.g. 2 companies, where we have 1 with 10 & 1 with just 5 observations.
Data:
set.seed(7)
Time <- c(10,5)
xv <- matrix(c(rnorm(10,5,2), rnorm(5,20,1), rnorm(10,5,2), rnorm(5,20,1)), ncol=2);
y <- matrix( c(rnorm(10,5,2), rnorm(5,20,1)));
z <- matrix(c(rnorm(10,5,2), rnorm(5,20,1), rnorm(10,5,2), rnorm(5,20,1)), ncol=2)
# create data frame of input variables which helps
# to conduct the rowise bootstrapping
data <- data.frame (y = y, xv = xv, z = z);
rows <- dim(data)[1];
cols <- dim(data)[2];
# create the index to sample from the different panels
cumTime <- c(0, cumsum (Time));
index <- findInterval (seq (1:rows), cumTime, left.open = TRUE);
# draw R individual bootstrap samples
bootList <- replicate(R = 5, list(), simplify=F);
bootList <- lapply (bootList, function(x) by (data, INDICES = index, FUN = function(x) dplyr::sample_n (tbl = x, size = dim(x)[1], replace = T)));
---------- UNLISTING ---------
Currently, I try do it incorrectly like this:
Example for just 1 entry of the list:
matrix(unlist(bootList[[1]], recursive = T), ncol = cols)
The desired output is just
bootList[[1]]
as a matrix.
Do you have an idea how to do this & if possible reasonably efficient?
The matrices are then processed in unfortunately slow MLE estimations...

i found a solution for you. From what i gather, you have a Dataframe containing all observations of all companies, which may have different panel lengths. And as a result you would like to have a Bootstap sample for each company of same size as the original panel length.
You mearly have to add a company indicator
data$company = c(rep(1, 10), rep(2, 5)) # this could even be a factor.
L1 = split(data, data$company)
L2 = lapply(L1, FUN = function(s) s[sample(x = 1:nrow(s), size = nrow(s), replace = TRUE),] )
stop here if you would like to have saperate bootstap samples e.g. in case you want to estimate seperately
bootdata = do.call(rbind, L2)
Best wishes,
Tim

Build dataframe with function dependant on its own output

I've tried on the web but I haven't found a suitable solution. Any help would be much appreciated!
Example:
# I want this OUTPUT df
Previous <- c(2, 4, 8, 16)
Today <- c(4, 8, 16, 32)
df <- data.frame(Previous, Today)
INPUT conditions :
Today is a function of Previous:
Today_function <- Previous_start * 2
For Previous I only have the starting value:
Previous_start <- 2
The output of Today_function feeds the next row of Previous and is the new input of Today_function.

You can abuse Reduce to iteratively apply a function to its own output:
n <- 4 # Desired number of rows in the data.frame
val0 <- 2 # Starting value
f <- function( valPrev, dummy ) { valPrev * 2 } # Make function "binary"
v <- Reduce( f, rep(NA,n), val0, accumulate=TRUE )
X <- data.frame( Previous=head(v,-1), Today=tail(v,-1) )
Side note: I know it's common, but I recommend against using df as a variable name, because it's a built-in name for the density function of the F distribution.

How do you find the sample sizes used in calculations on r?

I am running correlations between variables, some of which have missing data, so the sample size for each correlation are likely different. I tried print and summary, but neither of these shows me how big my n is for each correlation. This is a fairly simple problem that I cannot find the answer to anywhere.

like this..?
x <- c(1:100,NA)
length(x)
length(x[!is.na(x)])
you can also get the degrees of freedom like this...
y <- c(1:100,NA)
x <- c(1:100,NA)
cor.test(x,y)$parameter
But I think it would be best if you show the code for how your are estimating the correlation for exact help.

Here's an example of how to find the pairwise sample sizes among the columns of a matrix. If you want to apply it to (certain) numeric columns of a data frame, combine them accordingly, coerce the resulting object to matrix and apply the function.
# Example matrix:
xx <- rnorm(3000)
# Generate some NAs
vv <- sample(3000, 200)
xx[vv] <- NA
# reshape to a matrix
dd <- matrix(xx, ncol = 3)
# find the number of NAs per column
apply(dd, 2, function(x) sum(is.na(x)))
# tack on some column names
colnames(dd) <- paste0("x", seq(3))
# Function to find the number of pairwise complete observations
# among all pairs of columns in a matrix. It returns a data frame
# whose first two columns comprise all column pairs
pairwiseN <- function(mat)
{
u <- if(is.null(colnames(mat))) paste0("x", seq_len(ncol(mat))) else colnames(mat)
h <- expand.grid(x = u, y = u)
f <- function(x, y)
sum(apply(mat[, c(x, y)], 1, function(z) !any(is.na(z))))
h$n <- mapply(f, h[, 1], h[, 2])
h
}
# Call it
pairwiseN(dd)
The function can easily be improved; for example, you could set h <- expand.grid(x = u[-1], y = u[-length(u)]) to cut down on the number of calculations, you could return an n x n matrix instead of a three-column data frame, etc.

Here is a for-loop implementation of Dennis' function above to output an n x n matrix rather than have to pivot_wide() that result. On my databricks cluster it cut the compute time for 1865 row x 69 column matrix down from 2.5 - 3 minutes to 30-40 seconds.
Thanks for your answer Dennis, this helped me with my work.
pairwise_nxn <- function(mat)
{
cols <- if(is.null(colnames(mat))) paste0("x", seq_len(ncol(mat))) else colnames(mat)
nn <- data.frame(matrix(nrow = length(cols), ncol = length(cols)))
rownames(nn) <- colnames(nn) <- cols
f <- function(x, y)
sum(apply(mat[, c(x, y)], 1, function(z) !any(is.na(z))))
for (i in 1:nrow(nn))
for (j in 1:ncol(nn))
nn[i,j] <- f(rownames(nn)[i], colnames(nn)[j])
nn
}

If your variables are vectors named a and b, would something like sum(is.na(a) | is.na(b)) help you?

How to designate "binding method" in apply

Say, I have a matrix m with 2 rows and 3 columns, together with a function which takes in an argument of numeric type and return a vector. For example,
f <- function(x){
rep(x,4)
}
If I use
t <- apply(m, MARGIN = c(1,2), FUN = f)
I will get a tensor t with dim(t) == c(4,2,3). But what I want is a tensor t with dim(t) == c(2,3,4). How can I do this conveniently? Thank you.

Use aperm, a generalized transposition:
tt <- <- apply(m, MARGIN = c(1,2), FUN = f)
aperm(tt, c(2,3,1))
See here: How to change order of array dimensions

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Variance of a List of Arrays - r

One option would be to unlist the list, create an array and use apply to get the variance apply(array(unlist(x), c(2, 3, 4, 3)), 1:3, var)

Related

Optimise row wise matrix comparison in R

Create a matrix from a list consisting of unequal matrices for individual bootstraps

Build dataframe with function dependant on its own output

How do you find the sample sizes used in calculations on r?

How to designate "binding method" in apply

Categories

Resources