So, I'm trying to generate random numbers from multivariate normal distributions with different means. I'm also trying to use the apply functions and not for loops, which is where the problem occurs. Here is my code:
library(MASS)
set.seed(123)
# X and Y means
Means<-cbind(c(.2,.2,.8),c(.2,.6,.8))
Means
Sigma<-matrix(c(.01,0,0,.01),nrow=2)
Sigma
data<-apply(X=Means,MARGIN=1,FUN=mvrnorm,n=10,Sigma=Sigma)
data
Instead of getting two vector with X and Y points for the three means, I get three vectors with X and Y points stacked. What is the best way to get the two vectors? I know I could unstack them manually, but I feel R should have some slick way of getting this done.
It's not sure if it's what I would call 'slick' but if you really want to use apply (instead of lapply as previously mentioned), you can force apply to return your results as a list of matrices. Then it's just a matter of sticking the results together. I expect that this would be less error-prone than trying to rebuild a two column matrix.
data <- apply(Means, 1, function(x) {
list(mvrnorm(n=10, mu=x, Sigma=Sigma))
})
data <- do.call('rbind', unlist(data, recursive=FALSE))
Try:
set.seed(42)
res1 <- lapply(seq_len(nrow(Means)), function(i) mvrnorm(Means[i,], n=10, Sigma))
Checking with the results of apply
set.seed(42)
res2 <- apply(X=Means,MARGIN=1,FUN=mvrnorm,n=10,Sigma=Sigma)
dim(res2) <- c(10,2, 3)
res3 <-lapply(1:dim(res2)[3], function(i) res2[,,i])
all.equal(res3, res1, check.attributes=FALSE)
#[1] TRUE
Related
i am working with consumer price index CPI and in order to calculate it i have to multiply the index matrix with the corresponding weights:
grossCPI77_10 <- grossIND1977 %*% weights1910/100
grossCPI82_10 <- grossIND1982 %*% weights1910/100
of course i would rather like to have a code like the one beyond:
grossIND1982 <- replicate(20, cbind(1:61))
grossIND1993 <- replicate(20, cbind(1:61))
weights1910_sc <- c(1:20)
grossIND_list <- mget(ls(pattern = "grossIND...."))
totalCPI <- mapply("*", grossIND_list, weights1910_sc)
the problem is that it gives me a 1200x20 matrix. i expected a normal matrix (61x20) vector (20x1) multiplication which should result in a 20x1 vector? could you explain me what i am doing wrong? thanks
part of your problem is that you don't have matrices but 3D arrays, with one singleton dimension. The other issue is that mapply likes to try and combine the results into a matrix, and also that constant arguments should be passed via MoreArgs. But actually, this is more a case for lapply.
grossIND1982 <- replicate(20, cbind(1:61))[,1,]
grossIND1993 <- replicate(20, cbind(1:61))[,1,]
weights1910_sc <- c(1:20)
grossIND_list <- mget(ls(pattern = "grossIND...."))
totalCPI <- mapply("*", grossIND_list, MoreArgs=list(e2 = weights1910_sc), SIMPLIFY = FALSE)
totalCPI <- lapply(grossIND_list, "*", e2 = weights1910_sc)
I am not sure if I understood all aspects of your problem (especially concerning what should be colums, what should be rows, and in which order the crossproduct shall be applied), but I will try at least to cover some aspects. See comments in below code for clarifications of what you did and what you might want. I hope it helps, let me know if this is what you need.
#instead of using mget, I recommend to use a list structure
#otherwise you might capture other variables with similar names
#that you do not want
INDlist <- sapply(c("1990", "1991"), function(x) {
#this is how to set up a matrix correctly, check `?matrix`
#I think your combination of cbind and rep did not give you what you wanted
matrix(rep(1:61, 20), nrow = 61)
}, USE.NAMES = TRUE, simplify = F)
weights <- list(c(1:20))
#the first argument of mapply needs to be a function, in this case of two variables
#the body of the function calculates the cross product
#you feed the arguments (both lists) in the following part of mapply
#I have repeated your weights, but you might assign different weights for each year
res <- mapply(function(x, y) {x %*% y}, INDlist, rep(weights, length(INDlist)))
dim(res)
#[1] 61 2
I'm wanting to fill a list with many different matrices which are created by selecting a variety of different samples from an original matrix. Then repeat this process 10 times. I managed to do it (after much fighting/painful learning process). I would be so grateful if someone could point me in the right direction to get rid of my redundant code and improve the functions I'm using (maybe even get rid of the loops which I gather are rather frowned upon).
My problem hinged on getting the different sized matrices out of the loop.
Here's the code I used, one day I aspire to write R code that is not ugly:
##defining a matrix called allmat
allmat <- matrix(rnorm(100), nrow=50, ncol=2)
##sampling different sizes of the allmat matrix from 0.1*allmat to 10*allmat
for(i in seq(0,9,by=1)) {
for(j in seq(0.1,10,by=0.05)) {
nam <- paste("powermatrix_",j,"_",i,sep="")
assign(nam, allmat[sample(nrow(allmat),replace=T,size=j*nrow(allmat)),])
}
}
##then using apropos to pick out the names of the matrices from file
##eventually converting matrix list into a list to then use lapply
matrixlist <- data.frame(apropos("powermatrix_"), stringsAsFactors = FALSE)
##then rather horribly trying somehow to get my dataframe into a
## list which eventually I do below (but although inelegant this bit is
## not crucial)
colnames(matrixlist) <- "col1"
matrixlist_split <- strsplit(matrixlist$col1, "_")
library("plyr")
df <- ldply(matrixlist_split)
colnames(df) <- c("V1", "V2", "V3")
vector_sample <- as.numeric(df$V2)
mynewdf <- cbind(vector_matrices,vector_sample)
##creating the list before lapply
mylist <- as.list(mget(mynewdf$col1))
##then with the list I can use lapply (but there has to be a much
## much better way!)
Many thanks for all your input. This is now working much better with the following two lines. I didn't know you could seq_along or seq with lapply. These two in combination are very helpful.
this vector changes the size and repititions of the matrix sampled
seq_vector <- c(rep(seq(0.1,1,by=0.1),each=10))
this samples the matrix for all of the sizes and repeats defined by the sequence vector
myotherlist <- lapply(seq(seq_vector), function(x) allmat[sample(1:nrow(allmat), replace=T, size=x*nrow(allmat)),])
I have the following data:
seed(1)
X <- data.frame(matrix(rnorm(2000), nrow=10))#### the dataset
The following code creates 1000 bootstrapped datasets "x" and 1000 bootstrapped datasets "y" with 5 columns each.
colnums_boot <- replicate(1000,sample.int(200,10))
output<-lapply(1:1000, function(i){
Xprime <- X[,colnums_boot[1:5,i]]
Yprime <- X[,colnums_boot[6:10,i]]
xy <- list(x=Xprime,y=Yprime )
} )
I obtained a list of lists of dataframes " xy " to which I would like to apply this particular code but do not understand the list indexing operations.
From the output "xy"
Considering the first list [1] which has
$x and
$y
I would like to apply the code:
X= cor($x)
Y= cor($y) separately and then
sapply(1:10, function(row) cor(X[row,], Y[row,]))
which will give me a single value for each row "r1" for list [1].
I would like to apply this to the entire list and obtain r1, r2 from list[1] , list[2] respectively and so on.. until 1000 and make it as a dataframe in the end. It will be a ten by thousand dimension dataframe in the end.
I can't find the question where I wrote that Xprime, Yprime bit; I hope you didn't delete it...? If I remember correctly, I suggested this, since it is much more efficient to deal with matrices:
Z <- as.matrix(X)
Xprime2 <- array(,dim=c(10,5,1000))
Yprime2 <- array(,dim=c(10,5,1000))
Xprime2[] <- Z[,colnums_boot[1:5,]]
Yprime2[] <- Z[,colnums_boot[6:10,]]
Anyway, in your setup, as #KarlForner commented, this will get you correlations between X and Y columns
lapply(output,function(ll) cor(ll$x,ll$y))
This is also potentially inefficient when bootstrapping, since you will be computing correlations among the same 200 vectors. I think it makes more sense to just compute them up front cor(X) and then grab the values from there...
As far as putting that into a data.frame, I'm not clear on what that would mean.
I am having difficulty subsetting my data by factors in a for loop. Here is a illustrative example:
x<-rnorm(n=40, m=0, sd=1)
y<-rep(1:5, 8)
df<-as.data.frame(cbind(x,y))
df_split<-split(df, df$y)
mean_vect<-rep(-99, 5)
for (i in c(1:5)) {
current_df<-df_split$i
mean_vect[i]<-mean(current_df)
}
`
This approach is not working because I think R is looking for a split called "i" when I really want it to pull out the ith split! I have also tried the subset function with little joy. I always run into these problems when I am trying to split on a non-numeric factor so any help would be appreciated
FYI, the functionality to accomplish this is typically done using tapply
tapply( df$x, df$y, mean )
The first argument specifies the value you want to "mean-group". The second is just the INDEX, i.e. the variable that splits your groups and the last is obviously the function you want to run on these groups, in this case mean.
To get split number i run
df_split[[i]]
BTW, as your final aim is mean_vect you better to use
mean_vect <- lapply(df_split, mean)
or:
mean_vect <- tapply(df$x, df$y, mean)
mean_vect
1 2 3 4 5
0.2566810 -0.1528079 -0.2097333 -0.1540343 0.3609312
I'm trying to find an apply() type function that can run a function that operates on two arrays instead of one.
Sort of like:
apply(X1 = doy_stack, X2 = snow_stack, MARGIN = 2, FUN = r_part(a, b))
The data is a stack of band arrays from Landsat tiles that are stacked together using rbind. Each row contains the data from a single tile, and in the end, I need to apply a function on each column (pixel) of data in this stack. One such stack contains whether each pixel has snow on it or not, and the other stack contains the day of year for that row. I want to run a classifier (rpart) on each pixel and have it identify the snow free day of year for each pixel.
What I'm doing now is pretty silly: mapply(paste, doy, snow_free) concatenates the day of year and the snow status together for each pixel as a string, apply(strstack, 2, FUN) runs the classifer on each pixel, and inside the apply function, I'm exploding each string using strsplit. As you might imagine, this is pretty inefficient, especially on 1 million pixels x 300 tiles.
Thanks!
I wouldn't try to get too fancy. A for loop might be all you need.
out <- numeric(n)
for(i in 1:n) {
out[i] <- snow_free(doy_stack[,i], snow_stack[,i])
}
Or, if you don't want to do the bookkeeping yourself,
sapply(1:n, function(i) snow_free(doy_stack[,i], snow_stack[,i]))
I've just encountered the same problem and, if I clearly understood the question, I may have solved it using mapply.
We'll use two 10x10 matrices populated with uniform random values.
set.seed(1)
X <- matrix(runif(100), 10, 10)
set.seed(2)
Y <- matrix(runif(100), 10, 10)
Next, determine how operations between the matrices will be performed. If it is row-wise, you need to transpose X and Y then cast to data.frame. This is because a data.frame is a list with columns as list elements. mapply() assumes that you are passing a list. In this example I'll perform correlation row-wise.
res.row <- mapply(function(x, y){cor(x, y)}, as.data.frame(t(X)), as.data.frame(t(Y)))
res.row[1]
V1
0.36788
should be the same as
cor(X[1,], Y[1,])
[1] 0.36788
For column-wise operations exclude the t():
res.col <- mapply(function(x, y){cor(x, y)}, as.data.frame(X), as.data.frame(Y))
This obviously assumes that X and Y have dimensions consistent with the operation of interest (i.e. they don't have to be exactly the same dimensions). For instance, one could require a statistical test row-wise but having differing numbers of columns in each matrix.
Wouldn't it be more natural to implement this as a raster stack? With the raster package you can use entire rasters in functions (eg ras3 <- ras1^2 + ras2), as well as extract a single cell value from XY coordinates, or many cell values using a block or polygon mask.
apply can work on higher dimensions (i.e. list elements). Not sure how your data is set up, but something like this might be what you are looking for:
apply(list(doy_stack, snow_stack), c(1,2), function(x) r_part(x[1], x[2]))