Vertical concatenation of many vectors Mata/Stata - vector

I am working with Mata (Stata) trying to append (vertically concatenate) many vectors.
I would like to do something like
mat C = A\B
but since I have about 300 column vectors to append in a unique column vector, I would like to know if there is a command to do it (since it is quite difficult to type
mat C = c1\c2\c3...
300 times).

The code you cite is Stata's matrix language, which is not Mata.
How did you get these vectors in the first place? Are they named systematically?
There is a likely to be an easy answer depending on the details.
For example, in Stata you can go with column vectors c1 ... c300
mat C = c1
forval j = 2/300 {
mat C = C \ c`j'
}
although the matsize limit may mean you are better off handling such a column vector as a Stata variable or in Mata.
EDIT: To produce a matrix in Stata from those columns, use , not \.

This code adopts Nick's logic but uses the Mata language.
mat c=(1,4,7,10\2,5,8,11\3,6,9,12) // 3 x 4 matrix
mat list c
mata
c=st_matrix("c") // Stata matrix into Mata matrix
x=c[.,1]
for (i=2; i<5;i++) {
x=x\c[.,i]
}
st_matrix("newC",x) // Mata matrix into Stata matrix
end
mat list newC

Related

R data.table: How to use apply function?

I need to briefly explain the context before letting you know my question.
I am trying to process a large graph, namely the Social circles: Google+ here. The file gplus_combined.txt downloaded from that site is read by using data.table package:
library(data.table)
data = fread('gplus_combined.txt',stringsAsFactors = TRUE)
Variable data is of dimensions dim(data) = c(30494865,2) and here is an example of a row of data:
>data[1,]
>1: 112188647432305746617 107727150903234299458
The two long integer strings are ids of nodes of the graph, and each row of data corresponds to an edge between the first and second node ids. Since working with node ids like those are not very convenient, I'd like to convert them to numbers using R function strtoi. Here is what I have tried
M = matrix(0,2,2)
for (i in 1:2) {
for (j in 1:2) {
M[i,j] = strtoi(data[i,j,with = FALSE])
}
}
print(M)
[,1] [,2]
[1,] 47826 45374
[2,] 65616 2462
This works well, for just two rows of data. But it is too slow for processing about 30 millions rows of data. So I want to use R function apply to speed up the calculation. The problem is that if I just use
apply(data[1:2,], 1:2, strtoi)
[1,] NA NA
[2,] NA NA
then it returns a 2x2 matrix with NA entries. Note that to get the matrix M above, I need to include the parameter with = FALSE,
strtoi(data[i,j,with = FALSE])
otherwise M would also be a matrix of NA entries. Is there a way to pass the option with = FALSE to apply function? Or any other faster way to get the same result like matrix M? Any sugguestions/comments are greatly appreciated!
Thank you for spending your time reading this long post!

Convert code from Matlab to R

Example of My Data
I have three matrix csv data files that I need to flatten and combine in R, so that I have three columns (Lat, Long, Data). The code I have for this is in matlab, but I need to convert this to R. Any thoughts? This is the matlab code that does this:
LON=csvread(‘LONGITUDE.csv’);
LAT=csvread(‘LATITUDE.csv’);
SM=csvread(‘soil_moisture20151008.csv’);
xyz=zeros(101*210,3);
k=0;
for i=1:101
for j=1:210
k=k+1;
xyz(k,1)=LAT(i,j);
xyz(k,2)=LON(i,j);
xyz(k,3)=SM(i,j);
end
end
csvwrite(‘xyz.csv’,xyz);
So far this is how I have changed it in R:
LON<-read.csv("LONGITUDE.csv", header = T)
LAT<-read.csv("LATITUDE.csv", header = T)
ET<-read.csv("actual_ET20100101.csv")
xyz=matrix(3,101,210)
k=0
for (i in 1:101){
for (j in 1:210){
k=k+1
xyz[k,1]=LAT[i,j]
xyz[k,2]=LON[i,j]
xyz[k,3]=ET[i,j]
}
}
write.csv("xyz.csv",xyz);
I'm not sure what I'm doing wrong. Any guidance on this issue would be greatly appreciated.
Finally, I have a whole directory of files that I need to run this script on, so any ideas on how to apply this to a directory would be great. The LAT/LON files don't change, just the data files.
Thank you!!
If I am understanding your data correctly, you have a large number of matrix files, where each index (row/column position) is assigned to the same data value. That is, (1,1) in each matrix gives the value of interest for the 1st data point, and (1,2) gives values for a different data point.
In that case, you should just be able to convert them all to a matrix, extract the values as a vector, then stitch them together.
To illustrate, here are three identical data.frames (so that we can see if they align correctly:
A <- B <- C <-
data.frame(matrix(runif(36), nrow = 6))
Each data.frame is this:
X1 X2 X3 X4 X5 X6
1 0.2462450 0.6887587 0.216578122 0.5982332 0.2402868 0.9588999
2 0.5924075 0.7511237 0.813704807 0.6892747 0.6253069 0.4648226
3 0.7482773 0.4808986 0.006036452 0.6576487 0.5752148 0.5554258
4 0.8545323 0.6822942 0.654128179 0.6582181 0.8173544 0.5191778
5 0.1748737 0.7456279 0.992209169 0.4468014 0.3491022 0.9736064
6 0.7189847 0.3424291 0.581840006 0.1460138 0.8071445 0.2920479
Then, I put them all in a list (named, so that the columns come out named):
myList <- list(A = A, B = B, C = C)
Then, we loop through the list, converting each data.frame to a matrix, then extracting the values as a vector. Then, I convert the resulting list to a data.frame to get the column/row behavior you likely want (data.frames are just lists with special properties; each column is an element of the list, but data.frames assumes the value orders match). Note that I am using magrittr/dplyr piping to simplify the nesting in the code:
flattened <-
lapply(myList, function(x){
as.matrix(x) %>%
as.numeric()
}) %>%
as.data.frame()
Then, the head of this (from my randomization) looks like:
A B C
1 0.2462450 0.2462450 0.2462450
2 0.5924075 0.5924075 0.5924075
3 0.7482773 0.7482773 0.7482773
4 0.8545323 0.8545323 0.8545323
5 0.1748737 0.1748737 0.1748737
6 0.7189847 0.7189847 0.7189847
Of note, you mentioned that you may have multiple data sources that you want to merge -- as long as you load them all up into this list, the approach will generate a column for each.

How to feed two arrays into apply

I have two 3-D arrays, and I want to calculate some statistics on them. As long as I am working with only one variable, I know how to do it. For example, to calculate the mean over the first dimension, I use the following:
obs<-array(1:8,c(2,2,2));
mod<-array(9:2,c(2,2,2));
meanObs <- apply(obs,c(2,3),mean) # mean of observation
meanMod <- apply(mod,c(2,3),mean) # mean od model simulation/forecast
However, I do not know how to feed two sliced array into apply. For example, I am trying to calculate the correlation coefficient over the first dimension. I can do it with the following loop functions:
pearsonCor<-matrix(, nrow = dim(obs)[2], ncol = dim(obs)[3])
for (i in 1:dim(obs)[2]){
for (j in 1:dim(obs)[3]){
pearsonCor[i,j]<-tryCatch(suppressWarnings(cor(obs[,i,j], mod[,i,j], method = "pearson")),
error=function(cond) {return(NA)})
}
}
result:
> pearsonCor
[,1] [,2]
[1,] -1 -1
[2,] -1 -1
But I want to learn how to deal with this situation with apply.Any help would be very much appreciated.
Thanks,
You can use expand.grid to get the index combination as in your nested for loop. Then apply over the data.frame of indices.
pearsonCor[] <- apply(expand.grid(1:dim(obs)[2], 1:dim(obs)[3]), 1, function(x)
cor(obs[,x[[1]], x[[2]]], mod[,x[[1]], x[[2]]]))
This will actually loop more quickly over the first variable (corresponding to i in the loops), so the indices would need to be reversed to have the matrix in the ordering of your question.

Creating an array or lists of lists in R

I have a list of matrices such that my_list[[1]] consists of a matrix and my_list[[2]] contains another matrix and so on. I want to embed this list inside a loop such that for every iteration of the loop I have a different my_list with different matrices, and want to be able to access them later. Is there any way I could do this in R? For example like creating an array (of size = number of iterations of the loop), and each index of the array would have a different list of matrices. Or something similar. And how can I access it. Could anyone please help me with this? I would greatly appreciate the help. I have looked around but cannot find a way to do this. Lists of lists seem to be an option, and I have tried to experiment with it for one iteration but it gives this error:
> nes <- list()
> nes[[1]] <- append(nes[[1]], my_list[[1]])
Error in nes[[1]] : subscript out of bounds
Would be great if anyone could help me with this.
EDIT:
Basically what I have is an initial list known as particles. Something like this:
for (k in 1:10)
{
# three centroids; k = 3
particle[[k]] <- rbind(features.dataf[sample(1:10, 1),2:4],
features.dataf[sample(1:10, 1),2:4],
features.dataf[sample(1:10, 1),2:4])
row.names(particle[[k]]) <- c(1,2,3)
}
Then I run this loop again. With an extra outer loop.
for (n in 1:30) {
for (k in 1:10) {
###some calculations
### create a vector f[k] with an f value for each k (calculated according to some formula)
pbestFitness[n,k] <- f[k] ##create a nXk dataframe that stores the f[k] value for every iteration of n
### over here I want to create a list of lists
}
}
In the above code where I create the list of lists, such that for every iteration of the outer loop I have a particle[[k]]th matrix stored.
Any particle[[k]] is of the form:
[,1] [,2] [,3]
[1,] 0.96436532 0.8958297 0.6089338
[2,] 0.08555853 0.7762849 0.6647247
[3,] 0.30792817 0.8061227 0.5099790
The desired output would be something like that if I try to access this new lists of lists (nes), its nes[[n]] value should have a list with k number of matrices.

R - "apply" for 2 matrices

I am doing an R assignment and I have to write a function that does what dist.xyz does.
dist.xyz(a, b = NULL, all.pairs=FALSE)
a and b are matrices of numbers and the function computes the distances between corresponding rows of
‘a’ and ‘b’.
I tried a for loop (as below) but it takes too long and "apply" only allows us to do operation on 1 matrix at a time.
dis = vector()
for (i in 1:nrow(a)) {
append(dis,sqrt(sum((a[i,] - b[i,]) ^ 2)))
}
Is there some way to "apply" to two matrices?
Thanks in advance
Would be easier if you had example data. But here's my take. This isn't a general solution for '"apply" for 2 matrices'. However, in your case, you only need apply for a single matrix a-b, since the element-wise difference of each row is the first thing you take. Then apply square, sum, and square root to each row to obtain your result.
set.seed(7) # just to ensure reproducible results
rowDist<-function(a,b) {
apply(a-b,1,function(x)sqrt(sum(x^2)))
}
a<-matrix(rnorm(25),5,5)
b<-matrix(rnorm(25),5,5)
rowDist(a,b)
#[1] 2.716251 2.685056 3.699462 2.125998 3.437412

Resources