Firstly, I’d like to apologize. I’m learning R on my own, so I couldn’t simplify my problem and decided to just write a short version of my real variables here.
I’m trying to implement a variant of the Maximum Likelihood classifier in R. So, I have some variables for each class written in vectors and lists (each position refers to one class), and I want to apply a function to the lines of a matrix that contains the data I want to classify. The problem is that I need the results of that function separated by class. So far, I’m doing this:
cc<-vector(length=2)
mm<-list(length=2)
ii<-list(length=2)
temp1<-matrix(nrow=16,ncol=6)
temp1<-as.data.frame(temp1)
temp1[]<-c(256,235,194,235,215,173,215,215,194,215,215,215,194,173,152,215,
430,388,388,388,388,430,430,430,388,346,346,388,388,388,346,388,
283,317,283,283,248,283,283,283,214,214,248,283,214,283,214,248,
3701,3450,3576,3826,3534,3450,3868,4035,3450,3493,3450,3701,3534,3242,3032,3116,
1646,1589,1589,1646,1646,1589,1646,1732,1560,1475,1589,1589,1675,1532,1503,1418,
474,556,556,515,556,556,597,637,556,515,515,515,515,515,434,434)
temp2<- matrix(nrow=11,ncol=6)
temp2<-as.data.frame(temp2)
temp2[]<-c(422,463,462,483,546,525,483,566,546,483,546,
770,812,770,812,854,854,812,939,939,854,981,
1038,1175,1004,1141,1209,1209,1038,1311,1311,1175,1311,
2359,2359,2275,2359,2359,2359,2359,2401,2359,2401,2401,
2445,2531,2417,2588,2759,2617,2388,2674,2730,2645,2731,
1413,1413,1373,1495,1618,1535,1413,1535,1659,1535,1618)
cc[1]<-det(cov(temp1))
cc[2]<-det(cov(temp2))
mm[[1]]<-as.numeric(sapply(temp1,"mean"))
mm[[2]]<-as.numeric(sapply(temp2,"mean"))
ii[[1]]<-solve(cov(temp1))
ii[[2]]<-solve(cov(temp2))
data<-matrix(nrow=10,ncol=6)
data<-as.data.frame(data)
data[]<-c(181,203,224,203,203,224,181,181,161,161,
338,338,338,338,296,296,338,381,338,296,
208,242,208,208,208,208,208,242,208,173,
3164,2954,2660,2787,2744,2787,2534,3457,2870,2912,
1476,1505,1391,1332,1304,1391,1132,1591,1448,1304,
474,474,474,515,392,432,432,556,515,474)
for (k in 1:2){
Pxi<-apply(data,1,function(x)1/(2*pi^(6/2)*cc[k]^(1/2))*exp(-1/2*t(as.numeric(x-mm[[k]]))%*%ii[[k]]%*%(as.numeric(x-mm[[k]]))))
if (k==1) {rule<-Pxi} else {rule<-cbind(rule,Pxi)}
}
So I got it:
rule
rule Pxi
[1,] 4.316396e-13 0.000000e+00
[2,] 6.835553e-15 7.970888e-284
[3,] 8.674921e-21 2.687251e-145
[4,] 5.923777e-19 8.020048e-189
[5,] 5.627127e-16 8.064007e-184
[6,] 2.495667e-17 5.738550e-209
[7,] 6.311390e-22 8.913098e-97
[8,] 1.413893e-12 0.000000e+00
[9,] 5.521715e-15 1.619401e-221
[10,] 5.212091e-17 5.810407e-254
Well, as you can imagine, data is actually much bigger than in my example, and this last loop is taking a very long time when k is too big. Any suggestions on how to make it faster?
Should be faster if you work in matrices. Here is a suggestion to replace the for loop
data <- as.matrix(data)
const <- 2*pi^(6/2)
do.call(cbind, lapply(1L:2L, function(k) {
m <- sweep(data, 2L, mm[[k]])
#1/(const*cc[k]^(1/2))* exp(-1/2 * diag(m %*% ii[[k]] %*% t(m)))
1/(const*cc[k]^(1/2))* exp(-1/2 * rowSums((m %*% ii[[k]]) * m))
}))
The use of rowSums (instead of the original diag(m %*% ii[[k]] %*% t(m)) was from compute only diagonals of matrix multiplication in R
output:
[,1] [,2]
[1,] 4.316396e-13 0.000000e+00
[2,] 6.835553e-15 7.970888e-284
[3,] 8.674921e-21 2.687251e-145
[4,] 5.923777e-19 8.020048e-189
[5,] 5.627127e-16 8.064007e-184
[6,] 2.495667e-17 5.738550e-209
[7,] 6.311390e-22 8.913098e-97
[8,] 1.413893e-12 0.000000e+00
[9,] 5.521715e-15 1.619401e-221
[10,] 5.212091e-17 5.810407e-254
Using cbind() in a loop is very expensive. Instead, you should assign the intermediate loop results to a list and then do.call(cbind, rule) at the end:
Regarding why the apply() statement is slow, there are a lot of operations to go through for each row of data looped through. Instead, it is better to try to do matrix operations (or a function) all at once.
This uses the mahalanobis() function to simplify what's in the exp() call. It turns out that the function uses the same exact approach as #chinsoon12.
1 / (2*pi^(6/2)*det(cov(temp1))^(1/2))*exp(-1 / 2 * mahalanobis(data, colMeans(temp1), cov(temp1)))
mahalanobis
#function (x, center, cov, inverted = FALSE, ...)
#{
# x <- if (is.vector(x))
# matrix(x, ncol = length(x))
# else as.matrix(x)
# if (!isFALSE(center))
# x <- sweep(x, 2L, center)
# if (!inverted)
# cov <- solve(cov, ...)
# setNames(rowSums(x %*% cov * x), rownames(x))
#}
#<bytecode: 0x000000000c217d80>
#<environment: namespace:stats>
I would approach this by first making a list() of your temp data.frames and then using lapply() to loop through them:
tmps <- list(temp1, temp2)
do.call(cbind,
lapply(tmps,
function(tmp) {
n = length(tmp)
cov_tmp <- cov(tmp)
1 / (2*pi^(n/2)*det(cov_tmp)^(1/2))*exp(-1 / 2 * mahalanobis(data, colMeans(tmp), cov_tmp))
}
)
)
[,1] [,2]
[1,] 4.316396e-13 0.000000e+00
[2,] 6.835553e-15 7.970888e-284
[3,] 8.674921e-21 2.687251e-145
[4,] 5.923777e-19 8.020048e-189
[5,] 5.627127e-16 8.064007e-184
[6,] 2.495667e-17 5.738550e-209
[7,] 6.311390e-22 8.913098e-97
[8,] 1.413893e-12 0.000000e+00
[9,] 5.521715e-15 1.619401e-221
[10,] 5.212091e-17 5.810407e-254
Reference: http://sar.kangwon.ac.kr/etc/rs_note/rsnote/cp11/cp11-7.htm
I have the following list of 5x2 matrices:
l <- list(a=matrix(rnorm(10),nrow=5,ncol=2),
b=matrix(rnorm(10),nrow=5,ncol=2),
c=matrix(rnorm(10),nrow=5,ncol=2))
For example, the first element of this list looks like this:
$a
[,1] [,2]
[1,] -0.4988268 1.9881333
[2,] -0.2979064 1.5921169
[3,] -1.3783522 -1.4149601
[4,] 0.2205115 0.2029210
[5,] 1.2721645 0.2861253
I want to take this list and create a new 5x2 matrix using information from a vector v:
v <- c("a","a","b","c","b")
This vector is an indicator vector that has information on how this new matrix should be constructed. That is, take row 1 from list element a, take row 2 from list element a and so on.
One could do it through a for-loop, however, for my application this is not efficient enough and I feel there might be a more elegant solution to it. My approach:
goal <- matrix(nrow=5,ncol=2)
for(i in 1:length(v)){
goal[i,] <- l[[v[i]]][i,]
}
goal
[,1] [,2]
[1,] -0.4988268 1.98813326
[2,] -0.2979064 1.59211686
[3,] 0.7715907 0.16776669
[4,] 0.2690278 0.02542766
[5,] 1.7865093 0.46361239
Thanks!
Assuming all the list matrices have same number of row, we could use mapply and subset the matrices by name (v) and row number.
t(mapply(function(x, y) l[[x]][y, ], v, 1:nrow(l[[1]])))
# [,1] [,2]
#a -1.2070657 0.5060559
#a 0.2774292 -0.5747400
#b -0.7762539 -0.9111954
#c 0.4595894 -0.0151383
#b 0.9594941 2.4158352
data
set.seed(1234)
l <- list(a=matrix(rnorm(10),nrow=5,ncol=2),
b=matrix(rnorm(10),nrow=5,ncol=2),
c=matrix(rnorm(10),nrow=5,ncol=2))
I´m trying to get different elements from multiple diagonal saved as lists. My data looks something like this:
res <- list()
res[[1]] <- matrix(c(0.04770856,0.02854005,0.02854005,0.03260190), nrow=2, ncol=2)
res[[2]] <- matrix(c(0.05436957,0.04887182,0.04887182, 0.10484454), nrow=2, ncol=2)
> res
[[1]]
[,1] [,2]
[1,] 0.04770856 0.02854005
[2,] 0.02854005 0.03260190
[[2]]
[,1] [,2]
[1,] 0.05436957 0.04887182
[2,] 0.04887182 0.10484454
> diag(res[[1]])
[1] 0.04770856 0.03260190
> diag(res[[2]])
[1] 0.05436957 0.10484454
I would like to save the first and second elements of each diagonal of a given list into a vector similar to this:
d.1st.el <- c(0.04770856, 0.05436957)
d.2nd.el <- c(0.03260190, 0.10484454)
My issue is to write the function that runs for all given lists and get the diagonals. For some reason, when I use unlist() to extract the values of each matrix for a given level, it doesn't get me the number but the full matrix.
Does anyone have a simple solution?
sapply(res, diag)
[,1] [,2]
[1,] 0.04770856 0.05436957
[2,] 0.03260190 0.10484454
# or
lapply(res, diag)
[[1]]
[1] 0.04770856 0.03260190
[[2]]
[1] 0.05436957 0.10484454
If you want the vectors for some reason in your global environment:
alld <- lapply(res, diag)
names(alld) <- sprintf("d.%d.el", 1:length(alld))
list2env(alld, globalenv())
In two steps you can do:
# Step 1 - Get the diagonals
all_diags <- sapply(res, function(x) diag(t(x)))
print(all_diags)
[,1] [,2]
[1,] 0.04770856 0.05436957
[2,] 0.03260190 0.10484454
# Step 2 - Append to vectors
d.1st.el <- all_diags[1,]
d.2nd.el <- all_diags[2,]
I have a matrix, named "mat", and a smaller matrix, named "center".
temp = c(1.8421,5.6586,6.3526,2.904,3.232,4.6076,4.8,3.2909,4.6122,4.9399)
mat = matrix(temp, ncol=2)
[,1] [,2]
[1,] 1.8421 4.6076
[2,] 5.6586 4.8000
[3,] 6.3526 3.2909
[4,] 2.9040 4.6122
[5,] 3.2320 4.9399
center = matrix(c(3, 6, 3, 2), ncol=2)
[,1] [,2]
[1,] 3 3
[2,] 6 2
I need to compute the distance between each row of mat with every row of center. For example, the distance of mat[1,] and center[1,] can be computed as
diff = mat[1,]-center[1,]
t(diff)%*%diff
[,1]
[1,] 3.92511
Similarly, I can find the distance of mat[1,] and center[2,]
diff = mat[1,]-center[2,]
t(diff)%*%diff
[,1]
[1,] 24.08771
Repeat this process for each row of mat, I will end up with
[,1] [,2]
[1,] 3.925110 24.087710
[2,] 10.308154 7.956554
[3,] 11.324550 1.790750
[4,] 2.608405 16.408805
[5,] 3.817036 16.304836
I know how to implement it with for-loops. I was really hoping someone could tell me how to do it with some kind of an apply() function, maybe mapply() I guess.
Thanks
apply(center, 1, function(x) colSums((x - t(mat)) ^ 2))
# [,1] [,2]
# [1,] 3.925110 24.087710
# [2,] 10.308154 7.956554
# [3,] 11.324550 1.790750
# [4,] 2.608405 16.408805
# [5,] 3.817036 16.304836
If you want the apply for expressiveness of code that's one thing but it's still looping, just different syntax. This can be done without any loops, or with a very small one across center instead of mat. I'd just transpose first because it's wise to get into the habit of getting as much as possible out of the apply statement. (The BrodieG answer is pretty much identical in function.) These are working because R will automatically recycle the smaller vector along the matrix and do it much faster than apply or for.
tm <- t(mat)
apply(center, 1, function(m){
colSums((tm - m)^2) })
Use dist and then extract the relevant submatrix:
ix <- 1:nrow(mat)
as.matrix( dist( rbind(mat, center) )^2 )[ix, -ix]
6 7
# 1 3.925110 24.087710
# 2 10.308154 7.956554
# 3 11.324550 1.790750
# 4 2.608405 16.408805
# 5 3.817036 16.304836
REVISION: simplified slightly.
You could use outer as well
d <- function(i, j) sum((mat[i, ] - center[j, ])^2)
outer(1:nrow(mat), 1:nrow(center), Vectorize(d))
This will solve it
t(apply(mat,1,function(row){
d1<-sum((row-center[1,])^2)
d2<-sum((row-center[2,])^2)
return(c(d1,d2))
}))
Result:
[,1] [,2]
[1,] 3.925110 24.087710
[2,] 10.308154 7.956554
[3,] 11.324550 1.790750
[4,] 2.608405 16.408805
[5,] 3.817036 16.304836
I have two equally long dataset - 'vpXmin' and 'vpXmax' created from 'vp'
> head(vpXmin)
vp
[1,] 253641 2621722
[2,] 253641 2622722
[3,] 253641 2623722
[4,] 253641 2624722
[5,] 253641 2625722
[6,] 253641 2626722
> head(vpXmax)
vp
[1,] 268641 2621722
[2,] 268641 2622722
[3,] 268641 2623722
[4,] 268641 2624722
[5,] 268641 2625722
[6,] 268641 2626722
I want to join each of the rows from these datasets using 'rbind' and want to create separate matrix; e.g.
l1<-rbind(vpXmax[1,],vpXmin[1,])
l2<-rbind(vpXmax[2,],vpXmin[2,])
... ...
Even though I'm not familiar with R loops, I want to deal with such a large data as a loop ... but I failed while trying this:
for (i in 1:length(vp)){rbind(vpXmax[i,],vpXmin[i,])}
Any idea why? Also, please gimme some good references for learning different kinds of loops using R, if any. thanks in advance.
Maybe something like:
vpXmax <- matrix(1:10,ncol=2)
vpXmin <- matrix(11:20,ncol=2)
l <- lapply(1:nrow(vpXmin),function(i) rbind(vpXmax[i,],vpXmin[i,]) )
Then, instead of l1, l2 etc etc you have
l[[1]]
# [,1] [,2]
#[1,] 1 6
#[2,] 11 16
l[[2]]
# [,1] [,2]
#[1,] 2 7
#[2,] 12 17
And although it is probably not ideal, there is one major thing wrong with your initial loop.
You aren't assigning your output, so you need to use assign or <- in some way to actually make an object. However, using assign, is pretty much a flag to set off alarm bells that there is a better way to do things, and <- would require pre-allocating or other stuffing around.
Nevertheless, it will work, albeit polluting your work space with l1 l2... ln objects:
for (i in 1:nrow(vpXmax)) {assign(paste0("l",i), rbind(vpXmax[i,],vpXmin[i,]) )}
> l1
# [,1] [,2]
#[1,] 1 6
#[2,] 11 16
> l2
# [,1] [,2]
#[1,] 2 7
#[2,] 12 17
As #ToNoy indicates, it is not obvious the kind of output that you want. The easiest way to proceed would be to create a list in which each element is the result of rbind each row of the two original data frames.
A <- data.frame("a" = runif(100, -1, 0), "b" = runif(100, 0, 1))
Z <- data.frame("a" = runif(100, -2, -1), "b" = runif(100, 1, 2))
output <- vector("list", nrow(A))
for (i in 1:nrow(A)) {
output[[i]] <- rbind(A[i, ], Z[i, ])
}