R package copula: does dimension order matter in cCopula? - r

I'm trying to to get a sample conditional on a value in one dimension using cCopula from R's copula package. I get the expected behavior when the conditioned value is in the first dimension, but not in other dimensions.
The first dimension works as expected:
cc <- claytonCopula(.5, dim = 2)
U <- cCopula(cbind(.1, runif(1000)), copula = cc, inverse = TRUE)
> head(U)
[,1] [,2]
[1,] 0.1 0.02399811
[2,] 0.1 0.51941744
[3,] 0.1 0.54457839
[4,] 0.1 0.30212338
[5,] 0.1 0.16368668
[6,] 0.1 0.43865921
The second does not. I expect .1 to be the value in the second column.
U <- cCopula(cbind(runif(1000), .1), copula = cc, inverse = TRUE)
head(U)
[,1] [,2]
[1,] 0.85596900 0.19792006
[2,] 0.05069967 0.02663780
[3,] 0.87673450 0.20056410
[4,] 0.52156481 0.14809874
[5,] 0.42508008 0.13026719
[6,] 0.04852083 0.02567477
My question is: should the order matter in cCopula? If yes, how can I work around it, and if no, what am I doing wrong?

The order does matter in cCopula. Check the Value section in the documentation for that function. Each column "contains the conditional copula function values", conditioned on the columns before it.
Not sure why you'd expect to have a column of 0.1 in your second example; even in the first example, that second column is not the random uniform values:
set.seed(1)
cc <- claytonCopula(.5, dim = 2)
Z <- cbind(.1, runif(1000))
U <- cCopula(Z, copula = cc, inverse = TRUE)
> head(Z)
[,1] [,2]
[1,] 0.1 0.2655087
[2,] 0.1 0.3721239
[3,] 0.1 0.5728534
[4,] 0.1 0.9082078
[5,] 0.1 0.2016819
[6,] 0.1 0.8983897
> head(U)
[,1] [,2]
[1,] 0.1 0.2293643
[2,] 0.1 0.3274950
[3,] 0.1 0.5232455
[4,] 0.1 0.8893238
[5,] 0.1 0.1723588
[6,] 0.1 0.8777835

Related

R: Calculation with each element in a matrix across lists

I think my example is something special. Since I am not advanced in the use of lapply I am stucking with the following calculation. Here is a short reproducivle example: Assume I've a list containing three matrices:
list <- list(est1=matrix(sample(c(0,0.4,0.2,1), replace=TRUE, size=10), ncol=2), est2=matrix(sample(c(0,0.4,0.2,1), replace=TRUE, size=10), ncol=2),
est3=matrix(sample(c(0,0.4,0.2,1), replace=TRUE, size=10), ncol=2))
$`est1`
[,1] [,2]
[1,] 0.4 1.0
[2,] 0.0 0.4
[3,] 0.0 0.0
[4,] 0.0 0.4
[5,] 0.0 1.0
$est2
[,1] [,2]
[1,] 0.0 0.2
[2,] 0.4 0.4
[3,] 1.0 0.0
[4,] 0.2 1.0
[5,] 0.4 0.4
$est3
[,1] [,2]
[1,] 1.0 0.2
[2,] 0.4 1.0
[3,] 1.0 0.0
[4,] 1.0 0.2
[5,] 0.4 0.4
Each matrix contains coefficient estimates for different iterations. Each element inside one matrix belongs to one coefficient. I want to calculate the percentage over the three Matrices at which a coefficient is different from zero.
Expected Output:
[,1] [,2]
0.67 1
0.67 1
0.67 0
0.67 1
0.67 1
Please do not call your list list. In the following, it will be called z.
z <- list(est1=matrix(sample(c(0,0.4,0.2,1), replace=TRUE, size=10), ncol=2), est2=matrix(sample(c(0,0.4,0.2,1), replace=TRUE, size=10), ncol=2),
est3=matrix(sample(c(0,0.4,0.2,1), replace=TRUE, size=10), ncol=2))
For the kind of problems that you describe, I like to use arrays, so the first step is to transform your list into an array.
library(abind)
A <- abind(list, along=3)
Then, you can apply a function along the third dimension:
apply(A, 1:2, function(x) 100 * sum(x!=0) / length(x))
[,1] [,2]
[1,] 100.0 100.0
[2,] 100.0 66.7
[3,] 100.0 66.7
[4,] 100.0 66.7
[5,] 66.7 66.7
Maybe the following does what you want.
I start by setting the RNG seed to make the results reproducible
set.seed(2081) # Make the results reproducible
list <- list(est1 = matrix(sample(c(0,0.4,0.2,1), replace=TRUE, size=10), ncol=2),
est2 = matrix(sample(c(0,0.4,0.2,1), replace=TRUE, size=10), ncol=2),
est3 = matrix(sample(c(0,0.4,0.2,1), replace=TRUE, size=10), ncol=2))
zeros <- sapply(list, `==`, 0)
res <- rowSums(zeros) / ncol(zeros)
matrix(res, ncol = 2)
# [,1] [,2]
#[1,] 0.3333333 0.3333333
#[2,] 0.0000000 0.6666667
#[3,] 0.0000000 0.3333333
#[4,] 0.3333333 0.3333333
#[5,] 0.6666667 0.3333333
EDIT.
The following uses rowMeans and is simpler. The result is identical() to res above.
res2 <- rowMeans(zeros)
identical(res, res2)
#[1] TRUE
matrix(res2, ncol = 2)

How does the `prop.table()` function work in r?

I've just started learning r and have had trouble finding an (understandable) explanation of what the prop.table() function does. I found the following explanation and example:
prop.table: Express Table Entries as Fraction of Marginal Table
Examples
m <- matrix(1:4, 2)
m
prop.table(m, 1)
But, as a beginner, I do not understand what this explanation means. I've also attempted to discern its functionality from the result of the above example, but I haven't been able to make sense of it.
With reference to the example above, what does the prop.table() function do? Furthermore, what is a "marginal table"?
The values in each cell divided by the sum of the 4 cells:
prop.table(m)
The value of each cell divided by the sum of the row cells:
prop.table(m, 1)
The value of each cell divided by the sum of the column cells:
prop.table(m, 2)
I think this can help
include all those things like prop.table(m), prop.table(m, 1), prop.table(m, 2)
m <- matrix(1:4, 2)
> m
[,1] [,2]
[1,] 1 3
[2,] 2 4
> prop.table(m) #sum=1+2+3+4=10, 1/10=0.1, 2/10=0.2, 3/10=0.3,4/10=0.4
[,1] [,2]
[1,] 0.1 0.3
[2,] 0.2 0.4
> prop.table(m,1)
[,1] [,2]
[1,] 0.2500000 0.7500000 #row1: sum=1+3=4, m(0,0)=1/4=0.25, m(0,1)=3/4=0.75
[2,] 0.3333333 0.6666667 #row2: sum=2+4=6, m(1,0)=2/6=0.33, m(1,1)=4/6=0.66
> prop.table(m,2)
[,1] [,2]
[1,] 0.3333333 0.4285714 #col1: sum=1+2=3, m(0,0)=1/3=0.33, m(1,0)=2/3=0.4285
[2,] 0.6666667 0.5714286 #col2: sum=3+4=7, m(0,1)=3/7=0.66, m(1,1)=4/7=0.57
>
when m is the 2D matrix: (m,1) refers to a fraction of row marginal table (sum over each row), (m,2) refers to a fraction of column marginal table (sum over each column). In short, just a "% of total sum of row of column", if you dont want to care about the term marginal.
Example:
m with extra row and column margin
[,1] [,2] ***
[1,] 1 4 5
[2,] 2 5 7
[3,] 3 6 9
*** 6 15
> prop.table(m,1)
` [,1] [,2]
[1,] 0.2000000 0.8000000
[2,] 0.2857143 0.7142857
[3,] 0.3333333 0.6666667
> prop.table(m,2)
[,1] [,2]
[1,] 0.1666667 0.2666667
[2,] 0.3333333 0.3333333
[3,] 0.5000000 0.4000000

`sweep() function` in R taking `2L` as input

Very, very specific question, but I'm stuck trying to unravel the code within contr.poly() in R.
I am at what I think is the last hurdle... There is this internal function, make.poly(), which is the critical part of contr.poly(). Within make.poly I see that there is a raw matrix generated, which for contr.poly(4) is:
[,1] [,2] [,3] [,4]
[1,] 1 -1.5 1 -0.3
[2,] 1 -0.5 -1 0.9
[3,] 1 0.5 -1 -0.9
[4,] 1 1.5 1 0.3
From there the function sweep() is applied with the following call and result:
Z <- sweep(raw, 2L, apply(raw, 2L, function(x) sqrt(sum(x^2))),
"/", check.margin = FALSE)
[,1] [,2] [,3] [,4]
[1,] 0.5 -0.6708204 0.5 -0.2236068
[2,] 0.5 -0.2236068 -0.5 0.6708204
[3,] 0.5 0.2236068 -0.5 -0.6708204
[4,] 0.5 0.6708204 0.5 0.2236068
I am familiar with the apply functions, and I guess sweep is similar, at least in syntax, but I don't understand what 2L is doing, and I don't know if "/" and check.margin = F are important to understand the mathematical operation being performed.
EDIT: Quite easy... thanks to this - it just normalizes vector lengths by dividing "/" by the function(x) applied column-wise, each entry of the matrix.
Here is an example that answers the operation in the function sweep().
I start with a matrix
> set.seed(0)
> (mat = matrix(rnorm(30, 5, 3), nrow= 10))
[,1] [,2] [,3]
[1,] 8.7888629 7.290780 4.327196
[2,] 4.0212999 2.602972 6.132187
[3,] 8.9893978 1.557029 5.400009
[4,] 8.8172880 4.131615 7.412569
[5,] 6.2439243 4.102355 4.828680
[6,] 0.3801499 3.765468 6.510824
[7,] 2.2142989 5.756670 8.257308
[8,] 4.1158387 2.324237 2.927138
[9,] 4.9826985 6.307050 1.146202
[10,] 12.2139602 1.287385 5.140179
and I want to center the data columnwise. Granted, I could use scale(mat, center = T, scale = F) and be done, but I find that this function give you a list of attributes at the end as such:
attr(,"scaled:center")
[1] 6.076772 3.912556 5.208229
corresponding to the column means. Good to have, but I just wanted the matrix, clean and neat. So it turns out that this can be achieved with:
> (centered = sweep(mat, 2, apply(mat,2, function(x) mean(x)),"-"))
[,1] [,2] [,3]
[1,] 2.7120910 3.3782243 -0.88103281
[2,] -2.0554720 -1.3095838 0.92395779
[3,] 2.9126259 -2.3555271 0.19177993
[4,] 2.7405161 0.2190592 2.20433938
[5,] 0.1671524 0.1897986 -0.37954947
[6,] -5.6966220 -0.1470886 1.30259477
[7,] -3.8624730 1.8441143 3.04907894
[8,] -1.9609332 -1.5883194 -2.28109067
[9,] -1.0940734 2.3944938 -4.06202721
[10,] 6.1371883 -2.6251713 -0.06805063
So the sweep() function is understood as:
sweep(here goes matrix name to sweep through, tell me if you want to do it column (2) or row wise (1), but first let's calculate the second argument to use in the sweep - let's use apply on either the same matrix, or another matrix: just type the name here, again... column or row wise, now define a function(x) mean(x), almost done: now the actual operation in the function in quotes: "-" or "/"... and done
Interestingly, we could have used the means of the columns of a completely different matrix to then sweep through the original matrix - presumably a more complex operation, more in line with the reason why this function was developed.
> aux.mat = matrix(rnorm(9), nrow = 3)
> aux.mat
[,1] [,2] [,3]
[1,] -0.2793463 -0.4527840 -1.065591
[2,] 1.7579031 -0.8320433 -1.563782
[3,] 0.5607461 -1.1665705 1.156537
> (centered = sweep(mat, 2, apply(aux.mat,2, function(x) mean(x)),"-"))
[,1] [,2] [,3]
[1,] 8.1090952 8.107913 4.818142
[2,] 3.3415323 3.420105 6.623132
[3,] 8.3096302 2.374162 5.890954
[4,] 8.1375203 4.948748 7.903514
[5,] 5.5641567 4.919487 5.319625
[6,] -0.2996178 4.582600 7.001769
[7,] 1.5345313 6.573803 8.748253
[8,] 3.4360710 3.141369 3.418084
[9,] 4.3029308 7.124183 1.637147
[10,] 11.5341925 2.104517 5.631124

calculated minimum and max of coordinates from SpatialLinesDataFrame object

I want a simple way to calculate the minimum and maximum of coordinates for each line in SpatialLinesDataFrame object
code :
coordinates(contour)
Extract the SpatialLinesDataFrame object :
[[9]]
[[9]][[1]]
[,1] [,2]
[1,] -4.44583300 45.87010
[2,] -4.24583300 45.87874
[3,] -4.04583300 45.90037
[4,] -4.02830912 45.90306
[20,] -1.6458330 42.98340
[21,] -1.8458330 43.07336
[[12]]
[[12]][[1]]
[,1] [,2]
[1,] -1.845833 43.48721
[2,] -1.849027 43.50306
[3,] -1.845833 43.50926
[4,] -1.710073 43.70306
[5,] -1.645833 43.74554
[6,] -1.445833 43.73724
[7,] -1.373848 43.70306
[8,] -1.261626 43.50306
[9,] -1.308085 43.30306
[10,] -1.445833 43.17663
[11,] -1.645833 43.16952
[12,] -1.808587 43.30306
[13,] -1.845833 43.48721
[[13]]
[[13]][[1]]
[,1] [,2]
[1,] -1.645833 43.34325
[2,] -1.712682 43.50306
[3,] -1.645833 43.58276
[4,] -1.445833 43.58877
[5,] -1.376018 43.50306
[6,] -1.445833 43.33714
[7,] -1.645833 43.34325
Is there an easier way of doing it?
Edit :
I give an example taken by #EDi to show what I would like résulat :
[[1]]
[[1]][[1]]
[,1] [,2]
[1,] 1 3
[2,] 2 2
[3,] 3 2
min(1,2,3)=1 & min(3,2,2)=2
max(1,2,3)=3 & max(3,2,2)=3
[[1]][[2]]
[,1] [,2]
[1,] 1.05 3.05
[2,] 2.05 2.05
[3,] 3.05 2.05
min(1.05,2.05,3.05)= 1.05 & min(3.05,2.05,2.05)= 2.05
max(1.05,2.05,3.05)= 3 .05 & max(3.05,2.05,2.05)= 3.05
[[2]]
[[2]][[1]]
[,1] [,2]
[1,] 1 1.0
[2,] 2 1.5
[3,] 3 1.0
min(1,2,3)= 1& min(1.0,1.5,1.0)= 1.0
max(1,2,3)= 3 & max(1.0,1.5,1.0)= 1.5
Note sure if I understood your correctly... Something like this?
# Some Lines --------------------------------------------------------------
require(sp)
l1 = cbind(c(1,2,3), c(3,2,2))
l1a = cbind(l1[,1]+.05,l1[,2]+.05)
l2 = cbind(c(1,2,3),c(1,1.5,1))
sl1 = Lines(list(Line(l1), Line(l1a)), ID = 'a')
sl2 = Lines(Line(l2), ID = 'b')
sl = SpatialLines(list(sl1, sl2))
plot(sl, col = c("red", "blue"))
abline(v = 1:3, lty = 'dotted')
abline(h = 1:3, lty = 'dotted')
# Extract min / max of coordinates for each line --------------------------
cc <- coordinates(sl)
foo <- function(y) {
# combine coordinates lines with same ID
ccl <- do.call(rbind, y)
# return min / max
return(c(range(ccl[,1]), range(ccl[,2])))
}
out <- t(sapply(cc, foo))
out
# for each line one row
# from left to right (min(x), max(x), min(y), max(y))
Update
Based on your edit (it wasn't clear to me that you want the extent for each line segment) I would suggest:
foo <- function(y) {
return(c(range(y[,1]), range(y[,2])))
}
rapply(cc, foo, how = 'unlist')
matrix(rapply(cc, foo, how = 'unlist'), nrow = 4)
rapply() applies the function also to sublists, matrix() is just for formatting.

Solving non-square linear system with R

How to solve a non-square linear system with R : A X = B ?
(in the case the system has no solution or infinitely many solutions)
Example :
A=matrix(c(0,1,-2,3,5,-3,1,-2,5,-2,-1,1),3,4,T)
B=matrix(c(-17,28,11),3,1,T)
A
[,1] [,2] [,3] [,4]
[1,] 0 1 -2 3
[2,] 5 -3 1 -2
[3,] 5 -2 -1 1
B
[,1]
[1,] -17
[2,] 28
[3,] 11
If the matrix A has more rows than columns, then you should use least squares fit.
If the matrix A has fewer rows than columns, then you should perform singular value decomposition. Each algorithm does the best it can to give you a solution by using assumptions.
Here's a link that shows how to use SVD as a solver:
http://www.ecse.rpi.edu/~qji/CV/svd_review.pdf
Let's apply it to your problem and see if it works:
Your input matrix A and known RHS vector B:
> A=matrix(c(0,1,-2,3,5,-3,1,-2,5,-2,-1,1),3,4,T)
> B=matrix(c(-17,28,11),3,1,T)
> A
[,1] [,2] [,3] [,4]
[1,] 0 1 -2 3
[2,] 5 -3 1 -2
[3,] 5 -2 -1 1
> B
[,1]
[1,] -17
[2,] 28
[3,] 11
Let's decompose your A matrix:
> asvd = svd(A)
> asvd
$d
[1] 8.007081e+00 4.459446e+00 4.022656e-16
$u
[,1] [,2] [,3]
[1,] -0.1295469 -0.8061540 0.5773503
[2,] 0.7629233 0.2908861 0.5773503
[3,] 0.6333764 -0.5152679 -0.5773503
$v
[,1] [,2] [,3]
[1,] 0.87191556 -0.2515803 -0.1764323
[2,] -0.46022634 -0.1453716 -0.4694190
[3,] 0.04853711 0.5423235 0.6394484
[4,] -0.15999723 -0.7883272 0.5827720
> adiag = diag(1/asvd$d)
> adiag
[,1] [,2] [,3]
[1,] 0.1248895 0.0000000 0.00000e+00
[2,] 0.0000000 0.2242431 0.00000e+00
[3,] 0.0000000 0.0000000 2.48592e+15
Here's the key: the third eigenvalue in d is very small; conversely, the diagonal element in adiag is very large. Before solving, set it equal to zero:
> adiag[3,3] = 0
> adiag
[,1] [,2] [,3]
[1,] 0.1248895 0.0000000 0
[2,] 0.0000000 0.2242431 0
[3,] 0.0000000 0.0000000 0
Now let's compute the solution (see slide 16 in the link I gave you above):
> solution = asvd$v %*% adiag %*% t(asvd$u) %*% B
> solution
[,1]
[1,] 2.411765
[2,] -2.282353
[3,] 2.152941
[4,] -3.470588
Now that we have a solution, let's substitute it back to see if it gives us the same B:
> check = A %*% solution
> check
[,1]
[1,] -17
[2,] 28
[3,] 11
That's the B side you started with, so I think we're good.
Here's another nice SVD discussion from AMS:
http://www.ams.org/samplings/feature-column/fcarc-svd
Aim is to solve Ax = b
where A is p by q, x is q by 1 and b is p by 1 for x given A and b.
Approach 1: Generalized Inverse: Moore-Penrose
https://en.wikipedia.org/wiki/Generalized_inverse
Multiplying both sides of the equation, we get
A'Ax = A' b
where A' is the transpose of A. Note that A'A is q by q matrix now. One way to solve this now multiply both sides of the equation by the inverse of A'A. Which gives,
x = (A'A)^{-1} A' b
This is the theory behind generalized inverse. Here G = (A'A)^{-1} A' is pseudo-inverse of A.
library(MASS)
ginv(A) %*% B
# [,1]
#[1,] 2.411765
#[2,] -2.282353
#[3,] 2.152941
#[4,] -3.470588
Approach 2: Generalized Inverse using SVD.
#duffymo used SVD to obtain a pseudoinverse of A.
However, last elements of svd(A)$d may not be quite as small as in this example. So, probably one shouldn't use that method as is. Here's an example where none of the last elements of A is close to zero.
A <- as.matrix(iris[11:13, -5])
A
# Sepal.Length Sepal.Width Petal.Length Petal.Width
# 11 5.4 3.7 1.5 0.2
# 12 4.8 3.4 1.6 0.2
# 13 4.8 3.0 1.4 0.1
svd(A)$d
# [1] 10.7820526 0.2630862 0.1677126
One option would be to look as the singular values in cor(A)
svd(cor(A))$d
# [1] 2.904194e+00 1.095806e+00 1.876146e-16 1.155796e-17
Now, it is clear there is only two large singular values are present. So, one now can apply svd on A to get pseudo-inverse as below.
svda <- svd(A)
G = svda$v[, 1:2] %*% diag(1/svda$d[1:2]) %*% t(svda$u[, 1:2])
# to get x
G %*% B

Resources