Very, very specific question, but I'm stuck trying to unravel the code within contr.poly() in R.
I am at what I think is the last hurdle... There is this internal function, make.poly(), which is the critical part of contr.poly(). Within make.poly I see that there is a raw matrix generated, which for contr.poly(4) is:
[,1] [,2] [,3] [,4]
[1,] 1 -1.5 1 -0.3
[2,] 1 -0.5 -1 0.9
[3,] 1 0.5 -1 -0.9
[4,] 1 1.5 1 0.3
From there the function sweep() is applied with the following call and result:
Z <- sweep(raw, 2L, apply(raw, 2L, function(x) sqrt(sum(x^2))),
"/", check.margin = FALSE)
[,1] [,2] [,3] [,4]
[1,] 0.5 -0.6708204 0.5 -0.2236068
[2,] 0.5 -0.2236068 -0.5 0.6708204
[3,] 0.5 0.2236068 -0.5 -0.6708204
[4,] 0.5 0.6708204 0.5 0.2236068
I am familiar with the apply functions, and I guess sweep is similar, at least in syntax, but I don't understand what 2L is doing, and I don't know if "/" and check.margin = F are important to understand the mathematical operation being performed.
EDIT: Quite easy... thanks to this - it just normalizes vector lengths by dividing "/" by the function(x) applied column-wise, each entry of the matrix.
Here is an example that answers the operation in the function sweep().
I start with a matrix
> set.seed(0)
> (mat = matrix(rnorm(30, 5, 3), nrow= 10))
[,1] [,2] [,3]
[1,] 8.7888629 7.290780 4.327196
[2,] 4.0212999 2.602972 6.132187
[3,] 8.9893978 1.557029 5.400009
[4,] 8.8172880 4.131615 7.412569
[5,] 6.2439243 4.102355 4.828680
[6,] 0.3801499 3.765468 6.510824
[7,] 2.2142989 5.756670 8.257308
[8,] 4.1158387 2.324237 2.927138
[9,] 4.9826985 6.307050 1.146202
[10,] 12.2139602 1.287385 5.140179
and I want to center the data columnwise. Granted, I could use scale(mat, center = T, scale = F) and be done, but I find that this function give you a list of attributes at the end as such:
attr(,"scaled:center")
[1] 6.076772 3.912556 5.208229
corresponding to the column means. Good to have, but I just wanted the matrix, clean and neat. So it turns out that this can be achieved with:
> (centered = sweep(mat, 2, apply(mat,2, function(x) mean(x)),"-"))
[,1] [,2] [,3]
[1,] 2.7120910 3.3782243 -0.88103281
[2,] -2.0554720 -1.3095838 0.92395779
[3,] 2.9126259 -2.3555271 0.19177993
[4,] 2.7405161 0.2190592 2.20433938
[5,] 0.1671524 0.1897986 -0.37954947
[6,] -5.6966220 -0.1470886 1.30259477
[7,] -3.8624730 1.8441143 3.04907894
[8,] -1.9609332 -1.5883194 -2.28109067
[9,] -1.0940734 2.3944938 -4.06202721
[10,] 6.1371883 -2.6251713 -0.06805063
So the sweep() function is understood as:
sweep(here goes matrix name to sweep through, tell me if you want to do it column (2) or row wise (1), but first let's calculate the second argument to use in the sweep - let's use apply on either the same matrix, or another matrix: just type the name here, again... column or row wise, now define a function(x) mean(x), almost done: now the actual operation in the function in quotes: "-" or "/"... and done
Interestingly, we could have used the means of the columns of a completely different matrix to then sweep through the original matrix - presumably a more complex operation, more in line with the reason why this function was developed.
> aux.mat = matrix(rnorm(9), nrow = 3)
> aux.mat
[,1] [,2] [,3]
[1,] -0.2793463 -0.4527840 -1.065591
[2,] 1.7579031 -0.8320433 -1.563782
[3,] 0.5607461 -1.1665705 1.156537
> (centered = sweep(mat, 2, apply(aux.mat,2, function(x) mean(x)),"-"))
[,1] [,2] [,3]
[1,] 8.1090952 8.107913 4.818142
[2,] 3.3415323 3.420105 6.623132
[3,] 8.3096302 2.374162 5.890954
[4,] 8.1375203 4.948748 7.903514
[5,] 5.5641567 4.919487 5.319625
[6,] -0.2996178 4.582600 7.001769
[7,] 1.5345313 6.573803 8.748253
[8,] 3.4360710 3.141369 3.418084
[9,] 4.3029308 7.124183 1.637147
[10,] 11.5341925 2.104517 5.631124
Related
I'm trying to to get a sample conditional on a value in one dimension using cCopula from R's copula package. I get the expected behavior when the conditioned value is in the first dimension, but not in other dimensions.
The first dimension works as expected:
cc <- claytonCopula(.5, dim = 2)
U <- cCopula(cbind(.1, runif(1000)), copula = cc, inverse = TRUE)
> head(U)
[,1] [,2]
[1,] 0.1 0.02399811
[2,] 0.1 0.51941744
[3,] 0.1 0.54457839
[4,] 0.1 0.30212338
[5,] 0.1 0.16368668
[6,] 0.1 0.43865921
The second does not. I expect .1 to be the value in the second column.
U <- cCopula(cbind(runif(1000), .1), copula = cc, inverse = TRUE)
head(U)
[,1] [,2]
[1,] 0.85596900 0.19792006
[2,] 0.05069967 0.02663780
[3,] 0.87673450 0.20056410
[4,] 0.52156481 0.14809874
[5,] 0.42508008 0.13026719
[6,] 0.04852083 0.02567477
My question is: should the order matter in cCopula? If yes, how can I work around it, and if no, what am I doing wrong?
The order does matter in cCopula. Check the Value section in the documentation for that function. Each column "contains the conditional copula function values", conditioned on the columns before it.
Not sure why you'd expect to have a column of 0.1 in your second example; even in the first example, that second column is not the random uniform values:
set.seed(1)
cc <- claytonCopula(.5, dim = 2)
Z <- cbind(.1, runif(1000))
U <- cCopula(Z, copula = cc, inverse = TRUE)
> head(Z)
[,1] [,2]
[1,] 0.1 0.2655087
[2,] 0.1 0.3721239
[3,] 0.1 0.5728534
[4,] 0.1 0.9082078
[5,] 0.1 0.2016819
[6,] 0.1 0.8983897
> head(U)
[,1] [,2]
[1,] 0.1 0.2293643
[2,] 0.1 0.3274950
[3,] 0.1 0.5232455
[4,] 0.1 0.8893238
[5,] 0.1 0.1723588
[6,] 0.1 0.8777835
I have a list of matrices and a list of vectors, and I want to divide the columns of each matrix with the corresponding vector element.
For example, given
set.seed(230)
data <- list(cbind(c(NA, rnorm(6)),c(rnorm(6),NA)), cbind(runif(7), runif(7)))
divisors <- list(c(0.5,2), c(3,4))
I'm looking for a vectorized function that produces output that looks the same as
for(i in 1:length(data)){
for(j in 1:ncol(data[[i]])){data[[i]][,j] <- data[[i]][,j] / divisors[[i]][j]}
}
i.e.
[[1]]
[,1] [,2]
[1,] NA 0.28265752
[2,] -0.46967014 -0.07132588
[3,] 0.20253439 -0.37432527
[4,] 0.65736410 0.06630705
[5,] 0.72349294 0.67202129
[6,] 0.88532648 -0.80892508
[7,] 0.08162027 NA
[[2]]
[,1] [,2]
[1,] 0.26597435 0.18120979
[2,] 0.31213250 0.16493883
[3,] 0.19250804 0.14104145
[4,] 0.21196882 0.10172964
[5,] 0.10389773 0.04979742
[6,] 0.02754329 0.15064043
[7,] 0.25771766 0.23042586
The closest I have been able to come is
Map(`/`, data, divisors)
But that divides rows (rather than columns) of the matrix by the vector. Any help appreciated.
Transpose your matrices before and after:
lapply(Map(`/`, lapply(data, t), divisors), t)
# [[1]]
# [,1] [,2]
# [1,] NA 0.28265752
# [2,] -0.46967014 -0.07132588
# [3,] 0.20253439 -0.37432527
# [4,] 0.65736410 0.06630705
# [5,] 0.72349294 0.67202129
# [6,] 0.88532648 -0.80892508
# [7,] 0.08162027 NA
#
# [[2]]
# [,1] [,2]
# [1,] 0.26597435 0.18120979
# [2,] 0.31213250 0.16493883
# [3,] 0.19250804 0.14104145
# [4,] 0.21196882 0.10172964
# [5,] 0.10389773 0.04979742
# [6,] 0.02754329 0.15064043
# [7,] 0.25771766 0.23042586
I prefer the transpose approach above, but another option is to expand your divisor vectors into matrices of the same dimensions as in data:
div_mat = Map(matrix, data = divisors, nrow = sapply(data, nrow), ncol = 2, byrow = T)
Map("/", data, div_mat)
so basicly I want to separate a random generated matrix into 2 matrix, 1 for training and 1 for testing.
a <- s[sample(nrow(s),size=3,replace=FALSE),]
b <- s[-a,]
> s
[,1] [,2]
[1,] 0.69779187 -0.75869384
[2,] -0.46857477 -0.33813598
[3,] 0.53903809 -0.95950598
[4,] -0.33312675 -0.49951164
[5,] 0.88500834 0.08256923
[6,] 0.63664652 0.87420720
[7,] 0.61614134 0.77893294
[8,] 0.36956134 0.07586245
[9,] -0.03678593 -0.23743987
[10,] -0.27057064 -0.86067063
> a
[,1] [,2]
[1,] 0.8850083 0.08256923
[2,] 0.6366465 0.87420720
[3,] -0.2705706 -0.86067063
> b
[,1] [,2]
The idea here is generate a 10*2 matrix, and random pick 3 rows as training data from matrix, then output the training matrix and the rest row of matrix as testing matrix.
Does anyone has some suggestions on how to delete a from s?
The issue is that you're trying to index s with a matrix a, rather than the randomly selected indices. Modifying your code to the following should do the trick:
i <- sample(nrow(s),size=3,replace=FALSE)
a <- s[i,]
b <- s[-i,] # Note the indexing with i, rather than a
I have a matrix with 2 columns, and I'd like to turn it into a matrix with specified dimensions.
> t <- matrix(rnorm(20), ncol=2, nrow=10)
[,1] [,2]
[1,] 1.4938530 1.2493088
[2,] -0.8079445 1.8715868
[3,] 0.5775695 -0.9277420
[4,] 0.4415969 2.6357908
[5,] 0.3209226 -1.1306049
[6,] 0.5109251 -0.8661100
[7,] 1.9495571 0.2092941
[8,] 0.7816373 1.1517466
[9,] 0.0300595 -0.1351532
[10,] 0.7550894 0.7778869
What I'd like to do is something like:
> tt <- matrix(t, ncol=4, nrow=5)
[,1] [,2] [3,] [4,]
[1,] 1.4938530 1.2493088 -0.8079445 1.8715868
[2,] 0.5775695 -0.9277420 0.4415969 2.6357908
[3,] etc.
I tried to do things with modulo but my head hurts too much for me to try even one more minute.
You can transpose your first matrix, so that data is stored in the order you want, and then fill the second matrix by row:
tt <- matrix(t(t), ncol=4, nrow=5, byrow = T)
t
# [,1] [,2]
# [1,] -1.4162465950 0.01532476
# [2,] -0.2366332875 -0.04024386
# [3,] 0.5146631983 -0.34720239
# [4,] 1.9243922633 -0.24016160
# [5,] 1.6161165230 0.63187438
# [6,] -0.3558181508 -0.73199138
# [7,] 0.7459405376 0.01934826
# [8,] -1.0428581093 -2.04422042
# [9,] 0.0003166344 0.98973993
#[10,] 0.6390745275 -0.65584930
tt
# [,1] [,2] [,3] [,4]
# [1,] -1.4162465950 0.01532476 -0.2366333 -0.04024386
# [2,] 0.5146631983 -0.34720239 1.9243923 -0.24016160
# [3,] 1.6161165230 0.63187438 -0.3558182 -0.73199138
# [4,] 0.7459405376 0.01934826 -1.0428581 -2.04422042
# [5,] 0.0003166344 0.98973993 0.6390745 -0.65584930
When you work with matrix in R, you can think of it as a vector with data stored column by column. So extracting data by row from a matrix is not as straight forward as extracting by column which is essentially how data is stored. After transposing the first matrix, the data will be stored in an order you want to extract and then fill the second matrix by row would be straight forward.
I have a data.frame with several columns some of which contain NAs. I want to run the following function suggested by Farnsworth over every single column:
hpfilter = function(x,lambda=1600){
eye <- diag(length(x))
result <- solve(eye+lambda*crossprod(diff(eye,lag=1,d=2)),x)
return(result)
}
I do so by:
test <- as.data.frame(sapply(vectorOfColumnNames,function(X) hpfilter(mydf[,X])))
which works fine as long as none of the columns contain NAs. If I add an na.omit to the function it continues to work well with the same amount of NAs.
But how can I handle every column truly on its own and end up with a data.frame at the end (that contains NAs where the input had NAs) ?
EDIT: I wonder whether there is a general solution to the problem of ending up with vectors of different length when running a function over apply. Maybe something similar to what is possible with data.table indexing.
It is not completely clear to me what you want, but I'll give it a try.
Let's create some example data. Note that I use a matrix and not a data.frame. Explicitely iterating over the columnnames is now not needed, greatly simplifying the code.
m = matrix(runif(100), 10, 10)
apply(m, 2, hpfilter)
And introduce some NA values:
m[sample(1:10, 2), sample(1:10, 2)] <- NA
apply(m, 2, hpfilter)
A tweak to the hpfilter function yields the result, I believe, you are looking for:
hpfilter = function(x,lambda=1600, na.omit = TRUE) {
if(na.omit) {
na_values = is.na(x)
if(any(na_values)) x = x[-which(na_values)]
}
eye <- diag(length(x))
result <- solve(eye+lambda*crossprod(diff(eye,lag=1,d=2)),x)
for(idx in which(na_values)) result = append(result, NA, idx - 1) # reinsert NA values
return(result)
}
Essentially, NA's are torn out of the dataset. The high pass filter is then based on the values surrounding the NA, e.g. the next or previous hour. Later the NA's are reintroduced. You need to think carefully if this is the way you want to deal with NA's. If there are a large number of consecutive NA's, you start apply your high pass filter to pieces of the timeseries which are far apart.
The output:
> m
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0.3492249 0.13243768 NA 0.302102537 0.4229100 0.5922950
[2,] 0.2933371 0.20001802 0.03145775 0.429109073 0.9597172 0.9490127
[3,] 0.7040072 0.49672438 0.22093906 0.323518480 0.4842678 0.4081306
[4,] 0.9072993 0.86930200 0.52859786 0.122859661 0.1841663 0.5389729
[5,] 0.3236061 0.38602856 0.46249498 0.866068888 0.6981199 0.9766099
[6,] 0.4878379 0.31511419 NA 0.807535084 0.6563737 0.0419552
[7,] 0.3244131 0.34287848 0.31360175 0.821228400 0.5989790 0.6631735
[8,] 0.3758025 0.39728965 0.64960319 0.283663049 0.9054992 0.8160815
[9,] 0.4485784 0.06440579 0.67518605 0.815575767 0.1479089 0.6391120
[10,] 0.9061172 0.16812244 0.86293095 0.005075972 0.6736308 0.7574890
[,7] [,8] [,9] [,10]
[1,] NA 0.02125704 0.7029417 0.490146887
[2,] 0.353827474 0.40482437 0.2102700 0.351850122
[3,] 0.778491744 0.32676623 0.6709055 0.953126856
[4,] 0.825446342 0.24411303 0.4939415 0.026877439
[5,] 0.264156057 0.30620799 0.0474103 0.505411467
[6,] NA 0.63995093 0.6155766 0.736349958
[7,] 0.048948805 0.96751061 0.9697167 0.005304793
[8,] 0.733419331 0.85554984 0.7438209 0.581133546
[9,] 0.823691194 0.74550281 0.0635690 0.903188495
[10,] 0.009001798 0.74201923 0.3516963 0.904093070
> apply(m, 2, hpfilter)
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 0.4337716 0.4101083 NA 0.4239194 0.5762643 0.6178718 NA
[2,] 0.4512989 0.3950404 0.1219334 0.4367185 0.5756097 0.6219962 0.5909609
[3,] 0.4687735 0.3797990 0.2209373 0.4494414 0.5748593 0.6261047 0.5593590
[4,] 0.4860436 0.3640885 0.3198847 0.4620073 0.5741572 0.6303856 0.5276089
[5,] 0.5031048 0.3476868 0.4187190 0.4742566 0.5735911 0.6348910 0.4956993
[6,] 0.5202157 0.3306871 NA 0.4858177 0.5730049 0.6396161 NA
[7,] 0.5375230 0.3132068 0.5175141 0.4965640 0.5723201 0.6447694 0.4638051
[8,] 0.5551529 0.2953536 0.6163712 0.5065697 0.5715107 0.6501860 0.4319566
[9,] 0.5730986 0.2772537 0.7152643 0.5161124 0.5705671 0.6557125 0.3999246
[10,] 0.5912411 0.2590969 0.8141878 0.5253298 0.5696884 0.6612990 0.3676684
[,8] [,9] [,10]
[1,] 0.1423571 0.5362741 0.3871990
[2,] 0.2276829 0.5253623 0.4217619
[3,] 0.3129329 0.5145546 0.4563892
[4,] 0.3981423 0.5037583 0.4911015
[5,] 0.4833547 0.4929783 0.5262298
[6,] 0.5685175 0.4822135 0.5618152
[7,] 0.6534674 0.4711843 0.5978857
[8,] 0.7380857 0.4596942 0.6345782
[9,] 0.8224501 0.4478587 0.6716594
[10,] 0.9067115 0.4359704 0.7088627