pca in R with princomp() and using svd() [duplicate] - r
This question already has an answer here:
Closed 11 years ago.
Possible Duplicate:
Comparing svd and princomp in R
How to perform PCA using 2 methods (princomp() and svd of correlation matrix ) in R
I have a data set like:
438,498,3625,3645,5000,2918,5000,2351,2332,2643,1698,1687,1698,1717,1744,593,502,493,504,445,431,444,440,429,10
438,498,3625,3648,5000,2918,5000,2637,2332,2649,1695,1687,1695,1720,1744,592,502,493,504,449,431,444,443,429,10
438,498,3625,3629,5000,2918,5000,2637,2334,2643,1696,1687,1695,1717,1744,593,502,493,504,449,431,444,446,429,10
437,501,3625,3626,5000,2918,5000,2353,2334,2642,1730,1687,1695,1717,1744,593,502,493,504,449,431,444,444,429,10
438,498,3626,3629,5000,2918,5000,2640,2334,2639,1696,1687,1695,1717,1744,592,502,493,504,449,431,444,441,429,10
439,498,3626,3629,5000,2918,5000,2633,2334,2645,1705,1686,1694,1719,1744,589,502,493,504,446,431,444,444,430,10
440,5000,3627,3628,5000,2919,3028,2346,2330,2638,1727,1684,1692,1714,1745,588,501,492,504,451,433,446,444,432,10
444,5021,3631,3634,5000,2919,5000,2626,2327,2638,1698,1680,1688,1709,1740,595,500,491,503,453,436,448,444,436,10
451,5025,3635,3639,5000,2920,3027,2620,2323,2632,1706,1673,1681,1703,753,595,499,491,502,457,440,453,454,442,20
458,5022,3640,3644,5000,2922,5000,2346,2321,2628,1688,1666,1674,1696,744,590,496,490,498,462,444,458,461,449,20
465,525,3646,3670,5000,2923,5000,2611,2315,2631,1674,1658,1666,1688,735,593,495,488,497,467,449,462,469,457,20
473,533,3652,3676,5000,2925,5000,2607,2310,2623,1669,1651,1659,1684,729,578,496,487,498,469,454,467,476,465,20
481,544,3658,3678,5000,2926,5000,2606,2303,2619,1668,1643,1651,1275,723,581,495,486,497,477,459,472,484,472,20
484,544,3661,3665,5000,2928,5000,2321,2304,5022,1647,1639,1646,1270,757,623,493,484,495,480,461,474,485,476,20
484,532,3669,3662,2945,2926,5000,2326,2306,2620,1648,1639,1646,1270,760,533,493,483,494,507,461,473,486,476,20
482,520,3685,3664,2952,2927,5000,2981,2307,2329,1650,1640,1644,1268,757,533,492,482,492,513,459,474,485,474,20
481,522,3682,3661,2955,2927,2957,2984,1700,2622,1651,1641,1645,1272,761,530,492,482,492,513,462,486,483,473,20
480,525,3694,3664,2948,2926,2950,2995,1697,2619,1651,1642,1646,1269,762,530,493,482,492,516,462,486,483,473,20
481,515,5018,3664,2956,2927,2947,2993,1697,2622,1651,1641,1645,1269,765,592,489,482,495,531,462,499,483,473,20
479,5000,3696,3661,2953,2927,2944,2993,1702,2622,1649,1642,1645,1269,812,588,489,481,491,510,462,481,483,473,20
480,506,5019,3665,2941,2929,2945,2981,1700,2616,1652,1642,1645,1271,814,643,491,480,493,524,461,469,484,473,20
479,5000,5019,3661,2943,2930,2942,2996,1698,2312,1653,1642,1644,1274,811,617,491,479,491,575,461,465,484,473,20
479,5000,5020,3662,2945,2931,2942,2997,1700,2313,1654,1642,1644,1270,908,616,490,478,489,503,460,460,478,473,10
481,508,5021,3660,2954,2936,2946,2966,1705,2313,1654,1643,1643,1270,1689,678,493,477,483,497,467,459,476,473,10
486,510,522,3662,2958,2938,2939,2627,1707,2314,1659,1643,1639,1665,1702,696,516,476,477,547,465,457,470,474,10
479,521,520,3663,2954,2938,2941,2957,1712,2314,1660,1643,1638,1660,1758,688,534,475,475,489,461,456,465,474,10
480,554,521,3664,2954,2938,2941,2632,1715,2313,1660,1643,1637,1656,1761,687,553,475,474,558,462,453,465,476,10
481,511,5023,3665,2954,2937,2941,2627,1707,2312,1660,1641,1636,1655,1756,687,545,475,475,504,463,458,470,477,10
482,528,524,3665,2953,2937,2940,2629,1706,2312,1657,1640,1635,1654,1756,566,549,475,476,505,464,459,468,477,10
So I am doing this:
x <- read.csv("C:\\data_25_1000.txt",header=F,row.names=NULL)
p1 <- princomp(x, cor = TRUE) ## using correlation matrix
p1
Call:
princomp(x = x, cor = TRUE)
Standard deviations:
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9 Comp.10 Comp.11 Comp.12 Comp.13 Comp.14 Comp.15 Comp.16
1.9800328 1.8321498 1.4147367 1.3045541 1.2016116 1.1708212 1.1424120 1.0134829 1.0045317 0.9078734 0.8442308 0.8093044 0.7977656 0.7661921 0.7370972 0.7075442
Comp.17 Comp.18 Comp.19 Comp.20 Comp.21 Comp.22 Comp.23 Comp.24 Comp.25
0.7011462 0.6779179 0.6671614 0.6407627 0.6077336 0.5767217 0.5659030 0.5526520 0.5191375
25 variables and 1000 observations.
For the second method suppose I have the correlation matrix of "C:\data_25_1000.txt"
which is:
1.0 0.3045 0.1448 -0.0714 -0.038 -0.0838 -0.1433 -0.1071 -0.1988 -0.1076 -0.0313 -0.157 -0.1032 -0.137 -0.0802 0.1244 0.0701 0.0457 -0.0634 0.0401 0.1643 0.3056 0.3956 0.4533 0.1557
0.3045 0.9999 0.3197 0.1328 0.093 -0.0846 -0.132 0.0046 -0.004 -0.0197 -0.1469 -0.1143 -0.2016 -0.1 -0.0316 0.0044 -0.0589 -0.0589 0.0277 0.0314 0.078 0.0104 0.0692 0.1858 0.0217
0.1448 0.3197 1 0.3487 0.2811 0.0786 -0.1421 -0.1326 -0.2056 -0.1109 0.0385 -0.1993 -0.1975 -0.1858 -0.1546 -0.0297 -0.0629 -0.0997 -0.0624 -0.0583 0.0316 0.0594 0.0941 0.0813 -0.1211
-0.0714 0.1328 0.3487 1 0.6033 0.2866 -0.246 -0.1201 -0.1975 -0.0929 -0.1071 -0.212 -0.3018 -0.3432 -0.2562 0.0277 -0.1363 -0.2218 -0.1443 -0.0322 -0.012 0.1741 -0.0725 -0.0528 -0.0937
-0.038 0.093 0.2811 0.6033 1 0.4613 0.016 0.0655 -0.1094 0.0026 -0.1152 -0.1692 -0.2047 -0.2508 -0.319 -0.0528 -0.1839 -0.2758 -0.2657 -0.1136 -0.0699 0.1433 -0.0136 -0.0409 -0.1538
-0.0838 -0.0846 0.0786 0.2866 0.4613 0.9999 0.2615 0.2449 0.1471 0.0042 -0.1496 -0.2025 -0.1669 -0.142 -0.1746 -0.1984 -0.2197 -0.2631 -0.2675 -0.1999 -0.1315 0.0469 0.0003 -0.1113 -0.1217
-0.1433 -0.132 -0.1421 -0.246 0.016 0.2615 1 0.3979 0.3108 0.1622 -0.0539 0.0231 0.1801 0.2129 0.1331 -0.1325 -0.0669 -0.0922 -0.1236 -0.1463 -0.1452 -0.2422 -0.0768 -0.1457 0.036
-0.1071 0.0046 -0.1326 -0.1201 0.0655 0.2449 0.3979 1 0.4244 0.3821 0.119 -0.0666 0.0163 0.0963 -0.0078 -0.1202 -0.204 -0.2257 -0.2569 -0.2334 -0.234 -0.2004 -0.138 -0.0735 -0.1442
-0.1988 -0.004 -0.2056 -0.1975 -0.1094 0.1471 0.3108 0.4244 0.9999 0.5459 0.0498 -0.052 0.0987 0.186 0.2576 -0.052 -0.1921 -0.2222 -0.1792 -0.0154 -0.058 -0.1868 -0.2232 -0.3118 0.0186
-0.1076 -0.0197 -0.1109 -0.0929 0.0026 0.0042 0.1622 0.3821 0.5459 0.9999 0.2416 0.0183 0.063 0.0252 0.186 0.0519 -0.1943 -0.2241 -0.2635 -0.0498 -0.0799 -0.0553 -0.1567 -0.2281 -0.0263
-0.0313 -0.1469 0.0385 -0.1071 -0.1152 -0.1496 -0.0539 0.119 0.0498 0.2416 1 0.2601 0.1625 -0.0091 -0.0633 0.0355 0.0397 -0.0288 -0.0768 -0.2144 -0.2581 0.1062 0.0469 -0.0608 -0.0578
-0.157 -0.1143 -0.1993 -0.212 -0.1692 -0.2025 0.0231 -0.0666 -0.052 0.0183 0.2601 0.9999 0.3685 0.3059 0.1269 -0.0302 0.1417 0.1678 0.2219 -0.0392 -0.2391 -0.2504 -0.2743 -0.1827 -0.0496
-0.1032 -0.2016 -0.1975 -0.3018 -0.2047 -0.1669 0.1801 0.0163 0.0987 0.063 0.1625 0.3685 1 0.6136 0.2301 -0.1158 0.0366 0.0965 0.1334 -0.0449 -0.1923 -0.2321 -0.1848 -0.1109 0.1007
-0.137 -0.1 -0.1858 -0.3432 -0.2508 -0.142 0.2129 0.0963 0.186 0.0252 -0.0091 0.3059 0.6136 1 0.4078 -0.0615 0.0607 0.1223 0.1379 0.0072 -0.1377 -0.3633 -0.2905 -0.1867 0.0277
-0.0802 -0.0316 -0.1546 -0.2562 -0.319 -0.1746 0.1331 -0.0078 0.2576 0.186 -0.0633 0.1269 0.2301 0.4078 1 0.0521 -0.0345 0.0444 0.0778 0.0925 0.0596 -0.2551 -0.1499 -0.2211 0.244
0.1244 0.0044 -0.0297 0.0277 -0.0528 -0.1984 -0.1325 -0.1202 -0.052 0.0519 0.0355 -0.0302 -0.1158 -0.0615 0.0521 1 0.295 0.2421 -0.06 0.0921 0.243 0.0953 0.0886 0.0518 -0.0032
0.0701 -0.0589 -0.0629 -0.1363 -0.1839 -0.2197 -0.0669 -0.204 -0.1921 -0.1943 0.0397 0.1417 0.0366 0.0607 -0.0345 0.295 0.9999 0.4832 0.2772 0.0012 0.1198 0.0411 0.1213 0.1409 0.0368
0.0457 -0.0589 -0.0997 -0.2218 -0.2758 -0.2631 -0.0922 -0.2257 -0.2222 -0.2241 -0.0288 0.1678 0.0965 0.1223 0.0444 0.2421 0.4832 1 0.2632 0.0576 0.0965 -0.0043 0.0818 0.102 0.0915
-0.0634 0.0277 -0.0624 -0.1443 -0.2657 -0.2675 -0.1236 -0.2569 -0.1792 -0.2635 -0.0768 0.2219 0.1334 0.1379 0.0778 -0.06 0.2772 0.2632 1 0.2036 -0.0452 -0.142 -0.0696 -0.0367 0.3039
0.0401 0.0314 -0.0583 -0.0322 -0.1136 -0.1999 -0.1463 -0.2334 -0.0154 -0.0498 -0.2144 -0.0392 -0.0449 0.0072 0.0925 0.0921 0.0012 0.0576 0.2036 0.9999 0.2198 0.1268 0.0294 0.0261 0.3231
0.1643 0.078 0.0316 -0.012 -0.0699 -0.1315 -0.1452 -0.234 -0.058 -0.0799 -0.2581 -0.2391 -0.1923 -0.1377 0.0596 0.243 0.1198 0.0965 -0.0452 0.2198 1 0.2667 0.2833 0.2467 0.0288
0.3056 0.0104 0.0594 0.1741 0.1433 0.0469 -0.2422 -0.2004 -0.1868 -0.0553 0.1062 -0.2504 -0.2321 -0.3633 -0.2551 0.0953 0.0411 -0.0043 -0.142 0.1268 0.2667 1 0.4872 0.3134 0.1663
0.3956 0.0692 0.0941 -0.0725 -0.0136 0.0003 -0.0768 -0.138 -0.2232 -0.1567 0.0469 -0.2743 -0.1848 -0.2905 -0.1499 0.0886 0.1213 0.0818 -0.0696 0.0294 0.2833 0.4872 0.9999 0.4208 0.1317
0.4533 0.1858 0.0813 -0.0528 -0.0409 -0.1113 -0.1457 -0.0735 -0.3118 -0.2281 -0.0608 -0.1827 -0.1109 -0.1867 -0.2211 0.0518 0.1409 0.102 -0.0367 0.0261 0.2467 0.3134 0.4208 1 0.0592
0.1557 0.0217 -0.1211 -0.0937 -0.1538 -0.1217 0.036 -0.1442 0.0186 -0.0263 -0.0578 -0.0496 0.1007 0.0277 0.244 -0.0032 0.0368 0.0915 0.3039 0.3231 0.0288 0.1663 0.1317 0.0592 0.9999
I have also computed svd of this correlation matrix and got:
> s = svd(Correlation_25_1000)
$d
[1] 3.9205298 3.3567729 2.0014799 1.7018614 1.4438704 1.3708223 1.3051053 1.0271475 1.0090840 0.8242341 0.7127256 0.6549736 0.6364299 0.5870503 0.5433123 0.5006188 0.4916060
[18] 0.4595726 0.4451043 0.4105769 0.3693401 0.3326079 0.3202462 0.3054243 0.2695037
$u
matrix
$v
matrix
My question is, how can I use $d, $u and $v to get principal components
Could I use prcomp() ?? If, so how?
Try this one
princomp
princomp(USArrests, cor = TRUE)$loadings
Loadings:
Comp.1 Comp.2 Comp.3 Comp.4
Murder -0.536 0.418 -0.341 0.649
Assault -0.583 0.188 -0.268 -0.743
UrbanPop -0.278 -0.873 -0.378 0.134
Rape -0.543 -0.167 0.818
svd
svd(cor(USArrests))$u
[,1] [,2] [,3] [,4]
[1,] -0.5358995 0.4181809 -0.3412327 0.64922780
[2,] -0.5831836 0.1879856 -0.2681484 -0.74340748
[3,] -0.2781909 -0.8728062 -0.3780158 0.13387773
[4,] -0.5434321 -0.1673186 0.8177779 0.08902432
eigen
eigen(cor(USArrests))$vectors
[,1] [,2] [,3] [,4]
[1,] -0.5358995 0.4181809 -0.3412327 0.64922780
[2,] -0.5831836 0.1879856 -0.2681484 -0.74340748
[3,] -0.2781909 -0.8728062 -0.3780158 0.13387773
[4,] -0.5434321 -0.1673186 0.8177779 0.08902432
For cor matrix, all princomp, svd, and eigen produces same results.
Related
Compute euclidean distance for PCA in R
I did a PCA (using eigenvalue and smartPCA) and now I am trying to compute the Euclidean distance to one population. For example , in this dataset, I would want to compute in R the Euclidean distance of all population to population 6. Individual PCA1 PCA2 PCA3 PCA4 PCA5 PCA6 PCA7 PCA8 PCA8 PCA10 1: Pop1 0.0346 0.0095 -0.0022 -0.0018 -0.0033 0.0002 0.0042 -0.0003 0.0028 0.0268 2: Pop2 0.0370 0.0095 -0.0027 0.0015 -0.0027 -0.0024 0.0038 0.0012 0.0053 0.0210 3: Pop3 0.0379 0.0100 -0.0030 0.0021 -0.0017 -0.0043 0.0033 0.0005 0.0036 0.0144 4: Pop4 0.0352 0.0092 -0.0031 -0.0021 -0.0029 -0.0005. 0.0038 -0.0003 0.0047 0.0349 5: Pop5 0.0342 0.0089 -0.0027 -0.0013 -0.0031 -0.0008 0.0032 -0.0017 -0.0009 0.0265 6: Pop6 0.0342 -0.0524 -0.0503 -0.1028 -0.4785 -0.0244 0.0279 0.0038 -0.0264 -0.0022 -0.0265 I found a thread on how to do it in Python but can't find it in R ! I tried dist() but the I don't understand the output or how to compare to only one population (I have 500+ population in total) Thanks !
Error trying to produce forecast errors in R
I am trying to modify some code that I have, which works, to instead work with a different function for estimating a model. The original code is the following, and it works with the ARIMA function: S=round(0.75*length(ts_HHFCE_log)) h=1 error1.h <- c() for (i in S:(length(ts_HHFCE_log)-h)) { mymodel.sub <- arima(ts_HHFCE_log[1:i], order = c(0,1,3),seasonal=c(0,0,0)) predict.h <- predict(mymodel.sub,n.ahead=h)$pred[h] error1.h <- c(error1.h,ts_HHFCE_log[i+h]-predict.h) } The intuition is the following: Your time series has length T. You start somewhere at the beginning of your sample, but to give enough observations to regress and obtain parameter coefficients for your alpha and betas. Let's call this t for simplicity. Then based on this, you produce a one-step ahead forecast, so for time period (t+1). Then your forecast error is the difference between the actual value for (t+1) and your forecast value based on regressing on data available until t. Then you iterate, and consider from the start to (t+1), regress, and forecast (t+2). Then you obtain a forecast error for (t+2). Then basically you keep on doing this iterative process until you reach (T-1) and produce a forecast for T. This provides with what is known as a dynamic out of sample forecast error series. You do this for different models and then ascertain using a statistical test which is the more appropriate model to use. It is a way to produce out of sample forecasting using only the data you already have. I have modified the code to be the following: S=round(0.75*length(ts.GDP)) h=1 error1.h <- c() for (i in S:(length(ts.GDP)-h)) { mymodel.sub <- lm(ts.GDP[4:i] ~ ts.GDP[3:(i-1)] + ts.GDP[2:(i-2)] + ts.GDP[1:(i-3)]) predict.h <- predict(mymodel.sub,n.ahead=h)$pred[h] error1.h <- c(error1.h,ts.GDP[i+h]-predict.h) } I'm trying to do an AR(3) model. The reason I am not using the ARIMA function is because I also then want to compare these forecast errors with an ARDL model, and to my knowledge there is no simple function for the ARDL model (I'd have to use lm(), hence why I want to do the AR(3) model using the lm() function). The model I wish to compare the AR(3) model is the following: model_ts.GDP_1 <- lm(ts.GDP[4:123] ~ ts.GDP[3:122] + ts.GDP[2:121] + ts.GDP[1:120] + ts.CCI_AGG[3:122] + ts.CCI_AGG[2:121] + ts.CCI_AGG[1:120]) I am unsure how further to modify the code to get what I am after. Hopefully the intuition bit I explained should be clear in what I am trying to do. The data for GDP is basically the quarterly growth rate. It is stationary. The other variable in the second model is an index I've constructed using a dynamic PCA and taken first differences so it too is stationary. But in any case, in the second model, the forecast at t is based only on lagged data of each GDP and the index I constructed. Equally, given I am simulating out of sample forecast using data I have, there is no issue with actually properly forecasting. (In time series, this technique is seen as a more robust method to compare models than simply using things such as RMSE, etc.) Thanks! The data I am using: Date GDP_qoq CCI_A_qoq 31/03/1988 2.956 0.540 30/06/1988 2.126 -0.743 30/09/1988 3.442 0.977 31/12/1988 3.375 -0.677 31/03/1989 2.101 0.535 30/06/1989 1.787 -0.667 30/09/1989 2.791 0.343 31/12/1989 2.233 -0.334 31/03/1990 1.961 0.520 30/06/1990 2.758 -0.763 30/09/1990 1.879 0.438 31/12/1990 0.287 -0.708 31/03/1991 1.796 -0.078 30/06/1991 1.193 -0.735 30/09/1991 0.908 0.896 31/12/1991 1.446 0.163 31/03/1992 0.870 0.361 30/06/1992 0.215 -0.587 30/09/1992 0.262 0.238 31/12/1992 1.646 -1.436 31/03/1993 2.375 0.646 30/06/1993 0.249 -0.218 30/09/1993 1.806 0.676 31/12/1993 1.218 -0.393 31/03/1994 1.501 0.346 30/06/1994 0.879 -0.501 30/09/1994 1.123 0.731 31/12/1994 2.089 0.062 31/03/1995 0.386 0.475 30/06/1995 1.238 -0.243 30/09/1995 1.836 0.263 31/12/1995 1.236 -0.125 31/03/1996 1.926 -0.228 30/06/1996 2.109 -0.013 30/09/1996 1.312 0.196 31/12/1996 0.972 -0.015 31/03/1997 1.028 -0.001 30/06/1997 1.086 -0.016 30/09/1997 2.822 0.156 31/12/1997 -0.818 -0.062 31/03/1998 1.418 0.408 30/06/1998 0.970 -0.548 30/09/1998 0.968 0.466 31/12/1998 2.826 -0.460 31/03/1999 0.599 0.228 30/06/1999 -0.651 -0.361 30/09/1999 1.289 0.579 31/12/1999 1.600 0.196 31/03/2000 2.324 0.535 30/06/2000 1.368 -0.499 30/09/2000 0.825 0.440 31/12/2000 0.378 -0.414 31/03/2001 0.868 0.478 30/06/2001 1.801 -0.521 30/09/2001 0.319 0.068 31/12/2001 0.877 0.045 31/03/2002 1.253 0.061 30/06/2002 1.247 -0.013 30/09/2002 1.513 0.625 31/12/2002 1.756 0.125 31/03/2003 1.443 -0.088 30/06/2003 0.874 -0.138 30/09/2003 1.524 0.122 31/12/2003 1.831 -0.075 31/03/2004 0.780 0.395 30/06/2004 1.665 -0.263 30/09/2004 0.390 0.543 31/12/2004 0.886 -0.348 31/03/2005 1.372 0.500 30/06/2005 2.574 -0.066 30/09/2005 0.961 0.058 31/12/2005 2.378 -0.061 31/03/2006 1.015 0.212 30/06/2006 1.008 -0.218 30/09/2006 1.105 0.593 31/12/2006 0.943 -0.144 31/03/2007 1.566 0.111 30/06/2007 1.003 -0.125 30/09/2007 1.810 0.268 31/12/2007 1.275 -0.592 31/03/2008 1.413 0.017 30/06/2008 -0.491 -0.891 30/09/2008 -0.617 -0.836 31/12/2008 -1.410 -1.092 31/03/2009 -1.593 0.182 30/06/2009 -0.106 -0.922 30/09/2009 0.788 0.351 31/12/2009 0.247 0.414 31/03/2010 1.221 -0.329 30/06/2010 1.561 -0.322 30/09/2010 0.163 0.376 31/12/2010 0.825 -0.104 31/03/2011 2.484 0.063 30/06/2011 -0.574 -0.107 30/09/2011 0.361 -0.006 31/12/2011 0.997 -0.304 31/03/2012 0.760 0.243 30/06/2012 0.143 -0.381 30/09/2012 2.547 0.315 31/12/2012 0.308 -0.046 31/03/2013 0.679 0.221 30/06/2013 0.766 -0.170 30/09/2013 1.843 0.352 31/12/2013 0.756 0.080 31/03/2014 1.380 -0.080 30/06/2014 1.501 0.162 30/09/2014 0.876 0.017 31/12/2014 0.055 -0.251 31/03/2015 0.497 0.442 30/06/2015 1.698 -0.278 30/09/2015 0.066 0.397 31/12/2015 0.470 0.076 31/03/2016 1.581 0.247 30/06/2016 0.859 -0.342 30/09/2016 0.865 -0.011 31/12/2016 1.467 0.049 31/03/2017 1.006 0.087 30/06/2017 0.437 -0.215 30/09/2017 0.527 0.098 31/12/2017 0.900 0.218
The only thing you need to understand is how to get predictions using lm, it's not necessary to add other details (without reproducible data you're only making it more difficult). Create dummy data: set.seed(123) df<-data.frame(a=runif(10),b=runif(10),c=runif(10)) > print(df) a b c 1 0.2875775 0.95683335 0.8895393 2 0.7883051 0.45333416 0.6928034 3 0.4089769 0.67757064 0.6405068 4 0.8830174 0.57263340 0.9942698 5 0.9404673 0.10292468 0.6557058 6 0.0455565 0.89982497 0.7085305 7 0.5281055 0.24608773 0.5440660 8 0.8924190 0.04205953 0.5941420 9 0.5514350 0.32792072 0.2891597 10 0.4566147 0.95450365 0.1471136 Fit your model: model<-lm(c~a+b,data=df) Create new data: new_df<-data.frame(a=runif(1),b=runif(1)) > print(new_df) a b 1 0.9630242 0.902299 Get predictions from your new data: prediction<- predict(model,new_df) > print(prediction) 1 0.8270997 In your case, the new data new_df will be your lagged data, but you have to make the appropriate changes, OR provide reproducible data as above if you want us to go through the details of your problem. Hope this helps.
Calculate period return from monthly returns
This may sound naive but I can't seem to find the solution. I need to calculate 1, 3 and 5-year returns and my dataset consists of monthly returns rather than prices. The dataset I'm working on is similar to managers data(managers) tail(managers) HAM1 HAM2 HAM3 HAM4 HAM5 HAM6 EDHEC LS EQ SP500 TR US 10Y TR US 3m TR 2006-07-31 -0.0144 -0.0131 0.0102 -0.0120 -0.0164 -0.0225 -0.0031 0.00620 0.01580 0.00423 2006-08-31 0.0161 -0.0113 0.0253 -0.0183 0.0169 0.0193 0.0114 0.02380 0.02190 0.00441 2006-09-30 0.0068 -0.0231 0.0072 0.0197 0.0132 -0.0177 0.0001 0.02580 0.01140 0.00456 2006-10-31 0.0427 0.0167 0.0183 0.0518 0.0266 0.0189 0.0194 0.03260 0.00584 0.00381 2006-11-30 0.0117 0.0206 0.0269 0.0373 0.0038 0.0300 0.0200 0.01900 0.01419 0.00430 2006-12-31 0.0115 -0.0062 0.0110 0.0206 0.0317 0.0215 0.0153 0.01403 -0.01550 0.00441 I looked into the Return.cumulative from package PerformanceAnalytics but there is no argument for specifying periods. ROC from TTR can specify the number of periods to use but it is not based on return. What would be the best way to do this? Thank you in advance!
Base on what you want and what you know about ROC from TTR , I will only provide the Data preparation part #Sample Data df=read.table(text=' Date HAM1 HAM2 HAM3 HAM4 HAM5 HAM6 2006-07-31 -0.0144 -0.0131 0.0102 -0.0120 -0.0164 -0.0225 2006-08-31 0.0161 -0.0113 0.0253 -0.0183 0.0169 0.0193 2006-09-30 0.0068 -0.0231 0.0072 0.0197 0.0132 -0.0177 2006-10-31 0.0427 0.0167 0.0183 0.0518 0.0266 0.0189 2006-11-30 0.0117 0.0206 0.0269 0.0373 0.0038 0.0300 2006-12-31 0.0115 -0.0062 0.0110 0.0206 0.0317 0.0215 ',header=T,stringsAsFactors=F) #Make the Return to Price assume all stocks initial value with 100 for (i in 2:dim(df)[2]){ B=Reduce(function(x,y) {x * (1+y)}, df[,i], init=100, accumulate=T)# if it is log Return: {x * exp(y)} if (i==2){ Price= B }else{ Price=cbind(Price,B) } } Price=data.frame(cbind(df$Date,Price[-1,])) names(Price)=names(df) > Price Date HAM1 HAM2 HAM3 HAM4 HAM5 HAM6 1 2006-07-31 98.56 98.69 101.02 98.8 98.36 97.75 2 2006-08-31 100.146816 97.574803 103.575806 96.99196 100.022284 99.636575 3 2006-09-30 100.8278143488 95.3208250507 104.3215518032 98.902701612 101.3425781488 97.8730076225 4 2006-10-31 105.133162021494 96.9126828290467 106.230636201199 104.025861555502 104.038290727558 99.7228074665652 5 2006-11-30 106.363220017145 98.909084095325 109.088240315011 107.906026191522 104.433636232323 102.714491690562 6 2006-12-31 107.586397047342 98.295847773934 110.288210958476 110.128890331067 107.744182500887 104.922853261909 Then you can use the normal package to annualized the return(or customize)
Using #Wen's data: df=read.table(text=' Date HAM1 HAM2 HAM3 HAM4 HAM5 HAM6 2006-07-31 -0.0144 -0.0131 0.0102 -0.0120 -0.0164 -0.0225 2006-08-31 0.0161 -0.0113 0.0253 -0.0183 0.0169 0.0193 2006-09-30 0.0068 -0.0231 0.0072 0.0197 0.0132 -0.0177 2006-10-31 0.0427 0.0167 0.0183 0.0518 0.0266 0.0189 2006-11-30 0.0117 0.0206 0.0269 0.0373 0.0038 0.0300 2006-12-31 0.0115 -0.0062 0.0110 0.0206 0.0317 0.0215 ',header=T,stringsAsFactors=F) You can use the rollaply function from the zoo package. library(zoo) roll <- function(x, n, stat) { if (length(x) <= n) NA else x <- x + 1 rollapply(x, list(-seq(n)), stat, fill = NA) } df2 <- transform(df, four_month_return_HAM1 = ave(HAM1, FUN = function(x) roll(x, 4, prod)-1)) Change 4 to the period you want to calculate the cumulative return over. So, for one year, this would be 12. This will then give you the 12 month rolling returns.
How to color the branches and tick labels in the heatmap.2?
I have done a Heat Map using the function heatmap.2 of gplots in R, but I don't have an idea of how to coloring the branches and tick labels per groups (Eg. if I cut the tree to have four gruops like in my second figure). I have checked that it is possible to color the dendrogram alone using dendextend package. Also there is a heatmap here: selecting number of leaf nodes of dendrogram in heatmap.2 in R with a colored dendrogram, but I can't implement it in my example. Somebody can help me with this issue? Update This is my Heat Map: and I would like to have one like this with branches and tick labels in color according their four groups (this figure was edited with Illustrator to explain this question): Here is the data and code that I have used: Data YEAR varA varB varC varD varE varF var1 var2 var3 var4 var5 var6 var7 var8 var9 var10 var11 var12 var13 var14 var15 var16 var17 2005 1.175290887 1.535846033 1.531113178 -1.10297075 0.0284 26 -25.5470 -24.2101 24.7900 3.3345 0.0468 0.5058 0.0087 1.7378 0.0703 2.7070 0.0183 0.0340 0.0177 0.0176 0.0240 0.0015 0.0292 2004 0.834733204 0.64917365 -0.403174087 0.116169692 0.033 50 -24.4170 -22.2574 27.3400 3.4106 0.1151 0.5822 0.0085 1.8133 0.0762 3.2604 0.0114 0.0178 0.0086 0.0086 0.0824 0.0018 0.0308 2003 1.297607635 1.224946337 0.4486378 0.227557968 0.0544 181 -24.5080 -23.2790 27.4200 3.5092 0.1052 0.5239 0.0038 0.9815 0.0681 2.7465 0.0074 0.0099 0.0025 0.0025 0.0142 0.0015 0.0298 2002 1.043780072 0.650695815 -0.337133061 0.016766696 0.0374 227 -22.6110 -21.7828 30.0200 3.6270 0.1119 0.5753 0.0106 0.7916 0.0805 3.0434 0.0069 0.0086 0.0109 0.0108 0.0313 0.0017 0.0288 2001 0.781864124 0.534881678 -0.740527443 0.171745261 0.0074 20 -23.9170 -23.2327 3.8007 0.1243 0.6216 0.0553 1.2333 0.3414 2.9606 0.0074 0.0384 0.0079 0.0082 0.0570 0.0018 0.0360 2000 0.742528229 0.667207042 -0.614740091 0.189253192 0.0257 88 -22.6420 -21.4066 30.8900 3.1693 0.0287 0.6244 0.0070 1.0256 0.1336 2.7033 0.0063 0.0102 0.0185 0.0186 0.0248 0.0015 0.0278 1999 0.701222612 1.059869033 0.772334853 0.290190993 0.0476 312 -22.4730 -21.8328 26.6600 3.0578 0.0719 0.6363 0.0032 0.7183 0.0649 2.5445 0.0066 0.0070 0.0063 0.0063 0.0095 0.0016 0.0252 1998 0.904634938 1.16455833 0.646654191 0.086214161 0.0546 332 -23.2070 -22.4399 26.1400 3.2344 0.0656 0.7096 0.0046 0.6709 0.0718 2.5656 0.0072 0.0166 0.0132 0.0131 0.0144 0.0016 0.0275 1997 0.965775183 1.362520795 0.653268963 0.007038426 0.0791 509 -23.4830 -22.4253 26.0400 3.0278 0.0438 0.7575 0.0081 0.5002 0.0657 2.5755 0.0077 0.0072 0.0083 0.0083 0.0108 0.0017 0.0252 1996 0.956113049 1.439534042 0.618648101 -0.334351083 0.0411 245 -23.4290 -23.0417 27.3000 2.9331 0.0363 0.9229 0.0050 0.4819 0.1306 2.7239 0.0072 0.0166 0.0027 0.0026 0.0174 0.0018 0.0240 1995 1.786742729 1.732091021 2.654237394 0.190377371 0.0842 646 -22.7600 -22.0212 24.2100 3.1562 0.0202 1.1728 0.0072 0.6133 0.0772 3.1313 0.0080 0.0051 0.0035 0.0035 0.0055 0.0022 0.0266 1994 0.811695681 0.670904284 0.76646691 2.163378723 0.0394 203 -22.4920 -21.3677 28.6500 3.2475 0.0132 1.7476 0.0084 0.9386 0.1880 3.8856 0.0082 0.0120 0.0129 0.0129 0.0151 0.0026 0.0280 1993 0.754876913 0.302624208 -0.927234708 -0.108263802 0.017 66 -22.3880 -21.2900 32.8400 3.5853 0.0008 1.6626 0.0221 1.2307 0.4173 3.8864 0.0079 0.0379 0.0199 0.0196 0.0225 0.0028 0.0319 1992 0.723058507 0.818965047 -0.52053294 0.384656566 0.0345 155 -21.4920 -20.8724 32.2000 3.3116 0.0068 1.5673 0.0104 0.9411 0.2245 4.0228 0.0075 0.0123 0.0308 0.0306 0.0112 0.0027 0.0287 1991 1.024225427 0.71537408 0.22672288 -0.029575009 0.0297 235 -23.4850 -22.7000 27.8400 4.4024 0.0097 1.6126 0.0698 0.6344 0.2832 4.4160 0.0108 0.0127 0.0184 0.0184 0.0122 0.0030 0.0356 1990 0.873807193 1.168599747 1.317306687 -0.335682786 0.0533 172 -23.7170 -23.5029 25.8100 4.0497 0.0170 1.5207 0.0065 0.5232 0.1734 4.6765 0.0104 0.0164 0.0131 0.0130 0.0093 0.0030 0.0332 1989 0.71498833 1.065965836 0.650281646 -0.048038841 0.0663 214 -23.5000 -23.1053 26.3500 4.1139 0.0159 1.6162 0.0096 0.5199 0.1426 4.7752 0.0106 0.0083 0.0098 0.0099 0.0076 0.0031 0.0341 1988 1.188282778 1.133076429 -0.167816244 0.448030288 0.007 64 -23.3750 -21.9900 29.3900 3.6893 0.0278 1.8392 0.0939 0.5658 1.2390 5.1103 0.0086 0.0775 0.0203 0.0202 0.0339 0.0034 0.0340 1987 0.788798159 0.276008942 -0.934596308 -0.039259431 0.012 65 -22.9540 -22.7758 28.3800 3.6375 0.0011 1.8331 0.0768 0.6187 0.6081 5.0475 0.0088 0.0554 0.0183 0.0180 0.0159 0.0038 0.0381 1986 0.757757883 1.395817348 0.455252572 -0.001274532 0.0125 47 -22.6120 -22.9011 29.7400 3.7060 0.0172 1.5279 0.0151 0.5897 0.6168 4.4917 0.0085 0.0160 0.0257 0.0256 0.0276 0.0033 0.0410 1985 1.128413419 0.321849225 -0.904189697 -0.05362552 0.0705 291 -22.7200 -21.9357 28.4100 3.5887 0.0100 1.4955 0.0022 0.3538 0.1471 4.3125 0.0091 0.0157 0.0042 0.0042 0.0041 0.0029 0.0292 1984 1.015352865 1.014625668 0.39294569 -0.267936245 0.0419 121 -23.5170 -23.1678 25.6200 4.5018 0.0353 1.8985 0.0022 0.3420 0.2620 4.9867 0.0113 0.0069 0.0058 0.0058 0.0051 0.0033 0.0356 1983 0.393985784 0.474743555 -0.368393191 -0.222845745 0.0161 49 -24.5600 -23.9514 30.5300 3.0978 0.0270 0.9467 0.0421 0.3287 0.5616 3.1256 0.0075 0.0553 0.0154 0.0155 0.0084 0.0022 0.0323 1982 0.503744901 0.524683063 -0.946225504 0.016766696 0.0118 10 -23.5970 -24.0037 30.3100 2.7288 0.0011 1.2154 0.0097 0.3022 0.8415 4.3594 0.0083 0.0254 0.0075 0.0076 0.0134 0.0029 0.0304 1981 0.872025585 1.496555573 0.658923526 -0.175816424 0.0489 343 -23.8320 -23.4716 28.1100 4.6585 0.0128 1.9205 0.0031 0.2999 0.2278 5.6588 0.0134 0.0067 0.0072 0.0071 0.0087 0.0036 0.0437 1980 2.165460373 3.419095697 3.741300435 0.250364758 0.0644 626 -24.5010 -24.0323 28.7300 3.8474 0.0122 1.4827 0.0019 0.2164 0.1859 4.3602 0.0104 0.0056 0.0050 0.0050 0.0064 0.0028 0.0337 1979 1.00201444 0.453601121 0.109577407 0.73158507 0.0281 301 -23.6070 -22.9149 27.9100 4.5765 0.0467 1.6919 0.0344 0.1940 0.3453 5.1064 0.0132 0.0162 0.0078 0.0077 0.0554 0.0032 0.0389 1978 0.829984787 0.2021646 -0.724630653 -0.178430782 0.0000 1977 0.939170906 0.192142351 -1.029656979 0.50745842 0.0068 30 -24.3510 -22.5760 29.4900 6.1029 0.3417 2.4069 0.0938 0.2824 1.3937 6.6441 0.0136 0.0609 0.0395 0.0391 0.6074 0.0045 0.0591 1976 0.741090851 0.151474404 -0.439448642 0.359471579 0.056 396 -23.7450 -22.7680 28.3700 4.3464 0.0431 1.6901 0.0234 0.2937 0.2160 5.1366 0.0113 0.0147 0.0082 0.0081 0.0317 0.0034 0.0389 1975 1.061884929 0.396763153 -1.075320241 0.433356946 0.0299 322 -23.4320 -22.9732 25.7800 5.0301 0.1740 2.2028 0.0311 0.3131 0.4254 5.8683 0.0131 0.0160 0.0182 0.0182 0.2093 0.0038 0.0443 1974 1.052548763 0.491883924 0.28198823 -0.562241025 0.0215 267 -23.3350 -22.7075 26.4100 5.3407 0.1187 2.2436 0.0231 0.2984 0.5378 5.8795 0.0127 0.0208 0.0127 0.0128 0.0821 0.0038 0.0466 1973 0.519163031 1.120525721 0.960322396 -0.84893256 0.0129 49 -23.4350 -23.0556 31.3500 6.4341 0.1105 2.4298 0.0484 0.2783 0.9249 5.8779 0.0129 0.0428 0.0124 0.0123 0.1293 0.0038 0.0499 1972 0.703961551 1.359485416 -0.306513069 -1.150818704 0.0228 247 -23.7840 -23.3257 28.3000 6.3520 0.1096 2.6043 0.0439 0.4126 0.5335 6.3320 0.0154 0.0279 0.0061 0.0062 0.0874 0.0042 0.0593 1971 0.714252707 1.621333793 -1.065184704 0.003023451 0.0274 196 -23.2140 -22.2731 31.3800 5.1332 0.0873 1.9259 0.0872 0.3598 0.4714 4.9337 0.0112 0.0234 0.0073 0.0073 0.0688 0.0034 0.0426 1970 1.022643019 1.491401283 0.088239434 -0.973528472 0.025 206 -22.9870 -21.9506 30.6200 5.0770 0.0698 2.1145 0.1825 0.3537 0.4990 5.3274 0.0129 0.0873 0.0098 0.0098 0.0316 0.0040 0.0479 1969 2.157784838 1.796722133 0.731152565 -0.193891705 0.0547 505 -24.2820 -23.9048 26.2400 5.0183 0.0637 2.2673 0.0127 0.2893 0.2420 5.1038 0.0129 0.0244 0.0069 0.0069 0.0154 0.0037 0.0440 1968 0.913026742 1.271215847 0.196849717 -1.068149218 0.0132 112 -22.9850 -21.9397 32.2300 4.0568 0.0498 2.0576 0.0965 0.2188 0.9468 5.3597 0.0080 0.0513 0.0157 0.0154 0.0507 0.0039 0.0371 1967 0.749350643 0.439194622 -1.316546028 0.306149455 0.0209 196 -23.7020 -22.8580 30.5400 4.5873 0.0703 1.9639 0.4981 0.2136 0.6086 5.1528 0.0100 0.0934 0.0103 0.0102 0.0235 0.0042 0.0415 1966 0.732785384 0.74795644 -0.681581292 1.265096245 0.0189 204 -23.3746 -22.7452 30.0600 4.8598 0.0542 1.8172 0.0437 0.2605 0.6557 5.2782 0.0131 0.0118 0.0081 0.0080 0.0203 0.0036 0.0418 1965 0.613725701 0.507953446 -1.91048851 0.825418348 0.0073 75 -24.2131 -22.5251 30.1900 5.5445 0.0691 1.9367 0.9303 0.2240 1.6461 5.5971 0.0119 0.1519 0.0318 0.0322 0.0436 0.0053 0.0467 1964 0.761469549 0.591007527 -0.715988774 -0.038091331 0.0000 1963 0.863218851 0.888615198 -0.331691877 -0.251436807 0.0123 121 -25.0690 -24.5964 27.4600 6.3232 0.0777 2.0383 0.1999 0.2465 0.9724 5.8291 0.0133 0.0349 0.0130 0.0131 0.0240 0.0044 0.0519 1962 1.194332086 1.123299319 1.400311402 -0.006545299 0.0296 250 -23.6850 -23.4588 29.3800 5.7280 0.0771 1.8900 0.0077 0.1952 0.4429 5.7635 0.0122 0.0047 0.0064 0.0063 0.0121 0.0041 0.0471 1961 0.685968021 0.396586649 -0.75076967 0.0168 201 -26.3352 -26.3457 5.5119 0.0726 1.9270 0.0180 0.1741 0.7887 5.7523 0.0121 0.0080 0.0119 0.0119 0.0208 0.0043 0.0496 1960 0.881343621 0.681729796 -0.466014418 0.0242 250 -25.5025 -25.2769 29.1200 6.5630 0.1133 2.2199 0.1176 0.2603 0.5894 6.4430 0.0159 0.0392 0.0062 0.0061 0.0308 0.0051 0.0647 1959 0.976463783 0.856497076 -0.769653776 0.0046 109 -24.9889 -25.0234 28.1000 7.4239 0.0760 3.3692 3.7315 0.4288 2.8041 7.8173 0.0178 0.6213 0.0559 0.0554 0.0902 0.0115 0.0722 1958 1.267054108 0.846073161 -0.698278256 0.0069 41 -24.5183 -25.8900 24.7200 8.4312 0.0602 3.1824 0.6086 0.4111 1.6313 7.3141 0.0165 0.0977 0.0280 0.0279 0.0575 0.0046 0.0709 1957 0.811849325 0.818326511 -1.087269506 0.0126 95 -23.4967 -23.5870 32.4900 5.6488 0.0761 2.6156 0.2207 0.4425 1.0305 7.3572 0.0159 0.0726 0.0380 0.0377 0.0437 0.0059 0.0573 1956 0.837065839 1.0007592 0.424525891 0.0115 76 -23.4403 -22.9419 32.1500 5.6087 0.0844 2.8347 0.3853 0.3125 1.1162 8.0455 0.0167 0.0696 0.0158 0.0157 0.0306 0.0058 0.0565 1955 2.044375189 1.828578166 0.0218 128 -24.9729 -24.2108 26.9000 7.4702 0.1659 4.0858 0.2619 0.3952 0.7023 9.7602 0.0222 0.0635 0.0111 0.0111 0.0338 0.0070 0.0731 1954 0.737033129 1.060103924 0.0029 8 -25.6604 -25.1068 28.9700 7.8034 0.0884 4.0907 1.8003 0.4834 5.0243 8.9409 0.0243 0.4037 0.0541 0.0529 0.2932 0.0091 0.0813 1953 0.619590578 0.647436408 0.0075 109 31.0400 1952 0.671851137 1.325676852 0.00562 41 33.1100 1951 0.894632264 1.397998867 0.00374 95 35.1800 1950 0.793048089 0.55195169 0.00186 76 -24.6750 -24.0405 37.2500 6.8214 0.1632 3.3876 1.0452 0.4622 1.7704 7.9556 0.0223 0.2316 0.0594 0.0592 0.3935 0.0066 0.0673 1949 0.70029018 1.053010492 0.0061 23 -25.2148 -26.0272 31.0900 5.8770 0.0532 3.0895 0.1231 0.4304 2.1365 7.9355 0.0165 0.1047 0.0204 0.0201 0.0735 0.0060 0.0578 1948 1.051413064 0.611568416 0.0105 86 -25.9116 -25.3761 29.6500 4.0905 0.0930 2.3578 0.7431 0.1757 1.3103 7.2889 0.0122 0.1378 0.0138 0.0136 0.0408 0.0056 0.0441 1947 0.706745895 0.323498221 0.0108 129 -26.5485 -25.8733 29.7700 5.7245 0.1294 3.2072 0.0524 0.2021 1.2550 9.1257 0.0150 0.1170 0.0155 0.0155 0.0393 0.0060 0.0588 1946 1.550656194 1.598435187 0.0164 381 -27.4603 -26.6368 28.0600 5.8659 0.1405 2.7682 0.0353 0.2424 0.3504 8.4089 0.0130 0.0437 0.0075 0.0075 0.0176 0.0057 0.0516 1945 0.877065687 0.539494611 0.0199 169 -26.7543 -26.0271 24.5700 6.2789 0.1407 2.9213 0.0309 0.3404 0.2888 7.9661 0.0131 0.0460 0.0079 0.0079 0.0185 0.0054 0.0507 1944 0.630508563 0.833959181 0.0116 20 -26.8748 -25.0203 29.4600 7.8427 0.0963 3.3664 0.8484 0.4187 0.4954 6.6868 0.0172 0.1799 0.0114 0.0114 0.0185 0.0066 0.0697 1943 0.948762137 0.552892235 0.0392 309 -24.8697 -26.9799 24.9700 7.2577 0.1020 3.2354 0.1611 0.3774 0.7706 8.0918 0.0196 0.0457 0.0060 0.0060 0.0120 0.0055 0.0699 1942 0.950673449 1.135547963 0.0148 18 -22.5094 -22.8155 28.5600 7.6926 0.1348 3.3979 0.6492 0.3347 1.3499 8.7744 0.0190 0.1142 0.0095 0.0095 0.0208 0.0072 0.0710 1941 1.185071356 1.263733805 0.0107 10 -24.3510 -22.5329 29.8200 6.2710 0.1459 3.3306 0.0560 0.3519 1.0068 9.4886 0.0179 0.0185 0.0196 0.0198 0.1190 0.0066 0.0613 1940 1.262322422 0.924262914 0.0168 133 -25.2962 -25.0828 26.2600 7.9568 0.1977 3.2329 0.0803 0.3561 3.2999 9.5743 0.0200 0.0232 0.0125 0.0125 0.0538 0.0065 0.0702 1939 1.114823086 1.548939022 0.0158 25 -25.5439 -24.3820 27.9800 4.2674 0.1624 2.3578 0.4553 0.3042 2.2656 7.3905 0.0087 0.0741 0.0100 0.0100 0.3075 0.0059 0.0413 1938 0.639727143 0.569847918 0.0115 5 -23.4696 -22.7480 5.0000 0.0751 2.6663 0.4021 0.2049 0.4997 7.9594 0.0121 0.0753 0.0093 0.0092 0.0819 0.0068 0.0485 1937 0.844930794 1.201811673 0.0269 13 -24.2616 -24.5915 25.9500 4.5623 0.0912 2.3393 0.0227 0.3172 0.2136 7.5512 0.0108 0.0093 0.0080 0.0079 0.1586 0.0049 0.0397 1936 0.603048989 0.528796963 0.0167 4 -23.4819 -23.1849 29.0200 7.1722 0.0600 2.7679 0.0126 0.2080 1.1025 7.5967 0.0175 0.0076 0.0094 0.0095 0.0608 0.0052 0.0569 1935 0.739921482 0.980951812 0.0369 402 -25.3542 -25.7692 30.5500 4.8218 0.0563 2.1489 0.0084 0.2337 1.3120 6.8994 0.0154 0.0044 0.0081 0.0081 0.0329 0.0047 0.0404 1934 0.936808475 1.350050919 0.0289 166 -26.1766 -24.8557 26.5700 4.2794 0.0626 2.1503 0.0112 0.3330 1.5501 6.8375 0.0072 0.0045 0.0248 0.0249 0.0818 0.0046 0.0362 1933 0.822006233 0.980858486 0.0187 215 -25.2825 -24.7483 27.0600 4.0682 0.0719 2.1376 0.0170 0.3042 3.6465 6.7130 0.0085 0.0074 0.0071 0.0071 0.0790 0.0047 0.0380 1932 1.128679304 1.122260931 0.0302 318 -26.5160 -24.7148 29.8100 3.4429 0.0475 2.1194 0.0111 0.2919 2.6147 7.5700 0.0093 0.0039 0.0069 0.0071 0.0472 0.0047 0.0336 1931 1.013960586 0.485124456 0.0189 13 -24.7074 -24.9517 30.7100 3.9828 0.0677 2.2806 0.0183 0.2268 3.7269 9.1548 0.0074 0.0089 0.0073 0.0073 0.0687 0.0057 0.0383 1930 1.148649752 1.029163891 0.0203 175 -26.8323 -26.0809 29.1800 3.0899 0.0697 3.5321 0.0158 0.3735 1.8765 13.0435 0.0121 0.0145 0.0103 0.0104 0.0397 0.0086 0.0506 1929 0.99387758 1.204846613 0.0376 104 -26.6411 -26.0890 28.1500 4.2733 0.0412 2.6675 0.0078 0.2893 0.1528 9.4824 0.0094 0.0112 0.0075 0.0075 0.0083 0.0064 0.0354 1928 0.905609551 0.772378969 0.0331 233 -25.8461 -26.2246 32.3600 5.8361 0.0440 2.8293 0.0095 0.2231 0.1736 8.7255 0.0186 0.0087 0.0074 0.0075 0.0091 0.0063 0.0476 1927 0.85672722 0.215215241 0.0171 152 -25.9555 -25.9299 28.1500 8.1915 0.1054 2.9585 0.0298 0.2692 0.3361 7.8459 0.0158 0.0135 0.0113 0.0112 0.2221 0.0057 0.0717 1926 0.932350398 0.425876672 0.0165 132 -27.7161 -26.9161 22.1900 7.5864 0.0875 3.2115 0.0256 0.2381 0.3483 8.4859 0.0152 0.0123 0.0127 0.0127 0.1256 0.0061 0.0618 1925 0.809324244 0.603492919 0.0174 48 -24.5765 -24.8562 28.9600 6.3520 0.0226 2.7524 0.0175 0.2355 0.3303 7.8838 0.0120 0.0130 0.0096 0.0096 0.0174 0.0058 0.0534 1924 1.735408827 1.991986688 0.027 253 -25.9985 -24.8571 31.4900 6.1000 0.1097 2.6762 0.0284 0.2676 2.2755 7.9132 0.0158 0.0089 0.0107 0.0106 0.2161 0.0054 0.0668 1923 0.787925712 1.573404755 0.0203 150 -24.6288 -25.1568 29.9300 5.6860 0.0967 2.5993 0.0231 0.2137 3.8395 9.0800 0.0128 0.0101 0.0098 0.0098 0.1010 0.0060 0.0536 1922 0.799163043 0.0208 334 -24.4215 -24.3729 28.8900 5.3341 0.0924 2.6394 0.0133 0.2462 3.8226 7.8138 0.0114 0.0069 0.0149 0.0150 0.0729 0.0054 0.0497 1921 0.77243578 0.0226 443 -23.4421 -23.8877 29.4300 6.1139 0.0805 3.2761 0.0156 0.2522 4.2754 10.1551 0.0128 0.0040 0.0195 0.0197 0.1065 0.0067 0.0623 1920 0.787155209 0.0385 278 -24.2587 -23.9798 29.2400 5.9896 0.0727 3.0804 0.0110 0.2266 3.7709 9.9680 0.0133 0.0038 0.0268 0.0269 0.0544 0.0067 0.0567 1919 0.836725864 0.0276 341 -24.7950 -24.8537 27.3900 6.5779 0.0798 3.1646 0.0126 0.2276 4.7733 10.8125 0.0149 0.0052 0.0154 0.0154 0.0604 0.0073 0.0629 1918 0.838156697 0.0058 392 -25.9260 -24.5236 30.6200 6.0259 0.0939 3.5283 0.0448 0.4603 6.5956 12.5834 0.0114 0.0238 0.0598 0.0605 0.2763 0.0095 0.0823 1917 0.966249549 0.0208 58 -25.5352 -24.7604 28.3400 5.8498 0.0925 2.8573 0.0143 0.2275 3.3143 9.2387 0.0118 0.0090 0.0238 0.0239 0.0445 0.0065 0.0535 1916 1.352618036 0.0152 567 -24.0530 -23.6626 27.6400 6.3964 0.0549 3.1876 0.0166 0.2559 6.1909 11.3232 0.0119 0.0088 0.0303 0.0302 0.0696 0.0078 0.0620 1915 0.56838431 0.0354 153 -23.6817 -23.9420 29.7600 5.9449 0.0494 3.1254 0.0118 0.2632 3.6600 10.8684 0.0125 0.0096 0.0234 0.0234 0.0455 0.0075 0.0580 1914 1.653698335 0.0096 355 -25.3230 -25.5543 30.4100 6.1042 0.0305 3.3067 0.0310 0.3592 11.7772 11.9468 0.0103 0.0189 0.0230 0.0230 0.0825 0.0083 0.0603 1913 0.673176646 0.018 479 -25.2734 -25.9128 31.0800 6.1167 0.1001 3.5575 0.0227 0.3392 8.3156 12.0722 0.0131 0.0069 0.0294 0.0291 0.0844 0.0083 0.0681 1912 1.168563731 0.0026 57 -25.4911 -25.0984 30.9900 8.2413 0.1793 5.4744 0.1320 0.7542 53.7132 17.0050 0.0120 0.1196 0.0562 0.0570 0.3436 0.0120 0.1118 1911 1.458277945 0.0119 43 -25.0742 -25.1744 29.2000 8.5525 0.0326 4.2884 0.0276 0.4920 13.5179 14.3376 0.0117 0.0126 0.0152 0.0153 0.0453 0.0096 0.0817 1910 1.653698335 0.0096 355 -25.3230 -25.5543 30.4100 6.1042 0.0305 3.3067 0.0310 0.3592 11.7772 11.9468 0.0103 0.0189 0.0230 0.0230 0.0825 0.0083 0.0603 Code # reading data test <- read.delim("clipboard", sep="") rnames <- test[,1] test <- data.matrix(test[,2:ncol(test)]) # to matrix rownames(test) <- rnames test <- scale(test, center=T, scale=T) # data standarization test <- t(test) # transpose ## Creating a color palette & color breaks my_palette <- colorRampPalette(c("forestgreen", "yellow", "red"))(n = 299) col_breaks = c(seq(-1,-0.5,length=100), # forestgreen seq(-0.5,0.5,length=100), # yellow seq(0.5,1,length=100)) # red # distance & hierarchical clustering distance= dist(test, method ="euclidean") hcluster = hclust(distance, method ="ward.D") # Creating Heat Map heatmap.2(test, main = paste( "test"), trace="none", margins =c(5,7), col=my_palette, breaks=col_breaks, dendrogram="row", Rowv = as.dendrogram(hcluster), Colv = "NA", key.xlab = "Concentration (index)", cexRow =0.6, cexCol = 0.8, na.rm = TRUE )
Solution: use the color_branches function from the dendextend package (or the set function, with the "branches_k_color", "k", and "value" parameters ). First we need to get the data into R and create the relevant objects ready (this part is the same as the code in the question): test <- read.delim("clipboard", sep="") rnames <- test[,1] test <- data.matrix(test[,2:ncol(test)]) # to matrix rownames(test) <- rnames test <- scale(test, center=T, scale=T) # data standarization test <- t(test) # transpose ## Creating a color palette & color breaks my_palette <- colorRampPalette(c("forestgreen", "yellow", "red"))(n = 299) col_breaks = c(seq(-1,-0.5,length=100), # forestgreen seq(-0.5,0.5,length=100), # yellow seq(0.5,1,length=100)) # red # distance & hierarchical clustering distance= dist(test, method ="euclidean") hcluster = hclust(distance, method ="ward.D") Next, we get the dendrogram and the heatmap ready: dend1 <- as.dendrogram(hcluster) # Get the dendextend package if(!require(dendextend)) install.packages("dendextend") library(dendextend) # get some colors cols_branches <- c("darkred", "forestgreen", "orange", "blue") # Set the colors of 4 branches dend1 <- color_branches(dend1, k = 4, col = cols_branches) # or with: # dend1 <- set(dend1, "branches_k_color", k = 4, value = cols_branches) # get the colors of the tips of the dendrogram: # col_labels <- cols_branches[cutree(dend1, k = 4)] # this may need tweaking in various cases - the following is a more general solution. # The following code will work on its own once I uplode dendextend 0.18.6 to CRAN - but that can # take several good weeks until that happens. In the meantime # Either use devtools::install_github('talgalili/dendextend') # Or just the following: source("https://raw.githubusercontent.com/talgalili/dendextend/master/R/attr_access.R") col_labels <- get_leaves_branches_col(dend1) # But due to the way heatmap.2 works - we need to fix it to be in the # order of the data! col_labels <- col_labels[order(order.dendrogram(dend1))] # Creating Heat Map if(!require(gplots)) install.packages("gplots") library(gplots) heatmap.2(test, main = paste( "test"), trace="none", margins =c(5,7), col=my_palette, breaks=col_breaks, dendrogram="row", Rowv = dend1, Colv = "NA", key.xlab = "Concentration (index)", cexRow =0.6, cexCol = 0.8, na.rm = TRUE, RowSideColors = col_labels, # to add nice colored strips colRow = col_labels # to add nice colored labels - only for qplots 2.17.0 and higher ) Which produces this plot: For more details on the package, you can have a look at its vignette. p.s.: to get the labels colored depends on parameters of heatmap.2, and this should be asked from the maintainer of gplots (i.e.: from greg at warnes.net) update: this answer now includes the new "colRow" parameter in qplots 2.17.0.
this is the maintainer of the gplots package. I've added two new arguments to the gplots::heatmap.2 function, 'colRow' and 'colCol' to control the colors of the row and column labels. This will be part of gplots 2.17.0 which should be submitted to CRAN in the next day or so.
Mapping spatial Distributions in R
My data set includes 17 stations and for each station there are 24 hourly temperature values. I would like to map each stations value in each hour and doing so for all the hours. What I want to do is something like the image. The data is in the following format: N2 N3 N4 N5 N7 N8 N10 N12 N13 N14 N17 N19 N25 N28 N29 N31 N32 1 1.300 -0.170 -0.344 2.138 0.684 0.656 0.882 0.684 1.822 1.214 2.046 2.432 0.208 0.312 0.530 0.358 0.264 2 0.888 -0.534 -0.684 1.442 -0.178 -0.060 0.430 -0.148 1.420 0.286 1.444 2.138 -0.264 -0.042 0.398 -0.196 -0.148 3 0.792 -0.564 -0.622 0.998 -0.320 1.858 -0.036 -0.118 1.476 0.110 0.964 2.048 -0.480 -0.434 0.040 -0.538 -0.322 4 0.324 -1.022 -1.128 1.380 -0.792 1.042 -0.054 -0.158 1.518 -0.102 1.354 2.386 -0.708 -0.510 0.258 -0.696 -0.566 5 0.650 -0.774 -0.982 1.124 -0.540 3.200 -0.052 -0.258 1.452 0.028 1.022 2.110 -0.714 -0.646 0.266 -0.768 -0.532 6 0.670 -0.660 -0.844 1.248 -0.550 2.868 -0.098 -0.240 1.380 -0.012 1.164 2.324 -0.498 -0.474 0.860 -0.588 -0.324 MeteoSwiss 1 -0.6 2 -1.2 3 -1.0 4 -0.8 5 -0.4 6 -0.2 where N2, N3, ...m MeteoSwiss are the stations and each row presents the station's temperature value for each hour. id Longitude Latitude 2 7.1735 45.86880001 3 7.17254 45.86887001 4 7.171636 45.86923601 5 7.18018 45.87158001 7 7.177229 45.86923001 8 7.17524 45.86808001 10 7.179299 45.87020001 12 7.175189 45.86974001 13 7.179379 45.87081001 14 7.175509 45.86932001 17 7.18099 45.87262001 19 7.18122 45.87355001 25 7.15497 45.87058001 28 7.153399 45.86954001 29 7.152649 45.86992001 31 7.154419 45.87004001 32 7.156099 45.86983001 MeteoSwiss 7.184 45.896
I define a toy example more or less resembling your data: vals <- matrix(rnorm(24*17), nrow=24) cds <- data.frame(id=paste0('N', 1:17), Longitude=rnorm(n=17, mean=7.1), Latitude=rnorm(n=17, mean=45.8)) vals <- as.data.frame(t(vals)) names(vals) <- paste0('H', 1:24) The sp package defines several classes and methods to store and display spatial data. For your example you should use the SpatialPointsDataFrame class: library(sp) mySP <- SpatialPointsDataFrame(coords=cds[,-1], data=data.frame(vals)) and the spplot method to display the information: spplot(mySP, as.table=TRUE, col.regions=bpy.colors(10), alpha=0.8, edge.col='black') Besides, you may find useful the spacetime package (paper at JSS).