Separate fields in R when no delimiter exists - r

I have a dataset like the following:
structure(list(Info = c("Acacia melanoceras 0.0369 0.0427 0.0267 0.0298 0.0501 0.0042 ",
"Acalypha diversifolia van 0.0670 0.0439 0.0281 0.0427 0.0464 -0.0148 ",
"Acalypha macrostachya vin 0.0657 0.0621 0.0441 0.0522 0.0473 -0.0173 ",
"Adelia triloba 0.0481 0.0350 0.0202 0.0174 0.0286 -0.0349 ",
"Aegiphila panamensis 0.0437 0.0312 0.0166 0.0148 0.0194 -0.0497 ",
"Alchornea costaricensis 0.0568 0.0781 0.0502 0.0221 0.0734 -0.0153 "
)), .Names = "Info", row.names = c(NA, 6L), class = "data.frame")
It currently has only one column and it looks like this
Info
1 Acacia melanoceras 0.0369 0.0427 0.0267 0.0298 0.0501 0.0042
2 Acalypha diversifolia van 0.0670 0.0439 0.0281 0.0427 0.0464 -0.0148
3 Acalypha macrostachya vin 0.0657 0.0621 0.0441 0.0522 0.0473 -0.0173
4 Adelia triloba 0.0481 0.0350 0.0202 0.0174 0.0286 -0.0349
5 Aegiphila panamensis 0.0437 0.0312 0.0166 0.0148 0.0194 -0.0497
6 Alchornea costaricensis 0.0568 0.0781 0.0502 0.0221 0.0734 -0.0153
I would like it to have 7 columns and look like this:
Species V1 V2 V3 V4 V5 V6
1 Acacia melanoceras 0.0369 0.0427 0.0267 0.0298 0.0501 0.0042
2 Acalypha diversifolia van 0.0670 0.0439 0.0281 0.0427 0.0464 -0.0148
3 Acalypha macrostachya vin 0.0657 0.0621 0.0441 0.0522 0.0473 -0.0173
4 Adelia triloba 0.0481 0.0350 0.0202 0.0174 0.0286 -0.0349
5 Aegiphila panamensis 0.0437 0.0312 0.0166 0.0148 0.0194 -0.0497
6 Alchornea costaricensis 0.0568 0.0781 0.0502 0.0221 0.0734 -0.0153
This probelm has been giving me headaches as the species name is not always two words. The original text file is not delimited, so I have been unable to read it in delimited. I have only been able to get it in as one column strings. Anyone have any suggestions?

Try using gsub for putting a comma before every number in the "Info" column of a dataframe we will assume is named "dat" and then re-read with read.csv:
> read.csv(text=gsub("( [-[:digit:].])", ",\\1", dat$Info), header=FALSE)
V1 V2 V3 V4 V5 V6 V7
1 Acacia melanoceras 0.0369 0.0427 0.0267 0.0298 0.0501 0.0042
2 Acalypha diversifolia van 0.0670 0.0439 0.0281 0.0427 0.0464 -0.0148
3 Acalypha macrostachya vin 0.0657 0.0621 0.0441 0.0522 0.0473 -0.0173
4 Adelia triloba 0.0481 0.0350 0.0202 0.0174 0.0286 -0.0349
5 Aegiphila panamensis 0.0437 0.0312 0.0166 0.0148 0.0194 -0.0497
6 Alchornea costaricensis 0.0568 0.0781 0.0502 0.0221 0.0734 -0.0153
I thank you for describing your use case. I might be able to use this myself in the future.

Suppose ds is your data:
ds <-
structure(list(Info = c("Acacia melanoceras 0.0369 0.0427 0.0267 0.0298 0.0501 0.0042 ",
"Acalypha diversifolia van 0.0670 0.0439 0.0281 0.0427 0.0464 -0.0148 ",
"Acalypha macrostachya vin 0.0657 0.0621 0.0441 0.0522 0.0473 -0.0173 ",
"Adelia triloba 0.0481 0.0350 0.0202 0.0174 0.0286 -0.0349 ",
"Aegiphila panamensis 0.0437 0.0312 0.0166 0.0148 0.0194 -0.0497 ",
"Alchornea costaricensis 0.0568 0.0781 0.0502 0.0221 0.0734 -0.0153 "
)), .Names = "Info", row.names = c(NA, 6L), class = "data.frame")
You can then do something like
ds$Info <- gsub(" (-?[0-9])", ", \\1", ds$Info)
do.call(rbind, strsplit(ds$Info, ", "))
# [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#[1,] "Acacia melanoceras" "0.0369" "0.0427" "0.0267" "0.0298" "0.0501" "0.0042 "
#[2,] "Acalypha diversifolia van" "0.0670" "0.0439" "0.0281" "0.0427" "0.0464" "-0.0148 "
#[3,] "Acalypha macrostachya vin" "0.0657" "0.0621" "0.0441" "0.0522" "0.0473" "-0.0173 "
#[4,] "Adelia triloba" "0.0481" "0.0350" "0.0202" "0.0174" "0.0286" "-0.0349 "
#[5,] "Aegiphila panamensis" "0.0437" "0.0312" "0.0166" "0.0148" "0.0194" "-0.0497 "
#[6,] "Alchornea costaricensis" "0.0568" "0.0781" "0.0502" "0.0221" "0.0734" "-0.0153 "
where ds is your data as above, you're nearly done. You first look for the space followed by a number and put in a comma. Then we split the strings and combine the vectors. You can then yourself convert the object to a data.frame, covert the relevant columns to numeric, and add colnames.
EDIT:
As seen in BondedDust's answer, using read.csv is much more elegant.
read.csv(text = ds$Info, header = FALSE)

Here's my suggestion:
1) Split by ' ',
2) paste the species and genus names together (I assume you have 6 numeric columns) and
3) make a (character) data.frame.
4) Finally convert columns to numeric and
5) set Species as colname.
df <- structure(list(Info = c("Acacia melanoceras 0.0369 0.0427 0.0267 0.0298 0.0501 0.0042 ",
"Acalypha diversifolia van 0.0670 0.0439 0.0281 0.0427 0.0464 -0.0148 ",
"Acalypha macrostachya vin 0.0657 0.0621 0.0441 0.0522 0.0473 -0.0173 ",
"Adelia triloba 0.0481 0.0350 0.0202 0.0174 0.0286 -0.0349 ",
"Aegiphila panamensis 0.0437 0.0312 0.0166 0.0148 0.0194 -0.0497 ",
"Alchornea costaricensis 0.0568 0.0781 0.0502 0.0221 0.0734 -0.0153 "
)), .Names = "Info", row.names = c(NA, 6L), class = "data.frame")
df
# split
sp <- strsplit(df$Info, ' ')
sp
# make (character) data.frame
require(plyr)
newdf <- ldply(sp, function(x) {
l <- length(x)
dta <- x[(l-5):l]
spec <- paste(x[1:(l-6)], collapse = ' ')
out <- c(spec, dta)
return(out)
})
# make numeric cols
newdf[ , 2:7] <- apply(newdf[ , 2:7], 2, function(x) as.numeric(x))
names(newdf)[1] <- 'Species'
str(newdf)

Related

Compute euclidean distance for PCA in R

I did a PCA (using eigenvalue and smartPCA) and now I am trying to compute the Euclidean distance to one population. For example , in this dataset, I would want to compute in R the Euclidean distance of all population to population 6.
Individual PCA1 PCA2 PCA3 PCA4 PCA5 PCA6 PCA7 PCA8 PCA8 PCA10
1: Pop1 0.0346 0.0095 -0.0022 -0.0018 -0.0033 0.0002 0.0042 -0.0003 0.0028 0.0268
2: Pop2 0.0370 0.0095 -0.0027 0.0015 -0.0027 -0.0024 0.0038 0.0012 0.0053 0.0210
3: Pop3 0.0379 0.0100 -0.0030 0.0021 -0.0017 -0.0043 0.0033 0.0005 0.0036 0.0144
4: Pop4 0.0352 0.0092 -0.0031 -0.0021 -0.0029 -0.0005. 0.0038 -0.0003 0.0047 0.0349
5: Pop5 0.0342 0.0089 -0.0027 -0.0013 -0.0031 -0.0008 0.0032 -0.0017 -0.0009 0.0265
6: Pop6 0.0342 -0.0524 -0.0503 -0.1028 -0.4785 -0.0244 0.0279 0.0038 -0.0264 -0.0022 -0.0265
I found a thread on how to do it in Python but can't find it in R ! I tried dist() but the I don't understand the output or how to compare to only one population (I have 500+ population in total)
Thanks !

Calculate period return from monthly returns

This may sound naive but I can't seem to find the solution. I need to calculate 1, 3 and 5-year returns and my dataset consists of monthly returns rather than prices. The dataset I'm working on is similar to managers
data(managers)
tail(managers)
HAM1 HAM2 HAM3 HAM4 HAM5 HAM6 EDHEC LS EQ SP500 TR US 10Y TR US 3m TR
2006-07-31 -0.0144 -0.0131 0.0102 -0.0120 -0.0164 -0.0225 -0.0031 0.00620 0.01580 0.00423
2006-08-31 0.0161 -0.0113 0.0253 -0.0183 0.0169 0.0193 0.0114 0.02380 0.02190 0.00441
2006-09-30 0.0068 -0.0231 0.0072 0.0197 0.0132 -0.0177 0.0001 0.02580 0.01140 0.00456
2006-10-31 0.0427 0.0167 0.0183 0.0518 0.0266 0.0189 0.0194 0.03260 0.00584 0.00381
2006-11-30 0.0117 0.0206 0.0269 0.0373 0.0038 0.0300 0.0200 0.01900 0.01419 0.00430
2006-12-31 0.0115 -0.0062 0.0110 0.0206 0.0317 0.0215 0.0153 0.01403 -0.01550 0.00441
I looked into the Return.cumulative from package PerformanceAnalytics but there is no argument for specifying periods. ROC from TTR can specify the number of periods to use but it is not based on return. What would be the best way to do this? Thank you in advance!
Base on what you want and what you know about ROC from TTR , I will only provide the Data preparation part
#Sample Data
df=read.table(text=' Date HAM1 HAM2 HAM3 HAM4 HAM5 HAM6
2006-07-31 -0.0144 -0.0131 0.0102 -0.0120 -0.0164 -0.0225
2006-08-31 0.0161 -0.0113 0.0253 -0.0183 0.0169 0.0193
2006-09-30 0.0068 -0.0231 0.0072 0.0197 0.0132 -0.0177
2006-10-31 0.0427 0.0167 0.0183 0.0518 0.0266 0.0189
2006-11-30 0.0117 0.0206 0.0269 0.0373 0.0038 0.0300
2006-12-31 0.0115 -0.0062 0.0110 0.0206 0.0317 0.0215 ',header=T,stringsAsFactors=F)
#Make the Return to Price assume all stocks initial value with 100
for (i in 2:dim(df)[2]){
B=Reduce(function(x,y) {x * (1+y)}, df[,i], init=100, accumulate=T)# if it is log Return: {x * exp(y)}
if (i==2){
Price= B
}else{
Price=cbind(Price,B)
}
}
Price=data.frame(cbind(df$Date,Price[-1,]))
names(Price)=names(df)
> Price
Date HAM1 HAM2 HAM3 HAM4 HAM5 HAM6
1 2006-07-31 98.56 98.69 101.02 98.8 98.36 97.75
2 2006-08-31 100.146816 97.574803 103.575806 96.99196 100.022284 99.636575
3 2006-09-30 100.8278143488 95.3208250507 104.3215518032 98.902701612 101.3425781488 97.8730076225
4 2006-10-31 105.133162021494 96.9126828290467 106.230636201199 104.025861555502 104.038290727558 99.7228074665652
5 2006-11-30 106.363220017145 98.909084095325 109.088240315011 107.906026191522 104.433636232323 102.714491690562
6 2006-12-31 107.586397047342 98.295847773934 110.288210958476 110.128890331067 107.744182500887 104.922853261909
Then you can use the normal package to annualized the return(or customize)
Using #Wen's data:
df=read.table(text=' Date HAM1 HAM2 HAM3 HAM4 HAM5 HAM6
2006-07-31 -0.0144 -0.0131 0.0102 -0.0120 -0.0164 -0.0225
2006-08-31 0.0161 -0.0113 0.0253 -0.0183 0.0169 0.0193
2006-09-30 0.0068 -0.0231 0.0072 0.0197 0.0132 -0.0177
2006-10-31 0.0427 0.0167 0.0183 0.0518 0.0266 0.0189
2006-11-30 0.0117 0.0206 0.0269 0.0373 0.0038 0.0300
2006-12-31 0.0115 -0.0062 0.0110 0.0206 0.0317 0.0215 ',header=T,stringsAsFactors=F)
You can use the rollaply function from the zoo package.
library(zoo)
roll <- function(x, n, stat) {
if (length(x) <= n) NA
else x <- x + 1
rollapply(x, list(-seq(n)), stat, fill = NA)
}
df2 <- transform(df, four_month_return_HAM1 = ave(HAM1, FUN = function(x) roll(x, 4, prod)-1))
Change 4 to the period you want to calculate the cumulative return over. So, for one year, this would be 12. This will then give you the 12 month rolling returns.

How to color the branches and tick labels in the heatmap.2?

I have done a Heat Map using the function heatmap.2 of gplots in R, but I don't have an idea of how to coloring the branches and tick labels per groups (Eg. if I cut the tree to have four gruops like in my second figure). I have checked that it is possible to color the dendrogram alone using dendextend package.
Also there is a heatmap here: selecting number of leaf nodes of dendrogram in heatmap.2 in R with a colored dendrogram, but I can't implement it in my example.
Somebody can help me with this issue?
Update
This is my Heat Map:
and I would like to have one like this with branches and tick labels in color according their four groups (this figure was edited with Illustrator to explain this question):
Here is the data and code that I have used:
Data
YEAR varA varB varC varD varE varF var1 var2 var3 var4 var5 var6 var7 var8 var9 var10 var11 var12 var13 var14 var15 var16 var17
2005 1.175290887 1.535846033 1.531113178 -1.10297075 0.0284 26 -25.5470 -24.2101 24.7900 3.3345 0.0468 0.5058 0.0087 1.7378 0.0703 2.7070 0.0183 0.0340 0.0177 0.0176 0.0240 0.0015 0.0292
2004 0.834733204 0.64917365 -0.403174087 0.116169692 0.033 50 -24.4170 -22.2574 27.3400 3.4106 0.1151 0.5822 0.0085 1.8133 0.0762 3.2604 0.0114 0.0178 0.0086 0.0086 0.0824 0.0018 0.0308
2003 1.297607635 1.224946337 0.4486378 0.227557968 0.0544 181 -24.5080 -23.2790 27.4200 3.5092 0.1052 0.5239 0.0038 0.9815 0.0681 2.7465 0.0074 0.0099 0.0025 0.0025 0.0142 0.0015 0.0298
2002 1.043780072 0.650695815 -0.337133061 0.016766696 0.0374 227 -22.6110 -21.7828 30.0200 3.6270 0.1119 0.5753 0.0106 0.7916 0.0805 3.0434 0.0069 0.0086 0.0109 0.0108 0.0313 0.0017 0.0288
2001 0.781864124 0.534881678 -0.740527443 0.171745261 0.0074 20 -23.9170 -23.2327 3.8007 0.1243 0.6216 0.0553 1.2333 0.3414 2.9606 0.0074 0.0384 0.0079 0.0082 0.0570 0.0018 0.0360
2000 0.742528229 0.667207042 -0.614740091 0.189253192 0.0257 88 -22.6420 -21.4066 30.8900 3.1693 0.0287 0.6244 0.0070 1.0256 0.1336 2.7033 0.0063 0.0102 0.0185 0.0186 0.0248 0.0015 0.0278
1999 0.701222612 1.059869033 0.772334853 0.290190993 0.0476 312 -22.4730 -21.8328 26.6600 3.0578 0.0719 0.6363 0.0032 0.7183 0.0649 2.5445 0.0066 0.0070 0.0063 0.0063 0.0095 0.0016 0.0252
1998 0.904634938 1.16455833 0.646654191 0.086214161 0.0546 332 -23.2070 -22.4399 26.1400 3.2344 0.0656 0.7096 0.0046 0.6709 0.0718 2.5656 0.0072 0.0166 0.0132 0.0131 0.0144 0.0016 0.0275
1997 0.965775183 1.362520795 0.653268963 0.007038426 0.0791 509 -23.4830 -22.4253 26.0400 3.0278 0.0438 0.7575 0.0081 0.5002 0.0657 2.5755 0.0077 0.0072 0.0083 0.0083 0.0108 0.0017 0.0252
1996 0.956113049 1.439534042 0.618648101 -0.334351083 0.0411 245 -23.4290 -23.0417 27.3000 2.9331 0.0363 0.9229 0.0050 0.4819 0.1306 2.7239 0.0072 0.0166 0.0027 0.0026 0.0174 0.0018 0.0240
1995 1.786742729 1.732091021 2.654237394 0.190377371 0.0842 646 -22.7600 -22.0212 24.2100 3.1562 0.0202 1.1728 0.0072 0.6133 0.0772 3.1313 0.0080 0.0051 0.0035 0.0035 0.0055 0.0022 0.0266
1994 0.811695681 0.670904284 0.76646691 2.163378723 0.0394 203 -22.4920 -21.3677 28.6500 3.2475 0.0132 1.7476 0.0084 0.9386 0.1880 3.8856 0.0082 0.0120 0.0129 0.0129 0.0151 0.0026 0.0280
1993 0.754876913 0.302624208 -0.927234708 -0.108263802 0.017 66 -22.3880 -21.2900 32.8400 3.5853 0.0008 1.6626 0.0221 1.2307 0.4173 3.8864 0.0079 0.0379 0.0199 0.0196 0.0225 0.0028 0.0319
1992 0.723058507 0.818965047 -0.52053294 0.384656566 0.0345 155 -21.4920 -20.8724 32.2000 3.3116 0.0068 1.5673 0.0104 0.9411 0.2245 4.0228 0.0075 0.0123 0.0308 0.0306 0.0112 0.0027 0.0287
1991 1.024225427 0.71537408 0.22672288 -0.029575009 0.0297 235 -23.4850 -22.7000 27.8400 4.4024 0.0097 1.6126 0.0698 0.6344 0.2832 4.4160 0.0108 0.0127 0.0184 0.0184 0.0122 0.0030 0.0356
1990 0.873807193 1.168599747 1.317306687 -0.335682786 0.0533 172 -23.7170 -23.5029 25.8100 4.0497 0.0170 1.5207 0.0065 0.5232 0.1734 4.6765 0.0104 0.0164 0.0131 0.0130 0.0093 0.0030 0.0332
1989 0.71498833 1.065965836 0.650281646 -0.048038841 0.0663 214 -23.5000 -23.1053 26.3500 4.1139 0.0159 1.6162 0.0096 0.5199 0.1426 4.7752 0.0106 0.0083 0.0098 0.0099 0.0076 0.0031 0.0341
1988 1.188282778 1.133076429 -0.167816244 0.448030288 0.007 64 -23.3750 -21.9900 29.3900 3.6893 0.0278 1.8392 0.0939 0.5658 1.2390 5.1103 0.0086 0.0775 0.0203 0.0202 0.0339 0.0034 0.0340
1987 0.788798159 0.276008942 -0.934596308 -0.039259431 0.012 65 -22.9540 -22.7758 28.3800 3.6375 0.0011 1.8331 0.0768 0.6187 0.6081 5.0475 0.0088 0.0554 0.0183 0.0180 0.0159 0.0038 0.0381
1986 0.757757883 1.395817348 0.455252572 -0.001274532 0.0125 47 -22.6120 -22.9011 29.7400 3.7060 0.0172 1.5279 0.0151 0.5897 0.6168 4.4917 0.0085 0.0160 0.0257 0.0256 0.0276 0.0033 0.0410
1985 1.128413419 0.321849225 -0.904189697 -0.05362552 0.0705 291 -22.7200 -21.9357 28.4100 3.5887 0.0100 1.4955 0.0022 0.3538 0.1471 4.3125 0.0091 0.0157 0.0042 0.0042 0.0041 0.0029 0.0292
1984 1.015352865 1.014625668 0.39294569 -0.267936245 0.0419 121 -23.5170 -23.1678 25.6200 4.5018 0.0353 1.8985 0.0022 0.3420 0.2620 4.9867 0.0113 0.0069 0.0058 0.0058 0.0051 0.0033 0.0356
1983 0.393985784 0.474743555 -0.368393191 -0.222845745 0.0161 49 -24.5600 -23.9514 30.5300 3.0978 0.0270 0.9467 0.0421 0.3287 0.5616 3.1256 0.0075 0.0553 0.0154 0.0155 0.0084 0.0022 0.0323
1982 0.503744901 0.524683063 -0.946225504 0.016766696 0.0118 10 -23.5970 -24.0037 30.3100 2.7288 0.0011 1.2154 0.0097 0.3022 0.8415 4.3594 0.0083 0.0254 0.0075 0.0076 0.0134 0.0029 0.0304
1981 0.872025585 1.496555573 0.658923526 -0.175816424 0.0489 343 -23.8320 -23.4716 28.1100 4.6585 0.0128 1.9205 0.0031 0.2999 0.2278 5.6588 0.0134 0.0067 0.0072 0.0071 0.0087 0.0036 0.0437
1980 2.165460373 3.419095697 3.741300435 0.250364758 0.0644 626 -24.5010 -24.0323 28.7300 3.8474 0.0122 1.4827 0.0019 0.2164 0.1859 4.3602 0.0104 0.0056 0.0050 0.0050 0.0064 0.0028 0.0337
1979 1.00201444 0.453601121 0.109577407 0.73158507 0.0281 301 -23.6070 -22.9149 27.9100 4.5765 0.0467 1.6919 0.0344 0.1940 0.3453 5.1064 0.0132 0.0162 0.0078 0.0077 0.0554 0.0032 0.0389
1978 0.829984787 0.2021646 -0.724630653 -0.178430782 0.0000
1977 0.939170906 0.192142351 -1.029656979 0.50745842 0.0068 30 -24.3510 -22.5760 29.4900 6.1029 0.3417 2.4069 0.0938 0.2824 1.3937 6.6441 0.0136 0.0609 0.0395 0.0391 0.6074 0.0045 0.0591
1976 0.741090851 0.151474404 -0.439448642 0.359471579 0.056 396 -23.7450 -22.7680 28.3700 4.3464 0.0431 1.6901 0.0234 0.2937 0.2160 5.1366 0.0113 0.0147 0.0082 0.0081 0.0317 0.0034 0.0389
1975 1.061884929 0.396763153 -1.075320241 0.433356946 0.0299 322 -23.4320 -22.9732 25.7800 5.0301 0.1740 2.2028 0.0311 0.3131 0.4254 5.8683 0.0131 0.0160 0.0182 0.0182 0.2093 0.0038 0.0443
1974 1.052548763 0.491883924 0.28198823 -0.562241025 0.0215 267 -23.3350 -22.7075 26.4100 5.3407 0.1187 2.2436 0.0231 0.2984 0.5378 5.8795 0.0127 0.0208 0.0127 0.0128 0.0821 0.0038 0.0466
1973 0.519163031 1.120525721 0.960322396 -0.84893256 0.0129 49 -23.4350 -23.0556 31.3500 6.4341 0.1105 2.4298 0.0484 0.2783 0.9249 5.8779 0.0129 0.0428 0.0124 0.0123 0.1293 0.0038 0.0499
1972 0.703961551 1.359485416 -0.306513069 -1.150818704 0.0228 247 -23.7840 -23.3257 28.3000 6.3520 0.1096 2.6043 0.0439 0.4126 0.5335 6.3320 0.0154 0.0279 0.0061 0.0062 0.0874 0.0042 0.0593
1971 0.714252707 1.621333793 -1.065184704 0.003023451 0.0274 196 -23.2140 -22.2731 31.3800 5.1332 0.0873 1.9259 0.0872 0.3598 0.4714 4.9337 0.0112 0.0234 0.0073 0.0073 0.0688 0.0034 0.0426
1970 1.022643019 1.491401283 0.088239434 -0.973528472 0.025 206 -22.9870 -21.9506 30.6200 5.0770 0.0698 2.1145 0.1825 0.3537 0.4990 5.3274 0.0129 0.0873 0.0098 0.0098 0.0316 0.0040 0.0479
1969 2.157784838 1.796722133 0.731152565 -0.193891705 0.0547 505 -24.2820 -23.9048 26.2400 5.0183 0.0637 2.2673 0.0127 0.2893 0.2420 5.1038 0.0129 0.0244 0.0069 0.0069 0.0154 0.0037 0.0440
1968 0.913026742 1.271215847 0.196849717 -1.068149218 0.0132 112 -22.9850 -21.9397 32.2300 4.0568 0.0498 2.0576 0.0965 0.2188 0.9468 5.3597 0.0080 0.0513 0.0157 0.0154 0.0507 0.0039 0.0371
1967 0.749350643 0.439194622 -1.316546028 0.306149455 0.0209 196 -23.7020 -22.8580 30.5400 4.5873 0.0703 1.9639 0.4981 0.2136 0.6086 5.1528 0.0100 0.0934 0.0103 0.0102 0.0235 0.0042 0.0415
1966 0.732785384 0.74795644 -0.681581292 1.265096245 0.0189 204 -23.3746 -22.7452 30.0600 4.8598 0.0542 1.8172 0.0437 0.2605 0.6557 5.2782 0.0131 0.0118 0.0081 0.0080 0.0203 0.0036 0.0418
1965 0.613725701 0.507953446 -1.91048851 0.825418348 0.0073 75 -24.2131 -22.5251 30.1900 5.5445 0.0691 1.9367 0.9303 0.2240 1.6461 5.5971 0.0119 0.1519 0.0318 0.0322 0.0436 0.0053 0.0467
1964 0.761469549 0.591007527 -0.715988774 -0.038091331 0.0000
1963 0.863218851 0.888615198 -0.331691877 -0.251436807 0.0123 121 -25.0690 -24.5964 27.4600 6.3232 0.0777 2.0383 0.1999 0.2465 0.9724 5.8291 0.0133 0.0349 0.0130 0.0131 0.0240 0.0044 0.0519
1962 1.194332086 1.123299319 1.400311402 -0.006545299 0.0296 250 -23.6850 -23.4588 29.3800 5.7280 0.0771 1.8900 0.0077 0.1952 0.4429 5.7635 0.0122 0.0047 0.0064 0.0063 0.0121 0.0041 0.0471
1961 0.685968021 0.396586649 -0.75076967 0.0168 201 -26.3352 -26.3457 5.5119 0.0726 1.9270 0.0180 0.1741 0.7887 5.7523 0.0121 0.0080 0.0119 0.0119 0.0208 0.0043 0.0496
1960 0.881343621 0.681729796 -0.466014418 0.0242 250 -25.5025 -25.2769 29.1200 6.5630 0.1133 2.2199 0.1176 0.2603 0.5894 6.4430 0.0159 0.0392 0.0062 0.0061 0.0308 0.0051 0.0647
1959 0.976463783 0.856497076 -0.769653776 0.0046 109 -24.9889 -25.0234 28.1000 7.4239 0.0760 3.3692 3.7315 0.4288 2.8041 7.8173 0.0178 0.6213 0.0559 0.0554 0.0902 0.0115 0.0722
1958 1.267054108 0.846073161 -0.698278256 0.0069 41 -24.5183 -25.8900 24.7200 8.4312 0.0602 3.1824 0.6086 0.4111 1.6313 7.3141 0.0165 0.0977 0.0280 0.0279 0.0575 0.0046 0.0709
1957 0.811849325 0.818326511 -1.087269506 0.0126 95 -23.4967 -23.5870 32.4900 5.6488 0.0761 2.6156 0.2207 0.4425 1.0305 7.3572 0.0159 0.0726 0.0380 0.0377 0.0437 0.0059 0.0573
1956 0.837065839 1.0007592 0.424525891 0.0115 76 -23.4403 -22.9419 32.1500 5.6087 0.0844 2.8347 0.3853 0.3125 1.1162 8.0455 0.0167 0.0696 0.0158 0.0157 0.0306 0.0058 0.0565
1955 2.044375189 1.828578166 0.0218 128 -24.9729 -24.2108 26.9000 7.4702 0.1659 4.0858 0.2619 0.3952 0.7023 9.7602 0.0222 0.0635 0.0111 0.0111 0.0338 0.0070 0.0731
1954 0.737033129 1.060103924 0.0029 8 -25.6604 -25.1068 28.9700 7.8034 0.0884 4.0907 1.8003 0.4834 5.0243 8.9409 0.0243 0.4037 0.0541 0.0529 0.2932 0.0091 0.0813
1953 0.619590578 0.647436408 0.0075 109 31.0400
1952 0.671851137 1.325676852 0.00562 41 33.1100
1951 0.894632264 1.397998867 0.00374 95 35.1800
1950 0.793048089 0.55195169 0.00186 76 -24.6750 -24.0405 37.2500 6.8214 0.1632 3.3876 1.0452 0.4622 1.7704 7.9556 0.0223 0.2316 0.0594 0.0592 0.3935 0.0066 0.0673
1949 0.70029018 1.053010492 0.0061 23 -25.2148 -26.0272 31.0900 5.8770 0.0532 3.0895 0.1231 0.4304 2.1365 7.9355 0.0165 0.1047 0.0204 0.0201 0.0735 0.0060 0.0578
1948 1.051413064 0.611568416 0.0105 86 -25.9116 -25.3761 29.6500 4.0905 0.0930 2.3578 0.7431 0.1757 1.3103 7.2889 0.0122 0.1378 0.0138 0.0136 0.0408 0.0056 0.0441
1947 0.706745895 0.323498221 0.0108 129 -26.5485 -25.8733 29.7700 5.7245 0.1294 3.2072 0.0524 0.2021 1.2550 9.1257 0.0150 0.1170 0.0155 0.0155 0.0393 0.0060 0.0588
1946 1.550656194 1.598435187 0.0164 381 -27.4603 -26.6368 28.0600 5.8659 0.1405 2.7682 0.0353 0.2424 0.3504 8.4089 0.0130 0.0437 0.0075 0.0075 0.0176 0.0057 0.0516
1945 0.877065687 0.539494611 0.0199 169 -26.7543 -26.0271 24.5700 6.2789 0.1407 2.9213 0.0309 0.3404 0.2888 7.9661 0.0131 0.0460 0.0079 0.0079 0.0185 0.0054 0.0507
1944 0.630508563 0.833959181 0.0116 20 -26.8748 -25.0203 29.4600 7.8427 0.0963 3.3664 0.8484 0.4187 0.4954 6.6868 0.0172 0.1799 0.0114 0.0114 0.0185 0.0066 0.0697
1943 0.948762137 0.552892235 0.0392 309 -24.8697 -26.9799 24.9700 7.2577 0.1020 3.2354 0.1611 0.3774 0.7706 8.0918 0.0196 0.0457 0.0060 0.0060 0.0120 0.0055 0.0699
1942 0.950673449 1.135547963 0.0148 18 -22.5094 -22.8155 28.5600 7.6926 0.1348 3.3979 0.6492 0.3347 1.3499 8.7744 0.0190 0.1142 0.0095 0.0095 0.0208 0.0072 0.0710
1941 1.185071356 1.263733805 0.0107 10 -24.3510 -22.5329 29.8200 6.2710 0.1459 3.3306 0.0560 0.3519 1.0068 9.4886 0.0179 0.0185 0.0196 0.0198 0.1190 0.0066 0.0613
1940 1.262322422 0.924262914 0.0168 133 -25.2962 -25.0828 26.2600 7.9568 0.1977 3.2329 0.0803 0.3561 3.2999 9.5743 0.0200 0.0232 0.0125 0.0125 0.0538 0.0065 0.0702
1939 1.114823086 1.548939022 0.0158 25 -25.5439 -24.3820 27.9800 4.2674 0.1624 2.3578 0.4553 0.3042 2.2656 7.3905 0.0087 0.0741 0.0100 0.0100 0.3075 0.0059 0.0413
1938 0.639727143 0.569847918 0.0115 5 -23.4696 -22.7480 5.0000 0.0751 2.6663 0.4021 0.2049 0.4997 7.9594 0.0121 0.0753 0.0093 0.0092 0.0819 0.0068 0.0485
1937 0.844930794 1.201811673 0.0269 13 -24.2616 -24.5915 25.9500 4.5623 0.0912 2.3393 0.0227 0.3172 0.2136 7.5512 0.0108 0.0093 0.0080 0.0079 0.1586 0.0049 0.0397
1936 0.603048989 0.528796963 0.0167 4 -23.4819 -23.1849 29.0200 7.1722 0.0600 2.7679 0.0126 0.2080 1.1025 7.5967 0.0175 0.0076 0.0094 0.0095 0.0608 0.0052 0.0569
1935 0.739921482 0.980951812 0.0369 402 -25.3542 -25.7692 30.5500 4.8218 0.0563 2.1489 0.0084 0.2337 1.3120 6.8994 0.0154 0.0044 0.0081 0.0081 0.0329 0.0047 0.0404
1934 0.936808475 1.350050919 0.0289 166 -26.1766 -24.8557 26.5700 4.2794 0.0626 2.1503 0.0112 0.3330 1.5501 6.8375 0.0072 0.0045 0.0248 0.0249 0.0818 0.0046 0.0362
1933 0.822006233 0.980858486 0.0187 215 -25.2825 -24.7483 27.0600 4.0682 0.0719 2.1376 0.0170 0.3042 3.6465 6.7130 0.0085 0.0074 0.0071 0.0071 0.0790 0.0047 0.0380
1932 1.128679304 1.122260931 0.0302 318 -26.5160 -24.7148 29.8100 3.4429 0.0475 2.1194 0.0111 0.2919 2.6147 7.5700 0.0093 0.0039 0.0069 0.0071 0.0472 0.0047 0.0336
1931 1.013960586 0.485124456 0.0189 13 -24.7074 -24.9517 30.7100 3.9828 0.0677 2.2806 0.0183 0.2268 3.7269 9.1548 0.0074 0.0089 0.0073 0.0073 0.0687 0.0057 0.0383
1930 1.148649752 1.029163891 0.0203 175 -26.8323 -26.0809 29.1800 3.0899 0.0697 3.5321 0.0158 0.3735 1.8765 13.0435 0.0121 0.0145 0.0103 0.0104 0.0397 0.0086 0.0506
1929 0.99387758 1.204846613 0.0376 104 -26.6411 -26.0890 28.1500 4.2733 0.0412 2.6675 0.0078 0.2893 0.1528 9.4824 0.0094 0.0112 0.0075 0.0075 0.0083 0.0064 0.0354
1928 0.905609551 0.772378969 0.0331 233 -25.8461 -26.2246 32.3600 5.8361 0.0440 2.8293 0.0095 0.2231 0.1736 8.7255 0.0186 0.0087 0.0074 0.0075 0.0091 0.0063 0.0476
1927 0.85672722 0.215215241 0.0171 152 -25.9555 -25.9299 28.1500 8.1915 0.1054 2.9585 0.0298 0.2692 0.3361 7.8459 0.0158 0.0135 0.0113 0.0112 0.2221 0.0057 0.0717
1926 0.932350398 0.425876672 0.0165 132 -27.7161 -26.9161 22.1900 7.5864 0.0875 3.2115 0.0256 0.2381 0.3483 8.4859 0.0152 0.0123 0.0127 0.0127 0.1256 0.0061 0.0618
1925 0.809324244 0.603492919 0.0174 48 -24.5765 -24.8562 28.9600 6.3520 0.0226 2.7524 0.0175 0.2355 0.3303 7.8838 0.0120 0.0130 0.0096 0.0096 0.0174 0.0058 0.0534
1924 1.735408827 1.991986688 0.027 253 -25.9985 -24.8571 31.4900 6.1000 0.1097 2.6762 0.0284 0.2676 2.2755 7.9132 0.0158 0.0089 0.0107 0.0106 0.2161 0.0054 0.0668
1923 0.787925712 1.573404755 0.0203 150 -24.6288 -25.1568 29.9300 5.6860 0.0967 2.5993 0.0231 0.2137 3.8395 9.0800 0.0128 0.0101 0.0098 0.0098 0.1010 0.0060 0.0536
1922 0.799163043 0.0208 334 -24.4215 -24.3729 28.8900 5.3341 0.0924 2.6394 0.0133 0.2462 3.8226 7.8138 0.0114 0.0069 0.0149 0.0150 0.0729 0.0054 0.0497
1921 0.77243578 0.0226 443 -23.4421 -23.8877 29.4300 6.1139 0.0805 3.2761 0.0156 0.2522 4.2754 10.1551 0.0128 0.0040 0.0195 0.0197 0.1065 0.0067 0.0623
1920 0.787155209 0.0385 278 -24.2587 -23.9798 29.2400 5.9896 0.0727 3.0804 0.0110 0.2266 3.7709 9.9680 0.0133 0.0038 0.0268 0.0269 0.0544 0.0067 0.0567
1919 0.836725864 0.0276 341 -24.7950 -24.8537 27.3900 6.5779 0.0798 3.1646 0.0126 0.2276 4.7733 10.8125 0.0149 0.0052 0.0154 0.0154 0.0604 0.0073 0.0629
1918 0.838156697 0.0058 392 -25.9260 -24.5236 30.6200 6.0259 0.0939 3.5283 0.0448 0.4603 6.5956 12.5834 0.0114 0.0238 0.0598 0.0605 0.2763 0.0095 0.0823
1917 0.966249549 0.0208 58 -25.5352 -24.7604 28.3400 5.8498 0.0925 2.8573 0.0143 0.2275 3.3143 9.2387 0.0118 0.0090 0.0238 0.0239 0.0445 0.0065 0.0535
1916 1.352618036 0.0152 567 -24.0530 -23.6626 27.6400 6.3964 0.0549 3.1876 0.0166 0.2559 6.1909 11.3232 0.0119 0.0088 0.0303 0.0302 0.0696 0.0078 0.0620
1915 0.56838431 0.0354 153 -23.6817 -23.9420 29.7600 5.9449 0.0494 3.1254 0.0118 0.2632 3.6600 10.8684 0.0125 0.0096 0.0234 0.0234 0.0455 0.0075 0.0580
1914 1.653698335 0.0096 355 -25.3230 -25.5543 30.4100 6.1042 0.0305 3.3067 0.0310 0.3592 11.7772 11.9468 0.0103 0.0189 0.0230 0.0230 0.0825 0.0083 0.0603
1913 0.673176646 0.018 479 -25.2734 -25.9128 31.0800 6.1167 0.1001 3.5575 0.0227 0.3392 8.3156 12.0722 0.0131 0.0069 0.0294 0.0291 0.0844 0.0083 0.0681
1912 1.168563731 0.0026 57 -25.4911 -25.0984 30.9900 8.2413 0.1793 5.4744 0.1320 0.7542 53.7132 17.0050 0.0120 0.1196 0.0562 0.0570 0.3436 0.0120 0.1118
1911 1.458277945 0.0119 43 -25.0742 -25.1744 29.2000 8.5525 0.0326 4.2884 0.0276 0.4920 13.5179 14.3376 0.0117 0.0126 0.0152 0.0153 0.0453 0.0096 0.0817
1910 1.653698335 0.0096 355 -25.3230 -25.5543 30.4100 6.1042 0.0305 3.3067 0.0310 0.3592 11.7772 11.9468 0.0103 0.0189 0.0230 0.0230 0.0825 0.0083 0.0603
Code
# reading data
test <- read.delim("clipboard", sep="")
rnames <- test[,1]
test <- data.matrix(test[,2:ncol(test)]) # to matrix
rownames(test) <- rnames
test <- scale(test, center=T, scale=T) # data standarization
test <- t(test) # transpose
## Creating a color palette & color breaks
my_palette <- colorRampPalette(c("forestgreen", "yellow", "red"))(n = 299)
col_breaks = c(seq(-1,-0.5,length=100), # forestgreen
seq(-0.5,0.5,length=100), # yellow
seq(0.5,1,length=100)) # red
# distance & hierarchical clustering
distance= dist(test, method ="euclidean")
hcluster = hclust(distance, method ="ward.D")
# Creating Heat Map
heatmap.2(test,
main = paste( "test"),
trace="none",
margins =c(5,7),
col=my_palette,
breaks=col_breaks,
dendrogram="row",
Rowv = as.dendrogram(hcluster),
Colv = "NA",
key.xlab = "Concentration (index)",
cexRow =0.6,
cexCol = 0.8,
na.rm = TRUE )
Solution: use the color_branches function from the dendextend package (or the set function, with the "branches_k_color", "k", and "value" parameters ).
First we need to get the data into R and create the relevant objects ready (this part is the same as the code in the question):
test <- read.delim("clipboard", sep="")
rnames <- test[,1]
test <- data.matrix(test[,2:ncol(test)]) # to matrix
rownames(test) <- rnames
test <- scale(test, center=T, scale=T) # data standarization
test <- t(test) # transpose
## Creating a color palette & color breaks
my_palette <- colorRampPalette(c("forestgreen", "yellow", "red"))(n = 299)
col_breaks = c(seq(-1,-0.5,length=100), # forestgreen
seq(-0.5,0.5,length=100), # yellow
seq(0.5,1,length=100)) # red
# distance & hierarchical clustering
distance= dist(test, method ="euclidean")
hcluster = hclust(distance, method ="ward.D")
Next, we get the dendrogram and the heatmap ready:
dend1 <- as.dendrogram(hcluster)
# Get the dendextend package
if(!require(dendextend)) install.packages("dendextend")
library(dendextend)
# get some colors
cols_branches <- c("darkred", "forestgreen", "orange", "blue")
# Set the colors of 4 branches
dend1 <- color_branches(dend1, k = 4, col = cols_branches)
# or with:
# dend1 <- set(dend1, "branches_k_color", k = 4, value = cols_branches)
# get the colors of the tips of the dendrogram:
# col_labels <- cols_branches[cutree(dend1, k = 4)] # this may need tweaking in various cases - the following is a more general solution.
# The following code will work on its own once I uplode dendextend 0.18.6 to CRAN - but that can
# take several good weeks until that happens. In the meantime
# Either use devtools::install_github('talgalili/dendextend')
# Or just the following:
source("https://raw.githubusercontent.com/talgalili/dendextend/master/R/attr_access.R")
col_labels <- get_leaves_branches_col(dend1)
# But due to the way heatmap.2 works - we need to fix it to be in the
# order of the data!
col_labels <- col_labels[order(order.dendrogram(dend1))]
# Creating Heat Map
if(!require(gplots)) install.packages("gplots")
library(gplots)
heatmap.2(test,
main = paste( "test"),
trace="none",
margins =c(5,7),
col=my_palette,
breaks=col_breaks,
dendrogram="row",
Rowv = dend1,
Colv = "NA",
key.xlab = "Concentration (index)",
cexRow =0.6,
cexCol = 0.8,
na.rm = TRUE,
RowSideColors = col_labels, # to add nice colored strips
colRow = col_labels # to add nice colored labels - only for qplots 2.17.0 and higher
)
Which produces this plot:
For more details on the package, you can have a look at its vignette.
p.s.: to get the labels colored depends on parameters of heatmap.2, and this should be asked from the maintainer of gplots (i.e.: from greg at warnes.net)
update: this answer now includes the new "colRow" parameter in qplots 2.17.0.
this is the maintainer of the gplots package.
I've added two new arguments to the gplots::heatmap.2 function, 'colRow' and 'colCol' to control the colors of the row and column labels. This will be part of gplots 2.17.0 which should be submitted to CRAN in the next day or so.

Mapping spatial Distributions in R

My data set includes 17 stations and for each station there are 24 hourly temperature values.
I would like to map each stations value in each hour and doing so for all the hours.
What I want to do is something like the image.
The data is in the following format:
N2 N3 N4 N5 N7 N8 N10 N12 N13 N14 N17 N19 N25 N28 N29 N31 N32
1 1.300 -0.170 -0.344 2.138 0.684 0.656 0.882 0.684 1.822 1.214 2.046 2.432 0.208 0.312 0.530 0.358 0.264
2 0.888 -0.534 -0.684 1.442 -0.178 -0.060 0.430 -0.148 1.420 0.286 1.444 2.138 -0.264 -0.042 0.398 -0.196 -0.148
3 0.792 -0.564 -0.622 0.998 -0.320 1.858 -0.036 -0.118 1.476 0.110 0.964 2.048 -0.480 -0.434 0.040 -0.538 -0.322
4 0.324 -1.022 -1.128 1.380 -0.792 1.042 -0.054 -0.158 1.518 -0.102 1.354 2.386 -0.708 -0.510 0.258 -0.696 -0.566
5 0.650 -0.774 -0.982 1.124 -0.540 3.200 -0.052 -0.258 1.452 0.028 1.022 2.110 -0.714 -0.646 0.266 -0.768 -0.532
6 0.670 -0.660 -0.844 1.248 -0.550 2.868 -0.098 -0.240 1.380 -0.012 1.164 2.324 -0.498 -0.474 0.860 -0.588 -0.324
MeteoSwiss
1 -0.6
2 -1.2
3 -1.0
4 -0.8
5 -0.4
6 -0.2
where N2, N3, ...m MeteoSwiss are the stations and each row presents the station's temperature value for each hour.
id Longitude Latitude
2 7.1735 45.86880001
3 7.17254 45.86887001
4 7.171636 45.86923601
5 7.18018 45.87158001
7 7.177229 45.86923001
8 7.17524 45.86808001
10 7.179299 45.87020001
12 7.175189 45.86974001
13 7.179379 45.87081001
14 7.175509 45.86932001
17 7.18099 45.87262001
19 7.18122 45.87355001
25 7.15497 45.87058001
28 7.153399 45.86954001
29 7.152649 45.86992001
31 7.154419 45.87004001
32 7.156099 45.86983001
MeteoSwiss 7.184 45.896
I define a toy example more or less resembling your data:
vals <- matrix(rnorm(24*17), nrow=24)
cds <- data.frame(id=paste0('N', 1:17),
Longitude=rnorm(n=17, mean=7.1),
Latitude=rnorm(n=17, mean=45.8))
vals <- as.data.frame(t(vals))
names(vals) <- paste0('H', 1:24)
The sp package defines several classes and methods to store and
display spatial data. For your example you should use the
SpatialPointsDataFrame class:
library(sp)
mySP <- SpatialPointsDataFrame(coords=cds[,-1], data=data.frame(vals))
and the spplot method to display the information:
spplot(mySP, as.table=TRUE,
col.regions=bpy.colors(10),
alpha=0.8, edge.col='black')
Besides, you may find useful the spacetime package
(paper at JSS).

pca in R with princomp() and using svd() [duplicate]

This question already has an answer here:
Closed 11 years ago.
Possible Duplicate:
Comparing svd and princomp in R
How to perform PCA using 2 methods (princomp() and svd of correlation matrix ) in R
I have a data set like:
438,498,3625,3645,5000,2918,5000,2351,2332,2643,1698,1687,1698,1717,1744,593,502,493,504,445,431,444,440,429,10
438,498,3625,3648,5000,2918,5000,2637,2332,2649,1695,1687,1695,1720,1744,592,502,493,504,449,431,444,443,429,10
438,498,3625,3629,5000,2918,5000,2637,2334,2643,1696,1687,1695,1717,1744,593,502,493,504,449,431,444,446,429,10
437,501,3625,3626,5000,2918,5000,2353,2334,2642,1730,1687,1695,1717,1744,593,502,493,504,449,431,444,444,429,10
438,498,3626,3629,5000,2918,5000,2640,2334,2639,1696,1687,1695,1717,1744,592,502,493,504,449,431,444,441,429,10
439,498,3626,3629,5000,2918,5000,2633,2334,2645,1705,1686,1694,1719,1744,589,502,493,504,446,431,444,444,430,10
440,5000,3627,3628,5000,2919,3028,2346,2330,2638,1727,1684,1692,1714,1745,588,501,492,504,451,433,446,444,432,10
444,5021,3631,3634,5000,2919,5000,2626,2327,2638,1698,1680,1688,1709,1740,595,500,491,503,453,436,448,444,436,10
451,5025,3635,3639,5000,2920,3027,2620,2323,2632,1706,1673,1681,1703,753,595,499,491,502,457,440,453,454,442,20
458,5022,3640,3644,5000,2922,5000,2346,2321,2628,1688,1666,1674,1696,744,590,496,490,498,462,444,458,461,449,20
465,525,3646,3670,5000,2923,5000,2611,2315,2631,1674,1658,1666,1688,735,593,495,488,497,467,449,462,469,457,20
473,533,3652,3676,5000,2925,5000,2607,2310,2623,1669,1651,1659,1684,729,578,496,487,498,469,454,467,476,465,20
481,544,3658,3678,5000,2926,5000,2606,2303,2619,1668,1643,1651,1275,723,581,495,486,497,477,459,472,484,472,20
484,544,3661,3665,5000,2928,5000,2321,2304,5022,1647,1639,1646,1270,757,623,493,484,495,480,461,474,485,476,20
484,532,3669,3662,2945,2926,5000,2326,2306,2620,1648,1639,1646,1270,760,533,493,483,494,507,461,473,486,476,20
482,520,3685,3664,2952,2927,5000,2981,2307,2329,1650,1640,1644,1268,757,533,492,482,492,513,459,474,485,474,20
481,522,3682,3661,2955,2927,2957,2984,1700,2622,1651,1641,1645,1272,761,530,492,482,492,513,462,486,483,473,20
480,525,3694,3664,2948,2926,2950,2995,1697,2619,1651,1642,1646,1269,762,530,493,482,492,516,462,486,483,473,20
481,515,5018,3664,2956,2927,2947,2993,1697,2622,1651,1641,1645,1269,765,592,489,482,495,531,462,499,483,473,20
479,5000,3696,3661,2953,2927,2944,2993,1702,2622,1649,1642,1645,1269,812,588,489,481,491,510,462,481,483,473,20
480,506,5019,3665,2941,2929,2945,2981,1700,2616,1652,1642,1645,1271,814,643,491,480,493,524,461,469,484,473,20
479,5000,5019,3661,2943,2930,2942,2996,1698,2312,1653,1642,1644,1274,811,617,491,479,491,575,461,465,484,473,20
479,5000,5020,3662,2945,2931,2942,2997,1700,2313,1654,1642,1644,1270,908,616,490,478,489,503,460,460,478,473,10
481,508,5021,3660,2954,2936,2946,2966,1705,2313,1654,1643,1643,1270,1689,678,493,477,483,497,467,459,476,473,10
486,510,522,3662,2958,2938,2939,2627,1707,2314,1659,1643,1639,1665,1702,696,516,476,477,547,465,457,470,474,10
479,521,520,3663,2954,2938,2941,2957,1712,2314,1660,1643,1638,1660,1758,688,534,475,475,489,461,456,465,474,10
480,554,521,3664,2954,2938,2941,2632,1715,2313,1660,1643,1637,1656,1761,687,553,475,474,558,462,453,465,476,10
481,511,5023,3665,2954,2937,2941,2627,1707,2312,1660,1641,1636,1655,1756,687,545,475,475,504,463,458,470,477,10
482,528,524,3665,2953,2937,2940,2629,1706,2312,1657,1640,1635,1654,1756,566,549,475,476,505,464,459,468,477,10
So I am doing this:
x <- read.csv("C:\\data_25_1000.txt",header=F,row.names=NULL)
p1 <- princomp(x, cor = TRUE) ## using correlation matrix
p1
Call:
princomp(x = x, cor = TRUE)
Standard deviations:
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9 Comp.10 Comp.11 Comp.12 Comp.13 Comp.14 Comp.15 Comp.16
1.9800328 1.8321498 1.4147367 1.3045541 1.2016116 1.1708212 1.1424120 1.0134829 1.0045317 0.9078734 0.8442308 0.8093044 0.7977656 0.7661921 0.7370972 0.7075442
Comp.17 Comp.18 Comp.19 Comp.20 Comp.21 Comp.22 Comp.23 Comp.24 Comp.25
0.7011462 0.6779179 0.6671614 0.6407627 0.6077336 0.5767217 0.5659030 0.5526520 0.5191375
25 variables and 1000 observations.
For the second method suppose I have the correlation matrix of "C:\data_25_1000.txt"
which is:
1.0 0.3045 0.1448 -0.0714 -0.038 -0.0838 -0.1433 -0.1071 -0.1988 -0.1076 -0.0313 -0.157 -0.1032 -0.137 -0.0802 0.1244 0.0701 0.0457 -0.0634 0.0401 0.1643 0.3056 0.3956 0.4533 0.1557
0.3045 0.9999 0.3197 0.1328 0.093 -0.0846 -0.132 0.0046 -0.004 -0.0197 -0.1469 -0.1143 -0.2016 -0.1 -0.0316 0.0044 -0.0589 -0.0589 0.0277 0.0314 0.078 0.0104 0.0692 0.1858 0.0217
0.1448 0.3197 1 0.3487 0.2811 0.0786 -0.1421 -0.1326 -0.2056 -0.1109 0.0385 -0.1993 -0.1975 -0.1858 -0.1546 -0.0297 -0.0629 -0.0997 -0.0624 -0.0583 0.0316 0.0594 0.0941 0.0813 -0.1211
-0.0714 0.1328 0.3487 1 0.6033 0.2866 -0.246 -0.1201 -0.1975 -0.0929 -0.1071 -0.212 -0.3018 -0.3432 -0.2562 0.0277 -0.1363 -0.2218 -0.1443 -0.0322 -0.012 0.1741 -0.0725 -0.0528 -0.0937
-0.038 0.093 0.2811 0.6033 1 0.4613 0.016 0.0655 -0.1094 0.0026 -0.1152 -0.1692 -0.2047 -0.2508 -0.319 -0.0528 -0.1839 -0.2758 -0.2657 -0.1136 -0.0699 0.1433 -0.0136 -0.0409 -0.1538
-0.0838 -0.0846 0.0786 0.2866 0.4613 0.9999 0.2615 0.2449 0.1471 0.0042 -0.1496 -0.2025 -0.1669 -0.142 -0.1746 -0.1984 -0.2197 -0.2631 -0.2675 -0.1999 -0.1315 0.0469 0.0003 -0.1113 -0.1217
-0.1433 -0.132 -0.1421 -0.246 0.016 0.2615 1 0.3979 0.3108 0.1622 -0.0539 0.0231 0.1801 0.2129 0.1331 -0.1325 -0.0669 -0.0922 -0.1236 -0.1463 -0.1452 -0.2422 -0.0768 -0.1457 0.036
-0.1071 0.0046 -0.1326 -0.1201 0.0655 0.2449 0.3979 1 0.4244 0.3821 0.119 -0.0666 0.0163 0.0963 -0.0078 -0.1202 -0.204 -0.2257 -0.2569 -0.2334 -0.234 -0.2004 -0.138 -0.0735 -0.1442
-0.1988 -0.004 -0.2056 -0.1975 -0.1094 0.1471 0.3108 0.4244 0.9999 0.5459 0.0498 -0.052 0.0987 0.186 0.2576 -0.052 -0.1921 -0.2222 -0.1792 -0.0154 -0.058 -0.1868 -0.2232 -0.3118 0.0186
-0.1076 -0.0197 -0.1109 -0.0929 0.0026 0.0042 0.1622 0.3821 0.5459 0.9999 0.2416 0.0183 0.063 0.0252 0.186 0.0519 -0.1943 -0.2241 -0.2635 -0.0498 -0.0799 -0.0553 -0.1567 -0.2281 -0.0263
-0.0313 -0.1469 0.0385 -0.1071 -0.1152 -0.1496 -0.0539 0.119 0.0498 0.2416 1 0.2601 0.1625 -0.0091 -0.0633 0.0355 0.0397 -0.0288 -0.0768 -0.2144 -0.2581 0.1062 0.0469 -0.0608 -0.0578
-0.157 -0.1143 -0.1993 -0.212 -0.1692 -0.2025 0.0231 -0.0666 -0.052 0.0183 0.2601 0.9999 0.3685 0.3059 0.1269 -0.0302 0.1417 0.1678 0.2219 -0.0392 -0.2391 -0.2504 -0.2743 -0.1827 -0.0496
-0.1032 -0.2016 -0.1975 -0.3018 -0.2047 -0.1669 0.1801 0.0163 0.0987 0.063 0.1625 0.3685 1 0.6136 0.2301 -0.1158 0.0366 0.0965 0.1334 -0.0449 -0.1923 -0.2321 -0.1848 -0.1109 0.1007
-0.137 -0.1 -0.1858 -0.3432 -0.2508 -0.142 0.2129 0.0963 0.186 0.0252 -0.0091 0.3059 0.6136 1 0.4078 -0.0615 0.0607 0.1223 0.1379 0.0072 -0.1377 -0.3633 -0.2905 -0.1867 0.0277
-0.0802 -0.0316 -0.1546 -0.2562 -0.319 -0.1746 0.1331 -0.0078 0.2576 0.186 -0.0633 0.1269 0.2301 0.4078 1 0.0521 -0.0345 0.0444 0.0778 0.0925 0.0596 -0.2551 -0.1499 -0.2211 0.244
0.1244 0.0044 -0.0297 0.0277 -0.0528 -0.1984 -0.1325 -0.1202 -0.052 0.0519 0.0355 -0.0302 -0.1158 -0.0615 0.0521 1 0.295 0.2421 -0.06 0.0921 0.243 0.0953 0.0886 0.0518 -0.0032
0.0701 -0.0589 -0.0629 -0.1363 -0.1839 -0.2197 -0.0669 -0.204 -0.1921 -0.1943 0.0397 0.1417 0.0366 0.0607 -0.0345 0.295 0.9999 0.4832 0.2772 0.0012 0.1198 0.0411 0.1213 0.1409 0.0368
0.0457 -0.0589 -0.0997 -0.2218 -0.2758 -0.2631 -0.0922 -0.2257 -0.2222 -0.2241 -0.0288 0.1678 0.0965 0.1223 0.0444 0.2421 0.4832 1 0.2632 0.0576 0.0965 -0.0043 0.0818 0.102 0.0915
-0.0634 0.0277 -0.0624 -0.1443 -0.2657 -0.2675 -0.1236 -0.2569 -0.1792 -0.2635 -0.0768 0.2219 0.1334 0.1379 0.0778 -0.06 0.2772 0.2632 1 0.2036 -0.0452 -0.142 -0.0696 -0.0367 0.3039
0.0401 0.0314 -0.0583 -0.0322 -0.1136 -0.1999 -0.1463 -0.2334 -0.0154 -0.0498 -0.2144 -0.0392 -0.0449 0.0072 0.0925 0.0921 0.0012 0.0576 0.2036 0.9999 0.2198 0.1268 0.0294 0.0261 0.3231
0.1643 0.078 0.0316 -0.012 -0.0699 -0.1315 -0.1452 -0.234 -0.058 -0.0799 -0.2581 -0.2391 -0.1923 -0.1377 0.0596 0.243 0.1198 0.0965 -0.0452 0.2198 1 0.2667 0.2833 0.2467 0.0288
0.3056 0.0104 0.0594 0.1741 0.1433 0.0469 -0.2422 -0.2004 -0.1868 -0.0553 0.1062 -0.2504 -0.2321 -0.3633 -0.2551 0.0953 0.0411 -0.0043 -0.142 0.1268 0.2667 1 0.4872 0.3134 0.1663
0.3956 0.0692 0.0941 -0.0725 -0.0136 0.0003 -0.0768 -0.138 -0.2232 -0.1567 0.0469 -0.2743 -0.1848 -0.2905 -0.1499 0.0886 0.1213 0.0818 -0.0696 0.0294 0.2833 0.4872 0.9999 0.4208 0.1317
0.4533 0.1858 0.0813 -0.0528 -0.0409 -0.1113 -0.1457 -0.0735 -0.3118 -0.2281 -0.0608 -0.1827 -0.1109 -0.1867 -0.2211 0.0518 0.1409 0.102 -0.0367 0.0261 0.2467 0.3134 0.4208 1 0.0592
0.1557 0.0217 -0.1211 -0.0937 -0.1538 -0.1217 0.036 -0.1442 0.0186 -0.0263 -0.0578 -0.0496 0.1007 0.0277 0.244 -0.0032 0.0368 0.0915 0.3039 0.3231 0.0288 0.1663 0.1317 0.0592 0.9999
I have also computed svd of this correlation matrix and got:
> s = svd(Correlation_25_1000)
$d
[1] 3.9205298 3.3567729 2.0014799 1.7018614 1.4438704 1.3708223 1.3051053 1.0271475 1.0090840 0.8242341 0.7127256 0.6549736 0.6364299 0.5870503 0.5433123 0.5006188 0.4916060
[18] 0.4595726 0.4451043 0.4105769 0.3693401 0.3326079 0.3202462 0.3054243 0.2695037
$u
matrix
$v
matrix
My question is, how can I use $d, $u and $v to get principal components
Could I use prcomp() ?? If, so how?
Try this one
princomp
princomp(USArrests, cor = TRUE)$loadings
Loadings:
Comp.1 Comp.2 Comp.3 Comp.4
Murder -0.536 0.418 -0.341 0.649
Assault -0.583 0.188 -0.268 -0.743
UrbanPop -0.278 -0.873 -0.378 0.134
Rape -0.543 -0.167 0.818
svd
svd(cor(USArrests))$u
[,1] [,2] [,3] [,4]
[1,] -0.5358995 0.4181809 -0.3412327 0.64922780
[2,] -0.5831836 0.1879856 -0.2681484 -0.74340748
[3,] -0.2781909 -0.8728062 -0.3780158 0.13387773
[4,] -0.5434321 -0.1673186 0.8177779 0.08902432
eigen
eigen(cor(USArrests))$vectors
[,1] [,2] [,3] [,4]
[1,] -0.5358995 0.4181809 -0.3412327 0.64922780
[2,] -0.5831836 0.1879856 -0.2681484 -0.74340748
[3,] -0.2781909 -0.8728062 -0.3780158 0.13387773
[4,] -0.5434321 -0.1673186 0.8177779 0.08902432
For cor matrix, all princomp, svd, and eigen produces same results.

Resources