How can I split one column in multiple columns in R using spaces as separators?
I tried to find an answer for few hours (even days) but now I count on you guys to help me!
This is how my data set looks like and it's all in one column, I don't really care about the column names as in the end I will only need a few of them for my analysis:
[1] 1000.0 246
[2] 970.0 491 -3.3 -5.0 88 2.73 200 4 272.2 279.8 272.7
[3] 909.0 1002 -4.7 -6.6 87 2.58 200 12 275.9 283.2 276.3
[4] 900.0 1080 -5.5 -7.5 86 2.43 200 13 275.8 282.8 276.2
[5] 879.0 1264 -6.5 -8.8 84 2.25 200 16 276.7 283.1 277.0
[6] 850.0 1525 -6.5 -12.5 62 1.73 200 20 279.3 284.4 279.6
Also, I tried the separate function and it give me an error telling me that this is not possible for a function class object.
Thanks a lot for your help!
It's always easier to help if there is minimal reproducible example in the question. The data you show is not easily usable...
MRE:
data_vector <- c("1000.0 246",
"970.0 491 -3.3 -5.0 88 2.73 200 4 272.2 279.8 272.7",
"909.0 1002 -4.7 -6.6 87 2.58 200 12 275.9 283.2 276.3",
"900.0 1080 -5.5 -7.5 86 2.43 200 13 275.8 282.8 276.2",
"879.0 1264 -6.5 -8.8 84 2.25 200 16 276.7 283.1 277.0",
"850.0 1525 -6.5 -12.5 62 1.73 200 20 279.3 284.4 279.6")
And here is a solution using gsub and read.csv:
oo <- read.csv(text=gsub(" +", " ", paste0(data_vector, collapse="\n")), sep=" ", header=FALSE)
Which produces this output:
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11
1 1000 246 NA NA NA NA NA NA NA NA NA
2 970 491 -3.3 -5.0 88 2.73 200 4 272.2 279.8 272.7
3 909 1002 -4.7 -6.6 87 2.58 200 12 275.9 283.2 276.3
4 900 1080 -5.5 -7.5 86 2.43 200 13 275.8 282.8 276.2
5 879 1264 -6.5 -8.8 84 2.25 200 16 276.7 283.1 277.0
6 850 1525 -6.5 -12.5 62 1.73 200 20 279.3 284.4 279.6
The read.table/read.csv would work if we pass it as a character vector
read.table(text = data_vector, header = FALSE, fill = TRUE)
# V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11
#1 1000 246 NA NA NA NA NA NA NA NA NA
#2 970 491 -3.3 -5.0 88 2.73 200 4 272.2 279.8 272.7
#3 909 1002 -4.7 -6.6 87 2.58 200 12 275.9 283.2 276.3
#4 900 1080 -5.5 -7.5 86 2.43 200 13 275.8 282.8 276.2
#5 879 1264 -6.5 -8.8 84 2.25 200 16 276.7 283.1 277.0
#6 850 1525 -6.5 -12.5 62 1.73 200 20 279.3 284.4 279.6
data
data_vector <- c("1000.0 246",
"970.0 491 -3.3 -5.0 88 2.73 200 4 272.2 279.8 272.7",
"909.0 1002 -4.7 -6.6 87 2.58 200 12 275.9 283.2 276.3",
"900.0 1080 -5.5 -7.5 86 2.43 200 13 275.8 282.8 276.2",
"879.0 1264 -6.5 -8.8 84 2.25 200 16 276.7 283.1 277.0",
"850.0 1525 -6.5 -12.5 62 1.73 200 20 279.3 284.4 279.6")
Related
I have two vectors here. One is all the data about population for various countries:
## [1] "China 1439323776 0.39 5540090 153 9388211 -348399 1.7 38 61 18.47"
## [2] "India 1380004385 0.99 13586631 464 2973190 -532687 2.2 28 35 17.70"
## [3] "United States 331002651 0.59 1937734 36 9147420 954806 1.8 38 83 4.25"
## [4] "Indonesia 273523615 1.07 2898047 151 1811570 -98955 2.3 30 56 3.51"
## [5] "Pakistan 220892340 2.00 4327022 287 770880 -233379 3.6 23 35 2.83"
## [6] "Brazil 212559417 0.72 1509890 25 8358140 21200 1.7 33 88 2.73"
## [7] "Nigeria 206139589 2.58 5175990 226 910770 -60000 5.4 18 52 2.64"
## [8] "Bangladesh 164689383 1.01 1643222 1265 130170 -369501 2.1 28 39 2.11"
## [9] "Russia 145934462 0.04 62206 9 16376870 182456 1.8 40 74 1.87 "
## [10] "Tokelau 1357 1.27 17 136 10 N.A. N.A. 0 0.00"
## [11] "Holy See 801 0.25 2 2003 0 N.A. N.A. N.A. 0.00"
The other vector is all the column names in the exact order corresponding to the country name and those numbers above:
## [1] "Country(ordependency)" "Population(2020)" "YearlyChange"
## [4] "NetChange" "Density(P/Km²)" "LandArea(Km²)"
## [7] "Migrants(net)" "Fert.Rate" "Med.Age"
## [10] "UrbanPop%" "WorldShare"
How do I make a dataframe that match the column names corresponding to the its data such like this:
head(population)
Country (or dependency) Population (2020) Yearly Change Net Change Density (P/Km²) ......
1 China 1439323776 0.39 5540090 ... ....
2 India 1380004385 0.99 13586631 .......
3 United States 331002651 0.59 1937734 .......
4 Indonesia 273523615 1.07 2898047 .......
5 Pakistan 220892340 2.00 4327022 .......
Note: For the last two countries Tokelau and Holy See there are no "Migrants(net)" data.
TIA!
EDIT:
Some more samples are here:
## [53] "Côte d'Ivoire 26378274 2.57 661730 83 318000 -8000 4.7 19 51 0.34"
## [86] "Czech Republic (Czechia) 10708981 0.18 19772 139 77240 22011 1.6 43 74 0.14"
## [93] "United Arab Emirates 9890402 1.23 119873 118 83600 40000 1.4 33 86 0.13"
## [98] "Papua New Guinea 8947024 1.95 170915 20 452860 -800 3.6 22 13 0.11"
## [135] "Bosnia and Herzegovina 3280819 -0.61 -20181 64 51000 -21585 1.3 43 52 0.04"
## [230] "Saint Pierre & Miquelon 5794 -0.48 -28 25 230 N.A. N.A. 100 0.00"
UPDATES:
Here is the problem:
tail(population)
## Country(ordependency) Population(2020) YearlyChange NetChange
## 230 Saint Pierre & Miquelon 5794 -0.48 -28
## 231 Montserrat 4992 0.06 3
## 232 Falkland Islands 3480 3.05 103
## 233 Niue 1626 0.68 11
## 234 Tokelau 1357 1.27 17
## 235 Holy See 801 0.25 2
## Density(P/Km²) LandArea(Km²) Migrants(net) Fert.Rate **Med.Age** **UrbanPop%**
## 230 25 230 N.A. N.A. 100 0.00
## 231 50 100 N.A. N.A. 10 0.00
## 232 0 12170 N.A. N.A. 66 0.00
## 233 6 260 N.A. N.A. 46 0.00
## 234 136 10 N.A. N.A. 0 0.00
## 235 2003 0 N.A. N.A. N.A. 0.00
## **WorldShare**
## 230 NA
## 231 NA
## 232 NA
## 233 NA
## 234 NA
## 235 NA
All the rows with 10 variables instead of 11 are here:
## [202] "Isle of Man 85033 0.53 449 149 570 N.A. N.A. 53 0.00"
## [203] "Andorra 77265 0.16 123 164 470 N.A. N.A. 88 0.00"
## [204] "Dominica 71986 0.25 178 96 750 N.A. N.A. 74 0.00"
## [205] "Cayman Islands 65722 1.19 774 274 240 N.A. N.A. 97 0.00"
## [206] "Bermuda 62278 -0.36 -228 1246 50 N.A. N.A. 97 0.00"
## [207] "Marshall Islands 59190 0.68 399 329 180 N.A. N.A. 70 0.00"
## [208] "Northern Mariana Islands 57559 0.60 343 125 460 N.A. N.A. 88 0.00"
## [209] "Greenland 56770 0.17 98 0 410450 N.A. N.A. 87 0.00"
## [210] "American Samoa 55191 -0.22 -121 276 200 N.A. N.A. 88 0.00"
## [211] "Saint Kitts & Nevis 53199 0.71 376 205 260 N.A. N.A. 33 0.00"
## [212] "Faeroe Islands 48863 0.38 185 35 1396 N.A. N.A. 43 0.00"
## [213] "Sint Maarten 42876 1.15 488 1261 34 N.A. N.A. 96 0.00"
## [214] "Monaco 39242 0.71 278 26337 1 N.A. N.A. N.A. 0.00"
## [215] "Turks and Caicos 38717 1.38 526 41 950 N.A. N.A. 89 0.00"
## [216] "Saint Martin 38666 1.75 664 730 53 N.A. N.A. 0 0.00"
## [217] "Liechtenstein 38128 0.29 109 238 160 N.A. N.A. 15 0.00"
## [218] "San Marino 33931 0.21 71 566 60 N.A. N.A. 97 0.00"
## [219] "Gibraltar 33691 -0.03 -10 3369 10 N.A. N.A. N.A. 0.00"
## [220] "British Virgin Islands 30231 0.67 201 202 150 N.A. N.A. 52 0.00"
## [221] "Caribbean Netherlands 26223 0.94 244 80 328 N.A. N.A. 75 0.00"
## [222] "Palau 18094 0.48 86 39 460 N.A. N.A. N.A. 0.00"
## [223] "Cook Islands 17564 0.09 16 73 240 N.A. N.A. 75 0.00"
## [224] "Anguilla 15003 0.90 134 167 90 N.A. N.A. N.A. 0.00"
## [225] "Tuvalu 11792 1.25 146 393 30 N.A. N.A. 62 0.00"
## [226] "Wallis & Futuna 11239 -1.69 -193 80 140 N.A. N.A. 0 0.00"
## [227] "Nauru 10824 0.63 68 541 20 N.A. N.A. N.A. 0.00"
## [228] "Saint Barthelemy 9877 0.30 30 470 21 N.A. N.A. 0 0.00"
## [229] "Saint Helena 6077 0.30 18 16 390 N.A. N.A. 27 0.00"
## [230] "Saint Pierre & Miquelon 5794 -0.48 -28 25 230 N.A. N.A. 100 0.00"
## [231] "Montserrat 4992 0.06 3 50 100 N.A. N.A. 10 0.00"
## [232] "Falkland Islands 3480 3.05 103 0 12170 N.A. N.A. 66 0.00"
## [233] "Niue 1626 0.68 11 6 260 N.A. N.A. 46 0.00"
## [234] "Tokelau 1357 1.27 17 136 10 N.A. N.A. 0 0.00"
## [235] "Holy See 801 0.25 2 2003 0 N.A. N.A. N.A. 0.00"
It would be easier to read with read.table with delimiter space. But, there is an issue with space as the 'Country' may have multiple words and this should be read as a single column. In order to do that, we can insert single quotes as boundary for the Country using sub and then read with read.table while specifying the col.names as 'v2'
df1 <- read.table(text = sub("^([^0-9]+)\\s", ' "\\1"', v1),
header = FALSE, col.names = v2, fill = TRUE, check.names = FALSE)
-output
df1
Country(ordependency) Population(2020) YearlyChange NetChange Density(P/Km²) LandArea(Km²) Migrants(net) Fert.Rate Med.Age UrbanPop%
1 China 1439323776 0.39 5540090 153 9388211 -348399 1.7 38 61
2 India 1380004385 0.99 13586631 464 2973190 -532687 2.2 28 35
3 United States 331002651 0.59 1937734 36 9147420 954806 1.8 38 83
4 Indonesia 273523615 1.07 2898047 151 1811570 -98955 2.3 30 56
5 Pakistan 220892340 2.00 4327022 287 770880 -233379 3.6 23 35
6 Brazil 212559417 0.72 1509890 25 8358140 21200 1.7 33 88
7 Nigeria 206139589 2.58 5175990 226 910770 -60000 5.4 18 52
8 Bangladesh 164689383 1.01 1643222 1265 130170 -369501 2.1 28 39
9 Russia 145934462 0.04 62206 9 16376870 182456 1.8 40 74
10 Tokelau 1357 1.27 17 136 10 N.A. N.A. 0 0
11 Holy See 801 0.25 2 2003 0 N.A. N.A. N.A. 0
12 Côte d'Ivoire 26378274 2.57 661730 83 318000 -8000 4.7 19 51
13 Czech Republic (Czechia) 10708981 0.18 19772 139 77240 22011 1.6 43 74
14 United Arab Emirates 9890402 1.23 119873 118 83600 40000 1.4 33 86
15 Papua New Guinea 8947024 1.95 170915 20 452860 -800 3.6 22 13
16 Bosnia and Herzegovina 3280819 -0.61 -20181 64 51000 -21585 1.3 43 52
17 Saint Pierre & Miquelon 5794 -0.48 -28 25 230 N.A. N.A. 100 0
WorldShare
1 18.47
2 17.70
3 4.25
4 3.51
5 2.83
6 2.73
7 2.64
8 2.11
9 1.87
10 NA
11 NA
12 0.34
13 0.14
14 0.13
15 0.11
16 0.04
17 NA
For those cases where the count is less, we can update the column values by shifting the columns values with row/column indexing
library(stringr)
cnt <- str_count(sub("^([^0-9]+)\\s", '', v1), "\\s+") + 2
i1 <- cnt == 10
df1[i1, 10:11] <- df1[i1, 9:10]
df1[i1, 9] <- NA
data
v1 <- c("China 1439323776 0.39 5540090 153 9388211 -348399 1.7 38 61 18.47",
"India 1380004385 0.99 13586631 464 2973190 -532687 2.2 28 35 17.70",
"United States 331002651 0.59 1937734 36 9147420 954806 1.8 38 83 4.25",
"Indonesia 273523615 1.07 2898047 151 1811570 -98955 2.3 30 56 3.51",
"Pakistan 220892340 2.00 4327022 287 770880 -233379 3.6 23 35 2.83",
"Brazil 212559417 0.72 1509890 25 8358140 21200 1.7 33 88 2.73",
"Nigeria 206139589 2.58 5175990 226 910770 -60000 5.4 18 52 2.64",
"Bangladesh 164689383 1.01 1643222 1265 130170 -369501 2.1 28 39 2.11",
"Russia 145934462 0.04 62206 9 16376870 182456 1.8 40 74 1.87 ",
"Tokelau 1357 1.27 17 136 10 N.A. N.A. 0 0.00", "Holy See 801 0.25 2 2003 0 N.A. N.A. N.A. 0.00",
"Côte d'Ivoire 26378274 2.57 661730 83 318000 -8000 4.7 19 51 0.34",
"Czech Republic (Czechia) 10708981 0.18 19772 139 77240 22011 1.6 43 74 0.14",
"United Arab Emirates 9890402 1.23 119873 118 83600 40000 1.4 33 86 0.13",
"Papua New Guinea 8947024 1.95 170915 20 452860 -800 3.6 22 13 0.11",
"Bosnia and Herzegovina 3280819 -0.61 -20181 64 51000 -21585 1.3 43 52 0.04",
"Saint Pierre & Miquelon 5794 -0.48 -28 25 230 N.A. N.A. 100 0.00"
)
v2 <- c("Country(ordependency)", "Population(2020)", "YearlyChange",
"NetChange", "Density(P/Km²)", "LandArea(Km²)", "Migrants(net)",
"Fert.Rate", "Med.Age", "UrbanPop%", "WorldShare")
I am not sure what you mean but you could try:
df <- do.call(rbind.data.frame, vector1)
colnames(df) <- vector2
I tried converting my data frame into polygon using the code from previous
post but I got an error message. Please I need assistance on how to fix this.
Thanks. Below is my code:
County MEDIAN_V latitude longitude RACE DRAG AGIP AGIP2 AGIP3
Akpa 18.7 13.637 46.048 3521875 140.1290323 55 19 5
Uopa 17.9 12.85 44.869 3980000 86.71929825 278 6 4
Kaop 15.7 14.283 45.41 6623750 167.6746988 231 66 17
Nguru 14.7 13.916 44.764 3642500 152.256705 87 15 11
Nagima 20.2 14.7666636 43.249999 23545500 121.699 271 287 450
Dagoja 17.2 16.7833302 45.5166646 2316000 135.5187713 114 374 194
AlKoma 20.7 16.7999968 51.7333304 767000 83.38818565 NA NA NA
Ikaka 18.1 15.46833146 43.5404978 5687500 99.86455331 18 29 11
Maru 17.4 15.452 44.2173 10845625 90.98423127 679 424 159
Nko 19.4 16.17 43.89 10693000 109.7594937 126 140 60
Dfor 16.8 14.702 44.336 16587000 120.7656012 74 52 30
Hydr 20.7 16.666664 49.499998 5468000 126.388535 2 5 NA
lami 23 16.17 43.156 10432875 141.3487544 359 326 795
Ntoka 16.9 13.9499962 44.1833326 21614750 134.3637902 153 84 2
Lakoje 20.6 13.244 44.606 4050250 100.5965167 168 108 75
Mbiri 14.6 15.4499982 45.333332 2386625 166.9104478 465 452 502
Masi 18.2 14.633 43.6 4265250 117.16839 6 1 NA
Sukara 20.6 16.94021 43.76393 6162750 66.72009029 974 928 1176
Shakara 18.9 15.174 44.213 10721000 151.284264 585 979 574
Bambam 18.8 14.5499978 46.83333 3017625 142.442623 101 84 134
Erika 17.8 13.506 43.759 23565000 93.59459459 697 728 1034
mydata %>%
group_by(County) %>%
summarise(geometry = st_sfc(st_cast(st_multipoint(cbind(longitude,
latitude)), 'POLYGON'))) %>%
st_sf()
After running the above I got an error message:
Error in ClosePol(x) : polygons require at least 4 points
Please can someone help me out with how to fix this.
I am trying to calculate differences beetwen differents columns, I did it with a loop but I know that is not a elegant solution and not the best in R (not efficient) also my results have duplicated results and not logical operation (disp-disp or hp_disp and disp_hp).
My real data have Na, I tried to simulate them. My goal is try to improvement my command to get the same table below.
An example of my command is like:
names(mtcars)
mtcars$mpg[mtcars$am==1]=NA
vars1= c("mpg","cyl","disp","hp")
vars2= c("mpg","cyl","disp","hp")
df=data.frame()
df_all=data.frame()
df_all=length(mtcars)
for(i in vars1){
for(k in vars2) {
df= mtcars[[i]]-mtcars[[k]]
df_all=cbind(df_all, df)
length =ncol(df_all)
colnames(df_all)[length]= paste0(i,"_",k)
}
}
head(df_all)
disp_mpg disp_cyl disp_disp disp_hp hp_mpg hp_cyl hp_disp hp_hp
[1,] NA 154 0 50 NA 104 -50 0
[2,] NA 154 0 50 NA 104 -50 0
[3,] NA 104 0 15 NA 89 -15 0
[4,] 236.6 252 0 148 88.6 104 -148 0
[5,] 341.3 352 0 185 156.3 167 -185 0
[6,] 206.9 219 0 120 86.9 99 -120 0
Here's one way to do that, using the data.table library
library(data.table)
vars = c("mpg","cyl","disp","hp")
# create table of pairs to diff
to_diff <- CJ(vars, vars)[V1 < V2]
# calculate diffs
diffs <-
to_diff[, .(diff_val = mtcars[, V1] - mtcars[, V2]),
by = .(cols = paste0(V1, '_minus_', V2))]
# number each row in each "cols" group
diffs[, rid := rowid(cols)]
# transform so that rid determines the row, cols determines the col, and
# the values are the value of diff_val
dcast(diffs, rid ~ cols, value.var = 'diff_val')
Output
#
# rid cyl_minus_disp cyl_minus_hp cyl_minus_mpg disp_minus_hp disp_minus_mpg hp_minus_mpg
# 1: 1 -154.0 -104 -15.0 50.0 139.0 89.0
# 2: 2 -154.0 -104 -15.0 50.0 139.0 89.0
# 3: 3 -104.0 -89 -18.8 15.0 85.2 70.2
# 4: 4 -252.0 -104 -15.4 148.0 236.6 88.6
# 5: 5 -352.0 -167 -10.7 185.0 341.3 156.3
# 6: 6 -219.0 -99 -12.1 120.0 206.9 86.9
# 7: 7 -352.0 -237 -6.3 115.0 345.7 230.7
# 8: 8 -142.7 -58 -20.4 84.7 122.3 37.6
# 9: 9 -136.8 -91 -18.8 45.8 118.0 72.2
# 10: 10 -161.6 -117 -13.2 44.6 148.4 103.8
# 11: 11 -161.6 -117 -11.8 44.6 149.8 105.2
# 12: 12 -267.8 -172 -8.4 95.8 259.4 163.6
# 13: 13 -267.8 -172 -9.3 95.8 258.5 162.7
# 14: 14 -267.8 -172 -7.2 95.8 260.6 164.8
# 15: 15 -464.0 -197 -2.4 267.0 461.6 194.6
# 16: 16 -452.0 -207 -2.4 245.0 449.6 204.6
# 17: 17 -432.0 -222 -6.7 210.0 425.3 215.3
# 18: 18 -74.7 -62 -28.4 12.7 46.3 33.6
# 19: 19 -71.7 -48 -26.4 23.7 45.3 21.6
# 20: 20 -67.1 -61 -29.9 6.1 37.2 31.1
# 21: 21 -116.1 -93 -17.5 23.1 98.6 75.5
# 22: 22 -310.0 -142 -7.5 168.0 302.5 134.5
# 23: 23 -296.0 -142 -7.2 154.0 288.8 134.8
# 24: 24 -342.0 -237 -5.3 105.0 336.7 231.7
# 25: 25 -392.0 -167 -11.2 225.0 380.8 155.8
# 26: 26 -75.0 -62 -23.3 13.0 51.7 38.7
# 27: 27 -116.3 -87 -22.0 29.3 94.3 65.0
# 28: 28 -91.1 -109 -26.4 -17.9 64.7 82.6
# 29: 29 -343.0 -256 -7.8 87.0 335.2 248.2
# 30: 30 -139.0 -169 -13.7 -30.0 125.3 155.3
# 31: 31 -293.0 -327 -7.0 -34.0 286.0 320.0
# 32: 32 -117.0 -105 -17.4 12.0 99.6 87.6
# rid cyl_minus_disp cyl_minus_hp cyl_minus_mpg disp_minus_hp disp_minus_mpg hp_minus_mpg
i got problem how to delete several lines in txt file then convert into csv with R because i just want to get the data from txt.
My code cant delete propely because it delete lines which contain the date of the data
Here the code i used
setwd("D:/tugasmaritim/")
FILES <- list.files( pattern = ".txt")
for (i in 1:length(FILES)) {
l <- readLines(FILES[i],skip=4)
l2 <- l[-sapply(grep("</PRE><H3>", l), function(x) seq(x, x + 30))]
l3 <- l2[-sapply(grep("<P>Description", l2), function(x) seq(x, x + 29))]
l4 <- l3[-sapply(grep("<HTML>", l3), function(x) seq(x, x + 3))]
write.csv(l4,row.names=FALSE,file=paste0("D:/tugasmaritim/",sub(".txt","",FILES[i]),".csv"))
}
my data looks like this
<HTML>
<TITLE>University of Wyoming - Radiosonde Data</TITLE>
<LINK REL="StyleSheet" HREF="/resources/select.css" TYPE="text/css">
<BODY BGCOLOR="white">
<H2>96749 WIII Jakarta Observations at 00Z 02 Oct 1995</H2>
<PRE>
-----------------------------------------------------------------------------
PRES HGHT TEMP DWPT RELH MIXR DRCT SKNT THTA THTE THTV
hPa m C C % g/kg deg knot K K K
-----------------------------------------------------------------------------
1011.0 8 23.2 22.5 96 17.30 0 0 295.4 345.3 298.5
1000.0 98 23.6 22.4 93 17.39 105 8 296.8 347.1 299.8
977.3 300 24.6 22.1 86 17.49 105 8 299.7 351.0 302.8
976.0 311 24.6 22.1 86 17.50 104 8 299.8 351.2 303.0
950.0 548 23.0 22.0 94 17.87 88 12 300.5 353.2 303.7
944.4 600 22.6 21.8 95 17.73 85 13 300.6 352.9 303.8
925.0 781 21.2 21.0 99 17.25 90 20 301.0 351.9 304.1
918.0 847 20.6 20.6 100 16.95 90 23 301.0 351.0 304.1
912.4 900 20.4 18.6 89 15.00 90 26 301.4 345.7 304.1
897.0 1047 20.0 13.0 64 10.60 90 26 302.4 334.1 304.3
881.2 1200 19.4 11.4 60 9.70 90 26 303.3 332.5 305.1
850.0 1510 18.2 8.2 52 8.09 95 18 305.2 329.9 306.7
845.0 1560 18.0 7.0 49 7.49 91 17 305.5 328.4 306.9
810.0 1920 15.0 9.0 67 8.97 60 11 306.0 333.4 307.7
792.9 2100 14.3 3.1 47 6.06 45 8 307.1 325.9 308.2
765.1 2400 13.1 -6.8 24 3.01 40 8 309.0 318.7 309.5
746.0 2612 12.2 -13.8 15 1.77 38 10 310.3 316.2 310.6
712.0 3000 10.3 -15.0 15 1.69 35 13 312.3 318.1 312.6
700.0 3141 9.6 -15.4 16 1.66 35 13 313.1 318.7 313.4
653.0 3714 6.6 -16.4 18 1.63 32 12 316.0 321.6 316.3
631.0 3995 4.8 -2.2 60 5.19 31 11 317.0 333.9 318.0
615.3 4200 3.1 -3.9 60 4.70 30 11 317.4 332.8 318.3
601.0 4391 1.6 -5.4 60 4.28 20 8 317.8 331.9 318.6
592.9 4500 0.6 -12.0 38 2.59 15 6 317.9 326.6 318.4
588.0 4567 0.0 -16.0 29 1.88 11 6 317.9 324.4 318.3
571.0 4800 -1.2 -18.9 25 1.51 355 5 319.1 324.4 319.4
549.8 5100 -2.8 -22.8 20 1.12 45 6 320.7 324.8 321.0
513.0 5649 -5.7 -29.7 13 0.64 125 10 323.6 326.0 323.8
500.0 5850 -5.1 -30.1 12 0.63 155 11 326.8 329.1 326.9
494.0 5945 -4.9 -29.9 12 0.65 146 11 328.1 330.6 328.3
471.7 6300 -7.4 -32.0 12 0.56 110 13 329.3 331.5 329.4
453.7 6600 -9.6 -33.8 12 0.49 100 14 330.3 332.2 330.4
400.0 7570 -16.5 -39.5 12 0.31 105 14 333.5 334.7 333.5
398.0 7607 -16.9 -39.9 12 0.30 104 14 333.4 334.6 333.5
371.9 8100 -20.4 -42.6 12 0.24 95 16 335.4 336.3 335.4
300.0 9660 -31.3 -51.3 12 0.11 115 18 341.1 341.6 341.2
269.0 10420 -36.3 -55.3 12 0.08 79 20 344.7 345.0 344.7
265.9 10500 -36.9 75 20 344.9 344.9
250.0 10920 -40.3 80 28 346.0 346.0
243.4 11100 -41.8 85 37 346.4 346.4
222.5 11700 -46.9 75 14 347.6 347.6
214.0 11960 -49.1 68 16 348.1 348.1
200.0 12400 -52.7 55 20 349.1 349.1
156.0 13953 -66.1 55 25 352.1 352.1
152.3 14100 -67.2 55 26 352.6 352.6
150.0 14190 -67.9 55 26 352.9 352.9
144.7 14400 -69.6 60 26 353.6 353.6
137.5 14700 -72.0 60 39 354.6 354.6
130.7 15000 -74.3 50 28 355.6 355.6
124.2 15300 -76.7 40 36 356.5 356.5
118.0 15600 -79.1 50 48 357.4 357.4
116.0 15698 -79.9 45 44 357.6 357.6
112.0 15900 -79.1 45 26 362.6 362.6
106.3 16200 -78.0 35 24 370.2 370.2
100.0 16550 -76.7 35 24 379.3 379.3
</PRE><H3>Station information and sounding indices</H3><PRE>
Station identifier: WIII
Station number: 96749
Observation time: 951002/0000
Station latitude: -6.11
Station longitude: 106.65
Station elevation: 8.0
Showalter index: 6.30
Lifted index: -1.91
LIFT computed using virtual temperature: -2.80
SWEAT index: 145.41
K index: 6.50
Cross totals index: 13.30
Vertical totals index: 23.30
Totals totals index: 36.60
Convective Available Potential Energy: 799.02
CAPE using virtual temperature: 1070.13
Convective Inhibition: -26.70
CINS using virtual temperature: -12.88
Equilibrum Level: 202.64
Equilibrum Level using virtual temperature: 202.60
Level of Free Convection: 828.70
LFCT using virtual temperature: 909.19
Bulk Richardson Number: 210.78
Bulk Richardson Number using CAPV: 282.30
Temp [K] of the Lifted Condensation Level: 294.96
Pres [hPa] of the Lifted Condensation Level: 958.67
Mean mixed layer potential temperature: 298.56
Mean mixed layer mixing ratio: 17.50
1000 hPa to 500 hPa thickness: 5752.00
Precipitable water [mm] for entire sounding: 36.31
</PRE>
<H2>96749 WIII Jakarta Observations at 00Z 03 Oct 1995</H2>
<PRE>
-----------------------------------------------------------------------------
PRES HGHT TEMP DWPT RELH MIXR DRCT SKNT THTA THTE THTV
hPa m C C % g/kg deg knot K K K
-----------------------------------------------------------------------------
1012.0 8 23.6 22.9 96 17.72 140 2 295.7 346.9 298.9
1000.0 107 24.0 21.6 86 16.54 135 3 297.1 345.2 300.1
990.0 195 24.4 20.3 78 15.39 128 4 298.4 343.4 301.2
945.4 600 22.9 20.2 85 16.00 95 7 300.9 348.0 303.7
925.0 791 22.2 20.1 88 16.29 100 6 302.0 350.3 304.9
913.5 900 21.9 18.2 80 14.63 105 6 302.8 346.3 305.4
911.0 924 21.8 17.8 78 14.28 108 6 302.9 345.4 305.5
850.0 1522 17.4 16.7 96 14.28 175 6 304.4 347.1 307.0
836.0 1665 16.4 16.4 100 14.24 157 7 304.8 347.5 307.4
811.0 1925 15.0 14.7 98 13.14 123 8 305.9 345.6 308.3
795.0 2095 14.2 7.2 63 8.08 101 9 306.8 331.6 308.3
794.5 2100 14.2 7.2 63 8.05 100 9 306.8 331.5 308.3
745.0 2642 10.4 2.4 58 6.14 64 11 308.4 327.6 309.6
736.0 2744 11.0 0.0 47 5.23 57 11 310.2 326.7 311.1
713.8 3000 9.2 5.0 75 7.70 40 12 310.9 335.0 312.4
711.0 3033 9.0 5.6 79 8.08 40 12 311.0 336.2 312.6
700.0 3163 8.6 1.6 61 6.18 40 12 312.0 331.5 313.1
688.5 3300 8.3 -6.0 36 3.57 60 12 313.1 324.8 313.8
678.0 3427 8.0 -13.0 21 2.08 70 12 314.2 321.2 314.6
642.0 3874 5.0 -2.0 61 5.17 108 11 315.7 332.4 316.7
633.0 3989 4.4 -11.6 30 2.50 117 10 316.3 324.7 316.8
616.6 4200 3.1 -14.1 27 2.09 135 10 317.1 324.3 317.6
580.0 4694 0.0 -20.0 21 1.36 164 13 319.1 323.9 319.4
572.3 4800 -0.4 -20.7 20 1.29 170 14 319.9 324.5 320.1
510.8 5700 -4.0 -26.6 15 0.86 80 10 326.1 329.2 326.2
500.0 5870 -4.7 -27.7 15 0.79 80 10 327.2 330.2 327.4
497.0 5917 -4.9 -27.9 15 0.78 71 13 327.6 330.5 327.7
491.7 6000 -5.5 -28.3 15 0.76 55 19 327.9 330.7 328.0
473.0 6300 -7.6 -29.9 15 0.68 55 16 328.9 331.4 329.0
436.0 6930 -12.1 -33.1 16 0.54 77 17 330.9 333.0 331.0
400.0 7580 -17.9 -37.9 16 0.37 100 19 331.6 333.1 331.7
388.3 7800 -19.9 -39.9 15 0.31 105 20 331.8 333.1 331.9
386.0 7844 -20.3 -40.3 15 0.30 103 20 331.9 333.1 331.9
372.0 8117 -18.3 -38.3 16 0.38 91 23 338.1 339.6 338.1
343.6 8700 -22.1 -41.4 16 0.30 65 29 340.7 342.0 340.8
329.0 9018 -24.1 -43.1 16 0.26 73 27 342.2 343.2 342.2
300.0 9680 -29.9 -44.9 22 0.23 90 22 343.1 344.1 343.2
278.6 10200 -34.3 85 37 344.1 344.1
266.9 10500 -36.8 60 32 344.7 344.7
255.8 10800 -39.4 65 27 345.2 345.2
250.0 10960 -40.7 65 27 345.4 345.4
204.0 12300 -51.8 55 23 348.6 348.6
200.0 12430 -52.9 55 23 348.8 348.8
194.6 12600 -55.0 60 23 348.1 348.1
160.7 13800 -70.1 35 39 342.4 342.4
153.2 14100 -73.9 35 41 340.6 340.6
150.0 14230 -75.5 35 41 339.9 339.9
131.5 15000 -76.3 50 53 351.6 351.6
124.9 15300 -76.6 50 57 356.2 356.2
122.0 15436 -76.7 57 45 358.3 358.3
118.6 15600 -77.3 65 31 360.2 360.2
115.0 15779 -77.9 65 31 362.2 362.2
112.6 15900 -77.7 85 17 364.8 364.8
107.0 16200 -77.2 130 10 371.2 371.2
100.0 16590 -76.5 120 18 379.7 379.7
</PRE><H3>Station information and sounding indices</H3><PRE>
Station identifier: WIII
Station number: 96749
Observation time: 951003/0000
Station latitude: -6.11
Station longitude: 106.65
Station elevation: 8.0
Showalter index: -0.58
Lifted index: 0.17
LIFT computed using virtual temperature: -0.57
SWEAT index: 222.41
K index: 31.80
Cross totals index: 21.40
Vertical totals index: 22.10
Totals totals index: 43.50
Convective Available Potential Energy: 268.43
CAPE using virtual temperature: 431.38
Convective Inhibition: -84.04
CINS using virtual temperature: -81.56
Equilibrum Level: 141.42
Equilibrum Level using virtual temperature: 141.35
Level of Free Convection: 784.91
LFCT using virtual temperature: 804.89
Bulk Richardson Number: 221.19
Bulk Richardson Number using CAPV: 355.46
Temp [K] of the Lifted Condensation Level: 293.21
Pres [hPa] of the Lifted Condensation Level: 940.03
Mean mixed layer potential temperature: 298.46
Mean mixed layer mixing ratio: 16.01
1000 hPa to 500 hPa thickness: 5763.00
Precipitable water [mm] for entire sounding: 44.54
and here my data
data
and this is what i want to get
contoh
I'm trying to use cor() to return the most correlated elements in order of their correlation. I wrote this function adapting cor() to do it and it works perfectly, but only when I run it on a big input. When I try and run it on a small input, I get a missing value where TRUE/FALSE needed error and I don't understand why?
Here is an example of my input data:
This can be directly copied into R(printed via write.table):
"Col2" "Col3" "Col4" "Col5" "Col6"
"Market Capitalization" NA NA 17082.69 17879.8 16266.11
"Cash & Equivalents" NA NA 747 132 394
"Preferred & Other" NA NA 0 0 0
"Total Debt" NA NA 12379 11982 11309
"Enterprise Value" NA NA 28714.69 29729.8 27181.11
"Total Revenue" 2896.75 3461.25 2818 3184 2901
"Growth % YoY" -0.15 0.68 1.7 3.44 -0.48
"Gross Profit" NA NA 1874 2080 1981
"Margin %" NA NA 66.5 65.33 68.29
"EBITDA" 758 1074 641 777 699
"Margin %1" 26.17 31.03 22.75 24.4 24.1
"Net Income Before XO" 214.5 410 172 192 207
"Margin %2" 7.4 11.85 6.1 6.03 7.14
"Adjusted EPS" 0.7 1.42 0.59 1.07 0.69
"Growth % YoY1" 0.72 -1.67 -3.28 5.94 -6.76
"Cash from Operations" 375.79 812.21 991 -84 961
"Capital Expenditures" NA NA -660 -676 -608
"Free Cash Flow" NA NA 331 -760 353
"Adjusted Price" 2094.66 3689.2 3805.62 3588.42 3582.4
This is the mycor() function I wrote
mycor<-function(dataset, relative.to=19, neg.cor=0){
#This takes the dataset (as a matrix) and computes the best correleted value
#and returns the row (variable ID) that is the most strongly correlated
#to the variable row referenced by relative.to. Use neg.cor = 1 for neg correlation
if(neg.cor == 0){
best.cor <- -1.0 #Have to get better correlation then this
best.cor.row <- integer() #The row with the best correlation
all.cor <- numeric() #The correlation for everything else
index <- 1 #The index for the all.cor array
for(i in 1:nrow(dataset)){
if(i != relative.to){ #No self correlation
temp.cor <- cor(dataset[i,], dataset[relative.to,], use = "na.or.complete")
all.cor[index] <- temp.cor
index <- index+1 #I wish the ++ opperator worked in R...
cat(best.cor)
pause()
if(temp.cor > best.cor){ #This remembers the best seen cor value
best.cor <- temp.cor
best.cor.row <- i
} #End inner if
} #End outter if
} #End for loop
}else{
best.cor <- 1.0 #Have to get better correlation then this
best.cor.row <- integer() #The row with the best correlation
all.cor <- numeric() #The correlation for everything else
index <- 1 #The index for the all.cor array
for(i in 1:nrow(dataset)){
if(i != relative.to){ #No self correlation
temp.cor <- cor(dataset[i,], dataset[relative.to,], use = "na.or.complete")
all.cor[index] <- temp.cor
index <- index+1 #I wish the ++ opperator worked in R...
if(temp.cor < best.cor){ #This remembers the worst seen cor value
best.cor <- temp.cor
best.cor.row <- i
} #End inner if
} #End outter if
} #End for loop
} #End else
return(list(all.cor = all.cor, best.cor.row = best.cor.row))
)
When I try and run this I get: Error in if (temp.cor > best.cor) { : missing value where TRUE/FALSE needed. The part about this that is strange, is that the mycor function works perfectly and gives no error when I give it a larger chunk of the same data set.
This is the larger chunk of the same data set.
This can also be copied into R(printed via write.table):
"Col2" "Col3" "Col4" "Col5" "Col6" "Col7" "Col8" "Col9" "Col10" "Col11" "Col12" "Col13" "Col14" "Col15" "Col16" "Col17" "Col18" "Col19" "Col20" "Col21" "Col22" "Col23" "Col24" "Col25" "Col26" "Col27" "Col28" "Col29" "Col30" "Col31" "Col32" "Col33" "Col34" "Col35" "Col36" "Col37" "Col38" "Col39" "Col40" "Col41" "Col42" "Col43" "Col44" "Col45" "Col46" "Col47" "Col48" "Col49" "Col50" "Col51" "Col52" "Col53" "Col54" "Col55" "Col56" "Col57" "Col58" "Col59" "Col60" "Col61" "Col62" "Col63" "Col64" "Col65" "Col66" "Col67" "Col68" "Col69" "Col70" "Col71" "Col72" "Col73" "Col74" "Col75" "Col76" "Col77" "Col78" "Col79" "Col80" "Col81" "Col82" "Col83" "Col84" "Col85" "Col86" "Col87" "Col88" "Col89" "Col90" "Col91" "Col92" "Col93" "Col94" "Col95" "Col96" "Col97" "Col98" "Col99" "Col100" "Col101" "Col102" "Col103" "Col104" "Col105" "Col106" "Col107" "Col108" "Col109" "Col110" "Col111"
"Market Capitalization" NA NA 17082.69 17879.8 16266.11 17540.1 18214.39 17110.13 18167.87 16700.24 15592.71 14824.06 14455.42 13685.56 12168.31 12550.1 12771.45 11273.2 10284.48 10863.21 10655.99 11750.74 10671.37 10818.32 13288.42 12558.8 12221.79 13213.51 12375.92 11854.12 10942.65 10689.79 11364.1 11887.9 11426.1 10249.34 10609.99 10167.51 9600.1 10001.68 9713.38 9184.3 9730.33 8249.64 9160.61 8586.38 8894.55 8908.81 11887.9 11426.1 10249.34 10609.99 10167.51 9600.1 10001.68 9713.38 9184.3 9730.33 8249.64 9160.61 8586.38 8894.55 8908.81 8566.69 8641.04 8444.84 7867.83 8163.04 7238.2 6279.55 6173.33 7376.47 9048.75 10095.35 10351.52 12311.04 12006.02 10785.58 11009.16 9655.09 7990.1 6918.52 7050.24 6844.2 6520.75 6873.11 7489.61 7459.85 7136.58 6930.38 6401.43 6048.8 5843.01 6224.43 6840.76 7529.23 8452.46 8247.48 8132.72 7632.03 7339.11 6549.2 6165.26 6535.8 5793.52 5621.57 5877.31 5391.98 4792.51 5362.35
"Cash & Equivalents" NA NA 747 132 394 69 1381 769 648 398 492 516 338 198 178 87 260 75 311 651 74 68 1757 144 210 192 186 157 94 234 63 177 81 119 818 477 26 70 487 55 49 49 60 62 117.86 83.4 59.2 108.34 119 818 477 26 70 487 55 49 49 60 62 117.86 83.4 59.2 108.34 271.35 432.14 41.63 59.57 94.83 72.81 37.66 73.6 485.05 188.94 291.14 57.5 102.29 153.82 105.01 198.26 183.46 269.87 12.23 94.9 106.88 117.28 57.37 103.23 342.29 429.89 48.49 111.39 245.22 360.74 80.65 205.1 36.76 203.96 143.32 74.33 282.45 349.66 384.84 238.24 317.86 315.65 291.01 185.21 353.33 160.33 160.31
"Preferred & Other" NA NA 0 0 0 0 0 0 213 213 213 213 213 213 213 213 213 213 213 213 213 213 213 257 256 255 255 254 254 254 255 255 255 254 255 255 252 252 253 254 255 221 222 221 221.47 221.13 221.2 220.79 254 255 255 252 252 253 254 255 221 222 221 221.47 221.13 221.2 220.79 222.09 212.56 249.61 212.56 249.61 212.56 212.56 212.56 249.61 212.56 212.56 212.56 249.61 318.02 318.02 318.02 318.02 322.34 322.42 322.54 322.65 322.74 322.77 322.84 639.92 639.98 640.13 640.24 640.31 640.39 640.47 640.54 640.73 640.89 640.95 641.09 641.25 645.87 634.99 635.05 635.18 637.51 637.73 638.05 638.15 640.53 640.77
"Total Debt" NA NA 12379 11982 11309 11111 11873 11073 10675 10676 10678 11144 10683 11526 11020 11027 10599 10773 10366 10699 10094 9751 9480 9363 9282 9213 8653 8943 8815 8968 8487 8162 8205 7687 7868 7498 7219 7245 7336 7432 7094 6968 6682 7000 6841.23 6584.25 6374.14 6264.74 7687 7868 7498 7219 7245 7336 7432 7094 6968 6682 7000 6841.23 6584.25 6374.14 6264.74 6234.03 6249.6 6448.51 6100.6 6011.55 5693.56 5536.13 5276.01 5449.52 4792.08 4881.68 4471.08 4312.4 4410.61 4480.08 4437.33 4758.17 4432.04 4532.28 4466.59 4387.54 4313.86 4316.43 4316.66 4146.02 4175.36 4082.33 4085.09 4089.16 4116.98 3970.11 3972.46 3827.89 3850.12 3927.94 3722.68 3709.36 3804.58 3658.69 3885.52 3667.45 3734.29 3737 3615.16 3492.38 3374.62 3229.81
"Enterprise Value" NA NA 28714.69 29729.8 27181.11 28582.1 28706.39 27414.13 28407.87 27191.24 25991.71 25665.06 25013.42 25226.56 23223.31 23703.1 23323.45 22184.2 20552.48 21124.21 20888.99 21646.74 18607.37 20294.32 22616.42 21834.8 20943.79 22253.51 21350.92 20842.12 19621.65 18929.79 19743.1 19709.9 18731.1 17525.34 18054.99 17594.51 16702.1 17632.68 17013.38 16324.3 16574.33 15408.64 16105.45 15308.35 15430.68 15286 19709.9 18731.1 17525.34 18054.99 17594.51 16702.1 17632.68 17013.38 16324.3 16574.33 15408.64 16105.45 15308.35 15430.68 15286 14751.46 14671.06 15101.34 14121.44 14329.37 13071.51 11990.59 11588.31 12590.55 13864.46 14898.46 14977.66 16770.77 16580.82 15478.67 15566.25 14547.82 12474.62 11760.98 11744.46 11447.51 11040.07 11454.93 12025.88 11903.5 11522.02 11604.35 11015.38 10533.05 10239.65 10754.35 11248.66 11961.09 12739.51 12673.05 12422.15 11700.18 11439.9 10458.04 10447.58 10520.58 9849.67 9705.29 9945.31 9169.17 8647.34 9072.61
"Total Revenue" 2896.75 3461.25 2818 3184 2901 3438 2771 3078 2915 3629 2993 3349 3140 3707 3017 3462 3273 3489 2845 3423 2998 3858 3149 3577 3228 3579 2957 3357 2649 3441 2555 3317 3107 3337 2395 2800 2181 2734 2164 2685 2279 2801 2176 2570 2057.03 2539.49 1848 2056 3337 2395 2800 2181 2734 2164 2685 2279 2801 2176 2570 2057.03 2539.49 1848 2056 1942.6 2627.56 2112.22 2886.26 2250.13 2820.78 2041.89 2318.59 1963.38 2346.24 1479.08 1776.59 1617.34 2061.62 1561.04 1853.05 1720.06 2011.03 1504.01 1886.15 1632.3 1920.34 1539.73 1867.36 1528.38 1879.88 1459.85 1668.79 1461.25 1821.99 1392.09 1697.76 1483.61 1799.69 1396.01 1586.08 1478.81 1717.88 1280.11 1456.11 1342.73 1720.3 1330.65 1479.39 1367.21 1613.83 1263.27
"Growth % YoY" -0.15 0.68 1.7 3.44 -0.48 -5.26 -7.42 -8.09 -7.17 -2.1 -0.8 -3.26 -4.06 6.25 6.05 1.14 9.17 -9.56 -9.65 -4.31 -7.13 7.8 6.49 6.55 21.86 4.01 15.73 1.21 -14.74 3.12 6.68 18.46 42.46 22.06 10.67 4.28 -4.3 -2.39 -0.55 4.47 10.79 10.3 17.75 25 5.89 -3.35 -12.51 -28.77 22.06 10.67 4.28 -4.3 -2.39 -0.55 4.47 10.79 10.3 17.75 25 5.89 -3.35 -12.51 -28.77 -13.67 -6.85 3.44 24.48 14.6 20.23 38.05 30.51 21.4 13.81 -5.25 -4.13 -5.97 2.52 3.79 -1.75 5.38 4.72 -2.32 1.01 6.8 2.15 5.47 11.9 4.59 3.18 4.87 -1.71 -1.51 1.24 -0.28 7.04 0.32 4.76 9.05 8.93 10.13 -0.14 -3.8 -1.57 -1.79 6.6 5.33 -1.02 NA NA NA
"Gross Profit" NA NA 1874 2080 1981 2393 1934 1993 1846 2244 1794 2000 1942 2103 1723 1826 1700 1979 1558 1551 1459 1531 1420 1588 1478 1595 1317 1506 1273 1554 1202 1322 1179 1460 1097 1217 916 1285 980 1169 1066 1349 975 1157 1024.93 1317.57 980 1091 1460 1097 1217 916 1285 980 1169 1066 1349 975 1157 1024.93 1317.57 980 1091 1052.71 1368.8 1091.61 1236.41 991.8 1374.86 1043.29 1236.87 1129.87 1507.31 998.19 1190.69 1151.22 1475.08 1025.84 1170.8 1115.9 1438.56 981.96 1159.37 1094.25 1401.25 1001.2 1198.64 1079.65 1405.45 984.46 1196.22 1086.13 1415.37 998.06 1177.1 1086.53 1381.01 971.41 1118.91 1055.19 1331.37 947.22 1036.88 991.58 1301.1 921.48 994.97 967.89 1217.32 848.39
"Margin %" NA NA 66.5 65.33 68.29 69.6 69.79 64.75 63.33 61.84 59.94 59.72 61.85 56.73 57.11 52.74 51.94 56.72 54.76 45.31 48.67 39.68 45.09 44.39 45.79 44.57 44.54 44.86 48.06 45.16 47.05 39.86 37.95 43.75 45.8 43.46 42 47 45.29 43.54 46.77 48.16 44.81 45.02 49.83 51.88 53.03 53.06 43.75 45.8 43.46 42 47 45.29 43.54 46.77 48.16 44.81 45.02 49.83 51.88 53.03 53.06 54.19 52.09 51.68 42.84 44.08 48.74 51.09 53.35 57.55 64.24 67.49 67.02 71.18 71.55 65.72 63.18 64.88 71.53 65.29 61.47 67.04 72.97 65.02 64.19 70.64 74.76 67.44 71.68 74.33 77.68 71.7 69.33 73.24 76.74 69.58 70.55 71.35 77.5 74 71.21 73.85 75.63 69.25 67.26 70.79 75.43 67.16
"EBITDA" 758 1074 641 777 699 1091 711 794 684 978 617 844 708 916 640 696 625 885 569 611 567 586 520 702 596 715 510 694 547 670 467 564 423 717 411 533 274 624 367 497 458 669 334 485 388.44 693.3 384 487 717 411 533 274 624 367 497 458 669 334 485 388.44 693.3 384 487 445 695.27 439.32 538.75 377.16 666.39 492.65 526.86 446.87 748.34 331.51 492.91 430.87 760.5 313.33 474.78 434.79 751.92 280.96 463.41 390.79 712.97 313.14 490.27 368.26 711.24 307.36 506.85 383.64 721.41 317.3 474.34 363.04 678.27 279.09 400.41 320.03 637.82 281.47 340.21 297.39 610.07 247.48 300.27 305.15 561.67 203.06
"Margin %1" 26.17 31.03 22.75 24.4 24.1 31.73 25.66 25.8 23.46 26.95 20.61 25.2 22.55 24.71 21.21 20.1 19.1 25.37 20 17.85 18.91 15.19 16.51 19.63 18.46 19.98 17.25 20.67 20.65 19.47 18.28 17 13.61 21.49 17.16 19.04 12.56 22.82 16.96 18.51 20.1 23.88 15.35 18.87 18.88 27.3 20.78 23.69 21.49 17.16 19.04 12.56 22.82 16.96 18.51 20.1 23.88 15.35 18.87 18.88 27.3 20.78 23.69 22.91 26.46 20.8 18.67 16.76 23.62 24.13 22.72 22.76 31.9 22.41 27.74 26.64 36.89 20.07 25.62 25.28 37.39 18.68 24.57 23.94 37.13 20.34 26.25 24.09 37.83 21.05 30.37 26.25 39.59 22.79 27.94 24.47 37.69 19.99 25.25 21.64 37.13 21.99 23.36 22.15 35.46 18.6 20.3 22.32 34.8 16.07
"Net Income Before XO" 214.5 410 172 192 207 440 214 280 193 386 168 314 236 353 186 229 205 339 153 183 163 185 283 303 209 313 154 261 205 234 129 183 148 290 121 184 55 253 92 158 50 260 69 157 123.03 286.54 101 169 290 121 184 55 253 92 158 50 260 69 157 123.03 286.54 101 169 128.51 280.74 104.07 182.51 49.48 283.27 72.14 191.53 124.96 339.41 69.8 180.05 135.23 351.55 66.51 176.45 143.61 355.04 47.56 166.61 120.15 327.99 71.42 188.48 113.12 333.3 76.4 201.03 117.88 339.87 87.21 189.31 117.29 324.84 62.45 153.94 100.63 309.44 77.54 116.48 92.2 303.36 64.65 106.7 121.1 263.26 49.06
"Margin %2" 7.4 11.85 6.1 6.03 7.14 12.8 7.72 9.1 6.62 10.64 5.61 9.38 7.52 9.52 6.17 6.61 6.26 9.72 5.38 5.35 5.44 4.8 8.99 8.47 6.47 8.75 5.21 7.77 7.74 6.8 5.05 5.52 4.76 8.69 5.05 6.57 2.52 9.25 4.25 5.88 2.19 9.28 3.17 6.11 5.98 11.28 5.47 8.22 8.69 5.05 6.57 2.52 9.25 4.25 5.88 2.19 9.28 3.17 6.11 5.98 11.28 5.47 8.22 6.62 10.68 4.93 6.32 2.2 10.04 3.53 8.26 6.36 14.47 4.72 10.13 8.36 17.05 4.26 9.52 8.35 17.65 3.16 8.83 7.36 17.08 4.64 10.09 7.4 17.73 5.23 12.05 8.07 18.65 6.26 11.15 7.91 18.05 4.47 9.71 6.8 18.01 6.06 8 6.87 17.63 4.86 7.21 8.86 16.31 3.88
"Adjusted EPS" 0.7 1.42 0.59 1.07 0.69 1.44 0.61 1.01 0.74 1.33 0.57 0.99 0.69 1.32 0.51 0.93 0.67 1.16 0.48 0.78 0.72 0.98 0.42 0.87 0.71 1.2 0.58 1.03 0.78 0.92 0.51 0.86 0.59 1.17 0.48 0.75 0.49 1.08 0.38 0.69 0.65 1.16 0.29 0.72 0.56 1.33 0.46 0.78 1.17 0.48 0.75 0.49 1.08 0.38 0.69 0.65 1.16 0.29 0.72 0.56 1.33 0.46 0.78 0.59 1.3 0.48 0.84 0.52 1.4 0.33 0.88 0.57 1.5 0.3 0.76 0.56 1.49 0.26 0.73 0.59 1.49 0.18 0.69 0.49 1.38 0.28 0.78 0.44 1.38 0.29 0.82 0.47 1.41 0.33 0.77 0.46 1.35 0.23 0.62 0.39 1.3 0.3 0.47 0.36 1.29 0.24 0.43 0.49 1.11 0.18
"Growth % YoY1" 0.72 -1.67 -3.28 5.94 -6.76 8.27 7.02 2.02 7.25 0.76 11.76 6.45 2.99 13.79 6.25 19.23 -6.94 18.37 14.29 -10.34 1.41 -18.33 -27.59 -15.53 -8.97 30.43 13.73 19.77 32.2 -21.37 6.25 14.67 20.41 8.33 26.32 8.7 -24.62 -6.9 31.03 -4.17 16.07 -12.78 -36.96 -7.69 -5.08 2.31 -4.17 -7.14 8.33 26.32 8.7 -24.62 -6.9 31.03 -4.17 16.07 -12.78 -36.96 -7.69 -5.08 2.31 -4.17 -7.14 13.46 -7.14 45.45 -4.55 -8.77 -6.67 10 15.79 1.79 0.67 13.64 4.11 -5.08 -0.07 44.44 5.89 20.41 8.05 -34.72 -11.62 11.36 0 -3.45 -4.88 -6.38 -2.13 -12.12 6.49 2.17 4.44 43.48 24.19 17.95 3.85 -23.33 31.91 8.33 0.78 25 9.3 -26.53 16.22 33.33 -23.21 NA NA NA
"Cash from Operations" 375.79 812.21 991 -84 961 391 845 402 976 572 1227 362 1407 179 794 1 997 26 798 645 581 -1237 733 563 630 109 346 481 710 -162 224 593 177 581 -346 389 525 164 490 152 766 218 492 -58 735.49 285 369 146 581 -346 389 525 164 490 152 766 218 492 -58 735.49 285 369 146 490.18 387.73 254.59 141.41 215.82 279.84 489.5 199.17 -325.31 -66.66 280.22 256.65 718.82 438.66 302.05 244.37 -52.38 647.78 53.19 258.9 294.29 359.1 267.8 184.51 310.07 585.52 233.75 145.31 426.63 480.57 187.86 270.34 236.08 472.92 243.13 69.8 261.19 291.41 285.57 77.33 283.64 328.4 309.68 11.95 357.21 141.59 357.15
"Capital Expenditures" NA NA -660 -676 -608 -478 -635 -523 -542 -503 -629 -460 -599 -548 -551 -465 -719 -531 -595 -529 -785 -584 -608 -547 -638 -519 -485 -482 -583 -480 -537 -420 -619 -385 -426 -390 -431 -439 -308 -373 -448 -356 -404 -317 -593.69 -310 -392 -340 -385 -426 -390 -431 -439 -308 -373 -448 -356 -404 -317 -593.69 -310 -392 -340 -302.22 -394.08 -274.8 -228.02 -75.57 -274.36 -684.94 -207.41 -211.95 -218.98 -157.07 -127.56 -210.59 -156.81 -150.58 -127.3 -226.32 -145.55 -171.37 -140.37 -244.12 -167.92 -185.35 -142.94 -239.55 -165.98 -166.25 -147.38 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
"Free Cash Flow" NA NA 331 -760 353 -87 210 -121 434 69 598 -98 808 -369 243 -464 278 -505 203 116 -204 -1821 125 16 -8 -410 -139 -1 127 -642 -313 173 -442 196 -772 -1 94 -275 182 -221 318 -138 88 -375 141.79 -25 -23 -194 196 -772 -1 94 -275 182 -221 318 -138 88 -375 141.79 -25 -23 -194 187.96 -6.35 -20.21 -86.61 140.26 5.47 -195.45 -8.24 -537.26 -285.64 123.15 129.09 508.23 281.85 151.46 117.07 -278.7 502.23 -118.18 118.53 50.17 191.18 82.45 41.57 70.51 419.54 67.49 -2.08 426.63 480.57 187.86 270.34 236.08 472.92 243.13 69.8 261.19 291.41 285.57 77.33 283.64 328.4 309.68 11.95 357.21 141.59 357.15
"Adjusted Price" 2094.66 3689.2 3805.62 3588.42 3582.4 3885.75 3523.13 3554.9 3420.27 3141.36 2984.19 2838.81 2760.09 2517.44 2447.56 2403.89 2188.98 1960.8 1952.2 2033.87 2099.97 1993.98 2043.36 2296.42 2201.73 2277.15 2301.5 2203.47 2086.87 1938.95 2019.34 2002.47 2048.12 1881.97 1817.17 1807.02 1664.57 1659.78 1717.25 1585.27 1589.9 1506.13 1534.98 1531.24 1498.21 1528.96 1418.46 1431.1 1343.43 1244.04 1194.62 1076.93 1058.66 960.76 1112.69 1322.69 1414.59 1442.28 1545.6 1364.27 1305.46 1231.15 1022.23 869.37 796.9 820.22 762.84 715.9 756.11 816.37 731.97 705.73 657.84 628.55 571.47 624.67 651.89 676.63 759.77 742.27 734.39 657.44 619.61 569.84 524.2 510.26 475.43 449.8 441.27 409.34 383 413.34 441.72 435.71 419.07 385.87 356.85 346.15 326.97 318.45 323.72 314.18 313.22 300.88 329.3 315.1 312.34 279.11 163.47 NA
The larger chunk works perfectly, but I need to be able to check the correlation on the smaller sections. I'm really new to R so it might be easy, but I've read the boards here and the r manuals and can't find it.
In your example above, your code fails on the first (smaller) data set because row 3 consists only of 0's and NA's, so it has a standard deviation of 0 and so its correlation with any other row will return NA, since computing correlation involves dividing the sample covariance by the sample standard deviation of each vector. It doesn't happen in the larget example because row 3 has sufficient variation to have a non-zero standard deviation.
However, your approach seems a bit convoluted. If you want to compute the correlation between a single row in the matrix and all other rows, sorted by correlation, then you can use cor() on the transposed matrix and sort the result, for example:
mycor <- function(dataset, relative.to=19) {
mat <- t(dataset)
cors <- cor(mat, mat[, relative.to], use="na.or.complete")
cors[order(drop(cors)), ]
}
mycor(dataset)