Related
I'm running 100,000 simulations using R, and this code chunk is part of the simulation process. I understand that R is not very kind with for loops, so I would like to optimize my loop with (I think) an apply function, though I haven't figured that out yet.
I've tried a nested sapply, but it only seemed to work with the first row that fulfilled the if statement.
settlements <- matrix(data = NA, nrow = 40, ncol = 5)
settlements[-seq(3, 40, 3), ] <- 0
for(i in seq(3, 40, 3)){
for(j in 1:ncol(settlements)){
if(j > i){
settlements[i, j] <- 0
}else{
settlements[i, j] <- max(min(sum(expected_claims_pricing_cumulative[c(1:i), j]), sum(actual_claims[c(1:i), j])) - sum(expected_claims_discounted_cumulative[c(1:i), j]) - sum(settlements[c(1:(i - 1)), j]), 0)
}
}
}
So the code chunk above works, but clearly inefficient.
This is a sample output of the code:
[,1] [,2] [,3] [,4] [,5]
[1,] 0.0 0 0.0 0.0 0
[2,] 0.0 0 0.0 0.0 0
[3,] 121571.9 0 297009.7 0.0 0
[4,] 0.0 0 0.0 0.0 0
[5,] 0.0 0 0.0 0.0 0
[6,] 217259.3 0 881364.8 1543090.3 1821937
[7,] 0.0 0 0.0 0.0 0
[8,] 0.0 0 0.0 0.0 0
[9,] 219615.8 0 1398676.8 2050454.9 2788436
[10,] 0.0 0 0.0 0.0 0
[11,] 0.0 0 0.0 0.0 0
[12,] 577947.7 0 2007643.5 2995483.9 4163100
[13,] 0.0 0 0.0 0.0 0
[14,] 0.0 0 0.0 0.0 0
[15,] 641731.9 1184557 2148414.1 4253819.1 5908208
[16,] 0.0 0 0.0 0.0 0
[17,] 0.0 0 0.0 0.0 0
[18,] 862253.0 0 0.0 5727515.7 8081478
[19,] 0.0 0 0.0 0.0 0
[20,] 0.0 0 0.0 0.0 0
[21,] 796571.5 0 1588740.9 7260817.4 10380325
[22,] 0.0 0 0.0 0.0 0
[23,] 0.0 0 0.0 0.0 0
[24,] 302051.9 0 0.0 4496129.9 0
[25,] 0.0 0 0.0 0.0 0
[26,] 0.0 0 0.0 0.0 0
[27,] 112847.7 0 0.0 2148951.8 0
[28,] 0.0 0 0.0 0.0 0
[29,] 0.0 0 0.0 0.0 0
[30,] 0.0 0 0.0 338543.2 0
[31,] 0.0 0 0.0 0.0 0
[32,] 0.0 0 0.0 0.0 0
[33,] 0.0 0 0.0 0.0 0
[34,] 0.0 0 0.0 0.0 0
[35,] 0.0 0 0.0 0.0 0
[36,] 0.0 0 0.0 0.0 0
[37,] 0.0 0 0.0 0.0 0
[38,] 0.0 0 0.0 0.0 0
[39,] 0.0 0 0.0 0.0 0
[40,] 0.0 0 0.0 0.0 0
expected claims pricing cumulative:
1 246504.7 246504.7 246504.7 246504.7 246504.7
2 359684.6 729441.6 729441.6 729441.6 729441.6
3 450958.0 990484.9 1606746.6 1606746.6 1606746.6
4 536705.6 1213142.7 2112354.2 2851868.2 2851868.2
5 616642.8 1421701.3 2549096.4 3628150.1 4614168.9
6 735863.2 1660827.4 3002591.5 4355465.6 5794203.9
7 844157.6 1947952.3 3489559.4 5099676.3 6903508.4
8 958274.5 2224510.9 4064168.9 5914097.3 8060919.8
9 1060873.2 2498285.0 4608679.0 6816268.5 9282839.9
10 1177249.5 2768559.4 5164245.7 7696718.5 10640171.2
11 1358375.2 3124249.5 5776432.6 8651256.2 12027886.6
12 1536404.6 3573967.4 6517091.2 9699710.9 13532809.1
13 1712181.1 4016788.0 7412726.0 10944474.6 15187967.5
14 1868723.0 4436994.7 8278006.1 12353131.8 17062129.9
15 1999373.6 4802458.2 9082910.9 13692124.6 19125625.6
16 2289300.0 5288360.4 9960168.0 15096711.3 21242329.6
17 2492864.7 5926814.7 10925248.8 16531417.9 23380142.2
18 2715687.3 6454984.3 12178234.4 18176355.2 25651247.4
19 2955382.3 7028913.2 13261074.9 20128975.0 28126469.5
20 3200558.6 7633632.0 14422850.2 21901444.3 31058644.4
21 770767.5 5571605.4 12960061.0 21107123.0 31078581.7
22 823184.7 1979336.0 9980732.5 18846879.2 29709628.5
23 875275.7 2110052.8 4036971.6 13638647.3 25460176.4
24 928077.7 2240991.2 4298953.1 6611255.6 19413489.9
25 981284.6 2373401.1 4561590.4 7031144.6 10114214.6
26 0.0 1471926.9 3792121.1 6417948.2 9710687.1
27 0.0 0.0 2453211.5 5237444.5 8738547.3
28 0.0 0.0 0.0 2943853.8 6656164.5
29 0.0 0.0 0.0 0.0 3925138.4
30 0.0 0.0 0.0 0.0 0.0
31 0.0 0.0 0.0 0.0 0.0
32 0.0 0.0 0.0 0.0 0.0
33 0.0 0.0 0.0 0.0 0.0
34 0.0 0.0 0.0 0.0 0.0
35 0.0 0.0 0.0 0.0 0.0
36 0.0 0.0 0.0 0.0 0.0
37 0.0 0.0 0.0 0.0 0.0
38 0.0 0.0 0.0 0.0 0.0
39 0.0 0.0 0.0 0.0 0.0
40 0.0 0.0 0.0 0.0 0.0
expected claims discounted cumulative:
1 218156.6 218156.6 218156.6 218156.6 218156.6
2 318320.9 645555.8 645555.8 645555.8 645555.8
3 399097.9 876579.2 1421970.8 1421970.8 1421970.8
4 474984.5 1073631.3 1869433.4 2523903.4 2523903.4
5 545728.9 1258205.6 2255950.3 3210912.9 4083539.5
6 651238.9 1469832.3 2657293.5 3854587.0 5127870.5
7 747079.5 1723937.8 3088260.1 4513213.5 6109605.0
8 848073.0 1968692.2 3596789.4 5233976.2 7133914.1
9 938872.8 2210982.3 4078680.9 6032397.7 8215313.3
10 1041865.8 2450175.0 4570357.5 6811595.9 9416551.5
11 1202162.1 2764960.8 5112142.9 7656361.8 10644679.6
12 1359718.1 3162961.2 5767625.8 8584244.2 11976536.1
13 1515280.3 3554857.3 6560262.6 9685860.0 13441351.3
14 1653819.9 3926740.3 7326035.4 10932521.7 15099985.0
15 1769445.7 4250175.5 8038376.2 12117530.3 16926178.6
16 2026030.5 4680199.0 8814748.7 13360589.5 18799461.7
17 2206185.2 5245231.0 9668845.1 14630304.8 20691425.9
18 2403383.3 5712661.1 10777737.4 16086074.4 22701353.9
19 2615513.3 6220588.2 11736051.3 17814142.9 24891925.5
20 2832494.4 6755764.3 12764222.5 19382778.2 27486900.3
21 682129.2 4930870.8 11469654.0 18679803.8 27504544.8
22 728518.5 1751712.4 8832948.2 16679488.1 26293021.2
23 774619.0 1867396.7 3572719.8 12070202.9 22532256.1
24 821348.7 1983277.2 3804573.5 5850961.2 17180938.6
25 868436.9 2100460.0 4037007.5 6222562.9 8951079.9
26 0.0 1302655.3 3356027.1 5679884.1 8593958.1
27 0.0 0.0 2171092.2 4635138.4 7733614.4
28 0.0 0.0 0.0 2605310.6 5890705.5
29 0.0 0.0 0.0 0.0 3473747.5
30 0.0 0.0 0.0 0.0 0.0
31 0.0 0.0 0.0 0.0 0.0
32 0.0 0.0 0.0 0.0 0.0
33 0.0 0.0 0.0 0.0 0.0
34 0.0 0.0 0.0 0.0 0.0
35 0.0 0.0 0.0 0.0 0.0
36 0.0 0.0 0.0 0.0 0.0
37 0.0 0.0 0.0 0.0 0.0
38 0.0 0.0 0.0 0.0 0.0
39 0.0 0.0 0.0 0.0 0.0
40 0.0 0.0 0.0 0.0 0.0
actual claims:
[,1] [,2] [,3] [,4] [,5]
[1,] 0 0 0 60000 610000
[2,] 0 420000 530000 960000 410000
[3,] 1110000 660000 4140000 2260000 810000
[4,] 330000 750000 3550000 3480000 1070000
[5,] 1790000 850000 2140000 5090000 6120000
[6,] 2110000 540000 3940000 7440000 9500000
[7,] 0 1260000 2170000 7010000 9110000
[8,] 120000 3520000 5280000 6260000 10900000
[9,] 240000 210000 2610000 5730000 9140000
[10,] 2130000 2070000 5840000 14570000 6600000
[11,] 1430000 5110000 5810000 13540000 18910000
[12,] 860000 1140000 3970000 11630000 14430000
[13,] 2410000 5890000 8360000 13220000 15890000
[14,] 1550000 950000 9820000 12150000 21450000
[15,] 1960000 9370000 5780000 7740000 18530000
[16,] 4160000 5430000 4590000 14150000 17190000
[17,] 3930000 3450000 8190000 16840000 20080000
[18,] 590000 5290000 11690000 17580000 25380000
[19,] 2760000 4470000 19390000 20990000 28200000
[20,] 3450000 7140000 9740000 21740000 21070000
[21,] 820000 3780000 13220000 24690000 33260000
[22,] 0 2410000 4380000 19070000 30780000
[23,] 1930000 360000 2030000 9470000 15680000
[24,] 2460000 3620000 3140000 1540000 12560000
[25,] 0 4170000 6600000 5970000 8770000
[26,] 0 1280000 6890000 4940000 4530000
[27,] 0 0 740000 2880000 8280000
[28,] 0 0 0 4020000 9550000
[29,] 0 0 0 0 4960000
[30,] 0 0 0 0 0
[31,] 0 0 0 0 0
[32,] 0 0 0 0 0
[33,] 0 0 0 0 0
[34,] 0 0 0 0 0
[35,] 0 0 0 0 0
[36,] 0 0 0 0 0
[37,] 0 0 0 0 0
[38,] 0 0 0 0 0
[39,] 0 0 0 0 0
[40,] 0 0 0 0 0
You can use outer to get the elements and either [<- or replace to put them in.
settlements <- matrix(data = 0, nrow = 40, ncol = 5)
myfun <- function(i, j){
3*i/sqrt(j)
}
settlements[seq(3, nrow(settlements), 3),] <-
outer(seq(3, nrow(settlements), 3), seq(ncol(settlements)), myfun)
# or
# settlements[seq(3, nrow(settlements), 3),] <-
# do.call(myfun,
# expand.grid(i = seq(3, nrow(settlements), 3),
# j = seq(ncol(settlements))))
# or replace(settlements, row(settlements) %% 3 == 0, outer_output)
# if you want to create a new matrix
settlements
# [,1] [,2] [,3] [,4] [,5]
# [1,] 0 0.000000 0.000000 0.0 0.000000
# [2,] 0 0.000000 0.000000 0.0 0.000000
# [3,] 9 6.363961 5.196152 4.5 4.024922
# [4,] 0 0.000000 0.000000 0.0 0.000000
# [5,] 0 0.000000 0.000000 0.0 0.000000
# [6,] 18 12.727922 10.392305 9.0 8.049845
# [7,] 0 0.000000 0.000000 0.0 0.000000
# [8,] 0 0.000000 0.000000 0.0 0.000000
# [9,] 27 19.091883 15.588457 13.5 12.074767
# [10,] 0 0.000000 0.000000 0.0 0.000000
# [11,] 0 0.000000 0.000000 0.0 0.000000
# [12,] 36 25.455844 20.784610 18.0 16.099689
# [13,] 0 0.000000 0.000000 0.0 0.000000
# [14,] 0 0.000000 0.000000 0.0 0.000000
# [15,] 45 31.819805 25.980762 22.5 20.124612
# [16,] 0 0.000000 0.000000 0.0 0.000000
# [17,] 0 0.000000 0.000000 0.0 0.000000
# [18,] 54 38.183766 31.176915 27.0 24.149534
# [19,] 0 0.000000 0.000000 0.0 0.000000
# [20,] 0 0.000000 0.000000 0.0 0.000000
# [21,] 63 44.547727 36.373067 31.5 28.174457
# [22,] 0 0.000000 0.000000 0.0 0.000000
# [23,] 0 0.000000 0.000000 0.0 0.000000
# [24,] 72 50.911688 41.569219 36.0 32.199379
# [25,] 0 0.000000 0.000000 0.0 0.000000
# [26,] 0 0.000000 0.000000 0.0 0.000000
# [27,] 81 57.275649 46.765372 40.5 36.224301
# [28,] 0 0.000000 0.000000 0.0 0.000000
# [29,] 0 0.000000 0.000000 0.0 0.000000
# [30,] 90 63.639610 51.961524 45.0 40.249224
# [31,] 0 0.000000 0.000000 0.0 0.000000
# [32,] 0 0.000000 0.000000 0.0 0.000000
# [33,] 99 70.003571 57.157677 49.5 44.274146
# [34,] 0 0.000000 0.000000 0.0 0.000000
# [35,] 0 0.000000 0.000000 0.0 0.000000
# [36,] 108 76.367532 62.353829 54.0 48.299068
# [37,] 0 0.000000 0.000000 0.0 0.000000
# [38,] 0 0.000000 0.000000 0.0 0.000000
# [39,] 117 82.731493 67.549981 58.5 52.323991
# [40,] 0 0.000000 0.000000 0.0 0.000000
You can always just write an operation directly in terms of row() and col() to create a new matrix
row(settlements)/3 + col(settlements)^2.5
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1.333333 5.990188 15.92179 32.33333 56.23503
# [2,] 1.666667 6.323521 16.25512 32.66667 56.56837
# [3,] 2.000000 6.656854 16.58846 33.00000 56.90170
# [4,] 2.333333 6.990188 16.92179 33.33333 57.23503
# [5,] 2.666667 7.323521 17.25512 33.66667 57.56837
# [6,] 3.000000 7.656854 17.58846 34.00000 57.90170
# [7,] 3.333333 7.990188 17.92179 34.33333 58.23503
# [8,] 3.666667 8.323521 18.25512 34.66667 58.56837
# [9,] 4.000000 8.656854 18.58846 35.00000 58.90170
# [10,] 4.333333 8.990188 18.92179 35.33333 59.23503
# [11,] 4.666667 9.323521 19.25512 35.66667 59.56837
# [12,] 5.000000 9.656854 19.58846 36.00000 59.90170
# [13,] 5.333333 9.990188 19.92179 36.33333 60.23503
# [14,] 5.666667 10.323521 20.25512 36.66667 60.56837
# [15,] 6.000000 10.656854 20.58846 37.00000 60.90170
# [16,] 6.333333 10.990188 20.92179 37.33333 61.23503
# [17,] 6.666667 11.323521 21.25512 37.66667 61.56837
# [18,] 7.000000 11.656854 21.58846 38.00000 61.90170
# [19,] 7.333333 11.990188 21.92179 38.33333 62.23503
# [20,] 7.666667 12.323521 22.25512 38.66667 62.56837
# [21,] 8.000000 12.656854 22.58846 39.00000 62.90170
# [22,] 8.333333 12.990188 22.92179 39.33333 63.23503
# [23,] 8.666667 13.323521 23.25512 39.66667 63.56837
# [24,] 9.000000 13.656854 23.58846 40.00000 63.90170
# [25,] 9.333333 13.990188 23.92179 40.33333 64.23503
# [26,] 9.666667 14.323521 24.25512 40.66667 64.56837
# [27,] 10.000000 14.656854 24.58846 41.00000 64.90170
# [28,] 10.333333 14.990188 24.92179 41.33333 65.23503
# [29,] 10.666667 15.323521 25.25512 41.66667 65.56837
# [30,] 11.000000 15.656854 25.58846 42.00000 65.90170
# [31,] 11.333333 15.990188 25.92179 42.33333 66.23503
# [32,] 11.666667 16.323521 26.25512 42.66667 66.56837
# [33,] 12.000000 16.656854 26.58846 43.00000 66.90170
# [34,] 12.333333 16.990188 26.92179 43.33333 67.23503
# [35,] 12.666667 17.323521 27.25512 43.66667 67.56837
# [36,] 13.000000 17.656854 27.58846 44.00000 67.90170
# [37,] 13.333333 17.990188 27.92179 44.33333 68.23503
# [38,] 13.666667 18.323521 28.25512 44.66667 68.56837
# [39,] 14.000000 18.656854 28.58846 45.00000 68.90170
# [40,] 14.333333 18.990188 28.92179 45.33333 69.23503
A more complex example using ifelse
new <-
ifelse(sqrt(row(settlements)*col(settlements)) > 5,
row(settlements)*col(settlements)^1.2,
row(settlements) + 44)
dim(new) <- dim(settlements)
new
# [,1] [,2] [,3] [,4] [,5]
# [1,] 45 45.00000 45.00000 45.00000 45.00000
# [2,] 46 46.00000 46.00000 46.00000 46.00000
# [3,] 47 47.00000 47.00000 47.00000 47.00000
# [4,] 48 48.00000 48.00000 48.00000 48.00000
# [5,] 49 49.00000 49.00000 49.00000 49.00000
# [6,] 50 50.00000 50.00000 50.00000 41.39189
# [7,] 51 51.00000 51.00000 36.94622 48.29054
# [8,] 52 52.00000 52.00000 42.22425 55.18919
# [9,] 53 53.00000 33.63474 47.50228 62.08783
# [10,] 54 54.00000 37.37193 52.78032 68.98648
# [11,] 55 55.00000 41.10912 58.05835 75.88513
# [12,] 56 56.00000 44.84631 63.33638 82.78378
# [13,] 57 29.86616 48.58351 68.61441 89.68243
# [14,] 58 32.16355 52.32070 73.89244 96.58108
# [15,] 59 34.46095 56.05789 79.17047 103.47972
# [16,] 60 36.75835 59.79509 84.44851 110.37837
# [17,] 61 39.05574 63.53228 89.72654 117.27702
# [18,] 62 41.35314 67.26947 95.00457 124.17567
# [19,] 63 43.65054 71.00666 100.28260 131.07432
# [20,] 64 45.94793 74.74386 105.56063 137.97297
# [21,] 65 48.24533 78.48105 110.83866 144.87161
# [22,] 66 50.54273 82.21824 116.11670 151.77026
# [23,] 67 52.84012 85.95543 121.39473 158.66891
# [24,] 68 55.13752 89.69263 126.67276 165.56756
# [25,] 69 57.43492 93.42982 131.95079 172.46621
# [26,] 26 59.73231 97.16701 137.22882 179.36486
# [27,] 27 62.02971 100.90421 142.50685 186.26350
# [28,] 28 64.32711 104.64140 147.78489 193.16215
# [29,] 29 66.62450 108.37859 153.06292 200.06080
# [30,] 30 68.92190 112.11578 158.34095 206.95945
# [31,] 31 71.21930 115.85298 163.61898 213.85810
# [32,] 32 73.51669 119.59017 168.89701 220.75675
# [33,] 33 75.81409 123.32736 174.17504 227.65539
# [34,] 34 78.11149 127.06456 179.45308 234.55404
# [35,] 35 80.40888 130.80175 184.73111 241.45269
# [36,] 36 82.70628 134.53894 190.00914 248.35134
# [37,] 37 85.00368 138.27613 195.28717 255.24999
# [38,] 38 87.30107 142.01333 200.56520 262.14864
# [39,] 39 89.59847 145.75052 205.84323 269.04728
# [40,] 40 91.89587 149.48771 211.12127 275.94593
edit
Based on the answer from #IceCreamToucan above---I didn't know about row and col!---you could also do this much simply as follows:
map2(row(settlements), col(settlements), FUN) %>%
matrix(nrow = 40, ncol = 5)
original answer
Unfortunately, if you want to do something functionally in R, it's much easier to think of things as vectors, rather than as matrices. Fortunately, matrices really are vectors under the hood; the hard part is getting back the i and j indices.
I'm going to use purrr since the syntax is a little easier. This is a bit messy---if you want this to be reproducible, I would clean up the arguments to map_int---but the basic idea is to do something like this:
library(functional)
library(purrr)
# Make a function that can convert back and forth between position in vector
#+ and the indices of the same element in the matrix.
get_indices <- function(nrow, n) {
i <- ((n - 1) %% nrow) + 1
j <- ((n - 1) %/% nrow) + 1
list(i = i, j = j)
}
# Initialize your matrix, but fill it with the position of each element of the
#+ vector.
settlements <- matrix(1:200, nrow = 40, ncol = 5)
# Apply your function, and then turn it back into a matrix
map_dbl(settlements, ~do.call("FUN", Curry(get_indices, nrow = 40)(.))) %>%
matrix(nrow = 40, ncol = 5)
I'm working a large matrix (187,682,789 x 5)
Say it's build like this:
Day1 <- rep(1, 10)
Lat=sample(30:33, 10, replace=T)
Lon=sample(-30:-33, 10, replace=T)
Var=runif(10,1,100)
Mat1<-cbind(Day1,Lat,Lon,Var)
Day2 <- rep(2, 10)
Lat=sample(30:33, 10, replace=T)
Lon=sample(-30:-33, 10, replace=T)
Var=runif(10,1,100)
Mat2<-cbind(Day2,Lat,Lon,Var)
#... And so on, but let's stick to 2 days for the example
Mat = rbind(Mat1,Mat2)
Of course here, there is a redundancy in the number of unique Lat Lon combinaison.
position=cbind(Mat[,2],Mat[,3]) # Lat Lon
nrow(unique(position)) < nrow(position) #True
I would like to obtain a matrix that shows all the unique Lat Lon combinaison followed by all the corresponding variable per day.
For example:
> Mat
Day Lat Lon Var
[1,] 1 36 -36 51.086210
[2,] 1 37 -37 48.486008
[3,] 1 38 -38 39.482635
[4,] 1 39 -39 97.848232
[5,] 1 40 -40 71.076543
[6,] 2 31 -31 5.641855
[7,] 2 32 -32 62.124584
[8,] 2 33 -33 39.524119
[9,] 2 34 -34 7.214646
[10,] 2 35 -35 94.254170
[11,] 2 36 -36 40.615783
[12,] 2 37 -37 71.319719
[13,] 2 38 -38 81.775119
[14,] 2 39 -39 49.224411
[15,] 2 40 -40 80.813237
Would become:
>Resulting.Mat.Var
Unique.Lat Unique.Lon Day1 Day2
[1,] 36 -36 51.08621 40.615783
[2,] 37 -37 48.48601 71.319719
[3,] 38 -38 39.48264 81.775119
[4,] 39 -39 97.84823 49.224411
[5,] 40 -40 71.07654 80.813237
[6,] 31 -31 NA 5.641855
[7,] 32 -32 NA 62.124584
[8,] 33 -33 NA 39.524119
[9,] 34 -34 NA 7.214646
[10,] 35 -35 NA 94.254170
I tried to create a Matrix of NAs and fill it with 2 for loops, but it really takes too long !
Many thanks !
Edit:
This is somewhat different than what I found on SO since it really need efficiency, all are in numeric format and there are 2 columns that form the position...
J
This is a typical "long-to-wide" conversion problem. One possibility to obtain the desired form is to use dcast() from the reshape2 package:
library(reshape2)
as.matrix(dcast(as.data.frame(Mat), Lat + Lon ~ Day, value.var = "Var"))
# Lat Lon 1 2
# [1,] 31 -31 NA 5.641855
# [2,] 32 -32 NA 62.124584
# [3,] 33 -33 NA 39.524119
# [4,] 34 -34 NA 7.214646
# [5,] 35 -35 NA 94.254170
# [6,] 36 -36 51.08621 40.615783
# [7,] 37 -37 48.48601 71.319719
# [8,] 38 -38 39.48264 81.775119
# [9,] 39 -39 97.84823 49.224411
#[10,] 40 -40 71.07654 80.813237
Quite a few similar questions have been answered before on SO, so this is probably a duplicate. However, most questions refer to data.frame structures, and not to matrices.
data:
Mat <- structure(c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 36,
37, 38, 39, 40, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, -36,
-37, -38, -39, -40, -31, -32, -33, -34, -35, -36, -37, -38, -39,
-40, 51.08621, 48.486008, 39.482635, 97.848232, 71.076543, 5.641855,
62.124584, 39.524119, 7.214646, 94.25417, 40.615783, 71.319719,
81.775119, 49.224411, 80.813237), .Dim = c(15L, 4L),
.Dimnames = list(NULL, c("Day", "Lat", "Lon", "Var")))
Another method using dplyr is:
library(dplyr)
Resulting.Mat.Var <- as.matrix(
Mat %>% group_by(Unique.Lat=Lat,Unique.Lon=Lon) %>%
summarise(Day1=Var[which(Day==1)], Day2=Var[which(Day==2)]))
print(Resulting.Mat.Var)
## Unique.Lat Unique.Lon Day1 Day2
## [1,] 31 -31 NA 5.641855
## [2,] 32 -32 NA 62.124584
## [3,] 33 -33 NA 39.524119
## [4,] 34 -34 NA 7.214646
## [5,] 35 -35 NA 94.254170
## [6,] 36 -36 51.08621 40.615783
## [7,] 37 -37 48.48601 71.319719
## [8,] 38 -38 39.48264 81.775119
## [9,] 39 -39 97.84823 49.224411
##[10,] 40 -40 71.07654 80.813237
Looks like a merge to me:
> merge( Mat[Mat[,'Day']==1 , -1], Mat[ Mat[,'Day']==2, -1], by=c(1,2) , all=TRUE)
Lat Lon Var.x Var.y
1 31 -31 NA 5.641855
2 32 -32 NA 62.124584
3 33 -33 NA 39.524119
4 34 -34 NA 7.214646
5 35 -35 NA 94.254170
6 36 -36 51.08621 40.615783
7 37 -37 48.48601 71.319719
8 38 -38 39.48264 81.775119
9 39 -39 97.84823 49.224411
10 40 -40 71.07654 80.813237
Can coerce to matrix if needed since that result is a data.frame
Is it only me who have the problem with extracting coordinates of a polygon from SpatialPolygonsDataFrame object? I am able to extract other slots of the object (ID,plotOrder) but not coordinates (coords). I don't know what I am doing wrong. Please find below my R session where bdryData being the SpatialPolygonsDataFrame object with two polygons.
> bdryData
An object of class "SpatialPolygonsDataFrame"
Slot "data":
ID GRIDCODE
0 1 0
1 2 0
Slot "polygons":
[[1]]
An object of class "Polygons"
Slot "Polygons":
[[1]]
An object of class "Polygon"
Slot "labpt":
[1] 415499.1 432781.7
Slot "area":
[1] 0.6846572
Slot "hole":
[1] FALSE
Slot "ringDir":
[1] 1
Slot "coords":
[,1] [,2]
[1,] 415499.6 432781.2
[2,] 415498.4 432781.5
[3,] 415499.3 432782.4
[4,] 415499.6 432781.2
Slot "plotOrder":
[1] 1
Slot "labpt":
[1] 415499.1 432781.7
Slot "ID":
[1] "0"
Slot "area":
[1] 0.6846572
[[2]]
An object of class "Polygons"
Slot "Polygons":
[[1]]
An object of class "Polygon"
Slot "labpt":
[1] 415587.3 432779.4
Slot "area":
[1] 20712.98
Slot "hole":
[1] FALSE
Slot "ringDir":
[1] 1
Slot "coords":
[,1] [,2]
[1,] 415499.6 432781.2
[2,] 415505.0 432781.8
[3,] 415506.5 432792.6
[4,] 415508.9 432792.8
[5,] 415515.0 432791.5
[6,] 415517.7 432795.6
[7,] 415528.6 432797.7
[8,] 415538.8 432804.2
[9,] 415543.2 432805.8
[10,] 415545.1 432803.6
[11,] 415547.1 432804.7
[12,] 415551.7 432805.8
[13,] 415557.5 432812.3
[14,] 415564.2 432817.1
[15,] 415568.5 432823.9
[16,] 415571.0 432826.8
[17,] 415573.2 432828.7
[18,] 415574.1 432829.7
[19,] 415576.2 432830.7
[20,] 415580.2 432833.8
[21,] 415589.6 432836.0
[22,] 415593.1 432841.0
[23,] 415592.2 432843.7
[24,] 415590.6 432846.6
[25,] 415589.0 432853.3
[26,] 415584.8 432855.3
[27,] 415579.7 432859.8
[28,] 415577.7 432866.2
[29,] 415575.6 432868.1
[30,] 415566.7 432880.7
[31,] 415562.7 432887.5
[32,] 415559.2 432889.1
[33,] 415561.5 432890.7
[34,] 415586.2 432889.7
[35,] 415587.1 432888.6
[36,] 415588.5 432890.2
[37,] 415598.2 432888.7
[38,] 415599.1 432887.7
[39,] 415601.2 432886.7
[40,] 415603.1 432885.7
[41,] 415605.2 432884.7
[42,] 415606.1 432882.7
[43,] 415607.2 432880.7
[44,] 415608.3 432878.3
[45,] 415612.2 432874.8
[46,] 415614.7 432871.9
[47,] 415617.1 432870.7
[48,] 415622.4 432868.2
[49,] 415622.0 432862.4
[50,] 415624.2 432855.4
[51,] 415633.2 432845.3
[52,] 415639.0 432841.1
[53,] 415642.8 432832.9
[54,] 415647.5 432828.7
[55,] 415654.3 432820.3
[56,] 415654.1 432816.5
[57,] 415658.2 432812.8
[58,] 415661.9 432808.6
[59,] 415663.5 432808.7
[60,] 415668.1 432803.5
[61,] 415676.5 432801.3
[62,] 415679.1 432802.7
[63,] 415680.1 432802.7
[64,] 415681.1 432802.7
[65,] 415682.2 432802.7
[66,] 415685.8 432804.7
[67,] 415691.8 432802.2
[68,] 415693.6 432798.9
[69,] 415696.2 432777.0
[70,] 415689.8 432773.5
[71,] 415683.7 432771.6
[72,] 415680.2 432766.7
[73,] 415679.0 432765.6
[74,] 415676.8 432753.7
[75,] 415671.4 432747.7
[76,] 415662.7 432747.2
[77,] 415658.7 432750.0
[78,] 415657.0 432746.3
[79,] 415654.1 432743.7
[80,] 415652.3 432739.8
[81,] 415649.6 432739.6
[82,] 415648.0 432739.7
[83,] 415641.9 432736.4
[84,] 415633.4 432736.9
[85,] 415630.2 432734.7
[86,] 415622.3 432733.6
[87,] 415614.4 432726.5
[88,] 415617.1 432719.1
[89,] 415612.5 432718.1
[90,] 415610.0 432720.9
[91,] 415606.2 432716.6
[92,] 415603.2 432713.9
[93,] 415601.4 432710.0
[94,] 415580.3 432708.7
[95,] 415545.1 432709.7
[96,] 415543.5 432711.5
[97,] 415534.0 432715.7
[98,] 415527.1 432713.7
[99,] 415521.1 432711.6
[100,] 415505.6 432710.6
[101,] 415501.3 432710.9
[102,] 415499.3 432708.7
[103,] 415495.6 432711.6
[104,] 415482.6 432726.2
[105,] 415477.2 432734.0
[106,] 415478.1 432737.7
[107,] 415479.2 432739.7
[108,] 415480.9 432743.4
[109,] 415486.5 432751.2
[110,] 415493.2 432760.7
[111,] 415494.1 432762.7
[112,] 415498.1 432767.9
[113,] 415497.2 432770.7
[114,] 415490.6 432773.2
[115,] 415493.2 432775.6
[116,] 415496.0 432778.7
[117,] 415499.2 432779.7
[118,] 415499.6 432781.2
Slot "plotOrder":
[1] 1
Slot "labpt":
[1] 415587.3 432779.4
Slot "ID":
[1] "1"
Slot "area":
[1] 20712.98
Slot "plotOrder":
[1] 2 1
Slot "bbox":
min max
x 415477.2 415696.2
y 432708.7 432890.7
Slot "proj4string":
CRS arguments:
+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000
+datum=OSGB36 +units=m +no_defs +ellps=airy
+towgs84=446.448,-125.157,542.060,0.1502,0.2470,0.8421,-20.4894
Subsetting second polygon from bdryData
> bdryData#polygons[[2]]
An object of class "Polygons"
Slot "Polygons":
[[1]]
An object of class "Polygon"
Slot "labpt":
[1] 415587.3 432779.4
Slot "area":
[1] 20712.98
Slot "hole":
[1] FALSE
Slot "ringDir":
[1] 1
Slot "coords":
[,1] [,2]
[1,] 415499.6 432781.2
[2,] 415505.0 432781.8
[3,] 415506.5 432792.6
[4,] 415508.9 432792.8
[5,] 415515.0 432791.5
[6,] 415517.7 432795.6
[7,] 415528.6 432797.7
[8,] 415538.8 432804.2
[9,] 415543.2 432805.8
[10,] 415545.1 432803.6
[11,] 415547.1 432804.7
[12,] 415551.7 432805.8
[13,] 415557.5 432812.3
[14,] 415564.2 432817.1
[15,] 415568.5 432823.9
[16,] 415571.0 432826.8
[17,] 415573.2 432828.7
[18,] 415574.1 432829.7
[19,] 415576.2 432830.7
[20,] 415580.2 432833.8
[21,] 415589.6 432836.0
[22,] 415593.1 432841.0
[23,] 415592.2 432843.7
[24,] 415590.6 432846.6
[25,] 415589.0 432853.3
[26,] 415584.8 432855.3
[27,] 415579.7 432859.8
[28,] 415577.7 432866.2
[29,] 415575.6 432868.1
[30,] 415566.7 432880.7
[31,] 415562.7 432887.5
[32,] 415559.2 432889.1
[33,] 415561.5 432890.7
[34,] 415586.2 432889.7
[35,] 415587.1 432888.6
[36,] 415588.5 432890.2
[37,] 415598.2 432888.7
[38,] 415599.1 432887.7
[39,] 415601.2 432886.7
[40,] 415603.1 432885.7
[41,] 415605.2 432884.7
[42,] 415606.1 432882.7
[43,] 415607.2 432880.7
[44,] 415608.3 432878.3
[45,] 415612.2 432874.8
[46,] 415614.7 432871.9
[47,] 415617.1 432870.7
[48,] 415622.4 432868.2
[49,] 415622.0 432862.4
[50,] 415624.2 432855.4
[51,] 415633.2 432845.3
[52,] 415639.0 432841.1
[53,] 415642.8 432832.9
[54,] 415647.5 432828.7
[55,] 415654.3 432820.3
[56,] 415654.1 432816.5
[57,] 415658.2 432812.8
[58,] 415661.9 432808.6
[59,] 415663.5 432808.7
[60,] 415668.1 432803.5
[61,] 415676.5 432801.3
[62,] 415679.1 432802.7
[63,] 415680.1 432802.7
[64,] 415681.1 432802.7
[65,] 415682.2 432802.7
[66,] 415685.8 432804.7
[67,] 415691.8 432802.2
[68,] 415693.6 432798.9
[69,] 415696.2 432777.0
[70,] 415689.8 432773.5
[71,] 415683.7 432771.6
[72,] 415680.2 432766.7
[73,] 415679.0 432765.6
[74,] 415676.8 432753.7
[75,] 415671.4 432747.7
[76,] 415662.7 432747.2
[77,] 415658.7 432750.0
[78,] 415657.0 432746.3
[79,] 415654.1 432743.7
[80,] 415652.3 432739.8
[81,] 415649.6 432739.6
[82,] 415648.0 432739.7
[83,] 415641.9 432736.4
[84,] 415633.4 432736.9
[85,] 415630.2 432734.7
[86,] 415622.3 432733.6
[87,] 415614.4 432726.5
[88,] 415617.1 432719.1
[89,] 415612.5 432718.1
[90,] 415610.0 432720.9
[91,] 415606.2 432716.6
[92,] 415603.2 432713.9
[93,] 415601.4 432710.0
[94,] 415580.3 432708.7
[95,] 415545.1 432709.7
[96,] 415543.5 432711.5
[97,] 415534.0 432715.7
[98,] 415527.1 432713.7
[99,] 415521.1 432711.6
[100,] 415505.6 432710.6
[101,] 415501.3 432710.9
[102,] 415499.3 432708.7
[103,] 415495.6 432711.6
[104,] 415482.6 432726.2
[105,] 415477.2 432734.0
[106,] 415478.1 432737.7
[107,] 415479.2 432739.7
[108,] 415480.9 432743.4
[109,] 415486.5 432751.2
[110,] 415493.2 432760.7
[111,] 415494.1 432762.7
[112,] 415498.1 432767.9
[113,] 415497.2 432770.7
[114,] 415490.6 432773.2
[115,] 415493.2 432775.6
[116,] 415496.0 432778.7
[117,] 415499.2 432779.7
[118,] 415499.6 432781.2
Slot "plotOrder":
[1] 1
Slot "labpt":
[1] 415587.3 432779.4
Slot "ID":
[1] "1"
Slot "area":
[1] 20712.98
Extracting slots
> bdryData#polygons[[2]]#ID
[1] "1"
> bdryData#polygons[[2]]#plotOrder
[1] 1
But problem with coordinates
> bdryData#polygons[[2]]#coords
Error: no slot of name "coords" for this object of class "Polygons"
Any help is really appreciated. Thanks.
Finally, I figured out that I didn't parse the output correctly. The correct way to do is bdryData#polygons[[2]]#Polygons[[1]]#coords. Mind the difference in command polygons(Polygons and polygons) and it took me ages to find out.
Use the coordinates() function from the sp package. It should give you the values in a list format.
You can also get the Polygon attribute from the shapefile.
mfile = readOGR(dsn=dsn,layer=layername)
polys = attr(mfile,'polygons')
npolys = length(polys)
for (i in 1:npolys){
poly = polys[[i]]
polys2 = attr(poly,'Polygons')
npolys2 = length(polys2)
for (j in 1:npolys2){
#do stuff with these values
coords = coordinates(polys2[[j]])
}
}
This took me a while to figure out too. The following function I wrote worked for me. sp.df should be SpatialPolygonsDataFrame.
extractCoords <- function(sp.df)
{
results <- list()
for(i in 1:length(sp.df#polygons[[1]]#Polygons))
{
results[[i]] <- sp.df#polygons[[1]]#Polygons[[i]]#coords
}
results <- Reduce(rbind, results)
results
}
This question was also addressed on gis.stackexchange, here. I made an example below testing all the options mentioned here by #mdsumner. Also have a look here
library(sp)
library(sf)
#> Warning: package 'sf' was built under R version 3.5.3
#> Linking to GEOS 3.6.1, GDAL 2.2.3, PROJ 4.9.3
library(raster)
library(spbabel)
#> Warning: package 'spbabel' was built under R version 3.5.3
library(tmap)
library(microbenchmark)
library(ggplot2)
# Prepare data
data(World)
# Convert from sf to sp objects
atf_sf <- World[World$iso_a3 == "ATF", ]
atf_sp <- as(atf_sf, "Spatial")
atf_sp
#> class : SpatialPolygonsDataFrame
#> features : 1
#> extent : 5490427, 5660887, -6048972, -5932855 (xmin, xmax, ymin, ymax)
#> coord. ref. : +proj=eck4 +lon_0=0 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0
#> variables : 15
#> # A tibble: 1 x 15
#> iso_a3 name sovereignt continent area pop_est pop_est_dens economy
#> <fct> <fct> <fct> <fct> <S3:> <dbl> <dbl> <fct>
#> 1 ATF Fr. ~ France Seven se~ 7257~ 140 0.0193 6. Dev~
#> # ... with 7 more variables: income_grp <fct>, gdp_cap_est <dbl>,
#> # life_exp <dbl>, well_being <dbl>, footprint <dbl>, inequality <dbl>,
#> # HPI <dbl>
# Try various functions:
raster::geom(atf_sp)
#> object part cump hole x y
#> [1,] 1 1 1 0 5550200 -5932855
#> [2,] 1 1 1 0 5589907 -5964836
#> [3,] 1 1 1 0 5660887 -5977490
#> [4,] 1 1 1 0 5656160 -5996685
#> [5,] 1 1 1 0 5615621 -6042456
#> [6,] 1 1 1 0 5490427 -6048972
#> [7,] 1 1 1 0 5509148 -5995424
#> [8,] 1 1 1 0 5536900 -5953683
#> [9,] 1 1 1 0 5550200 -5932855
ggplot2::fortify(atf_sp)
#> Regions defined for each Polygons
#> long lat order hole piece id group
#> 1 5550200 -5932855 1 FALSE 1 8 8.1
#> 2 5589907 -5964836 2 FALSE 1 8 8.1
#> 3 5660887 -5977490 3 FALSE 1 8 8.1
#> 4 5656160 -5996685 4 FALSE 1 8 8.1
#> 5 5615621 -6042456 5 FALSE 1 8 8.1
#> 6 5490427 -6048972 6 FALSE 1 8 8.1
#> 7 5509148 -5995424 7 FALSE 1 8 8.1
#> 8 5536900 -5953683 8 FALSE 1 8 8.1
#> 9 5550200 -5932855 9 FALSE 1 8 8.1
spbabel::sptable(atf_sp)
#> # A tibble: 9 x 6
#> object_ branch_ island_ order_ x_ y_
#> <int> <int> <lgl> <int> <dbl> <dbl>
#> 1 1 1 TRUE 1 5550200. -5932855.
#> 2 1 1 TRUE 2 5589907. -5964836.
#> 3 1 1 TRUE 3 5660887. -5977490.
#> 4 1 1 TRUE 4 5656160. -5996685.
#> 5 1 1 TRUE 5 5615621. -6042456.
#> 6 1 1 TRUE 6 5490427. -6048972.
#> 7 1 1 TRUE 7 5509148. -5995424.
#> 8 1 1 TRUE 8 5536900. -5953683.
#> 9 1 1 TRUE 9 5550200. -5932855.
as.data.frame(as(as(atf_sp, "SpatialLinesDataFrame"),"SpatialPointsDataFrame"))
#> iso_a3 name sovereignt continent
#> 8 ATF Fr. S. Antarctic Lands France Seven seas (open ocean)
#> 8.1 ATF Fr. S. Antarctic Lands France Seven seas (open ocean)
#> 8.2 ATF Fr. S. Antarctic Lands France Seven seas (open ocean)
#> 8.3 ATF Fr. S. Antarctic Lands France Seven seas (open ocean)
#> 8.4 ATF Fr. S. Antarctic Lands France Seven seas (open ocean)
#> 8.5 ATF Fr. S. Antarctic Lands France Seven seas (open ocean)
#> 8.6 ATF Fr. S. Antarctic Lands France Seven seas (open ocean)
#> 8.7 ATF Fr. S. Antarctic Lands France Seven seas (open ocean)
#> 8.8 ATF Fr. S. Antarctic Lands France Seven seas (open ocean)
#> area pop_est pop_est_dens economy
#> 8 7257.455 [km^2] 140 0.01929051 6. Developing region
#> 8.1 7257.455 [km^2] 140 0.01929051 6. Developing region
#> 8.2 7257.455 [km^2] 140 0.01929051 6. Developing region
#> 8.3 7257.455 [km^2] 140 0.01929051 6. Developing region
#> 8.4 7257.455 [km^2] 140 0.01929051 6. Developing region
#> 8.5 7257.455 [km^2] 140 0.01929051 6. Developing region
#> 8.6 7257.455 [km^2] 140 0.01929051 6. Developing region
#> 8.7 7257.455 [km^2] 140 0.01929051 6. Developing region
#> 8.8 7257.455 [km^2] 140 0.01929051 6. Developing region
#> income_grp gdp_cap_est life_exp well_being footprint
#> 8 2. High income: nonOECD 114285.7 NA NA NA
#> 8.1 2. High income: nonOECD 114285.7 NA NA NA
#> 8.2 2. High income: nonOECD 114285.7 NA NA NA
#> 8.3 2. High income: nonOECD 114285.7 NA NA NA
#> 8.4 2. High income: nonOECD 114285.7 NA NA NA
#> 8.5 2. High income: nonOECD 114285.7 NA NA NA
#> 8.6 2. High income: nonOECD 114285.7 NA NA NA
#> 8.7 2. High income: nonOECD 114285.7 NA NA NA
#> 8.8 2. High income: nonOECD 114285.7 NA NA NA
#> inequality HPI Lines.NR Lines.ID Line.NR coords.x1 coords.x2
#> 8 NA NA 1 8 1 5550200 -5932855
#> 8.1 NA NA 1 8 1 5589907 -5964836
#> 8.2 NA NA 1 8 1 5660887 -5977490
#> 8.3 NA NA 1 8 1 5656160 -5996685
#> 8.4 NA NA 1 8 1 5615621 -6042456
#> 8.5 NA NA 1 8 1 5490427 -6048972
#> 8.6 NA NA 1 8 1 5509148 -5995424
#> 8.7 NA NA 1 8 1 5536900 -5953683
#> 8.8 NA NA 1 8 1 5550200 -5932855
# What about speed? raster::geom is the fastest
res <- microbenchmark(raster::geom(atf_sp),
ggplot2::fortify(atf_sp),
spbabel::sptable(atf_sp),
as.data.frame(as(as(atf_sp, "SpatialLinesDataFrame"),
"SpatialPointsDataFrame")))
ggplot2::autoplot(res)
#> Coordinate system already present. Adding new coordinate system, which will replace the existing one.
Created on 2019-03-23 by the reprex package (v0.2.1)
ggplot2's fortify() function may be deprecated at some point so the broom package is now suggested
library(broom)
broom::tidy(atf_sp)
The only valid answer on this posting was provided by the author "repres_package" above. See that author's recommended solutions if you want to get the right answer. If you want to obtain the geometry of a polygon dataset, you are seeking the long and lat for every single vertex in the polygon feature class. The author's suggestion of using raster::geom() or ggplot2::fortify(), for example, will give you the total number of vertices that are contained in the spatialpolygonsdataframe. That's what you want. The other author's fail to do so.
For example, in my spatialpolygonsdataframe of North Carolina counties (from US Census), I have a total of 1259547 vertices. By using raster::geom(NC_counties), I am given a dataframe that contains a long and lat for each of those 1259547 vertices. I could also use gglot2::fortify(NC_counties) to obtain coordinates for those 1259547 vertices. All of the valid options are given in the answer by "repres_package".
When I ran the recommended codes in the other answers on this posting, I obtained long and lat coordinates for only 672 vertices, 1041 vertices, or 1721 vertices, which is off by over one million vertices. I'm supposed to get long and lat coordiates for 1259547 vertices. I suspect that those codes are interpolating centroids for the polygons, which is not the geometry of the polygons.
> uc<-unique(r1$COMPANY)
> uc
[1] AZTEC CALIBER POINT COGNIZANT CYBAGE CYBAGE DLF
[7] GODREJ AND BOYCE LTD. HCL TECHNOLOGIES I-FLEX INFOCEPTS INFOSYS JATAAYU SOFTWARE (P) LTD.
[13] KANBAY KPIT L & T LTD. L & T INFOTECH MASTEK mBlazon SOLUTION PVT. LTD.
[19] MOTOR INDUSTRIES LTD NOVATECH PATNI COMPUTER SOFTWARE RF ARRAYS S.M. WIRELESS PVT. LTD. SATYAM COMPUTERS
[25] SATYAM COMPUTERS SATYAM COMPUTERS LTD. SHOBHA DEVELOPERS SOHAM's FOUNDTION ENGG. SYNTEL LTD. TCS
[31] TECH MAHINDRA LTD. ULTRA TECH CEMENT VRITTI SOLUTIONS ABO SOFTWARE ARTEFACT PROJECTS LTD. EATON
[37] FORCE MOTORS H.C.C. HEXAWARE TECHNOLOGIES HJB GROUP COMPANY, OMAN IBM DAKSH INDIAN MILITARY ACADEMY
[43] INDO RAMA SYNTHETICS INFOSPECTRUM PVT. LTD. JYOTI STRUCTUIRES LTD. KALPATARU KONE ELEVATORS L & T ( e- SOLUTIONS)
[49] LAMBENT MAHINDRA & MAHINDRA LTD. MAYTAS INFRA PVT. LTD. MOTOR INDUSTRIES LTD . ORIENT CEMENT PERSISTENT SYSTEMS PVT.LTD
[55] PREMIERE TECHNMOLOGY SCHNEIDER SETH CONSTRUCTION SIEMENS SIMPLEX SMS PARYAWARAN
[61] SOFT LINK INTERNATIONAL VARROC ENGINEERING, PUNE
> f<-length(uc)
> tt<-mat.or.vec(length(uc),3)
> for(n in 1:f)
+ {
+ tt[n,1]=uc[n]
+ tt[n,2]=5
+ tt[n,3]=9
+ }
> tt
[,1] [,2] [,3]
[1,] 3 5 9
[2,] 4 5 9
[3,] 5 5 9
[4,] 6 5 9
[5,] 7 5 9
[6,] 8 5 9
[7,] 11 5 9
[8,] 13 5 9
[9,] 16 5 9
[10,] 20 5 9
[11,] 22 5 9
[12,] 23 5 9
[13,] 26 5 9
[14,] 28 5 9
[15,] 29 5 9
[16,] 31 5 9
[17,] 34 5 9
[18,] 36 5 9
[19,] 37 5 9
[20,] 39 5 9
[21,] 41 5 9
[22,] 44 5 9
[23,] 45 5 9
[24,] 46 5 9
[25,] 47 5 9
[26,] 48 5 9
[27,] 51 5 9
[28,] 56 5 9
[29,] 57 5 9
[30,] 58 5 9
[31,] 59 5 9
[32,] 60 5 9
[33,] 62 5 9
[34,] 1 5 9
[35,] 2 5 9
[36,] 9 5 9
[37,] 10 5 9
[38,] 12 5 9
[39,] 14 5 9
[40,] 15 5 9
[41,] 17 5 9
[42,] 18 5 9
[43,] 19 5 9
[44,] 21 5 9
[45,] 24 5 9
[46,] 25 5 9
[47,] 27 5 9
[48,] 30 5 9
[49,] 32 5 9
[50,] 33 5 9
[51,] 35 5 9
[52,] 38 5 9
[53,] 40 5 9
[54,] 42 5 9
[55,] 43 5 9
[56,] 49 5 9
[57,] 50 5 9
[58,] 52 5 9
[59,] 53 5 9
[60,] 54 5 9
[61,] 55 5 9
[62,] 61 5 9
> strsplit(r1$COMPANY," ")
Error in strsplit(r1$COMPANY, " ") : non-character argument
I want t[n,1] to store all unique values of r1$COMPANY present in vector uc but it shows some random numbers.Please help to resolve the error.Also I would like to know how to obtain just first word from r1$COMPANY. I tried to split it on space character but it shows error.
As #Jason and #Metrics mentioned, it appears that uc is a factor rather than a character. To remedy this, define uc as follows:
uc <- with(r1, as.character(unique(COMPANY)))
The key here is as.character(), which will convert a factor variable to a character variable. Any string manipulation functions should now work on uc.
# I think `uc` is factor in your case (check using str(uc)). If it is character, it will give the solution as expected. Consider the following example:
uc<-names(mtcars)
#str(uc)
#chr [1:11] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear" "carb"
f<-length(uc)
tt<-mat.or.vec(f,3)
for(n in 1:f) {
tt[n,1]<-uc[n]
tt[n,2]<-5
tt[n,3]<-9
}
> tt
tt
[,1] [,2] [,3]
[1,] "mpg" "5" "9"
[2,] "cyl" "5" "9"
[3,] "disp" "5" "9"
[4,] "hp" "5" "9"
[5,] "drat" "5" "9"
[6,] "wt" "5" "9"
[7,] "qsec" "5" "9"
[8,] "vs" "5" "9"
[9,] "am" "5" "9"
[10,] "gear" "5" "9"
[11,] "carb" "5" "9"
I have used PCA on 2D arrays before, and I use the first PC score vector that best best describes the variance of all the other columns in analyses. Below is a R example that shows the Comp.1 vector that would best describe the variance of the 2D array of interest.
data <- array(data=sample(12), c(4,3))
data
[,1] [,2] [,3]
[1,] 11 2 12
[2,] 4 3 10
[3,] 8 7 1
[4,] 6 9 5
output=princomp(data)
output$scores
Comp.1 Comp.2 Comp.3
[1,] 6.422813 2.865390 0.4025040
[2,] 3.251842 -3.617633 -0.9814571
[3,] -5.856500 1.848419 -1.3819379
[4,] -3.818155 -1.096176 1.9608909
My question is how can I do this same procedure on a 3D array? For example, if I have an array that the size is 4 x 5 x 3 how could I get the 4 x 5 2D array that is equivalent to the Comp.1 vector found above?
I have provided an R example below with code and outputs. When I look at the scores it only outputs one component (not 3 as expected), and the length is 60. Does that mean that the first 20 elements correspond to the first PC, the next 20 to the 2nd PC, and the last 20 to the 3rd PC? If so how does princomp arrange the entries, so I can get back to the original 4 x 5 2D array using the first 20 elements (1st PC)? Thank you for your assistance.
data=array(data=sample(48), c(4,5,3))
data
, , 1
[,1] [,2] [,3] [,4] [,5]
[1,] 47 21 45 41 34
[2,] 1 16 32 31 37
[3,] 39 8 35 10 6
[4,] 48 14 25 3 11
, , 2
[,1] [,2] [,3] [,4] [,5]
[1,] 12 43 15 36 23
[2,] 17 4 7 26 46
[3,] 2 13 33 20 40
[4,] 18 19 28 44 38
, , 3
[,1] [,2] [,3] [,4] [,5]
[1,] 42 24 47 21 45
[2,] 5 22 1 16 32
[3,] 30 29 39 8 35
[4,] 27 9 48 14 25
output=princomp(data)
output$scores
Comp.1
[1,] 21.8833333
[2,] -24.1166667
[3,] 13.8833333
[4,] 22.8833333
[5,] -4.1166667
[6,] -9.1166667
[7,] -17.1166667
[8,] -11.1166667
[9,] 19.8833333
[10,] 6.8833333
[11,] 9.8833333
[12,] -0.1166667
[13,] 15.8833333
[14,] 5.8833333
[15,] -15.1166667
[16,] -22.1166667
[17,] 8.8833333
[18,] 11.8833333
[19,] -19.1166667
[20,] -14.1166667
[21,] -13.1166667
[22,] -8.1166667
[23,] -23.1166667
[24,] -7.1166667
[25,] 17.8833333
[26,] -21.1166667
[27,] -12.1166667
[28,] -6.1166667
[29,] -10.1166667
[30,] -18.1166667
[31,] 7.8833333
[32,] 2.8833333
[33,] 10.8833333
[34,] 0.8833333
[35,] -5.1166667
[36,] 18.8833333
[37,] -2.1166667
[38,] 20.8833333
[39,] 14.8833333
[40,] 12.8833333
[41,] 16.8833333
[42,] -20.1166667
[43,] 4.8833333
[44,] 1.8833333
[45,] -1.1166667
[46,] -3.1166667
[47,] 3.8833333
[48,] -16.1166667
[49,] 21.8833333
[50,] -24.1166667
[51,] 13.8833333
[52,] 22.8833333
[53,] -4.1166667
[54,] -9.1166667
[55,] -17.1166667
[56,] -11.1166667
[57,] 19.8833333
[58,] 6.8833333
[59,] 9.8833333
[60,] -0.1166667