Count values greater than x within subsets of a matrix? - r

I have a matrix (49 rows x 533 columns) and the columns are subsetted into 5 "subtypes".
For each row, I want to count how many values are greater than 1 within each subtype
e.g. if I have subsets A,B,C,D,E: "In row (i) how many of the values in subset A are greater than 1?" and the same for b,c,d and e for every row.
Using tapply() and length() I am able to count the values for each row by subtype:
lengthBySubtype <- function(x) {tapply(x,subtypes,length)}
apply(dataMatrix,1,lengthBySubtype)
My code returns, for each row, the number of values in each subset. Here's a small chunk of the results:
r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14 r15 r16 r17
A 111 111 111 111 111 111 111 111 111 111 111 111 111 111 111 111 111
B 74 74 74 74 74 74 74 74 74 74 74 74 74 74 74 74 74
C 195 195 195 195 195 195 195 195 195 195 195 195 195 195 195 195 195
D 128 128 128 128 128 128 128 128 128 128 128 128 128 128 128 128 128
E 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25
It's in the exact format I want, but what if I only want to count values that meet a certain condition? (e.g. are greater than 1 in my case). Is there a different function that would work with the apply family for this?

M <- data.matrix( read.table(text=" r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14 r15 r16 r17
A 111 111 111 111 111 111 111 111 111 111 111 111 111 111 111 111 111
B 74 74 74 74 74 74 74 74 74 74 74 74 74 74 74 74 74
C 195 195 195 195 195 195 195 195 195 195 195 195 195 195 195 195 195
D 128 128 128 128 128 128 128 128 128 128 128 128 128 128 128 128 128
E 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25", head=TRUE))
x=50
Assuming the use of the word "matrix" is referring to an R matrix. then this demonstrates the comment suggestion on the example.
rowSums(M > x)
# A B C D E
#17 17 17 17 0

Related

K-nearest neighbor for spatial weights R

I was wondering if you could help me with this problem. I have a dataset of US counties that I am trying to do k-nearest neighbor analysis for spatial weighting, following the method proposed here (section 4.5), but the results aren't making sense, or potentially I'm not understanding them.
library(spdep)
library(tigris)
library(sf)
counties <- counties("Georgia", cb = TRUE)
coords <- st_centroid(st_geometry(counties), of_largest_polygon=TRUE)
col.knn <- knearneigh(coords)
gck4.nb <- knn2nb(knearneigh(coords, k=4, longlat=TRUE))
summary(gck4.nb, coords, longlat=TRUE, scale=0.5)
However, the output I'm getting, with regards to the distances, seems rather small, on the order of less than 1 km:
Neighbour list object:
Number of regions: 159
Number of nonzero links: 636
Percentage nonzero weights: 2.515723
Average number of links: 4
Non-symmetric neighbours list
Link number distribution:
4
159
159 least connected regions:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 with 4 links
159 most connected regions:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 with 4 links
Summary of link distances:
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.1355 0.2650 0.3085 0.3112 0.3482 0.6224
The decimal point is 1 digit(s) to the left of the |
1 | 44
1 | 7799999999999999
2 | 00000000000011111111112222222222222233333333333333333333333333444444
2 | 55555555555555555555555555556666666666666666666666666666666666667777+92
3 | 00000000000000000000000000000001111111111111111111111111111111111111+121
3 | 55555555555555555555555555555556666666666667777777777777777777777777+19
4 | 00000000000111111111112222222222223333333444
4 | 555667777999
5 | 0000014
5 | 7888
6 | 2

Optimal way to reshape dataframe in R to have observation on columns

Given e.g. the Orange data set, I would like to arrange the observations in a matrix in which the measurements (circumference) taken on each tree are arranged in rows (for a total of 5 rows).
One unsatisfactory way of obtaining this result is as follows:
mat<-matrix(Orange[,3],nrow=5, ncol = 7,byrow=T, dimnames = list(c(unique(Orange$Tree)),c(1:7)))
An alternative way would be using the dcast( ) function within the data.table package.
This allows you to convert data from long to wide. In this case, I've created an ID to could the number of records per Tree.
In the re-shaped data, Tree becomes our primary column and circumference is recorded in 7 unique columns (one for each age).
library(data.table)
Orange <- data.table(Orange)[,ID := seq(1:.N), by=Tree]
Orange2 <- dcast(
data = Orange,
formula = Tree ~ ID,
value.var = "circumference")
Orange2
Tree 1 2 3 4 5 6 7
1: 3 30 51 75 108 115 139 140
2: 1 30 58 87 115 120 142 145
3: 5 30 49 81 125 142 174 177
4: 2 33 69 111 156 172 203 203
5: 4 32 62 112 167 179 209 214
EDIT (in response to additional comments/questions):
Technically the data is already ordered by Tree (defined within the data). This is because the variable Tree is a factor variable with preset levels. To order numerically, here are 2 things: (1) Order by as.character( ) and (2) Re-level the variable.
Orange2[order(as.character(Tree),]
1: 1 30 58 87 115 120 142 145
2: 2 33 69 111 156 172 203 203
3: 3 30 51 75 108 115 139 140
4: 4 32 62 112 167 179 209 214
5: 5 30 49 81 125 142 174 177
class(Orange$Tree)
[1] "ordered" "factor"
levels(Orange$Tree)
[1] "3" "1" "5" "2" "4"
Orange2[,Tree := factor(Tree, c("1","2","3","4","5"), ordered = FALSE)]
Orange2[order(Tree),]
Tree 1 2 3 4 5 6 7
1: 1 30 58 87 115 120 142 145
2: 2 33 69 111 156 172 203 203
3: 3 30 51 75 108 115 139 140
4: 4 32 62 112 167 179 209 214
5: 5 30 49 81 125 142 174 177
In base, you could simply do:
aggregate(circumference ~ Tree, Orange, I)
If you don't want to order it afterwards: aggregate(circumference ~ as.character(Tree), Orange, I) (that will strip the factor ordering).
Or similar to #RyanF:
Orange$id <- sequence(rle(as.character(Orange$Tree))$lengths)
reshape(Orange[,-2],
idvar = "Tree",
timevar = "id",
direction = "wide")
Output:
Tree circumference.1 circumference.2 circumference.3 circumference.4 circumference.5 circumference.6 circumference.7
1 1 30 58 87 115 120 142 145
8 2 33 69 111 156 172 203 203
15 3 30 51 75 108 115 139 140
22 4 32 62 112 167 179 209 214
29 5 30 49 81 125 142 174 177

GAMs in R: Fewer unique covariate combinations than df

I tried fitting gams to some dataframes I have. All minus one work. It fails with the error:
Error in smooth.construct.tp.smooth.spec(object, dk$data, dk$knots) : A term has fewer unique covariate combinations than specified maximum degrees of freedom
I looked a bit on the internet but couldn't really figure out what's really going wrong. All my 7 over dataframes run without a problem.
I then ran epiR::epi.cp(srtm[-c(1,7,8)]) and it gave me this output:
$cov.pattern
id n curv_plan curv_prof dem slope ca
1 1 1 1.113192e-02 3.991046e-03 3909 43.601479 5.225853
2 2 1 -2.686749e-03 3.474989e-03 3312 35.022511 4.418310
3 3 1 -1.033450e-02 -4.626922e-03 3326 36.678623 4.421465
4 4 1 -5.439283e-03 2.066148e-03 4069 31.501045 3.887526
5 5 1 -2.602015e-03 -1.249511e-04 3021 37.199219 5.010560
6 6 1 1.068216e-03 1.216902e-03 2844 44.694374 4.852220
7 7 1 -1.855443e-02 -5.965539e-03 2841 42.753750 5.088554
8 8 1 2.363193e-03 2.353357e-03 2833 33.160995 4.652209
9 9 1 2.169674e-02 1.049735e-02 2964 32.311535 4.671970
10 10 1 2.850910e-02 9.416230e-03 2956 50.791847 3.496096
11 11 1 -1.932028e-02 4.949751e-04 2794 38.714302 4.217102
12 12 1 -1.372750e-03 -4.437230e-03 3799 48.356312 4.597039
13 13 1 1.154181e-04 -4.114155e-03 3808 54.669777 3.518823
14 14 1 2.743768e-02 7.829833e-03 3580 23.674162 3.268744
15 15 1 7.216539e-03 9.818082e-04 3969 29.421440 4.354250
16 16 1 2.385139e-03 6.333927e-04 3635 10.555381 4.905733
17 17 1 -1.129411e-02 2.719948e-03 2805 29.195084 4.807369
18 18 1 4.584329e-04 -1.497223e-03 3676 32.754879 3.729304
19 19 1 1.883965e-03 4.189690e-03 3165 30.973505 4.833158
20 20 1 -5.350136e-03 -2.615470e-03 2745 32.534698 4.420852
21 21 1 1.484253e-02 -1.245213e-03 3872 26.113234 4.045357
22 22 1 -2.449377e-02 -5.045668e-04 2931 31.060991 5.170872
23 23 1 -2.962795e-02 -9.271557e-03 2917 21.680889 4.547461
24 24 1 -2.487545e-02 -7.834328e-03 2736 41.775677 4.543325
25 25 1 2.890568e-03 -2.040353e-03 2577 47.003765 3.739546
26 26 1 -5.119631e-03 8.869720e-03 3401 38.519680 5.428564
27 27 1 6.171266e-03 -6.515175e-04 2687 36.678623 4.152842
28 28 1 -8.297552e-03 -7.053435e-03 3678 39.532673 4.081311
29 29 1 8.652663e-03 2.394378e-03 3515 33.895370 4.220177
30 30 1 -2.528805e-03 -1.293259e-03 3404 42.548138 4.266330
31 31 1 1.899994e-02 6.367806e-03 3191 41.696201 3.300749
32 32 1 -2.243623e-02 -1.866033e-04 2433 34.162479 5.364681
33 33 1 -6.934012e-03 9.280805e-03 2309 32.667160 5.650699
34 34 1 -1.121149e-02 6.376335e-05 2188 31.119059 4.706416
35 35 1 -1.429000e-02 5.299596e-04 2511 34.543365 4.538456
36 36 1 -7.168889e-03 1.301791e-03 2625 30.826660 4.059711
37 37 1 -4.226461e-03 7.440552e-03 2830 33.398251 4.941027
38 38 1 -2.635832e-03 8.748529e-03 3378 45.972672 4.861779
39 39 1 -2.007920e-02 -8.081778e-03 3281 31.735376 5.173269
40 40 1 -3.453595e-02 -6.867430e-03 2690 47.515182 4.935358
41 41 1 1.698363e-03 -8.296107e-03 2529 42.224693 4.386349
42 42 1 5.257193e-03 1.021242e-02 2571 43.070564 4.194372
43 43 1 6.968817e-03 5.538784e-03 2581 36.055031 4.209373
44 44 1 -7.632907e-04 2.803704e-04 2582 28.257311 4.230427
45 45 1 -3.468894e-03 -9.099842e-04 2409 29.421440 4.190946
46 46 1 1.879089e-02 6.532978e-03 3733 41.535984 4.032614
47 47 1 -1.076225e-03 -1.138945e-03 2712 39.260731 4.580621
48 48 1 -5.306205e-03 2.667941e-03 3446 34.250553 4.925404
49 49 1 -5.380515e-03 -2.595619e-03 3785 50.561493 4.642792
50 50 1 -2.571232e-03 -2.063937e-03 3768 46.160892 4.728879
51 51 1 -7.638110e-03 -2.432463e-03 3413 32.401161 5.058373
52 52 1 -2.950254e-03 -2.034031e-04 3852 32.543564 4.443869
53 53 1 -2.702386e-03 -1.776183e-03 2483 31.002720 3.879390
54 54 1 -3.892425e-02 -2.266178e-03 2225 26.126318 5.750985
55 55 1 -2.644659e-03 3.034660e-03 2192 32.103516 4.949506
56 56 1 -2.862503e-02 3.673996e-04 2361 23.930893 5.181818
57 57 1 6.263880e-03 -7.725377e-04 3780 17.752790 4.890797
58 58 1 1.054093e-03 -1.563014e-03 3089 36.422310 4.520845
59 59 1 9.474340e-04 -3.901043e-03 3155 42.552841 4.265886
60 60 1 5.569567e-03 -1.770366e-04 3516 13.166321 4.772187
61 61 1 -8.342760e-03 -9.908290e-03 3097 36.815479 5.346615
62 62 1 -1.422498e-03 -1.645628e-03 2865 29.802414 4.131463
63 63 1 4.523963e-02 1.067406e-02 2163 36.154739 3.369432
64 64 1 -1.164162e-02 6.808200e-04 2316 19.610609 4.634536
65 65 1 -8.043590e-03 9.395104e-03 2614 44.298817 3.983136
66 66 1 -1.925332e-02 -4.521391e-03 2035 31.205780 4.134195
67 67 1 -1.429050e-02 5.435983e-03 2799 38.876656 4.180761
68 68 1 6.935605e-04 3.015038e-03 2679 37.863647 4.213497
69 69 1 -5.062089e-03 5.961242e-04 2831 32.401161 3.729215
70 70 1 -3.617065e-04 -2.874465e-03 3152 45.871994 4.703659
71 71 1 -4.216370e-02 -4.917050e-03 3726 25.376934 4.614913
72 72 1 -2.184333e-02 -2.840071e-03 3610 43.138550 4.237120
73 73 1 -1.735273e-02 -2.199261e-03 3339 33.984894 4.811754
74 74 1 1.929157e-02 5.358084e-03 3447 32.356407 3.355368
75 75 1 -4.118797e-02 -2.408211e-03 3251 22.373844 5.160147
76 76 1 -1.393304e-02 7.900328e-05 3297 22.090260 4.724728
77 77 1 -3.078095e-02 -5.535597e-03 3143 37.298687 4.625203
78 78 1 1.717030e-02 -1.120720e-03 3617 37.965389 4.627342
79 79 1 -5.965119e-04 -5.377157e-04 3689 28.360373 4.767213
80 80 1 7.843294e-03 -9.579902e-04 3676 48.356312 3.907819
81 81 1 5.994634e-03 2.034169e-03 2759 25.142431 3.980591
82 82 1 -1.323012e-02 2.393529e-03 3972 26.880308 5.107575
83 83 1 6.312347e-03 2.877600e-04 3323 32.167103 3.496723
84 84 1 -1.180464e-02 4.438243e-03 3790 40.369972 4.081389
85 85 1 -8.333334e-03 4.009274e-03 3248 14.931417 4.881107
86 86 1 2.016023e-03 -5.707344e-04 3994 18.305449 4.278613
87 87 1 -5.515654e-03 -8.373593e-04 3368 40.703190 4.229169
88 88 1 8.931696e-03 1.677515e-03 4651 30.133842 4.327270
89 89 1 1.962347e-04 -7.458636e-04 5075 57.352509 3.263017
90 90 1 -2.880805e-02 -5.200595e-04 2645 11.976726 5.634262
91 91 1 -2.101875e-02 -5.110677e-03 3109 34.218582 4.925558
92 92 1 -8.390786e-03 -1.188547e-02 3667 39.895481 4.249029
93 93 1 -1.366958e-02 9.873455e-04 2827 22.636129 5.269634
94 94 1 1.004551e-02 5.205147e-04 3667 44.028976 3.993555
95 95 1 5.892557e-03 -5.482296e-04 2416 5.385977 4.614692
96 96 1 -1.662132e-02 -9.946494e-04 3806 42.599808 3.951163
97 97 1 -7.977792e-03 5.937776e-03 3470 28.888371 3.120762
98 98 1 -2.408042e-02 -2.647421e-03 2975 16.228737 4.227977
99 99 1 -1.191509e-02 -2.014583e-03 2461 30.051607 4.361413
100 100 1 1.110316e-02 2.506189e-04 3362 29.517509 4.591039
101 101 1 2.010373e-03 4.185408e-04 5104 17.387333 3.642855
102 102 1 -3.218945e-03 1.004196e-02 4113 44.448421 3.282414
103 103 1 2.438254e-03 2.551999e-03 3234 31.205780 3.844411
104 104 1 -1.178511e-02 2.775465e-04 1864 1.350224 3.875072
105 105 1 -9.511201e-04 -1.446065e-03 2351 22.406872 4.392300
106 106 1 -4.563018e-03 -5.890041e-03 3141 24.862123 3.998985
107 107 1 -1.471223e-02 5.965497e-03 3765 25.363234 3.661456
108 108 1 -5.857890e-03 -9.363544e-03 2272 22.878105 5.105480
109 109 1 1.369277e-02 1.019289e-02 4016 44.848000 4.092690
110 110 1 -8.784844e-03 3.358194e-03 3293 32.543564 4.115062
111 111 1 -5.148044e-03 5.372697e-03 3038 31.772562 3.626687
112 112 1 -1.556184e+35 5.799786e+34 4961 29.421440 3.020591
113 113 1 3.831991e-03 1.570888e-03 2069 28.821898 3.790284
114 114 1 8.289138e-04 6.439757e-04 2154 21.045721 3.959267
115 115 1 -4.800863e-03 3.194520e-03 5294 45.660866 3.701611
116 116 1 2.974254e-02 1.197812e-02 4380 31.670097 3.877057
117 117 1 1.137725e-02 -1.082659e-02 5172 18.774675 3.572600
118 118 1 -4.678526e-03 7.448288e-03 2257 39.260731 4.227000
119 119 1 -4.655881e-03 -1.119303e-03 3233 30.205467 5.613868
120 120 1 -4.827522e-03 -4.766134e-03 3414 42.974857 3.831894
121 121 1 -8.568994e-04 1.053632e-03 1750 29.421440 4.132886
122 122 1 1.212121e-02 0.000000e+00 5018 20.136303 3.669850
123 123 1 -4.711660e-03 -2.261143e-03 3013 45.007954 3.622240
124 124 1 -1.226328e-02 4.688181e-04 3842 26.880308 3.098333
125 125 1 3.438910e-03 1.441129e-03 3470 11.386165 4.552782
126 126 1 1.192164e-02 -1.295839e-03 3473 22.684824 4.748498
127 127 1 -1.960781e-40 0.000000e+00 4155 90.000000 2.960569
128 128 1 2.124726e-04 1.945100e-03 2496 32.103516 5.242211
129 129 1 5.669804e-03 -4.589476e-03 2577 35.398876 4.271112
130 130 1 -8.838220e-03 -9.496282e-04 4921 14.506372 4.088247
131 131 1 1.009090e-02 -2.243944e-03 3385 38.372120 4.067030
132 132 1 5.630660e-03 -8.632211e-04 4003 33.322365 3.776054
133 133 1 -9.103803e-03 -6.322661e-03 2758 47.934212 3.739807
134 134 1 6.225513e-03 -1.824928e-03 3925 37.085732 3.389725
135 135 1 -1.303080e-03 3.580316e-03 2978 27.432941 4.345174
136 136 1 1.355920e-02 3.468190e-03 5058 57.797195 3.739124
137 137 1 2.092464e-02 -3.244962e-04 2400 3.931096 3.032193
138 138 1 5.691811e-02 -7.933985e-04 3885 15.069956 3.414036
139 139 1 8.052407e-05 -3.197287e-03 3493 33.993008 3.881695
140 140 1 -1.892967e-02 -5.049255e-03 2985 24.904482 4.417928
141 141 1 2.278842e-02 1.188287e-02 3666 31.670097 3.313449
142 142 1 1.496110e-02 2.181270e-03 3702 30.498932 3.171413
[ reached 'max' / getOption("max.print") -- omitted 18 rows ]
$id
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
[34] 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66
[67] 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
[100] 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132
[133] 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160
I tried to lower the number of knots in the gam-call but didn't suceed as well...
Anyone might have an idea?
I fit the gam using the following line:
mgcv::gam(slide ~ s(curv_plan) + s(curv_prof) + s(dem) + s(slope) + s(ca), data = dataframes_new[[7]], family = binomial)
I have experienced the same issue. The root cause was that some of my categorical variables had fewer levels than k in my formula specification. To give an example:
Suppose one of the terms in my formula specification was:
s(I(pmin(example_variable, 120)), k = 5)
and the data in my example_variable had 3 levels (say, "yes", "no", "maybe"). This would throw the above-mentioned error.
In my case, I solved it by creating additional levels in my data (I was creating test data for a unit test). In other cases it could be solved by ensuring k does not exceed the number of levels in your categorical variables.
If you're using categorical variables, check if the root cause might be the same for you.
I found the solution to my problem by reading these:
https://stat.ethz.ch/pipermail/r-sig-ecology/2011-May/002148.html
https://stat.ethz.ch/pipermail/r-help/2007-October/143569.html
The error means that you tried to create a thin plate spline basis expansion with more basis functions than the variable from which the expansion is to be made has unique values.
As you don't show the model fitting code, we can't say more than that one of the smooths in the model you tried to fit didn't have enough unique values for the value of k you specific or used (if you didn't set k a default value was used).

Filtering my R data frame is causing it to sort the data frame incorrectly

Consider the following two code snippets.
A:
download.file("https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FGDP.csv", destfile = "./data/gdp.csv", method = "curl" )
gdp <- read.csv('./data/gdp.csv', header=F, skip=5, nrows=190) # Specify nrows, get correct answer
download.file("https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FEDSTATS_Country.csv", destfile = "./data/education.csv", method = "curl" )
education = read.csv('./data/education.csv')
mergedData <- merge(gdp, education, by.x='V1', by.y='CountryCode')
# No need to remove unranked countries because we specified nrows
# No need to convert V2 from factor to numeric
sortedMergedData = arrange(mergedData, -V2)
sortedMergedData[13,1] # Get KNA, correct answer
B:
download.file("https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FGDP.csv", destfile = "./data/gdp.csv", method = "curl" )
gdp <- read.csv('./data/gdp.csv', header=F, skip=5) # Don't specify nrows, get incorrect answer
download.file("https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FEDSTATS_Country.csv", destfile = "./data/education.csv", method = "curl" )
education = read.csv('./data/education.csv')
mergedData <- merge(gdp, education, by.x='V1', by.y='CountryCode')
mergedData = mergedData[which(mergedData$V2 != ""),] # Remove unranked countries
mergedData$V2 = as.numeric(mergedData$V2) # make V2 a numeric column
sortedMergedData = arrange(mergedData, -V2)
sortedMergedData[13,1] # Get SRB, incorrect answer
I would think the two code snippets would be identical, except that in A you never add the unranked countries to your dataframe and in B you add them but then remove them. Why is the sorting different for these two code snippets?
The file downloads are from Coursera's Getting and Cleaning Data class (Quiz 3, Question 3).
Edit: To avoid security concerns, I've pasted the raw .csv files below
gdp.csv - http://pastebin.com/raw.php?i=4aRZwBRd
education.csv - http://pastebin.com/raw.php?i=0pbhDCSX
Edit2: The problem is occurring in the as.numeric step. For case B, here is mergedData$V2 before and after mergedData$V2 = as.numeric(mergedData$V2) is applied:
> mergedData$V2
[1] 161 105 60 125 32 26 133 172 12 27 68 162 25 140 128 59 76 93
[19] 138 111 69 169 149 96 7 153 113 167 117 165 11 20 36 2 99 98
[37] 121 30 182 166 81 67 102 51 4 183 33 72 48 64 38 159 13 103
[55] 85 43 155 5 185 109 6 114 86 148 175 176 110 42 178 77 160 37
[73] 108 71 139 58 16 10 46 22 47 122 40 9 116 92 3 50 87 145
[91] 120 189 178 15 146 56 136 83 168 171 70 163 84 74 94 82 62 147
[109] 141 132 164 14 188 135 129 137 151 130 118 154 127 152 34 123 144 39
[127] 126 18 23 107 55 66 44 89 49 41 187 115 24 61 45 97 54 52
[145] 8 142 19 73 119 35 174 157 100 88 186 150 63 80 21 158 173 65
[163] 124 156 31 143 91 170 184 101 79 17 190 95 106 53 78 1 75 180
[181] 29 57 177 181 90 28 112 104 134
194 Levels: .. Not available. 1 10 100 101 102 103 104 105 106 107 ... Note: Rankings include only those economies with confirmed GDP estimates. Figures in italics are for 2011 or 2010.
> mergedData$V2 = as.numeric(mergedData$V2)
> mergedData$V2
[1] 72 10 149 32 118 111 41 84 26 112 157 73 110 49 35 147 166 185
[19] 46 17 158 80 58 188 159 63 19 78 23 76 15 105 122 104 191 190
[37] 28 116 94 77 172 156 7 139 126 95 119 162 135 153 124 69 37 8
[55] 176 130 65 137 97 14 148 20 177 57 87 88 16 129 90 167 71 123
[73] 13 161 47 146 70 4 133 107 134 29 127 181 22 184 115 138 178 54
[91] 27 101 90 59 55 144 44 174 79 83 160 74 175 164 186 173 151 56
[109] 50 40 75 48 100 43 36 45 61 38 24 64 34 62 120 30 53 125
[127] 33 91 108 12 143 155 131 180 136 128 99 21 109 150 132 189 142 140
[145] 170 51 102 163 25 121 86 67 5 179 98 60 152 171 106 68 85 154
[163] 31 66 117 52 183 82 96 6 169 81 103 187 11 141 168 3 165 92
[181] 114 145 89 93 182 113 18 9 42
Can anyone explain why the numbers change when I apply as.numeric()?
The real reason for getting different results are in the second case i.e. the full dataset have some footer notes, which were also read with the read.csv resulting in most of the columns to be 'factor' class because of the 'character' elements in the footer. This could have avoided either by
skipping the last few lines using skip argument in read.csv
using stringsAsFactors=FALSE in the read.csv call along with skipping the lines.
The columns were ordered based on the "levels" of the factor.
If you have already read the files without skipping the lines, convert to the respective classes. If it is 'numeric' column, convert it to numeric by as.numeric(as.character(df$column)) or as.numeric(levels(df$column))[df$column].

Create a for loop which prints every number that is x%%3=0 between 1-200

Like the title says I need a for loop which will write every number from 1 to 200 that is evenly divided by 3.
Every other method posted so far generates the 1:200 vector then throws away two thirds of it. What a waste. In an attempt to be eco-conscious, this method does not waste any electrons:
seq(3,200,by=3)
You don't need a for loop, use match function instead, as in:
which(1:200 %% 3 == 0)
[1] 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72 75 78 81
[28] 84 87 90 93 96 99 102 105 108 111 114 117 120 123 126 129 132 135 138 141 144 147 150 153 156 159 162
[55] 165 168 171 174 177 180 183 186 189 192 195 198
Two other alternatives:
c(1:200)[c(F, F, T)]
c(1:200)[1:200 %% 3 == 0]

Resources