creating a Sparse matrix from a list of lists- R - r

I have a list called res which includes 83 lists with the following format. I need to generate one sparse matrix out of these lists. Row and Columns are indecies for the row and column of the sparse matrix and freq is the entry for that corresponding index.
Example of format for res[82] and res[83]:
[[82]]
Row Columns Freq
2 82 33 1
3 82 173 1
4 82 211 1
5 82 247 2
6 82 480 2
7 82 541 1
8 82 974 1
9 82 1197 1
10 82 1416 1
11 82 1531 1
12 82 1797 7
13 82 2416 2
14 82 2530 1
15 82 2772 1
16 82 2970 2
17 82 3264 4
18 82 3416 1
19 82 3995 4
20 82 5593 1
21 82 6557 1
22 82 8141 1
23 82 9044 1
24 82 11889 1
25 82 12608 1
26 82 13352 1
27 82 13463 1
28 82 17937 1
29 82 29730 1
30 82 37712 1
31 82 258434 1
[[83]]
Row Columns Freq
2 83 309 1
3 83 447 1
4 83 480 2
5 83 487 1
6 83 619 1
7 83 651 1
8 83 913 1
9 83 1555 1
10 83 1874 1
11 83 2416 1
12 83 3101 1
13 83 3856 1
14 83 3964 1
15 83 3995 1
16 83 4017 1
17 83 4362 1
18 83 10551 1
19 83 17130 1
20 83 29730 1

We can use sparseMatrix from Matrix after rbinding the list elements.
library(Matrix)
d1 <- do.call(rbind, lst)
res <- sparseMatrix(d1[,1], d1[,2], x = d1[,3])

Related

When using table on a vector, the numbers in the names are out of order

I have a data frame with a column Session. There are 215 unique values for Session, and I am trying to treat it as a categorical variable.
However, when I run table(df$Session), the sessions are not appearing in order and some appear to be missing:
table(df$Session)
1 10 100 101 102 103 104 105 106 107 108 109 11 110 111 113 114 115 116 117 118
6 11 20 14 17 8 14 11 8 14 15 17 12 16 15 17 19 26 24 31 28
12 120 121 122 123 124 125 126 127 128 13 130 131 132 133 134 135 136 137 138 139
13 36 27 20 23 18 12 12 40 52 19 91 78 88 78 8 7 74 5 8 6
14 140 141 142 143 144 145 146 147 148 149 15 150 151 152 153 154 155 156 157 158
14 7 6 7 5 3 75 3 70 75 68 16 68 67 67 68 58 69 70 68 26
159 16 160 161 162 163 164 165 166 167 168 169 17 170 171 172 173 174 175 176 177
75 17 65 70 63 76 57 43 45 32 31 18 18 20 17 22 13 15 12 7 7
178 179 18 180 181 182 183 184 185 186 187 188 189 19 190 191 192 193 194 195 196
6 7 17 9 9 13 12 18 19 22 15 3 10 3 21 32 43 54 66 77 84
197 198 199 2 20 200 201 202 203 204 205 206 207 208 209 21 210 211 212 213 215
77 85 79 6 17 89 87 93 85 85 98 80 78 68 54 17 34 24 50 50 65
22 23 24 25 26 27 28 29 3 30 31 32 33 34 35 36 37 38 39 4 40
11 12 12 10 11 7 7 10 4 7 8 7 6 9 11 10 23 27 14 3 21
41 42 43 44 45 46 47 48 49 5 50 51 52 53 54 55 56 57 58 59 6
27 16 16 18 10 12 19 7 6 4 5 13 21 17 25 31 32 30 15 10 3
60 61 62 63 64 65 66 67 68 69 7 70 71 73 74 75 76 77 78 79 8
18 17 11 14 14 15 18 11 13 9 7 13 12 7 8 8 9 12 8 9 6
80 81 82 83 84 85 86 87 88 89 9 90 91 92 93 94 95 97 98 99
1 11 8 17 20 13 14 18 19 19 9 14 16 12 15 17 19 13 7 16
If we only look at a couple of columns:
table(df$Session)
# 1 10 100 101 ... 197 198 199 2 20 200 201 202 ...
# 6 11 20 14 ... 77 85 79 6 17 89 87 93 ...
Why are they not ordered by number (1, 2, 3 instead of 1, 10, 100)? And how can I correct this?
Answer
The variable will be sorted correctly if you make it numeric first:
table(as.numeric(df$Session))
table(as.factor(as.numeric(df$Session)))
Explanation
Your variable is or was of the class character. The order of your variable is alphabetically, i.e. what would happen if you sort a character vector. Try: sort(c("1", "11", "2")). When you apply factor or as.factor to a character vector, the levels will be ordered as such (see ?factor):
levels: an optional vector of the unique values (as character strings) that x might have taken. The default is the unique set of values taken by as.character(x), sorted into increasing order of x. Note that this set can be specified as smaller than sort(unique(x)).
Keep in mind that R reads in numbers as numeric by default. If you expected the column to be numeric from the start but R made it character, then you likely have values in there that are not strictly numbers. It is important to find out why the vector was character.
Reproducible example
vec <- c(22, 11, 3, 2, 1)
table(vec) # correct: numeric
# 1 2 3 11 22
# 1 1 1 1 1
table(as.character(vec)) # incorrect: character
# 1 11 2 22 3
# 1 1 1 1 1
table(as.factor(as.character(vec))) # incorrect: character -> factor
# 1 11 2 22 3
# 1 1 1 1 1
table(as.factor(vec)) # correct: numeric -> factor
# 1 2 3 11 22
# 1 1 1 1 1

GAMs in R: Fewer unique covariate combinations than df

I tried fitting gams to some dataframes I have. All minus one work. It fails with the error:
Error in smooth.construct.tp.smooth.spec(object, dk$data, dk$knots) : A term has fewer unique covariate combinations than specified maximum degrees of freedom
I looked a bit on the internet but couldn't really figure out what's really going wrong. All my 7 over dataframes run without a problem.
I then ran epiR::epi.cp(srtm[-c(1,7,8)]) and it gave me this output:
$cov.pattern
id n curv_plan curv_prof dem slope ca
1 1 1 1.113192e-02 3.991046e-03 3909 43.601479 5.225853
2 2 1 -2.686749e-03 3.474989e-03 3312 35.022511 4.418310
3 3 1 -1.033450e-02 -4.626922e-03 3326 36.678623 4.421465
4 4 1 -5.439283e-03 2.066148e-03 4069 31.501045 3.887526
5 5 1 -2.602015e-03 -1.249511e-04 3021 37.199219 5.010560
6 6 1 1.068216e-03 1.216902e-03 2844 44.694374 4.852220
7 7 1 -1.855443e-02 -5.965539e-03 2841 42.753750 5.088554
8 8 1 2.363193e-03 2.353357e-03 2833 33.160995 4.652209
9 9 1 2.169674e-02 1.049735e-02 2964 32.311535 4.671970
10 10 1 2.850910e-02 9.416230e-03 2956 50.791847 3.496096
11 11 1 -1.932028e-02 4.949751e-04 2794 38.714302 4.217102
12 12 1 -1.372750e-03 -4.437230e-03 3799 48.356312 4.597039
13 13 1 1.154181e-04 -4.114155e-03 3808 54.669777 3.518823
14 14 1 2.743768e-02 7.829833e-03 3580 23.674162 3.268744
15 15 1 7.216539e-03 9.818082e-04 3969 29.421440 4.354250
16 16 1 2.385139e-03 6.333927e-04 3635 10.555381 4.905733
17 17 1 -1.129411e-02 2.719948e-03 2805 29.195084 4.807369
18 18 1 4.584329e-04 -1.497223e-03 3676 32.754879 3.729304
19 19 1 1.883965e-03 4.189690e-03 3165 30.973505 4.833158
20 20 1 -5.350136e-03 -2.615470e-03 2745 32.534698 4.420852
21 21 1 1.484253e-02 -1.245213e-03 3872 26.113234 4.045357
22 22 1 -2.449377e-02 -5.045668e-04 2931 31.060991 5.170872
23 23 1 -2.962795e-02 -9.271557e-03 2917 21.680889 4.547461
24 24 1 -2.487545e-02 -7.834328e-03 2736 41.775677 4.543325
25 25 1 2.890568e-03 -2.040353e-03 2577 47.003765 3.739546
26 26 1 -5.119631e-03 8.869720e-03 3401 38.519680 5.428564
27 27 1 6.171266e-03 -6.515175e-04 2687 36.678623 4.152842
28 28 1 -8.297552e-03 -7.053435e-03 3678 39.532673 4.081311
29 29 1 8.652663e-03 2.394378e-03 3515 33.895370 4.220177
30 30 1 -2.528805e-03 -1.293259e-03 3404 42.548138 4.266330
31 31 1 1.899994e-02 6.367806e-03 3191 41.696201 3.300749
32 32 1 -2.243623e-02 -1.866033e-04 2433 34.162479 5.364681
33 33 1 -6.934012e-03 9.280805e-03 2309 32.667160 5.650699
34 34 1 -1.121149e-02 6.376335e-05 2188 31.119059 4.706416
35 35 1 -1.429000e-02 5.299596e-04 2511 34.543365 4.538456
36 36 1 -7.168889e-03 1.301791e-03 2625 30.826660 4.059711
37 37 1 -4.226461e-03 7.440552e-03 2830 33.398251 4.941027
38 38 1 -2.635832e-03 8.748529e-03 3378 45.972672 4.861779
39 39 1 -2.007920e-02 -8.081778e-03 3281 31.735376 5.173269
40 40 1 -3.453595e-02 -6.867430e-03 2690 47.515182 4.935358
41 41 1 1.698363e-03 -8.296107e-03 2529 42.224693 4.386349
42 42 1 5.257193e-03 1.021242e-02 2571 43.070564 4.194372
43 43 1 6.968817e-03 5.538784e-03 2581 36.055031 4.209373
44 44 1 -7.632907e-04 2.803704e-04 2582 28.257311 4.230427
45 45 1 -3.468894e-03 -9.099842e-04 2409 29.421440 4.190946
46 46 1 1.879089e-02 6.532978e-03 3733 41.535984 4.032614
47 47 1 -1.076225e-03 -1.138945e-03 2712 39.260731 4.580621
48 48 1 -5.306205e-03 2.667941e-03 3446 34.250553 4.925404
49 49 1 -5.380515e-03 -2.595619e-03 3785 50.561493 4.642792
50 50 1 -2.571232e-03 -2.063937e-03 3768 46.160892 4.728879
51 51 1 -7.638110e-03 -2.432463e-03 3413 32.401161 5.058373
52 52 1 -2.950254e-03 -2.034031e-04 3852 32.543564 4.443869
53 53 1 -2.702386e-03 -1.776183e-03 2483 31.002720 3.879390
54 54 1 -3.892425e-02 -2.266178e-03 2225 26.126318 5.750985
55 55 1 -2.644659e-03 3.034660e-03 2192 32.103516 4.949506
56 56 1 -2.862503e-02 3.673996e-04 2361 23.930893 5.181818
57 57 1 6.263880e-03 -7.725377e-04 3780 17.752790 4.890797
58 58 1 1.054093e-03 -1.563014e-03 3089 36.422310 4.520845
59 59 1 9.474340e-04 -3.901043e-03 3155 42.552841 4.265886
60 60 1 5.569567e-03 -1.770366e-04 3516 13.166321 4.772187
61 61 1 -8.342760e-03 -9.908290e-03 3097 36.815479 5.346615
62 62 1 -1.422498e-03 -1.645628e-03 2865 29.802414 4.131463
63 63 1 4.523963e-02 1.067406e-02 2163 36.154739 3.369432
64 64 1 -1.164162e-02 6.808200e-04 2316 19.610609 4.634536
65 65 1 -8.043590e-03 9.395104e-03 2614 44.298817 3.983136
66 66 1 -1.925332e-02 -4.521391e-03 2035 31.205780 4.134195
67 67 1 -1.429050e-02 5.435983e-03 2799 38.876656 4.180761
68 68 1 6.935605e-04 3.015038e-03 2679 37.863647 4.213497
69 69 1 -5.062089e-03 5.961242e-04 2831 32.401161 3.729215
70 70 1 -3.617065e-04 -2.874465e-03 3152 45.871994 4.703659
71 71 1 -4.216370e-02 -4.917050e-03 3726 25.376934 4.614913
72 72 1 -2.184333e-02 -2.840071e-03 3610 43.138550 4.237120
73 73 1 -1.735273e-02 -2.199261e-03 3339 33.984894 4.811754
74 74 1 1.929157e-02 5.358084e-03 3447 32.356407 3.355368
75 75 1 -4.118797e-02 -2.408211e-03 3251 22.373844 5.160147
76 76 1 -1.393304e-02 7.900328e-05 3297 22.090260 4.724728
77 77 1 -3.078095e-02 -5.535597e-03 3143 37.298687 4.625203
78 78 1 1.717030e-02 -1.120720e-03 3617 37.965389 4.627342
79 79 1 -5.965119e-04 -5.377157e-04 3689 28.360373 4.767213
80 80 1 7.843294e-03 -9.579902e-04 3676 48.356312 3.907819
81 81 1 5.994634e-03 2.034169e-03 2759 25.142431 3.980591
82 82 1 -1.323012e-02 2.393529e-03 3972 26.880308 5.107575
83 83 1 6.312347e-03 2.877600e-04 3323 32.167103 3.496723
84 84 1 -1.180464e-02 4.438243e-03 3790 40.369972 4.081389
85 85 1 -8.333334e-03 4.009274e-03 3248 14.931417 4.881107
86 86 1 2.016023e-03 -5.707344e-04 3994 18.305449 4.278613
87 87 1 -5.515654e-03 -8.373593e-04 3368 40.703190 4.229169
88 88 1 8.931696e-03 1.677515e-03 4651 30.133842 4.327270
89 89 1 1.962347e-04 -7.458636e-04 5075 57.352509 3.263017
90 90 1 -2.880805e-02 -5.200595e-04 2645 11.976726 5.634262
91 91 1 -2.101875e-02 -5.110677e-03 3109 34.218582 4.925558
92 92 1 -8.390786e-03 -1.188547e-02 3667 39.895481 4.249029
93 93 1 -1.366958e-02 9.873455e-04 2827 22.636129 5.269634
94 94 1 1.004551e-02 5.205147e-04 3667 44.028976 3.993555
95 95 1 5.892557e-03 -5.482296e-04 2416 5.385977 4.614692
96 96 1 -1.662132e-02 -9.946494e-04 3806 42.599808 3.951163
97 97 1 -7.977792e-03 5.937776e-03 3470 28.888371 3.120762
98 98 1 -2.408042e-02 -2.647421e-03 2975 16.228737 4.227977
99 99 1 -1.191509e-02 -2.014583e-03 2461 30.051607 4.361413
100 100 1 1.110316e-02 2.506189e-04 3362 29.517509 4.591039
101 101 1 2.010373e-03 4.185408e-04 5104 17.387333 3.642855
102 102 1 -3.218945e-03 1.004196e-02 4113 44.448421 3.282414
103 103 1 2.438254e-03 2.551999e-03 3234 31.205780 3.844411
104 104 1 -1.178511e-02 2.775465e-04 1864 1.350224 3.875072
105 105 1 -9.511201e-04 -1.446065e-03 2351 22.406872 4.392300
106 106 1 -4.563018e-03 -5.890041e-03 3141 24.862123 3.998985
107 107 1 -1.471223e-02 5.965497e-03 3765 25.363234 3.661456
108 108 1 -5.857890e-03 -9.363544e-03 2272 22.878105 5.105480
109 109 1 1.369277e-02 1.019289e-02 4016 44.848000 4.092690
110 110 1 -8.784844e-03 3.358194e-03 3293 32.543564 4.115062
111 111 1 -5.148044e-03 5.372697e-03 3038 31.772562 3.626687
112 112 1 -1.556184e+35 5.799786e+34 4961 29.421440 3.020591
113 113 1 3.831991e-03 1.570888e-03 2069 28.821898 3.790284
114 114 1 8.289138e-04 6.439757e-04 2154 21.045721 3.959267
115 115 1 -4.800863e-03 3.194520e-03 5294 45.660866 3.701611
116 116 1 2.974254e-02 1.197812e-02 4380 31.670097 3.877057
117 117 1 1.137725e-02 -1.082659e-02 5172 18.774675 3.572600
118 118 1 -4.678526e-03 7.448288e-03 2257 39.260731 4.227000
119 119 1 -4.655881e-03 -1.119303e-03 3233 30.205467 5.613868
120 120 1 -4.827522e-03 -4.766134e-03 3414 42.974857 3.831894
121 121 1 -8.568994e-04 1.053632e-03 1750 29.421440 4.132886
122 122 1 1.212121e-02 0.000000e+00 5018 20.136303 3.669850
123 123 1 -4.711660e-03 -2.261143e-03 3013 45.007954 3.622240
124 124 1 -1.226328e-02 4.688181e-04 3842 26.880308 3.098333
125 125 1 3.438910e-03 1.441129e-03 3470 11.386165 4.552782
126 126 1 1.192164e-02 -1.295839e-03 3473 22.684824 4.748498
127 127 1 -1.960781e-40 0.000000e+00 4155 90.000000 2.960569
128 128 1 2.124726e-04 1.945100e-03 2496 32.103516 5.242211
129 129 1 5.669804e-03 -4.589476e-03 2577 35.398876 4.271112
130 130 1 -8.838220e-03 -9.496282e-04 4921 14.506372 4.088247
131 131 1 1.009090e-02 -2.243944e-03 3385 38.372120 4.067030
132 132 1 5.630660e-03 -8.632211e-04 4003 33.322365 3.776054
133 133 1 -9.103803e-03 -6.322661e-03 2758 47.934212 3.739807
134 134 1 6.225513e-03 -1.824928e-03 3925 37.085732 3.389725
135 135 1 -1.303080e-03 3.580316e-03 2978 27.432941 4.345174
136 136 1 1.355920e-02 3.468190e-03 5058 57.797195 3.739124
137 137 1 2.092464e-02 -3.244962e-04 2400 3.931096 3.032193
138 138 1 5.691811e-02 -7.933985e-04 3885 15.069956 3.414036
139 139 1 8.052407e-05 -3.197287e-03 3493 33.993008 3.881695
140 140 1 -1.892967e-02 -5.049255e-03 2985 24.904482 4.417928
141 141 1 2.278842e-02 1.188287e-02 3666 31.670097 3.313449
142 142 1 1.496110e-02 2.181270e-03 3702 30.498932 3.171413
[ reached 'max' / getOption("max.print") -- omitted 18 rows ]
$id
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
[34] 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66
[67] 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
[100] 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132
[133] 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160
I tried to lower the number of knots in the gam-call but didn't suceed as well...
Anyone might have an idea?
I fit the gam using the following line:
mgcv::gam(slide ~ s(curv_plan) + s(curv_prof) + s(dem) + s(slope) + s(ca), data = dataframes_new[[7]], family = binomial)
I have experienced the same issue. The root cause was that some of my categorical variables had fewer levels than k in my formula specification. To give an example:
Suppose one of the terms in my formula specification was:
s(I(pmin(example_variable, 120)), k = 5)
and the data in my example_variable had 3 levels (say, "yes", "no", "maybe"). This would throw the above-mentioned error.
In my case, I solved it by creating additional levels in my data (I was creating test data for a unit test). In other cases it could be solved by ensuring k does not exceed the number of levels in your categorical variables.
If you're using categorical variables, check if the root cause might be the same for you.
I found the solution to my problem by reading these:
https://stat.ethz.ch/pipermail/r-sig-ecology/2011-May/002148.html
https://stat.ethz.ch/pipermail/r-help/2007-October/143569.html
The error means that you tried to create a thin plate spline basis expansion with more basis functions than the variable from which the expansion is to be made has unique values.
As you don't show the model fitting code, we can't say more than that one of the smooths in the model you tried to fit didn't have enough unique values for the value of k you specific or used (if you didn't set k a default value was used).

Conditionally replace column names in a dataframe based on values in another dataframe

I have downloaded a table of stream diversion data ("df_download"). The column names of this table are primarily taken from the ID numbers of the gauging stations.
I want to conditionally replace the ID numbers that have been used for column names with text for the station names, which will help make the data more readable when I'm sharing the results. I created a table ("stationIDs") with the ID numbers and station names to use as a reference for changing the column names of "df_download".
I can replace the column names individually, but I want to write a loop of some kind that will address all of the columns of "df_download" and change the names of the columns referenced in the dataframe "stationIDs".
An example of what I'm trying to do is below.
Downloaded Data ("df_download")
A portion of the downloaded data is similar to this:
df_downloaded <- data.frame(Var1 = seq(as.Date("2012-01-01"),as.Date("2012-12-01"), by="month"),
Var2 = sample(50:150,12, replace =TRUE),
Var3 = sample(10:100,12, replace =TRUE),
Var4 = sample(15:45,12, replace =TRUE),
Var5 = sample(50:200,12, replace =TRUE),
Var6 = sample(15:100,12, replace =TRUE),
Var7 = c(rep(0,3),rep(13,6),rep(0,3)),
Var8 = rep(5,12))
colnames(df_downloaded) <- c("Diversion.Date","360410059","360410060",
"360410209","361000655","361000656","Irrigation","Seep")
df_download # not run
#
# Diversion.Date 360410059 360410060 360410209 361000655 361000656 Irrigation Seep
# 1 2012-01-01 93 57 28 101 16 0 5
# 2 2012-02-01 102 68 19 124 98 0 5
# 3 2012-03-01 124 93 36 109 56 0 5
# 4 2012-04-01 94 96 23 54 87 13 5
# 5 2012-05-01 83 70 43 119 15 13 5
# 6 2012-06-01 78 63 45 195 15 13 5
# 7 2012-07-01 86 77 20 130 63 13 5
# 8 2012-08-01 118 29 27 118 57 13 5
# 9 2012-09-01 142 18 45 116 27 13 5
# 10 2012-10-01 74 68 34 182 79 0 5
# 11 2012-11-01 106 48 27 95 74 0 5
# 12 2012-12-01 91 41 20 179 55 0 5
Reference Table ("stationIDs")
stationIDs <- data.frame(ID = c("360410059", "360410060", "360410209", "361000655", "361000656"),
Names = c("RimView", "IPCO", "WMA.Ditch", "RV.Bypass", "LowerFalls"))
stationIDs # not run
#
# ID Names
# 1 360410059 RimView
# 2 360410060 IPCO
# 3 360410209 WMA.Ditch
# 4 361000655 RV.Bypass
# 5 361000656 LowerFalls
I can replace the column names in "df_downloaded" using individual statements. I show the first three iterations below.
After three iterations "RimValley", "IPCO", and "WMA.Ditch" have replaced their respective gauge ID numbers.
names(df_downloaded) <- gsub(stationIDs$ID[1],stationIDs$Name[1],names(df_downloaded))
# head(df_downloaded)
# Diversion.Date RimView 360410060 360410209 361000655 361000656 Irrigation Seep
# 1 2012-01-01 93 57 28 101 16 0 5
# 2 2012-02-01 102 68 19 124 98 0 5
# 3 2012-03-01 124 93 36 109 56 0 5
# 4 2012-04-01 94 96 23 54 87 13 5
# 5 2012-05-01 83 70 43 119 15 13 5
# 6 2012-06-01 78 63 45 195 15 13 5
names(df_downloaded) <- gsub(stationIDs$ID[2],stationIDs$Name[2],names(df_downloaded))
# head(df_downloaded)
# Diversion.Date RimView IPCO 360410209 361000655 361000656 Irrigation Seep
# 1 2012-01-01 93 57 28 101 16 0 5
# 2 2012-02-01 102 68 19 124 98 0 5
# 3 2012-03-01 124 93 36 109 56 0 5
# 4 2012-04-01 94 96 23 54 87 13 5
# 5 2012-05-01 83 70 43 119 15 13 5
# 6 2012-06-01 78 63 45 195 15 13 5
names(df_downloaded) <- gsub(stationIDs$ID[3],stationIDs$Name[3],names(df_downloaded))
# head(df_downloaded)
# Diversion.Date RimView IPCO WMA.Ditch 361000655 361000656 Irrigation Seep
# 1 2012-01-01 93 57 28 101 16 0 5
# 2 2012-02-01 102 68 19 124 98 0 5
# 3 2012-03-01 124 93 36 109 56 0 5
# 4 2012-04-01 94 96 23 54 87 13 5
# 5 2012-05-01 83 70 43 119 15 13 5
# 6 2012-06-01 78 63 45 195 15 13 5
If I try to do the renaming using a for loop, I end up with NAs for column names.
for(i in seq_along(names(df_downloaded))){
names(df_downloaded) <- gsub(stationIDs$ID[i],stationIDs$Name[i],names(df_downloaded))
}
# head(df_downloaded)
# NA NA NA NA NA NA NA NA
# 1 2012-01-01 93 57 28 101 16 0 5
# 2 2012-02-01 102 68 19 124 98 0 5
# 3 2012-03-01 124 93 36 109 56 0 5
# 4 2012-04-01 94 96 23 54 87 13 5
# 5 2012-05-01 83 70 43 119 15 13 5
# 6 2012-06-01 78 63 45 195 15 13 5
I really want to be able to change the names with a for loop or something similar, because because the number of stations that I download data from changes depending on the years that I am analyzing.
Thanks for taking time to look at my question.
We can use match
#Convert factor columns to character
stationIDs[] <- lapply(stationIDs, as.character)
#Match names of df_downloaded with stationIDs$ID
inds <- match(names(df_downloaded), stationIDs$ID)
#Replace the matched name with corresponding Names from stationIDs
names(df_downloaded)[which(!is.na(inds))] <- stationIDs$Names[inds[!is.na(inds)]]
df_downloaded
# Diversion.Date RimView IPCO WMA.Ditch RV.Bypass LowerFalls Irrigation Seep
#1 2012-01-01 142 14 41 200 79 0 5
#2 2012-02-01 97 100 35 176 22 0 5
#3 2012-03-01 85 59 26 88 71 0 5
#4 2012-04-01 68 49 34 63 15 13 5
#5 2012-05-01 62 58 44 87 16 13 5
#6 2012-06-01 70 59 33 145 87 13 5
#7 2012-07-01 112 65 25 52 64 13 5
#8 2012-08-01 75 12 27 103 19 13 5
#9 2012-09-01 73 65 36 172 68 13 5
#10 2012-10-01 87 35 27 146 42 0 5
#11 2012-11-01 122 17 33 183 32 0 5
#12 2012-12-01 108 65 15 120 99 0 5
You can do this dplyr and tidyr. You basically want to make your data long so that the IDs are in a column so that you can do a join on this with your reference of IDs to names. Then you can make your data wide again.
df_downloaded %>%
gather(ID, value, -Diversion.Date, -Irrigation, -Seep) %>%
left_join(., stationIDs) %>%
dplyr::select(-ID) %>%
spread(Names, value)

Using barplot in R doesn't not match the data?

I want to use barplot (or any other better options) to plot the following data:
action_number times
1 1 13408
2 2 5550
3 3 2757
4 4 1782
5 5 1114
6 6 847
7 7 582
8 8 410
9 9 306
10 10 278
11 11 212
12 12 165
13 13 139
14 14 112
15 15 106
16 16 82
17 17 64
18 18 61
19 19 69
20 20 47
21 21 31
22 22 40
23 23 34
24 24 31
25 25 28
26 26 26
27 27 21
28 28 16
29 29 14
30 30 16
31 31 11
32 32 10
33 33 11
34 34 10
35 35 4
36 36 6
37 37 5
38 38 8
39 39 6
40 40 3
41 41 6
42 42 8
43 43 3
44 44 3
45 45 7
46 46 8
47 47 4
48 48 4
49 49 1
50 50 4
51 51 2
52 52 4
53 53 3
54 54 1
55 55 2
56 56 1
57 58 2
58 59 4
59 60 1
60 62 2
61 63 1
62 66 1
63 67 4
64 68 2
65 69 1
66 70 1
67 71 1
68 73 1
69 74 1
70 77 1
71 79 1
72 80 1
73 82 1
74 92 2
75 97 1
76 98 1
77 103 1
78 106 1
79 114 1
80 118 1
81 128 1
82 142 1
83 148 1
84 153 1
85 155 1
86 166 1
87 183 1
88 218 1
89 224 1
90 298 1
91 536 1
I am using the following, but it does not match the data correctly:
mp <- barplot(data$times,axes=FALSE,ylim=c(0,13408))
axis(1,at=data$action_number,labels=data$action_number)
#??? Should I use at=data$action_number to at=data$times
axis(2,seq(0,91,3),c(0:30))
![enter image description here][1]
Problems:
- the x-axis does not have 536, it only goes to 224
- the Y axis only shows one number
Can you please give me advice and if I should use any package?
still, unclear but may be something like this
barplot(data$times, xlab=data$action_number)
mp <- barplot(data$times,axes=FALSE,ylim=c(0,13408))
axis(1,at=seq(1,91,10),labels=data$action_number[seq(1,91,10)])
axis(2,seq(0,13408,500),seq(0,13408,500))

How to annotate boxplots using svyboxplot library in R

I am trying to figure out how to label the boxplots that appear after I use the svyboxplot library for R.
I have tried the following:
svyboxplot(~ALCANYNO~factor(REGION), design=ihisDesign3, xlab='Region', ylab='Frequency', ylim=c(0,10), colnames=c("Northeast", "Midwest", "South", "West"));
SOLUTION: Add the following to factor:
labels = c('Northeast', 'Midwest', 'South', 'West')
This changes the example above to the following:
svyboxplot(~ALCANYNO~factor(REGION,
labels=c('Northeast', 'Midwest', 'South', 'West')),
design=ihisDesign3, xlab='Region', ylab='Frequency',
ylim =c (0, 10))
I am Creating a dataset to explain:
options(width = 120)
library (survey)
library (KernSmooth)
xd1<-
"xsmoke age_p psu stratum wt8
13601 3 22 2 20 356.5600
32966 3 38 2 45 434.3562
63493 1 32 1 87 699.9987
238175 3 46 1 338 982.8075
174162 3 40 1 240 273.6313
220206 3 33 2 308 1477.1688
118133 3 68 1 159 716.3012
142859 2 23 1 194 1100.9475
115253 2 35 2 155 444.3750
61675 3 31 1 85 769.5963
189813 3 37 1 263 328.5600
226274 1 47 2 318 605.8700
41969 3 71 2 58 597.0150
167667 3 40 2 230 1030.4637
225103 3 37 2 316 349.6825
49894 3 70 2 68 517.7862
98075 3 46 2 130 1428.7225
180771 3 50 1 250 652.4188
137057 3 42 1 186 590.2100
77705 2 23 1 105 1687.2450
89106 3 48 1 118 407.6513
208178 3 50 1 290 556.5000
100403 3 52 2 133 1481.8200
221571 1 27 2 310 833.5338
10823 2 72 1 16 1807.6425
108431 3 71 2 145 945.6263
68708 1 46 1 94 1989.3775
23874 3 23 2 33 1707.8775
150634 3 19 2 206 761.1500
231232 3 42 2 326 1487.4113
184654 2 42 2 255 1715.2375
215312 3 57 1 300 483.5663
40713 2 57 2 56 2042.2762
130309 3 23 1 177 948.5625
25515 2 55 1 35 2719.7525
235612 2 83 2 333 603.3537
13755 2 36 2 20 265.1938
2441 3 33 1 4 1062.1200
157327 3 77 1 215 2010.6600
66502 3 20 2 91 1122.9725
230778 1 55 2 325 1207.3025
74805 3 54 1 101 1028.5150
166556 1 50 1 229 1546.9450
91914 1 68 1 121 428.5350
89651 3 59 2 118 143.5437
149329 3 44 2 204 1064.7725
212700 2 59 2 295 1050.1163
454 1 79 1 1 275.5700
125639 1 27 1 170 785.1037
55442 3 47 1 76 950.3312
145132 3 77 1 197 1269.2287
123069 3 24 1 167 216.1937
188301 1 55 2 260 426.6313
852 2 66 2 1 1443.4887
3582 3 81 1 6 790.8412
235423 1 44 2 333 659.4238
42175 2 40 1 59 1089.6762
57033 3 43 1 78 226.8750
177273 2 85 1 244 392.7200
218558 3 40 2 305 1680.2700
27784 2 45 1 39 280.0550
81823 3 43 1 110 965.0438
76344 3 26 1 103 1095.6012
114916 3 56 2 154 436.8838
35563 3 78 1 49 333.2875
192279 3 30 2 267 722.0312
61315 1 48 2 84 1426.5725
219903 3 43 1 308 791.5738
42612 3 25 1 60 658.1387
178488 3 33 2 246 675.1912
9031 1 27 2 14 989.4863
145092 2 64 1 197 960.1912
71885 3 53 2 97 595.4050
38137 2 75 1 53 1004.0912
140149 1 21 1 190 1870.9350
162052 3 25 1 223 892.7775
89527 2 39 2 118 518.1050
59650 3 26 2 82 432.7837
24709 2 84 1 34 453.9013
18933 3 85 1 27 582.3288
24904 3 35 2 34 1027.5287
213668 3 39 1 298 3174.1925
110509 3 30 1 149 469.8188
72462 3 63 1 98 386.2163
152596 3 19 1 209 1328.2188
17014 4 62 1 24 294.9250
33467 2 50 1 46 1601.4575
5241 3 33 1 9 1651.0988
215094 3 23 1 300 427.6313
88885 1 21 1 118 1092.2613
204868 2 60 2 285 781.2325
157415 2 31 2 215 1323.5750
71081 2 44 2 96 1059.2088
25420 3 38 1 35 530.7413
144226 1 27 1 196 1126.3112
47888 3 46 2 66 965.4050
216179 3 29 2 301 1237.6463
29172 3 68 1 41 1025.9738
168786 1 47 1 232 680.6213
94035 2 23 2 124 330.4563
170542 1 25 2 234 757.2287
160331 2 33 2 220 636.3900
124163 3 80 2 167 287.6988
71442 2 37 1 97 442.2300
80191 2 74 2 107 871.0338
199309 3 29 2 277 485.2337
91293 3 35 2 120 138.3187
219524 2 68 1 307 609.5862
119336 3 85 2 160 149.7612
31814 3 68 1 44 396.6913
54920 1 28 2 75 532.7175
161034 3 29 2 221 791.0100
177037 1 50 1 244 626.2400
119963 1 54 1 162 374.1062
107972 2 58 1 145 944.8863
22932 3 60 1 32 310.6413
54197 3 23 2 74 931.2737
209598 3 23 1 292 1078.2950
213604 1 74 2 297 588.5000
146480 3 27 1 200 212.0588
162463 3 55 2 223 1202.0925
215534 3 33 2 300 430.3938
100703 1 53 1 134 463.6200
162588 3 27 1 224 612.0250
222676 1 35 1 312 292.7000
220052 3 84 1 308 1301.4738
131382 3 36 1 178 825.9512
102117 3 28 1 137 451.4075
70362 3 52 2 95 185.2562
188757 3 22 2 261 704.3913
215878 2 37 1 301 789.9837
45820 3 18 2 64 2019.4137
84860 3 47 1 113 149.0200
110581 3 37 1 149 526.0775
207650 3 51 2 289 688.0538
40723 3 59 2 56 497.6050
169663 3 19 2 233 845.0362
191955 1 36 1 267 735.7350
213816 3 18 2 298 2275.3513
120967 3 48 2 163 1055.3238
209430 2 42 2 291 1771.0225
21235 3 21 1 30 1204.5663
131326 3 29 1 178 331.9588
19667 1 57 1 28 638.9138
74743 2 48 1 101 1208.8763
178672 3 66 2 246 338.2013
100174 3 24 2 133 1733.6275
69046 3 24 2 94 542.4863
79960 1 41 2 107 567.6363
108591 2 42 1 146 978.3775
235635 3 24 1 334 1382.9437
187426 2 54 2 259 478.2362
28728 3 39 2 40 1165.6175
205348 3 32 2 286 1082.9913
218812 3 30 1 306 308.1037
168389 3 48 2 231 593.2475
145479 1 21 1 198 864.2663
105170 2 40 1 141 1016.7862
155753 2 78 2 212 1109.0025
169399 3 28 1 233 1467.1363
55664 1 63 1 76 904.3763
74024 2 51 1 100 547.5538
85558 1 25 1 114 893.8825
142684 3 54 2 193 1203.3212
198792 1 22 1 277 1800.3325
82603 3 70 2 110 827.3763
171036 2 50 2 235 2003.9725
1616 1 42 2 2 590.5662
57042 3 45 1 78 1021.7287
45100 2 38 2 63 1807.9288
134828 2 28 1 183 715.1187
91167 3 26 2 120 480.1950
170605 3 40 2 234 507.2763
175869 3 77 1 242 386.2987
81594 2 82 2 109 580.0838
37426 1 20 2 52 1159.1613
113799 3 85 1 153 459.5450
24721 3 18 2 34 2912.7575
26297 3 45 2 36 1304.4925
57074 1 51 1 78 602.2112
185000 3 34 1 256 583.5738
94196 3 44 2 124 2344.1087
80656 3 45 2 108 1340.9713
14849 1 46 1 22 967.2525
145730 2 73 1 198 418.8037
56633 3 34 2 77 1011.5488
273 2 54 1 1 786.2138
60567 1 40 2 83 315.2925
47788 1 38 2 66 1105.9188
76943 2 53 2 103 537.7062
165014 3 34 1 227 824.3125
188444 3 22 1 261 623.2225
29043 1 35 1 41 724.9025
165578 3 25 1 228 596.0275
50702 3 43 2 69 985.9662
197621 3 39 2 275 1310.1163
26267 3 41 2 36 1030.3900
29565 1 60 2 41 920.8550
20060 3 36 2 28 157.2188
119780 2 20 1 162 863.8100"
tor <- read.table(textConnection(xd1), header=TRUE, as.is=TRUE)
# Grouping variable "xsmoke" must be a factor
tor$xsmoke <- factor(tor$xsmoke,levels=c (1,2,3),
labels=c('Current SMK','Former SMK', 'Never Smk'), ordered=TRUE)
is.factor(tor$xsmoke)
# object with survey design variables and data
nhis <- svydesign (id=~psu,strat=~stratum, weights=~wt8, data=tor, nest=TRUE)
MyBreaks <- c(18, 25, 35, 45, 55, 65, 75, 85)
svyboxplot (age_p~xsmoke,
subset (nhis, age_p>=0),
col=c("red", "yellow", "green"), medcol="blue",
varwidth=TRUE, all.outliers=TRUE,
ylab="Age at Interview",
xlab=" "
)
The Factor variable xsmoke is coded as tor$xsmoke <- factor(tor$xsmoke,levels=c (1,2,3),
labels=c('Current SMK','Former SMK', 'Never Smk'), ordered=TRUE) which should be useful
__________________________________________enter code here

Resources