hello I have this df
res1 res4 aa1234
1 1 4 IVGG
2 10 13 RQFP
3 102 105 TSSV
4 112 115 LQNA
5 118 121 EAGT
6 12 15 FPFL
7 132 135 RSGG
8 138 141 SRFP
9 150 153 PEDQ
10 151 154 EDQC
11 155 158 RPNN
12 165 168 TRRG
13 171 174 CNGD
14 172 175 NGDG
15 174 177 DGGT
16 181 184 CEGL
17 195 198 PCGR
18 20 23 NQGR
19 205 208 RVAL
20 32 35 HARF
21 39 42 AASC
22 40 43 ASCF
23 48 51 PGVS
24 57 60 AYDL
25 59 62 DLRR
26 64 67 ERQS
27 65 68 RQSR
28 78 81 ENGY
29 8 11 RPRQ
30 82 85 DPQQ
31 83 86 PQQN
32 86 89 NLND
33 95 98 LDRE
I want to subset it considering only rows in which res1 are in sequence as i and i <= i+4, as :
res1 res4 aa1234
29 8 11 RPRQ
6 12 15 FPFL
21 39 42 AASC
22 40 43 ASCF
24 57 60 AYDL
25 59 62 DLRR
26 64 67 ERQS
27 65 68 RQSR
28 78 81 ENGY
30 82 85 DPQQ
31 83 86 PQQN
32 86 89 NLND
9 150 153 PEDQ
10 151 154 EDQC
11 155 158 RPNN
13 171 174 CNGD
14 172 175 NGDG
15 174 177 DGGT
I tried something woth functions "filter" and "subset" but I didn't got the result expected.
So in general, I need to have the overlap between two rows in a range (i-i+4) including i+4.
For example, in this 3 lines there is the overlap between rows [9] and [10] (150-153 overlaps with 151-154), but also row [11] corresponds to res1[10] + 4 (151+4 = 155). So maybe an idea should be to consider res1[i] and check if res1[i+1] is =< res[i].
9 150 153 PEDQ
10 151 154 EDQC
11 155 158 RPNN
why not we are simply doing this?
df[df$res1 %in% c(df$res1 -4,df$res1 -3, df$res1-2, df$res1 -1, df$res1+1,df$res1 +2, df$res1 +3, df$res1 +4),]
res1 res4 aa1234
2 10 13 RQFP
6 12 15 FPFL
9 150 153 PEDQ
10 151 154 EDQC
11 155 158 RPNN
13 171 174 CNGD
14 172 175 NGDG
15 174 177 DGGT
21 39 42 AASC
22 40 43 ASCF
24 57 60 AYDL
25 59 62 DLRR
26 64 67 ERQS
27 65 68 RQSR
28 78 81 ENGY
29 8 11 RPRQ
30 82 85 DPQQ
31 83 86 PQQN
32 86 89 NLND
edited scenario just order the df, and rest will be same. See
df <- df[order(df$res1),]
df[sort(unique(c(which(rev(diff(rev(df$res1))) >= -3 & rev(diff(rev(df$res1))) <= 0), which(diff(df$res1) <= 4 & diff(df$res1) >= 0)+1))),]
res1 res4 aa1234
29 8 11 RPRQ
2 10 13 RQFP
6 12 15 FPFL
21 39 42 AASC
22 40 43 ASCF
24 57 60 AYDL
25 59 62 DLRR
26 64 67 ERQS
27 65 68 RQSR
30 82 85 DPQQ
31 83 86 PQQN
32 86 89 NLND
9 150 153 PEDQ
10 151 154 EDQC
11 155 158 RPNN
13 171 174 CNGD
14 172 175 NGDG
15 174 177 DGGT
old answer Use this
df[sort(unique(c(which(rev(diff(rev(df$res1))) >= -3 & rev(diff(rev(df$res1))) <= 0), which(diff(df$res1) <= 4 & diff(df$res1) >= 0)+1))),]
res1 res4 aa1234
9 150 153 PEDQ
10 151 154 EDQC
11 155 158 RPNN
13 171 174 CNGD
14 172 175 NGDG
15 174 177 DGGT
21 39 42 AASC
22 40 43 ASCF
24 57 60 AYDL
25 59 62 DLRR
26 64 67 ERQS
27 65 68 RQSR
30 82 85 DPQQ
31 83 86 PQQN
32 86 89 NLND
Data used
df <- read.table(text = "res1 res4 aa1234
1 1 4 IVGG
2 10 13 RQFP
3 102 105 TSSV
4 112 115 LQNA
5 118 121 EAGT
6 12 15 FPFL
7 132 135 RSGG
8 138 141 SRFP
9 150 153 PEDQ
10 151 154 EDQC
11 155 158 RPNN
12 165 168 TRRG
13 171 174 CNGD
14 172 175 NGDG
15 174 177 DGGT
16 181 184 CEGL
17 195 198 PCGR
18 20 23 NQGR
19 205 208 RVAL
20 32 35 HARF
21 39 42 AASC
22 40 43 ASCF
23 48 51 PGVS
24 57 60 AYDL
25 59 62 DLRR
26 64 67 ERQS
27 65 68 RQSR
28 78 81 ENGY
29 8 11 RPRQ
30 82 85 DPQQ
31 83 86 PQQN
32 86 89 NLND
33 95 98 LDRE", header = T)
I tried fitting gams to some dataframes I have. All minus one work. It fails with the error:
Error in smooth.construct.tp.smooth.spec(object, dk$data, dk$knots) : A term has fewer unique covariate combinations than specified maximum degrees of freedom
I looked a bit on the internet but couldn't really figure out what's really going wrong. All my 7 over dataframes run without a problem.
I then ran epiR::epi.cp(srtm[-c(1,7,8)]) and it gave me this output:
$cov.pattern
id n curv_plan curv_prof dem slope ca
1 1 1 1.113192e-02 3.991046e-03 3909 43.601479 5.225853
2 2 1 -2.686749e-03 3.474989e-03 3312 35.022511 4.418310
3 3 1 -1.033450e-02 -4.626922e-03 3326 36.678623 4.421465
4 4 1 -5.439283e-03 2.066148e-03 4069 31.501045 3.887526
5 5 1 -2.602015e-03 -1.249511e-04 3021 37.199219 5.010560
6 6 1 1.068216e-03 1.216902e-03 2844 44.694374 4.852220
7 7 1 -1.855443e-02 -5.965539e-03 2841 42.753750 5.088554
8 8 1 2.363193e-03 2.353357e-03 2833 33.160995 4.652209
9 9 1 2.169674e-02 1.049735e-02 2964 32.311535 4.671970
10 10 1 2.850910e-02 9.416230e-03 2956 50.791847 3.496096
11 11 1 -1.932028e-02 4.949751e-04 2794 38.714302 4.217102
12 12 1 -1.372750e-03 -4.437230e-03 3799 48.356312 4.597039
13 13 1 1.154181e-04 -4.114155e-03 3808 54.669777 3.518823
14 14 1 2.743768e-02 7.829833e-03 3580 23.674162 3.268744
15 15 1 7.216539e-03 9.818082e-04 3969 29.421440 4.354250
16 16 1 2.385139e-03 6.333927e-04 3635 10.555381 4.905733
17 17 1 -1.129411e-02 2.719948e-03 2805 29.195084 4.807369
18 18 1 4.584329e-04 -1.497223e-03 3676 32.754879 3.729304
19 19 1 1.883965e-03 4.189690e-03 3165 30.973505 4.833158
20 20 1 -5.350136e-03 -2.615470e-03 2745 32.534698 4.420852
21 21 1 1.484253e-02 -1.245213e-03 3872 26.113234 4.045357
22 22 1 -2.449377e-02 -5.045668e-04 2931 31.060991 5.170872
23 23 1 -2.962795e-02 -9.271557e-03 2917 21.680889 4.547461
24 24 1 -2.487545e-02 -7.834328e-03 2736 41.775677 4.543325
25 25 1 2.890568e-03 -2.040353e-03 2577 47.003765 3.739546
26 26 1 -5.119631e-03 8.869720e-03 3401 38.519680 5.428564
27 27 1 6.171266e-03 -6.515175e-04 2687 36.678623 4.152842
28 28 1 -8.297552e-03 -7.053435e-03 3678 39.532673 4.081311
29 29 1 8.652663e-03 2.394378e-03 3515 33.895370 4.220177
30 30 1 -2.528805e-03 -1.293259e-03 3404 42.548138 4.266330
31 31 1 1.899994e-02 6.367806e-03 3191 41.696201 3.300749
32 32 1 -2.243623e-02 -1.866033e-04 2433 34.162479 5.364681
33 33 1 -6.934012e-03 9.280805e-03 2309 32.667160 5.650699
34 34 1 -1.121149e-02 6.376335e-05 2188 31.119059 4.706416
35 35 1 -1.429000e-02 5.299596e-04 2511 34.543365 4.538456
36 36 1 -7.168889e-03 1.301791e-03 2625 30.826660 4.059711
37 37 1 -4.226461e-03 7.440552e-03 2830 33.398251 4.941027
38 38 1 -2.635832e-03 8.748529e-03 3378 45.972672 4.861779
39 39 1 -2.007920e-02 -8.081778e-03 3281 31.735376 5.173269
40 40 1 -3.453595e-02 -6.867430e-03 2690 47.515182 4.935358
41 41 1 1.698363e-03 -8.296107e-03 2529 42.224693 4.386349
42 42 1 5.257193e-03 1.021242e-02 2571 43.070564 4.194372
43 43 1 6.968817e-03 5.538784e-03 2581 36.055031 4.209373
44 44 1 -7.632907e-04 2.803704e-04 2582 28.257311 4.230427
45 45 1 -3.468894e-03 -9.099842e-04 2409 29.421440 4.190946
46 46 1 1.879089e-02 6.532978e-03 3733 41.535984 4.032614
47 47 1 -1.076225e-03 -1.138945e-03 2712 39.260731 4.580621
48 48 1 -5.306205e-03 2.667941e-03 3446 34.250553 4.925404
49 49 1 -5.380515e-03 -2.595619e-03 3785 50.561493 4.642792
50 50 1 -2.571232e-03 -2.063937e-03 3768 46.160892 4.728879
51 51 1 -7.638110e-03 -2.432463e-03 3413 32.401161 5.058373
52 52 1 -2.950254e-03 -2.034031e-04 3852 32.543564 4.443869
53 53 1 -2.702386e-03 -1.776183e-03 2483 31.002720 3.879390
54 54 1 -3.892425e-02 -2.266178e-03 2225 26.126318 5.750985
55 55 1 -2.644659e-03 3.034660e-03 2192 32.103516 4.949506
56 56 1 -2.862503e-02 3.673996e-04 2361 23.930893 5.181818
57 57 1 6.263880e-03 -7.725377e-04 3780 17.752790 4.890797
58 58 1 1.054093e-03 -1.563014e-03 3089 36.422310 4.520845
59 59 1 9.474340e-04 -3.901043e-03 3155 42.552841 4.265886
60 60 1 5.569567e-03 -1.770366e-04 3516 13.166321 4.772187
61 61 1 -8.342760e-03 -9.908290e-03 3097 36.815479 5.346615
62 62 1 -1.422498e-03 -1.645628e-03 2865 29.802414 4.131463
63 63 1 4.523963e-02 1.067406e-02 2163 36.154739 3.369432
64 64 1 -1.164162e-02 6.808200e-04 2316 19.610609 4.634536
65 65 1 -8.043590e-03 9.395104e-03 2614 44.298817 3.983136
66 66 1 -1.925332e-02 -4.521391e-03 2035 31.205780 4.134195
67 67 1 -1.429050e-02 5.435983e-03 2799 38.876656 4.180761
68 68 1 6.935605e-04 3.015038e-03 2679 37.863647 4.213497
69 69 1 -5.062089e-03 5.961242e-04 2831 32.401161 3.729215
70 70 1 -3.617065e-04 -2.874465e-03 3152 45.871994 4.703659
71 71 1 -4.216370e-02 -4.917050e-03 3726 25.376934 4.614913
72 72 1 -2.184333e-02 -2.840071e-03 3610 43.138550 4.237120
73 73 1 -1.735273e-02 -2.199261e-03 3339 33.984894 4.811754
74 74 1 1.929157e-02 5.358084e-03 3447 32.356407 3.355368
75 75 1 -4.118797e-02 -2.408211e-03 3251 22.373844 5.160147
76 76 1 -1.393304e-02 7.900328e-05 3297 22.090260 4.724728
77 77 1 -3.078095e-02 -5.535597e-03 3143 37.298687 4.625203
78 78 1 1.717030e-02 -1.120720e-03 3617 37.965389 4.627342
79 79 1 -5.965119e-04 -5.377157e-04 3689 28.360373 4.767213
80 80 1 7.843294e-03 -9.579902e-04 3676 48.356312 3.907819
81 81 1 5.994634e-03 2.034169e-03 2759 25.142431 3.980591
82 82 1 -1.323012e-02 2.393529e-03 3972 26.880308 5.107575
83 83 1 6.312347e-03 2.877600e-04 3323 32.167103 3.496723
84 84 1 -1.180464e-02 4.438243e-03 3790 40.369972 4.081389
85 85 1 -8.333334e-03 4.009274e-03 3248 14.931417 4.881107
86 86 1 2.016023e-03 -5.707344e-04 3994 18.305449 4.278613
87 87 1 -5.515654e-03 -8.373593e-04 3368 40.703190 4.229169
88 88 1 8.931696e-03 1.677515e-03 4651 30.133842 4.327270
89 89 1 1.962347e-04 -7.458636e-04 5075 57.352509 3.263017
90 90 1 -2.880805e-02 -5.200595e-04 2645 11.976726 5.634262
91 91 1 -2.101875e-02 -5.110677e-03 3109 34.218582 4.925558
92 92 1 -8.390786e-03 -1.188547e-02 3667 39.895481 4.249029
93 93 1 -1.366958e-02 9.873455e-04 2827 22.636129 5.269634
94 94 1 1.004551e-02 5.205147e-04 3667 44.028976 3.993555
95 95 1 5.892557e-03 -5.482296e-04 2416 5.385977 4.614692
96 96 1 -1.662132e-02 -9.946494e-04 3806 42.599808 3.951163
97 97 1 -7.977792e-03 5.937776e-03 3470 28.888371 3.120762
98 98 1 -2.408042e-02 -2.647421e-03 2975 16.228737 4.227977
99 99 1 -1.191509e-02 -2.014583e-03 2461 30.051607 4.361413
100 100 1 1.110316e-02 2.506189e-04 3362 29.517509 4.591039
101 101 1 2.010373e-03 4.185408e-04 5104 17.387333 3.642855
102 102 1 -3.218945e-03 1.004196e-02 4113 44.448421 3.282414
103 103 1 2.438254e-03 2.551999e-03 3234 31.205780 3.844411
104 104 1 -1.178511e-02 2.775465e-04 1864 1.350224 3.875072
105 105 1 -9.511201e-04 -1.446065e-03 2351 22.406872 4.392300
106 106 1 -4.563018e-03 -5.890041e-03 3141 24.862123 3.998985
107 107 1 -1.471223e-02 5.965497e-03 3765 25.363234 3.661456
108 108 1 -5.857890e-03 -9.363544e-03 2272 22.878105 5.105480
109 109 1 1.369277e-02 1.019289e-02 4016 44.848000 4.092690
110 110 1 -8.784844e-03 3.358194e-03 3293 32.543564 4.115062
111 111 1 -5.148044e-03 5.372697e-03 3038 31.772562 3.626687
112 112 1 -1.556184e+35 5.799786e+34 4961 29.421440 3.020591
113 113 1 3.831991e-03 1.570888e-03 2069 28.821898 3.790284
114 114 1 8.289138e-04 6.439757e-04 2154 21.045721 3.959267
115 115 1 -4.800863e-03 3.194520e-03 5294 45.660866 3.701611
116 116 1 2.974254e-02 1.197812e-02 4380 31.670097 3.877057
117 117 1 1.137725e-02 -1.082659e-02 5172 18.774675 3.572600
118 118 1 -4.678526e-03 7.448288e-03 2257 39.260731 4.227000
119 119 1 -4.655881e-03 -1.119303e-03 3233 30.205467 5.613868
120 120 1 -4.827522e-03 -4.766134e-03 3414 42.974857 3.831894
121 121 1 -8.568994e-04 1.053632e-03 1750 29.421440 4.132886
122 122 1 1.212121e-02 0.000000e+00 5018 20.136303 3.669850
123 123 1 -4.711660e-03 -2.261143e-03 3013 45.007954 3.622240
124 124 1 -1.226328e-02 4.688181e-04 3842 26.880308 3.098333
125 125 1 3.438910e-03 1.441129e-03 3470 11.386165 4.552782
126 126 1 1.192164e-02 -1.295839e-03 3473 22.684824 4.748498
127 127 1 -1.960781e-40 0.000000e+00 4155 90.000000 2.960569
128 128 1 2.124726e-04 1.945100e-03 2496 32.103516 5.242211
129 129 1 5.669804e-03 -4.589476e-03 2577 35.398876 4.271112
130 130 1 -8.838220e-03 -9.496282e-04 4921 14.506372 4.088247
131 131 1 1.009090e-02 -2.243944e-03 3385 38.372120 4.067030
132 132 1 5.630660e-03 -8.632211e-04 4003 33.322365 3.776054
133 133 1 -9.103803e-03 -6.322661e-03 2758 47.934212 3.739807
134 134 1 6.225513e-03 -1.824928e-03 3925 37.085732 3.389725
135 135 1 -1.303080e-03 3.580316e-03 2978 27.432941 4.345174
136 136 1 1.355920e-02 3.468190e-03 5058 57.797195 3.739124
137 137 1 2.092464e-02 -3.244962e-04 2400 3.931096 3.032193
138 138 1 5.691811e-02 -7.933985e-04 3885 15.069956 3.414036
139 139 1 8.052407e-05 -3.197287e-03 3493 33.993008 3.881695
140 140 1 -1.892967e-02 -5.049255e-03 2985 24.904482 4.417928
141 141 1 2.278842e-02 1.188287e-02 3666 31.670097 3.313449
142 142 1 1.496110e-02 2.181270e-03 3702 30.498932 3.171413
[ reached 'max' / getOption("max.print") -- omitted 18 rows ]
$id
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
[34] 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66
[67] 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
[100] 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132
[133] 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160
I tried to lower the number of knots in the gam-call but didn't suceed as well...
Anyone might have an idea?
I fit the gam using the following line:
mgcv::gam(slide ~ s(curv_plan) + s(curv_prof) + s(dem) + s(slope) + s(ca), data = dataframes_new[[7]], family = binomial)
I have experienced the same issue. The root cause was that some of my categorical variables had fewer levels than k in my formula specification. To give an example:
Suppose one of the terms in my formula specification was:
s(I(pmin(example_variable, 120)), k = 5)
and the data in my example_variable had 3 levels (say, "yes", "no", "maybe"). This would throw the above-mentioned error.
In my case, I solved it by creating additional levels in my data (I was creating test data for a unit test). In other cases it could be solved by ensuring k does not exceed the number of levels in your categorical variables.
If you're using categorical variables, check if the root cause might be the same for you.
I found the solution to my problem by reading these:
https://stat.ethz.ch/pipermail/r-sig-ecology/2011-May/002148.html
https://stat.ethz.ch/pipermail/r-help/2007-October/143569.html
The error means that you tried to create a thin plate spline basis expansion with more basis functions than the variable from which the expansion is to be made has unique values.
As you don't show the model fitting code, we can't say more than that one of the smooths in the model you tried to fit didn't have enough unique values for the value of k you specific or used (if you didn't set k a default value was used).
Here an example of my data.frame:
df = read.table(text = 'ID Day Episode Count
28047 6000 143 7
28049 6000 143 7
29002 6000 143 7
29003 6000 143 7
30003 6000 143 7
30004 6000 143 7
32010 6000 143 7
30001 7436 47 6
33021 7436 47 6
33024 7436 47 6
33034 7436 47 6
37018 7436 47 6
40004 7436 47 6
29003 7300 111 6
30003 7300 111 6
30004 7300 111 6
32010 7300 111 6
30001 7300 111 6
33021 7300 111 6
2001 7438 54 5
19007 7438 54 5
20002 7438 54 5
22006 7438 54 5
22007 7438 54 5
32010 7301 99 5
30001 7301 99 5
33021 7301 99 5
2001 7301 99 5
19007 7301 99 5
27021 5998 158 5
28015 5998 158 5
28047 5998 158 5
28049 5998 158 5
29001 5998 158 5
21009 7437 65 4
24001 7437 65 4
25005 7437 65 4
25009 7437 65 4
14001 7435 81 4
16004 7435 81 4
17001 7435 81 4
17005 7435 81 4
21009 7299 77 4
24001 7299 77 4
25005 7299 77 4
25009 7299 77 4
29002 5996 158 4
29003 5996 158 4
27002 5996 158 4
27003 5996 158 4
33014 5999 56 3
33023 5999 56 3
25005 5999 56 3
27021 5995 246 2
33006 5995 246 2
8876 7439 765 2
5421 7439 765 2
6678 7298 68 1
34001 5994 125 1
4432 7440 841 1', header = TRUE)
What I need to do is for each unique Day observation look for its Count value and add it to the previous 3 days' Count ones (i.e. 4-days time window).
e.g. 1) Day = 6000, sum 7 (Count value) to Count values of Day 5999, 5998 and 5997 (the last one not present in the df), which are respectively 3, 5 and 0 -> 7 + 3 + 5 + 0 = new_Count 15;
2) next Day = 7436, sum 6 to Count values of 7435, 7434 and 7433 -> 6 + 4 + 0 + 0 = new_Count 10;
and so on up to the last Day within df.
Desired output:
ID Day new_Episode new_Count
2001 7438 1 19
19007 7438 1 19
20002 7438 1 19
22006 7438 1 19
22007 7438 1 19
21009 7437 1 19
24001 7437 1 19
25005 7437 1 19
25009 7437 1 19
30001 7436 1 19
33021 7436 1 19
33024 7436 1 19
33034 7436 1 19
37018 7436 1 19
40004 7436 1 19
14001 7435 1 19
16004 7435 1 19
17001 7435 1 19
17005 7435 1 19
8876 7439 2 17
5421 7439 2 17
2001 7438 2 17
19007 7438 2 17
20002 7438 2 17
22006 7438 2 17
22007 7438 2 17
21009 7437 2 17
24001 7437 2 17
25005 7437 2 17
25009 7437 2 17
30001 7436 2 17
33021 7436 2 17
33024 7436 2 17
33034 7436 2 17
37018 7436 2 17
40004 7436 2 17
32010 7301 3 16
30001 7301 3 16
33021 7301 3 16
2001 7301 3 16
19007 7301 3 16
29003 7300 3 16
30003 7300 3 16
30004 7300 3 16
32010 7300 3 16
30001 7300 3 16
33021 7300 3 16
21009 7299 3 16
24001 7299 3 16
25005 7299 3 16
25009 7299 3 16
6678 7298 3 16
28047 6000 4 15
28049 6000 4 15
29002 6000 4 15
29003 6000 4 15
30003 6000 4 15
30004 6000 4 15
32010 6000 4 15
33014 5999 4 15
33023 5999 4 15
25005 5999 4 15
27021 5998 4 15
28015 5998 4 15
28047 5998 4 15
28049 5998 4 15
29001 5998 4 15
21009 7437 5 14
24001 7437 5 14
25005 7437 5 14
25009 7437 5 14
30001 7436 5 14
33021 7436 5 14
33024 7436 5 14
33034 7436 5 14
37018 7436 5 14
40004 7436 5 14
14001 7435 5 14
16004 7435 5 14
17001 7435 5 14
17005 7435 5 14
4432 7440 6 12
8876 7439 6 12
5421 7439 6 12
2001 7438 6 12
19007 7438 6 12
20002 7438 6 12
22006 7438 6 12
22007 7438 6 12
21009 7437 6 12
24001 7437 6 12
25005 7437 6 12
25009 7437 6 12
33014 5999 7 12
33023 5999 7 12
25005 5999 7 12
27021 5998 7 12
28015 5998 7 12
28047 5998 7 12
28049 5998 7 12
29001 5998 7 12
29002 5996 7 12
29003 5996 7 12
27002 5996 7 12
27003 5996 7 12
29003 7300 8 11
30003 7300 8 11
30004 7300 8 11
32010 7300 8 11
30001 7300 8 11
33021 7300 8 11
21009 7299 8 11
24001 7299 8 11
25005 7299 8 11
25009 7299 8 11
6678 7298 8 11
27021 5998 9 11
28015 5998 9 11
28047 5998 9 11
28049 5998 9 11
29001 5998 9 11
29002 5996 9 11
29003 5996 9 11
27002 5996 9 11
27003 5996 9 11
27021 5995 9 11
33006 5995 9 11
30001 7436 10 10
33021 7436 10 10
33024 7436 10 10
33034 7436 10 10
37018 7436 10 10
40004 7436 10 10
14001 7435 10 10
16004 7435 10 10
17001 7435 10 10
17005 7435 10 10
29002 5996 11 7
29003 5996 11 7
27002 5996 11 7
27003 5996 11 7
27021 5995 11 7
33006 5995 11 7
34001 5994 11 7
21009 7299 12 5
24001 7299 12 5
25005 7299 12 5
25009 7299 12 5
6678 7298 12 5
14001 7435 13 4
16004 7435 13 4
17001 7435 13 4
17005 7435 13 4
27021 5995 14 3
33006 5995 14 3
34001 5994 14 3
6678 7298 15 1
34001 5994 16 1
Note that the output_df is larger than df (but it's ok) and it is ranked by -new_Count and -Day with new_Episode column accordingly to -new_Count ranking.
Any suggestion?
So I'm not sure why output_df has more rows than the original data.frame, but we can use the by function along with subset to calculate new_Count. Note that I've called your data.frame df1 instead of df.
output_df1 <- do.call('rbind', by(df1, list(df1$Day, df1$ID), FUN = function(d){
#grab subset of df
sub_df <- subset(df1, Day < d$Day & Day > (d$Day - 4))
#select unique day, count
sub_df_u <- unique(sub_df[,-1])
d$new_Count <- sum(sub_df_u$Count) + d$Count
d
}))
head(output_df1)
ID Day Episode Count new_Count
14 2001 7438 54 5 15
28 14001 7435 81 4 4
29 16004 7435 81 4 4
30 17001 7435 81 4 4
31 17005 7435 81 4 4
15 19007 7438 54 5 15
To get the new_Episode column, we can use the dense_rank function from the dplyr package:
output_df1$new_Episode <- dplyr::dense_rank(-output_df1$new_Count)
I need a vector that repeats numbers in a sequence at varying intervals. I basically need this
c(rep(1:42, each=6), rep(43:64, each = 7),
rep(65:106, each=6), rep(107:128, each = 7),
.... but I need to this to keep going, until almost 2 million.
So I want a vector that looks like
[1] 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 5 5 5 5 5 5 ...
.....
[252] 43 43 43 43 43 43 43 44 44 44 44 44 44 44
....
[400] 64 64 64 64 64 64 65 65 65 65 65 65...
and so on. Not just alternating between 6 and 7 repetitions, rather mostly 6s and fewer 7s until the whole vector is something like 1.7 million rows. So, is there a loop I can do? Or apply, replicate? I need the 400th entry in the vector to be 64, the 800th entry to be 128, and so on, in somewhat evenly spaced integers.
UPDATE
Thank you all for the quick clever tricks there. It worked, at least well enough for the deadline I was dealing with. I realize repeating 6 xs and 7 xs are a really dumb way to try to solve this, but it was quick at least. But now that I have some time, I would like to get everyone's opinions /ideas on my real underlying issue here.
I have two datasets to merge. They are both sensor datasets, both with stopwatch time as primary keys. But one records every 1/400 of a second, and the other records every 1/256 of a second. I have trimmed the top of each so that they are starting the exact same moment. But.. now what? I have 400 records for each second in one set, and 256 records for 1 second in the other. Is there a way to merge these without losing data? Interpolating or just repeating obs is a-ok, necessary, I think, but I'd rather not throw any data out.
I read this post here, that had to do with using xts and zoo for a very similar problem to mine. But they have nice epoch date/times for each. I just have these awful fractions of seconds!
sample data (A):
time dist a_lat
1 139.4300 22 0
2 139.4325 22 0
3 139.4350 22 0
4 139.4375 22 0
5 139.4400 22 0
6 139.4425 22 0
7 139.4450 22 0
8 139.4475 22 0
9 139.4500 22 0
10 139.4525 22 0
sample data (B):
timestamp hex_acc_x hex_acc_y hex_acc_z
1 367065215501 -0.5546875 -0.7539062 0.1406250
2 367065215505 -0.5468750 -0.7070312 0.2109375
3 367065215509 -0.4218750 -0.6835938 0.1796875
4 367065215513 -0.5937500 -0.7421875 0.1562500
5 367065215517 -0.6757812 -0.7773438 0.2031250
6 367065215521 -0.5937500 -0.8554688 0.2460938
7 367065215525 -0.6132812 -0.8476562 0.2109375
8 367065215529 -0.3945312 -0.8906250 0.2031250
9 367065215533 -0.3203125 -0.8906250 0.2226562
10 367065215537 -0.3867188 -0.9531250 0.2578125
(oh yeah, and btw, the B dataset timestamps are epoch format * 256, because life is hard. i haven't converted it for this because dataset A has nothing like that, only just 0.0025 intervals. Also the B data sensor was left on for hours later the A data sensor turned off, so that doesn't help)
Or if you like, you can try this using apply
# using this sample data
df <- data.frame(from=c(1,4,7,11), to = c(3,6,10,13),rep=c(6,7,6,7));
> df
# from to rep
#1 1 3 6
#2 4 6 7
#3 7 10 6
#4 11 13 7
unlist(apply(df, 1, function(x) rep(x['from']:x['to'], each=x['rep'])))
# [1] 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 4
#[26] 5 5 5 5 5 5 5 6 6 6 6 6 6 6 7 7 7 7 7 7 8 8 8 8 8
#[51] 8 9 9 9 9 9 9 10 10 10 10 10 10 11 11 11 11 11 11 11 12 12 12 12 12
#[76] 12 12 13 13 13 13 13 13 13
Now that you put it that way ... I have absolutely no idea how you are planning on using all of the 6s and 7s. :-)
Regardless, I recommend standardizing the time, adding a "sample" column, and merging on them. Having the "sample" column may facilitate your processing later on, perhaps.
Your data:
df400 <- structure(list(time = c(139.43, 139.4325, 139.435, 139.4375, 139.44, 139.4425,
139.445, 139.4475, 139.45, 139.4525),
dist = c(22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L),
a_lat = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)),
.Names = c("time", "dist", "a_lat"),
class = "data.frame", row.names = c(NA, -10L))
df256 <- structure(list(timestamp = c(367065215501, 367065215505, 367065215509, 367065215513,
367065215517, 367065215521, 367065215525, 367065215529,
367065215533, 367065215537),
hex_acc_x = c(-0.5546875, -0.546875, -0.421875, -0.59375, -0.6757812,
-0.59375, -0.6132812, -0.3945312, -0.3203125, -0.3867188),
hex_acc_y = c(-0.7539062, -0.7070312, -0.6835938, -0.7421875,
-0.7773438, -0.8554688, -0.8476562, -0.890625,
-0.890625, -0.953125),
hex_acc_z = c(0.140625, 0.2109375, 0.1796875, 0.15625, 0.203125,
0.2460938, 0.2109375, 0.203125, 0.2226562, 0.2578125)),
.Names = c("timestamp", "hex_acc_x", "hex_acc_y", "hex_acc_z"),
class = "data.frame", row.names = c(NA, -10L))
Standardize your time frames:
colnames(df256)[1] <- 'time'
df400$time <- df400$time - df400$time[1]
df256$time <- (df256$time - df256$time[1]) / 256
Assign a label for easy reference (not that the NAs won't be clear enough):
df400 <- cbind(sample='A', df400, stringsAsFactors=FALSE)
df256 <- cbind(sample='B', df256, stringsAsFactors=FALSE)
And now for the merge and sorting:
dat <- merge(df400, df256, by=c('sample', 'time'), all.x=TRUE, all.y=TRUE)
dat <- dat[order(dat$time),]
dat
## sample time dist a_lat hex_acc_x hex_acc_y hex_acc_z
## 1 A 0.000000 22 0 NA NA NA
## 11 B 0.000000 NA NA -0.5546875 -0.7539062 0.1406250
## 2 A 0.002500 22 0 NA NA NA
## 3 A 0.005000 22 0 NA NA NA
## 4 A 0.007500 22 0 NA NA NA
## 5 A 0.010000 22 0 NA NA NA
## 6 A 0.012500 22 0 NA NA NA
## 7 A 0.015000 22 0 NA NA NA
## 12 B 0.015625 NA NA -0.5468750 -0.7070312 0.2109375
## 8 A 0.017500 22 0 NA NA NA
## 9 A 0.020000 22 0 NA NA NA
## 10 A 0.022500 22 0 NA NA NA
## 13 B 0.031250 NA NA -0.4218750 -0.6835938 0.1796875
## 14 B 0.046875 NA NA -0.5937500 -0.7421875 0.1562500
## 15 B 0.062500 NA NA -0.6757812 -0.7773438 0.2031250
## 16 B 0.078125 NA NA -0.5937500 -0.8554688 0.2460938
## 17 B 0.093750 NA NA -0.6132812 -0.8476562 0.2109375
## 18 B 0.109375 NA NA -0.3945312 -0.8906250 0.2031250
## 19 B 0.125000 NA NA -0.3203125 -0.8906250 0.2226562
## 20 B 0.140625 NA NA -0.3867188 -0.9531250 0.2578125
I'm guessing your data was just a small representation. If I've guessed poorly (that A's integers are seconds and B's integers are 1/400ths of a second) then just scale differently. Either way, by resetting the first value to zero and then merging/sorting, they are easy to merge and sort.
alt <- data.frame(len=c(42,22),rep=c(6,7));
alt;
## len rep
## 1 42 6
## 2 22 7
altrep <- function(alt,cyc,len) {
cyclen <- sum(alt$len*alt$rep);
if (missing(cyc)) {
if (missing(len)) {
cyc <- 1;
len <- cyc*cyclen;
} else {
cyc <- ceiling(len/cyclen);
};
} else if (missing(len)) {
len <- cyc*cyclen;
};
if (isTRUE(all.equal(len,0))) return(integer());
result <- rep(1:(cyc*sum(alt$len)),rep(rep(alt$rep,alt$len),cyc));
length(result) <- len;
result;
};
altrep(alt,2);
## [1] 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 5 5 5 5 5 5 6 6 6 6 6 6 7 7 7 7 7 7 8 8 8 8 8 8 9 9 9
## [52] 9 9 9 10 10 10 10 10 10 11 11 11 11 11 11 12 12 12 12 12 12 13 13 13 13 13 13 14 14 14 14 14 14 15 15 15 15 15 15 16 16 16 16 16 16 17 17 17 17 17 17
## [103] 18 18 18 18 18 18 19 19 19 19 19 19 20 20 20 20 20 20 21 21 21 21 21 21 22 22 22 22 22 22 23 23 23 23 23 23 24 24 24 24 24 24 25 25 25 25 25 25 26 26 26
## [154] 26 26 26 27 27 27 27 27 27 28 28 28 28 28 28 29 29 29 29 29 29 30 30 30 30 30 30 31 31 31 31 31 31 32 32 32 32 32 32 33 33 33 33 33 33 34 34 34 34 34 34
## [205] 35 35 35 35 35 35 36 36 36 36 36 36 37 37 37 37 37 37 38 38 38 38 38 38 39 39 39 39 39 39 40 40 40 40 40 40 41 41 41 41 41 41 42 42 42 42 42 42 43 43 43
## [256] 43 43 43 43 44 44 44 44 44 44 44 45 45 45 45 45 45 45 46 46 46 46 46 46 46 47 47 47 47 47 47 47 48 48 48 48 48 48 48 49 49 49 49 49 49 49 50 50 50 50 50
## [307] 50 50 51 51 51 51 51 51 51 52 52 52 52 52 52 52 53 53 53 53 53 53 53 54 54 54 54 54 54 54 55 55 55 55 55 55 55 56 56 56 56 56 56 56 57 57 57 57 57 57 57
## [358] 58 58 58 58 58 58 58 59 59 59 59 59 59 59 60 60 60 60 60 60 60 61 61 61 61 61 61 61 62 62 62 62 62 62 62 63 63 63 63 63 63 63 64 64 64 64 64 64 64 65 65
## [409] 65 65 65 65 66 66 66 66 66 66 67 67 67 67 67 67 68 68 68 68 68 68 69 69 69 69 69 69 70 70 70 70 70 70 71 71 71 71 71 71 72 72 72 72 72 72 73 73 73 73 73
## [460] 73 74 74 74 74 74 74 75 75 75 75 75 75 76 76 76 76 76 76 77 77 77 77 77 77 78 78 78 78 78 78 79 79 79 79 79 79 80 80 80 80 80 80 81 81 81 81 81 81 82 82
## [511] 82 82 82 82 83 83 83 83 83 83 84 84 84 84 84 84 85 85 85 85 85 85 86 86 86 86 86 86 87 87 87 87 87 87 88 88 88 88 88 88 89 89 89 89 89 89 90 90 90 90 90
## [562] 90 91 91 91 91 91 91 92 92 92 92 92 92 93 93 93 93 93 93 94 94 94 94 94 94 95 95 95 95 95 95 96 96 96 96 96 96 97 97 97 97 97 97 98 98 98 98 98 98 99 99
## [613] 99 99 99 99 100 100 100 100 100 100 101 101 101 101 101 101 102 102 102 102 102 102 103 103 103 103 103 103 104 104 104 104 104 104 105 105 105 105 105 105 106 106 106 106 106 106 107 107 107 107 107
## [664] 107 107 108 108 108 108 108 108 108 109 109 109 109 109 109 109 110 110 110 110 110 110 110 111 111 111 111 111 111 111 112 112 112 112 112 112 112 113 113 113 113 113 113 113 114 114 114 114 114 114 114
## [715] 115 115 115 115 115 115 115 116 116 116 116 116 116 116 117 117 117 117 117 117 117 118 118 118 118 118 118 118 119 119 119 119 119 119 119 120 120 120 120 120 120 120 121 121 121 121 121 121 121 122 122
## [766] 122 122 122 122 122 123 123 123 123 123 123 123 124 124 124 124 124 124 124 125 125 125 125 125 125 125 126 126 126 126 126 126 126 127 127 127 127 127 127 127 128 128 128 128 128 128 128
altrep(alt,len=1000);
## [1] 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 5 5 5 5 5 5 6 6 6 6 6 6 7 7 7 7 7 7 8 8 8 8 8 8 9 9 9
## [52] 9 9 9 10 10 10 10 10 10 11 11 11 11 11 11 12 12 12 12 12 12 13 13 13 13 13 13 14 14 14 14 14 14 15 15 15 15 15 15 16 16 16 16 16 16 17 17 17 17 17 17
## [103] 18 18 18 18 18 18 19 19 19 19 19 19 20 20 20 20 20 20 21 21 21 21 21 21 22 22 22 22 22 22 23 23 23 23 23 23 24 24 24 24 24 24 25 25 25 25 25 25 26 26 26
## [154] 26 26 26 27 27 27 27 27 27 28 28 28 28 28 28 29 29 29 29 29 29 30 30 30 30 30 30 31 31 31 31 31 31 32 32 32 32 32 32 33 33 33 33 33 33 34 34 34 34 34 34
## [205] 35 35 35 35 35 35 36 36 36 36 36 36 37 37 37 37 37 37 38 38 38 38 38 38 39 39 39 39 39 39 40 40 40 40 40 40 41 41 41 41 41 41 42 42 42 42 42 42 43 43 43
## [256] 43 43 43 43 44 44 44 44 44 44 44 45 45 45 45 45 45 45 46 46 46 46 46 46 46 47 47 47 47 47 47 47 48 48 48 48 48 48 48 49 49 49 49 49 49 49 50 50 50 50 50
## [307] 50 50 51 51 51 51 51 51 51 52 52 52 52 52 52 52 53 53 53 53 53 53 53 54 54 54 54 54 54 54 55 55 55 55 55 55 55 56 56 56 56 56 56 56 57 57 57 57 57 57 57
## [358] 58 58 58 58 58 58 58 59 59 59 59 59 59 59 60 60 60 60 60 60 60 61 61 61 61 61 61 61 62 62 62 62 62 62 62 63 63 63 63 63 63 63 64 64 64 64 64 64 64 65 65
## [409] 65 65 65 65 66 66 66 66 66 66 67 67 67 67 67 67 68 68 68 68 68 68 69 69 69 69 69 69 70 70 70 70 70 70 71 71 71 71 71 71 72 72 72 72 72 72 73 73 73 73 73
## [460] 73 74 74 74 74 74 74 75 75 75 75 75 75 76 76 76 76 76 76 77 77 77 77 77 77 78 78 78 78 78 78 79 79 79 79 79 79 80 80 80 80 80 80 81 81 81 81 81 81 82 82
## [511] 82 82 82 82 83 83 83 83 83 83 84 84 84 84 84 84 85 85 85 85 85 85 86 86 86 86 86 86 87 87 87 87 87 87 88 88 88 88 88 88 89 89 89 89 89 89 90 90 90 90 90
## [562] 90 91 91 91 91 91 91 92 92 92 92 92 92 93 93 93 93 93 93 94 94 94 94 94 94 95 95 95 95 95 95 96 96 96 96 96 96 97 97 97 97 97 97 98 98 98 98 98 98 99 99
## [613] 99 99 99 99 100 100 100 100 100 100 101 101 101 101 101 101 102 102 102 102 102 102 103 103 103 103 103 103 104 104 104 104 104 104 105 105 105 105 105 105 106 106 106 106 106 106 107 107 107 107 107
## [664] 107 107 108 108 108 108 108 108 108 109 109 109 109 109 109 109 110 110 110 110 110 110 110 111 111 111 111 111 111 111 112 112 112 112 112 112 112 113 113 113 113 113 113 113 114 114 114 114 114 114 114
## [715] 115 115 115 115 115 115 115 116 116 116 116 116 116 116 117 117 117 117 117 117 117 118 118 118 118 118 118 118 119 119 119 119 119 119 119 120 120 120 120 120 120 120 121 121 121 121 121 121 121 122 122
## [766] 122 122 122 122 122 123 123 123 123 123 123 123 124 124 124 124 124 124 124 125 125 125 125 125 125 125 126 126 126 126 126 126 126 127 127 127 127 127 127 127 128 128 128 128 128 128 128 129 129 129 129
## [817] 129 129 130 130 130 130 130 130 131 131 131 131 131 131 132 132 132 132 132 132 133 133 133 133 133 133 134 134 134 134 134 134 135 135 135 135 135 135 136 136 136 136 136 136 137 137 137 137 137 137 138
## [868] 138 138 138 138 138 139 139 139 139 139 139 140 140 140 140 140 140 141 141 141 141 141 141 142 142 142 142 142 142 143 143 143 143 143 143 144 144 144 144 144 144 145 145 145 145 145 145 146 146 146 146
## [919] 146 146 147 147 147 147 147 147 148 148 148 148 148 148 149 149 149 149 149 149 150 150 150 150 150 150 151 151 151 151 151 151 152 152 152 152 152 152 153 153 153 153 153 153 154 154 154 154 154 154 155
## [970] 155 155 155 155 155 156 156 156 156 156 156 157 157 157 157 157 157 158 158 158 158 158 158 159 159 159 159 159 159 160 160
You can specify len=1.7e6 (and omit the cyc argument) to get exactly 1.7 million elements, or you can get a whole number of cycles using cyc.
How about
len <- 2e6
step <- 400
x <- rep(64 * seq(0, ceiling(len / step) - 1), each = step) +
sort(rep(1:64, length.out = step))
x <- x[seq(len)] # to get rid of extra elements