So, I have this time series that tracks the daily number of applications to a graduate program. Each application period is 64 days - so for each period, you start at zero and it goes up until the end of the period. The last period is partial, representing the current application period.
[1] 0 26 32 36 37 38 40 43 43 44 45 45 45 45 49 49 55 61 66 69 73 77 85 94 99 102 104 108 113 117 123 126 128 132 138 143 151 156 158 161 162 172 175 179 182 189 193
[48] 196 206 213 218 225 234 241 243 251 256 264 267 273 277 282 290 302 0 16 23 36 40 44 51 54 58 60 64 66 69 74 82 88 90 91 92 93 96 102 102 104 106 109 111 115 117 124
[95] 124 126 128 128 129 130 132 135 135 136 139 140 146 150 152 155 157 159 160 167 171 173 174 174 176 177 180 182 184 185 186 186 187 187 0 11 16 27 38 40 44 51 54 57 61 71 80
[142] 85 92 95 97 100 107 116 121 125 131 134 134 136 137 143 150 151 156 163 163 165 173 189 200 210 215 233 247 256 275 279 284 291 304 310 315 325 330 332 332 343 345 351 357 359 359 365
[189] 371 372 372 374 0 24 34 41 53 65 74 78 84 90 93 96 104 105 112 118 122 126 134 138 143 151 155 156 158 159 164 171 177 180 184 188 196 201 203 218 223 225 230 233 236 240 245
[236] 250 255 259 265 267 275 281 285 290 293 298 307 316 319 320 322 325 328 338 342 342 0 10 18 23 27 40 51 60 67 71 73 76 82 88 91 94 102 102 104 111 114 118 119 123 123 130
[283] 133 142 146 154 157 160 163 172 177 187 192 195 195 197 201 208 210 214 222 225 227 232 240 243 246 249 251 254 258 261 265 267 269 270 272 274 293 293 0 12 17 19 22 27 28 32 35
[330] 38 44 45 45 46 52 54 55 61 67 73 77 79 82 85 87 90 110 122 128 133 145 157 169 179 198 205 215 229 239 256 264 279 290 298 306 309 317 322 324 327 331 341 357 375 379 382
[377] 385 395 396 398 400 407 409 415 0 57 72 94 104 119 125 129 131 136 149 154 165 173 177 181 186 191 195 204 210 216 224 234 240 245 253 257 263 269 273 276 283 287 304 322 328 332 352
[424] 366 377 380 383 387 388 398 405 408 411 416 420 427 435 437 446 448 455 463 468 476 486 493 501 501 0 17 35 48 61 69 77 87 95 100 105 109 112 117 120 122 125 131 136 141 145 154
[471] 159 161 164 169 172 179 182 190 192 199 203 206 209 218 225 228 231 237 241 243 245 248 249 256 262 277 289 295 303 308 313 321 330 333 334 342 343 344 346 349 353 354 1 17 32 40 48
[518] 50 53 54 55 56 62 65 69 73 75 81 85 87 89 92 96 98 100 103 106 108 111 112 113 121 123 127 130 136 136 141 143 146 146 150 151 152 153 154 164 175 184 187 189 191 192 193
[565] 198 203 217 220 230 234 237 240 244 256 262 268 0 20 31 46
Each day, I run a simple model that happens to predict the number of applications quite well.
myts2 <- ts(df, frequency = 64)
myts2 <- HoltWinters(myts2, seasonal = "additive")
fcast <- predict(myts2, n.ahead=60, prediction.interval = T, level = 0.95)
# Creates data frame with day (0 to 63), predicted fit, and confidence intervals
castout <- data.frame((elapsed):63, as.numeric(fcast[,1]), as.numeric(fcast[,2]), as.numeric(fcast[,3]))
names(castout) <- c("Day", "Total", "High", "Low")
# Simplified; this block ensures the low esimate cannot dip below the current number of applications
castout$Low[castout$Low < 53)] <- 53
Here's a graph of the results, and the output of fcast:
> fcast
Time Series:
Start = c(10, 5)
End = c(10, 64)
Frequency = 64
fit upr lwr
10.06250 51.08407 77.18901 24.979132
10.07812 55.25007 91.76327 18.736879
10.09375 61.69342 106.24630 17.140542
10.10938 65.36204 116.71089 14.013186
10.12500 69.29609 126.64110 11.951078
10.14062 71.76356 134.53454 8.992582
10.15625 76.06790 143.83176 8.304034
10.17188 78.42243 150.83574 6.009127
10.18750 81.85213 158.63385 5.070411
10.20312 86.70147 167.61610 5.786832
10.21875 94.62669 179.47316 9.780222
10.23438 101.18980 189.79380 12.585798
10.25000 104.27303 196.48157 12.064493
10.26562 106.00446 201.68183 10.327081
10.28125 107.74120 206.76598 8.716431
10.29688 109.56690 211.82956 7.304241
10.31250 112.75659 218.15771 7.355464
10.32812 119.17347 227.62227 10.724667
10.34375 120.76563 232.17877 9.352490
10.35938 123.42045 237.72108 9.119822
10.37500 126.19423 243.31117 9.077281
10.39062 130.27639 250.14350 10.409274
10.40625 133.92534 256.48092 11.369764
10.42188 138.90565 264.09197 13.719325
10.43750 142.15385 269.91676 14.390943
10.45312 149.87770 280.16626 19.589151
10.46875 152.03874 284.80490 19.272586
10.48438 155.52991 290.72828 20.331547
10.50000 143.70956 281.29715 6.121980
10.51562 144.86804 284.80405 4.932018
10.53125 150.57027 292.81595 8.324581
10.54688 156.17148 300.68993 11.653042
10.56250 162.91642 309.67243 16.160415
10.57812 167.96348 316.92344 19.003512
10.59375 170.24252 321.37431 19.110738
10.60938 173.24254 326.51538 19.969707
10.62500 173.89835 329.28274 18.513961
10.64062 181.92820 339.39583 24.460577
10.65625 185.62127 345.14493 26.097603
10.67188 188.82313 350.37666 27.269594
10.68750 191.58817 355.14638 28.029951
10.70312 197.56781 363.10643 32.029187
10.71875 201.46633 368.96194 33.970710
10.73438 203.75381 373.18381 34.323802
10.75000 211.86575 383.20831 40.523188
10.76562 218.58229 391.81629 45.348290
10.78125 223.19144 398.29645 48.086433
10.79688 229.36717 406.32341 52.410940
10.81250 237.59928 416.38758 58.810989
10.82812 244.59432 425.19609 63.992543
10.84375 247.02798 429.42520 64.630764
10.85938 253.22807 437.40324 69.052906
10.87500 258.46738 444.40349 72.531266
10.89062 265.76017 453.44071 78.079642
10.90625 268.82203 458.23093 79.413143
10.92188 274.29332 465.41494 83.171700
10.93750 278.46062 471.27976 85.641485
10.95312 283.35496 477.85680 88.853120
10.96875 290.67334 486.84344 94.503231
10.98438 301.22108 499.04539 103.396775
As you can see, the # of applications in a given cycle is either flat or increasing. Yet in the prediction, there's a dip just after day 30. For the life of me, I cannot figure out what is causing it. Any ideas?
Related
I am trying to fit my data with a Weibull density functions.
eventually, I want to smooth my observations for the entire year so that I can create a smooth GPP (my observation)- DOY (day of the year) curve.
The data is detached at the end of my question.And here's the point plot for my data
Point plot
The formula is quite complex here's the formula, P(t) stands for my observations
somehow I managed to build a nonlinear model for my data using code below,
library(nls2)
library(dplyr)
require(minpack.lm)
#I store my data in data.frame d
#define weibull function
weibull_function<-function(a,b,k,x0,y0,t){
y=
ifelse(t>(x0-b*(k-1)/k),
y0+a*((k-1)/k)^((1-k)/k)*abs((t-x0)/b+((k-1)/k)^(1/k))^(k-1)*exp(1)^(-abs((t-x0)/b+((k-1)/k)^(1/k))^k+(k-1)/k),
y0
)
return(y)
}
#data fitting
lm1<-nlsLM(y~weibull_function(a,b,k,x0,y0,t),data=d,start=list(a=0,b=10,k=2,x0=1,y0=0)
#plot predict values
plot(d$x,predict(lm1,d))
But the predicted values can not actually fit my data, as u see in the plot fitted data
I had go through quite a lot of answers on StackOverflow,
and aware that the bias may relate to the start values I use.
So I changed some of the values for the start value, and here's what surprised me.
As I go through different combinations of start values for my a,b,k,x0 and y0, the nls function generated quite an amount of different models, which use different values,
however, none of them seems to really fit my data.
Now I am quite confused about which strat values I should use and how can I make sure that the model (suppose I eventually find ONE fits my data) is better than any other nls Weibull models (since it is impossible to go through all combinations of start values?
Thank u
t y
1 1 0.0000000
2 2 0.0000000
3 3 0.0000000
4 4 0.0000000
5 5 0.0707867
6 6 0.1712200
7 7 0.4918100
8 8 0.7889240
9 9 0.5143970
10 10 0.7365840
11 11 0.8226880
12 12 0.8913360
13 13 1.9113300
14 14 1.9021600
15 15 2.5347900
16 16 2.9011300
17 17 2.4049000
18 18 0.7344520
19 19 0.1427200
20 20 0.0541768
21 21 0.0000000
22 22 0.0000000
23 23 0.1926340
24 24 0.5145610
25 25 0.8064800
26 26 0.8090040
27 27 2.1381500
28 28 1.8712600
29 29 0.9658490
30 30 0.2964860
31 31 1.2073700
32 32 2.5077900
33 33 3.4101900
34 34 2.8787600
35 35 3.6792400
36 36 2.9349200
37 37 2.6029300
38 38 1.9863700
39 39 1.2938900
40 40 0.4992630
41 41 0.6379650
42 42 0.4024000
43 43 0.1084260
44 44 0.1374730
45 45 0.2230510
46 46 0.1501440
47 47 0.4220550
48 48 0.7916190
49 49 0.6582870
50 50 1.2428100
51 51 1.0643000
52 52 0.4634650
53 53 0.4777060
54 54 0.2625760
55 55 0.3416690
56 56 2.0303200
57 57 1.1497000
58 58 1.4016800
59 59 0.7974760
60 60 1.6967400
61 61 1.5555500
62 62 1.3034300
63 63 2.9090000
64 64 2.0858800
65 65 0.8658620
66 66 3.3597300
67 67 1.0571400
68 68 4.4057700
69 69 3.0252900
70 70 1.2971200
71 71 3.9716500
72 72 3.1547100
73 73 1.6375300
74 74 3.0920600
75 75 4.3314800
76 76 3.6577800
77 77 3.0225800
78 78 3.4114200
79 79 4.1715900
80 80 3.5697300
81 81 3.8911100
82 82 4.4364500
83 83 4.9133700
84 84 5.2404200
85 85 5.7771400
86 86 6.7429000
87 87 6.9022200
88 88 7.4436900
89 89 4.3942800
90 90 0.8826800
91 91 1.4101000
92 92 2.2473800
93 93 2.9795900
94 94 3.9610900
95 95 2.8689700
96 96 2.3157700
97 97 4.2013700
98 98 2.4536200
99 99 2.3285200
100 100 1.6641800
101 101 1.8391400
102 102 3.7247200
103 103 4.4881200
104 104 5.4677000
105 105 7.1896600
106 106 4.5204400
107 107 5.8330400
108 108 3.3793700
109 109 3.8234600
110 110 3.9182200
111 111 3.1710000
112 112 2.9232900
113 113 4.2434700
114 114 4.7464600
115 115 4.6802300
116 116 5.1251200
117 117 6.4484500
118 118 5.6865200
119 119 4.1672000
120 120 4.9955900
121 121 6.9491800
122 122 5.7618500
123 123 2.4349800
124 124 3.7315500
125 125 8.3070800
126 126 4.3468400
127 127 8.4310100
128 128 9.7953500
129 129 5.1387300
130 130 5.6159800
131 131 4.9249800
132 132 5.2035200
133 133 7.3140900
134 134 8.5128400
135 135 8.8445500
136 136 6.4021100
137 137 8.5730400
138 138 9.0752800
139 139 6.9884600
140 140 10.0649000
141 141 10.9208000
142 142 10.4544000
143 143 14.0787000
144 144 12.6344000
145 145 11.9214000
146 146 15.1133000
147 147 15.3369000
148 148 15.4777000
149 149 16.0808000
150 150 15.8116000
151 151 15.3791000
152 152 10.9130000
153 153 11.8881000
154 154 12.5383000
155 155 2.9121600
156 156 4.8731600
157 157 11.6981000
158 158 6.8281600
159 159 8.1552300
160 160 11.3900000
161 161 10.4996000
162 162 9.9490400
163 163 7.3252500
164 164 11.6759000
165 165 10.3756000
166 166 17.2289000
167 167 6.7320000
168 168 13.6835000
169 169 15.4414000
170 170 12.7428000
171 171 13.5159000
172 172 13.8205000
173 173 9.9679200
174 174 11.4347000
175 175 11.8706000
176 176 6.5545700
177 177 13.6308000
178 178 15.3185000
179 179 9.1710900
180 180 13.5977000
181 181 11.2282000
182 182 11.7510000
183 183 11.4871000
184 184 10.4018000
185 185 10.8641000
186 186 9.2063100
187 187 11.3159000
188 188 10.6050000
189 189 12.6539000
190 190 9.2266000
191 191 8.5330400
192 192 9.2949000
193 193 8.2153200
194 194 10.7958000
195 195 7.4245200
196 196 7.2358800
197 197 9.3145700
198 198 8.3644700
199 199 8.4106900
200 200 13.7398000
201 201 12.8421000
202 202 9.3427900
203 203 11.5155000
204 204 12.1537000
205 205 11.3195000
206 206 10.8288000
207 207 11.1031000
208 208 12.6185000
209 209 10.4288000
210 210 8.7446600
211 211 13.1651000
212 212 12.4868000
213 213 7.0671500
214 214 10.6482000
215 215 10.5971000
216 216 11.2978000
217 217 12.0698000
218 218 11.9749000
219 219 11.3467000
220 220 12.7263000
221 221 8.9283400
222 222 9.7184300
223 223 10.2274000
224 224 11.9933000
225 225 12.6712000
226 226 11.4917000
227 227 11.5164000
228 228 11.1688000
229 229 12.1940000
230 230 12.2719000
231 231 12.6843000
232 232 12.0033000
233 233 10.4394000
234 234 10.0225000
235 235 9.3543900
236 236 9.5651400
237 237 8.0770500
238 238 8.2516400
239 239 6.7008700
240 240 10.2780000
241 241 8.4796000
242 242 9.8009400
243 243 8.6459500
244 244 7.7860100
245 245 9.7695600
246 246 8.4967000
247 247 8.2067600
248 248 8.2361900
249 249 7.3512700
250 250 6.2018700
251 251 7.1628900
252 252 7.0082400
253 253 6.9478600
254 254 6.8310100
255 255 4.1930200
256 256 7.1842600
257 257 7.2565500
258 258 3.7791600
259 259 6.7925900
260 260 10.1900000
261 261 7.4041900
262 262 8.6597800
263 263 9.5826000
264 264 8.3029000
265 265 7.2548300
266 266 8.7421600
267 267 4.3173600
268 268 5.5106100
269 269 6.4128400
270 270 5.4460700
271 271 5.8495000
272 272 6.1458700
273 273 6.7045200
274 274 7.3160100
275 275 6.4701900
276 276 4.5038000
277 277 2.7967300
278 278 4.6101100
279 279 3.1605100
280 280 3.4307200
281 281 5.7120700
282 282 4.8887400
283 283 5.2968700
284 284 5.8722500
285 285 6.0290200
286 286 3.8281000
287 287 1.4922500
288 288 4.3007900
289 289 4.7463100
290 290 3.6876100
291 291 3.1633900
292 292 2.5615100
293 293 4.0825100
294 294 2.8859400
295 295 3.1885900
296 296 5.4614400
297 297 4.9645100
298 298 4.4726700
299 299 1.3583300
300 300 1.6828900
301 301 3.0714600
302 302 3.4279900
303 303 1.2706300
304 304 2.2885800
305 305 4.0884900
306 306 1.4124700
307 307 3.6298100
308 308 2.7364700
309 309 2.8791000
310 310 2.6254400
311 311 3.5437700
312 312 1.8247300
313 313 1.6026100
314 314 2.0445300
315 315 1.2098200
316 316 2.9734400
317 317 1.7955200
318 318 1.6497700
319 319 3.7585900
320 320 2.1699300
321 321 1.9716500
322 322 1.0365200
323 323 1.0400600
324 324 1.2130500
325 325 2.7250800
326 326 1.6329600
327 327 3.0840200
328 328 0.7717740
329 329 0.8716610
330 330 1.6803600
331 331 1.3165100
332 332 0.8895280
333 333 1.1678900
334 334 1.3315100
335 335 1.3054600
336 336 0.8515050
337 337 0.4578000
338 338 0.0516099
339 339 0.1484510
340 340 0.2275460
341 341 0.8208840
342 342 0.7448860
343 343 2.3841900
344 344 0.2445460
345 345 0.7701040
346 346 1.9149200
347 347 1.4889100
348 348 0.8986610
349 349 0.3705810
350 350 0.4623590
351 351 0.2586430
352 352 0.1939820
353 353 0.1817090
354 354 0.1586170
355 355 0.0517517
356 356 0.0291422
357 357 0.0269378
358 358 0.0960937
359 359 0.4633600
360 360 0.5766720
361 361 0.8399390
362 362 0.6647790
363 363 0.7475380
364 364 1.6569600
365 365 1.8504600
366 366 1.3835600
The function ludridate::yday returns the day of the year as an integer:
> lubridate::yday("2020-07-01")
[1] 183
I would like to be able to calculate the day of the year assuming a different yearly start date. For example, I would like to start all years on July 1st (07-01), such that I could call:
> lubridate::yday("2020-07-01", start = "2020-07-01")
[1] 1
I could call :
> lubridate::yday("2020-07-01") - lubridate::yday("2020-06-30")
[1] 1
But not only this would fail to account for leap years, it would be difficult to account for a date with a 2021 year (or any date that crosses the January 1st threshold for any given year):
> lubridate::yday("2021-01-01") - lubridate::yday("2020-06-30")
[1] -181
After working a little bit with this on my own, this is what I have created:
valiDATE <- function(date) {
stopifnot(`date must take the form of "MM-DD"` = stringr::str_detect(date, "^\\d{2}-\\d{2}$"))
}
days <- function(x, end = "06-30") {
valiDATE(end)
calcdiff <- function(x) {
endx <- glue::glue("{lubridate::year(x)}-{end}")
if(lubridate::yday(x) > lubridate::yday(endx)) {
diff <- ceiling(difftime(x, endx, units = "days"))
} else {
endx <- glue::glue("{lubridate::year(x)-1}-{end}")
diff <- ceiling(difftime(x, endx, units = "days"))
}
unclass(diff)
}
purrr::map_dbl(x, calcdiff)
}
day_vec <- seq(as.Date("2020-07-01"), as.Date("2021-06-30"), by = "days")
days(day_vec)
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
[38] 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74
[75] 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111
[112] 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148
[149] 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185
[186] 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222
[223] 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259
[260] 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296
[297] 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333
[334] 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365
I would still like to see other solutions. Thanks!
Adding or subtracting months from your date to the desired start of the year can be helpful in this case.
For your example vector of dates day_vec, you can subtract six months from all the dates if you want to start your year on July 1.
day_vec <- seq(as.Date("2020-07-01"), as.Date("2021-06-30"), by = "days")
day_vec2 <- day_vec %m-% months(6) #Substracting because the new year will start 6 months later
yday(day_vec2) #Result is similar to what you desired.
The point to keep in mind is whether your new beginning of the year is before or after the conventional beginning. If your year starts early, you should add months and vice-versa.
I am trying to creating a training dataframe for fitting my model. The dataframe I am working with is a nested dataframe. Using createDataPartition , I have created a list of indexes. But I am having trouble subsetting the dataframe with said list.
Here is what the object partitionindex created by caret::createDataParition looks like:
partitionindex
[[1]]
[[1]]$Resample1
[1] 4 5 6 8 9 10 11 12 14 15 17 18 20 21 23 28 30 32 34 38 39 41 42 46
[25] 47 48 50 52 53 56 57 58 59 60 64 66 67 70 73 75 76 77 78 82 85 87 90 95
[49] 97 99 105 106 110 113 114 116 117 118 119 120 123 124 126 128 129 130 132 134 135 137 139 141
[73] 142 143 144 145 146 148 149 151 153 154 155 157 158 164 165 167 170 174 176 178 182 183 184 186
[97] 189 190 191 193 194 197 198 200 201 202 203 206 210 211 212 213 214 216 219 221 222 223 226 232
[121] 236 237 241 243 247 248 251 254 255 256 258 262 263 264 269 270 271 274 276 277 280 281 284 291
[145] 292 293 295 296 297 299 300 301 302 303 304 309 314 317 318 319 320 323 324 327 328 329 339 341
[169] 342 343 344 345 349 350 351 353 354 355 356 360 361 363 364 365 367 370 371 375 379 380
[[2]]
[[2]]$Resample1
[1] 1 2 4 5 7 8 9 10 14 17 19 22 24 26 28 29 31 32 34 36 37 42 44 45
[25] 47 48 49 51 52 53 56 58 65 66 67 68 72 74 75 77 78 81 83 86 95 96 98 100
[49] 102 104 105 106 110 113 114 115 118 119 122 123 124 125 128 129 130 132 135 137 142 144 145 147
[73] 149 150 151 152 158 160 161 163 165 168 169 170 171 175 176 180 183 186 187 188 191 194 196 199
[97] 203 205 206 207 208 209 210 211 213 215 218 220 221 222 224 225 227 228 231 233 240 241 242 243
[121] 247 248 250 251 254 255 256 257 258 262 263 264 267 268 269 270 272 273 277 278 282 285 286 288
[145] 289 290 292 293 294 295 296 300 301 302 304 305 307 308 312 314 315 316 317 321 323 328 329 332
[169] 333 335 336 339 341 343 344 345 347 348 349 354 355 359 360 362 363 366 369 374 375 376 377
[[3]]
[[3]]$Resample1
[1] 5 8 10 12 17 22 25 26 27 30 32 33 34 36 38 39 42 44 45 46 47 51 52 57
[25] 58 59 62 64 66 70 71 73 75 78 81 82 83 84 86 89 90 95 96 97 98 100 103 104
[49] 105 108 109 111 112 113 114 117 119 120 121 123 124 127 130 131 132 133 137 139 140 141 144 148
[73] 149 150 151 153 154 155 156 157 159 160 163 164 167 168 170 172 173 176 178 179 181 182 184 186
[97] 187 188 189 190 191 207 208 212 214 215 219 220 222 223 227 230 233 234 238 248 250 251 252 253
[121] 256 258 260 261 262 264 265 266 267 270 271 272 275 278 281 285 288 289 291 293 295 297 298 302
[145] 303 305 306 308 312 314 315 318 319 320 321 323 325 326 329 332 333 334 335 336 338 342 343 345
[169] 347 348 349 350 351 352 360 361 363 364 365 366 368 369 370 371 372 374 375 376 377 378
[[4]]
[[4]]$Resample1
[1] 1 2 3 4 5 6 7 8 10 12 14 15 18 19 20 22 23 25 26 27 28 30 31 34
[25] 37 38 40 44 45 46 47 49 50 51 52 59 62 64 66 68 70 71 72 73 75 76 79 80
[49] 81 83 84 86 88 89 91 92 94 95 96 97 99 100 102 105 108 109 112 119 125 126 129 130
[73] 132 134 137 139 140 141 145 150 153 155 156 158 159 162 163 170 178 179 181 182 184 185 187 188
[97] 190 191 192 194 196 197 199 201 205 206 207 218 219 220 223 229 230 231 232 237 238 240 241 242
[121] 244 245 247 248 249 251 252 253 257 258 260 261 263 264 265 266 270 271 273 275 276 283 285 289
[145] 290 291 294 298 299 300 302 303 304 306 307
And the nested dataframe:
> nested_df
# A tibble: 4 x 2
# Groups: League [4]
League data
<chr> <list<df[,133]>>
1 F1 [380 x 133]
2 E0 [380 x 133]
3 SP1 [380 x 133]
4 D1 [308 x 133]
I tried something like this but to no avail:
nested_df%>%
mutate(train = data[map(data,~.x[partitionindex,])])
Error in x[i] : invalid subscript type 'list'
Is there a solution involving purrr::map or lappy?
I think this could work, with purrr::pmap
nested_df %>%
ungroup() %>% # make sure the table is not grouped
mutate(i = row_number()) %>%
mutate(train = pmap(
.,
function(data, i, ...) {
data[partitionindex[[i]]$Resample1,]
}
)) %>%
select(-i)
I have downloaded the historical stock prices of a list of 218 stocks. I want check whether it is populated with the the most recent date or not. I have written a function to that effect, by name check.date
function(snlq){
j <- 1;
for(i in 1:length(snlq)){
ind <- index(snlq[[i]])
if(identical(ind[length(ind)],"2018-05-04") == FALSE){
s[j] <- i
j <- j+1
}
}
return(s);
}
snlq is list of stocks with length 218 and of class list
But when I run it, I get the following output:
check.date(snlq)
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
[33] 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
[65] 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96
[97] 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128
[129] 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160
[161] 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192
[193] 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 356 358 359 360 361 362
[225] 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394
[257] 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426
[289] 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458
[321] 459 460 461 462 463 464 465 466 467 468 469 470
How can the output be of length more than 218? Also I have checked that snlq[[1]] is up to date; then why is 1 in the output?
This might seem like a simple for loop problem, but is perplexing me.
Very many thanks for your time and effort...
It seems the problem is that s is not created in scope in which it is updated and used. #Dave2e has correctly pointed out in above comment. The most logical error seems to me is that s has been created in global space that's why your function is not giving error, otherwise your function would have not run.
There are many ways to fix the problem. One of the option can be as:
check.date <- function(snlq){
j <- 1;
ss <- integer() #declare before use in function scope
for(i in 1:length(snlq)){
ind <- index(snlq[[i]])
if(identical(ind[length(ind)],"2018-05-04") == FALSE){
s = c(s,j) #Kind of adding an element to vector s
j <- j+1
}
}
return(s);
}
I cannot check this result without a reproducible example, but I think this will simplify your function greatly.
check.data <- function(input, today) {
result <- sapply(input, function(x) {
ind <- index(x)
!identical(ind[length(ind)], today)
})
which(result)
}
I find myself pulling bits of data here and there from old excel files I have stored from past projects. These files are unorganized, I mean no common format of rows and columns so it would be useless to read the entire spreadsheet directly into R, and most of the time I just want to grab a couple columns and rows of data at a time anyway. Rather than creating a separate text file with the columns/rows I want and reading this file in I find the easiest way to get the data is to use read.table(text="....") and copy and paste the bit of data I want into text. This works great but when I start to grab larger data sets it takes longer for the console to process the information. I have the same issue sometimes when entering very large functions. I think this may be an RStudio limitation on how fast it shows the information passing in the console but I am not sure. I have to wait for the console to pass this information before I can do anything else. How can I get my console to either show me the information I'm reading faster or to not show me anything at all? Preferably the first.
Example Data
GNL <- read.table(header=TRUE,text="WYD Temp_C
1 12.77777778
2 11.66666667
3 8.888888889
4 3.888888889
5 -0.555555556
6 -1.111111111
7 3.888888889
8 7.777777778
9 8.333333333
10 6.666666667
11 10.55555556
12 15.55555556
13 16.11111111
14 16.66666667
15 15
16 13.33333333
17 14.44444444
18 13.88888889
19 11.66666667
20 12.77777778
21 12.22222222
22 15
23 14.44444444
24 11.11111111
25 7.222222222
26 5.555555556
27 6.666666667
28 8.888888889
29 11.66666667
30 11.66666667
31 10.55555556
32 7.777777778
33 8.333333333
34 2.777777778
35 -4.444444444
36 -5
37 -4.444444444
38 -1.666666667
39 0.555555556
40 5.555555556
41 NA
42 2.777777778
43 1.666666667
44 3.333333333
45 3.888888889
46 5
47 5.555555556
48 4.444444444
49 -1.111111111
50 -3.888888889
51 -3.888888889
52 -0.555555556
53 3.888888889
54 5.555555556
55 1.111111111
56 4.444444444
57 10
58 10
59 8.888888889
60 10
61 2.777777778
62 -3.333333333
63 1.666666667
64 -1.111111111
65 NA
66 NA
67 0
68 3.888888889
69 5
70 5.555555556
71 2.777777778
72 -0.555555556
73 -3.888888889
74 -3.333333333
75 -2.222222222
76 -1.666666667
77 3.888888889
78 6.111111111
79 1.666666667
80 2.222222222
81 5
82 3.333333333
83 0
84 4.444444444
85 5
86 5
87 5.555555556
88 6.111111111
89 8.888888889
90 7.222222222
91 5.555555556
92 7.777777778
93 10
94 8.888888889
95 9.444444444
96 11.11111111
97 8.888888889
98 6.111111111
99 5
100 7.777777778
101 7.777777778
102 5.555555556
103 6.111111111
104 5
105 6.111111111
106 5.555555556
107 0.555555556
108 -6.111111111
109 -2.222222222
110 2.777777778
111 1.666666667
112 2.222222222
113 -3.888888889
114 -3.333333333
115 NA
116 2.777777778
117 7.777777778
118 3.888888889
119 3.888888889
120 7.777777778
121 6.111111111
122 3.888888889
123 3.333333333
124 0.555555556
125 5.555555556
126 1.111111111
127 0.555555556
128 1.111111111
129 3.333333333
130 0
131 4.444444444
132 6.666666667
133 5
134 0
135 -1.111111111
136 -5.555555556
137 -2.777777778
138 -5
139 1.111111111
140 3.888888889
141 0
142 -2.222222222
143 0
144 6.666666667
145 8.333333333
146 8.888888889
147 7.777777778
148 2.777777778
149 -0.555555556
150 -5.555555556
151 -5.555555556
152 -5.555555556
153 -5
154 2.222222222
155 8.333333333
156 8.333333333
157 7.222222222
158 -5
159 -0.555555556
160 7.222222222
161 7.222222222
162 5
163 1.111111111
164 -0.555555556
165 -1.111111111
166 0
167 3.333333333
168 1.111111111
169 NA
170 -7.777777778
171 -6.666666667
172 NA
173 NA
174 NA
175 NA
176 2.777777778
177 -2.777777778
178 -2.777777778
179 NA
180 0.555555556
181 3.333333333
182 7.222222222
183 -0.555555556
184 -2.222222222
185 3.888888889
186 6.666666667
187 -1.111111111
188 -5.555555556
189 -1.666666667
190 6.111111111
191 7.777777778
192 7.777777778
193 3.888888889
194 -2.777777778
195 -3.333333333
196 -4.444444444
197 -2.777777778
198 3.333333333
199 6.111111111
200 6.666666667
201 7.222222222
202 11.11111111
203 13.88888889
204 15
205 15.55555556
206 13.88888889
207 10.55555556
208 7.222222222
209 2.222222222
210 4.444444444
211 8.888888889
212 11.11111111
213 11.11111111
214 8.888888889
215 8.333333333
216 5.555555556
217 5
218 6.666666667
219 10
220 11.66666667
221 12.77777778
222 13.88888889
223 12.77777778
224 12.22222222
225 14.44444444
226 16.11111111
227 10
228 10.55555556
229 NA
230 11.11111111
231 8.888888889
232 12.22222222
233 15.55555556
234 14.44444444
235 12.77777778
236 10
237 NA
238 NA
239 NA
240 NA
241 NA
242 NA
243 14.44444444
244 18.33333333
245 18.33333333
246 16.66666667
247 16.11111111
248 6.666666667
249 1.666666667
250 7.777777778
251 11.66666667
252 11.11111111
253 11.11111111
254 NA
255 15.55555556
256 16.11111111
257 16.11111111
258 16.11111111
259 17.22222222
260 22.22222222
261 21.11111111
262 17.22222222
263 16.11111111
264 18.33333333
265 16.11111111
266 11.11111111
267 7.777777778
268 9.444444444
269 8.888888889
270 10.55555556
271 13.33333333
272 15
273 14.44444444
274 14.44444444
275 14.44444444
276 15.55555556
277 16.11111111
278 16.11111111
279 16.11111111
280 17.22222222
281 19.44444444
282 19.44444444
283 20.55555556
284 22.22222222
285 22.22222222
286 22.22222222
287 19.44444444
288 18.33333333
289 17.22222222
290 15
291 12.22222222
292 13.33333333
293 13.88888889
294 17.77777778
295 20.55555556
296 21.11111111
297 19.44444444
298 17.22222222
299 17.22222222
300 17.22222222
301 16.11111111
302 15.55555556
303 17.77777778
304 19.44444444
305 19.44444444
306 20.55555556
307 21.66666667
308 22.22222222
309 18.88888889
310 19.44444444
311 19.44444444
312 21.11111111
313 22.22222222
314 22.77777778
315 22.77777778
316 23.88888889
317 23.33333333
318 23.33333333
319 21.11111111
320 21.11111111
321 21.11111111
322 21.11111111
323 18.88888889
324 20
325 20
326 18.88888889
327 18.33333333
328 17.77777778
329 18.88888889
330 18.88888889
331 15.55555556
332 16.66666667
333 17.77777778
334 18.88888889
335 19.44444444
336 15.55555556
337 13.88888889
338 16.11111111
339 17.22222222
340 18.33333333
341 18.33333333
342 16.11111111
343 17.77777778
344 19.44444444
345 19.44444444
346 17.77777778
347 16.11111111
348 17.77777778
349 19.44444444
350 18.88888889
351 18.33333333
352 17.22222222
353 16.66666667
354 NA
355 17.77777778
356 18.33333333
357 17.77777778
358 17.22222222
359 16.11111111
360 14.44444444
361 15
362 15.55555556
363 16.66666667
364 16.66666667
365 17.77777778
366 20")
If the problem is really just the time the information takes to be shown in the console, you could just source the file:
Save you read.table script in a different .R file, and then just do source("filepath/file.R", echo=FALSE).
But I should say that, at first glance, it doesn't look like a good idea to handle your data this way, manually copying and pasting different bits of data from different files.