I have read the following post, as well as and related ones, and I have found it really useful:
Interpolate / Extend quarterly to monthly series
I have a similar but more general/extended problem, which I still haven't figure how to solve. Got a matrix of seven time series (named value1, ..., value7, below) including quarterly data for 63 dates, as well as NAs.
> str(test)
'data.frame': 63 obs. of 8 variables:
$ Date : Date, format: "2001-03-30" "2001-06-29" "2001-09-28" ...
$ value1: num 320 181.1 19.7 133.1 160.6 ...
$ value2: num 4741 4556 4115 3892 3605 ...
$ value3: num 146.8 -163.9 73.2 111.6 210.5 ...
$ value4: num -135 -383.3 104.3 74.7 -75.4 ...
$ value5: num 21.6 20.2 NA NA NA ...
$ value6: num -19.1 -82.4 85 134.6 111 ...
$ value7: num -163 -215 -164 -137 -199 ...
> test
Date value1 value2 value3 value4 value5 value6 value7
1 2001-03-30 319.952 4740.905 146.756 -134.998 21.645 -19.0611 -162.713
2 2001-06-29 181.103 4555.732 -163.867 -383.334 20.199 -82.3660 -215.105
3 2001-09-28 19.724 4115.053 73.189 104.300 NA 84.9740 -164.073
4 2001-12-31 133.134 3891.754 111.567 74.683 NA 134.6460 -136.974
5 2002-03-28 160.564 3605.080 210.533 -75.351 NA 110.9770 -199.083
6 2002-06-28 -111.902 3220.115 -107.759 -22.624 NA 408.4770 -172.327
7 2002-09-30 -127.751 2962.472 -93.616 241.749 NA 687.2240 -195.772
8 2002-12-31 -59.553 2697.029 -98.068 119.288 NA 903.8211 -137.965
9 2003-03-31 86.427 2509.511 -78.662 -124.428 NA 1130.9380 -180.496
10 2003-06-30 90.070 2554.473 -14.345 -66.764 NA 925.9010 -103.080
11 2003-09-30 246.801 3000.005 0.001 -244.487 NA 1005.6370 -123.959
12 2003-12-31 325.088 3519.168 388.592 129.915 NA 739.5460 -162.781
13 2004-03-31 359.263 4041.043 206.260 -101.966 NA 745.8810 -202.047
14 2004-06-30 367.347 4657.622 254.678 -59.913 NA 852.4181 -360.963
15 2004-09-30 373.089 4943.322 263.395 -37.116 NA 857.8670 -406.748
16 2004-12-31 351.817 5001.434 362.188 118.842 NA 663.5370 -470.379
17 2005-03-31 287.224 4991.632 251.327 39.029 24.245 785.3220 -518.472
18 2005-06-30 311.324 4989.710 265.163 11.546 25.653 676.1650 -303.265
19 2005-09-30 369.478 5273.006 429.086 133.030 30.615 667.2330 -362.296
20 2005-12-30 482.974 5847.577 537.279 63.616 24.447 -265.5200 -329.140
21 2006-03-31 432.157 5953.107 566.349 196.971 -4.915 -1807.2560 -310.326
22 2006-06-30 295.014 5909.556 218.850 -6.842 -17.449 -1837.8140 -455.364
23 2006-09-29 318.926 5714.423 230.185 14.135 -13.551 -1667.5960 -424.892
24 2006-12-29 232.784 5649.147 271.616 142.736 46.000 2256.0000 -666.418
25 2007-03-30 -190.000 5549.989 41.000 373.000 62.000 2674.0000 -586.000
26 2007-06-29 -70.000 5642.622 -635.000 -412.000 80.000 3943.0000 -414.000
27 2007-09-28 153.000 5873.000 223.000 168.000 76.000 3807.0000 -419.000
28 2007-12-31 234.000 5858.000 61.000 -153.000 76.000 3380.0000 -266.000
29 2008-03-31 83.000 6112.000 16.000 110.000 86.000 3534.0000 -323.000
30 2008-06-30 -18.000 6165.000 -242.000 -82.000 91.000 3694.0000 -106.000
31 2008-09-30 426.000 6404.000 -216.000 -497.000 87.000 3799.0000 -82.000
32 2008-12-31 -237.000 5808.000 -250.000 110.000 88.000 3680.0000 -113.000
33 2009-03-31 -18.000 5498.000 -391.000 -252.000 94.000 2844.0000 -84.000
34 2009-06-30 33.000 5320.000 -144.000 -120.000 102.000 3107.0000 -112.000
35 2009-09-30 205.000 4919.000 -142.000 -288.000 110.000 3059.0000 -97.000
36 2009-12-31 1572.000 5403.000 1150.000 -361.000 116.000 1884.0000 -174.000
37 2010-03-31 282.000 5800.000 23.000 -237.000 46.000 672.0000 -48.000
38 2010-06-30 221.000 6269.000 -98.000 -279.000 52.000 684.0000 -31.000
39 2010-09-30 217.000 6491.000 -124.000 -343.000 53.000 671.0000 -31.000
40 2010-12-31 511.000 6494.000 -213.000 -647.000 37.000 632.0000 -38.000
41 2011-03-31 142.000 6533.000 -168.000 -326.000 45.000 485.0000 -38.000
42 2011-06-30 185.000 6454.000 174.000 17.000 45.000 338.0000 -67.000
43 2011-09-30 217.000 6526.000 189.000 -5.000 39.000 203.0000 -58.000
44 2011-12-30 140.000 6568.000 187.000 63.000 41.000 102.0000 -87.000
45 2012-03-30 -517.000 6540.000 107.000 384.000 41.000 306.0000 -40.000
46 2012-06-29 142.000 6379.000 81.000 -49.000 41.000 262.0000 -39.000
47 2012-09-28 -65.000 5958.000 -240.000 -185.000 42.000 560.0000 -32.000
48 2012-12-31 -356.000 5422.000 -286.000 82.000 43.000 859.0000 -22.000
49 2013-03-28 -32.000 4925.000 -155.000 -159.000 43.000 861.0000 -20.000
50 2013-06-28 30.000 4673.000 -35.000 -8.000 40.000 930.0000 -28.000
51 2013-09-30 152.000 4865.000 21.000 -61.000 46.000 868.0000 -15.000
52 2013-12-31 189.000 5299.000 21.000 -128.000 43.000 871.0000 -21.000
53 2014-03-31 102.000 5608.000 -204.000 -277.000 46.000 1156.0000 -21.000
54 2014-06-30 116.000 5888.000 -28.000 -118.000 46.000 1262.0000 -23.000
55 2014-09-30 112.000 5856.000 18.000 -65.000 42.000 1270.0000 -29.000
56 2014-12-31 -282.000 5506.000 116.000 170.000 40.000 1172.0000 -22.000
57 2015-03-31 -91.000 5139.000 -172.000 -129.000 40.000 1362.0000 -22.000
58 2015-06-30 -92.000 4640.000 -57.000 55.000 NA 1440.0000 -17.000
59 2015-09-30 -116.000 4272.000 -59.000 64.000 NA 1505.0000 -25.000
60 2015-12-31 -15.000 3991.000 53.000 112.000 NA 1477.0000 -32.000
61 2016-03-31 -35.000 3793.000 -42.000 19.000 NA 1520.0000 -26.000
62 2016-06-30 25.000 3878.000 -85.000 -67.000 NA 1281.0000 -21.000
63 2016-09-30 -260.000 4124.000 29.000 67.000 NA 374.0000 -9.000
I want to interpolate daily values (output will include 5664 days in total), using a cubic spline or linear relation. The solution provided in the link above is good but it works only if I apply it on each time series separately, to which I always need to associate the "Date" column: (Date, value1); (Date, value2); ..., as below, which is quite time-consuming:
DateSeq <- seq(test$Date[1],tail(test$Date,1),by="1 day")
test1 <- test[1:2]
test2 <- test[c(1,3)]
...
test1Daily <- data.frame(test=DateSeq, Interp.Value=spline(test1, method="natural", xout=DateSeq)$y)
test2Daily <- data.frame(test=DateSeq, Interp.Value=spline(test2, method="natural", xout=DateSeq)$y)
...
merge1 <- merge(test1, testDaily1, by='Date', all.y = T)
merge2 <- merge(test2, testDaily2, by='Date', all.y = T)
...
...then finally merge all the merged variables above.
Does anyone knows how to apply the interpolation once to the whole matrix (meaning to each column, or time series)?
Many thanks in advance.
I have found the following solution, and the way to plot it to verify that things work out well. Hope it may be useful for others!
test_z1 <- zoo(test, order.by = test$Date, frequency = 1)
test_t1 <- as.ts(x=test_z1)
test_t2 <- as.zoo(test_t1)
index(test_t2) <- as.Date(index(test_t2), origin = '1970-01-01')
test_t2_ncol <- test_t2[,-c(1)]
test_g <- na.spline(test_t2_ncol)
Now I put together each time series ("value1, value2,...", in "test") with its own interpolated time series in "test_g", and plot them to verify by eye the goodness of the interpolation:
interp_val1 <- test_g[,-c(2:7)]
orig_val1 <- test[,-c(3:8)]
orig_val1_z <- read.zoo(orig_val1)
merge_val1 <- merge(orig_val1_z, interp_val1)
options(stringsAsFactors = FALSE) # to avoid conversion to factors
merge_val1_df <- data.frame(Date=time(merge_val1), merge_val1, check.names=FALSE, row.names=NULL)
plot(merge_val1_df$orig_val1_z, lwd=2)
lines(merge_val1_df$interp_val1, lwd=1, col="green")
It seems that the interpolation works well!
Related
I am trying to plot a heat map from data with three variables. I am using ggplot with geom_raster, but doesn't seem to work. I am unable to see what's going wrong.
library(tidyverse)
p <- read.csv("Rheatmaptest.csv", header = TRUE);
p
xdir ydir Category.1 Category.2 Category.3 Category.4
1 -10.731 10.153 0.61975 3.2650 0.19025 13.00
2 -21.462 9.847 1.77000 3.2475 0.56325 16.70
3 -32.193 9.847 1.65500 2.9900 0.51325 176.00
4 -42.924 10.000 1.34500 3.1800 0.41350 177.00
5 -16.770 20.000 0.69600 3.4975 0.22150 174.00
6 -33.541 20.000 0.68700 3.4275 0.20250 4.24
7 -50.311 20.000 0.77350 3.1575 0.24250 177.00
8 -67.082 20.000 1.09600 3.5350 0.34600 163.00
9 -18.689 30.000 0.54250 3.5875 0.18100 160.00
10 -37.378 30.000 0.63075 3.7125 0.19300 158.00
11 -56.067 30.000 0.71975 3.5425 0.22225 2.26
12 -74.756 30.000 0.79100 3.3750 0.23000 8.24
13 -20.000 40.000 0.76650 3.7200 0.24375 167.00
14 -40.000 40.000 0.68325 3.5300 0.21350 155.00
15 -60.000 40.000 0.81075 3.3400 0.25325 145.00
16 -80.000 40.000 0.68800 3.6375 0.21350 146.00
17 -19.521 50.000 0.67900 3.7150 0.21700 167.00
18 -39.043 50.000 0.69500 3.7950 0.21225 109.00
19 -58.564 49.847 0.68300 3.5575 0.20700 166.00
20 -78.085 50.000 0.67375 3.5325 0.21975 163.00
21 -17.562 60.000 0.64350 3.7025 0.19475 140.00
22 -35.585 60.000 0.56650 3.5250 0.17775 34.30
23 -54.067 60.000 0.82350 3.7700 0.24525 129.00
24 -72.090 60.000 0.85450 3.6675 0.28225 156.00
25 -15.522 70.000 0.59100 3.3475 0.18875 144.00
26 -31.044 69.847 0.56200 3.7975 0.17250 159.00
27 -46.566 70.000 0.79375 3.5350 0.24975 145.00
28 -62.088 70.000 0.64275 3.6100 0.20375 132.00
29 -11.040 80.000 0.75875 3.7450 0.23925 138.00
30 -22.081 80.000 0.81900 3.3875 0.25975 144.00
31 -33.121 80.000 0.72725 3.5825 0.22175 132.00
32 -44.161 80.000 0.83300 3.5550 0.27000 177.00
33 -4.522 90.000 1.77500 3.1250 0.57200 16.30
34 -9.440 90.000 0.96925 3.7200 0.31000 163.00
35 -13.106 90.000 0.76975 3.6600 0.23800 3.50
36 -18.089 90.000 0.86050 3.6750 0.26650 80.50
ggplot(p, aes(x = xdir, y = ydir)) +
geom_raster(aes(fill = Category.1), interpolate = TRUE) +
scale_fill_gradient2(limits=c(0.5,2), low="blue", mid="yellow", high="red", midpoint=1)
I am able to see points when I use geom_point instead of geom_raster. Even with geom_raster, I just see very tiny points at the corresponding locations. Interpolate doesn't seem to work.
Am I missing something?
The implied precision of your data is causing your rasters to be plotted so small they are barely visible.
By reducing your precision, you can at least see your raster plot though it is still probably not very useful. Posting this I see I came to the same solution as #tifu.
db %>%
ggplot(aes(x = round(xdir/2), y = round(ydir), fill = Category.1)) +
geom_raster(aes(fill = Category.1)) +
scale_fill_gradient2(limits=c(0.5,2), low="blue", mid="yellow", high="red", midpoint=1)
My first question here :)
My goal is: Given a data frame with predictors (each column a predictor / rows observations) fit a regression using lm and then predict the value using the last observation using a rolling window.
The data frame looks like:
> DfPredictor[1:40,]
Y X1 X2 X3 X4 X5
1 3.2860 192.5115 2.1275 83381 11.4360 8.7440
2 3.2650 190.1462 2.0050 88720 11.4359 8.8971
3 3.2213 192.9773 2.0500 74130 11.4623 8.8380
4 3.1991 193.7058 2.1050 73930 11.3366 8.7536
5 3.2224 193.5407 2.0275 80875 11.3534 8.7555
6 3.2000 190.6049 2.0950 86606 11.3290 8.8555
7 3.1939 191.1390 2.0975 91402 11.2960 8.8433
8 3.1971 192.2921 2.2700 88181 11.2930 8.8681
9 3.1873 194.9700 2.3300 115959 1.9477 8.5245
10 3.2182 194.5396 2.4200 134754 11.3200 8.4990
11 3.2409 194.5396 2.2025 136685 1.9649 8.4192
12 3.2112 195.1362 2.1900 136316 1.9750 8.3752
13 3.2231 193.3560 2.2475 140295 1.9691 8.3546
14 3.2015 192.9649 2.2575 139474 1.9500 8.3116
15 3.1744 194.0154 2.1900 146202 1.8476 8.2225
16 3.1646 194.4423 2.2650 142983 1.8600 8.1948
17 3.1708 194.9473 2.2425 141377 1.8522 8.2589
18 3.1675 193.9788 2.2400 141377 1.8600 8.2600
19 3.1744 194.2563 2.3000 149875 1.8718 8.2899
20 3.1410 193.4316 2.2300 129561 1.8480 8.2395
21 3.1266 191.2633 2.2550 122636 1.8440 8.2396
22 3.1486 192.0354 2.3600 130996 1.8570 8.8640
23 3.1282 194.3351 2.4825 92430 1.7849 8.1291
24 3.1214 193.5196 2.4750 94814 1.7624 8.1991
25 3.1230 193.2017 2.3725 87590 1.7660 8.2310
26 3.1182 192.1642 2.4475 87715 1.6955 8.2414
27 3.1203 191.3744 2.3775 89857 1.6539 8.2480
28 3.1156 192.2646 2.3725 92159 1.5976 8.1676
29 3.1270 192.7555 2.3675 97425 1.5896 8.1162
30 3.1154 194.0375 2.3725 87598 1.5277 8.2640
31 3.1104 192.0596 2.3850 93236 1.5132 7.9999
32 3.0846 192.2792 2.2900 94608 1.4990 8.1600
33 3.0569 193.2573 2.3050 84663 1.4715 8.2200
34 3.0893 192.7632 2.2550 67149 1.4955 7.9590
35 3.0991 192.1229 2.3050 75519 1.4280 7.9183
36 3.0879 192.1229 2.3100 76756 1.3839 7.9133
37 3.0965 192.0502 2.2175 61748 1.3130 7.8750
38 3.0655 191.2274 2.2300 41490 1.2823 7.8656
39 3.0636 191.6342 2.1925 51049 1.1492 7.7447
40 3.1097 190.9312 2.2150 21934 1.1626 7.6895
For instance using the rolling window with width = 10 the regression should be estimate and then predict the 'Y' correspondent to the X1,X2,...,X5.
The predictions should be included in a new column 'Ypred'.
There's some way to do that using rollapply + lm/predict + mudate??
Many thanks!!
Using the data in the Note at the end and assuming that in a window of width 10 we want to predict the last Y (i..e. the 10th), then:
library(zoo)
pred <- function(x) tail(fitted(lm(Y ~., as.data.frame(x))), 1)
transform(DF, pred = rollapplyr(DF, 10, pred, by.column = FALSE, fill = NA))
giving:
Y X1 X2 X3 X4 X5 pred
1 3.2860 192.5115 2.1275 83381 11.4360 8.7440 NA
2 3.2650 190.1462 2.0050 88720 11.4359 8.8971 NA
3 3.2213 192.9773 2.0500 74130 11.4623 8.8380 NA
4 3.1991 193.7058 2.1050 73930 11.3366 8.7536 NA
5 3.2224 193.5407 2.0275 80875 11.3534 8.7555 NA
6 3.2000 190.6049 2.0950 86606 11.3290 8.8555 NA
7 3.1939 191.1390 2.0975 91402 11.2960 8.8433 NA
8 3.1971 192.2921 2.2700 88181 11.2930 8.8681 NA
9 3.1873 194.9700 2.3300 115959 1.9477 8.5245 NA
10 3.2182 194.5396 2.4200 134754 11.3200 8.4990 3.219764
11 3.2409 194.5396 2.2025 136685 1.9649 8.4192 3.241614
12 3.2112 195.1362 2.1900 136316 1.9750 8.3752 3.225423
13 3.2231 193.3560 2.2475 140295 1.9691 8.3546 3.217797
14 3.2015 192.9649 2.2575 139474 1.9500 8.3116 3.205856
15 3.1744 194.0154 2.1900 146202 1.8476 8.2225 3.177928
16 3.1646 194.4423 2.2650 142983 1.8600 8.1948 3.156405
17 3.1708 194.9473 2.2425 141377 1.8522 8.2589 3.176243
18 3.1675 193.9788 2.2400 141377 1.8600 8.2600 3.177165
19 3.1744 194.2563 2.3000 149875 1.8718 8.2899 3.177211
20 3.1410 193.4316 2.2300 129561 1.8480 8.2395 3.145533
21 3.1266 191.2633 2.2550 122636 1.8440 8.2396 3.127410
22 3.1486 192.0354 2.3600 130996 1.8570 8.8640 3.148792
23 3.1282 194.3351 2.4825 92430 1.7849 8.1291 3.124913
24 3.1214 193.5196 2.4750 94814 1.7624 8.1991 3.124992
25 3.1230 193.2017 2.3725 87590 1.7660 8.2310 3.117981
26 3.1182 192.1642 2.4475 87715 1.6955 8.2414 3.117679
27 3.1203 191.3744 2.3775 89857 1.6539 8.2480 3.119898
28 3.1156 192.2646 2.3725 92159 1.5976 8.1676 3.121039
29 3.1270 192.7555 2.3675 97425 1.5896 8.1162 3.123903
30 3.1154 194.0375 2.3725 87598 1.5277 8.2640 3.119438
31 3.1104 192.0596 2.3850 93236 1.5132 7.9999 3.113963
32 3.0846 192.2792 2.2900 94608 1.4990 8.1600 3.101229
33 3.0569 193.2573 2.3050 84663 1.4715 8.2200 3.076817
34 3.0893 192.7632 2.2550 67149 1.4955 7.9590 3.083266
35 3.0991 192.1229 2.3050 75519 1.4280 7.9183 3.089377
36 3.0879 192.1229 2.3100 76756 1.3839 7.9133 3.084225
37 3.0965 192.0502 2.2175 61748 1.3130 7.8750 3.075252
38 3.0655 191.2274 2.2300 41490 1.2823 7.8656 3.063025
39 3.0636 191.6342 2.1925 51049 1.1492 7.7447 3.068808
40 3.1097 190.9312 2.2150 21934 1.1626 7.6895 3.091819
Note: Input DF in reproducible form is:
Lines <- " Y X1 X2 X3 X4 X5
1 3.2860 192.5115 2.1275 83381 11.4360 8.7440
2 3.2650 190.1462 2.0050 88720 11.4359 8.8971
3 3.2213 192.9773 2.0500 74130 11.4623 8.8380
4 3.1991 193.7058 2.1050 73930 11.3366 8.7536
5 3.2224 193.5407 2.0275 80875 11.3534 8.7555
6 3.2000 190.6049 2.0950 86606 11.3290 8.8555
7 3.1939 191.1390 2.0975 91402 11.2960 8.8433
8 3.1971 192.2921 2.2700 88181 11.2930 8.8681
9 3.1873 194.9700 2.3300 115959 1.9477 8.5245
10 3.2182 194.5396 2.4200 134754 11.3200 8.4990
11 3.2409 194.5396 2.2025 136685 1.9649 8.4192
12 3.2112 195.1362 2.1900 136316 1.9750 8.3752
13 3.2231 193.3560 2.2475 140295 1.9691 8.3546
14 3.2015 192.9649 2.2575 139474 1.9500 8.3116
15 3.1744 194.0154 2.1900 146202 1.8476 8.2225
16 3.1646 194.4423 2.2650 142983 1.8600 8.1948
17 3.1708 194.9473 2.2425 141377 1.8522 8.2589
18 3.1675 193.9788 2.2400 141377 1.8600 8.2600
19 3.1744 194.2563 2.3000 149875 1.8718 8.2899
20 3.1410 193.4316 2.2300 129561 1.8480 8.2395
21 3.1266 191.2633 2.2550 122636 1.8440 8.2396
22 3.1486 192.0354 2.3600 130996 1.8570 8.8640
23 3.1282 194.3351 2.4825 92430 1.7849 8.1291
24 3.1214 193.5196 2.4750 94814 1.7624 8.1991
25 3.1230 193.2017 2.3725 87590 1.7660 8.2310
26 3.1182 192.1642 2.4475 87715 1.6955 8.2414
27 3.1203 191.3744 2.3775 89857 1.6539 8.2480
28 3.1156 192.2646 2.3725 92159 1.5976 8.1676
29 3.1270 192.7555 2.3675 97425 1.5896 8.1162
30 3.1154 194.0375 2.3725 87598 1.5277 8.2640
31 3.1104 192.0596 2.3850 93236 1.5132 7.9999
32 3.0846 192.2792 2.2900 94608 1.4990 8.1600
33 3.0569 193.2573 2.3050 84663 1.4715 8.2200
34 3.0893 192.7632 2.2550 67149 1.4955 7.9590
35 3.0991 192.1229 2.3050 75519 1.4280 7.9183
36 3.0879 192.1229 2.3100 76756 1.3839 7.9133
37 3.0965 192.0502 2.2175 61748 1.3130 7.8750
38 3.0655 191.2274 2.2300 41490 1.2823 7.8656
39 3.0636 191.6342 2.1925 51049 1.1492 7.7447
40 3.1097 190.9312 2.2150 21934 1.1626 7.6895"
DF <- read.table(text = Lines, header = TRUE)
I am trying to store in multiple cells in a dataframe. But, my code is storing the data in the last cell (on the dd array). Please see my output below.
Can somebody please correct me? Cannot figure out what I am doing wrong.
Thanks in advance,
MyData <- read.csv(file="Pat_AR_035.csv", header=TRUE, sep=",")
dd <- unique(MyData$POLICY_NUM)
for (j in length(dd)) {
myDF <- data.frame(i=1:length(dd), m=I(vector('list', length(dd))))
myDF$m[[j]] <- data.frame(j,MyData[which(MyData$POLICY_NUM==dd[j] & MyData$ACRES), ],ncol(MyData),nrow(MyData))
}
[[60]]
NULL
[[61]]
NULL
[[62]]
NULL
[[63]]
j OBJECTID DIVISION POLICY_SYM POLICY_NUM YIELD_ID LINE_ID RH_CLU_ID ACRES PLANT_DATE ACRE_TYPE CLU_DETERM STATE COUNTY FARM_SERIA TRACT
1646 63 1646 8 MP 754033 3 20 39565604 8.56 5/3/2014 PL A 3 35 109 852
1647 63 1647 8 MP 754033 1 10 39565605 30.07 4/19/2014 PL A 3 35 109 852
1648 63 1648 8 MP 754033 1 10 39565606 56.59 4/19/2014 PL A 3 35 109 852
CLU_NUMBER FIELD_ACRE RMA_CLU_ID UPDATE_DAT Percent_Ar RHCLUID Field1 OBJECTID_1 DIVISION_1 STATE_1 COUNTY_1
1646 3 8.56 F68E591A-ECC2-470B-A012-201C3BB20D7F 9/21/2014 63.4990 39565604 1646 1646 8 3 35
1647 1 30.07 eb04cfc0-e78b-415f-b447-9595c81ef09e 9/21/2014 100.0000 39565605 1647 1647 8 3 35
1648 2 56.59 5922d604-e31c-4b9d-b846-9f38e2d18abe 9/21/2014 92.1442 39565606 1648 1648 8 3 35
POLICY_N_1 YIELD_ID_1 RH_CLU_ID_ short_dist coords_x1 coords_x2 optional SHAPE_Leng SHAPE_Area ncol.MyData. nrow.MyData.
1646 754033 3 39565604 5.110837 516747.8 -221751.4 TRUE 831.3702 34634.73 35 1757
1647 754033 1 39565605 5.606284 515932.1 -221702.0 TRUE 1469.4800 121611.46 35 1757
1648 754033 1 39565606 5.325399 516380.1 -221640.9 TRUE 1982.8757 228832.22 35 1757
for (j in length(dd))
This doesn’t iterate over dd — it iterates over a single number: the length of dd. Not much of an iteration. You probably meant to write the following or something similar:
for (j in seq_along(dd))
However, there are more issues with your code. For instance, the myDF variable is continuously overwritten inside your loop, which probably isn’t what you intended at all. Instead, you should probably create objects in an lapply statement and forego the loop.
Given the following example:
library(metafor)
dat <- escalc(measure = "RR", ai = tpos, bi = tneg, ci = cpos, di = cneg, data = dat.bcg, append = TRUE)
dat
rma(yi, vi, data = dat, mods = ~dat[[8]], subset = (alloc=="systematic"), knha = TRUE)
trial author year tpos tneg cpos cneg ablat alloc yi vi
1 1 Aronson 1948 4 119 11 128 44 random -0.8893 0.3256
2 2 Ferguson & Simes 1949 6 300 29 274 55 random -1.5854 0.1946
3 3 Rosenthal et al 1960 3 228 11 209 42 random -1.3481 0.4154
4 4 Hart & Sutherland 1977 62 13536 248 12619 52 random -1.4416 0.0200
5 5 Frimodt-Moller et al 1973 33 5036 47 5761 13 alternate -0.2175 0.0512
6 6 Stein & Aronson 1953 NA NA NA NA 44 alternate NA NA
7 7 Vandiviere et al 1973 8 2537 10 619 19 random -1.6209 0.2230
8 8 TPT Madras 1980 505 87886 499 87892 NA random 0.0120 0.0040
9 9 Coetzee & Berjak 1968 29 7470 45 7232 27 random -0.4694 0.0564
10 10 Rosenthal et al 1961 17 1699 65 1600 42 systematic -1.3713 0.0730
11 11 Comstock et al 1974 186 50448 141 27197 18 systematic -0.3394 0.0124
12 12 Comstock & Webster 1969 5 2493 3 2338 33 systematic 0.4459 0.5325
13 13 Comstock et al 1976 27 16886 29 17825 33 systematic -0.0173 0.0714
Now what i basically want is to iterate with the rma() command (only for mods argument) from - let's say - [7:8] and to store this result in a variable equal to the columnname.
Two problems:
1) When i enter the command:
rma(yi, vi, data = dat, mods = ~dat[[8]], subset = (alloc=="systematic"), knha = TRUE)
The modname is named as dat[[8]]. But I want the modname to be the columname (i.e. colnames(dat[i]))
Model Results:
estimate se tval pval ci.lb ci.ub
intrcpt 0.5543 1.4045 0.3947 0.7312 -5.4888 6.5975
dat[[8]] -0.0312 0.0435 -0.7172 0.5477 -0.2185 0.1560
2) Now imagine that I have a lot of columns more and I want to iterate from [8:53], such that each result gets stored in a variable named equal to the columnname.
Problem 2) has been solved:
for(i in 7:8){
assign(paste(colnames(dat[i]), i, sep=""), rma(yi, vi, data = dat, mods = ~dat[[i]], subset = (alloc=="systematic"), knha = TRUE))}
To answers 1st part of your question, you can change the names by accessing the attributes of the model object.
In this case
# inspect the attributes
attr(model$vb, which = "dimnames")
# assign the name
attr(model$vb, which = "dimnames")[[1]][2] <- paste(colnames(dat)[8])
I have this df where I have columns with date&time, date, and time. And of course the observations of CH4 and the calculated Ratio (I have more, but that is unrelevant to this question).
'data.frame': 1420847 obs. of 17 variables
$ Start : Factor w/ 1469 levels "2013-08-31 23:56:09.000",..: 2 2 2 2 2 2 2 2 2 2 ...
$ CO2 : int 1510 1950 1190 1170 780 870 730 740 680 700 ...
$ CH4 : int 66 77 62 58 34 51 36 43 32 40 ...
$ Ratio : num 0.0437 0.0395 0.0521 0.0496 0.0436 ...
$ Start_time: POSIXlt, format: "2013-11-20 00:10:05" "2013-11-20 00:10:05" "2013-11-20 00:10:05" "2013-11-20 00:10:05" ...
$ Start_date: Date, format: "2013-09-01" "2013-09-01" "2013-09-01" "2013-09-01" ...
Now I wish to split every day in six blocks of 4 hrs and to assign numbers 1 - 6 to each block. The problem, however, is that I only have the date and time at which the measurements started (Start_date and Start_time, or the combined Start), so I think it is necessary to assign each new Start_time to a block. The length of the observations varies a lot, so there is no option of assigning a number to it. This is what I wish to accomplish:
Start Start_time Start_date CO2 CH4 Ratio block
2013-09-01 00:10:05.000 00:10:05 2013-09-01 1510 66 0.04370861 1
2013-09-01 00:10:05.000 00:10:05 2013-09-01 1950 77 0.03948718 1
2013-09-01 05:16:55.000 05:16:55 2013-09-01 1190 62 0.05210084 2
2013-09-01 05:16:55.000 05:16:55 2013-09-01 1170 58 0.04957265 2
2013-09-01 05:16:55.000 05:16:55 2013-09-01 780 34 0.04358974 2
2013-09-01 12:44:33.000 12:44:33 2013-09-01 870 51 0.05862069 4
2013-09-01 12:44:33.000 12:44:33 2013-09-01 730 36 0.04931507 4
2013-09-01 22:14:23.000 22:14:23 2013-09-01 740 43 0.05810811 6
2013-09-01 22:14:23.000 22:14:23 2013-09-01 680 32 0.04705882 6
2013-09-02 08:37:05.000 08:37:05 2013-09-02 700 40 0.05714286 3
2013-09-02 08:37:05.000 08:37:05 2013-09-02 610 35 0.05737705 3
2013-09-02 17:22:33.000 17:22:33 2013-09-02 630 25 0.03968254 5
2013-09-02 17:22:33.000 17:22:33 2013-09-02 670 40 0.05970149 5
2013-09-02 23:59:44.000 23:59:44 2013-09-02 640 37 0.05781250 6
2013-09-02 23:59:44.000 23:59:44 2013-09-02 730 35 0.04794521 6
I have searched this website and also tried Google but, so far, I have found no answer. I have tried the following code, which I found in an answer on this website but no luck.
qaa <- split(df, cut(strptime(paste(df$Start_date, df$Start_time), format = "%Y-%m-%d %H:%M"),"4 hours"))
Previously, I tried to split the number of observations in minutes, so I tried to adjust that code. And to be very honest, I have no idea what I am doing (as you can probably tell).
lst<- split(df, df$Start_date)
nobs <- "4 hours"
List <- unlist(lapply(lst, function(x) {
x$grp <- rep(1:(nrow(x)/nobs+1), each = nobs)[1:nrow(x)]
split(x, x$grp)}), recursive = FALSE)
b <- as.matrix(do.call("rbind", List))
Just to let you know, again, I am a NOOB concerning R so it takes me a lot of time to figure everything out. I understand very little of the language but I am trying my very best to make it work. I really enjoy working with it! If there is already another question like this on this website, please let me know so I can remove this.. I have not found it, though.
Thank you for taking your time to read my question and to consider to answer it!
If you can extract the start hour from the start time (try here: Dealing with timestamps in R), you could then use the following to assign the correct block number :
df$block[df$start_hour>=0 & df$start_hour<4]<-1
df$block[df$start_hour>=4 & df$start_hour<8]<-2
df$block[df$start_hour>=8 & df$start_hour<12]<-3
df$block[df$start_hour>=12 & df$start_hour<16]<-4
df$block[df$start_hour>=16 & df$start_hour<20]<-5
df$block[df$start_hour>=20 & df$start_hour<24]<-6
If you install lubridate in particular you will be helped as it has useful functions like hour. cut2 from Hmisc allows you specify some easy brackets for your hours to be split by.
library("lubridate")
library("Hmisc")
example<-as.factor('2013-09-01 00:10:05.000')
example<-data.frame(example,timeslot=cut2(hour(as.POSIXct(example,"%Y-%m-%d %H:%M")),cuts=seq(0,24,4)))