How do I integrate a line created by the smooth.spline function? - r

I'm trying to integrate a list created by predict(smooth.spline) but I'm getting the following error: Error in stats::integrate(...) :
evaluation of function gave a result of wrong length.
predict(smooth.spline(x,y) gives:
$x
[1] 0.000 0.033 0.067 0.100 0.133 0.167 0.200 0.233 0.267 0.300 0.333 0.367 0.400 0.433 0.467 0.500
[17] 0.533 0.567 0.600 0.633 0.667 0.700 0.733 0.767 0.800 0.833 0.867 0.900 0.933 0.967 1.000 1.033
[33] 1.067 1.100 1.133 1.167 1.200 1.233 1.267 1.300 1.333 1.367 1.400 1.433 1.467 1.500 1.533 1.567
[49] 1.600 1.633 1.667 1.700 1.733 1.767 1.800 1.833 1.867 1.900 1.933 1.967 2.000 2.033 2.067 2.100
[65] 2.133 2.167 2.200 2.233 2.267 2.300 2.333 2.367 2.400 2.433 2.467 2.500 2.533 2.567 2.600 2.633
[81] 2.667 2.700 2.733 2.767 2.800 2.833 2.867 2.900 2.933 2.967 3.000 3.033 3.067 3.100 3.133 3.167
[97] 3.200 3.233 3.267 3.300 3.333 3.367 3.400 3.433 3.467 3.500 3.533 3.567 3.600 3.633 3.667 3.700
[113] 3.733 3.767 3.800 3.833 3.867 3.900 3.933 3.967 4.000 4.033 4.067 4.100 4.133 4.167 4.200 4.233
[129] 4.267 4.300 4.333 4.367 4.400 4.433 4.467 4.500 4.533 4.567 4.600 4.633 4.667 4.700 4.733 4.767
[145] 4.800 4.833 4.867 4.900 4.933 4.967 5.000 5.033 5.067 5.100 5.133 5.167 5.200 5.233 5.267 5.300
[161] 5.333 5.367 5.400 5.433 5.467 5.500 5.533 5.567 5.600 5.633 5.667 5.700 5.733 5.767 5.800 5.833
[177] 5.867 5.900 5.933 5.967 6.000 6.033 6.067 6.100 6.133 6.167 6.200 6.233 6.267 6.300 6.333 6.367
[193] 6.400 6.433 6.467 6.500 6.533 6.567 6.600 6.633 6.667 6.700 6.733 6.767 6.800 6.833 6.867 6.900
[209] 6.933 6.967 7.000 7.033 7.067 7.100 7.133 7.167 7.200 7.233 7.267 7.300 7.333 7.367 7.400 7.433
[225] 7.467 7.500 7.533 7.567 7.600 7.633 7.667 7.700 7.733 7.767 7.800 7.833 7.867 7.900 7.933 7.967
[241] 8.000 8.033 8.067 8.100 8.133 8.167 8.200 8.233 8.267 8.300 8.333 8.367 8.400 8.433 8.467 8.500
[257] 8.533 8.567 8.600 8.633 8.667 8.700 8.733 8.767 8.800 8.833 8.867 8.900 8.933 8.967 9.000 9.033
[273] 9.067 9.100 9.133 9.167 9.200 9.233 9.267 9.300 9.333 9.367 9.400 9.433 9.467 9.500 9.533 9.567
[289] 9.600 9.633 9.667 9.700 9.733 9.767 9.800 9.833 9.867 9.900 9.933 9.967 10.000 10.033 10.067 10.100
$y
[1] 59.96571 182.14589 308.06545 430.28967 552.13181 676.76001 796.27007 913.45605 1030.73901 1140.24735
[11] 1244.62019 1345.89199 1437.37738 1521.99577 1601.97896 1672.60118 1736.28174 1794.58753 1844.06630 1886.59891
[21] 1923.24013 1952.04715 1974.93273 1993.22884 2006.84446 2017.75964 2027.59482 2036.61631 2045.82650 2056.14890
[31] 2067.21217 2079.44489 2093.29127 2107.48046 2121.84443 2136.20938 2149.03007 2160.03152 2168.83055 2174.72156
[41] 2177.92034 2178.50434 2177.25261 2175.18231 2173.05271 2171.23280 2169.75413 2168.60865 2167.58021 2166.28136
[51] 2164.31765 2161.56924 2157.84126 2153.06845 2147.68110 2141.80856 2135.99289 2131.40947 2128.57716 2127.73980
[61] 2129.07173 2132.52768 2137.84677 2144.15311 2151.04004 2158.20845 2164.72665 2170.38182 2175.16221 2178.72060
[71] 2181.26140 2183.34329 2185.47108 2188.20964 2191.71999 2195.72978 2200.17822 2204.67512 2208.37304 2210.99201
[81] 2212.16148 2211.52661 2209.27941 2205.52709 2200.82773 2195.80333 2191.14046 2187.86227 2186.22909 2186.61490
[91] 2189.21504 2193.74033 2200.00587 2207.23478 2215.15186 2223.55507 2231.56558 2239.35648 2247.15616 2254.58452
[101] 2262.25845 2270.90839 2280.40791 2291.00929 2302.93232 2315.07098 2327.30700 2339.53707 2350.58890 2360.39110
[111] 2368.83106 2375.48715 2380.80457 2385.21836 2389.36786 2394.40853 2401.47143 2410.55245 2422.11132 2436.78865
[121] 2453.43711 2472.06315 2492.92121 2514.41941 2536.79884 2560.48574 2584.09299 2608.55242 2635.61496 2664.80169
[131] 2697.76567 2735.79016 2776.54744 2820.81417 2868.96931 2916.89215 2964.73344 3012.72300 3056.87880 3097.62601
[141] 3135.48071 3167.79172 3195.56342 3220.27772 3241.55129 3261.03300 3279.41808 3295.63106 3310.16876 3323.00826
[151] 3332.94381 3340.03845 3344.39672 3345.94806 3345.34005 3343.03700 3339.80326 3336.46397 3333.90149 3333.10272
[161] 3334.29421 3337.81087 3343.53943 3351.20699 3360.65966 3370.86645 3381.56693 3392.54603 3402.66565 3411.98625
[171] 3420.52889 3427.65472 3433.82738 3439.48350 3444.52521 3449.15602 3453.47469 3457.18103 3460.26646 3462.61691
[181] 3463.90801 3464.03740 3462.81764 3460.39884 3456.89191 3452.34917 3447.51817 3442.81170 3438.49642 3434.61442
[191] 3430.68032 3426.12851 3420.51956 3412.97424 3402.44270 3389.08015 3372.22571 3350.92543 3326.65679 3299.18832
[201] 3267.98034 3235.60437 3201.97284 3166.74241 3132.31425 3097.84231 3062.28419 3027.69000 2992.94842 2956.82062
[211] 2921.23160 2884.94573 2846.71167 2808.67879 2769.66061 2728.44573 2687.49711 2645.56586 2600.90609 2555.63728
[221] 2507.95605 2455.68553 2401.27869 2342.78231 2278.34602 2212.01091 2142.26985 2067.55831 1993.06085 1917.46648
[231] 1839.35164 1764.18963 1690.48889 1616.92292 1548.58020 1483.78349 1421.22958 1365.02723 1313.47540 1265.38224
[241] 1223.67578 1186.75059 1153.52704 1125.77912 1102.26304 1082.24588 1066.67248 1054.56916 1045.35940 1039.20608
[251] 1035.34023 1033.24970 1032.58511 1032.85175 1033.69725 1034.73437 1035.66522 1036.21146 1036.16962 1035.42480
[261] 1033.76896 1031.12350 1027.27529 1021.86005 1014.99372 1006.33762 995.34857 982.53272 967.47341 949.51507
[271] 929.75179 907.75896 882.86053 856.68919 828.72692 798.22411 767.10143 734.59731 699.82246 665.13042
[281] 629.85926 593.30425 558.00149 523.24723 488.37898 455.79640 424.78607 394.77350 367.79586 343.17422
[291] 320.38235 300.83710 283.85695 268.87085 256.54269 246.16897 237.22002 229.91066 223.66652 218.11256
[301] 213.36419 209.04868 204.88159 200.94805
smooth <- predict(smooth.spline(x,y))
Then I pass this data to the function command:
func <- function(x) smooth
#Attempt to integrate
integrate(func,0,10)$value
Error in stats::integrate(...) :
evaluation of function gave a result of wrong length
I get the same error when I attempt to Vectorize the function
> integrate(Vectorize(func),0,10)$value
Error in stats::integrate(...) :
evaluation of function gave a result of wrong length
Ultimately, I'm trying to find an upper limit to the integral with a given Area under the curve but I can't even seem to complete the integral function.

You didn't include any reproducible data, so I can't test this advice for you, but here are two suggestions.
First: if you are starting with the smooth object that has evenly spaced x values and corresponding y values from predictions from the spline, then don't bother with integrate(), just use the trapezoidal rule to approximate the integral:
with(smooth, (x[2]-x[1])*(sum(y) - mean(y[c(1, length(y))])))
The Simpson's rule formula would be a bit more accurate but also more complicated.
Second: if you are starting with data vectors x and y, then you should construct a function which takes a vector of new x values and returns the corresponding predictions of y, and pass that function to integrate(). Here I do it that way:
fit <- smooth.spline(x, y)
smooth <- function(x) predict(fit, x)$y
integrate(smooth, 0, 10)

Related

multiple line graphs in single frame

I have a discharge data that i want to display; observed vs simulated. The data is as follows;
Time observed simulated
Jan-86 0.105 0.1597
Feb-86 0.0933 0.1259
Mar-86 3.5336 0.41
Apr-86 8.8999 2.494
May-86 5.2431 1.767
Jun-86 0.9747 1.96
Jul-86 0.079 1.98
Aug-86 0.0154 1.729
Sep-86 0.0053 1.419
Oct-86 0.0135 1.121
Nov-86 0.0235 0.8664
Dec-86 0.017 0.658
Jan-87 0.017 0.4925
Feb-87 0.017 0.3855
Mar-87 3.3483 1.089
Apr-87 3.3156 1.704
May-87 0.5563 1.327
Jun-87 0.2565 1.166
Jul-87 0.0446 1.012
Aug-87 0.0096 0.8278
Sep-87 0.0007 0.6567
Oct-87 0.0018 0.5083
Nov-87 0.0139 0.3892
Dec-87 0.0087 0.2953
Jan-88 0.0025 0.2196
Feb-88 0.0017 0.1641
Mar-88 0.0099 0.3858
Apr-88 1.6217 3.929
May-88 0.3398 0.5156
Jun-88 0.762 0.5537
Jul-88 0.0242 0.4985
Aug-88 0.0002 0.4125
Sep-88 0.0003 0.4027
Oct-88 0 0.2918
Nov-88 0 0.2388
Dec-88 0.0005 0.2024
Jan-89 0.0003 0.147
Feb-89 0.0004 0.1157
Mar-89 0.0006 0.3886
Apr-89 6.5433 10.92
May-89 0.8047 1.685
Jun-89 0.7968 1.486
Jul-89 0.0836 1.407
Aug-89 0.0024 1.22
Sep-89 0.0001 0.9965
Oct-89 0 0.7846
Nov-89 0.0005 0.6097
Dec-89 0 0.4636
Jan-90 0 0.3469
Feb-90 0 0.271
Mar-90 0.2724 0.9063
Apr-90 0.3768 2.902
May-90 0.0776 0.5038
Jun-90 0.1327 0.5622
Jul-90 0.0636 0.5068
Aug-90 0.0005 0.4169
Sep-90 0 0.3328
Oct-90 0 0.2611
Nov-90 0 0.2016
Dec-90 0 0.1549
Jan-91 0 0.116
Feb-91 0.0004 0.0904
Mar-91 0.0024 0.0709
Apr-91 0.0056 0.3813
May-91 0.1312 0.6567
Jun-91 0.1033 0.6053
Jul-91 1.1491 0.6226
Aug-91 0.0957 0.5423
Sep-91 0.01 0.4529
Oct-91 0.009 0.374
Nov-91 0.0436 0.3132
Dec-91 0.0629 0.2344
Jan-92 0.0238 0.1775
Feb-92 0.0125 0.1378
Mar-92 2.4242 3.399
Apr-92 2.9119 4.284
May-92 1.0843 1.854
Jun-92 0.1473 1.7
Jul-92 0.3467 1.451
Aug-92 0.0143 1.182
Sep-92 0.0193 2.272
Oct-92 0.035 1.332
Nov-92 0.0132 1.181
Dec-92 0.0353 0.9716
Jan-93 0.0213 0.7097
Feb-93 0.0196 0.5596
Mar-93 0.2553 5.669
Apr-93 3.4093 4.912
May-93 0.4553 1.575
Jun-93 1.4621 1.56
Jul-93 2.7732 2.622
Aug-93 7.4911 1.587
Sep-93 7.7134 1.381
Oct-93 0.4065 1.133
Nov-93 0.3042 0.9257
Dec-93 0.1669 0.7514
Jan-94 0.0756 0.5657
Feb-94 0.0317 0.4464
Mar-94 1.3576 3.802
Apr-94 1.5093 4.446
May-94 0.8696 1.246
Jun-94 0.3097 1.426
Jul-94 4.1223 1.66
Aug-94 0.6915 0.7939
Sep-94 3.9228 0.6434
Oct-94 1.5528 0.5081
Nov-94 3.0506 0.3907
Dec-94 0.6294 0.3053
Jan-95 0.2484 0.2327
Feb-95 0.1053 0.1842
Mar-95 9.4852 7.073
Apr-95 3.8737 3.122
May-95 3.0692 1.754
Jun-95 0.3433 1.386
Jul-95 2.6554 1.297
Aug-95 0.3252 0.9797
Sep-95 0.2854 0.7803
Oct-95 0.2667 0.6097
Nov-95 0.1444 0.4692
Dec-95 0.1098 0.355
Jan-96 0.0696 0.265
Feb-96 0.0399 0.4352
Mar-96 0.0419 0.2793
Apr-96 16.2771 17.33
May-96 25.3653 21.04
Jun-96 0.4064 4.901
Jul-96 0.3028 3.886
Aug-96 0.097 3.1
Sep-96 0.0325 2.51
Oct-96 0.0949 2.009
Nov-96 0.2763 1.614
Dec-96 0.1307 1.252
Jan-97 0.0778 0.9253
Feb-97 0.0661 0.7211
Mar-97 0.0703 0.7519
Apr-97 27.3434 21.65
May-97 4.2895 7.989
Jun-97 0.4939 3.661
Jul-97 6.7193 3.92
Aug-97 0.1174 2.802
Sep-97 0.0858 2.229
Oct-97 2.0501 1.789
Nov-97 0.891 1.644
Dec-97 0.3561 1.288
Jan-98 0.133 0.94
Feb-98 0.8482 2.56
Mar-98 7.2317 6.613
Apr-98 3.7604 4.181
May-98 3.039 2.323
Jun-98 5.3291 2.492
Jul-98 5.6387 2.607
Aug-98 0.1308 1.943
Sep-98 0.0937 1.647
Oct-98 1.4565 1.641
Nov-98 0.7778 1.563
Dec-98 0.5755 1.692
Jan-99 0.0573 1.65
Feb-99 0.0783 1.489
Mar-99 2.3554 7.688
Apr-99 25.3018 18.41
May-99 8.7571 5.154
Jun-99 14.8313 3.564
Jul-99 4.7535 2.423
Aug-99 3.6622 1.898
Sep-99 5.0639 1.524
Oct-99 0.9153 1.186
Nov-99 0.4436 0.905
Dec-99 0.181 0.6864
Jan-00 0.1015 0.5129
Feb-00 1.9763 0.3953
Mar-00 2.5832 0.3083
Apr-00 3.6585 0.2388
May-00 0.9701 0.182
Jun-00 7.1744 0.1605
Jul-00 1.7145 0.1494
Aug-00 0.6677 0.1364
Sep-00 0.1858 0.1195
Oct-00 1.1442 0.0997
Nov-00 15.1503 0.6839
Dec-00 0.5526 0.4275
01-Jan 0.182 0.6061
01-Feb 0.1582 0.5254
01-Mar 0.7527 0.437
01-Apr 18.8305 21
01-May 4.0794 2.765
01-Jun 1.7906 5.399
01-Jul 0.2344 2.615
01-Aug 2.8721 1.896
01-Sep 0.108 1.555
01-Oct 0.0896 1.237
01-Nov 0.6865 0.9588
01-Dec 0.1609 0.7329
02-Jan 0.0987 0.5496
02-Feb 0.081 0.4299
02-Mar 0.0671 0.4125
02-Apr 1.9161 5.189
02-May 2.8088 2.423
02-Jun 18.2132 2.137
02-Jul 2.881 2.783
02-Aug 0.676 1.102
02-Sep 1.309 0.892
02-Oct 0.1844 0.7183
02-Nov 0.1415 0.56
02-Dec 0.0781 0.4277
03-Jan 0.0897 0.3211
03-Feb 0.0191 0.2515
03-Mar 1.1978 2.32
03-Apr 1.4536 2.175
03-May 1.2194 0.9472
03-Jun 2.2049 0.7456
03-Jul 0.1934 0.6395
03-Aug 0.0362 0.5237
03-Sep 0.0047 0.4738
03-Oct 0.0338 0.3477
03-Nov 0.1166 0.2821
03-Dec 0.0301 0.2319
04-Jan 0.0151 0.1851
04-Feb 0.0218 0.1462
04-Mar 2.9284 3.967
04-Apr 5.113 8.21
04-May 14.4488 6.077
04-Jun 8.7876 4.92
04-Jul 0.7572 2.781
04-Aug 0.3186 2.023
04-Sep 1.7134 1.648
04-Oct 0.834 1.385
04-Nov 1.5215 1.571
04-Dec 0.1535 1.175
05-Jan 0.0515 0.8762
05-Feb 0.0535 0.7016
05-Mar 0.5916 2.954
05-Apr 10.2761 12.22
05-May 4.3927 3.95
05-Jun 12.6566 8.826
05-Jul 13.6267 4.855
05-Aug 11.4682 3.241
05-Sep 1.2082 2.454
05-Oct 1.1875 1.986
05-Nov 1.5555 1.566
05-Dec 0.3229 1.294
06-Jan 0.1832 1.055
06-Feb 0.112 0.885
06-Mar 0.3341 3.006
06-Apr 24.8525 19.75
06-May 6.2187 4.442
06-Jun 0.3634 2.697
06-Jul 0.0534 1.889
06-Aug 0.0439 1.571
06-Sep 0.02 1.261
06-Oct 0.0418 0.9836
06-Nov 0.0612 0.7535
06-Dec 0.0747 0.5717
07-Jan 0.0644 0.43
07-Feb 0.0339 0.3319
07-Mar 2.8046 2.675
07-Apr 2.7156 3.412
07-May 0.5788 2.576
07-Jun 8.5705 9.888
07-Jul 1.3929 2.897
07-Aug 0.1146 1.758
07-Sep 0.0374 1.486
07-Oct 0.1637 1.338
07-Nov 0.1599 1.2
07-Dec 0.1165 0.9649
08-Jan 0.054 0.7372
08-Feb 0.024 0.5469
08-Mar 0.04 0.6989
08-Apr 2.3773 9.219
08-May 1.3455 3.223
08-Jun 1.4375 4.011
08-Jul 0.531 2.341
08-Aug 0.0512 1.618
08-Sep 0.0902 1.377
08-Oct 2.8219 1.115
08-Nov 4.7166 0.9028
08-Dec 0.3393 0.8564
09-Jan 0.1303 0.6376
09-Feb 0.1594 0.7089
09-Mar 10.3111 5.402
09-Apr 14.466 14.64
09-May 6.0214 13.73
09-Jun 5.4491 6.086
09-Jul 7.4774 4.059
09-Aug 0.4845 2.885
09-Sep 0.1321 2.208
09-Oct 0.0935 1.755
09-Nov 0.1702 1.367
09-Dec 0.0786 1.183
10-Jan 0.049 1.461
10-Feb 0.0502 0.8349
10-Mar 9.9809 7.328
10-Apr 2.1785 5.341
10-May 5.54 9.544
10-Jun 6.5798 10.35
10-Jul 1.4304 5.972
10-Aug 0.3424 3.768
10-Sep 8.7223 3.844
10-Oct 5.7656 4.88
10-Nov 3.7897 4.978
10-Dec 0.5271 3.289
I tried the following codes to display the data
require(xts)
data <- read.csv('./flowout13.csv')
dd1<-data.frame(data[2:3])
dd1<-ts(dd1,frequency = 12,start = 1986)
plot(as.xts(dd1),major.format="%y-%m")
title(main="Calibrated observed and simulated discharge",xlab="Time",ylab="discharge in mm")
legend("topleft", inset=0.10, title="Discharge",
c("observed","simulated","r2=0.8", "NSE=0.60"), fill=terrain.colors(2), horiz=FALSE)
And the graph does not show the actual color of the graphs.I want the black lines as observed and red as simulated but it shows different.i do not want the r2 and NSE be in any color they are just the values, i added from different calculations. I also want to change the position of xlab below the dates. Please help out. I am working on r studio.
Is this what you're looking for?
plot(as.xts(dd1), major.format="%y-%m", col = terrain.colors(2))

Error: faceting variables must have at least one value

I have the following dataset:
Col1 Col2 Col3 Col4 Col5 Col6
4439.5 6.5211 50.0182 29.4709 -0.0207 0.0888
4453 25.1186 46.5586 34.1279 -0.0529 0.082
4453.5 24.2974 46.6291 30.6281 -0.057 0.0809
4457.5 25.3257 49.6885 26.2664 -0.0357 0.0837
4465 7.1077 53.516 32.5077 -0.0398 0.1099
4465.5 7.5892 53.0884 33.1582 -0.0395 0.1128
4898.5 8.8296 55.0611 40.3813 -0.0123 0.1389
4899 9.2469 54.4799 37.1927 -0.0061 0.1354
4900 13.4119 50.8334 28.9441 -0.0272 0.1071
4900.5 21.8415 50.1127 24.2351 -0.0375 0.0882
4905 11.3824 52.4024 37.2646 -0.0324 0.1215
4918.5 6.2601 49.9454 27.715 0.0101 0.1444
4919 7.4157 49.7412 25.6159 -0.0164 0.1038
4932 25.737 46.2825 38.6334 -0.0425 0.0717
5008.5 13.641 49.7868 18.0337 -0.0213 0.111
5010.5 13.5935 49.5352 23.9319 -0.0518 0.0979
5012 16.6945 48.0672 25.2408 -0.0446 0.0985
5014.5 14.1303 49.6361 23.1816 -0.0455 0.1056
5040 7.6895 49.8688 31.562 -0.0138 0.126
5044 12.594 60.822 52.4569 0.0481 0.1877
5045.5 10.3719 56.443 43.3782 0.0076 0.1403
5046 8.1382 54.5388 46.2675 0.01 0.1443
5051.5 29.0142 46.8052 43.3224 -0.0465 0.0917
5052 32.3053 46.4278 32.9387 -0.0509 0.0868
5052.5 38.4807 45.3555 24.4187 -0.0619 0.0774
5053 38.8954 43.8459 21.8487 -0.0688 0.0681
5055 19.69 50.9335 46.9419 -0.0527 0.0897
5055.5 11.7398 51.8329 59.5443 -0.0307 0.1083
5056 13.3196 51.8329 55.4419 -0.0276 0.1262
5056.5 18.3702 51.7003 39.232 -0.0408 0.1105
5057.5 14.0531 50.1129 24.4546 -0.0444 0.0921
5058 15.292 49.8805 23.0938 -0.0347 0.0925
5059 20.5135 49.52 21.6173 -0.0333 0.1006
5060 14.5151 47.5836 27.0685 -0.0156 0.1062
5060.5 14.5188 48.2506 27.9704 -0.0363 0.1018
5228 1.2168 54.2009 17.4351 0.0583 0.1794
5229 3.5896 51.7649 26.1107 -0.0033 0.1362
5232.5 2.7404 53.5941 38.6852 0.0646 0.194
5233 3.6694 53.9483 36.674 0.0633 0.204
5234 1.3789 53.8741 18.5804 0.0693 0.1958
5234.5 0.8592 53.6052 18.1654 0.0742 0.1982
5237 2.6951 52.3763 24.8098 0.0549 0.1923
I am trying to create an R visual that will break out each Column into facets, using Col1 as the identity column.
To do this I am using this (faulty) code:
library(reshape2)
library(plotly)
plot.data <- dataset
melted <- melt(dataset, id.vars="Col1")
sp <- ggplot(melted, aes(x=Col1, y=value)) + geom_line()
# Divide by variable in the vertical direction
sp + facet_grid(variable~.)
ggplotly()
However, I am receiving an error saying:
Faceting variables must have at least one value
I know this is an unlikely solution, but did you make sure all your filters are correct / not filtering out values somehow? I find that filter are often a source of mistakes for me so if it works in R, that could be the problem.
I had the same error and it was my filtering:
Example:
I did this data <- data[data$symbol == geneId,] instead of data <- data[data$symbol %in% geneId,]

R fitting and forecasting daily time series

I am working with a daily time serie and I need to build a forecast for 90 days (or maybe more) based on my history - The current time serie has roughly 298 data points.
The issue I have is the famous flat line in the final forecast - and yes I might not have a seasonality but I am trying to work this out. Another issue is how to find the best model and adapt it from here on for this kind of behaviour.
I created a test case to investigate this further and any help is appreciated.
Thanks,
To start with
x <- day_data # My time serie
z <- 90 # Days to forecast
low_bound_date <- as.POSIXlt(min(x$time), format = "%m/%d/%Y") # oldest date in the DF.
> low_bound_date
[1] "2015-12-21 PST"
low_bound_date$yday
> low_bound_date$yday # Day in Julian
[1] 354
lbyear <- as.numeric(substr(low_bound_date, 1, 4))
> lbyear
[1] 2015
This is my time serie content
> ts
Time Series:
Start = c(2065, 4)
End = c(2107, 7)
Frequency = 7
[2] 20.73 26.19 27.51 26.11 26.28 27.58 26.84 27.00 26.30 28.75 28.43 39.03 41.36 45.42 44.80 45.33 47.79 44.70 45.17
[20] 34.90 32.54 32.75 33.35 34.76 34.11 33.59 33.60 38.08 30.45 29.66 31.09 31.36 31.96 29.30 30.04 30.85 31.13 25.09
[39] 17.88 23.73 25.31 31.30 35.18 34.13 34.96 35.12 27.36 38.33 38.59 38.14 38.54 41.72 37.15 35.92 37.37 32.39 30.64
[58] 30.57 30.66 31.16 31.50 30.68 32.21 32.27 32.55 33.61 34.80 33.53 33.09 20.90 6.91 7.82 15.78 7.25 6.19 6.38
[77] 38.06 39.82 35.53 38.63 41.91 39.76 37.26 38.79 37.74 35.61 39.70 35.79 35.36 29.63 22.07 35.39 35.99 37.35 38.82
[96] 25.80 21.31 18.85 9.52 20.75 36.83 44.12 37.79 34.45 36.05 16.39 21.84 31.39 34.26 31.50 30.87 28.88 42.83 41.52
[115] 42.34 47.35 44.47 44.10 44.49 26.89 18.17 40.44 43.93 41.56 39.98 40.31 40.59 40.17 40.22 40.50 32.68 35.89 36.06
[134] 34.30 22.67 12.56 13.29 12.34 28.00 35.27 36.57 33.78 32.15 33.58 34.62 30.96 32.06 33.05 30.66 32.47 30.42 32.83
[153] 31.74 29.39 22.39 12.58 16.46 5.36 4.01 15.32 32.79 31.66 32.02 27.60 31.47 31.61 34.96 27.77 31.91 33.94 33.43
[172] 26.94 28.38 21.42 24.51 23.82 31.71 26.64 27.96 29.29 29.25 28.70 27.02 27.62 30.90 27.46 27.37 26.46 27.77 13.61
[191] 5.87 12.18 5.68 4.15 4.35 4.42 16.42 25.18 26.06 27.39 27.57 28.86 15.18 5.19 5.61 8.28 7.78 5.13 4.90
[210] 5.02 5.27 16.31 25.01 26.19 25.96 24.93 25.53 25.56 26.39 26.80 26.73 26.00 25.61 25.90 25.89 13.80 6.66 6.41
[229] 5.28 5.64 5.71 5.38 5.76 7.20 7.27 5.55 5.31 5.94 5.75 5.93 5.77 6.57 5.52 5.51 5.47 5.69 19.75
[248] 29.22 30.75 29.63 30.49 29.48 31.83 30.42 29.27 30.40 29.91 32.00 30.09 28.93 14.54 7.75 5.63 17.17 22.27 24.93
[267] 35.94 37.42 33.13 25.88 24.27 37.64 37.42 38.33 35.20 21.32 7.32 4.81 5.17 17.49 23.77 23.36 27.60 26.53 24.99
[286] 24.22 23.76 24.10 24.22 27.06 25.53 23.40 37.07 26.52 25.19 28.02 28.53 26.67
First step, I get my data in ts
day_data_ts <- ts(x$avg_day, start = c(lbyear,low_bound_date$yday), frequency=7)
plot(day_data_ts)
plot_ts
acf(day_data_ts)
acf_ts
Second step, I get my data in msts
day_data_msts <- msts(x$avg_day, seasonal.periods=c(7,365.25), start = c(lbyear,low_bound_date$yday))
plot(day_data_msts)
acf(day_data_msts)
I did several fitting iterations to try and figure out the best fit and forecast model.
First fitting test is with the ts only.
fit1 <- HoltWinters(day_data_ts)
> fit1
Holt-Winters exponential smoothing with trend and additive seasonal component.
Call: HoltWinters(x = day_data_ts)
Smoothing parameters: alpha: 1 beta : 0.006757112 gamma: 0
Coefficients:
[,1]
a 28.0922449
b 0.1652477
s1 0.6241837
s2 1.9084694
s3 0.9913265
s4 0.8198980
s5 -1.7015306
s6 -1.2201020
s7 -1.4222449
fit2 <- tbats(day_data_ts)
> fit2
BATS(1, {0,0}, 0.8, -)
Parameters: Alpha: 1.309966 Beta: -0.3011143 Damping Parameter: 0.800001
Seed States:
[,1]
[1,] 15.282259
[2,] 2.177787
Sigma: 5.501356 AIC: 2723.911
fit3 <- ets(day_data_ts)
> fit3
ETS(A,N,N)
Smoothing parameters: alpha = 0.9999
Initial states: l = 25.2275
sigma: 5.8506
AIC AICc BIC
2756.597 2756.678 2767.688
fit4 <- auto.arima(day_data_ts)
> fit4
ARIMA(1,1,2)
Coefficients:
ar1 ma1 ma2
0.7396 -0.6897 -0.2769
s.e. 0.0545 0.0690 0.0621
sigma^2 estimated as 30.47: log likelihood=-927.9
AIC=1863.81 AICc=1863.94 BIC=1878.58
Second test is using msts. I also changed the ets model to MAM.
fit5 <- tbats(day_data_msts)
> fit5
BATS(1, {0,0}, 0.8, -)
Parameters: Alpha: 1.309966 Beta: -0.3011143 Damping Parameter: 0.800001
Seed States:
[,1]
[1,] 15.282259
[2,] 2.177787
Sigma: 5.501356 AIC: 2723.911
fit6 <- ets(day_data_msts, model="MAN")
> fit6
ETS(M,A,N)
Smoothing parameters: alpha = 0.9999 beta = 9e-04
Initial states: l = 52.8658 b = 3.9184
sigma: 0.3459
AIC AICc BIC
3042.744 3042.949 3061.229
fit7 <- auto.arima(day_data_msts)
> fit7
ARIMA(1,1,2)
Coefficients:
ar1 ma1 ma2
0.7396 -0.6897 -0.2769
s.e. 0.0545 0.0690 0.0621
sigma^2 estimated as 30.47: log likelihood=-927.9
AIC=1863.81 AICc=1863.94 BIC=1878.58
You can forecast on previously estimated model as follows (use built in timeseries LakeHuron):
library(forecast)
y <- LakeHuron
tsdisplay(y)
# estimate ARMA(1,1)
mod_2 <- Arima(y, order = c(1, 0, 1))
#make forecast for 5 periods (years in this case)
fHuron <- forecast(mod_2, h = 5)
#show results in table
fHuron
#plot results
plot(fHuron)
This will give you:
Pay attention that ARIMA model bases its forecast on previous values, so if we make prediction on many periods the model will use already predicted values to predict next. Which will reduce accuracy.
To fit optimal ARIMA model use this function:
library(R.utils) #for the function 'withTimeout'
fitARIMA<-function(timeseriesObject, timout)
{
final.aic <- Inf
final.order <- c(0,0,0)
for (p in 0:5) for (q in 0:5) {
if ( p == 0 && q == 0) {
next
}
arimaFit = tryCatch(
withTimeout(arima(timeseriesObject
,order=c(p, 0, q))
,timeout = timeout)
,error=function( err ) FALSE
,warning=function( err ) FALSE )
if( !is.logical( arimaFit ) ) {
current.aic <- AIC(arimaFit)
if (current.aic < final.aic) {
final.aic <- current.aic
final.order <- c(p, 0, q)
final.arima <- arima(timeseriesObject, order=final.order)
}
} else {
next
}
}
final.order<-c(final.order,final.aic)
final.order
}

Read the data from a text file and reshape the data in R

I have a data set for different time intervals. The data has three comment lines before data for each time interval. For each time interval there are 500 data points. I want to change the dataset such that I have the following format:
t1 t2 t3 ................
0.00208 0.00417 0.00625 .................
a1 a2 a3 ...................
b1 b2 b3 ...................
c1 c2 c3 .................
...............................
................................
The link to the file is as follows: https://www.dropbox.com/s/hc8n3qcai1mlxca/WAT_DEP.DAT
As you will see on the file, time for each interval is the second data of the third line before the data starts. For the first time, t= 0.00208. I need to change the data in several rows into one column. At last I need to create a dataframe with the format shown above. In the sample above, a1, b1, c1 are the data for time t1, and so on.
I am sorry for posting a relatively large data set.
Thank you for the help.
Sample data added
The sample data is as follows:
** N:SNAPSHOT TIME DELT[S]
** WATER DEPTH [M]: (HP(L),L=2,LA)
1800 0.00208 0.10000
3.224 3.221 3.220 3.217 3.216 3.214 3.212 3.210 3.209 3.207
3.205 3.203 3.202 3.200 3.199 3.197 3.196 3.193 3.192 3.190
3.189 3.187 3.186 3.184 3.184 3.182 3.181 3.179 3.178 3.176
3.175 3.174 3.173 3.171 3.170 3.169 3.168 3.167 3.166 3.164
3.164 3.162 3.162 3.160 3.160 3.158 3.158 3.156 3.156 3.155
3.154 3.153 3.152 3.151 3.150 3.150 3.149 3.149 3.147 3.147
3.146 3.146 3.145 3.145 3.144 3.144 3.143 3.143 3.142 3.142
3.141 3.142 3.141 3.141 3.140 3.141 3.140 3.140 3.139 3.140
3.139 3.140 3.139 3.140 3.139 3.140 3.139 3.140 3.139 3.140
3.139 3.140 3.140 3.140 3.140 3.141 3.141 3.142 3.141 3.142
3.142 3.142 3.143 3.143 3.144 3.144 3.145 3.145 3.146 3.146
3.147 3.148 3.149 3.149 3.150 3.150 3.152 3.152 3.153 3.154
3.155 3.156 3.157 3.158 3.159 3.160 3.161 3.162 3.163 3.164
3.165 3.166 3.168 3.169 3.170 3.171 3.173 3.174 3.176 3.176
3.178 3.179 3.181 3.182 3.184 3.185 3.187 3.188 3.190 3.191
3.194 3.195 3.196 3.198 3.199 3.202 3.203 3.205 3.207 3.209
3.210 3.213 3.214 3.217 3.218 3.221 3.222 3.225 3.226 3.229
3.231 3.233 3.235 3.238 3.239 3.242 3.244 3.247 3.248 3.251
3.253 3.256 3.258 3.261 3.263 3.266 3.268 3.271 3.273 3.276
3.278 3.281 3.283 3.286 3.289 3.292 3.294 3.297 3.299 3.303
3.305 3.307 3.311 3.313 3.317 3.319 3.322 3.325 3.328 3.331
3.334 3.337 3.340 3.343 3.347 3.349 3.353 3.356 3.359 3.362
3.366 3.369 3.372 3.375 3.379 3.382 3.386 3.388 3.392 3.395
3.399 3.402 3.406 3.409 3.413 3.416 3.420 3.423 3.427 3.430
3.435 3.438 3.442 3.445 3.449 3.453 3.457 3.460 3.464 3.468
3.472 3.475 3.479 3.483 3.486 3.491 3.494 3.498 3.502 3.506
3.510 3.514 3.518 3.522 3.526 3.531 3.534 3.539 3.542 3.547
3.551 3.555 3.559 3.564 3.567 3.572 3.576 3.581 3.584 3.589
3.593 3.598 3.602 3.606 3.610 3.615 3.619 3.624 3.628 3.633
3.637 3.642 3.646 3.651 3.655 3.660 3.664 3.669 3.673 3.678
3.682 3.686 3.691 3.695 3.700 3.704 3.710 3.714 3.719 3.723
3.728 3.733 3.738 3.742 3.747 3.752 3.757 3.761 3.766 3.771
3.776 3.780 3.786 3.790 3.795 3.800 3.805 3.810 3.815 3.819
3.825 3.829 3.835 3.839 3.845 3.849 3.855 3.859 3.865 3.869
3.875 3.879 3.885 3.889 3.895 3.900 3.905 3.910 3.915 3.920
3.926 3.930 3.935 3.941 3.945 3.951 3.956 3.961 3.966 3.972
3.976 3.982 3.987 3.993 3.997 4.003 4.008 4.014 4.018 4.024
4.029 4.035 4.039 4.045 4.050 4.056 4.061 4.066 4.071 4.077
4.082 4.088 4.093 4.099 4.103 4.109 4.114 4.120 4.125 4.131
4.136 4.142 4.147 4.153 4.157 4.163 4.168 4.174 4.179 4.185
4.190 4.195 4.201 4.206 4.212 4.217 4.223 4.228 4.234 4.239
4.245 4.250 4.256 4.261 4.267 4.272 4.278 4.283 4.289 4.294
4.300 4.305 4.311 4.316 4.322 4.327 4.333 4.339 4.345 4.350
4.356 4.361 4.367 4.372 4.378 4.383 4.389 4.394 4.400 4.405
4.411 4.417 4.423 4.428 4.434 4.439 4.445 4.450 4.456 4.461
4.467 4.473 4.478 4.484 4.489 4.495 4.500 4.506 4.511 4.517
4.523 4.529 4.534 4.540 4.545 4.551 4.556 4.562 4.568 4.574
4.579 4.585 4.590 4.596 4.601 4.607 4.613 4.619 4.624 4.630
4.635 4.641 4.646 4.652 4.658 4.664 4.669 4.675 4.680 4.686
4.691 4.697 4.703 4.709 4.714 4.720 4.725 4.731 4.736 4.741
** N:SNAPSHOT TIME DELT[S]
** WATER DEPTH [M]: (HP(L),L=2,LA)
3600 0.00417 0.10000
4.124 4.123 4.123 4.122 4.122 4.121 4.121 4.120 4.120 4.119
4.118 4.117 4.117 4.116 4.116 4.115 4.115 4.114 4.114 4.114
4.114 4.113 4.113 4.112 4.112 4.111 4.111 4.110 4.110 4.109
4.109 4.109 4.109 4.108 4.108 4.107 4.107 4.106 4.107 4.106
4.106 4.105 4.105 4.105 4.105 4.104 4.104 4.104 4.104 4.103
4.103 4.103 4.102 4.102 4.102 4.102 4.101 4.102 4.101 4.101
4.101 4.101 4.100 4.101 4.100 4.101 4.100 4.100 4.100 4.100
4.100 4.100 4.100 4.100 4.100 4.100 4.100 4.100 4.100 4.100
4.100 4.100 4.100 4.100 4.100 4.100 4.100 4.100 4.100 4.101
4.100 4.101 4.100 4.101 4.101 4.101 4.101 4.102 4.101 4.102
4.102 4.101 4.102 4.102 4.103 4.102 4.103 4.103 4.104 4.103
4.104 4.104 4.105 4.104 4.105 4.105 4.106 4.106 4.107 4.106
4.107 4.107 4.108 4.108 4.109 4.109 4.110 4.110 4.110 4.110
4.111 4.111 4.112 4.112 4.113 4.113 4.114 4.114 4.115 4.115
4.116 4.116 4.117 4.117 4.118 4.118 4.120 4.120 4.121 4.121
4.122 4.122 4.122 4.123 4.123 4.125 4.125 4.126 4.126 4.127
4.128 4.129 4.129 4.130 4.130 4.132 4.132 4.133 4.133 4.135
4.135 4.136 4.137 4.138 4.138 4.139 4.140 4.141 4.141 4.143
4.143 4.145 4.145 4.146 4.147 4.148 4.149 4.150 4.150 4.152
4.152 4.154 4.154 4.156 4.156 4.158 4.158 4.160 4.160 4.162
4.162 4.163 4.164 4.165 4.166 4.167 4.168 4.169 4.171 4.171
4.173 4.173 4.175 4.176 4.177 4.178 4.180 4.180 4.182 4.183
4.184 4.185 4.187 4.187 4.189 4.190 4.192 4.192 4.194 4.195
4.197 4.197 4.199 4.200 4.202 4.203 4.204 4.205 4.207 4.208
4.210 4.210 4.212 4.213 4.215 4.216 4.218 4.219 4.221 4.221
4.223 4.224 4.225 4.227 4.228 4.230 4.231 4.233 4.234 4.236
4.237 4.239 4.240 4.242 4.243 4.245 4.246 4.248 4.249 4.251
4.252 4.254 4.255 4.257 4.258 4.260 4.262 4.264 4.265 4.267
4.268 4.270 4.271 4.273 4.275 4.277 4.278 4.280 4.281 4.283
4.285 4.287 4.288 4.290 4.291 4.294 4.295 4.297 4.298 4.301
4.302 4.303 4.305 4.307 4.309 4.310 4.312 4.314 4.316 4.317
4.320 4.321 4.323 4.325 4.327 4.328 4.331 4.332 4.334 4.336
4.338 4.339 4.342 4.343 4.346 4.347 4.349 4.351 4.353 4.355
4.357 4.359 4.361 4.362 4.365 4.366 4.369 4.370 4.373 4.374
4.377 4.378 4.381 4.382 4.385 4.386 4.389 4.390 4.393 4.394
4.397 4.398 4.400 4.402 4.404 4.406 4.408 4.411 4.412 4.415
4.416 4.419 4.421 4.423 4.425 4.427 4.429 4.432 4.433 4.436
4.437 4.440 4.442 4.444 4.446 4.449 4.450 4.453 4.455 4.457
4.459 4.462 4.463 4.466 4.468 4.470 4.472 4.475 4.476 4.479
4.481 4.484 4.485 4.488 4.490 4.492 4.494 4.497 4.499 4.501
4.503 4.505 4.508 4.509 4.512 4.514 4.517 4.519 4.521 4.523
4.526 4.528 4.530 4.532 4.535 4.537 4.540 4.541 4.544 4.546
4.549 4.551 4.554 4.555 4.558 4.560 4.563 4.565 4.568 4.569
4.572 4.574 4.577 4.579 4.582 4.584 4.586 4.588 4.591 4.593
4.596 4.598 4.601 4.603 4.605 4.607 4.610 4.612 4.615 4.617
4.620 4.622 4.624 4.627 4.628 4.631 4.633 4.636 4.638 4.641
4.643 4.646 4.648 4.651 4.653 4.656 4.657 4.660 4.662 4.665
4.667 4.670 4.672 4.675 4.677 4.680 4.682 4.685 4.687 4.690
4.692 4.695 4.697 4.700 4.702 4.705 4.706 4.709 4.711 4.714
4.716 4.719 4.721 4.724 4.726 4.729 4.731 4.734 4.736 4.741
Currently, I have data of 10 columns for each time. I want to create that as one single column of 500 data points. So, I want to arrange the data columns such that first the data on row 1 will be used and then data on second row and so on. This way, we will have one column for one time.
This produces a matrix, result, containing the times in the first row and the data in columns underneath the corresponding time.
L <- readLines(infile)
nt <- length(grep("TIME", L)) # no. of TIME lines
nd <- round((length(L) / nt) - 3) # no. of data lines per time
# times
ix.times <- rep(c(FALSE, TRUE, FALSE), c(2, 1, nd))
times <- scan(text = L[ix.times]) [ c(FALSE, TRUE, FALSE) ]
# data
ix.dat <- rep(c(FALSE, TRUE), c(3, nd))
dat <- matrix(scan(text = L[ix.dat]), nc = nt)
result <- rbind(times, dat)
The first few rows are:
> head(result)
[,1] [,2]
times 0.00208 0.00417
3.22400 4.12400
3.22100 4.12300
3.22000 4.12300
3.21700 4.12200
3.21600 4.12200
For the first part of your question : On idea to remove the comments lines is to use recycling. First, I read all the data using fill=TRUE then:
dat <- read.table(file=file.Name,fill=TRUE)
Then, since you have fixed number of rows, you can do this :
dat <- dat[c(rep(FALSE,3),rep(TRUE,500)),]
You will get a clean data.frame .
I don't get your second part of the question.
Second part solution:
First, call the sample data as sample. I assume two columns in the solution below. You can use lapplyto apply to other columns.
col.1<-as.data.frame(sample[,1])
col.2<-as.data.frame(sample[,2])
Now col.1 and col.2 are dataframes. Try to have the same colnames for `rbind` to work.
sample.1<-rbind(col.1,col.2)

Pratical questions about the vrtest package

I want to perform Variance Ratio tests (Lo-MackKinlay, Chow-Denning) but I have some problem with the running of the commands.
I have a price Index for 1957 to 2007. Do I need to perform the variance ratio tests on the level series or on the series of returns?
How do you fix the kvec? It is a vector with the lags for which you want to do the test right?
So here is my output:
> rcorr
[1] 0.0000 -0.1077 0.4103 -0.0347 0.1136 0.0286 0.0104 0.0104 0.1915
[10] -0.0025 0.0665 0.2127 0.0116 -0.1288 0.1640 0.3089 0.2098 -0.1071
[19] -0.2079 -0.1082 0.0022 0.1419 0.0641 -0.0082 -0.1163 -0.1731 0.0260
[28] 0.0468 0.0882 0.2640 0.3946 0.2094 0.2754 0.0623 -0.3696 -0.1095
[37] -0.1463 0.0118 0.0152 -0.0103 0.0223 0.0379 0.0580 -0.0091 -0.0510
[46] 0.0765 0.0984 0.1250 0.0519 0.1623 0.2552
> kvec<--c(2,5,10)
> Lo.Mac(rcorr,kvec)
Error in y[index] : only 0's may be mixed with negative subscripts
Why do I get this error?
It is the same error as in your other question I just answered:
kvec<--c(2,5,10)
is the same as
kvec <- -c(2,5,10)
ie
kvec <- -1 * c(2,5,10)
Remove the second dash.

Resources