As the data is of rainfall, I want to replace the negative values both in point forecasts and intervals with 0. How can this be done in R ? Looking for the R codes that can make the required changes.
The Forecast values obtained in R using an ARIMA model are given below
> Predictions
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
Jan 2021 -1.6625108 -165.62072 162.2957 -252.41495 249.0899
Feb 2021 0.8439712 -165.57869 167.2666 -253.67752 255.3655
Mar 2021 35.9618300 -130.53491 202.4586 -218.67297 290.5966
Apr 2021 53.4407679 -113.05822 219.9398 -201.19746 308.0790
May 2021 206.7464927 40.24744 373.2455 -47.89184 461.3848
Jun 2021 436.2547446 269.75569 602.7538 181.61641 690.8931
Jul 2021 408.2814434 241.78239 574.7805 153.64311 662.9198
Aug 2021 431.7649076 265.26585 598.2640 177.12657 686.4032
Sep 2021 243.5520546 77.05300 410.0511 -11.08628 498.1904
Oct 2021 117.4581047 -49.04095 283.9572 -137.18023 372.0964
Nov 2021 25.0773401 -141.42171 191.5764 -229.56098 279.7157
Dec 2021 28.9468415 -137.55188 195.4456 -225.69098 283.5847
Jan 2022 -0.4912674 -171.51955 170.5370 -262.05645 261.0739
Feb 2022 2.2963271 -168.86759 173.4602 -259.47630 264.0690
Mar 2022 43.3561613 -127.81187 214.5242 -218.42275 305.1351
Apr 2022 48.6538398 -122.51431 219.8220 -213.12526 310.4329
May 2022 228.4762035 57.30805 399.6444 -33.30290 490.2553
Jun 2022 445.3540781 274.18592 616.5222 183.57497 707.1332
Jul 2022 441.8287867 270.66063 612.9969 180.04968 703.6079
Aug 2022 592.5766086 421.40845 763.7448 330.79751 854.3557
Sep 2022 220.6996396 49.53148 391.8678 -41.07946 482.4787
Oct 2022 158.7952154 -12.37294 329.9634 -102.98389 420.5743
Nov 2022 29.9052184 -141.26288 201.0733 -231.87380 291.6842
Dec 2022 25.9432583 -145.22303 197.1095 -235.83298 287.7195
In this context, try using:
Predictions[Predictions < 0] <- 0
Which will replace all values less than 0 with 0. Because of the processing, the use of for loops is discouraged in applications where vectorization can be applied.
I have this data containing the animal ID, and measures of 15N and 13C for different years (2012-2017).
I need to obtain the mean and +-SE for 15N and 13C, then plot the yearly variation of each using these means.
In other words, I need to find the mean 15N and the mean 15C (with the standard error) for each year, then plot this, all using R.
ID Year N C
BVC002 2012 11,03 -16,3
BVC003 2012 12,6 -17,34
BVC004 2012 14,5 -11,3
BVC005 2012 14,08 -9,52
BVC00 2012 11,86 -15,34
BVC008 2012 11,5 -16,7
BVC009 2012 15,1 -15,7
BVC010 2012 13,25 -15,08
BVC011 2012 10,3 -14,6
BVC012 2012 17,8 -13,5
BVC014 2012 12,3 -11,9
BVC015 2012 10,83 -17,59
BVC117 2012 13,7 -9,6
BVC122 2012 13,3 -8,9
BVC127 2012 10,2 -17,5
BVG640 2012 12,6 -17,5
BVG642 2013 10,91 -16,52
BVG653 2013 12,03 -15,4
BVH013 2013 12,52 -15,9
BVH014 2013 12,17 -14,16
BVH015 2013 14,34 -15,31
BVH017 2013 12,06 -16,98
BVH041 2013 13,91 -14,6
BVH042 2013 11,56 -18
BVH044 2013 11,49 -16,61
BVH045 2013 14,29 -12,9
BVH046 2013 10,11 -16,5
BVH050 2013 10,35 -17,5
BVH051 2013 11,98 -12,13
BVH052 2013 12,77 -17,32
BVH053 2013 10,92 -15,31
BVH054 2013 12,85 -16,85
BVH055 2013 9,46 -15,47
BVH056 2013 11,54 -18,54
BVH058 2013 12,27 -16,84
BVH074 2013 13,6 -15,74
BVH101 2013 10,32 -17,93
BVH105 2013 13,05 -17,14
BVH107 2013 12,48 -17,19
BVH115 2013 12,61 -15,33
BVH117 2013 11,56 -14,97
BVH121 2013 15,1 -14,9
BVH122 2013 12,39 -16,26
BVG162 2013 11,6 -16,23
BVG165 2014 11,81 -15,68
BVG172 2014 11,51 -12,12
BVG173 2014 14,26 -16,48
BVG174 2014 14,01 -15,62
BVG175 2014 9,19 -17,26
BVG176 2014 13,86 -15,9
BVG180 2014 11,77 -16,6
BVG348 2014 11,65 -18,1
BVG353 2014 13,17 -15,05
BVG354 2014 10,75 -12,44
BVE191 2014 13,95 -16,69
I am using sentiment analysis function sentiment_by() from R package sentimentr (by trinker). I have a dataframe containing the following columns:
review comments
month
year
I ran the sentiment_by function on the dataframe to find the average polarity score based on the year and month and i get the following values.
review_year review_month word_count sd ave_sentiment
2015 March 8722 0.381686065 0.163440921
2015 April 7758 0.387046768 0.158812775
2015 May 7333 0.389256472 0.149220636
2015 November 14020 0.394711478 0.14691745
2016 February 7974 0.400406931 0.142345278
2015 September 8238 0.379989344 0.141740366
2015 February 7642 0.361415304 0.141624745
2015 December 24863 0.387409099 0.141606892
2016 March 8229 0.389033232 0.138552943
2016 January 10472 0.388300946 0.134302612
2015 August 7520 0.3640285 0.127980712
2016 May 3432 0.422246851 0.125041218
2015 June 8678 0.356612924 0.119333949
2015 January 9930 0.351126449 0.119225549
2016 April 9344 0.397066458 0.111879315
2015 July 8450 0.349963536 0.108881821
2015 October 7630 0.38017201 0.1044298
Now i run the sentiment_by function on the dataframe based on the comments alone and then i run the following function on the resultant data frame to find the average polarity score based on year and months.
sentiment_df[,list(avg=mean(ave_sentiment)),by="month,year"]
I get the following results.
month year avg
January 2015 0.110950199
February 2015 0.126943461
March 2015 0.146546669
April 2015 0.148264268
May 2015 0.143924126
June 2015 0.110691204
July 2015 0.106472437
August 2015 0.118976304
September 2015 0.135362187
October 2015 0.111441484
November 2015 0.137699548
December 2015 0.136786867
January 2016 0.128645808
February 2016 0.129139898
March 2016 0.134595706
April 2016 0.12106743
May 2016 0.142801514
As per my understanding both should return the same results, correct me if I am wrong. Reason for me to go for the second approach is because i need to average polarity based on both month and year, as well as based on months and i don't want to use the method twice as it will cause additional time delay. Could some one let me know what i am doing wrong here?
Here is an idea: Maybe the first function is taking the averages from the individual sentences, and the second one is taking the average from the "ave sentiment", which is already an average. So, the average of averages is not always equal to the average of the individual elements.
I am following along in this guide to forecast data in ARIMA data.
The question I have is how do I extract the data points from the forecasted data?
I would like to have those points so I could graph the exact same thing in excel. Is this possible?
Thank you.
Suppose you use something like
library(forecast)
m_aa <- auto.arima(AirPassengers)
f_aa <- forecast(m_aa, h=24)
then you can show values for the forecast, for example with
f_aa
which gives
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
Jan 1961 446.7582 431.7435 461.7729 423.7953 469.7211
Feb 1961 420.7582 402.5878 438.9286 392.9690 448.5474
Mar 1961 448.7582 427.9043 469.6121 416.8649 480.6515
Apr 1961 490.7582 467.5287 513.9877 455.2318 526.2846
May 1961 501.7582 476.3745 527.1419 462.9372 540.5792
Jun 1961 564.7582 537.3894 592.1270 522.9012 606.6152
Jul 1961 651.7582 622.5388 680.9776 607.0709 696.4455
Aug 1961 635.7582 604.7986 666.7178 588.4096 683.1069
Sep 1961 537.7582 505.1511 570.3653 487.8900 587.6264
Oct 1961 490.7582 456.5830 524.9334 438.4918 543.0246
Nov 1961 419.7582 384.0838 455.4326 365.1989 474.3176
Dec 1961 461.7582 424.6450 498.8714 404.9985 518.5179
Jan 1962 476.5164 431.6293 521.4035 407.8675 545.1653
Feb 1962 450.5164 401.1834 499.8494 375.0681 525.9647
Mar 1962 478.5164 425.1064 531.9265 396.8328 560.2000
Apr 1962 520.5164 463.3192 577.7137 433.0408 607.9920
May 1962 531.5164 470.7676 592.2652 438.6092 624.4237
Jun 1962 594.5164 530.4126 658.6203 496.4780 692.5548
Jul 1962 681.5164 614.2245 748.8083 578.6024 784.4304
Aug 1962 665.5164 595.1809 735.8519 557.9475 773.0853
Sep 1962 567.5164 494.2636 640.7692 455.4859 679.5469
Oct 1962 520.5164 444.4581 596.5747 404.1953 636.8376
Nov 1962 449.5164 370.7525 528.2803 329.0574 569.9754
Dec 1962 491.5164 410.1368 572.8961 367.0570 615.9758
and you can save these values with something like
write.csv(f_aa, file="location_and_filename.csv")
I've created a multiple line graph using ggplot2, where each line represents a year that is plotted against month (click link below). Volume is represented on the y-axis.
Here is the code I used to plot the figure above:
ggplot(data=df26, aes(x=Month, y=C1, group=Year, colour=factor(Year))) +
geom_line(size=.75) + geom_point() +
scale_x_discrete(limits=c("Jan","Feb","Mar","Apr","May","Jun","Jul",
"Aug","Sep","Oct","Nov","Dec")) +
scale_y_continuous(labels=comma) +
scale_colour_manual(values=cPalette, name="Year") +
ylab("Volume")
Question: How do I also include another line to the plot that represents the mean volume within each month with the ability to modify the line thickness and color of that mean line? So far, all of my attempts at producing the right code have been unsuccessful (most likely due to my relative newbie status using R). Any help is much appreciated!
Edit: Dataframe df26 is provided below (as requested by a commenter):
Year Month C1
2010 Jan NA
2010 Feb NA
2010 Mar NA
2010 Apr NA
2010 May NA
2010 Jun NA
2010 Jul NA
2010 Aug 183.6516764
2010 Sep 120.6303348
2010 Oct 85.31007613
2010 Nov 13.7347988
2010 Dec 20.93950545
2011 Jan 13.35780833
2011 Feb 14.16910945
2011 Mar 9.786319721
2011 Apr 41.24848885
2011 May 122.3014387
2011 Jun 422.4012809
2011 Jul 539.8569592
2011 Aug 527.6301222
2011 Sep 385.8199781
2011 Oct 201.7846973
2011 Nov 27.91934061
2011 Dec 7.919004379
2012 Jan 10.22724424
2012 Feb 10.64391791
2012 Mar 88.06585438
2012 Apr 124.0320675
2012 May 325.1399457
2012 Jun 465.938168
2012 Jul 567.2273488
2012 Aug 459.769634
2012 Sep 333.8636373
2012 Oct 102.0607986
2012 Nov 23.18822051
2012 Dec 15.64841121
2013 Jan 7.458238256
2013 Feb 4.34972039
2013 Mar 26.2019396
2013 Apr 38.82781323
2013 May 257.0920645
2013 Jun 357.594195
2013 Jul 383.2780483
2013 Aug 456.469314
2013 Sep 319.3616298
2013 Oct NA
2013 Nov NA
2013 Dec 17.01748185
You need to calculate the means. Then you can plot them.
Using dplyr
library(dplyr)
df26means <- df26 %>%
group_by(Month) %>%
summarize(C1 = mean(C1, na.rm = T))
Then add it to your plot:
ggplot(data=df26, aes(x=Month, y=C1, group=Year, colour=factor(Year))) +
geom_line(size=.75) + geom_point() +
scale_x_discrete(limits=c("Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec")) +
scale_y_continuous(labels=comma) +
scale_colour_manual(values=cPalette, name="Year") +
ylab("Volume") +
geom_line(data = df26means, aes(group = 1), size = 1.25, color = "black")
I'd recommend using annotate to add a nice piece of text on the plot identifying that line as the mean line. To get it in the legend, you'd probably need to set df26means$Year = "Mean", convert df26$Year to a character, rbind the two dataframes together, then convert Year to a factor. The plot code would be simpler, but the data wrangling is a bit more complicated.