I have 3 years of daily data in a column and need to write the code in R to convert the data-frame into a time series object but I am unsure of the coding. I attach the raw data. I was wondering whether to set the frequency to monthly or leave it daily, or whether to adapt the raw data to make it more user friendly in R. Any advice/help would be appreciated.
Thanks
Martin.
I couldn't get the code to load up. I then changed the frequency to just a year and 1 and it accepted the data but it is not giving the full picture.
This is the R code
`install.packages("readxl")
install.packages("forecast")
install.packages("tseries")
library(readxl)
library(forecast)
library(tseries)
asb <- read_excel("C://Users//BCCAMNHY//OneDrive - Birmingham City Council//HomeFiles//My Documents//DATA ANALYST TRAINING//PROJECT 4//PROJECT DOCUMENTS//ASB_311022.xlsx")
View(asb)
class(
asbtime=ts(asb$`ASB Submitted`,start = min(asb$`Date for R`,end = max(asb$`Date for R`),frequency = 12)
class(asbtime)
library(forecast)
library(tseries)
plot(asbtime)
acf(asbtime)
pacf(asbtime)
adf.test(asbtime)
gdpmodel=auto.arima(gdptime,ic="aic",trace = TRUE) ## dont understand this line of code
acf(ts(asb$residuals)) # not sure if this code should be changed to asb$asb submitted
pacf(ts(asb$residuals))# as above
myasbforecast=forecast(asbmodel,level = c(95),h=10*4) ##### Don't understand this line of code. Want a monthly or daily forecast - think ideally monthly
mygdpforecast
plot(asbforecast)
Box.test(myasbforecast$resid, lag=5, type= "Ljung-Box")
Box.test(mygdpforecast$resid, lag=15, type= "Ljung-Box")
Box.test(myasbforecast$resid, lag=25, type= "Ljung-Box")
An extract of the raw data is:
Submitted Count of Submitted
01/03/2019 1
02/03/2019 0
03/03/2019 0
04/03/2019 0
05/03/2019 1
06/03/2019 0
07/03/2019 1
08/03/2019 2
09/03/2019 0
10/03/2019 0
11/03/2019 27
12/03/2019 54
13/03/2019 52
14/03/2019 46
15/03/2019 44
In your example, the names of the data columns do not match those used in the code. I think it's a coincidence but check it out anyway.
IMHO, these will be enought for conversion into ts:
asbtime=ts(asb$`Count of Submitted`, start=2019, frequency = 365)
plot(forecast(asbtime), xlab = "year", ylab="Submitted")
Related
I would like to visualize the number of people infected with COVID-19, but I am unable to obtain the mortality rate because the number of deaths is derived by int when obtaining the mortality rate per 100,000 population for each prefecture.
What I want to achieve
I want to find the solution of "covid19j_20200613$POP2019 * 100" by setting the data type of "covid19j_20200613$deaths" to num.
Error message.
Error in covid19j_20200613$deaths/covid19j_20200613$POP2019:
Argument of binary operator is not numeric
Source code in question.
library(spdep)
library(sf)
library(spatstat)
library(tidyverse)
library(ggplot2)
needs::prioritize(magrittr)
covid19j <- read.csv("https://raw.githubusercontent.com/kaz-ogiwara/covid19/master/data/prefectures.csv",
header=TRUE)
# Below is an example for May 20, 2020.
# Month and date may be changed
covid19j_20200613 <- dplyr::filter(covid19j,
year==2020,
month==6,
date==13)
covid19j_20200613$CODE <- 1:47
covid19j_20200613[is.na(covid19j_20200613)] <- 0
pop19 <- read.csv("/Users/carlobroschi_imac/Documents/lectures/EGDS/07/covid19_data/covid19_data/pop2019.csv", header=TRUE)
covid19j_20200613 <- dplyr::inner_join(covid19j_20200613, pop19,
by = c("CODE" = "CODE"))
# Load Japan prefecture administrative boundary data
jpn_pref <- sf::st_read("/Users/carlobroschi_imac/Documents/lectures/EGDS/07/covid19_data/covid19_data/jpn_pref.shp")
# Data and concatenation
jpn_pref_cov19 <- dplyr::inner_join(jpn_pref, covid19j_20200613, by=c("PREF_CODE"="CODE"))
ggplot2::ggplot(data = jpn_pref_cov19) +
geom_sf(aes(fill=testedPositive)) +
scale_fill_distiller(palette="RdYlGn") +
theme_bw() +
labs(title = "Tested Positiv of Covid19 (2020/06/13)")
# Mortality rate per 100,000 population
# Population number in units of 1000
as.numeric(covid19j_20200613$deaths)
covid19j_20200613$deaths_rate <- covid19j_20200613$deaths / covid19j_20200613$POP2019 * 100
Source code in question.
prefectures.csv
https://docs.google.com/spreadsheets/d/11C2vVo-jdRJoFEP4vAGxgy_AEq7pUrlre-i-zQVYDd4/edit?usp=sharing
pop2019.csv
https://docs.google.com/spreadsheets/d/1CbEX7BADutUPUQijM0wuKUZFq2UUt-jlWVQ1ipzs348/edit?usp=sharing
What we tried
I tried to put "as.numeric(covid19j_20200613$deaths)" before the calculation and set the number of dead to type
num, but I got the same error message during the calculation.
Additional information (FW/tool versions, etc.)
iMac M1 2021, R 4.2.0
Translated with www.DeepL.com/Translator (free version)
as.numeric() does not permanently change the data type - it only does it temporarily.
So when you're running as.numeric(covid19j_20200613$deaths), this shows you the column deaths as numeric, but the column will stay a character.
So if you want to coerce the data type, you need to also reassign:
covid19j_20200613$deaths <- as.numeric(covid19j_20200613$deaths)
covid19j_20200613$POP2019 <- as.numeric(covid19j_20200613$POP2019)
# Now you can do calculations
covid19j_20200613$deaths_rate <- covid19j_20200613$deaths / covid19j_20200613$POP2019 * 100
It's easier to read if you use mutate from dplyr:
covid19j_20200613 <- covid19j_20200613 |>
mutate(
deaths = as.numeric(deaths),
POP2019 = as.numeric(POP2019),
death_rate = deaths / POP2019 * 100
)
Result
deaths POP2019 deaths_rate
1 91 5250 1.73333333
2 1 1246 0.08025682
3 0 1227 0.00000000
4 1 2306 0.04336513
5 0 966 0.00000000
PS: your question is really difficult to follow! There is a lot of stuff that we don't actually need to answer it, so that makes it harder for us to identify where the issue is. For example, all the data import, the join, the ggplot...
When writing a question, please only include the minimal elements that lead to a problem. In your case, we only needed a sample dataset with the deaths and POP2019 columns, and the two lines of code that you tried to fix at the end.
If you look at str(covid19j) you'll see that the deaths column is a character column containing a lot of blanks. You need to figure out the structure of that column to read it properly.
Have used forecasting method using R. Using the below codes:
library(forecast)
t1$StartDate <- as.Date(t1$StartDate, origin = "1899-12-30")
## 10,1 indicates 10th Week & Sunday
ordervalu_ts <- ts(t1$Revenue, start = c(10,1), frequency = 7)
print(ordervalu_ts)
ordervalu_ts_decom <- HoltWinters(ordervalu_ts)
print(ordervalu_ts_decom)
ordervalu_ts_for <- forecast:::forecast.HoltWinters(ordervalu_ts_decom, h=30)
print(ordervalu_ts_for)
t1 is the input file. It has two columns: Date and Revenue. I am trying to forecast the Revenue for the next 30 days. ?Able to get output?. The date in the output is not in the right format. I have the following questions:
Start date: wanted to have it dynamic and not static (ie, the start date to take it form the column "Date"
Output is not providing me the exact date (providing 81.42857 instead of the first predicted date) while providing the prediction. Shows as "Below".
print(ordervalu_ts_for)
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
81.42857 1390.4782 368.3917 2412.565 -172.668266 2953.625
81.57143 1351.3890 328.9055 2373.872 -212.364558 2915.142
81.71429 1355.7625 332.8507 2378.674 -208.646034 2920.171
Can some one help ? Have tried reviewing all the video's in youtube and online. Thanks for your help
I intend to perform a time series analysis on my data set. I have imported the data (monthly data from January 2015 till December 2017) from a csv file and my codes in RStudio appear as follows:
library(timetk)
library(tidyquant)
library(timeSeries)
library(tseries)
library(forecast)
mydata1 <- read.csv("mydata.csv", as.is=TRUE, header = TRUE)
mydata1
date pkgrev
1 1/1/2015 39103770
2 2/1/2015 27652952
3 3/1/2015 30324308
4 4/1/2015 35347040
5 5/1/2015 31093119
6 6/1/2015 20670477
7 7/1/2015 24841570
mydata2 <- mydata1 %>%
mutate(date = mdy(date))
mydata2
date pkgrev
1 2015-01-01 39103770
2 2015-02-01 27652952
3 2015-03-01 30324308
4 2015-04-01 35347040
5 2015-05-01 31093119
6 2015-06-01 20670477
7 2015-07-01 24841570
class(mydata2)
[1] "data.frame"
It is when running this piece of code that things get a little weird (for me at least):
mydata2_ts <- ts(mydata2, start=c(2015,1), freq=12)
mydata2_ts
date pkgrev
Jan 2015 16436 39103770
Feb 2015 16467 27652952
Mar 2015 16495 30324308
Apr 2015 16526 35347040
May 2015 16556 31093119
Jun 2015 16587 20670477
Jul 2015 16617 24841570
I don't really understand the values in the date column! It seems the dates have been converted into numeric format.
class(mydata2_ts)
[1] "mts" "ts" "matrix"
Now, running the following codes give me an error:
stlRes <- stl(mydata2_ts, s.window = "periodic")
Error in stl(mydata2_ts, s.window = "periodic") :
only univariate series are allowed
What is wrong with my process?
The reason that you got this error is because you tried to feed a data set with two variables (date + pkgrev) into STL's argument, which only takes a univariate time series as a proper argument.
To solve this problem, you could create a univariate ts object without the date variable. In your case, you need to use mydata2$pkgrev (or mydata2["pkgrev"] after mydata2 is converted into a dataframe) instead of mydata2 in your code mydata2_ts <- ts(mydata2, start=c(2015,1), freq=12). The ts object is already supplied with the temporal information as you specified start date and frequency in the argument.
If you would like to create a new dataframe with both the ts object and its corresponding date variable, I would suggest you to use the following code:
mydata3 = cbind(as.Date(time(mydata2_ts)), mydata2_ts)
mydata3 = as.data.frame(mydata3)
However, for the purpose of STL decompostion, the input of the first argument should be a ts object, i.e., mydata2_ts.
I had a column of data as follows:
141523
146785
143667
65560
88524
148422
151664
.
.
.
.
I used the ts() function to convert this data into a time series.
{
Aclines <- read.csv(file.choose())
Aclinests <- ts(Aclines[[1]], start = c(2013), end = c(2015), frequency = 52)
}
head(Aclines) gives me the following output:
X141.523
1 146785
2 143667
3 65560
4 88524
5 148422
6 151664
head(Aclinests) gives me the following output:
[1] 26 16 83 87 35 54
The output of all my further analysis including graphs and predictions are scaled to how you can see the head(Aclinets) output. How can I scale the outputs back to how the original data was input? Am I missing something while converting the data to a ts?
It is typically recommended to have a reproducible example How to make a great R reproducible example?. But I will try to help based what I'm reading. If it isn't helpful, I'll delete the post.
First, the read.csv defaults to header = TRUE. It doesn't look like you have a header in your file. Also, it looks like R is reading data in as factors instead of numeric.
So you can try a couple of parameters to reading the file -
Aclines <- read.csv(file.choose(), header=FALSE, stringsAsFactors=FALSE)
Then to get your time series
Aclinests <- ts(Aclines[, 2], start = c(2013), end = c(2015), frequency = 52)
Since your data looks like it has 2 columns, this will read the second column of your data frame into a ts object.
Hope this helps.
I want to create a new dummy variable that prints 1 if my observation is within a certain set of date ranges, and a 0 if its not. My dataset is a list of political contributions over a 10 year range and I want to make a dummy variable to mark if the donation came during a certain range of dates. I have 10 date ranges I'm looking at.
Does anyone know if the right way to do this is to create a loop? I've been looking at this question, which seems similar, but I think mine would be a bit more complicated: Creating a weekend dummy variable
By way of example, what I have a variable listing dates that contributions were recorded and I want to create dummy to show whether this contribution came during a budget crisis. So, if there were a budget crisis from 2010-2-01 until 2010-03-25 and another from 2009-06-05 until 2009-07-30, the variable would ideally look like this:
Contribution Date.......Budget Crisis
2009-06-01...........................0
2009-06-06...........................1
2009-07-30...........................1
2009-07-31...........................0
2010-01-31...........................0
2010-03-05...........................1
2010-03-26...........................0
Thanks yet again for your help!
This looks like a good opportunity to use the %in% syntax of the match(...) function.
dat <- data.frame(ContributionDate = as.Date(c("2009-06-01", "2009-06-06", "2009-07-30", "2009-07-31", "2010-01-31", "2010-03-05", "2010-03-26")), CrisisYes = NA)
crisisDates <- c(seq(as.Date("2010-02-01"), as.Date("2010-03-25"), by = "1 day"),
seq(as.Date("2009-06-05"), as.Date("2009-07-30"), by = "1 day")
)
dat$CrisisYes <- as.numeric(dat$ContributionDate %in% crisisDates)
dat
ContributionDate CrisisYes
1 2009-06-01 0
2 2009-06-06 1
3 2009-07-30 1
4 2009-07-31 0
5 2010-01-31 0
6 2010-03-05 1
7 2010-03-26 0