my dataset I am trying to implement Time Series Analysis on a data set which has two attributes (Year & Sales). Year are 2016,2017 & 2018 for which there are average sales value for all 12 months. My data looks like below:
JAN FEB MAR APR MAY JUNE
2016 4457. 4,105 4,276 4712. 5,116 4,512
2017 4,222 5,432 4,816 5,018 4,497 4,603
2018 4,355 4,972 4,868 4,665 4,735 4,926
This is just some part of my data set to get an idea how it looks like. The months are JAN to DEC. Now I want to know, firstly, how to import this data set into R? As I obviously cannot import it like this because it treats all the columns like X1,X2 etc and these becomes too many variables. Secondly, R takes this data set as "data.frame". How can I convert it into just "ts". I have tried
data.ts<- as.ts(myData)
but it converts it into
"mts" "ts" "matrix"
and moreover, it shows my frequency 1 while it should
be 12. Please help me. I am stuck at the starting.
First you want to restructure your data to be in long format which can be done with the gather function from tidyr.
library(tidyr)
myData <- myData %>% tidyr::gather(timeperiod, sales, JAN:DEC)
Then your data will be structured to create a time series:
ts <- as.ts(data, from=c(2016,1), frequency=12)
Related
I included a photo of a part of my data frame called dfbtc1 and the columns of the dat is called created_at. Now, I would like to sort the data frame by date such that it starts with thu sept 15 00.00.... and ends at thu sept 15 23.59....
Here I include a picture of the current data frame.
How can I fix this?
I have two lines of code that looks like this:
US <- tibble(USA = c(......))
Country <- map_df(US$USA, get_country)
By using map_df I want to use the function get_country to download all available data from first of January 2015 to 1 January to 2018. The dates are in the format 2015-01-01. So I want to find out what to write where ... is to approach the interval.
I'm just starting with time series analysis in R and I am having a hard time to figure out the best format of my ts file.
I will be importing data into R from a csv file and the data frame will look like this:
date sales
2015/01/01 150
2015/02/01 200
2015/03/01 175
...
My aim is to break this data into its time series components: Seasonal, Trend and Irregular
Can I leave the data ás is'and then convert it into a ts format and proceed with my analysis?
I have seen time series data in the following format also:
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2015 150 200 175 ...
2016 250 420 350 ...
...
Which of these 2 formats work best for time series analysis in R? Does it make a difference?
For monthly data the simplest way is to use ts(), e.g.
ts(data, start=c(2015,1), freq=12)
This will produce the time series object you refer to in your last table. Some functions in R require your time series to be a ts()-object such that it can record the frequency often through tsp(), e.g. stats::stl. tsp() returns the properties of the time series, i.e. start date, end date, frequency. Some also require xts()-objects from library(xts), often used for hourly or higher frequency data. For multi-seasonal data you can use msts() from library(forecast) e.g. for forecast::tbats.
i have a data with date(2015)with mm/dd/yy format and sales. I need to predict sales for 2016 with the given data. I just know, I need to use time series forecasting. However no idea. Since, many examples have only year like(1960,1970,..) my data has only one year with several months. Don't know how to plot too. can you give me a clear structure how to proceed?
Assuming that the date is in string and in the format mm/dd/yy
convert string into date by using this code
a <- "07/23/15"
b <- as.Date(a, format = "%m/%d/%y")
fullYear <- format(b,'%Y') // to get 2015 as year
halfYear <- format(b, '%y') //to get 15 as year
After this you can work on
I have found the solution. Converted sales figure into time series format.
plotted the data and seen whether there is any trend/Seasonality.
Since the data has only trend applied holts exponential smoothing under forecast package. Sales of 2016 has been found and plotted.
I want to convert monthly data into quarterly averages. These are my 2 datasets:
gas <- UKgas
dd <- UKDriverDeaths
I was able to accomplish (I think) for the dd data as so:
dd.zoo <- zoo(dd)
ddq <- aggregate(dd.zoo, as.yearqtr, mean)
However I cannot figure out how to do this with the gas data...any help?
Follow-up
When I try to subset the data based on date (1969-1984) the resulting data does not include 1969 Q1 and instead includes 1985 Q1...any suggestions on how to fix this? I was just trying to subset as gas[1969:1984].
Originally I did not plan to post answer, as it looks like you did not pre-check your UKgas dataset to see that it is already a quarterly time series.
But the follow-up question is worth answering. "ts" object comes with many handy generic functions. We can use window to easily subset a time series. To extract the section between first quarter of 1969 and the final quarter of 1984, we can use
window(UKgas, start = c(1969,1), end = c(1984,4))
The result will still be a quarterly time series.
On the other hand, if we use "[" for subsetting, we lose object class:
class(UKgas[1:12])
#[1] "numeric"