I have downloaded the historical prices between Jan-1-2010 and Dec-31-2014 for Twitter, Inc. (TWTR) -NYSE from YAHOO! FINANCE in a twitter.csv file.
I then loaded it into RStudio using:
x = read.csv("Z:/path/to/file/twitter.csv", header=T,stringsAsFactors=F)
Here is how table x looks like:
View(x)
Then I used ts function to get the time series of Adj.Close:
x.ts = ts(x$Adj.Close, frequency = 12, start=c(2010,1), end=c(2014,12)
x.ts
How the previous results have been obtained? They are really different from table x data. Do they need any adjustements?
Your problem is the scale in which the data are read. With frequency = 12, start=c(2010,1), end=c(2014,12) you are telling the function that you have one number per month. If you have one number per day, as it's your case, you should try with:
x.ts = ts(x$Adj.Close, frequency = 365, start=c(2010,1), end=c(2014,365)
Firstly, frequency should be set to 365 if you deal with daily data, 12 if monthly etc.
Secondly
Secondly, I think you need to arrange the data ascending chronologically before using the ts() function.
The function blindly follows exactly what you are telling it, e.g. the data from the chart starts with the first value 35.87 in 2014-12-31 but the start date in the code is 2010, January, meaning it will attribute that value to being associated with Jan-2010.
x <- x %>%
dplyr::arrange(date)
ts.x <- ts(x$Adj.Close, frequency = 365, start=min(x$date), end=max(x$date))
Related
i have a WEEKLY dataset that start on 1986.01.03 and end on 2022-10-07.
The problem is when I forecast the time series with Arima +garch, because the date in T0 is wrong, i.e. 1975 enter image description here.
The function that I used to convert the dataset into time series is here, but I think that the problem is here, since it doesn't take on the right date.
FutureWeekly= ts(WeeklyFuture$FutureWeekly, start= c(1986,1), end = c(2022,10), frequency = 52)
does anyone know how to convert a weekly dataset to time series other than this?
There are the first rows of my dataset and then I have to transform that into returns (diff(log(FutureWeekly) to do the ARMA+GARCH
enter image description here
Try this:
futures<-c(WeeklyFuture$FutureWeekly) #convert to vector
FutureWeekly= ts(futures, start= c(1986,1,10), end = c(1986,3,7), frequency = 52) #add day of week ending on
One of the things ts() demands is a vector of values. I think it might also be easier for ts() to convert the data if it was able to see the 7-day increments.
Assuming you have full un-broken weekly data for the entire period, I think these two things will solve the problem.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 months ago.
Improve this question
This is the CCMP Index. R produces a wrong graph. It never took a dip. The graph in Excel shows it correctly. The data in CSV file has no problem, so what did I do wrong?
ccmp <- read.csv("/Users/jackconnors/Downloads/yeet.csv")
ccmp$time=as.Date(ccmp$date, format ="%m/%d/%Y")
ccmp=ccmp[order(ccmp$time), ]
### Find the Date Range
ccmp_min_date = min(ccmp$time)
ccmp_max_date=max(ccmp$time)
### TS variable
ccmp_ts=ts(ccmp$price ,start=c(2012, 7), end=c(2022, 7), frequency=365)
View(ccmp_ts)
plot(ccmp_ts, xlab="Year", ylab="Price", main="CCMP Prie", lwd=2.5)
This can be diagnosed even without access to your data. Knowing the real length of your series, i.e., length(ccmp$price), is sufficient. But basically, your following usage of ts() is wrong.
ccmp_ts <- ts(ccmp$price, start = c(2012, 7), end = c(2022, 7), frequency = 365)
By specifying start = c(2012, 7) and frequency = 365, you tell ts() that there are 365 data in each year and the series starts from day 7 in 2012. By specifying end = c(2022, 7), you tell ts() that you want to get 365 * (2022 - 2012 + 1) + 1 = 4016 out of your series ccmp$price. Check length(ccmp_ts) to verify this.
But what if ccmp$price has fewer data than this? Well, data will be recycled. This is what happened to you. The figure clearly shows that data in 2019 ~ 2022 are identical to data in 2012 ~ 2015.
Usually we never specify start and end at the same time when doing y <- ts(x, ...), as they will exactly imply the length of the resulting series.
If y is shorter than x, then x will be truncated, which is fine;
If y is longer than x, your series will be recycled, which causes problem.
By omitting either start or end, the other will be auto-determined based on frequency. All data in x are kept: no truncation or recycling. The resulting y is identical to x.
So, to make your code run without problem, you can drop either start = c(2012, 7) or end = c(2022, 7).
But working code does not mean everything. Believe it or not, although you can pass any positive value into frequency, only 1 (evenly spaced series), 4 (quarterly series) and 12 (monthly series) have natural interpretation. When you pass other values, you need to make sure it is a sensible period. Here, 365 is not a good one for day of year, because leap years have 366 days.
I can only imagine two situations where using ts() for daily time series is reasonable.
Daily series grouped by week, i.e., frequency = 7. So time can be interpreted as Monday, Tuesday, ..., Sunday.
Daily series with no grouping, i.e., frequency = 1. So time is simply interpreted as day 1, day 2, etc.
If you want to identify daily series with full time information, like year, month, etc, you have to use package zoo or xts to create a "zoo" object or "xts" object.
I am new to R and I am currently struggling to convert a set of data into TS format.
Call_VolumeTS10 <- ts(Forecast_Data_Test$`Call Volume`, frequency = 578, start = c(2019, 1,1), end=c(2020, 7, 31))
However, the code does not properly convert data into a daily time series.
Is it a problem with my code?
As pointed out by #AlexB, the value for frequency is certainly oddly defined. However, the error you're having it's also related to how you defined start and end.
You cannot define start and end with a numeric vector of length 3. It must be a vector of length 2: the first number should be the year (or, generically speaking, the number of seasons that have passed) and the second number should be a number between 1 and the value of the frequency.
To properly write your ts, you should use this code:
Call_VolumeTS10 <- ts(Forecast_Data_Test$`Call Volume`, frequency = 365.25, start = c(2019, 1))
Start, end and frequency will be defined as follow:
your_data <- rnorm(578)
your_ts <- ts(your_data, start = c(2019, 1), frequency = 365.25)
tsp(your_ts)
#> [1] 2019.00 2020.58 365.25
## respectively: start end frequency
However, I suppose you want to define a ts to forecast it.
The problem is that a frequency of 365.25 is rarely handled correctly by forecasting methods (for example forecast::auto.arima or forecast::ets).
Probably you may need to use frequency = 7. Of course, in that case, the value for end in the time series definition will make no sense.
your_data <- rnorm(578)
your_ts <- ts(your_data, start = c(2019, 1), frequency = 7)
tsp(your_ts)
#> [1] 2019.000 2101.429 7.000
## respectively: start end frequency
Obviously, it has no meaning. So it would just make more sense to define it this way:
your_data <- rnorm(578)
your_ts <- ts(your_data, frequency = 7)
tsp(your_ts)
#> [1] 1.00000 83.42857 7.00000
## respectively: start end frequency
In this case, you can interpret the difference between 1.000 and 83.428 as if 82 weeks has passed since the beginning of the time series (plus a couple of days).
Alternatively, you can use the msts function from the forecast package that allows you to define multiple frequencies.
library(forecast)
msts(your_ts, start = c(2019, 1), seasonal.periods = c(7, 365.25))
your_ts
#> Multi-Seasonal Time Series:
#> Start: 2019 1
#> Seasonal Periods: 7 365.25
#> Data:
#> ...
That msts object is well-integrated with forecast::fourier and forecast::tbats.
I suggest you to have a look at this for some ideas about it.
About how to write start in a proper way...
"2019-01-01" is a pretty convenient day, because the right value for start will be c(2019, 1). However, if you find yourself with the need to write a different start date, I suggest you to use this code to define the start time:
start = c(lubridate::year(date), as.numeric(format(date, "%j")))
Where date is any date in the format yyyy-mm-dd.
I have a dataframe where some of the columns are starting later than the other. Please find a reproducible example.
set.seed(354)
df <- data.frame(Product_Id = rep(1:100, each = 50),
Date = seq(from = as.Date("2014/1/1"),
to = as.Date("2018/2/1"),
by = "month"),
Sales = rnorm(100, mean = 50, sd= 20))
df <- df[-c(251:256, 301:312, 2551:2562, 2651:2662, 2751:2762), ]
library(zoo)
z <- read.zoo(df, index = "Date", split = "Product_Id", FUN = as.yearmon)
tt <- as.ts(z)
Now for this dataframe for the columns 6,7,52,54 and 56 I want to define them as timeseries starting from a different date as compared to the rest of the dataframe. Supposedly the data begins from Jan 2000, column 6 will begin from July 2000, column 7 from Jan 2001 and so on. How should I proceed to do this?
Later, I want to perform a forecast on this dataset. Any inputs on this? Should I consider each column as a seperate dataframe and do the forecasting. Or can I convert each column to a different timeseries object that starts from the first non NA value?
Now for this dataframe for the columns 6,7,52,54 and 56 I want to define them as timeseries starting from a different date as compared to the rest of the dataframe. Supposedly the data begins from Jan 2000, column 6 will begin from July 2000, column 7 from Jan 2001 and so on. How should I proceed to do this?
There, AFAIK, no way to do this in R in a time series matrix. And if each column started at a different date, then (since each column has the same number of entries), each column would also need to end at a different date. Is this really what you need? A collection of time series that all happen to be of the same length (so they can fit into a matrix), but that start and end with offsets? I struggle to understand where something like this would be useful, outside a kind of forecasting competition.
If you really need this, then I would recommend you put your time series into a list structure. Then each one can start and end at any date, and they can be the same or different lengths. Take inspiration from Mcomp::M3.
Later, I want to perform a forecast on this dataset. Any inputs on this? Should I consider each column as a seperate dataframe and do the forecasting. Or can I convert each column to a different timeseries object that starts from the first non NA value?
Since your tt is already a time series object, the simplest way would be simply to iterate over its columns:
fcst <- matrix(nrow=10,ncol=ncol(tt))
for ( ii in 1:ncol(tt) ) fcst <- forecast(ets(tt[,ii]),10)$mean
Note that most modeling functions in forecast will throw a warning and do something reasonable on encountering NA values. Here, e.g.:
1: In ets(tt[, ii]) :
Missing values encountered. Using longest contiguous portion of time series
Of course, you could do something yourself inside the loop, e.g., search for the last NA and start the time series for modeling right after that (but make sure you fail gracefully if the last entry is NA).
I have a two variable dataframe (df) in R of daily sales for a ten year period from 2004-07-09 through 2014-12-31. Not every single date is represented in the ten year period, but pretty much most days Monday through Friday.
My objective is to aggregate sales by quarter, convert to a time series object, and run a seasonal decomposition and other time series forecasting.
I am having trouble with the conversion, as ulitmately I receive a error:
time series has no or less than 2 periods
Here's the structure of my code.
# create a time series object
library(xts)
x <- xts(df$amount, df$date)
# create a time series object aggregated by quarter
q.x <- apply.quarterly(x, sum)
When I try to run
fit <- stl(q.x, s.window = "periodic")
I get the error message
series is not periodic or has less than two periods
When I try to run
q.x.components <- decompose(q.x)
# or
decompose(x)
I get the error message
time series has no or less than 2 periods
So, how do I take my original dataframe, with a date variable and an amount variable (sales), aggregate that quarterly as a time series object, and then run a time series analysis?
I think I was able to answer my own question. I did this. Can anyone confirm if this structure makes sense?
library(lubridate)
# add a new variable indicating the calendar year.quarter (i.e. 2004.3) of each observation
df$year.quarter <- quarter(df$date, with_year = TRUE)
library(plyr)
# summarize gift amount by year.quarter
new.data <- ddply(df, .(year.quarter), summarize,
sum = round(sum(amount), 2))
# convert the new data to a quarterly time series object beginning
# in July 2004 (2004, Q3) and ending in December 2014 (2014, Q4)
nd.ts <- ts(new.data$sum, start = c(2004,3), end = c(2014,4), frequency = 4)