How to optimally organize data analysis with different Date granularity - r

I am analyzing data for a client and I am building a Shiny app to showcase my graphs.
The original time granularity of this data is very granular as it is raw data and it is in UTC to the second. ie: 2019-12-16 17:07:40
I started my analysis at a weekly granularity using the cut function:
Signup_Table_Weekly <- User_Info[ , c("userID", "Signup_Date", "Device")]
Signup_Table_Weekly$Signup_Date <- cut(as.Date(Signup_Table$Signup_Date), "week")
I have the feeling this is not optimized at all because I now want to analyze the same data at a monthly granularity and I find myself having to replicate the same chunk of code and modify the parameter "week" in cut to "month", ending up with multiple dataframes:
Signup_Table_Monthly <- User_Info[ , c("userID", "Signup_Date", "Device")]
Signup_Table_Monthly$Signup_Date <- cut(as.Date(Signup_Table$Signup_Date), "month")
Any advice on the proper method? Should I do all my analysis with very granular raw dates and only aggregate in my ggplot X axis graph for the Date ?

Related

Converting excel file to time series format

I have a data in excel with 13 columns defined as year and months. I want to run time series analysis but finds it difficult to plot the data. Please how can I convert this file such that I can analyse the data?
My excel file
Tried plotting but gets error. Error reads "can not plot morethan 10 series"
Say your data is df.
Using base R,
rownames(df) <- df$Year
df$Year <- NULL
ts(as.vector(t(as.matrix(dummy.df))),
start=..., end=..., frequency=12)
You may change start and end depends on your data.

How to get dates on the xaxis of my Arima forecast plot rather than just numbers

I have imported a netCDF file into R and created a dataset which has 58196 time stamps. I’ve then fitted an Arima model to it and forecasted. However, the format of the time is ‘hours since 1900-01-01 00:00:00’. Each of the times are just in a numerical order up to 58196, but I would like to use ggplot to plot the forecast with dates on the xaxis.
Any ideas? Here is some code I have put in.
I have read in the required variable and taken it along what pressure level I want, so that it is a single variable at 58169 times, 6hourly intervals up to the end of the year in 2018. I have then done the following:
data <- data_array[13, ] # To get my univariate time series.
print(data)
[58176] -6.537371e-01 -4.765177e-01 -4.226107e-01 -4.303621e-01
-3.519134e-01
[58181] -2.706966e-01 -1.864843e-01 -9.974014e-02 2.970415e-02
6.640909e-02
[58186] -1.504763e-01 -3.968417e-01 -4.864971e-01 -5.934973e-01
-7.059880e-01
[58191] -7.812654e-01 -7.622807e-01 -8.968482e-01 -9.414597e-01
-1.003678e+00
[58196] -9.908477e-01
datafit <- auto.arima(data)
datamodel <- Arima(data, order = c(5, 0, 2))
datafcst <- forecast(datamodel, h=60, level=95)
plot(datafcst, xlim=c(58100, 58250))
enter image description here
I have attached the image it yields too. The idea is that I can use ggplot to plot this rather than the standard plot, with dates on the xaxis instead of the numerical values. However, ggplot also won't work for me as it says it isn't considered a data frame?
Many thanks!
as you did not provide a minimal example it is hard to help you but I try. Assume your date is called "date".
dater = as.Date(strptime(date, "%Y-%m-%d"))
And from ?strptime:
format
A character string. The default for the format methods is "%Y-%m-%d %H:%M:%S" if any element has a time component which is not midnight, and "%Y-%m-%d" otherwise.
Hope that helps

How to set and display index with multiple columns using R time series and forecast

I have a sample usage table of 'Account','Asset','Date','Asset Network Usage' with 15 days of summarised Usage data per Asset. I am trying to append the table with forecasted usage per day over the next 15 days, or at least create an output with the same table structure.
E.g.
Date (m/d/Y) Account Asset Network Usage
4/4/2019 Acct#100 AS-4310 56.5251
4/5/2019 Acct#100 AS-4310 592.1843
4/6/2019 Acct#100 AS-4310 556.1898
4/7/2019 Acct#100 AS-4310 808.2403
4/8/2019 Acct#100 AS-4310 466.118
I've been able to produce the appended table aggregating only by Date. I want to include Date / Account / Asset however I'm challenged in setting an index that doesn't run into an error on the timeseries ts() function
library(forecast)
library(ggfortify)
dataset <-
as.data.frame(read.csv(file="/path/Data.csv",header=TRUE,sep=","))
dataset <- aggregate(Network_Usgae ~ Date,data = dataset, FUN= sum)
ts <- ts(dataset$Network_Usage, frequency=15)
decom <- stl(ts,s.window = "periodic")
pred <- forecast(decom,h = 15)
fort <- fortify(pred,ts.connect= TRUE )
Any suggestions on syntax updates, or use of a different method to achieve my outcome?
I think forecast only works on objects convertable to matrixes, my suggestion is using lists and predicting the "values" while keeping relevant information about other stuff in other elements.
If you provide a dput() dataset I can create an example for you.
Good Luck.

Create intraday chart with R using chartSeries, candleChart, or barChart?

Can chartSeries, candleChart, or barChart be used to create an intraday chart in R?
chartSeries, candleChart, and barChart are part of the quantmod package for R.
First we need some example intraday trading data, which you can get for free from a variety of sites including Google's Undocumented Finance API.
Get some example data (hourly intervals)
query_addr <- 'https://www.google.com/finance/getprices'
stock_symb <- 'GOOG'
stock_exch <- 'NASD'
intvl_size <- 60*60 # 1 hr interval (in seconds) -- use 24*this for daily
period_len <- '90d'
output_fmt <- 'd,o,h,l,c,v' # date, open, high, low, close, volume
library(httr)
resp <-
POST(url = query_addr,
query = list(q = stock_symb,
x = stock_exch,
i = intvl_size,
p = period_len,
f = output_fmt) )
df <-
read.csv(text = content(resp),
skip = 7,
header = FALSE,
stringsAsFactors = FALSE)
# we need a function to munge the date convention used by google finance API
g_fin_date <- function(dstr, intvl_size){
unix_dates <- numeric(length(dstr))
date_is_unix <- grepl('^a',dstr)
unix_dates[date_is_unix] <- as.numeric(sub('^a','',dstr[date_is_unix]))
for(i in 2L:length(dstr)){
if(!date_is_unix[i]){
unix_dates[i] <- unix_dates[i-1] + intvl_size
}
}
return(as.POSIXct(unix_dates,origin="1970-01-01",tz="GMT" ))
}
# see header of resp text for column order
names(df) <- c('close_date','Close','High','Low','Open','Volume')
df[,'close_date'] <- g_fin_date(df[,'close_date'], intvl_size=intvl_size)
Here I have just chosen hourly open (i.e. beginning price), high, low, close (i.e. ending price)-- but you can specify a finer level of detail if you desire -- it will still roll up to a larger period with quantmod::to.period().
Make an xts
Once we have a data frame (such as you might obtain from an API or flat file) then you need to convert the data to xts. Note that for xts the timestamp must be the row name (and can be dropped from the columns).
library(xts)
rownames(df) <- df$close_date
df$close_date <- NULL
Convert to OHLC (Open, High, Low, Close) using xts
This is straightforward aggregation -- see ?to.period
GOOG <- to.hourly(as.xts(df)) # for daily use to.daily(as.xts(df))
More chart examples available at quantmod.com.
Make some charts using quantmod
There are great charts already built into quantmod, including the ones you mentioned.
library(quantmod)
chartSeries(GOOG)
barChart(GOOG, theme='white.mono',bar.type='hlc')
candleChart(GOOG,multi.col=TRUE,theme='white')
Enjoy your charts
Me: "I'll take intra-day time series charts in R for 100 Alex!" :D
Alex: "THIS popular format for financial time series can be used by quantmod functions chartSeries, candleChart, and barChart to create intraday charts in R"
Me: "What is an xts object, indexed by data/time stamp, containing prices for the Open, High, Low, and Close?
Alex: "Right you are!"

R - How to work with and plot time data

I have the following data (of which the following is a small sample):
times <- c("02:45:00", "02:45:07", "02:45:10", "02:45:20", "02:45:25", "02:45:27", "02:45:27", "02:45:30", "02:45:32", "02:45:37")
I would like to plot these times and be able to have them be in a time variable format if possible. In the graph, I want to be able to have different time bands in order to create a histogram of the different distribution of times.
You could look into strptime to get familiar with the base time format.
Then, something like this might get you started:
hist(strptime(times,"%H:%M:%S"), "secs", freq = TRUE, xlab="seconds")

Resources