ACF on parts of dataset? - r

R noob here, I am running acf's In R to check Auto-correlation on my data before running other tests.
Now I am running into 2 problems. I have time-series data for 26 years (1990-2016).
Problem 1. For some of my variables a couple of years have missing data (1995-1997). For these specific variables I would like to start the acf at the year 1998. Is this possible?
Problem 2. One variable has multiple years with missing data throughout the time-series, but only for odd years. Is it possible to do an acf for only even years?
I could manually adjust the data but would prefer to keep it as one dataset.
Thank you!

Related

Plot all pairs of variables in R data frame based on column type [duplicate]

This question already has answers here:
Scatterplot matrixes with boxplots for categorical data
(1 answer)
Create a matrix of scatterplots (pairs() equivalent) in ggplot2
(4 answers)
Closed 29 days ago.
This post was edited and submitted for review 28 days ago and failed to reopen the post:
Original close reason(s) were not resolved
I’m fairly sure I saw a package that did this, but I cannot find its name in my notes.
This package produces a plot for each pair of variables in a data frame, but chooses the plot based on the columns’ types. So, two numeric variables would produce a scatterplot. A numeric y and categorical x would produce side-by-side box plots. Like that. It’s this multiple column type ability that distinguishes it from the packages I can find by Googling.
Perhaps I should say that I’m certain I saw it, and didn’t see a bunch of surrounding code with loops or purr calls looping over the data, so I’m guessing there was a package that did it.
You're probably thinking of GGally::ggpairs:
library(GGally)
ggpairs(iris)

How to use fixed time effects in r?

I have panel dataset with 40 variables for many cities over a period of time, from 1980-2014, and I'm trying to use multiple linear regression using only three variables but also I want to have time dummies for each and every year to control for unobserved shocks over time.
Should I create a dummy for each year? but that would create too many columns.
I don't know how to create the set of time dummies just to be in one column in R (as one variable).
I searched online but couldn't find help.

stepAIC on weekly aggregated data with many columns

I have got around 4 years of data.(US retail data) I aggregated it by (year,weekoftheyear) and built some models and checked the quantity forecast. The performance was not upto the mark. Now I am trying to aggregated data on week basis without considering years.(as all years have almost same behavior in US market and holidays,events fall same date every year). So I end up having only 52 rows of data. I have got around 35 features that I have derived earlier so stepAIC giving infinity error. How do I deal with this issue? Can anyone suggest other good methods in choosing important features instead.Unfortunately I cannot give more information about the data. Thanks in advance.

multiple seasonality-using Tbat() function ,-forecasting

I started using tableau with its integration with R, and I'm using the predicted graphs.
I have 6 years of data (hourly) with multiple seasonalities, as hourly, weekly and yearly.
library(forecast); data <- msts(.arg1, seasonal.periods=c(24, 7 * 24, 365 * 24)
I've applied the above in tableau. It is taking 8 hours to complete but not getting good results. Previously I used the ts() function that was showing good results when I applied f=365,{days wise data}, but on hourly data this is not showing good results.
There may be some seasons that are getting missed. I know tbat() can do the job but I need to improve it over tableau.
Dates are notoriously difficult. The biggest issue is that you're not accounting for leap years, which will happen in any six year window. Holidays make life even more complicated, since some holidays fall in different days of the week depending on the year, which can change observations.
Take a step back. What kind of data do you have? What do you want to learn about it? That will inform the best approach.

How to plot time series clusters in R? [duplicate]

This question already has an answer here:
How can I produce plots like this?
(1 answer)
Closed 9 years ago.
Just read the "Mining time series data" pdf by Ratanamahatana, Lin, Gunopulos and Keogh. Did someone know how to visualize time series clusters in R like in the Figure 1.7?
You can visualize 100s of Time Series sequences with Sparklines. If you also want to the Hierarchical ordering, the you could attain that in 2 steps.
Sort your data.frame of Times Series sequences by their multi-level clusters. (This assumes that you have computed the cluster hierarchy for each series.)
Download and install the SparkTable in your R setup. Now plot the Sparklines for your TS sequences. Take a look at this Inside-R page for SparkEPS.
This answer on statExchange is exactly what you need for the plotting part, so I am not reproducing the same example here.
Hope that helps.
This figure most likely is made with a drawing program, not with a data mining software.
Nobody would run cluster analysis on 6 observations like this. It's easier to look at them visually and do it manually than figuring out how to have a program visualize it this way.

Resources