Related
I'm struggling with changing the x-axis (time) for my time series forecast plot. I have ran many models but I am struggling with the same issue. I'm going to write the code for the model fit, forecast and the plot here for one of the models. First here is my original time series. Note: I'm fitting my model on my training data that is from 2008-2016 and testing my model on my test data for the 11 months in 2017.
Data Split.
sal.ts <- window(sal.ts.original, start=c(2008,1), end=c(2016,12))
sal.test <- window(sal.ts.original, start=c(2017,1))
Now, the model.
sal.hw.mul <- HoltWinters(sal.ts, seasonal = "mult")
sal.hw.mul
fc.hwm <- forecast(sal.hw.mul, h=11)
fc.hwm
plot(fc.hwm, xlim=c(2017,2017+11/12), main = "Forecast from Mutltiplicative HW", xlab = "Year", ylab = "Total Sales, $M")
lines(sal.test,col='red', lwd=2)
legend("topleft", c("Actual", "Predicted"), col = c(4,2), lty = 1)
Here's my forecast plot:
See that ugly 2017.0, 2017.2.... 2017.8? I want it to instead say 1,2,3,....11 for the 11 months of 2017.
Yes, I only want to plot my test data and forecast on it and not the whole series.
I am pretty sure my problem is around my use of the xlim function. I am using that xlim function to just plot the months of 2017 and if I don't use that then R plots the whole series from 2008-2017. I tried to play around with the axis function a lot by setting xaxt="n" in the plot command but still couldn't figure it out.
Let me know if you need more information from me. Any help will be appreciated.
Update, on someone's suggestion I tried to write a custom axis by setting xaxt = 'n' in my plot. Here's the change in code.
x <- seq(1,11,1)
fc.hwm <- forecast(sal.hw.mul, h=11)
fc.hwm
layout(1:1)
plot(fc.hwm, xaxt='n', xlim=c(2017,2017+11/12), main = "Forecast from Mutltiplicative HW", xlab = "Year", ylab = "Total Sales, $M")
axis(side=1, at= x, labels=c("1","2","3","4","5","6","7","8","9","10","11"))
lines(sal.test,col='red', lwd=2)
legend("topleft", c("Actual", "Predicted"), col = c(4,2), lty = 1)
Like you can see. It gets me there half way. I can remove my current axis label but I am not being able to write a new axis. This new code is not even giving me an error or else I would've tried to debug it. It accepts my code but doesn't give me the desired output.
Here's an idea. I'm not sure what the data look like, but I'm guessing that you have a Date type for the date variable -- and that means that your "by" sequence of integer 1 to 11 might be placing those new labels outside the plot limits. Try using a Date sequence instead.
Change this:
x <- seq(1,11,1)
To something like this:
x <- seq.Date(as.Date("2017-01-01"), as.Date("2017-11-01"), "months")
I'm not sure how far into November your data go, so you might want to set that "to" Date in the sequence to December instead, so you can fully cover your November data points.
Please explain me which transformation should I be using in the below code to apply WN model.
Below is the code where difference is used, I did not use log() because the series is decaying :
data <- c(60088,48398,54687,43337,47839,43480,53297,46882,45387,47186,42794,43274,31486,29036,25242,21792,23699,19161)
diff_data <- diff(data)
ts.plot(diff_data)
model_wn <- arima(diff_data, order = c(0, 0, 0))
coeff<-model_wn$coef
ts.plot(data)
abline(0, coeff)
Please explain me two things:
with ts.plot and abline, the abline is not visible in the graph
what can I utilise using the time series analysis with the above data.
'abline' has some parameters that you can specify, for example-
If you want a horizontal line you need to specify h = y-value
If you want a vertical line, you need to specify v = x-value
Your plot is produced by-
ts.plot(data)
If you want a horizontal line in your plot, add this code after the above code-
abline(h = 40000, lty = "dashed", col = "black")
'lty' is for line type and 'col' is for line color.
Similarly, if you want a vertical line, replace 'h' with 'v' in the above code. But remember that the value of 'v' should be within the bounds of your x-variable values.
Hope this helps answer you're question.
I have data that is mostly centered in a small range (1-10) but there is a significant number of points (say, 10%) which are in (10-1000). I would like to plot a histogram for this data that will focus on (1-10) but will also show the (10-1000) data. Something like a log-scale for th histogram.
Yes, i know this means not all bins are of equal size
A simple hist(x) gives
while hist(x,breaks=c(0,1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2,3,4,5,7.5,10,15,20,50,100,200,500,1000,10000))) gives
none of which is what I want.
update
following the answers here I now produce something that is almost exactly what I want (I went with a continuous plot instead of bar-histogram):
breaks <- c(0,1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2,4,8)
ggplot(t,aes(x)) + geom_histogram(colour="darkblue", size=1, fill="blue") + scale_x_log10('true size/predicted size', breaks = breaks, labels = breaks)![alt text][3]
the only problem is that I'd like to match between the scale and the actual bars plotted. There two options for doing that : the one is simply use the actual margins of the plotted bars (how?) then get "ugly" x-axis labels like 1.1754,1.2985 etc. The other, which I prefer, is to control the actual bins margins used so they will match the breaks.
Log scale histograms are easier with ggplot than with base graphics. Try something like
library(ggplot2)
dfr <- data.frame(x = rlnorm(100, sdlog = 3))
ggplot(dfr, aes(x)) + geom_histogram() + scale_x_log10()
If you are desperate for base graphics, you need to plot a log-scale histogram without axes, then manually add the axes afterwards.
h <- hist(log10(dfr$x), axes = FALSE)
Axis(side = 2)
Axis(at = h$breaks, labels = 10^h$breaks, side = 1)
For completeness, the lattice solution would be
library(lattice)
histogram(~x, dfr, scales = list(x = list(log = TRUE)))
AN EXPLANATION OF WHY LOG VALUES ARE NEEDED IN THE BASE CASE:
If you plot the data with no log-transformation, then most of the data are clumped into bars at the left.
hist(dfr$x)
The hist function ignores the log argument (because it interferes with the calculation of breaks), so this doesn't work.
hist(dfr$x, log = "y")
Neither does this.
par(xlog = TRUE)
hist(dfr$x)
That means that we need to log transform the data before we draw the plot.
hist(log10(dfr$x))
Unfortunately, this messes up the axes, which brings us to workaround above.
Using ggplot2 seems like the most easy option. If you want more control over your axes and your breaks, you can do something like the following :
EDIT : new code provided
x <- c(rexp(1000,0.5)+0.5,rexp(100,0.5)*100)
breaks<- c(0,0.1,0.2,0.5,1,2,5,10,20,50,100,200,500,1000,10000)
major <- c(0.1,1,10,100,1000,10000)
H <- hist(log10(x),plot=F)
plot(H$mids,H$counts,type="n",
xaxt="n",
xlab="X",ylab="Counts",
main="Histogram of X",
bg="lightgrey"
)
abline(v=log10(breaks),col="lightgrey",lty=2)
abline(v=log10(major),col="lightgrey")
abline(h=pretty(H$counts),col="lightgrey")
plot(H,add=T,freq=T,col="blue")
#Position of ticks
at <- log10(breaks)
#Creation X axis
axis(1,at=at,labels=10^at)
This is as close as I can get to the ggplot2. Putting the background grey is not that straightforward, but doable if you define a rectangle with the size of your plot screen and put the background as grey.
Check all the functions I used, and also ?par. It will allow you to build your own graphs. Hope this helps.
A dynamic graph would also help in this plot. Use the manipulate package from Rstudio to do a dynamic ranged histogram:
library(manipulate)
data_dist <- table(data)
manipulate(barplot(data_dist[x:y]), x = slider(1,length(data_dist)), y = slider(10, length(data_dist)))
Then you will be able to use sliders to see the particular distribution in a dynamically selected range like this:
Using the plot() function in R, I'm trying to produce a scatterplot of points of the form (SaleDate,SalePrice) = (saldt,sapPr) from a time-series, cross-section real estate sales dataset in dataframe format. My problem concerns labels for the X-axis. Just about any series of annual labels would be adequate, e.g. 1999,2000,...,2013 or 1999-01-01,...,2013-01-01. What I'm getting now, a single label, 2000, at what appears to be the proper location won't work.
The following is my call to plot():
plot(r12rgr0$saldt, r12rgr0$salpr/1000, type="p", pch=20, col="blue", cex.axis=.75,
xlim=c(as.Date("1999-01-01"),as.Date("2014-01-01")),
ylim=c(100,650),
main="Heritage Square Sales Prices $000s 1990-2014",xlab="Sale Date",ylab="$000s")
The xlim and ylim are called out to bound the date and price ranges of the data to be plotted; note prices are plotted as $000s. r12rgr0$saldt really is a date; str(r12rgr0$saldt) returns:
Date[1:4190], format: "1999-10-26" "2013-07-06" "2003-08-25" NA NA "2000-05-24" xx
I have reviewed several threads here concerning similar questions, and see that the solution probably lies with turning off the default X-axis behavior and using axis.date, but i) At my current level of R skill, I'm not sure I'd be able to solve the problem, and ii) I wonder why the plotting defaults are producing these rather puzzling (to me, at least) results?
Addl Observations: The Y-axis labels are just fine 100, 200,..., 600. The general appearance of the scatterplot indicates the called-for date ranges are being observed and the relative positions of the plotted points are correct. Replacing xlim=... as above with xlim=c("1999-01-01","2014-01-01")
or
xlim=c(as.numeric(as.character("1999-01-01")),as.numeric(as.character("2014-01-01")))
or
xlim=c(as.POSIXct("1999-01-01", format="%Y-%m-%d"),as.POSIXct("2014-01-01", format="%Y-%m-%d"))
all result in error messages.
With plots it's very hard to reproduce results with out sample data. Here's a sample I'll use
dd<-data.frame(
saldt=seq(as.Date("1999-01-01"), as.Date("2014-01-10"), by="6 mon"),
salpr = cumsum(rnorm(31))
)
A simple plot with
with(dd, plot(saldt, salpr))
produces a few year marks
If i wanted more control, I could use axis.Date as you alluded to
with(dd, plot(saldt, salpr, xaxt="n"))
axis.Date(1, at=seq(min(dd$saldt), max(dd$saldt), by="30 mon"), format="%m-%Y")
which gives
note that xlim will only zoom in parts of the plot. It is not directly connected to the axis labels but the axis labels will adjust to provide a "pretty" range to cover the data that is plotted. Doing just
xlim=c(as.Date("1999-01-01"),as.Date("2014-01-01"))
is the correct way to zoom the plot. No need for conversion to numeric or POSIXct.
If you are running a plot in real time and don't mind some warnings, you can just pass, e.g., format = "%Y-%m-%d" in the plot function. For instance:
plot(seq((Sys.Date()-9),Sys.Date(), 1), runif(10), xlab = "Date", ylab = "Random")
yields:
while:
plot(seq((Sys.Date()-9), Sys.Date(), 1), runif(10), format = "%Y-%m-%d", xlab = "Date", ylab = "Random")
yields:
with lots of warnings about format not being a graphical parameter.
I have data that is mostly centered in a small range (1-10) but there is a significant number of points (say, 10%) which are in (10-1000). I would like to plot a histogram for this data that will focus on (1-10) but will also show the (10-1000) data. Something like a log-scale for th histogram.
Yes, i know this means not all bins are of equal size
A simple hist(x) gives
while hist(x,breaks=c(0,1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2,3,4,5,7.5,10,15,20,50,100,200,500,1000,10000))) gives
none of which is what I want.
update
following the answers here I now produce something that is almost exactly what I want (I went with a continuous plot instead of bar-histogram):
breaks <- c(0,1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2,4,8)
ggplot(t,aes(x)) + geom_histogram(colour="darkblue", size=1, fill="blue") + scale_x_log10('true size/predicted size', breaks = breaks, labels = breaks)![alt text][3]
the only problem is that I'd like to match between the scale and the actual bars plotted. There two options for doing that : the one is simply use the actual margins of the plotted bars (how?) then get "ugly" x-axis labels like 1.1754,1.2985 etc. The other, which I prefer, is to control the actual bins margins used so they will match the breaks.
Log scale histograms are easier with ggplot than with base graphics. Try something like
library(ggplot2)
dfr <- data.frame(x = rlnorm(100, sdlog = 3))
ggplot(dfr, aes(x)) + geom_histogram() + scale_x_log10()
If you are desperate for base graphics, you need to plot a log-scale histogram without axes, then manually add the axes afterwards.
h <- hist(log10(dfr$x), axes = FALSE)
Axis(side = 2)
Axis(at = h$breaks, labels = 10^h$breaks, side = 1)
For completeness, the lattice solution would be
library(lattice)
histogram(~x, dfr, scales = list(x = list(log = TRUE)))
AN EXPLANATION OF WHY LOG VALUES ARE NEEDED IN THE BASE CASE:
If you plot the data with no log-transformation, then most of the data are clumped into bars at the left.
hist(dfr$x)
The hist function ignores the log argument (because it interferes with the calculation of breaks), so this doesn't work.
hist(dfr$x, log = "y")
Neither does this.
par(xlog = TRUE)
hist(dfr$x)
That means that we need to log transform the data before we draw the plot.
hist(log10(dfr$x))
Unfortunately, this messes up the axes, which brings us to workaround above.
Using ggplot2 seems like the most easy option. If you want more control over your axes and your breaks, you can do something like the following :
EDIT : new code provided
x <- c(rexp(1000,0.5)+0.5,rexp(100,0.5)*100)
breaks<- c(0,0.1,0.2,0.5,1,2,5,10,20,50,100,200,500,1000,10000)
major <- c(0.1,1,10,100,1000,10000)
H <- hist(log10(x),plot=F)
plot(H$mids,H$counts,type="n",
xaxt="n",
xlab="X",ylab="Counts",
main="Histogram of X",
bg="lightgrey"
)
abline(v=log10(breaks),col="lightgrey",lty=2)
abline(v=log10(major),col="lightgrey")
abline(h=pretty(H$counts),col="lightgrey")
plot(H,add=T,freq=T,col="blue")
#Position of ticks
at <- log10(breaks)
#Creation X axis
axis(1,at=at,labels=10^at)
This is as close as I can get to the ggplot2. Putting the background grey is not that straightforward, but doable if you define a rectangle with the size of your plot screen and put the background as grey.
Check all the functions I used, and also ?par. It will allow you to build your own graphs. Hope this helps.
A dynamic graph would also help in this plot. Use the manipulate package from Rstudio to do a dynamic ranged histogram:
library(manipulate)
data_dist <- table(data)
manipulate(barplot(data_dist[x:y]), x = slider(1,length(data_dist)), y = slider(10, length(data_dist)))
Then you will be able to use sliders to see the particular distribution in a dynamically selected range like this: