Plotting monthly time series in R should be simpler - r

R could be amazingly powerful and frustrating at the same time. This makes teaching R to non-statisticians (business students in my case) rather challenging. Let me illustrate this with a simple task.
Let's say you are working with a monthly time series dataset. Most business data are usually plotted as monthly time series. We would like to plot the data such that the x-axis depicts a combination of month and year. For instance, January 2017 could be depicted as 2017-01. It should be straightforward with the plot command. Not true.
Data Generation
Let's illustrate this with an example. I'll generate a random time series of monthly data for 120 observations representing 10 years of information starting in January 2007 and ending in December 2017. Here's the code.
set.seed(1234)
x <- rnorm(120)
d <-.07
y <- cumsum(x+d)*-1
Since we have not declared the data as time series, plotting it with the plot command would not return the intended labels for the x-axis. See the code and the chart below.
plot(y, type="l")
Now there should be an option in the plot or the plot.ts command to display the time series specific x-axis. I couldn't find one. So here's the workaround.
Declare the data set to be time series.
Use tsp and seq to generate the required x-axis labels.
Plot the chart but suppress x-axis.
Use the axis command to add the custom x-axis labels.
Add an extra step to draw a vertical line at 2012.
Here's the code.
my.ts <- ts(y, start=c(2007, 1), end=c(2017, 12), frequency=12)
tsp = attributes(my.ts)$tsp
dates = seq(as.Date("2007-01-01"), by = "month", along = my.ts)
plot(my.ts, xaxt = "n", main= "Plotting outcome over time",
ylab="outcome", xlab="time")
axis(1, at = seq(tsp[1], tsp[2], along = my.ts), labels = format(dates, "%Y-%m"))
abline(v=2012, col="blue", lty=2, lwd=2)
The result is charted below.
This is a workable solution for most data scientists. But if your audience comprises business students or professionals there are too many lines of code to write.
Question: Is it possible to plot a time series variable (object) using the plot command with the format option controlling how the x-axis will be displayed?
--

ggplot2 package has the scale_x_date function for plotting time series in desired scales, labels, breaks and limits (day, month, year formats).
All you need is date class object and values y. For eg.
dates = seq(as.Date("01-01-2007", format = "%d-%m-%Y"), length.out = 120, by = "month")
df <- data.frame(dates, y)
# use the format you need in your plot using scale_x_date
library(ggplot2)
ggplot(df, aes(dates, y)) + geom_line() + scale_x_date(date_labels = "%b-%Y") +
geom_vline(xintercept = as.Date("01-01-2012", format = "%d-%m-%Y"), linetype = 'dotted', color = 'blue')

I think the question boils down to wanting a pre-written function for the custom axis you have in mind. Note that plot(my.ts) does give a plot with ticks every month and labels every year which to me looks better than the plot shown in the question but if you want a custom axis since R is a programming language you can certainly write a simple function for that and from then on it's just a matter of calling that function.
For example, to get you started here is a function that accepts a frequency 12 ts object. It draws an X axis with ticks for each month labelling the years and each every'th month where the every argument can be a divisor of 12. The default is 3 so a label for every third month is shown (except Jan which is shown as the year). len is the number of letters of the month shown and can be 1, 2 or 3. 1 means show Jul as J, 2 means Ju and 3 means Jul. The default is 1.
xaxis12 <- function(ser, every = 3, len = 1) {
tt <- time(ser)
axis(side = 1, at = tt, labels = FALSE)
is.every <- cycle(ser) %in% seq(1, 12, every)[-1]
month.labs <- substr(month.abb[cycle(ser)][is.every], 1, len)
axis(side = 1, at = tt[is.every], labels = month.labs,
cex.axis = 0.7, tcl = -0.75)
is.jan <- cycle(ser) == 1
year.labs <- sprintf("'%02d", as.integer(tt)[is.jan] %% 100)
axis(side = 1, at = tt[is.jan], labels = year.labs,
cex.axis = 0.7, tcl = -1)
}
# test
plot(my.ts, xaxt = "n")
xaxis12(my.ts)

Gabor is spot-on. It really just depends on what you want, and what you are willing to dig up or alter. Here is a simple alternative using a newer and less-well-known package that is excellent for plotting xts types:
## alternative
library(rtsplot) # load the plotting package
library(xts) # load the xts time-series container package
xx <- as.xts(my.ts) # create an xts object
rtsplot(xx, main= "Plotting outcome over time")
rtsplot.x.highlight(xx, which(index(xx)=="Jan 2012"), 1)
As you can see, the plotting then is two calls -- rtsplot has lots of nice defaults. Below is a screenshot as I am lazy, the plot window does of course not have a title bar...

Related

My R bar plot is not showing the difference in yearly sales clearly

After successfully running my bar plot, I have a PROBLEM where the numbers on the y axis are in codes and not in full, and instead of showing normal figures, it shows codes like 0e+00.
I would like the Y axis to show between 102,000 and 106,000.
The bar plot also has different figures. But it is not showing clearly the difference in yearly sales.
# Table of average total sales by year
yearly_sales <- DataSet %>% group_by(Year) %>% summarise(sales = mean(Weekly_Sales))
summarise(yearly_sales)
mycols <- c("#FF7F50", "#DE3163", "#6495ED")
# Bar chart for Average Yearly Sales 2010 to 2012
barplot(height = yearly_sales$sales, names = yearly_sales$Year, col = mycols)
The bar plot is shown here.
How can I get the chart to show clearly the difference in sales and how can I get the actual values too instead of 0e+00
This can be done using ylim and xpd, though I caution representing your data this way as it exaggerates the true difference between years and is a general violation of data visualization. The figure you represented in your question is more accurate.
At any rate, if you wanted to do this:
# Example data
data <- c(1033660, 1046239, 1059670)
# Plot, `a` saves the position of the x axis to place years text
a <- barplot(data, ylim = c(1020000, 1060000), xpd = FALSE, col = c("#FF7F50", "#DE3163", "#6495ED"))
axis(1, at = a, labels = c(2010, 2011, 2012), tick = FALSE)
ylim sets the limits and xpd clips the overhang at the lower axis.
Either temporally disabling scientific notation ,
op <- options(scipen=999) ## set par
barplot(sales ~ Year, Yearly_sales, col=2:4)
par(op) ## restore old par
or formatCing.
barplot(sales ~ Year, data=Yearly_sales, col=2:4, yaxt='n')
axis(side=2, at=axTicks(2), labels=formatC(axTicks(2), format='fg'))
Just like #jpsmith I'm neither a fan of narrowing the x-axis, because that's exactly how folks pimp their statistics and give the impression of huge differences while they are actually tiny, thus deceiving their readers. Please don't do this.
It might be more professional to show the percentage differences, e.g. compared to the first measurement in 2010.
Yearly_sales <- transform(Yearly_sales,
diff=(sales - sales[Year == 2010])/sales[Year == 2010])
barplot(diff ~ Year, Yearly_sales, col=2:4, yaxt='n', ylim=c(-.04, 0))
axis(side=2, at=axTicks(2), labels=paste0(axTicks(2)*100, '%'))
Data:
Yearly_sales <- structure(list(sales = c(1033660, 1046239, 1059670), Year = 2012:2010,
diff = c(-0.0245453773344532, -0.0126747006143422, 0)), class = "data.frame", row.names = c(NA,
-3L))

R Plot Numeric Variable vs Days of the Week

I have a dataframe with two columns, Global Active Power is a numeric column and DateTime is a datetime type column. When I execute the command plot(DateTime,Global Active Power), I automatically get the days of the week as ticks on the x axis.
1. Can someone explain how this is happening?
2. Also, when I run plot(as.factor(weekdays(DateTime)),Global Active Power), I do not get the same plot, instead I get a boxplot.
Your DateTime column has its dates and times all within a 48-hour period, so R chooses the day of the week as the most appropriate x axis labels for you. You can change this formatting to whatever you like.
Since your example did not include any data, I've had to create some dummy data to show how this works:
set.seed(69)
x <- (as.POSIXct("2020-05-29 10:30:00") + 1:(24 * 60) * 300)[1:1000]
y <- rpois(1000, 50 * sin(seq(0, 12, length.out = 1000))^2) / 10
df <- data.frame(DateTime = x, `Global Active Power` = y)
So plotting this data, we get a similar layout to the plot in your question:
plot(df$DateTime, df$Global.Active.Power, type ="l", xlab ="Date", ylab ="Power")
Now, if I want to format with, say, the date, then I would draw the plot without an x axis then add a formatted axis like this:
plot(df$DateTime, df$Global.Active.Power,
type = "l", xaxt = "n", xlab = "Date", ylab = "Power")
axis.POSIXct(1, df$DateTime, format = "%d %b")
As for why your plot changes to a boxplot when you change the x axis to a factor variable according to the day of the week, you have transformed your time variable from a continuous to a discrete variable. There are only two weekdays in your data, so you will only have two points on your x axis where data can appear. R chooses a boxplot here because otherwise your plot would just be a mess, as you can see if I change the date-times to just dates:
plot(as.Date(df$DateTime),df$Global.Active.Power)
Created on 2020-05-29 by the reprex package (v0.3.0)

ts.plot() not plotting Time Series data against custom x-axis

I am having issues with trying to plot some Time Series data; namely, trying to plot the date (increments in months) against a real number (which represents price).
I can plot the data with just plot(months, mydata) with no issue, but its in a scatter plot format.
However, when I try the same with ts.plot i.e. tsplot(months, mydata), I get the following error:
Error in .cbind.ts(list(...), .makeNamesTs(...), dframe = dframe, union = TRUE) : no time series supplied
I tried to bypass this by doing tsplot(ts(months, mydata)), but with this I get a straight linear line (which I know isn't correct).
I have made sure that both months and mydata have the same length
EDIT: What I mean by custom x-axis
I need the data to be in monthly increments (specifically from 03/1998 to 02/2018) - so I ran the following in R:
d <- seq(as.Date("1998-03-01"), as.Date("2018-02-01"), "day")
months <- seq(min(d), max(d), "month")
Now that I have attained the monthly increments, I need the above variable, months, to act as the x-axis for the Time Series plot (perhaps more accurately, the time index).
With package zoo you can do the following.
library(zoo)
z <- zoo(mydata, order.by = months)
labs <- seq(min(index(z)), max(index(z)), length.out = 10)
plot(z, xaxt = "n")
axis(1, at = labs, labels = format(labs, "%m/%Y"))
Data creation code.
set.seed(1234)
d <- seq(as.Date("1998-03-01"), as.Date("2018-02-01"), "day")
months <- seq(min(d), max(d), "month")
n <- length(months)
mydata <- cumsum(rnorm(n))

Create barplot to represent time series in ggplot2

I have a basic dataframe with 3 columns: (i) a date (when a sample was taken); (ii) a site location and (iii) a binary variable indicating what the condition was when sampling (e.g. wet versus dry).
Some reproducible data:
df <- data.frame(Date = rep(seq(as.Date("2010-01-01"), as.Date("2010-12-01"), by="months"),times=2))
df$Site <- c(rep("Site.A",times = 12),rep("Site.B",times = 12))
df$Condition<- as.factor(c(0,0,0,0,1,1,1,1,0,0,0,0,
0,0,0,0,0,1,1,0,0,0,0,0))
What I would like to do is use ggplot to create a bar chart indicating the condition of each site (y axis) over time (x axis) - the condition indicated by a different colour. I am guessing some kind of flipped barplot would be the way to do this, but I cannot figure out how to tell ggplot2 to recognise the values chronologically, rather than summed for each condition. This is my attempt so far which clearly doesn't do what I need it to.
ggplot(df) +
geom_bar(aes(x=Site,y=Date,fill=Condition),stat='identity')+coord_flip()
So I have 2 questions. Firstly, how do I tell ggplot to recognise changes in condition over time and not just group each condition in a traditional stacked bar chart?
Secondly, it seems ggplot converts the date to a numerical value, how would I reformat the x-axis to show a time period, e.g. in a month-year format? I have tried doing this via the scale_x_date function, but get an error message.
labDates <- seq(from = (head(df$Date, 1)),
to = (tail(df$Date, 1)), by = "1 months")
Datelabels <-format(labDates,"%b %y")
ggplot(df) +
geom_bar(aes(x=Site,y=Date,fill=Condition),stat='identity')+coord_flip()+
scale_x_date(labels = Datelabels, breaks=labDates)
I have also tried converting sampling times to factors and displaying these instead. Below I have done this by changing each sampling period to a letter (in my own code, the factor levels are in a month-year format - I put letters here for simplicity). But I cannot format the axis to place each level of the factor as a tick mark. Either a date or factor solution for this second question would be great!
df$Factor <- as.factor(unique(df$Date))
levels(df$Factor) <- list(A = "2010-01-01", B = "2010-02-01",
C = "2010-03-01", D = "2010-04-01", E = "2010-05-01",
`F` = "2010-06-01", G = "2010-07-01", H = "2010-08-01",
I = "2010-09-01", J = "2010-10-01", K= "2010-11-01", L = "2010-12-01")
ggplot(df) +
geom_bar(aes(x=Site,y=Date,fill=Condition),stat='identity')+coord_flip()+
scale_y_discrete(breaks=as.numeric(unique(df$Date)),
labels=levels(df$Factor))
Thank you in advance!
It doesn't really make sense to use geom_bar() considering you do not want to summarise the data and require the visualisation over "time"
I would rather use geom_line() and increase the line thickness if you want to portray a bar chart.
library(tidyr)
library(dplyr)
library(ggplot2)
library(scales)
library(lubridate)
df <- data.frame(Date = rep(seq.Date(as.Date("2010-01-01"), as.Date("2010-12-01"), by="months"),times=2))
df$Site <- c(rep("Site.A",times = 12),rep("Site.B",times = 12))
df$Condition<- as.factor(c(0,0,0,0,1,1,1,1,0,0,0,0,
0,0,0,0,0,1,1,0,0,0,0,0))
df$Date <- ymd(df$Date)
ggplot(df) +
geom_line(aes(y=Site,x=Date,color=Condition),size=10)+
scale_x_date(labels = date_format("%b-%y"))
Note using coord_flip() also does not work, I think this causes the Date issue, see below threads:
how to use coord_carteisan and coord_flip together in ggplot2
In ggplot2, coord_flip and free scales don't work together

Time-series plot: change x-axis format in R

I'm trying to change the x-axis labels on a ts plot from the default (e.g. year.samplenumber) to an actual date. I had already searched in other threads, but the solution I found isn't quite working for me.
mm17
date fullband band1
1 2015/1/14 109.0873 107.0733
2 2015/1/15 110.1434 109.1999
3 2015/1/16 109.8811 108.6232
4 2015/1/17 110.4814 109.8164
5 2015/1/18 110.1513 109.2764
6 2015/1/19 110.3266 109.5860
mm17.ts<-ts(mm17.perday[,2], frequency=365, start=c(2015, 14))
cols<-c("red", "green", "orange", "purble", "blue")
dates<-as.Date(mm17[,1])
ts.plot(mm17.ts, col=cols[1], xaxt="n")
axis(1, dates, format(dates, "%m %d"), cex.axis = .7)
As you can see the axis command isn't working for some reason.
In general, ts class is not a good fit for daily data. It is more suitable for monthly, quarterly and annual data.
For daily data, it would be easier to just convert it to zoo and plot:
library(zoo)
z <- read.zoo(mm17, format = "%Y/%m/%d")
plot(z$fullband, col = "red")
Note
We assume the mm17 is given as shown below.
Lines <- " date fullband band1
1 2015/1/14 109.0873 107.0733
2 2015/1/15 110.1434 109.1999
3 2015/1/16 109.8811 108.6232
4 2015/1/17 110.4814 109.8164
5 2015/1/18 110.1513 109.2764
6 2015/1/19 110.3266 109.5860"
mm17 <- read.table(text = Lines)
What's happening here is a mismatch between the underlying numeric values of the dates as plotted by ts.plot and the dates vector. The x-axis dates in the output of ts.plot literally have magnitudes of 2015.1, 2015.2 etc. However, the underlying numeric values of the dates in the dates vector are the number of days from January 1, 1970 to the given date (dates in R are actually numeric values with a Date class attached). For example:
dates
[1] "2015-01-14" "2015-01-15" "2015-01-16" "2015-01-17" "2015-01-18" "2015-01-19"
as.numeric(dates)
[1] 16449 16450 16451 16452 16453 16454
x=16449
class(x)="Date"
x
[1] "2015-01-14"
You can also see this with the following code. We expand the x-axis range to include the numeric values listed above. Note how you can see one of your date labels way out on the right end of the plot at 16,449, while the data values are plotted near the left side at 2015:
ts.plot(mm17.ts, col=cols[1], xlim=c(0, 16455))
axis(1, dates, format(dates, "%m %d"), cex.axis = .8, col.axis="red")
axis(1, 2015, 2015, cex.axis = .8, col.axis="red")
So, let's change the at argument in the axis function so that we get the date labels placed at the correct locations. We'll use a couple of functions from the lubridate package to help with this. Also, note that to remove the default x-axis labels, ts.plot requires that xaxt (and other graphical parameters) be passed as a list using the gpars argument (see the ts.plot help for more on this):
library(lubridate)
ts.plot(mm17.ts, col=cols[1], gpars=list(xaxt="n"))
axis(1, at=year(dates) + yday(dates)/365.24, labels=format(dates, "%m %d"), cex.axis = .7)

Resources