I'm trying to change the x-axis labels on a ts plot from the default (e.g. year.samplenumber) to an actual date. I had already searched in other threads, but the solution I found isn't quite working for me.
mm17
date fullband band1
1 2015/1/14 109.0873 107.0733
2 2015/1/15 110.1434 109.1999
3 2015/1/16 109.8811 108.6232
4 2015/1/17 110.4814 109.8164
5 2015/1/18 110.1513 109.2764
6 2015/1/19 110.3266 109.5860
mm17.ts<-ts(mm17.perday[,2], frequency=365, start=c(2015, 14))
cols<-c("red", "green", "orange", "purble", "blue")
dates<-as.Date(mm17[,1])
ts.plot(mm17.ts, col=cols[1], xaxt="n")
axis(1, dates, format(dates, "%m %d"), cex.axis = .7)
As you can see the axis command isn't working for some reason.
In general, ts class is not a good fit for daily data. It is more suitable for monthly, quarterly and annual data.
For daily data, it would be easier to just convert it to zoo and plot:
library(zoo)
z <- read.zoo(mm17, format = "%Y/%m/%d")
plot(z$fullband, col = "red")
Note
We assume the mm17 is given as shown below.
Lines <- " date fullband band1
1 2015/1/14 109.0873 107.0733
2 2015/1/15 110.1434 109.1999
3 2015/1/16 109.8811 108.6232
4 2015/1/17 110.4814 109.8164
5 2015/1/18 110.1513 109.2764
6 2015/1/19 110.3266 109.5860"
mm17 <- read.table(text = Lines)
What's happening here is a mismatch between the underlying numeric values of the dates as plotted by ts.plot and the dates vector. The x-axis dates in the output of ts.plot literally have magnitudes of 2015.1, 2015.2 etc. However, the underlying numeric values of the dates in the dates vector are the number of days from January 1, 1970 to the given date (dates in R are actually numeric values with a Date class attached). For example:
dates
[1] "2015-01-14" "2015-01-15" "2015-01-16" "2015-01-17" "2015-01-18" "2015-01-19"
as.numeric(dates)
[1] 16449 16450 16451 16452 16453 16454
x=16449
class(x)="Date"
x
[1] "2015-01-14"
You can also see this with the following code. We expand the x-axis range to include the numeric values listed above. Note how you can see one of your date labels way out on the right end of the plot at 16,449, while the data values are plotted near the left side at 2015:
ts.plot(mm17.ts, col=cols[1], xlim=c(0, 16455))
axis(1, dates, format(dates, "%m %d"), cex.axis = .8, col.axis="red")
axis(1, 2015, 2015, cex.axis = .8, col.axis="red")
So, let's change the at argument in the axis function so that we get the date labels placed at the correct locations. We'll use a couple of functions from the lubridate package to help with this. Also, note that to remove the default x-axis labels, ts.plot requires that xaxt (and other graphical parameters) be passed as a list using the gpars argument (see the ts.plot help for more on this):
library(lubridate)
ts.plot(mm17.ts, col=cols[1], gpars=list(xaxt="n"))
axis(1, at=year(dates) + yday(dates)/365.24, labels=format(dates, "%m %d"), cex.axis = .7)
Related
I am plotting data from 1st Feb 2020 to 31st Dec 2020 i.e. 11 months worth.
I have converted the x axis data from character to date format and it now shows six ticks labelled Mar, May, Jul, Sep, Nov, Jan which must be the default
I want 11 ticks labelled Feb through Dec.
I am pretty new to 'R' (obviously).
Assuming your data and plot look like something this:
dates <- seq(as.Date("2020-02-01"), as.Date("2020-12-31"), by = "day")
values <- cumsum(rnorm(length(dates)))
plot(dates, values, type = "l")
Then we can plot with a blank axis using xaxt = "n" in our plot, then add a custom axis using axis, setting the x axis labels with at
plot(dates, values, type = "l", xaxt = "n")
axis.Date(1, dates,
at = seq(as.Date("2020-02-01"), as.Date("2020-12-01"), by = "1 month"))
Created on 2020-11-16 by the reprex package (v0.3.0)
I assume you are using ggplot to for your data. If so you will need to format the style for the ticks. There are plenty of tutorials for this online. Here is a nice one
https://www.r-bloggers.com/2018/06/customizing-time-and-date-scales-in-ggplot2/
You will see scale_x_date here, I beleive this is what you are looking for:
scale_x_date(date_breaks ="1 month")
I have a data set where the X axis variable is Date / Time data. I find the following syntax works when the X axis variable is a Date but when it is a date-time it does not seem to work.
What I wanted was the X axis here to have (say) weekly labels.
Any ideas how to make this work in plot(). Don't want to switch to ggplot etc.
This does not work:
plot(x = data$Time,y=data$foobar,
xlab = "Date / Time",
ylab = "y-foo-bar",main = "foo",xaxt="n")
axis.Date(1,data$Time,
at=seq(as.POSIXct("2020-04-01 16:36:00 IST"),
as.POSIXct("2020-05-01 16:36:00 IST"),by="weeks"))
Nor this:
axis.Date(1,data$Time,at=seq(as.Date("2020/04/01"),
as.Date("2020/05/01"),by="weeks"))
For more context:
class(data$Time)
[1] "POSIXct" "POSIXt"
data$Time[500]
[1] "2020-03-24 08:18:00 IST"
I would define ticks mark positions and labels beforehand. Use axis without labels then mtext, since with its las= argument you're able to rotate the labels.
Using strftime you may extract weeks (or else, just lookup ?strftime) and subset the time points.
weeks <- strftime(dat$time, "%W")
ats <- dat$time[!duplicated(weeks)]
labs <- strftime(ats, "%m-%d")
with(dat, plot(x=time, y=x, type="l", main="foo", xaxt="n"))
axis(1, at=ats, labels=FALSE)
mtext(labs, side=1, line=.75, at=ats, las=2)
I omitted the year, since it might be redundant information. You could also omit the month by using two mtexts in different lines and also omit the duplicates.
Data
set.seed(33720)
n <- 100
dat <- data.frame(time=seq(1585034280, (1585034280 + n*24*60*60), length.out=n),
x=cumsum(rexp(n)))
dat$time <- as.POSIXct(dat$time, origin="1970-01-01")
I have a dataframe with two columns, Global Active Power is a numeric column and DateTime is a datetime type column. When I execute the command plot(DateTime,Global Active Power), I automatically get the days of the week as ticks on the x axis.
1. Can someone explain how this is happening?
2. Also, when I run plot(as.factor(weekdays(DateTime)),Global Active Power), I do not get the same plot, instead I get a boxplot.
Your DateTime column has its dates and times all within a 48-hour period, so R chooses the day of the week as the most appropriate x axis labels for you. You can change this formatting to whatever you like.
Since your example did not include any data, I've had to create some dummy data to show how this works:
set.seed(69)
x <- (as.POSIXct("2020-05-29 10:30:00") + 1:(24 * 60) * 300)[1:1000]
y <- rpois(1000, 50 * sin(seq(0, 12, length.out = 1000))^2) / 10
df <- data.frame(DateTime = x, `Global Active Power` = y)
So plotting this data, we get a similar layout to the plot in your question:
plot(df$DateTime, df$Global.Active.Power, type ="l", xlab ="Date", ylab ="Power")
Now, if I want to format with, say, the date, then I would draw the plot without an x axis then add a formatted axis like this:
plot(df$DateTime, df$Global.Active.Power,
type = "l", xaxt = "n", xlab = "Date", ylab = "Power")
axis.POSIXct(1, df$DateTime, format = "%d %b")
As for why your plot changes to a boxplot when you change the x axis to a factor variable according to the day of the week, you have transformed your time variable from a continuous to a discrete variable. There are only two weekdays in your data, so you will only have two points on your x axis where data can appear. R chooses a boxplot here because otherwise your plot would just be a mess, as you can see if I change the date-times to just dates:
plot(as.Date(df$DateTime),df$Global.Active.Power)
Created on 2020-05-29 by the reprex package (v0.3.0)
R could be amazingly powerful and frustrating at the same time. This makes teaching R to non-statisticians (business students in my case) rather challenging. Let me illustrate this with a simple task.
Let's say you are working with a monthly time series dataset. Most business data are usually plotted as monthly time series. We would like to plot the data such that the x-axis depicts a combination of month and year. For instance, January 2017 could be depicted as 2017-01. It should be straightforward with the plot command. Not true.
Data Generation
Let's illustrate this with an example. I'll generate a random time series of monthly data for 120 observations representing 10 years of information starting in January 2007 and ending in December 2017. Here's the code.
set.seed(1234)
x <- rnorm(120)
d <-.07
y <- cumsum(x+d)*-1
Since we have not declared the data as time series, plotting it with the plot command would not return the intended labels for the x-axis. See the code and the chart below.
plot(y, type="l")
Now there should be an option in the plot or the plot.ts command to display the time series specific x-axis. I couldn't find one. So here's the workaround.
Declare the data set to be time series.
Use tsp and seq to generate the required x-axis labels.
Plot the chart but suppress x-axis.
Use the axis command to add the custom x-axis labels.
Add an extra step to draw a vertical line at 2012.
Here's the code.
my.ts <- ts(y, start=c(2007, 1), end=c(2017, 12), frequency=12)
tsp = attributes(my.ts)$tsp
dates = seq(as.Date("2007-01-01"), by = "month", along = my.ts)
plot(my.ts, xaxt = "n", main= "Plotting outcome over time",
ylab="outcome", xlab="time")
axis(1, at = seq(tsp[1], tsp[2], along = my.ts), labels = format(dates, "%Y-%m"))
abline(v=2012, col="blue", lty=2, lwd=2)
The result is charted below.
This is a workable solution for most data scientists. But if your audience comprises business students or professionals there are too many lines of code to write.
Question: Is it possible to plot a time series variable (object) using the plot command with the format option controlling how the x-axis will be displayed?
--
ggplot2 package has the scale_x_date function for plotting time series in desired scales, labels, breaks and limits (day, month, year formats).
All you need is date class object and values y. For eg.
dates = seq(as.Date("01-01-2007", format = "%d-%m-%Y"), length.out = 120, by = "month")
df <- data.frame(dates, y)
# use the format you need in your plot using scale_x_date
library(ggplot2)
ggplot(df, aes(dates, y)) + geom_line() + scale_x_date(date_labels = "%b-%Y") +
geom_vline(xintercept = as.Date("01-01-2012", format = "%d-%m-%Y"), linetype = 'dotted', color = 'blue')
I think the question boils down to wanting a pre-written function for the custom axis you have in mind. Note that plot(my.ts) does give a plot with ticks every month and labels every year which to me looks better than the plot shown in the question but if you want a custom axis since R is a programming language you can certainly write a simple function for that and from then on it's just a matter of calling that function.
For example, to get you started here is a function that accepts a frequency 12 ts object. It draws an X axis with ticks for each month labelling the years and each every'th month where the every argument can be a divisor of 12. The default is 3 so a label for every third month is shown (except Jan which is shown as the year). len is the number of letters of the month shown and can be 1, 2 or 3. 1 means show Jul as J, 2 means Ju and 3 means Jul. The default is 1.
xaxis12 <- function(ser, every = 3, len = 1) {
tt <- time(ser)
axis(side = 1, at = tt, labels = FALSE)
is.every <- cycle(ser) %in% seq(1, 12, every)[-1]
month.labs <- substr(month.abb[cycle(ser)][is.every], 1, len)
axis(side = 1, at = tt[is.every], labels = month.labs,
cex.axis = 0.7, tcl = -0.75)
is.jan <- cycle(ser) == 1
year.labs <- sprintf("'%02d", as.integer(tt)[is.jan] %% 100)
axis(side = 1, at = tt[is.jan], labels = year.labs,
cex.axis = 0.7, tcl = -1)
}
# test
plot(my.ts, xaxt = "n")
xaxis12(my.ts)
Gabor is spot-on. It really just depends on what you want, and what you are willing to dig up or alter. Here is a simple alternative using a newer and less-well-known package that is excellent for plotting xts types:
## alternative
library(rtsplot) # load the plotting package
library(xts) # load the xts time-series container package
xx <- as.xts(my.ts) # create an xts object
rtsplot(xx, main= "Plotting outcome over time")
rtsplot.x.highlight(xx, which(index(xx)=="Jan 2012"), 1)
As you can see, the plotting then is two calls -- rtsplot has lots of nice defaults. Below is a screenshot as I am lazy, the plot window does of course not have a title bar...
Suppose I have a vector of numbers from 1:12 and want to plot them over period of time ranged from Jan. 2013 to Dec. 2013. I used the following code to generate the data and plotting:
dates<-seq(as.Date("2013/1/1"), by = "month", length.out = 12)
n<-seq(1:12)
df<-cbind(dates,n)
plot(df)
However, some problems come up with the last code; Firstly could not find an option in the first seq to generate only months and year without day. Secondly, all dates in df become serial even after adding as.Date before dates in cbind. Finally, the x axis in the plot not in the time format as a result of the last two problems.
just use
plot(dates,n)
without cbinding it. cbind creates a matrix (see class(df)). Within this process the dates are saved as class numeric.
For nicer and easier to customize plots use
require(ggplot2)
qplot(dates,n) + xlab("") + ylab("my y lab")
df<-data.frame(dates=dates,n=n)
plot(df$dates, df$n, axes=FALSE)
axis(1, labels=format(df$dates, "%b %Y"), at=df$dates)
axis(2)