I have a dataframe with two columns, Global Active Power is a numeric column and DateTime is a datetime type column. When I execute the command plot(DateTime,Global Active Power), I automatically get the days of the week as ticks on the x axis.
1. Can someone explain how this is happening?
2. Also, when I run plot(as.factor(weekdays(DateTime)),Global Active Power), I do not get the same plot, instead I get a boxplot.
Your DateTime column has its dates and times all within a 48-hour period, so R chooses the day of the week as the most appropriate x axis labels for you. You can change this formatting to whatever you like.
Since your example did not include any data, I've had to create some dummy data to show how this works:
set.seed(69)
x <- (as.POSIXct("2020-05-29 10:30:00") + 1:(24 * 60) * 300)[1:1000]
y <- rpois(1000, 50 * sin(seq(0, 12, length.out = 1000))^2) / 10
df <- data.frame(DateTime = x, `Global Active Power` = y)
So plotting this data, we get a similar layout to the plot in your question:
plot(df$DateTime, df$Global.Active.Power, type ="l", xlab ="Date", ylab ="Power")
Now, if I want to format with, say, the date, then I would draw the plot without an x axis then add a formatted axis like this:
plot(df$DateTime, df$Global.Active.Power,
type = "l", xaxt = "n", xlab = "Date", ylab = "Power")
axis.POSIXct(1, df$DateTime, format = "%d %b")
As for why your plot changes to a boxplot when you change the x axis to a factor variable according to the day of the week, you have transformed your time variable from a continuous to a discrete variable. There are only two weekdays in your data, so you will only have two points on your x axis where data can appear. R chooses a boxplot here because otherwise your plot would just be a mess, as you can see if I change the date-times to just dates:
plot(as.Date(df$DateTime),df$Global.Active.Power)
Created on 2020-05-29 by the reprex package (v0.3.0)
Related
#This is my code, I need help in improving the x axis resolution that is monthly (an 2017, Feb 2017.....Dec 2020)
ggplot()+
geom_line(data=IDA_DATA,
aes(y=final1,x= Date,colour="darkblue"),size=1 )+
geom_line(data=IDA_DATA,
aes(y=fpmc2,x= Date,colour="red"),
size=1) +
scale_color_discrete(name = "Y series", labels = c("Adjusted Trend", "Long Term Comp"))+
theme(legend.position = c(.85, .85))+
labs(y="PM10 Conc (ug/m3)")
I am assuming your Date variable is an actual date type (ie. numeric).
If your data is continuous (measurements on multiple days per month), you can use the breaks parameter in scale_x_continuous() to specify each break you want. Your breaks will also have to be in date format, not text strings, so you'll need dmy() to convert a vector of all the first-day-of-the-month dates. That can be tedious if your data spans many years, but I can't think of how else you can force all the month labels to show up on a continuous axis.
If your data can be summarised as a single x value for each entire month, you can convert the Date into a month-year string or factor variable and use that as a discrete x axis. Discrete axis automatically show all the x values, so you won't have to create all the breaks. This should be relatively easy to achieve.
I have a data set where the X axis variable is Date / Time data. I find the following syntax works when the X axis variable is a Date but when it is a date-time it does not seem to work.
What I wanted was the X axis here to have (say) weekly labels.
Any ideas how to make this work in plot(). Don't want to switch to ggplot etc.
This does not work:
plot(x = data$Time,y=data$foobar,
xlab = "Date / Time",
ylab = "y-foo-bar",main = "foo",xaxt="n")
axis.Date(1,data$Time,
at=seq(as.POSIXct("2020-04-01 16:36:00 IST"),
as.POSIXct("2020-05-01 16:36:00 IST"),by="weeks"))
Nor this:
axis.Date(1,data$Time,at=seq(as.Date("2020/04/01"),
as.Date("2020/05/01"),by="weeks"))
For more context:
class(data$Time)
[1] "POSIXct" "POSIXt"
data$Time[500]
[1] "2020-03-24 08:18:00 IST"
I would define ticks mark positions and labels beforehand. Use axis without labels then mtext, since with its las= argument you're able to rotate the labels.
Using strftime you may extract weeks (or else, just lookup ?strftime) and subset the time points.
weeks <- strftime(dat$time, "%W")
ats <- dat$time[!duplicated(weeks)]
labs <- strftime(ats, "%m-%d")
with(dat, plot(x=time, y=x, type="l", main="foo", xaxt="n"))
axis(1, at=ats, labels=FALSE)
mtext(labs, side=1, line=.75, at=ats, las=2)
I omitted the year, since it might be redundant information. You could also omit the month by using two mtexts in different lines and also omit the duplicates.
Data
set.seed(33720)
n <- 100
dat <- data.frame(time=seq(1585034280, (1585034280 + n*24*60*60), length.out=n),
x=cumsum(rexp(n)))
dat$time <- as.POSIXct(dat$time, origin="1970-01-01")
R could be amazingly powerful and frustrating at the same time. This makes teaching R to non-statisticians (business students in my case) rather challenging. Let me illustrate this with a simple task.
Let's say you are working with a monthly time series dataset. Most business data are usually plotted as monthly time series. We would like to plot the data such that the x-axis depicts a combination of month and year. For instance, January 2017 could be depicted as 2017-01. It should be straightforward with the plot command. Not true.
Data Generation
Let's illustrate this with an example. I'll generate a random time series of monthly data for 120 observations representing 10 years of information starting in January 2007 and ending in December 2017. Here's the code.
set.seed(1234)
x <- rnorm(120)
d <-.07
y <- cumsum(x+d)*-1
Since we have not declared the data as time series, plotting it with the plot command would not return the intended labels for the x-axis. See the code and the chart below.
plot(y, type="l")
Now there should be an option in the plot or the plot.ts command to display the time series specific x-axis. I couldn't find one. So here's the workaround.
Declare the data set to be time series.
Use tsp and seq to generate the required x-axis labels.
Plot the chart but suppress x-axis.
Use the axis command to add the custom x-axis labels.
Add an extra step to draw a vertical line at 2012.
Here's the code.
my.ts <- ts(y, start=c(2007, 1), end=c(2017, 12), frequency=12)
tsp = attributes(my.ts)$tsp
dates = seq(as.Date("2007-01-01"), by = "month", along = my.ts)
plot(my.ts, xaxt = "n", main= "Plotting outcome over time",
ylab="outcome", xlab="time")
axis(1, at = seq(tsp[1], tsp[2], along = my.ts), labels = format(dates, "%Y-%m"))
abline(v=2012, col="blue", lty=2, lwd=2)
The result is charted below.
This is a workable solution for most data scientists. But if your audience comprises business students or professionals there are too many lines of code to write.
Question: Is it possible to plot a time series variable (object) using the plot command with the format option controlling how the x-axis will be displayed?
--
ggplot2 package has the scale_x_date function for plotting time series in desired scales, labels, breaks and limits (day, month, year formats).
All you need is date class object and values y. For eg.
dates = seq(as.Date("01-01-2007", format = "%d-%m-%Y"), length.out = 120, by = "month")
df <- data.frame(dates, y)
# use the format you need in your plot using scale_x_date
library(ggplot2)
ggplot(df, aes(dates, y)) + geom_line() + scale_x_date(date_labels = "%b-%Y") +
geom_vline(xintercept = as.Date("01-01-2012", format = "%d-%m-%Y"), linetype = 'dotted', color = 'blue')
I think the question boils down to wanting a pre-written function for the custom axis you have in mind. Note that plot(my.ts) does give a plot with ticks every month and labels every year which to me looks better than the plot shown in the question but if you want a custom axis since R is a programming language you can certainly write a simple function for that and from then on it's just a matter of calling that function.
For example, to get you started here is a function that accepts a frequency 12 ts object. It draws an X axis with ticks for each month labelling the years and each every'th month where the every argument can be a divisor of 12. The default is 3 so a label for every third month is shown (except Jan which is shown as the year). len is the number of letters of the month shown and can be 1, 2 or 3. 1 means show Jul as J, 2 means Ju and 3 means Jul. The default is 1.
xaxis12 <- function(ser, every = 3, len = 1) {
tt <- time(ser)
axis(side = 1, at = tt, labels = FALSE)
is.every <- cycle(ser) %in% seq(1, 12, every)[-1]
month.labs <- substr(month.abb[cycle(ser)][is.every], 1, len)
axis(side = 1, at = tt[is.every], labels = month.labs,
cex.axis = 0.7, tcl = -0.75)
is.jan <- cycle(ser) == 1
year.labs <- sprintf("'%02d", as.integer(tt)[is.jan] %% 100)
axis(side = 1, at = tt[is.jan], labels = year.labs,
cex.axis = 0.7, tcl = -1)
}
# test
plot(my.ts, xaxt = "n")
xaxis12(my.ts)
Gabor is spot-on. It really just depends on what you want, and what you are willing to dig up or alter. Here is a simple alternative using a newer and less-well-known package that is excellent for plotting xts types:
## alternative
library(rtsplot) # load the plotting package
library(xts) # load the xts time-series container package
xx <- as.xts(my.ts) # create an xts object
rtsplot(xx, main= "Plotting outcome over time")
rtsplot.x.highlight(xx, which(index(xx)=="Jan 2012"), 1)
As you can see, the plotting then is two calls -- rtsplot has lots of nice defaults. Below is a screenshot as I am lazy, the plot window does of course not have a title bar...
I'm trying to change the x-axis labels on a ts plot from the default (e.g. year.samplenumber) to an actual date. I had already searched in other threads, but the solution I found isn't quite working for me.
mm17
date fullband band1
1 2015/1/14 109.0873 107.0733
2 2015/1/15 110.1434 109.1999
3 2015/1/16 109.8811 108.6232
4 2015/1/17 110.4814 109.8164
5 2015/1/18 110.1513 109.2764
6 2015/1/19 110.3266 109.5860
mm17.ts<-ts(mm17.perday[,2], frequency=365, start=c(2015, 14))
cols<-c("red", "green", "orange", "purble", "blue")
dates<-as.Date(mm17[,1])
ts.plot(mm17.ts, col=cols[1], xaxt="n")
axis(1, dates, format(dates, "%m %d"), cex.axis = .7)
As you can see the axis command isn't working for some reason.
In general, ts class is not a good fit for daily data. It is more suitable for monthly, quarterly and annual data.
For daily data, it would be easier to just convert it to zoo and plot:
library(zoo)
z <- read.zoo(mm17, format = "%Y/%m/%d")
plot(z$fullband, col = "red")
Note
We assume the mm17 is given as shown below.
Lines <- " date fullband band1
1 2015/1/14 109.0873 107.0733
2 2015/1/15 110.1434 109.1999
3 2015/1/16 109.8811 108.6232
4 2015/1/17 110.4814 109.8164
5 2015/1/18 110.1513 109.2764
6 2015/1/19 110.3266 109.5860"
mm17 <- read.table(text = Lines)
What's happening here is a mismatch between the underlying numeric values of the dates as plotted by ts.plot and the dates vector. The x-axis dates in the output of ts.plot literally have magnitudes of 2015.1, 2015.2 etc. However, the underlying numeric values of the dates in the dates vector are the number of days from January 1, 1970 to the given date (dates in R are actually numeric values with a Date class attached). For example:
dates
[1] "2015-01-14" "2015-01-15" "2015-01-16" "2015-01-17" "2015-01-18" "2015-01-19"
as.numeric(dates)
[1] 16449 16450 16451 16452 16453 16454
x=16449
class(x)="Date"
x
[1] "2015-01-14"
You can also see this with the following code. We expand the x-axis range to include the numeric values listed above. Note how you can see one of your date labels way out on the right end of the plot at 16,449, while the data values are plotted near the left side at 2015:
ts.plot(mm17.ts, col=cols[1], xlim=c(0, 16455))
axis(1, dates, format(dates, "%m %d"), cex.axis = .8, col.axis="red")
axis(1, 2015, 2015, cex.axis = .8, col.axis="red")
So, let's change the at argument in the axis function so that we get the date labels placed at the correct locations. We'll use a couple of functions from the lubridate package to help with this. Also, note that to remove the default x-axis labels, ts.plot requires that xaxt (and other graphical parameters) be passed as a list using the gpars argument (see the ts.plot help for more on this):
library(lubridate)
ts.plot(mm17.ts, col=cols[1], gpars=list(xaxt="n"))
axis(1, at=year(dates) + yday(dates)/365.24, labels=format(dates, "%m %d"), cex.axis = .7)
I am working with XYPLOTS, where x-axis is a DateTime variable and y-axis contains a numeric variable. Due to huge number of DateTime (for 3 days and every 15 mins there is a data point).
The graph looks good but the xaxis labels are merged together.
If i reduce the tick numbers, the labels can be seen clearly. I dont know how to change the tick numbers or limit the intervals for a DATETIME variable.
DateTime looks like this : 2014-04-08 17:00:00, 2014-04-08 17:15:00, ... etc.
Code I use right now:
xyplot(upper + lower + New1 ~ DateTime,data = a1,type = "l",lty = c(2, 2, 1),lwd = c(1, 1, 3),col.line = c(rep("black",2), "red"), scales=list(x=list(rot=45)))
This dataset can be a good example except that, x is datetime not just year:
df <- data.frame(x=paste0(rep(1960:1999, each=4), paste0("Q", 1:4)), y=1:160)
How can I handle this!
I fixed the same with ggplot. I used library(ggplot2) and library(scales) to handle the issue. It came out very well.