labeling axis of dates in ggplot? - r

I am trying to making plots using ggplot in R and I have the same problem that was discussed below.
Date axis labels in ggplot2 is one day behind
My data ranges from 2016-09-01 to 2016-09-30, but labels in plots say 2016-08-31 is the first day of data.
I solved the problem with the solution in the previous question, which is:
ggplot(df, aes(x, y)) +
geom_point() +
scale_x_datetime(breaks =df$x , labels = format(df$x, "%Y-%m-%d"))
(Is this to set breaks and labels by taking exact dates from the data?)
Anyways, I have a new problem,
dates match to labels well now but the plot does not look good.
I am not complaining length of dates is too long, but I don't like I can't set breaks and labels by a week or a certain number of days with the solution above.
Also, I have many missing dates.
What should I do to solve this problem? I need a new solution.

Just use this if you want your dates to appear vertically (that way you can see all your dates):
ggplot(df, aes(x, y)) +
geom_point() +
scale_x_datetime(breaks =df$x , labels = format(df$x, "%Y-%m-%d")) +
theme(axis.text.x = element_text(angle=90, vjust = 0.5))

I found the solution... Maybe my question was not described here in detail.
My solution for the situation where dates did not match to values on an axis and I wanted to make plots look better is:
# set breaks first by seq.POSIXt
breaks.index <- seq.POSIXt(from=as.POSIXct(strftime("2020-01-01", format="%Y-%m-%d"), format="%Y-%m-%d"), to=as.POSIXct(strftime("2020-12-31", format="%Y-%m-%d"), format="%Y-%m-%d"), by="1 week")
and
# plot
plot <- ggplot(data, aes(x=date, y=y)
+scale_x_datetime(breaks = breaks.index, labels = format(breaks.index, "%Y-%m-%d"))
plot
.
Though I don't understand what is different from using scale_x_date(date_labels ='%F') and how this code works, it works.

Related

R ggplot2: labels on time axis to monthly bins

I am trying to adjust the way ggplot labels the axis. My code is:
x = as.POSIXct(c(1, 9999999), origin="1970-01-01 00:00:00 CET")
y = c(1,2)
df = data.frame(x=x, y=y)
ggplot()+
geom_line(data = df, aes(y=y, x=x))
and it produces this output:
I think it would be intuitive to place the labels mid-month, get rid of mid-month grid lines and so on... a bit like this:
Can this be accomplished with ggplot2?
This is a very hacky solution, but it's an option. I don't think it's possible to label on minor breaks only, so to do what you're looking for, you need to label on major breaks, offset labels with spaces (or maybe tabs), and then hide the minor break panel lines, like so:
ggplot()+
geom_line(data = df, aes(y=y, x=x)) +
scale_x_datetime(date_breaks = "1 month", date_labels=paste0(" ","%b")) +
theme(panel.grid.minor = element_blank())

ggplot x axis trouble

Currently, I have this plot that looks like this:
I don't like how on the x-axis there are weird lines / bars. I suspect this may be because ggplot can't fit all 540000 observations in the x axis. Here is the code I used to graph this:
data %>%
ggplot() +
geom_point(aes(x = dates_df$date, y = Quantity)) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
labs(x = "Invoice Date", y = "Quantity", title = "Quantity vs Invoice Date")
What can I do to get rid of / solve this mess on the x-axis?
As was told on comments it seems there is a mess in Date column and you use of two separate data frames. As first join the data. I assume both of them has some Id or other key like name in column:
library("dplyr")
left_join(data,dates_df,by="id")
Date is also a character as was mentioned. To change it to Date, if you haven't already do this use as.Date function. After joining
data$date<- as.Date(data$date, "%m/%d/%Y")
you can find other date formats here: http://www.statmethods.net/input/dates.html
You said there are 540 000 observation on x axis. My suggestfion is to separate the chart for unique year. To do this use facet_grid function inside ggplot.
library(lubridate)
ggplot(df, aes(x= df$date,y= df$Quantity))+
geom_point() +
facet_grid(~year(df$date))
Hope it helped :)

R, ggplot2, skip printing x values

This might be fairly simple but yet i cant seem to find out how to do it.
I got a nice plot with a group of lines of values in it.
The y represents an amount, the x represents dates.
The problem is simple, there so many dates that they are printed on top of each other.
The code :
sp = rbind(sp1,sp2,sp3,sp4)
pm = ggplot(data = sp, aes(x = date,
y = amount,
colour=sm,
group=sm)) +
geom_line()
How can I make the x axis only print for example every 5 dates instead of all of them?
Thanks in advance!
library(scales)
sp = rbind(sp1,sp2,sp3,sp4)
pm = ggplot(data = sp, aes(x = date, y = amount, colour=sm, group=sm)) +
geom_line() +
scale_x_date("x axis title", breaks = "5 years")
scale_x_date will sort out the x axis labels for you. To specify the label intervals use the scales packages as above. (p.s your dates need to be of class Date, POSIXct or POSIXlt)

How to tell R's ggplot2 to put tick marks for some values of x-axis and still keep vertical lines for other values

I am creating a times series using ggplot2 in R. I would like to know how to show tick marks in the x-axis only for the months that are labeled (e.g. Mar 07, Mar 08, etc) while keeping the vertical grey lines for every single month.
The main reason is because having a tick mark for every month makes it hard to know which one correspond to the labels.
Here is an example of a plot:
Here is the line of R behind:
ggplot(timeseries_plot_data_mean,aes(as.numeric(project_date)))+
geom_line(aes(y=num_views))+geom_point(aes(y=num_views))+
stat_smooth(aes(y=num_views),method="lm")+
scale_x_continuous(breaks = xscale$breaks, labels = xscale$labels)+
opts(title="Monthly average num views")+xlab("months")+ylab("num views")
This is what would like to generate. See how the ticks are positioned right above the month label and the vertical lines are still there showing each month.
I manually edited the plot above using Inkscape, (ignore the q's, Inkscape strangely replaced the dots for q's)
Here is a solution using the minor_breaks parameter of scale_x_date(). To use this, your x-values must be of class Date instead of numeric.
library(ggplot2)
set.seed(123)
x <- seq(as.Date("2007/3/1"), as.Date("2012/4/1"), by = "1 month")
y <- ((exp(-10 * seq(from=0, to=1, length.out=length(x))) * 120) +
runif(length(x), min=-10, max=10))
dat <- data.frame(Months=x, Views=y)
x_breaks <- seq(as.Date("2007/3/1"), as.Date("2012/4/1"), by="1 year")
x_labels <- as.character(x_breaks, format="%h-%y")
plot_1 <- ggplot(dat, aes(x=Months, y=Views)) +
theme_bw() +
geom_line() +
geom_point() +
scale_x_date(breaks=x_breaks, labels=x_labels, minor_breaks=dat$Months)
png("plot_1.png", width=600, height=240)
print(plot_1)
dev.off()

Date labels overlap when putting multiple ggplot plots on single page

I am trying to put multiple ggplot2 time series plots on a page using the gridExtra package's arrange() function. Unfortunately, I am finding that the x-axis labels get pushed together; it appears that the plot is putting the same number of x-axis labels as a full-page chart, even though my charts only take up 1/4 of a page. Is there a better way to do this? I would prefer not to have to manually set any points, since I will be dealing with a large number of charts that span different date ranges and have different frequencies.
Here is some example code that replicates the problem:
dfm <- data.frame(index=seq(from=as.Date("2000-01-01"), length.out=100, by="year"),
x1=rnorm(100),
x2=rnorm(100))
mydata <- melt(dfm, id="index")
pdf("test.pdf")
plot1 <- ggplot(mydata, aes(index, value, color=variable))+geom_line()
plot2 <- ggplot(mydata, aes(index, value, color=variable))+geom_line()
plot3 <- ggplot(mydata, aes(index, value, color=variable))+geom_line()
plot4 <- ggplot(mydata, aes(index, value, color=variable))+geom_line()
arrange(plot1, plot2, plot3, plot4, ncol=2, nrow=2)
dev.off()
either rotate the axis labels
+ opts(axis.text.x=theme_text(angle=45, hjust=1))
Note that opts is deprecated in current versions of ggplot2. This functionality has been moved to theme():
+ theme(axis.text.x = element_text(angle = 45, hjust = 1))
or dilute the x-axis
+scale_x_datetime(major = "10 years")
to automatically shift the labels, I think the arrange() function needs to be fiddled with (though I'm not sure how).
I wrote this function to return the proper major axis breaks given that you want some set number of major breaks.
year.range.major <- function(df, column = "index", n = 5){
range <- diff(range(df[,column]))
range.num <- as.numeric(range)
major = max(pretty((range.num/365)/n))
return(paste(major,"years"))
}
So, instead of always fixing the breaks at 10 years, it'll produce fixed number of breaks at nice intervals.
+scale_x_date(major = year.range.major())
or
+scale_x_date(major = year.range.major(n=3))

Resources