Breaks for scale_x_date in ggplot2 and R - r

I am plotting value~date in ggplot2 (in R). I have the following code. As you see ggplot2 adds more breaks on the x-axis that I have in my data. I just want to have the x-label everytime I have a data point in my data frame. How can I force ggplot2 to just show the breaks only at the values of my.dates? It seems there is no "breaks" argument for scale_x_date
require(ggplot2)
my.dates = as.Date(c("2011-07-22","2011-07-23",
"2011-07-24","2011-07-28","2011-07-29"))
my.vals = c(5,6,8,7,3)
my.data <- data.frame(date =my.dates, vals = my.vals)
plot(my.dates, my.vals)
p <- ggplot(data = my.data, aes(date,vals))+ geom_line(size = 1.5)
p <- p + scale_x_date(format="%m/%d", ' ')
p

One approach would be to treat the x-axis as numeric and set the breaks and labels aesthetics with scale_x_continuous().
ggplot(my.data, aes(as.numeric(date), vals)) +
geom_line(size = 1.5) +
scale_x_continuous(breaks = as.numeric(my.data$date)
, labels = format(my.data$date, format = "%m/%d"))
though the break between 7/24 through 7/28 looks a bit strange in my opinion. However, I think that's what you want? Let me know if I've misinterpreted.
EDIT
As noted above, I wasn't thrilled with the way the breaks looked above, specifically with the gray grid in the background. Here's one way to maintain the rectangular grid and to only label the points where we have data. You could do this all within the ggplot call, but I think it's easier to do the processing outside of ggplot. First, create a vector that contains the sequence of numbers corresponding to the dates. Then we'll update the appropriate labels and replace the NA entries with " " to prevent anything from being plotted on the x-axis for those entries:
xscale <- data.frame(breaks = seq(min(as.numeric(my.data$date)), max(as.numeric(my.data$date)))
, labels = NA)
xscale$labels[xscale$breaks %in% as.numeric(my.data$date)] <- format(my.data$date, format = "%m/%d")
xscale$labels[is.na(xscale$labels)] <- " "
This gives us something that looks like:
breaks labels
1 15177 07/22
2 15178 07/23
3 15179 07/24
4 15180
5 15181
6 15182
7 15183 07/28
8 15184 07/29
which can then be passed to the scale like this:
scale_x_continuous(breaks = xscale$breaks, labels = xscale$labels)

Related

Select data and name when pointing it chart with ggplotly

I did everything in ggplot, and it was everything working well. Now I need it to show data when I point a datapoint. In this example, the model (to identify point), and the disp and wt ( data in axis).
For this I added the shape (same shape, I do not actually want different shapes) to model data. and asked ggplot not to show shape in legend. Then I convert to plotly. I succeeded in showing the data when I point the circles, but now I am having problems with the legend showing colors and shapes separated with a comma...
I did not wanted to make it again from scrach in plotly as I have no experience in plotly, and this is part of a much larger shiny project, where the chart adjust automatically the axis scales and adds trend lines the the chart among other things (I did not include for simplicity) that I do not know how to do it in plotly.
Many thanks in advance. I have tried a million ways for a couple of days now, and did not succeed.
# choose mtcars data and add rowname as column as I want to link it to shapes in ggplot
data1 <- mtcars
data1$model <- rownames(mtcars)
# I turn cyl data to character as when charting it showed (Error: Continuous value supplied to discrete scale)
data1$cyl <- as.character(data1$cyl)
# linking colors with cylinders and shapes with models
ccolor <- c("#E57373","purple","green")
cylin <- c(6,4,8)
# I actually do not want shapes to be different, only want to show data of model when I point the data point.
models <- data1$model
sshapes <- rep(16,length(models))
# I am going to chart, do not want legend to show shape
graff <- ggplot(data1,aes(x=disp, y=wt,shape=model,col=cyl)) +
geom_point(size = 1) +
ylab ("eje y") + xlab('eje x') +
scale_color_manual(values= ccolor, breaks= cylin)+
scale_shape_manual(values = sshapes, breaks = models)+
guides(shape='none') # do not want shapes to show in legend
graff
chart is fine, but when converting to ggplotly, I am having trouble with the legend
# chart is fine, but when converting to ggplotly, I am having trouble with the legend
graffPP <- ggplotly(graff)
graffPP
legend is not the same as it was in ggplot
I succeeded in showing the model and data from axis when I point a datapoint in the chart... but now I am having problems with the legend....
To the best of my knowledge there is no easy out-of-the box solution to achieve your desired result.
Using pure plotly you could achieve your result by assigning legendgroups which TBMK is not available using ggplotly. However, you could assign the legend groups manually by manipulating the plotly object returned by ggplotly.
Adapting my answer on this post to your case you could achieve your desired result like so:
library(plotly)
p <- ggplot(data1, aes(x = disp, y = wt, shape = model, col = cyl)) +
geom_point(size = 1) +
ylab("eje y") +
xlab("eje x") +
scale_color_manual(values = ccolor, breaks = cylin) +
scale_shape_manual(values = sshapes, breaks = models) +
guides(shape = "none")
gp <- ggplotly(p = p)
# Get the names of the legend entries
df <- data.frame(id = seq_along(gp$x$data), legend_entries = unlist(lapply(gp$x$data, `[[`, "name")))
# Extract the group identifier, i.e. the number of cylinders from the legend entries
df$legend_group <- gsub("^\\((\\d+).*?\\)", "\\1", df$legend_entries)
# Add an indicator for the first entry per group
df$is_first <- !duplicated(df$legend_group)
for (i in df$id) {
# Is the layer the first entry of the group?
is_first <- df$is_first[[i]]
# Assign the group identifier to the name and legendgroup arguments
gp$x$data[[i]]$name <- df$legend_group[[i]]
gp$x$data[[i]]$legendgroup <- gp$x$data[[i]]$name
# Show the legend only for the first layer of the group
if (!is_first) gp$x$data[[i]]$showlegend <- FALSE
}
gp

Create barplot to represent time series in ggplot2

I have a basic dataframe with 3 columns: (i) a date (when a sample was taken); (ii) a site location and (iii) a binary variable indicating what the condition was when sampling (e.g. wet versus dry).
Some reproducible data:
df <- data.frame(Date = rep(seq(as.Date("2010-01-01"), as.Date("2010-12-01"), by="months"),times=2))
df$Site <- c(rep("Site.A",times = 12),rep("Site.B",times = 12))
df$Condition<- as.factor(c(0,0,0,0,1,1,1,1,0,0,0,0,
0,0,0,0,0,1,1,0,0,0,0,0))
What I would like to do is use ggplot to create a bar chart indicating the condition of each site (y axis) over time (x axis) - the condition indicated by a different colour. I am guessing some kind of flipped barplot would be the way to do this, but I cannot figure out how to tell ggplot2 to recognise the values chronologically, rather than summed for each condition. This is my attempt so far which clearly doesn't do what I need it to.
ggplot(df) +
geom_bar(aes(x=Site,y=Date,fill=Condition),stat='identity')+coord_flip()
So I have 2 questions. Firstly, how do I tell ggplot to recognise changes in condition over time and not just group each condition in a traditional stacked bar chart?
Secondly, it seems ggplot converts the date to a numerical value, how would I reformat the x-axis to show a time period, e.g. in a month-year format? I have tried doing this via the scale_x_date function, but get an error message.
labDates <- seq(from = (head(df$Date, 1)),
to = (tail(df$Date, 1)), by = "1 months")
Datelabels <-format(labDates,"%b %y")
ggplot(df) +
geom_bar(aes(x=Site,y=Date,fill=Condition),stat='identity')+coord_flip()+
scale_x_date(labels = Datelabels, breaks=labDates)
I have also tried converting sampling times to factors and displaying these instead. Below I have done this by changing each sampling period to a letter (in my own code, the factor levels are in a month-year format - I put letters here for simplicity). But I cannot format the axis to place each level of the factor as a tick mark. Either a date or factor solution for this second question would be great!
df$Factor <- as.factor(unique(df$Date))
levels(df$Factor) <- list(A = "2010-01-01", B = "2010-02-01",
C = "2010-03-01", D = "2010-04-01", E = "2010-05-01",
`F` = "2010-06-01", G = "2010-07-01", H = "2010-08-01",
I = "2010-09-01", J = "2010-10-01", K= "2010-11-01", L = "2010-12-01")
ggplot(df) +
geom_bar(aes(x=Site,y=Date,fill=Condition),stat='identity')+coord_flip()+
scale_y_discrete(breaks=as.numeric(unique(df$Date)),
labels=levels(df$Factor))
Thank you in advance!
It doesn't really make sense to use geom_bar() considering you do not want to summarise the data and require the visualisation over "time"
I would rather use geom_line() and increase the line thickness if you want to portray a bar chart.
library(tidyr)
library(dplyr)
library(ggplot2)
library(scales)
library(lubridate)
df <- data.frame(Date = rep(seq.Date(as.Date("2010-01-01"), as.Date("2010-12-01"), by="months"),times=2))
df$Site <- c(rep("Site.A",times = 12),rep("Site.B",times = 12))
df$Condition<- as.factor(c(0,0,0,0,1,1,1,1,0,0,0,0,
0,0,0,0,0,1,1,0,0,0,0,0))
df$Date <- ymd(df$Date)
ggplot(df) +
geom_line(aes(y=Site,x=Date,color=Condition),size=10)+
scale_x_date(labels = date_format("%b-%y"))
Note using coord_flip() also does not work, I think this causes the Date issue, see below threads:
how to use coord_carteisan and coord_flip together in ggplot2
In ggplot2, coord_flip and free scales don't work together

How to format x-axis tick label in sqrt(x) format?

I'm studying the example of coord_trans() of ggplot2:
library(ggplot2)
library(scales)
set.seed(4747)
df <- data.frame(a = abs(rnorm(26)),letters)
plot <- ggplot(df,aes(a,letters)) + geom_point()
plot + coord_trans(x = "log10")
plot + coord_trans(x = "sqrt")
I modified the code plot + coord_trans(x = "log10") as following and get what I expected:
plot + scale_x_log10(breaks=trans_breaks("log10", function(x) 10^x),
labels=trans_format("log10", math_format(10^.x)))
I modified the code plot + coord_trans(x = "sqrt") as following and get a strange x-axis:
plot + scale_x_sqrt(breaks=trans_breaks("sqrt", function(x) sqrt(x)),
labels=trans_format("sqrt", math_format(.x^0.5)))
How could I fix the problem?
I get why you said it was a strange / terrible axis. The documentation for trans_breaks even warns you about this in its first line:
These often do not produce very attractive breaks.
To make it less unattractive, I would use round(,2) so my axis labels only have 2 decimal points instead of the default 8 or 9 - cluttering up the axis. Then I would set a sensible range, say in your case 0 to 5 (c(0,5)).
Finally, you can specify the number of ticks for your axis using n in the trans_breaks call.
So putting it together, here's how you can format your x-axis and its tick label in the scale_x_sqrt(x) format:
plot <- ggplot(df,aes(a,letters)) + geom_point()
plot + scale_x_sqrt(breaks=trans_breaks("sqrt", function(x) round(sqrt(x),2), n=5)(c(0, 5)))
Produces this:
The c(0,5) is passed to pretty(), a lesser-known Base R's function. From the documentation, pretty does the following:
Compute a sequence of about n+1 equally spaced "round" values which cover the range of the values in x.
pretty(c(0,5)) simply produces [1] 0 1 2 3 4 5 in our case.
You can even fine-tune your axis by changing the parameters. Here the code uses 3 decimal points (round(x,3)) and we asked for 3 number of ticks n=3:
plot <- ggplot(df,aes(a,letters)) + geom_point()
plot + scale_x_sqrt(breaks=trans_breaks("sqrt", function(x) round(sqrt(x),3), n=3)(c(0, 5)))
Produces this:
EDIT based on OP's additional comments:
To get round integer values, floor() or round(x,0) works, so the following code:
plot <- ggplot(df,aes(a,letters)) + geom_point()
plot + scale_x_sqrt(breaks=trans_breaks("sqrt", function(x) round(sqrt(x),0), n=5)(c(0, 5)))
Produces this:

How to plot stacked point histograms?

What's the ggplot2 equivalent of "dotplot" histograms? With stacked points instead of bars? Similar to this solution in R:
Plot Histogram with Points Instead of Bars
Is it possible to do this in ggplot2? Ideally with the points shown as stacks and a faint line showing the smoothed line "fit" to these points (which would make a histogram shape.)
ggplot2 does dotplots Link to the manual.
Here is an example:
library(ggplot2)
set.seed(789); x <- data.frame(y = sample(1:20, 100, replace = TRUE))
ggplot(x, aes(y)) + geom_dotplot()
In order to make it behave like a simple dotplot, we should do this:
ggplot(x, aes(y)) + geom_dotplot(binwidth=1, method='histodot')
You should get this:
To address the density issue, you'll have to add another term, ylim(), so that your plot call will have the form ggplot() + geom_dotplot() + ylim()
More specifically, you'll write ylim(0, A), where A will be the number of stacked dots necessary to count 1.00 density. In the example above, the best you can do is see that 7.5 dots reach the 0.50 density mark. From there, you can infer that 15 dots will reach 1.00.
So your new call looks like this:
ggplot(x, aes(y)) + geom_dotplot(binwidth=1, method='histodot') + ylim(0, 15)
Which will give you this:
Usually, this kind of eyeball estimate will work for dotplots, but of course you can try other values to fine-tune your scale.
Notice how changing the ylim values doesn't affect how the data is displayed, it just changes the labels in the y-axis.
As #joran pointed out, we can use geom_dotplot
require(ggplot2)
ggplot(mtcars, aes(x = mpg)) + geom_dotplot()
Edit: (moved useful comments into the post):
The label "count" it's misleading because this is actually a density estimate may be you could suggest we changed this label to "density" by default. The ggplot implementation of dotplot follow the original one of Leland Wilkinson, so if you want to understand clearly how it works take a look at this paper.
An easy transformation to make the y axis actually be counts, i.e. "number of observations". From the help page it is written that:
When binning along the x axis and stacking along the y axis, the numbers on y axis are not meaningful, due to technical limitations of ggplot2. You can hide the y axis, as in one of the examples, or manually scale it to match the number of dots.
So you can use this code to hide y axis:
ggplot(mtcars, aes(x = mpg)) +
geom_dotplot(binwidth = 1.5) +
scale_y_continuous(name = "", breaks = NULL)
I introduce an exact approach using #Waldir Leoncio's latter method.
library(ggplot2); library(grid)
set.seed(789)
x <- data.frame(y = sample(1:20, 100, replace = TRUE))
g <- ggplot(x, aes(y)) + geom_dotplot(binwidth=0.8)
g # output to read parameter
### calculation of width and height of panel
grid.ls(view=TRUE, grob=FALSE)
real_width <- convertWidth(unit(1,'npc'), 'inch', TRUE)
real_height <- convertHeight(unit(1,'npc'), 'inch', TRUE)
### calculation of other values
width_coordinate_range <- diff(ggplot_build(g)$panel$ranges[[1]]$x.range)
real_binwidth <- real_width / width_coordinate_range * 0.8 # 0.8 is the argument binwidth
num_balls <- real_height / 1.1 / real_binwidth # the number of stacked balls. 1.1 is expanding value.
# num_balls is the value of A
g + ylim(0, num_balls)
Apologies : I don't have enough reputation to 'comment'.
I like cuttlefish44's "exact approach", but to make it work (with ggplot2 [2.2.1]) I had to change the following line from :
### calculation of other values
width_coordinate_range <- diff(ggplot_build(g)$panel$ranges[[1]]$x.range)
to
### calculation of other values
width_coordinate_range <- diff(ggplot_build(g)$layout$panel_ranges[[1]]$x.range)

Understanding dates and plotting a histogram with ggplot2 in R

Main Question
I'm having issues with understanding why the handling of dates, labels and breaks is not working as I would have expected in R when trying to make a histogram with ggplot2.
I'm looking for:
A histogram of the frequency of my dates
Tick marks centered under the matching bars
Date labels in %Y-b format
Appropriate limits; minimized empty space between edge of grid space and outermost bars
I've uploaded my data to pastebin to make this reproducible. I've created several columns as I wasn't sure the best way to do this:
> dates <- read.csv("http://pastebin.com/raw.php?i=sDzXKFxJ", sep=",", header=T)
> head(dates)
YM Date Year Month
1 2008-Apr 2008-04-01 2008 4
2 2009-Apr 2009-04-01 2009 4
3 2009-Apr 2009-04-01 2009 4
4 2009-Apr 2009-04-01 2009 4
5 2009-Apr 2009-04-01 2009 4
6 2009-Apr 2009-04-01 2009 4
Here's what I tried:
library(ggplot2)
library(scales)
dates$converted <- as.Date(dates$Date, format="%Y-%m-%d")
ggplot(dates, aes(x=converted)) + geom_histogram()
+ opts(axis.text.x = theme_text(angle=90))
Which yields this graph. I wanted %Y-%b formatting, though, so I hunted around and tried the following, based on this SO:
ggplot(dates, aes(x=converted)) + geom_histogram()
+ scale_x_date(labels=date_format("%Y-%b"),
+ breaks = "1 month")
+ opts(axis.text.x = theme_text(angle=90))
stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
That gives me this graph
Correct x axis label format
The frequency distribution has changed shape (binwidth issue?)
Tick marks don't appear centered under bars
The xlims have changed as well
I worked through the example in the ggplot2 documentation at the scale_x_date section and geom_line() appears to break, label, and center ticks correctly when I use it with my same x-axis data. I don't understand why the histogram is different.
Updates based on answers from edgester and gauden
I initially thought gauden's answer helped me solve my problem, but am now puzzled after looking more closely. Note the differences between the two answers' resulting graphs after the code.
Assume for both:
library(ggplot2)
library(scales)
dates <- read.csv("http://pastebin.com/raw.php?i=sDzXKFxJ", sep=",", header=T)
Based on #edgester's answer below, I was able to do the following:
freqs <- aggregate(dates$Date, by=list(dates$Date), FUN=length)
freqs$names <- as.Date(freqs$Group.1, format="%Y-%m-%d")
ggplot(freqs, aes(x=names, y=x)) + geom_bar(stat="identity") +
scale_x_date(breaks="1 month", labels=date_format("%Y-%b"),
limits=c(as.Date("2008-04-30"),as.Date("2012-04-01"))) +
ylab("Frequency") + xlab("Year and Month") +
theme_bw() + opts(axis.text.x = theme_text(angle=90))
Here is my attempt based on gauden's answer:
dates$Date <- as.Date(dates$Date)
ggplot(dates, aes(x=Date)) + geom_histogram(binwidth=30, colour="white") +
scale_x_date(labels = date_format("%Y-%b"),
breaks = seq(min(dates$Date)-5, max(dates$Date)+5, 30),
limits = c(as.Date("2008-05-01"), as.Date("2012-04-01"))) +
ylab("Frequency") + xlab("Year and Month") +
theme_bw() + opts(axis.text.x = theme_text(angle=90))
Plot based on edgester's approach:
Plot based on gauden's approach:
Note the following:
gaps in gauden's plot for 2009-Dec and 2010-Mar; table(dates$Date) reveals that there are 19 instances of 2009-12-01 and 26 instances of 2010-03-01 in the data
edgester's plot starts at 2008-Apr and ends at 2012-May. This is correct based on a minimum value in the data of 2008-04-01 and a max date of 2012-05-01. For some reason gauden's plot starts in 2008-Mar and still somehow manages to end at 2012-May. After counting bins and reading along the month labels, for the life of me I can't figure out which plot has an extra or is missing a bin of the histogram!
Any thoughts on the differences here? edgester's method of creating a separate count
Related References
As an aside, here are other locations that have information about dates and ggplot2 for passers-by looking for help:
Started here at learnr.wordpress, a popular R blog. It stated that I needed to get my data into POSIXct format, which I now think is false and wasted my time.
Another learnr post recreates a time series in ggplot2, but wasn't really applicable to my situation.
r-bloggers has a post on this, but it appears outdated. The simple format= option did not work for me.
This SO question is playing with breaks and labels. I tried treating my Date vector as continuous and don't think it worked so well. It looked like it was overlaying the same label text over and over so the letters looked kind of odd. The distribution is sort of correct but there are odd breaks. My attempt based on the accepted answer was like so (result here).
UPDATE
Version 2: Using Date class
I update the example to demonstrate aligning the labels and setting limits on the plot. I also demonstrate that as.Date does indeed work when used consistently (actually it is probably a better fit for your data than my earlier example).
The Target Plot v2
The Code v2
And here is (somewhat excessively) commented code:
library("ggplot2")
library("scales")
dates <- read.csv("http://pastebin.com/raw.php?i=sDzXKFxJ", sep=",", header=T)
dates$Date <- as.Date(dates$Date)
# convert the Date to its numeric equivalent
# Note that Dates are stored as number of days internally,
# hence it is easy to convert back and forth mentally
dates$num <- as.numeric(dates$Date)
bin <- 60 # used for aggregating the data and aligning the labels
p <- ggplot(dates, aes(num, ..count..))
p <- p + geom_histogram(binwidth = bin, colour="white")
# The numeric data is treated as a date,
# breaks are set to an interval equal to the binwidth,
# and a set of labels is generated and adjusted in order to align with bars
p <- p + scale_x_date(breaks = seq(min(dates$num)-20, # change -20 term to taste
max(dates$num),
bin),
labels = date_format("%Y-%b"),
limits = c(as.Date("2009-01-01"),
as.Date("2011-12-01")))
# from here, format at ease
p <- p + theme_bw() + xlab(NULL) + opts(axis.text.x = theme_text(angle=45,
hjust = 1,
vjust = 1))
p
Version 1: Using POSIXct
I try a solution that does everything in ggplot2, drawing without the aggregation, and setting the limits on the x-axis between the beginning of 2009 and the end of 2011.
The Target Plot v1
The Code v1
library("ggplot2")
library("scales")
dates <- read.csv("http://pastebin.com/raw.php?i=sDzXKFxJ", sep=",", header=T)
dates$Date <- as.POSIXct(dates$Date)
p <- ggplot(dates, aes(Date, ..count..)) +
geom_histogram() +
theme_bw() + xlab(NULL) +
scale_x_datetime(breaks = date_breaks("3 months"),
labels = date_format("%Y-%b"),
limits = c(as.POSIXct("2009-01-01"),
as.POSIXct("2011-12-01")) )
p
Of course, it could do with playing with the label options on the axis, but this is to round off the plotting with a clean short routine in the plotting package.
I know this is an old question, but for anybody coming to this in 2021 (or later), this can be done much easier using the breaks= argument for geom_histogram() and creating a little shortcut function to make the required sequence.
dates <- read.csv("http://pastebin.com/raw.php?i=sDzXKFxJ", sep=",", header=T)
dates$Date <- lubridate::ymd(dates$Date)
by_month <- function(x,n=1){
seq(min(x,na.rm=T),max(x,na.rm=T),by=paste0(n," months"))
}
ggplot(dates,aes(Date)) +
geom_histogram(breaks = by_month(dates$Date)) +
scale_x_date(labels = scales::date_format("%Y-%b"),
breaks = by_month(dates$Date,2)) +
theme(axis.text.x = element_text(angle=90))
I think the key thing is that you need to do the frequency calculation outside of ggplot. Use aggregate() with geom_bar(stat="identity") to get a histogram without the reordered factors. Here is some example code:
require(ggplot2)
# scales goes with ggplot and adds the needed scale* functions
require(scales)
# need the month() function for the extra plot
require(lubridate)
# original data
#df<-read.csv("http://pastebin.com/download.php?i=sDzXKFxJ", header=TRUE)
# simulated data
years=sample(seq(2008,2012),681,replace=TRUE,prob=c(0.0176211453744493,0.302496328928047,0.323054331864905,0.237885462555066,0.118942731277533))
months=sample(seq(1,12),681,replace=TRUE)
my.dates=as.Date(paste(years,months,01,sep="-"))
df=data.frame(YM=strftime(my.dates, format="%Y-%b"),Date=my.dates,Year=years,Month=months)
# end simulated data creation
# sort the list just to make it pretty. It makes no difference in the final results
df=df[do.call(order, df[c("Date")]), ]
# add a dummy column for clarity in processing
df$Count=1
# compute the frequencies ourselves
freqs=aggregate(Count ~ Year + Month, data=df, FUN=length)
# rebuild the Date column so that ggplot works
freqs$Date=as.Date(paste(freqs$Year,freqs$Month,"01",sep="-"))
# I set the breaks for 2 months to reduce clutter
g<-ggplot(data=freqs,aes(x=Date,y=Count))+ geom_bar(stat="identity") + scale_x_date(labels=date_format("%Y-%b"),breaks="2 months") + theme_bw() + opts(axis.text.x = theme_text(angle=90))
print(g)
# don't overwrite the previous graph
dev.new()
# just for grins, here is a faceted view by year
# Add the Month.name factor to have things work. month() keeps the factor levels in order
freqs$Month.name=month(freqs$Date,label=TRUE, abbr=TRUE)
g2<-ggplot(data=freqs,aes(x=Month.name,y=Count))+ geom_bar(stat="identity") + facet_grid(Year~.) + theme_bw()
print(g2)
The error graph this under the title "Plot based on Gauden's approach" is due to the binwidth parameter:
... + Geom_histogram (binwidth = 30, color = "white") + ...
If we change the value of 30 to a value less than 20, such as 10, you will get all frequencies.
In statistics the values are more important than the presentation is more important a bland graphic to a very pretty picture but with errors.

Resources