Dropping data outside valid range when using geom_ma in scatterplot - r

I have four categories that I am plotting her using ggplot. I would like add a moving average using geom_ma but I have too few of the green dots to get a good moving average (I would prefer a period of at least 20). How can I keep the scatterplot as is and only add a MA of the purple and blue dots, which would be in my range of a 20 period moving average?
Example:
ggplot(data, aes(x, y, color=Str)) + geom_point(stat="identity") + geom_ma(ma_fun = SMA, n = 20, linetype=1, size=1, na.rm=TRUE)
I get the error: "Warning message:
Computation failed in stat_sma():
n = 20 is outside valid range: [1, 10]"

This is a great example of why it helps to provide a minimal reproducible example. You have provided the code that produced the error, but there is nothing wrong with the code on its own: it will only cause this error with certain inputs. Given suitable data, your code is fine.
Let's make a dummy data frame with the same name and column names as your data frame. We will make data for the first 330 days of 2020, and we will have 4 groups in Str, so a total of 1320 rows:
library(tidyquant)
library(ggplot2)
set.seed(1)
data <- data.frame(x = rep(seq(as.Date("2020-01-01"),
by = "day", length.out = 330), 4),
y = as.vector(replicate(4, 1000 * cumsum(rnorm(330)))),
Str = rep(c("A", "B", "C", "D"), each = 330))
Now if we use your exact plotting code, we can see that the plot is fine:
ggplot(data, aes(x, y, color = Str)) +
geom_point(stat="identity") +
geom_ma(ma_fun = SMA, n = 20, linetype = 1, size = 1, na.rm = TRUE)
But if one or more of our Str groups has fewer than 20 measurements, then we get your error. Let's remove most of the Str == "A" and Str == "B" cases, and repeat the plot:
data <- data[c(1:20 * 33, 661:1320),]
ggplot(data, aes(x, y, color = Str)) +
geom_point(stat="identity") +
geom_ma(ma_fun = SMA, n = 20, linetype = 1, size = 1, na.rm = TRUE)
#> Warning: Computation failed in `stat_sma()`:
#> n = 20 is outside valid range: [1, 10]
We get your exact warning, and the MA lines disappear from all the groups. Clearly we cannot get a 20-measurement moving average if we only have 10 data points, so geom_ma just gives up.
The fix here is to use the data = argument in geom_ma to filter out any groups with fewer than 20 data points:
ggplot(data, aes(x, y, color = Str)) +
geom_point(stat="identity") +
geom_ma(ma_fun = SMA, n = 20, linetype = 1, size = 1, na.rm = TRUE,
data = data[data$Str %in% names(table(data$Str)[table(data$Str) > 20]),])

Related

Loop printing lots of graphs in order (PDF) using ggplot2 in R

I have a large dataset as a result of a bayesian logistic regression. The dataset contains parameter estimates, confidence intervals, etc (see below for head).
mean sd confint_2.5 confint_97.5 Rhat median spec Errorup Errordown
1 -0.7897597 0.18668304 -1.1759960 -0.4517294 1.002211 -0.7811156 Marvulg -0.3293862 -1.957112
2 -0.7891327 0.08145761 -0.9570086 -0.6380287 1.000155 -0.7861764 Viotric -0.1481477 -1.743185
3 -0.6619662 0.26049168 -1.2203315 -0.2059030 1.045208 -0.6440501 Antdioi -0.4381470 -1.864382
4 -0.6571516 0.17940842 -1.0417642 -0.3364415 1.008100 -0.6470382 Eleacic -0.3105968 -1.688802
5 -0.6526717 0.20005184 -1.0816375 -0.2968111 1.005126 -0.6394952 Antcotu -0.3426842 -1.721133
6 -0.6497648 0.16620699 -1.0081607 -0.3555847 1.003738 -0.6384035 Triflav -0.2828188 -1.646564
I have a total of 714 rows of data, sorted (mean) from low to high. I use this code to plot 50 at a time, where a3_sort is a subset of 50 rows of data (so manually doing a3_sort <- a3[n:n,), after which I print the subset and proceed to the next 50):
ggplot2::ggplot(data = a3_sort, mapping = aes(x = reorder(spec, mean), y = mean, ymin = confint_97.5, ymax = confint_2.5))+
geom_pointrange()+
geom_hline(yintercept = 0, lty = 2)+
coord_flip()+
xlab ("species") +ylab ("mean (credibility interval)")+
theme_bw()
This works, and I get what I want, but there must be a less manual labour way to do this?
My question: Is there a way to loop this procedure, automatically saving the PDF in the working directory?
Below an example of what one plot looks like:
You can try this solution. I tested with dummy data DF with 714 rows and same columns as you have. DF in your case is your sorted dataframe of 714 rows and the variables you have. I have set the code so that you can change if you require a width larger than 50.
library(zoo)
#Create keys; change 50 if you want a larger window
keys <- seq(1, nrow(DF), 50)
vals=1:length(keys)
#Flag to allocate the position and values
#na.locf is used to complete NA so that we have same index
DF$Flag <- NA
DF$Flag[keys]<-vals
DF$Flag <- na.locf(DF$Flag)
#Then split by flag
ListData <- split(DF,DF$Flag)
#Function to create plot
myplot <- function(x)
{
tplot <- ggplot2::ggplot(data = x, mapping = aes(x = reorder(spec, mean), y = mean, ymin = confint_97.5, ymax = confint_2.5))+
geom_pointrange()+
geom_hline(yintercept = 0, lty = 2)+
coord_flip()+
xlab ("species") +ylab ("mean (credibility interval)")+
theme_bw()
return(tplot)
}
#Replicate plots
LPlots <- lapply(ListData,myplot)
#Export to pdf
pdf('Myplots.pdf',width = 14)
for(i in c(1:length(LPlots)))
{
plot(LPlots[[i]])
}
dev.off()
In the end, you will have your plots in pdf. I hope this helps. Let me know if you have any doubt.
This approach could be adapted to your case:
# Some dummy data:
df <- data.frame(g = letters[1:24],
min = sample(0:10, 24, replace = TRUE),
mid = sample(11:20, 24, replace = TRUE),
max = sample(21:30, 24, replace = TRUE))
library(ggplot2)
library(purrr)
# list of the rows you want printing, this could be automated
plot_range <- list(p1_6 = 1:6, p7_12 = 7:12, p13_18 = 13:18, p19_24 = 19:24)
# plotting function which also sets a title and plot name
gg_plot <- function(df, plot_rows){
title <- paste("Automatic plot rows: ", min(plot_rows), "to", max(plot_rows))
plot_nm <- paste("plots", min(plot_rows), max(plot_rows), sep = "_")
p <- ggplot(df[plot_rows, ])+
geom_segment(aes(x = min , xend = max, y = g, yend = g))+
geom_point(aes(x = mid, y = g))+
ggtitle(title)
print(ggsave(plot_nm, p, device = "pdf"))
}
# purrr function which acts as a loop to print each graph and allows a different data frame to be used.
walk(plot_range, ~gg_plot(df = df, plot_rows = .x))
#> Saving 7 x 5 in image
#> NULL
#> Saving 7 x 5 in image
#> NULL
#> Saving 7 x 5 in image
#> NULL
#> Saving 7 x 5 in image
#> NULL
Created on 2020-07-11 by the reprex package (v0.3.0)

Advice/ on how to plot side by side histograms with line graph going through in ggplot2

I'm currently finishing off my Masters project and need to include some graphics for the write-up. Without boring you too much, I have some data which is associated with AR(1) parameters ranging from 0.1 to 0.9 by 0.1 increments. As such I thought of doing a faceted histogram like the one below (worry not about the hideous fruit salad of colours, it will not be used).
I used this code.
ggplot(opt_lens_geom,aes(x=l_1024,fill=factor(rho))) + geom_histogram()+coord_flip()+facet_grid(.~rho,scales = "free_x")
I also would like to draw a trend line for the median values since the AR(1) parameter is continuous. In a later iteration I deleted the padding and made it "look" like it was one graph, but I have had issues with the endpoints matching up since each facet is a separate graphical device. Can anyone give me some advice on how to do this? I am not particularly partial to the faceting so if it is not needed I do away with it.
I will try and upload sample data, but all simulating 100 values for each of the 9 rhos would work just to get it started like:
opt_lens_geom <- data.frame(rho= rep(seq(0.1,0.9,by=0.1),each=100),l_1024=rnorm(900))
You might consider ggridges. I've assumed here that you want a median value for each value of rho.
library(ggplot2)
library(ggridges)
library(dplyr)
set.seed(1001)
opt_lens_geom <- data.frame(rho = rep(seq(0.1, 0.9, by = 0.1), each = 100),
l_1024 = rnorm(900))
opt_lens_geom %>%
mutate(rho_f = factor(rho)) %>%
ggplot(aes(l_1024, rho_f)) +
stat_density_ridges(quantiles = 2, quantile_lines = TRUE)
Result. You can add scale = 1 as a parameter to stat_density_ridges if you don't like the amount of overlap.
Try the following. It uses a pre-computed data frame of the medians.
library(ggplot2)
df <- iris[c(1, 5)]
names(df) <- c("val", "rho")
med <- plyr::ddply(df, "rho", summarise, m = median(val))
ggplot(data = df, aes(x = val, fill = factor(rho))) +
geom_histogram() +
coord_flip() +
geom_vline(data = med, aes(xintercept = m), colour = 'black') +
facet_wrap(~ factor(rho))
You could do a variant on this using geom_violin instead of using histograms, although you wouldn't get labelled counts, just an idea of the relative density. Example with made up data:
df = data.frame(
rho = rep(c(0.1, 0.2, 0.3), each = 50),
val = sample(1:10, 150, replace = TRUE)
)
df$val = df$val + (5 * (df$rho == 0.2)) + (8 * (df$rho == 0.3))
ggplot(df, aes(x = rho, y = val, fill = factor(rho))) +
geom_violin() +
stat_summary(aes(group = 1), colour = "black",
geom = "line", fun.y = "median")
This produces a violin for each value of rho, and joins the medians for each violin.

Fill area between two lines, with high/low and dates

Forword: I provide a reasonably satisfactory answer to my own question. I understand this is acceptable practice. Naturally my hope is to invite suggestions and improvements.
My purpose is to plot two time series (stored in a dataframe with dates stored as class 'Date') and to fill the area between the data points with two different colors according to whether one is above the other. For instance, to plot an index of Bonds and an index of Stocks, and to fill the area in red when the Stock index is above the bond index, and to fill the area in blue otherwise.
I have used ggplot2 for this purpose, because I am reasonably familiar with the package (author: Hadley Wickham), but feel free to suggest other approaches. I wrote a custom function based on the geom_ribbon() function of the ggplot2 package. Early on I faced problems related to my lack of experience in handling the geom_ribbon() function and objects of class 'Date'. The function below represents my effort to solve these problems, almost surely it is roundabout, unecessarily complicated, clumsy, etc.. So my question is: Please suggest improvements and/or alternative approaches. Ultimately, it would be great to have a general-purpose function made available here.
Data:
set.seed(123456789)
df <- data.frame(
Date = seq.Date(as.Date("1950-01-01"), by = "1 month", length.out = 12*10),
Stocks = 100 + c(0, cumsum(runif(12*10-1, -30, 30))),
Bonds = 100 + c(0, cumsum(runif(12*10-1, -5, 5))))
library('reshape2')
df <- melt(df, id.vars = 'Date')
Custom Function:
## Function to plot geom_ribbon for class Date
geom_ribbon_date <- function(data, group, N = 1000) {
# convert column of class Date to numeric
x_Date <- as.numeric(data[, which(sapply(data, class) == "Date")])
# append numeric date to dataframe
data$Date.numeric <- x_Date
# ensure fill grid is as fine as data grid
N <- max(N, length(x_Date))
# generate a grid for fill
seq_x_Date <- seq(min(x_Date), max(x_Date), length.out = N)
# ensure the grouping variable is a factor
group <- factor(group)
# create a dataframe of min and max
area <- Map(function(z) {
d <- data[group == z,];
approxfun(d$Date.numeric, d$value)(seq_x_Date);
}, levels(group))
# create a categorical variable for the max
maxcat <- apply(do.call('cbind', area), 1, which.max)
# output a dataframe with x, ymin, ymax, is. max 'dummy', and group
df <- data.frame(x = seq_x_Date,
ymin = do.call('pmin', area),
ymax = do.call('pmax', area),
is.max = levels(group)[maxcat],
group = cumsum(c(1, diff(maxcat) != 0))
)
# convert back numeric dates to column of class Date
df$x <- as.Date(df$x, origin = "1970-01-01")
# create and return the geom_ribbon
gr <- geom_ribbon(data = df, aes(x, ymin = ymin, ymax = ymax, fill = is.max, group = group), inherit.aes = FALSE)
return(gr)
}
Usage:
ggplot(data = df, aes(x = Date, y = value, group = variable, colour = variable)) +
geom_ribbon_date(data = df, group = df$variable) +
theme_bw() +
xlab(NULL) +
ylab(NULL) +
ggtitle("Bonds Versus Stocks (Fake Data!)") +
scale_fill_manual('is.max', breaks = c('Stocks', 'Bonds'),
values = c('darkblue','darkred')) +
theme(legend.position = 'right', legend.direction = 'vertical') +
theme(legend.title = element_blank()) +
theme(legend.key = element_blank())
Result:
While there are related questions and answers on stackoverflow, I haven't found one that was sufficiently detailed for my purpose. Here is a selection of useful exchanges:
create-geom-ribbon-for-min-max-range: Asks a similar question, but provides less detail than I was looking for.
possible-bug-in-geom-ribbon: Closely related, but intermediate steps on how to compute max/min are missing.
fill-region-between-two-loess-smoothed-lines-in-r-with-ggplot: Closely related, but focuses on loess lines. Excellent.
ggplot-colouring-areas-between-density-lines-according-to-relative-position : Closely related, but focuses on densities. This post greatly inspired me.
Perhaps I'm not understanding your full problem but it seems that a fairly direct approach would be to define a third line as the minimum of the two time series at each time point. geom_ribbon is then called twice (once for each unique value of Asset) to plot the ribbons formed by each of the series and the minimum line. Code could look like:
set.seed(123456789)
df <- data.frame(
Date = seq.Date(as.Date("1950-01-01"), by = "1 month", length.out = 12*10),
Stocks = 100 + c(0, cumsum(runif(12*10-1, -30, 30))),
Bonds = 100 + c(0, cumsum(runif(12*10-1, -5, 5))))
library(reshape2)
library(ggplot2)
df <- cbind(df,min_line=pmin(df[,2],df[,3]) )
df <- melt(df, id.vars=c("Date","min_line"), variable.name="Assets", value.name="Prices")
sp <- ggplot(data=df, aes(x=Date, fill=Assets))
sp <- sp + geom_ribbon(aes(ymax=Prices, ymin=min_line))
sp <- sp + scale_fill_manual(values=c(Stocks="darkred", Bonds="darkblue"))
sp <- sp + ggtitle("Bonds Versus Stocks (Fake Data!)")
plot(sp)
This produces following chart:
I actually had the same question some time ago and here is the related post. It defines a function finding the intersections between two lines and an other function which takes a dataframe in input and then colors the space between the two columns using matplotand polygon
EDIT
Here is the code, modified a bit to allow the last polygon to be plotted
set.seed(123456789)
dat <- data.frame(
Date = seq.Date(as.Date("1950-01-01"), by = "1 month", length.out = 12*10),
Stocks = 100 + c(0, cumsum(runif(12*10-1, -30, 30))),
Bonds = 100 + c(0, cumsum(runif(12*10-1, -5, 5))))
intersects <- function(x1, x2) {
seg1 <- which(!!diff(x1 > x2)) # location of first point in crossing segments
above <- x2[seg1] > x1[seg1] # which curve is above prior to crossing
slope1 <- x1[seg1+1] - x1[seg1]
slope2 <- x2[seg1+1] - x2[seg1]
x <- seg1 + ((x2[seg1] - x1[seg1]) / (slope1 - slope2))
y <- x1[seg1] + slope1*(x - seg1)
data.frame(x=x, y=y, pindex=seg1, pabove=(1:2)[above+1L])
# pabove is greater curve prior to crossing
}
fillColor <- function(data, addLines=TRUE) {
## Find points of intersections
ints <- intersects(data[,2], data[,3]) # because the first column is for Dates
intervals <- findInterval(1:nrow(data), c(0, ints$x))
## Make plot
matplot(data, type="n", col=2:3, lty=1, lwd=4,xaxt='n',xlab='Date')
axis(1,at=seq(1,dim(data)[1],length.out=12),
labels=data[,1][seq(1,dim(data)[1],length.out=12)])
legend("topright", c(colnames(data)[2], colnames(data)[3]), col=3:2, lty=1, lwd=2)
## Draw the polygons
for (i in seq_along(table(intervals))) {
xstart <- ifelse(i == 1, 0, ints$x[i-1])
ystart <- ifelse(i == 1, data[1,2], ints$y[i-1])
xend <- ints$x[i]
yend <- ints$y[i]
x <- seq(nrow(data))[intervals == i]
polygon(c(xstart, x, xend, rev(x)), c(ystart, data[x,2], yend, rev(data[x,3])),
col=ints$pabove[i]%%2+2)
}
# add end of plot
xstart <- ints[dim(ints)[1],1]
ystart <- ints[dim(ints)[1],2]
xend <- nrow(data)
yend <- data[dim(data)[1],2]
x <- seq(nrow(data))[intervals == max(intervals)]
polygon(c(xstart, x, xend, rev(x)), c(ystart, data[x,2], yend, rev(data[x,3])),
col=ints[dim(ints)[1]-1,4]%%2+2)
## Add lines for curves
if (addLines)
invisible(lapply(1:2, function(x) lines(seq(nrow(data)), data[,x], col=x%%2+2, lwd=2)))
}
## Plot the data
fillColor(dat,FALSE)
and the final result is this (with the same data used for the question)
#walts answer should remain the winner but while implementing his solution, I gave it a tidy update.
library(tidyverse)
set.seed(2345)
# fake data​
raw_data <-
tibble(
date = as.Date("2020-01-01") + (1:40),
a = 95 + cumsum(runif(40, min = -20, max = 20)),
b = 55 + cumsum(runif(40, min = -1, max = 1))
)
​
# the steps
# the 'y' + 'min_line' + 'group' is the right granularity (by date) to
# create 2 separate ribbons
df <-
raw_data %>%
# find min of the two columns
mutate(min_line = pmin(a, b)) %>%
pivot_longer(c(a, b), names_to = "group", values_to = "y") %>%
print()
​
# the result
ggplot(data = df, aes(x = date, fill = group)) +
geom_ribbon(aes(ymax = y, ymin = min_line)) +
theme_classic()
another option using ggh4x - requires the data to be wide with y for lines 1 and 2 in different columns.
library(ggh4x)
#> Loading required package: ggplot2
set.seed(123456789)
df <- data.frame(
Date = seq.Date(as.Date("1950-01-01"), by = "1 month", length.out = 12*10),
Stocks = 100 + c(0, cumsum(runif(12*10-1, -30, 30))),
Bonds = 100 + c(0, cumsum(runif(12*10-1, -5, 5))))
## The data frame is NOT made long!!
ggplot(data = df, aes(x = Date)) +
stat_difference(aes(ymin = Stocks, ymax = Bonds)) +
scale_fill_brewer(palette = "Set1")
Created on 2022-11-24 with reprex v2.0.2

ggplot in function using variables -- geom_density: arguments imply differing number of rows

I have a function to print and save some charts using ggplot2. When I executed to geom_density, the error message showed up.
Don't know how to automatically pick scale for object of type function. Defaulting to continuous
Error in data.frame(x = 1:5, y = c(44.43, 72.36, 177.17, 515.09, 1403.33 :
arguments imply differing number of rows: 5, 0
After some research, I found that I maybe miss the group name and some instruction. But, I cannot pinpoint this error. I have listed entire data.
library(ggplot2)
dt <- read.table("/R/10G.csv", header=TRUE, sep="\t", na.strings="NA", dec=".", strip.white=TRUE)
dt$Thread <- factor(dt$Thread) # factorize 'Thread'
library(plyr)
dd.mean <- ddply(dt, 'Thread', summarize, TPS = round(mean(TPS), 2), RT = round(mean(RT), 2))
m <- ggplot(dd.mean, aes(x=Thread,y=RT, group=seq))
m + geom_density(fill=NA)
m + geom_text(data=dd.mean, aes(x=Thread, label=TPS), vjust=-2)
You should research more into ggplot(), specifically geom_line and geom_point. I don't think geom_density is what you were looking for here. Below is an example of one way you could approach this task, but there are literally thousands of different approaches you could take, that's why I recommend looking into the documentation more, some links that may help:
http://www.cookbook-r.com/Graphs/Bar_and_line_graphs_%28ggplot2%29/.
http://sape.inf.usi.ch/quick-reference/ggplot2/geom_text
Note: I didn't factorize dt$Thread
library(plyr)
dd.mean <- ddply(dt, 'Thread', summarize, TPS = round(mean(TPS), 2), RT = round(mean(RT), 2))
ggplot(dd.mean, aes(x = Thread, y = RT)) +
geom_line(size = 1, alpha = 0.3, colour = "red") +
geom_point(size = 3, alpha = 1, colour = "red") +
geom_text(x = dd.mean$Thread, label = dd.mean$TPS, vjust = -2)

How to prevent two labels to overlap in a barchart?

The image below shows a chart that I created with the code below. I highlighted the missing or overlapping labels. Is there a way to tell ggplot2 to not overlap labels?
week = c(0, 1, 1, 1, 1, 2, 2, 3, 4, 5)
statuses = c('Shipped', 'Shipped', 'Shipped', 'Shipped', 'Not-Shipped', 'Shipped', 'Shipped', 'Shipped', 'Not-Shipped', 'Shipped')
dat <- data.frame(Week = week, Status = statuses)
p <- qplot(factor(Week), data = dat, geom = "bar", fill = factor(Status))
p <- p + geom_bar()
# Below is the most important line, that's the one which displays the value
p <- p + stat_bin(aes(label = ..count..), geom = "text", vjust = -1, size = 3)
p
You can use a variant of the well-known population pyramid.
Some sample data (code inspired by Didzis Elferts' answer):
set.seed(654)
week <- sample(0:9, 3000, rep=TRUE, prob = rchisq(10, df = 3))
status <- factor(rbinom(3000, 1, 0.15), labels = c("Shipped", "Not-Shipped"))
data.df <- data.frame(Week = week, Status = status)
Compute count scores for each week, then convert one category to negative values:
library("plyr")
plot.df <- ddply(data.df, .(Week, Status), nrow)
plot.df$V1 <- ifelse(plot.df$Status == "Shipped",
plot.df$V1, -plot.df$V1)
Draw the plot. Note that the y-axis labels are adapted to show positive values on either side of the baseline.
library("ggplot2")
ggplot(plot.df) +
aes(x = as.factor(Week), y = V1, fill = Status) +
geom_bar(stat = "identity", position = "identity") +
scale_y_continuous(breaks = 100 * -1:5,
labels = 100 * c(1, 0:5)) +
geom_text(aes(y = sign(V1) * max(V1) / 30, label = abs(V1)))
The plot:
For production purposes you'd need to determine the appropriate y-axis tick labels dynamically.
Made new sample data (inspired by code of #agstudy).
week <- sample(0:5,1000,rep=TRUE,prob=c(0.2,0.05,0.15,0.5,0.03,0.1))
statuses <- gl(2,1000,labels=c('Not-Shipped', 'Shipped'))
dat <- data.frame(Week = week, Status = statuses)
Using function ddply() from library plyr made new data frame text.df for labels. Column count contains number of observations in each combination of Week and Status. Then added column ypos that contains cumulative sum of count for each Week plus 15. This will be used for y position. For Not-Shipped ypos replaced with -10.
library(plyr)
text.df<-ddply(dat,.(Week,Status),function(x) data.frame(count=nrow(x)))
text.df<-ddply(text.df,.(Week),transform,ypos=cumsum(count)+15)
text.df$ypos[text.df$Status=="Not-Shipped"]<- -10
Now labels are plotted with geom_text() using new data frame.
ggplot(dat,aes(as.factor(Week),fill=Status))+geom_bar()+
geom_text(data=text.df,aes(x=as.factor(Week),y=ypos,label=count))
One solution to avoid overlaps is to use to dodge position of bars and texts. To avoid missing values you can set ylim. Here an example.
## I create some more realistic data similar to your picture
week <- sample(0:5,1000,rep=TRUE)
statuses <- gl(2,1000,labels=c('Not-Shipped', 'Shipped'))
dat <- data.frame(Week = week, Status = statuses)
## for dodging
dodgewidth <- position_dodge(width=0.9)
## get max y to set ylim
ymax <- max(table(dat$Week,dat$Status))+20
ggplot(dat,aes(x = factor(Week),fill = factor(Status))) +
geom_bar( position = dodgewidth ) +
stat_bin(geom="text", position= dodgewidth, aes( label=..count..),
vjust=-1,size=5)+
ylim(0,ymax)
Based on Didzis plot you could also increase readability by keeping the position on the y axis constant and by colouring the text in the same colour as the legend.
library(ggplot2)
week <- sample(0:5,1000,rep=TRUE,prob=c(0.2,0.05,0.15,0.5,0.03,0.1))
statuses <- gl(2,1000,labels=c('Not-Shipped', 'Shipped'))
dat <- data.frame(Week = week, Status = statuses)
library(plyr)
text.df<-ddply(dat,.(Week,Status),function(x) data.frame(count=nrow(x)))
text.df$ypos[text.df$Status=="Not-Shipped"]<- -15
text.df$ypos[text.df$Status=="Shipped"]<- -55
p <- ggplot(dat,aes(as.factor(Week),fill=Status))+geom_bar()+
geom_text(data=text.df,aes(x=as.factor(Week),y=ypos,label=count),colour=ifelse(text.df$Status=="Not-Shipped","#F8766D","#00BFC4"))

Resources