Make dotplot scale y axis as for histogram - r

We are using dotplots in a classroom setting to introduce the histogram, because the binning concept is confusing to many students. So we start with the dotplot which is similar but more intuitive:
x <- rnorm(100)
qplot(x, geom = "bar")
qplot(x, geom = "dotplot", method="histodot")
Because students do this on their own data, the code needs to work without manual fiddling. However the geom_dotplot seems to use different scaling defaults than geom_bar. The y axis does not adjust with the data, but seems to depend only on the size of the dots. For example:
x <- runif(1000)
qplot(x, geom = "bar")
qplot(x, geom = "dotplot", method="histodot")
How can I make geom_dotplot with stat_histodot scale the y axis exactly as it would do for the histogram, either by using smaller or overlapping dots?

I came up with the following workaround that shrinks the binwidth until things fit on the page:
# This function calculates a default binwidth that will work better
# for the dotplot with large n than the ggplot2 default.
calculate_smart_binwidth <- function(x, aspect_ratio = 2/3){
x <- as.numeric(x)
nbins <- max(30, round(sqrt(length(x)) / aspect_ratio))
range <- range(x, na.rm = TRUE, finite = TRUE)
if(diff(range) == 0) return(NULL)
repeat {
message("trying nbins: ", nbins)
binwidth <- diff(range)/nbins;
highest_bin <- max(ggplot2:::bin(x, binwidth = binwidth)$count);
if(highest_bin < aspect_ratio * nbins) return(binwidth)
nbins <- ceiling(nbins * 1.03);
}
}
Examples:
x <- runif(1e4)
qplot(x, geom="dotplot", binwidth=calculate_smart_binwidth(x))
x <- rnorm(1e4)
qplot(x, geom="dotplot", binwidth=calculate_smart_binwidth(x))

Related

Common breaks and free axes for overlapping lattice histograms

What is the required incantation to achieve an overlapping, faceted lattice::histogram with common break points (across groups, but potentially varying across panels)?
For example, assume I want the total range of the data (groups combined) for each panel to be split into 30 bins.
Example data:
library(lattice)
set.seed(1)
d <- data.frame(v1=rep(c('A', 'B'), each=1000),
v2=rep(c(0.5, 1), each=2000),
mean=rep(c(0, 10, 2, 12), each=1000))
d$x <- rnorm(nrow(d), d$mean, d$v2)
Using nint=30?
p1 <- histogram(~x|v1, d, groups=v2, nint=30,
scales=list(relation='free'), type='percent',
panel = function(...) {
panel.superpose(..., panel.groups=panel.histogram,
col=c('red', 'blue'), alpha=0.3)
})
p1
Above, the bins are consistent across groups, but (1) the x-axis limits are shared across panels (problematic when the x-axis range varies substantially across panels - I really want the 30 bins to be calculated individually for each panel), and (2) the y-axis is cramped when using type='percent' (it should extend further).
Using breaks=30?
p2 <- histogram(~x|v1, d, groups=v2, breaks=30,
scales=list(relation='free'), type='percent',
panel = function(...) {
panel.superpose(..., panel.groups=panel.histogram,
col=c('red', 'blue'), alpha=0.3)
})
p2
Now the axis limits look good, but the bins width varies across groups.
So...
Using lattice, how can I achieve overlapping, faceted histograms that have constant bin width across groups within panels, but have axis limits that fit the data for each panel?
(I realise that ggplot is an option, but I want the figure style to be consistent with my other lattice plots.)
This works, but I'm afraid it's rather pedestrian. At least it only requires the trellis object itself; it will assume the number of bins you want in each panel is equal to the nint parameter.
It works like this: check whether the panels ranges overlap. If they don't, split each (slightly extended) range into nint bins, then concatenate them with a few empty bins in between. We also need to work out the y range, which we do by scaling according to the maximum number of counts.
fix_facets <- function(p1)
{
n_bins <- p1$panel.args.common$nint
xvals1 <- p1$panel.args[[1]]$x
xvals2 <- p1$panel.args[[2]]$x
if(min(xvals2) > max(xvals1) | min(xvals1) > max(xvals2)){
left_range <- range(xvals1)
left_range <- left_range + (diff(left_range) * c(-0.1, 0.1))
left_bins <- seq(left_range[1], left_range[2], diff(left_range)/n_bins)
right_range <- range(xvals2)
right_range <- right_range + (diff(right_range) * c(-0.1, 0.1))
right_bins <- seq(right_range[1], right_range[2], diff(right_range)/n_bins)
if(max(left_range) < min(right_range)){
mid_bins <- seq(max(left_bins), min(right_bins), diff(left_bins[1:2]))
all_bins <- c(left_bins, mid_bins, right_bins)
} else {
mid_bins <- seq(max(right_bins), min(left_bins), diff(right_bins[1:2]))
all_bins <- c(right_bins, mid_bins, left_bins)
}
p1$panel.args.common$breaks <- all_bins
p1$x.limits[[1]] <- left_range
p1$x.limits[[2]] <- right_range
histleft <- hist(xvals1, breaks = left_bins)
histright <- hist(xvals2, breaks = right_bins)
group_factor <- 100 * length(p1$condlevels[[1]])
p1$y.limits[[1]][2] <- group_factor * max(histleft$counts) / length(xvals1)
p1$y.limits[[2]][2] <- group_factor * max(histright$counts) / length(xvals2)
}
return(p1)
}
So with your example, we can do this:
p1 <- histogram(~x|v1, d, groups=v2, nint=30,
scales=list(relation='free'), type='percent',
panel = function(...) {
panel.superpose(..., panel.groups=panel.histogram,
col=c('red', 'blue'), alpha=0.3)
})
fix_facets(p1)
and to show it works with other numbers of bins...
p1 <- histogram(~x|v1, d, groups=v2, nint=10,
scales=list(relation='free'), type='percent',
panel = function(...) {
panel.superpose(..., panel.groups=panel.histogram,
col=c('red', 'blue'), alpha=0.3)
})
fix_facets(p1)

How can I place multiple unrelated graphs on the same axes in ggplot2?

I am trying to recreate an image found in a textbook in R, the original of which was built in MATLAB:
I have generated each of the graphs seperately, but what would be best practice them into an image like this in ggplot2?
Edit: Provided code used. This is just a transformation of normally distributed data.
library(ggplot2)
mean <- 6
sd <- 1
X <- rnorm(100000, mean = mean, sd = sd)
Y <- dnorm(X, mean = mean, sd = sd)
Y_p <- pnorm(X, mean = mean, sd = sd)
ch_vars <- function(X){
nu_vars <- c()
for (x in X){
nu_vars <- c(nu_vars, (1/(1 + exp(-x + 5))))
}
return(nu_vars)
}
nu_X <- ch_vars(X)
nu_Y <- ch_vars(Y)
data <- data.frame(x = X, y = Y, Y_p = Y_p, x = nu_X, y = nu_Y)
# Cumulative distribution
ggplot(data = data) +
geom_line(aes(x = X, y = Y_p))
# Distribution of initial data
ggplot(data = data_ch, aes(x = X)) +
geom_histogram(aes(y = ..density..), bins = 25, fill = "red", color = "black")
# Distribution of transformed data
ggplot(data = data, aes(x = nu_X)) +
geom_histogram(aes(y = ..density..), bins = 25, fill = "green", color = "black")
In short, you can't, or rather, you shouldn't.
ggplot is a high-level plotting packaging. More than a system for drawing shapes and lines, it's fairly "opinionated" about how data should be represented, and one of its opinions is that a plot should express a clear relationship between its axes and marks (points, bars, lines, etc.). The axes essentially define a coordinate space, and the marks are then plotted onto the space in a straightforward and easily interpretable manner.
The plot you show breaks that relationship -- it's a set of essentially arbitrary histograms all drawn onto the same box, where the axis values become ambiguous. The x-axis represents the values of 1 histogram and the y-axis represents another (and thus neither axis represents the histograms' heights).
It is of course technically possible to force ggplot to render something like your example, but it would require pre-computing the histograms, normalizing their values and bin heights to a common coordinate space, converting these into suitable coordinates for use with geom_rect, and then re-labeling the plot axes. It would be a very large amount of manual effort and ultimately defeats the point of using a high-level plotting grammar like ggplot.

specific colours are required within Hexbin package?

I am plotting scatter plot for high density of dots.I used Hexbin package and I successfully plot the data.The colour is not pretty,and I am asked to follow a standard colour. I wonder if it is supported by R. Image shows my out put(right) and the wanted colour(left).
Example:
x <- rnorm(1000)
y <- rnorm(1000)
bin<-hexbin(x,y, xbins=50)
plot(bin, main="Hexagonal Binning")
Using the example on the package helpapge for hexbin you can get close using rainbow and playing with the colcuts argument like so...
x <- rnorm(10000)
y <- rnorm(10000)
(bin <- hexbin(x, y))
plot(hexbin(x, y + x*(x+1)/4),main = "Example" ,
colorcut = seq(0,1,length.out=64),
colramp = function(n) rev(rainbow(64)),
legend = 0 )
You will need to play with the legend specification etc to get exactly what you want.
Alternative colour palette suggested by #Roland
## nicer colour palette
cols <- colorRampPalette(c("darkorchid4","darkblue","green","yellow", "red") )
plot(hexbin(x, y + x*(x+1)/4), main = "Example" ,
colorcut = seq(0,1,length.out=24),
colramp = function(n) cols(24) ,
legend = 0 )

scatter plot specifying color and labelling axis in r

I have following data and plot:
pos <- rep(1:2000, 20)
xv =c(rep(1:20, each = 2000))
# colrs <- unique(xv)
colrs <- xv # edits
yv =rnorm(2000*20, 0.5, 0.1)
xv = lapply(unique(xv), function(x) pos[xv==x])
to.add = cumsum(sapply(xv, max) + 1000)
bp <- c(xv[[1]], unlist(lapply(2:length(xv), function(x) xv[[x]] + to.add[x-1])))
plot (bp,yv, pch = "*", col = colrs)
I have few issues in this plot I could not figure out.
(1) I want to use different color for different group or two different color for different groups (i.e xv), but when I tried color function in terms to be beautiful mixture. Although I need to highlight some points (for example bp 4000 to 4500 for example with blue color)
(2) Instead of bp positions I want to put a tick mark and label with the group.
Thank you, appreciate your help.
Edits: with help of the following answer (with slight different approach in case I have unbalanced number in each group will work) I could get the similar plot. But still question remaining regarding colors is what if I want to use two alternate colors in alternate group ?
You can solve your colour issue by repeating the colour index however many times each group has a point plotted, like so:
plot (bp,yv, pch = "*", col = rep(colrs,each=2000))
The default colour palette (see ?palette or palette() ) will wrap around itself and you might want to specify your own to get 20 distinct colours.
To relabel the x axis, try plotting without the axis and then specifying the points and labels manually.
plot (bp,yv, pch = "*", col = rep(colrs,each=2000),xaxt="n")
axis(1,at=seq(1000,58000,3000),labels=1:20)
If you are trying to squeeze a lot of labels in there, you might have to shrink the text (cex.axis)or spin the labels 90 degrees (las=2).
plot (bp,yv, pch = "*", col = rep(colrs,each=2000),xaxt="n")
axis(1,at=seq(1000,58000,3000),labels=1:20,cex.axis=0.7,las=2)
Result:
One way is you could use a nested ifelse.
I'm still learning R, but one way it could be done would look something like:
plot(whatev$x, whatev$y, col=ifelse(xv<2000,red,ifelse(2000<xv & xv<4000,yellow,blue)))
You could nest as many of these as you want to have specificity on the colors and the intervals. The ifelse command is of form ifelse(TEST, True, False).
A simpler way would be to use the unique groups in xv to assign rainbow colors.
colrs=rainbow(length(unique(xv))) #Or colrs=rainbow(length(xv)) if xv is unique.
plot(whatev$x, whatev$y, col=colrs)
I hope I got all that right. I'm still learning R myself.
I'm going to go out on a limb and guess that your real data are something like 2000 values of things from 20 different groups. For instance, heights of 2000 plants of 20 different species. In such a case, you might want to look at the dotplot() function (or as illustrated below, dotplot.table()) in the lattice package.
Generate matrix of hypothetical values:
set.seed(1)
myY <- sapply( seq_len(20), function(x) rnorm(2000, x^(1/3)))
Transpose matrix to get groups as rows
myY <- t(myY)
Provide names of groups to matrix:
dimnames(myY)[[1]]<-paste("group", seq_len(nrow(myY)))
Load lattice package
library(lattice)
Generate dotplot
dotplot(myY, horizontal = FALSE, panel = function(x, y, horizontal, ...) {
panel.dotplot(x = x, y = y, horizontal = horizontal, jitter.x = TRUE,
col = seq_len(20)[x], pch = "*", cex = 1.5)
}, scales = list(x = list(rot = 90))
)
Which looks like (with unfortunate y-axis labeling):
Seeing that #JohnCLK is requesting a way of colouring by values on the x axis, I tried these demos in ggplot2-- each uses a dummy variable that is coded based on values or ranges to be highlighted in the other variables.
So, first set up the data, as in the question:
pos <- rep(1:2000, 20)
xv <- c(rep(1:20, each = 2000))
yv <- (2000*20, 0.5, 0.1)
xv <- lapply(unique(xv), function(x) pos[xv==x])
to.add <- cumsum(sapply(xv, max) + 1000)
bp <- c(xv[[1]], unlist(lapply(2:length(xv), function(x) xv[[x]] + to.add[x-1])))
Then load ggplot2, prepare a couple of utility functions, and set the default theme:
library("ggplot2")
make.png <- function(p, fName) {
png(fName, width=640, height=480, units="px")
print(p)
dev.off()
}
make.plot <- function(df) {
p <- ggplot(df,
aes(x = bp,
y = yv,
colour = highlight))
p <- p + geom_point()
p <- p + opts(legend.position = "none")
return(p)
}
theme_set( theme_bw() )
Draw a plot which highlights values in a defined range on the vertical axis:
# highlight a horizontal band
df <- data.frame(cbind(bp, yv))
df$highlight <- 0
df$highlight[ df$yv >= 0.4 & df$yv < 0.45 ] <- 1
p <- make.plot(df)
print(p)
make.png(p, "demo_horizontal.png")
Next draw a plot which highlights values in a defined range on the x axis, a vertical band:
# highlight a vertical band
df$highlight <- 0
df$highlight[ df$bp >= 38000 & df$bp < 42000 ] <- 1
p <- make.plot(df)
print(p)
make.png(p, "demo_vertical.png")
And finally draw a plot which highlights alternating vertical bands, by x value:
# highlight alternating bands
library("gtools")
alt.band.width <- 2000
df$highlight <- as.integer(df$bp / alt.band.width)
df$highlight <- ifelse(odd(df$highlight), 1, 0)
p <- make.plot(df)
print(p)
make.png(p, "demo_alternating.png")
Hope this helps; it was good practice anyway.

draw one or more plots in the same window

I want compare two curves, it's possible with R to draw a plot and then draw another plot over it ? how ?
thanks.
With base R, you can plot your one curve and then add the second curve with the lines() argument. Here's a quick example:
x <- 1:10
y <- x^2
y2 <- x^3
plot(x,y, type = "l")
lines(x, y2, col = "red")
Alternatively, if you wanted to use ggplot2, here are two methods - one plots different colors on the same plot, and the other generates separate plots for each variable. The trick here is to "melt" the data into long format first.
library(ggplot2)
df <- data.frame(x, y, y2)
df.m <- melt(df, id.var = "x")
qplot(x, value, data = df.m, colour = variable, geom = "line")
qplot(x, value, data = df.m, geom = "line")+ facet_wrap(~ variable)
Using lattice package:
require(lattice)
x <- seq(-3,3,length.out=101)
xyplot(dnorm(x) + sin(x) + cos(x) ~ x, type = "l")
There's been some solutions already for you. If you stay with the base package, you should get acquainted with the functions plot(), lines(), abline(), points(), polygon(), segments(), rect(), box(), arrows(), ...Take a look at their help files.
You should see a plot from the base package as a pane with the coordinates you gave it. On that pane, you can draw a whole set of objects with the abovementioned functions. They allow you to construct a graph as you want. You should remember though that, unless you play with the par settings like Dr. G showed, every call to plot() gives you a new pane. Also take into account that things can be plot over other things, so think about the order you use to plot things.
See eg:
set.seed(100)
x <- 1:10
y <- x^2
y2 <- x^3
yse <- abs(runif(10,2,4))
plot(x,y, type = "n") # type="n" only plots the pane, no curves or points.
# plots the area between both curves
polygon(c(x,sort(x,decreasing=T)),c(y,sort(y2,decreasing=T)),col="grey")
# plot both curves
lines(x,y,col="purple")
lines(x, y2, col = "red")
# add the points to the first curve
points(x, y, col = "black")
# adds some lines indicating the standard error
segments(x,y,x,y+yse,col="blue")
# adds some flags indicating the standard error
arrows(x,y,x,y-yse,angle=90,length=0.1,col="darkgreen")
This gives you :
Have a look at par
> ?par
> plot(rnorm(100))
> par(new=T)
> plot(rnorm(100), col="red")
ggplot2 is a great package for this sort of thing:
install.packages('ggplot2')
require(ggplot2)
x <- 1:10
y1 <- x^2
y2 <- x^3
df <- data.frame(x = x, curve1 = y1, curve2 = y2)
df.m <- melt(df, id.vars = 'x', variable_name = 'curve' )
# now df.m is a data frame with columns 'x', 'curve', 'value'
ggplot(df.m, aes(x,value)) + geom_line(aes(colour = curve)) +
geom_point(aes(shape=curve))
You get the plot coloured by curve, and with different piont marks for each curve, and a nice legend, all painlessly without any additional work:
Draw multiple curves at the same time with the matplot function. Do help(matplot) for more.

Resources