Suppose I have a number of replications of bivariate experiments which I wish to display simultaneously in hexagonally binned plots, with common cell counts. Is there existing code to do this? Is there an easy way to modify the hexbin package to do this for me?
For example:
library(hexbin)
x <- replicate(9, rnorm(10000), simplify=FALSE)
y <- replicate(9, rnorm(10000), simplify=FALSE)
h <- mapply(hexbin, x, y)
par(mfrow=c(3,3))
lapply(h, plot)
This code doesn't display a grid of hexbin plots with common cell counts, but I'd like it to.
hexbin objects are plotted using grid graphics so your par(mfrow=c(3,3)) does not do anything. Each graph is plotted on a separate page. To get the details of the plot options:
?gplot.hexbin
In this case, we want to set maxcnt to the largest cell count:
lapply(h, plot, maxcnt=max(unlist(lapply(h, function(x) max(x#count)))))
This will apply the same legend to each graph.
Related
I have a standard unfilled contour plot in R. It has two regions and was generated using the KDE function. It looks to be normalised to between 0 and 1. I want to plot the original data over it however R just seems to plot the data on a separate graph each time. I have tried using lines() and points(). So my two questions are: 1) how do you un-normalise a contour plot (did KDE normalise the output?) and 2) how do you plot the original data over a contour plot?
Skeleton code:
data.kde <- kde(data)
plot(data)
contour(data.kde$estimate, add=TRUE)
I am not sure if the add=TRUE statement is working, as the data is on different scales as my contour plot has come out normalised to between 0 and 1. If I normalise my original data it does not quite match where it should on the contour - the two data centres are slightly off from the contour centres.
Suppose your data is like this:
library(ks)
set.seed(1)
x <- rnorm(100)
y <- rnorm(100)
data <- cbind(x, y)
Then you can do:
KDE <- kde(data)
plot(KDE, drawpoints = TRUE)
Or if you want to use contour
contour(x = KDE$eval.points[[1]], y = KDE$eval.points[[2]], z = KDE$estimate)
points(KDE$x[,1], KDE$x[,2])
Created on 2022-02-03 by the reprex package (v2.0.1)
I need to make a histogram for my variable which is 'travel time'. And inside that, I need to plot the regression(correlation) data i.e. my observed data vs predicted. And I need to repeat it for different time of day and week(in simple words, make a matrix of such figure using par function). for now, I can draw histograms and arrange that in matrix form but I am facing a problem in inside plot (plotting x and y data together with y=x line, and arranging them within their consecutive histograms plot, in a matrix ). How can I do that, as in the figure below. Any help would be appreciated. Thanks!
One way to do this is to loop over your data and on every iteration create a desired plot. Here is one not very polished example, but it shows the logic how plotting a small plot over larger plot can be done. You will have to tweak the code to get it work in the way you need, but it shouldn't be that difficult.
# create some sample dataset (your x values)
a <- c(rnorm(100,0,1))
b <- c(rnorm(100,2,1))
# create their "y" values counterparts
x <- a + 3
y <- b + 4
# bind the data into two dataframes (explanatory variables in one, explained in the other)
data1 <- cbind(a,b)
data2 <- cbind(x,y)
# set dimensions of the plot matrix
par(mfrow = c(2,1))
# for each of the explanatory - explained pair
for (i in 1:ncol(data2))
{
# set positioning of the histogram
par("plt" = c(0.1,0.95,0.15,0.9))
# plot the histogram
hist(data1[, i])
# set positioning of the small plot
par("plt" = c(0.7, 0.95, 0.7, 0.95))
# plot the small plot over the histogram
par(new = TRUE)
plot(data1[, i], data2[, i])
# add some line into the small plot
lines(data1[, i], data1[, i])
}
Is there any way for me to add some points to a pairs plot?
For example, I can plot the Iris dataset with pairs(iris[1:4]), but I wanted to execute a clustering method (for example, kmeans) over this dataset and plot its resulting centroids on the plot I already had.
It would help too if there's a way to plot the whole data and the centroids together in a single pairs plot in such a way that the centroids can be plotted in a different way. The idea is, I plot pairs(rbind(iris[1:4],centers) (where centers are the three centroids' data) but plotting the three last elements of this matrix in a different way, like changing cex or pch. Is it possible?
You give the solution yourself in the last paragraph of your question. Yes, you can use pch and col in the pairs function.
pairs(rbind(iris[1:4], kmeans(iris[1:4],3)$centers),
pch=rep(c(1,2), c(nrow(iris), 3)),
col=rep(c(1,2), c(nrow(iris), 3)))
Another option is to use panel function:
cl <- kmeans(iris[1:4],3)
idx <- subset(expand.grid(x=1:4,y=1:4),x!=y)
i <- 1
pairs(iris[1:4],bg=cl$cluster,pch=21,
panel=function(x, y,bg, ...) {
points(x, y, pch=21,bg=bg)
points(cl$center[,idx[i,'x']],cl$center[,idx[i,'y']],
cex=4,pch=10,col='blue')
i <<- i +1
})
But I think it is safer and easier to use lattice splom function. The legend is also automatically generated.
cl <- kmeans(iris[1:4],3)
library(lattice)
splom(iris[1:4],groups=cl$cluster,pch=21,
panel=function(x, y,i,j,groups, ...) {
panel.points(x, y, pch=21,col=groups)
panel.points(cl$center[,j],cl$center[,i],
pch=10,col='blue')
},auto.key=TRUE)
I have three data sets of different lengths and I would like to plot density functions of all three on the same plot. This is straight forward with base graphics:
n <- c(rnorm(10000), rnorm(10000))
a <- c(rnorm(10001), rnorm(10001, 0, 2))
p <- c(rnorm(10002), rnorm(10002, 2, .5))
plot(density(n))
lines(density(a))
lines(density(p))
Which gives me something like this:
alt text http://www.cerebralmastication.com/wp-content/uploads/2009/10/density.png
But I really want to do this with GGPLOT2 because I want to add other features that are only available with GGPLOT2. It seems that GGPLOT really wants to take my empirical data and calculate the density for me. And it gives me a bunch of lip because my data sets are of different lengths. So how do I get these three densities to plot in GGPLOT2?
The secret to happiness in ggplot2 is to put everything in the "long" (or what I guess matrix oriented people would call "sparse") format:
df <- rbind(data.frame(x="n",value=n),
data.frame(x="a",value=a),
data.frame(x="p",value=p))
qplot(value, colour=x, data=df, geom="density")
If you don't want colors:
qplot(value, group=x, data=df, geom="density")
I have come across a number of situations where I want to plot more points than I really ought to be -- the main holdup is that when I share my plots with people or embed them in papers, they occupy too much space. It's very straightforward to randomly sample rows in a dataframe.
if I want a truly random sample for a point plot, it's easy to say:
ggplot(x,y,data=myDf[sample(1:nrow(myDf),1000),])
However, I was wondering if there were more effective (ideally canned) ways to specify the number of plot points such that your actual data is accurately reflected in the plot. So here is an example.
Suppose I am plotting something like the CCDF of a heavy tailed distribution, e.g.
ccdf <- function(myList,density=FALSE)
{
# generates the CCDF of a list or vector
freqs = table(myList)
X = rev(as.numeric(names(freqs)))
Y =cumsum(rev(as.list(freqs)));
data.frame(x=X,count=Y)
}
qplot(x,count,data=ccdf(rlnorm(10000,3,2.4)),log='xy')
This will produce a plot where the x & y axis become increasingly dense. Here it would be ideal to have fewer samples plotted for large x or y values.
Does anybody have any tips or suggestions for dealing with similar issues?
Thanks,
-e
I tend to use png files rather than vector based graphics such as pdf or eps for this situation. The files are much smaller, although you lose resolution.
If it's a more conventional scatterplot, then using semi-transparent colours also helps, as well as solving the over-plotting problem. For example,
x <- rnorm(10000); y <- rnorm(10000)
qplot(x, y, colour=I(alpha("blue",1/25)))
Beyond Rob's suggestions, one plot function I like as it does the 'thinning' for you is hexbin; an example is at the R Graph Gallery.
Here is one possible solution for downsampling plot with respect to the x-axis, if it is log transformed. It log transforms the x-axis, rounds that quantity, and picks the median x value in that bin:
downsampled_qplot <- function(x,y,data,rounding=0, ...) {
# assumes we are doing log=xy or log=x
group = factor(round(log(data$x),rounding))
d <- do.call(rbind, by(data, group,
function(X) X[order(X$x)[floor(length(X)/2)],]))
qplot(x,count,data=d, ...)
}
Using the definition of ccdf() from above, we can then compare the original plot of the CCDF of the distribution with the downsampled version:
myccdf=ccdf(rlnorm(10000,3,2.4))
qplot(x,count,data=myccdf,log='xy',main='original')
downsampled_qplot(x,count,data=myccdf,log='xy',rounding=1,main='rounding = 1')
downsampled_qplot(x,count,data=myccdf,log='xy',rounding=0,main='rounding = 0')
In PDF format, the original plot takes up 640K, and the downsampled versions occupy 20K and 8K, respectively.
I'd either make image files (png or jpeg devices) as Rob already mentioned, or I'd make a 2D histogram. An alternative to the 2D histogram is a smoothed scatterplot, it makes a similar graphic but has a more smooth cutoff from dense to sparse regions of space.
If you've never seen addictedtor before, it's worth a look. It has some very nice graphics generated in R with images and sample code.
Here's the sample code from the addictedtor site:
2-d histogram:
require(gplots)
# example data, bivariate normal, no correlation
x <- rnorm(2000, sd=4)
y <- rnorm(2000, sd=1)
# separate scales for each axis, this looks circular
hist2d(x,y, nbins=50, col = c("white",heat.colors(16)))
rug(x,side=1)
rug(y,side=2)
box()
smoothscatter:
library("geneplotter") ## from BioConductor
require("RColorBrewer") ## from CRAN
x1 <- matrix(rnorm(1e4), ncol=2)
x2 <- matrix(rnorm(1e4, mean=3, sd=1.5), ncol=2)
x <- rbind(x1,x2)
layout(matrix(1:4, ncol=2, byrow=TRUE))
op <- par(mar=rep(2,4))
smoothScatter(x, nrpoints=0)
smoothScatter(x)
smoothScatter(x, nrpoints=Inf,
colramp=colorRampPalette(brewer.pal(9,"YlOrRd")),
bandwidth=40)
colors <- densCols(x)
plot(x, col=colors, pch=20)
par(op)