Using the QQ Plot functionality in ggplot - r

I'm brand new to R, and have a data frame with 8 columns that has daily changes in interest rates. I can plot QQ plots for data each of the 8 columns using the following code:
par(mfrow = c(2,4))
for(i in 1:length(column_names)){
qqnorm(deltaIR.df[,i],main = column_names[i], pch = 16, cex = .5)
qqline(deltaIR.df[,i],cex = .5)
}
I'd like now to use the stat_qq function in the ggplot2 package to do this more elegantly, but just can't get my arms around the syntax - I keep getting it wrong. Would someone kindly help me translate the above code to use ggplot and allow me to view my 8 QQ plots on one page with an appropriate header? Trying the obvious
ggplot(deltaIR.df) + stat_qq(sample = columns[i])
gets me only an error message
Warning: Ignoring unknown parameters: sample
Error: stat_qq requires the following missing aesthetics: sample
and adding in the aesthetics
ggplot(deltaIR.df, aes(column_names)) + stat_qq()
is no better. The error message just changes to
Error: Aesthetics must be either length 1 or the same as the data (5271)
In short, nothing I have done so far (even with Google's assistance) has got me closer to a solution. May I ask for guidance?

Related

Error in axis(side = side, at = at, labels = labels, ...) : invalid value specified for graphical parameter "pch"

I have applied DBSCAN algorithm on built-in dataset iris in R. But I am getting error when tried to visualise the output using the plot( ).
Following is my code.
library(fpc)
library(dbscan)
data("iris")
head(iris,2)
data1 <- iris[,1:4]
head(data1,2)
set.seed(220)
db <- dbscan(data1,eps = 0.45,minPts = 5)
table(db$cluster,iris$Species)
plot(db,data1,main = 'DBSCAN')
Error: Error in axis(side = side, at = at, labels = labels, ...) :
invalid value specified for graphical parameter "pch"
How to rectify this error?
I have a suggestion below, but first I see two issues:
You're loading two packages, fpc and dbscan, both of which have different functions named dbscan(). This could create tricky bugs later (e.g. if you change the order in which you load the packages, different functions will be run).
It's not clear what you're trying to plot, either what the x- or y-axes should be or the type of plot. The function plot() generally takes a vector of values for the x-axis and another for the y-axis (although not always, consult ?plot), but here you're passing it a data.frame and a dbscan object, and it doesn't know how to handle it.
Here's one way of approaching it, using ggplot() to make a scatterplot, and dplyr for some convenience functions:
# load our packages
# note: only loading dbscacn, not loading fpc since we're not using it
library(dbscan)
library(ggplot2)
library(dplyr)
# run dbscan::dbscan() on the first four columns of iris
db <- dbscan::dbscan(iris[,1:4],eps = 0.45,minPts = 5)
# create a new data frame by binding the derived clusters to the original data
# this keeps our input and output in the same dataframe for ease of reference
data2 <- bind_cols(iris, cluster = factor(db$cluster))
# make a table to confirm it gives the same results as the original code
table(data2$cluster, data2$Species)
# using ggplot, make a point plot with "jitter" so each point is visible
# x-axis is species, y-axis is cluster, also coloured according to cluster
ggplot(data2) +
geom_point(mapping = aes(x=Species, y = cluster, colour = cluster),
position = "jitter") +
labs(title = "DBSCAN")
Here's the image it generates:
If you're looking for something else, please be more specific about what the final plot should look like.

Plot histograms or pie charts in a scatter plot

I need to repeat the thing done in:
tiny pie charts to represent each point in an scatterplot using ggplot2 but I stumbled into the problem that the package ggsubplot is not available for 3.3.1 R version.
Essentially I need a histogram or a pie chart in predefined points on the scatterplot. Here is the same code that is used in the cited post:
foo <- data.frame(X=runif(30),Y=runif(30),A=runif(30),B=runif(30),C=runif(30))
foo.m <- melt(foo, id.vars=c("X","Y"))
ggplot(foo.m, aes(X,Y))+geom_point()
ggplot(foo.m) +
geom_subplot2d(aes(x = X, y = Y, subplot = geom_bar(aes(variable,
value, fill = variable), stat = "identity")), width = rel(.5), ref = NULL)
The code used libraries reshape2, ggplot2 and ggsubplot.
The image that I want to see is in the post cited above
UPD: I downloaded the older versions of R (3.0.2 and 3.0.3) and checkpoint package, and used:
checkpoint("2014-09-18")
as was described in the comment bellow. But I get an error:
Using binwidth 0.0946
Using binwidth 0.0554
Error in layout_base(data, vars, drop = drop) :
At least one layer must contain all variables used for facetting
Which I can't get around, because when I try to include facet, the following error comes up:
Error: ggsubplots do not support facetting
It doesn't look like ggsubplot is going to fix itself any time soon. One option would be to use the checkpoint package, and essentially "reset" your copy of R to a time when the package was compatible. This post suggests using a time point of 2014-09-18.

Weird ggplot2 error: Empty raster

Why does
ggplot(data.frame(x=c(1,2),y=c(1,2),z=c(1.5,1.5)),aes(x=x,y=y,color=z)) +
geom_point()
give me the error
Error in grid.Call.graphics(L_raster, x$raster, x$x, x$y, x$width, x$height, : Empty raster
but the following two plots work
ggplot(data.frame(x=c(1,2),y=c(1,2),z=c(2.5,2.5)),aes(x=x,y=y,color=z)) +
geom_point()
ggplot(data.frame(x=c(1,2),y=c(1,2),z=c(1.5,2.5)),aes(x=x,y=y,color=z)) +
geom_point()
I'm using ggplot2 0.9.3.1
TL;DR: Check your data -- do you really want to use a continuous color scale with only one possible value for the color?
The error does not occur if you add + scale_fill_continuous(guide=FALSE) to the plot. (This turns off the legend.)
ggplot(data.frame(x=c(1,2), y=c(1,2), z=c(1.5,1.5)), aes(x=x,y=y,color=z)) +
geom_point() + scale_color_continuous(guide = FALSE)
The error seems to be triggered in cases where a continuous color scale uses only one color. The current GitHub version already includes the relevant pull request. Install it via:
devtools::install_github("hadley/ggplot2")
But more probably there is an issue with the data: why would you use a continuous color scale with only one value?
The same behaviour (i.e. the "Empty raster"error) appeared to me with another value apart from 1.5.
Try the following:
ggplot(data.frame(x=c(1,2),y=c(1,2),z=c(0.02,0.02)),aes(x=x,y=y,color=z))
+ geom_point()
And you get again the same error (tried with both 0.9.3.1 and 1.0.0.0 versions) so it looks like a nasty and weird bug.
This definitely sounds like an edge case better suited for a bug report as others have mentioned but here's some generalizable code that might be useful to somebody as a clunky workaround or for handling labels/colors. It's plotting a rescaled variable and using the real values as labels.
require(scales)
z <- c(1.5,1.5)
# rescale z to 0:1
z_rescaled <- rescale(z)
# customizable number of breaks in the legend
max_breaks_cnt <- 5
# break z and z_rescaled by quantiles determined by number of maximum breaks
# and use 'unique' to remove duplicate breaks
breaks_z <- unique(as.vector(quantile(z, seq(0,1,by=1/max_breaks_cnt))))
breaks_z_rescaled <- unique(as.vector(quantile(z_rescaled, seq(0,1,by=1/max_breaks_cnt))))
# make a color palette
Pal <- colorRampPalette(c('yellow','orange','red'))(500)
# plot z_rescaled with breaks_z used as labels
ggplot(data.frame(x=c(1,2),y=c(1,2),z_rescaled),aes(x=x,y=y,color=z_rescaled)) +
geom_point() + scale_colour_gradientn("z",colours=Pal,labels = breaks_z,breaks=breaks_z_rescaled)
This is quite off-topic but I like to use rescaling to send tons of changing variables to a function like this:
colorfunction <- gradient_n_pal(colours = colorRampPalette(c('yellow','orange','red'))(500),
values = c(0:1), space = "Lab")
colorfunction(z_rescaled)

r - Add text to each lattice histogram with panel.text but has error "object x is missing"

In the following R code, I try to create 30 histograms for the variable allowed.clean by the factor zip_cpt(which has 30 levels).
For each of these histograms, I also want to add mean and sample size--they need to be calculated for each level of the factor zip_cpt. So I used panel.text to do this.
After I run this code, I had error message inside each histogram which reads "Error using packet 21..."x" is missing, with..." (I am not able to read the whole error message because they don't show up in whole). I guess there's something wrong with the object x. Is it because mean(x) and length(x) don't actually apply to the data at each level of the factor zip_cpt?
I appreciate any help!
histogram(~allowed.clean|zip_cpt,data=cpt.IC_CAB1,
type='density',
nint=100,
breaks=NULL,
layout=c(10,3),
scales= list(y=list(relation="free"),
x=list(relation="free")),
panel=function(x,...) {
mean.values <-mean(x)
sample.n <- length(x)
panel.text(lab=paste("Sample size = ",sample.n))
panel.text(lab=paste("Mean = ",mean.values))
panel.histogram(x,col="pink", ...)
panel.mathdensity(dmath=dnorm, col="black",args=list(mean=mean(x, na.rm = TRUE),sd=sd(x, na.rm = TRUE)), ...)})
A discussion I found online is helpful for adding customized text (e.g., basic statistics) on each of the histograms:
https://stat.ethz.ch/pipermail/r-help/2007-March/126842.html

What does negative length vectors in a wireframe plot (lattice package) means?

I want to plot a wireframe in R using the lattice package. However, I get the following error message "error using packet 1 negative length vectors are not allowed". The data looks like the following:
> result_mean
experiment alpha beta packet
1 0 1.0 1 3.000000
2 0 1.1 1 2.571429
The command to create the data is the following
png(file=paste("foobar.png"),width=1280, height=1280);
plot <- wireframe(result_mean$packet ~ result_mean$alpha * result_mean$beta,
data=result_mean, scales = list(arrows=FALSE, cex= .45, col = "black", font = 3),
drape = TRUE, colorkey = TRUE, main = "Foo",
col.regions = terrain.colors(100),
screen = list(z = -60, x = -60),
xlab="alpha", ylab="beta", zlab="mean \npackets");
print(plot);
dev.off();
I'm wondering what this error message means and if there is a good way to debug this?
Thanks in advance!
Debugging lattice graphics is a bit difficult because (a) the code is complex and multi-layered and (b) the errors get trapped in a way that makes them hard to intercept. However, you can at least get some way in diagnosing the problem.
First create a minimal example. I suspected that your problem was that your data fall on a single line, so I created data that looked like that:
d <- data.frame(x=c(1,1.1),
y=c(1,1),
z=c(2,3))
library(lattice)
wireframe(z~y*x,data=d)
Now confirm that fully three-dimensional data (data that define a plane) work just fine:
d2 <- data.frame(expand.grid(x=c(1,1.1),
y=c(1,1.1)),
z=1:4)
wireframe(z~y*x,data=d2)
So the question is really -- did you intend to draw a wireframe of two points lying on a line? If so, what did you want to have appear in the plot? You could hack things a little bit to set the y values to differ by a tiny bit -- I tried it, though, and got no wireframe appearing (but no error either).
edit: I did a bit more tracing, with various debug() incantations (and searching the source code of the lattice package and R itself for "negative length") to deduce the following: within a function called lattice:::panel.3dwire, there is a call to a C function wireframePanelCalculations, which you can see at https://r-forge.r-project.org/scm/viewvc.php/pkg/src/threeDplot.c?view=markup&root=lattice
Within this function:
nh = (nx-1) * (ny-1) * ng; /* number of quadrilaterals */
sHeights = PROTECT(allocVector(REALSXP, nh));
In this case nx is zero, so this code is asking R to allocate a negative-length vector, which is where the error comes from.
In this case, though, I think the diagnosis is more useful than the explicit debugging.

Resources