Kernel density plots on a single figure - r

I have been trying to plot simple density plots using R as:
plot(density(Data$X1),col="red")
plot(density(Data$X2),col="green")
Since I want to compare, I'd like to plot both in one figure. But 'matplot' doesn't work!! I also tried with ggplot2 as:
library(ggplot2)
qplot(Data$X1, geom="density")
qplot(Data$X2, add = TRUE, geom="density")
Also in this case, plots appear separately (though I wrote add=TRUE)!! Can anyone come up with an easy solution to the problem, please?

In ggplot2 or lattice you need to reshape the data to seupose them.
For example :
dat <- data.frame(X1= rnorm(100),X2=rbeta(100,1,1))
library(reshape2)
dat.m <- melt(dat)
Using ``lattice`
densityplot(~value , groups = variable, data=dat.m,auto.key = T)
Using ``ggplot2`
ggplot(data=dat.m)+geom_density(aes(x=value, color=variable))
EDIT add X1+X2
Using lattice and the extended formua interface, it is extremely easy to do this:
densityplot(~X1+X2+I(X1+X2) , data=dat) ## no need to reshape data!!

You can try:
plot(density(Data$X1),col="red")
points(density(Data$X2),col="green")
I must add that the xlim and ylim values should ideally be set to include ranges of both X1 and X2, which could be done as follows:
foo <- density(Data$X1)
bar <- density(Data$X2)
plot(foo,col="red", xlim=c(min(foo$x,bar$x),max(foo$x,bar$x)) ylim=c(min(foo$y,bar$y),max(foo$y,bar$y))
points(bar,col="green")

In base graphics you can overlay density plots if you keep the ranges identical and use par(new=TRUE) between them. I think add=TRUE is a base graphics strategy that some functions but not all will honor.

If you specify n, from, and to in the calls to density and make sure that they match between the 2 calls then you should be able to use matplot to plot both in one step (you will need to bind the 2 sets of y values into a single matrix).

Related

Is it possible to use more than two characters as points in a plot

I am trying to plot points in a plot where each dot is represented by a number. However, it seems that the points can only be one character long, as you can see in the plot produced by the code below:
set.seed(1); plot(rnorm(15), pch=paste(1:15))
I wonder if there is any workaround for this. Thanks.
set.seed(1); plot(rnorm(15), pch=paste(1:15),type='n')
text(x=1:15,y=rnorm(15),label=round(rnorm(15),2))
another grid option using lattice for example:
dat <- data.frame(x=1:15,y=rnorm(15))
xyplot(y~x,data=dat,
panel=function(x,y,...){
panel.xyplot(x,y,...)
panel.text(x,y,label=round(rnorm(15),2),adj=2,col='red')})

ggplot2 2d Density Weights

I'm trying to plot some data with 2d density contours using ggplot2 in R.
I'm getting one slightly odd result.
First I set up my ggplot object:
p <- ggplot(data, aes(x=Distance,y=Rate, colour = Company))
I then plot this with geom_points and geom_density2d. I want geom_density2d to be weighted based on the organisation's size (OrgSize variable). However when I add OrgSize as a weighting variable nothing changes in the plot:
This:
p+geom_point()+geom_density2d()
Gives an identical plot to this:
p+geom_point()+geom_density2d(aes(weight = OrgSize))
However, if I do the same with a loess line using geom_smooth, the weighting does make a clear difference.
This:
p+geom_point()+geom_smooth()
Gives a different plot to this:
p+geom_point()+geom_smooth(aes(weight=OrgSize))
I was wondering if I'm using density2d inappropriately, should I instead be using contour and supplying OrgSize as the 'height'? If so then why does geom_density2d accept a weighting factor?
Code below:
require(ggplot2)
Company <- c("One","One","One","One","One","Two","Two","Two","Two","Two")
Store <- c(1,2,3,4,5,6,7,8,9,10)
Distance <- c(1.5,1.6,1.8,5.8,4.2,4.3,6.5,4.9,7.4,7.2)
Rate <- c(0.1,0.3,0.2,0.4,0.4,0.5,0.6,0.7,0.8,0.9)
OrgSize <- c(500,1000,200,300,1500,800,50,1000,75,800)
data <- data.frame(Company,Store,Distance,Rate,OrgSize)
p <- ggplot(data, aes(x=Distance,y=Rate))
# Difference is apparent between these two
p+geom_point()+geom_smooth()
p+geom_point()+geom_smooth(aes(weight = OrgSize))
# Difference is not apparent between these two
p+geom_point()+geom_density2d()
p+geom_point()+geom_density2d(aes(weight = OrgSize))
geom_density2d is "accepting" the weight parameter, but then not passing to MASS::kde2d, since that function has no weights. As a consequence, you will need to use a different 2d-density method.
(I realize my answer is not addressing why the help page says that geom_density2d "understands" the weight argument, but when I have tried to calculate weighted 2D-KDEs, I have needed to use other packages besides MASS. Maybe this is a TODO that #hadley put in the help page that then got overlooked?)

Producing statistics over levels

I've generated a set of levels from my dataset, and now I want to find a way to sum the rest of the data columns in order to plot it while plotting my first column. Something like:
levelSet <- cut(frame$x1, "cutting")
boxplot(frame$x1~levelSet)
for (l in levelSet)
{
x2Sum<-sum(frame$x2[levelSet==l])
}
or maybe the inside of the loop should look like:
lines(sum(frame$x2[levelSet==l]))
Any thoughts? I am new to R, but I can't seem to get a hang of the indexing and ~ notation thus far.
I know r doesn't work this way, but I'd like functionality that 'looks' like
hist(frame$x2~levelSet)
## Or
hist(frame$x2, breaks = levelSet)
To plot a histograph, boxplot, etc. over a level set:
Try the lattice package:
library(lattice)
histogram(~x2|equal.count(x1),data=frame)
Substitute shingle for equal.count to set your own break points.
ggplot2 would also work nicely for this.
To put a histogram over a boxplot:
par(mfrow=c(2,1))
hist(x2)
boxplot(x2)
You can also use the layout() command to fine-tune the arrangement.

How to plot data grouped by a factor, but not as a boxplot

In R, given a vector
casp6 <- c(0.9478638, 0.7477657, 0.9742675, 0.9008372, 0.4873001, 0.5097587, 0.6476510, 0.4552577, 0.5578296, 0.5728478, 0.1927945, 0.2624068, 0.2732615)
and a factor:
trans.factor <- factor (rep (c("t0", "t12", "t24", "t72"), c(4,3,3,3)))
I want to create a plot where the data points are grouped as defined by the factor. So the categories should be on the x-axis, values in the same category should have the same x coordinate.
Simply doing plot(trans.factor, casp6) does almost what I want, it produces a boxplot, but I want to see the individual data points.
require(ggplot2)
qplot(trans.factor, casp6)
You can do it with ggplot2, using facets. When I read "I want to create a plot where the data points are grouped as defined by the factor", the first thing that came to my mind was facets.
But in this particular case, faster alternative should be:
plot(as.numeric(trans.factor), casp6)
And you can play with plot options afterwards (type, fg, bg...), but I recommend sticking with ggplot2, since it has much cleaner code, great functionality, you can avoid overplotting... etc. etc.
Learn how to deal with factors. You got barplot when evaluating plot(trans.factor, casp6) 'cause trans.factor was class of factor (ironically, you even named it in such manor)... and trans.factor, as such, was declared before a continuous (numeric) variable within plot() function... hence plot() "feels" the need to subset data and draw boxplot based on each part (if you declare continuous variable first, you'll get an ordinary graph, right?). ggplot2, on the other hand, interprets factor in a different way... as "an ordinary", numeric variable (this stands for syntax provided by Jonathan Chang, you must specify geom when doing something more complex in ggplot2).
But, let's presuppose that you have one continuous variable and a factor, and you want to apply histogram on each part of continuous variable, defined by factor levels. This is where the things become complicated with base graph capabilities.
# create dummy data
> set.seed(23)
> x <- rnorm(200, 23, 2.3)
> g <- factor(round(runif(200, 1, 4)))
By using base graphs (package:graphics):
par(mfrow = c(1, 4))
tapply(x, g, hist)
ggplot2 way:
qplot(x, facets = . ~ g)
Try to do this with graphics in one line of code (semicolons and custom functions are considered cheating!):
qplot(x, log(x), facets = . ~ g)
Let's hope that I haven't bored you to death, but helped you!
Kind regards,
aL3xa
I find the following solution:
stripchart(casp6~trans.factor,data.frame(casp6,trans.factor),pch=1,vertical=T)
simple and direct.
(Refer eg to http://www.mail-archive.com/r-help#r-project.org/msg34176.html)
You may be able to get close to what you want using lattice graphics by doing:
library(lattice)
xyplot(casp6 ~ trans.factor,
scales = list(x = list(at = 1:4, labels = levels(trans.factor))))
I think there's a better solution (I wrote it for a workshop a few days ago), but it slipped my mind. Here's an ugly substitute with base graphics. Feel free to annotate the x axis ad libitum. Personally, I like Greg's solution.
plot(0, 0, xlim = c(1, 4), ylim = range(casp6), type = "n")
points(casp6 ~ trans.factor)
No extra package needed
I'm a bit late to the party, but I found that you can get the desired result very easily with the standard plot function -- simply convert the factor to a numeric value:
plot(as.numeric(trans.factor), casp6)
10 year old question...but if you want a neat base R solution:
plot(trans.factor, casp6, border=NA, outline=FALSE)
points(trans.factor, casp6)
The first line sets up the plot but draws nothing. The second adds the points. This is slightly neater than the solutions that force x to be numeric.

Plotting predefined density functions using ggplot and R

I have three data sets of different lengths and I would like to plot density functions of all three on the same plot. This is straight forward with base graphics:
n <- c(rnorm(10000), rnorm(10000))
a <- c(rnorm(10001), rnorm(10001, 0, 2))
p <- c(rnorm(10002), rnorm(10002, 2, .5))
plot(density(n))
lines(density(a))
lines(density(p))
Which gives me something like this:
alt text http://www.cerebralmastication.com/wp-content/uploads/2009/10/density.png
But I really want to do this with GGPLOT2 because I want to add other features that are only available with GGPLOT2. It seems that GGPLOT really wants to take my empirical data and calculate the density for me. And it gives me a bunch of lip because my data sets are of different lengths. So how do I get these three densities to plot in GGPLOT2?
The secret to happiness in ggplot2 is to put everything in the "long" (or what I guess matrix oriented people would call "sparse") format:
df <- rbind(data.frame(x="n",value=n),
data.frame(x="a",value=a),
data.frame(x="p",value=p))
qplot(value, colour=x, data=df, geom="density")
If you don't want colors:
qplot(value, group=x, data=df, geom="density")

Resources