Multiple Pen's Parade Graphs on the same Plot - r

I'm doing stochastic dominance analysis with diferent income distributions using Pen's Parade. I can plot a single Pen's Parade using Pen function from ineq package, but I need a visual comparison and I want multiple lines in the same image. I don't know how extract values from the function, so I can't do this.
I have the following reproducible example:
set.seed(123)
x <- rnorm(100)
y <- rnorm(100, mean = 0.2)
library(ineq)
Pen(x)
Pen(y)
I obtain the following plots:
I want obtain sometime as the following:

You can use add = TRUE:
set.seed(123)
x <- rnorm(100)
y <- rnorm(100, mean = 0.2)
library(ineq)
Pen(x); Pen(y, add = TRUE)
From help("Pen"):
add logical. Should the plot be added to an existing plot?
While the solution mentioned by M-M in the comments is a more general solution, in this specific case it produces a busy Y axis:
Pen(x)
par(new = TRUE)
Pen(y)
I would generalize the advice for plotting functions in this way:
Check the plotting function's help file. If it has an add argument, use that.
Otherwise, use the par(new = TRUE) technique
Update
As M-M helpfully mentions in the comments, their more general solution will not produce a busy Y axis if you manually suppress the Y axis on the second plot:
Pen(x)
par(new = TRUE)
Pen(y, yaxt = "n")

Looking at ?ineq::Pen() it seems to work like plot(); therefore, followings work for you.
Pen(x)
Pen(y, add=T)
Note: However, add=T cuts out part of your data since second plot has points which fall out of the limit of the first.
Update on using par(new=T):
Using par(new=T) basically means overlaying two plots on top of each other; hence, it is important to make them with the same scale. We can achieve that by setting the same axis limits. That said, while using add=T argument it is desired to set limits of the axis to not loose any part of data. This is the best practice for overlaying two plots.
Pen(x, ylim=c(0,38), xlim=c(0,1))
par(new=T)
Pen(y, col="red", ylim=c(0,38), xlim=c(0,1), yaxt='n', xaxt='n')
Essentially, you can do the same with add=T.

Related

Axis breaks in ggplot histogram in R [duplicate]

I have data that is mostly centered in a small range (1-10) but there is a significant number of points (say, 10%) which are in (10-1000). I would like to plot a histogram for this data that will focus on (1-10) but will also show the (10-1000) data. Something like a log-scale for th histogram.
Yes, i know this means not all bins are of equal size
A simple hist(x) gives
while hist(x,breaks=c(0,1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2,3,4,5,7.5,10,15,20,50,100,200,500,1000,10000))) gives
none of which is what I want.
update
following the answers here I now produce something that is almost exactly what I want (I went with a continuous plot instead of bar-histogram):
breaks <- c(0,1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2,4,8)
ggplot(t,aes(x)) + geom_histogram(colour="darkblue", size=1, fill="blue") + scale_x_log10('true size/predicted size', breaks = breaks, labels = breaks)![alt text][3]
the only problem is that I'd like to match between the scale and the actual bars plotted. There two options for doing that : the one is simply use the actual margins of the plotted bars (how?) then get "ugly" x-axis labels like 1.1754,1.2985 etc. The other, which I prefer, is to control the actual bins margins used so they will match the breaks.
Log scale histograms are easier with ggplot than with base graphics. Try something like
library(ggplot2)
dfr <- data.frame(x = rlnorm(100, sdlog = 3))
ggplot(dfr, aes(x)) + geom_histogram() + scale_x_log10()
If you are desperate for base graphics, you need to plot a log-scale histogram without axes, then manually add the axes afterwards.
h <- hist(log10(dfr$x), axes = FALSE)
Axis(side = 2)
Axis(at = h$breaks, labels = 10^h$breaks, side = 1)
For completeness, the lattice solution would be
library(lattice)
histogram(~x, dfr, scales = list(x = list(log = TRUE)))
AN EXPLANATION OF WHY LOG VALUES ARE NEEDED IN THE BASE CASE:
If you plot the data with no log-transformation, then most of the data are clumped into bars at the left.
hist(dfr$x)
The hist function ignores the log argument (because it interferes with the calculation of breaks), so this doesn't work.
hist(dfr$x, log = "y")
Neither does this.
par(xlog = TRUE)
hist(dfr$x)
That means that we need to log transform the data before we draw the plot.
hist(log10(dfr$x))
Unfortunately, this messes up the axes, which brings us to workaround above.
Using ggplot2 seems like the most easy option. If you want more control over your axes and your breaks, you can do something like the following :
EDIT : new code provided
x <- c(rexp(1000,0.5)+0.5,rexp(100,0.5)*100)
breaks<- c(0,0.1,0.2,0.5,1,2,5,10,20,50,100,200,500,1000,10000)
major <- c(0.1,1,10,100,1000,10000)
H <- hist(log10(x),plot=F)
plot(H$mids,H$counts,type="n",
xaxt="n",
xlab="X",ylab="Counts",
main="Histogram of X",
bg="lightgrey"
)
abline(v=log10(breaks),col="lightgrey",lty=2)
abline(v=log10(major),col="lightgrey")
abline(h=pretty(H$counts),col="lightgrey")
plot(H,add=T,freq=T,col="blue")
#Position of ticks
at <- log10(breaks)
#Creation X axis
axis(1,at=at,labels=10^at)
This is as close as I can get to the ggplot2. Putting the background grey is not that straightforward, but doable if you define a rectangle with the size of your plot screen and put the background as grey.
Check all the functions I used, and also ?par. It will allow you to build your own graphs. Hope this helps.
A dynamic graph would also help in this plot. Use the manipulate package from Rstudio to do a dynamic ranged histogram:
library(manipulate)
data_dist <- table(data)
manipulate(barplot(data_dist[x:y]), x = slider(1,length(data_dist)), y = slider(10, length(data_dist)))
Then you will be able to use sliders to see the particular distribution in a dynamically selected range like this:

filled.contour() in R: nonlinear key range

I am using filled.contour() to plot data stored in a matrix. The data is generated by a (highly) non-linear function, hence its distribution is not uniform at all and the range is very large.
Consequently, I have to use the option "levels" to fine tune the plot. However, filled.contour() does not use these custom levels to make an appropriate color key for the heat map, which I find quite surprising.
Here is a simple example of what I mean:
x = c(20:200/100)
y = c(20:200/100)
z = as.matrix(exp(x^2)) %*% exp(y^2)
filled.contour(x=x,y=y,z=z,color.palette=colorRampPalette(c('green','yellow','red')),levels=c(1:60/3,30,50,150,250,1000,3000))
As you can see, the color key produced with the code above is pretty much useless. I would like to use some sort of projection (perhaps sin(x) or tanh(x)?), so that the upper range is not over-represented in the key (in a linear way).
At this point, I would like to:
1) know if there is something very simple/obvious I am missing, e.g.: an option to make this "key range adapting" automagically;
2) seek suggestions/help on how to do it myself, should the answer to 1) be negative.
Thanks a lot!
PS: I apologize for my English, which is far from perfect. Please let me know if you need me to clarify anything.
I feel your frustration. I never found a way to do this with filled contour, so have usually reverted to using image and then adding my own scale as a separate plot. I wrote the function image.scale to help out with this (link). Below is an example of how you can supply a log-transform to your scale in order to stretch out the small values - then label the scale with the non-log-transformed values as labels:
Example:
source("image.scale.R") # http://menugget.blogspot.de/2011/08/adding-scale-to-image-plot.html
x = c(20:200/100)
y = c(20:200/100)
z = as.matrix(exp(x^2)) %*% exp(y^2)
pal <- colorRampPalette(c('green','yellow','red'))
breaks <- c(1:60/3,30,50,150,250,1000,3000)
ncolors <- length(breaks)-1
labs <- c(0.5, 1, 3,30,50,150,250,1000,3000)
#x11(width=6, height=6)
layout(matrix(1:2, nrow=1, ncol=2), widths=c(5,1), heights=c(6))
layout.show(2)
par(mar=c(5,5,1,1))
image(x=x,y=y,z=log(z), col=pal(ncolors), breaks=log(breaks))
box()
par(mar=c(5,0,1,4))
image.scale(log(z), col=pal(ncolors), breaks=log(breaks), horiz=FALSE, xlab="", ylab="", xaxt="n", yaxt="n")
axis(4, at=log(labs), labels=labs)
box()
Result:

Plot with two Y axes : confidence intervals

I am trying to plot several points with error bars, with two y axes.
However at every call of the plotCI or errbar functions, a new plot is initialized - with or without par(new=TRUE) calls -.
require(plotrix)
x <- 1:10
y1 <- x + rnorm(10)
y2<-x+rnorm(10)
delta <- runif(10)
plotCI(x,y=y1,uiw=delta,xaxt="n",gap=0)
axis(side=1,at=c(1:10),labels=rep("a",10),cex=0.7)
par(new=TRUE)
axis(4)
plotCI(x,y=y2,uiw=delta,xaxt="n",gap=0)
I have also tried the twoord.plot function from plotrix, but I don't think it's possible to add the error bars.
With ggplot2 I have only managed to plot in two different panels with the same Y axis.
Is there a way to do this?
Use add=TRUE,
If FALSE (default), create a new plot; if TRUE, add error bars to an
existing plot.
For example the last line becomes:
plotCI(x,y=y2,uiw=delta,xaxt="n",gap=0,add=TRUE)
PS: hard to do this with ggplot2. take a look at this hadley code
EDIT
The user coordinate system is now redefined by specifying a new user setting. Here I do it manually.
plotCI(x,y=y1,uiw=delta,xaxt="n",gap=0)
axis(side=1,at=c(1:10),labels=rep("a",10),cex=0.7)
usr <- par("usr")
par(usr=c(usr[1:2], -1, 20))
plotCI(x,y=y2,uiw=delta,xaxt="n",gap=0,add=TRUE,col='red')
axis(4,col.ticks ='red')

R: Plotting one ECDF on top of another in different colors

I have a couple of cumulative empirical density functions which I would like to plot on top of each other in order to illustrate differences in the two curves. As was pointed out in a previous question, the function to draw the ECDF is simply plot(Ecdf()) And as I read the fine manual page, I determined that I can plot multiple ECDFs on top of each other using something like the following:
require( Hmisc )
set.seed(3)
g <- c(rep(1, 20), rep(2, 20))
Ecdf(c( rnorm(20), rnorm(20)), group=g)
However my curves sometimes overlap a bit and can be hard to tell which is which, just like the example above which produces this graph:
I would really like to make the color of these two CDFs different. I can't figure out how to do that, however. Any tips?
If memory serves, I have done this in the past. As I recall, you needed to trick it as Ecdf() is so darn paramterised. I think in help(ecdf) it hints that it is just a plot of stepfunctions, so you could estimate two or more ecdfs, plot one and then annotate via lines().
Edit Turns out it is as easy as
R> Ecdf(c(rnorm(20), rnorm(20)), group=g, col=c('blue', 'orange'))
as the help page clearly states the col= argument. But I have also found some scriptlets where I used plot.stepfun() explicitly.
You can add each curve one at a time (each with its own style), e.g.
Ecdf(rnorm(20), lwd = 2)
Ecdf(rnorm(20),add = TRUE, col = 'red', lty = 1)
Without using Ecdf (doesn't look like Hmisc is available):
set.seed(3)
mat <- cbind(rnorm(20), rnorm(20))
matplot(apply(mat, 2, sort), seq(20)/20, type='s')

How can I plot a histogram of a long-tailed data using R?

I have data that is mostly centered in a small range (1-10) but there is a significant number of points (say, 10%) which are in (10-1000). I would like to plot a histogram for this data that will focus on (1-10) but will also show the (10-1000) data. Something like a log-scale for th histogram.
Yes, i know this means not all bins are of equal size
A simple hist(x) gives
while hist(x,breaks=c(0,1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2,3,4,5,7.5,10,15,20,50,100,200,500,1000,10000))) gives
none of which is what I want.
update
following the answers here I now produce something that is almost exactly what I want (I went with a continuous plot instead of bar-histogram):
breaks <- c(0,1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2,4,8)
ggplot(t,aes(x)) + geom_histogram(colour="darkblue", size=1, fill="blue") + scale_x_log10('true size/predicted size', breaks = breaks, labels = breaks)![alt text][3]
the only problem is that I'd like to match between the scale and the actual bars plotted. There two options for doing that : the one is simply use the actual margins of the plotted bars (how?) then get "ugly" x-axis labels like 1.1754,1.2985 etc. The other, which I prefer, is to control the actual bins margins used so they will match the breaks.
Log scale histograms are easier with ggplot than with base graphics. Try something like
library(ggplot2)
dfr <- data.frame(x = rlnorm(100, sdlog = 3))
ggplot(dfr, aes(x)) + geom_histogram() + scale_x_log10()
If you are desperate for base graphics, you need to plot a log-scale histogram without axes, then manually add the axes afterwards.
h <- hist(log10(dfr$x), axes = FALSE)
Axis(side = 2)
Axis(at = h$breaks, labels = 10^h$breaks, side = 1)
For completeness, the lattice solution would be
library(lattice)
histogram(~x, dfr, scales = list(x = list(log = TRUE)))
AN EXPLANATION OF WHY LOG VALUES ARE NEEDED IN THE BASE CASE:
If you plot the data with no log-transformation, then most of the data are clumped into bars at the left.
hist(dfr$x)
The hist function ignores the log argument (because it interferes with the calculation of breaks), so this doesn't work.
hist(dfr$x, log = "y")
Neither does this.
par(xlog = TRUE)
hist(dfr$x)
That means that we need to log transform the data before we draw the plot.
hist(log10(dfr$x))
Unfortunately, this messes up the axes, which brings us to workaround above.
Using ggplot2 seems like the most easy option. If you want more control over your axes and your breaks, you can do something like the following :
EDIT : new code provided
x <- c(rexp(1000,0.5)+0.5,rexp(100,0.5)*100)
breaks<- c(0,0.1,0.2,0.5,1,2,5,10,20,50,100,200,500,1000,10000)
major <- c(0.1,1,10,100,1000,10000)
H <- hist(log10(x),plot=F)
plot(H$mids,H$counts,type="n",
xaxt="n",
xlab="X",ylab="Counts",
main="Histogram of X",
bg="lightgrey"
)
abline(v=log10(breaks),col="lightgrey",lty=2)
abline(v=log10(major),col="lightgrey")
abline(h=pretty(H$counts),col="lightgrey")
plot(H,add=T,freq=T,col="blue")
#Position of ticks
at <- log10(breaks)
#Creation X axis
axis(1,at=at,labels=10^at)
This is as close as I can get to the ggplot2. Putting the background grey is not that straightforward, but doable if you define a rectangle with the size of your plot screen and put the background as grey.
Check all the functions I used, and also ?par. It will allow you to build your own graphs. Hope this helps.
A dynamic graph would also help in this plot. Use the manipulate package from Rstudio to do a dynamic ranged histogram:
library(manipulate)
data_dist <- table(data)
manipulate(barplot(data_dist[x:y]), x = slider(1,length(data_dist)), y = slider(10, length(data_dist)))
Then you will be able to use sliders to see the particular distribution in a dynamically selected range like this:

Resources