R: Creating graphs with two y-axes - r

I'm looking to display two graphs on the same plot in R where the two graphs have vastly different scales i.e. the one goes from -0.001 to 0.0001 and the other goes from 0.05 to 0.2.
I've found this link http://www.statmethods.net/advgraphs/axes.html
which indicates how to display two y axes on the same plot, but I'm having trouble.
My code reads as follows:
plot(rateOfChangeMS[,1],type="l",ylim=c(-0.01,.2),axes = F)
lines(ratios[,1])
x = seq(-0.001,0.0001,0.0001)
x2 = seq(0.05,0.2,0.01)
axis(2,x)
axis(4,x2)
The problem I'm having is that, although R shows both axes, they are not next to each other as I would like, with the resulting graph attached. The left axis is measuring the graph with the small range, while the right is measuring the graph from 0.05 to 0.2. The second graph is, in fact, on the plot, but the scaling is so small that you can't see it.
Not sure if there is some etiquette rule I'm violating, never uploaded an image before so not quite sure how best to do it.
Any help would be greatly appreciated!
Thanks
Mike

Since you don't provide a reproducible example, or a representative dataset, this is a partial answer.
set.seed(1)
df <- data.frame(x=1:100,
y1=-0.001+0.002/(1:100)+rnorm(100,0,5e-5),
y2=0.05+0.0015*(0:99)+rnorm(100,0,1e-2))
ticks.1 <- seq(-0.001,0.001,0.0001)
ticks.2 <- seq(0.05,0.2,0.01)
plot(df$x, df$y1, type="l", yaxt="n", xlab="X", ylab="", col="blue")
axis(2, at=ticks.1, col.ticks="blue", col.axis="blue")
par(new=T)
plot(df$x, df$y2, type="l", yaxt="n", xlab="", ylab="", col="red")
axis(4, at=ticks.2, col.ticks="red", col.axis="red")
The reason your left axis is compressed is that both axes are on the same scale. You can get around that by basically superimposing two completely different plots (which is what having two axes does, after all). Incidentally, dual axes like this is not a good way to visualize data. It creates a grossly misleading visual impression.

Related

How can I plot a smooth line over plot points, like a contour/skyline of the plot?

What I'm looking for is best explained by a picture: A line that "contours" the maxima of my points (like giving the "skyline" of the plot). I have a plot of scattered points with dense, (mostly) unique x coordinates (not equally distributed in either axis). I want a red line surfacing this plot:
What I've tried/thought of so far is, that a simple "draw as line" approach fails due to the dense nature of the data with unique x values and a lot of local maxima and minima (basically at every point). The same fact makes a mere "get maximum"-approach impossible.
Therefore I'm asking: Is there some kind of smoothing option for a plot? Or any existing "skyline" operator for a plot?
I am specifically NOT looking for a "contour plot" or a "skyline plot" (as in Bayesian skylineplot) - the terms would actually describe what I want, but unfortunately are already used for other things.
Here is a minimal version of what I'm working with so far, a negative example of lines not giving the desired results. I uploaded sample data here.
load("xy_lidarProfiles.RData")
plot(x, y,
xlab="x", ylab="y", # axis
pch = 20, # point marker style (1 - 20)
asp = 1 # aspect of x and y ratio
)
lines(x, y, type="l", col = "red") # makes a mess
You will get close to your desired result if you order() by x values. What you want then is a running maximum, which TTR::runMax() provides.
plot(x[order(x)], y[order(x)], pch=20)
lines(x[order(x)], TTR::runMax(y[order(x)], n=10), col="red", lwd=2)
You may adjust the window with the n= parameter.

Contour plot via Scatter plot

Scatter plots are useless when number of plots is large.
So, e.g., using normal approximation, we can get the contour plot.
My question: Is there any package to implement the contour plot from scatter plot.
Thank you #G5W !! I can do it !!
You don't offer any data, so I will respond with some artificial data,
constructed at the bottom of the post. You also don't say how much data
you have although you say it is a large number of points. I am illustrating
with 20000 points.
You used the group number as the plotting character to indicate the group.
I find that hard to read. But just plotting the points doesn't show the
groups well. Coloring each group a different color is a start, but does
not look very good.
plot(x,y, pch=20, col=rainbow(3)[group])
Two tricks that can make a lot of points more understandable are:
1. Make the points transparent. The dense places will appear darker. AND
2. Reduce the point size.
plot(x,y, pch=20, col=rainbow(3, alpha=0.1)[group], cex=0.8)
That looks somewhat better, but did not address your actual request.
Your sample picture seems to show confidence ellipses. You can get
those using the function dataEllipse from the car package.
library(car)
plot(x,y, pch=20, col=rainbow(3, alpha=0.1)[group], cex=0.8)
dataEllipse(x,y,factor(group), levels=c(0.70,0.85,0.95),
plot.points=FALSE, col=rainbow(3), group.labels=NA, center.pch=FALSE)
But if there are really a lot of points, the points can still overlap
so much that they are just confusing. You can also use dataEllipse
to create what is basically a 2D density plot without showing the points
at all. Just plot several ellipses of different sizes over each other filling
them with transparent colors. The center of the distribution will appear darker.
This can give an idea of the distribution for a very large number of points.
plot(x,y,pch=NA)
dataEllipse(x,y,factor(group), levels=c(seq(0.15,0.95,0.2), 0.995),
plot.points=FALSE, col=rainbow(3), group.labels=NA,
center.pch=FALSE, fill=TRUE, fill.alpha=0.15, lty=1, lwd=1)
You can get a more continuous look by plotting more ellipses and leaving out the border lines.
plot(x,y,pch=NA)
dataEllipse(x,y,factor(group), levels=seq(0.11,0.99,0.02),
plot.points=FALSE, col=rainbow(3), group.labels=NA,
center.pch=FALSE, fill=TRUE, fill.alpha=0.05, lty=0)
Please try different combinations of these to get a nice picture of your data.
Additional response to comment: Adding labels
Perhaps the most natural place to add group labels is the centers of the
ellipses. You can get that by simply computing the centroids of the points in each group. So for example,
plot(x,y,pch=NA)
dataEllipse(x,y,factor(group), levels=c(seq(0.15,0.95,0.2), 0.995),
plot.points=FALSE, col=rainbow(3), group.labels=NA,
center.pch=FALSE, fill=TRUE, fill.alpha=0.15, lty=1, lwd=1)
## Now add labels
for(i in unique(group)) {
text(mean(x[group==i]), mean(y[group==i]), labels=i)
}
Note that I just used the number as the group label, but if you have a more elaborate name, you can change labels=i to something like
labels=GroupNames[i].
Data
x = c(rnorm(2000,0,1), rnorm(7000,1,1), rnorm(11000,5,1))
twist = c(rep(0,2000),rep(-0.5,7000), rep(0.4,11000))
y = c(rnorm(2000,0,1), rnorm(7000,5,1), rnorm(11000,6,1)) + twist*x
group = c(rep(1,2000), rep(2,7000), rep(3,11000))
You can use hexbin::hexbin() to show very large datasets.
#G5W gave a nice dataset:
x = c(rnorm(2000,0,1), rnorm(7000,1,1), rnorm(11000,5,1))
twist = c(rep(0,2000),rep(-0.5,7000), rep(0.4,11000))
y = c(rnorm(2000,0,1), rnorm(7000,5,1), rnorm(11000,6,1)) + twist*x
group = c(rep(1,2000), rep(2,7000), rep(3,11000))
If you don't know the group information, then the ellipses are inappropriate; this is what I'd suggest:
library(hexbin)
plot(hexbin(x,y))
which produces
If you really want contours, you'll need a density estimate to plot. The MASS::kde2d() function can produce one; see the examples in its help page for plotting a contour based on the result. This is what it gives for this dataset:
library(MASS)
contour(kde2d(x,y))

How do I plot an abline() when I don't have any data points (in R)

I have to plot a few different simple linear models on a chart, the main point being to comment on them. I have no data for the models. I can't get R to create a plot with appropriate axes, i.e. I can't get the range of the axes correct. I think I'd like my y-axis to 0-400 and x to be 0-50.
Models are:
$$
\widehat y=108+0.20x_1
$$$$
\widehat y=101+2.15x_1
$$$$
\widehat y=132+0.20x_1
$$$$
\widehat y=119+8.15x_1
$$
I know I could possibly do this much more easily in a different software or create a dataset from the model and estimate and plot the model from that but I'd love to know if there is a better way in R.
As #Glen_b noticed, type = "n" in plot produces a plot with nothing on it. As it demands data, you have to provide anything as x - it can be NA, or some data. If you provide actual data, the plot function will figure out the plot margins from the data, otherwise you have to choose the margins by hand using xlim and ylim arguments. Next, you use abline that has parameters a and b for intercept and slope (or h and v if you want just a horizontal or vertical line).
plot(x=NA, type="n", ylim=c(100, 250), xlim=c(0, 50),
xlab=expression(x[1]), ylab=expression(hat(y)))
abline(a=108, b=0.2, col="red")
abline(a=101, b=2.15, col="green")
abline(a=132, b=0.2, col="blue")
abline(a=119, b=8.15, col="orange")

Align x-axis to plot for consistent use with grid

I'm trying to build an histogram using data available from here. I'm using using the CSV version of this database to display the number of exoplantes discovered per year. A simple script would be
bulkdata <- read.csv('file.csv',head=1,sep=',')
pdf(file="yearcount.pdf",family="Times")
bins <- seq(min(bulkdata$discovered,na.rm=T),max(bulkdata$discovered,na.rm=T),by=1)
hist(bulkdata$discovered,breaks=bins,col='gray',ylab="Discovered",xlab="Year",main="",ylim=c(0,100),axes=FALSE)
axis(1, at=seq(1989,2012,by=1))
axis(2, at=seq(0,100,by=10))
grid(nx=10)
hist(bulkdata$discovered,breaks=bins,col='gray',ylab="Discovered",xlab="Year",main="", add=TRUE)
dev.off()
The problem is that the xaxis is not aligned with the 0 point of the yaxis. This is a problem because the lines drawn by grid() does not mean anything because they are not aligned with the ticks! I tried to add in axis(1, at=seq(1989,2012,by=1)) the option line=-1 to correct but this way the axis is correctly drawn but the grid start below the axis. Maybe a non standard package is needed?
?grid says:
If more fine tuning is required, use ‘abline(h = ., v = .)’
directly.
So here's a suggestion:
par(las=1,bty="l")
h <- hist(bulkdata$discovered,breaks=bins,
col='gray',ylab="Discovered",xlab="Year",main="",
ylim=c(0,100),axes=FALSE)
yrs <- 1989:2012
yvals <- seq(0,100,by=10)
axis(1, at=yrs)
axis(2, at=yvals)
abline(h=yvals,v=yrs,col="gray",lty=3)
hist(bulkdata$discovered,breaks=bins,
col='gray',ylab="Discovered",xlab="Year",main="", add=TRUE)
I would consider making the grid lines a little bit sparser (e.g. every 5 years?)

change look-and-feel of plot to resemble hist

I used the information from this post to create a histogram with logarithmic scale:
Histogram with Logarithmic Scale
However, the output from plot looks nothing like the output from hist. Does anyone know how to configure the output from plot to resemble the output from hist? Thanks for the help.
A simplified, reproducible version of the linked answer is
x <- rlnorm(1000)
hx <- hist(x, plot=FALSE)
plot(hx$counts, type="h", log="y", lwd=10, lend="square")
To get the axes looking more "hist-like", replace the last line with
plot(hx$counts, type="h", log="y", lwd=10, lend="square", axes = FALSE)
Axis(side=1)
Axis(side=2)
Getting the bars to join up is going to be a nightmare using this method. I suggest using trial and error with values of lwd (in this example, 34 is somewhere close to looking right), or learning to use lattice or ggplot.
EDIT:
You can't set a border colour, because the bars aren't really rectangles – they are just fat lines. We can fake the border effect by drawing slightly thinner lines over the top. The updated code is
par(lend="square")
bordercol <- "blue"
fillcol <- "pink"
linewidth <- 24
plot(hx$counts, type="h", log="y", lwd=linewidth, col=bordercol, axes = FALSE)
lines(hx$counts, type="h", lwd=linewidth-2, col=fillcol)
Axis(side=1)
Axis(side=2)
How about using ggplot2?
x <- rnorm(1000)
qplot(x) + scale_y_log10()
But I agree with Hadley's comment on the other post that having a histogram with a log scale seems weird to me =).

Resources