How do I prevent the red line for the last distribution from being plotted outside of the area of the plot in the graph below?
chart http://i.minus.com/jRiGxDBVw6kjZ.jpeg
I generated the graph with the following code:
x <- seq(0,4,.1)
alpha_0 <- 2
beta_0 <- .2
hist(rexp(256, rate=1))
sample <- rexp(256, rate=1)
plot(x,dgamma(x, shape=alpha_0, rate=beta_0),type='l',col='black',ylim=c(0,2),main="Posteriors of Exponential Distribution", ylab='')
lines(x,dgamma(x, shape=alpha_0+4, rate=beta_0+sum(sample[1:4])),col='blue')
lines(x,dgamma(x, shape=alpha_0+8, rate=beta_0+sum(sample[1:8])),col='green')
lines(x,dgamma(x, shape=alpha_0+16, rate=beta_0+sum(sample[1:16])),col='orange')
lines(x,dgamma(x, shape=alpha_0+256, rate=beta_0+sum(sample[1:256])),col='red',)
legend(x=2.5,y=2, c("prior","n=4", "n=8", "n=16", 'n=256'), col = c('black', 'blue', 'green','orange' ,'red'),lty=c(1,1,1,1))
Sorry, seems like a pretty simple fix, I just couldn't figure it out from the documentation. Thanks for your help.
Yes, as Joran mentioned, it was graphing the line outside of the plot area because I ran par(xpd=TRUE) earlier in the session to try to put the legend outside. I simply ran par(xpd=FALSE) and it solved the problem.
Related
Scatter plots are useless when number of plots is large.
So, e.g., using normal approximation, we can get the contour plot.
My question: Is there any package to implement the contour plot from scatter plot.
Thank you #G5W !! I can do it !!
You don't offer any data, so I will respond with some artificial data,
constructed at the bottom of the post. You also don't say how much data
you have although you say it is a large number of points. I am illustrating
with 20000 points.
You used the group number as the plotting character to indicate the group.
I find that hard to read. But just plotting the points doesn't show the
groups well. Coloring each group a different color is a start, but does
not look very good.
plot(x,y, pch=20, col=rainbow(3)[group])
Two tricks that can make a lot of points more understandable are:
1. Make the points transparent. The dense places will appear darker. AND
2. Reduce the point size.
plot(x,y, pch=20, col=rainbow(3, alpha=0.1)[group], cex=0.8)
That looks somewhat better, but did not address your actual request.
Your sample picture seems to show confidence ellipses. You can get
those using the function dataEllipse from the car package.
library(car)
plot(x,y, pch=20, col=rainbow(3, alpha=0.1)[group], cex=0.8)
dataEllipse(x,y,factor(group), levels=c(0.70,0.85,0.95),
plot.points=FALSE, col=rainbow(3), group.labels=NA, center.pch=FALSE)
But if there are really a lot of points, the points can still overlap
so much that they are just confusing. You can also use dataEllipse
to create what is basically a 2D density plot without showing the points
at all. Just plot several ellipses of different sizes over each other filling
them with transparent colors. The center of the distribution will appear darker.
This can give an idea of the distribution for a very large number of points.
plot(x,y,pch=NA)
dataEllipse(x,y,factor(group), levels=c(seq(0.15,0.95,0.2), 0.995),
plot.points=FALSE, col=rainbow(3), group.labels=NA,
center.pch=FALSE, fill=TRUE, fill.alpha=0.15, lty=1, lwd=1)
You can get a more continuous look by plotting more ellipses and leaving out the border lines.
plot(x,y,pch=NA)
dataEllipse(x,y,factor(group), levels=seq(0.11,0.99,0.02),
plot.points=FALSE, col=rainbow(3), group.labels=NA,
center.pch=FALSE, fill=TRUE, fill.alpha=0.05, lty=0)
Please try different combinations of these to get a nice picture of your data.
Additional response to comment: Adding labels
Perhaps the most natural place to add group labels is the centers of the
ellipses. You can get that by simply computing the centroids of the points in each group. So for example,
plot(x,y,pch=NA)
dataEllipse(x,y,factor(group), levels=c(seq(0.15,0.95,0.2), 0.995),
plot.points=FALSE, col=rainbow(3), group.labels=NA,
center.pch=FALSE, fill=TRUE, fill.alpha=0.15, lty=1, lwd=1)
## Now add labels
for(i in unique(group)) {
text(mean(x[group==i]), mean(y[group==i]), labels=i)
}
Note that I just used the number as the group label, but if you have a more elaborate name, you can change labels=i to something like
labels=GroupNames[i].
Data
x = c(rnorm(2000,0,1), rnorm(7000,1,1), rnorm(11000,5,1))
twist = c(rep(0,2000),rep(-0.5,7000), rep(0.4,11000))
y = c(rnorm(2000,0,1), rnorm(7000,5,1), rnorm(11000,6,1)) + twist*x
group = c(rep(1,2000), rep(2,7000), rep(3,11000))
You can use hexbin::hexbin() to show very large datasets.
#G5W gave a nice dataset:
x = c(rnorm(2000,0,1), rnorm(7000,1,1), rnorm(11000,5,1))
twist = c(rep(0,2000),rep(-0.5,7000), rep(0.4,11000))
y = c(rnorm(2000,0,1), rnorm(7000,5,1), rnorm(11000,6,1)) + twist*x
group = c(rep(1,2000), rep(2,7000), rep(3,11000))
If you don't know the group information, then the ellipses are inappropriate; this is what I'd suggest:
library(hexbin)
plot(hexbin(x,y))
which produces
If you really want contours, you'll need a density estimate to plot. The MASS::kde2d() function can produce one; see the examples in its help page for plotting a contour based on the result. This is what it gives for this dataset:
library(MASS)
contour(kde2d(x,y))
I'm new to R. Previously, I've been able to overlay 2 separate plots that were of the same kind, p1 and p2, using plot (p1); plot (p2, add=T).
I'm struggling with the definition of factors when overlaying a barplot with a point plot showing all individual points.
I can individually plot the barplot as I want it. The point plot looks like I want it, but I realize I'm using an incorrect definition of phase as numerical to force R plot to display each value, rather than default to a boxplot (like when I use plot(my.df$cond, my.df$val).
Any tips on defining my variable types correctly or whether I'm using the correct barplot and plot functions, would be greatly appreciated. Thank you so much.
shpad <- c(1,2,5,6,1,2,5,6,1,2,5,6,1,2,5,6)
my.df <- data.frame(val=c(0.0738,0.0518,0.002,0.0397,0.1452,0.1152,0.1774,0.0658,0.0218,0.0497,-0.0296,0.0653,0.0848,0.1296,0.1416,0.0923,
phase=c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4),
sub=c(1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4),
cond=c("NsNm", "NsNm", "NsNm", "NsNm", "NsLm", "NsLm", "NsLm", "NsLm", "LsNm", "LsNm", "LsNm", "LsNm", "LsLm", "LsLm", "LsLm", "LsLm"))
avg <-tapply(my.df$val, my.df$phase, mean)
barplot(avg, border=NA, names.arg=c("NsNm", "NsLm", "LsNm", "LsLm"),col=c("blue","darkblue","red", "darkred"),ylab = "score",ylim=c(-0.03,0.25))
plot(my.df$phase, my.df$val, type="p", ylim=c(-0.03,0.25), ylab = "score", pch=shpad)
tl;dr: problem is that if instead of the last line, I have plot(my.df$phase, my.df$val, type="p", ylim=c(-0.03,0.25), ylab = "score", pch=shpad, add=T), the formats are incongruent.
Alright, so, I've tried for a bit to accomplish what you wanted, but the best I could do with the base plotting system is this:
Which is accomplished purely by your lines of code above except for the last line, which I replaced with
points(my.df$phase,my.df$val,type="p",pch=shpad)
However, I think you can do much better, if you want to keep the same kind of plot, using the ggplot2 library. Using this code:
library('ggplot2')
new.df <- data.frame(avg,phase=levels(factor(phase)))
ggplot(new.df) +
geom_bar(stat="identity",aes(x=levels(phase),y=avg, fill=c("NsNm","NsLm","LsNm","LsLm")))+
geom_point(aes(x=my.df$phase,y=my.df$val,shape=factor(shpad))) +
scale_x_discrete(name="Type",labels=c("NsNm","NsLm","LsNm","LsLm")) +
ylab("Score")
you can make this chart:
I didn't adjust the coloring and the point types and the legend titles (not sure how important they are, but those can be fiddled with). However, you can see this probably produces the result you were aiming for.
I am trying to remove all grid lines outside the graph. I noticed that behavior in R is not deterministic, i.e., sometimes grid lines are inside the graph only (as I want), but sometimes it spans an entire figure (see sample). I'd like to always put grid lines inside.
I read grid manual, but could not find an option to do so. abline() also puts grid lines across an entire figure.
The code I am using is
plot(xrange, yrange, type="n", xlab="X", ylab="Y", xlim=c(200,1500), ylim=c(0,10000))
...
grid(lty=3, col="gray")
Any help is appreciated. Thanks,
Nodir
When I have had this problem it is because par(xpd=TRUE) is somewhere in the code. Try setting par(xpd=FALSE) before using grid() and then par(xpd=TRUE). The sample code was used to generate the same the two plots, one of which has the grid lines extending outside of the plot region.
set.seed(1)
x <- rnorm(100)
y <- rnorm(100)
# scatter plot with gridlines inside
par(xpd=FALSE) # do not plot outside the plot region
plot(x,y)
grid(lwd=2)
# scatterplot with gridlines outside the region
par(xpd=TRUE) # plot outside the plot region
plot(x,y)
grid(lwd=2)
I have plotted a histogram in R and marked quantiles using abline() in vertical intervals. However, I want to plot a legend that shows the corresponding values to the quantiles together with the quantile interval itself.
The current legend is almost there as you can see if you run the example code below. But I can't seem to succeed at aligning the legend interval with its corresponding value and colored line symbol. I tried to use a data.frame() to achieve this but it didn't work out.
Any tips or suggestions will be very much appreciated.
x<-1:100
quantiles_x<-quantile(x)
hist(x)
abline(v=quantiles_x, col=c("blue", "green","red","yellow","black"))
legend('topright', legend=c(names(quantiles_x), levels(factor(quantiles_x))), lwd=1, col=c("blue","green","red","yellow","black"))
Something like this??
x<-1:100
quantiles_x<-quantile(x)
hist(x)
abline(v=quantiles_x, col=c("blue", "green","red","yellow","black"))
labels <- paste(names(quantiles_x), "[",quantiles_x,"]")
legend('topright', legend=labels, lwd=1,
col=c("blue","green","red","yellow","black"))
I used the information from this post to create a histogram with logarithmic scale:
Histogram with Logarithmic Scale
However, the output from plot looks nothing like the output from hist. Does anyone know how to configure the output from plot to resemble the output from hist? Thanks for the help.
A simplified, reproducible version of the linked answer is
x <- rlnorm(1000)
hx <- hist(x, plot=FALSE)
plot(hx$counts, type="h", log="y", lwd=10, lend="square")
To get the axes looking more "hist-like", replace the last line with
plot(hx$counts, type="h", log="y", lwd=10, lend="square", axes = FALSE)
Axis(side=1)
Axis(side=2)
Getting the bars to join up is going to be a nightmare using this method. I suggest using trial and error with values of lwd (in this example, 34 is somewhere close to looking right), or learning to use lattice or ggplot.
EDIT:
You can't set a border colour, because the bars aren't really rectangles – they are just fat lines. We can fake the border effect by drawing slightly thinner lines over the top. The updated code is
par(lend="square")
bordercol <- "blue"
fillcol <- "pink"
linewidth <- 24
plot(hx$counts, type="h", log="y", lwd=linewidth, col=bordercol, axes = FALSE)
lines(hx$counts, type="h", lwd=linewidth-2, col=fillcol)
Axis(side=1)
Axis(side=2)
How about using ggplot2?
x <- rnorm(1000)
qplot(x) + scale_y_log10()
But I agree with Hadley's comment on the other post that having a histogram with a log scale seems weird to me =).