change look-and-feel of plot to resemble hist - r

I used the information from this post to create a histogram with logarithmic scale:
Histogram with Logarithmic Scale
However, the output from plot looks nothing like the output from hist. Does anyone know how to configure the output from plot to resemble the output from hist? Thanks for the help.

A simplified, reproducible version of the linked answer is
x <- rlnorm(1000)
hx <- hist(x, plot=FALSE)
plot(hx$counts, type="h", log="y", lwd=10, lend="square")
To get the axes looking more "hist-like", replace the last line with
plot(hx$counts, type="h", log="y", lwd=10, lend="square", axes = FALSE)
Axis(side=1)
Axis(side=2)
Getting the bars to join up is going to be a nightmare using this method. I suggest using trial and error with values of lwd (in this example, 34 is somewhere close to looking right), or learning to use lattice or ggplot.
EDIT:
You can't set a border colour, because the bars aren't really rectangles – they are just fat lines. We can fake the border effect by drawing slightly thinner lines over the top. The updated code is
par(lend="square")
bordercol <- "blue"
fillcol <- "pink"
linewidth <- 24
plot(hx$counts, type="h", log="y", lwd=linewidth, col=bordercol, axes = FALSE)
lines(hx$counts, type="h", lwd=linewidth-2, col=fillcol)
Axis(side=1)
Axis(side=2)

How about using ggplot2?
x <- rnorm(1000)
qplot(x) + scale_y_log10()
But I agree with Hadley's comment on the other post that having a histogram with a log scale seems weird to me =).

Related

Create scatter plot with third dimension and multiple colors

Purpose
Create scatter plot with third dimension and multiple colors.
First:
- 3rd dimension with another scale in contrast to y-axis
- create two colors (this is done using col, see code)
Sketch simulating the purpose:
Code
Two "containers" of points plotted in this way:
plot(1:3, c(3,3,3))
points(1:3, c(2,2,2), col="blue")
Another nice plotting is done by:
#install.packages("hexbin")
library(hexbin)
x <- 1:1000#rnorm(1000)
y <- 1500:501#rnorm(1000)
bin<-hexbin(x, y, xbins=50)
plot(bin, main="Hexagonal Binning")
But I do not know how to use hexbin (I do not understand the functionality). There are needed two colors which I do not know how to generate.
Questions
How to create the 3rd axis with other scaling than the y-axis?
Can I use ´hexbin´ to get the result?
For some reason, using points() does not work, but using plot() does work:
#Set margin on right side to be a bit larger
par(mar = c(5,4.5,4,5))
#Plot first set of data
plot(1:3, rep(3,3), ylim=c(-5,5), xlab="X-Axis", ylab="Y-Axis 1")
#Plot second set of data on different axis.
par(new=T)
plot(1:3, rep(5,3), ylim=c(-10,10), col="blue", xlab="", ylab="", axes=FALSE)
#Add numbers and labels to the second y-axis
mtext("Y-Axis 2",side=4,line=3)
axis(4, ylim=c(-10,10))

Line markers (pch) are not shown for big datasets using R plot command

I am able to plot data and and everything seems to work. The only problem is that R seems to decide if line markers are inserted or not. I have several different datasets, for the dataset with 1500 the plot works fine and I can see the markers. Any other dataset, all of them with 3000+ points the plot ignores all markers and just the line can be seen.
Bellow you guys can see the code used to plot the data and example plot Figures.
My question is, how can I assure that R will plot the lines with markers? Am I doing something wrong?
Thanks for your time and help.
png(filename="figures/all.normdtime.png", width=800, height=600)
plot(ecdf(data1[,10]), col="blue", ann=FALSE, pch=c(1,NA,NA,NA,NA,NA,NA,NA,NA), cex=2)
lines(ecdf(data2[,10]), col="green", pch=c(3,NA,NA,NA,NA,NA,NA,NA,NA), cex=2)
lines(ecdf(data3[,10]), col="red", pch=c(8,NA,NA,NA,NA,NA,NA,NA,NA), cex=2)
lines(ecdf(data4[,10]), col="orange", pch=c(2,NA,NA,NA,NA,NA,NA,NA,NA), cex=2)
title(xlab="Transfer rate (bytes/ms)")
title(main="ECDF Normalized Download Time")
dev.off()
No markers, 21100 points plotted
With markers, 1400 points plotted
I would try something like this:
data1 <- dnorm(seq(-5,5,.001))
x <- ecdf(data1)
plot(ecdf(data1), col="blue", ann=FALSE, pch=c(1,rep(NA,10000)), cex=2)
points(x=knots(x)[seq(1,length(knots(x)),5)], y=ecdf(data1)(knots(x)[seq(1,length(knots(x)),5)]), col="red",pch=3)
title(xlab="Transfer rate (bytes/ms)")
title(main="ECDF Normalized Download Time")
The original ECDF is not visible since we plotted approx. 1500 points.
If you want less just change the value 5 inside the x and y argument of pointsto a bigger number i.e. 100. Then we have ~70 points plotted:
I don't have your data available but I think this should work for you:
ecdf1 <- ecdf(data1[,10])
ecdf2 <- ecdf(data2[,10])
ecdf3 <- ecdf(data3[,10])
ecdf4 <- ecdf(data4[,10])
knots1 <- knots(ecdf1)
knots2 <- knots(ecdf2)
knots3 <- knots(ecdf3)
knots4 <- knots(ecdf4)
n <- 10 # every 10th point
png(filename="figures/all.normdtime.png", width=800, height=600)
plot(ecdf1, col="blue", ann=FALSE)
points(x=knots1[seq(1,length(knots1),n)], y=ecdf1(knots1[seq(1,length(knots1),n)]), col="blue",pch=1)
lines(ecdf2, col="green")
points(x=knots2[seq(1,length(knots2),n)], y=ecdf2(knots2[seq(1,length(knots2),n)]), col="green",pch=3)
lines(ecdf3, col="red",)
points(x=knots3[seq(1,length(knots3),n)], y=ecdf3(knots3[seq(1,length(knots3),n)]), col="red",pch=8)
lines(ecdf4, col="orange")
points(x=knots4[seq(1,length(knots4),n)], y=ecdf4(knots4[seq(1,length(knots4),n)]), col="orange",pch=2)
title(xlab="Transfer rate (bytes/ms)")
title(main="ECDF Normalized Download Time")
dev.off()

R: Creating graphs with two y-axes

I'm looking to display two graphs on the same plot in R where the two graphs have vastly different scales i.e. the one goes from -0.001 to 0.0001 and the other goes from 0.05 to 0.2.
I've found this link http://www.statmethods.net/advgraphs/axes.html
which indicates how to display two y axes on the same plot, but I'm having trouble.
My code reads as follows:
plot(rateOfChangeMS[,1],type="l",ylim=c(-0.01,.2),axes = F)
lines(ratios[,1])
x = seq(-0.001,0.0001,0.0001)
x2 = seq(0.05,0.2,0.01)
axis(2,x)
axis(4,x2)
The problem I'm having is that, although R shows both axes, they are not next to each other as I would like, with the resulting graph attached. The left axis is measuring the graph with the small range, while the right is measuring the graph from 0.05 to 0.2. The second graph is, in fact, on the plot, but the scaling is so small that you can't see it.
Not sure if there is some etiquette rule I'm violating, never uploaded an image before so not quite sure how best to do it.
Any help would be greatly appreciated!
Thanks
Mike
Since you don't provide a reproducible example, or a representative dataset, this is a partial answer.
set.seed(1)
df <- data.frame(x=1:100,
y1=-0.001+0.002/(1:100)+rnorm(100,0,5e-5),
y2=0.05+0.0015*(0:99)+rnorm(100,0,1e-2))
ticks.1 <- seq(-0.001,0.001,0.0001)
ticks.2 <- seq(0.05,0.2,0.01)
plot(df$x, df$y1, type="l", yaxt="n", xlab="X", ylab="", col="blue")
axis(2, at=ticks.1, col.ticks="blue", col.axis="blue")
par(new=T)
plot(df$x, df$y2, type="l", yaxt="n", xlab="", ylab="", col="red")
axis(4, at=ticks.2, col.ticks="red", col.axis="red")
The reason your left axis is compressed is that both axes are on the same scale. You can get around that by basically superimposing two completely different plots (which is what having two axes does, after all). Incidentally, dual axes like this is not a good way to visualize data. It creates a grossly misleading visual impression.

Keep R from graphing a line outside of a chart's area

How do I prevent the red line for the last distribution from being plotted outside of the area of the plot in the graph below?
chart http://i.minus.com/jRiGxDBVw6kjZ.jpeg
I generated the graph with the following code:
x <- seq(0,4,.1)
alpha_0 <- 2
beta_0 <- .2
hist(rexp(256, rate=1))
sample <- rexp(256, rate=1)
plot(x,dgamma(x, shape=alpha_0, rate=beta_0),type='l',col='black',ylim=c(0,2),main="Posteriors of Exponential Distribution", ylab='')
lines(x,dgamma(x, shape=alpha_0+4, rate=beta_0+sum(sample[1:4])),col='blue')
lines(x,dgamma(x, shape=alpha_0+8, rate=beta_0+sum(sample[1:8])),col='green')
lines(x,dgamma(x, shape=alpha_0+16, rate=beta_0+sum(sample[1:16])),col='orange')
lines(x,dgamma(x, shape=alpha_0+256, rate=beta_0+sum(sample[1:256])),col='red',)
legend(x=2.5,y=2, c("prior","n=4", "n=8", "n=16", 'n=256'), col = c('black', 'blue', 'green','orange' ,'red'),lty=c(1,1,1,1))
Sorry, seems like a pretty simple fix, I just couldn't figure it out from the documentation. Thanks for your help.
Yes, as Joran mentioned, it was graphing the line outside of the plot area because I ran par(xpd=TRUE) earlier in the session to try to put the legend outside. I simply ran par(xpd=FALSE) and it solved the problem.

Histogram with Logarithmic Scale and custom breaks

I'm trying to generate a histogram in R with a logarithmic scale for y. Currently I do:
hist(mydata$V3, breaks=c(0,1,2,3,4,5,25))
This gives me a histogram, but the density between 0 to 1 is so great (about a million values difference) that you can barely make out any of the other bars.
Then I've tried doing:
mydata_hist <- hist(mydata$V3, breaks=c(0,1,2,3,4,5,25), plot=FALSE)
plot(rpd_hist$counts, log="xy", pch=20, col="blue")
It gives me sorta what I want, but the bottom shows me the values 1-6 rather than 0, 1, 2, 3, 4, 5, 25. It's also showing the data as points rather than bars. barplot works but then I don't get any bottom axis.
A histogram is a poor-man's density estimate. Note that in your call to hist() using default arguments, you get frequencies not probabilities -- add ,prob=TRUE to the call if you want probabilities.
As for the log axis problem, don't use 'x' if you do not want the x-axis transformed:
plot(mydata_hist$count, log="y", type='h', lwd=10, lend=2)
gets you bars on a log-y scale -- the look-and-feel is still a little different but can probably be tweaked.
Lastly, you can also do hist(log(x), ...) to get a histogram of the log of your data.
Another option would be to use the package ggplot2.
ggplot(mydata, aes(x = V3)) + geom_histogram() + scale_x_log10()
It's not entirely clear from your question whether you want a logged x-axis or a logged y-axis. A logged y-axis is not a good idea when using bars because they are anchored at zero, which becomes negative infinity when logged. You can work around this problem by using a frequency polygon or density plot.
Dirk's answer is a great one. If you want an appearance like what hist produces, you can also try this:
buckets <- c(0,1,2,3,4,5,25)
mydata_hist <- hist(mydata$V3, breaks=buckets, plot=FALSE)
bp <- barplot(mydata_hist$count, log="y", col="white", names.arg=buckets)
text(bp, mydata_hist$counts, labels=mydata_hist$counts, pos=1)
The last line is optional, it adds value labels just under the top of each bar. This can be useful for log scale graphs, but can also be omitted.
I also pass main, xlab, and ylab parameters to provide a plot title, x-axis label, and y-axis label.
Run the hist() function without making a graph, log-transform the counts, and then draw the figure.
hist.data = hist(my.data, plot=F)
hist.data$counts = log(hist.data$counts, 2)
plot(hist.data)
It should look just like the regular histogram, but the y-axis will be log2 Frequency.
I've put together a function that behaves identically to hist in the default case, but accepts the log argument. It uses several tricks from other posters, but adds a few of its own. hist(x) and myhist(x) look identical.
The original problem would be solved with:
myhist(mydata$V3, breaks=c(0,1,2,3,4,5,25), log="xy")
The function:
myhist <- function(x, ..., breaks="Sturges",
main = paste("Histogram of", xname),
xlab = xname,
ylab = "Frequency") {
xname = paste(deparse(substitute(x), 500), collapse="\n")
h = hist(x, breaks=breaks, plot=FALSE)
plot(h$breaks, c(NA,h$counts), type='S', main=main,
xlab=xlab, ylab=ylab, axes=FALSE, ...)
axis(1)
axis(2)
lines(h$breaks, c(h$counts,NA), type='s')
lines(h$breaks, c(NA,h$counts), type='h')
lines(h$breaks, c(h$counts,NA), type='h')
lines(h$breaks, rep(0,length(h$breaks)), type='S')
invisible(h)
}
Exercise for the reader: Unfortunately, not everything that works with hist works with myhist as it stands. That should be fixable with a bit more effort, though.
Here's a pretty ggplot2 solution:
library(ggplot2)
library(scales) # makes pretty labels on the x-axis
breaks=c(0,1,2,3,4,5,25)
ggplot(mydata,aes(x = V3)) +
geom_histogram(breaks = log10(breaks)) +
scale_x_log10(
breaks = breaks,
labels = scales::trans_format("log10", scales::math_format(10^.x))
)
Note that to set the breaks in geom_histogram, they had to be transformed to work with scale_x_log10

Resources