Plot multiple barplots on one plot, but bars dont overlap R - r

If I wanted to do this with the bars of the different calls to barplot(add=T) overlapping, that's fine and dandy. But say I want tthem to be plotted on the same plot, but with the first call having a ylim from 0:1 then the second call from 1:2 etc. I tried:
for(i in 1:length(files)) {
file <- as.matrix(read.table(files[i], header=F, sep=" ") )
if(i==1) barplot(file, beside=T, col=1:i, border=NA, ylim = c(0,length(files)))
if(i>1) barplot(file, beside=T, col=1:i, border=NA, ylim = c(i-1,i) ,xpd=T, add=T)
}
but that overlays them. How can I do it so that theyre on the same image but not overlapping if that makes sense. I envisage something like this http://img585.imageshack.us/img585/5439/romak13.png

If you're doing something like this, I'd recommend using ggplot2, as it's much easier.
Here's some sample code:
library(ggplot2)
data(diamonds)
ggplot(diamonds,aes(x=carat,y=price,fill=color))+
geom_histogram(stat='identity')+
facet_grid('cut~.',scale='free')+labs("Graph Title")
The output looks like this:
The interpretation of this particular graph is a bit strange, considering the nature of the data set, but if you follow the same format, you should be able to get a decent-looking graph. If anyone has any better data examples, let me know.

Related

How to create histogram plot in ggplot2 without data frame?

I am plotting two histograms in R by using the following code.
x1<-rnorm(100)
x2<-rnorm(50)
h1<-hist(x1)
h2<-hist(x2)
plot(h1, col=rgb(0,0,1,.25), xlim=c(-4,4), ylim=c(0,0.6), main="", xlab="Index", ylab="Percent",freq = FALSE)
plot(h2, col=rgb(1,0,0,.25), xlim=c(-4,4), ylim=c(0,0.6), main="", xlab="Index", ylab="Percent",freq = FALSE,add=TRUE)
legend("topright", c("H1", "H2"), fill=c(rgb(0,0,1,.25),rgb(1,0,0,.25)))
The code produces the following output.
I need a visually good looking (or stylistic) version of the above plot. I want to use ggplot2. I am looking for something like this (see Change fill colors section). However, I think, ggplot2 only works with data frames. I do not have data frames in this case. Hence, how can I create good looking histogram plot in ggplot2? Please let me know. Thanks in advance.
You can (and should) put your data into a data.frame if you want to use ggplot. Ideally for ggplot, the data.frame should be in long format. Here's a simple example:
df1 = rbind(data.frame(grp='x1', x=x1), data.frame(grp='x2', x=x2))
ggplot(df1, aes(x, fill=grp)) +
geom_histogram(color='black', alpha=0.5)
There are lots of options to change the appearnce how you like. If you want to have the histograms stacked or grouped, or shown as percent versus count, or as densities etc., you will find many resources in previous questions showing how to implement each of those options.

Contour plot via Scatter plot

Scatter plots are useless when number of plots is large.
So, e.g., using normal approximation, we can get the contour plot.
My question: Is there any package to implement the contour plot from scatter plot.
Thank you #G5W !! I can do it !!
You don't offer any data, so I will respond with some artificial data,
constructed at the bottom of the post. You also don't say how much data
you have although you say it is a large number of points. I am illustrating
with 20000 points.
You used the group number as the plotting character to indicate the group.
I find that hard to read. But just plotting the points doesn't show the
groups well. Coloring each group a different color is a start, but does
not look very good.
plot(x,y, pch=20, col=rainbow(3)[group])
Two tricks that can make a lot of points more understandable are:
1. Make the points transparent. The dense places will appear darker. AND
2. Reduce the point size.
plot(x,y, pch=20, col=rainbow(3, alpha=0.1)[group], cex=0.8)
That looks somewhat better, but did not address your actual request.
Your sample picture seems to show confidence ellipses. You can get
those using the function dataEllipse from the car package.
library(car)
plot(x,y, pch=20, col=rainbow(3, alpha=0.1)[group], cex=0.8)
dataEllipse(x,y,factor(group), levels=c(0.70,0.85,0.95),
plot.points=FALSE, col=rainbow(3), group.labels=NA, center.pch=FALSE)
But if there are really a lot of points, the points can still overlap
so much that they are just confusing. You can also use dataEllipse
to create what is basically a 2D density plot without showing the points
at all. Just plot several ellipses of different sizes over each other filling
them with transparent colors. The center of the distribution will appear darker.
This can give an idea of the distribution for a very large number of points.
plot(x,y,pch=NA)
dataEllipse(x,y,factor(group), levels=c(seq(0.15,0.95,0.2), 0.995),
plot.points=FALSE, col=rainbow(3), group.labels=NA,
center.pch=FALSE, fill=TRUE, fill.alpha=0.15, lty=1, lwd=1)
You can get a more continuous look by plotting more ellipses and leaving out the border lines.
plot(x,y,pch=NA)
dataEllipse(x,y,factor(group), levels=seq(0.11,0.99,0.02),
plot.points=FALSE, col=rainbow(3), group.labels=NA,
center.pch=FALSE, fill=TRUE, fill.alpha=0.05, lty=0)
Please try different combinations of these to get a nice picture of your data.
Additional response to comment: Adding labels
Perhaps the most natural place to add group labels is the centers of the
ellipses. You can get that by simply computing the centroids of the points in each group. So for example,
plot(x,y,pch=NA)
dataEllipse(x,y,factor(group), levels=c(seq(0.15,0.95,0.2), 0.995),
plot.points=FALSE, col=rainbow(3), group.labels=NA,
center.pch=FALSE, fill=TRUE, fill.alpha=0.15, lty=1, lwd=1)
## Now add labels
for(i in unique(group)) {
text(mean(x[group==i]), mean(y[group==i]), labels=i)
}
Note that I just used the number as the group label, but if you have a more elaborate name, you can change labels=i to something like
labels=GroupNames[i].
Data
x = c(rnorm(2000,0,1), rnorm(7000,1,1), rnorm(11000,5,1))
twist = c(rep(0,2000),rep(-0.5,7000), rep(0.4,11000))
y = c(rnorm(2000,0,1), rnorm(7000,5,1), rnorm(11000,6,1)) + twist*x
group = c(rep(1,2000), rep(2,7000), rep(3,11000))
You can use hexbin::hexbin() to show very large datasets.
#G5W gave a nice dataset:
x = c(rnorm(2000,0,1), rnorm(7000,1,1), rnorm(11000,5,1))
twist = c(rep(0,2000),rep(-0.5,7000), rep(0.4,11000))
y = c(rnorm(2000,0,1), rnorm(7000,5,1), rnorm(11000,6,1)) + twist*x
group = c(rep(1,2000), rep(2,7000), rep(3,11000))
If you don't know the group information, then the ellipses are inappropriate; this is what I'd suggest:
library(hexbin)
plot(hexbin(x,y))
which produces
If you really want contours, you'll need a density estimate to plot. The MASS::kde2d() function can produce one; see the examples in its help page for plotting a contour based on the result. This is what it gives for this dataset:
library(MASS)
contour(kde2d(x,y))

R plot overlay barplot with plot type "p" (confused with factors)

I'm new to R. Previously, I've been able to overlay 2 separate plots that were of the same kind, p1 and p2, using plot (p1); plot (p2, add=T).
I'm struggling with the definition of factors when overlaying a barplot with a point plot showing all individual points.
I can individually plot the barplot as I want it. The point plot looks like I want it, but I realize I'm using an incorrect definition of phase as numerical to force R plot to display each value, rather than default to a boxplot (like when I use plot(my.df$cond, my.df$val).
Any tips on defining my variable types correctly or whether I'm using the correct barplot and plot functions, would be greatly appreciated. Thank you so much.
shpad <- c(1,2,5,6,1,2,5,6,1,2,5,6,1,2,5,6)
my.df <- data.frame(val=c(0.0738,0.0518,0.002,0.0397,0.1452,0.1152,0.1774,0.0658,0.0218,0.0497,-0.0296,0.0653,0.0848,0.1296,0.1416,0.0923,
phase=c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4),
sub=c(1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4),
cond=c("NsNm", "NsNm", "NsNm", "NsNm", "NsLm", "NsLm", "NsLm", "NsLm", "LsNm", "LsNm", "LsNm", "LsNm", "LsLm", "LsLm", "LsLm", "LsLm"))
avg <-tapply(my.df$val, my.df$phase, mean)
barplot(avg, border=NA, names.arg=c("NsNm", "NsLm", "LsNm", "LsLm"),col=c("blue","darkblue","red", "darkred"),ylab = "score",ylim=c(-0.03,0.25))
plot(my.df$phase, my.df$val, type="p", ylim=c(-0.03,0.25), ylab = "score", pch=shpad)
tl;dr: problem is that if instead of the last line, I have plot(my.df$phase, my.df$val, type="p", ylim=c(-0.03,0.25), ylab = "score", pch=shpad, add=T), the formats are incongruent.
Alright, so, I've tried for a bit to accomplish what you wanted, but the best I could do with the base plotting system is this:
Which is accomplished purely by your lines of code above except for the last line, which I replaced with
points(my.df$phase,my.df$val,type="p",pch=shpad)
However, I think you can do much better, if you want to keep the same kind of plot, using the ggplot2 library. Using this code:
library('ggplot2')
new.df <- data.frame(avg,phase=levels(factor(phase)))
ggplot(new.df) +
geom_bar(stat="identity",aes(x=levels(phase),y=avg, fill=c("NsNm","NsLm","LsNm","LsLm")))+
geom_point(aes(x=my.df$phase,y=my.df$val,shape=factor(shpad))) +
scale_x_discrete(name="Type",labels=c("NsNm","NsLm","LsNm","LsLm")) +
ylab("Score")
you can make this chart:
I didn't adjust the coloring and the point types and the legend titles (not sure how important they are, but those can be fiddled with). However, you can see this probably produces the result you were aiming for.

Exporting graphs in R

I have two graphs that I plotted in R and I want to export it as a high-resolution picture for publication.
For example:
a<-c(1,2,3,4,5,6,7)
b<-c(2,3,4,6,7,8,9)
par(mfrow=c(2,1))
plot (a,b)
plot(a,b)
I usually export this graph by:
dev.copy(jpeg,'test.jpeg',width=80,height=150,units="mm",res=200)
dev.off()
However I always find this process a bit troublesome. The graph that was plotted in R does not necessarily look like the one that I exported. Therefore, I am wondering if there is a way to specifiy the dimensions and resolution of graphs before I plot them so that I can visually inspect the graphs before I export them?
Thank you
You can try:
png('out.png')
a<-c(1,2,3,4,5,6,7)
b<-c(2,3,4,6,7,8,9)
par(mfrow=c(2,1))
plot (a,b)
plot(a,b)
dev.off()
As baptiste said, jpeg is the worst format you can choose. You should take a look at the help for the bmp and png functions (with ?bmp and ?png). Both bmp and png have height, width, and res arguments that you can use to specifiy the dimensions and resolution of the output. Also, I wouldn't recommend the use of dev.copy. As you could see, the result of the output is not always what you expect.
To add to Bonifacio2's answer, you if you call the function first to make the plot, you can also define your margins and window size etc before doing any actual plotting. That way you have full control over all fig specs.
pdf(file='test.jpeg',width=80,height=150,units="mm") #I prefer pdf, because they are editable files
a<-c(1,2,3,4,5,6,7)
b<-c(2,3,4,6,7,8,9)
par(mfrow=c(2,1))
plot (a,b)
plot(a,b)
dev.off()
You can use cowplot package to combine multiple panels in several different ways. For example, in your case, we export one plot with two panels arranged in two rows and one column. I assume that you prefer to use base-R 'plot' function instead of ggplot.
library(cowplot)
p1 <- ~{
plot(a,b)
}
p2 <- ~{
plot(b,a)
}
png("plot.png",
width = 3.149606, # 80 mm
height = 5.905512, # 150 mm
units = 'in',
res = 500)
plot_grid(p1, p2, labels = "AUTO", nrow = 2, ncol = 1)
dev.off()
Note that you can either remove the labels if not needed or print small letters by using "auto". Regarding size of the text, axis-labels etc, use the standard arguments for generic plot function of base-R. I hope this answer helps you. Best wishes.

How can I plot a histogram of a long-tailed data using R?

I have data that is mostly centered in a small range (1-10) but there is a significant number of points (say, 10%) which are in (10-1000). I would like to plot a histogram for this data that will focus on (1-10) but will also show the (10-1000) data. Something like a log-scale for th histogram.
Yes, i know this means not all bins are of equal size
A simple hist(x) gives
while hist(x,breaks=c(0,1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2,3,4,5,7.5,10,15,20,50,100,200,500,1000,10000))) gives
none of which is what I want.
update
following the answers here I now produce something that is almost exactly what I want (I went with a continuous plot instead of bar-histogram):
breaks <- c(0,1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2,4,8)
ggplot(t,aes(x)) + geom_histogram(colour="darkblue", size=1, fill="blue") + scale_x_log10('true size/predicted size', breaks = breaks, labels = breaks)![alt text][3]
the only problem is that I'd like to match between the scale and the actual bars plotted. There two options for doing that : the one is simply use the actual margins of the plotted bars (how?) then get "ugly" x-axis labels like 1.1754,1.2985 etc. The other, which I prefer, is to control the actual bins margins used so they will match the breaks.
Log scale histograms are easier with ggplot than with base graphics. Try something like
library(ggplot2)
dfr <- data.frame(x = rlnorm(100, sdlog = 3))
ggplot(dfr, aes(x)) + geom_histogram() + scale_x_log10()
If you are desperate for base graphics, you need to plot a log-scale histogram without axes, then manually add the axes afterwards.
h <- hist(log10(dfr$x), axes = FALSE)
Axis(side = 2)
Axis(at = h$breaks, labels = 10^h$breaks, side = 1)
For completeness, the lattice solution would be
library(lattice)
histogram(~x, dfr, scales = list(x = list(log = TRUE)))
AN EXPLANATION OF WHY LOG VALUES ARE NEEDED IN THE BASE CASE:
If you plot the data with no log-transformation, then most of the data are clumped into bars at the left.
hist(dfr$x)
The hist function ignores the log argument (because it interferes with the calculation of breaks), so this doesn't work.
hist(dfr$x, log = "y")
Neither does this.
par(xlog = TRUE)
hist(dfr$x)
That means that we need to log transform the data before we draw the plot.
hist(log10(dfr$x))
Unfortunately, this messes up the axes, which brings us to workaround above.
Using ggplot2 seems like the most easy option. If you want more control over your axes and your breaks, you can do something like the following :
EDIT : new code provided
x <- c(rexp(1000,0.5)+0.5,rexp(100,0.5)*100)
breaks<- c(0,0.1,0.2,0.5,1,2,5,10,20,50,100,200,500,1000,10000)
major <- c(0.1,1,10,100,1000,10000)
H <- hist(log10(x),plot=F)
plot(H$mids,H$counts,type="n",
xaxt="n",
xlab="X",ylab="Counts",
main="Histogram of X",
bg="lightgrey"
)
abline(v=log10(breaks),col="lightgrey",lty=2)
abline(v=log10(major),col="lightgrey")
abline(h=pretty(H$counts),col="lightgrey")
plot(H,add=T,freq=T,col="blue")
#Position of ticks
at <- log10(breaks)
#Creation X axis
axis(1,at=at,labels=10^at)
This is as close as I can get to the ggplot2. Putting the background grey is not that straightforward, but doable if you define a rectangle with the size of your plot screen and put the background as grey.
Check all the functions I used, and also ?par. It will allow you to build your own graphs. Hope this helps.
A dynamic graph would also help in this plot. Use the manipulate package from Rstudio to do a dynamic ranged histogram:
library(manipulate)
data_dist <- table(data)
manipulate(barplot(data_dist[x:y]), x = slider(1,length(data_dist)), y = slider(10, length(data_dist)))
Then you will be able to use sliders to see the particular distribution in a dynamically selected range like this:

Resources