Logarithmic scale plot in R - r

I want to plot the clustering coefficient and the average shortest-
path as a function of the parameter p of the Watts-Strogatz model as following:
And this is my code:
library(igraph)
library(ggplot2)
library(reshape2)
library(pracma)
p <- #don't know how to generate this?
trans <- -1
path <- -1
for (i in p) {
ws_graph <- watts.strogatz.game(1, 1000, 4, i)
trans <-c(trans, transitivity(ws_graph, type = "undirected", vids = NULL,
weights = NULL))
path <- c(path,average.path.length(ws_graph))
}
#Remove auxiliar values
trans <- trans[-1]
path <- path[-1]
#Normalize them
trans <- trans/trans[1]
path <- path/path[1]
x = data.frame(v1 = p, v2 = path, v3 = trans)
plot(p,trans, ylim = c(0,1), ylab='coeff')
par(new=T)
plot(p,path, ylim = c(0,1), ylab='coeff',pch=15)
How should I proceed to make this x-axis?

You can generate the values of p using code like the following:
p <- 10^(seq(-4,0,0.2))
You want your x values to be evenly spaced on a log10 scale. This means you need to take evenly spaced values as the exponent for the base 10, because the log10 scale takes the log10 of your x values, which is the exact opposite operation.
With this, you are already pretty far. You don't need par(new=TRUE), you can simply use the function plot followed by the function points. The latter does not redraw the whole plot. Use the argument log = 'x' to tell R you need a logarithmic x axis. This only needs to be set in the plot function, the points function and all other low-level plot functions (those who do not replace but add to the plot) respect this setting:
plot(p,trans, ylim = c(0,1), ylab='coeff', log='x')
points(p,path, ylim = c(0,1), ylab='coeff',pch=15)
EDIT: If you want to replicate the log-axis look of the above plot, you have to calculate them yourselves. Search the internet for 'R log10 minor ticks' or similar. Below is a simple function which can calcluate the appropriate position for log axis major and minor ticks
log10Tck <- function(side, type){
lim <- switch(side,
x = par('usr')[1:2],
y = par('usr')[3:4],
stop("side argument must be 'x' or 'y'"))
at <- floor(lim[1]) : ceil(lim[2])
return(switch(type,
minor = outer(1:9, 10^(min(at):max(at))),
major = 10^at,
stop("type argument must be 'major' or 'minor'")
))
}
After you have defined this function, by using the above code, you can call the function inside the axis(...) function, which draws axes. As a suggestion: save the function away in its own R script and import that script at the top of your calculation using the function source. By this means, you can reuse the function in future projects. Prior to drawing the axes, you have to prevent plot from drawing default axes, so add the parameter axes = FALSE to your plot call:
plot(p,trans, ylim = c(0,1), ylab='coeff', log='x', axes=F)
Then you may generate the axes, using the tick positions generated by the
new function:
axis(1, at=log10Tck('x','major'), tcl= 0.2) # bottom
axis(3, at=log10Tck('x','major'), tcl= 0.2, labels=NA) # top
axis(1, at=log10Tck('x','minor'), tcl= 0.1, labels=NA) # bottom
axis(3, at=log10Tck('x','minor'), tcl= 0.1, labels=NA) # top
axis(2) # normal y axis
axis(4) # normal y axis on right side of plot
box()
As a third option, as you are importing ggplot2 in your original post: The same, without all of the above, with ggplot:
# Your data needs to be in the so-called 'long format' or 'tidy format'
# that ggplot can make sense of it. Google 'Wickham tidy data' or similar
# You may also use the function 'gather' of the package 'tidyr' for this
# task, which I find more simple to use.
d2 <- reshape2::melt(x, id.vars = c('v1'), measure.vars = c('v2','v3'))
ggplot(d2) +
aes(x = v1, y = value, color = variable) +
geom_point() +
scale_x_log10()

Related

Rotate x-axis labels at a given degree for boxplot in R

I generate a boxplot with code below:
boxplot(top10threads$affect ~ top10threads$ThreadID[], data = top10threads, xlab = "10 biggest Threads", ylab = "Affect", col=(c("gold","darkgreen")), srt=45)
But as you may notice that some labels in x-axis are missing, so I want to rotate them into 45 degrees. I added srt=45, but it doesn't work.
By setting las=2 can rotate them vertically, but it's not exactly I need.
How could I do that? Thanks.
First, store the output of boxplot() as a object. It contains names of the groups. You can use $names to get them. Then use text() to add labels on the axis. The argument srt works on text().
bp <- boxplot(y ~ x, data = df, col = c("gold", "darkgreen"), xaxt = "n")
tick <- seq_along(bp$names)
axis(1, at = tick, labels = FALSE)
text(tick, par("usr")[3] - 0.3, bp$names, srt = 45, xpd = TRUE)
Data
df <- data.frame(x = sample(100:110, 100, TRUE), y = rnorm(100))
Some test data:
mydata=lapply(1:5,function(i) rnorm(100,mean=i))
names(mydata)=c("first","second","third","fourth","fifth")
First, plot the boxplot with no x-axis:
boxplot(mydata,xaxt="n",xlab="")
Then, we make a function to add textual x-axis labels:
x_axis_labels=function(labels,every_nth=1,...) {
axis(side=1,at=seq_along(labels),labels=F)
text(x=(seq_along(labels))[seq_len(every_nth)==1],
y=par("usr")[3]-0.075*(par("usr")[4]-par("usr")[3]),
labels=labels[seq_len(every_nth)==1],xpd=TRUE,...)
}
# axis() draws the axis with ticks at positions specified by at. Again, we don't plot the labels yet.
# text() plots the labels at positions given by x and y.
# We estimate the y-positions from the values of the y-axis (using par("usr")),
# and specify xpd=TRUE to indicate that we don't want to crop plotting to within the plot area
# Note that we select the [seq_len(every_nth)==1] elements of both the x positions and the labels,
# so we can easily skip labels if there would be too many to cram in otherwise.
# Finally, we leave a ... in the function so we can pass additional arguments to text()
Finally, we call the new function to plot the axis tick labels:
x_axis_labels(labels=names(mydata),every_nth=1,adj=1,srt=45)
Here we take advantage of the ... in the function to pass the rotation/justification parameters: adj=1 specifies to right-justify the text labels, and srt=45 indicates to rotate them by 45 degrees.

Histogram to decide whether two distributions have the same shape in R [duplicate]

I am using R and I have two data frames: carrots and cucumbers. Each data frame has a single numeric column that lists the length of all measured carrots (total: 100k carrots) and cucumbers (total: 50k cucumbers).
I wish to plot two histograms - carrot length and cucumbers lengths - on the same plot. They overlap, so I guess I also need some transparency. I also need to use relative frequencies not absolute numbers since the number of instances in each group is different.
Something like this would be nice but I don't understand how to create it from my two tables:
Here is an even simpler solution using base graphics and alpha-blending (which does not work on all graphics devices):
set.seed(42)
p1 <- hist(rnorm(500,4)) # centered at 4
p2 <- hist(rnorm(500,6)) # centered at 6
plot( p1, col=rgb(0,0,1,1/4), xlim=c(0,10)) # first histogram
plot( p2, col=rgb(1,0,0,1/4), xlim=c(0,10), add=T) # second
The key is that the colours are semi-transparent.
Edit, more than two years later: As this just got an upvote, I figure I may as well add a visual of what the code produces as alpha-blending is so darn useful:
That image you linked to was for density curves, not histograms.
If you've been reading on ggplot then maybe the only thing you're missing is combining your two data frames into one long one.
So, let's start with something like what you have, two separate sets of data and combine them.
carrots <- data.frame(length = rnorm(100000, 6, 2))
cukes <- data.frame(length = rnorm(50000, 7, 2.5))
# Now, combine your two dataframes into one.
# First make a new column in each that will be
# a variable to identify where they came from later.
carrots$veg <- 'carrot'
cukes$veg <- 'cuke'
# and combine into your new data frame vegLengths
vegLengths <- rbind(carrots, cukes)
After that, which is unnecessary if your data is in long format already, you only need one line to make your plot.
ggplot(vegLengths, aes(length, fill = veg)) + geom_density(alpha = 0.2)
Now, if you really did want histograms the following will work. Note that you must change position from the default "stack" argument. You might miss that if you don't really have an idea of what your data should look like. A higher alpha looks better there. Also note that I made it density histograms. It's easy to remove the y = ..density.. to get it back to counts.
ggplot(vegLengths, aes(length, fill = veg)) +
geom_histogram(alpha = 0.5, aes(y = ..density..), position = 'identity')
On additional thing, I commented on Dirk's question that all of the arguments could simply be in the hist command. I was asked how that could be done. What follows produces exactly Dirk's figure.
set.seed(42)
hist(rnorm(500,4), col=rgb(0,0,1,1/4), xlim=c(0,10))
hist(rnorm(500,6), col=rgb(1,0,0,1/4), xlim=c(0,10), add = TRUE)
Here's a function I wrote that uses pseudo-transparency to represent overlapping histograms
plotOverlappingHist <- function(a, b, colors=c("white","gray20","gray50"),
breaks=NULL, xlim=NULL, ylim=NULL){
ahist=NULL
bhist=NULL
if(!(is.null(breaks))){
ahist=hist(a,breaks=breaks,plot=F)
bhist=hist(b,breaks=breaks,plot=F)
} else {
ahist=hist(a,plot=F)
bhist=hist(b,plot=F)
dist = ahist$breaks[2]-ahist$breaks[1]
breaks = seq(min(ahist$breaks,bhist$breaks),max(ahist$breaks,bhist$breaks),dist)
ahist=hist(a,breaks=breaks,plot=F)
bhist=hist(b,breaks=breaks,plot=F)
}
if(is.null(xlim)){
xlim = c(min(ahist$breaks,bhist$breaks),max(ahist$breaks,bhist$breaks))
}
if(is.null(ylim)){
ylim = c(0,max(ahist$counts,bhist$counts))
}
overlap = ahist
for(i in 1:length(overlap$counts)){
if(ahist$counts[i] > 0 & bhist$counts[i] > 0){
overlap$counts[i] = min(ahist$counts[i],bhist$counts[i])
} else {
overlap$counts[i] = 0
}
}
plot(ahist, xlim=xlim, ylim=ylim, col=colors[1])
plot(bhist, xlim=xlim, ylim=ylim, col=colors[2], add=T)
plot(overlap, xlim=xlim, ylim=ylim, col=colors[3], add=T)
}
Here's another way to do it using R's support for transparent colors
a=rnorm(1000, 3, 1)
b=rnorm(1000, 6, 1)
hist(a, xlim=c(0,10), col="red")
hist(b, add=T, col=rgb(0, 1, 0, 0.5) )
The results end up looking something like this:
Already beautiful answers are there, but I thought of adding this. Looks good to me.
(Copied random numbers from #Dirk). library(scales) is needed`
set.seed(42)
hist(rnorm(500,4),xlim=c(0,10),col='skyblue',border=F)
hist(rnorm(500,6),add=T,col=scales::alpha('red',.5),border=F)
The result is...
Update: This overlapping function may also be useful to some.
hist0 <- function(...,col='skyblue',border=T) hist(...,col=col,border=border)
I feel result from hist0 is prettier to look than hist
hist2 <- function(var1, var2,name1='',name2='',
breaks = min(max(length(var1), length(var2)),20),
main0 = "", alpha0 = 0.5,grey=0,border=F,...) {
library(scales)
colh <- c(rgb(0, 1, 0, alpha0), rgb(1, 0, 0, alpha0))
if(grey) colh <- c(alpha(grey(0.1,alpha0)), alpha(grey(0.9,alpha0)))
max0 = max(var1, var2)
min0 = min(var1, var2)
den1_max <- hist(var1, breaks = breaks, plot = F)$density %>% max
den2_max <- hist(var2, breaks = breaks, plot = F)$density %>% max
den_max <- max(den2_max, den1_max)*1.2
var1 %>% hist0(xlim = c(min0 , max0) , breaks = breaks,
freq = F, col = colh[1], ylim = c(0, den_max), main = main0,border=border,...)
var2 %>% hist0(xlim = c(min0 , max0), breaks = breaks,
freq = F, col = colh[2], ylim = c(0, den_max), add = T,border=border,...)
legend(min0,den_max, legend = c(
ifelse(nchar(name1)==0,substitute(var1) %>% deparse,name1),
ifelse(nchar(name2)==0,substitute(var2) %>% deparse,name2),
"Overlap"), fill = c('white','white', colh[1]), bty = "n", cex=1,ncol=3)
legend(min0,den_max, legend = c(
ifelse(nchar(name1)==0,substitute(var1) %>% deparse,name1),
ifelse(nchar(name2)==0,substitute(var2) %>% deparse,name2),
"Overlap"), fill = c(colh, colh[2]), bty = "n", cex=1,ncol=3) }
The result of
par(mar=c(3, 4, 3, 2) + 0.1)
set.seed(100)
hist2(rnorm(10000,2),rnorm(10000,3),breaks = 50)
is
Here is an example of how you can do it in "classic" R graphics:
## generate some random data
carrotLengths <- rnorm(1000,15,5)
cucumberLengths <- rnorm(200,20,7)
## calculate the histograms - don't plot yet
histCarrot <- hist(carrotLengths,plot = FALSE)
histCucumber <- hist(cucumberLengths,plot = FALSE)
## calculate the range of the graph
xlim <- range(histCucumber$breaks,histCarrot$breaks)
ylim <- range(0,histCucumber$density,
histCarrot$density)
## plot the first graph
plot(histCarrot,xlim = xlim, ylim = ylim,
col = rgb(1,0,0,0.4),xlab = 'Lengths',
freq = FALSE, ## relative, not absolute frequency
main = 'Distribution of carrots and cucumbers')
## plot the second graph on top of this
opar <- par(new = FALSE)
plot(histCucumber,xlim = xlim, ylim = ylim,
xaxt = 'n', yaxt = 'n', ## don't add axes
col = rgb(0,0,1,0.4), add = TRUE,
freq = FALSE) ## relative, not absolute frequency
## add a legend in the corner
legend('topleft',c('Carrots','Cucumbers'),
fill = rgb(1:0,0,0:1,0.4), bty = 'n',
border = NA)
par(opar)
The only issue with this is that it looks much better if the histogram breaks are aligned, which may have to be done manually (in the arguments passed to hist).
Here's the version like the ggplot2 one I gave only in base R. I copied some from #nullglob.
generate the data
carrots <- rnorm(100000,5,2)
cukes <- rnorm(50000,7,2.5)
You don't need to put it into a data frame like with ggplot2. The drawback of this method is that you have to write out a lot more of the details of the plot. The advantage is that you have control over more details of the plot.
## calculate the density - don't plot yet
densCarrot <- density(carrots)
densCuke <- density(cukes)
## calculate the range of the graph
xlim <- range(densCuke$x,densCarrot$x)
ylim <- range(0,densCuke$y, densCarrot$y)
#pick the colours
carrotCol <- rgb(1,0,0,0.2)
cukeCol <- rgb(0,0,1,0.2)
## plot the carrots and set up most of the plot parameters
plot(densCarrot, xlim = xlim, ylim = ylim, xlab = 'Lengths',
main = 'Distribution of carrots and cucumbers',
panel.first = grid())
#put our density plots in
polygon(densCarrot, density = -1, col = carrotCol)
polygon(densCuke, density = -1, col = cukeCol)
## add a legend in the corner
legend('topleft',c('Carrots','Cucumbers'),
fill = c(carrotCol, cukeCol), bty = 'n',
border = NA)
#Dirk Eddelbuettel: The basic idea is excellent but the code as shown can be improved. [Takes long to explain, hence a separate answer and not a comment.]
The hist() function by default draws plots, so you need to add the plot=FALSE option. Moreover, it is clearer to establish the plot area by a plot(0,0,type="n",...) call in which you can add the axis labels, plot title etc. Finally, I would like to mention that one could also use shading to distinguish between the two histograms. Here is the code:
set.seed(42)
p1 <- hist(rnorm(500,4),plot=FALSE)
p2 <- hist(rnorm(500,6),plot=FALSE)
plot(0,0,type="n",xlim=c(0,10),ylim=c(0,100),xlab="x",ylab="freq",main="Two histograms")
plot(p1,col="green",density=10,angle=135,add=TRUE)
plot(p2,col="blue",density=10,angle=45,add=TRUE)
And here is the result (a bit too wide because of RStudio :-) ):
Plotly's R API might be useful for you. The graph below is here.
library(plotly)
#add username and key
p <- plotly(username="Username", key="API_KEY")
#generate data
x0 = rnorm(500)
x1 = rnorm(500)+1
#arrange your graph
data0 = list(x=x0,
name = "Carrots",
type='histogramx',
opacity = 0.8)
data1 = list(x=x1,
name = "Cukes",
type='histogramx',
opacity = 0.8)
#specify type as 'overlay'
layout <- list(barmode='overlay',
plot_bgcolor = 'rgba(249,249,251,.85)')
#format response, and use 'browseURL' to open graph tab in your browser.
response = p$plotly(data0, data1, kwargs=list(layout=layout))
url = response$url
filename = response$filename
browseURL(response$url)
Full disclosure: I'm on the team.
So many great answers but since I've just written a function (plotMultipleHistograms() in 'basicPlotteR' package) function to do this, I thought I would add another answer.
The advantage of this function is that it automatically sets appropriate X and Y axis limits and defines a common set of bins that it uses across all the distributions.
Here's how to use it:
# Install the plotteR package
install.packages("devtools")
devtools::install_github("JosephCrispell/basicPlotteR")
library(basicPlotteR)
# Set the seed
set.seed(254534)
# Create random samples from a normal distribution
distributions <- list(rnorm(500, mean=5, sd=0.5),
rnorm(500, mean=8, sd=5),
rnorm(500, mean=20, sd=2))
# Plot overlapping histograms
plotMultipleHistograms(distributions, nBins=20,
colours=c(rgb(1,0,0, 0.5), rgb(0,0,1, 0.5), rgb(0,1,0, 0.5)),
las=1, main="Samples from normal distribution", xlab="Value")
The plotMultipleHistograms() function can take any number of distributions, and all the general plotting parameters should work with it (for example: las, main, etc.).

r program grouping 3 histograms into one grouped histogram [duplicate]

I am using R and I have two data frames: carrots and cucumbers. Each data frame has a single numeric column that lists the length of all measured carrots (total: 100k carrots) and cucumbers (total: 50k cucumbers).
I wish to plot two histograms - carrot length and cucumbers lengths - on the same plot. They overlap, so I guess I also need some transparency. I also need to use relative frequencies not absolute numbers since the number of instances in each group is different.
Something like this would be nice but I don't understand how to create it from my two tables:
Here is an even simpler solution using base graphics and alpha-blending (which does not work on all graphics devices):
set.seed(42)
p1 <- hist(rnorm(500,4)) # centered at 4
p2 <- hist(rnorm(500,6)) # centered at 6
plot( p1, col=rgb(0,0,1,1/4), xlim=c(0,10)) # first histogram
plot( p2, col=rgb(1,0,0,1/4), xlim=c(0,10), add=T) # second
The key is that the colours are semi-transparent.
Edit, more than two years later: As this just got an upvote, I figure I may as well add a visual of what the code produces as alpha-blending is so darn useful:
That image you linked to was for density curves, not histograms.
If you've been reading on ggplot then maybe the only thing you're missing is combining your two data frames into one long one.
So, let's start with something like what you have, two separate sets of data and combine them.
carrots <- data.frame(length = rnorm(100000, 6, 2))
cukes <- data.frame(length = rnorm(50000, 7, 2.5))
# Now, combine your two dataframes into one.
# First make a new column in each that will be
# a variable to identify where they came from later.
carrots$veg <- 'carrot'
cukes$veg <- 'cuke'
# and combine into your new data frame vegLengths
vegLengths <- rbind(carrots, cukes)
After that, which is unnecessary if your data is in long format already, you only need one line to make your plot.
ggplot(vegLengths, aes(length, fill = veg)) + geom_density(alpha = 0.2)
Now, if you really did want histograms the following will work. Note that you must change position from the default "stack" argument. You might miss that if you don't really have an idea of what your data should look like. A higher alpha looks better there. Also note that I made it density histograms. It's easy to remove the y = ..density.. to get it back to counts.
ggplot(vegLengths, aes(length, fill = veg)) +
geom_histogram(alpha = 0.5, aes(y = ..density..), position = 'identity')
On additional thing, I commented on Dirk's question that all of the arguments could simply be in the hist command. I was asked how that could be done. What follows produces exactly Dirk's figure.
set.seed(42)
hist(rnorm(500,4), col=rgb(0,0,1,1/4), xlim=c(0,10))
hist(rnorm(500,6), col=rgb(1,0,0,1/4), xlim=c(0,10), add = TRUE)
Here's a function I wrote that uses pseudo-transparency to represent overlapping histograms
plotOverlappingHist <- function(a, b, colors=c("white","gray20","gray50"),
breaks=NULL, xlim=NULL, ylim=NULL){
ahist=NULL
bhist=NULL
if(!(is.null(breaks))){
ahist=hist(a,breaks=breaks,plot=F)
bhist=hist(b,breaks=breaks,plot=F)
} else {
ahist=hist(a,plot=F)
bhist=hist(b,plot=F)
dist = ahist$breaks[2]-ahist$breaks[1]
breaks = seq(min(ahist$breaks,bhist$breaks),max(ahist$breaks,bhist$breaks),dist)
ahist=hist(a,breaks=breaks,plot=F)
bhist=hist(b,breaks=breaks,plot=F)
}
if(is.null(xlim)){
xlim = c(min(ahist$breaks,bhist$breaks),max(ahist$breaks,bhist$breaks))
}
if(is.null(ylim)){
ylim = c(0,max(ahist$counts,bhist$counts))
}
overlap = ahist
for(i in 1:length(overlap$counts)){
if(ahist$counts[i] > 0 & bhist$counts[i] > 0){
overlap$counts[i] = min(ahist$counts[i],bhist$counts[i])
} else {
overlap$counts[i] = 0
}
}
plot(ahist, xlim=xlim, ylim=ylim, col=colors[1])
plot(bhist, xlim=xlim, ylim=ylim, col=colors[2], add=T)
plot(overlap, xlim=xlim, ylim=ylim, col=colors[3], add=T)
}
Here's another way to do it using R's support for transparent colors
a=rnorm(1000, 3, 1)
b=rnorm(1000, 6, 1)
hist(a, xlim=c(0,10), col="red")
hist(b, add=T, col=rgb(0, 1, 0, 0.5) )
The results end up looking something like this:
Already beautiful answers are there, but I thought of adding this. Looks good to me.
(Copied random numbers from #Dirk). library(scales) is needed`
set.seed(42)
hist(rnorm(500,4),xlim=c(0,10),col='skyblue',border=F)
hist(rnorm(500,6),add=T,col=scales::alpha('red',.5),border=F)
The result is...
Update: This overlapping function may also be useful to some.
hist0 <- function(...,col='skyblue',border=T) hist(...,col=col,border=border)
I feel result from hist0 is prettier to look than hist
hist2 <- function(var1, var2,name1='',name2='',
breaks = min(max(length(var1), length(var2)),20),
main0 = "", alpha0 = 0.5,grey=0,border=F,...) {
library(scales)
colh <- c(rgb(0, 1, 0, alpha0), rgb(1, 0, 0, alpha0))
if(grey) colh <- c(alpha(grey(0.1,alpha0)), alpha(grey(0.9,alpha0)))
max0 = max(var1, var2)
min0 = min(var1, var2)
den1_max <- hist(var1, breaks = breaks, plot = F)$density %>% max
den2_max <- hist(var2, breaks = breaks, plot = F)$density %>% max
den_max <- max(den2_max, den1_max)*1.2
var1 %>% hist0(xlim = c(min0 , max0) , breaks = breaks,
freq = F, col = colh[1], ylim = c(0, den_max), main = main0,border=border,...)
var2 %>% hist0(xlim = c(min0 , max0), breaks = breaks,
freq = F, col = colh[2], ylim = c(0, den_max), add = T,border=border,...)
legend(min0,den_max, legend = c(
ifelse(nchar(name1)==0,substitute(var1) %>% deparse,name1),
ifelse(nchar(name2)==0,substitute(var2) %>% deparse,name2),
"Overlap"), fill = c('white','white', colh[1]), bty = "n", cex=1,ncol=3)
legend(min0,den_max, legend = c(
ifelse(nchar(name1)==0,substitute(var1) %>% deparse,name1),
ifelse(nchar(name2)==0,substitute(var2) %>% deparse,name2),
"Overlap"), fill = c(colh, colh[2]), bty = "n", cex=1,ncol=3) }
The result of
par(mar=c(3, 4, 3, 2) + 0.1)
set.seed(100)
hist2(rnorm(10000,2),rnorm(10000,3),breaks = 50)
is
Here is an example of how you can do it in "classic" R graphics:
## generate some random data
carrotLengths <- rnorm(1000,15,5)
cucumberLengths <- rnorm(200,20,7)
## calculate the histograms - don't plot yet
histCarrot <- hist(carrotLengths,plot = FALSE)
histCucumber <- hist(cucumberLengths,plot = FALSE)
## calculate the range of the graph
xlim <- range(histCucumber$breaks,histCarrot$breaks)
ylim <- range(0,histCucumber$density,
histCarrot$density)
## plot the first graph
plot(histCarrot,xlim = xlim, ylim = ylim,
col = rgb(1,0,0,0.4),xlab = 'Lengths',
freq = FALSE, ## relative, not absolute frequency
main = 'Distribution of carrots and cucumbers')
## plot the second graph on top of this
opar <- par(new = FALSE)
plot(histCucumber,xlim = xlim, ylim = ylim,
xaxt = 'n', yaxt = 'n', ## don't add axes
col = rgb(0,0,1,0.4), add = TRUE,
freq = FALSE) ## relative, not absolute frequency
## add a legend in the corner
legend('topleft',c('Carrots','Cucumbers'),
fill = rgb(1:0,0,0:1,0.4), bty = 'n',
border = NA)
par(opar)
The only issue with this is that it looks much better if the histogram breaks are aligned, which may have to be done manually (in the arguments passed to hist).
Here's the version like the ggplot2 one I gave only in base R. I copied some from #nullglob.
generate the data
carrots <- rnorm(100000,5,2)
cukes <- rnorm(50000,7,2.5)
You don't need to put it into a data frame like with ggplot2. The drawback of this method is that you have to write out a lot more of the details of the plot. The advantage is that you have control over more details of the plot.
## calculate the density - don't plot yet
densCarrot <- density(carrots)
densCuke <- density(cukes)
## calculate the range of the graph
xlim <- range(densCuke$x,densCarrot$x)
ylim <- range(0,densCuke$y, densCarrot$y)
#pick the colours
carrotCol <- rgb(1,0,0,0.2)
cukeCol <- rgb(0,0,1,0.2)
## plot the carrots and set up most of the plot parameters
plot(densCarrot, xlim = xlim, ylim = ylim, xlab = 'Lengths',
main = 'Distribution of carrots and cucumbers',
panel.first = grid())
#put our density plots in
polygon(densCarrot, density = -1, col = carrotCol)
polygon(densCuke, density = -1, col = cukeCol)
## add a legend in the corner
legend('topleft',c('Carrots','Cucumbers'),
fill = c(carrotCol, cukeCol), bty = 'n',
border = NA)
#Dirk Eddelbuettel: The basic idea is excellent but the code as shown can be improved. [Takes long to explain, hence a separate answer and not a comment.]
The hist() function by default draws plots, so you need to add the plot=FALSE option. Moreover, it is clearer to establish the plot area by a plot(0,0,type="n",...) call in which you can add the axis labels, plot title etc. Finally, I would like to mention that one could also use shading to distinguish between the two histograms. Here is the code:
set.seed(42)
p1 <- hist(rnorm(500,4),plot=FALSE)
p2 <- hist(rnorm(500,6),plot=FALSE)
plot(0,0,type="n",xlim=c(0,10),ylim=c(0,100),xlab="x",ylab="freq",main="Two histograms")
plot(p1,col="green",density=10,angle=135,add=TRUE)
plot(p2,col="blue",density=10,angle=45,add=TRUE)
And here is the result (a bit too wide because of RStudio :-) ):
Plotly's R API might be useful for you. The graph below is here.
library(plotly)
#add username and key
p <- plotly(username="Username", key="API_KEY")
#generate data
x0 = rnorm(500)
x1 = rnorm(500)+1
#arrange your graph
data0 = list(x=x0,
name = "Carrots",
type='histogramx',
opacity = 0.8)
data1 = list(x=x1,
name = "Cukes",
type='histogramx',
opacity = 0.8)
#specify type as 'overlay'
layout <- list(barmode='overlay',
plot_bgcolor = 'rgba(249,249,251,.85)')
#format response, and use 'browseURL' to open graph tab in your browser.
response = p$plotly(data0, data1, kwargs=list(layout=layout))
url = response$url
filename = response$filename
browseURL(response$url)
Full disclosure: I'm on the team.
So many great answers but since I've just written a function (plotMultipleHistograms() in 'basicPlotteR' package) function to do this, I thought I would add another answer.
The advantage of this function is that it automatically sets appropriate X and Y axis limits and defines a common set of bins that it uses across all the distributions.
Here's how to use it:
# Install the plotteR package
install.packages("devtools")
devtools::install_github("JosephCrispell/basicPlotteR")
library(basicPlotteR)
# Set the seed
set.seed(254534)
# Create random samples from a normal distribution
distributions <- list(rnorm(500, mean=5, sd=0.5),
rnorm(500, mean=8, sd=5),
rnorm(500, mean=20, sd=2))
# Plot overlapping histograms
plotMultipleHistograms(distributions, nBins=20,
colours=c(rgb(1,0,0, 0.5), rgb(0,0,1, 0.5), rgb(0,1,0, 0.5)),
las=1, main="Samples from normal distribution", xlab="Value")
The plotMultipleHistograms() function can take any number of distributions, and all the general plotting parameters should work with it (for example: las, main, etc.).

R: Plotting a scatterplot over a filled.contour plot

I am very new to R and have made a filled.contour plot using interpolated data like the data found in Plotting contours on an irregular grid . Using some sample data from Plotting contours on an irregular grid , I made a filled.contour and simple scatterplot using the following codes
x <- datr$Lat
y <- datr$Lon
z <- datr$Rain
require(akima)
fld <- interp(x,y,z)
filled.contour(fld)
plot(x,y)
Is there a way to make the plot(x,y) and filled.contour(fld) on the same plot (overlaying)? I have tried the points(x,y), but this doesn't match the x and y axes. In Matlab, I believe I would do this with hold, but I am unsure how to do it on R.
Thanks!
You could use the arguments plot.title or plot.axes for that:
x <- 10*1:nrow(volcano)
y <- 10*1:ncol(volcano)
filled.contour(x, y, volcano, plot.title = {
points(x = 200, y = 200)
})
(via)
One way is to read the code for filled.contour, and do a
little hacking like so:
Make your figure:
filled.contour(fld)
Define these constants by copying them from the arguments list.
nlevels = 20
zlim = range(z, finite = TRUE)
las = 1
levels = pretty(zlim, nlevels)
xlim = range(x, finite = TRUE)
ylim = range(y, finite = TRUE)
xaxs = "i"
yaxs = "i"
asp = NA
Calculate these values by copying code from the function body
mar.orig <- (par.orig <- par(c("mar", "las", "mfrow")))$mar
w <- (3 + mar.orig[2L]) * par("csi") * 2.54
Set the layout by copying code from the function body
layout(matrix(c(2, 1), ncol = 2L), widths = c(1, lcm(w)))
Noteice that the figure is actually plotted after the color scale,
but we don't wnat to reverse the order of the layout because layout
actually sets the 'current' region as the last region because the
first call to plot.new will cause the current region to wrap around
to the first region. Hence, when you set the plot window and plot the points via:
plot.window(ylim=ylim,xlim=xlim)
points(x,y)
title(main='title',
sub='Sub-Title',
xlab='This is the x axis',
ylab='This is the y axis')
They overlay figure as desired.

Surface plot Q in R - compable to surf() in matlab

I want to plot a matrix of z values with x rows and y columns as a surface similar to this graph from MATLAB.
Surface plot:
Code to generate matrix:
# Parameters
shape<-1.849241
scale<-38.87986
x<-seq(from = -241.440, to = 241.440, by = 0.240)# 2013 length
y<-seq(from = -241.440, to = 241.440, by = 0.240)
matrix_fun<-matrix(data = 0, nrow = length(x), ncol = length(y))
# Generate two dimensional travel distance probability density function
for (i in 1:length(x)) {
for (j in 1:length(y)){
dxy<-sqrt(x[i]^2+y[j]^2)
prob<-1/(scale^(shape)*gamma(shape))*dxy^(shape-1)*exp(-(dxy/scale))
matrix_fun[i,j]<-prob
}}
# Rescale 2-d pdf to sum to 1
a<-sum(matrix_fun)
matrix_scale<-matrix_fun/a
I am able to generate surface plots using a couple methods (persp(), persp3d(), surface3d()) but the colors aren't displaying the z values (the probabilities held within the matrix). The z values only seem to display as heights not as differentiated colors as in the MATLAB figure.
Example of graph code and graphs:
library(rgl)
persp3d(x=x, y=y, z=matrix_scale, color=rainbow(25, start=min(matrix_scale), end=max(matrix_scale)))
surface3d(x=x, y=y, z=matrix_scale, color=rainbow(25, start=min(matrix_scale), end=max(matrix_scale)))
persp(x=x, y=y, z=matrix_scale, theta=30, phi=30, col=rainbow(25, start=min(matrix_scale), end=max(matrix_scale)), border=NA)
Image of the last graph
Any other tips to recreate the image in R would be most appreciated (i.e. legend bar, axis tick marks, etc.)
So here's a ggplot solution which seems to come a little bit closer to the MATLAB plot
# Parameters
shape<-1.849241
scale<-38.87986
x<-seq(from = -241.440, to = 241.440, by = 2.40)
y<-seq(from = -241.440, to = 241.440, by = 2.40)
df <- expand.grid(x=x,y=y)
df$dxy <- with(df,sqrt(x^2+y^2))
df$prob <- dgamma(df$dxy,shape=shape,scale=scale)
df$prob <- df$prob/sum(df$prob)
library(ggplot2)
library(colorRamps) # for matlab.like(...)
library(scales) # for labels=scientific
ggplot(df, aes(x,y))+
geom_tile(aes(fill=prob))+
scale_fill_gradientn(colours=matlab.like(10), labels=scientific)
BTW: You can generate your data frame of probabilities much more efficiently using the built-in dgamma(...) function, rather than calculating it yourself.
In line with alexis_laz's comment, here is an example using filled.contour. You might want to increase your by to 2.40 since the finer granularity increases the time it takes to generate the plot by a lot but doesn't improve quality.
filled.contour(x = x, y = y, z = matrix_scale, color = terrain.colors)
# terrain.colors is in the base grDevices package
If you want something closer to your color scheme above, you can fiddle with the rainbow function:
filled.contour(x = x, y = y, z = matrix_scale,
color = (function(n, ...) rep(rev(rainbow(n/2, ...)[1:9]), each = 3)))
Finer granularity:
filled.contour(x = x, y = y, z = matrix_scale, nlevels = 150,
color = (function(n, ...)
rev(rep(rainbow(50, start = 0, end = 0.75, ...), each = 3))[5:150]))

Resources