Combining 2 datasets in a single plot in R - r

I have two columns of data, f.delta and g.delta that I would like to produce a scatter plot of in R.
Here is how I am doing it.
plot(f.delta~x, pch=20, col="blue")
points(g.delta~x, pch=20, col="red")
The problem is this: the values of f.delta vary from 0 to -7; the values of g.delta vary from 0 to 10.
When the plot is drawn, the y axis extends from 1 to -7. So while all the f.delta points are visible, any g.delta point that has y>1 is cut-off from view.
How do I stop R from automatically setting the ylims from the data values. Have tried, unsuccessfully, various combinations of yaxt, yaxp, ylims.
Any suggestion will be greatly appreciated.
Thanks,
Anjan

In addition to Gavin's excellent answer, I also thought I'd mention that another common idiom in these cases is to create an empty plot with the correct limits and then to fill it in using points, lines, etc.
Using Gavin's example data:
with(df,plot(range(x),range(f.delta,g.delta),type = "n"))
points(f.delta~x, data = df, pch=20, col="blue")
points(g.delta~x, data = df, pch=20, col="red")
The type = "n" causes plot to create only the empty plotting window, based on the range of x and y values we've supplied. Then we use points for both columns on this existing plot.

You need to tell R what the limits of the data are and pass that as argument ylim to plot() (note the argument is ylim not ylims!). Here is an example:
set.seed(1)
df <- data.frame(f.delta = runif(10, min = -7, max = 0),
g.delta = runif(10, min = 0, max = 10),
x = rnorm(10))
ylim <- with(df, range(f.delta, g.delta)) ## compute y axis limits
plot(f.delta ~ x, data = df, pch = 20, col = "blue", ylim = ylim)
points(g.delta ~ x, data = df, pch = 20, col = "red")
Which produces

Related

Histogram to decide whether two distributions have the same shape in R [duplicate]

I am using R and I have two data frames: carrots and cucumbers. Each data frame has a single numeric column that lists the length of all measured carrots (total: 100k carrots) and cucumbers (total: 50k cucumbers).
I wish to plot two histograms - carrot length and cucumbers lengths - on the same plot. They overlap, so I guess I also need some transparency. I also need to use relative frequencies not absolute numbers since the number of instances in each group is different.
Something like this would be nice but I don't understand how to create it from my two tables:
Here is an even simpler solution using base graphics and alpha-blending (which does not work on all graphics devices):
set.seed(42)
p1 <- hist(rnorm(500,4)) # centered at 4
p2 <- hist(rnorm(500,6)) # centered at 6
plot( p1, col=rgb(0,0,1,1/4), xlim=c(0,10)) # first histogram
plot( p2, col=rgb(1,0,0,1/4), xlim=c(0,10), add=T) # second
The key is that the colours are semi-transparent.
Edit, more than two years later: As this just got an upvote, I figure I may as well add a visual of what the code produces as alpha-blending is so darn useful:
That image you linked to was for density curves, not histograms.
If you've been reading on ggplot then maybe the only thing you're missing is combining your two data frames into one long one.
So, let's start with something like what you have, two separate sets of data and combine them.
carrots <- data.frame(length = rnorm(100000, 6, 2))
cukes <- data.frame(length = rnorm(50000, 7, 2.5))
# Now, combine your two dataframes into one.
# First make a new column in each that will be
# a variable to identify where they came from later.
carrots$veg <- 'carrot'
cukes$veg <- 'cuke'
# and combine into your new data frame vegLengths
vegLengths <- rbind(carrots, cukes)
After that, which is unnecessary if your data is in long format already, you only need one line to make your plot.
ggplot(vegLengths, aes(length, fill = veg)) + geom_density(alpha = 0.2)
Now, if you really did want histograms the following will work. Note that you must change position from the default "stack" argument. You might miss that if you don't really have an idea of what your data should look like. A higher alpha looks better there. Also note that I made it density histograms. It's easy to remove the y = ..density.. to get it back to counts.
ggplot(vegLengths, aes(length, fill = veg)) +
geom_histogram(alpha = 0.5, aes(y = ..density..), position = 'identity')
On additional thing, I commented on Dirk's question that all of the arguments could simply be in the hist command. I was asked how that could be done. What follows produces exactly Dirk's figure.
set.seed(42)
hist(rnorm(500,4), col=rgb(0,0,1,1/4), xlim=c(0,10))
hist(rnorm(500,6), col=rgb(1,0,0,1/4), xlim=c(0,10), add = TRUE)
Here's a function I wrote that uses pseudo-transparency to represent overlapping histograms
plotOverlappingHist <- function(a, b, colors=c("white","gray20","gray50"),
breaks=NULL, xlim=NULL, ylim=NULL){
ahist=NULL
bhist=NULL
if(!(is.null(breaks))){
ahist=hist(a,breaks=breaks,plot=F)
bhist=hist(b,breaks=breaks,plot=F)
} else {
ahist=hist(a,plot=F)
bhist=hist(b,plot=F)
dist = ahist$breaks[2]-ahist$breaks[1]
breaks = seq(min(ahist$breaks,bhist$breaks),max(ahist$breaks,bhist$breaks),dist)
ahist=hist(a,breaks=breaks,plot=F)
bhist=hist(b,breaks=breaks,plot=F)
}
if(is.null(xlim)){
xlim = c(min(ahist$breaks,bhist$breaks),max(ahist$breaks,bhist$breaks))
}
if(is.null(ylim)){
ylim = c(0,max(ahist$counts,bhist$counts))
}
overlap = ahist
for(i in 1:length(overlap$counts)){
if(ahist$counts[i] > 0 & bhist$counts[i] > 0){
overlap$counts[i] = min(ahist$counts[i],bhist$counts[i])
} else {
overlap$counts[i] = 0
}
}
plot(ahist, xlim=xlim, ylim=ylim, col=colors[1])
plot(bhist, xlim=xlim, ylim=ylim, col=colors[2], add=T)
plot(overlap, xlim=xlim, ylim=ylim, col=colors[3], add=T)
}
Here's another way to do it using R's support for transparent colors
a=rnorm(1000, 3, 1)
b=rnorm(1000, 6, 1)
hist(a, xlim=c(0,10), col="red")
hist(b, add=T, col=rgb(0, 1, 0, 0.5) )
The results end up looking something like this:
Already beautiful answers are there, but I thought of adding this. Looks good to me.
(Copied random numbers from #Dirk). library(scales) is needed`
set.seed(42)
hist(rnorm(500,4),xlim=c(0,10),col='skyblue',border=F)
hist(rnorm(500,6),add=T,col=scales::alpha('red',.5),border=F)
The result is...
Update: This overlapping function may also be useful to some.
hist0 <- function(...,col='skyblue',border=T) hist(...,col=col,border=border)
I feel result from hist0 is prettier to look than hist
hist2 <- function(var1, var2,name1='',name2='',
breaks = min(max(length(var1), length(var2)),20),
main0 = "", alpha0 = 0.5,grey=0,border=F,...) {
library(scales)
colh <- c(rgb(0, 1, 0, alpha0), rgb(1, 0, 0, alpha0))
if(grey) colh <- c(alpha(grey(0.1,alpha0)), alpha(grey(0.9,alpha0)))
max0 = max(var1, var2)
min0 = min(var1, var2)
den1_max <- hist(var1, breaks = breaks, plot = F)$density %>% max
den2_max <- hist(var2, breaks = breaks, plot = F)$density %>% max
den_max <- max(den2_max, den1_max)*1.2
var1 %>% hist0(xlim = c(min0 , max0) , breaks = breaks,
freq = F, col = colh[1], ylim = c(0, den_max), main = main0,border=border,...)
var2 %>% hist0(xlim = c(min0 , max0), breaks = breaks,
freq = F, col = colh[2], ylim = c(0, den_max), add = T,border=border,...)
legend(min0,den_max, legend = c(
ifelse(nchar(name1)==0,substitute(var1) %>% deparse,name1),
ifelse(nchar(name2)==0,substitute(var2) %>% deparse,name2),
"Overlap"), fill = c('white','white', colh[1]), bty = "n", cex=1,ncol=3)
legend(min0,den_max, legend = c(
ifelse(nchar(name1)==0,substitute(var1) %>% deparse,name1),
ifelse(nchar(name2)==0,substitute(var2) %>% deparse,name2),
"Overlap"), fill = c(colh, colh[2]), bty = "n", cex=1,ncol=3) }
The result of
par(mar=c(3, 4, 3, 2) + 0.1)
set.seed(100)
hist2(rnorm(10000,2),rnorm(10000,3),breaks = 50)
is
Here is an example of how you can do it in "classic" R graphics:
## generate some random data
carrotLengths <- rnorm(1000,15,5)
cucumberLengths <- rnorm(200,20,7)
## calculate the histograms - don't plot yet
histCarrot <- hist(carrotLengths,plot = FALSE)
histCucumber <- hist(cucumberLengths,plot = FALSE)
## calculate the range of the graph
xlim <- range(histCucumber$breaks,histCarrot$breaks)
ylim <- range(0,histCucumber$density,
histCarrot$density)
## plot the first graph
plot(histCarrot,xlim = xlim, ylim = ylim,
col = rgb(1,0,0,0.4),xlab = 'Lengths',
freq = FALSE, ## relative, not absolute frequency
main = 'Distribution of carrots and cucumbers')
## plot the second graph on top of this
opar <- par(new = FALSE)
plot(histCucumber,xlim = xlim, ylim = ylim,
xaxt = 'n', yaxt = 'n', ## don't add axes
col = rgb(0,0,1,0.4), add = TRUE,
freq = FALSE) ## relative, not absolute frequency
## add a legend in the corner
legend('topleft',c('Carrots','Cucumbers'),
fill = rgb(1:0,0,0:1,0.4), bty = 'n',
border = NA)
par(opar)
The only issue with this is that it looks much better if the histogram breaks are aligned, which may have to be done manually (in the arguments passed to hist).
Here's the version like the ggplot2 one I gave only in base R. I copied some from #nullglob.
generate the data
carrots <- rnorm(100000,5,2)
cukes <- rnorm(50000,7,2.5)
You don't need to put it into a data frame like with ggplot2. The drawback of this method is that you have to write out a lot more of the details of the plot. The advantage is that you have control over more details of the plot.
## calculate the density - don't plot yet
densCarrot <- density(carrots)
densCuke <- density(cukes)
## calculate the range of the graph
xlim <- range(densCuke$x,densCarrot$x)
ylim <- range(0,densCuke$y, densCarrot$y)
#pick the colours
carrotCol <- rgb(1,0,0,0.2)
cukeCol <- rgb(0,0,1,0.2)
## plot the carrots and set up most of the plot parameters
plot(densCarrot, xlim = xlim, ylim = ylim, xlab = 'Lengths',
main = 'Distribution of carrots and cucumbers',
panel.first = grid())
#put our density plots in
polygon(densCarrot, density = -1, col = carrotCol)
polygon(densCuke, density = -1, col = cukeCol)
## add a legend in the corner
legend('topleft',c('Carrots','Cucumbers'),
fill = c(carrotCol, cukeCol), bty = 'n',
border = NA)
#Dirk Eddelbuettel: The basic idea is excellent but the code as shown can be improved. [Takes long to explain, hence a separate answer and not a comment.]
The hist() function by default draws plots, so you need to add the plot=FALSE option. Moreover, it is clearer to establish the plot area by a plot(0,0,type="n",...) call in which you can add the axis labels, plot title etc. Finally, I would like to mention that one could also use shading to distinguish between the two histograms. Here is the code:
set.seed(42)
p1 <- hist(rnorm(500,4),plot=FALSE)
p2 <- hist(rnorm(500,6),plot=FALSE)
plot(0,0,type="n",xlim=c(0,10),ylim=c(0,100),xlab="x",ylab="freq",main="Two histograms")
plot(p1,col="green",density=10,angle=135,add=TRUE)
plot(p2,col="blue",density=10,angle=45,add=TRUE)
And here is the result (a bit too wide because of RStudio :-) ):
Plotly's R API might be useful for you. The graph below is here.
library(plotly)
#add username and key
p <- plotly(username="Username", key="API_KEY")
#generate data
x0 = rnorm(500)
x1 = rnorm(500)+1
#arrange your graph
data0 = list(x=x0,
name = "Carrots",
type='histogramx',
opacity = 0.8)
data1 = list(x=x1,
name = "Cukes",
type='histogramx',
opacity = 0.8)
#specify type as 'overlay'
layout <- list(barmode='overlay',
plot_bgcolor = 'rgba(249,249,251,.85)')
#format response, and use 'browseURL' to open graph tab in your browser.
response = p$plotly(data0, data1, kwargs=list(layout=layout))
url = response$url
filename = response$filename
browseURL(response$url)
Full disclosure: I'm on the team.
So many great answers but since I've just written a function (plotMultipleHistograms() in 'basicPlotteR' package) function to do this, I thought I would add another answer.
The advantage of this function is that it automatically sets appropriate X and Y axis limits and defines a common set of bins that it uses across all the distributions.
Here's how to use it:
# Install the plotteR package
install.packages("devtools")
devtools::install_github("JosephCrispell/basicPlotteR")
library(basicPlotteR)
# Set the seed
set.seed(254534)
# Create random samples from a normal distribution
distributions <- list(rnorm(500, mean=5, sd=0.5),
rnorm(500, mean=8, sd=5),
rnorm(500, mean=20, sd=2))
# Plot overlapping histograms
plotMultipleHistograms(distributions, nBins=20,
colours=c(rgb(1,0,0, 0.5), rgb(0,0,1, 0.5), rgb(0,1,0, 0.5)),
las=1, main="Samples from normal distribution", xlab="Value")
The plotMultipleHistograms() function can take any number of distributions, and all the general plotting parameters should work with it (for example: las, main, etc.).

r program grouping 3 histograms into one grouped histogram [duplicate]

I am using R and I have two data frames: carrots and cucumbers. Each data frame has a single numeric column that lists the length of all measured carrots (total: 100k carrots) and cucumbers (total: 50k cucumbers).
I wish to plot two histograms - carrot length and cucumbers lengths - on the same plot. They overlap, so I guess I also need some transparency. I also need to use relative frequencies not absolute numbers since the number of instances in each group is different.
Something like this would be nice but I don't understand how to create it from my two tables:
Here is an even simpler solution using base graphics and alpha-blending (which does not work on all graphics devices):
set.seed(42)
p1 <- hist(rnorm(500,4)) # centered at 4
p2 <- hist(rnorm(500,6)) # centered at 6
plot( p1, col=rgb(0,0,1,1/4), xlim=c(0,10)) # first histogram
plot( p2, col=rgb(1,0,0,1/4), xlim=c(0,10), add=T) # second
The key is that the colours are semi-transparent.
Edit, more than two years later: As this just got an upvote, I figure I may as well add a visual of what the code produces as alpha-blending is so darn useful:
That image you linked to was for density curves, not histograms.
If you've been reading on ggplot then maybe the only thing you're missing is combining your two data frames into one long one.
So, let's start with something like what you have, two separate sets of data and combine them.
carrots <- data.frame(length = rnorm(100000, 6, 2))
cukes <- data.frame(length = rnorm(50000, 7, 2.5))
# Now, combine your two dataframes into one.
# First make a new column in each that will be
# a variable to identify where they came from later.
carrots$veg <- 'carrot'
cukes$veg <- 'cuke'
# and combine into your new data frame vegLengths
vegLengths <- rbind(carrots, cukes)
After that, which is unnecessary if your data is in long format already, you only need one line to make your plot.
ggplot(vegLengths, aes(length, fill = veg)) + geom_density(alpha = 0.2)
Now, if you really did want histograms the following will work. Note that you must change position from the default "stack" argument. You might miss that if you don't really have an idea of what your data should look like. A higher alpha looks better there. Also note that I made it density histograms. It's easy to remove the y = ..density.. to get it back to counts.
ggplot(vegLengths, aes(length, fill = veg)) +
geom_histogram(alpha = 0.5, aes(y = ..density..), position = 'identity')
On additional thing, I commented on Dirk's question that all of the arguments could simply be in the hist command. I was asked how that could be done. What follows produces exactly Dirk's figure.
set.seed(42)
hist(rnorm(500,4), col=rgb(0,0,1,1/4), xlim=c(0,10))
hist(rnorm(500,6), col=rgb(1,0,0,1/4), xlim=c(0,10), add = TRUE)
Here's a function I wrote that uses pseudo-transparency to represent overlapping histograms
plotOverlappingHist <- function(a, b, colors=c("white","gray20","gray50"),
breaks=NULL, xlim=NULL, ylim=NULL){
ahist=NULL
bhist=NULL
if(!(is.null(breaks))){
ahist=hist(a,breaks=breaks,plot=F)
bhist=hist(b,breaks=breaks,plot=F)
} else {
ahist=hist(a,plot=F)
bhist=hist(b,plot=F)
dist = ahist$breaks[2]-ahist$breaks[1]
breaks = seq(min(ahist$breaks,bhist$breaks),max(ahist$breaks,bhist$breaks),dist)
ahist=hist(a,breaks=breaks,plot=F)
bhist=hist(b,breaks=breaks,plot=F)
}
if(is.null(xlim)){
xlim = c(min(ahist$breaks,bhist$breaks),max(ahist$breaks,bhist$breaks))
}
if(is.null(ylim)){
ylim = c(0,max(ahist$counts,bhist$counts))
}
overlap = ahist
for(i in 1:length(overlap$counts)){
if(ahist$counts[i] > 0 & bhist$counts[i] > 0){
overlap$counts[i] = min(ahist$counts[i],bhist$counts[i])
} else {
overlap$counts[i] = 0
}
}
plot(ahist, xlim=xlim, ylim=ylim, col=colors[1])
plot(bhist, xlim=xlim, ylim=ylim, col=colors[2], add=T)
plot(overlap, xlim=xlim, ylim=ylim, col=colors[3], add=T)
}
Here's another way to do it using R's support for transparent colors
a=rnorm(1000, 3, 1)
b=rnorm(1000, 6, 1)
hist(a, xlim=c(0,10), col="red")
hist(b, add=T, col=rgb(0, 1, 0, 0.5) )
The results end up looking something like this:
Already beautiful answers are there, but I thought of adding this. Looks good to me.
(Copied random numbers from #Dirk). library(scales) is needed`
set.seed(42)
hist(rnorm(500,4),xlim=c(0,10),col='skyblue',border=F)
hist(rnorm(500,6),add=T,col=scales::alpha('red',.5),border=F)
The result is...
Update: This overlapping function may also be useful to some.
hist0 <- function(...,col='skyblue',border=T) hist(...,col=col,border=border)
I feel result from hist0 is prettier to look than hist
hist2 <- function(var1, var2,name1='',name2='',
breaks = min(max(length(var1), length(var2)),20),
main0 = "", alpha0 = 0.5,grey=0,border=F,...) {
library(scales)
colh <- c(rgb(0, 1, 0, alpha0), rgb(1, 0, 0, alpha0))
if(grey) colh <- c(alpha(grey(0.1,alpha0)), alpha(grey(0.9,alpha0)))
max0 = max(var1, var2)
min0 = min(var1, var2)
den1_max <- hist(var1, breaks = breaks, plot = F)$density %>% max
den2_max <- hist(var2, breaks = breaks, plot = F)$density %>% max
den_max <- max(den2_max, den1_max)*1.2
var1 %>% hist0(xlim = c(min0 , max0) , breaks = breaks,
freq = F, col = colh[1], ylim = c(0, den_max), main = main0,border=border,...)
var2 %>% hist0(xlim = c(min0 , max0), breaks = breaks,
freq = F, col = colh[2], ylim = c(0, den_max), add = T,border=border,...)
legend(min0,den_max, legend = c(
ifelse(nchar(name1)==0,substitute(var1) %>% deparse,name1),
ifelse(nchar(name2)==0,substitute(var2) %>% deparse,name2),
"Overlap"), fill = c('white','white', colh[1]), bty = "n", cex=1,ncol=3)
legend(min0,den_max, legend = c(
ifelse(nchar(name1)==0,substitute(var1) %>% deparse,name1),
ifelse(nchar(name2)==0,substitute(var2) %>% deparse,name2),
"Overlap"), fill = c(colh, colh[2]), bty = "n", cex=1,ncol=3) }
The result of
par(mar=c(3, 4, 3, 2) + 0.1)
set.seed(100)
hist2(rnorm(10000,2),rnorm(10000,3),breaks = 50)
is
Here is an example of how you can do it in "classic" R graphics:
## generate some random data
carrotLengths <- rnorm(1000,15,5)
cucumberLengths <- rnorm(200,20,7)
## calculate the histograms - don't plot yet
histCarrot <- hist(carrotLengths,plot = FALSE)
histCucumber <- hist(cucumberLengths,plot = FALSE)
## calculate the range of the graph
xlim <- range(histCucumber$breaks,histCarrot$breaks)
ylim <- range(0,histCucumber$density,
histCarrot$density)
## plot the first graph
plot(histCarrot,xlim = xlim, ylim = ylim,
col = rgb(1,0,0,0.4),xlab = 'Lengths',
freq = FALSE, ## relative, not absolute frequency
main = 'Distribution of carrots and cucumbers')
## plot the second graph on top of this
opar <- par(new = FALSE)
plot(histCucumber,xlim = xlim, ylim = ylim,
xaxt = 'n', yaxt = 'n', ## don't add axes
col = rgb(0,0,1,0.4), add = TRUE,
freq = FALSE) ## relative, not absolute frequency
## add a legend in the corner
legend('topleft',c('Carrots','Cucumbers'),
fill = rgb(1:0,0,0:1,0.4), bty = 'n',
border = NA)
par(opar)
The only issue with this is that it looks much better if the histogram breaks are aligned, which may have to be done manually (in the arguments passed to hist).
Here's the version like the ggplot2 one I gave only in base R. I copied some from #nullglob.
generate the data
carrots <- rnorm(100000,5,2)
cukes <- rnorm(50000,7,2.5)
You don't need to put it into a data frame like with ggplot2. The drawback of this method is that you have to write out a lot more of the details of the plot. The advantage is that you have control over more details of the plot.
## calculate the density - don't plot yet
densCarrot <- density(carrots)
densCuke <- density(cukes)
## calculate the range of the graph
xlim <- range(densCuke$x,densCarrot$x)
ylim <- range(0,densCuke$y, densCarrot$y)
#pick the colours
carrotCol <- rgb(1,0,0,0.2)
cukeCol <- rgb(0,0,1,0.2)
## plot the carrots and set up most of the plot parameters
plot(densCarrot, xlim = xlim, ylim = ylim, xlab = 'Lengths',
main = 'Distribution of carrots and cucumbers',
panel.first = grid())
#put our density plots in
polygon(densCarrot, density = -1, col = carrotCol)
polygon(densCuke, density = -1, col = cukeCol)
## add a legend in the corner
legend('topleft',c('Carrots','Cucumbers'),
fill = c(carrotCol, cukeCol), bty = 'n',
border = NA)
#Dirk Eddelbuettel: The basic idea is excellent but the code as shown can be improved. [Takes long to explain, hence a separate answer and not a comment.]
The hist() function by default draws plots, so you need to add the plot=FALSE option. Moreover, it is clearer to establish the plot area by a plot(0,0,type="n",...) call in which you can add the axis labels, plot title etc. Finally, I would like to mention that one could also use shading to distinguish between the two histograms. Here is the code:
set.seed(42)
p1 <- hist(rnorm(500,4),plot=FALSE)
p2 <- hist(rnorm(500,6),plot=FALSE)
plot(0,0,type="n",xlim=c(0,10),ylim=c(0,100),xlab="x",ylab="freq",main="Two histograms")
plot(p1,col="green",density=10,angle=135,add=TRUE)
plot(p2,col="blue",density=10,angle=45,add=TRUE)
And here is the result (a bit too wide because of RStudio :-) ):
Plotly's R API might be useful for you. The graph below is here.
library(plotly)
#add username and key
p <- plotly(username="Username", key="API_KEY")
#generate data
x0 = rnorm(500)
x1 = rnorm(500)+1
#arrange your graph
data0 = list(x=x0,
name = "Carrots",
type='histogramx',
opacity = 0.8)
data1 = list(x=x1,
name = "Cukes",
type='histogramx',
opacity = 0.8)
#specify type as 'overlay'
layout <- list(barmode='overlay',
plot_bgcolor = 'rgba(249,249,251,.85)')
#format response, and use 'browseURL' to open graph tab in your browser.
response = p$plotly(data0, data1, kwargs=list(layout=layout))
url = response$url
filename = response$filename
browseURL(response$url)
Full disclosure: I'm on the team.
So many great answers but since I've just written a function (plotMultipleHistograms() in 'basicPlotteR' package) function to do this, I thought I would add another answer.
The advantage of this function is that it automatically sets appropriate X and Y axis limits and defines a common set of bins that it uses across all the distributions.
Here's how to use it:
# Install the plotteR package
install.packages("devtools")
devtools::install_github("JosephCrispell/basicPlotteR")
library(basicPlotteR)
# Set the seed
set.seed(254534)
# Create random samples from a normal distribution
distributions <- list(rnorm(500, mean=5, sd=0.5),
rnorm(500, mean=8, sd=5),
rnorm(500, mean=20, sd=2))
# Plot overlapping histograms
plotMultipleHistograms(distributions, nBins=20,
colours=c(rgb(1,0,0, 0.5), rgb(0,0,1, 0.5), rgb(0,1,0, 0.5)),
las=1, main="Samples from normal distribution", xlab="Value")
The plotMultipleHistograms() function can take any number of distributions, and all the general plotting parameters should work with it (for example: las, main, etc.).

Using Bxp function in R with varwidth

I am quite new to R programming and have been given the task of representing some data in a boxplot. We were only provided the five figure summary of the data, i.e the lowest value, lower quartile,median,upper quartile,highest value. We are also told the amount of samples (n).
I read bxp was a function similar to boxplot but drew the boxplot based upon this five figure summary.
However, I know varwidth can be used to change the width of boxes proportionate to N, yet it does not seem to work here as all boxes are the same length. This is what I need help with.
MORSEYear1 <- c(18.2,58.5,64.4,73.4,91.1)
MORSEYear2 <- c(22.3,56.4,64.3,75.7,97.4)
MORSEYear3 <- c(29.1,57.9,66.6,73.4,86.0)
MathStatYear1 <- c(46.8,54.8,66.1,71.4,84.1)
MathStatYear2 <- c(35.1,47.8,57.8,65.7,82.8)
MathStatYear3 <- c(32.6,56.3,61.1,75.6,89.4)
MORSE1<-list(stats=matrix(MORSEYear1,MORSEYear1[5],MORSEYear1[1]), n=139)
MORSE2<-list(stats=matrix(MORSEYear2,MORSEYear2[5],MORSEYear2[1]), n=132)
MORSE3<-list(stats=matrix(MORSEYear3,MORSEYear3[5],MORSEYear3[1]), n=131)
MS1 <- list(stats=matrix(MathStatYear1,MathStatYear1[5],MathStatYear1[1]), n= 21)
MS2 <- list(stats=matrix(MathStatYear2,MathStatYear2[5],MathStatYear2[1]), n=20)
MS3 <- list(stats=matrix(MathStatYear3,MathStatYear3[5],MathStatYear3[1]), n= 14)
bxp(MORSE1, xlim = c(0.5,6.5),ylim = c(0,100),varwidth= TRUE, main = "Graph comparing distribution of marks across different years of MORSE and MathStat",ylab = "Marks", xlab = "Course and year of study (Course,Year)", axes = FALSE)
par(new=T)
bxp(MORSE2, xlim = c(-0.5,5.5), ylim = c(0,100),axes= TRUE, varwidth=TRUE)
par(new=T)
bxp(MORSE3, xlim = c(-1.5,4.5), ylim = c(0,100), varwidth=TRUE, axes = FALSE)
par(new=T)
bxp(MS1, xlim = c(-2.5,3.5), ylim = c(0,100), varwidth=TRUE, axes = FALSE)
par(new=T)
bxp(MS2, xlim = c(-3.5,2.5), ylim = c(0,100), varwidth=TRUE, axes = FALSE)
par(new=T)
bxp(MS3, xlim = c(-4.5,1.5), ylim = c(0,100), varwidth=TRUE, axes = FALSE)
NOTE: My supervisor said to use par(new=T) and change the xlim to plot multiple graphs using bxp(), if someone could verify if this is the best method or not that would be great!
Thanks
Stumbled upon the same problem, without much experience with R.
The varwidth argument of the bxp() function requires multiple boxplots being plotted at once. Adding to an initial plot does not count, as no readjustment is possible after the fact.
The question is how to construct a multidimensional z argument for bxp(). To answer this, a look at the result of something like boxplot(c(c(1,1),c(2,2))~c(c(11,11),c(22,22))) helps.
First, a generic example with made-up data to aid anyone that lands here:
# data
d1 <- c(1,2,3,4,5)
d2 <- c(1,2,3,5,8,13,21,34)
# summaries (generated with quantile and structured accordingly)
z1 <- list(
stats=matrix(quantile(d1, c(0.05,0.25,0.5,0.75,0.85))),
n=length(d1)
)
z2 <- list(
stats=matrix(quantile(d2, c(0.05,0.25,0.5,0.75,0.85))),
n=length(d2)
)
# merging the summaries appropriately
z <- list(
stats=cbind(z1$stats,z2$stats),
n=c(z1$n,z2$n)
)
# check result
print(z)
# call bxp with needed parameters ("at" can/should also be used here)
bxp(z=z,varwidth=TRUE)
In the case of the original question, one should merge MORSE# and MS#. The code is far from optimal - there might be a better way to merge and a function for this can be written, but the aim is ugly clarity and simplicity:
z <- list(
stats=cbind(MORSE1$stats, MORSE2$stats, MORSE3$stats, M1$stats, M2$stats, M3$stats),
n=c(MORSE1$stats, MORSE2$n, MORSE3$n, M1$n, M2$n, M3$n)
)

Plotting two Poisson processes on one plot

I have two Poisson processes:
n <- 100
x <- seq(0, 10, length = 1000)
y1 <- cumsum(rpois(1000, 1 / n))
y2 <- -cumsum(rpois(1000, 1 / n))
I would like to plot them in one plot and expect that y1 lies above x-axis and y2 lies below x-axis. I tried the following code:
plot(x, y1)
par(new = TRUE)
plot(x, y2, col = "red",
axes = FALSE,
xlab = '', ylab = '',
xlim = c(0, 10), ylim = c(min(y2), max(y1)))
but it did not work. Can someone please tell me how to fix this? (I am working with R for my code)
Many thanks in advance
How about
plot(x,y1, ylim=range(y1,y2), type="l")
lines(x, y2, col="red")
I would suggest trying to avoid multiple calls to plot with par(new=TRUE). That is usually very messy. Here we use lines() to add to an existing plot. The only catch is that the x and y limits won't change based on the new data, so we use ylim in the first plot() call to set a range appropriate for all the data.
Or if you don't want to worry about limits (like MrFlick mentioned) or the number of lines, you could also tide up your data and using melt and ggplot
df <- data.frame(x, y1, y2)
library(reshape2)
library(ggplot2)
mdf <- melt(df, "x")
ggplot(mdf, aes(x, value, color = variable)) +
geom_line()

Howto Plot ROC curve in R with only known SN/PPV/Cutoff info

Given such data:
#Cutpoint SN (1-PPV)
5 0.56 0.01
7 0.78 0.19
9 0.91 0.58
How can I plot ROC curve with R that produce similar result like the
attached ?
I know ROCR package but it doesn't take such input.
If you just want to create the plot (without that silly interpolation spline between points) then just plot the data you give in the standard way, prepending a point at (0,0) and appending one at (1,1) to give the end points of the curve.
## your data with different labels
dat <- data.frame(cutpoint = c(5, 7, 9),
TPR = c(0.56, 0.78, 0.91),
FPR = c(0.01, 0.19, 0.58))
## plot version 1
op <- par(xaxs = "i", yaxs = "i")
plot(TPR ~ FPR, data = dat, xlim = c(0,1), ylim = c(0,1), type = "n")
with(dat, lines(c(0, FPR, 1), c(0, TPR, 1), type = "o", pch = 25, bg = "black"))
text(TPR ~ FPR, data = dat, pos = 3, labels = dat$cutpoint)
abline(0, 1)
par(op)
To explain the code: The first plot() call sets up the plotting region, without doing an plotting at all. Note that I force the plot to cover the range (0,1) in both axes. The par() call tells R to plot axes that cover the range of the data - the default extends them by 4 percent of the range on each axis.
The next line, with(dat, lines(....)) draws the ROC curve and here we prepend and append the points at (0,0) and (1,1) to give the full curve. Here I use type = "o" to give both points and lines overplotted, the points are represented by character 25 which allows it to be filled with a colour, here black.
Then I add labels to the points using text(....); the pos argument is used to position the label away from the actual plotting coordinates. I take the labels from the cutpoint object in the data frame.
The abline() call draws the 1:1 line (here the 0, and 1 mean an intercept of 0 and a slope of 1 respectively.
The final line resets the plotting parameters to the defaults we saved in op prior to plotting (in the first line).
The resulting plot looks like this:
It isn't an exact facsimile and I prefer the plot using the default for the axis ranges(adding 4 percent):
plot(TPR ~ FPR, data = dat, xlim = c(0,1), ylim = c(0,1), type = "n")
with(dat, lines(c(0, FPR, 1), c(0, TPR, 1), type = "o", pch = 25, bg = "black"))
text(TPR ~ FPR, data = dat, pos = 3, labels = dat$cutpoint)
abline(0, 1)
Again, not a true facsimile but close.

Resources