Remove part of y-axis in a boxplot - r

For the following boxplot:
How can I modify the y-axis and remove the range 0.1 to 0.8? The reason I want to do so is that I want to make each boxplot clear (that start from the range of 0.81 - 1).
For this boxplot, I wrote the following R script:
dataset <- read.csv("/boxplot.csv")
x <- boxplot(dataset)

This is generally discouraged and considered bad practice, as it leads to misleading visualizations. But you can do this using gap.boxplot() from the package plotrix.
Here is an arbitrary example:
library(plotrix)
test_data <- c(rnorm(100, 1000, 20), 10, 20)
par(mfrow = c(1, 2))
boxplot(test_data, main = "boxplot")
gap.boxplot(test_data, gap = list(bottom = c(50, 900), top = c(NA, NA)),
main = "gap.boxplot")

Related

R multiple plots of time series xts with only 1 legend

I want to produce multiple graphs of a time series xts object in different windows. The issue is that I cannot add only one legend (for the last plot). My code is the following:
dev.new(width=3,height=9)
par(mfrow=c(3,1))
plot(csum_GVMP[,c(-2,-3)],main=" ",minor.ticks="years",cex.axis = 1,major.ticks="years",grid.ticks.on=FALSE,grid.ticks.lty=0,col=color)
addLegend("bottomleft",legend.names = c("","","","","","",""))
plot(csum_ERC[,c(-2,-3)],main=" ",minor.ticks="years",cex.axis = 1,major.ticks="years",grid.ticks.on=FALSE,grid.ticks.lty=0,col=color)
addLegend("bottomleft",legend.names = c("","","","","","",""))
plot(csum_MD[,c(-2,-3)],main=" ",minor.ticks="years",cex.axis = 1,major.ticks="years",grid.ticks.on=FALSE,grid.ticks.lty=0,col=color)
As you see I added blank values for the legend names for the 1st and 2nd plot, but the results is that the graphs are of the same plot are being repeated two times like these: showing only the plot for the csum_GVMP
here
Otherwise if I leave the addLegend out the plot looks like this here,
which is what I want but now I would like to add only one legend. If I leave out the command addLegend for 1st and 2nd plot, the figures are not even plotted.
Does it anybody know how to handle this? Thank you in advance.
here you go. If you uncomment the addLegend it will duplicate the graphs, as I mentioned in the post.
I hope this helps
set.seed(10)
library(MASS)
library(xts)
date=seq(as.Date("2000/1/1"), as.Date("2000/1/10"), "days")
matrixA=as.numeric(mvrnorm(n = 30, 0.5, 0.2, tol = 1e-6, empirical = TRUE, EISPACK = FALSE))
matrixA=matrix(matrixA,10,3)
martixA.ts=as.xts(matrixA,date)
matrixB=as.numeric(mvrnorm(n = 30, 0.5, 0.2, tol = 1e-6, empirical = TRUE, EISPACK = FALSE))
matrixB=matrix(matrixB,10,3)
martixB.ts=as.xts(matrixB,date)
par(mfrow=c(2,1))
plot(as.xts(matrixA,date),main="A")
#addLegend("bottomleft",legend.names = c("A","B"))
plot(as.xts(matrixB,date),main="B")
#addLegend("bottomleft",legend.names = c("",""))
You should be able to see this
I'm not particularly happy with this solution, but it solves the immediate problem.
The strategy is to "build" the plot to completion before plotting/printing it. See below.
set.seed(10)
library(MASS)
library(xts)
date <- seq(as.Date("2000-01-01"), as.Date("2000-01-10"), "days")
matrixA <- matrix(mvrnorm(n = 30, 0.5, 0.2, empirical = TRUE), 10, 3)
matrixA.ts <- xts(matrixA, date)
matrixB <- matrix(mvrnorm(n = 30, 0.5, 0.2, empirical = TRUE), 10, 3)
matrixB.ts <- xts(matrixB, date)
# Create the first plot, but do not draw it
# Assign the result to 'p1'
p1 <- plot(matrixA.ts, main = "A")
p1 <- addLegend("bottomleft", legend.names = c("A","B"))
# Create the second plot without drawing it
# Assign the result to 'p2'
p2 <- plot(matrixB.ts, main = "B")
p2 <- addLegend("bottomleft", legend.names = c("",""))
# Set up the device layout, and draw both plots
par(mfrow=c(2,1))
p1
p2

Add multiple horizontal lines in a boxplot

I know that I can add a horizontal line to a boxplot using a command like
abline(h=3)
When there are multiple boxplots in a single panel, can I add different horizontal lines for each single boxplot?
In the above plot, I would like to add lines 'y=1.2' for 1, 'y=1.5' for 2, and 'y=2.1' for 3.
I am not sure that I understand exactly, what you want, but it might be this: add a line for each boxplot that covers the same x-axis range as the boxplot.
The width of the boxes is controlled by pars$boxwex which is set to 0.8 by default. This can be seen from the argument list of boxplot.default:
formals(boxplot.default)$pars
## list(boxwex = 0.8, staplewex = 0.5, outwex = 0.5)
So, the following produces a line segment for each boxplot:
# create sample data and box plot
set.seed(123)
datatest <- data.frame(a = rnorm(100, mean = 10, sd = 4),
b = rnorm(100, mean = 15, sd = 6),
c = rnorm(100, mean = 8, sd = 5))
boxplot(datatest)
# create data for segments
n <- ncol(datatest)
# width of each boxplot is 0.8
x0s <- 1:n - 0.4
x1s <- 1:n + 0.4
# these are the y-coordinates for the horizontal lines
# that you need to set to the desired values.
y0s <- c(11.3, 16.5, 10.7)
# add segments
segments(x0 = x0s, x1 = x1s, y0 = y0s, col = "red")
This gives the following plot:

Draw vertical quantile lines over histogram

I currently generate the following plot using ggplot in R:
The data is stored in a single dataframe with three columns: PDF (y-axis in the plot above), mids(x) and dataset name. This is created from histograms.
What I want to do is to plot a color-coded vertical line for each dataset representing the 95th quantile, like I manually painted below as an example:
I tried to use + geom_line(stat="vline", xintercept="mean") but of course I'm looking for the quantiles, not for the mean, and AFAIK ggplot does not allow that. Colors are fine.
I also tried + stat_quantile(quantiles = 0.95) but I'm not sure what it does exactly. Documentation is very scarce. Colors, again, are fine.
Please note that density values are very low, down to 1e-8. I don't know if the quantile() function likes that.
I understand that calculating the quantile of an histogram is not quite the same as calculating that of a list of numbers. I don't know how it would help, but the HistogramToolspackage contains an ApproxQuantile() function for histogram quantiles.
Minimum working example is included below. As you can see I obtain a data frame from each histogram, then bind the dataframes together and plot that.
library(ggplot2)
v <- c(1:30, 2:50, 1:20, 1:5, 1:100, 1, 2, 1, 1:5, 0, 0, 0, 5, 1, 3, 7, 24, 77)
h <- hist(v, breaks=c(0:100))
df1 <- data.frame(h$mids,h$density,rep("dataset1", 100))
colnames(df1) <- c('Bin','Pdf','Dataset')
df2 <- data.frame(h$mids*2,h$density*2,rep("dataset2", 100))
colnames(df2) <- c('Bin','Pdf','Dataset')
df_tot <- rbind(df1, df2)
ggplot(data=df_tot[which(df_tot$Pdf>0),], aes(x=Bin, y=Pdf, group=Dataset, colour=Dataset)) +
geom_point(aes(color=Dataset), alpha = 0.7, size=1.5)
Precomputing these values and plotting them separately seems like the simplest option. Doing so with dplyr requires minimal effort:
library(dplyr)
q.95 <- df_tot %>%
group_by(Dataset) %>%
summarise(Bin_q.95 = quantile(Bin, 0.95))
ggplot(data=df_tot[which(df_tot$Pdf>0),],
aes(x=Bin, y=Pdf, group=Dataset, colour=Dataset)) +
geom_point(aes(color=Dataset), alpha = 0.7, size=1.5) +
geom_vline(data = q.95, aes(xintercept = Bin_q.95, colour = Dataset))

How to overlay density plots in R?

I would like to overlay 2 density plots on the same device with R. How can I do that? I searched the web but I didn't find any obvious solution.
My idea would be to read data from a text file (columns) and then use
plot(density(MyData$Column1))
plot(density(MyData$Column2), add=T)
Or something in this spirit.
use lines for the second one:
plot(density(MyData$Column1))
lines(density(MyData$Column2))
make sure the limits of the first plot are suitable, though.
ggplot2 is another graphics package that handles things like the range issue Gavin mentions in a pretty slick way. It also handles auto generating appropriate legends and just generally has a more polished feel in my opinion out of the box with less manual manipulation.
library(ggplot2)
#Sample data
dat <- data.frame(dens = c(rnorm(100), rnorm(100, 10, 5))
, lines = rep(c("a", "b"), each = 100))
#Plot.
ggplot(dat, aes(x = dens, fill = lines)) + geom_density(alpha = 0.5)
Adding base graphics version that takes care of y-axis limits, add colors and works for any number of columns:
If we have a data set:
myData <- data.frame(std.nromal=rnorm(1000, m=0, sd=1),
wide.normal=rnorm(1000, m=0, sd=2),
exponent=rexp(1000, rate=1),
uniform=runif(1000, min=-3, max=3)
)
Then to plot the densities:
dens <- apply(myData, 2, density)
plot(NA, xlim=range(sapply(dens, "[", "x")), ylim=range(sapply(dens, "[", "y")))
mapply(lines, dens, col=1:length(dens))
legend("topright", legend=names(dens), fill=1:length(dens))
Which gives:
Just to provide a complete set, here's a version of Chase's answer using lattice:
dat <- data.frame(dens = c(rnorm(100), rnorm(100, 10, 5))
, lines = rep(c("a", "b"), each = 100))
densityplot(~dens,data=dat,groups = lines,
plot.points = FALSE, ref = TRUE,
auto.key = list(space = "right"))
which produces a plot like this:
That's how I do it in base (it's actually mentionned in the first answer comments but I'll show the full code here, including legend as I can not comment yet...)
First you need to get the info on the max values for the y axis from the density plots. So you need to actually compute the densities separately first
dta_A <- density(VarA, na.rm = TRUE)
dta_B <- density(VarB, na.rm = TRUE)
Then plot them according to the first answer and define min and max values for the y axis that you just got. (I set the min value to 0)
plot(dta_A, col = "blue", main = "2 densities on one plot"),
ylim = c(0, max(dta_A$y,dta_B$y)))
lines(dta_B, col = "red")
Then add a legend to the top right corner
legend("topright", c("VarA","VarB"), lty = c(1,1), col = c("blue","red"))
I took the above lattice example and made a nifty function. There is probably a better way to do this with reshape via melt/cast. (Comment or edit if you see an improvement.)
multi.density.plot=function(data,main=paste(names(data),collapse = ' vs '),...){
##combines multiple density plots together when given a list
df=data.frame();
for(n in names(data)){
idf=data.frame(x=data[[n]],label=rep(n,length(data[[n]])))
df=rbind(df,idf)
}
densityplot(~x,data=df,groups = label,plot.points = F, ref = T, auto.key = list(space = "right"),main=main,...)
}
Example usage:
multi.density.plot(list(BN1=bn1$V1,BN2=bn2$V1),main='BN1 vs BN2')
multi.density.plot(list(BN1=bn1$V1,BN2=bn2$V1))
You can use the ggjoy package. Let's say that we have three different beta distributions such as:
set.seed(5)
b1<-data.frame(Variant= "Variant 1", Values = rbeta(1000, 101, 1001))
b2<-data.frame(Variant= "Variant 2", Values = rbeta(1000, 111, 1011))
b3<-data.frame(Variant= "Variant 3", Values = rbeta(1000, 11, 101))
df<-rbind(b1,b2,b3)
You can get the three different distributions as follows:
library(tidyverse)
library(ggjoy)
ggplot(df, aes(x=Values, y=Variant))+
geom_joy(scale = 2, alpha=0.5) +
scale_y_discrete(expand=c(0.01, 0)) +
scale_x_continuous(expand=c(0.01, 0)) +
theme_joy()
Whenever there are issues of mismatched axis limits, the right tool in base graphics is to use matplot. The key is to leverage the from and to arguments to density.default. It's a bit hackish, but fairly straightforward to roll yourself:
set.seed(102349)
x1 = rnorm(1000, mean = 5, sd = 3)
x2 = rnorm(5000, mean = 2, sd = 8)
xrng = range(x1, x2)
#force the x values at which density is
# evaluated to be the same between 'density'
# calls by specifying 'from' and 'to'
# (and possibly 'n', if you'd like)
kde1 = density(x1, from = xrng[1L], to = xrng[2L])
kde2 = density(x2, from = xrng[1L], to = xrng[2L])
matplot(kde1$x, cbind(kde1$y, kde2$y))
Add bells and whistles as desired (matplot accepts all the standard plot/par arguments, e.g. lty, type, col, lwd, ...).

R: How do I display clustered matrix heatmap (similar color patterns are grouped)

I searched a lot of questions about heatmap throughout the site and packages, but I still have a problem.
I have clustered data (kmeans/EM/DBscan..), and I want to create a heatmap by grouping the same cluster. I want the similar color patterns to be grouped in the heatmap, so generally, it looks like a block-diagonal.
I tried to order the data by the cluster number and display it,
k = kmeans(data, 3)
d = data.frame(data)
d = data.frame(d, k$cluster)
d = d[order(d$k.cluster),]
heatmap(as.matrix(d))
but it is still not sorted and looks like this link: But, I want it to be sorted by its cluster number and looked like this:
Can I do this in R?
I searched lots of packages and tried many ways, but I still have a problem.
Thanks a lot.
You can do this using reshape2 and ggplot2 as follows:
library(reshape2)
library(ggplot2)
# Create dummy data
set.seed(123)
df <- data.frame(
a = sample(1:5, 1000, replace=TRUE),
b = sample(1:5, 1000, replace=TRUE),
c = sample(1:5, 1000, replace=TRUE)
)
# Perform clustering
k <- kmeans(df, 3)
# Append id and cluster
dfc <- cbind(df, id=seq(nrow(df)), cluster=k$cluster)
# Add idsort, the id number ordered by cluster
dfc$idsort <- dfc$id[order(dfc$cluster)]
dfc$idsort <- order(dfc$idsort)
# use reshape2::melt to create data.frame in long format
dfm <- melt(dfc, id.vars=c("id", "idsort"))
ggplot(dfm, aes(x=variable, y=idsort)) + geom_tile(aes(fill=value))
You should set Rowv and Colv to NA if you don't want the dendrograms and the subseuent ordering. BTW, You should also put of the scaling. Using the df of Andrie :
heatmap(as.matrix(df)[order(k$cluster),],Rowv=NA,Colv=NA,scale="none",labRow=NA)
In fact, this whole heatmap is based on image(). You can hack away using image to construct a plot exactly like you want. Heatmap is using layout() internally, so it will be diffucult to set the margins. With image you could do eg :
myHeatmap <- function(x,ord,xlab="",ylab="",main="My Heatmap",
col=heat.colors(5), ...){
op <- par(mar=c(3,0,2,0)+0.1)
on.exit(par(op))
nc <- NCOL(x)
nr <- NROW(x)
labCol <- names(x)
x <- t(x[ord,])
image(1L:nc, 1L:nr, x, xlim = 0.5 + c(0, nc), ylim = 0.5 +
c(0, nr), axes = FALSE, xlab=xlab, ylab=ylab, main=main,
col=col,...)
axis(1, 1L:nc, labels = labCol, las = 2, line = -0.5, tick = 0)
axis(2, 1L:nr, labels = NA, las = 2, line = -0.5, tick = 0)
}
library(RColorBrewer)
myHeatmap(df,order(k$cluster),col=brewer.pal(5,"BuGn"))
To produce a plot that has less margins on the side. You can also manipulate axes, colors, ... You should definitely take a look at the RColorBrewerpackage
(This custom function is based on the internal plotting used by heatmap btw, simplified for the illustration and to get rid of all the dendrogram stuff)

Resources