R: Histogram with both custom breaks and constant width - r

I have some skewed data and want to create a histogram with custom breaks, but want it to actually look readable w/ constant widths for the bins (which would throw off the scale of the x axis, but that's fine). Does anyone know how to do this in ggplot/R?
This is what I don't want, but I don't know how to make breaks not override the width argument:
library(ggplot2)
test_data = rep(c(1,2,3,4,5,8,9,14,20,42,98,101,175), c(50,40,30,20,10,6,6,7,9,5,6,4,1))
buckets = c(-.5,.5,1.5,2.5,3.5,4.5,5.5,10.5,99.5,200)
q1 = qplot(test_data,geom="histogram",breaks=buckets)
print(q1)
Not the histogram I want :(

As ulfelder suggested, use cut():
library(ggplot2)
test_data = rep(c(1,2,3,4,5,8,9,14,20,42,98,101,175),
c(50,40,30,20,10,6,6,7,9,5,6,4,1))
buckets = c(-.5,.5,1.5,2.5,3.5,4.5,5.5,10.5,99.5,200)
q1 = qplot(cut(test_data, buckets), geom="histogram")
print(q1)

Related

How do I use R cut function to create custom bins with x y coordinates?

Firstly apologies, if I do not explain this very well, I'm relatively new to the cut function and cannot find a suitable answer to my question.
I have X Y coordinate data, and I know how to create evenly distributed bins for use with stat_bin_2d. I would do something like this:
heatmap$xbin <- cut(heatmap$x, breaks = seq(from=0, to=100, by = 20),include.lowest=TRUE )
heatmap$ybin <- cut(heatmap$y, breaks = seq(from=0, to=100, by = 20),include.lowest=TRUE)
Using ggplot I'd include this in a plot like so:
stat_bin_2d(data = heatmap, aes(x=x, y=y), binwidth = c(20,20))+
However if I want to create custom sized bins of different sizes, I'm not entirely sure how I do this. For example, if I wanted to plot specific zones of interest on a sports pitch, how do I approach the cut function as its not an even distribution?
I tried this, but again I don't believe this is correct:
heatmap$xbin <- cut(heatmap$x, breaks = seq(from=80, to=100, by = 1),include.lowest=TRUE )
heatmap$ybin <- cut(heatmap$y, breaks = seq(from=40, to=60, by = 1),include.lowest=TRUE)
Effectively I'd like bins like this, once I know how to customize the bins sizes:

Setting custom LCL and UCL limits with qcc (Rstudio)

Perhaps this is an easy question but I am quite new wirh R and am struggling to define custom UCL and LCL limits in xBar control charts. In productions we have already set tollerances that must be fulfilled and I would like to set the limits (LCL and UCL) according the tollerances but I do not know how to do.
I write here a simple example to better understand:
library(qcc)
data(pistonrings)
diameter <- pistonrings$diameter
q1 <- qcc(diameter, type = "xbar.one", plot = TRUE)
This creates the xBar chart defining the two limits according the measurements and confidence interval. I would like to set them as following (just as example) and calculate the results according these values:
LCL: 73.99
UCL: 74.02
Is it possible?
I fixed the issue. It was enough specifying the limits with the qcc function:
q1 <- qcc(diameter, type = "xbar.one", plot = TRUE, limits = c(73.99,74.02))

Plot a table with box size changing

Does anyone have an idea how is this kind of chart plotted? It seems like heat map. However, instead of using color, size of each cell is used to indicate the magnitude. I want to plot a figure like this but I don't know how to realize it. Can this be done in R or Matlab?
Try scatter:
scatter(x,y,sz,c,'s','filled');
where x and y are the positions of each square, sz is the size (must be a vector of the same length as x and y), and c is a 3xlength(x) matrix with the color value for each entry. The labels for the plot can be input with set(gcf,properties) or xticklabels:
X=30;
Y=10;
[x,y]=meshgrid(1:X,1:Y);
x=reshape(x,[size(x,1)*size(x,2) 1]);
y=reshape(y,[size(y,1)*size(y,2) 1]);
sz=50;
sz=sz*(1+rand(size(x)));
c=[1*ones(length(x),1) repmat(rand(size(x)),[1 2])];
scatter(x,y,sz,c,'s','filled');
xlab={'ACC';'BLCA';etc}
xticks(1:X)
xticklabels(xlab)
set(get(gca,'XLabel'),'Rotation',90);
ylab={'RAPGEB6';etc}
yticks(1:Y)
yticklabels(ylab)
EDIT: yticks & co are only available for >R2016b, if you don't have a newer version you should use set instead:
set(gca,'XTick',1:X,'XTickLabel',xlab,'XTickLabelRotation',90) %rotation only available for >R2014b
set(gca,'YTick',1:Y,'YTickLabel',ylab)
in R, you should use ggplot2 that allows you to map your values (gene expression in your case?) onto the size variable. Here, I did a simulation that resembles your data structure:
my_data <- matrix(rnorm(8*26,mean=0,sd=1), nrow=8, ncol=26,
dimnames = list(paste0("gene",1:8), LETTERS))
Then, you can process the data frame to be ready for ggplot2 data visualization:
library(reshape)
dat_m <- melt(my_data, varnames = c("gene", "cancer"))
Now, use ggplot2::geom_tile() to map the values onto the size variable. You may update additional features of the plot.
library(ggplot2)
ggplot(data=dat_m, aes(cancer, gene)) +
geom_tile(aes(size=value, fill="red"), color="white") +
scale_fill_discrete(guide=FALSE) + ##hide scale
scale_size_continuous(guide=FALSE) ##hide another scale
In R, corrplotpackage can be used. Specifically, you have to use method = 'square' when creating the plot.
Try this as an example:
library(corrplot)
corrplot(cor(mtcars), method = 'square', col = 'red')

Setting equal xlim and ylim in plot function

Is there a way to get the plot function to generate equal xlimand ylimautomatically?
I do not want to define a fix range beforehand, but I want the plot function to decide about the range itself. However, I expect it to pick the same range for x and y.
A possible solution is to define a wrapper to the plot function:
plot.Custom <- function(x, y, ...) {
.limits <- range(x, y)
plot(x, y, xlim = .limits, ylim = .limits, ...)
}
One way is to manipulate interactively and then choose the right one. A slider will appear once you run the following code.
library(manipulate)
manipulate(
plot(cars, xlim=c(x.min,x.max)),
x.min=slider(0,15),
x.max=slider(15,30))
I'm not aware of anyway to do this using plot(doesn't mean there isn't one). ggplot might be the way to go; it lends itself more to be being retroactively changed since it is designed around a layer system.
library(ggplot2)
#Creating our ggplot object
loop_plot <- ggplot(cars, aes(x = speed, y = dist)) +
geom_point()
#pulling out the 'auto' x & y axis limits
rangepull <- t(cbind(
ggplot_build(loop_plot)$panel$ranges[[1]]$x.range,
ggplot_build(loop_plot)$panel$ranges[[1]]$y.range))
#taking the max and min(so we don't cut out data points)
newrange <- list(cor.min = min(rangepull[,1]), cor.max = max(rangepull[,2]))
#changing our plot size to be nice and symmetric
loop_plot <- loop_plot +
xlim(newrange$cor.min, newrange$cor.max) +
ylim(newrange$cor.min, newrange$cor.max)
Note that the loop_plot object is of ggplot class, and wont actually print until its called.
I used the cars dataset in the code above to show whats going on, but just sub in your data set[s] and then do whatever postmortem your end goal is.
You'll also be able to add in titles and the like based off of the dataset name et cetera which will likely end up producing a clearer visualization out of your loop.
Hopefully this works for your needs.

lattice or latticeExtra combine multiple plots different yscaling (log10 and non-transformed)

I have a multiple variable time series were some of the variables have rather large ranges. I wish to make a single-page plot with multiple stacked plots of each variable were some of the variables have a log10 y-axis scaling. I am relatively new to lattice and have not been able to figure out how to effectively mix the log10 scaling with non-transformed axes and get a publication quality plot. If print.trellis is used the plots are not aligned and the padding needs some work, if c.trellis is used the layout is good, but only the y-scaling from only one plot is used. Any suggestions for an efficient solution, where I can replicate the output of c.trellis using the different y-scaling for each (original) object?
Example below:
require(lattice)
require(latticeExtra)
# make data.frame
d.date <- as.POSIXct(c("2009-12-15", "2010-01-15", "2010-02-15", "2010-03-15", "2010-04-15"))
CO2dat <- c(100,200,1000,9000,2000)
pHdat <- c(10,9,7,6,7)
tmp <- data.frame(date=d.date ,CO2dat=CO2dat ,pHdat=pHdat)
# make plots
plot1 <- xyplot(pHdat ~ date, data=tmp
, ylim=c(5,11)
, ylab="pHdat"
, xlab="Date"
, origin = 0, border = 0
, scales=list(y=list(alternating=1))
, panel = function(...){
panel.xyarea(...)
panel.xyplot(...)
}
)
# make plot with log y scale
plot2 <- xyplot(CO2dat ~ date, data=tmp
, ylim=c(10,10^4)
, ylab="CO2dat"
, xlab="Date"
, origin = 0, border = 0
, scales=list(y=list(alternating=1,log=10))
, yscale.components = yscale.components.log10ticks
, panel = function(...){
panel.xyarea(...)
panel.xyplot(...)
# plot CO2air uatm
panel.abline(h=log10(390),col="blue",type="l",...)
}
)
# plot individual figures using split
print(plot2, split=c(1,1,1,2), more=TRUE)
print(plot1, split=c(1,2,1,2), more=F)
# combine plots (more convenient)
comb <- c(plot1, plot2, x.same=F, y.same=F, layout = c(1, 2))
# plot combined figure
update(comb, ylab = c("pHdat","log10 CO2dat"))
Using #joran's idea, I can get the axes to be closer but not exact; also, reducing padding gets them closer together but changes the aspect ratio. In the picture below I've reduced the padding perhaps by too much to show the not exactness; if this close were desired, you'd clearly want to remove the x-axis labels on the top as well.
I looked into the code that sets up the layout and the margin on the left side is calculated from the width of the labels, so #joran's idea is probably the only thing that will work based on the printing using split, unless one were to rewrite the plot.trellis command. Perhaps the c method could work but I haven't found a way yet to set the scale components separately depending on the panel. That does seem more promising though.
mtheme <- standard.theme("pdf")
mtheme$layout.heights$bottom.padding <- -10
plot1b <- update(plot1, scales=list(y=list(alternating=1, at=5:10, labels=paste(" ",c(5:10)))))
plot2b <- update(plot2, par.settings=mtheme)
pdf(file="temp.pdf")
print(plot2b, split=c(1,1,1,2), more=TRUE)
print(plot1b, split=c(1,2,1,2), more=F)

Resources