Plotting a histogram with custom breaks - r

I have a vector like:
K <- rnorm(10000, mean=100)
I want to create a histogram of K with custom breaks (and labels) like <20, 20-50, 50-75, 75-99, =100, >400, etc.
Any ideas?

With base plotting, you might be better off cutting the vector first, and then using barplot or the plot method for tables.
For example:
K <- rnorm(10000, mean=100, sd = 100)
K.cut <- cut(K, c(-Inf, 20, 50, 75, 100, 400, Inf))
plot(table(K.cut), xaxt='n', ylab='K')
axis(1, at=1:6, labels=c('< 20', '20-50', '50-75', '75-100', '100-400', '> 400'))
box(bty='L')
xax <- barplot(table(K.cut), xaxt='n')
axis(1, at=xax, labels=c('< 20', '20-50', '50-75', '75-100', '100-400', '> 400'))
box(bty='L')
Note that by default, cut includes the upper (but not the lower) bound in each bin, so for example the 20-50 bin includes any 50s, but the 20s will be included in the lower adjacent bin.

Try ggplot version:
library(ggplot)
ggplot()+ geom_histogram(aes(K))
Many options are available for tweaking.

Related

Is there a way to make a Line Chart with a separate Pie Chart as a marker in R?

I am looking for a way to have a line chart with values over a timeline, however, I need the markers to be pie charts that I can programmatically generate. Something similar to this is what I am looking for:
Is this even possible in R, and if so, what libraries would I need to download to achieve this.
Assuming I have a dataset like this(in a comma-separated list):
I want the line chart to be constructed with time on the X-axis and status on the Y-axis. However, the markers should be pie charts with equal proportions with different colors based on the Quality, Cost, and Delivery status in the Dataset. Similar to this:
Not sure if there exists a package to do this. But with some assistance and inspiration from the plotrix package (in particular plotrix::getYmult()) (see more here: https://CRAN.R-project.org/package=plotrix) it is doable.
First define the function
addPies <- function(x, y=x, radius=0.1, shareVector=c(25, 25, 25, 25),
col="cbPalette"){
#setup
if (!require('plotrix')) { stop('Need package plotrix. See https://CRAN.R-project.org/package=plotrix') }
seqD <- seq(0, 2*pi, length=100)
if(any(grepl("cbPalette", col))){
#color palette from http://www.cookbook-r.com/Graphs/Colors_(ggplot2)/
col <- c("#999999", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")
} else {
col <- col
}
#iterate over number of circles
for(j in 1:length(x)){
xcord <- x[j]
ycord <- y[j]
r <- radius[j]
if(is.list(shareVector)){
shareVec <- shareVector[[j]]
} else {
shareVec <- shareVector
}
#the way xx and yy are defined is heavily inspired by the plotrix package
xx <- cos(seqD)*r+xcord
yy <- sin(seqD)*getYmult()*r+ycord
inputPer <- cumsum(shareVec)
#initiate circle
inputPush <- 0
#iterate over number of shares
for(i in 1:length(inputPer)){
nullX <- seq(xcord,xx[(inputPush[i]):inputPer[i]][1], length=100)
nullY <- seq(ycord,yy[(inputPush[i]):inputPer[i]][1], length=100)
xpol <- c(xx[(inputPush[i]):inputPer[i]], nullX)
ypol <- c(yy[(inputPush[i]):inputPer[i]], nullY)
polygon(xpol, ypol, col=col[i], border="white", lwd=2)
inputPush[i+1] <- inputPer[i]
}
}
}
Inputs are:
x is a number (for single pie) or vector (for multiple pies) of x-coordinates. y same. radius same. shareVector is a vector (for single pie) or list of vectors (multiple pies), only for integers and should sum up to 100, or else it will have a blank spot.
Example single pie:
plot(0,0, type="n")
addPies(0)
Example multiple pies over line:
xVec <- c(2010, 2011, 2012, 2013)
yVec <- c(20, 50, 10, 35)
radiusVec <- c(0.15, 0.25, 0.1, 0.20)
shareList <- list(c(70, 20, 10), c(20, 50, 30), c(20, 20, 40, 10, 10), c(50, 50))
plot(y=yVec, x=xVec, type='l', xlim=c(2009.5, 2013.5), ylim=c(0, 66),
lwd=2, col="lightblue")
addPies(xVec, yVec, radiusVec, shareList)
Save plots by using Device Size and it should look okay
It's an old question, but to be complete, you can use add.pie() from the mapplots package.

Grouping extreme value bins into one "> x" bin

Does a function / method already exist to determine the frequency of data greater than some value? Similar to the Excel frequency distribution, I would like to group extreme values into the last bin (e.g., >120 as in image). I have been doing this manually by first using the hist function and then summing the counts for breaks greater than a given value.
Here's one option:
d <- rlnorm(1000, 3)
d.cut <- cut(d, c(seq(0, 120, 10), Inf))
hist(as.numeric(d.cut), breaks=0:13, xaxt='n', xlab='',
col=1, border=0, main='', cex.axis=0.8, las=1)
axis(1, at=0:13, labels=c(seq(0, 120, 10), '>120'), cex.axis=0.8)
box()

R - Customizing X Axis Values in Histogram

I want to change the values on the x axis in my histogram in R.
The computer currently has it set as
0, 20, 40, 60, 80, 100.
I want the x axis to go by 10 as in:
0,10,20,30,40,50,60,70,80,90,100.
I know to get rid of the current axis I have to do this
(hist(x), .... xaxt = 'n')
and then
axis(side = 1) .....
But how do I get it to show the numbers that I need it to show?
Thanks.
The answer is right there in ?axis...
dat <- sample(100, 1000, replace=TRUE)
hist(dat, xaxt='n')
axis(side=1, at=seq(0,100, 10), labels=seq(0,1000,100))

Reverse Statistics with R

What I want to do sounds simple. I want to plot a normal IQ curve with R with a mean of 100 and a standard deviation of 15. Then, I'd like to be able to overlay a scatter plot of data on top of it.
Anybody know how to do this?
I'm guessing what you want to do is this: you want to plot the model normal density with mean 100 and sd = 15, and you want to overlay on top of that the empirical density of some set of observations that purportedly follow the model normal density, so that you can visualize how well the model density fits the empirical density. The code below should do this (here, x would be the vector of actual observations but for illustration purposes I'm generating it with a mixed normal distribution N(100,15) + 15*N(0,1), i.e. the purported N(100,15) distribution plus noise).
require(ggplot2)
x <- round( rnorm( 1000, 100, 15 )) + rnorm(1000)*15
dens.x <- density(x)
empir.df <- data.frame( type = 'empir', x = dens.x$x, density = dens.x$y )
norm.df <- data.frame( type = 'normal', x = 50:150, density = dnorm(50:150,100,15))
df <- rbind(empir.df, norm.df)
m <- ggplot(data = df, aes(x,density))
m + geom_line( aes(linetype = type, colour = type))
Well, it's more like a histogram, since I think you are expecting these to be more like an integer rounded process:
x<-round(rnorm(1000, 100, 15))
y<-table(x)
plot(y)
par(new=TRUE)
plot(density(x), yaxt="n", ylab="", xlab="", xaxt="n")
If you want the theoretic value of dnorm superimposed, then use one of these:
lines(sort(x), dnorm(sort(x), 100, 15), col="red")
-or
points(x, dnorm(x, 100, 15))
You can generate IQ scores PDF with:
curve(dnorm(x, 100, 15), 50, 150)
But why would you like to overlay scatter over density curve? IMHO, that's very unusual...
In addition to the other good answers, you might be interested in plotting a number of panels, each with its own graph. Something like this.

How can I create raster plots with the same colour scale in R

I'm creating some maps from raster files using the "raster" package in R. I'd like to create comparison rasters, showing several maps side by side. It's important for this that the colour scales used are the same for all maps, regardless of the values in each map. For example, if map 1 has values from 0-1, and map 2 has values from 0-0.5, cells with a value of 0.5 should have the same colour on both maps.
For example:
map 1 has values from 0 to 1
map 2 has values from 0 to 0.5
the colour goes from red (lowest) to green (highest)
I would like a value of 0.5 to have the same colour in both maps (i.e. yellow, as halfway between red and green). The current behaviour is that it is yellow in map 1, and green in map 2.
I can't find a way to make this work. I can't see any way to set the range of pixel values to use with the plotting function. setMinMax() doesn't help (as 'plot' always calculates the values). Even trying to set the values by hand (e.g. g1#data#max <- 10) doesn't work (these are ignored when plotting).
Finally, making a stack of the maps (which might be expected to plot everything on the same colour scale) doesn't work either - each map still has it's own colour scale.
Any thoughts on how to do this?
EDIT:
The solution I ended up using is:
plot( d, col=rev( rainbow( 99, start=0,end=1 ) ), breaks=seq(min(minValue( d )),max(maxValue(d)),length.out=100) )
Easy solution now is to use the zlim option.
plot( d, col=rev( rainbow( 99, start=0,end=1 ) ),zlim=c(0,1) )
Since the image::raster function specifies that the image::base arguments can be passed (and suggests that image::base is probably used), wouldn't you just specify the same col= and breaks= arguments to all calls to image::raster? You do need to get the breaks and the col arguments "in sync". The number of colors needs to be one less than the number of breaks. The example below is based on the classic volcano data and the second version shows how a range of values can be excluded from an image:
x <- 10*(1:nrow(volcano))
y <- 10*(1:ncol(volcano))
image(x, y, volcano, col = terrain.colors( length(seq(90, 200, by = 5))-1), axes = FALSE, breaks= seq(90, 200, by = 5) )
axis(1, at = seq(100, 800, by = 100))
axis(2, at = seq(100, 600, by = 100))
box()
title(main = "Maunga Whau Volcano", font.main = 4)
x <- 10*(1:nrow(volcano))
y <- 10*(1:ncol(volcano))
image(x, y, volcano, col = terrain.colors( length(seq(150, 200, by = 5))-1), axes = FALSE, breaks= seq(150, 200, by = 5) )
axis(1, at = seq(100, 800, by = 100))
axis(2, at = seq(100, 600, by = 100))
box()
title(main = "Maunga Whau Volcano Restricted to elevations above 150", font.main = 4)
A specific example would aid this effort.
Added as an answer in response to #Tomas
The answer I ended up using is:
plot( d, col=rev( rainbow( 99, start=0,end=1 ) ),
breaks=seq(min(minValue( d )),max(maxValue(d)),length.out=100) )
There is more work to be done here in 'raster' but here is a hack:
library(raster)
r1 <- r2 <- r3 <- raster(ncol=10, nrow=10)
r1[] <- runif(ncell(r1))
r2[] <- runif(ncell(r2)) / 2
r3[] <- runif(ncell(r3)) * 1.5
r3 <- min(r3, 1)
s <- stack(r1, r2, r3)
brk <- c(0, 0.25, 0.5, 0.75, 1)
par(mfrow=c(1,3))
plot(r1, breaks=brk, col=rainbow(4), legend=F)
plot(r1, breaks=brk, col=rainbow(4), legend.only=T, box=F)
plot(r2, breaks=brk, col=rainbow(4), legend=F)
plot(r1, breaks=brk, col=rainbow(4), legend.only=T, box=F)
plot(r3, breaks=brk, col=rainbow(4), legend=F)
plot(r1, breaks=brk, col=rainbow(4), legend.only=T, box=F)
You can also use the spplot function (sp package)
s <- stack(r1, r2, r3)
sp <- as(s, 'SpatialGridDataFrame')
spplot(sp)
You can also send the values to ggplot (search the r-sig-geo archives for examples)
If your RasterLayer links to a very large file, you might first do, before going to ggplot
r <- sampleRegular(r, size=100000, asRaster=TRUE)
and then perhaps
m <- as.matrix(r)
It did not work for me. I used this script to split the color scale and select the one more suitable according to my data:
plot(d, col=rev(heat.colors(8, alpha = 1)), breaks = seq(0, 0.40, by = 0.05))
A pretty simple solution that should usually work (e.g. with the "plot" function in the raster package) is to set "z axis" limits (which control the colors and the color legend).
E.g. you can do something like:
plot(d, zlim=c(0,1))
where d is a stacked raster object. Or, if you have a bunch of separate rasters d1, d2, d2..., you can just do:
plot(d1, zlim=c(0,1))
plot(d2, zlim=c(0,1))
plot(d3, zlim=c(0,1))
...

Resources