Discontionous heatmap in R - r

I wish to create something similar to the following types of discontinuous heat map in R:
My data is arranged as follows:
k_e percent time
.. .. ..
.. .. ..
I wish k_e to be x-axis, percent on y-axis and time to denote the color.
All links I could find plotted a continuous matrix http://www.r-bloggers.com/ggheat-a-ggplot2-style-heatmap-function/ or interpolated. But I wish neither of the aforementioned, I want to plot discontinuous heatmap as in the images above.

The second one is a hexbin plot
If your (x,y) pairs are unique, you can do an x y plot, if that's what you want, you can try using base R plot functions:
x <- runif(100)
y<-runif(100)
time<-runif(100)
pal <- colorRampPalette(c('white','black'))
#cut creates 10 breaks and classify all the values in the time vector in
#one of the breaks, each of these values is then indexed by a color in the
#pal colorRampPalette.
cols <- pal(10)[as.numeric(cut(time,breaks = 10))]
#plot(x,y) creates the plot, pch sets the symbol to use and col the color
of the points
plot(x,y,pch=19,col = cols)
With ggplot, you can also try:
library(ggplot2)
qplot(x,y,color=time)

Generate data
d <- data.frame(x=runif(100),y=runif(100),w=runif(100))
Using ggplot2
require(ggplot2)
Sample Count
The following code produces a discontinuous heatmap where color represents the number of items falling into a bin:
ggplot(d,aes(x=x,y=y)) + stat_bin2d(bins=10)
Average weight
The following code creates a discontinuous heatmap where color represents the average value of variable w for all samples inside the current bin.
ggplot(d,aes(x=x,y=y,z=w)) + stat_summary2d(bins=10)

Related

R PCA : With the fviz_pca_ind function, can we have two categorical variables: one point shape and one fill color?

I am trying to make a PCA plot with individuals
-where one categorical variable (A) would be represented as the point shape (eg one group as a circle, a second one as a square, etc.)
-and a second categorical variable (B) as the color inside the point
Is that possible?
Which code would you use?
I don't think you can modify the output from fviz_pca_ind(), so you would need to take out the data from the results, and plot it again using ggplot2:
library(factoextra)
library(ggplot2)
data <- iris
colnames(data)[5] <- "A"
data$B <- sample(letters[1:2],nrow(data),replace=TRUE)
res.pca <- prcomp(data[,1:4], scale = TRUE)
basic_plot <- fviz_pca_ind(res.pca, label="none")
ggplot(cbind(basic_plot$data,data[,c("A","B")]),
aes(x=x,y=y,col=A,shape=B)) + geom_point() + theme_bw()

geom_bar messing with y_axis scale

I have some elevation data which I would like to associate with climatic categories in a dataset. When I try to plot it as a barplot to see the distribution of the categories along the elevation, something in ggplot's geom_bar converts the y axis scale to some weird values.
Here is the example:
# Example dataset
data_mountain_A <- data.frame(elevation=c(0,500,1000,1500,2000),
temperature=c(20,16,12,8,5),
name="A")
data_mountain_B <- data.frame(elevation=c(0,500,1000,1500,2000,2500,3000),
temperature=c(20,16,12,8,5,0,-5),
name="B")
data_merge <- rbind(data_mountain_A, data_mountain_B)
# Creates the temperature intervals
data_merge$temperature_intervals <- cut(data_merge$temperature,seq(-5,20,5))
# Fancy colors
colfunc <- colorRampPalette(c("white","light blue","dark green"))
# Plot
ggplot(data=data_merge, aes(fill=temperature_intervals, y=elevation, x=name)) +
geom_bar(stat="identity") +
scale_fill_manual(values=colfunc(5))
And here is the output I get:
Any hints on what am I doing wrong?
Thanks!
EDIT: I've found out the issue. It was considering the elevation as a range, not as a single measure. I fixed it by replacing the elevation absolute values by the length of the elevation intervals.
I've found out the issue. It was considering the elevation as a range, not as a single measure. I fixed it by replacing the elevation absolute values by the length of the elevation intervals.
# Example dataset
data_mountain_A <- data.frame(elevation=c(500,500,500,500,500),
temperature=c(20,16,12,8,5),
name="A")
data_mountain_B <- data.frame(elevation=c(500,500,500,500,500,500,500),
temperature=c(20,16,12,8,5,0,-5),
name="B")
data_merge <- rbind(data_mountain_A, data_mountain_B)
# Creates the temperature intervals
data_merge$temperature_intervals <- cut(data_merge$temperature,seq(-5,20,5))
# Fancy colors
colfunc <- colorRampPalette(c("white","light blue","dark green"))
# Plot
ggplot(data=data_merge, aes(fill=temperature_intervals, y=elevation, x=name))+geom_bar(position="stack",stat="identity")+
scale_fill_manual(values=colfunc(5))

Standardize Color Range For Multiple Plots

I am plotting multiple dataframes, where the color of the line is dependent on a variable in the dataframe. The problem is that for each plot, R makes the color spectrum relative to the range of each plot.
I would like for the range (and corresponding colors) to be kept constant for all of the dataframes I'm using. I won't know the range of numbers in advance, though they'll all be set before plotting. In addition, there will hundreds of values, so a manual mapping is not feasible.
As of right now, I have:
library(ggplot2)
df1 <- as.data.frame(list('x'=1:5,'y'=1:5,'colors'=6:10))
df2 <- as.data.frame(list('x'=1:5,'y'=1:5,'colors'=8:12))
qplot(data=df1,x,y,geom='line', colour=colors)
qplot(data=df2,x,y,geom='line', colour=colors)
The first plot produces:
where the color range goes from 6-10.
The second plot produces:
where the color range goes from 8-12
I would like a constant range for both that goes from 6-12.

different color for dots in ggplot2

I want to draw a scatter plot with R. I use ggplot2 to draw the picture:
data<-data.frame(x=runif(50),y=runif(50))
ggplot(data, aes(x,y))+geom_point()
but I want the dots to have different colors according to the "x" value, the dots belongs to the following "x" ranges must have different colors. [0,0.2), [0.2,0.4), [0.4,0.6), [0.6,0.8),[0.8,1].
There's probably a better way to do this, but here's my solution:
# what we started with
data<-data.frame(x=runif(50),y=runif(50))
# create discretized variable z from x to determine plotted color.
# Since you wanted 5 levels, multiplied by 5 and took the floor, and then
# converted to a factor
z<-factor(floor((data$x)*5)) # or z<-factor(floor((data[,1])*5))
# add z to previous data frame and store in new variable dat
dat<-cbind(data,z)
# make pretty labels
lolim<-seq(0,0.8,0.2)
hilim<-seq(0.2,1,0.2)
lbls<-paste(lolim,'-',hilim)
# plot, changed x-axis ticks to show cutoff values
ggplot(dat,aes(x=x,y=y,color=z))+
geom_point()+
scale_color_hue(name='x',labels=lbls)+
scale_x_continuous(breaks=seq(0,1,0.2))
last_plot() + aes(colour=cut(x, breaks = seq(0,1,by=0.2)))

adding text to ggplot geom_jitter points that match a condition

How can I add text to points rendered with geom_jittered to label them? geom_text will not work because I don't know the coordinates of the jittered dots. Could you capture the position of the jittered points so I can pass to geom_text?
My practical usage would be to plot a boxplot with the geom_jitter over it to show the data distribution and I would like to label the outliers dots or the ones that match certain condition (for example the lower 10% for the values used for color the plots).
One solution would be to capture the xy positions of the jittered plots and use it later in another layer, is that possible?
[update]
From Joran answer, a solution would be to calculate the jittered values with the jitter function from the base package, add them to a data frame and use them with geom_point. For filtering he used ddply to have a filter column (a logic vector) and use it for subsetting the data in geom_text.
He asked for a minimal dataset. I just modified his example (a unique identifier in the label colum)
dat <- data.frame(x=rep(letters[1:3],times=100),y=runif(300),
lab=paste('id_',1:300,sep=''))
This is the result of joran example with my data and lowering the display of ids to the lowest 1%
And this is a modification of the code to have colors by another variable and displaying some values of this variable (the lowest 1% for each group):
library("ggplot2")
#Create some example data
dat <- data.frame(x=rep(letters[1:3],times=100),y=runif(300),
lab=paste('id_',1:300,sep=''),quality= rnorm(300))
#Create a copy of the data and a jittered version of the x variable
datJit <- dat
datJit$xj <- jitter(as.numeric(factor(dat$x)))
#Create an indicator variable that picks out those
# obs that are in lowest 1% by x
datJit <- ddply(datJit,.(x),.fun=function(g){
g$grp <- g$y <= quantile(g$y,0.01);
g$top_q <- g$qual <= quantile(g$qual,0.01);
g})
#Create a boxplot, overlay the jittered points and
# label the bottom 1% points
ggplot(dat,aes(x=x,y=y)) +
geom_boxplot() +
geom_point(data=datJit,aes(x=xj,colour=quality)) +
geom_text(data=subset(datJit,grp),aes(x=xj,label=lab)) +
geom_text(data=subset(datJit,top_q),aes(x=xj,label=sprintf("%0.2f",quality)))
Your question isn't completely clear; for example, you mention labeling points at one point but also mention coloring points, so I'm not sure which you really mean, or perhaps both. A reproducible example would be very helpful. But using a little guesswork on my part, the following code does what I think you're describing:
#Create some example data
dat <- data.frame(x=rep(letters[1:3],times=100),y=runif(300),
lab=rep('label',300))
#Create a copy of the data and a jittered version of the x variable
datJit <- dat
datJit$xj <- jitter(as.numeric(factor(dat$x)))
#Create an indicator variable that picks out those
# obs that are in lowest 10% by x
datJit <- ddply(datJit,.(x),.fun=function(g){
g$grp <- g$y <= quantile(g$y,0.1); g})
#Create a boxplot, overlay the jittered points and
# label the bottom 10% points
ggplot(dat,aes(x=x,y=y)) +
geom_boxplot() +
geom_point(data=datJit,aes(x=xj)) +
geom_text(data=subset(datJit,grp),aes(x=xj,label=lab))
Just an addition to Joran's wonderful solution:
I ran into trouble with the x-axis positioning when I tried to use in a facetted plot using facet_wrap(). The problem is, that ggplot2 uses 1 as the x-value on every facet. The solution is to create a vector of jittered 1s:
datJit$xj <- jitter(rep(1,length(dat$x)),amount=0.1)

Resources