I have some elevation data which I would like to associate with climatic categories in a dataset. When I try to plot it as a barplot to see the distribution of the categories along the elevation, something in ggplot's geom_bar converts the y axis scale to some weird values.
Here is the example:
# Example dataset
data_mountain_A <- data.frame(elevation=c(0,500,1000,1500,2000),
temperature=c(20,16,12,8,5),
name="A")
data_mountain_B <- data.frame(elevation=c(0,500,1000,1500,2000,2500,3000),
temperature=c(20,16,12,8,5,0,-5),
name="B")
data_merge <- rbind(data_mountain_A, data_mountain_B)
# Creates the temperature intervals
data_merge$temperature_intervals <- cut(data_merge$temperature,seq(-5,20,5))
# Fancy colors
colfunc <- colorRampPalette(c("white","light blue","dark green"))
# Plot
ggplot(data=data_merge, aes(fill=temperature_intervals, y=elevation, x=name)) +
geom_bar(stat="identity") +
scale_fill_manual(values=colfunc(5))
And here is the output I get:
Any hints on what am I doing wrong?
Thanks!
EDIT: I've found out the issue. It was considering the elevation as a range, not as a single measure. I fixed it by replacing the elevation absolute values by the length of the elevation intervals.
I've found out the issue. It was considering the elevation as a range, not as a single measure. I fixed it by replacing the elevation absolute values by the length of the elevation intervals.
# Example dataset
data_mountain_A <- data.frame(elevation=c(500,500,500,500,500),
temperature=c(20,16,12,8,5),
name="A")
data_mountain_B <- data.frame(elevation=c(500,500,500,500,500,500,500),
temperature=c(20,16,12,8,5,0,-5),
name="B")
data_merge <- rbind(data_mountain_A, data_mountain_B)
# Creates the temperature intervals
data_merge$temperature_intervals <- cut(data_merge$temperature,seq(-5,20,5))
# Fancy colors
colfunc <- colorRampPalette(c("white","light blue","dark green"))
# Plot
ggplot(data=data_merge, aes(fill=temperature_intervals, y=elevation, x=name))+geom_bar(position="stack",stat="identity")+
scale_fill_manual(values=colfunc(5))
Related
I've done the rounds here and via google without a solution, so please help if you can.
I'm looking to create something like this : painSensitivityHeatMap using ggplot2
I can create something kinda similar using geom_tile, but without the smoothing between data points ... the only solution I have found requires a lot of code and data interpolation. Not very elegant, me thinks.uglySolutionUsingTile
So I'm thinking, I could coerce the density2d plots to my purposes instead by having the plot use fixed values rather than a calculated data-point density -- much in the same way that stat='identity' can be used in histograms to make them represent data values, rather than data counts.
So a minimal working example:
df <- expand.grid(letters[1:5], LETTERS[1:5])
df$value <- sample(1:4, 25, replace=TRUE)
# A not so pretty, non-smooth tile plot
ggplot(df, aes(x=Var1, y=Var2, fill=value)) + geom_tile()
# A potentially beautiful density2d plot, except it fails :-(
ggplot(df, aes(x=Var1, y=Var2)) + geom_density2d(aes(color=..value..))
This took me a little while, but here is a solution for future reference
A solution using idw from the gstat package and spsample from the sp package.
I've written a function which takes a dataframe, number of blocks (tiles) and a low and upper anchor for the colour scale.
The function creates a polygon (a simple quadrant of 5x5) and from that creates a grid of that shape.
In my data, the location variables are ordered factors -- therefor I unclass them into numbers (1-to-5 corresponding to the polygon-grid) and convert them to coordinates -- thus converting the tmpDF from a datafra to a spatial dataframe. Note: there are no overlapping/duplicate locations -- i.e 25 observations corresponding to the 5x5 grid.
The idw function fills in the polygon-grid (newdata) with inverse-distance weighted values ... in other words, it interpolates my data to the full polygon grid of a given number of tiles ('blocks').
Finally I create a ggplot based on a color gradient from the colorRamps package
painMapLumbar <- function(tmpDF, blocks=2500, lowLimit=min(tmpDF$value), highLimit=max(tmpDF$value)) {
# Create polygon to represent the lower back (lumbar)
poly <- Polygon(matrix(c(0.5, 0.5,0.5, 5.5,5.5, 5.5,5.5, 0.5,0.5, 0.5), ncol=2, byrow=TRUE))
# Create a grid of datapoints from the polygon
polyGrid <- spsample(poly, n=blocks, type="regular")
# Filter out the data for the figure we want
tmpDF <- tmpDF %>% mutate(x=unclass(x)) %>% mutate(y=unclass(y))
tmpDF <- tmpDF %>% filter(y<6) # Lumbar region only
coordinates(tmpDF) <- ~x+y
# Interpolate the data as Inverse Distance Weighted
invDistanceWeigthed <- as.data.frame(idw(formula = value ~ 1, locations = tmpDF, newdata = polyGrid))
p <- ggplot(invDistanceWeigthed, aes(x=x1, y=x2, fill=var1.pred)) + geom_tile() + scale_fill_gradientn(colours=matlab.like2(100), limits=c(lowLimit,highLimit))
return(p)
}
I hope this is useful to someone ... thanks for the replies above ... they helped me move on.
I wish to create something similar to the following types of discontinuous heat map in R:
My data is arranged as follows:
k_e percent time
.. .. ..
.. .. ..
I wish k_e to be x-axis, percent on y-axis and time to denote the color.
All links I could find plotted a continuous matrix http://www.r-bloggers.com/ggheat-a-ggplot2-style-heatmap-function/ or interpolated. But I wish neither of the aforementioned, I want to plot discontinuous heatmap as in the images above.
The second one is a hexbin plot
If your (x,y) pairs are unique, you can do an x y plot, if that's what you want, you can try using base R plot functions:
x <- runif(100)
y<-runif(100)
time<-runif(100)
pal <- colorRampPalette(c('white','black'))
#cut creates 10 breaks and classify all the values in the time vector in
#one of the breaks, each of these values is then indexed by a color in the
#pal colorRampPalette.
cols <- pal(10)[as.numeric(cut(time,breaks = 10))]
#plot(x,y) creates the plot, pch sets the symbol to use and col the color
of the points
plot(x,y,pch=19,col = cols)
With ggplot, you can also try:
library(ggplot2)
qplot(x,y,color=time)
Generate data
d <- data.frame(x=runif(100),y=runif(100),w=runif(100))
Using ggplot2
require(ggplot2)
Sample Count
The following code produces a discontinuous heatmap where color represents the number of items falling into a bin:
ggplot(d,aes(x=x,y=y)) + stat_bin2d(bins=10)
Average weight
The following code creates a discontinuous heatmap where color represents the average value of variable w for all samples inside the current bin.
ggplot(d,aes(x=x,y=y,z=w)) + stat_summary2d(bins=10)
I'm using R to read and plot data from NetCDF files (ncdf4). I've started using R only recently thus I'm very confused, I beg your pardon.
Let's say from the files I obtain N 2-D matrixes of numerical values, each with different dimensions and many NA values.
I have to histogram these values in the same plot, with bins of given width and within given limits, the same for every matrix.
For just one matrix, I can do this:
library(ncdf4)
library(ggplot2)
file0 <- nc_open("test.nc")
#Read a variable
prec0 <- ncvar_get(file0,"pr")
#Some settings
min_plot=0
max_plot=30
bin_width=2
xlabel="mm/day"
ylabel="PDF"
title="Precipitation"
#Get maximum of array, exclude NAs
maximum_prec0=max(prec0, na.rm=TRUE)
#Store the histogram
histo_prec0 <- hist(prec0, xlim=c(min_plot,max_plot), right=FALSE, breaks=seq(0,ceiling(maximum_prec0),by=bin_width))
#Plot the histogram densities using points instead of bars, which is what we want
qplot(histo_prec0$mids, histo_prec0$density, xlim=c(min_plot,max_plot), color=I("yellow"), xlab=xlabel, ylab=ylabel, main=title, log="y")
#If necessary, can transform matrix to vector using
#vector_prec0 <- c(prec0)
However it occurs to me that it would be best to use a DataFrame for plotting multiple matrixes. I'm not certain of that nor on how to do it. This would also allow for automatic legends and all the advantages that come from using dataframes with ggplot2.
What I want to achieve is something akin to this:
https://copy.com/thumbs_public/j86WLyOWRs4N1VTi/scatter_histo.jpg?size=1024
Where on Y we have the Density and on X the bins.
Thanks in advance.
To be honest, it is unclear what you are after (scatter plot or histogram of data with values as points?).
Here are a couple of examples using ggplot which might fit your goals (based on your last sentence: "Where on Y we have the Density and on X the bins"):
# some data
nsample<- 200
d1<- rnorm(nsample,1,0.5)
d2<- rnorm(nsample,2,0.6)
#transformed into histogram bins and collected in a data frame
hist.d1<- hist(d1)
hist.d2<- hist(d2)
data.d1<- data.frame(hist.d1$mids, hist.d1$density, rep(1,length(hist.d1$density)))
data.d2<- data.frame(hist.d2$mids, hist.d2$density, rep(2,length(hist.d2$density)))
colnames(data.d1)<- c("bin","den","group")
colnames(data.d2)<- c("bin","den","group")
ddata<- rbind(data.d1,data.d2)
ddata$group<- factor(ddata$group)
# plot
plots<- ggplot(data=ddata, aes(x=bin, y=den, group=group)) +
geom_point(aes(color=group)) +
geom_line(aes(color=group)) #optional
print(plots)
However, you could also produce smooth density plots (or histograms) directly in ggplot:
ddata2<- cbind(c(rep(1,nsample),rep(2,nsample)),c(d1,d2))
ddata2<- as.data.frame(ddata2)
colnames(ddata2)<- c("group","value")
ddata2$group<- factor(ddata2$group)
plots2<- ggplot(data=ddata2, aes(x=value, group=group)) +
geom_density(aes(color=group))
# geom_histogram(aes(color=group, fill=group)) # for histogram instead
windows()
print(plots2)
How can I add text to points rendered with geom_jittered to label them? geom_text will not work because I don't know the coordinates of the jittered dots. Could you capture the position of the jittered points so I can pass to geom_text?
My practical usage would be to plot a boxplot with the geom_jitter over it to show the data distribution and I would like to label the outliers dots or the ones that match certain condition (for example the lower 10% for the values used for color the plots).
One solution would be to capture the xy positions of the jittered plots and use it later in another layer, is that possible?
[update]
From Joran answer, a solution would be to calculate the jittered values with the jitter function from the base package, add them to a data frame and use them with geom_point. For filtering he used ddply to have a filter column (a logic vector) and use it for subsetting the data in geom_text.
He asked for a minimal dataset. I just modified his example (a unique identifier in the label colum)
dat <- data.frame(x=rep(letters[1:3],times=100),y=runif(300),
lab=paste('id_',1:300,sep=''))
This is the result of joran example with my data and lowering the display of ids to the lowest 1%
And this is a modification of the code to have colors by another variable and displaying some values of this variable (the lowest 1% for each group):
library("ggplot2")
#Create some example data
dat <- data.frame(x=rep(letters[1:3],times=100),y=runif(300),
lab=paste('id_',1:300,sep=''),quality= rnorm(300))
#Create a copy of the data and a jittered version of the x variable
datJit <- dat
datJit$xj <- jitter(as.numeric(factor(dat$x)))
#Create an indicator variable that picks out those
# obs that are in lowest 1% by x
datJit <- ddply(datJit,.(x),.fun=function(g){
g$grp <- g$y <= quantile(g$y,0.01);
g$top_q <- g$qual <= quantile(g$qual,0.01);
g})
#Create a boxplot, overlay the jittered points and
# label the bottom 1% points
ggplot(dat,aes(x=x,y=y)) +
geom_boxplot() +
geom_point(data=datJit,aes(x=xj,colour=quality)) +
geom_text(data=subset(datJit,grp),aes(x=xj,label=lab)) +
geom_text(data=subset(datJit,top_q),aes(x=xj,label=sprintf("%0.2f",quality)))
Your question isn't completely clear; for example, you mention labeling points at one point but also mention coloring points, so I'm not sure which you really mean, or perhaps both. A reproducible example would be very helpful. But using a little guesswork on my part, the following code does what I think you're describing:
#Create some example data
dat <- data.frame(x=rep(letters[1:3],times=100),y=runif(300),
lab=rep('label',300))
#Create a copy of the data and a jittered version of the x variable
datJit <- dat
datJit$xj <- jitter(as.numeric(factor(dat$x)))
#Create an indicator variable that picks out those
# obs that are in lowest 10% by x
datJit <- ddply(datJit,.(x),.fun=function(g){
g$grp <- g$y <= quantile(g$y,0.1); g})
#Create a boxplot, overlay the jittered points and
# label the bottom 10% points
ggplot(dat,aes(x=x,y=y)) +
geom_boxplot() +
geom_point(data=datJit,aes(x=xj)) +
geom_text(data=subset(datJit,grp),aes(x=xj,label=lab))
Just an addition to Joran's wonderful solution:
I ran into trouble with the x-axis positioning when I tried to use in a facetted plot using facet_wrap(). The problem is, that ggplot2 uses 1 as the x-value on every facet. The solution is to create a vector of jittered 1s:
datJit$xj <- jitter(rep(1,length(dat$x)),amount=0.1)
I saw Boxplot in R showing the mean
I'm interested in the ggplot solution. But what I am plotting are averages already so I don't want to do an average of an average. I do have the true mean stored in TrueAvgCPC.
Here is what I tried, but it's not working:
p <- qplot(Mydf$Network,Mydf$Avg.CPC,data=Mydf,geom='boxplot')
p <- p+stat_summary(TrueAvgCPC,shape=1,col='red',geom='point')
print(p)
Thanks!
As far as I see, you want to just add a true mean (or several?) to the box plot. If you have the value(s), why use stat_summary instead of just plotting the points?
#sample data
x <- rnorm(30)
y <- rep(letters[1:3],10)
TrueAVGCPC <- c(0.34,0.1,0.44)
#plot
p <- qplot(y,x,geom='boxplot')
p <- p+geom_point(aes(x=c(1,2,3),y=TrueAVGCPC),col="red")
print(p)