Getting geom_tile to draw square rather than rectangular cells - r

I'm trying to generate a heatmap plot using ggplot's geom_tile. My data have far more rows than columns.
set.seed(1)
df <- data.frame(val=rnorm(100),gene=rep(letters[1:20],5),cell=c(sapply(LETTERS[1:5],function(l) rep(l,20))))
Running:
library(ggplot2)
ggplot(df,aes(y=gene,x=cell,fill=val))+geom_tile(color="white")
produces:
How do I get the heatmap cells to be of symmetric dimensions - squares instead of rectangles (height=width)? without distorting the dimensions of the figure.

An option is to add coord_equal.
The default, ratio = 1, ensures that one unit on the x-axis is the
same length as one unit on the y-axis
ggplot(df, aes(y = gene, x = cell, fill = val)) +
geom_tile(color = "white") +
coord_equal()

Tweak the ratio as follows
set.seed(1)
df <- data.frame(val=rnorm(100),gene=rep(letters[1:20],5),
cell=c(sapply(LETTERS[1:5],function(l) rep(l,20))))
library(ggplot2)
p <- ggplot(df,aes(y=gene,x=cell,fill=val))+geom_tile(color="white")
p <- p + coord_fixed(ratio = 0.7)
p

Related

How to clip an interpolated layer in R so it does not extend past data boundaries

I am trying to display a cross-section of conductivity in a lagoon environment using isolines. I have applied interp() and stat_contour() to my data, but I would like to clip the interpolated output so that it doesn't extend past my data points. This way the bathymetry of the lagoon in the cross-section is clear. Here is the code I have used so far:
cond_df <- read_csv("salinity_profile.csv")
di <- interp(cond_df$stop, cond_df$depth, cond_df$conductivity,
xo = seq(min(cond_df$stop), max(cond_df$stop), length = 200),
yo = seq(min(cond_df$depth), max(cond_df$depth), length = 200))
dat_interp <- data.frame(expand.grid(x=di$x, y=di$y), z=c(di$z))
ggplot(dat_interp) +
aes(x=x, y=y, z=z, fill=z)+
scale_y_reverse() +
geom_tile()+
stat_contour(colour="white", size=0.25) +
scale_fill_viridis_c() +
theme_tufte(base_family="Helvetica")
Here is the output:
interpolated plot
To help clarify, here is the data just as a geom_point() graph, and I do not want the interpolated layer going past the lower points of the graph:
cond_df%>%
ggplot(mapping=aes(x=stop, y=depth, z=conductivity, fill=conductivity)) +
geom_point(aes(colour = conductivity), size = 3) +
scale_y_reverse()
point plot
You can mask the unwanted region of the plot by using geom_ribbon.
You will need to generate a data.frame with values for the max depth at each stop. Here's one somewhat inelegant way to do that:
# Create the empty data frame for all stops
bathymetry <- data.frame(depth = as.numeric(NA),
stop = unique(cond_df$stop))
# Find the max depth for each stop
for(thisStop in bathymetry$stop){
bathymetry[bathymetry$stop==thisStop, "depth"] <- max(cond_df[cond_df$stop==thisStop, "depth"])
}
Then, you can add the geom_ribbon as the last geom of your plot, like so
geom_ribbon(data=bathymetry, aes(x=stop, ymin=depth, ymax=max(cond_df$depth)), inherit.aes = FALSE)

Weird behavior of ggplot combined with fill and scale_y_log10()

I'm trying to produce a histogram with ggplot's geom_histogram which colors the bars according to a gradient, and log10's them.
Here's the code:
library(ggplot2)
set.seed(1)
df <- data.frame(id=paste("ID",1:1000,sep="."),val=rnorm(1000),stringsAsFactors=F)
bins <- 10
cols <- c("darkblue","darkred")
colGradient <- colorRampPalette(cols)
cut.cols <- colGradient(bins)
df$cut <- cut(df$val,bins)
df$cut <- factor(df$cut,level=unique(df$cut))
Then,
ggplot(data=df,aes_string(x="val",y="..count..+1",fill="cut"))+
geom_histogram(show.legend=FALSE)+
scale_color_manual(values=cut.cols,labels=levels(df$cut))+
scale_fill_manual(values=cut.cols,labels=levels(df$cut))+
scale_y_log10()
gives:
whereas dropping the fill from the aesthetics:
ggplot(data=df,aes_string(x="val",y="..count..+1"))+
geom_histogram(show.legend=FALSE)+
scale_color_manual(values=cut.cols,labels=levels(cuts))+
scale_fill_manual(values=cut.cols,labels=levels(cuts))+
scale_y_log10()
gives:
Any idea why do the histogram bars differ between the two plots and to make the first one similar to the second one?
The OP is trying to produce a histogram with ggplot's geom_histogram which colors the bars according to a gradient...
The OP has already done the binning (with 10 bins) but is then calling geom_histogram() which does a binning on its own using 30 bins by default (see ?geomhistogram).
When geom_bar() is used instead together with cutinstead of val
ggplot(data = df, aes_string(x = "cut", y = "..count..+1", fill = "cut")) +
geom_bar(show.legend = FALSE) +
scale_color_manual(values = cut.cols, labels = levels(df$cut)) +
scale_fill_manual(values = cut.cols, labels = levels(df$cut)) +
scale_y_log10()
the chart becomes:
Using geom_histogram() with filled bars is less straightforward as can be seen in this and this answer to the question How to fill histogram with color gradient?

Line up columns of bar graph with points of line plot with ggplot

Is there any way to line up the points of a line plot with the bars of a bar graph using ggplot when they have the same x-axis? Here is the sample data I'm trying to do it with.
library(ggplot2)
library(gridExtra)
data=data.frame(x=rep(1:27, each=5), y = rep(1:5, times = 27))
yes <- ggplot(data, aes(x = x, y = y))
yes <- yes + geom_point() + geom_line()
other_data = data.frame(x = 1:27, y = 50:76 )
no <- ggplot(other_data, aes(x=x, y=y))
no <- no + geom_bar(stat = "identity")
grid.arrange(no, yes)
Here is the output:
The first point of the line plot is to the left of the first bar, and the last point of the line plot is to the right of the last bar.
Thank you for your time.
Extending #Stibu's post a little: To align the plots, use gtable (Or see answers to your earlier question)
library(ggplot2)
library(gtable)
data=data.frame(x=rep(1:27, each=5), y = rep(1:5, times = 27))
yes <- ggplot(data, aes(x = x, y = y))
yes <- yes + geom_point() + geom_line() +
scale_x_continuous(limits = c(0,28), expand = c(0,0))
other_data = data.frame(x = 1:27, y = 50:76 )
no <- ggplot(other_data, aes(x=x, y=y))
no <- no + geom_bar(stat = "identity") +
scale_x_continuous(limits = c(0,28), expand = c(0,0))
gYes = ggplotGrob(yes) # get the ggplot grobs
gNo = ggplotGrob(no)
plot(rbind(gNo, gYes, size = "first")) # Arrange and plot the grobs
Edit To change heights of plots:
g = rbind(gNo, gYes, size = "first") # Combine the plots
panels <- g$layout$t[grepl("panel", g$layout$name)] # Get the positions for plot panels
g$heights[panels] <- unit(c(0.7, 0.3), "null") # Replace heights with your relative heights
plot(g)
I can think of (at least) two ways to align the x-axes in the two plots:
The two axis do not align because in the bar plot, the geoms cover the x-axis from 0.5 to 27.5, while in the other plot, the data only ranges from 1 to 27. The reason is that the bars have a width and the points don't. You can force the axex to align by explicitly specifying an x-axis range. Using the definitions from your plot, this can be achieved by
yes <- yes + scale_x_continuous(limits=c(0,28))
no <- no + scale_x_continuous(limits=c(0,28))
grid.arrange(no, yes)
limits sets the range of the x-axis. Note, though, that the alginment is still not quite perfect. The y-axis labels take up a little more space in the upper plot, because the numbers have two digits. The plot looks as follows:
The other solution is a bit more complicated but it has the advantage that the x-axis is drawn only once and that ggplot makes sure that the alignment is perfect. It makes use of faceting and the trick described in this answer. First, the data must be combined into a single data frame by
all <- rbind(data.frame(other_data,type="other"),data.frame(data,type="data"))
and then the plot can be created as follows:
ggplot(all,aes(x=x,y=y)) + facet_grid(type~.,scales = "free_y") +
geom_bar(data=subset(all,type=="other"),stat="identity") +
geom_point(data=subset(all,type=="data")) +
geom_line(data=subset(all,type=="data"))
The trick is to let the facets be constructed by the variable type which was used before to label the two data sets. But then each geom only gets the subset of the data that should be drawn with that specific geom. In facet_grid, I also used scales = "free_y" because the two y-axes should be independent. This plot looks as follows:
You can change the labels of the facets by giving other names when you define the data frame all. If you want to remove them alltogether, then add the following to your plot:
+ theme(strip.background = element_blank(), strip.text = element_blank())

Making adjustments to a forest plot using ggplot2

I'm trying to create a forest plot in R from meta-analysis results. However, I'm having difficulties adjusting the line thickness & the center points as well as getting rid of the automatic legend and creating my own legend.
#d is a data frame with 4 columns
#d$x gives variable names
#d$y gives center point
#d$ylo gives lower limits
#d$yhi gives upper limits
#data
d <- data.frame(x = toupper(letters[1:10]),
y = rnorm(10, 0, 0.1))
d <- transform(d, ylo = y-1/10, yhi=y+1/10)
d$x <- factor(d$x, levels=rev(d$x)) #Reverse ordering in the way that it's is in the
#function
credplot.gg <- function(d){
require(ggplot2)
p <- ggplot(d, aes(x=x, y=y, ymin=ylo, ymax=yhi,group=x,colour=x))+
geom_pointrange()+ theme_bw()+ coord_flip()+
guides(color=guide_legend(title="Cohort"))+
geom_hline(aes(x=0),colour = 'red', lty=1)+
xlab('Cohort') + ylab('Beta') + ggtitle('rs6467890_CACNA2D1')
return(p)
}
credplot.gg(d)
The issues that I'm having are:
when insert "size" into ggplot(d, aes(x=x, y=y, ymin=ylo, ymax=yhi, group=x,colour=x), size=1.5) the line and points are extremely large
How do I get rid of the legend that is automatically generated with the plot and how do I create my own legend?
I'm fairly new to r so and any help is gladly appreciated

Insert color into scale fill gradient at specified location / value in ggplot2

I've created a plot in ggplot2 and then "on top" of that plot I have used geom_rect() to visualize (in red) the data points with the lowest 10% of values of z. I would still like to use scale_fill_gradientn to fill the legend/scale bar, but I would like to insert the color red at the minimum of the fill (i.e. that bottom of the vertical scale). How can I use scale_fill_gradientn and do this? Or, how can I achieve the desired result using another method?
# The example code here produces an plot for illustrative purposes only.
# create data frame, from ggplot2 documentation
df <- expand.grid(x = 0:5, y = 0:5)
df$z <- runif(nrow(df))
# select min. 10%
df.1 <- df[df$z <= quantile(df$z, 0.1),]
#plot
ggplot(df, aes(x, y, fill = z)) + geom_raster() +
scale_fill_gradientn(colours=topo.colors(7),na.value = "transparent") +
geom_rect(data=df.1, size=1, fill="red", colour=NA , aes(xmin=x-.5, xmax=x+.5, ymin=y-.5, ymax=y+.5))
One solution would be provide "red" as one of the colors in scale_fill_gradientn(). Red is used twice to ensure that in range from 0 to 0.1 all values get "red" then with seq() all other values in range from 0.1001 to 1 are distributed evenly. In this case you don't need a second dataframe.
set.seed(123)
df <- expand.grid(x = 0:5, y = 0:5)
df$z <- runif(nrow(df))
ggplot(df, aes(x, y, fill = z)) + geom_raster() +
scale_fill_gradientn(colours=c("red","red",topo.colors(6)),
values=c(0,0.1,seq(0.1001,1,length.out=7)),na.value = "transparent")

Resources