Binning not correct? Different amount of counts

Binning not correct? Different amount of counts - r

I have two vectors of values, both with the same number of entries. Hence, when these vectors are histogrammed, the corresponding distributions should depict the counts vs values. I'm not sure whether I misinterpret something or plotted something wrong but in my understand the red values should not top the green values everywhere. When both vectors provide the same number of entries the one distribution must be lower than the other when the other is higher somewhere. Or not?
The plot command:
number_ticks<- function(n) {function(limits) pretty(limits, n)}
ggplot(data, aes(x = value, fill = Parameter)) +
geom_histogram(
binwidth = 0.25,
color = "black",
alpha = 0.75) +
theme_classic() +
theme(legend.position = c(0.21, 0.85)) +
labs(title = "",
x = TeX("$ \\Delta U_{bias} / V"))) +
scale_x_continous(breaks = number_ticks(20)) +
guides(fill=guide_legend(title=Parameter))

Currently the red histogram goes on top of the green one: they are stacked. That is, position = "stack" is the default option in geom_histogram, while you want to use position = "identity".
For instance, compare
ggplot(diamonds, aes(price, fill = cut)) +
geom_histogram(binwidth = 500)
with
ggplot(diamonds, aes(price, fill = cut)) +
geom_histogram(binwidth = 500, position = "identity", alpha = 0.5)

Related

ground geom_text to x axis (e.g. y =0)

I have a graph made in ggplot that looks like this:
I wish to have the numeric labels at each of the bars to be grounded/glued to the x axis where y <= 0.
This is the code to generate the graph as such:
ggplot(data=df) +
geom_bar(aes(x=row, y=numofpics, fill = crop, group = 1), stat='identity') +
geom_point(data=df, aes(x = df$row, y=df$numofparcels*50, group = 2), alpha = 0.25) +
geom_line(data=df, aes(x = df$row, y=df$numofparcels*50, group = 2), alpha = 0.25) +
geom_text(aes(x=row, y=numofpics, label=bbch)) +
geom_hline(yintercept=300, linetype="dashed", color = "red", size=1) +
scale_y_continuous(sec.axis= sec_axis(~./50, name="Number of Parcels")) +
scale_x_discrete(name = c(),breaks = unique(df$crop), labels = as.character(unique(df$crop)))+
labs(x=c(), y="Number of Pictures")
I've tried vjust and experimenting with position_nudge for the geom_text element, but every solution I can find changes the position of each element of the geom_text respective to its current position. As such everything I try results in situation like this one:
How can I make ggplot ground the text to the bottom of the x axis where y <= 0, possibly with the possibility to also introduce a angle = 45?
Link to dataframe = https://drive.google.com/file/d/1b-5AfBECap3TZjlpLhl1m3v74Lept2em/view?usp=sharing

As I said in the comments, just set the y-coordinate of the text to 0 or below, and specify the angle : geom_text(aes(x=row, y=-100, label=bbch), angle=45)

I'm behind a proxy server that blocks connections to google drive so I can't access your data. I'm not able to test this, but I would introduce a new label field in my dataset that sets y to be 0 if y<0:
df <- df %>%
mutate(labelField = if_else(numofpics<0, 0, numofpics)
I would then use this label field in my geom_text call:
geom_text(aes(x=row, y=labelField, label=bbch), angle = 45)
Hope that helps.

You can simply define the y-value in geom_text (e.g. -50)
ggplot(data=df) +
geom_bar(aes(x=row, y=numofpics, fill = crop, group = 1), stat='identity') +
geom_point(data=df, aes(x = df$row, y=df$numofparcels*50, group = 2), alpha = 0.25) +
geom_line(data=df, aes(x = df$row, y=df$numofparcels*50, group = 2), alpha = 0.25) +
geom_text(aes(x=row, y=-50, label=bbch)) +
geom_hline(yintercept=300, linetype="dashed", color = "red", size=1) +
scale_y_continuous(sec.axis= sec_axis(~./50, name="Number of Parcels")) +
scale_x_discrete(name = c(),breaks = unique(df$crop), labels =
as.character(unique(df$crop)))+
labs(x=c(), y="Number of Pictures")

How to switch y axis labels in ggplot without reversing plot?

I'm trying to plot mean values for species although the mean values are all negative. I want the more smaller values (more negative) to be towards the bottom of the y axis with the larger values (less negative) to be higher up on the y axis.
I've tried changing coord_cartesian and ylim and neither work.
ggplot(meanWUE, aes(x = Species, y = mean, fill = Species)) +
coord_cartesian(ylim = c(-0.8, -0.7)) +
scale_fill_manual( values c("EUCCHR" = "darkolivegreen2","ESCCAL" = "darkgoldenrod2", "ARTCAL" = "darkcyan", "DEIFAS" = "darkred", "ENCCAL" = "darkorchid2", "SALMEL" = "deepskyblue1", "ERIFAS" = "blue3", "BRANIG" = "azure3", "PHAPAR"= "palevioletred" )) +
scale_y_reverse() +
geom_bar(position = position_dodge(), stat="identity") +
geom_errorbar(aes(ymin=mean-se, ymax=mean+se),width=.3) +
labs(x="Species", y="WUE")+
theme_bw() +
theme(panel.grid.major = element_blank(), legend.position = "none")
I want ESCCAL and EUCCHR to be the shortest bars essentially, but currently they're being shown as the tallest.
Species vs water use efficiency
If I don't do scale_y_reverse, I get a plot that looks like this second image

One approach is to shift all the numbers to show their value over a baseline, and then adjust the labeling the same way:
df <- data.frame(Species = LETTERS[1:10],
mean = -80:-71/100)
ggplot(df, aes(x = Species, y = mean, fill = Species)) +
geom_bar(position = position_dodge(), stat="identity")
Here we shift the values to show them against a new baseline. Then we can show larger numbers as larger bars the way we'd normally expect for positive numbers. At the same time, we change the labels on the y axis so they correspond to the original values. So -0.8 becomes +0.1 vs. a baseline of -0.9. But we adjust the labels too, so that adjusted 0 has a label of -0.9, and adjusted +0.1 has a label of -0.8, its original value.
baseline <- -0.9
ggplot(df, aes(x = Species, y = mean - baseline, fill = Species)) +
geom_bar(position = position_dodge(), stat="identity") +
scale_y_continuous(breaks = 0:100*0.02,
labels = 0:100*0.02 + baseline, minor_breaks = NULL)

ggplot2 - using two different color scales for same fill in overlayed plots

A very similar question to the one asked here. However, in that situation the fill parameter for the two plots are different. For my situation the fill parameter is the same for both plots, but I want different color schemes.
I would like to manually change the color in the boxplots and the scatter plots (for example making the boxes white and the points colored).
Example:
require(dplyr)
require(ggplot2)
n<-4*3*10
myvalues<- rexp((n))
days <- ntile(rexp(n),4)
doses <- ntile(rexp(n), 3)
test <- data.frame(values =myvalues,
day = factor(days, levels = unique(days)),
dose = factor(doses, levels = unique(doses)))
p<- ggplot(data = test, aes(x = day, y = values)) +
geom_boxplot( aes(fill = dose))+
geom_point( aes(fill = dose), alpha = 0.4,
position = position_jitterdodge())
produces a plot like this:
Using 'scale_fill_manual()' overwrites the aesthetic on both the boxplot and the scatterplot.
I have found a hack by adding 'colour' to geom_point and then when I use scale_fill_manual() the scatter point colors are not changed:
p<- ggplot(data = test, aes(x = day, y = values)) +
geom_boxplot(aes(fill = dose), outlier.shape = NA)+
geom_point(aes(fill = dose, colour = factor(test$dose)),
position = position_jitterdodge(jitter.width = 0.1))+
scale_fill_manual(values = c('white', 'white', 'white'))
Are there more efficient ways of getting the same result?

You can use group to set the different boxplots. No need to set the fill and then overwrite it:
ggplot(data = test, aes(x = day, y = values)) +
geom_boxplot(aes(group = interaction(day, dose)), outlier.shape = NA)+
geom_point(aes(fill = dose, colour = dose),
position = position_jitterdodge(jitter.width = 0.1))
And you should never use data$column inside aes - just use the bare column. Using data$column will work in simple cases, but will break whenever there are stat layers or facets.

geom_tile single color as 0, then color scale

I want to produce a heat map where with a color pallet of green to red, but values of 0 are in white. I got started with geom_tile heatmap with different high fill colours based on factor and others on SO but can't quite get what I need. For example, with the following database:
df <- data.frame(expand.grid(1:10,1:10))
df$z <- sample(0:10, nrow(df), replace=T)
I can create this plot:
ggplot(df,aes(x = Var1,y = Var2,fill = z)) +
geom_tile() +
scale_fill_gradient(low = "green", high = "red")
But I want the values equal to zero to be white. So this gets part way there:
ggplot(df,aes(x = Var1,y = Var2,fill = z)) +
geom_tile() +
scale_fill_gradient(low="green", high="red", limits=c(1, 10))
And this gets 0 as white but I lose the green to red:
ggplot(df,aes(x = Var1,y = Var2,fill = z)) +
geom_tile() +
scale_fill_gradient(low = "white", high = "red")
And I can't use brewer scales at all (though I think I'm missing something simple based on the error).
ggplot(df,aes(x = Var1,y = Var2,fill = z)) +
geom_tile() +
scale_fill_brewer("Greens")
Error: Continuous value supplied to discrete scale
Should I just replace 0 with NA? Any help would be appreciated.

You can use scale_fill_gradientn():
ggplot(df,aes(x = Var1,y = Var2, fill = z)) +
geom_tile() +
scale_fill_gradientn(colours = c("white", "green", "red"), values = c(0,0.1,1))

If you do not have NA values in your data, you can modify your 2nd plot:
ggplot(df,aes(x = Var1,y = Var2,fill = z)) +
geom_tile() +
scale_fill_gradient(low="green", high="red", limits=c(1, 10),na.value="white")
The advantage of this solution (in comparison to the one by #erc) is, that only the values below your limits are white, while in the solution by #erc, the values between zero and your limit are colored as a gradient so some of them can also be white-ish (i.e. be on the white side of the gradient, visually indistinguishable from white).

Count and axis labels on stat_bin2d with ggplot

I am trying to make a 2D histogram with the individual bins showing both the bin contents and a gradient. The data are integers ranging from 0 to 4 (only) in both axes.
I tried working with this answer but I end up with a few issues. First, a few bins end up getting no gradient at all. In the MWE below, the bottom left bins of 130 and 60 seems to be blank. Second, the bins are shifted to below 0 in both axes. For this axis issue, I found I could simply add a 0.5 to both x and y. In the end though, I also would like to have the axis labels to be centered within a bin and adding that 0.5 does not address that.
library(ggplot2)
# Construct the data to be plotted
x <- c(rep(0,190),rep(1,50),rep(2,10),rep(3,40))
y <- c(rep(0,130),rep(1,80),rep(2,30),rep(3,10),rep(4,40))
data <- data.frame(x,y)
# Taken from the example
ggplot(data, aes(x = x, y = y)) +
geom_bin2d(binwidth=1) +
stat_bin2d(geom = "text", aes(label = ..count..), binwidth=1) +
scale_fill_gradient(low = "snow3", high = "red", trans = "log10") +
xlim(-1, 5) +
ylim(-1, 5) +
coord_equal()
Is there something obvious I am doing wrong in both the color gradients and axis labels? I am also not married to ggplot or stat_bin2d if there is a better way to do it with some other package/command. Thanks in advance!

stat_bin2d uses the cut function to create the bins. By default, cut creates bins that are open on the left and closed on the right. stat_bin2d also sets include.lowest=TRUE so that the lowest interval will be closed on the left also. I haven't looked through the code for stat_bin2d to try and figure out exactly what's going wrong, but it seems like it has to do with how the breaks in cut are being chosen. In any case, you can get the desired behavior by setting the bin breaks explicitly to start at -1. For example:
ggplot(data, aes(x = x, y = y)) +
geom_bin2d(breaks=c(-1:4)) +
stat_bin2d(geom = "text", aes(label = ..count..), breaks=c(-1:4)) +
scale_fill_gradient(low = "snow3", high = "red", trans = "log10") +
xlim(-1, 5) +
ylim(-1, 5) +
coord_equal()
To center the tiles on the integer lattice points, set the breaks to half-integer values:
ggplot(data, aes(x = x, y = y)) +
geom_bin2d(breaks=seq(-0.5,4.5,1)) +
stat_bin2d(geom = "text", aes(label = ..count..), breaks=seq(-0.5,4.5,1)) +
scale_fill_gradient(low = "snow3", high = "red", trans = "log10") +
scale_x_continuous(breaks=0:4, limits=c(-0.5,4.5)) +
scale_y_continuous(breaks=0:4, limits=c(-0.5,4.5)) +
coord_equal()
Or, to emphasize that the values are discrete, set the bins to be half a unit wide:
ggplot(data, aes(x = x, y = y)) +
geom_bin2d(breaks=seq(-0.25,4.25,0.5)) +
stat_bin2d(geom = "text", aes(label = ..count..), breaks=seq(-0.25,4.25,0.5)) +
scale_fill_gradient(low = "snow3", high = "red", trans = "log10") +
scale_x_continuous(breaks=0:4, limits=c(-0.25,4.25)) +
scale_y_continuous(breaks=0:4, limits=c(-0.25,4.25)) +
coord_equal()

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Binning not correct? Different amount of counts - r

Related

ground geom_text to x axis (e.g. y =0)

How to switch y axis labels in ggplot without reversing plot?

ggplot2 - using two different color scales for same fill in overlayed plots

geom_tile single color as 0, then color scale

Count and axis labels on stat_bin2d with ggplot

Categories

Resources