geom_boxplot behaving oddly?

geom_boxplot behaving oddly? - r

I'm currently plotting some data (response times in ms) in geom_boxplot.
I have a question:
When you adjust the limits on the y-axis does it disregard any values above that in the plotting & error bar calculations?
The data itself comprises of over 20k entries and I'm not sure providing a sample will be of much use as this is a more functionality based question.
Here is the code I use:
f <- function(x) {ans <- boxplot.stats(x)
data.frame(ymin = ans$conf[1], ymax = ans$conf[2], y = ans$stats[3])}
RTs.box = ggplot(mean.vis.aud.long, aes(x = Report, y = RTs, fill =Report)) + theme_bw() + facet_grid(Audio~Visual)
RTs.box +
geom_boxplot(alpha = .8) + geom_hline(yintercept = .333, linetype = 3, alpha = .8) + theme(legend.position = "none") + ylab("Reposponse Times ms") + scale_fill_grey(start=.4) +
labs(title = expression("Visual Condition")) + theme(plot.title = element_text(size = rel(1)))+
theme(panel.background = element_rect())+
#line below for shaded confidence intervals
stat_summary(fun.data = f, geom = "crossbar",
colour = NA, fill = "skyblue", width = 0.75, alpha = .9)+
ylim(0,1000)#this is the value that I change that results in different plots and shaded confidence intervals
Here is the plot with
ylim(0,1000)
And using the same data but changing the limit to
ylim(0,3000)
results in this plot:
As you can see the values in the boxplots appear to be adjusted according to the limit used. Instead of plotting to the edge of the limit the percentiles are reduced. This is apparent when you compare the middle boxplot in the top-left panel of both grids.
There are differences in the confidence intervals also as can be seen.
Does this mean geom_boxplot is discarding the data above the limit or is there something I'm missing?
I want to include all the data when plotting the boxplot & confidence intervals but limit the scale so it can be seen clearly. It means not seeing some major outliers in the data but for my purposes that is fine.
Has anyone got any suggestions as to what is going on here & how to get around it without potentially dropping the values from the data outside the visual range chosen for my calculation?
Thanks as always.

From ?ylim "Observations not in this range will be dropped completely and not passed to any other layers. If a NA value is substituted for one of the limits that limit is automatically calculated."
If you want to adjust the limits without affecting the data, use coord_cartesian instead.

The function ylim clearly influences which data points are used for plotting.
T avoid this, you may want to use coord_cartesian, which will not change the underlying data.
Try to replace ylim(0,1000) with:
coord_cartesian(ylim = c(0,1000))

Related

Points keep getting cut off, and standard fixes don't work well with facet grid on a log scale

Novice R user here wrestling with some arcane details of ggplot
I am trying to produce a plot that charts two data ranges: One plotted as a line, and another plotted on the same plot, but as points. The code is something roughly like this:
ggplot(data1, aes(x = Year, y = Capacity, col = Process)) +
geom_line() +
facet_grid(Country ~ ., scales = "free_y") +
scale_y_continuous(trans = "log10") +
geom_point(data = data2, aes(x = Year, y = Capacity, col = Process))
I've left out some additional cosmetic arguments for the sake of simplicity.
The problem is that the points from the geom_point keep getting cut off by the x axis:
I know the standard fix here would be to adjust the y limits to make room for the points:
scale_y_continuous(limits = c(-100, Y_MAX))
But here there is a separate problem due to the facet grid with free scales, since there is no single value for Y_MAX
I've also tried it using expansions:
scale_y_continuous(expand = c(0.5, 0))
But here, it runs into problems with the log scale, since it multiplies by different values for each facet, producing very wonky results.
I just want to produce enough blank space on the bottom of each facet to make room for the point. Or, alternatively, move each point up a little bit to make room. Is there any easy way to do this in my case?

This might be a good place for scales::pseudo_log_trans, which combines a log transformation with a linear transformation (and a flipped sign log transformation) to retain most of the benefits of a log transformation while also allowing zero and negative values. Adjust the sigma parameter of the function to adjust where the transition from linear to log should happen.
library(ggplot2)
ggplot(data = data.frame(country = rep(c("France","USA"), each = 5),
x = rep(1:5, times = 2),
y = c(10^(2:6), 0, 10^(1:4))),
aes(x,y)) +
geom_point() +
# scale_y_continuous(trans = "log10") +
scale_y_continuous(trans = scales::pseudo_log_trans(),
breaks = c(0, 10^(0:6)),
labels = scales::label_number_si()) +
facet_wrap(~country, ncol = 1, scales = "free_y")
vs. with (trans = "log10"):

How to make consistent color scales between two plots with geom_hex

I'm trying to make a before/after comparison between two plots, so I need them to both have the same color scale so a true comparison can be made. I have been trying for a while to change the color scales of geom_hex, but I have only find ways of providing a min/max cut off. Is there anyway of manually setting the scale to be a defined range, e.g. 1-100? My plot code and examples are below.
ggplot() +
geom_hex(aes(x=VolumeBefore$Flow, y=SpeedBefore$Speed)) +
xlab("Flow") + ylab("Speed (MPH)")+
theme(legend.justification=c(1,0), legend.position=c(1,0), text = element_text(size = 20)) +
ggtitle('Speed-Flow Before Density Plot')
ggplot() +
geom_hex(aes(x=VolumeAfter$Flow, y=SpeedAfter$Speed)) +
xlab("Flow") + ylab("Speed (MPH)")+
theme(legend.justification=c(1,0), legend.position=c(1,0), text = element_text(size = 20)) +
ggtitle('Speed-Flow After Density Plot')
Before Plot
After Plot
In these two images you can see the scales are different, I just want to make them the same :)
Thanks!

Here's a way using scale_fill_gradient2 and oob = scales::squish, which lets you specify the lower and upper range for the fill, and constrains any values beyond that range.
Note that without specifications, the fill will use the full range of densities in the data:
ggplot(diamonds) +
geom_hex(aes(x=carat,y=price)) +
scale_fill_gradient2()
We could alternately specify the range directly, clamping anything beyond into that range. That will let you match multiple plots' legend ranges:
ggplot(diamonds) +
geom_hex(aes(x=carat,y=price)) +
scale_fill_gradient2(limits = c(0, 3000), oob = scales::squish)

ggplot2 question - How to graph on a log scale for values < 0

I've been playing with a shiny application to visualize data from the International Monetary Fund database.
One of the tabs in the application consists in plotting two indicators against one another using geom_point().
I thought that plotting the data on a log scale might be helpful for visualization purposes, as the data distribution is often skewed.
On the one hand, no issues are encountered for values that are strictly positive.
On the other hand, since some indicators are measured in percent change, which can be negative, ggplot simply excludes the values from the graph (which makes sense given the fact that the log of a negative value is undefined).
Might sound like a weird question but here it is:
Is there a method by which you can show negative values, while still plotting the axes on a log scale?
Please take a look at the code below:
output$interact1 <- renderPlot({
x <- ggplot(filter(testplot, Year == input$y),
aes_string(x = input$indicator, y = input$indicator2))
x + geom_jitter(aes_string( col = "Continent", size = "Population")) +
scale_size_continuous(breaks = c(0,5,50,100,200,1000)) +
theme(legend.position = "bottom") +
scale_x_log10() +
scale_y_log10() +
geom_smooth(aes_string(input$indicator, input$indicator2),
method = "loess", col = "black") +
geom_smooth(method = "lm", col = "red", show.legend = F) +
ylab(gsub("_"," ",input$indicator2)) +
xlab(gsub("_", " ", input$indicator)) +
labs(caption = paste(gsub("_"," ",input$indicator2),
"vs.",
gsub("_", " ", input$indicator)))})
The screenshot below may also be helpful in understanding what it is that I am trying to do here:
Edit:
To answer a question asked below, the attached bar graph shows the distribution of GDP % Change in the year 2018 (I know the labels might be a little hard to read).

Reduce space between groups of bars in ggplot2

I haven't been able to remove extra white space flanking groups of bars in geom_plot.
I'd like to do what Roland achieves here: Remove space between bars ggplot2 but when I try to implement his solution I get the error "Warning message:
geom_bar() no longer has a binwidth parameter. Please use geom_histogram() instead."
I added this line of code to my plot (trying different widths):
geom_histogram(binwidth = 0.5) +
which returns "Error: stat_bin() must not be used with a y aesthetic." and no plot.
Data:
mydf<- data.frame(Treatment = c("Con", "Con", "Ex", "Ex"),
Response = rep(c("Alive", "Dead"), times=2),
Count = c(259,10,290,21))
aPalette<-c("#009E73", "#D55E00")
Plot:
example<-ggplot(mydf, aes(factor(Response), Count, fill = Treatment)) +
geom_bar(stat="identity",position = position_dodge(width = 0.55), width =
0.5) +
scale_fill_manual(values = aPalette, name = "Treatment") + #legend title
theme_classic() +
labs(x = "Response",
y = "Count") +
scale_y_continuous(breaks = c(0,50,100,150,200,250,275), expand = c(0,0),
limits = c(0, 260)) +
theme(legend.position = c(0.7, 0.3)) +
theme(text = element_text(size = 15)) #change all text size
example
Returns:
Note: I don't know why I'm getting "Warning message: Removed 1 rows containing missing values (geom_bar)." but I'm not concerned about it because that doesn't happen using my actual data
**Edit re: note - this is happening because I set the limit for the y-axis lower then the max value for the bar that was removed. I'm not going to change to code so I don't have to redraw my figure, but changing
limits = c(0, 260)
to
limits = c(0, 300)
will show all the bars. In case someone else had a similar problem. I'm going to find a post related to this issue and will make this edit more concise when I can link an answer

Forgive me if I completely missed what your trying to accomplish here but the only reason that ggplot has included so much white space is because you constrained the bars to a particular width and increased the size of the graph.
The white space within the graph is an output of width of the bars and width of the graph.
Using your original graph...
We notice a lot of whitespace but you both made the bins small and your graph wide. Think of the space as a compromise between bins and whitespace. Its illogical to expect a wide graph with small bins and no whitespace. To fix this we can either decrease the graph size or increase the bin size.
First we increase the bin size back to normal by removing your constraints.
Which looks rediculous....
But by looking at the Remove space between bars ggplot2 link that you included above all he did was remove constraints and limit width. Doing so would result in a similar graph...
Including the graph from your link above....
And removing all of your constraints....
example<-ggplot(mydf, aes(factor(Response), Count, fill = Treatment)) +
geom_bar(stat="identity",position = position_dodge()) +
scale_fill_manual(values = aPalette, name = "Treatment") +
theme_bw() +
labs(x = "Response", y = "Count")
example
If your goal was not to make your graph similar to the one in the link by removing whitespace let me know, other then that I hope this helped.

Position dodge with error bars in ggplot2

I am trying to use ggplot to show a series of confidence intervals across different time points. I have two sets of confidence intervals, one parametric and one bootstrap, and I would like to display them using geom_errorbar(). I tried using position_dodge() so the two CI won't directly overlay one another, but it is not working. How do I jitter the CI at the same time point?
pd <- position_dodge(.6)
ggplot(results, aes(x=intervals, y = change)) +
geom_errorbar(aes(ymin=ci.par.low, ymax=ci.par.hi), position = pd, width=.1, colour =
"green") +
geom_errorbar(aes(ymin=ci.boot.low, ymax=ci.boot.hi), width=.1, colour = "blue") +
geom_abline(intercept = slope.est, slope = 0, colour = "red") +
labs(title = paste("Protein ID:", prot.name))

I accomplished my goal with position_jitter(), though it's clunky.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

geom_boxplot behaving oddly? - r

From ?ylim "Observations not in this range will be dropped completely and not passed to any other layers. If a NA value is substituted for one of the limits that limit is automatically calculated." If you want to adjust the limits without affecting the data, use coord_cartesian instead.

The function ylim clearly influences which data points are used for plotting. T avoid this, you may want to use coord_cartesian, which will not change the underlying data. Try to replace ylim(0,1000) with: coord_cartesian(ylim = c(0,1000))

Related

Points keep getting cut off, and standard fixes don't work well with facet grid on a log scale

How to make consistent color scales between two plots with geom_hex

ggplot2 question - How to graph on a log scale for values < 0

Reduce space between groups of bars in ggplot2

Position dodge with error bars in ggplot2

Categories

Resources