GGplot - how to not treat points as outliers? - r

In the below script, outliers on the boxplot are shown as individual scatter points. Instead, I would like the creation of the boxplot to include these and to not treat these points as outliers. Consequently, the box would be extended to include them.
ggplot(imp,aes(Group,LWG,fill=Group))+geom_boxplot()
As per the below picture, the bottom of the left boxplot would extend downwards further.

That would be inappropriate to extend the boxplot. The main thing about them is to show the quantiles, therefor an extension would make the boxplot statically wrong in its interpretation.
But you can remove the outliers with:
geom_boxplot(outlier.shape = NA)

Related

How to get `geom_smooth` style CIs when using `emmip`

I have a model and a graph in R:
fit.lmer = lmer(std_brain ~ std_beh*type*taught + (1|subject/run), data=avg_data)
graph_lmer = emmip(fit.lmer, type~std_beh | taught , at=list(type=type, std_beh=std_beh,
taught=taught), CIs=FALSE)
I always set CIs to false because the default style of the CIs makes the graph totally illegible -- there's a colorful vertical bar at every marking point on three different lines. You can't see the actual lines on the graph. But I see examples of the kind of bands I'd like using geom_smooth and stat_smooth with ggplot. Here's an example -- there's a solid band, rather than bars on points, and it's gray: However, I am not graphing points, I'm graphing marginal means, and so I don't think geom_smooth or stat_smooth are appropriate. What I really want is just to apply that style to my emmip graph. I can't find an example.

Mixed geom_line & geom_point plot: remove marker from color scale

I often have to use plots mixing lines and points (ggplot2), with the colors of the line representing one variable (here, "Dose"), and the shape of the points another one (here, "Treatment). Figure 1 shows what I typically get:
Figure 1: what I get
I like having different legends for the two variables, but would like to remove the round markers from the color scale, to only show the colors (see legend mockup below, made with Gimp). Doing so would allow me to have a clean legend, with colors and shapes clearly segregated.
Figure 2 (mockup): what I would like
Would anyone know if there is a way to do that? Any help would be much appreciated.
Note: the plots above show means and error bars, but I have the same problem with any plot mixing geom_line and geom_point, even simple ones.
Thanks in advance !

ggplot draw multiple plots by levels of a variable

I have a sample dataset
d=data.frame(n=rep(c(1,1,1,1,1,1,2,2,2,3),2),group=rep(c("A","B"),each=20),stringsAsFactors = F)
And I want to draw two separate histograms based on group variable.
I tried this method suggested by #jenesaisquoi in a separate post here
Generating Multiple Plots in ggplot by Factor
ggplot(data=d)+geom_histogram(aes(x=n,y=..count../sum(..count..)),binwidth = 1)+facet_wrap(~group)
It did the trick but if you look closely, the proportions are wrong. It didn't calculate the proportion for each group but rather a grand proportion. I want the proportion to be 0.6 for number 1 for each group, not 0.3.
Then I tried dplyr package, and it didn't even create two graphs. It ignored the group_by command. Except the proportion is right this time.
d%>%group_by(group)%>%ggplot(data=.)+geom_histogram(aes(x=n,y=..count../sum(..count..)),binwidth = 1)
Finally I tried factoring with color
ggplot(data=d)+geom_histogram(aes(x=n,y=..count../sum(..count..),color=group),binwidth = 1)
But the result is far from ideal. I was going to accept one output but with the bins side by side, not on top of each other.
In conclusion, I want to draw two separate histograms with correct proportions calculated within each group. If there is no easy way to do this, I can live with one graph but having the bins side by side, and with correct proportions for each group. In this example, number 1 should have 0.6 as its proportion.
By changing ..count../sum(..count..) to ..density.., it gives you the desired proportion
ggplot(data=d)+geom_histogram(aes(x=n,y=..density..),binwidth = 1)+facet_wrap(~group)
You actually have the separation of charts by variable correct! Especially with ggplot, you sometimes need to consider the scales of the graph separately from the shape. Facet_wrap applies a new layer to your data, regardless of scale. It will behave the same, no matter what your axes are. You could also try adding scale_y_log10() as a layer, and you'll notice that the overall shape and style of your graph is the same, you've just changed the axes.
What you actually need is a fix to your scales. Understandable - frequency plots can be confusing. ..count../sum(..count..)) treats each bin as an independent unit, regardless of its value. See a good explanation of this here: Show % instead of counts in charts of categorical variables
What you want is ..density.., which is basically the count divided by the total count. The difference is subtle in principle, but the important bit is that the value on the x-axis matters. For an extreme case of this, see here: Normalizing y-axis in histograms in R ggplot to proportion, where tiny x-axis values produced huge densities.
Your original code will still work, just substituting the aesthetics I described above.
ggplot(data=d)+geom_histogram(aes(x=n,y=..density..,)binwidth = 1)+facet_wrap(~group)
If you're still confused about density, so are lots of people. Hadley Wickham wrote a long piece about it, you can find that here: http://vita.had.co.nz/papers/density-estimation.pdf

R stack multiple boxplot on top of each other

I am trying to make some boxplots. Here is a sample data
set.seed(1)
a<-rnorm(100)
a1<-rnorm(100);a2<-rnorm(100);a3<-rnorm(100);a4<-rnorm(100)
b1<-rnorm(100);b2<-rnorm(100);b3<-rnorm(100);b4<-rnorm(100)
c1<-rnorm(100);c2<-rnorm(100);c3<-rnorm(100);c4<-rnorm(100)
d1<-rnorm(100);d2<-rnorm(100);d3<-rnorm(100);d4<-rnorm(100)
e1<-rnorm(100);e2<-rnorm(100);e3<-rnorm(100);e4<-rnorm(100)
f1<-rnorm(100);f2<-rnorm(100);f3<-rnorm(100);f4<-rnorm(100)
dat<-data.frame(a,a1,a2,a3,a4,b1,b2,b3,b4,c1,c2,c3,c4,d1,d2,d3,d4,e1,e2,e3,e4,f1,f2,f3,f4)
par(mfrow=c(4,1))
boxplot(dat$a,dat$a1,dat$b1,dat$c1,dat$d1,dat$e1,dat$f1)
boxplot(dat$a,dat$a2,dat$b2,dat$c2,dat$d2,dat$e2,dat$f2)
boxplot(dat$a,dat$a3,dat$b3,dat$c3,dat$d3,dat$e3,dat$f3)
boxplot(dat$a,dat$a4,dat$b4,dat$c4,dat$d4,dat$e4,dat$f4)
And this is the resultant plot
As you can see, the four boxplots lie on top of each other. Is there any way I can combine these plots on top of each other so that there is no spaces between them as well as make the size of boxplot small (i.e. the boxes inside the plots)
I thought doing a par(mfrow=c(4,1)) should do the trick but it is leaving a lot of spaces between the plots. Ideally, I would want a single x-axis and single y-axis (further split into four axis to show the values of each of the plots)
Thanks
You can use par(mar=c(0,0,0,0)) to get rid of the entire figure margin. Adjusting the four mar values will change the margins (see ?par).
As for changing the size of the boxplots, you can adjust the boxwex argument in the boxplot function (see ?boxplot). Here is code that changes both mar and boxwex.
par(mfrow=c(4,1), mar=c(2,3,0,1))
boxplot(dat$a,dat$a1,dat$b1,dat$c1,dat$d1,dat$e1,dat$f1, boxwex=0.25)
boxplot(dat$a,dat$a2,dat$b2,dat$c2,dat$d2,dat$e2,dat$f2, boxwex=0.5)
boxplot(dat$a,dat$a3,dat$b3,dat$c3,dat$d3,dat$e3,dat$f3, boxwex=0.75)
boxplot(dat$a,dat$a4,dat$b4,dat$c4,dat$d4,dat$e4,dat$f4, boxwex=1,
names=1:7)
You can set the first element of mar to 0 if you want to completely get rid of the space between the plots, but that doesn't seem like it would look particularly nice, and that makes it trickier to get the x-axis in the bottom figure without changing its size relative to the first three plots.
Another alternative you could try is to put all the boxplots into one plot, but have side-by-side boxplots for each category (1-7). You can use the at argument in the boxplot function to specify the position of each boxplot along the x-axis.

In Stata, how do I modify axes of dot chart?

I'm trying to create a dot chart in Stata, splitting it into two categories
Running a chunk of code:
sysuse nlsw88, clear
drop if race == 3
graph dot (mean) wage, over(occ) by(race)
Creates such output:
So far so good but I'd like to remove labels of Y axis from the right graph to give the data some more space.
The only way I've been able to do that was to manually edit graph and hide the axis label object:
Is there a way to do it programmatically? I do know I could use one more over() but in some graphs of mine that is already taken.
I believe the solution is buried in help bystyle and help by_option. However, I can't get it to work with your example (I'm on Stata 12). But the description is clear. For example:
A bystyle determines the overall look of the combined graphs,
including
whether the individual graphs have their own axes and labels or if instead the axes and labels are shared across graphs arrayed in the
same row and/or in the same column;
...
There are options that let you control each of the above attributes --
see [G-3] by_option --
And also
iyaxes and ixaxes (and noiyaxes and noixaxes) specify whether the y axes and x axes are
to be displayed with each graph. The default
with most styles and
schemes is to place y axes on the leftmost graph of each row and to place x axes on
the bottommost graph of each column. The y and
x axes include the
default ticks and labels but exclude the axes titles.
If for some reason that doesn't work out, something like
sysuse nlsw88, clear
drop if race == 3
graph dot (mean) wage, over(occ) by(race)
gr_edit .plotregion1.grpaxis[2].draw_view.setstyle, style(no)
does (but I don't really like the approach). You can mess with at least the axis number [#] to do a bit of customization. I guess recording changes in the graphical editor and then recycling the corresponding code, may be one way out of difficult situations.

Resources