Why does a boxplot in ggplot requires axis x and y?

Why does a boxplot in ggplot requires axis x and y? - r

I have a variable ceroonce which is number of schools per county (integers) in 2011. When I plot it with boxplot() it only requires the ceroonce variable. A boxplot is then retrieved in which the y axis is the number of schools and the x axis is... the "factor" ceroonce. But in ggplot, when using geom_boxplot, it requires me to input both x and y axis, but I just want a boxplot of ceroonce. I have tried inputing ceroonce as both the x and y axis. But then a weird boxplot is retrieved in which the y axis is the number of schools but the x axis (which should be the factor variable) is also the number of schools? I am assuming this is very basic statistics, but I am just confused. I am attaching the images hoping this will clarify my question.
This is the code I am using:
ggplot(escuelas, aes(x=ceroonce, y=ceroonce))+geom_boxplot()
boxplot(escuelas$ceroonce)

ggplot(escuelas, aes(x="ceroonce", y=ceroonce))+geom_boxplot()
ggplot will interpret the character string "ceroonce" as a vector with the same length as the ceroonce column and it will give the result you're looking for.

There are no fancy statistics happening here. boxplot is simply assuming that since you've given it a single vector, that you want a single box in your boxplot. ggplot and geom_histogram simply don't make that assumption.
If you want a bit less typing, you can do this:
qplot(y=escuelas$ceroonce, x= 1, geom = "boxplot")
ggplot2 will automatically create a vector of 1s equal in length to the length of escuelas$ceroonce

This could work for you:
ggplot(escuelas, aes(x= "", y=ceroncee)) + geom_boxplot()

Related

How to swap axes on boxplot when one variable is discrete and the other is continuous

I currently have a boxplot that looks like this in R:
However, I would like the variable "length" to be on the y-axis instead of the x-axis, and the variable "group" to be on the x-axis instead (i.e. flip the axes).
I tried the following code, with p being the formula for the boxplot on ggplot2, but I think it does not work as "group" is a discrete variable whereas "length" is a continuous variable:
p + scale_x_reverse()
Is there a way to make it work? Any help will be appreciated, thanks!

geom_histogram does not show all values in x axis

Trying to run a simple and quick analysis of some variables. I run this code:
ggplot(data, aes(var1)) +
geom_bar()
Resulting in a Histogram however in spite of having only 6 possible values in var1, x Axis only shows 2,4,6. Is it possible to easily include all 6 possible values as labels?

You want to have frequency bar plot for six individual numbers. However, you wish to see all of these numbers on the X axis, which makes me think that you actually treat them as categorical data rather then numeric data, so you actually would prefer a categorical X axis which shows all the data. Turning the x into a factor should do the trick:
data <- data.frame(var1=floor(6*runif(200) + 1))
ggplot(data, aes(factor(var1))) + geom_bar()
Below: left - without factor, right - with factor.

What does your data look like?
Assuming you have a numeric x, adding scale_x_continuous(breaks = seq(1,6, by = 1))should work.
Of course this would only work if the x values go from 1 to 6... Otherwise you can replace the seq call with a vector that contains the values you want.

Plot with two different x axis for the same variable in R

I am trying to create a plot that displays a line with two x axis, one is a continuous numeric and the other is discrete.
This an example of the data:
df <-cbind.data.frame("Category"=c("A","A","A","A","A","B","B","B","B","B"),
"Y"=c(5,6,4,8,9,4,5,3,7,8),
"X1"=c(0,10,20,30,40,0,10,20,30,40),
"X2"=c(0,0,1,1,2,0,1,2,2,3))
I tried to add a secondary axis and re-scale it, but since my two variables are not proportional I don't know how to re-scale so the same Y point in the line will fit both x axis.
ggplot(data=df) +
geom_path(aes(y=Y,x=X1),color="red")+
geom_path(aes(y=Y,x=X2*10),color="blue")+
facet_wrap(~Category)+
scale_y_continuous("Y")+
scale_x_continuous("X1",sec.axis = sec_axis(~ .*1/10, "X2"))
I read different problems with two axis, but was not able to find a solution for my problem.
I am looking for something like this:
I will appreciate a lot any help on this!

The plot you provide does not evidence a clear algebraic relationship, so I'm going to give you an example of a completely-arbitrary second x-axis.
library(ggplot2)
ggplot(mtcars, aes(mpg, disp)) +
geom_point() +
scale_x_continuous(sec.axis=sec_axis(~., breaks=c(15,20,30), labels=c('a','b','c')))
The first argument is the transformation "~." (essentially x2=x1) and is required, so in this case it's a 1-for-1 transformation. The other two are relatively clear, you place 'a' at x=15, 'b' at x=20, etc. I don't think there's a way to put both on the same axis (with ggplot2 alone).

Plot a 'top 10' style list/ranking in R based on numerical column of dataframe

I have an R dataframe that contains a string variable and a numerical variable, and I would like to plot the top 10 strings, based on the value of the numerical variable.
I can of course get the top 10 entries pretty simply:
top10_rank <- rank[order(rank$numerical_var_name),]
My first approach to trying to visualize this was to simple attempt to plot this like:
ggplot(data=top10_rank, aes(x = top10_rank$numerical_var_name, y = top10_rank$string_name)) + geom_point(size=3)
And to a first approximation this "works" - the problem is that the strings on the y axis are sorted alphabetically rather than by the numerical value.
My preference would be to find a way to plot the top 10 strings without having to bother showing the numerical variable at all - just basically as a list (even better would be if I could enumerate the list). I am attempting to plot this so it looks more pleasing than simply dumping the text to the screen.
Any ideas greatly appreciated!

The y-axis tick marks may be sorted alphabetically, but the points are drawn in order(from left to right) of the top10_rank dataframe. What you need to do is change the order of the y-axis. Add this to your call of ggplot + scale_y_discrete(limits=top10_rank$String) and it should work.
ggplot(data=top10_rank, aes(x = top10_rank$Number,
y = top10_rank$String)) + geom_point(size=3) + scale_y_discrete(limits=top10_rank$String)
Here is a link to a great resource on R graphics: R Graphics Cookbook

Plotting percent change for a large number of factors on same figure using ggplot by faceting or color-coding factors

Here is an example of the code I'm working with
x<-as.factor(rep(c("tree_mean","tree_qmean","tree_skew"),3))
factor<-c(rep("mfn2_burned_99",3),rep("mfna_burned_5_7",3),rep("mfna_burned_5_7_10_12",3)))
y<-c(0.336457409,-0.347422910,-0.318945621,1.494109367, 0.003578698,-0.019985780,-0.484171146, 0.611589217,-0.322292664)
dat<-as.data.frame(cbind(x,factor,y))
head(dat)
x factor y
tree_mean mfn2_burned_99 -0.3364574
tree_qmean mfn2_burned_99 -0.3474229
tree_skew mfn2_burned_99 -0.3189456
tree_mean mfna_burned_5_7 -0.8269814
tree_qmean mfna_burned_5_7 -0.8088810
tree_skew mfna_burned_5_7 -2.5429226
tree_mean mfna_burned_5_7_10_12 -0.8601206
tree_qmean mfna_burned_5_7_10_12 -0.8474920
tree_skew mfna_burned_5_7_10_12 -2.9854178
I am trying to plot how much x deviates from 0, and facet it by each factor, as so:
ggplot(dat) +
geom_point(aes(x=x,y=y),shape=1,size=3)+
geom_linerange(aes(x=x,ymin=0,ymax=y))+
geom_hline(yintercept=0)+
facet_grid(factor~.)
This works fine when I have three factors (ignore the *: I had a significance column which I have since removed.
Example below:
However, I have 8 factors in total, and faceting obscures the plot such that the distance from zero for each x value gets very distorted.
Example below
So, my question is this: what would be a better way of coding/rendering this plot given my large number of x values and factors using faceting or color coding by factor in ggplot??
I would be very open to color-coding each distance for x by factor rather than faceting, but I have been beating my head against the wall trying to figure out how to even do that in ggplot (very new to ggplot), so I can't yet say if it would make the figure much more interpretable.

One option as you note is to color your point and/or linerange by a factor. You can then use position_dodge to move the points slightly on the x axis.
For example:
ggplot(dat, aes(color = factor)) +
geom_point(aes(x=x,y=y),shape=1,size=3, position = position_dodge(width = 0.5)+
geom_linerange(aes(x=x,ymin=0,ymax=y), position = position_dodge(width =0.5))+
geom_hline(yintercept=0)
I think this would still be difficult with many factors, but with 8 it might suit your purposes.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Why does a boxplot in ggplot requires axis x and y? - r

ggplot(escuelas, aes(x="ceroonce", y=ceroonce))+geom_boxplot() ggplot will interpret the character string "ceroonce" as a vector with the same length as the ceroonce column and it will give the result you're looking for.

This could work for you: ggplot(escuelas, aes(x= "", y=ceroncee)) + geom_boxplot()

Related

How to swap axes on boxplot when one variable is discrete and the other is continuous

geom_histogram does not show all values in x axis

Plot with two different x axis for the same variable in R

Plot a 'top 10' style list/ranking in R based on numerical column of dataframe

Plotting percent change for a large number of factors on same figure using ggplot by faceting or color-coding factors

Categories

Resources