keep the original order when using ggplot - r

I have this simple code, trying to plot the figure. My intention was to plot the x axis ordered as what I made, i.e. the same as order_num: from 1:10 and then 10+. However, ggplot changed my order. How could I keep the original order I put them in the data frame.
data_order=data.frame(order_num=as.factor(c(rep(1:10),"10+")),
ratio=c(0.18223,0.1561,0.14177,0.1163,0.09646,
0.07518,0.05699,0.04,0.0345,0.02668,0.006725))
ggplot(data_order,aes(x=order_num,y=ratio))+geom_bar(stat = 'identity')

Reading the data: (Notice the removal of as.factor, we will be doing it in the next step. This is not mandatory!)
data_order=data.frame(order_num=c(rep(1:10),"10+"),
ratio=c(0.18223,0.1561,0.14177,0.1163,0.09646,
0.07518,0.05699,0.04,0.0345,0.02668,0.006725))
You need to work with the dataframe instead of the ggplot.
data_order$order_num <- factor(data_order$order_num, levels = data_order$order_num)
Once you change the levels, it will be as expected.
ggplot(data_order,aes(x=order_num,y=ratio))+geom_bar(stat = 'identity')

Related

Reordering data based on a column in [r] to order x-value items from lowest to highest y-values in ggplot

I have a dataframe that I want to reorder to make a ggplot so I can easily see which items have the highest and lowest values in them. In my case, I've grouped the data into two groups, and it'd be nice to have a visual representation of which group tends to score higher. Based on this question I came up with:
library(ggplot2)
cor.data<- read.csv("https://dl.dropbox.com/s/p4uy6uf1vhe8yzs/cor.data.csv?dl=0",stringsAsFactors = F)
cor.data.sorted = cor.data[with(cor.data,order(r.val,pic)),] #<-- line that doesn't seem to be working
ggplot(cor.data.sorted,aes(x=pic,y=r.val,size=df.val,color=exp)) + geom_point()
which produces this:
I've tried quite a few variants to reorder the data, and I feel like this should be pretty simple to achieve. To clarify, if I had succesfully reorganised the data then the y-values would go up as the plot moves along the x-value. So maybe i'm focussing on the wrong part of the code to achieve this in a ggplot figure?
You could do something like this?
library(tidyverse);
cor.data %>%
mutate(pic = factor(pic, levels = as.character(pic)[order(r.val)])) %>%
ggplot(aes(x = pic, y = r.val, size = df.val, color = exp)) + geom_point()
This obviously still needs some polishing to deal with the x axis label clutter etc.
Rather than try to order the data before creating the plot, I can reorder the data at the time of writing the plot:
cor.data<- read.csv("https://dl.dropbox.com/s/p4uy6uf1vhe8yzs/cor.data.csv?dl=0",stringsAsFactors = F)
cor.data.sorted = cor.data[with(cor.data,order(r.val,pic)),] #<-- This line controls order points drawn created to make (slightly) more readible plot
gplot(cor.data.sorted,aes(x=reorder(pic,r.val),y=r.val,size=df.val,color=exp)) + geom_point()
to create

simple boxplot using qplot/ggplot2

This is my first post, so go easy. Up until now (the past ~5 years?) I've been able to either tweak my R code the right way or find an answer on this or various other sites. Trust me when I say that I've looked for an answer!
I have a working script to create the attached boxplot in basic R.
http://i.stack.imgur.com/NaATo.jpg
This is fine, but I really just want to "jazz" it up in ggplot, for vain reasons.
I've looked at the following questions and they are close, but not complete:
Why does a boxplot in ggplot requires axis x and y?
How do you draw a boxplot without specifying x axis?
My data is basically like "mtcars" if all the numerical variables were on the same scale.
All I want to do is plot each variable on the same boxplot, like the basic R boxplot I made above. My y axis is the same continuous scale (0 to 1) for each box and the x axis simply labels each month plus a yearly average (think all the mtcars values the same on the y axis and the x axis is each vehicle model). Each box of my data represents 75 observations (kind of like if mtcars had 75 different vehicle models), again all the boxes are on the same scale.
What am I missing?
Though I don't think mtcars makes a great example for this, here it is:
First, we make the data (hopefully) more similar to yours by using a column instead of rownames.
mt = mtcars
mt$car = row.names(mtcars)
Then we reshape to long format:
mt_long = reshape2::melt(mt, id.vars = "car")
Then the plot is easy:
library(ggplot2)
ggplot(mt_long, aes(x = variable, y = value)) +
geom_boxplot()
Using ggplot all but requires data in "long" format rather than "wide" format. If you want something to be mapped to a graphical dimension (x-axis, y-axis, color, shape, etc.), then it should be a column in your data. Luckily, it's usually quite easy to get data in the right format with reshape2::melt or tidyr::gather. I'd recommend reading the Tidy Data paper for more on this topic.

R ggplot geom_text Aesthetic Length

I'm working with a really big data setcontaining one dummy variable and a factor variable with 14 levels- a sample of which I have posted here. I'm trying to make a stacked proportional bar graph using the following code:
ggplot(data,aes(factor(data$factor),fill=data$dummy))+
geom_bar(position="fill")+
ylab("Proportion")+
theme(axis.title.y=element_text(angle=0))
It works great and its almost the plot I need. I just want to add small text labels reporting the number of observations of each factor level. My intuition tells me that something like this should work
Labels<-c("n=1853" , "n=392", "n=181" , "n=80", "n=69", "n=32" , "n=10", "n=6", "n=4", "n=5", "n=3", "n=3", "n=2", "n=1" )
ggplot(data,aes(factor(data$factor),fill=data$dummy))+
geom_bar(position="fill")+
geom_text(aes(label=Labels,y=.5))+
ylab("Proportion")+
theme(axis.title.y=element_text(angle=0))
But it spits out a blank graph and the error
Aesthetics must either be length one, or the same length as the dataProblems:Labels
this really doesn't make sense to me because I know for a fact that the length of my factor levels is the same length as the number of labels I muscled in. I've been trying to figure out how I can get it to just print what I need without creating a vector of values for the number of observations like this example, but no matter what I try I always get the same Aesthetics error.
How about this:
library(dplyr)
# Create a separate data frame of counts for the count labels
counts = data %>% group_by(factor) %>%
summarise(n=n()) %>%
mutate(dummy=NA)
counts$factor = factor(counts$factor, levels=0:10)
ggplot(data, aes(factor(factor), fill=factor(dummy))) +
geom_bar(position="fill") +
geom_text(data=counts, aes(label=n, x=factor, y=-0.03), size=4) +
ylab("Proportion")+
theme(axis.title.y=element_text(angle=0))
Your method is the right idea, but Labels needs to be a data frame, rather than a vector. geom_text needs to be given the name of the data frame using the data argument. Then, the label argument inside aes tells geom_text which column to use for the labels. Also, even though geom_text doesn't use the dummy column, it has to be in the data frame or you'll get an error.

Making ordered heat maps in qplot (ggplot2)

I am making heat maps from correlations. I have two columns that represent ID's and a third column that gives the correlation between those two datapoints. I am struggling to get qplot to keep the order of my data in the file. Link to data:
https://www.dropbox.com/s/3l9p1od5vjt0p4d/SNPS.txt?n=7399684
Here is the code I am using to make the plot:
test <- qplot(x=x, y=y, data=PCIT, fill = col1, geom = "tile")
I have tried several order options but they don't seem to do the trick? Ideas?
Thanks and Happy Holidays
You need to set the levels of the factors x and y to be in the order you want them (as they come in from the file). Try
PCIT$x <- factor(PCIT$x, levels=unique(as.character(PCIT$x)))
and similarly with y.

Ordering the bars of a stacked bar graph in ggplot from least to greatest

Is there a way to specify that I want the bars of a stacked bar graph in with ggplot ordered in terms of the total of the four factors from least to greatest? (so in the code below, I want to order by the total of all of the variables) I have the total for each x value in a dataframe that that I melted to create the dataframe from which I formed the graph.
The code that I am using to graph is:
ggplot(md, aes(x=factor(fullname), fill=factor(variable))) + geom_bar()
My current graph looks like this:
http://i.minus.com/i5lvxGAH0hZxE.png
The end result is I want to have a graph that looks a bit like this:
http://i.minus.com/kXpqozXuV0x6m.jpg
My data looks like this:
(source: minus.com)
and I melt it to this form where each student has a value for each category:
melted data http://i.minus.com/i1rf5HSfcpzri.png
before using the following line to graph it
ggplot(data=md, aes(x=fullname, y=value, fill=variable), ordered=TRUE) + geom_bar()+ opts(axis.text.x=theme_text(angle=90))
Now, I'm not really sure that I understand the way Chi does the ordering and if I can apply that to the data from either of the frames that I have. Maybe it's helpful that that the data is ordered in the original data frame that I have, the one that I show first.
UPDATE: We figured it out. See this thread for the answer:
Order Stacked Bar Graph in ggplot
I'm not sure about the way your data were generated (i.e., whether you use a combination of cast/melt from the reshape package, which is what I suspect given the default name of your variables), but here is a toy example where sorting is done outside the call to ggplot. There might be far better way to do that, browse on SO as suggested by #Andy.
v1 <- sample(c("I","S","D","C"), 200, rep=T)
v2 <- sample(LETTERS[1:24], 200, rep=T)
my.df <- data.frame(v1, v2)
idx <- order(apply(table(v1, v2), 2, sum))
library(ggplot2)
ggplot(my.df, aes(x=factor(v2, levels=LETTERS[1:24][idx], ordered=TRUE),
fill=v1)) + geom_bar() + opts(axis.text.x=theme_text(angle=90)) +
labs(x="fullname")
To sort in the reverse direction, add decr=TRUE with the order command. Also, as suggested by #Andy, you might overcome the problem with x-labels overlap by adding + coord_flip() instead of the opts() option.

Resources