I am hoping to make a stacked bar plot to show two factors. The questions and answers I can find on this site that address this problem all work with data that appears to be in a matrix format and use ggplot2. My data is in lists of observations, like this:
mydata = data.frame(V1=c("A","B","B","C","C"), V2=c("X","X","Y","Z","Z"))
I would like to show categories of V1 on the x axis of my plot, but stacked to show the proportions of V2 in each bar.
I can use the "count" function in the plyr library to find the frequency of each observation,
library(plyr)
mydata.count = count(mydata)
but I don't know how to structure my barplot command to group data by the level of V1: barplot(mydata.count$freq) separates all combinations of V1 and V2 into separate bars.
If possible, I would like to create this plot using the base R barplot functions so that it is visually consistent with other plots in my study.
Here is another possibility with ggplot:
ggplot(as.data.frame(table(mydata)), aes(x=V1, y=Freq, fill=V2)) + geom_bar(stat="identity")
ggplot(as.data.frame(table(mydata)), aes(x=V2, y=Freq, fill=V1)) + geom_bar(stat="identity")
Related
I have a stacked barchart that looks like this.
If I have a second dataframe that has the same layout as the one that created the plot, and I want to group both datasets by position while still keeping the stacked percentages, how would I go about this. I'm not sure how to do it in ggplot2
Hard to say without seeing the data and without more information about what you actually want to achieve, but the general approach I would use is to say combine your dataframes - especially if the variables are the same. You just want to make sure to maintain "where" each dataset originated, and that will be your identifying column.
So, if your data is in myData1 and myData2:
# add identifying columns
myData1$id <- 'dataset1'
myData2$id <- 'dataset2'
# put them together
newData <- rbind(myData1, myData2)
You are not clear on what you're looking for in the combined plot, so you can go about that any number of ways (depending on what you want to do). Maybe the simplest example would be to use facet_grid() or facet_wrap() from ggplot2 to show them in side-by-side plots:
ggplot(newData, aes(x=name, y=value)) +
geom_col(aes(fill=gene)) +
facet_wrap(~id)
I want to reduce clutter when plotting bars for different categories. That is, I use facets to compare the same categorical variables by other categorial variables.
For example, I use the tips dataset from reshape2:
library(reshape2)
library(ggplot2)
ggplot(tips, aes(x=time)) +
geom_bar(shape=1) +
facet_grid(. ~ sex)
The result is:
My desired change is that "Dinner" and "Lunch" only appear below the "Female" facet. I tried
scale_x_discrete(labels = c("with", "without", "", ""))
but of course without effect since there are only two categories within the variable time, so why take more than two elements in the labels vector? How can I accomplish my desired graph without the "draw two graphs and combine them"-workaround?
You can modify components of a ggplot using ggplot_build and ggplot_gtable:
x <- ggplot_gtable(ggplot_build(p))
If you look at str(x), you can then figure out where to change labels:
x$grobs[[8]]$children$axis$grobs[[2]]$label <- c('', '')
plot(x)
However, it's important to note that this may not work with future versions of ggplot2 if they decide to change the internal structure of plots.
I'm struggling with making a graph of proportion of a variable across a factor in ggplot.
Taking mtcars data as an example and stealing part of a solution from this question I can come up with
ggplot(mtcars, aes(x = as.factor(cyl))) +
geom_bar(aes(y = (..count..)/sum(..count..))) +
scale_y_continuous(labels = percent_format())
This graph gives me proportion of each cyl category in the whole dataset.
What I'd like to get though is the proportion of cars in each cyl category, that have automatic transmission (binary variable am).
On top of each bar I would like to add an error bar for the proportion.
Is it possible to do it with ggplot only? Or do I have to first prepare a data frame with summaries and use it with identity option of bar graphs?
I found some examples on Cookbook for R web page, but they deal with continuous y variable.
I think that it would be easier to make new data frame and then use it for plotting. Here I calculated proportions and lower/upper confidence interval values (took them from prop.test() result).
library(plyr)
mt.new<-ddply(mtcars,.(cyl),summarise,
prop=sum(am)/length(am),
low=prop.test(sum(am),length(am))$conf.int[1],
upper=prop.test(sum(am),length(am))$conf.int[2])
ggplot(mt.new,aes(as.factor(cyl),y=prop,ymin=low,ymax=upper))+
geom_bar(stat="identity")+
geom_errorbar()
Is there a way to specify that I want the bars of a stacked bar graph in with ggplot ordered in terms of the total of the four factors from least to greatest? (so in the code below, I want to order by the total of all of the variables) I have the total for each x value in a dataframe that that I melted to create the dataframe from which I formed the graph.
The code that I am using to graph is:
ggplot(md, aes(x=factor(fullname), fill=factor(variable))) + geom_bar()
My current graph looks like this:
http://i.minus.com/i5lvxGAH0hZxE.png
The end result is I want to have a graph that looks a bit like this:
http://i.minus.com/kXpqozXuV0x6m.jpg
My data looks like this:
(source: minus.com)
and I melt it to this form where each student has a value for each category:
melted data http://i.minus.com/i1rf5HSfcpzri.png
before using the following line to graph it
ggplot(data=md, aes(x=fullname, y=value, fill=variable), ordered=TRUE) + geom_bar()+ opts(axis.text.x=theme_text(angle=90))
Now, I'm not really sure that I understand the way Chi does the ordering and if I can apply that to the data from either of the frames that I have. Maybe it's helpful that that the data is ordered in the original data frame that I have, the one that I show first.
UPDATE: We figured it out. See this thread for the answer:
Order Stacked Bar Graph in ggplot
I'm not sure about the way your data were generated (i.e., whether you use a combination of cast/melt from the reshape package, which is what I suspect given the default name of your variables), but here is a toy example where sorting is done outside the call to ggplot. There might be far better way to do that, browse on SO as suggested by #Andy.
v1 <- sample(c("I","S","D","C"), 200, rep=T)
v2 <- sample(LETTERS[1:24], 200, rep=T)
my.df <- data.frame(v1, v2)
idx <- order(apply(table(v1, v2), 2, sum))
library(ggplot2)
ggplot(my.df, aes(x=factor(v2, levels=LETTERS[1:24][idx], ordered=TRUE),
fill=v1)) + geom_bar() + opts(axis.text.x=theme_text(angle=90)) +
labs(x="fullname")
To sort in the reverse direction, add decr=TRUE with the order command. Also, as suggested by #Andy, you might overcome the problem with x-labels overlap by adding + coord_flip() instead of the opts() option.
I'm having problems making a barplot using ggplot.
I tried different combinations of qplot and gplot, but I either get a histogram, or it swaps my bars or it decides to use log-scaling.
Using the ordinary plot functions. I would do it like:
d <- 1/(10:1)
names(d) <- paste("id", 1:10)
barplot(d)
To plot a bar chart in ggplot2, you have to use geom="bar" or geom_bar. Have you tried any of the geom_bar example on the ggplot2 website?
To get your example to work, try the following:
ggplot needs a data.frame as input. So convert your input data into a data.frame.
map your data to aesthetics on the plot using `aes(x=x, y=y). This tells ggplot which columns in the data to map to which elements on the chart.
Use geom_plot to create the bar chart. In this case, you probably want to tell ggplot that the data is already summarised using stat="identity", since the default is to create a histogram.
(Note that the function barplot that you used in your example is part of base R graphics, not ggplot.)
The code:
d <- data.frame(x=1:10, y=1/(10:1))
ggplot(d, aes(x, y)) + geom_bar(stat="identity")