ggplot2 continuous bar plot with multiple factors - r

I'm going to use the diamond data set that comes standard with the ggplot2 package to illustrate what I'm looking for.
I want to build a graph that is like this:
library(ggplot2)
ggplot(diamonds, aes(clarity, fill=cut)) + geom_bar(position="dodge")
However, instead of having a count, I would like to return the mean of a continuous variable. I'd like to return cut and color and get the mean carat. If I put in this code:
ggplot(diamonds, aes(carat, fill=cut)) + geom_bar(position="dodge")
My output is a count of the number of carats vs the cut.
Anyone know how to do this?

You can get a new data frame with mean(carat) grouped by cut and color and then plot:
library(plyr)
data <- ddply(diamonds, .(cut, color), summarise, mean_carat = mean(carat))
ggplot(data, aes(color, mean_carat,fill=cut))+geom_bar(stat="identity", position="dodge")
If you want faster solutions you can use either dplyr or data.table
With dplyr:
library(dplyr)
data <- group_by(diamonds, cut, color)%.%summarise(mean_carat=mean(carat))
With data.table:
library(data.table)
data <- data.table(diamonds)[,list(mean_carat=mean(carat)), by=c('cut', 'color')]
The code for the plot is the same for both.

Related

Boxplots in ggplot2 R

My goal is to visualize some data frames with ggplot2.
I have several data.frames looking like this
And my goal is a boxplot looking like this, just nicer.
I managed to get single boxplots using
plt <- ggplot(data, aes(RF, data$RF)) +
geom_boxplot()
plt
But that's not what I want.
library(ggplot2)
library(reshape)
airquality_m = melt(airquality)
ggplot(airquality_m, aes(variable, value )) + geom_boxplot()
I did not beautify the plot but I guess you get the idea here.
That boxplot you showed is created with base-r graphics. Single command
boxplot(data) will do it.
If you want to use ggplot, you have to first melt the dataframe and then plot.
library(reshape2)
datPlot <- melt(data)
ggplot(datPlot,aes(variable,value)) + geom_boxplot()
I guess this is what you want:
library(ggplot2)
library(reshape)
myddt_m = melt(mydata)
names(myddt_m)=c("Models","CI")
ggplot(myddt_m, aes(Models, CI,fill=Models )) + geom_boxplot()+guides(fill=FALSE)+labs( x="", y="C-Index")

ggplot bar chart for time series

I'm reading the book by Hadley Wickham about ggplot, but I have trouble to plot certain weights over time in a bar chart. Here is sample data:
dates <- c("20040101","20050101","20060101")
dates.f <- strptime(dates,format="%Y%m%d")
m <- rbind(c(0.2,0.5,0.15,0.1,0.05),c(0.5,0.1,0.1,0.2,0.1),c(0.2,0.2,0.2,0.2,0.2))
m <- cbind(dates.f,as.data.frame(m))
This data.frame has in the first column the dates and each row the corresponding weights. I would like to plot the weights for each year in a bar chart using the "fill" argument.
I'm able to plot the weights as bars using:
p <- ggplot(m,aes(dates.f))
p+geom_bar()
However, this is not exactly what I want. I would like to see in each bar the contribution of each weight. Moreover, I don't understand why I have the strange format on the x-axis, i.e. why there is "2004-07" and "2005-07" displayed.
Thanks for the help
Hope this is what you are looking for:
ggplot2 requires data in a long format.
require(reshape2)
m_molten <- melt(m, "dates.f")
Plotting itself is done by
ggplot(m_molten, aes(x=dates.f, y=value, fill=variable)) +
geom_bar(stat="identity")
You can add position="dodge" to geom_bar if you want then side by side.
EDIT
If you want yearly breaks only: convert m_molten$dates.f to date.
require(scales)
m_molten$dates.f <- as.Date(m_molten$dates.f)
ggplot(m_molten, aes(x=dates.f, y=value, fill=variable)) +
geom_bar(stat="identity") +
scale_x_date(labels = date_format("%y"), breaks = date_breaks("year"))
P.S.: See http://vita.had.co.nz/papers/tidy-data.pdf for Hadley's philosophy of tidy data.
To create the plot you need, you have to reshape your data from "wide" to "tall". There are many ways of doing this, including the reshape() function in base R (not recommended), reshape2 and tidyr.
In the tidyr package you have two functions to reshape data, gather() and spread().
The function gather() transforms from wide to tall. In this case, you have to gather your columns V1:V5.
Try this:
library("tidyr")
tidy_m <- gather(m, var, value, V1:V5)
ggplot(tidy_m,aes(x = dates.f, y=value, fill=var)) +
geom_bar(stat="identity")

How to plot graph type='h' in ggplot2

I'm wondering how to plot in ggplot2 something like this:
let's say I've got two numeric vectors:
time <-c(1,3,4,6,9,10,12), n.censor<-c(0,0,1,4,0,3,1)
and I'd like to plot:
plot(n.censor~time,type='h')
How to achieve something like this in ggplot2 ?
In this case, a "histogram" is the look you want, but the data are coded as if they're for a bar graph, since they are already aggregated. As such, your stat will be "identity".
Here's some code to use for ggplot() :
# first put your vectors into a data.frame
df <- data.frame(time, n.censor)
# plot
ggplot(df, aes(x=time, y=n.censor))+
geom_bar(stat="identity")
# or alternatively, with the histogram layer:
ggplot(df, aes(x=time, y=n.censor))+
geom_histogram(stat="identity")

Plot results from dist_tab() function from qdap library

I am interested in plotting the results from the following code which produces a frequency distribution table. I would like to graph the Freq column as a bar with the cum.Freq as a line both sharing the interval column as the x-axis.
library("qdap")
x <- c(1,2,3,2,4,2,5,4,6,7,8,9)
dist_tab(x)
I have been able to get the bar chart built using ggplot, but I want to take it further with the cum.Freq added as a secondary axis. I also want to add the percent and cum.percent values added as data labels. Any help is appreciated.
library("ggplot2")
ggplot(dist_tab(x), aes(x=interval)) + geom_bar(aes(y=Freq))
Not sure if I understand your question. Is this what you are looking for?
df <- dist_tab(x)
df.melt <- melt(df, id.vars="interval", measure.vars=c("Freq", "cum.Freq"))
#
ggplot(df.melt, aes(x=interval, y=value, fill=variable)) +
geom_bar(stat="identity", position="dodge")

Using density in stat_bin with factor variables

It seems density plot in stat_bin doesn't work as expected for factor variables. The density is 1 for each category on y-axis.
For example, using diamonds data:
diamonds_small <- diamonds[sample(nrow(diamonds), 1000), ]
ggplot(diamonds_small, aes(x = cut)) + stat_bin(aes(y=..density.., fill=cut))
I understand I could use
stat_bin(aes(y=..count../sum(..count..), fill=cut))
to make it work. However, according to the docs of stat_bin, it should works with categorical variables.
You can get what you (might) want by setting the group aesthetic manually.
ggplot(diamonds_small, aes(x = cut)) + stat_bin(aes(y=..density..,group=1))
However, you can't easily fill differently within a group. You can summarize the data yourself:
library(plyr)
ddply(diamonds_small,.(cut),
function(x) data.frame(dens=nrow(x)/nrow(diamonds_small)))
ggplot(dd_dens,aes(x=cut,y=dens))+geom_bar(aes(fill=cut),stat="identity")
A slightly more compact version of the summarization step:
as.data.frame.table(prop.table(table(diamonds_small$cut)))

Resources