Proportional barplot in R - r

I am trying to create a proportional barplot in R, so far I have managed to do something like this:
library(ggplot2)
ggplot(data=diamonds)+
geom_bar(aes(x=cut,y=..prop..,fill=color))
This obviously does not work, but neither does this:
ggplot(data=diamonds)+
geom_bar(aes(x=cut,y=..prop..,fill=color,group=1))
or this:
ggplot(data=diamonds)+
geom_bar(aes(x=cut,y=..count../sum(..count..),fill=color))
This works:
ggplot(data=diamonds)+
geom_bar(aes(x=cut,y=..count../sum(..count..),fill=color),position="fill")
But I would like bars to be side by side within a category.
What I want to do is to get proportional barplot without transforming my data before

I think you need to aggregate first and then use position="dodge":
diamonds2 <- aggregate(carat ~ cut + color, diamonds, length)
ggplot(data = transform(diamonds2, p = ave(carat, cut, FUN = function(x) x/sum(x))),
aes(x = cut, y = p, fill=color))+
geom_bar(stat = "identity", position = "dodge")
The resulting plot:

EDIT after OP's comment
If you want conditional and side-by-side histograms use geom_bar(stat = "identity", position = "dodge") when you call your conditional histogram plot with ggplot2 (I display the first 100 rows of data for sake of clarity)
library(ggplot2)
ggplot(data = diamonds[1:100, ], aes(cut, carat, fill = color)) + geom_bar(stat = "identity", position = "dodge")

Related

GGplot geom_bar stack items with even spacing

I was wondering if anyone has a solution for me as I would like to visualize a stacked bar chart that kinda looks like this:
This was made with a little data.table and the ggplot code below
library(data.table)
library(ggplot2)
dt <- data.table(id = seq(15), pvalue = c(0.0323616533686601, 0.00405825892193357, 0.00406609088355357, 0.00252697950679603, 0.00277696431629866, 0.0212521760053885, 0.0315721033650767, 0.00716594255390525, 0.00829537987151543, 0.0163753389504665, 0.0328650069220695, 0.0146991756928858, 0.0178425139730873, 0.00345987886149332, 0.0499748920124661))
ggplot(dt, aes(1, id, fill = pvalue)) + geom_bar(stat = 'identity')
But I'm looking for a slight modification. The data has an id column ranging from 1 to 15, this causes every item to have the corresponding size. But I would like to have them the same height/size.
This can be achieved with this bit of code:
ggplot(dt, aes(id, fill = pvalue)) + geom_bar(stat = 'count') + coord_flip()
But when I run this bit, I loose the ability to color them correctly (with scale_fill_gradient2)
Let me know if you find a nice solution :)
I think adding group= is what you are after:
ggplot(dt, aes(y=id, fill = pvalue, group=id)) +
geom_bar()
And if you define y= you don't need to coord_flip()
ps, geom_col() is the same as geom_bar(stat = 'identity')

Filling stacked/dodged bar with different colors

I'm trying to build a chart combining stacked and dodged to compare two business lines over months on two different KPIs (VOL and NV).
I would have something like this:
(https://imgur.com/a/IambH09)
I would use 4 different colours but even using the
scale_fill_manual
it uses just the first two for all the categories.
Do you think it is possible? Otherwise I don't go further adjusting other details
Thanks
Bruno
this is result I'm stucked with:
https://imgur.com/a/5RJMMiN
df=data.frame(
SOC=rep(c("ENERGIA","ENERGIE"),each=4),
MESE_RIF=rep(c("2019_01","2019_02")),
CHURN_TYPE=rep(c("VOL","NV"),each=2),
CHURN_RATE=rep(c(1.35,1.14,0.23,0.22,1.49,1.54,0.13,0.10)),
NR_LOST=rep(c(8288,7010,1432,1372,2818,2857,247,186)))
#filling colors
fill <- c("#72A3C9", "#B9DDF1","#F07E27","#FFC786")
#graph
ggplot(df, aes(x = SOC, y = CHURN_RATE, fill = CHURN_TYPE)) +
geom_bar(position = "stack", stat = "identity") + facet_wrap( ~ MESE_RIF) +
geom_text(data=df, aes(label = (df$NR_LOST)), size=4,
position=position_stack(vjust = 0.5)) + scale_fill_manual(values=fill)
You have four fill colors, and fill is being mapped to CHURN_RATE, which has two values.
One approach could be to map fill to the combination of CHURN_RATE and SOC, like this.
ggplot(df, aes(x = SOC, y = CHURN_RATE,
fill = interaction(CHURN_TYPE, SOC))) +
...

How to label stacked histogram in ggplot

I am trying to add corresponding labels to the color in the bar in a histogram. Here is a reproducible code.
ggplot(aes(displ),data =mpg) + geom_histogram(aes(fill=class),binwidth = 1,col="black")
This code gives a histogram and give different colors for the car "class" for the histogram bars. But is there any way I can add the labels of the "class" inside corresponding colors in the graph?
The inbuilt functions geom_histogram and stat_bin are perfect for quickly building plots in ggplot. However, if you are looking to do more advanced styling it is often required to create the data before you build the plot. In your case you have overlapping labels which are visually messy.
The following codes builds a binned frequency table for the dataframe:
# Subset data
mpg_df <- data.frame(displ = mpg$displ, class = mpg$class)
melt(table(mpg_df[, c("displ", "class")]))
# Bin Data
breaks <- 1
cuts <- seq(0.5, 8, breaks)
mpg_df$bin <- .bincode(mpg_df$displ, cuts)
# Count the data
mpg_df <- ddply(mpg_df, .(mpg_df$class, mpg_df$bin), nrow)
names(mpg_df) <- c("class", "bin", "Freq")
You can use this new table to set a conditional label, so boxes are only labelled if there are more than a certain number of observations:
ggplot(mpg_df, aes(x = bin, y = Freq, fill = class)) +
geom_bar(stat = "identity", colour = "black", width = 1) +
geom_text(aes(label=ifelse(Freq >= 4, as.character(class), "")),
position=position_stack(vjust=0.5), colour="black")
I don't think it makes a lot of sense duplicating the labels, but it may be more useful showing the frequency of each group:
ggplot(mpg_df, aes(x = bin, y = Freq, fill = class)) +
geom_bar(stat = "identity", colour = "black", width = 1) +
geom_text(aes(label=ifelse(Freq >= 4, Freq, "")),
position=position_stack(vjust=0.5), colour="black")
Update
I realised you can actually selectively filter a label using the internal ggplot function ..count... No need to preformat the data!
ggplot(mpg, aes(x = displ, fill = class, label = class)) +
geom_histogram(binwidth = 1,col="black") +
stat_bin(binwidth=1, geom="text", position=position_stack(vjust=0.5), aes(label=ifelse(..count..>4, ..count.., "")))
This post is useful for explaining special variables within ggplot: Special variables in ggplot (..count.., ..density.., etc.)
This second approach will only work if you want to label the dataset with the counts. If you want to label the dataset by the class or another parameter, you will have to prebuild the data frame using the first method.
Looking at the examples from the other stackoverflow links you shared, all you need to do is change the vjust parameter.
ggplot(mpg, aes(x = displ, fill = class, label = class)) +
geom_histogram(binwidth = 1,col="black") +
stat_bin(binwidth=1, geom="text", vjust=1.5)
That said, it looks like you have other issues. Namely, the labels stack on top of each other because there aren't many observations at each point. Instead I'd just let people use the legend to read the graph.

Individual Gradient Fill with Facet Wrap

I'm trying to achieve an output where the fill gradient is independent on each histogram. I know I could make individual plots and then combine them using grid.arrange, but I want this to work on a data set with any number of columns.
Any help is appreciated.
P.S. I would include an image but I don't have the reputation points.
# rm(list=ls())
var_his <- function(this_data){
this_data <- melt(this_data)
ggplot(this_data, aes(x = value)) +
geom_histogram(aes(x = value, y = ..density.., fill = ..count..), position="identity") +
facet_wrap(~variable, scales = "free") +
scale_fill_gradient('count', low='lightblue', high='steelblue')
}
data(Seatbelts)
data <- data.frame(Seatbelts)
var_his(data)

What is the simplest method to fill the area under a geom_freqpoly line?

The x-axis is time broken up into time intervals. There is an interval column in the data frame that specifies the time for each row. The column is a factor, where each interval is a different factor level.
Plotting a histogram or line using geom_histogram and geom_freqpoly works great, but I'd like to have a line, like that provided by geom_freqpoly, with the area filled.
Currently I'm using geom_freqpoly like this:
ggplot(quake.data, aes(interval, fill=tweet.type)) + geom_freqpoly(aes(group = tweet.type, colour = tweet.type)) + opts(axis.text.x=theme_text(angle=-60, hjust=0, size = 6))
I would prefer to have a filled area, such as provided by geom_density, but without smoothing the line:
The geom_area has been suggested, is there any way to use a ggplot2-generated statistic, such as ..count.., for the geom_area's y-values? Or, does the count aggregation need to occur prior to using ggplot2?
As stated in the answer, geom_area(..., stat = "bin") is the solution:
ggplot(quake.data, aes(interval)) + geom_area(aes(y = ..count.., fill = tweet.type, group = tweet.type), stat = "bin") + opts(axis.text.x=theme_text(angle=-60, hjust=0, size = 6))
produces:
Perhaps you want:
geom_area(aes(y = ..count..), stat = "bin")
geom_ribbon can be used to produce a filled area between two lines without needing to explicitly construct a polygon. There is good documentation here.
ggplot(quake.data, aes(interval, fill=tweet.type, group = 1)) + geom_density()
But I don't think this is a meaningful graphic.
I'm not entirely sure what you're aiming for. Do you want a line or bars. You should check out geom_bar for filled bars. Something like:
p <- ggplot(data, aes(x = time, y = count))
p + geom_bar(stat = "identity")
If you want a line filled in underneath then you should look at geom_area which I haven't personally used but it appears the construct will be almost the same.
p <- ggplot(data, aes(x = time, y = count))
p + geom_area()
Hope that helps. Give some more info and we can probably be more helpful.
Actually i would throw on an index, just the row of the data and use that as x, and then use
p <- ggplot(data, aes(x = index, y = count))
p + geom_bar(stat = "identity") + scale_x_continuous("Intervals",
breaks = index, labels = intervals)

Resources