Creating a grouped bar plot in R using barplot() from raw data - r

I have the following data:
CT VT TT
A* 5.923076923 6.529411765 5.305555556
Not A* 5.555555556 6.434782609 5.352941176
I want to make a grouped bar chart in R from the data such that the grouping is on A* and Not A*, the x-axis ticks are CT, VT and TT and the numeric values are plotted in the y-direction.
What do I need to do to produce the bar plot from this raw .csv data?

Next time, you should provide a reproducible example, but I use ggplot2 to create the desired bar plot:
Before jumping into the main body, make sure you have the required packages installed as follows:
install.packages(c("ggplot2","data.table"))
Now for a stacked bar chart:
require(ggplot2)
require(data.table)
data <- data.frame(CT = c( 5.923076923 ,5.555555556),
VT = c(6.529411765,6.434782609),
TT = c(5.305555556, 5.352941176))
rownames(data) <- c("A*", "Not A*")
long_format <- melt(as.matrix(data))
ggplot(long_format, aes(x = Var2,
y = value,
fill = Var1)) +
geom_col()
A grouped bar chart:
ggplot(data = long_format,
aes(x = Var2,
y = value,
fill = Var1)) +
geom_bar(position = "dodge",
stat = "identity")

Related

Variable distances between bars on ggplot bar plot

I have the following bar plot created using R ggplot. How do I dynamically update the distances between the bars on the plot using the 'distance' column of the same data frame.
library(tidyverse)
data.frame(name = c("A","B","C","D","E"),
value = c(34,45,23,45,75),
distance = c(3,4,1,2,5)) %>%
ggplot(aes(x = name, y = value)) +
geom_col()

How can I draw grouped bar charts, segmented bar plot and spine plot using ggplot2?

I'm trying to draw a grouped bar plot in r
Here is my code:
xtable <- xtabs(~ view + grade, data=hs)
xtable
barplot(xtable, beside = T, legend.text = T)
library(reshape2)
data.m <- melt(xtable, id.vars='view')
data.m
# plot
ggplot(data.m, aes(grade, value)) + geom_bar(aes(fill = view),
width = 0.4, position = "dodge", stat="identity") +
theme(legend.position="top", legend.title =
element_blank(),axis.title.x=element_blank(),
axis.title.y=element_blank())
View and grade are two properties of homes sold. Grade is a value between 0 to 13 showing the rank of a home, and view is 0 to 4 showing how good is the view of the home.
The usual barplot command in r works oaky. However, I liked a ggplot for it.
I followed the answer of similar questions, but I get a stacked bar instead of a grouped one. Also, how can I generate a segmented bar plot and spine plot using the same data?
Your code considers view as continuous, where it is not. Convert it to factor.
library(ggplot2)
library(reshape2)
hs <- read.csv(file = file.choose())
xtable <- xtabs(formula = (~ view + grade),
data = hs)
data.m <- melt(data = xtable,
id.vars='view')
ggplot(data = data.m,
mapping = aes(x = grade,
y = value)) +
geom_bar(mapping = aes(fill = factor(x = view)),
position = "dodge",
stat="identity")
This generates the following, which you can modify later to make it look nicer.

ggplot2 boxplot medians aren't plotting as expected

So, I have a fairly large dataset (Dropbox: csv file) that I'm trying to plot using geom_boxplot. The following produces what appears to be a reasonable plot:
require(reshape2)
require(ggplot2)
require(scales)
require(grid)
require(gridExtra)
df <- read.csv("\\Downloads\\boxplot.csv", na.strings = "*")
df$year <- factor(df$year, levels = c(2010,2011,2012,2013,2014), labels = c(2010,2011,2012,2013,2014))
d <- ggplot(data = df, aes(x = year, y = value)) +
geom_boxplot(aes(fill = station)) +
facet_grid(station~.) +
scale_y_continuous(limits = c(0, 15)) +
theme(legend.position = "none"))
d
However, when you dig a little deeper, problems creep in that freak me out. When I labeled the boxplot medians with their values, the following plot results.
df.m <- aggregate(value~year+station, data = df, FUN = function(x) median(x))
d <- d + geom_text(data = df.m, aes(x = year, y = value, label = value))
d
The medians plotted by geom_boxplot aren't at the medians at all. The labels are plotted at the correct y-axis value, but the middle hinge of the boxplots are definitely not at the medians. I've been stumped by this for a few days now.
What is the reason for this? How can this type of display be produced with correct medians? How can this plot be debugged or diagnosed?
The solution to this question is in the application of scale_y_continuous. ggplot2 will perform operations in the following order:
Scale Transformations
Statistical Computations
Coordinate Transformations
In this case, because a scale transformation is invoked, ggplot2 excludes data outside the scale limits for the statistical computation of the boxplot hinges. The medians calculated by the aggregate function and used in the geom_text instruction will use the entire dataset, however. This can result in different median hinges and text labels.
The solution is to omit the scale_y_continuous instruction and instead use:
d <- ggplot(data = df, aes(x = year, y = value)) +
geom_boxplot(aes(fill = station)) +
facet_grid(station~.) +
theme(legend.position = "none")) +
coord_cartesian(y = c(0,15))
This allows ggplot2 to calculate the boxplot hinge stats using the entire dataset, while limiting the plot size of the figure.

faceted piechart with ggplot

I have the following data.frame:
x = data.frame(category=c(1,1,1,1,2,2,2,2), value=c(1,2,1,1,2,2,2,1));
x$category = as.factor(x$category);
x$value = as.factor(x$value);
and I have created a faceted bar chart with ggplot2.
ggplot(x, aes(value, fill=category)) + geom_bar() + facet_wrap(~category);
However, I would like to have a pie chart that shows the fraction values (based on the totals for each category). The diagram should then show one pie chart for each category and two fractions inside each pie chart, one for each value factor. The real data has up to 6 categories and I have a few 1000 data sets). Is there a generic way to do that?
One way is to calculate the percentage/ratio beforehand and then use it to get the position of the text label. See also how to put percentage label in ggplot when geom_text is not suitable?
# Your data
y = data.frame(category=c(1,1,1,1,2,2,2,2), value=c(2,2,1,1,2,2,2,1))
# get counts and melt it
data.m = melt(table(y))
names(data.m)[3] = "count"
# calculate percentage:
m1 = ddply(data.m, .(category), summarize, ratio=count/sum(count))
#order data frame (needed to comply with percentage column):
m2 = data.m[order(data.m$category),]
# combine them:
mydf = data.frame(m2,ratio=m1$ratio)
# get positions of percentage labels:
mydf = ddply(mydf, .(category), transform, position = cumsum(count) - 0.5*count)
# create bar plot
pie = ggplot(mydf, aes(x = factor(1), y = count, fill = as.factor(value))) +
geom_bar(stat = "identity", width = 1) +
facet_wrap(~category)
# make a pie
pie = pie + coord_polar(theta = "y")
# add labels
pie + geom_text(aes(label = sprintf("%1.2f%%", 100*ratio), y = position))

how to append line graph above barplot in ggplot

I have dataframe which have column age, gender (Male/Female). I want to plot grouped bar plot by Age and want to append line plot of ratio of male to female of each age.
test is dataframe with age, gender as column
ratio_df is new data frame store ratio of male to female in each age
ratio_df <- ddply(test, 'age', function(x) c('ratio' = sum(test$gender == 'Male') / sum(test$gender == 'Female')))
ggplot with barplot and ratio line in ggplot
ggplot(data = test, aes(x = factor(age), fill = gender)) + geom_bar() + geom_line(data = ratio_df, aes(x = age, y = ratio))
As mentioned above, your ddply call seems off to me - I think it always yields the same ratio (over the whole dataframe). I could not figure out a compact elegant one from the top of my head so I had to resort to a somewhat clunky one but it does work.
EDIT: I changed the code to reflect the workaround described by http://rwiki.sciviews.org/doku.php?id=tips:graphics-ggplot2:aligntwoplots to adress the OP's comment.
#sample data
test=data.frame(gender=c("m","m","f","m","f","f","f"),age=c(1,3,4,4,3,4,4))
require(plyr)
age_N <- ddply(test, c("age","gender"), summarise, N=length(gender))
require(reshape2)
ratio_df <- dcast(age_N, age ~ gender, value.var="N", fill=0)
ratio_df$ratio <- ratio_df$m / (ratio_df$f+ratio_df$m)
#create variables for facetting
test$panel = rep("Distribution",length(test$gender))
ratio_df$panel = rep("Ratio",length(ratio_df$ratio))
test$panel <- factor(test$panel,levels=c("Ratio","Distribution"))
require(ggplot2)
g <- ggplot(data = test, aes(x = factor(age)))
g <- g + facet_wrap(~panel,scale="free",ncol=1)
g <- g + geom_line(data = ratio_df, aes(x = factor(age), y = ratio, group=1))
g <- g + geom_bar(aes(fill=gender))
print(g)
Is this what you are looking for? However, I think #SvenHohenstein is right that the line does not any information as the split is evident form the fill.

Resources