how to get a stacked bar chart for a dummy variable - r

my column in a dataset look like this:
teacher student
y n
y n
y y
y n
y n
n n
n n
n y
y y
n y
y n
I used
barchart(data$teacher)
for a graph for teacher, which shows the frequency of y and n in two separate bars, but now I want to show y and n stacked for both variables, so one bar each variable. I tried many things like chart.StackedBar but they all didn't work. Thanks for any help!

Read the man pages, bro. This is what you want:
barplot(matrix(c(table(df$teacher), table(df$student)), ncol=2),
col=c('red', 'blue'),
names.arg=c('teacher', 'student'),
legend.text=c('y', 'n'))

EDIT: based on your comment, is this what you are looking for?
library(reshape2)
tmp <- melt(dat, id.vars = NULL)
names(tmp) <- c('Occupation', 'ID')
ggplot(data = tmp, aes(x = Occupation, fill= ID)) + geom_histogram()
ORIGINAL:
I've approached this type of graph using ggplot. Here is a simple example:
library(ggplot2)
set.seed(1618)
dat <- data.frame(teacher = sample(c('y','n'),10,replace=T),
student = sample(c('y','n'),10,replace=T))
ggplot(data = dat, aes(x = teacher, fill = student)) + geom_histogram()
You might also consider
ggplot(data = dat, aes(x = teacher, fill = student)) +
geom_histogram(alpha= .5, position = 'identity')
Which would look like:
If you can't tell, the second graph is just "overlaying" the bars rather than stacking them.
I'm not great at ggplot, but hopefully this helps a bit.

Related

How to draw different line segment with different facets

I have a question about using geom_segment in R ggplot2.
For example, I have three facets and two clusters of points(points which have the same y values) in each facets, how do I draw multiple vertical line segments for each clustering with geom_segment?
Like if my data is
x <- (1:24)
y <- (rep(1,2),2,rep(2,2),1,rep(3,2),4, rep(4,1),5,6, ..rep(8,2),7)
facets <-(1,2,3)
factors <-(1,2,3,4,5,6)
xmean <- ( (1+2+3)/3, (4+5+6)/3, ..., (22+23+24)/3)
Note: (1+2+3)/3 is the mean first cluster in the first facet and (4+5+6)/3 is the mean second cluster in the second facet and (7+8+9)/3 is the first cluster in the second facet.
My Code:
ggplot(,aes(x=as.numeric(x),y=as.numeric(y),color=factors)+geom_point(alpha=0.85,size=1.85)+facet_grid(~facets)
+geom_segment(what should I put here to draw this line in different factors?)
Desired result:
Please see the picture!
Please see the updated picture!
Thank you so much! Have a nice day :).
Maybe this is what you are looking for. Instead of working with vectors put your data in a dataframe. Doing so you could easily make an aggregated dataframe with the mean values per facet and cluster which makes it easy to the segments:
Note: Wasn't sure about the setup of your data. You talk about two clusters per facet but your data has 8. So I slightly changed the example data.
library(ggplot2)
library(dplyr)
df <- data.frame(
x = 1:24,
y = rep(1:6, each = 4),
facets = rep(1:3, each = 8)
)
df_sum <- df %>%
group_by(facets, y) %>%
summarise(x = mean(x))
#> `summarise()` has grouped output by 'facets'. You can override using the `.groups` argument.
ggplot(df, aes(x, y, color = factor(y))) +
geom_point(alpha = 0.85, size = 1.85) +
geom_segment(data = df_sum, aes(x = x, xend = x, y = y - .25, yend = y + .25), color = "black") +
facet_wrap(~facets)

How to plot multiple facets histogram with ggplot in r?

i have a dataframe structured like this
Elem. Category. SEZa SEZb SEZc
A. ONE. 1. 3. 4
B. TWO. 4. 5. 6
i want to plot three histograms in three different facets (SEZa, SEZb, SEZc) with ggplot where the x values are the category values (ONE. e TWO.) and the y values are the number present in columns SEZa, SEZb, SEZc.
something like this:
how can I do? thank you for your suggestions!
Assume df is your data.frame, I would first convert from wide format to a long format:
new_df <- reshape2::melt(df, id.vars = c("Elem", "Category"))
And then make the plot using geom_col() instead of geom_histogram() because it seems you've precomputed the y-values and wouldn't need ggplot to calculate these values for you.
ggplot(new_df, aes(x = Category, y = value, fill = Elem)) +
geom_col() +
facet_grid(variable ~ .)
I think that what you are looking for is something like this :
library(ggplot2)
library(reshape2)
df <- data.frame(Category = c("One", "Two"),
SEZa = c(1, 4),
SEZb = c(3, 5),
SEZc = c(4, 6))
df <- melt(df)
ggplot(df, aes(x = Category, y = value)) +
geom_col(aes(fill = variable)) +
facet_grid(variable ~ .)
My inspiration is :
http://felixfan.github.io/stacking-plots-same-x/

Plotting a time series where color depends on a category with ggplot

Consider this minimum working example:
library(ggplot2)
x <- c(1,2,3,4,5,6)
y <- c(3,2,5,1,3,1)
data <- data.frame(x,y)
pClass <- c(0,1,1,2,2,0)
plottedGraph <- ggplot(data, aes(x = x, y = y, colour = factor(pClass))) + geom_line()
print(plottedGraph)
I have a time series y = f(x) where x is a timestep. Each timestep should have a color which depends on the category of the timestep, recorded in pClass.
This is the result it gives:
It doesn't make any kind of sense to me why ggplot would connect points with the same color together and not points that follow each other (which is what geom_line should do according to the documentation).
How do I make it plot the following:
You should use group = 1 inside the aes() to tell ggplot that the different colours in fact belong to the same line (ie. group).
ggplot(data, aes(x = x, y = y, colour = factor(pClass), group = 1)) +
geom_line()

how to append line graph above barplot in ggplot

I have dataframe which have column age, gender (Male/Female). I want to plot grouped bar plot by Age and want to append line plot of ratio of male to female of each age.
test is dataframe with age, gender as column
ratio_df is new data frame store ratio of male to female in each age
ratio_df <- ddply(test, 'age', function(x) c('ratio' = sum(test$gender == 'Male') / sum(test$gender == 'Female')))
ggplot with barplot and ratio line in ggplot
ggplot(data = test, aes(x = factor(age), fill = gender)) + geom_bar() + geom_line(data = ratio_df, aes(x = age, y = ratio))
As mentioned above, your ddply call seems off to me - I think it always yields the same ratio (over the whole dataframe). I could not figure out a compact elegant one from the top of my head so I had to resort to a somewhat clunky one but it does work.
EDIT: I changed the code to reflect the workaround described by http://rwiki.sciviews.org/doku.php?id=tips:graphics-ggplot2:aligntwoplots to adress the OP's comment.
#sample data
test=data.frame(gender=c("m","m","f","m","f","f","f"),age=c(1,3,4,4,3,4,4))
require(plyr)
age_N <- ddply(test, c("age","gender"), summarise, N=length(gender))
require(reshape2)
ratio_df <- dcast(age_N, age ~ gender, value.var="N", fill=0)
ratio_df$ratio <- ratio_df$m / (ratio_df$f+ratio_df$m)
#create variables for facetting
test$panel = rep("Distribution",length(test$gender))
ratio_df$panel = rep("Ratio",length(ratio_df$ratio))
test$panel <- factor(test$panel,levels=c("Ratio","Distribution"))
require(ggplot2)
g <- ggplot(data = test, aes(x = factor(age)))
g <- g + facet_wrap(~panel,scale="free",ncol=1)
g <- g + geom_line(data = ratio_df, aes(x = factor(age), y = ratio, group=1))
g <- g + geom_bar(aes(fill=gender))
print(g)
Is this what you are looking for? However, I think #SvenHohenstein is right that the line does not any information as the split is evident form the fill.

normalize ggplot histogram so that first height is 1 (to show growth) in R

I was wondering if there is a way to normalize the heights of the histograms with multiple groups so that their first heights are all = 1. For instance:
results <- rep(c(1,1,2,2,2,3,1,1,1,3,4,4,2,5,7),3)
category <- rep(c("a","b","c"),15)
data <- data.frame(results,category)
p <- ggplot(data, aes(x=results, fill = category, y = ..count..))
p + geom_histogram(position = "dodge")
gives a regular histogram with 3 groups.
Also
results <- rep(c(1,1,2,2,2,3,1,1,1,3,4,4,2,5,7),3)
category <- rep(c("a","b","c"),15)
data <- data.frame(results,category)
p <- ggplot(data, aes(x=results, fill = category, y = ..ncount..))
p + geom_histogram(position = "dodge")
gives a the histogram where each group is normalized to have maximum height of 1.
I want to get a histogram where each group is normalized to have first height of 1 (so I can show growth) but I don't understand if there is an appropriate alternative to ..ncount or ..count.. or if anyone can help me understand the structure of ..count.. I could maybe figure it out from there.
Thanks!
I bet there is a nice way to do everything within ggplot. However, I tend to prefer preparing the desired data set before I plug it into ggplot. If I understood you correctly, you may try something like this:
# convert 'results' to factor and set levels to get an equi-spaced 'results' x-axis
df$results <- factor(df$results, levels = 1:7)
# for each category, count frequency of 'results'
df <- as.data.frame(with(df, table(results, category)))
# normalize: for each category, divide all 'Freq' (heights) with the first 'Freq'
df$freq2 <- with(df, ave(Freq, category, FUN = function(x) x/x[1]))
ggplot(data = df, aes(x = results, y = freq2, fill = category)) +
geom_bar(stat = "identity", position = "dodge")
It looks like ..density.. does what you want, but I can't for the life of me find documentation on it. On both your examples it does what you are looking for, though!
results <- rep(c(1,1,2,2,2,3,1,1,1,3,4,4,2,5,7),3)
category <- rep(c("a","b","c"),15)
data <- data.frame(results,category)
p <- ggplot(data, aes(x=results, fill = category, y = ..density..))
p + geom_histogram(position = "dodge")

Resources