Below is the dataset.
https://docs.google.com/spreadsheet/ccc?key=0AjmK45BP3s1ydEUxRWhTQW5RczVDZjhyell5dUV4YlE#gid=0
Code:
counts = table(finaldata$satjob, finaldata$degree)
barplot(counts, xlab="Highest Degree after finishing 9-12th Grade",col = c("Dark Blue","Blueviolet","deepPink4","goldenrod"), legend =(rownames(counts)))
The below barplot is the result of the above code.
https://docs.google.com/file/d/0BzmK45BP3s1yVkx5OFlGQk5WVE0/edit
Now, i want to create the plot for relative frequency table of "counts".
For creating a relative frequency table, I need the divide each cell of the column by the column total to get the relative frequency for that cell and so for others as well. How to go about doing it.
I have tried this formula counts/sum(counts) , but this is not working. counts[1:4]/sum(counts[1:4]), this gives me the relative frequency of the first column.
Help me obtain the same for other columns as well in the same table.
I'm a big fan of plyr & ggplot2, so you may have to download a few packages for the below to work.
install.packages('ggplot2') # only have to run once
install.packages('plyr') # only have to run once
install.packages('scales') # only have to run once
library(plyr)
library(ggplot2)
library(scales)
# dat <- YOUR DATA
dat_count <- ddply(ft, .(degree, satjob), 'count')
dat_rel_freq <- ddply(dat, .(degree), transform, rel_freq = freq/sum(freq))
ggplot(dat_rel_freq, aes(x = degree, y = rel_freq, fill = satjob)) +
geom_bar(stat = 'identity') +
scale_y_continuous(labels = percent) +
labs(title = 'Highest Degree After finishing 9-12th Grade\n',
x = '',
y = '',
fill = 'Job Satisfaction')
Related
I've searched everywhere but cannot seem to find even a messy / hacked way of creating this plot.
I would like to plot a column chart with:
x = categorical factor, sorted in descending y order
y = numeric variable, summed
fill = categorical factor, sorted in descending y order - BUT having this calculated separately for each occurrence of x.
For example, the below code (using data from datasets) will nearly sort everything as I want, but I cannot for the life of me figure out how to tell ggplot to reorder the fill for each x.
library(tidyverse)
UCBAdmissions <- as.data.frame(UCBAdmissions)
UCBAdmissions$Dept <- as.factor(UCBAdmissions$Dept)
UCBAdmissions$Gender <- as.factor(UCBAdmissions$Gender)
plot <- UCBAdmissions %>%
ggplot(aes(
x = fct_reorder(Dept, Freq, .fun = sum),
y = Freq,
fill = fct_reorder(Gender, Freq, .fun = sum)
)) +
geom_col() + coord_flip() + labs(fill = "gender")
plot
I would like to keep Dept A showing Male closest to the axis, then Female,
but change Dept E to show Female closest (or any Dept where Female > Male).
Any ideas? Open to a messy solution at this point :)
Thanks in advance for your help.
From the position_stack help here:
position_fill() and position_stack() automatically stack values in
reverse order of the group aesthetic
So we can get what you want by adding mapping group to frequency. Since the data includes two Admit categories, I did some pre-processing here to combine them.
Now for each Dept, the stacking order is determined by which Gender has the higher number.
plot <- UCBAdmissions %>%
count(Dept, Gender, wt = Freq) %>% # outputs n = total Freq per Dept/Gender
ggplot(aes(
x = fct_reorder(Dept, n, .fun = sum),
y = n,
group = n,
fill = fct_reorder(Gender, n, .fun = sum)
)) +
geom_col() + coord_flip() + labs(fill = "gender")
plot
i need your help.
I was trying to do a stacked bar plot in R and i m not succeding for the moment. I have read several post but, no succed neither.
Like i am newbie, this is the chart I want (I made it in excel)
And this is how i have the data
Thank you in advance
I would use the package ggplot2 to create this plot as it is easier to position text labels than compared to the basic graphics package:
# First we create a dataframe using the data taken from your excel sheet:
myData <- data.frame(
Q_students = c(1000,1100),
Students_with_activity = c(950, 10000),
Average_debt_per_student = c(800, 850),
Week = c(1,2))
# The data in the dataframe above is in 'wide' format, to use ggplot
# we need to use the tidyr package to convert it to 'long' format.
library(tidyr)
myData <- gather(myData,
Condition,
Value,
Q_students:Average_debt_per_student)
# To add the text labels we calculate the midpoint of each bar and
# add this as a column to our dataframe using the package dplyr:
library(dplyr)
myData <- group_by(myData,Week) %>%
mutate(pos = cumsum(Value) - (0.5 * Value))
#We pass the dataframe to ggplot2 and then add the text labels using the positions which
#we calculated above to place the labels correctly halfway down each
#column using geom_text.
library(ggplot2)
# plot bars and add text
p <- ggplot(myData, aes(x = Week, y = Value)) +
geom_bar(aes(fill = Condition),stat="identity") +
geom_text(aes(label = Value, y = pos), size = 3)
#Add title
p <- p + ggtitle("My Plot")
#Plot p
p
so <- data.frame ( week1= c(1000,950,800), week2=c(1100,10000,850),row.names = c("Q students","students with Activity","average debt per student")
barplot(as.matrix(so))
I would like to create a bar chart with ggplot in R.
The sample data is as follows:
Name <- c('Sample1', 'Sample2', 'Sample3')
Total <- c(86020045,30974095,1520609)
Part <- c(41348957, 2956650, 595121)
DT <- data.frame(Name,Total,Part)
DT
ggplot(DT, aes(Name, Total, fill=Name)) +
geom_bar(position="stack",stat="identity")
What I would like to show is the stack bar chart that shows each Name's Total counts, and show the Part counts within the bar + label the % of it on in the middle of the bar.
Is there any way possible to do this? I've been searching on here but haven't been able to find a solution.
Oh... It seems like someone already commented the answer while I was writing it down. I'll post mine anyways since it's slightly different.
DT <- transform(DT, Part0 = Total - Part)
library(reshape2)
DT2 <- melt(DT, id.vars = c("Name", "Total"))
DT2 <- transform(DT2, perc = value/Total * 100)
ggplot(DT2, aes(Name, perc, fill=variable)) +
geom_bar(position="stack",stat="identity") +
geom_text(data = subset(DT2, variable == "Part"), aes(y = (perc),
label = paste0("Total = ", Total, "\n",
"Part = ", value, "\n",
round(perc, 1), "%\n")))
If you use value instead of perc you will get a proportional bar chart but since the total for sample 3 is a lot smaller than sample 1, it's going to be difficult to read the table. So I decided to use percentage instead of the actual values.
I`m having trouble constructing an histogram from a matrix in R
The matrix contains 3 treatments(lamda0.001, lambda0.002, lambda0.005 for 4 populations rec1, rec2, rec3, con1). The matrix is:
lambda0.001 lambda0.002 lambda.003
rec1 1.0881688 1.1890554 1.3653264
rec2 1.0119031 1.0687678 1.1751051
rec3 0.9540271 0.9540271 0.9540271
con1 0.8053506 0.8086985 0.8272758
my goal is to plot a histogram with lambda in the Y axis and four groups of three treatments in X axis. Those four groups should be separated by a small break from eache other.
I need help, it doesn`t matter if in ggplot2 ou just regular plot (R basic).
Thanks a lot!
Agree with docendo discimus that maybe a barplot is what you're looking for. Based on what you're asking though I would reshape your data to make it a little easier to work with first and you can still get it done with stat = "identity"
sapply(c("dplyr", "ggplot2"), require, character.only = T)
# convert from matrix to data frame and preserve row names as column
b <- data.frame(population = row.names(b), as.data.frame(b), row.names = NULL)
# gather so in a tidy format for ease of use in ggplot2
b <- gather(as.data.frame(b), lambda, value, -1)
# plot 1 as described in question
ggplot(b, aes(x = population, y = value)) + geom_histogram(aes(fill = lambda), stat = "identity", position = "dodge")
# plot 2 using facets to separate as an alternative
ggplot(b, aes(x = population, y = value)) + geom_histogram(stat = "identity") + facet_grid(. ~ lambda)
I was wondering if there is a way to normalize the heights of the histograms with multiple groups so that their first heights are all = 1. For instance:
results <- rep(c(1,1,2,2,2,3,1,1,1,3,4,4,2,5,7),3)
category <- rep(c("a","b","c"),15)
data <- data.frame(results,category)
p <- ggplot(data, aes(x=results, fill = category, y = ..count..))
p + geom_histogram(position = "dodge")
gives a regular histogram with 3 groups.
Also
results <- rep(c(1,1,2,2,2,3,1,1,1,3,4,4,2,5,7),3)
category <- rep(c("a","b","c"),15)
data <- data.frame(results,category)
p <- ggplot(data, aes(x=results, fill = category, y = ..ncount..))
p + geom_histogram(position = "dodge")
gives a the histogram where each group is normalized to have maximum height of 1.
I want to get a histogram where each group is normalized to have first height of 1 (so I can show growth) but I don't understand if there is an appropriate alternative to ..ncount or ..count.. or if anyone can help me understand the structure of ..count.. I could maybe figure it out from there.
Thanks!
I bet there is a nice way to do everything within ggplot. However, I tend to prefer preparing the desired data set before I plug it into ggplot. If I understood you correctly, you may try something like this:
# convert 'results' to factor and set levels to get an equi-spaced 'results' x-axis
df$results <- factor(df$results, levels = 1:7)
# for each category, count frequency of 'results'
df <- as.data.frame(with(df, table(results, category)))
# normalize: for each category, divide all 'Freq' (heights) with the first 'Freq'
df$freq2 <- with(df, ave(Freq, category, FUN = function(x) x/x[1]))
ggplot(data = df, aes(x = results, y = freq2, fill = category)) +
geom_bar(stat = "identity", position = "dodge")
It looks like ..density.. does what you want, but I can't for the life of me find documentation on it. On both your examples it does what you are looking for, though!
results <- rep(c(1,1,2,2,2,3,1,1,1,3,4,4,2,5,7),3)
category <- rep(c("a","b","c"),15)
data <- data.frame(results,category)
p <- ggplot(data, aes(x=results, fill = category, y = ..density..))
p + geom_histogram(position = "dodge")