I realize there already are multiple instances of this question, but none of them really provided the answer for me. So I've got this (already melted) data frame:
df <-data.frame(
Var1 = c("Inschrijvingen", "BSA", "Inschrijvingen", "BSA"),
Var2 = c("Totaal","Totaal", "OD_en_MD", "OD_en_MD"),
Value = c(262, 190, 81, 69)
)
Note that this is only a small part of the data frame and that I've got lots of similar data frames. I made stacked bar charts the following way:
ggplot(df, aes(Var2, as.numeric(as.character(value)), fill=Var1))+
geom_bar(position="identity", stat="identity") +
scale_alpha_manual(values=c(.6,.8)) +
ggtitle(names(df)) + labs(x="", y="Aantal") +
scale_colour_brewer(palette = "Set2") +
scale_fill_discrete("BSA Resultaten", labels=c("BSA niet behaald", "BSA behaald"))
Which gives me the following bar chart:
Now I would like to add percentages to the blue parts of the bar chart. The red part is the total amount of subscribers and the blue part is the amount that made it through. So in my example these percentages should become
df$Value[2]*100/df$Value[1]
df$Value[4]*100/df$Value[3]
Since I've got loads of these data frames, I don't really want to do it manually. I've seen examples on stackoverflow where the text and percentage calculations have been both implemented in ggplot and where the percentages were calculated before using ggplot, but I'm afraid my data preparation isn't that good to do this that easily.
Things I've tried:
#ddply, to add a column with percentages:
ddply(df2, .(Var2), transform, percent=value*100/value)
The problem here is, of course, my percent-calculation. How do I make ddply select and multiply the right values? Would this be the right way in the first place?
#Calculating percentages before melting the data frame, which gives me the (molten) data frame:
df2 <- data.frame(
Var1 =c("Inschrijvingen", "BSA","Percentage","Inschrijvingen",
"BSA","Percentage"),
Var2 =c("Totaal","Totaal","Totaal","OD_en_MD","OD_en_MD","OD_en_MD"),
Value = c(262,190,72.5,81,69,85.2)
)
The problem here is that I don't know how to get this into ggplot without the percentages being plotted. I guess I should separate the values Percentage from Var1, but I haven't been able to manage that.
Any help would be greatly appreciated!
library(dplyr)
df <- df %>%
group_by(Var2) %>%
mutate(Max = max(Value), Min = min(Value), Per = round(Min*100/Max, 2))%>%
arrange(Var2)
ggplot(df, aes(Var2, as.numeric(as.character(Value)), fill=Var1))+
geom_bar(position="identity", stat="identity") +
scale_alpha_manual(values=c(.6,.8)) +
ggtitle(names(df)) + labs(x="", y="Aantal") +
scale_colour_brewer(palette = "Set2") +
scale_fill_discrete("BSA Resultaten", labels=c("BSA niet behaald", "BSA behaald"))+
annotate("text", x = 1:length(unique(df$Var2)), y=rep(min((unique(df$Max)-unique(df$Min))),2), label = unique(df$Per))
Related
i use geom_bar in ggplot to visualize the purchase decision of customers (3 factor levels purchase, may be, no purchase. The decisions are grouped for several product groups with facet_wrap.
ggplot(df, aes(x= status_purchase)) +
geom_bar() +
theme(axis.text.x = element_text(angle = 90)) +
facet_wrap(~ product_group)
Not surprisingly this works fine. Do i have any options to visualize another variable for the groups in facet_wrap (e.g. total expenses for each product group)? A kind of bubble in the respective size placed in the right upper corner of the plot or at least the sum of the expenses in the headline would be nice.
Thank you for your answers.
Philipp
OP. In the absence of a specific example, let me demonstrate one way to do this that uses geom_text() to display summary data for a given dataset that is separated in to facets.
In this example, I'll use the txhousing dataset (which is part of ggplot2):
library(dplyr)
library(tidyr)
library(ggplot2)
df <- txhousing
df %>% ggplot(aes(x=month, y=sales)) + geom_col() +
facet_wrap(~year)
Let's say we wanted to display a red total of sales for a year in the upper right portion of each facet. The easiest way to do this is to first calculate our summary data in a separate dataset, then overlay that information according to the facets via geom_text().
df_summary <- df %>%
group_by(year) %>%
summarize(total = sum(sales, na.rm = TRUE))
df %>% ggplot(aes(x=month, y=sales)) + geom_col() +
facet_wrap(~year) +
geom_text(
data=df_summary, x=12, y=33000, aes(label=total),
hjust=1, color='red', size=3
)
I override the mapping for the x and y aesthetics in the geom_text() call. As long as the df_summary dataset contains a column called year, the data will be placed on the facets properly.
I hope you can apply a similar idea to your particular question.
I have three vectors and a list of crimes. Each crime represents a row. On each row, each vector identifies the percentage change in the number of incidents of each type from the prior year.
Below is the reproducible example. Unfortunately, the df takes the first value in and repeats in down the columns (this is my first sorta reproducible example).
crime_vec = c('\tSTRONGARM - NO WEAPON', '$500 AND UNDER', 'ABUSE/NEGLECT: CARE FACILITY', 'AGG CRIM')
change15to16vec = as.double(825, -1.56, -66.67, -19.13)
change16to17vec = as.double(8.11, .96, 50, 4.84)
change17to18vec = as.double(-57.50, 1.29, 83.33, 28.72)
df = data.frame(crime_vec, change15to16vec, change16to17vec, change17to18vec)
df
I need a graph that will take the correct data frame, show the crimes down the y axis and ALL 3 percentage change vectors on the x-axis in a dodged bar. The examples I've seen plot only two vectors. I've tried plot(), geom_bar, geom_col, but can only get one column to graph (occasionally).
Any suggestions for a remedy would help.
Not sure if this is what you are looking for:
library(tidyr)
library(ggplot2)
df %>%
pivot_longer(-crime_vec) %>%
ggplot(aes(x = value, y = crime_vec, fill = as.factor(name))) +
geom_bar(stat = "identity", position = "dodge") +
theme_minimal() +
xlab("Percentage Change") +
ylab("Crime") +
labs(fill = "Change from")
For using ggplot2 it's necessary, to bring your data into a long format. geom_bar should create your desired plot.
I'm currently working on plotting simple plots using ggplot2.
The graph looks good, but there is one tiny detail I can't fix.
When you look at the legend, it says "Low n" twice. One of them should be "High n".
Here is my code:
half_plot <- ggplot() +
ggtitle(plot_title) +
geom_line(data = plot_dataframe_SD1, mapping = aes(x = XValues, y = YValues_SD1, color = "blue")) +
geom_line(data = plot_dataframe_SD2, mapping = aes(x = XValues, y = YValues_SD2, color = "green")) +
xlim(1, 2) +
ylim(1, 7) +
xlab("Standard Deviation") +
ylab(AV_column_name) +
scale_fill_identity(name = 'the fill', guide = 'legend',labels = c('m1')) +
scale_colour_manual(name = 'Legend',
values =c('blue'='blue','green'='green'),
labels = c(paste("Low ", Mod_column_name), paste("High ", Mod_column_name))
Here is the graph I get in my output:
So do you know how to fix this?
And there is one more thing that makes me curious: I can't remember that I changes anything in this code, but I know that the legend worked just fine a few days ago. I safed pictures I made wih this code and it looks alright..
Also if you have any further suggestions how to upgrade the graph, these suggestions are very welcome too.
When asking questions, it will help us if you provide a reproducible example including the data. With some sample data, there are a couple ways to fix it.
Sample data
library(dplyr)
plot_dataframe_SD1 = data.frame(XValues=seq(1,2,by=.2)) %>%
mutate(YValues_SD1=XValues*2)
plot_dataframe_SD2 = data.frame(XValues=seq(1,2,by=.2)) %>%
mutate(YValues_SD2=XValues*5)
The simplest way to modify your code is to supply the desired color label in the aesthetic.
Mod_column_name = 'n'
half_plot <- ggplot() +
# put the desired label name in the aesthetic
# link describing the bang bang operator (!!) https://www.r-bloggers.com/2019/07/bang-bang-how-to-program-with-dplyr/ geom_line(data=plot_dataframe_SD1,mapping=aes(x=XValues,y=YValues_SD1,color=!!paste('Low',Mod_column_name))) +
geom_line(data=plot_dataframe_SD2,mapping=aes(x=XValues,y=YValues_SD2,color=!!paste('High',Mod_column_name))) +
scale_color_manual(values=c('blue','green'),
labels=c(paste('Low',Mod_column_name),paste('High',Mod_column_name)))
A more general approach is to join the dataframes and pivot the joined df to have a column with the SD values and another to specify how to separate the colors. This makes it easier to plot without having to make multiple calls to geom_line.
# Join the dfs, pivot the SD columns longer, and make a new column with your desired labels
joined_df = plot_dataframe_SD1 %>% full_join(plot_dataframe_SD2,by='XValues') %>%
tidyr::pivot_longer(cols=contains('YValues'),names_to='df_num',values_to='SD') %>%
mutate(label_name=if_else(df_num == 'YValues_SD1',paste('Low',Mod_column_name),paste('High',Mod_column_name)))
# Simplified plot
ggplot(data=joined_df,aes(x=XValues,y=SD,color=label_name)) +
geom_line() +
scale_color_manual(values=c('blue','green'),
labels=c(paste('Low',Mod_column_name),paste('High',Mod_column_name)))
I am trying to plot a line graph with multiple lines (grouped by a categorical value - factor) and based on what I have done in the past and what I can find online here the easiest way to do this is by assigning the categorical value to the group aesthetic - but this isn't working for me I am only getting one line on the line graph. I am 100% sure I am doing something super silly but I can't for the life of me work it out. Thanks in advance :)
#dummy data for example
test <- data.frame(x = sample(seq(as.Date('2015/01/01'), as.Date('2020/01/01'), by="day"), 20),
y = sample(10:300, 10),
Origin_Station = as.factor(rep(1, 10)),
Neighbour_station = as.factor(rep(1:5, each = 20)))
#plot - what I want to see is a line for each of the 5 Neighbour_station categories (1:5) but what I get is just one line
ggplot(test, aes(x=x, y=y, group = Neighbour_station))+
geom_line()
I have also tried this:
ggplot(test, aes(x=x, y=y, group = factor(Neighbour_station), colour = Neighbour_station))+
geom_line()
Hi Rhetta also from Aus here, big ups Australian useRs:
library(ggplot2)
ggplot(test, aes(x = x, y = y, group = Neighbour_station, colour = Neighbour_station))+
geom_line()
Note the reason you can't see the distinct lines is because your data is exactly the same for each factor level (Neighbour_station 1:5).
Say I'm measuring 10 personality traits and I know the population baseline. I would like to create a chart for individual test-takers to show them their individual percentile ranking on each trait. Thus, the numbers go from 1 (percentile) to 99 (percentile). Given that a 50 is perfectly average, I'd like the graph to show bars going to the left or right from 50 as the origin line. In bar graphs in ggplot, it seems that the origin line defaults to 0. Is there a way to change the origin line to be at 50?
Here's some fake data and default graphing:
df <- data.frame(
names = LETTERS[1:10],
factor = round(rnorm(10, mean = 50, sd = 20), 1)
)
library(ggplot2)
ggplot(data = df, aes(x=names, y=factor)) +
geom_bar(stat="identity") +
coord_flip()
Picking up on #nongkrong's comment, here's some code that will do what I think you want while relabeling the ticks to match the original range and relabeling the axis to avoid showing the math:
library(ggplot2)
ggplot(data = df, aes(x=names, y=factor - 50)) +
geom_bar(stat="identity") +
scale_y_continuous(breaks=seq(-50,50,10), labels=seq(0,100,10)) + ylab("Percentile") +
coord_flip()
This post was really helpful for me - thanks #ulfelder and #nongkrong. However, I wanted to re-use the code on different data without having to manually adjust the tick labels to fit the new data. To do this in a way that retained ggplot's tick placement, I defined a tiny function and called this function in the label argument:
fix.labels <- function(x){
x + 50
}
ggplot(data = df, aes(x=names, y=factor - 50)) +
geom_bar(stat="identity") +
scale_y_continuous(labels = fix.labels) + ylab("Percentile") +
coord_flip()