R Horizontal Stacked Percentage Chart - r

I have a task using GGplot that I'm stuck with.
Im asked to horizontally plot the percentage of income provenance per individual with the following data.frame :
PersonalID <- c(1,2,3)
Stock.Return <- c(20,10,70)
Salary <- c(90,25,40)
Bond.Return <- c(16,10,7)
MyDat <- data.frame(PersonalID, Stock.Return, Salary, Bond.Return)
And it should look like this
As i understand i'm supposed to use facet_wrap , I have X and Y but no fill so i'm a bit lost.
Thank you

Try this approach. The key in ggplot2 to get the maximum power of its functions is first reshaping to long and after that you can design the plot. Here the code for the plot, using tidyverse functions for data process and ggplot2:
library(tidyverse)
#Code
MyDat %>% pivot_longer(-PersonalID) %>%
ggplot(aes(x=factor(PersonalID),y=value,fill=name,group=name))+
geom_bar(stat='identity',color='black',position='fill')+
scale_y_continuous(labels = scales::percent)+
theme_bw()+
theme(legend.position = 'bottom',
plot.title = element_text(hjust=0.5))+
coord_flip()+
xlab('PersonalID')+ylab('Percentage')+
ggtitle('Income distribution per person')
Output:

Related

How can I visualize the size of each group using the facet_wrap option in ggplot?

i use geom_bar in ggplot to visualize the purchase decision of customers (3 factor levels purchase, may be, no purchase. The decisions are grouped for several product groups with facet_wrap.
ggplot(df, aes(x= status_purchase)) +
geom_bar() +
theme(axis.text.x = element_text(angle = 90)) +
facet_wrap(~ product_group)
Not surprisingly this works fine. Do i have any options to visualize another variable for the groups in facet_wrap (e.g. total expenses for each product group)? A kind of bubble in the respective size placed in the right upper corner of the plot or at least the sum of the expenses in the headline would be nice.
Thank you for your answers.
Philipp
OP. In the absence of a specific example, let me demonstrate one way to do this that uses geom_text() to display summary data for a given dataset that is separated in to facets.
In this example, I'll use the txhousing dataset (which is part of ggplot2):
library(dplyr)
library(tidyr)
library(ggplot2)
df <- txhousing
df %>% ggplot(aes(x=month, y=sales)) + geom_col() +
facet_wrap(~year)
Let's say we wanted to display a red total of sales for a year in the upper right portion of each facet. The easiest way to do this is to first calculate our summary data in a separate dataset, then overlay that information according to the facets via geom_text().
df_summary <- df %>%
group_by(year) %>%
summarize(total = sum(sales, na.rm = TRUE))
df %>% ggplot(aes(x=month, y=sales)) + geom_col() +
facet_wrap(~year) +
geom_text(
data=df_summary, x=12, y=33000, aes(label=total),
hjust=1, color='red', size=3
)
I override the mapping for the x and y aesthetics in the geom_text() call. As long as the df_summary dataset contains a column called year, the data will be placed on the facets properly.
I hope you can apply a similar idea to your particular question.

How do I add a separate legend for each variable in geom_tile?

I would like to have a separate scale bar for each variable.
I have measurements taken throughout the water column for which the means have been calculated into 50cm bins. I would like to use geom_tile to show the variation of each variable in each bin throughout the water column, so the plot has the variable (categorical) on the x-axis, the depth on the y-axis and a different colour scale for each variable representing the value. I am able to do this for one variable using
ggplot(data, aes(x=var, y=depth, fill=value, color=value)) +
geom_tile(size=0.6)+ theme_classic()+scale_y_continuous(limits = c(0,11), expand = c(0, 0))
But if I put all variables onto one plot, the legend is scaled to the min and max of all values so the variation between bins is lost.
To provide a reproducible example, I have used the mtcars, and I have included alpha = which, of course, doesn't help much because the scale of each variable is so different
data("mtcars")
# STACKS DATA
library(reshape2)
dat2b <- melt(mtcars, id.vars=1:2)
dat2b
ggplot(dat2b) +
geom_tile(aes(x=variable , y=cyl, fill=variable, alpha = value))
Which produces
Is there a way I can add a scale bar for each variable on the plot?
This question is similar to others (e.g. here and here), but they do not use a categorical variable on the x-axis, so I have not been able to modify them to produce the desired plot.
Here is a mock-up of the plot I have in mind using just four of the variables, except I would have all legends horizontal at the bottom of the plot using theme(legend.position="bottom")
Hope this helps:
The function myfun was originally posted by Duck here: R ggplot heatmap with multiple rows having separate legends on the same graph
library(purrr)
library(ggplot2)
library(patchwork)
data("mtcars")
# STACKS DATA
library(reshape2)
dat2b <- melt(mtcars, id.vars=1:2)
dat2b
#Split into list
List <- split(dat2b,dat2b$variable)
#Function for plots
myfun <- function(x)
{
G <- ggplot(x, aes(x=variable, y=cyl, fill = value)) +
geom_tile() +
theme(legend.direction = "vertical", legend.position="bottom")
return(G)
}
#Apply
List2 <- lapply(List,myfun)
#Plot
reduce(List2, `+`)+plot_annotation(title = 'My plot')
patchwork::wrap_plots(List2)

How to plot multiple histograms at once of specific columns within dataset R

I am using the R dataset "USArrests" and am trying to plot the histograms of each column. However, when I do this, I am not able to figure out how to put the xaxis label as well as the title for each histogram labeling the variable that I am looking at.
I currently have
attach(USArrests)
lapply(arrests[,c(1:4)], FUN = hist)
The four histograms outputted look like this:
How can I add the axis/title labels? Thanks
If you want to use base R, this is an occasion where a for-loop is better than lapply.
par(mfrow = c(2, 2))
for(i in names(USArrests))
hist(USArrests[[i]], main = i, xlab = "Value")
You could also use tidyr and ggplot2 with facets, but would need to experiment with the histogram bin size.
library(tidyr)
library(ggplot2)
USArrests %>%
pivot_longer(cols = 1:4) %>%
ggplot(aes(value)) +
geom_histogram() +
facet_wrap(~name, scales = "free")

ggplot2 not rendering correctly using Plotly

I have created a point plot using ggplot2 that works relatively well. I would love to run in using Plotly, however when I do - it ends up upsetting the y axis and making the legend very wonky. I will post some before and after below but I am very new to both and looking for the right direction. The ggplot2 is okay but the added interactivity of plotly would be a huge win for what we are doing. Also a weird note - the top graph returned seems to cut off the plot (the highest value - not sure why). Thanks.
Code is:
library(ggplot2)
library(dplyr)
library(plotly)
library(sqldf)
library(tidyverse)
library(lubridate)
library(rio) #lets you use "import" for any file - without using extension name
options(scipen =999) #disable scientific notation
#prepare data:
setwd("C:/Users/hayescod/Desktop/BuysToForecastTracking")
Buys_To_Forecast <- import("BuysToForecastTrack")
colnames(Buys_To_Forecast) <- c("Date", "BusinessSegment", "Material", "StockNumber", "POCreatedBy", "PlantCode", "StockCategory", "Description", "Excess", "QuantityBought", "WareHouseSalesOrders", "GrandTotal", "Comments" )
Buys_To_Forecast$PlantCode <-factor(Buys_To_Forecast$PlantCode) #update PlantCode to factor
#use SQL to filter and order the data set:
btf <- sqldf("SELECT Date,
SUM(QuantityBought) AS 'QuantityBought',
Comments
FROM Buys_To_Forecast
GROUP BY Date, Comments
ORDER BY Date")
#use ggplot:
btfnew <- ggplot(data=btf, aes(x=Date, y=QuantityBought, color=Comments, size=QuantityBought)) +
geom_point() +
facet_grid(Comments~., scales="free")+
ggtitle("Buys To Forecast Review")+
theme(plot.title = element_text(hjust = 0.5),
axis.title.x = element_text(color="DarkBlue", size = 18),
axis.title.y = element_text(color="Red", size = 14))
btfnew #display the plot in ggplot
ggplotly(btfnew) #display the plot in Plotly

How to add text to bar chart in ggplot

I realize there already are multiple instances of this question, but none of them really provided the answer for me. So I've got this (already melted) data frame:
df <-data.frame(
Var1 = c("Inschrijvingen", "BSA", "Inschrijvingen", "BSA"),
Var2 = c("Totaal","Totaal", "OD_en_MD", "OD_en_MD"),
Value = c(262, 190, 81, 69)
)
Note that this is only a small part of the data frame and that I've got lots of similar data frames. I made stacked bar charts the following way:
ggplot(df, aes(Var2, as.numeric(as.character(value)), fill=Var1))+
geom_bar(position="identity", stat="identity") +
scale_alpha_manual(values=c(.6,.8)) +
ggtitle(names(df)) + labs(x="", y="Aantal") +
scale_colour_brewer(palette = "Set2") +
scale_fill_discrete("BSA Resultaten", labels=c("BSA niet behaald", "BSA behaald"))
Which gives me the following bar chart:
Now I would like to add percentages to the blue parts of the bar chart. The red part is the total amount of subscribers and the blue part is the amount that made it through. So in my example these percentages should become
df$Value[2]*100/df$Value[1]
df$Value[4]*100/df$Value[3]
Since I've got loads of these data frames, I don't really want to do it manually. I've seen examples on stackoverflow where the text and percentage calculations have been both implemented in ggplot and where the percentages were calculated before using ggplot, but I'm afraid my data preparation isn't that good to do this that easily.
Things I've tried:
#ddply, to add a column with percentages:
ddply(df2, .(Var2), transform, percent=value*100/value)
The problem here is, of course, my percent-calculation. How do I make ddply select and multiply the right values? Would this be the right way in the first place?
#Calculating percentages before melting the data frame, which gives me the (molten) data frame:
df2 <- data.frame(
Var1 =c("Inschrijvingen", "BSA","Percentage","Inschrijvingen",
"BSA","Percentage"),
Var2 =c("Totaal","Totaal","Totaal","OD_en_MD","OD_en_MD","OD_en_MD"),
Value = c(262,190,72.5,81,69,85.2)
)
The problem here is that I don't know how to get this into ggplot without the percentages being plotted. I guess I should separate the values Percentage from Var1, but I haven't been able to manage that.
Any help would be greatly appreciated!
library(dplyr)
df <- df %>%
group_by(Var2) %>%
mutate(Max = max(Value), Min = min(Value), Per = round(Min*100/Max, 2))%>%
arrange(Var2)
ggplot(df, aes(Var2, as.numeric(as.character(Value)), fill=Var1))+
geom_bar(position="identity", stat="identity") +
scale_alpha_manual(values=c(.6,.8)) +
ggtitle(names(df)) + labs(x="", y="Aantal") +
scale_colour_brewer(palette = "Set2") +
scale_fill_discrete("BSA Resultaten", labels=c("BSA niet behaald", "BSA behaald"))+
annotate("text", x = 1:length(unique(df$Var2)), y=rep(min((unique(df$Max)-unique(df$Min))),2), label = unique(df$Per))

Resources