I would like to add a different secondary axis to each facet. Here is my working example:
library(ggplot2)
library(data.table)
#Create the data:
data<-data.table(cohort=sample(c(1946,1947,1948),10000,replace=TRUE),
works=sample(c(0,1),10000,replace=TRUE),
year=sample(seq(2006,2013),10000,replace=TRUE))
data[,age_cohort:=year-cohort]
data[,prop_works:=mean(works),by=c("cohort","year")]
#Prepare data for plotting:
data_to_plot<-unique(data,by=c("cohort","year"))
#Plot what I want:
ggplot(data_to_plot,aes(x=age_cohort,y=prop_works))+geom_point()+geom_line()+
facet_wrap(~ cohort)
The plot shows how many people of a particular cohort work at a given age. I would like to add a secondary x axis showing which year corresponds to a particular age for different cohorts.
Since you have the actual values you want to use in your dataset, one work around is to plot them as an additional geom_text layer:
ggplot(data_to_plot,
aes(x = age_cohort, y = prop_works, label = year))+
geom_point() +
geom_line() +
geom_text(aes(y = min(prop_works)),
hjust = 1.5, angle = 90) + # rotate to save space
expand_limits(y = 0.44) +
scale_x_continuous(breaks = seq(58, 70, 1)) + # ensure x-axis breaks are at whole numbers
scale_y_continuous(labels = scales::percent) +
facet_wrap(~ cohort, scales = "free_x") + # show only relevant age cohorts in each facet
theme(panel.grid.minor.x = element_blank()) # hide minor grid lines for cleaner look
You can adjust the hjust value in geom_text() and y value in expand_limits() for a reasonable look, depending on your desired output's dimensions.
(More data wrangling would be required if there are missing years in the data, but I assume that isn't the case here.)
Related
I'm currently working on a sales dataset, and I have been trying to make a barchart, showcasing the units sold by each country, with the colouring of the charts being the number of inhabitants in each country.
so far I have used the following code to get the chart
ChartProductsbyCountry <- ggplot(Dataframe10, aes(x=Country, y=SumAct_U)) + geom_col(aes(fill=Inhabitants_2019)) + theme(axis.text.x = element_text(angle = 90, size = 10))
library(scales)
ChartProductsbyCountry + scale_y_continuous(labels = comma)
This gives me the following chart
Barchart
I'm already very satisfied with it, however I would like to change some things but don't know how:
on the right side, showing the "colouring labels/legend", it does not show the actual numbers but rather 2e+07, 4e+07, etc... how can I change it to showing Numbers instead? And as for the y-axis, how should I change my code to have ticks going from 0 to 1.250.000,00 at every 125.000 (Units) (so starting with 0, then 125.000, then 250.000, then 375.000, ...)?
OP, as mentioned by #stefan in the comment, you can use the breaks= argument to control spacing of your ticks on the axis. To format the colorbar legend items, you can use scale_fill_continuous() in the same manner you did for the y axis. Here's an example:
library(ggplot2)
library(scales)
set.seed(8675309)
df <- data.frame(x=LETTERS, y=sample(0:20, replace=T, size=26) * 1000000)
p <-
ggplot(df, aes(x,y, fill=y)) + geom_col() +
scale_y_continuous(labels = comma)
p
To set the number formatting in the colorbar, you can use labels = comma within scale_fill_continuous(). To change the ticks on the y axis, you can access that via the breaks= argument. Essentially, send a vector of values to breaks= to have the labels marked where you want. In this case, I'll set it up to make a tick mark every 1 million, and then set the colorbar to use comma format:
ggplot(df, aes(x,y, fill=y)) + geom_col() +
scale_y_continuous(labels = comma, breaks=seq(0, max(df$y), by=1000000)) +
scale_fill_continuous(labels=comma)
I am trying to use scale_y_continuous() with a faceted histogram and running into an issue. I am hoping to get each count to be a percentage instead. My code is:
ggplot(d, aes(x = likely_att)) +
geom_histogram(binwidth = 0.5, color = "black") +
facet_wrap(~married, scales = "free_y") +
theme_classic() +
scale_y_continuous(labels = percent_format())
It looks like the distributions themselves are accurate, but the scaling is off: the percentages are "200 000%", "5 000%", etc. and that seems wrong, but I'm not quite sure why it's happening.
There are many more "yes" than "no" or "separated" married values in my dataset, which is why I use scales = "free_y" and why I'm hoping to just have percentages shown and only need one axis value shown.
I can't share this exact data for privacy reasons, but the likely_att variable is just a 1-5 numeric var, and married is a character var with 3 values: yes, no, separated.
In case it's helpful, I basically want it to look just like this image, but with percentages instead of counts, so I can just have one single y axis on the far left with 0 - 100 %
The problem is that using the percentage_format() function changes the way the labels are printed, but it doesn't actually rescale the numbers. To do that, you could use the density constructed variable and multiply it by the bin-width, then use the percent formatting.
ggplot(d, aes(x = likely_att)) +
stat_bin(aes(y=..density..*.5, group = married),
binwidth = 0.5, color = "black") +
facet_wrap(~married, scales = "free_y") +
theme_classic() +
scale_y_continuous(labels = percent_format())
I've made a violin plot that looks like this:
As we can see most of the data lies near the region where the score is 0.90-0.95. What I wish is to focus on the interval 0.75 to 1.00 by changing the scale giving less space to ratings from 0 to 0.75.
Is there a way to do this?
This is the code I'm currently using to create the violin plot:
ggplot(data=Violin_plots, aes(x = Year, y = Score)) +
geom_violin(aes(fill = Violin_plots$Year), trim = TRUE) +
coord_flip()+
scale_fill_brewer(palette = "Blues") +
theme(legend.position = 'none') +
labs(y = "Rating score",
fill = "Rating year",
title = "Violin-plots of credit rating scores")
While it's possible to transform the scale to focus more in the upper region (e.g. add trans = "exp" as an argument to the scale), a non linear scale is often hard to interpret appropriately.
For such use cases, I recommend facet_zoom from the ggforce package, which is pretty much built for this exact purpose (see vignette here).
I also switched from geom_violin() + coord_flip() to geom_violinh from the ggstance package, which extends ggplot2 by providing flipped versions of ggplot components. Example with simulated data below:
library(ggforce) # for facet_zoom
library(ggstance) # for flipped version of geom_violin
ggplot(df,
aes(x = rating, y = year, fill = year)) +
geom_violinh() + # no need to specify trim = TRUE as it's the default
scale_fill_brewer(palette = "Blues") +
theme(legend.position = 'none') +
facet_zoom(xlim = c(0.75, 0.98)) # specify zoom range here
Sample data that simulates the characteristics of the data in the question:
df <- diamonds[, c("color", "price")]
df$rating <- (max(df$price) - df$price) / max(df$price)
df$year <- df$color
You could create a second plot to zoom in on the original plot, without modifying the data, by using ggplot2::coord_cartesian()
ggplot(data=Violin_plots, aes(x=Year,y=Score*100)) +
geom_violin(aes(fill=Violin_plots$Year),trim=TRUE) +
coord_flip() +
coord_cartesian(xlim = c(0.75, 1.00)) +
scale_fill_brewer(palette="Blues") +
theme(legend.position='none') +
labs(y="Rating score",fill="Rating year",title="Violin-plots of credit rating scores")
Say I'm measuring 10 personality traits and I know the population baseline. I would like to create a chart for individual test-takers to show them their individual percentile ranking on each trait. Thus, the numbers go from 1 (percentile) to 99 (percentile). Given that a 50 is perfectly average, I'd like the graph to show bars going to the left or right from 50 as the origin line. In bar graphs in ggplot, it seems that the origin line defaults to 0. Is there a way to change the origin line to be at 50?
Here's some fake data and default graphing:
df <- data.frame(
names = LETTERS[1:10],
factor = round(rnorm(10, mean = 50, sd = 20), 1)
)
library(ggplot2)
ggplot(data = df, aes(x=names, y=factor)) +
geom_bar(stat="identity") +
coord_flip()
Picking up on #nongkrong's comment, here's some code that will do what I think you want while relabeling the ticks to match the original range and relabeling the axis to avoid showing the math:
library(ggplot2)
ggplot(data = df, aes(x=names, y=factor - 50)) +
geom_bar(stat="identity") +
scale_y_continuous(breaks=seq(-50,50,10), labels=seq(0,100,10)) + ylab("Percentile") +
coord_flip()
This post was really helpful for me - thanks #ulfelder and #nongkrong. However, I wanted to re-use the code on different data without having to manually adjust the tick labels to fit the new data. To do this in a way that retained ggplot's tick placement, I defined a tiny function and called this function in the label argument:
fix.labels <- function(x){
x + 50
}
ggplot(data = df, aes(x=names, y=factor - 50)) +
geom_bar(stat="identity") +
scale_y_continuous(labels = fix.labels) + ylab("Percentile") +
coord_flip()
This question already has answers here:
Turning off some legends in a ggplot
(2 answers)
Closed 4 years ago.
Is it possible to remove certain items from a legend created with ggplot? I have a plot that is faceted, and point sizes provide another dimension to the plot. Since the plot is faceted I do not need to have certain legend items since it is explained by the facet titles, but the legend is still relevant for the point size.
In the plot below I would like to remove the "AREA" legend items since it is already explained by the faceting, but keep the "TOTAL_VOLUME" legend items that explain the point sizes.
Here is the code used to generate the plot:
library(data.table) # Import libraries
library(ggplot2)
library(scales)
set.seed(1234) # Set Seed
area.list <- LETTERS[seq(1:7)] # 7 Possible areas
date.list <- seq(as.Date("2014/03/01"), by="month", length=13)
# Build a random data set
data <- data.table(AREA = sample(area.list, 80, replace=TRUE),
DATE = sample(date.list, 80, replace=TRUE),
VOLUME = rnorm(n=80, mean=100000,sd=40000),
NON_CONFORMING_VOLUME = rnorm(n=80, mean=30000,sd=5000))
# Summarise data by area and date
data <- data[, list(TOTAL_VOLUME=sum(VOLUME),
TOTAL_NC_VOLUME=sum(NON_CONFORMING_VOLUME)),
by=list(AREA, DATE)]
data$PERCENT_NC <- data$TOTAL_NC_VOLUME / data$TOTAL_VOLUME * 100
p <- ggplot(data = data, aes(x = DATE,
y = PERCENT_NC,
colour = AREA)) +
geom_point(aes(size = TOTAL_VOLUME)) +
geom_line() +
facet_grid(. ~ AREA) +
theme(legend.position="bottom", axis.text.x=element_text(angle=90,hjust=1)) +
ggtitle("Percent Non-Conforming by Area by Month") +
labs(x = "Month", y = "% Non-Conforming") +
scale_size_continuous(labels = comma)
plot(p)
I tried adding show_guide=FALSE to geom_point() but that removes both TOTAL_VOLUME and AREA.
Thank you
You can set the guide for each scale in the following way:
p + guides(size = "legend", colour = "none")