R ggplot remove certain items from legend [duplicate] - r

This question already has answers here:
Turning off some legends in a ggplot
(2 answers)
Closed 4 years ago.
Is it possible to remove certain items from a legend created with ggplot? I have a plot that is faceted, and point sizes provide another dimension to the plot. Since the plot is faceted I do not need to have certain legend items since it is explained by the facet titles, but the legend is still relevant for the point size.
In the plot below I would like to remove the "AREA" legend items since it is already explained by the faceting, but keep the "TOTAL_VOLUME" legend items that explain the point sizes.
Here is the code used to generate the plot:
library(data.table) # Import libraries
library(ggplot2)
library(scales)
set.seed(1234) # Set Seed
area.list <- LETTERS[seq(1:7)] # 7 Possible areas
date.list <- seq(as.Date("2014/03/01"), by="month", length=13)
# Build a random data set
data <- data.table(AREA = sample(area.list, 80, replace=TRUE),
DATE = sample(date.list, 80, replace=TRUE),
VOLUME = rnorm(n=80, mean=100000,sd=40000),
NON_CONFORMING_VOLUME = rnorm(n=80, mean=30000,sd=5000))
# Summarise data by area and date
data <- data[, list(TOTAL_VOLUME=sum(VOLUME),
TOTAL_NC_VOLUME=sum(NON_CONFORMING_VOLUME)),
by=list(AREA, DATE)]
data$PERCENT_NC <- data$TOTAL_NC_VOLUME / data$TOTAL_VOLUME * 100
p <- ggplot(data = data, aes(x = DATE,
y = PERCENT_NC,
colour = AREA)) +
geom_point(aes(size = TOTAL_VOLUME)) +
geom_line() +
facet_grid(. ~ AREA) +
theme(legend.position="bottom", axis.text.x=element_text(angle=90,hjust=1)) +
ggtitle("Percent Non-Conforming by Area by Month") +
labs(x = "Month", y = "% Non-Conforming") +
scale_size_continuous(labels = comma)
plot(p)
I tried adding show_guide=FALSE to geom_point() but that removes both TOTAL_VOLUME and AREA.
Thank you

You can set the guide for each scale in the following way:
p + guides(size = "legend", colour = "none")

Related

Adding different secondary x axis for each facet in ggplot2

I would like to add a different secondary axis to each facet. Here is my working example:
library(ggplot2)
library(data.table)
#Create the data:
data<-data.table(cohort=sample(c(1946,1947,1948),10000,replace=TRUE),
works=sample(c(0,1),10000,replace=TRUE),
year=sample(seq(2006,2013),10000,replace=TRUE))
data[,age_cohort:=year-cohort]
data[,prop_works:=mean(works),by=c("cohort","year")]
#Prepare data for plotting:
data_to_plot<-unique(data,by=c("cohort","year"))
#Plot what I want:
ggplot(data_to_plot,aes(x=age_cohort,y=prop_works))+geom_point()+geom_line()+
facet_wrap(~ cohort)
The plot shows how many people of a particular cohort work at a given age. I would like to add a secondary x axis showing which year corresponds to a particular age for different cohorts.
Since you have the actual values you want to use in your dataset, one work around is to plot them as an additional geom_text layer:
ggplot(data_to_plot,
aes(x = age_cohort, y = prop_works, label = year))+
geom_point() +
geom_line() +
geom_text(aes(y = min(prop_works)),
hjust = 1.5, angle = 90) + # rotate to save space
expand_limits(y = 0.44) +
scale_x_continuous(breaks = seq(58, 70, 1)) + # ensure x-axis breaks are at whole numbers
scale_y_continuous(labels = scales::percent) +
facet_wrap(~ cohort, scales = "free_x") + # show only relevant age cohorts in each facet
theme(panel.grid.minor.x = element_blank()) # hide minor grid lines for cleaner look
You can adjust the hjust value in geom_text() and y value in expand_limits() for a reasonable look, depending on your desired output's dimensions.
(More data wrangling would be required if there are missing years in the data, but I assume that isn't the case here.)

How to fill the area under the lines in ggplot2 geom_freqpoly graph? [duplicate]

This question already has answers here:
What is the simplest method to fill the area under a geom_freqpoly line?
(4 answers)
Closed 6 years ago.
I am plotting a continuous variable in X-axis against the the corresponding counts (not the density) in the Y-axis using ggplot2.
This is my code
p <- ggplot(matched.frame, aes(x = AGE, color = as.factor(DRUG_KEY))) + geom_freqpoly(binwidth=5)
p1 <- p + theme_minimal()
plot(p1)
This produces a graph like this this:
I want the areas under these lines to be filled with colors and with little bit of transparency. I know to do this for density plots in ggplot2, but I am stuck with this frequency polygon.
Also, how do I change the legends on the right side? For example, I want 'Cases' instead of 26 and Controls instead of '27'. Instead of as.factor(DRUG_KEY), I want it to appear as 'Colors"
Sample data
matched.frame <- data.frame("AGE"=c(18,19,20,21,22,23,24,25,26,26,27,18,19,20,24,23,23,23,22,30,28,89,30,20,23))
matched.frame$DRUG_KEY <- 26
matched.frame$DRUG_KEY[11:25] <- 27
You can use geom_ribbon to fill the area under the curves and scale_fill_discrete (fill color) as well as scale_color_discrete (line color) to change the legend labels:
library(ggplot2)
set.seed(1)
df <- data.frame(x = 1:10, y = runif(20), f = gl(2, 10))
ggplot(df, aes(x=x, ymin=0, ymax=y, fill=f)) +
geom_ribbon(, alpha=.5) +
scale_fill_discrete(labels = c("1"="foo", "2"="bar"), name = "Labels")
With regards to your edit:
ggplot(matched.frame, aes(x=AGE, fill=as.factor(DRUG_KEY), color=as.factor(DRUG_KEY))) +
stat_bin(aes(ymax=..count..,), alpha=.5, ymin=0, geom="ribbon", binwidth =5, position="identity", pad=TRUE) +
geom_freqpoly(binwidth=5, size=2) +
scale_fill_discrete(labels = c("26"="foo", "27"="bar"), name = "Labels") +
scale_color_discrete(labels = c("26"="foo", "27"="bar"), name = "Labels")

Moving legend to the bottom in ggplot2 [duplicate]

This question already has answers here:
How to move or position a legend in ggplot2
(4 answers)
Closed 7 years ago.
I have created the following heatmap. If you notice that the legend for cohort is on the right and the vertically placed.
How do I move the legend to the bottom in order to give more space for X axis variable month M0 to M55...Also, you will notice that X axis elements are overlapping hence not clear.
Output of the graph:
cohort.clients<-df1
cohort.clients$cohort<-as.character(cohort.clients$cohort)
#we need to melt data
cohort.chart.cl <- melt(cohort.clients, id.vars = 'cohort')
colnames(cohort.chart.cl) <- c('cohort', 'month', 'clients')
#define palette
reds <- colorRampPalette(c('light green',"dark green","yellow"))
#plot data
p <- ggplot(cohort.chart.cl, aes(x=month, y=clients, group=cohort))
p + geom_area(aes(fill = cohort)) +
scale_fill_manual(values = reds(nrow(cohort.clients))) +
ggtitle('Customer Cohort')
Try something like:
ggplot(cohort.chart.cl, aes(x=month, y=clients, group=cohort))
geom_area(aes(fill = cohort)) +
scale_fill_manual(values = reds(nrow(cohort.clients))) +
ggtitle('Customer Cohort') +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
legend.direction = "horizontal", legend.position = "bottom"))
It's also worth noting that your color palette is essentially the same color. If you make cohort$month a factor then ggplot should automatically give you a much more informative palette by default. That being said, with >50 categories, you're well past the realm of a distinguishable colors and might also consider binning the months (into yearly quarters?) and returning to a spectrum like you have now.

Line up columns of bar graph with points of line plot with ggplot

Is there any way to line up the points of a line plot with the bars of a bar graph using ggplot when they have the same x-axis? Here is the sample data I'm trying to do it with.
library(ggplot2)
library(gridExtra)
data=data.frame(x=rep(1:27, each=5), y = rep(1:5, times = 27))
yes <- ggplot(data, aes(x = x, y = y))
yes <- yes + geom_point() + geom_line()
other_data = data.frame(x = 1:27, y = 50:76 )
no <- ggplot(other_data, aes(x=x, y=y))
no <- no + geom_bar(stat = "identity")
grid.arrange(no, yes)
Here is the output:
The first point of the line plot is to the left of the first bar, and the last point of the line plot is to the right of the last bar.
Thank you for your time.
Extending #Stibu's post a little: To align the plots, use gtable (Or see answers to your earlier question)
library(ggplot2)
library(gtable)
data=data.frame(x=rep(1:27, each=5), y = rep(1:5, times = 27))
yes <- ggplot(data, aes(x = x, y = y))
yes <- yes + geom_point() + geom_line() +
scale_x_continuous(limits = c(0,28), expand = c(0,0))
other_data = data.frame(x = 1:27, y = 50:76 )
no <- ggplot(other_data, aes(x=x, y=y))
no <- no + geom_bar(stat = "identity") +
scale_x_continuous(limits = c(0,28), expand = c(0,0))
gYes = ggplotGrob(yes) # get the ggplot grobs
gNo = ggplotGrob(no)
plot(rbind(gNo, gYes, size = "first")) # Arrange and plot the grobs
Edit To change heights of plots:
g = rbind(gNo, gYes, size = "first") # Combine the plots
panels <- g$layout$t[grepl("panel", g$layout$name)] # Get the positions for plot panels
g$heights[panels] <- unit(c(0.7, 0.3), "null") # Replace heights with your relative heights
plot(g)
I can think of (at least) two ways to align the x-axes in the two plots:
The two axis do not align because in the bar plot, the geoms cover the x-axis from 0.5 to 27.5, while in the other plot, the data only ranges from 1 to 27. The reason is that the bars have a width and the points don't. You can force the axex to align by explicitly specifying an x-axis range. Using the definitions from your plot, this can be achieved by
yes <- yes + scale_x_continuous(limits=c(0,28))
no <- no + scale_x_continuous(limits=c(0,28))
grid.arrange(no, yes)
limits sets the range of the x-axis. Note, though, that the alginment is still not quite perfect. The y-axis labels take up a little more space in the upper plot, because the numbers have two digits. The plot looks as follows:
The other solution is a bit more complicated but it has the advantage that the x-axis is drawn only once and that ggplot makes sure that the alignment is perfect. It makes use of faceting and the trick described in this answer. First, the data must be combined into a single data frame by
all <- rbind(data.frame(other_data,type="other"),data.frame(data,type="data"))
and then the plot can be created as follows:
ggplot(all,aes(x=x,y=y)) + facet_grid(type~.,scales = "free_y") +
geom_bar(data=subset(all,type=="other"),stat="identity") +
geom_point(data=subset(all,type=="data")) +
geom_line(data=subset(all,type=="data"))
The trick is to let the facets be constructed by the variable type which was used before to label the two data sets. But then each geom only gets the subset of the data that should be drawn with that specific geom. In facet_grid, I also used scales = "free_y" because the two y-axes should be independent. This plot looks as follows:
You can change the labels of the facets by giving other names when you define the data frame all. If you want to remove them alltogether, then add the following to your plot:
+ theme(strip.background = element_blank(), strip.text = element_blank())

Modifying ggplot2 Y axis to use integers without enforcing an upper limit [duplicate]

This question already has answers here:
How to display only integer values on an axis using ggplot2
(13 answers)
Closed 8 years ago.
I am trying to modify the axes in ggplot2 so that it is one decimal point and has a label for every integer. However, I want to do it without an upper limit so that it will automatically adjust to data of different counts.
The difference between my question and the question posed here (that I was flagged as being a duplicate of) is that I need to make this work automatically for many different data sets, not just for a single one. It must automatically choose the upper limit instead of creating a fixed y-axis with breaks=(0,2,4,...). This question has been answered extremely well by #DidzisElferts below.
Here is my work:
library(data.table)
library(scales)
library(ggplot2)
mtcars <- data.table(mtcars)
mtcars$Cylinders <- as.factor(mtcars$cyl)
mtcars$Gears <- as.factor(mtcars$gear)
setkey(mtcars, Cylinders, Gears)
mtcars <- mtcars[CJ(unique(Cylinders), unique(Gears)), .N, allow.cartesian = TRUE]
ggplot(mtcars, aes(x=Cylinders, y = N, fill = Gears)) +
geom_bar(position="dodge", stat="identity") +
ylab("Count") + theme(legend.position="top") +
scale_x_discrete(drop = FALSE)
As you can see, ggplot2 is plotting the axes with a decimal point and doing it every 2.5 automatically. I'd like to change that. Any way to do so?
integer_breaks <- function(x)
seq(floor(min(x)), ceiling(max(x)))
ggplot(mtcars, aes(x=Cylinders, y = N, fill = Gears)) +
geom_bar(position="dodge", stat="identity") +
ylab("Count") + theme(legend.position="top") +
scale_y_continuous(breaks=integer_breaks) +
scale_x_discrete(drop = FALSE)
Use scale_y_continuous(breaks=c(0,2,4,6,8,10)). So your plotting code will look like:
ggplot(mtcars, aes(x=Cylinders, y = N, fill = Gears)) +
geom_bar(position="dodge", stat="identity") +
ylab("Count") + theme(legend.position="top") +
scale_y_continuous(breaks=c(0,2,4,6,8,10)) +
scale_x_discrete(drop = FALSE)
EDIT: Alternatively you can use scale_y_continuous(breaks=seq(round(max(mtcars$N),0))) in order to automatically adjust the scale to the maximum value of the y-variable. When you want the breaks more then 1 from each other, you can use for example seq(from=0,to=round(max(mtcars$N),0),by=2)

Resources