I'm having problems renaming my figure legend. When I try using scale_color_discrete to do this the legend duplicates on the graph:
This is the code I've used:
Scoping <- read.csv("Data/scoping.csv")
#Enzyme column must be turned into a factor
Scoping$Enzyme <- as.factor(Scoping$Enzyme)
#Creating scatterplot called scopplt
scopplt <- ggplot(Scoping,aes(x=Time,y=PNP,shape=Enzyme, color=Enzyme))+
geom_point(size=2)+
theme_classic()+
scale_y_continuous(limits=c(0,120), breaks = c(0,30,60,90,120), name = "[PNP] µM")+
scale_x_continuous(limits=c(0,12), breaks = c(0,2,4,6,8,10,12), name = "Time (min)")+
theme(legend.position = c(0.2, 0.6))
scopplt
# Adding linear regression
scopplt+geom_smooth(method=lm,se=FALSE,fullrange=TRUE,
aes(color=Enzyme)) +
scale_color_discrete(name= "[Enzyme] µM")
Does anyone know why this is happening. Thanks.
From what I can tell, you are calling scale_color_discrete because you are trying to rename the legend. If that is indeed what you trying to do with that line, you are taking the wrong approach. The problem is that you are changing both the color and shape of the points by Enzyme, and scale_color_discrete only applies to the color. To change the legend title, you can do what teunbrand suggested so that ggplot knows that you want the same title for the color and shape, thereby putting the two legends together. Or you can also replace scale_color_discrete(name = "[Enzyme] µM") with labs(color = "[Enzyme] µM", shape = "[Enzyme] µM"). My intuition tells me there should be a simpler way of doing this, but I am unable to figure it out at this point in time.
Related
I am using quantile regression in R with the qgam package and visualising them using the mgcViz package, but I am struggling to understand how to control the appearance of the plots. The package effectively turns gams (in my case mqgams) into ggplots.
Simple reprex:
egfit <- mqgam(data = iris,
Sepal.Length ~ s(Petal.Length),
qu = c(0.25,0.5,0.75))
plot.mgamViz(getViz(egfit))
I am able to control things that can be added, for example the axis labels and theme of the plot, but I'm struggling to effect things that would normally be addressed in the aes() or geom_x() functions.
How would I control the thickness of the line? If this were a normal geom_smooth() or geom_line() I'd simply put size = 1 inside of the geoms, but I cannot see how I'd do so here.
How can I control the linetype of these lines? The "id" is continuous and one cannot supply a linetype to a continuous scale. If this were a nomral plot I would convert "id" to a character, but I can't see a way of doing so with the plot.mgamViz function.
How can I supply a new colour scale? It seems as though if I provide it with a new colour scale it invents new ID values to put on the legend that don't correlate to the actual "id" values, e.g.
plot.mgamViz(getViz(egfit)) + scale_colour_viridis_c()
I fully expect this to be relatively simple and I'm missing something obvious, and imagine the answer to all three of these subquestions are very similar to one another. Thanks in advance.
You need to extract your ggplot element using this:
p1 <- plot.mgamViz(getViz(egfit))
p <- p1$plots [[1]]$ggObj
Then, id should be as.factor:
p$data$id <- as.factor(p$data$id)
Now you can play with ggplot elements as you prefer:
library(mgcViz)
egfit <- mqgam(data = iris,
Sepal.Length ~ s(Petal.Length),
qu = c(0.25,0.5,0.75))
p1 <- plot.mgamViz(getViz(egfit))
# Taking gg infos and convert id to factor
p <- p1$plots [[1]]$ggObj
p$data$id <- as.factor(p$data$id)
# Changing ggplot attributes
p <- p +
geom_line(linetype = 3, size = 1)+
scale_color_brewer(palette = "Set1")+
labs(x="Petal Length", y="s(Petal Length)", color = "My ID labels:")+
theme_classic(14)+
theme(legend.position = "bottom")
p
Here the generated plot:
Hope it is useful!
I'm plotting a sort of chloropleth of up to three selectable species abundances across a research area. This toy code behaves as expected and does almost what I want:
library(dplyr)
library(ggplot2)
square <- expand.grid(X=0:10, Y=0:10)
sq2 <- square[rep(row.names(square), 2),] %>%
arrange(X,Y) %>%
mutate(SPEC = rep(c('red','blue'),len=n())) %>%
mutate(POP = ifelse(SPEC %in% 'red', X, Y)) %>%
group_by(X,Y) %>%
mutate(CLR = rgb(X/10,0,Y/10)) %>% ungroup()
ggplot(sq2, aes(x=X, y=Y, fill=CLR)) + geom_tile() +
scale_fill_identity("Species", guide="legend",
labels=c('red','blue'), breaks=c('#FF0000','#0000FF'))
Producing this:
A modified version properly plots the real map, appropriately mixing the RGBs to show the species proportions per map unit. But given that mixing, the real data does not necessarily include the specific values listed in breaks, in which case no entry appears in the legend for that species. If you change the last line of the example to
labels=c('red','blue','green'), breaks=c('#FF0000','#0000FF','#00FF00'))
you get the same legend as shown, with only 'red' and 'blue' displayed, as there is no green in it. Searching the data for each max(Species) and assigning those to the legend is possible but won't make good legend keys for species that only occur in low proportions. What's needed is for the legend to display the idea of the entities present, not their attested presences -- three colors in the legend even if only one species is detected.
I'd think that scale_fill_manual() or the override.aes argument might help me here but I haven't been able to make any combination work.
Edit: Episode IV -- A New Dead End
(Thanks #r2evans for fixing my omission of packages.)
I thought I might be able to trick the legend by mutating a further column into the df in the processing pipe called spCLR to represent the color ('#FF0000', e.g.) that codes each entry's species (redundant info, but fine). Now the plotting call in my real version goes:
df %>% [everything] %>%
ggplot(aes(x = X, y = Y, height = WIDTH, width = WIDTH, fill = CLR)) +
geom_tile() +
scale_fill_identity("Species", guide="legend",
labels=spCODE, breaks=spCLR)
But this gives the error: Error in check_breaks_labels(breaks, labels) : object 'spCLR' not found. That seems weird since spCLR is indeed in the pipe-modified df, and of all the values supplied to the ggplot functions spCODE is the only one present in the original df -- so if there's some kind of scope problem I don't get it. [Re-edit -- I see that neither labels nor breaks wants to look at df$anything. Anyway.]
I assume (rightly?) there's some way to make this one work [?], but it still wouldn't make the legend show 'red', 'blue' and 'green' in my toy example -- which is what my original question is really about -- because there is still no actual green-data present in that. So to reiterate, isn't there any way to force a ggplot2 legend to show the things you want to talk about, rather than just the ones that are present in the data?
I have belatedly discovered that my question is a near-duplicate of this. The accepted answer there (from #joran) doesn't work for this but the second answer (from #Axeman) does. So the way for me to go here is that the last line should be
labels=c('red','blue','green'), limits=c('#FF0000','#0000FF','#00FF00'))
calling limits() instead of breaks(), and now my example and my real version work as desired.
I have to say I spent a lot of time digging around in the ggplot2 reference without ever gaining a suspicion that limits() was the correct alternative to breaks() -- which is explicitly mentioned in that ref page while limits() does not appear. The ?limits() page is quite uninformative, and I can't find anything that lays out the distinctions between the two: when this rather than that.
I assume from the heatmap use case that you have no other need for colour mapping in the chart. In this case, a possible workaround is to leave the fill scale alone, & create an invisible geom layer with colour aesthetic mapping to generate the desired legend instead:
ggplot(sq2, aes(x=X, y=Y)) +
geom_tile(aes(fill = CLR)) + # move fill mapping here so new point layer doesn't inherit it
scale_fill_identity() + # scale_*_identity has guide set to FALSE by default
# add invisible layer with colour (not fill) mapping, within x/y coordinates within
# same range as geom_tile layer above
geom_point(data = . %>%
slice(1:3) %>%
# optional: list colours in the desired label order
mutate(col = forcats::fct_inorder(c("red", "blue", "green"))),
aes(colour = col),
alpha = 0) +
# add colour scale with alpha set to 1 (overriding alpha = 0 above),
# also make the shape square & larger to mimic the default legend keys
# associated with fill scale
scale_color_manual(name = "Species",
values = c("red" = '#FF0000', "blue" = '#0000FF', "green" = '#00FF00'),
guide = guide_legend(override.aes = list(alpha = 1, shape = 15, size = 5)))
I've got a bar graph whose variable labels (a couple of them) need changing. In the specific example here, I've got a variable "Sputum.Throat" which refers to samples which could be either sputum or throat swabs, so the label for this value should really read "Sputum/Throat" or even "Sputum or Throat Swab" (this latter would only work if I can wrap the text). So far, no syntax I've tried can pull this off.
Here's my code:
CultPerf <- data.frame(Blood=ForAnalysis$Cult_lastmo_blood, CSF=ForAnalysis$Cult_lastmo_csf, Fecal=ForAnalysis$Cult_lastmo_fecal, Genital=ForAnalysis$Cult_lastmo_genital, `Sputum-Throat`=ForAnalysis$`Cult_lastmo_sput-throat`, Urine=ForAnalysis$Cult_lastm_urine, `Wound-Surgical`=ForAnalysis$`Cult_lastmo_wound-surg`, Other=ForAnalysis$Cult_lastmo_oth)
CP <- data.table::melt(CultPerf, variable.names("Frequency"))
CP$value <- factor(CP$value, levels=c(">100","50-100","25-50","0-25"))
CP$variable <- factor(CP$variable, levels = c("Other","Wound.Surgical","Urine","Sputum.Throat","Genital","Fecal","CSF","Blood"))
ggplot(data=CP)+
geom_bar(aes(x=variable, fill = value), position="dodge", width = 0.9)+
labs(x="Culture Type", y="Number of Labs", title="Number of Cultures Performed Per Month at Study Hospitals", subtitle="n=140")+
coord_flip()+
theme(legend.title = element_blank(),aspect.ratio = 1.25/1,plot.subtitle=element_text(face="italic",hjust=0.5),plot.title=element_text(hjust=0.5))+
guides(fill = guide_legend(reverse = TRUE))
And for reference, here's a copy of the successful plot which it does produce:
As I mentioned, all I want to do is change those labels of the individual values on the Y axis. Any suggestions will be appreciated!
If you want to just change the axis label for that one category, try adding in this line
scale_x_discrete(labels=c("Sputum.Throat"="Sputum/Throat"))
be sure to add it (+) to your ggplot object.
Using the helpful suggestion from #MrFlick above (with my thanks), I added the following to my ggplot code, which also gave me a word-wrapped label for the second label:
scale_x_discrete(labels=c("Sputum.Throat"="Sputum/Throat", "Wound.Surgical"="Surgical or \n Other Wound"))+
Resultant plot looks like this:
Revised plot
In trying to color my stacked histogram according to a factor column; all the bars have a "green" roof? I want the bar-top to be the same color as the bar itself. The figure below shows clearly what is wrong. All the bars have a "green" horizontal line at the top?
Here is a dummy data set :
BodyLength <- rnorm(100, mean = 50, sd = 3)
vector <- c("80","10","5","5")
colors <- c("black","blue","red","green")
color <- rep(colors,vector)
data <- data.frame(BodyLength,color)
And the program I used to generate the plot below :
plot <- ggplot(data = data, aes(x=data$BodyLength, color = factor(data$color), fill=I("transparent")))
plot <- plot + geom_histogram()
plot <- plot + scale_colour_manual(values = c("Black","blue","red","green"))
Also, since the data column itself contains color names, any way I don't have to specify them again in scale_color_manual? Can ggplot identify them from the data itself? But I would really like help with the first problem right now...Thanks.
Here is a quick way to get your colors to scale_colour_manual without writing out a vector:
data <- data.frame(BodyLength,color)
data$color<- factor(data$color)
and then later,
scale_colour_manual(values = levels(data$color))
Now, with respect to your first problem, I don't know exactly why your bars have green roofs. However, you may want to look at some different options for the position argument in geom_histogram, such as
plot + geom_histogram(position="identity")
..or position="dodge". The identity option is closer to what you want but since green is the last line drawn, it overwrites previous the colors.
I like density plots better for these problems myself.
ggplot(data=data, aes(x=BodyLength, color=color)) + geom_density()
ggplot(data=data, aes(x=BodyLength, fill=color)) + geom_density(alpha=.3)
I am preparing series of plots, using sjPlot package. For simple frequencies presentation I use sjp.frq. I would like to use different colors for each bar. I found the option to choose color but it works only for whole series: the switch geom.colors allows to change the color of all bars. Even the combination geom.colors=c("color1","color2","color3") doesn't work.
Is there any solution to achieve something similar to this:
data(mpg)
sjp.frq(mpg$year,title = "", axis.title = "",
show.prc = TRUE, show.n = FALSE,
show.axis.values = FALSE)
I'm not sure, but I think I recall that ggplot2 used this color scheme by default for plots, if no color aesthetics was specified. However, later versions of ggplot now use a single color for simple frequencies (without grouping/colour aesthetics):
library(ggplot2)
library(sjmisc)
data(efc)
ggplot(efc, aes(e42dep)) + geom_bar()
That's why the image you posted has different colors, while now sjp.frq prints bars in one color only. Since you don't have a grouping aesthetics for simple frequency bars, you can't provide different colors for each geom / bar in sjp.frq. In this case, you have to find your own solution and add a group-aes, like:
ggplot(efc, aes(e42dep, fill = to_label(e42dep))) +
geom_bar() +
labs(y = NULL, x = get_label(efc$e42dep), fill = get_label(efc$e42dep)) +
scale_x_continuous(breaks = c(1:4), labels = get_labels(efc$e42dep))
However, to me it does not make much sense to give each bar a seperate color and provide axis labels. Using a legend instead of axis labels (drop axis labels) would work, but this makes the graph less intuitive, because you have to switch between legend and bars to find out which bar represents which category. For simple frequency plots, this is unnecessary complexity.