I've got a bar graph whose variable labels (a couple of them) need changing. In the specific example here, I've got a variable "Sputum.Throat" which refers to samples which could be either sputum or throat swabs, so the label for this value should really read "Sputum/Throat" or even "Sputum or Throat Swab" (this latter would only work if I can wrap the text). So far, no syntax I've tried can pull this off.
Here's my code:
CultPerf <- data.frame(Blood=ForAnalysis$Cult_lastmo_blood, CSF=ForAnalysis$Cult_lastmo_csf, Fecal=ForAnalysis$Cult_lastmo_fecal, Genital=ForAnalysis$Cult_lastmo_genital, `Sputum-Throat`=ForAnalysis$`Cult_lastmo_sput-throat`, Urine=ForAnalysis$Cult_lastm_urine, `Wound-Surgical`=ForAnalysis$`Cult_lastmo_wound-surg`, Other=ForAnalysis$Cult_lastmo_oth)
CP <- data.table::melt(CultPerf, variable.names("Frequency"))
CP$value <- factor(CP$value, levels=c(">100","50-100","25-50","0-25"))
CP$variable <- factor(CP$variable, levels = c("Other","Wound.Surgical","Urine","Sputum.Throat","Genital","Fecal","CSF","Blood"))
ggplot(data=CP)+
geom_bar(aes(x=variable, fill = value), position="dodge", width = 0.9)+
labs(x="Culture Type", y="Number of Labs", title="Number of Cultures Performed Per Month at Study Hospitals", subtitle="n=140")+
coord_flip()+
theme(legend.title = element_blank(),aspect.ratio = 1.25/1,plot.subtitle=element_text(face="italic",hjust=0.5),plot.title=element_text(hjust=0.5))+
guides(fill = guide_legend(reverse = TRUE))
And for reference, here's a copy of the successful plot which it does produce:
As I mentioned, all I want to do is change those labels of the individual values on the Y axis. Any suggestions will be appreciated!
If you want to just change the axis label for that one category, try adding in this line
scale_x_discrete(labels=c("Sputum.Throat"="Sputum/Throat"))
be sure to add it (+) to your ggplot object.
Using the helpful suggestion from #MrFlick above (with my thanks), I added the following to my ggplot code, which also gave me a word-wrapped label for the second label:
scale_x_discrete(labels=c("Sputum.Throat"="Sputum/Throat", "Wound.Surgical"="Surgical or \n Other Wound"))+
Resultant plot looks like this:
Revised plot
Related
I'm having problems renaming my figure legend. When I try using scale_color_discrete to do this the legend duplicates on the graph:
This is the code I've used:
Scoping <- read.csv("Data/scoping.csv")
#Enzyme column must be turned into a factor
Scoping$Enzyme <- as.factor(Scoping$Enzyme)
#Creating scatterplot called scopplt
scopplt <- ggplot(Scoping,aes(x=Time,y=PNP,shape=Enzyme, color=Enzyme))+
geom_point(size=2)+
theme_classic()+
scale_y_continuous(limits=c(0,120), breaks = c(0,30,60,90,120), name = "[PNP] µM")+
scale_x_continuous(limits=c(0,12), breaks = c(0,2,4,6,8,10,12), name = "Time (min)")+
theme(legend.position = c(0.2, 0.6))
scopplt
# Adding linear regression
scopplt+geom_smooth(method=lm,se=FALSE,fullrange=TRUE,
aes(color=Enzyme)) +
scale_color_discrete(name= "[Enzyme] µM")
Does anyone know why this is happening. Thanks.
From what I can tell, you are calling scale_color_discrete because you are trying to rename the legend. If that is indeed what you trying to do with that line, you are taking the wrong approach. The problem is that you are changing both the color and shape of the points by Enzyme, and scale_color_discrete only applies to the color. To change the legend title, you can do what teunbrand suggested so that ggplot knows that you want the same title for the color and shape, thereby putting the two legends together. Or you can also replace scale_color_discrete(name = "[Enzyme] µM") with labs(color = "[Enzyme] µM", shape = "[Enzyme] µM"). My intuition tells me there should be a simpler way of doing this, but I am unable to figure it out at this point in time.
I have used the following code to generate a plot with ggplot:
I want the legend to show the runs 1-8 and only the volumes 12.5 and 25 why doesn't it show it?
And is it possible to show all the points in the plot even though there is an overlap? Because right now the plot only shows 4 of 8 points due to overlap.
OP. You've already been given a part of your answer. Here's a solution given your additional comment and some explanation.
For reference, you were looking to:
Change a continuous variable to a discrete/discontinuous one and have that reflected in the legend.
Show runs 1-8 labeled in the legend
Disconnect lines based on some criteria in your dataset.
First, I'm representing your data here again in a way that is reproducible (and takes away the extra characters so you can follow along directly with all the code):
library(ggplot2)
mydata <- data.frame(
`Run`=c(1:8),
"Time"=c(834, 834, 584, 584, 1184, 1184, 938, 938),
`Area`=c(55.308, 55.308, 79.847, 79.847, 81.236, 81.236, 96.842, 96.842),
`Volume`=c(12.5, 12.5, 12.5, 12.5, 25.0, 25.0, 25.0, 25.0)
)
Changing to a Discrete Variable
If you check the variable type for each column (type str(mydata)), you'll see that mydata$Run is an int and the rest of the columns are num. Each column is understood to be a number, which is treated as if it were a continuous variable. When it comes time to plot the data, ggplot2 understands this to mean that since it is reasonable that values can exist between these (they are continuous), any representation in the form of a legend should be able to show that. For this reason, you get a continuous color scale instead of a discrete one.
To force ggplot2 to give you a discrete scale, you must make your data discrete and indicate it is a factor. You can either set your variable as a factor before plotting (ex: mydata$Run <- as.factor(mydata$Run), or use code inline, referring to aes(size = factor(Run),... instead of just aes(size = Run,....
Using reference to factor(Run) inline in your ggplot calls has the effect of changing the name of the variable to be "factor(Run)" in your legend, so you will have to also add that to the labs() object call. In the end, the plot code looks like this:
ggplot(data = mydata, aes(x=Area, y=Time)) +
geom_point(aes(color =as.factor(Volume), size = Run)) +
geom_line() +
labs(
x = "Area", y = "Time",
# This has to be changed now
color='Volume'
) +
theme_bw()
Note in the above code I am also not referring to mydata$Run, but just Run. It is greatly preferable that you refer to just the name of the column when using ggplot2. It works either way, but much better in practice.
Disconnect Lines
The reason your lines are connected throughout the data is because there's no information given to the geom_line() object other than the aesthetics of x= and y=. If you want to have separate lines, much like having separate colors or shapes of points, you need to supply an aesthetic to use as a basis for that. Since the two lines are different based on the variable Volume in your dataset, you want to use that... but keep the same color for both. For this, we use the group= aesthetic. It tells ggplot2 we want to draw a line for each piece of data that is grouped by that aesthetic.
ggplot(data = mydata, aes(x=Area, y=Time)) +
geom_point(aes(color =as.factor(Volume), size = Run)) +
geom_line(aes(group=as.factor(Volume))) +
labs(
x = "Area", y = "Time", color='Volume'
) +
theme_bw()
Show Runs 1-8 Labeled in Legend
Here I'm reading a bit into what you exactly wanted to do in terms of "showing runs 1-8" in the legend. This could mean one of two things, and I'll assume you want both and show you how to do both.
Listing and showing sizes 1-8 in the legend.
To set the values you see in the scale (legend) for size, you can refer to the various scale_ functions for all types of aesthetics. In this case, recall that since mydata$Run is an int, it is treated as a continuous scale. ggplot2 doesn't know how to draw a continuous scale for size, so the legend itself shows discrete sizes of points. This means we don't need to change Run to a factor, but what we do need is to indicate specifically we want to show in the legend all breaks in the sequence from 1 to 8. You can do this using scale_size_continuous(breaks=...).
ggplot(data = mydata, aes(x=Area, y=Time)) +
geom_point(aes(color =as.factor(Volume), size = Run)) +
geom_line(aes(group=as.factor(Volume))) +
labs(
x = "Area", y = "Time", color='Volume'
) +
scale_size_continuous(breaks=c(1:8)) +
theme_bw()
Showing all of your runs as points.
The note about showing all runs might also mean you want to literally see each run represented as a discrete point in your plot. For this... well, they already are! ggplot2 is plotting each of your points from your data into the chart. Since some points share the same values of x= and y=, you are getting overplotting - the points are drawn over top of one another.
If you want to visually see each point represented here, one option could be to use geom_jitter() instead of geom_point(). It's not really great here, because it will look like your data has different x and y values, but it is an option if this is what you want to do. Note in the code below I'm also changing the shape of the point to be a hollow circle for better clarity, where the color= is the line around each point (here it's black), and the fill= aesthetic is instead used for Volume. You should get the idea though.
set.seed(1234) # using the same randomization seed ensures you have the same jitter
ggplot(data = mydata, aes(x=Area, y=Time)) +
geom_jitter(aes(fill =as.factor(Volume), size = Run), shape=21, color='black') +
geom_line(aes(group=as.factor(Volume))) +
labs(
x = "Area", y = "Time", fill='Volume'
) +
scale_size_continuous(breaks=c(1:8)) +
theme_bw()
I am a newbie to R and hence having some problems in plotting using ggplot and hence need help.
In the above diagram, if any of my bars have high values (in this case, a green one with value of 447), the plot and the plot title gets overlapped. The values here are normalised / scaled such that the y-axis values are always between 0-100, though the label might indicate a different number (this is the actual count of occurrences, where as the scaling is done based on percentages).
I would like to know how to avoid the overlap of the plot with the plot title, in all cases, where the bar heights are very close to 100.
The ggplot function I am using is as below.
my_plot<-ggplot(data_frame,
aes(x=as.factor(X_VAR),y=GROUP_VALUE,fill=GROUP_VAR)) +
geom_bar(stat="identity",position="dodge") +
geom_text(aes(label = BAR_COUNT, y=GROUP_VALUE, ymax=GROUP_VALUE, vjust = -1), position=position_dodge(width=1), size = 4) +
theme(axis.text.y=element_blank(),axis.text.x=element_text(size=12),legend.position = "right",legend.title=element_blank()) + ylab("Y-axis label") +
scale_fill_discrete(breaks=c("GRP_PERCENTAGE", "NORMALIZED_COUNT"),
labels=c("Percentage", "Count of Jobs")) +
ggtitle("Distribution based on Text Analysis 2nd Level Sub-Category") +
theme(plot.title = element_text(lineheight=1, face="bold"))
Here is the ggsave command, in case if that is creating the problem, with dpi, height and width values.
ggsave(my_plot,file=paste(paste(variable_name,"my_plot",sep="_"),".png",sep = ""),dpi=72, height=6.75,width=9)
Can anyone please suggest what need to be done to get this right?
Many Thanks
As Axeman suggests ylim is useful Have a look at the documentation here:
http://docs.ggplot2.org/0.9.3/xylim.html
In your code:
my_plot + ylim(0,110)
Also, I find this intro to axis quite useful:
http://www.cookbook-r.com/Graphs/Axes_(ggplot2)/
Good luck!
I have a scatterplot that breaks the points out into different colors by the category. I want one of these categories to have a line connecting the dots to highlight this category's data. I'm having trouble figuring this out...
Round <- read.csv("http://goo.gl/3c3vBU") # Data
qplot(factor(Round), Opp.Landed, data=floyd, color=Opponent, size=Opp.Percent.Landed, alpha = I(0.7)) +
labs(x="Round", y="Punches Landed", title="Opponent Punches Landed / Percentage", colour="Boxer", size="Connect Percentage") +
scale_linetype_manual(values=1:2, labels=c("Boxer", "Connect Percentage")) +
guides(colour = guide_legend(override.aes = list(size=5)))
The ftheme code is just colors and formatting. Any ideas? I've tried adding geom_line(aes(linetype=floyd[Opponent="Manny Pacquiao"]), size=1) but it errors out with
Error in [.data.frame`(floyd, Opponent = "Manny Pacquiao") : unused argument (Opponent = "Manny Pacquiao")
EDIT: I've updated the code above to exclude ftheme so it's reproducable. Please see the sample dataset from with three categories. I just want any one of these to have connected points: http://goo.gl/3c3vBU
I can't give a tailored answer without being able to run your code on a sample of your data, but you can use scale_color_manual to set the colour of the category you want to highlight to, say, "red" and set all the others to NA. For example, if the category you want to highlight is the second category and you have a total of five categories, then add this to your plot code:
scale_colour_manual(values=c(NA, "red", rep(NA,3)))
If you have points that are tied to the color aesthetic as well, then you'll need to change the points to a fill aesthetic (e.g., fill=Opponent) and use a filled point marker that you can set manually using shape or pch. Otherwise, your point markers will disappear along with the lines. Marker numbers 21 through 25 are filled (see ?pch for more on point markers).
UPDATE: Here's my attempt using the data you provided. I'm not exactly sure how you want the legends and other details to look, so let me know if this works. I've switched to ggplot, as I don't know the ins and outs of qplot.
ggplot(floyd, aes(factor(Round), Opp.Landed, color=Opponent,
fill=Opponent, group=Opponent, size=Opp.Percent.Landed),
alpha = 0.7, pch=21) +
geom_point(pch=21, colour=NA) +
geom_line() +
labs(x="Round", y="Punches Landed", title="Opponent Punches Landed / Percentage",
colour="Boxer", size="Connect Percentage") +
scale_linetype_manual(values=1:2, labels=c("Boxer", "Connect Percentage")) +
scale_colour_manual(values=c(hcl(15,100,65), NA, NA), guide="none") +
guides(fill = guide_legend(override.aes = list(size=5)))
Try to add:
geom_line(data=subset(floyd,Opponent=="Manny Pacquiao"), aes(factor(Round), Opp.Landed, group=Opponent), size = 2)
Very easy piece of code which makes a subset of your data of which gives a nice line of size 2 through your data points.
(for the image I used the opponent Miguel Cotto since you did not provide Manny Pacquiao in the data set)
I want to put labels of the percentages on my stacked bar plot. However, I only want to label the largest 3 percentages for each bar. I went through a lot of helpful posts on SO (for example: 1, 2, 3), and here is what I've accomplished so far:
library(ggplot2)
groups<-factor(rep(c("1","2","3","4","5","6","Missing"),4))
site<-c(rep("Site1",7),rep("Site2",7),rep("Site3",7),rep("Site4",7))
counts<-c(7554,6982, 6296,16152,6416,2301,0,
20704,10385,22041,27596,4648, 1325,0,
17200, 11950,11836,12303, 2817,911,1,
2580,2620,2828,2839,507,152,2)
tapply(counts,site,sum)
tot<-c(rep(45701,7),rep(86699,7), rep(57018,7), rep(11528,7))
prop<-sprintf("%.1f%%", counts/tot*100)
data<-data.frame(groups,site,counts,prop)
ggplot(data, aes(x=site, y=counts,fill=groups)) + geom_bar()+
stat_bin(geom = "text",aes(y=counts,label = prop),vjust = 1) +
scale_y_continuous(labels = percent)
I wanted to insert my output image here but don't seem to have enough reputation...But the code above should be able to produce the plot.
So how can I only label the largest 3 percentages on each bar? Also, for the legend, is it possible for me to change the order of the categories? For example put "Missing" at the first. This is not a big issue here but for my real data set, the order of the categories in the legend really bothers me.
I'm new on this site, so if there's anything that's not clear about my question, please let me know and I will fix it. I appreciate any answer/comments! Thank you!
I did this in a sort of hacky manner. It isn't that elegant.
Anyways, I used the plyr package, since the split-apply-combine strategy seemed to be the way to go here.
I recreated your data frame with a variable perc that represents the percentage for each site. Then, for each site, I just kept the 3 largest values for prop and replaced the rest with "".
# I added some variables, and added stringsAsFactors=FALSE
data <- data.frame(groups, site, counts, tot, perc=counts/tot,
prop, stringsAsFactors=FALSE)
# Load plyr
library(plyr)
# Split on the site variable, and keep all the other variables (is there an
# option to keep all variables in the final result?)
data2 <- ddply(data, ~site, summarize,
groups=groups,
counts=counts,
perc=perc,
prop=ifelse(perc %in% sort(perc, decreasing=TRUE)[1:3], prop, ""))
# I changed some of the plotting parameters
ggplot(data2, aes(x=site, y=perc, fill=groups)) + geom_bar()+
stat_bin(geom = "text", aes(y=perc, label = prop),vjust = 1) +
scale_y_continuous(labels = percent)
EDIT: Looks like your scales are wrong in your original plotting code. It gave me results with 7500000% on the y axis, which seemed a little off to me...
EDIT: I fixed up the code.