I've built a function that utilizes ggplot2 to create a bar chart for a given style of summary table, but there are a few changes I'd like to make that I haven't quite figured out. Here's what the function looks like:
bar_chart_dist <- function(df, x_var, y_var, title) {
title_in_fun <- title
p <- ggplot(df, aes_string(x = df[,x_var], y = df[,y_var])) +
geom_bar(stat = "identity", fill="#005a8c") +
geom_text(aes_string(label= y_var, vjust = -0.2)) +
xlab("") + ylab(y_var) + my_theme() +
ggtitle(title_in_fun) + scale_y_continuous(limits=c(0,100))
return(p)
}
The my_theme function edits the font family to Open Sans and changes the color to grey, among other things.
The data frames I am using with this function are each three variables long -- topic (this name changes with each given dataframe), n (number of observations) and percent_of_population (pre-calculated percent of total population in a given group). I'm using topic as my x_var and percent_of_population as my y_var.
There are a few things here I haven't gotten to work:
1) I'd like the y-axis to be labeled with a percent sign (%) and to span 0% to 100%. I've tried to edit the scale_y_continuous argument to be:
scale_y_continuous(labels=percent, limits=c(0,100))
but that changes the scale such that my upper boundary is 10,000%.
2) I'd like to change the font color, size, and family in the geom_text argument, as well as add a % sign to the label. The family I'd like to use is Open Sans, but it doesn't seem to recognize that. When I set size = 4, a legend is created, which does not seem to happen in the examples I've looked at.
Any help you guys can provide is very much appreciated. I'm not sure what's not working because this is wrapped in a function, and what's not working because it's the wrong approach. Here's what the plot looks like in current state:
Related
I'm using ggplot and I get those weird horizontal lines out of geom_bar. I cannot provide a minimal working example: the same code works with few observations and it relies on data I am importing and transforming. However, I can show the relevant line of codes and cross my fingers someone ran into this issue:
ggplot(data) + geom_bar(aes(x=Horizon, y=Importance, fill=Groups),
position='fill', stat='identity') +
theme_timeseries2() +
scale_fill_manual(values=c('#1B9E77', 'orange2', 'black',
'red2', 'blue4')) +
xlab('') + ylab('')
My personal function, theme_timeseries2() isn't the source of the problem: it happens even if I stop after geom_bar. I checked for missing values in Importance and every other column of my data frame and there are none.
It's also very odd: the white lines aren't the same on the zoomed page as in the plot window of RStudio. They do print in .png format when I save the file, so there really is something going on with those horizontal bars. Any theory about why geom_bar() does this would be highly appreciated.
You can fix it by adding the fill as color. Like this:
geom_bar(aes(x=Horizon, y=Importance, fill=Groups, color=Groups),
position='fill', stat='identity')
This was suggested here.
I'm guessing the lines are due to a plotting bug between observations that go into each bar. (That could be related to the OS, the graphics device, and/or how ggplot2 interacts with them...)
I expect it'd go away if you summarized before ggplot2, e.g.:
library(dplyr);
data %>%
count(Horizon, Groups, wt = Importance, name = "Importance") %>%
ggplot() +
geom_col(aes(x = Horizon, y= Importance, fill = Groups), position = "fill") + ....
Mine went away when changing the size of the graphs in rmarkdown.
I am trying to make one figure with two categories of data, which looks like:
A comparison between two groups (indicated by pink and black) concerning various different species
It seems the author of this figure put two boxplot pictures into one figure. I constructed similar boxplot by R, codes like below:
{library(reshape2)
species_melt <- melt(species, "Species")
library(ggplot2)
p<-ggplot(species_melt, aes(Species, value),color="Red") + geom_boxplot()
windowsFonts(myFont1=windowsFont("Arial"),myFont2=windowsFont("Times New Roman"))
p+scale_y_log10()}
Which generate a boxplot like below (partly):
enter image description here
Thus I wonder how I could add another layer of boxplot on it, yet it seems difficult with R.
It's hard to test without having your data, but something like this should work:
library(ggplot2)
ggplot() +
geom_boxplot(data = species_melt_1,
aes(Species, value),
fill = "#ff84b3", color = "#994f6b") +
geom_boxplot(data = species_melt_2,
aes(Species, value),
alpha = 0, color = "black")
I'm using two geom_boxplot's with different datasets (species_melt_1 and species_melt_2). First one is reddish and second one is transparent.
I am a newbie to R and hence having some problems in plotting using ggplot and hence need help.
In the above diagram, if any of my bars have high values (in this case, a green one with value of 447), the plot and the plot title gets overlapped. The values here are normalised / scaled such that the y-axis values are always between 0-100, though the label might indicate a different number (this is the actual count of occurrences, where as the scaling is done based on percentages).
I would like to know how to avoid the overlap of the plot with the plot title, in all cases, where the bar heights are very close to 100.
The ggplot function I am using is as below.
my_plot<-ggplot(data_frame,
aes(x=as.factor(X_VAR),y=GROUP_VALUE,fill=GROUP_VAR)) +
geom_bar(stat="identity",position="dodge") +
geom_text(aes(label = BAR_COUNT, y=GROUP_VALUE, ymax=GROUP_VALUE, vjust = -1), position=position_dodge(width=1), size = 4) +
theme(axis.text.y=element_blank(),axis.text.x=element_text(size=12),legend.position = "right",legend.title=element_blank()) + ylab("Y-axis label") +
scale_fill_discrete(breaks=c("GRP_PERCENTAGE", "NORMALIZED_COUNT"),
labels=c("Percentage", "Count of Jobs")) +
ggtitle("Distribution based on Text Analysis 2nd Level Sub-Category") +
theme(plot.title = element_text(lineheight=1, face="bold"))
Here is the ggsave command, in case if that is creating the problem, with dpi, height and width values.
ggsave(my_plot,file=paste(paste(variable_name,"my_plot",sep="_"),".png",sep = ""),dpi=72, height=6.75,width=9)
Can anyone please suggest what need to be done to get this right?
Many Thanks
As Axeman suggests ylim is useful Have a look at the documentation here:
http://docs.ggplot2.org/0.9.3/xylim.html
In your code:
my_plot + ylim(0,110)
Also, I find this intro to axis quite useful:
http://www.cookbook-r.com/Graphs/Axes_(ggplot2)/
Good luck!
I would like to use customized linetypes in ggplot. If that is impossible (which I believe to be true), then I am looking for a smart hack to plot arrowlike symbols above, or below, my line.
Some background:
I want to plot some water quality data and compare it to the standard (set by the European Water Framework Directive) in a red line. Here's some reproducible data and my plot:
df <- data.frame(datum <- seq.Date(as.Date("2014-01-01"),
as.Date("2014-12-31"),by = "week"),y=rnorm(53,mean=100,sd=40))
(plot1 <-
ggplot(df, aes(x=datum,y=y)) +
geom_line() +
geom_point() +
theme_classic()+
geom_hline(aes(yintercept=70),colour="red"))
However, in this plot it is completely unclear if the Standard is a maximum value (as it would be for example Chloride) or a minimum value (as it would be for Oxygen). So I would like to make this clear by adding small pointers/arrows Up or Down. The best way would be to customize the linetype so that it consists of these arrows, but I couldn't find a way.
Q1: Is this at all possible, defining custom linetypes?
All I could think of was adding extra points below the line:
extrapoints <- data.frame(datum2 <- seq.Date(as.Date("2014-01-01"),
as.Date("2014-12-31"),by = "week"),y2=68)
plot1 + geom_point(data=extrapoints, aes(x=datum2,y=y2),
shape=">",size=5,colour="red",rotate=90)
However, I can't seem to rotate these symbols pointing downward. Furthermore, this requires calculating the right spacing of X and distance to the line (Y) every time, which is rather inconvenient.
Q2: Is there any way to achieve this, preferably as automated as possible?
I'm not sure what is requested, but it sounds as though you want arrows at point up or down based on where the y-value is greater or less than some expected value. If that's the case, then this satisfies using geom_segment:
require(grid) # as noted by ?geom_segment
(plot1 <-
ggplot(df, aes(x=datum,y=y)) + geom_line()+
geom_segment(data = data.frame( df$datum, y= 70, up=df$y >70),
aes(xend = datum , yend =70 + c(-1,1)[1+up]*5), #select up/down based on 'up'
arrow = arrow(length = unit(0.1,"cm"))
) + # adjust units to modify size or arrow-heads
geom_point() +
theme_classic()+
geom_hline(aes(yintercept=70),colour="red"))
If I'm wrong about what was desired and you only wanted a bunch of down arrows, then just take out the stuff about creating and using "up" and use a minus-sign.
got some problems with ggplot2 again
I want to plot at least two datasets with two different colors and two different shapes.
This works but when i try to put the names for the legend it doubles the legend automatically.
The number of datasets can change and so the legendnames of course.
I`d need a code that not just works for this example:
library(ggplot2)
xdata=1:5
ydata=c(3.45,4.67,7.8,8.98,10)
ydata2=c(12.4,13.5,14.6,15.8,16)
p <-data.frame(matrix(NA,nrow=5,ncol=3))
p$X1 <- xdata
p$X2 <- ydata
p$X3 <- ydata2
shps <-c(1,2)
colp <-c("navy","red3")
p <- melt(p,id="X1")
px <-ggplot(p,aes(X1,value))
legendnames <- c("name1","name2")
px <- px +aes(shape = factor(variable))+
geom_point(aes(colour =factor(variable)))+
theme_bw()+
scale_shape_manual(labels=legendnames,values =shps )+
scale_color_manual(values = colp)
px
This gives me this:
But i want that with my legendnames
I just deleted the labels=legendnames, in scale_shape_manual
So whats the issue to solve that problem.
Please help
I think this is just a matter of providing the same labels parameter to the scale_color_manual, otherwise it doesn't know how to consolidate the legends together.
So
px <- px + aes(shape = factor(variable)) +
geom_point(aes(colour = factor(variable))) +
theme_bw()+
scale_shape_manual(labels=legendnames, values = shps)+
scale_color_manual(labels=legendnames, values = colp)
px
It's not really a problem, you programmed it in yourself by using legendnames (which it then adds, even though those variables are not on your data). If you remove them, the plot behaves as you want:
shps <-c(X2=1,X3=2)
colp <-c(X2="navy",X3="red3")
#easy if you want to rerun code, don't overwrite variables
p2 <- melt(p,id="X1")
px <- ggplot(data=p2) + geom_point(aes(x=X1, y=value,shape=variable,colour=variable)) +
scale_shape_manual(values=shps)+
scale_color_manual(values=colp)
px