ggplot - gaps in stacked bar plot [duplicate] - r

I'm using ggplot and I get those weird horizontal lines out of geom_bar. I cannot provide a minimal working example: the same code works with few observations and it relies on data I am importing and transforming. However, I can show the relevant line of codes and cross my fingers someone ran into this issue:
ggplot(data) + geom_bar(aes(x=Horizon, y=Importance, fill=Groups),
position='fill', stat='identity') +
theme_timeseries2() +
scale_fill_manual(values=c('#1B9E77', 'orange2', 'black',
'red2', 'blue4')) +
xlab('') + ylab('')
My personal function, theme_timeseries2() isn't the source of the problem: it happens even if I stop after geom_bar. I checked for missing values in Importance and every other column of my data frame and there are none.
It's also very odd: the white lines aren't the same on the zoomed page as in the plot window of RStudio. They do print in .png format when I save the file, so there really is something going on with those horizontal bars. Any theory about why geom_bar() does this would be highly appreciated.

You can fix it by adding the fill as color. Like this:
geom_bar(aes(x=Horizon, y=Importance, fill=Groups, color=Groups),
position='fill', stat='identity')
This was suggested here.

I'm guessing the lines are due to a plotting bug between observations that go into each bar. (That could be related to the OS, the graphics device, and/or how ggplot2 interacts with them...)
I expect it'd go away if you summarized before ggplot2, e.g.:
library(dplyr);
data %>%
count(Horizon, Groups, wt = Importance, name = "Importance") %>%
ggplot() +
geom_col(aes(x = Horizon, y= Importance, fill = Groups), position = "fill") + ....

Mine went away when changing the size of the graphs in rmarkdown.

Related

Why does changing the label mess up my plot?

I have recently been playing around with various plot types using fictitious data to get my head around how I could display various pieces of information. One plot type that is gaining popularity is the so called individual differences dot plot which shows the change in each subjects score pre-post. The plot is fairly easy to produce, but my issue is that when I go to change the labels using either the labs or xlab ylab functions in ggplot, the plot itself becomes messed up. Below I have attached the fictitious data, the code used and the results.
Data
df<- data.frame(Participant<- c(rep(1:10,2)), Score<- c(rnorm(20,100,5)), Session<- c(1,1,1,1,1,1,1,1,1,1, 2,2,2,2,2,2,2,2,2,2))
colnames(df) <- c("Participant", "Score", "Session")
Code for plot
p<- ggplot(df, aes(x=df$Session, y=df$Score, colour=df$Participant))+ geom_point()+
geom_line(group=df$Participant)+
theme_classic()
Plot
Individual difference plot
My dilemma is that anytime I try to change the label names, the plot messes up as per below.
Problem
p + xlab("Session") + ylab("Score")
Plot after relabelling
The same thing happens if I try the labs function i.e, p + labs(x= "Session", y= "Score"). You can see that the labels themselves do actually change, but for some reason this messes up the actual plot. Does any have any ideas as to what could be going wrong here?
The issue appears to be the grouping is undone when the label functions are called. Instead, issue the grouping as an aesthetic mapping:
library(dplyr); library(ggplot)
df %>% mutate(across(c(Session,Participant),factor)) -> df
p <- ggplot(df, aes(x=Session, y=Score, colour=Participant))+ geom_point()+
geom_line(aes(group=Participant))+
theme_classic()
p + xlab("Session") + ylab("Score")
I suspect this is probably a bug.

Unable to fix the x-axis labels in ggplot2 facet_wrap geom_histogram plot

Maybe I have missed this in the stackexchange literature, as I'm surprised to find many solutions for adding floating axis labels and adjustments to axes (e.g. add "floating" axis labels in facet_wrap plot), but nothing to solve my issue of overlapping x-axis labels with a facet_wrap and scales = "free". This one is close, but for an older ggplot version: overlapping y-scales in facet (scale="free"). This one may be the answer, to write a function to do such an operation, but I couldn't make it work for me: Is there a way of manipulating ggplot scale breaks and labels?
Here is a reproducible example:
v<-runif(1000,min=1000000,max=10000000)
x<-runif(100,min=0.1,max=1)
y<-runif(100000,min=0,max=20000)
z<-runif(100000,min=10000,max=2000000)
df<-data.frame(v,x,y,z)
colnames(df)<-c("Order V","Order X","Order Y","Order z")
library(reshape2)
melt.df<-melt(df[,c(1:4)])
library(ggplot2)
ggplot(melt.df, aes(x = value)) +
geom_histogram(bins=50,na.rm=TRUE) +
facet_wrap(~variable, scales="free") +
theme_bw()
The resulting figure is this:
I have a similar setup, which produces this:
Any help on a cmd to do this, or a function that could set up these x-axis label breaks would be awesome!
I figured it out - at least a hacked answer - and will post my solution in case others can use it.
Basically I adjusted the font size of the axis text, and used the scales pkg to keep the notation consistent (i.e. get rid of scientific). My altered code is:
ggplot(melt.df, aes(x = value)) +
geom_histogram(bins=50,na.rm=TRUE) +
facet_wrap(~variable, scales="free") +
theme_bw()+
theme(axis.text=element_text(size=10),text=element_text(size=16))+
scale_x_continuous(labels = scales::comma)
If you don't want the commas in the labels, you can set something like
options(scipen = 10)
before plotting. This makes the threshold for using scientific notation higher so ordinary labels will be used in this case.

ggplot: plot title and plot overlap each other

I am a newbie to R and hence having some problems in plotting using ggplot and hence need help.
In the above diagram, if any of my bars have high values (in this case, a green one with value of 447), the plot and the plot title gets overlapped. The values here are normalised / scaled such that the y-axis values are always between 0-100, though the label might indicate a different number (this is the actual count of occurrences, where as the scaling is done based on percentages).
I would like to know how to avoid the overlap of the plot with the plot title, in all cases, where the bar heights are very close to 100.
The ggplot function I am using is as below.
my_plot<-ggplot(data_frame,
aes(x=as.factor(X_VAR),y=GROUP_VALUE,fill=GROUP_VAR)) +
geom_bar(stat="identity",position="dodge") +
geom_text(aes(label = BAR_COUNT, y=GROUP_VALUE, ymax=GROUP_VALUE, vjust = -1), position=position_dodge(width=1), size = 4) +
theme(axis.text.y=element_blank(),axis.text.x=element_text(size=12),legend.position = "right",legend.title=element_blank()) + ylab("Y-axis label") +
scale_fill_discrete(breaks=c("GRP_PERCENTAGE", "NORMALIZED_COUNT"),
labels=c("Percentage", "Count of Jobs")) +
ggtitle("Distribution based on Text Analysis 2nd Level Sub-Category") +
theme(plot.title = element_text(lineheight=1, face="bold"))
Here is the ggsave command, in case if that is creating the problem, with dpi, height and width values.
ggsave(my_plot,file=paste(paste(variable_name,"my_plot",sep="_"),".png",sep = ""),dpi=72, height=6.75,width=9)
Can anyone please suggest what need to be done to get this right?
Many Thanks
As Axeman suggests ylim is useful Have a look at the documentation here:
http://docs.ggplot2.org/0.9.3/xylim.html
In your code:
my_plot + ylim(0,110)
Also, I find this intro to axis quite useful:
http://www.cookbook-r.com/Graphs/Axes_(ggplot2)/
Good luck!

ggplot2 stacked barplots, formatting, and grids

In the data that I am attempting to plot, each sample belongs in one of several groups, that will be plotted on their own grids. I am plotting stacked bar plots for each sample that will be ordered in increasing number of sequences, which is an id attribute of each sample.
Currently, the plot (with some random data) looks like this:
(Since I don't have the required 10 rep for images, I am linking it here)
There are couple things I need to accomplish. And I don't know where to start.
I would like the bars not to be placed at its corresponding nseqs value, rather placed next to each other in ascending nseqs order.
I don't want each grid to have the same scale. Everything needs to fit snugly.
I have tried to set scales and size to for facet_grid to free_x, but this results in an unused argument error. I think this is related to the fact that I have not been able to get the scales library loaded properly (it keeps saying not available).
Code that deals with plotting:
ggfdata <- melt(fdata, id.var=c('group','nseqs','sample'))
p <- ggplot(ggfdata, aes(x=nseqs, y=value, fill = variable)) +
geom_bar(stat='identity') +
facet_grid(~group) +
scale_y_continuous() +
opts(title=paste('Taxonomic Distribution - grouped by',colnames(meta.frame)[i]))
Try this:
update.packages()
## I'm assuming your ggplot2 is out of date because you use opts()
## If the scales library is unavailable, you might need to update R
ggfdata <- melt(fdata, id.var=c('group','nseqs','sample'))
ggfdata$nseqs <- factor(ggfdata$nseqs)
## Making nseqs a factor will stop ggplot from treating it as a numeric,
## which sounds like what you want
p <- ggplot(ggfdata, aes(x=nseqs, y=value, fill = variable)) +
geom_bar(stat='identity') +
facet_wrap(~group, scales="free_x") + ## No need for facet_grid with only one variable
labs(title = paste('Taxonomic Distribution - grouped by',colnames(meta.frame)[i]))

ggplot2: plotting two size aesthetics

From what I can find on stackoverflow, (such as this answer to using two scale colour gradients on one ggplot) this may not (yet) be possible with ggplot2.
I want to create a bubbleplot with two size aesthetics, one always larger than the other. The idea is to show the proportion as well as the absolute values. Now I could colour the points by the proportion but I prefer multi-bubbles. In Excel this is relatively simple. (http://i.stack.imgur.com/v5LsF.png) Is there a way to replicate this in ggplot2 (or base)?
Here's an option. Mapping size in two geom_point layers should work. It's a bit of a pain getting the sizes right for bubblecharts in ggplot though.
p <- ggplot(mtcars, aes(mpg, wt)) + geom_point(aes(size = disp), shape = 1) +
geom_point(aes(size = hp/(2*disp))) + scale_size_continuous(range = c(15,30))
To get it looking most like your exapmle, add theme_bw():
P <- p + theme_bw()
The scale_size_continuous() is where you have to just fiddle around till you're happy - at least in my experience. If someone has a better idea there I'd love to hear it.

Resources