Geom_ribbon() just turns the graph blank - r

Hi I got a data frame weekly.mean.values with the following structure:
week:mean:ci.lower:ci.upper
Where week is a factor; mean, ci.lower and ci.upper are numeric. For each week, there is only one mean, and one ci.lower or ci.upper.
I was trying to plot a shaded area inside of the 95% confidence interval around the mean, with the following code:
ggplot(weekly.mean.values,aes(x=week,y=mean)) +
geom_line() +
geom_ribbon(aes(ymin=ci.lower,ymax=ci.upper))
The plot, however, came out blank (that is only with x-axis and y-axis present, but no lines, or points, let alone shaded areas).
If I removed the geom_ribbon part, I did get a line. I know that this should be a very simple task but I don't know why I couldn't get geom_ribbon to plot what I wanted. Any hint would be truly appreciated.

I realize this thread is super old, but google still find it.
The answer is that you need to set the ymin and ymax to use a part of the data you are using on the y-axis. It you set them to scalar values then the ribbon covers the entire plot from top to bottom.
You can use
ymin=0
ymax=mean
to go from 0 to your y-point or even
ymin=mean-1
ymax=mean+1
to have the ribbon cover a strip encompassing your actual data.

I may be missing something, but the ribbon will be plotted filled with grey20 by default. You are plotting this layer on top of the data so no wonder it obscures it. Also, it is also possible that the limits for the plot axes derived from the data provided to the initial ggplot() call will not be sufficient to contain the confidence interval ribbon. In that case, I would not be surprised to see a grey/blank plot.
To see if this is the problem, try altering your geom_ribbon() line to:
geom_ribbon(aes(ymin=ci.lower,ymax=ci.upper), alpha = 0.5)
which will plot the ribbon with transparency whic should show the data underneath if the problem is what I think it is.
If so, set the x and y limits to the range of the data +/- the confidence interval you wish to plot and swap the order of the layers (i.e. draw the line on top of the ribbon), and use transparency in the ribbon to show the grid through it.

From ggplot's docs for geom_ribbon (2.1.0):
For each continuous x value, geom_interval displays a y interval. geom_area is a special case of geom_ribbon, where the minimum of the range is fixed to 0.
In this case, x values cannot be factors for geom_ribbon. One solution would be to convert week from a factor to a numeric. e.g.
ggplot(weekly.mean.values,aes(x=as.numeric(week),y=mean)) +
geom_line() +
geom_ribbon(aes(ymin=ci.lower,ymax=ci.upper))
geom_line should handle the switch from factor to numeric without incident, although the X axis scale may display differently.

Related

How to increase the interval of labels in geom_text?

I am trying to put labels beside some points which are very close to each other on geographic coordinate. Of course, the problem is overlapping labels. I have used the following posts for reference:
geom_text() with overlapping labels
avoid overlapping labels in ggplot2 charts
Relative positioning of geom_text in ggplot2?
The problem is that I do not want to relocate labels but increase the interval of labeling (for example every other 10 points).
I tried to make column as alpha in my dataframe to make unwanted points transparent
[![combined_df_c$alpha=rep(c(1,rep(0,times=11)),
times=length(combined_df_c$time)/
length(rep(c(1,rep(0,times=11)))))][1]][1]
I do not know why it does not affect the plot and all labels are plotted again.
The expected output is fewer labels on my plot.
You can do this by sequencing your dataframe for the labs of geom_text.
I used the build-in dataset mtcars for this, since you did not provide any data. With df[seq(1,nrow(df),6),] i slice the data with 6-steps. This are the labels which get shown in your graph afterwards. You could use this with any steps you want. The sliced dataframe is given to geom_text, so it does not use the original dataset anymore, just the sliced one. This way the amount of points for the labels and the amount of labels are equal.
df <- mtcars
labdf<- df[seq(1,nrow(df),6),]
ggplot()+
geom_point(data=df, aes(x=drat, y=seq(1:length(drat))))+
geom_text(data=labdf,
aes(x=drat, y=seq(1:length(drat))), label=labdf$drat)
The output is as expected: from 32 rows, just 6 get labeled.
You can easily adjust the code for your case.
also: you can put the aes in ggplot() which may be more useful if you use more then just gemo_point. I made it like this, so i can clarify: there is a different dataset used on geom_text()

Histograms and Density Plots do not match up

I am creating histograms of substitutions: 1st, 2nd,or 3rd sub over Time. So each histogram shows the number of subs in a given minute given the Sub Number. The histograms make sense to me because for the most part they are smooth (I used a bin width of 1 minute). Nothing looks too out of the ordinary. However, when I overlay a density plot, the tails on the left inflate and I cannot determine why for one of the graphs.
The dataset is of substitions, ranging from minute 1 to a maximum time. I then cut this dataset in half to only look at when the sub was made after minute 45. I have not folded this data back and I have tried to create a reproducable example, but cannot given the data.
Code used to create in R
## Filter out subs that are not in the second half
df.half<-df[df$PeriodId>=2,]
p<-ggplot(data=df.half, aes(x=time)) +
geom_histogram(aes(y=..density..),position="identity", alpha=0.5,binwidth=1)+
geom_vline(data=sumy.df.half,aes(xintercept=grp.mean),color="blue", linetype="dashed", size=1)+
geom_density(alpha=.2)+
facet_grid(SUB_NUMBER ~ .)+
scale_y_continuous(limits = c(0,0.075),breaks = c(seq(0,0.075,0.025)),
minor_breaks = c(seq(0,0.075,0.025)),name='Count')
p
Why, for the First Sub is the density plot inflated in the tail if there are no values less than 45? Also why isn't the density plot more inflated in the tail for the Second Sub?
Side Note: I did ask this question on crossvalidated, but was told since it involved R, to ask it here instead. Here
So I was able to change the code and get the following:
ggplot() +
geom_histogram(data=df.half, aes(x=time,y=..density..),position="identity", alpha=0.5,binwidth=1)+
geom_density(data=df.half,aes(x=time,y=..density..))+
geom_vline(data=sumy.df.half,aes(xintercept=grp.mean),color="blue", linetype="dashed", size=1)+
facet_grid(SUB_NUMBER ~ .)
This looks more correct and at least now fits the dataset. However, I am still confused as to why those issues occured in the first place.
While there is no data sample to reproduce the error, you could try to
make sure that the environment used by geom_density is correct by specifying it explicitly. You can also try to move the code line specifying the density (geom_density) just after the geom_histogram. Also, the y-axis label is probably wrong - it is now set as counts, while values suggest that is in fact density.
How would I specify density explicitly?
You can specify the density parameters explicitly by specifying data, aes and position directly in geom_density function call, so it would use these stated instead of inherited arguments:
ggplot() +
geom_histogram(data=df.half, aes(x=time,y=..density..),position="identity", alpha=0.5,binwidth=1)+
geom_density(data=df.half,aes(x=time,y=..density..))+
geom_vline(data=sumy.df.half,aes(xintercept=grp.mean),color="blue", linetype="dashed", size=1)+
facet_grid(SUB_NUMBER ~ .)
I do not understand how it occured in the first place
I think in your initial code for geom_density, you have explicitly specified just the alpha argument. Thus for all of the rest of the parameters it needed, (data, aes, position etc) it used the inherited arguments/parameters and apparently it did not inherit them correctly. Probably it tried to use the data argument from the geom_vline function - sumy.df.half , or was confused by the syntaxis in argument "..density.."

Settings y-axis limits when stat = "summary" (binomial data) [duplicate]

This question already has answers here:
geom_bar bars not displaying when specifying ylim
(4 answers)
Closed 8 months ago.
I am trying to create a barplot using ggplot2, with the y axis starting at a value greater than zero.
Lets say I have the means and standard errors for hypothetical dataset about carrot length at three different farms:
carrots<-NULL
carrots$Mean<-c(270,250,240)
carrots$SE<-c(3,4,5)
carrots$Farm<-c("Plains","Hill","Valley")
carrots<-data.frame(carrots)
I create a basic plot:
p<-ggplot(carrots,aes(y=Mean,x=Farm)) +
geom_bar(fill="slateblue") +
geom_errorbar(aes(ymin=Mean-SE,ymax=Mean+SE), width=0)
p
This is nice, but as the scale runs from 0 to it is difficult to see the differences in length. Therefore, I would like to rescale the y axis to something like c(200,300). However, when I try to do this with:
p+scale_y_continuous('Length (mm)', limit=c(200,300))
The bars disappear, although the error bars remain.
My question is: is it possible to plot a barplot with this adjusted axis using ggplot2?
Thank you for any help or suggestions you can offer.
Try this
p + coord_cartesian(ylim=c(200,300))
Setting the limits on the coordinate system performs a visual zoom;
the data is unchanged, and we just view a small portion of the original plot.
If someone is trying to accomplish the same zoom effect for a flipped bar chart, the accepted answer won't work (even though the answer is perfect for the example in the question).
The solution for the flipped bar chart is using the argument ylim of the coord_flip function. I decided to post this answer because my bars were also "disappearing" as in the original question while I was trying to re-scale with other methods, but in my case the chart was a flipped one. This may probably help other people with the same issue.
This is the adapted code, based on the example of the question:
ggplot(carrots,aes(y=Mean,x=Farm)) +
geom_col(fill="slateblue") +
geom_errorbar(aes(ymin=Mean-SE,ymax=Mean+SE), width=0) +
coord_flip(ylim=c(200,300))
Flipped chart example

Adding points with error bars into a Matlab scatter plot

I have performed a multidimensional cluster analysis in matlab. For each cluster, I have calculated mean and covariance (assuming conditional independence).
I have chosen two or three dimensions out of my raw data and plotted it into a scatter or scatter3 plot.
Now I would like to add the cluster-means and the corresponding standart deviations into the same plot.
In other words, I wand to add some data points with error bars to a scatter plot.
This question is almost what I want. But I would be ok with bars instead of boxes and I wonder if in that case there is a built-in way to do it with less effort.
Any suggestions on how to do that?
Once you realize that line segments will probably suffice for your purpose (and may be less ugly than the usual error bars with the whiskers, depending on the number of points), you can do something pretty simple (which applies to probably any plotting package, not just MATLAB).
Just plot a scatter, then write a loop to plot all line-segments you want corresponding to error bars (or do it in the opposite order like I did with error bars first then the scatter plot, depending if you want your dots or your error bars on top).
Here is the simple MATLAB code, along with an example figure showing error bars in two dimensions (sorry for the boring near-linearity):
As you can see, you can plot error bars for each axis in different colors to aid in visualization.
function scatterError(x, y, xe, ye, varargin)
%Brandon Barker 01/20/2014
nD = length(x);
%Make these defaults later:
dotColor = [1 0.3 0.3]; % conservative pink
yeColor = [0, 0.4, 0.8]; % bright navy blue
xeColor = [0.35, 0.35, 0.35]; % not-too-dark grey
dotSize = 23;
figure();
set(gcf, 'Position', get(0,'Screensize')); % Maximize figure.
set(gca, 'FontSize', 23);
hold all;
for i = 1:nD
plot([(x(i) - xe(i)) (x(i) + xe(i))], [y(i) y(i)], 'Color', xeColor);
plot([x(i) x(i)], [(y(i) - ye(i)) (y(i) + ye(i))], 'Color', yeColor);
end
scatter(x, y, dotSize, repmat(dotColor, nD, 1));
set(gca, varargin{:});
axis square;
With some extra work, it wouldn't be too hard to add whiskers to your error bars if you really want them.
If you are not too picky about what the graph looks like and are looking for performance, a builtin function is indeed often a good choice.
My first thought would be to try using a boxplot, it has quite a lot of options so probably one combination of them will give you the result you need.
Sidenote: At first sight the answer you referred to does not look very inefficient so you may have to manage your expectations when it comes to achievable speedups.

How to represent datapoints that are out of scale in R

I am trying to plot a set of data in R
x <- c(1,4,5,3,2,25)
my Y scale is fixed at 20 so that the last datapoint would effectively not be visible on the plot if i execute the following code
plot(x, ylim=c(0,20), type='l')
i wanted to show the range of the outlying datapoint by showing a smaller box above the plot, with an independent Y scale, representing only this last datapoint.
is there any package or way to approach this problem?
You may try axis.break (plotrix package) http://rss.acs.unt.edu/Rdoc/library/plotrix/html/axis.break.html, with which you can define the axis to break, the style, size and color of the break marker.
The potential disadvantage of this approach is that the trend perception might be fooled. Good luck!

Resources