ggplot2: Changing which values are annotated on the x-axis - r

My question is certainly a replicate but I can't find the answer.
On the x-axis the values that have a tick in my plot are: 2.5,5,7.5,10,12.5.
I want to modify which values have a tick in order to see the following values: 2,4,6,8,10,12
In order to make sure I was well understood: I do not want to change my axes to something that is not carthesian, I just want to change which positions on the x axis are annotated.
How can I achieve this?
Here is my current code:
ggplot(data.and.factors.prov,aes(x=number.of.traits,y=FP,colour=factor(Corr))) +
stat_summary(fun.data=mean_cl_normal,position=position_dodge(width=0.2)) +
geom_blank() +
geom_smooth(method='lm',se=F,formula=y~I(x)) +
labs(x='Number of traits') +
scale_colour_manual(values=c(1:6),name='Correlation Coefficient') +
xlim(c(1,12))

Use scale_x_discrete(breaks = seq(2, 12, by=2))

Related

Change scale on X axis in ggplot in R

I am using ggplot (Line graph) and trying to plot my data by week, however when I am plotting the data R automatically shows the weeks by 10, 15, ....
I want o show all the weeks number on my X Axis, e.g. 10,11,12,...
ggplot(...) + geom_line(...) + scale_x_continuous(n.breaks = 30)
You can modify the n.breaks parameter to your liking.
It seems your "weeks" axis is numeric (just the number) rather than a date. To change where the tick marks are indicated for your axis, you can use the breaks= argument of scale_*_continuous() for the numeric scale. Here's an example where you can see how to do this:
df <- data.frame(x=1:20, y=rnorm(20))
p <- ggplot(df, aes(x,y)) + geom_point()
p
By default, the x axis is separated into major breaks of 5. If you wanted breaks every 1, you supply a vector to the breaks= argument:
p + scale_x_continuous(breaks=seq(0,20,by=1))
You can even do odd things, like specify breaks individually if you want:
p + scale_x_continuous(breaks=c(0,5,10,11,12,18,20))

ggplot: add manually labelled tick marks on top of automatic tick marks

I am trying to highlight the point with the lowest y value by attempting the following:
1) draw a line from this point down to the x-axis and another to the y-axis; and
2) add a manual tick mark with this point's x and y value on the x-axis and y-axis, respectively. This manual tick mark must be added in addition to the automatic tick marks on both axes.
Sample data:
df <- data.frame(x=1:100,y=rnorm(100,10,1))
ggplot(df) +
geom_point(aes(x=x,y=y))
Edit:
Here's an illustration of what I am attempting:
It's unclear exactly what you want this to look like but you could do one of two options. You could either use geom_vline() or geom_segment(). Vline will do a line from the bottom to the top, but it sounds like you may prefer to use segment. Try this:
+ geom_segment(x = min(x), xend = min(x), y = 0, yend = 1)
If you change the yend argument you could make the tick smaller or larger. Drawing one for the max value should be as simple as swapping the min() arguments for max() arguments. Or you could just input the values manually. Alternatively, you could add a vline to go the full height of the panel with:
+ geom_vline(xintercept = min(x))
You can read more about both here. If this doesn't help much, you can provide a proper reprex and maybe a sketch of your desired output we can modify that code to get a bit closer to what you want.
edit:
Writing outside of the plot window is a bit more difficult, but this link may help you. I've tried it on a few and always found that in my cases it was easier to use a different solution. Here's one option:
library(ggplot2)
set.seed(123) # so we have the same toy data
df <- data.frame(x=1:100,y=rnorm(100,10,1))
ggplot(df) +
geom_point(aes(x=x,y=y)) +
geom_segment(x=0, xend=18, y=8.033383, yend=8.033383) + # draw to x axis
geom_segment(x=18, xend=18, y=0, yend=8.033383) + # draw to y axis
annotate("text", 18.2, 8.2, label="(8, 8.03)", size=3) # ordered pair just above it
If you didn't want to draw all the way to the point you could just change the first xend and yend arguments where the x/y start at zero to be come just above the edge of the plot window.

R, how to add one break to the default breaks in ggplot?

Suppose I have the following issue: having a set of data, generate a chart indicating how many datapoints are below any given threshold.
This is fairly easy to achieve
n.data <- 215
set.seed(0)
dt <- rnorm(n.data) ** 2
x <- seq(0, 5, by=.2)
y <- sapply(x, function(i) length(which(dt < i)))
ggplot() +
geom_point(aes(x=x,y=y)) +
geom_hline(yintercept = n.data)
The question is, suppose I want to add a label to indicate what the total number of observation was (n.data). How do I do that, while maintaining the other breaks as default?
The outcome I'd like looks something like the image below, generated with the code
ggplot() +
geom_point(aes(x=x,y=y)) +
geom_hline(yintercept = n.data) +
scale_y_continuous(breaks = c(seq(0,200,50),n.data))
However, I'd like this to work even when I change the value of n.data, just by adding it to the default breaks.
(bonus points if you also get rid of the grid line between the last default break and the n.data one!)
Three years and some more knowledge of ggplot later, here's how I would do this today.
ggplot() +
geom_point(aes(x=x,y=y)) +
geom_hline(yintercept = n.data) +
scale_y_continuous(breaks = c(pretty(y), n.data))
Here is how you can get rid of the grid line between the last auto break and the manual one :
theme_update(panel.grid.minor=element_blank())
For the rest, I can't quite understand your question, as when you change n.data, your break is updated.

forcing ggplot2 y-axis label to be of integer only, and give proper breaks [duplicate]

This question already has answers here:
How to display only integer values on an axis using ggplot2
(13 answers)
Closed 5 years ago.
I am drawing barchart for discrete data and ggplot by default adjust the y-axis for me, but gives me y-axis label with breaks at 0.5 interval which I don't like it. I tried scale_y_discrete but y-axis breaks is given for every discrete value, which is not good also.
Can I force the y-axis break to be composed of integer only, and give proper breaks for each of the facet?
Sample script is as below:
set.seed(1)
chart.data <- data.frame(x=rep(LETTERS[1:10],3),
y=c(sample(0:10,10,replace=TRUE),
sample(0:100,10,replace=TRUE),
sample(0:1000,10,replace=TRUE)),
group=sort(rep(1:3,10)))
chart <- ggplot(data=chart.data,aes(x=x,y=y))
chart <- chart + geom_bar(stat="identity")
chart <- chart + scale_y_discrete()
chart <- chart + facet_wrap(facets=~group,nrow=1,scale="free_y")
Update #1
Since the post is being considered as possible duplicate, the script is refined to show a more complicated scenario.
First, as your y data are continuous you should use scale_y_continuous(). In this function you can add argument breaks= pretty_breaks() (add library scales to use function pretty_breaks()). If you don't provide any number inside pretty_breaks() then in this case you will get integer numbers on y axis. You can set number of breaks to display, for example, pretty_breaks(4) but for the first facet where you have range 0-10 it will still display only integer values and the number of breaks will be larger to get "nice" numbers.
library(scales)
ggplot(data=chart.data,aes(x=x,y=y)) +
geom_bar(stat="identity") +
facet_wrap(facets=~group,nrow=1,scale="free_y")+
scale_y_continuous(breaks= pretty_breaks())
You can also direclty specify the breaks in a function. Below are a few examples of how you could do this. Also look at the breaks argument in ?discrete_scale.
chart + scale_y_discrete(breaks=function(n) c(0, floor(max(n)/2), max(n)))
chart + scale_y_discrete(breaks=function(n) n[floor(length(n)/5)*1:5+1])
chart + scale_y_discrete(breaks=function(n) 10^(ceiling(log10(max(n)))-1)*2*0:5)

facet_wrap: How to add y axis to every individual graph when scales="free_x"?

The following code
library(ggplot2)
library(reshape2)
m=melt(iris[,1:4])
ggplot(m, aes(value)) +
facet_wrap(~variable,ncol=2,scales="free_x") +
geom_histogram()
produces 4 graphs with fixed y axis (which is what I want). However, by default, the y axis is only displayed on the left side of the faceted graph (i.e. on the side of 1st and 3rd graph).
What do I do to make the y axis show itself on all 4 graphs? Thanks!
EDIT: As suggested by #Roland, one could set scales="free" and use ylim(c(0,30)), but I would prefer not to have to set the limits everytime manually.
#Roland also suggested to use hist and ddply outside of ggplot to get the maximum count. Isn't there any ggplot2 based solution?
EDIT: There is a very elegant solution from #babptiste. However, when changing binwidth, it starts to behave oddly (at least for me). Check this example with default binwidth (range/30). The values on the y axis are between 0 and 30,000.
library(ggplot2)
library(reshape2)
m=melt(data=diamonds[,c("x","y","z")])
ggplot(m,aes(x=value)) +
facet_wrap(~variable,ncol=2,scales="free") +
geom_histogram() +
geom_blank(aes(y=max(..count..)), stat="bin")
And now this one.
ggplot(m,aes(x=value)) +
facet_wrap(~variable,scales="free") +
geom_histogram(binwidth=0.5) +
geom_blank(aes(y=max(..count..)), stat="bin")
The binwidth is now set to 0.5 so the highest frequency should change (decrease in fact, as in tighter bins there will be less observations). However, nothing happened with the y axis, it still covers the same amount of values, creating a huge empty space in each graph.
[The problem is solved... see #baptiste's edited answer.]
Is this what you're after?
ggplot(m, aes(value)) +
facet_wrap(~variable,scales="free") +
geom_histogram(binwidth=0.5) +
geom_blank(aes(y=max(..count..)), stat="bin", binwidth=0.5)
ggplot(m, aes(value)) +
facet_wrap(~variable,scales="free") +
ylim(c(0,30)) +
geom_histogram()
Didzis Elferts in https://stackoverflow.com/a/14584567/2416535 suggested using ggplot_build() to get the values of the bins used in geom_histogram (ggplot_build() provides data used by ggplot2 to plot the graph). Once you have your graph stored in an object, you can find the values for all the bins in the column count:
library(ggplot2)
library(reshape2)
m=melt(iris[,1:4])
plot = ggplot(m) +
facet_wrap(~variable,scales="free") +
geom_histogram(aes(x=value))
ggplot_build(plot)$data[[1]]$count
Therefore, I tried to replace the max y limit by this:
max(ggplot_build(plot)$data[[1]]$count)
and managed to get a working example:
m=melt(data=diamonds[,c("x","y","z")])
bin=0.5 # you can use this to try out different bin widths to see the results
plot=
ggplot(m) +
facet_wrap(~variable,scales="free") +
geom_histogram(aes(x=value),binwidth=bin)
ggplot(m) +
facet_wrap(~variable,ncol=2,scales="free") +
geom_histogram(aes(x=value),binwidth=bin) +
ylim(c(0,max(ggplot_build(plot)$data[[1]]$count)))
It does the job, albeit clumsily. It would be nice if someone improved upon that to eliminate the need to create 2 graphs, or rather the same graph twice.

Resources