ggplot2 R, Fixing much values in axis (Line-plot) - r

I can't read my y-axis since is has a lot of values. I tried rotating it and it doesn't work like I want, neither is it something I want to do.
I want to specify the values in the axis, to be from say 20 to 30, maybe with step 0.1.
But the length of the values are 1000, so I guess the range suggested above doesn't work (?).
Ex:
runNumbers <- seq(from = 1, to = 1000)
tempVector <- seq(from = 20.0010, to = 30, by = 0.01)
plotData <- data.frame(RunNumber = runNumbers, temp = tempVector,
myUglyPlot <- ggplot(data = plotData, mapping = aes(x = RunNumber, y = temp, group = 1)) + geom_line()
#
#http://stackoverflow.com/questions/14428887/overflowing-x-axis-ggplot2?noredirect=1&lq=1
require(scales) # for removing scientific notation
# manually generate breaks/labels
labels <- seq(from = 0, to = 30, length.out = 1000)
# and set breaks and labels
myUglyPlot <- myUglyPlot + scale_y_discrete(breaks = labels, labels = as.character(labels))
# And now my graph is without labels, why?
Is there another way to do this, without rotating my labels? Or am I doing something wrong in the code from the other question (I tried to follow what he did...)?
Later I will have 10 000 values instead, so I really need to change this, I want to have a readable axis, that I can put the interval in.
Maybe I'm missing in some simple concept, I tried to search and read R Graphics Cookbook, but without success for now.
Thanks for your time.
Update
Im trying to use breaks, thanks for the help guys. Here's what I'm doing now (only this):
myUglyPlot <- ggplot(data = plotData, mapping = aes(x = RunNo, y = t_amb, group = 1)) + geom_line()
myUglyPlot <- myUglyPlot + scale_y_discrete(breaks=seq(from = 1, to = 50, by = 0.01))
But my it doesn't give me any breaks. See pic.

You are almost there.. Since your y-axis is a continuous value, you need to use scale_y_continuous instead of scale_y_discrete.
myUglyPlot <- myUglyPlot + scale_y_continuous(breaks = labels)

Related

Can I draw a horizontal line at specific number of range of values using ggplot2?

I have data (from excel) with the y-axis as ranges (also calculated in excel) and the x-axis as cell counts and I would like to draw a horizontal line at a specific value in the range, like a reference line. I tried using geom_hline(yintercept = 450) but I am sure it is quite naive and does not work that way for a number in range. I wonder if there are any better suggestions for it :)
plot.new()
library(ggplot2)
d <- read.delim("C:/Users/35389/Desktop/R.txt", sep = "\t")
head(d)
d <- cbind(row.names(d), data.frame(d), row.names=NULL)
d
g <- ggplot(d, aes(d$CTRL,d$Bin.range))+ geom_col()
g + geom_hline(yintercept = 450)
First of all, have a look at my comments.
Second, this is how I suggest you to proceed: don't calculate those ranges on Excel. Let ggplot do it for you.
Say, your data is like this:
df <- data.frame(x = runif(100, 0, 500))
head(df)
#> x
#>1 322.76123
#>2 57.46708
#>3 223.31943
#>4 498.91870
#>5 155.05416
#>6 107.27830
Then you can make a plot like this:
library(ggplot2)
ggplot(df) +
geom_histogram(aes(x = x),
boundary = 0,
binwidth = 50,
fill = "steelblue",
colour = "white") +
geom_vline(xintercept = 450, colour = "red", linetype = 2, size = 1) +
coord_flip()
We don't have your data, but the following data frame is of a similar structure:
d <- data.frame(CTRL = sample(100, 10),
Bin.range = paste(0:9 * 50, 0:9 * 50 + 49.9, sep = "-"))
The first thing to note is that your y axis does not have your ranges ordered correctly. You have 50-99.9 at the top of the y axis. This is because your ranges are stored as characters and ggplot will automatically arrange these alphabetically, not numerically. So you need to reorder the factor levels of your ranges:
d$Bin.range <- factor(d$Bin.range, d$Bin.range)
When you create your plot, don't use d$Bin.range, but instead just use Bin.range. ggplot knows to look for this variable in the data frame you have passed.
g <- ggplot(d, aes(CTRL, Bin.range)) + geom_col()
If you want to draw a horizontal line, your two options are to specify the y axis label at which you want to draw the line (i.e. yintercept = "400-449.9") or, which is what I suspect you want, use a numeric value of 9.5 which will put it between the top two values:
g + geom_hline(yintercept = 9.5, linetype = 2)

Adding dummy values on axis in ggplot2 to add asymmetric distance between ticks

How to add dummy values on x-axis in ggplot2
I have 0,2,4,6,12,14,18,22,26 in data and that i have plotted on x-axis. Is there a way to add the remaining even numbers for which there is no data in table? this will create due spaces on the x-axis.
after the activity the x-axis should show 0,2,4,6,8,10,12,14,16,18,20,22,24,26
i have tried using rbind.fill already to add dummy data but when I make them factor the 8,10,12etc coming in last
Thanks
enter image description here
Hope this make sense:
library(ggplot2)
gvals <- factor(letters[1:3])
xvals <- factor(c(0,2,4,6,12,14,18,22,26), levels = seq(0, 26, by = 2))
yvals <- rnorm(10000, mean = 2)
df <- data.frame(x = sample(xvals, size = length(yvals), replace = TRUE),
y = yvals,
group = sample(gvals, size = length(yvals), replace = TRUE))
ggplot(df, aes(x = x, y = y)) + geom_boxplot(aes(fill = group)) +
scale_x_discrete(drop = FALSE)
The tricks are to make the x-variable with all levels you need and to specify drop = FALSE in scale.

how to get top 100 count number for each cell in ggplot2 with geom_bin2d

Before asking, I have read this post, but mine is more specific.
library(ggplot2)
library(scales)
set.seed(1)
dat <- data.frame(x = rnorm(1000), y = rnorm(1000))
I replace my real data with dat, the domain of x and y is [-4,4] at this random seed, and I partition the area into 256(16*16) cells, the interval of which is 0.5. For each cell, I want to get the count numbers.
Yeah, it's quite easy, geom_bin2d can solve it.
# plot
p <- ggplot(dat, aes(x = x, y = y)) + geom_bin2d()
# Get data - this includes counts and x,y coordinates
newdat <- ggplot_build(p)$data[[1]]
# add in text labels
p + geom_text(data=newdat, aes((xmin + xmax)/2, (ymin + ymax)/2,
label=count), col="white")
So far so good, but I only want to get top 100 count numbers and plot in the pic, like pic below.
After reading ?geom_bin2d, drop = TRUE only removes all cells with 0 counts, and my concern is the top 100 counts. What should I do, this is question 1.
Please take another look on the legend of the 2nd pic, the count number is small and close, what if it's 10,000, 20,000, 30,000.
The method is use trans in scale_fill_gradient, the built_in function are exp, log, sqrt, and so on, but I want to divide 1,000. Then, I found trans_new() in package scales and had a try, but negative.
sci_trans <- function(){ trans_new('sci', function(x) x/1000, function(x) x*1000)}
p + scale_fill_gradient(trans='sci')
And, this is question 2. I have googled a lot, but cannot find a way to solve it, thanks a lot for anyone who does me a favor, thank you!
Apparently you can't get the output bins or counts from stat_bin2d or stat_summary_2d ; according to a related question: How to use stat_bin2d() to compute counts labels in ggplot2? where #MrFlick 's comment quotes Hadley from 2010: "he basically says you can't use stat_bin2d, you'll have to do the summarization yourself".
So, the workaround: create the coordinate bins manually yourself, get the 2D counts, then take top-n. For example, using dplyr:
dat %>% mutate(x_binned=some_fn(x), y_binned=some_fn(y)) %>%
group_by(x_binned,y_binned) %>% # maybe can skip this line
summarize(count = count()) %>% # NOTE: no need to sort() or order()
top_n(..., 100)
You might have to poke into stat_bin2d in order to copy (or call) their exact coordinate-binning code. UPDATE: here's the source for stat-bin2d.r
StatBin2d <- ggproto("StatBin2d", Stat,
default_aes = aes(fill = ..count..),
required_aes = c("x", "y"),
compute_group = function(data, scales, binwidth = NULL, bins = 30,
breaks = NULL, origin = NULL, drop = TRUE) {
origin <- dual_param(origin, list(NULL, NULL))
binwidth <- dual_param(binwidth, list(NULL, NULL))
breaks <- dual_param(breaks, list(NULL, NULL))
bins <- dual_param(bins, list(x = 30, y = 30))
xbreaks <- bin2d_breaks(scales$x, breaks$x, origin$x, binwidth$x, bins$x)
ybreaks <- bin2d_breaks(scales$y, breaks$y, origin$y, binwidth$y, bins$y)
xbin <- cut(data$x, xbreaks, include.lowest = TRUE, labels = FALSE)
ybin <- cut(data$y, ybreaks, include.lowest = TRUE, labels = FALSE)
...
}
bin2d_breaks <- function(scale, breaks = NULL, origin = NULL, binwidth = NULL,
bins = 30, right = TRUE) {
...
(But this seems a worthy enhance request on ggplot2, if it hasn't already been filed.)

How to annotate lines like this in ggplot2?

See example:
I hope I don't need to manually assign the coordinators of the texts. If this is too complicated to achieve in ggplot2, what are the alternatives in R? Or maybe even not in R?
As #Axeman says, ggrepel is a decent option. Unfortunately it will only avoid overlap with other labels, and not the lines, so the solution isn't quite perfect.
library(ggplot2)
install.packages("ggrepel")
library(ggrepel)
set.seed(50)
d <- data.frame(y = c(rnorm(50), rnorm(50, 5), rnorm(50, 10)),
x = rep(seq(50), times = 3),
group = rep(LETTERS[seq(3)], each = 50))
ggplot(d, aes(x, y, group = group, label = group)) +
geom_line() +
geom_text_repel(data = d[d$x == sample(d$x, 1), ], size = 10)

How do you label a horizontal line when the x axis is categorical?

There is a worked example that shows how to label a straight line in R using ggplot2. Please look at example 5 - "Recreate the following plot of flight volume by longitude".
How do you code if the x axis was categorical instead of continuous? How would one write the part of the syntax in geom_text that is currently
data = data.frame(x = - 119, y = 0)
I created a line
+ geom_text(aes(x,y, label = "seronegative"),
data = data.frame(x = 1, y = 20),
size = 4, hjust = 0, vjust = 0, angle = 0)
and I tried several options
data = data.frame(x = 1, y = 20)
data = data.frame(x = factor(1), y = 20)
#where gard is the name of one of the categories
data = data.frame(x = "gard", y = 20)
...but I get the error
invalid argument to unary operator
It's not entirely clear to me what you're trying to do, since you say you try to create a line, and then your code uses geom_text. Assuming that you'd like to place a vertical line, with a text label oriented vertically on that line, using a categorical x variable, here's a simple example:
dat <- data.frame(x = letters[1:5],y = 1:5)
txt <- data.frame(x = 1.5, y = 1, lab = "label")
ggplot(dat,aes(x = x, y = y)) +
geom_point() +
geom_vline(xintercept = 1.5) +
geom_text(data = txt,aes(label = lab),angle = 90, hjust = 0, vjust = 0)
which on my machine produces this output:
Note that I put the text labels in a separate data frame, outside the ggplot call. That is not be strictly necessary, but I prefer it as I find that it avoids confusion.
Using an x value of 1.5 for the text label works here, as would setting it to "a" if you wanted it directly on the plotted x values.
The error you're describing suggests to me a simple syntax error somewhere in your code (which you haven't completely provided). Perhaps this example will help you to spot it.

Resources