This question already has an answer here:
label specific point in ggplot2
(1 answer)
Closed 1 year ago.
Basically, I want to make a graph that shows where different "analysts" chose a certain point on the graph.
This is what the base graph looks like
.
This is what I want to produce
.
I have a separate dataframe called sum_data that summarizes the time choices made by each analyst. It looks like this. The following is the code used to create the plot:
gqplot <- ggplot(Qdata,
aes(x = date,
y = cfs))+
labs(#title = paste(watershedID,"_",event),
x = "Date",
y = "Flow [cfs]")+
geom_line(colour = "#000099")+
# Show plot
gqplot
Hey what you need is a data.frame that contains the choices of those 2 people (like your sum_data) and then use geom_point().
Here is an example. I made up data and code etc because you didn't provide a completely reproducible example.
# Library
library(ggplot2)
# Seed for exact reproducibility
set.seed(20210307)
# Main data frame
main_data <- data.frame(x = 1:10, y = rnorm(10))
# Analyst data frame
analyst_choice <- data.frame(x = c(2, 3), y = main_data[2:3, 'y'], analyst = c('John', 'Paul'))
# Create plot
ggplot(main_data, aes(x = x, y = y)) +
geom_line() +
geom_point(data = analyst_choice, aes(x = x, y = y, colour = analyst), size = 10, shape = 4)
That's what this code produces:
Related
I am, in R and using ggplot2, plotting the development over time of several variables for several groups in my sample (days of the week, to be precise). An artificial sample (using long data suitable for plotting) is this:
library(tidyverse)
groups1 <- rep(1:2, each = 7 * 100)
groups2 <- rep(rep(1:7, times = 2), each = 100)
x <- rep(1:100, times = 14)
values <- c(rnorm(n = 700), rgamma(n = 700, shape = 2))
data <- tibble(x, groups1, groups2, values)
data %>% ggplot(mapping = aes(x = x, y = values)) + geom_line() + facet_grid(groups2 ~ groups1)
which gives
In this example, the first variable -- shown in the left column -- has unlimited range, while the second variable -- shown in the right column -- is weakly positive.
I would like to reflect this in my plot by allowing the Y axes to differ across the columns in this plot, i.e. set Y axis limits separately for the two variables plotted. However, in order to allow for easy visual comparison of the different groups for each of the two variables, I would also like to have the identical Y axes within each column.
I've looked at the scales option to facet_grid(), but it does not seem to be able to do what I want. Specifically,
passing scales = "free_x" allows the Y axes to vary across rows, while
passing scales = "free_y" allows the X axes to vary across columns, but
there is no option to allow the Y axes to vary across columns (nor, presumably, the X axes across rows).
As usual, my attempts to find a solution have yielded nothing. Thank you very much for your help!
I think the easiest would to create a plot per facet column and bind them with something like {patchwork}. To get the facet look, you can still add a faceting layer.
library(tidyverse)
library(patchwork)
groups1 <- rep(1:2, each = 7 * 100)
groups2 <- rep(rep(1:7, times = 2), each = 100)
x <- rep(1:100, times = 14)
set.seed(42) ## always better to set a seed before using random functions
values <- c(rnorm(n = 700), rgamma(n = 700, shape = 2))
data <- tibble(x, groups1, groups2, values)
data %>%
group_split(groups1) %>%
map({
~ggplot(.x, aes(x = x, y = values)) +
geom_line() +
facet_grid(groups2 ~ groups1)
}) %>%
wrap_plots()
Created on 2023-01-11 with reprex v2.0.2
I am creating a scatter plot using ggplot/geom_point. Here is my code for building the function in ggplot.
AddPoints <- function(x) {
list(geom_point(data = dat , mapping = aes(x = x, y = y) , shape = 1 , size = 1.5 ,
color = "blue"))
}
I am wondering if it would be possible to replace the standard points on the plot with numbers. That is, instead of seeing a dot on the plot, you would see a number on the plot to represent each observation. I would like that number to correspond to a column for that given observation (column name 'RP'). Thanks in advance.
Sample data.
Data <- data.frame(
X = sample(1:10),
Y = sample(3:12),
RP = sample(c(4,8,9,12,3,1,1,2,7,7)))
Use geom_text() and map the rp variable to the label argument.
ggplot(Data, aes(x = X, y = Y, label = RP)) +
geom_text()
I have a dataset I'm plotting, with facets by variables (in the toy dataset - densities of 2 species). I need to use the actual variable names to do 2 things: 1) italicize species names, and 2) have the 2 in n/m2 properly superscripted (or ASCII-ed, whichever easier).
It's similar to this, but I can't seem to make it work for my case.
toy data
library(ggplot2)
df <- data.frame(x = 1:10, y = 1:10,
z = rep(c("Species1 density (n/m2)", "Species2 density (m/m2)"), each = 5),
z1 = rep(c("Area1", "Area2", "Area3", "Area4", "Area5"), each = 2))
ggplot(df) + geom_point(aes(x = x, y = y)) + facet_grid(z1 ~ z)
I get an error (variable z not found) when I try to use the code in the answer naively. How do I get around having 2 variables in the facetting?
A little modification gets the code from your link to work. I've changed the code to use data_frame to stop the character vector being converted to a factor, and taken the common information out of the codes so it can be added via the labeller (otherwise it would be a pain to make half the text italic)
library(tidyverse)
df <- data_frame(
x = 1:10,
y = 1:10,
z = rep(c("Species1", "Species2"), each = 5),
z1 = rep(c("Area1", "Area2", "Area3", "Area4", "Area5"), each = 2)
)
ggplot(df) +
geom_point(aes(x = x, y = y)) +
facet_grid(z1 ~ z, labeller = label_bquote(col = italic(.(z))~density~m^2))
I have an experiment where three evolving populations of yeast have been studied over time. At discrete time points, we measured their growth, which is the response variable. I basically want to plot the growth of yeast as a time series, using boxplots to summarise the measurements taken at each point, and plotting each of the three populations separately. Basically, something that looks like this (as a newbie, I don't get to post actual images, so x,y,z refer to the three replicates):
| xyz
| x z xyz
| y xyz
| xyz y
| x z
|
-----------------------
t0 t1 t2
How can this be done using ggplot2? I have a feeling that there must be a simple and elegant solution, but I can't find it.
Try this code:
require(ggplot2)
df <- data.frame(
time = rep(seq(Sys.Date(), len = 3, by = "1 day"), 10),
y = rep(1:3, 10, each = 3) + rnorm(30),
group = rep(c("x", "y", "z"), 10, each = 3)
)
df$time <- factor(format(df$time, format = "%Y-%m-%d"))
p <- ggplot(df, aes(x = time, y = y, fill = group)) + geom_boxplot()
print(p)
Only with x = factor(time), ggplot(df, aes(x = factor(time), y = y, fill = group)) + geom_boxplot() + scale_x_date(), was not working.
Pre-processing, factor(format(df$time, format = "%Y-%m-%d")), was required for this form of graphics.
This question already has answers here:
Easier way to plot the cumulative frequency distribution in ggplot?
(3 answers)
Closed 4 years ago.
I have a data frame, which after applying the melt function looks similar to:
var val
1 a 0.6133426
2 a 0.9736237
3 b 0.6201497
4 b 0.3482745
5 c 0.3693730
6 c 0.3564962
..................
The initial dataframe had 3 columns with the column names, a,b,c and their associated values.
I need to plot on the same graph, using ggplot the associated ecdf for each of these columns (ecdf(a),ecdf(b),ecdf(c)) but I am failing in doing this. I tried:
p<-ggplot(melt_exp,aes(melt_exp$val,ecdf,colour=melt_exp$var))
pg<-p+geom_step()
But I am getting an error :arguments imply differing number of rows: 34415, 0.
Does anyone have an idea on how this can be done? The graph should look similar to the one returned by plot(ecdf(x)), not a step-like one.
Thank you!
My first thought was to try to use stat_function, but since ecdf returns a function, I couldn't get that working quickly. Instead, here's a solution the requires that you attach the computed values to the data frame first (using Ramnath's example data):
library(plyr) # function ddply()
mydf_m <- ddply(mydf_m, .(variable), transform, ecd = ecdf(value)(value))
ggplot(mydf_m,aes(x = value, y = ecd)) +
geom_line(aes(group = variable, colour = variable))
If you want a smooth estimate of the ECDF you could also use geom_smooth together with the function ns() from the spline package:
library(splines) # function ns()
ggplot(mydf_m, aes(x = value, y = ecd, group = variable, colour = variable)) +
geom_smooth(se = FALSE, formula = y ~ ns(x, 3), method = "lm")
As noted in a comment above, as of version 0.9.2.1, ggplot2 has a specific stat for this purpose: stat_ecdf. Using that, we'd just do something like this:
ggplot(mydf_m,aes(x = value)) + stat_ecdf(aes(colour = variable))
Based on Ramnath, approach above, you get the ecdf from ggplot2 by doing the following:
require(ggplot2)
mydf = data.frame(
a = rnorm(100, 0, 1),
b = rnorm(100, 2, 1),
c = rnorm(100, -2, 0.5)
)
mydf_m = melt(mydf)
p0 = ggplot(mydf_m, aes(x = value)) +
stat_ecdf(aes(group = variable, colour = variable))
print(p0)
Here is one approach
require(ggplot2)
mydf = data.frame(
a = rnorm(100, 0, 1),
b = rnorm(100, 2, 1),
c = rnorm(100, -2, 0.5)
)
mydf_m = melt(mydf)
p0 = ggplot(mydf_m, aes(x = value)) +
geom_density(aes(group = variable, colour = variable)) +
opts(legend.position = c(0.85, 0.85))