I have the following data frame resulting from simulations of ODEs with different parameter sets, e.g.
df <- data.frame(t = rep(seq(0,4), 4),
x1 = c(1.2*seq(1,5), 1.3*seq(1,5), 1.4*seq(1,5), 1.5*seq(1,5)),
x2 = c(0.2*seq(1,5), 0.3*seq(1,5), 0.4*seq(1,5), 0.5*seq(1,5)),
a = rep(c(rep(1, 5), rep(2,5)), 2),
b = c(rep(1, 10), rep(2,10))
)
I now would like to have a facet_grid with x1 and x2 on top and a and b on the right where the values of a and b determine the line colour.
I tried
df.1 <- df %>%
gather(x, xval, -t, -a, -b) %>%
gather(p, pval, -t, -x, -xval) %>%
distinct()
df.1$pval <- as.factor(df.1$pval)
ggplot(df.1, aes(t, xval)) +
geom_line(aes(colour = pval)) +
facet_grid(p~x)
and
dm.1 <- melt(df[, c("t", "x1", "x2")], id = 't')
colnames(dm.1) <- c("t", "x", "xval")
dm.2 <- melt(df[, c("t", "a", "b")], id = 't')
colnames(dm.2) <- c("t", "p", "pval")
dm <- merge(dm.1, dm.2)
dm$pval <- as.factor(dm$pval)
ggplot(dm, aes(t, xval)) +
geom_line(aes(colour = pval)) +
facet_grid(p~x)
But both do not give the desired result. Any hint would be greatly appreciated.
Edit: The desired result would be to have two lines in each facet similar to my first solution but the correct ones, i.e. straight lines and not the zig-zag lines that result.
The problem that's causing the zigzag plots you're getting is that there are multiple repeats of the same combinations of p and x, but there isn't a way to demarcate one from the other. So you get the first plot below:
library(tidyverse)
df <- data.frame(t = rep(seq(0,4), 4),
x1 = c(1.2*seq(1,5), 1.3*seq(1,5), 1.4*seq(1,5), 1.5*seq(1,5)),
x2 = c(0.2*seq(1,5), 0.3*seq(1,5), 0.4*seq(1,5), 0.5*seq(1,5)),
a = rep(c(rep(1, 5), rep(2,5)), 2),
b = c(rep(1, 10), rep(2,10))
)
df_long <- df %>%
gather(key = x, value = xval, x1, x2) %>%
gather(key = p, value = pval, a, b) %>%
mutate(pval = as.factor(pval))
df_long %>%
ggplot(aes(x = t, y = xval)) +
geom_line(aes(color = pval)) +
facet_grid(p ~ x)
You can see what it looks like here when I filter for a specific pair of x, p values. This will just place data at the same value of t repeatedly, instead of knowing how to make distinct lines.
df_long %>%
filter(x == "x1", p == "a") %>%
head()
#> t x xval p pval
#> 1 0 x1 1.2 a 1
#> 2 1 x1 2.4 a 1
#> 3 2 x1 3.6 a 1
#> 4 3 x1 4.8 a 1
#> 5 4 x1 6.0 a 1
#> 6 0 x1 1.3 a 2
Instead, before gathering, you can make an ID for each combination of a and b, and use that as your grouping variable in aes. There are probably other ways to do this, but a simple one is just interaction(a, b), which will give IDs that look like 1.1, 1.2, 2.1, 2.2, etc. Then add group = id inside your aes to make separate lines.
df_long_id <- df %>%
mutate(id = interaction(a, b)) %>%
gather(key = x, value = xval, x1, x2) %>%
gather(key = p, value = pval, a, b) %>%
mutate(pval = as.factor(pval))
df_long_id %>%
ggplot(aes(x = t, y = xval, group = id)) +
geom_line(aes(color = pval)) +
facet_grid(p ~ x)
Created on 2018-05-09 by the reprex package (v0.2.0).
Related
I am trying to plot multiple paths in a gganimate plot. I want the lines to fade out over the last N frames (e.g. N=5 in this example).
The data look like this:
set.seed(27)
df <- data.frame(Frame = rep(1:10, 3),
id = factor(rep(1:3, each = 10)),
x = runif(30),
y = runif(30))
head(df)
Frame id x y
1 1 1 0.97175023 0.14257923
2 2 1 0.08375751 0.47864658
3 3 1 0.87386992 0.05182206
4 4 1 0.32923136 0.25514379
5 5 1 0.22227551 0.14262912
6 6 1 0.40164822 0.48288482
I tried to make the plot using shadow_mark, but this doesn't appear to have the lines fade out over time.
df %>%
ggplot(aes(x = x, y = y, group = id, color = id)) +
geom_path() +
geom_point()+
scale_color_manual(values=c("red","blue","green")) +
transition_reveal(along = Frame) +
shadow_mark(size = 0.75) +
theme_void()
This just produces the below:
Is there a way to make these lines fade. Ideally, I'm just plotting a rolling path of N frames.
Is this something like what you're looking for? Adapted from the post mentioned in the comments. You don't need to use transition_reveal() if you use geom_segment().
library(gganimate)
library(dplyr)
library(tidyr)
set.seed(27)
n <- 10
df <- data.frame(Frame = rep(1:n, 3),
id = factor(rep(1:3, each = n)),
x = runif(3*n),
y = runif(3*n))
newdf <- df %>%
uncount(n, .id = "newframe") %>%
filter(Frame <= newframe) %>%
arrange(newframe, Frame) %>%
group_by(newframe, id) %>%
mutate(x_lag = lag(x),
y_lag = lag(y),
tail = last(Frame) - Frame,
# Make the points solid for 1 frame then alpha 0
point_alpha = if_else(tail == 0, 1, 0),
# Make the lines fade out over 3 frames
segment_alpha = pmax(0, (3-tail)/3)) %>%
ungroup()
ggplot(newdf,
aes(x = y, y = x, xend = y_lag, yend = x_lag, group = Frame, color = id)) +
geom_segment(aes(alpha = segment_alpha)) +
geom_point(aes(alpha = point_alpha)) +
scale_alpha(range = c(0,1)) +
guides(alpha = F) +
transition_manual(newframe) +
theme_void() +
scale_color_manual(values = c("red","blue","green"))
For example I have this dataset:
c1 c2
A 1
A 3
A 10
B 5
B 4
C 3
C 4
C 6
A 5
C 7
Is there a short way to maybe plot in 1 graph the first third of values of the A,B,C, the second third of values A,B,C, and the third third values A,B,C. For every variables there will be 3 lines.
So there will be 9 lines in total
You could use group_split and lapply:
df <- data.frame(c1 = rep(LETTERS[1:3], 3), c2 = sample(1:10, size = 9, rep = T))
df %>%
group_by(c1) %>%
mutate(num = 1:n()) %>%
group_split(num) -> plot_list
lapply(plot_list, function(x) {
ggplot(x, aes(x = num, y = c2)) + geom_line()
})
Or you use facets:
df %>%
group_by(c1) %>%
mutate(num = 1:n()) %>%
ggplot() +
facet_grid(scales = "free", cols = vars(num)) +
geom_line(aes(x = c1, y = c2, group = num))
Recently I discovered the function geom_table(), from ggpmisc package, which allows you to put a table inside a plot. But I don't know how to put different tables into a grid plot.
I have this df and plot:
library(lubridate)
library(ggplot2)
library(ggpmisc)
Date <- c("2010-01-28", "2010-02-28", "2010-03-28",
"2010-04-28", "2010-05-28", "2010-06-28",
"2010-07-28", "2010-08-28", "2010-09-28",
"2010-10-28")
Date <- as_date(Date)
Country <- rep("Japan", 10)
A <- runif(10, min=30, max=90)
B <- runif(10, min = 1, max = 15)
df <- data.frame(Date, Country, A, B)
df %>% pivot_longer(-c(Date, Country)) %>%
ggplot(aes(x=Date,y=value,group=1,color=Country))+
geom_line(size = 0.9) +
facet_grid(name~Country, scales = "free", switch = "y")
I also have these two tables, tableA and tableB:
Time <- c("Today", "Yesterday", "One week ago")
Value_A <- 10:12
Value_B <- 1:3
tableA <- data.frame(Time, Value_A)
tableB <- data.frame(Time, Value_B)
How I put tableA in the top graph and tableB in the bottom graph?
I appreciate it if someone can help :)
You need to create a little data frame that hosts your tableA and tableB in a list column:
d <- tibble(x = c(0.95, 0.95), y = c(0.95, 0.95),
name = c("A", "B"), tb = list(tableA, tableB))
df %>% pivot_longer(-c(Date, Country)) %>%
ggplot(aes(x=Date,y=value,group=1,color=Country))+
geom_line(size = 0.9) +
geom_table_npc(data = d, aes(npcx = x, npcy = y, label = tb)) +
facet_grid(name~Country, scales = "free", switch = "y")
I am having a spectral reflectance data like
library(hsdar)
library(tidyverse)
##Create some data
parameter <- data.frame(N = seq(1, 1.5, 0.05), LAI = seq(1,6,0.5))
spec <- PROSAIL(parameterList=parameter)
Then I have calculated the 1st order derivative of the data like
d1 <- derivative.speclib(spec)
I have extracted the dataframe from d1 object using following code
d1_df <- d1#spectra#spectra_ma
d1_wav <- d1#wavelength
colnames(d1_df) <- d1_wav
#Plotting of the data
matplot(d1_wav,t(d1_df[1:11,]),type='l', xlim = c(660, 800), ylim=c(-0.01,+0.01), xlab='Wavelength /nm',ylab='Reflectance')
Then I subsetted far-red (680 to 700 nm) and NIR (725 to 760 nm) region like
d1_df %>% as.data.frame() %>%
setNames(paste0("WV_", names(.))) %>%
mutate(ID = seq.int(nrow(.))) %>%
select(281:301, ID) %>%
pivot_longer(cols = -ID) %>%
separate(name, c("chr", "wv"), convert = T)
d1_df %>% as.data.frame() %>%
setNames(paste0("WV_", names(.))) %>%
mutate(ID = seq.int(nrow(.))) %>%
select(326:361, ID) %>%
pivot_longer(cols = -ID) %>%
separate(name, c("chr", "wv"), convert = T)
Now how can I fit two straight lines for the two regions and get the x corrosponding to the intersection of these two straight lines for each ID as shown in the following figure?
There is not a single unique answer to this question, because there is not a unique reflectance line (each ID has its own reflectance line and therefore its own unique crossing point). If we take your subsetted data like this:
region_A <- d1_df %>% as.data.frame() %>%
setNames(paste0("WV_", names(.))) %>%
mutate(ID = seq.int(nrow(.))) %>%
select(290:301, ID) %>%
pivot_longer(cols = -ID) %>%
separate(name, c("chr", "wv"), convert = T) %>%
mutate(ID = factor(ID))
region_B <- d1_df %>% as.data.frame() %>%
setNames(paste0("WV_", names(.))) %>%
mutate(ID = seq.int(nrow(.))) %>%
select(332:350, ID) %>%
pivot_longer(cols = -ID) %>%
separate(name, c("chr", "wv"), convert = T) %>%
mutate(ID = factor(ID))
and plot it, we see:
p <- ggplot(region_A, aes(x = wv, y = value, group = ID)) +
geom_line() +
geom_line(data = region_B)
p
If we extrapolate these lines, we can see they cross at different wavelengths:
p <- p + geom_smooth(method = "lm", formula = y ~ x, fullrange = TRUE,
aes(colour = factor(ID)), se = FALSE) +
geom_smooth(method = "lm", formula = y ~ x, fullrange = TRUE,
data = region_B, aes(colour = factor(ID)), se = FALSE) +
coord_cartesian(ylim = c(0, 0.0125))
p
We can get a linear regression on each of these lines like this:
modA <- lm(value ~ wv * ID, data = region_A)
modB <- lm(value ~ wv * ID, data = region_B)
And we can define a function that returns 0 when the prediction from both models at a particular wavelength is the same like so:
meet_at <- function(X, ID)
{
A <- predict(modA, newdata = list(wv = X, ID = ID))
B <- predict(modB, newdata = list(wv = X, ID = ID))
abs(A - B)
}
This allows us to use the optimise function to find the crossing point for each of the two lines and return a nice data frame of results as follows:
df <- do.call(rbind, lapply(unique(region_A$ID), function(i) {
wv <- optimize(meet_at, c(700, 740), ID = i)$minimum
value <- predict(modA, newdata = list(wv = wv, ID = i))
data.frame(wv, value, ID = as.character(i))
}))
df
#> wv value ID
#> 1 708.8861 0.004254394 1
#> 11 710.4923 0.005915650 2
#> 12 712.1372 0.007343448 3
#> 13 713.6095 0.008527553 4
#> 14 714.8414 0.009483770 5
#> 15 715.8220 0.010241372 6
#> 16 716.5676 0.010833544 7
#> 17 717.1078 0.011292029 8
#> 18 717.4764 0.011644701 9
#> 19 717.7071 0.011914912 10
#> 110 717.8309 0.012121712 11
and we know these results are correct if we draw these points on our plot:
p + geom_vline(data = df, aes(xintercept = wv, colour = ID)) +
geom_point(data = df)
So the answer to your question is that the crossing occurs according to the ID at wavelengths ranging between 708 and 718 nm, with the exact details as per df
I am comparing the intra-group correlation between duplicate samples within a large gene expression experiment, where I have multiple separate biological groups - the idea being to see if any of the groups is much less well-correlated than the others, indicating a potential sample mixup or other error.
I am using ggplot to plot the expression values of each duplicate pair against each other. I would like to also be able to add the correlation coefficient and p-value to each panel of the plot, which I obtain through summarize and cor.test. You can use this code to get the general idea: in exp1, the duplicates are correlated, but not in exp2.
library(tidyverse)
df <- data.frame(exp=c(rep('exp1', 100), rep('exp2', 100)), a=rnorm(200, 1000, 200))
df <- mutate(df, b=ifelse(exp=='exp1', a*rnorm(100,1,0.05), rnorm(100, 1000, 200)))
head(df)
tail(df)
df %>% ggplot(aes(x=a, y=b))+
geom_point() +
facet_wrap(~exp)
group_by(df, exp) %>%
summarize(corr=cor.test(a,b)$estimate, pval=cor.test(a,b)$p.value)
This is the plot I generated via ggplot, and I've manually added the R and p-values that I got at the end. But of course, if I have a lot of sample pairs to analyze, it would be nice to be able to add these automatically from within the ggplot call. I'm just not sure how to do it.
If, for whatever reason, you want to build this yourself instead of using the ggpubr functions, you can create your summary data, format labels, and place the labels with geom_text.
I'm formatting the stats so that R has a fixed 3 significant digits and p has 3 digits, falling back on scientific notation. I changed the names of those columns in summarise to R and p to make the labels below. Reshaping to long data and creating a new column with unite gets this:
library(tidyverse)
...
group_by(df, exp) %>%
summarize(R = cor.test(a, b)$estimate, p = cor.test(a, b)$p.value) %>%
mutate(R = formatC(R, format = "fg", digits = 3),
p = formatC(p, format = "g", digits = 3)) %>%
gather(key = measure, value = value, -exp) %>%
unite("stat", measure, value, sep = " = ")
#> # A tibble: 4 x 2
#> exp stat
#> <chr> <chr>
#> 1 exp1 R = 0.965
#> 2 exp2 R = 0.0438
#> 3 exp1 p = 1.14e-58
#> 4 exp2 p = 0.665
Then for each of the groups, I want to collapse both labels, separated by a newline \n. This is a place that will scale well—you might have more summary stats to display, but this should still work.
summ <- group_by(df, exp) %>%
summarize(R = cor.test(a, b)$estimate, p = cor.test(a, b)$p.value) %>%
mutate(R = formatC(R, format = "fg", digits = 3),
p = formatC(p, format = "g", digits = 3)) %>%
gather(key = measure, value = value, -exp) %>%
unite("stat", measure, value, sep = " = ") %>%
group_by(exp) %>%
summarise(both_stats = paste(stat, collapse = "\n"))
summ
#> # A tibble: 2 x 2
#> exp both_stats
#> <chr> <chr>
#> 1 exp1 "R = 0.965\np = 1.14e-58"
#> 2 exp2 "R = 0.0438\np = 0.665"
In geom_text, I'm setting the x coordinate to -Inf, which gets the minimum of all x values, and the y coordinate as Inf for the maximum of all y values. That puts the label in the top-left corner, regardless of the values in the data.
The one thing I don't like here is then hacking the hjust and vjust outside their intended ranges of 0 to 1. But nudge_x/nudge_y won't do anything because of the values being set to infinity.
df %>%
ggplot(aes(x = a, y = b)) +
geom_point() +
geom_text(aes(x = -Inf, y = Inf, label = both_stats), data = summ,
hjust = -0.1, vjust = 1.1, lineheight = 1) +
facet_wrap(~ exp)
Created on 2018-11-14 by the reprex package (v0.2.1)
We can use the stat_cor function from the ggpubr package.
set.seed(123)
library(dplyr)
library(ggplot2)
library(ggpubr)
df <- data.frame(exp=c(rep('exp1', 100), rep('exp2', 100)), a=rnorm(200, 1000, 200))
df <- mutate(df, b=ifelse(exp=='exp1', a*rnorm(100,1,0.05), rnorm(100, 1000, 200)))
ggplot(df, aes(x=a, y=b))+
geom_point() +
facet_wrap(~exp) +
stat_cor(method = "pearson")
Similar to the answer of camille, but you can do all in one run
library(tidyverse)
set.seed(123)
df %>%
group_by(exp) %>%
mutate(p = cor.test(a, b)$p.value,
rho = cor.test(a, b)$estimate) %>%
mutate_at(vars(p, rho), signif, 2) %>%
ggplot(aes(x=a, y=b)) +
geom_point() +
geom_text(data = . %>% distinct(p, rho, exp),
aes(x = -Inf, y = Inf,label = paste("p=",p,"\nrho=",rho)),
hjust = -0.1, vjust = 1.1, lineheight = 1) +
facet_wrap(~exp)