Shading different regions of the graph based on time period - r

I am creating a graph using ggplot2 that takes dates on the x-axis (i.e 1000 years ago) and probabilities on the y-axis. I would like to distinguish different time periods by shading regions of the graph different colors. I stored the following dates here:
paleo.dates <- c(c(13500,8000), c(13500,10050) ,c(10050,9015),
c(9015,8000), c(8000,2500), c(8000,5500), c(5500,3500), c(3500,2500),
c(2500,1150), c(2500,2000), c(2000,1500), c(1500,1150), c(1150,500))
I would like to take a time period, say 13500 to 8000, and color code it until it overlaps with another date, such as the third entry.
I am using the ggplot2 cheatsheat, and I attempted to use aes(fill = paleo.dates), but this does not work as it is not the same length as my dataset. I was also thinking of using + geom_rect() to manually fill the areas, but that does not seem very elegant, and I am not sure it will even work.
Any advice is appreciated, thank you.

You just need to create a subset of period. In this case I created a sub vector to transform into a factor to facilitate the fill.
library(dplyr)
library(ggplot2)
df <- data.frame(paleo.dates = seq(500, 13000, 100),
p = runif(n = length(seq(500, 13000, 100)),
0, 1))
sub <- data.frame(sub = rep(1:(13000/500), each = 5))
sub <- sub %>%
dplyr::slice(1:nrow(df))
df <- df %>%
dplyr::mutate(period = sub$sub,
period = as.factor(period))
ggplot2::ggplot(df) +
geom_bar(aes(x = paleo.dates, y = p,
fill = period,
col = period),
show.legend = F, stat = "identity") +
theme_bw()

Related

Plot relative abundance through time

I would like to plot a number of symmetric bars like these two, in which the width of the bar corresponds to the relative abundance of the variable through time. I could not find anything similar in R; any help is appreciated.
Are you looking for a violin plot?
As per your comment, the violin plot is not what you are after.
There are two approximate solutions, neither of them ideal but they get you a bit further:
library(dplyr)
library(tibble)
library(ggplot2)
set.seed(123)
data <- tibble(
Date = seq.Date(from = as.Date("2020/01/01"), length = 50, by = "day"),
Value = runif(50, min = 0, max = 10)
)
data <- data %>%
mutate(Value_plus = Value,
Value_min = -Value)
p <- ggplot(data = data, aes(fill = "red")) +
geom_step(aes(x = Date, y = Value_plus)) +
geom_step(aes(x = Date, y = Value_min))
p
p <- ggplot(data = data, ) +
geom_ribbon(aes(x = Date, ymin = Value_min, ymax = Value_plus))
p
The first plot has the steps that you suggest in your example but a fill for geom_step appears non-trivial. The second plot, using geom_ribbon gives you a fill but not the steps. There are several examples of solutions (e.g. here) on how to get to a filled step plot.
Using geom_step:
Using geom_ribbon:

R ggplot2 Specify separate color gradients by group

I'm trying to make separate color gradients for grouped data that is displayed on the same scatterplot. I've included sample data below. User is unique user IDs, task is unique task IDs, days_completion is the time in days when the task was completed, task_group is the group indicator that the tasks are grouped into, and task_order is the order in which the tasks were made available for users to complete. Each row represents the time that the user completed a specific task. The task_order may not logically follow this organization as it was randomly generated, but it should suffice for demonstration.
The resulting plot would have days_completion of the x axis, user on the y axis, each point from geom_point would represent the time in days that the user completed their task. The tasks groups would each have their own color in a gradient of dark to light by task_order. For example, task group 1 would be dark red at task order == 1 and light red at task order == 7.
Sample code is below:
library(dplyr)
library(forcats)
library(ggplot2)
test_data <- tibble(user = rep(seq(1:50), 10) %>%
as_factor(),
task = sample(1:10, 500, replace = TRUE) %>%
as_factor(),
days_completion = sample(1:500, 500, replace = FALSE),
task_group = sample(1:3, 500, replace = TRUE) %>%
as_factor(),
task_order = sample(1:7, 500, replace = TRUE, prob = c(rep(.25,3),.2,.2,.1,.1)) %>%
as_factor()) %>%
arrange(days_completion)
#Sample plotting approach; does not work
test_plot <- test_data %>%
ggplot(aes(x = days_completion, y = user, color = task)) +
geom_point() +
#This seems to be what I need, but I can't figure out how to specify multiple gradients by task_group
scale_color_gradient()
I know I could manually order the factors and map colors with hex codes, but I'd like something that can scale and avoid the manual process. Also, if anyone has any suggestions for how to display this plot other than a scatterplot, I'm open to suggestions. The main idea is to detect patterns in completion time in trends displayed by the color. The trends may not show due to it being randomly generated data, but that's okay.
My coworker found a solution in another post that requires an additional package called ggnewscale. I still don't know if this can be done only with ggplot2, but this works. I'm still open to alternative plotting suggestions though. The purpose is to detect any trends in day of completion across and within users. Across users is where I expect to see more of a trend, but within could be informative too.
How merge two different scale color gradient with ggplot
library(ggnewscale)
dat1 <- test_data %>% filter(task_group == 1)
dat2 <- test_data %>% filter(task_group == 2)
dat3 <- test_data %>% filter(task_group == 3)
ggplot(mapping = aes(x = days_completion, y = user)) +
geom_point(data = dat1, aes(color = task_order)) +
scale_color_gradientn(colors = c('#99000d', '#fee5d9')) +
new_scale_color() +
geom_point(data = dat2, aes(color = task_order)) +
scale_color_gradientn(colors = c('#084594', '#4292c6')) +
new_scale_color() +
geom_point(data = dat3, aes(color = task_order)) +
scale_color_gradientn(colors = c('#238b45'))
You can have generate your own color scale by using RColorBrewer and pass it to scale_color_manual:
library(RColorBrewer)
colo <- colorRampPalette(c("darkred", "orangered"))(10)
library(ggplot2)
ggplot(test_data, aes(x = days_completion, y = user))+
geom_point(aes(color = task))+
scale_color_manual(values = colo)
Regarding the representation other than scatterplot, it is difficult to propose something else. It will based on your original data and the question you are trying to solve. Do you need to see the pattern per user ? or does your 50 users are just replicate of your experiments. In those cases, maybe some geom_density could be helpful. Otherwise, maybe you can take a look at stat_contour function.

Time series data using ggplot: how use different color for each time point and also connect with lines data belonging to each subject?

I have data from several cells which I tested in several conditions: a few times before and also a few times after treatment. In ggplot, I use color to indicate different times of testing.
Additionally, I would like to connect with lines all data points which belong to the same cell. Is that possible?...
Here is my example data (https://www.dropbox.com/s/eqvgm4yu6epijgm/df.csv?dl=0) and a simplified code for the plot:
df$condition = as.factor(df$condition)
df$cell = as.factor(df$cell)
df$condition <- factor(df$condition, levels = c("before1", "before2", "after1", "after2", "after3")
windows(width=8,height=5)
ggplot(df, aes(x=condition, y=test_variable, color=condition)) +
labs(title="", x = "Condition", y = "test_variable", color="Condition") +
geom_point(aes(color=condition),size=2,shape=17, position = position_jitter(w = 0.1, h = 0))
I think you get in the wrong direction for your code, you should instead group and colored each points based on the column Cell. Then, if I'm right, you are looking to see the evolution of the variable for each cell before and after a treatment, so you can order the x variable using scale_x_discrete.
Altogether, you can do something like that:
library(ggplot2)
ggplot(df, aes(x = condition, y = variable, group = Cell)) +
geom_point(aes(color = condition))+
geom_line(aes(color = condition))+
scale_x_discrete(limits = c("before1","before2","after1","after2","after3"))
Does it look what you are expecting ?
Data
df = data.frame(Cell = c(rep("13a",5),rep("1b",5)),
condition = rep(c("before1","before2","after1","after2","after3"),2),
variable = c(58,55,36,29,53,57,53,54,52,52))

How to plot the mean of a single factor in a barplot with

I'm having trouble to create a figure with ggplot2.
In this plot, I'm using geom_bar to plot three factors. I mean, for each "time" and "dose" I'm plotting two bars (two genotypes).
To be more specific, this is what I mean:
This is my code till now (Actually I changed some settings, but I'm presenting just what is need for):
ggplot(data=data, aes(x=interaction(dose,time), y=b, fill=factor(genotype)))+
geom_bar(stat="identity", position="dodge")+
scale_fill_grey(start=0.3, end=0.6, name="Genotype")
Question: I intend to add the mean of each time using points and that these points are just in the middle of the bars of a certain time. How can I proceed?
I tried to add these points using geom_dotplot and geom_point but I did not succeed.
library(dplyr)
time_data = data %>% group_by(time) %>% summarize(mean(b))
data <- inner_join(data,time_data,by = "time")
this gives you data with the means attached. Now make the plot
ggplot(data=data, aes(x=interaction(dose,time), y=b,fill=factor(genotype)))+
geom_bar(stat="identity", position="dodge")+
scale_fill_grey(start=0.3, end=0.6, name="Genotype")+
geom_text(aes(b),vjust = 0)
You might need to fiddle around with the argument hjust and vjust in the geom_text statement. Maybe the aes one too, I didn't run the program so I don't know.
It generally helps if you can give a reproducible example. Here, I made some of my own data.
sampleData <-
data.frame(
dose = 1:3
, time = rep(1:3, each = 3)
, genotype = rep(c("AA","aa"), each = 9)
, b = rnorm(18, 20, 5)
)
You need to calculate the means somewhere, and I chose to do that on the fly. Note that, instead of using points, I used a line to show that the mean is for all of those values. I also sorted somewhat differently, and used facet_wrap to cluster things together. Points would be a fair bit harder to place, particularly when using position_dodge, but you could likely modify this code to accomplish that.
ggplot(
sampleData
, aes(x = dose
, y = b
, fill = genotype)
) +
geom_bar(position = "dodge", stat = "identity") +
geom_hline(data =
sampleData %>%
group_by(time) %>%
summarise(meanB = mean(b)
, dose = NA, genotype = NA)
, aes(yintercept = meanB)
, col = "black"
) +
facet_wrap(~time)

ggplot2 graphic with several x variable?

I need help for a R graphic issue with ggplot2.
Lets take an example :
date <- c("oct", "dec")
min.national <- c(17, 20)
min.international <- c(11, 12)
min.roaming <- c(5, 7)
mb.national <- c(115, 150)
mb.international <- c(72, 75)
mb.roaming <- c(30, 40)
df <- data.frame(min.national, min.international, min.roaming, mb.national, mb.international, mb.roaming)
What I want is to have two graphic one for the minutes and one for the megabytes sideline. And to get bars for the three variable (for the minutes in national, international and roaming for example) on the same graphic with fill = date ?
Is it clear for you ?
Thanks
I appreciate there may be a language challenge here, and it sounds like you're just getting started with ggplot2 so not sure how to get started on this, so I hope you find this useful.
It makes sense to treat the minutes and mb separately; they're different units. So I'll just use the minutes as an example. What I understand you're trying to achieve is easy with the right approach and the tidyr library.
library(tidyr)
library(ggplot2)
#first get your data in a data frame
min.df <- data.frame(national = min.national, international = min.international, roaming = min.roaming, month = date)
#now use the tidyr function to create a long data frame, you should recognize that this gives you a data structure readily suited to what you want to plot
min.df.long <- gather(min.df, "region", "minutes", 1:3)
ggplot(min.df.long) + geom_bar(aes(x = region, y = minutes, fill = month), stat = "identity")
If you want the months side by side, as I understand your question, then you could do:
ggplot(min.df.long) + geom_bar(aes(x = region, y = minutes, fill = factor(month, levels = c("oct", "dec"))), position = "dodge", stat = "identity") + labs(fill = "month")
The key parameter is the position keyword, the rest is just to make it neater.
df <- data.frame(date, min.national, min.international, min.roaming, mb.national, mb.international, mb.roaming)
df.stk <- tidyr::separate(melt(df), col="variable", into=c("min_byte", "type"), sep="\\.")
plt <- ggplot(df.stk, aes(type, value, fill = date)) +
geom_bar(stat = "identity") +
facet_grid(.~min_byte)
print(plt)

Resources