Plot difference over time in R - r

Consider the following data:
set.seed(4235)
dates <- c("2016-01-01", "2015-01-01", "2014-01-01", "2013-01-01")
small <- data.frame(group = "small", n1 = rnorm(4), dates = as.Date(dates))
medium <- data.frame(group = "medium", n1 = rnorm(4), dates = as.Date(dates))
large <- data.frame(group = "large", n1 = rnorm(4), dates = as.Date(dates))
data <- rbind(small, medium, large)
This is pretty basic data that can be plotted like this:
ggplot(data, aes(dates, col = group)) +
geom_line(aes(y = n1))
However, imagine that I want to plot the small and medium group against the large group. In other words the difference between the small and medium group and the large. In other words the large group should be represented by a straight line around zero and the other groups should represent the difference. Something like a autocovariance plot.
Any idea on how to do this with ggplot?

You probably can't do it with ggplot directly, though it is relatively straightforward to calculate the differences first, then pass them back to ggplot.
Here, I am using tidyr and dplyr to do the manipulations. First, I spread the data to get the groups in their own columns (with one row per date) to allow comparison. Then, I mutate to create the difference variables of interest. Finally, I gather the comparisons back into long form (be warned that this duplicates the entries in small, medium, and large; however, those can be dropped with select if needed). Then, simply pass the result to ggplot and plot however you desire (here, simple lines again).
data %>%
spread(group, n1) %>%
mutate(large - medium
, large - small) %>%
gather(Comparison, Difference, `large - medium`, `large - small`) %>%
ggplot(aes(x = dates
, y = Difference
, col = Comparison)) +
geom_line()
gives:

This should also work:
library(reshape2)
df <- dcast(data, dates~group, value.var = 'n1')
df$diff.small <- df$small - df$large
df$diff.medium <- df$medium - df$large
df$large <- 0
data <- subset(melt(df, id='dates'), variable %in% c('diff.small', 'diff.medium', 'large'))
ggplot(data, aes(dates, fill = variable, col = variable)) +
geom_ribbon(aes(ymax=value, ymin=0), alpha=0.2) +
geom_line(aes(y = value))

Related

Setting per-column y axis limits with facet_grid

I am, in R and using ggplot2, plotting the development over time of several variables for several groups in my sample (days of the week, to be precise). An artificial sample (using long data suitable for plotting) is this:
library(tidyverse)
groups1 <- rep(1:2, each = 7 * 100)
groups2 <- rep(rep(1:7, times = 2), each = 100)
x <- rep(1:100, times = 14)
values <- c(rnorm(n = 700), rgamma(n = 700, shape = 2))
data <- tibble(x, groups1, groups2, values)
data %>% ggplot(mapping = aes(x = x, y = values)) + geom_line() + facet_grid(groups2 ~ groups1)
which gives
In this example, the first variable -- shown in the left column -- has unlimited range, while the second variable -- shown in the right column -- is weakly positive.
I would like to reflect this in my plot by allowing the Y axes to differ across the columns in this plot, i.e. set Y axis limits separately for the two variables plotted. However, in order to allow for easy visual comparison of the different groups for each of the two variables, I would also like to have the identical Y axes within each column.
I've looked at the scales option to facet_grid(), but it does not seem to be able to do what I want. Specifically,
passing scales = "free_x" allows the Y axes to vary across rows, while
passing scales = "free_y" allows the X axes to vary across columns, but
there is no option to allow the Y axes to vary across columns (nor, presumably, the X axes across rows).
As usual, my attempts to find a solution have yielded nothing. Thank you very much for your help!
I think the easiest would to create a plot per facet column and bind them with something like {patchwork}. To get the facet look, you can still add a faceting layer.
library(tidyverse)
library(patchwork)
groups1 <- rep(1:2, each = 7 * 100)
groups2 <- rep(rep(1:7, times = 2), each = 100)
x <- rep(1:100, times = 14)
set.seed(42) ## always better to set a seed before using random functions
values <- c(rnorm(n = 700), rgamma(n = 700, shape = 2))
data <- tibble(x, groups1, groups2, values)
data %>%
group_split(groups1) %>%
map({
~ggplot(.x, aes(x = x, y = values)) +
geom_line() +
facet_grid(groups2 ~ groups1)
}) %>%
wrap_plots()
Created on 2023-01-11 with reprex v2.0.2

Bins for fixed interval in Longitudinal data and plotting it over the period of time by categories

It is longitudinal data; ID wise values are repeating 4 times in every tick of 20 steps. Then this experiments repeats. For the datafarme below I want bins based for every tick time steps for the categories of land based on the values of X.
Bins can be 3 for every time interval for land type (Small, medium and large) each.
I want to see timeline of bins of X based on categories of Land.
Any help will be appreciated. I have added possibly a picture of how data may look like for ggplot and plot as bins or dots may look like as in picture.
Seed(123)
ID = 1:5
Time = rep (c(1,2,3,4,5), each = 20)
Type = 1:25
data <- data.frame( IDn = rep(ID,20), Time, Land = rep(Type, 40), y = rnorm(100,0,1), x = runif(100,0,1))
data$Land= ifelse (data$Land > 15,"large farmers", ifelse(data$Land <=5, "small farmers", "medium-farmers"))
Edit: Question for labeling the faceting variable and dot plots.
Maybe something like this would help -
library(dplyr)
library(ggplot2)
data %>%
group_by(Time, Land) %>%
mutate(x = cut(x, c(0, 0.25, 0.75, 1))) %>%
ungroup %>%
count(Time, Land, x) %>%
ggplot() + aes(Time, n, fill = Land) + geom_col(position = 'dodge')

How to plot % positive cases (y-axis) by collection date (x-axis) and by other factors (R)?

Please help!
I have case data I need to prepare for a report soon and just cannot get the graphs to display properly.
From a dataset with CollectionDate as the "record" of cases (i.e. multiple rows with the same date means more cases that day), I want to display Number of positive cases/total (positive + negative) cases for that day as a percent on the y-axis, with collection dates along the x-axis. Then I want to break down by region. Goal is to look like this but in terms of daily positives/# of tests rather than just positives vs negatives. I also want to add a horizontal line on every graph at 20%.
I have tried manipulating it before, in and after ggplot:
ggplot(df_final, aes(x =CollectionDate, fill = TestResult)) +
geom_bar(aes(y=..prop..)) +
scale_y_continuous(labels=percent_format())
Which is, again, close. But the percents are wrong because they are just taking the proportion of that day against counts of all days instead of per day.
Then I tried using tally()in the following command to try and count per region and aggregate:
df_final %>%
group_by(CollectionDate, Region, as.factor(TestResult)) %>%
filter(TestResult == "Positive") %>%
tally()
and I still cannot get the graphs right.
Suggestions?
A quick look at my data:
head(df_final)
Well, I have to say that I am not 100% sure that I got what you want, but anyway, this can be helpful.
The data: Since you are new here, I have to let you know that using a simple and reproducible version of your data will make it easier to the rest of us to answer. To do this you can simulate a data frame o any other objec, or use dput function on it.
library(ggplot2)
library(dplyr)
data <- data.frame(
# date
CollectionDate = sample(
seq(as.Date("2020-01-01"), by = "day", length.out = 15),
size = 120, replace = TRUE),
# result
TestResult = sample(c("Positive", "Negative"), size = 120, replace = TRUE),
# region
Region = sample(c("Region 1", "Region2"), size = 120, replace = TRUE)
)
With this data, you can do ass follow to get the plots you want.
# General plot, positive cases proportion
data %>%
count(CollectionDate, TestResult, name = "cases") %>%
group_by(CollectionDate) %>%
summarise(positive_pro = sum(cases[TestResult == "Positive"])/sum(cases)) %>%
ggplot(aes(x = CollectionDate, y = positive_pro)) +
geom_col() +
geom_hline(yintercept = 0.2)
# positive proportion by day within region
data %>%
count(CollectionDate, TestResult, Region, name = "cases") %>%
group_by(CollectionDate, Region) %>%
summarise(
positive_pro = sum(cases[TestResult == "Positive"])/sum(cases)
) %>%
ggplot(aes(x = CollectionDate, y = positive_pro)) +
geom_col() +
# horizontal line at 20%
geom_hline(yintercept = 0.2) +
facet_wrap(~Region)
I can get you halfway there (refer to the comments in the code for clarifications). This code is for the counts per day per region (plotted separately for each region). I think you can tweak things further to calculate the counts per day per county too; and whole state should be a cakewalk. I wish you good luck with your report.
rm(list = ls())
library(dplyr)
library(magrittr)
library(ggplot2)
library(scales)
library(tidyr) #Needed for the spread() function
#Dummy data
set.seed(1984)
sdate <- as.Date('2000-03-09')
edate <- as.Date('2000-05-18')
dateslist <- as.Date(sample(as.numeric(sdate): as.numeric(edate), 10000, replace = TRUE), origin = '1970-01-01')
df_final <- data.frame(Region = rep_len(1:9, 10000),
CollectionDate = dateslist,
TestResult = sample(c("Positive", "Negative"), 10000, replace = TRUE))
#First tally the positve and negative cases
#by Region, CollectionDate, TestResult in that order
df_final %<>%
group_by(Region, CollectionDate, TestResult) %>%
tally()
#Then
#First spread the counts (in n)
#That is, create separate columns for Negative and Positive cases
#for each Region-CollectionDate combination
#Then calculate their proportions (as shown)
#Now you have Negative and Positive
#percentages by CollectionDate by Region
df_final %<>%
spread(key = TestResult, value = n) %>%
mutate(Negative = Negative/(Negative + Positive),
Positive = Positive/(Negative + Positive))
#Plotting this now
#Since the percentages are available already
#Use geom_col() instead of geom_bar()
df_final %>% ggplot() +
geom_col(aes(x = CollectionDate, y = Positive, fill = "Positive"),
position = "identity", alpha = 0.4) +
geom_col(aes(x = CollectionDate, y = Negative, fill = "Negative"),
position = "identity", alpha = 0.4) +
facet_wrap(~ Region, nrow = 3, ncol = 3)
This yields:

ggplot2 - Two color series in area chart

I've got a question regarding an edge case with ggplot2 in R.
They don't like you adding multiple legends, but I think this is a valid use case.
I've got a large economic dataset with the following variables.
year = year of observation
input_type = *labor* or *supply chain*
input_desc = specific type of labor (eg. plumbers OR building supplies respectively)
value = percentage of industry spending
And I'm building an area chart over approximately 15 years. There are 39 different input descriptions and so I'd like the user to see the two major components (internal employee spending OR outsourcing/supply spending)in two major color brackets (say green and blue), but ggplot won't let me group my colors in that way.
Here are a few things I tried.
Junk code to reproduce
spec_trend_pie<- data.frame("year"=c(2006,2006,2006,2006,2007,2007,2007,2007,2008,2008,2008,2008),
"input_type" = c("labor", "labor", "supply", "supply", "labor", "labor","supply","supply","labor","labor","supply","supply"),
"input_desc" = c("plumber" ,"manager", "pipe", "truck", "plumber" ,"manager", "pipe", "truck", "plumber" ,"manager", "pipe", "truck"),
"value" = c(1,2,3,4,4,3,2,1,1,2,3,4))
spec_broad <- ggplot(data = spec_trend_pie, aes(y = value, x = year, group = input_type, fill = input_desc)) + geom_area()
Which gave me
Error in f(...) : Aesthetics can not vary with a ribbon
And then I tried this
sff4 <- ggplot() +
geom_area(data=subset(spec_trend_pie, input_type="labor"), aes(y=value, x=variable, group=input_type, fill= input_desc)) +
geom_area(data=subset(spec_trend_pie, input_type="supply_chain"), aes(y=value, x=variable, group=input_type, fill= input_desc))
Which gave me this image...so closer...but not quite there.
To give you an idea of what is desired, here's an example of something I was able to do in GoogleSheets a long time ago.
It's a bit of a hack but forcats might help you out. I did a similar post earlier this week:
How to factor sub group by category?
First some base data
set.seed(123)
raw_data <-
tibble(
x = rep(1:20, each = 6),
rand = sample(1:120, 120) * (x/20),
group = rep(letters[1:6], times = 20),
cat = ifelse(group %in% letters[1:3], "group 1", "group 2")
) %>%
group_by(group) %>%
mutate(y = cumsum(rand)) %>%
ungroup()
Now, use factor levels to create gradients within colors
df <-
raw_data %>%
# create factors for group and category
mutate(
group = fct_reorder(group, y, max),
cat = fct_reorder(cat, y, max) # ordering in the stack
) %>%
arrange(cat, group) %>%
mutate(
group = fct_inorder(group), # takes the category into account first
group_fct = as.integer(group), # factor as integer
hue = as.integer(cat)*(360/n_distinct(cat)), # base hue values
light_base = 1-(group_fct)/(n_distinct(group)+2), # trust me
light = floor(light_base * 100) # new L value for hcl()
) %>%
mutate(hex = hcl(h = hue, l = light))
Create a lookup table for scale_fill_manual()
area_colors <-
df %>%
distinct(group, hex)
Lastly, make your plot
ggplot(df, aes(x, y, fill = group)) +
geom_area(position = "stack") +
scale_fill_manual(
values = area_colors$hex,
labels = area_colors$group
)

Plotting multiple box plots as a single graph in R

I am trying to plot multiple box plots as a single graph. The data is where I have done a wilcoxon test. It should be like this
I have four/five questions and I want to plot the respondent score for two sets as a box plot. This should be done for all questions (Two groups for each question).
I am thinking of using ggplot2. My data is like
q1o <- c(4,4,5,4,4,4,4,5,4,5,4,4,5,4,4,4,5,5,5,5,5,5,5,5,5,3,4,4,3,4)
q1s <- c(5,4,4,5,5,5,5,5,4,5,4,4,5,4,5,5,5,5,5,5,5,5,5,5,5,5,4,5,4,4)
q2o <- c(3,3,3,4,3,4,4,3,3,3,4,4,3,4,3,3,4,3,3,3,3,4,4,4,4,3,3,3,3,4)
q2s <- c(5,4,4,5,5,5,5,5,4,5,4,4,5,4,5,5,5,5,5,5,5,5,5,5,5,5,4,3,4,4)
....
....
q1 means question 1 and q2 means question 2. I also want to know how to align these stacked box plots based on my need. Like one row or two rows.
This should get you started:
Unfortunately you don't provide a minimal example with sample data, so I will generate some random sample data.
# Generate sample data
set.seed(2017);
df <- cbind.data.frame(
value = rnorm(1000),
Label = sample(c("Good", "Bad"), 1000, replace = T),
variable = sample(paste0("F", 5:11), 1000, replace = T));
# ggplot
library(tidyverse);
df %>%
mutate(variable = factor(variable, levels = paste0("F", 5:11))) %>%
ggplot(aes(variable, value, fill = Label)) +
geom_boxplot(position=position_dodge()) +
facet_wrap(~ variable, ncol = 3, scale = "free");
You can specify the number of columns and rows in your 2d panel layout through arguments ncol and nrow, respectively, of facet_wrap. Many more details and examples can be found if you follow ?geom_boxplot and ?facet_wrap.
Update 1
A boxplot based on your sample data doesn't make too much sense, because your data are not continuous. But ignoring that, you could do the following:
df <- data.frame(
q1o = c(4,4,5,4,4,4,4,5,4,5,4,4,5,4,4,4,5,5,5,5,5,5,5,5,5,3,4,4,3,4),
q1s = c(5,4,4,5,5,5,5,5,4,5,4,4,5,4,5,5,5,5,5,5,5,5,5,5,5,5,4,5,4,4),
q2o = c(3,3,3,4,3,4,4,3,3,3,4,4,3,4,3,3,4,3,3,3,3,4,4,4,4,3,3,3,3,4),
q2s = c(5,4,4,5,5,5,5,5,4,5,4,4,5,4,5,5,5,5,5,5,5,5,5,5,5,5,4,3,4,4));
df %>%
gather(key, value, 1:4) %>%
mutate(
variable = ifelse(grepl("q1", key), "F1", "F2"),
Label = ifelse(grepl("o$", key), "Bad", "Good")) %>%
ggplot(aes(variable, value, fill = Label)) +
geom_boxplot(position = position_dodge()) +
facet_wrap(~ variable, ncol = 3, scale = "free");
Update 2
One way of visualising discrete data would be in a mosaicplot.
mosaicplot(table(df2));
The plot shows the count of value (as filled rectangles) per Variable per Label. See ?mosaicplot for details.

Resources