Labels for line series of different lengths - r

I do have a problem with one of my charts, where there are four data sets, with three of the same length and one dataset that is a month longer; only the longest data set shows the appropriate label at the end of that particular line.
I'm trying to get all four labels related to each line series to shown on the chart, but I can only get the label for the longest series. Please any thoughts and ideas would be greatly appreciated!
I show the code below and the chart output
library(GetBCBData)
library(ggplot2)
library(dplyr)
library(ggrepel)
# set ids
id.series <- c(ICC_sprd_total = 27443,
ICC_sprd_corps = 27444,
ICC_sprd_indivs = 27445,
SELIC = 4189)
first.date = '2013-01-01'
# get series from bcb
df_cred <- gbcbd_get_series(id = id.series,
first.date = first.date,
last.date = Sys.Date(),
use.memoise = FALSE)
glimpse(df_cred)
p <- ggplot(df_cred, aes(x =ref.date, y = value, colour = series.name)) +
geom_line() +
geom_label_repel(data = df_cred %>%
slice(which.max(ref.date)),
aes(label = value),
nudge_x = 0.05,
show.legend = FALSE,
size = 4.5) +
scale_y_continuous(limits = c(0,NA), expand = c(0,0)) +
geom_hline(yintercept=0)
print(p)

The original code is identifying points whose ref.date match the latest ref.date in the data; what you want is the latest ref.date within each series, which you can get by grouping first.
...
geom_label_repel(data = df_cred %>%
group_by(series.name) %>% # ADD THIS
slice(which.max(ref.date)),
...

Related

How do I create a non-stacked barplot with data labels using ggplot2 in R?

I am processing this dataset (bottom of the page) in R for a project.
First I load in the data:
count_data <- read.table(file = "../data/GSE156388_read_counts.tsv", header = T, sep = "",
row.names = 1)
I then melt the data using reshape2:
melted_count_data <- melt(count_data)
Then I create a factor for colouring graphs by group:
color_groups <- factor(melted_count_data$variable, labels = rep(c("siTFIP11", "siGl3"), each = 3))
Now we get to the barplot I'm trying to make:
ggplot(melted_count_data, aes(x = variable, y = value / 1e6, fill = color_groups)) +
geom_bar(stat = "identity") + labs(title = "Read counts", y = "Sequencing depth (millions of reads)")
The problem is that this creates a barplot with a bunch of stripes, leading me to believe it is trying to stack a ton of bars on top of each other instead of just creating one solid block.
I also wanted to add data labels to the plot:
+ geom_text(label = value / 1e6)
but this seemed to just put a bunch of values on top of each other.
For the stacked bars problem I tried to use y = sum(values) but this just made all the bars the same height. I also tried using y = colSums(values) but this obviously didn't work because it needs "an array of at least two dimensions".
I tried figuring it out using the unmelted data but to no avail.
I just kind of gave up on the labels since I wasn't even able to fix the bars problem.
EDIT:
I found a thread suggesting this:
ggplot(melted_count_data, aes(x = variable, y = value / 1e6, color = color_groups)) +
geom_bar(stat = "identity") + labs(title = "Read counts", y = "Sequencing depth (millions of reads)")
Changing fill to color. This fixes the white lines but results in some (fewer) black lines. Looking at this new chart leads me to believe it might actually be pasting a bunch of charts on top of each other?
You could do:
library(tidyverse)
url <- paste0( "https://www.ncbi.nlm.nih.gov/geo/download/",
"?acc=GSE156388&format=file&file=GSE156388%5",
"Fread%5Fcounts%2Etsv%2Egz")
tmpfile <- tempfile()
download.file(url, tmpfile)
count_data <- readr::read_tsv(gzfile(tmpfile),
show_col_types = FALSE)
count_data %>%
pivot_longer(-1) %>%
mutate(color_groups = factor(name,
labels = rep(c("siTFIP11", "siGl3"), each = 3))) %>%
group_by(name) %>%
summarise(value = sum(value)/1e6, color_groups = first(color_groups)) %>%
ggplot(aes(name, value, fill = color_groups)) +
geom_col() +
geom_text(aes(label = round(value, 2)), nudge_y = 0.5) +
labs(title = "Read counts", x = "", fill = "Type",
y = "Sequencing depth (millions of reads)") +
scale_fill_manual(values = c("gold", "deepskyblue3")) +
theme_minimal()
Created on 2022-03-21 by the reprex package (v2.0.1)

How do you make a line graph with multiple lines from multiple variables in R

I have two dataframes and I want to plot a comparison between them. The plot and dataframes look like so
df2019 <- data.frame(Role = c("A","B","C"),Women_percent = c(65,50,70),Men_percent = c(35,50,30), Women_total =
c(130,100,140), Men_total = c(70,100,60))
df2016 <- data.frame(Role= c("A","B","C"),Women_percent = c(70,45,50),Men_percent = c(30,55,50),Women_total =
c(140,90,100), Men_total = c(60,110,100))
all_melted <- reshape2::melt(
rbind(cbind(df2019, year=2019), cbind(df2016, year=2016)),
id=c("year", "Role"))
Theres no reason I need the data in melted from, I just did it because I was plotting bar graphs with it, but now I need a line graph and I dont know how to make line graphs in melted form, and dont know how to keep that 19/16 tag if not in melted frame. When i try to make a line graph I dont know how to specify what "variable" will be used. I want the lines to be the Women,Men percent values, and the label to be the totals. (in this picture the geom_text is the percent values, I want it to use the total values)
Crucially I want the linetype to be dotted in 2016 and for the legend to show that
I think it would be simplest to rbind the two frames after labelling them with their year, then reshape the result so that you have columns for role, year, gender, percent and total.
I would then use a bit of alpha scale trickery to hide the points and labels from 2016:
df2016$year <- 2016
df2019$year <- 2019
rbind(df2016, df2019) %>%
pivot_longer(cols = 2:5, names_sep = "_", names_to = c("Gender", "Type")) %>%
pivot_wider(names_from = Type) %>%
ggplot(aes(Role, percent, color = Gender,
linetype = factor(year),
group = paste(Gender, year))) +
geom_line(size = 1.3) +
geom_point(size = 10, aes(alpha = year)) +
geom_text(aes(label = total, alpha = year), colour = "black") +
scale_colour_manual(values = c("#07aaf6", "#ef786f")) +
scale_alpha(range = c(0, 1), guide = guide_none()) +
scale_linetype_manual(values = c(2, 1)) +
labs(y = "Percent", color = "Gender", linetype = "Year")

Barplot side by side and line charts in the same plot

I want to create in R a plot which contains side by side bars and line charts as follows:
I tried:
Total <- c(584,605,664,711,759,795,863,954,1008,1061,1117,1150)
Infected <- c(366,359,388,402,427,422,462,524,570,560,578,577)
Recovered <- c(212,240,269,301,320,359,385,413,421,483,516,548)
Death <- c(6,6,7,8,12,14,16,17,17,18,23,25)
day <- itemizeDates(startDate="01.04.20", endDate="12.04.20")
df <- data.frame(Day=day, Infected=Infected, Recovered=Recovered, Death=Death, Total=Total)
value_matrix = matrix(, nrow = 2, ncol = 12)
value_matrix[1,] = df$Recovered
value_matrix[2,] = df$Death
plot(c(1:12), df$Total, ylim=c(0,1200), xlim=c(1,12), type = "b", col="peachpuff", xaxt="n", xlab = "", ylab = "")
points(c(1:12), df$Infected, type = "b", col="red")
barplot(value_matrix, beside = TRUE, col = c("green", "black"), width = 0.35, add = TRUE)
But the bar chart does not fit the line chart. I guess it would be easier to use ggplot2, but don't know how. Could anyone help me? Thanks a lot in advance!
With ggplot2, the margins are handled nicely for you, but you'll need the data in two separate long forms. Reshape from wide to long with tidyr::gather, tidyr::pivot_longer, reshape2::melt, reshape, or whatever you prefer.
library(tidyr)
library(ggplot2)
df <- data.frame(
Total = c(584,605,664,711,759,795,863,954,1008,1061,1117,1150),
Infected = c(366,359,388,402,427,422,462,524,570,560,578,577),
Recovered = c(212,240,269,301,320,359,385,413,421,483,516,548),
Death = c(6,6,7,8,12,14,16,17,17,18,23,25),
day = seq(as.Date("2020-04-01"), as.Date("2020-04-12"), by = 'day')
)
ggplot(
tidyr::gather(df, Population, count, Total:Infected),
aes(day, count, color = Population, fill = Population)
) +
geom_line() +
geom_point() +
geom_col(
data = tidyr::gather(df, Population, count, Recovered:Death),
position = 'dodge', show.legend = FALSE
)
Another way to do it is to gather twice before plotting. Not sure if this is easier or harder to understand, but you get the same thing.
df %>%
tidyr::gather(Population, count, Total:Infected) %>%
tidyr::gather(Resolution, count2, Recovered:Death) %>%
ggplot(aes(x = day, y = count, color = Population)) +
geom_line() +
geom_point() +
geom_col(
aes(y = count2, color = Resolution, fill = Resolution),
position = 'dodge', show.legend = FALSE
)
You can actually plot the lines and points without reshaping by making separate calls for each, but to dodge bars (or get legends), you'll definitely need to reshape.

plot multiple lines in ggplot

I need to plot hourly data for different days using ggplot, and here is my dataset:
The data consists of hourly observations, and I want to plot each day's observation into one separate line.
Here is my code
xbj1 = bj[c(1:24),c(1,6)]
xbj2 = bj[c(24:47),c(1,6)]
xbj3 = bj[c(48:71),c(1,6)]
ggplot()+
geom_line(data = xbj1,aes(x = Date, y= Value), colour="blue") +
geom_line(data = xbj2,aes(x = Date, y= Value), colour = "grey") +
geom_line(data = xbj3,aes(x = Date, y= Value), colour = "green") +
xlab('Hour') +
ylab('PM2.5')
Please advice on this.
I'll make some fake data (I won't try to transcribe yours) first:
set.seed(2)
x <- data.frame(
Date = rep(Sys.Date() + 0:1, each = 24),
# Year, Month, Day ... are not used here
Hour = rep(0:23, times = 2),
Value = sample(1e2, size = 48, replace = TRUE)
)
This is a straight-forward ggplot2 plot:
library(ggplot2)
ggplot(x) +
geom_line(aes(Hour, Value, color = as.factor(Date))) +
scale_color_discrete(name = "Date")
ggplot(x) +
geom_line(aes(Hour, Value)) +
facet_grid(Date ~ .)
I highly recommend you find good tutorials for ggplot2, such as http://www.cookbook-r.com/Graphs/. Others exist, many quite good.

Separate boxes for two grouping variables when color by only one variable

Here is an example from the geom_boxplot man page:
p = ggplot(mpg, aes(class, hwy))
p + geom_boxplot(aes(colour = drv))
which looks like this:
I would like to make a very similar plot, but with (yearmon formatted) dates where the class variable is in the example, and a factor variable where drv is in the example.
Here is some sample data:
df_box = data_frame(
Date = sample(
as.yearmon(seq.Date(from = as.Date("2013-01-01"), to = as.Date("2016-08-01"), by = "month")),
size = 10000,
replace = TRUE
),
Source = sample(c("Inside", "Outside"), size = 10000, replace = TRUE),
Value = rnorm(10000)
)
I have tried a bunch of different things:
Put an as.factor around the date variable, then I no longer have the nicely spaced out date scale for the x-axis:
df_box %>%
ggplot(aes(
x = as.factor(Date),
y = Value,
# group = Date,
color = Source
)) +
geom_boxplot(outlier.shape = NA) +
theme_bw() +
xlab("Month Year") +
theme(
axis.text.x = element_text(hjust = 1, angle = 50)
)
On the other hand, if I use Date as an additional group variable as suggested here, adding color no longer has any additional impact:
df_box %>%
ggplot(aes(
x = Date,
y = Value,
group = Date,
color = Source
)) +
geom_boxplot() +
theme_bw()
Any ideas as to how achieve the output of #1 while still maintaining a yearmon scale x-axis?
Since you need separate boxes for each combination of Date and Source, use interaction(Source, Date) as the group aesthetic:
ggplot(df_box, aes(x = Date, y = Value,
colour = Source,
group = interaction(Source, Date))) +
geom_boxplot()

Resources