How to customize a horizontal stacked-bar chart with ggplot2 - r

I'm just starting with R visuals, and I'm stuck on this one. In the code below you will find the table I'm working with and its rendering.
What I'm trying to do is to obtain a horizontal bar chart where for each Pool value I have two variables (D_MTD and I_MTD).
To achieve that (and I don't know if this is the right way of doing it) I added two geom_bar layers and what I get is pretty much what it should look like (it's okay to have stacked columns).
MyTable <- data.frame(
Pool = c(" ", "(CA) Domestic", "(CA) International", "(FR) International", "(US) Domestic", "(US) International" ),
ADomestic = c(0, 36000, 3000, 1200, 54000, 6000),
AInternational = c(0, 0, 600, 600, 0, 1200),
D_MTD = c(23.4, 0, 0, 0, 8.38, 0),
I_MTD = c(6.63, 0, 0, 0, 0, 0)
)
MyTable
MyVisual <- MyTable %>%
ggplot(
Data = MyTable,
aes(y = Pool)
) +
geom_bar(
aes(
weight = D_MTD,
colour = "Domestic"
),
alpha = 0.5
) +
geom_bar(
aes(
weight = I_MTD,
colour = "International"
),
alpha = 0.5
) +
labs(
title = "Month-To-Date Consumption by Pool",
subtitle = "(in minutes)",
y = "Pools",
x = "MTD Used Minutes",
colour = "Pool context"
)
MyVisual
However, the visual has two outline colours that I don't want (I would like the bars to be filled with colors instead). I tried to use the fill argument to assign a fixed color to the bars (ex: red for Domestic and blue for International) but that generates another legend that I don't want to show.
Example:
MyVisual <- MyTable %>%
ggplot(
Data = MyTable,
aes(y = Pool)
) +
geom_bar(
aes(
weight = D_MTD,
colour = "Domestic",
fill = "red"
),
alpha = 0.5
) +
geom_bar(
aes(
weight = I_MTD,
colour = "International",
fill = "blue"
),
alpha = 0.5
) +
labs(
title = "Month-To-Date Consumption by Pool",
subtitle = "(in minutes)",
y = "Pools",
x = "MTD Used Minutes",
colour = "Pool context"
)
MyVisual
Also, how do I apply labels to each bar with the corresponding values?
Thanks in advance!

This solution should work for you. It contains elements, and transformation that for a starter may be confusing, but you will have to face these at some point. I recommend you to read the book R for Data Science, at least the chapter reated to data visualization with ggplot2.
Steps:
1- Reshape the data usin the pivot_longer() function
2- Creating factors and labels for the plot
library(tidyr)
library(dplyr)
library(ggplot2)
MyTable <- data.frame(
Pool = c(" ", "(CA) Domestic", "(CA) International", "(FR) International", "(US) Domestic", "(US) International" ),
ADomestic = c(0, 36000, 3000, 1200, 54000, 6000),
AInternational = c(0, 0, 600, 600, 0, 1200),
D_MTD = c(23.4, 0, 0, 0, 8.38, 0),
I_MTD = c(6.63, 0, 0, 0, 0, 0)
)
to_plot <- MyTable %>%
select(Pool, D_MTD, I_MTD) %>%
# Data to long format
pivot_longer(
cols = contains('MTD'),
names_to = 'pool_context',
values_to = 'used_minutes'
) %>%
mutate(
pool_context = factor(
pool_context, c('D_MTD', 'I_MTD'), c('Domestic', 'International')
),
# Labels only for non 0 values
label = ifelse(used_minutes == 0, NA, used_minutes)
)
to_plot %>%
ggplot(aes(y = Pool, x = used_minutes)) +
# Fill aesthetic only for bars, to avoid labels to get color too
geom_col(aes(fill = pool_context)) +
theme(legend.position = 'bottom') +
geom_label(aes(label = label), hjust = 1.1) +
labs(fill = NULL, x = 'Used minutes (MTD)', y = 'Pool')
Created on 2022-02-17 by the reprex package (v2.0.1)

Related

How to plot filtered data with loop in R and combine them with facet_grid?

I am new in R, so my question could seem very trivial for someone, but I need a solution. I have a data frame:
`structure(list(Time = c(0, 0, 0), Node = 1:3, Depth = c(0, -10,
-20), Head = c(-1000, -1000, -1000), Moisture = c(0.166, 0.166,
0.166), HeadF = c(-1000, -1000, -1000), MoistureF = c(0.004983,
0.004983, 0.004983), Flux = c(-0.00133, -0.00133, -0.00133),
FluxF = c(-0.00122, -0.00122, -0.00122), Sink = c(0, 0, 0
), Transf = c(0, 0, 0), TranS = c(0, 0, 0), Temp = c(20,
20, 20), ConcF = c(0, 0, 0), ConcM = c(0, 0, 0)), row.names = c(NA,
3L), class = "data.frame")`.
I am able to plot a single TranS vs Time Single plot, where color = Transf (using scale_color_viridis). I want to create plots with a filtered data for( depth = -20, depth = -40 , -60, -80 and -100) Note: that title also have to be changed according to a depth value. These plots then I want to put next to each other using facet_grid.
I have tried in a such way:
plot_d20 <-plot_node %>% filter(plot_node$Depth == -20)
plot_d40 <-plot_node %>% filter(plot_node$Depth == -40)
plot_d60 <-plot_node %>% filter(plot_node$Depth == -60)
plot_d80 <-plot_node %>% filter(plot_node$Depth == -80)
plot_d100 <-plot_node %>% filter(plot_node$Depth == -100)
depth_plot <- c(plot_d20,plot_d40,plot_d60,plot_d80,plot_d100)
for (p in depth_plot){
ggpS<-ggplot(p, aes(Time, TranS, color=Transf) ) +
geom_point(alpha = 1)+
scale_color_viridis(option = "D")+
scale_x_continuous(limits = c(0,1400), breaks = seq(0,1400,200))+
ggtitle('Solute Mass Transfer for depth = 20mm')
ggpS
}
But it doesn't work.
R says:
data must be a data frame, or another object coercible by fortify(), not a numeric vector. And I don't know how to make my title dynamic and combine it with facet_grid or on a single plot, but in this case, I will face difficulty to distinguish the lines and assigning the legend to the plot by color, because color already represents another variable. What is the possible way to accomplish that?
Edit: Understand the question differently.
facet_grid accepts a single data.frame, and uses one of that frames values to split a chart into multiple subplots. Your question describes combining multiples charts into a single chart, which is available as a function from the cowplot library. However, If you are interested in faceting the data, here is a way to filter and facet_wrap.
Example with Iris data:
library(tidyverse)
iris %>%
filter(Sepal.Length %in% c(6.4,5.7,6.7,5.1,6.3,5)) %>% ### Your values here
ggplot(aes(Petal.Length, Petal.Width, color=Species)) +
geom_point(alpha = 1) +
scale_color_viridis_d()+ #(option = "D") + ### New function name
#scale_x_continuous(limits = c(0,1400), breaks = seq(0,1400,200))+
facet_wrap("Sepal.Length") +
# facet_grid("Sepal.Length") + ### Alternative Layout
ggtitle('Sepal Length Range')
To create a "grid" of plots with only one faceting variable, you'll actually want to use facet_wrap(). You can create your facet titles before plotting, and change the formatting of strip.text within theme() to make them look more "title-like."
library(dplyr)
library(ggplot2)
plot_node %>%
mutate(
facet = paste0("Solute Mass Transfer for Depth = ", abs(Depth), "mm")
) %>%
ggplot(aes(Time, TranS, color=Transf)) +
geom_point(alpha = 1) +
scale_color_viridis_c(option = "D") +
scale_x_continuous(limits = c(0, 1400), breaks = seq(0, 1400, 200)) +
facet_wrap(vars(facet), ncol = 2, scales = "free") +
theme_minimal() +
theme(strip.text = element_text(size = 12, face = "bold"))

R visualization: How to display flags on points created by geom_quasirandom with the use of geom_flag

I am pretty new to R programming and I need help with plotting flags using ggflags library.
If you run the code below, it will create a single graph of density plots, box plots, and swarm plots. There are two purple dots in each 5-year period, which stand for China and India (the two most populous countries).
What I need is to replace these purple dots with flags of China and India, and with the country code attached to each flag.
I used the ggflags library but I could not figure out how to use it (geom_flag) with the points created by geom_quasirandom.
The two code lines mentioned are:
geom_quasirandom(aes(color = color_custom, alpha = alpha_custom, size = size_custom), width = 0.20)
This creates points of the swarm plot, including purple dots.
geom_flag(aes(country = CodeISO2C), size = 3)
This uses the country codes to make flags. However, when you uncomment this line, an error will occur:
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘grobify’ for signature ‘"NULL"’
If it is not possible to use flags instead of purple dots, then please tell me how to set 2 colors for China and India and add a legend with a single function call:
ComboGraph(country_data['isMostPopulous'],'none')
Thank you very much for your help.
The data source file can be downloaded from:
https://population.un.org/wpp/Download/Files/1_Indicators%20(Standard)/EXCEL_FILES/1_Population/WPP2019_POP_F03_RATE_OF_NATURAL_INCREASE.xlsx
Here is my full code:
# ****************************************************************
# Visualization of natural population rate in the world
# with density plots, box plots, and swarm plots in a single graph
# ****************************************************************
if (!require("dplyr")) {
install.packages("dplyr")
}
if (!require("devtools")) {
install.packages("devtools")
}
# flags in round shape
if (!require("ggflags")) {
devtools::install_github("rensa/ggflags")
}
# flags in rectangular form
# if (!require("ggflags")) {
# devtools::install_github("ellisp/ggflags")
# }
if (!require("countrycode")) {
install.packages("countrycode")
}
library(dplyr) # to use the pipe operator %>%
library(tidyverse)
library(readxl)
library(ggdist)
library(ggbeeswarm)
library(ggtext)
library(ggflags)
library(countrycode)
# import data from the 'ESTIMATES' sheet and skip the first 16 rows
country_data <- read_excel('WPP2019_POP_F03_RATE_OF_NATURAL_INCREASE.xlsx',
sheet = 'ESTIMATES', skip = 16) %>%
# only select rows with 'Country/Area' value in the 'Type' column
dplyr::filter(Type == 'Country/Area') %>%
# select columns 1, 3, 6, 8 to 21
select(c(1, 3, 6, 8:21)) %>%
# now the table has 17 columns
# increase the number of rows and decrease the number of columns
# to 2 columns: 'Period', and 'NaturalRate'
pivot_longer(4:17, names_to = 'Period', values_to = 'NaturalRate') %>%
# mark countries by continent and max population
# mutate: add new variables and preserve existing ones
mutate(
NaturalRate = as.numeric(NaturalRate),
isAfrica = ifelse(Index >= 27 & Index <= 88, T, F),
isAsia = ifelse(Index >= 90 & Index <= 146, T, F),
isLatAmCar = ifelse(Index >= 149 & Index <= 188, T, F),
isOceania = ifelse(Index >= 190 & Index <= 206, T, F),
isEurope = ifelse(Index >= 210 & Index <= 252, T, F),
isNorAm = ifelse(Index >= 254 & Index <= 255, T, F),
isMostPopulous = ifelse(Index == 127 | Index == 119, T, F)
)
# mark countries by max and min rate in each period
country_data <- country_data %>%
group_by(Period) %>%
mutate(
maxPeriodRate = max(NaturalRate),
minPeriodRate = min(NaturalRate)
) %>%
mutate(isMaxMinPeriodRate = ifelse(NaturalRate == maxPeriodRate | NaturalRate == minPeriodRate, T, F)) %>%
ungroup() # always ungroup after group_by to prevent future errors
# rename columns
names(country_data)[2] = 'Country'
# add a new column of two-letter country codes for certain cases
# Note: The Channel Islands is not part of the UK, but ISO offers the code for it under "GB"
country_data <- country_data %>%
mutate(CodeISO2C_NorAm = ifelse(isNorAm == TRUE,
tolower(countrycode(Country, origin = 'country.name', destination = 'iso2c',
custom_match = c('Channel Islands' = 'GB'))),
NA)) %>%
mutate(CodeISO2C_Populous = ifelse(isMostPopulous == TRUE,
tolower(countrycode(Country, origin = 'country.name', destination = 'iso2c',
custom_match = c('Channel Islands' = 'GB'))),
NA)) %>%
mutate(CodeISO2C_MaxMin = ifelse(isMaxMinPeriodRate == TRUE,
tolower(countrycode(Country, origin = 'country.name', destination = 'iso2c',
custom_match = c('Channel Islands' = 'GB'))),
NA))
# count the number of distinct periods
total_periods = nrow(unique(country_data[,'Period']))
# choose the range of colors for the density plots
column_colors <- colorRampPalette(c("#8ecae6", "#219ebc"))(total_periods)
# choose other colors
background_color = '#fffef7'
country_color = '#800080'
world_color = '#5DB6D3'
# create a function to make density plots, box plots, and swarm plots (strip plots) in the same graph
ComboGraphWithFlag <- function(boolean_column){
# create temporarily 3 columns: size_custom, color_custom, alpha_custom to plot points of the selected group
country_data <- mutate(country_data, size_custom = ifelse(boolean_column == TRUE, 0.8, 0.5))
country_data <- mutate(country_data, color_custom = ifelse(boolean_column == TRUE, 'highlight', Period))
country_data <- mutate(country_data, alpha_custom = ifelse(boolean_column == TRUE, 0.9, 0.5))
if(colnames(boolean_column) == 'isNorAm') {
region_group = 'Northern America'
code_col = country_data$CodeISO2C_NorAm
} else if(colnames(boolean_column) == 'isMostPopulous') {
region_group = 'two most populous countries'
code_col = country_data$CodeISO2C_Populous
} else if(colnames(boolean_column) == 'isMaxMinPeriodRate') {
region_group = 'countries of max and min rates'
code_col = country_data$CodeISO2C_MaxMin
}
print(paste0('Generating the graph of ', region_group))
ggplot(country_data, aes(x = Period, y = NaturalRate)) +
geom_boxplot(fill = 'transparent', width = 0.4, color = 'orange',
outlier.shape = NA, alpha = 0.8, coef = 0) +
geom_quasirandom(aes(color = color_custom, alpha = alpha_custom, size = size_custom), width = 0.20) +
stat_halfeye(aes(fill = Period), color = 'orange', justification = -0.6,
width = 0.4, .width = 0, alpha = 0.9) +
geom_text(data = unique(country_data[, c('Period')]),
aes(y = 45, label = Period), color = column_colors, size = 2.7, family = 'Arial') +
geom_flag(aes(country = code_col), size = 3, position = position_quasirandom()) +
annotate('text', y = -16, x = 5.5, label = 'Cambodian genocide',
family = 'Arial', color = 'grey50', size = 3)+
geom_curve(data = NULL, aes(x = 5.45, y = -17.5, xend = 5.9, yend = -21),
arrow = arrow(length = unit(0.02, "npc")), color = 'grey60', size = 0.35) +
scale_color_manual(values = c(column_colors, country_color)) +
scale_alpha_identity() +
scale_size_identity() +
scale_fill_manual(values = column_colors) +
labs(title = paste0("Natural population rate in <span style='color:", country_color, "'>", region_group,
"</span> and <span style='color:", world_color, "'>the rest of the world </span>"),
subtitle = 'Natural population rate = crude birth rate - crude death rate, per 1000 population',
caption = "Source: Rate of Natural Population Increase - Population Division, United Nations
\nhttps://population.un.org/wpp/Download/Files/1_Indicators%20(Standard)/EXCEL_FILES/1_Population/WPP2019_POP_F03_RATE_OF_NATURAL_INCREASE.xlsx",
y = 'Rate of natural population increase, per 1000 population') +
theme_void() +
theme(
text = element_text('Arial', color = 'grey50'),
plot.background = element_rect(fill = background_color, color = background_color),
plot.margin = margin(0.8, 0.8, 0.8, 0.8, 'cm'),
plot.title = element_markdown(face = 'bold', size = 16, margin = margin(b = 4)),
plot.subtitle = element_text(colour = 'grey60', margin = margin(b = 7.5)),
plot.caption = element_text(colour = 'grey40', size = 8),
legend.position = 'none',
axis.line.y = element_line(colour = 'grey50'),
axis.title.y = element_text(angle = 90, margin = margin(r = 10), size = 10),
axis.text.y = element_text(color = 'grey70', size = 9, margin = margin(r = 5)),
panel.grid.major.y = element_line(colour = 'grey80', linetype = 'dotted')
)
ggsave(paste0('Population rate in ', region_group,'.png'), width = 10, height = 6)
print('Graph making is completed.')
}
# make plots with flags for certain cases
ComboGraphWithFlag(country_data['isNorAm'])
ComboGraphWithFlag(country_data['isMostPopulous'])
ComboGraphWithFlag(country_data['isMaxMinPeriodRate'])
ggbeeswarm comes with a few new "positions": these are specific objects that define the offset of the plotted points. You can pass this position to geom_flag
I have not used your example because it seems quite... involved. I hope the underlying concept becomes clear from the below example.
library(ggbeeswarm)
#> Loading required package: ggplot2
# devtools::install_github("jimjam-slam/ggflags")
library(ggflags)
ggplot(iris) +
geom_flag(aes(x = Species, Sepal.Length, country = "fr"),
position = position_quasirandom())
Created on 2022-06-08 by the reprex package (v2.0.1)

geom_text in the right of a heatmap in ggplot

I created a heatmap with this dataframe:
datos<- data.frame(
stringsAsFactors = FALSE,
country_name = c("Argentina","Bolivia",
"Brazil","Chile","Colombia","Paraguay","Peru","Uruguay",
"Argentina","Bolivia","Brazil","Chile","Colombia",
"Paraguay","Peru","Uruguay","Argentina","Bolivia",
"Brazil","Chile"),
year = c("1961","1961","1961","1961",
"1961","1961","1961","1961","1962","1962","1962",
"1962","1962","1962","1962","1962","1963","1963",
"1963","1963"),
crec = c(1,1,1,1,1,1,1,1,0,1,1,
1,1,1,1,0,0,1,1,1)
)
colors<-c("red","blue")
chart<- ggplot(datos,aes(x=year,y=country_name,fill=factor(crec))) +
geom_tile(color=gris,size=0.01)+
scale_fill_manual(values=colors)+
scale_y_discrete(limits = crisis$country_name)+
guides(fill=FALSE)
I would like to add a geom_text at the right of the last year of each country, so I can show the counts how many red squares each country has. I think geom_text would be good, but i am not sure about how to create one for each country.
text<- data.frame(
stringsAsFactors = FALSE,
country_name = c("Colombia","Bolivia","Chile",
"Peru","Brazil","Paraguay","Uruguay","Argentina"),
label = c("0 years","0 years","0 years",
"0 years","0 years","0 years","1 years","2 years")
)
library(dplyr)
# get the maximum year per country
text = text %>%
left_join(
datos %>% group_by(country_name) %>%
summarize(year = max(year))
)
chart +
geom_text(
data = text,
aes(label = label, x = year, y = country_name),
# left justified, but nudged to the right
hjust = 0, nudge_x = 0.55,
inherit.aes = FALSE
) +
# give a little extra room for the text
scale_x_discrete(expand = expansion(mult = 0, add = c(0, 1)))

Multiple scatter plots in a single figure from multiple data frame in R using ggplot?

I want to generate a figure that display all the scatter plots on this single figure using data from the two data frame (i.e., regressing column-A of Data1 against Column-A of Data2). Each plot in the figure should show R-square and p-value. I am more interested to know how I can use the fact_wrap function of ggplot while grabing data from multiple data frame.
I tried a couple of method but did not succeeded.
library(tidyverse)
Data1=data.frame(A=runif(20, min = 0, max = 100), B=runif(20, min = 0, max = 250), C=runif(20, min = 0, max = 300))
Data2=data.frame(A=runif(20, min = -10, max = 50), B=runif(20, min = -5, max = 150), C=runif(20, min = 5, max = 200))
#method-1: using plot functions
par(mfrow=c(3,1))
plot(Data1$A, Data2$A)
abline(lm(Data1$A ~ Data2$A))
plot(Data1$B, Data2$B)
abline(lm(Data1$B ~ Data2$B))
plot(Data1$C, Data2$C)
abline(lm(Data1$C ~ Data2$C))
dev.off()
#method-2: using ggplot
ggplot()+
geom_point(aes(Data1$A,Data2$A))
I want a Figure like the one below
The hardest part is tidying up your data. Once that's done, the plot is pretty straightforward.
library(tidyverse)
Data1=data.frame(A=runif(20, min = 0, max = 100), B=runif(20, min = 0, max = 250), C=runif(20, min = 0, max = 300))
Data2=data.frame(A=runif(20, min = -10, max = 50), B=runif(20, min = -5, max = 150), C=runif(20, min = 5, max = 200))
data <- Data1 %>%
#add columns to indicate the source and the observation number
mutate(source = "Data1",
obs = row_number()) %>%
#bind to Data2 with the same new columns
bind_rows(Data2 %>% mutate(source = "Data2", obs = row_number())) %>%
#tidy the data so we've got a column for Data1 and Data2 and an indicator for the series (A, B, C)
gather(A, B, C, key = series, value = value) %>%
spread(key = source, value = value)
#create a separate data frame for annotations, finding the "top left" corner of each series
annotations <- data %>%
group_by(series) %>%
summarise(x = min(Data1),
y = max(Data2)) %>%
mutate(label = c("P = 0.6", "P = 0.5", "P = 0.9"))
#plot the data, faceting by series
data %>%
ggplot(aes(Data1, Data2))+
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
facet_grid(series~., scales = "free") +
#add the annotations with adjustments to the horiz & vert placement
geom_text(data = annotations, aes(x = x, y = y, label = label, hjust = 0, vjust = 1),
color = "red", fontface = "italic")
You can make a list of plots and then use grid.arrange() function.
sc_plots = list()
sc_plots$sc1 = ggplot() + ...
sc_plots$sc2 = ggplot() + ...
grid.arrange(sc_plots$sc1, sc_plots$sc2,
ncol = 3)
#Jordo82, here is what I get when I try to insert the text on the figures. Is there a way to free-up the Y-axis in a way that the added text do not depends on the y-scale rather it appears on the top left corner of each plot. The reason why I used annotate_custom was that it do not depends on the y-scale but the downside is that I would take only the first text in the labels. my real values are so different then each other- see the Y-scale of attached Figure.
I used your code while editing the placement coordinate
annotate("text", -1.5, 800, label = c("P = 0.6", "P = 0.5", "P = 0.9", "P = 0.9"),
color = "red", fontface = "italic")

Connect multiple points across multiple facets with a dashed line

I have following data. Each observation is a genomic coordinate with copy number changes (copy.number.type) which is found in some percentage of samples (per.found).
chr<-c('1','12','2','12','12','4','2','X','12','12','16','16','16','5'
,'4','16','X','16','16','4','1','5','2','4','5','X','X','X','4',
'1','16','16','1','4','4','12','2','X','1','16','16','2','1','12',
'2','2','4','4','2','1','5','X','4','2','12','16','2','X','4','5',
'4','X','5','5')
start <- c(247123880,91884413,88886155,9403011,40503634,10667741,88914884,
100632615,25804205,25803542,18925987,21501823,21501855,115902990,
26120955,22008406,432498,22008406,22008406,69306802,4144380,73083197,
47743372,34836043,16525257,315832,1558229,51048657,49635818,239952709,
69727769,27941625,80328938,49136485,49136654,96076105,133702693,315823,
16725215,69728318,88520557,89832606,202205081,124379013,16045662,89836880,
49657307,97117994,76547133,35051701,344973,1770075,49139874,77426085,
9406416,69727781,108238962,151006944,49121333,6669602,89419843,74214551,
91203955,115395615)
type <- c('Inversions','Deletions','Deletions','Deletions','Deletions','Duplications','Deletions','Deletions',
'Duplications','Deletions','Duplications','Inversions','Inversions','Deletions','Duplications',
'Deletions','Deletions','Deletions','Deletions','Inversions','Duplications','Inversions','Inversions',
'Inversions','Deletions','Deletions','Deletions','Insertions','Deletions','Inversions','Inversions',
'Inversions','Inversions','Deletions','Deletions','Inversions','Deletions','Deletions','Inversions',
'Inversions','Deletions','Deletions','Deletions','Insertions','Inversions','Deletions','Deletions',
'Deletions','Inversions','Deletions','Duplications','Inversions','Deletions','Deletions','Deletions',
'Inversions','Deletions','Inversions','Deletions','Inversions','Inversions','Inversions','Deletions','Deletions')
per.found <- c(-0.040,0.080,0.080,0.040,0.080,0.040,0.080,0.040,0.040,0.120,0.040,-0.080,-0.080,0.040,0.040,0.120,
0.040,0.120,0.120,-0.040,0.011,-0.011,-0.023,-0.023,0.011,0.023,0.011,0.011,0.011,-0.011,-0.034,
-0.011,-0.023,0.011,0.011,-0.011,0.023,0.023,-0.023,-0.034,0.011,0.023,0.011,0.011,-0.023,0.023,
0.011,0.011,-0.011,0.011,0.011,-0.023,0.011,0.057,0.011,-0.034,0.023,-0.011,0.011,-0.011,-0.023,
-0.023,0.011,0.011)
df <- data.frame(chromosome = chr, start.coordinate = start, copy.number.type = type, per.found = per.found )
I would like to create a line plot. I created a plot using ggplot (facets), but the problem is I can not connect the points between two facets. Is there any way to do that. I do not necessarily need to use facets if there is a way to annotate x axis scales by chromosome. In the following image the dotted line shows what I would like to have for all copy.number.type lines.
EDIT: Looking for simplified approach.
library(ggplot2)
ggplot(df, aes(x=start.coordinate,y=per.found, group=copy.number.type, color=copy.number.type))+
geom_line()+
geom_point()+
facet_grid(.~chromosome,scales = "free_x", space = "free_x")+
theme(axis.text.x = element_text(angle = 90, hjust = 1))
Desired output: As shown by the red dashed lines. I want to connect all the border points with a dashed line across facets.
Note: it may not make sense to connect the lines between the chromosomes.
But here is one way, by avoiding facets:
library(dplyr)
df2 <- df %>%
mutate(chromosome = factor(chromosome, c(1, 2, 4, 5, 12, 16, 'X'))) %>%
arrange(chromosome, start.coordinate)
chromosome_positions <- df2 %>%
group_by(chromosome) %>%
summarise(start = first(start.coordinate), end = last(start.coordinate)) %>%
mutate(
size = end - start,
new_start = cumsum(lag(size, default = 0)),
new_end = new_start + size
)
df3 <- df2 %>%
left_join(chromosome_positions, 'chromosome') %>%
mutate(new_x = start.coordinate + (new_start - start))
ggplot(df3, aes(x=new_x,y=per.found, group=copy.number.type, color=copy.number.type))+
geom_rect(
aes(xmin = new_start, xmax = new_end, ymin = -Inf, ymax = Inf, fill = chromosome),
chromosome_positions, inherit.aes = FALSE, alpha = 0.3
) +
geom_line() +
geom_point() +
geom_text(
aes(x = new_start + 0.5 * size, y = Inf, label = chromosome),
chromosome_positions, inherit.aes = FALSE, vjust = 1
) +
scale_fill_manual(values = rep(c('grey60', 'grey90'), 10), guide = 'none') +
scale_x_continuous(expand = c(0, 0))

Resources