Making multi-group line plot with many observations more readable - r

I have created the following plot:
From a bigger version (5 rows, 58 columns) of this df:
df <- data.frame(row.names = c("ROBERT", "FRANK", "MICHELLE", "KATE"), `1` = c(31, 87, 22, 12), `2` = c(37, 74, 33, 20), `3` = c(35, 32, 44, 14))
colnames(df) <- c("1", "2", "3")
In the following manner:
df = df %>%
rownames_to_column("Name") %>%
as.data.frame()
df <- melt(df , id.vars = 'Name', variable.name = 'ep')
ggplot(df, aes(ep,value)) + geom_line(aes(colour = Name, group=Name))
The plot kind of shows what I'd like to, but it really is a mess. Does anyone have a suggestion that would help me increasing its readability?
Any help is very much appreciated!

Here are a few options for visualizing lots of datapoints across a smallish number of cases. These are illustrated with a subset of the txhousing data included with ggplot2.
Solution 1: Faceting
As #rdelrossi suggested, one solution is to facet by Name:
library(ggplot2)
ggplot(df, aes(ep,value)) +
geom_line(aes(colour = Name, group=Name), show.legend = FALSE) +
scale_x_continuous(expand = c(0,0)) +
facet_wrap(vars(Name), ncol = 1, scales = "free_x") +
theme_bw()
Solution 2: Smoothing
Use geom_smooth() to smooth out local fluctuations to see larger longer-term trends:
ggplot(df, aes(ep,value)) +
geom_smooth(
aes(colour = Name, group=Name),
se = FALSE,
span = 1, # higher number = smoother
size = 1.25
) +
scale_x_date(expand = c(0,0)) +
theme_bw()
Solution 3: Lasagna
Sometimes called a "lasagna plot," this is a heatmap with cases on the y axis, time (or whatever) on the x axis, and values mapped to color. It's a different way of comparing changes within (left to right) and between (up and down) individuals.
ggplot(df, aes(ep, Name, colour = value, fill = value)) +
geom_tile(size = .5) +
scale_fill_viridis_c(option = "B", aesthetics = c("colour", "fill")) +
coord_cartesian(expand = FALSE) +
theme(
axis.text.y = element_text(size = 12, face = "bold"),
axis.title.y = element_blank()
)
(may want to click through to larger image)
Data prep:
library(dplyr)
library(lubridate)
df <- txhousing %>%
filter(
city %in% c("Beaumont", "Amarillo", "Arlington", "Corpus Christi", "El Paso"),
between(year, 2004, 2012)
) %>%
group_by(city) %>%
mutate(
Name = city,
value = scale(sales),
ep = ym(str_c(year, month))
) %>%
ungroup()

If your readability concern is just the x axis labels, then I think the main issue is that when you use reshape2::melt() the result is that the column ep is a factor which means that the x axis of your plot will show all the levels and get crowded. The solution is to convert it to numeric and then it will adjust the labels in a sensible way.
I replace your use of reshape2::melt() with tidyr::pivot_longer() which has superseded it within the {tidyverse} but your original code would still work.
library(tidyverse)
df <- structure(list(`1` = c(31, 87, 22, 12), `2` = c(37, 74, 33, 20), `3` = c(35, 32, 44, 14)), class = "data.frame", row.names = c("ROBERT", "FRANK", "MICHELLE", "KATE"))
df %>%
rownames_to_column("Name") %>%
pivot_longer(-Name, names_to = "ep") %>%
mutate(ep = as.numeric(ep)) %>%
ggplot(aes(ep, value, color = Name)) +
geom_line()
Created on 2022-03-07 by the reprex package (v2.0.1)

Another solution could be the use of a geom_bar()
Sample code:
ggplot(df, aes(fill=Name)) +
geom_bar(aes(x=ep, y=value, group=Name),stat="identity", position = position_dodge(width = 0.9)) +
labs(x="ep", y="count")+
scale_y_continuous(expand=c(0,0))+
theme_bw()
Plot:
Also you can add facet_grid(~Name)+
Also you can add
geom_text(aes(label=value), position = position_stack(vjust = .5))+

Related

Combining two heatmaps with the variables next to each other

I'm trying to combine two heatmaps. I want var_a and var_x on the y axis with for example: var_a first and then var_x. I don't know if I should do this by changing the dataframe or combining them, or if I can do this in ggplot.
Below I have some example code and a drawing of what I want (since I don't know if I explained it right).
I hope someone has ideas how I can do this either in the dataframe or in ggplot!
Example code:
df_one <- data.frame(
vars = c("var_a", "var_b", "var_c"),
corresponding_vars = c("var_x", "var_y", "var_z"),
expression_organ_1_vars = c(5, 10, 20),
expression_organ_2_vars = c(50, 2, 10),
expression_organ_3_vars = c(5, 10, 3)
)
df_one_long <- pivot_longer(df_one,
cols=3:5,
names_to = "tissueType",
values_to = "Expression")
expression.df_one <- ggplot(df_one_long,
mapping = aes(y=tissueType, x=vars, fill = Expression)) +
geom_tile()
expression.df_one
df_two <- data.frame(
corresponding_vars = c("var_x", "var_y", "var_z"),
expression_organ_1_corresponding_vars = c(100, 320, 120),
expression_organ_2_corresponding_vars = c(23, 30, 150),
expression_organ_3_corresponding_vars = c(89, 7, 200)
)
df_two_long <- pivot_longer(df_one,
cols=3:5,
names_to = "tissueType",
values_to = "Expression")
expression.df_two <- ggplot(df_two_long,
mapping = aes(y=tissueType, x=vars, fill = Expression)) +
geom_tile()
expression.df_two
Drawing:
You can bind your data frames together and pivot into a longer format so that vars and corresponding vars are in the same column, but retain a grouping variable to facet by:
df_two %>%
mutate(cor = corresponding_vars) %>%
rename_with(~sub('corresponding_', '', .x)) %>%
bind_rows(df_one %>% rename(cor = corresponding_vars)) %>%
pivot_longer(contains('expression'), names_to = 'organ') %>%
mutate(organ = gsub('expression_|_vars', '', organ)) %>%
group_by(cor) %>%
summarize(vars = vars, organ = organ, value = value,
cor = paste(sort(unique(vars)), collapse = ' cor ')) %>%
ggplot(aes(vars, organ, fill = value)) +
geom_tile(color = 'white', linewidth = 1) +
facet_grid(.~cor, scales = 'free_x', switch = 'x') +
scale_fill_viridis_c() +
coord_cartesian(clip = 'off') +
scale_x_discrete(expand = c(0, 0)) +
theme_minimal(base_size = 16) +
theme(strip.placement = 'outside',
axis.text.x = element_blank(),
axis.ticks.x.bottom = element_line(),
panel.spacing.x = unit(3, 'mm'))
Okay, so I solved the issue for my own project, which is to convert it to a scatter plot. I combined both datasets and then used a simple scatterplot.
df.combined <- dplyr::full_join(df_two_long, df_one_long,
by = c("vars", "corresponding_vars", "tissueType"))
ggplot(df.combined,
aes(x=vars, y=tissueType, colour=Expression.x, size = Expression.y)) +
geom_point()
It's not a solution with heatmaps, but I don't know how to do that at the moment.

Plot multiple lines (data series) with unique colors and custom x_axis in R

I'm trying to generate a plot in R which has multiple lines (each line represents a different category), each with unique colors. The x-axis is the time, which start at 17:00 and end at 9:00 the next day. The y-axis is the frequency (i.e. number of counts) of each category at certain time. Please have a look at the csv file that I use to plot this:
Time,-3,-2.5,-2,-1.5,-1,-0.5,0,0.5,1,1.5,2,2.5,3
0,0,0,0,0,0,0,288,224,148,78,37,23,19
1,0,0,0,0,0,0,321,208,128,74,55,20,11
2,0,0,0,0,0,0,326,212,128,80,46,20,5
3,0,0,0,0,0,0,345,209,131,73,36,17,6
4,0,0,0,0,0,0,364,201,117,77,38,15,5
5,0,0,0,0,0,0,390,205,100,73,36,10,3
6,0,0,0,0,0,0,406,196,121,57,24,8,5
7,0,0,0,0,0,1,560,161,62,25,5,3,0
8,0,0,0,0,0,18,772,22,5,0,0,0,0
9,0,0,0,0,18,130,667,1,0,1,0,0,0
10,0,0,0,2,55,256,503,1,0,0,0,0,0
11,1,0,0,7,106,349,354,0,0,0,0,0,0
12,1,1,0,12,184,368,251,0,0,0,0,0,0
13,0,0,0,32,228,357,200,0,0,0,0,0,0
14,0,0,0,51,245,314,208,0,0,0,0,0,0
15,0,0,0,51,232,317,218,0,0,0,0,0,0
16,0,0,0,37,224,338,218,1,0,0,0,0,0
17,0,0,0,21,156,350,290,1,0,0,0,0,0
18,0,0,0,2,72,351,392,1,0,0,0,0,0
19,0,0,0,0,15,207,587,9,0,0,0,0,0
20,0,0,0,0,1,33,748,34,2,0,0,0,0
21,0,0,0,0,0,3,609,137,51,12,4,1,1
22,0,0,0,0,0,0,325,241,133,71,31,11,6
23,0,0,0,0,0,0,272,227,149,82,50,21,17
Aside from column Time, each column represent a category (i.e. -3, -2.5, -2, etc...). At time 0, category -3 appears 0 times while category 3 appear 19 times and so on. I want my lines to represent the categories and show the frequency of each category over time on the graph (similar to this question, but instead of just crimeFreq I have multiple categories here.
Another 2 things are:
My x-axis need to start from 17 (means 17:00 or 5pm) to 9 (means 9:00) or 9am).
I only need the categories that are ">= 0" (i.e. 0, 0.5, 1, etc...)
I have tried the solution above and solution from this question but was not being able to success. Some of my attempts are:
Attempt 1:
df = read.csv("data_summary.csv")
// Taking the rows in the order of time that I want (i.e. from 17:00 to 9:00)
row_to_take = c(18,19,20,21,22,23, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
// Tring to plot it with x-axis in desired order
matplot(x = df$Time[row_to_take], y = df[row_to_take, 9:14], ylab = "Frequency", xlab = "Hour", type = c("b"), pch=3, col = 1:7, xaxt="n" )
axis(1, at = c(17, 18, 19, 20, 21, 22, 23, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9), las=0)
legend("topleft", legend = 1:7, col=1:7, pch=3)
However, this attempt does not generate a correct graph (incorrect x-axis and the category that was represent by the black color got drawn twice)
My second attempt:
ggplot(df, aes(Time)) + geom_line(aes(y = 0, colour = "0")) +
geom_line(aes(y = 0.5, colour = "0.5")) + geom_line(aes(y = 1, colour = "1"))+
geom_line(aes(y = 1.5, colour = "1.5"))+ geom_line(aes(y = 2, colour = "2"))+
geom_line(aes(y = 2.5, colour = "2.5"))+ geom_line(aes(y = 3, colour = "3"))
This attemp has the same problem with the first attemp. Also, I don't know how to change the legend's name for each color and the axis's name (xlab and ylab doesn't work ??)
Please suggest me a simple solution. I'm very new in R and don't know much about advance functions/packages. Thanks you all in advance :)
You could use the lovely package ggplot2. First, you should make your dataframe in a longer format using pivot_longer and then you can assign each category as a line with color. You can filter the categories like this:
df <- read.table(text = "Time,-3,-2.5,-2,-1.5,-1,-0.5,0,0.5,1,1.5,2,2.5,3
0,0,0,0,0,0,0,288,224,148,78,37,23,19
1,0,0,0,0,0,0,321,208,128,74,55,20,11
2,0,0,0,0,0,0,326,212,128,80,46,20,5
3,0,0,0,0,0,0,345,209,131,73,36,17,6
4,0,0,0,0,0,0,364,201,117,77,38,15,5
5,0,0,0,0,0,0,390,205,100,73,36,10,3
6,0,0,0,0,0,0,406,196,121,57,24,8,5
7,0,0,0,0,0,1,560,161,62,25,5,3,0
8,0,0,0,0,0,18,772,22,5,0,0,0,0
9,0,0,0,0,18,130,667,1,0,1,0,0,0
10,0,0,0,2,55,256,503,1,0,0,0,0,0
11,1,0,0,7,106,349,354,0,0,0,0,0,0
12,1,1,0,12,184,368,251,0,0,0,0,0,0
13,0,0,0,32,228,357,200,0,0,0,0,0,0
14,0,0,0,51,245,314,208,0,0,0,0,0,0
15,0,0,0,51,232,317,218,0,0,0,0,0,0
16,0,0,0,37,224,338,218,1,0,0,0,0,0
17,0,0,0,21,156,350,290,1,0,0,0,0,0
18,0,0,0,2,72,351,392,1,0,0,0,0,0
19,0,0,0,0,15,207,587,9,0,0,0,0,0
20,0,0,0,0,1,33,748,34,2,0,0,0,0
21,0,0,0,0,0,3,609,137,51,12,4,1,1
22,0,0,0,0,0,0,325,241,133,71,31,11,6
23,0,0,0,0,0,0,272,227,149,82,50,21,17", header = TRUE, sep = ",", check.names = FALSE)
library(dplyr)
library(ggplot2)
library(tidyr)
df %>%
pivot_longer(cols = -Time) %>%
filter(name >= 0) %>%
ggplot(aes(x = Time, y = value, colour = name)) +
geom_line() +
labs(x = "Time", y = "Value", colour = "Category")
Created on 2022-08-25 with reprex v2.0.2
Reverse x-axis values
You could use scale_x_continuous with "reverse":
library(dplyr)
library(ggplot2)
library(tidyr)
df %>%
pivot_longer(cols = -Time) %>%
filter(name >= 0) %>%
filter(Time >= 9 & Time <= 17) %>%
ggplot(aes(x = Time, y = value, colour = name)) +
geom_line() +
scale_x_continuous(trans = "reverse") +
labs(x = "Time", y = "Value", colour = "Category")
Created on 2022-08-25 with reprex v2.0.2
I combined the answer above from #Quinten and this answer: Setting limits with scale_x_time and was able to come up with this:
df = read.csv("data_summary.csv", check.names = FALSE, header = TRUE)
df$Time <- as.POSIXct(df$Time)
df %>%
pivot_longer(cols = -Time) %>%
filter(name > 0) %>%
ggplot(aes(x=Time, y=value, color=name)) +
geom_line()+
labs(x="Time", y="Frequency", title="")+
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1,family="Times",
face="bold", size=12),
axis.text.y = element_text(family="Times", face="bold", size=12),
strip.text = element_text(size=6, face="bold")) +
scale_y_continuous(expand = expansion(mult = c(0, .1))) +
scale_x_datetime(date_labels = '%T',
limits = c(as.POSIXct('2022-08-24 22:00:00', tz = 'UTC'),
as.POSIXct('2022-08-25 14:00:00', tz = 'UTC')),
breaks = '1 hours')
The code yields this graph:
which is what I need. Note that I have to change my "Time" column into the format: 0:00, 1:00,etc... so that I can use as.POSIXct(df$Time) on it.

How to costumize bar chart in ggplot R?

I have a mirrored bar-chart, and I want avoid mirror bars and have the same graphs but 2 columns for each category:
Negative and positive (firstly positive values on right side and below negative values on left side)
Colours must be determined by 'Model' categorical variable and pos & neg values need to be different, like fully coloured bars = positive, bounded (with the same colour) but not filled bars = negative.
Also, USA based values must be on the top and Canada based values below
df <- data.frame (Origin = c("Canada", "Canada","Canada", "Canada","Canada", "Canada","USA","USA","USA","USA","USA","USA"),
Model = c("A","B","C","D","E","F","A","B","C","D","E","F"),
poschange = c(60, 45,34,56, 65, 44,40, 55, 35, 24,34,12),
negchange = c(-5,-2,-0.5,-2,-1,-0.05,-1,-3,-0.1,-3,-1.5,-0.9))
require(dplyr)
require(ggplot2)
require(tidyr)
df2 <- df %>% pivot_longer(., cols=c('poschange','negchange'),
names_to = 'value_category')
df2 <- df2 %>% mutate(Groups = paste(Origin, Model))
df2 <- df2 %>% mutate(label_position=ifelse(value>0, value-5,value-8)) # adjusting label position
df2 %>% arrange(value) %>% ggplot(aes(x=value, y=reorder(Groups,value),
fill=value_category,
group=value_category))+
geom_col(width=0.75)
coord_flip()
Output:
Desired output (something like this but colours must be corresponding to Model cat. variable):
Maybe something like this?
Use an ifelse statement to label the negative values as "white"
To have a fill of white, use scale_fill_manual with a my_color palette
To avoid "mirrored" bars, use position = "dodge"
To have negative and positive values side-by-side, you need to swap your x and y argument in ggplot
To avoid overlapping text on the x-axis, use theme(axis.text.x = element_text(angle = 90))
Use the breaks argument in both scale_xxx_manual function to remove the "white" label in the legend
library(tidyverse)
df <- data.frame (Origin = c("Canada", "Canada","Canada", "Canada","Canada", "Canada","USA","USA","USA","USA","USA","USA"),
Model = c("A","B","C","D","E","F","A","B","C","D","E","F"),
poschange = c(60, 45,34,56, 65, 44,40, 55, 35, 24,34,12),
negchange = c(-5,-2,-0.5,-2,-1,-0.05,-1,-3,-0.1,-3,-1.5,-0.9))
df2 <- df %>% pivot_longer(., cols=c('poschange','negchange'),
names_to = 'value_category') %>%
mutate(Groups = paste(Origin, Model),
value_category = factor(value_category, levels = c("negchange", "poschange")))
my_color = c("A" = '#7fc97f', "B" = '#beaed4', "C" = '#fdc086',
"D" = '#ffff99', "E" = '#386cb0', "F" = '#f0027f', "white" = "white")
ggplot(df2, aes(value, Model,
fill = ifelse(value_category == "negchange", "white", Model),
color = Model)) +
geom_col(position = "dodge") +
scale_fill_manual(values = my_color, breaks = df2$Model) +
scale_color_manual(values = my_color, breaks = df2$Model) +
labs(fill = "Model") +
facet_grid(Origin ~ ., switch = "y") +
theme(axis.text.x = element_text(angle = 90),
strip.background = element_rect(fill = "white"),
strip.placement = "outside",
strip.text.y.left = element_text(angle = 0),
panel.spacing = unit(0, "lines"))
Created on 2022-05-03 by the reprex package (v2.0.1)

Population pyramid with gender and comparing across two time periods with ggplot2

I am new to R & ggplot2 and wondering if it is possible to do a population pyramid for Male & Female and comparing each of the gender across two time periods. Please see the screenshot for the details. I have found quite a few resources on this site that show how to build population pyraminds but they all use only one variable i.e. gender. I want to compare gender and time periods in the same chart.
Any help is greatly appreciated. Thank you.
Here is an idea. First you didn't prepare an example dataset. Therefore I created this df. Note that the Values (number of people) are negative for women.
df <- data.frame(Gender = rep(c("M", "F"), each = 20),
Age = rep(c("0-10", "11-20", "21-30", "31-40", "41-50",
"51-60", "61-70", "71-80", "81-90", "91-100"), 4),
Year = factor(rep(c(2009, 2010, 2009, 2010), each= 10)),
Value = sample(seq(50, 100, 5), 40, replace = TRUE)) %>%
mutate(Value = ifelse(Gender == "F", Value *-1 , Value))
Next step is to everything in a bar plot. The function interaction helps to color the bars by Gender and Year. In scale_fill_manual the color can be specified. Alternativly you can use fill = Gender and alpha = Year if you don't want to use the interaction.
ggplot(df) +
geom_col(aes(fill = interaction(Gender, Year, sep = "-"),
y = Value,
x = Age),
position = "dodge") +
scale_y_continuous(labels = abs,
expand = c(0, 0)) +
scale_fill_manual(values = hcl(h = c(15,195,15,195),
c = 100,
l = 65,
alpha=c(0.4,0.4,1,1)),
name = "") +
coord_flip() +
facet_wrap(.~ Gender,
scale = "free_x",
strip.position = "bottom") +
theme_minimal() +
theme(legend.position = "bottom",
panel.spacing.x = unit(0, "pt"),
strip.background = element_rect(colour = "black"))

Is there a way to create borders around rectangles in ggplot2 with geom_raster?

I'm creating a visualization of missing data by slightly tweaking some of the code from the missmap function in the Amelia package. I want to draw borders around my rectangles, but I can't figure out a way to do that in ggplot2.
I found the function "borders()" but that appears to be related to map work. I also tried using geom_rect, but it seems like that would require me to specify mins and maxes. Geom_raster seems to be doing exactly what I need, but I can't figure out how to specify borders.
This example code creates the visualization that I'm imagining, but I have more variables in the "real" version and I'd like to be able to outline each variable (var1, var2, etc.) with a line (border).
#Dataset
missmap_data_test <- data.frame(var1 = c(11, 26, NA, NA, 15),
var2 = c(NA, NA, 0, NA, 1))
#Create Function
ggplot_missing <-
function(x){
x %>%
is.na %>%
melt %>%
ggplot(data = .,
aes(x = Var2,
y = Var1)) +
geom_raster(aes(fill = value)) +
scale_fill_grey(name = "",
labels = c("Present","Missing")) +
theme_minimal() +
theme(axis.text.x = element_text(angle=90, hjust=1)) +
labs(x = "Variables in Dataset",
y = "Observations")
}
#Feed the function my new data
ggplot_missing(missmap_data_test)
As #Axeman suggests, geom_tile does the job. I've updated your code to give an example below. Here, colour defines the colour of the border, while size define the thickness.
#Dataset
missmap_data_test <- data.frame(var1 = c(11, 26, NA, NA, 15),
var2 = c(NA, NA, 0, NA, 1))
# Load libraries
library(dplyr)
library(ggplot2)
library(reshape2)
#Create Function
ggplot_missing <- function(x){
x %>%
is.na %>%
melt %>%
ggplot(data = .,
aes(x = Var2,
y = Var1)) +
geom_tile(aes(fill = value), colour = "#FF3300", size = 2) +
scale_fill_grey(name = "",
labels = c("Present","Missing")) +
theme_minimal() +
theme(axis.text.x = element_text(angle=90, hjust=1)) +
labs(x = "Variables in Dataset",
y = "Observations")
}
#Feed the function my new data
ggplot_missing(missmap_data_test)
Created on 2019-05-30 by the reprex package (v0.3.0)
If you're getting notches in the top left corner (discussed here and apparent in the plot above), you may want to update to the development version of ggplot2. That is, devtools::install_github("tidyverse/ggplot2"). For example, compare the plot above with the plot below:
Update
I assume this is a toy example, so I've tried to come up with a generic solution. Here, I've created a function called boxy that will make a data frame for geom_rect based on the original data frame.
#Dataset
missmap_data_test <- data.frame(var1 = c(11, 26, NA, NA, 15),
var2 = c(NA, NA, 0, NA, 1))
# Function for making box data frame
boxy <- function(df){
data.frame(xmin = seq(0.5, ncol(df) - 0.5),
xmax = seq(1.5, ncol(df) + 0.5),
ymin = 0.5, ymax = nrow(df) + 0.5)
}
# Load libraries
library(dplyr)
library(ggplot2)
library(reshape2)
#Create Function
ggplot_missing <- function(x){
df_box <- boxy(x)
df_rast <- x %>% is.na %>% melt
ggplot() +
geom_raster(data = df_rast,
aes(x = Var2,
y = Var1,
fill = value)) +
geom_rect(data = df_box,
aes(xmin = xmin, xmax = xmax,
ymin = ymin, ymax = ymax),
colour = "#FF3300", fill = NA, size = 3) +
scale_fill_grey(name = "",
labels = c("Present","Missing")) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
labs(x = "Variables in Dataset",
y = "Observations")
}
#Feed the function my new data
ggplot_missing(missmap_data_test)
Created on 2019-05-30 by the reprex package (v0.3.0)
If you add a third variable (i.e., column) to your data frame, you get something like this:

Resources