How to group variable by quality number? - r

Here's a madeup dataset that demonstrates the general idea of what I'm working with.
Quality <- sample(1:4, 300, replace = TRUE)
reader_ID <- rep(1:3, each = 100)
df <- data.frame(Quality, reader_ID)
df
quality_percentage <- ggplot(df, aes(x = reader_ID, y = Quality, fill = Quality)) +
geom_bar(position="fill", stat="identity")
quality_percentage
Here is the graph it produced. I'm trying to have each quality grouped together instead of having them all separate.

You can simply sort your data frame by Quality before plotting:
ggplot(df[order(df$Quality),],
aes(x = reader_ID, y = Quality, fill = Quality)) +
geom_col(position = "fill")

Related

overlay lexis_grid with heatmap

I am trying to make a lexis_grid for a series of events for a synthetic cohorot of people aged 0:80 over the time period 1900-2021. What I'd like to get is something that looks a little like this:
Which I have taken from this article.
I have some dummy code created below:
library('dplyr')
library('LexisPlotR')
library('lubridate')
library('ggplot2')
df <- data.frame(
year <- sample(c(1900:2021), 1000, TRUE),
age <- sample(c(0:80), 1000, TRUE),
event <- sample(c(0:5), 1000, TRUE)
)
colnames(df) <- c("year", "age", "event")
mylexis <- lexis_grid(year_start = 1900,
year_end = 2021,
age_start = 0,
age_end = 80,
delta = 10
)
And I can create a heatmap in ggplot:
ggplot(df, aes(x = year, y = age, fill = event)) + geom_tile()
But I have been unsuccessful at combining them. These were my best guesses:
mylexis + geom_tile(df, mapping = aes(x = year(year), y = age, fill = event))
mylexis + ggplot(df, aes(x = year, y = age, fill = event)) + geom_tile()
Any advice on where to go from here?
One option would be to convert your year variable to a proper date:
library(ggplot2)
mylexis +
geom_tile(data = df, mapping = aes(x = as.Date(paste0(year, "-01-01")), y = age, fill = event))
EDIT A bit hacky but also a quick approach to change the order of the3 layers would be to manipulate the layers of the ggplot2 object like so, i.e. move the geom_tile (layer 3) to the first position (But I have to admit that at least for your example data the difference is hardly visible):
library(ggplot2)
p <- mylexis +
geom_tile(data = df, mapping = aes(x = as.Date(paste0(year, "-01-01")), y = age, fill = event))
p$layers <- p$layers[c(3, 1, 2)]
p

How create a box plot + line plot in a single plot using ggplot2

I want to create a box plot + line plot in a single plot using ggplot2
This is what my code now:
library(ggplot2)
dat <- data.frame(day = c(0,0,0,0,0,0,10,10,10,10,10,10,14,14,14,14,14,14,21,21,21,21,21,21,28,28,28,28,28,28,35,35,35,35,35,35,42,42,42,42,42,42), group = c('Saline','RP','Saline','Saline','RP','RP','Saline','RP','Saline','Saline','RP','RP','Saline','RP','Saline','Saline','RP','RP','Saline','RP','Saline','Saline','RP','RP','Saline','RP','Saline','Saline','RP','RP','Saline','RP','Saline','Saline','RP','RP','Saline','RP','Saline','Saline','RP','RP'), score = c(37.5,43,7,63,26,15,17,16,43,26,53,26,26,26,43,10,6,15,18,9,10,4,8,18,60,26,20,12.5,9,43,43,43,11,10,7,60,43,43,32,10.5,8,57.5))
g1 = ggplot(data = dat, aes(x = factor(day), y = score)) +
geom_boxplot(aes(fill = group))
g1
When doing box plot, I want scores of different treatments(groups) to be represented separately, so I let x = factor(day).
But for line plot, I want each day's score to be the average of the two treatments(group) of the day.
This is how my plot look like now
This is how I want my plot to look
How can I do this? Thank you so much!
#Libraries
library(tidyverse)
#Data
dat <- data.frame(day = c(0,0,0,0,0,0,10,10,10,10,10,10,14,14,14,14,14,14,21,21,21,21,21,21,28,28,28,28,28,28,35,35,35,35,35,35,42,42,42,42,42,42), group = c('Saline','RP','Saline','Saline','RP','RP','Saline','RP','Saline','Saline','RP','RP','Saline','RP','Saline','Saline','RP','RP','Saline','RP','Saline','Saline','RP','RP','Saline','RP','Saline','Saline','RP','RP','Saline','RP','Saline','Saline','RP','RP','Saline','RP','Saline','Saline','RP','RP'), score = c(37.5,43,7,63,26,15,17,16,43,26,53,26,26,26,43,10,6,15,18,9,10,4,8,18,60,26,20,12.5,9,43,43,43,11,10,7,60,43,43,32,10.5,8,57.5))
#How to
dat %>%
ggplot(aes(x = factor(day), y = score)) +
geom_boxplot(aes(fill = group))+
geom_line(
data = dat %>%
group_by(day) %>%
summarise(score = median(score,na.rm = TRUE)),
aes(group = 1),
size = 1,
col = "red"
)

Barplot side by side and line charts in the same plot

I want to create in R a plot which contains side by side bars and line charts as follows:
I tried:
Total <- c(584,605,664,711,759,795,863,954,1008,1061,1117,1150)
Infected <- c(366,359,388,402,427,422,462,524,570,560,578,577)
Recovered <- c(212,240,269,301,320,359,385,413,421,483,516,548)
Death <- c(6,6,7,8,12,14,16,17,17,18,23,25)
day <- itemizeDates(startDate="01.04.20", endDate="12.04.20")
df <- data.frame(Day=day, Infected=Infected, Recovered=Recovered, Death=Death, Total=Total)
value_matrix = matrix(, nrow = 2, ncol = 12)
value_matrix[1,] = df$Recovered
value_matrix[2,] = df$Death
plot(c(1:12), df$Total, ylim=c(0,1200), xlim=c(1,12), type = "b", col="peachpuff", xaxt="n", xlab = "", ylab = "")
points(c(1:12), df$Infected, type = "b", col="red")
barplot(value_matrix, beside = TRUE, col = c("green", "black"), width = 0.35, add = TRUE)
But the bar chart does not fit the line chart. I guess it would be easier to use ggplot2, but don't know how. Could anyone help me? Thanks a lot in advance!
With ggplot2, the margins are handled nicely for you, but you'll need the data in two separate long forms. Reshape from wide to long with tidyr::gather, tidyr::pivot_longer, reshape2::melt, reshape, or whatever you prefer.
library(tidyr)
library(ggplot2)
df <- data.frame(
Total = c(584,605,664,711,759,795,863,954,1008,1061,1117,1150),
Infected = c(366,359,388,402,427,422,462,524,570,560,578,577),
Recovered = c(212,240,269,301,320,359,385,413,421,483,516,548),
Death = c(6,6,7,8,12,14,16,17,17,18,23,25),
day = seq(as.Date("2020-04-01"), as.Date("2020-04-12"), by = 'day')
)
ggplot(
tidyr::gather(df, Population, count, Total:Infected),
aes(day, count, color = Population, fill = Population)
) +
geom_line() +
geom_point() +
geom_col(
data = tidyr::gather(df, Population, count, Recovered:Death),
position = 'dodge', show.legend = FALSE
)
Another way to do it is to gather twice before plotting. Not sure if this is easier or harder to understand, but you get the same thing.
df %>%
tidyr::gather(Population, count, Total:Infected) %>%
tidyr::gather(Resolution, count2, Recovered:Death) %>%
ggplot(aes(x = day, y = count, color = Population)) +
geom_line() +
geom_point() +
geom_col(
aes(y = count2, color = Resolution, fill = Resolution),
position = 'dodge', show.legend = FALSE
)
You can actually plot the lines and points without reshaping by making separate calls for each, but to dodge bars (or get legends), you'll definitely need to reshape.

plot multiple lines in ggplot

I need to plot hourly data for different days using ggplot, and here is my dataset:
The data consists of hourly observations, and I want to plot each day's observation into one separate line.
Here is my code
xbj1 = bj[c(1:24),c(1,6)]
xbj2 = bj[c(24:47),c(1,6)]
xbj3 = bj[c(48:71),c(1,6)]
ggplot()+
geom_line(data = xbj1,aes(x = Date, y= Value), colour="blue") +
geom_line(data = xbj2,aes(x = Date, y= Value), colour = "grey") +
geom_line(data = xbj3,aes(x = Date, y= Value), colour = "green") +
xlab('Hour') +
ylab('PM2.5')
Please advice on this.
I'll make some fake data (I won't try to transcribe yours) first:
set.seed(2)
x <- data.frame(
Date = rep(Sys.Date() + 0:1, each = 24),
# Year, Month, Day ... are not used here
Hour = rep(0:23, times = 2),
Value = sample(1e2, size = 48, replace = TRUE)
)
This is a straight-forward ggplot2 plot:
library(ggplot2)
ggplot(x) +
geom_line(aes(Hour, Value, color = as.factor(Date))) +
scale_color_discrete(name = "Date")
ggplot(x) +
geom_line(aes(Hour, Value)) +
facet_grid(Date ~ .)
I highly recommend you find good tutorials for ggplot2, such as http://www.cookbook-r.com/Graphs/. Others exist, many quite good.

Plot summary of unique observations with ggplot

Is it possible to count unique observations via a ggplot formula? For instance by somehow achieving the same result as this by cutting the middle line? My efforts so far e.g. using geom_histogram with stat='bin' have failed.
set.seed(1)
d = data.frame(year = sample(2005:2009, 50, prob = 1:5, rep=T),
group = sample(letters, 50, prob = 1:26, rep=T))
d2 = plyr::count(unique(d)$year)
ggplot(d2, aes(x, freq)) + geom_bar(stat='identity') + labs(x='year', y='count of groups')
stat_bin() will do the trick like this:
ggplot(unique(d), aes(x = as.factor(year))) +
stat_bin() +
labs(x='year', y='count of groups')

Resources