I have two dataframes with paired scores, each scoring patients on a 1-8 scoring system (Where 1= managing well and 8 = terminally ill).
One score is done by the patient and one by the clinician.
sample data
df <- data.frame(Patient = c(1,1,2,4,5,3,2,6,7,6,3,4,2,3,5,6,7,3,8,1), Clinican= c(1,2,2,5,4,5,4,4,4,2,3,5,4,6,5,4,3,7,7,1))
I'd like to create a bar chart similar to the one below using my dataset.
Any help would be much appreciated.
I believe I need dplyr pivot_longer similar to this post:
geom_bar two datasets together in R
Here is a solution using pivot_longer and geom_bar as you asked.
Libraries
library(dplyr)
library(tidyr)
library(ggplot2)
Solution
You can change name and value for whatever name you prefer.
Also, the x-axis is categorical, so we have to mutate it to factor.
You can then recode the factor value for the labels you need (e.g., 'well', 'very fit'...)
df %>%
pivot_longer(Patient:Clinican, names_to = "name", values_to = "value") %>%
mutate(value = factor(value)) %>%
ggplot(aes(x = value, fill = name)) +
geom_bar(position = "dodge")
Output
Related
Need help with a problem.
I am making a ggplot using dplyr and I need to group by 1 categorical variable while also facet wrapping by another.
My thought process was this:
d %>%
group_by(Grade) %>%
summarise(TotalPay = sum(PaymentsReceived)) %>%
ggplot(aes(y = Grade, x= TotalPay)) +
geom_col(fill = c(2:16), color = 'Black') +
facet_wrap(~ Status)
In this case I want to group by horizontal bars by the 'Grade' variable but also want to facet wrap based on the 'Status' variable. However, when I do this I can't facet_wrap because my group_by function eliminates the Status variables from the data set.
Any direction would help.
Thanks.
I can see many posts on this topic, but none addresses this question. Apologies if I missed a relevant answer. I have a large protein expression dataset, with samples like so as the columns:
rep1_0hr, rep1_16hr, rep1_24hr, rep1_48hr, rep1_72hr .....
and 2000+ proteins in the rows. In other words each sample is a different developmental timepoint.
If it is of any interest, the original dataset is 'mulvey2015' from the pRolocdata package in R, which I converted to a SummarizedExperiment object in RStudio.
I first ran k-means clustering on the data (an assay() of a SummarizedExperiment dataset, to get 12 clusters:
k_mul <- kmeans(scale(assay(mul)), centers = 12, nstart = 10)
Then:
summary(k_mul)
produced the expected output.
I would like the visualisation to look like this, with samples on the x-axis and expression on the y-axis. The plots look like they have been generated using facet_wrap() in ggplot:
For ggplot the data need to be provided as a dataframe with a column for the cluster identity of an individual protein. Also the data need to be in long format. I tried pivoting (pivot_longer) the original dataset, but of course there are a very large number of data points. Moreover, the image I posted shows that for any one plot, the number of coloured lines is smaller than the total number of proteins, suggesting that there might have been dimension reduction on the dataset first, but I am unsure. Up till now I have been running the kmeans algorithm without dimension reduction. Can I get guidance please for how to produce this plot?
Here is my attempt at reverse engeneering the plot:
library(pRolocdata)
library(dplyr)
library(tidyverse)
library(magrittr)
library(ggplot2)
mulvey2015 %>%
Biobase::assayData() %>%
magrittr::extract2("exprs") %>%
data.frame(check.names = FALSE) %>%
tibble::rownames_to_column("prot_id") %>%
mutate(.,
cl = kmeans(select(., -prot_id),
centers = 12,
nstart = 10) %>%
magrittr::extract2("cluster") %>%
as.factor()) %>%
pivot_longer(cols = !c(prot_id, cl),
names_to = "Timepoint",
values_to = "Expression") %>%
ggplot(aes(x = Timepoint, y = Expression, color = cl)) +
geom_line(aes(group = prot_id)) +
facet_wrap(~ cl, ncol = 4)
As for you questions, pivot_longer is usually quite performant unless it fails to find unique combinations in keys or problems related with data type conversion. The plot can be improved by:
tweaking the alpha parameter of geom_lines (e.g. alpha = 0.5), in order to provide an idea of density of lines
finding a good abbreviation and order for Timepoint
changing axis.text.x orientation
Here is my own, very similar solution to the above.
dfsa_mul <- data.frame(scale(assay(mul)))
dfsa_mul2 <- rownames_to_column(dfsa_mul, "protID")
add the kmeans $cluster column to the dfsa_mul2 dataframe. Only change clus to a factor after executing pivot_longer
dfsa_mul2$clus <- ksa_mul$cluster
dfsa_mul2 %>%
pivot_longer(cols = -c("protID", "clus"),
names_to = "samples",
values_to = "expression") %>%
ggplot(aes(x = samples, y = expression, colour = factor(clus))) +
geom_line(aes(group = protID)) +
facet_wrap(~ factor(clus))
This generates a series of plots identical to the graphs posted by #sbarbit.
I have a rather untidy dataset and can't wrap my head around how to do this in R. Alternative would be to do this in Excel but since I have several of these, this would take forever.
So what I need is to create a grouped boxplot.
For this I think I need a dataset that consists of 4 columns: species, group (A or B), variable, value.
But what I have at the moment is only:
variable and species_group (together in one column),
Here is a reproducible example:
variable <- c('precipitation','soil','land use')
species1_A <- c(10000, 500, 1322)
species1_B <- c(11500, 200, 600)
species2_A <- c(10000, 500, 1489)
species2_B <- c(15687, 800, 587)
df <- data.frame(variable, species1_A, species1_B,species2_A, species2_B)
So I guess I have to create a whole new column "group" with A or B and somehow tell R to take that information from the "species1_A" name.
Can anyone help me please? Thank you!
I'd suggest the following:
library(tidyverse)
df %>%
pivot_longer(contains("species"), names_to = "name", values_to = "value") %>%
separate(name, c("species", "group"), "_") %>%
ggplot() +
facet_wrap(~variable) +
aes(x = species, y = value, color = group) +
geom_point()
Sorry I'm not sure how you'd want things laid out and you only have one value per group in your example dataset. You can change geom_point to geom_boxplot once you have more variables per group. Spacing between the boxes can be adjusted with position_dodge. HTH.
I am trying to create a clustered bar plot for 3 different types of precipitation data. I've been doing various searches, how this might be done in R with a similar data set. However, I couldn't find any good help.
This is the dataset I am currently using. I have tried adding multiple geom_bar() but that didn't work out. See attempt below:
ggplot(ppSAcc,aes(x=date,y=as.numeric(Precipitation)))+geom_bar(stat="identity",aes(color="blue"),show.legend=FALSE,size=1)+
geom_bar(ppMAcc,stat="identity",aes(x=date,y=as.numeric(Precipitation),color="purple"),show.legend = FALSE,size=1)+
labs(title="Accumulated Solid Precipitation (Snow)",y="Precipitation (mm)")
In my second attempt, I tried creating a dataframe which includes all three precipitation types.
data<-data.frame(date=ppSAcc$date,snow=ppSAcc$Precipitation,mixed=ppMAcc$Precipitation,rain=ppRAcc$Precipitation)
Which gave me the dataframe shown above.
This is where I am stuck. I started coding ggplot ggplot(data,aes(x=date)))+geom_bar(position = "dodge",stat = "identity") but I'm not sure how to write the code such that I will have three columns(snow, mixed, rain) for each year. I'm not sure how to set the aes() part.
You need to reshape your dataframe into a longer format before to plot it in ggplot2. You can use pivot_longer function from tidyr:
library(tidyr)
library(dplyr)
library(ggplot2)
library(lubridate)
df %>% pivot_longer(-date, names_to = "var", values_to = "val") %>%
ggplot(aes(x = ymd(date), y= val, fill = var))+
geom_col(position = position_dodge())
Does it answer your question ?
If not, please provide a reproducible example of your dataset by following this guide: How to make a great R reproducible example
I have a dataframe with data of the daily evolution of the coronavirus with 4 columns: date, active cases, deaths and recoveries.
Since the sum of these 3 last values is equal to the total number of cases, I want a bar chart where each day has a corresponding bar divided in 3 parts: active cases, deaths and recoveries.
How do I do this with ggplot? Thank you in advance
A possible way is to reshape your dataframe into a longer format using pivot_longer function for example in order to fit the use with ggplot2.
As an example with a fake dataset:
library(lubridate)
df <- data.frame(date = seq(ymd("2020-01-01"),ymd("2020-01-10"),by = "day"),
active = sample(10:100,10),
death = sample(10:100,10),
recov = sample(10:100,10))
library(dplyr)
library(tidyr)
library(ggplot2)
df %>% pivot_longer(-date, names_to = "case", values_to = "val") %>%
mutate(case = factor(case, levels = c("active","recov","death"))) %>%
ggplot(aes(x = date, y = val, fill = case))+
geom_col(position = position_stack(reverse = TRUE))
Does it answer your question ?
If not, please provide a reproducible example of your dataset by following this guide: How to make a great R reproducible example and the code you have try so far.