I would really appreciate some help with this plot. I'm very new to R and struggling (after looking at many tutorials!)to understand how to plot the following:
This is my Table
The X axis is meant to have PatientID, the Y is cell counts for each patient
I've managed to do a basic plot for each variable individually, eg:
This is for 2 of the variables
And this gives me 2 separate graphs
Total cell counts
Cells counts for zone 1
I would like all the data represented on 1 graph...That means for each patient, there will be 4 bars (tot cell counts, and cell counts for each zone (1 - 3).
I don't understand whether I should be doing this as a combined plot or make the 4 different plots and then combine them together? I'm also very confused with how to actually code this. I've tried ggplot and I've done the regular Barplot in R (worked for 1 variable at a time but not sure how to do many variables). Some very step-by-step help would be so much appreciated here. TIA
Here's a way of doing it using the ggplot2 and tidyr packages from the tidyverse. The key steps are pivoting your data from "wide" to "long" format in order to make it usable with ggplot2. Afterwards, the ggplot call is pretty simple - more info here if you want a bit more explanation about stacked and bar plots in ggplot2, with an example that's pretty much identical to yours.
library(ggplot2)
library(tidyr)
# Reproducing your data
dat <- tibble(
patientID = c("a", "b", "c"),
tot_cells = c(2773, 3348, 4023),
tot_cells_zone1 = c(994, 1075, 1446),
tot_cells_zone2 = c(1141, 1254, 1349),
tot_cells_zone3 = c(961, 1075, 1426)
)
to_plot <- pivot_longer(dat, cols = starts_with("tot"), names_to = "Zone", values_to = "Count")
ggplot(to_plot, aes(x = patientID, y = Count, fill = Zone)) +
geom_bar(position="dodge", stat="identity")
Output:
Thanks everyone for your help. I was able to make the plot as follows:
First, I made a new table from data I imported into R:
#Make new table of patientID and tot cell count
patientID <- c("a", "b", "c")
tot_cells <- c(tot_cells_a, tot_cells_b, tot_cells_c)
tot_cells_zone1 <- c(tot_cells_a_zone1, tot_cells_b_zone1, tot_cells_c_zone1)
tot_cells_zone2 <- c(tot_cells_a_zone2, tot_cells_b_zone2, tot_cells_c_zone2)
tot_cells_zone3 <- c(tot_cells_a_zone3, tot_cells_b_zone3, tot_cells_c_zone3)
tot_cells_table <- data.frame(tot_cells,
tot_cells_zone1,
tot_cells_zone2,
tot_cells_zone3)
rownames(tot_cells_table) <- c(patientID)
Then I plotted as such, first converting the data.frame to matrix :
#Plot "Total Microglia Counts per Patient"
tot_cells_matrix <- data.matrix(tot_cells_table, rownames.force = patientID)
par(mar = c(5, 4, 4, 10),
xpd = TRUE)
barplot(t(tot_cells_table[1:3, 1:4]),
col = c("red", "blue", "green", "magenta"),
main = "Total Microglia Counts per Patient",
xlab = "Patient ID", ylab = "Cell #",
beside = TRUE)
legend("topright", inset = c(- 0.4, 0),
legend = c("tot_cells", "tot_cells_zone1",
"tot_cells_zone2", "tot_cells_zone3"),
fill = c("red", "blue", "green", "magenta"))
And the graph looks like this:
Barplot of multiple variables
Thanks again for pointing me in the right direction!
Related
I want to make pie charts for each column of my dataframe, where the slices represent the frequency, in which the values in the columns appear. For instance, the following will produce a data frame with 3 columns, and will round the numbers down to single digits.
test1<-rnorm(200,mean = 20, sd = 2)
test2<-rnorm(200,mean=20, sd =1)
test3<-rnorm(200,mean=20, sd =3)
testdata<-cbind(test,test2,test3)
testdata <-round(testdata,0)
So I would need to have 3 pie charts, where the slices represent the number of times, in which a given value appears in the respective column (with the name of the column on top of the pie chart, if possible)
So far, I have tried pie(frame(testdata$test1)) but it works for creating a single pie chart, and my real data has 25 columns. On top of that, trying to pass a "main=" argument to name it, results in error.
Thank you in advance.
ggplot2 is the go-to library to make nice plots. To have 3 different pie-plots one needs to adjust the data a bit, which is done with some tidyverse-functions.
test1<-rnorm(200,mean = 20, sd = 2)
test2<-rnorm(200,mean=20, sd =1)
test3<-rnorm(200,mean=20, sd =3)
testdata<-cbind(test1,test2,test3)
testdata <-round(testdata,0)
library(ggplot2)
library(tidyverse)
plotdata <- testdata %>%
as_tibble() %>%
pivot_longer(names(.),names_to = "data1", values_to = "value") %>%
group_by(data1) %>%
count(value)
ggplot(plot_data, aes( x = "", y = n, fill = factor(value))) +
geom_col(width = 1, show.legend = TRUE) +
coord_polar("y", start = 0) +
facet_wrap(~data1)
I am using a gene expression dataset from ~100 cells.
I want to generate a dot plot indicating which cells are expressing which genes, like below, excluding the color delineations.
I have tried ggplot solutions, but (from what I can tell) Ggplot2 cannot graph numerous variables in each axis. I've looked into more complex packages like Seurot and cRegulome (the image above is from cRegulome), but these produce more information the graphical output than I want.
Below is an example of the type of data frame I am working with.
Cell_A<-c(0,0,1,0,1,0,1,0)
Cell_B<-c(1,1,1,0,0,0,1,0)
Cell_C<-c(1,0,1,0,0,1,0,1)
Cell_D<-c(0,0,0,1,1,1,1,0)
Cell_E<-c(1,1,1,1,1,0,1,1)
Cell_F<-c(0,0,0,0,0,1,1,0)
Cell_G<-c(1,1,1,1,1,1,1,1)
Cell_H<-c(1,1,1,1,1,1,1,1)
Genes <- c("Gene1","Gene2","Gene3","Gene4","Gene5","Gene6","Gene7","Gene8")
fake_data <- data.frame(Cell_A, Cell_B, Cell_C, Cell_D, Cell_E,
Cell_F, Cell_G,Cell_H, row.names = Genes)
How can I manipulate this dataset to get the graphical output I want?
You can do this by reshaping the data and using geom_point. Map the size aesthetic to your count variable and it will work well. The legend is currently a bit nonsensical but can be manually tweaked if you do not have any other sizes than 0 and 1.
library(tidyverse)
Cell_A<-c(0,0,1,0,1,0,1,0)
Cell_B<-c(1,1,1,0,0,0,1,0)
Cell_C<-c(1,0,1,0,0,1,0,1)
Cell_D<-c(0,0,0,1,1,1,1,0)
Cell_E<-c(1,1,1,1,1,0,1,1)
Cell_F<-c(0,0,0,0,0,1,1,0)
Cell_G<-c(1,1,1,1,1,1,1,1)
Cell_H<-c(1,1,1,1,1,1,1,1)
Genes <- c("Gene1","Gene2","Gene3","Gene4","Gene5","Gene6","Gene7","Gene8")
fake_data <- data.frame(Cell_A, Cell_B, Cell_C, Cell_D, Cell_E,
Cell_F, Cell_G,Cell_H, row.names = Genes)
fake_data %>%
rownames_to_column(var = "gene") %>%
gather(cell, count, -gene) %>%
ggplot() +
geom_point(aes(x = gene, y = cell, size = count))
Created on 2019-08-02 by the reprex package (v0.3.0)
This solution is a base R solution that relies on matplot().
fake_data2 <- sweep(fake_data, 2, seq_len(length(fake_data)), FUN = '*')
fake_data2[fake_data2 == 0] <- NA_integer_
matplot(x = seq_along(Genes), y = as.matrix(fake_data2),
, cex = colSums(fake_data) / 3, pch = 16, col = 1
, yaxt='n', xaxt='n', ann=FALSE)
axis(1, at = seq_along(Genes), Genes)
axis(2, at = seq_len(length(fake_data)), names(fake_data), las = 1)
You didn't provide enough details on how what size you wanted. The size here is based on the number of 1 values for each column.
I am trying to plot multiple box plots as a single graph. The data is where I have done a wilcoxon test. It should be like this
I have four/five questions and I want to plot the respondent score for two sets as a box plot. This should be done for all questions (Two groups for each question).
I am thinking of using ggplot2. My data is like
q1o <- c(4,4,5,4,4,4,4,5,4,5,4,4,5,4,4,4,5,5,5,5,5,5,5,5,5,3,4,4,3,4)
q1s <- c(5,4,4,5,5,5,5,5,4,5,4,4,5,4,5,5,5,5,5,5,5,5,5,5,5,5,4,5,4,4)
q2o <- c(3,3,3,4,3,4,4,3,3,3,4,4,3,4,3,3,4,3,3,3,3,4,4,4,4,3,3,3,3,4)
q2s <- c(5,4,4,5,5,5,5,5,4,5,4,4,5,4,5,5,5,5,5,5,5,5,5,5,5,5,4,3,4,4)
....
....
q1 means question 1 and q2 means question 2. I also want to know how to align these stacked box plots based on my need. Like one row or two rows.
This should get you started:
Unfortunately you don't provide a minimal example with sample data, so I will generate some random sample data.
# Generate sample data
set.seed(2017);
df <- cbind.data.frame(
value = rnorm(1000),
Label = sample(c("Good", "Bad"), 1000, replace = T),
variable = sample(paste0("F", 5:11), 1000, replace = T));
# ggplot
library(tidyverse);
df %>%
mutate(variable = factor(variable, levels = paste0("F", 5:11))) %>%
ggplot(aes(variable, value, fill = Label)) +
geom_boxplot(position=position_dodge()) +
facet_wrap(~ variable, ncol = 3, scale = "free");
You can specify the number of columns and rows in your 2d panel layout through arguments ncol and nrow, respectively, of facet_wrap. Many more details and examples can be found if you follow ?geom_boxplot and ?facet_wrap.
Update 1
A boxplot based on your sample data doesn't make too much sense, because your data are not continuous. But ignoring that, you could do the following:
df <- data.frame(
q1o = c(4,4,5,4,4,4,4,5,4,5,4,4,5,4,4,4,5,5,5,5,5,5,5,5,5,3,4,4,3,4),
q1s = c(5,4,4,5,5,5,5,5,4,5,4,4,5,4,5,5,5,5,5,5,5,5,5,5,5,5,4,5,4,4),
q2o = c(3,3,3,4,3,4,4,3,3,3,4,4,3,4,3,3,4,3,3,3,3,4,4,4,4,3,3,3,3,4),
q2s = c(5,4,4,5,5,5,5,5,4,5,4,4,5,4,5,5,5,5,5,5,5,5,5,5,5,5,4,3,4,4));
df %>%
gather(key, value, 1:4) %>%
mutate(
variable = ifelse(grepl("q1", key), "F1", "F2"),
Label = ifelse(grepl("o$", key), "Bad", "Good")) %>%
ggplot(aes(variable, value, fill = Label)) +
geom_boxplot(position = position_dodge()) +
facet_wrap(~ variable, ncol = 3, scale = "free");
Update 2
One way of visualising discrete data would be in a mosaicplot.
mosaicplot(table(df2));
The plot shows the count of value (as filled rectangles) per Variable per Label. See ?mosaicplot for details.
There are a few similar questions but they are not asking what I am looking for.
I have a gene expression data with multiple independent variables. I want to visualize it using a heatmap in R. I am not able to include all the three variables together on the heatmap. Below is the example code:
species <- rep(c("st", "rt"), each = 18)
life <- rep(c("5d", "15d", "45d"), 2, each = 6)
concentration <- rep(c("c1", "c2", "c3"), 6, each = 2)
gene <- rep(c("gene1", "gene2"), 36, each = 1)
response <- runif(36, -4, 4)
data1 <- data.frame(species, life, concentration, gene, response)
I am open to use any package. Please see below image which is from a different dataset. I wish to visualize my data in a similar manner.
example_data_visualized
Many thanks in advance!
I am not sure which of the variables in your code correspond to which of the dimensions in your chart but, using the ggplot2 package, it's quite easy to do it:
library(ggplot2)
ggplot(data1, aes(x = factor(life, levels = c("5d", "15d", "45d")),
y = concentration,
fill = response)) +
geom_tile() +
facet_wrap(~species + gene, nrow = 1) +
scale_fill_gradient(low = "red", high = "green", guide = FALSE) +
scale_x_discrete(name = "life")
Of course, you can adjust the titles, labels, colours etc accordingly.
I'm trying to plot distribution of species between 2 different habitat types (hab 1 and hab 2). Some of my species secondarily use some habitats, so I have a separate column for secondary hab1 (hab1.sec). To visualise their distribution across the two habitats and different depths, I am using a facet_grid between hab1 and hab2. Example code as below:
# example code
set.seed(101)
ID <- seq(1,20, by=1) ## ID for plotting
species <- sample(letters, size=20) ## arbitrary species
## different habitat types in hab.1
hab1 <- c("coastal","shelf","slope","open.ocean","seamount")
hab1.pri <- sample(hab1, size = 20, replace = T)
## secondarily used habitats, may not be present for some species
hab.sec <- c("coastal","shelf","slope","open.ocean","seamount", NA)
hab1.sec <- sample(hab.sec, size = 20, replace = T)
## habitat types for hab.2
hab2 <- c("epipelagic","benthopelagic","epibenthic","benthic")
hab.2 <- sample(hab2, size = 20, replace = T)
## arbitrary depth values
dep.min <- sample(seq(0,1000), size = 20, replace = T)
dep.max <- sample(seq(40, 1500), size = 20, replace = T)
# make data frame
dat <- data.frame(ID, species, hab1.pri, hab1.sec, hab.2,dep.min, dep.max)
# ggplot with facet grid
p <- ggplot(data=dat)+ geom_segment(aes(x=as.factor(ID),xend=as.factor(ID),y=dep.min, yend=dep.max),size=2,data = dat)+ scale_y_reverse(breaks = c(0, 200, 1000,1500))+facet_grid(hab.2~hab1.pri, scales = "free" ,space = "free")+theme_bw()
I would like to add segments for hab1.sec within the existing facet grid. I have tried this code:
p+ geom_segment(aes(x=as.factor(ID),xend=as.factor(ID),y=dep.min, yend=dep.max),linetype=2,data = dat)+facet_wrap(~hab1.sec)
But doing this produces a new graph.
Is there a better way to add those extra lines to the existing grid (preferably as dashed lines)?
I'd be really grateful for any help with this!
Thanks a lot, in advance!
What about combining the primary and secondary habitats into one variable and mapping that variable to an aesthetic?
Note I'm using tidyr and dplyr tools here because they help a lot in cases like this.
library(dplyr)
library(tidyr)
dat %>%
gather(hab1, value, -ID, -species, -(hab.2:dep.max)) %>%
ggplot()+
geom_segment(aes(x=as.factor(ID),xend=as.factor(ID),y=dep.min, yend=dep.max, linetype=hab1),size=2) +
scale_y_reverse(breaks = c(0, 200, 1000,1500))+
facet_grid(hab.2~value, scales = "free" ,space = "free")+
theme_bw()