Boxplot across three timepoints in ggplot - r

I would like boxplots with all three timepoints in my data on the same plot
Data:
df<-
structure(list(ID = c("ED_001", "ED_002", "ED_003", "ED_004",
"ED_005"), Color = c("Black", "White", "Black", "Black", "White"
), Data_t1 = c(150, 159, 160, 154, 187), Data_t2 = c(123, 124,
125, 126, 140), Data_t3 = c(133, 135, 145, 150, 153)), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -5L), spec = structure(list(
cols = list(ID = structure(list(), class = c("collector_character",
"collector")), Color = structure(list(), class = c("collector_character",
"collector")), Data_t1 = structure(list(), class = c("collector_double",
"collector")), Data_t2 = structure(list(), class = c("collector_double",
"collector")), Data_t3 = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
I can plot the first timepoint easily enough:
df %>%
ggplot(. , aes(x = as.factor(Color), y = Data_t1)) +
geom_boxplot()
But how do I also plot Data_t2 and Data_t3? I don't think facet_wrap is the right approach. Do I group_by timepoint, and if so how? I would prefer a dplyr solution if possible rather than melting the data into long format as I always come unstuck with long format. Thanks

It looks like you’ve noticed it’s easiest to work with the data if it’s in long format. Here’s the method with tidyr. I then use a facet to separate the different groups. What facet you use depends on how you want to compare them.
library(tidyverse)
df %>%
pivot_longer(starts_with("Data")) %>%
ggplot(. , aes(y = value, x= Color, group = Color)) +
geom_boxplot() +
facet_grid(~name)
If you really wanted them all on the same plot without facets, you could create a dummy variable. You could play around with factors to order them how you wish.
df %>%
pivot_longer(starts_with("Data")) %>%
mutate(group_var = paste0(name, " - ", Color)) %>%
ggplot(. , aes(y = value, x= group_var, group = group_var)) +
geom_boxplot()
Created on 2022-01-14 by the reprex package (v2.0.1)

Related

Making a unique grouped bar graph for multiple different data frames

I have a series of about 300 data frames each structured the same way and want to write a code that will turn each of them into their own bar graph. I am struggling to write a code that structures the graph correctly in the first place. My data frames look like this as an example:
precursorMz Mz_Round HW Intensity Reg Intensity diff1 diff2
1 256.6814 141.10 4216 3994 0.96 1.00
2 256.6814 142.10 7184 5988 1.00 1.02
3 256.6814 143.12 44510 30020 1.02 1.00
4 256.6814 144.12 1858 1312 1.00 0.00
5 256.6814 260.20 43010 23230 4.52 1.00
6 256.6814 261.20 9452 6388 1.00 0.99
I want my graph to have the Mz_Round column be the X axis and then my Y values be HW Intensity and Reg Intensity.
I have tried using the barplot() function but again am having issues with getting my axes to be correct.
intensities <- table(split1$`HW Intensity`, split1$`Reg Intensity`)
barplot(intensities,
main = "Intensity Compared",
xlab = "M/z", ylab = "Intensity",
col = c("darkgrey", "blue"),
rownames(split1$Mz_Round),
beside = TRUE)
I have tried a couple of plots. I hope this helps.
# Data
> dput(df)
structure(list(precursor_Mz = c(256.6814, 256.6814, 256.6814,
256.6814, 256.6814, 256.6814), Mz_Round = c(141.1, 142.1, 143.12,
144.12, 260.2, 261.2), HW_Intensity = c(4216, 7184, 44510, 1858,
43010, 9452), Reg_Intensity = c(3994, 5988, 30020, 1312, 23230,
6388), diff1 = c(0.96, 1, 1.02, 1, 4.52, 1), diff2 = c(1, 1.02,
1, 0, 1, 0.99)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -6L), spec = structure(list(cols = list(
precursor_Mz = structure(list(), class = c("collector_double",
"collector")), Mz_Round = structure(list(), class = c("collector_double",
"collector")), HW_Intensity = structure(list(), class = c("collector_double",
"collector")), Reg_Intensity = structure(list(), class = c("collector_double",
"collector")), diff1 = structure(list(), class = c("collector_double",
"collector")), diff2 = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1L), class = "col_spec"))
library(tidyverse)
# pivoting data
df1 <- df|>
select("Mz_Round", "HW_Intensity", "Reg_Intensity")|>
pivot_longer(!Mz_Round)
# stacked bar plot
ggplot(df1) +
geom_col(aes(x = as.factor(Mz_Round), y = value, fill = name))
# dodged bar plot
ggplot(df1) +
geom_col(aes(x = as.factor(Mz_Round), y = value, fill = name), position = "dodge")

Issue in creating polygon in ggplot in R

Why does my polygon look like two triangles instead of a box? Can some help explain how I can turn the polygon into a box?
data <- structure(list(AREA = c("a", "a", "b", "b"), Lat = c(43.68389835,
43.68389835, 44.3395883, 44.3395883), Long = c(-88.22909367,
-88.99888743, -88.22909367, -88.99888743)), row.names = c(NA,
-4L), spec = structure(list(cols = list(AREA = structure(list(), class = c("collector_character",
"collector")), Lat = structure(list(), class = c("collector_double",
"collector")), Long = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), delim = ","), class = "col_spec"), problems = <pointer: 0x000002548f014500>, class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"))
Code:
library(tidyverse)
ggplot() + geom_polygon(data=data, mapping=aes(x=Long, y=Lat))
Currently geom_polygon draws the polygon in exactly the order of the points as given in data. To have a closed polygon, you need to order your points appropriately, either clockwise or anti-clockwise.
We can do this by calculating the angle relative to the lat/long centre, and then order points according to that angle.
library(tidyverse)
data %>%
mutate(angle = atan2(Lat - mean(Lat), Long - mean(Long))) %>%
arrange(desc(angle)) %>%
ggplot() +
geom_polygon(aes(x = Long, y = Lat))

Loop in tidyverse

I am learning tidyverse() and I am using a time-series dataset, and I selected columns that start with sec. What I would like basically to identify those values from columns that equal 123, keep these and have the rest replace with 0. But I don't know how to loop from sec1:sec4. Also how can I sum() per columns?
df1<-df %>%
select(starts_with("sec")) %>%
select(ifelse("sec1:sec4"==123, 1, 0))
Sample data:
structure(list(sec1 = c(1, 123, 1), sec2 = c(123, 1, 1), sec3 = c(123,
0, 0), sec4 = c(1, 123, 1)), spec = structure(list(cols = list(
sec1 = structure(list(), class = c("collector_double", "collector"
)), sec2 = structure(list(), class = c("collector_double",
"collector")), sec3 = structure(list(), class = c("collector_double",
"collector")), sec4 = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), delim = ","), class = "col_spec"), row.names = c(NA,
-3L), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"))
I think you would have to use mutate and across to accomplish this. below you will mutate across each column starting with sec and then keep all values that are 123 and replace all others with 0.
df1<-df %>%
select(starts_with("sec")) %>%
mutate(across(starts_with("sec"),.fns = function(x){ifelse(x == 123,x,0)}))

Bubble plot for three observation

This my data. I m trying to put three column in my bubbleplot.
They are Altered, Unaltered and the associated survial q value
My data frame
dput(df)
structure(list(Class = c("cell fate commitment", "chromatin remodeling",
"chromatin_covalent", "demethylation", "histone methylation",
"intracellular receptor signaling pathway", "negative regulation of cell differentiation",
"Nuclear Receptor transcription pathway", "PID HDAC CLASSI PATHWAY",
"PID SMAD2 3NUCLEAR PATHWAY", "regulation of chromatin organization",
"Transcriptional misregulation in cancer"), Altered = c(182,
312, 433, 117, 354, 294, 258, 268, 244, 185, 197, 282), Unaltered = c(489,
361, 235, 559, 315, 370, 411, 409, 426, 491, 483, 387), `q-Value` = c(0.0009732,
1.1e-07, 2.832e-05, 0.137, 0.003188, 0.971, 0.139, 0.0008647,
0.002938, 2.843e-06, 3.102e-06, 0.032)), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -12L), spec = structure(list(
cols = list(Class = structure(list(), class = c("collector_character",
"collector")), Altered = structure(list(), class = c("collector_double",
"collector")), Unaltered = structure(list(), class = c("collector_double",
"collector")), `q-Value` = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1L), class = "col_spec"))
Code for the plot
xm <- reshape2::melt(df, id.vars = "Class", variable.name = "Samples", value.name = "Size")
# Calculate bubble size
bubble_size <- function(val){
ifelse(val > 3, (1/15) * val + (1/3), val)
}
# Calculate bubble colour
bubble_colour <- function(val){
ifelse(val > 3, "A", "B")
}
# Calculate bubble size and colour
xm %<>%
mutate(bub_size = bubble_size(Size),
bub_col = bubble_colour(Size))
# Plot data
ggplot(xm, aes(x = Samples, y = fct_rev(Class))) +
geom_point(aes(size = bub_size, fill = bub_col), shape = 21, colour = "black") +
# geom_text()
geom_label_repel(aes(label=Size), size=3)+
theme(panel.grid.major = element_line(colour = alpha("gray", 0.5), linetype = "dashed"),
text = element_text(family = "serif"),
legend.position = "none") +
scale_size(range = c(1, 25)) +
scale_fill_manual(values = c("blue","red")) +
ylab("Class")
I get something like this
How do I label the two patient group into different color as well as label the data point such as patient number for both group and the qualue in the plot
Update
I can put label into the plot. But not able to map two different colors for the patient group altered and unaltered group.
Updated fig

Visualize bubbles on a map, using hc_add_series_map() instead of hcmap()

I am trying to visualize a bubble map, using highcharter.
I did it perfectly, using this code
library(highcharter)
library(tidyverse)
hcmap("custom/africa") %>%
hc_add_series(data = fake_data, type = "mapbubble", maxSize = '10%', color =
"Red", showInLegend = FALSE) %>%
hc_legend(enabled = FALSE)
My data
> dput(fake_data)
structure(list(country = c("DZ", "CD", "ZA", "TZ"), lat = c(28.033886,
-4.038333, -30.559482, -6.369028), lon = c(1.659626, 21.758664,
22.937506, 34.888822), name = c("Algeria", "Congo, Dem. Rep",
"South Africa", "Tanzania"), z = c(20, 5, 10, 1)), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -4L), spec =
structure(list(
cols = list(country = structure(list(), class = c("collector_character",
"collector")), lat = structure(list(), class = c("collector_double",
"collector")), lon = structure(list(), class = c("collector_double",
"collector")), name = structure(list(), class = c("collector_character",
"collector")), z = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
External geo data for Africa originally comes from this source and used with hcmap().
But I transform it into RDS and use locally. Available here.
My problem that I cannot use my code and external data due to corporate IT security restrictions. I cannot deploy this code with Shiny/RMarkdown on Connect, it is blocked.
So my solution currently
Use the same data in RDS format
africa_map_data <- readRDS("africa_map_data.RDS")
And use the hc_add_series_map() with local data instead of hcmap().
highchart() %>%
hc_add_series_map(
map = africa_map_data,
df = fake_data,
value = "z",
joinBy = c("hc-a2", "country"),
type = "mapbubble",
maxSize = '10%',
color = "Red"
)
But it does not work well, I get a mess.
How to create a bubble map with hc_add_series_map() (or any other way) without 'hcmap' and pulling external data.
Thanks!

Resources