Making a unique grouped bar graph for multiple different data frames - r

I have a series of about 300 data frames each structured the same way and want to write a code that will turn each of them into their own bar graph. I am struggling to write a code that structures the graph correctly in the first place. My data frames look like this as an example:
precursorMz Mz_Round HW Intensity Reg Intensity diff1 diff2
1 256.6814 141.10 4216 3994 0.96 1.00
2 256.6814 142.10 7184 5988 1.00 1.02
3 256.6814 143.12 44510 30020 1.02 1.00
4 256.6814 144.12 1858 1312 1.00 0.00
5 256.6814 260.20 43010 23230 4.52 1.00
6 256.6814 261.20 9452 6388 1.00 0.99
I want my graph to have the Mz_Round column be the X axis and then my Y values be HW Intensity and Reg Intensity.
I have tried using the barplot() function but again am having issues with getting my axes to be correct.
intensities <- table(split1$`HW Intensity`, split1$`Reg Intensity`)
barplot(intensities,
main = "Intensity Compared",
xlab = "M/z", ylab = "Intensity",
col = c("darkgrey", "blue"),
rownames(split1$Mz_Round),
beside = TRUE)

I have tried a couple of plots. I hope this helps.
# Data
> dput(df)
structure(list(precursor_Mz = c(256.6814, 256.6814, 256.6814,
256.6814, 256.6814, 256.6814), Mz_Round = c(141.1, 142.1, 143.12,
144.12, 260.2, 261.2), HW_Intensity = c(4216, 7184, 44510, 1858,
43010, 9452), Reg_Intensity = c(3994, 5988, 30020, 1312, 23230,
6388), diff1 = c(0.96, 1, 1.02, 1, 4.52, 1), diff2 = c(1, 1.02,
1, 0, 1, 0.99)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -6L), spec = structure(list(cols = list(
precursor_Mz = structure(list(), class = c("collector_double",
"collector")), Mz_Round = structure(list(), class = c("collector_double",
"collector")), HW_Intensity = structure(list(), class = c("collector_double",
"collector")), Reg_Intensity = structure(list(), class = c("collector_double",
"collector")), diff1 = structure(list(), class = c("collector_double",
"collector")), diff2 = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1L), class = "col_spec"))
library(tidyverse)
# pivoting data
df1 <- df|>
select("Mz_Round", "HW_Intensity", "Reg_Intensity")|>
pivot_longer(!Mz_Round)
# stacked bar plot
ggplot(df1) +
geom_col(aes(x = as.factor(Mz_Round), y = value, fill = name))
# dodged bar plot
ggplot(df1) +
geom_col(aes(x = as.factor(Mz_Round), y = value, fill = name), position = "dodge")

Related

Count edges using the adjacency matrix

Based on the adjacency matrix, I would like to count the number of unique edges in a network. In the below example I coloured the unique edges between the different nodes. But I don't know how to proceed.
Desired output:
Sample data
structure(list(...1 = c("m1", "m2", "m3", "m4"), m1 = c(0.2,
0.2, 0.2, 0.3), m2 = c(0.1, 0.2, 0.2, 0.6), m3 = c(0.5, 0.2,
1, 0), m4 = c(0.3, 0, 0, 0.1)), row.names = c(NA, -4L), spec = structure(list(
cols = list(...1 = structure(list(), class = c("collector_character",
"collector")), m1 = structure(list(), class = c("collector_double",
"collector")), m2 = structure(list(), class = c("collector_double",
"collector")), m3 = structure(list(), class = c("collector_double",
"collector")), m4 = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), delim = ","), class = "col_spec"), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"))
Assuming that this is an undirected graph such that 0 indicates no edge and a positive number indicates an edge, convert the input DF to a logical matrix and from that to an igraph object. Then get its edges and the names of those edges. (Another possible output is by using as_edgelist(g) to get a 2 column matrix such that each row defines an edge.)
If it were intended that the graph be directed then replace "undirected" with "directed" and in that case a character vector of 13 edge names will be produced instead of the 9 undirected edges shown below.
library(igraph)
m <- as.matrix(DF[-1])
rownames(m) <- colnames(m)
g <- graph_from_adjacency_matrix(m > 0, "undirected")
e <- E(g)
attr(e, "vnames")
## [1] "m1|m1" "m1|m2" "m1|m3" "m1|m4" "m2|m2" "m2|m3" "m2|m4" "m3|m3" "m4|m4"
Alternately as a pipeline
library(igraph)
library(tibble)
DF %>%
column_to_rownames("...1") %>%
as.matrix %>%
sign %>%
graph_from_adjacency_matrix("undirected") %>%
E %>%
attr("vnames")
## [1] "m1|m1" "m1|m2" "m1|m3" "m1|m4" "m2|m2" "m2|m3" "m2|m4" "m3|m3" "m4|m4"
The graph of g looks like this. (If "directed" had been chosen above then the edges would have arrowheads on them.)
set.seed(123)
plot(g)
Note
DF <-
structure(list(...1 = c("m1", "m2", "m3", "m4"), m1 = c(0.2,
0.2, 0.2, 0.3), m2 = c(0.1, 0.2, 0.2, 0.6), m3 = c(0.5, 0.2,
1, 0), m4 = c(0.3, 0, 0, 0.1)), row.names = c(NA, -4L), spec = structure(list(
cols = list(...1 = structure(list(), class = c("collector_character",
"collector")), m1 = structure(list(), class = c("collector_double",
"collector")), m2 = structure(list(), class = c("collector_double",
"collector")), m3 = structure(list(), class = c("collector_double",
"collector")), m4 = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), delim = ","), class = "col_spec"), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"))

Loop in tidyverse

I am learning tidyverse() and I am using a time-series dataset, and I selected columns that start with sec. What I would like basically to identify those values from columns that equal 123, keep these and have the rest replace with 0. But I don't know how to loop from sec1:sec4. Also how can I sum() per columns?
df1<-df %>%
select(starts_with("sec")) %>%
select(ifelse("sec1:sec4"==123, 1, 0))
Sample data:
structure(list(sec1 = c(1, 123, 1), sec2 = c(123, 1, 1), sec3 = c(123,
0, 0), sec4 = c(1, 123, 1)), spec = structure(list(cols = list(
sec1 = structure(list(), class = c("collector_double", "collector"
)), sec2 = structure(list(), class = c("collector_double",
"collector")), sec3 = structure(list(), class = c("collector_double",
"collector")), sec4 = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), delim = ","), class = "col_spec"), row.names = c(NA,
-3L), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"))
I think you would have to use mutate and across to accomplish this. below you will mutate across each column starting with sec and then keep all values that are 123 and replace all others with 0.
df1<-df %>%
select(starts_with("sec")) %>%
mutate(across(starts_with("sec"),.fns = function(x){ifelse(x == 123,x,0)}))

Unexpected behavior of filter inside a function dplyr

I have a function that filters a data.frame based on the unique values of a group column that is passed to the function
la <- function(df, grp){
gr <- df %>% pull({{grp}}) %>% unique()
purrr::map(gr, function(x){
print(x)
filter(df, {{grp}} == x)
})
}
When I use it with this df,
x <- structure(list(mac = c("dc:a6:32:21:59:2b", "dc:a6:32:2d:8c:ca",
"dc:a6:32:2d:b8:62", "dc:a6:32:2d:ca:3f"), datetime = structure(c(1594644546,
1594645457, 1594645375, 1594645080), tzone = "UTC", class = c("POSIXct",
"POSIXt")), Comment = c("FED2", "FED7", "FED1", "FED6")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -4L))
la(x, mac)
I get the proper prints and the subsets.
However, when I use it with this other df, which should be equivalent, it doesn't work as expected.
df <- structure(list(datetime = structure(c(1594644600, 1594644900,
1594645200, 1594645500, 1594645800, 1594646100), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), movement = c(9940.50454596681, 10779.7747307276,
7148.52826988968, 7687.54314683339, 8797.06954533588, 7524.02474093548
), x = c(606, NA, 240, NA, 504, NA), y = c(386, NA, 274, NA,
56, NA), i_x = c(606, 228, 214, 407.5, 500, 292.947368421053),
i_y = c(386, 286, 258, 49.1666666666667, 56, 234), mac = c("dc:a6:32:21:59:2b",
"dc:a6:32:21:59:2b", "dc:a6:32:21:59:2b", "dc:a6:32:21:59:2b",
"dc:a6:32:21:59:2b", "dc:a6:32:21:59:2b")), spec = structure(list(
cols = list(filename = structure(list(), class = c("collector_character",
"collector")), datetime = structure(list(format = ""), class = c("collector_datetime",
"collector")), movement = structure(list(), class = c("collector_double",
"collector")), x = structure(list(), class = c("collector_double",
"collector")), y = structure(list(), class = c("collector_double",
"collector")), i_x = structure(list(), class = c("collector_double",
"collector")), i_y = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), delim = "\t"), class = "col_spec"), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
I get 0 rows on each type of group (my real example has the same groups as the ones for the x dataframe).
Interestingly, this works as expected.
la(select(head(df), mac, datetime), mac)
[1] "dc:a6:32:21:59:2b"
[[1]]
# A tibble: 6 x 2
mac datetime
<chr> <dttm>
1 dc:a6:32:21:59:2b 2020-07-13 12:50:00
2 dc:a6:32:21:59:2b 2020-07-13 12:55:00
3 dc:a6:32:21:59:2b 2020-07-13 13:00:00
4 dc:a6:32:21:59:2b 2020-07-13 13:05:00
5 dc:a6:32:21:59:2b 2020-07-13 13:10:00
6 dc:a6:32:21:59:2b 2020-07-13 13:15:00
What is going on?
As the comment suggests, the problem is that I have function(x) inside the map call and because df has an x column, things become weird. I chose another variable name for that, and now it's working.
la <- function(df, grp){
gr <- df %>% pull({{grp}}) %>% unique()
purrr::map(gr, function(tt){
print(tt)
filter(df, {{grp}} == tt)
})
}

Visualize bubbles on a map, using hc_add_series_map() instead of hcmap()

I am trying to visualize a bubble map, using highcharter.
I did it perfectly, using this code
library(highcharter)
library(tidyverse)
hcmap("custom/africa") %>%
hc_add_series(data = fake_data, type = "mapbubble", maxSize = '10%', color =
"Red", showInLegend = FALSE) %>%
hc_legend(enabled = FALSE)
My data
> dput(fake_data)
structure(list(country = c("DZ", "CD", "ZA", "TZ"), lat = c(28.033886,
-4.038333, -30.559482, -6.369028), lon = c(1.659626, 21.758664,
22.937506, 34.888822), name = c("Algeria", "Congo, Dem. Rep",
"South Africa", "Tanzania"), z = c(20, 5, 10, 1)), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -4L), spec =
structure(list(
cols = list(country = structure(list(), class = c("collector_character",
"collector")), lat = structure(list(), class = c("collector_double",
"collector")), lon = structure(list(), class = c("collector_double",
"collector")), name = structure(list(), class = c("collector_character",
"collector")), z = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
External geo data for Africa originally comes from this source and used with hcmap().
But I transform it into RDS and use locally. Available here.
My problem that I cannot use my code and external data due to corporate IT security restrictions. I cannot deploy this code with Shiny/RMarkdown on Connect, it is blocked.
So my solution currently
Use the same data in RDS format
africa_map_data <- readRDS("africa_map_data.RDS")
And use the hc_add_series_map() with local data instead of hcmap().
highchart() %>%
hc_add_series_map(
map = africa_map_data,
df = fake_data,
value = "z",
joinBy = c("hc-a2", "country"),
type = "mapbubble",
maxSize = '10%',
color = "Red"
)
But it does not work well, I get a mess.
How to create a bubble map with hc_add_series_map() (or any other way) without 'hcmap' and pulling external data.
Thanks!

ggplot loop deal with special characters

Hi there I'm trying to plot a defined number of graphs using gridExtra.
This is working but unfortunately it is not dealing with special characters in its name. I tried to work around by using R friendly names and add in the actual name as a subtitle
library(gridExtra)
library(ggplot2)
Dataframe<-read.csv2(File_with_R_friendly_names.csv)
names<-read.csv2(File_with_actual_names.csv)
bar<-colnames(names)
list_of_plots<-lapply(names(Dataframe)[2:10], function(i) {
ggplot(Dataframe, aes_string(x="X1", y=i)) + geom_point()+labs(x=i, y="Intensity", subtitle=bar[i])
})
do.call(grid.arrange, c(list_of_plots, ncol=3))
If I put in bar[2] all graphs get the actual name but it is the same one for all while if I set bar to i, all graphs get NA.
The names I use to suit R are
Met1, Met2, Met3, Met4, Met5, Met6, Met7, Met8, Met9 and Met10
Examples of names that I need on the plots are:
-(-)-Corey lactone
-(2R)-2,3-Dihydroxypropanoic acid
-(D-(+)-Glyceric acid?)
-1,5-Naphthalenediamine
-12-Aminododecanoic acid
-2,5-di-tert-Butylhydroquinone
-2,6-di-tert-Butylphenol
-2-Amino-N,N-diethylacetamide
-2-Ethyl-2-phenylmalonamide
-2-Naphthalenesulfonic acid
Here is the dput to reproduce the bar (names):
`bar<-c("X1", "(-)-Corey lactone", "(2R)-2,3-Dihydroxypropanoic acid (D-(+)- Glyceric acid?)", "1,5-Naphthalenediamine", "12-Aminododecanoic acid", "2,5-di- tert-Butylhydroquinone", "2,6-di-tert-Butylphenol", "2-Amino-N,N- diethylacetamide", "2-Ethyl-2-phenylmalonamide", "2-Naphthalenesulfonic acid")`
Here is the dput to reproduce the dataframe:
Dataframe<-structure(list(X1 = c(0, 0, 0.25, 0.25, 0.5, 0.5, 1, 1, 2, 2),
Met1 = c(0, 0, 38096319.85, 45978353.93, 35077691.7, 42146132.41,
62606961.17, 32786049.6, 51054004.82, 48898547.32), Met2 = c(0,
0, 1288905.771, 948466.4001, 645979.6463, 1228663.251, 1137957.136,
940928.9344, 1443680.706, 1755726.385), Met3 = c(0, 0, 575887.464,
693692.0349, 1362477.6, 1515767.293, 2241120.502, 2417932.908,
3866432.112, 3894701.876), Met4 = c(0, 0, 16737068.73, 21915551.3,
12088089.1, 16003037.3, 17720785.29, 11957614.24, 13127281.5,
14192542.13), Met5 = c(0, 0, 4556006.426, 4782909.936, 4484706.271,
8019957.826, 5112289.476, 8537488.48, 6680688.948, 5959748.061
), Met6 = c(0, 0, 16874476.32, 15721984.25, 18093323.61,
18619817.92, 22055835.04, 19754379.11, 29211315.88, 27321333.35
), Met7 = c(0, 0, 6604385.457, 6396794.568, 13823034.64,
15449539.63, 26013299.82, 20262673.28, 35301685.57, 33367520.66
), Met8 = c(0, 0, 6727973.448, 7166827.569, 13238311.46,
13986568.69, 20957194.23, 19186953.76, 34513697.47, 31192991.75
), Met9 = c(0, 0, 2373752.304, 3259738.104, 1998529.732,
2387445.15, 2479309.442, 26924139.6, 4611277.427, 2439602.098
)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-10L), .Names = c("X1", "Met1", "Met2", "Met3", "Met4", "Met5",
"Met6", "Met7", "Met8", "Met9"), spec = structure(list(cols = structure(list(
X1 = structure(list(), class = c("collector_double", "collector"
)), Met1 = structure(list(), class = c("collector_double",
"collector")), Met2 = structure(list(), class = c("collector_double",
"collector")), Met3 = structure(list(), class = c("collector_double",
"collector")), Met4 = structure(list(), class = c("collector_double",
"collector")), Met5 = structure(list(), class = c("collector_double",
"collector")), Met6 = structure(list(), class = c("collector_double",
"collector")), Met7 = structure(list(), class = c("collector_double",
"collector")), Met8 = structure(list(), class = c("collector_double",
"collector")), Met9 = structure(list(), class = c("collector_double",
"collector"))), .Names = c("X1", "Met1", "Met2", "Met3",
"Met4", "Met5", "Met6", "Met7", "Met8", "Met9")), default = structure(list(), class = c("collector_guess",
"collector"))), .Names = c("cols", "default"), class = "col_spec"))
Because names(Dataframe)[2:10] is not number. Below will work:
list_of_plots<-lapply(as.numeric(names(Dataframe)[2:10]), function(i) {
ggplot(Dataframe, aes_string(x="X1", y=i)) + geom_point()+labs(x=i,
y="Intensity", subtitle=bar[i])
})
do.call(grid.arrange, c(list_of_plots, ncol=3))

Resources