Count edges using the adjacency matrix - r

Based on the adjacency matrix, I would like to count the number of unique edges in a network. In the below example I coloured the unique edges between the different nodes. But I don't know how to proceed.
Desired output:
Sample data
structure(list(...1 = c("m1", "m2", "m3", "m4"), m1 = c(0.2,
0.2, 0.2, 0.3), m2 = c(0.1, 0.2, 0.2, 0.6), m3 = c(0.5, 0.2,
1, 0), m4 = c(0.3, 0, 0, 0.1)), row.names = c(NA, -4L), spec = structure(list(
cols = list(...1 = structure(list(), class = c("collector_character",
"collector")), m1 = structure(list(), class = c("collector_double",
"collector")), m2 = structure(list(), class = c("collector_double",
"collector")), m3 = structure(list(), class = c("collector_double",
"collector")), m4 = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), delim = ","), class = "col_spec"), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"))

Assuming that this is an undirected graph such that 0 indicates no edge and a positive number indicates an edge, convert the input DF to a logical matrix and from that to an igraph object. Then get its edges and the names of those edges. (Another possible output is by using as_edgelist(g) to get a 2 column matrix such that each row defines an edge.)
If it were intended that the graph be directed then replace "undirected" with "directed" and in that case a character vector of 13 edge names will be produced instead of the 9 undirected edges shown below.
library(igraph)
m <- as.matrix(DF[-1])
rownames(m) <- colnames(m)
g <- graph_from_adjacency_matrix(m > 0, "undirected")
e <- E(g)
attr(e, "vnames")
## [1] "m1|m1" "m1|m2" "m1|m3" "m1|m4" "m2|m2" "m2|m3" "m2|m4" "m3|m3" "m4|m4"
Alternately as a pipeline
library(igraph)
library(tibble)
DF %>%
column_to_rownames("...1") %>%
as.matrix %>%
sign %>%
graph_from_adjacency_matrix("undirected") %>%
E %>%
attr("vnames")
## [1] "m1|m1" "m1|m2" "m1|m3" "m1|m4" "m2|m2" "m2|m3" "m2|m4" "m3|m3" "m4|m4"
The graph of g looks like this. (If "directed" had been chosen above then the edges would have arrowheads on them.)
set.seed(123)
plot(g)
Note
DF <-
structure(list(...1 = c("m1", "m2", "m3", "m4"), m1 = c(0.2,
0.2, 0.2, 0.3), m2 = c(0.1, 0.2, 0.2, 0.6), m3 = c(0.5, 0.2,
1, 0), m4 = c(0.3, 0, 0, 0.1)), row.names = c(NA, -4L), spec = structure(list(
cols = list(...1 = structure(list(), class = c("collector_character",
"collector")), m1 = structure(list(), class = c("collector_double",
"collector")), m2 = structure(list(), class = c("collector_double",
"collector")), m3 = structure(list(), class = c("collector_double",
"collector")), m4 = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), delim = ","), class = "col_spec"), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"))

Related

Making a unique grouped bar graph for multiple different data frames

I have a series of about 300 data frames each structured the same way and want to write a code that will turn each of them into their own bar graph. I am struggling to write a code that structures the graph correctly in the first place. My data frames look like this as an example:
precursorMz Mz_Round HW Intensity Reg Intensity diff1 diff2
1 256.6814 141.10 4216 3994 0.96 1.00
2 256.6814 142.10 7184 5988 1.00 1.02
3 256.6814 143.12 44510 30020 1.02 1.00
4 256.6814 144.12 1858 1312 1.00 0.00
5 256.6814 260.20 43010 23230 4.52 1.00
6 256.6814 261.20 9452 6388 1.00 0.99
I want my graph to have the Mz_Round column be the X axis and then my Y values be HW Intensity and Reg Intensity.
I have tried using the barplot() function but again am having issues with getting my axes to be correct.
intensities <- table(split1$`HW Intensity`, split1$`Reg Intensity`)
barplot(intensities,
main = "Intensity Compared",
xlab = "M/z", ylab = "Intensity",
col = c("darkgrey", "blue"),
rownames(split1$Mz_Round),
beside = TRUE)
I have tried a couple of plots. I hope this helps.
# Data
> dput(df)
structure(list(precursor_Mz = c(256.6814, 256.6814, 256.6814,
256.6814, 256.6814, 256.6814), Mz_Round = c(141.1, 142.1, 143.12,
144.12, 260.2, 261.2), HW_Intensity = c(4216, 7184, 44510, 1858,
43010, 9452), Reg_Intensity = c(3994, 5988, 30020, 1312, 23230,
6388), diff1 = c(0.96, 1, 1.02, 1, 4.52, 1), diff2 = c(1, 1.02,
1, 0, 1, 0.99)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -6L), spec = structure(list(cols = list(
precursor_Mz = structure(list(), class = c("collector_double",
"collector")), Mz_Round = structure(list(), class = c("collector_double",
"collector")), HW_Intensity = structure(list(), class = c("collector_double",
"collector")), Reg_Intensity = structure(list(), class = c("collector_double",
"collector")), diff1 = structure(list(), class = c("collector_double",
"collector")), diff2 = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1L), class = "col_spec"))
library(tidyverse)
# pivoting data
df1 <- df|>
select("Mz_Round", "HW_Intensity", "Reg_Intensity")|>
pivot_longer(!Mz_Round)
# stacked bar plot
ggplot(df1) +
geom_col(aes(x = as.factor(Mz_Round), y = value, fill = name))
# dodged bar plot
ggplot(df1) +
geom_col(aes(x = as.factor(Mz_Round), y = value, fill = name), position = "dodge")

Loop in tidyverse

I am learning tidyverse() and I am using a time-series dataset, and I selected columns that start with sec. What I would like basically to identify those values from columns that equal 123, keep these and have the rest replace with 0. But I don't know how to loop from sec1:sec4. Also how can I sum() per columns?
df1<-df %>%
select(starts_with("sec")) %>%
select(ifelse("sec1:sec4"==123, 1, 0))
Sample data:
structure(list(sec1 = c(1, 123, 1), sec2 = c(123, 1, 1), sec3 = c(123,
0, 0), sec4 = c(1, 123, 1)), spec = structure(list(cols = list(
sec1 = structure(list(), class = c("collector_double", "collector"
)), sec2 = structure(list(), class = c("collector_double",
"collector")), sec3 = structure(list(), class = c("collector_double",
"collector")), sec4 = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), delim = ","), class = "col_spec"), row.names = c(NA,
-3L), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"))
I think you would have to use mutate and across to accomplish this. below you will mutate across each column starting with sec and then keep all values that are 123 and replace all others with 0.
df1<-df %>%
select(starts_with("sec")) %>%
mutate(across(starts_with("sec"),.fns = function(x){ifelse(x == 123,x,0)}))

Visualize bubbles on a map, using hc_add_series_map() instead of hcmap()

I am trying to visualize a bubble map, using highcharter.
I did it perfectly, using this code
library(highcharter)
library(tidyverse)
hcmap("custom/africa") %>%
hc_add_series(data = fake_data, type = "mapbubble", maxSize = '10%', color =
"Red", showInLegend = FALSE) %>%
hc_legend(enabled = FALSE)
My data
> dput(fake_data)
structure(list(country = c("DZ", "CD", "ZA", "TZ"), lat = c(28.033886,
-4.038333, -30.559482, -6.369028), lon = c(1.659626, 21.758664,
22.937506, 34.888822), name = c("Algeria", "Congo, Dem. Rep",
"South Africa", "Tanzania"), z = c(20, 5, 10, 1)), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -4L), spec =
structure(list(
cols = list(country = structure(list(), class = c("collector_character",
"collector")), lat = structure(list(), class = c("collector_double",
"collector")), lon = structure(list(), class = c("collector_double",
"collector")), name = structure(list(), class = c("collector_character",
"collector")), z = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
External geo data for Africa originally comes from this source and used with hcmap().
But I transform it into RDS and use locally. Available here.
My problem that I cannot use my code and external data due to corporate IT security restrictions. I cannot deploy this code with Shiny/RMarkdown on Connect, it is blocked.
So my solution currently
Use the same data in RDS format
africa_map_data <- readRDS("africa_map_data.RDS")
And use the hc_add_series_map() with local data instead of hcmap().
highchart() %>%
hc_add_series_map(
map = africa_map_data,
df = fake_data,
value = "z",
joinBy = c("hc-a2", "country"),
type = "mapbubble",
maxSize = '10%',
color = "Red"
)
But it does not work well, I get a mess.
How to create a bubble map with hc_add_series_map() (or any other way) without 'hcmap' and pulling external data.
Thanks!

How to plot correlation graphs with R^2 for a big datamatrix?

I have a proteomics data matrix. In the data matrix, I have detected a different number of peptides for each protein (detectable peptides numbers vary on the protein).
Q1. How can I plot correlation graphs for each protein to compare how its' peptides behave. i.e. For protein A, I have peptides a1-a3, I want to compare a1 vs a2, a1 vs a3, and a2 vs a3.
Sample data
structure(list(Protein = c("A", "A", "A", "A", "B", "C", "C", "D", "D", "D"), Peptide = c("a1", "a2", "a3", "a4", "b1", "c1", "c2", "d1", "d2", "d3"), Sample1 = c(0.275755732, 0.683048798, 1.244604878, 0.850270313, 0.492175199, 0.269651338, 0.393004954, 0.157966662, 1.681672581, 0.298308801), Sample2 = c(0.408992244, 0.172488244, 1.749247694, 0.358172308, 0.142129982, 0.158636283, 0.243500648, 0.095019037, 0.667928805, 0.572162278), Sample3 = c(0.112265765, 0.377174168, 2.430040623, 0.497873323, 0.141136584, 0.250330266, 0.249783164, 0.107188279, 0.173623439, 0.242298602), Sample4 = c(0.87688073, 0.841826338, 0.831376575, 0.985900966, 0.891632525, 1.016533723, 0.292048735, 0.776351689, 0.800070173, 1.161882923), Sample5 = c(1.034093889, 0.304305772, 0.616445765, 1.000820463, 1.03124071, 0.995897846, 0.289542364, 0.578721727, 0.672592766, 1.168944588), Sample6 = c(1.063124715, 0.623917522, 0.613196611, 0.990921045, 1.014340981, 0.965631141, 0.316793011, 1.02220535, 1.182063616, 1.41196421), Sample7 = c(1.335677026, 0.628621656, 0.411171453, 1.050563412, 1.290233552, 1.1603839, 0.445372411, 1.077192698, 0.726669337, 1.09453338), Sample8 = c(1.139360562, 0.404024829, 0.263714711, 0.899959209, 1.356913804, 1.246338203, 0.426568548, 1.104988267, 0.964924824, 1.083654341), Sample9 = c(1.38146599, 0.582817437, 0.783698738, 1.118948066, 1.010795866, 1.277086848, 0.434025911, 1.238871048, 1.201184368, 1.476478831), Sample10 = c(1.111486801, 0.60513273, 0.460680037, 1.385702246, 1.448873253, 1.364329784, 0.375032044, 1.382750002, 0.741842319, 1.035657705)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"), spec = structure(list( cols = list(Protein = structure(list(), class = c("collector_character", "collector")), Peptide = structure(list(), class = c("collector_character", "collector")), Sample1 = structure(list(), class = c("collector_double", "collector")), Sample2 = structure(list(), class = c("collector_double", "collector")), Sample3 = structure(list(), class = c("collector_double", "collector")), Sample4 = structure(list(), class = c("collector_double", "collector")), Sample5 = structure(list(), class = c("collector_double", "collector")), Sample6 = structure(list(), class = c("collector_double", "collector")), Sample7 = structure(list(), class = c("collector_double", "collector")), Sample8 = structure(list(), class = c("collector_double", "collector")), Sample9 = structure(list(), class = c("collector_double", "collector")), Sample10 = structure(list(), class = c("collector_double", "collector"))), default = structure(list(), class = c("collector_guess", "collector"))), class = "col_spec"))
Hence peptide number varies for each protein, how can I compare each peptide and save the faceted graph into single plots, by this, I can select only required graphs.
"Hence peptide number varies for each protein, how can I compare each peptide and save the faceted graph into single plots, by this, I can select only required graphs." I'm not entirely sure what you actually want to plot. A correlation plot of which quantities? Select only which required graphs?
Anyway, perhaps the following will help.
library(GGally)
library(tidyverse)
df %>%
gather(Sample, Value, -Protein, -Peptide) %>%
spread(Peptide, Value) %>%
filter(Protein == "A") %>%
ggpairs(columns = 3:6)
Explanation: We reshape data such that we have Values for every Peptide in columns; then we filter entries for Protein == "A" and use GGally::ggpairs to show pairwise correlation plots of Values for every Peptide.
You have a lot of flexibility in customising the output plot of ggpairs (e.g. add regression lines, remove panels, etc.); I recommend taking a look at the GGally GitHub project page and at Multiple regression lines in ggpairs.
Update
If you want to show correlation plots only for certain Peptides, you could do the following
pep_of_interest <- c("a2", "a4")
df %>%
gather(Sample, Value, -Protein, -Peptide) %>%
spread(Peptide, Value) %>%
filter(Protein == "A") %>%
ggpairs(columns = match(pep_of_interest, colnames(.)))
Here is a solution using the corrplot library if you are looking for visual representation of correlation. A lot more plotting options are available in the library (take a look at the corrplot vignette).
# sample data
dd <- structure(list(Protein = c("A", "A", "A", "A", "B", "C", "C", "D", "D", "D"), Peptide = c("a1", "a2", "a3", "a4", "b1", "c1", "c2", "d1", "d2", "d3"), Sample1 = c(0.275755732, 0.683048798, 1.244604878, 0.850270313, 0.492175199, 0.269651338, 0.393004954, 0.157966662, 1.681672581, 0.298308801), Sample2 = c(0.408992244, 0.172488244, 1.749247694, 0.358172308, 0.142129982, 0.158636283, 0.243500648, 0.095019037, 0.667928805, 0.572162278), Sample3 = c(0.112265765, 0.377174168, 2.430040623, 0.497873323, 0.141136584, 0.250330266, 0.249783164, 0.107188279, 0.173623439, 0.242298602), Sample4 = c(0.87688073, 0.841826338, 0.831376575, 0.985900966, 0.891632525, 1.016533723, 0.292048735, 0.776351689, 0.800070173, 1.161882923), Sample5 = c(1.034093889, 0.304305772, 0.616445765, 1.000820463, 1.03124071, 0.995897846, 0.289542364, 0.578721727, 0.672592766, 1.168944588), Sample6 = c(1.063124715, 0.623917522, 0.613196611, 0.990921045, 1.014340981, 0.965631141, 0.316793011, 1.02220535, 1.182063616, 1.41196421), Sample7 = c(1.335677026, 0.628621656, 0.411171453, 1.050563412, 1.290233552, 1.1603839, 0.445372411, 1.077192698, 0.726669337, 1.09453338), Sample8 = c(1.139360562, 0.404024829, 0.263714711, 0.899959209, 1.356913804, 1.246338203, 0.426568548, 1.104988267, 0.964924824, 1.083654341), Sample9 = c(1.38146599, 0.582817437, 0.783698738, 1.118948066, 1.010795866, 1.277086848, 0.434025911, 1.238871048, 1.201184368, 1.476478831), Sample10 = c(1.111486801, 0.60513273, 0.460680037, 1.385702246, 1.448873253, 1.364329784, 0.375032044, 1.382750002, 0.741842319, 1.035657705)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"), spec = structure(list( cols = list(Protein = structure(list(), class = c("collector_character", "collector")), Peptide = structure(list(), class = c("collector_character", "collector")), Sample1 = structure(list(), class = c("collector_double", "collector")), Sample2 = structure(list(), class = c("collector_double", "collector")), Sample3 = structure(list(), class = c("collector_double", "collector")), Sample4 = structure(list(), class = c("collector_double", "collector")), Sample5 = structure(list(), class = c("collector_double", "collector")), Sample6 = structure(list(), class = c("collector_double", "collector")), Sample7 = structure(list(), class = c("collector_double", "collector")), Sample8 = structure(list(), class = c("collector_double", "collector")), Sample9 = structure(list(), class = c("collector_double", "collector")), Sample10 = structure(list(), class = c("collector_double", "collector"))), default = structure(list(), class = c("collector_guess", "collector"))), class = "col_spec"))
# for Protein A, build subset of data
tempdd <- dd[dd$Protein == "A",][,-1]
cc <- tempdd[,1]
tempdd <- t(tempdd[,-1])
colnames(tempdd) <- cc
# calculate the correlations for all samples
rr <- cor(tempdd)
# install.packages("corrplot")
library(corrplot)
#Build the plot
corrplot(rr,method='circle')

ggplot loop deal with special characters

Hi there I'm trying to plot a defined number of graphs using gridExtra.
This is working but unfortunately it is not dealing with special characters in its name. I tried to work around by using R friendly names and add in the actual name as a subtitle
library(gridExtra)
library(ggplot2)
Dataframe<-read.csv2(File_with_R_friendly_names.csv)
names<-read.csv2(File_with_actual_names.csv)
bar<-colnames(names)
list_of_plots<-lapply(names(Dataframe)[2:10], function(i) {
ggplot(Dataframe, aes_string(x="X1", y=i)) + geom_point()+labs(x=i, y="Intensity", subtitle=bar[i])
})
do.call(grid.arrange, c(list_of_plots, ncol=3))
If I put in bar[2] all graphs get the actual name but it is the same one for all while if I set bar to i, all graphs get NA.
The names I use to suit R are
Met1, Met2, Met3, Met4, Met5, Met6, Met7, Met8, Met9 and Met10
Examples of names that I need on the plots are:
-(-)-Corey lactone
-(2R)-2,3-Dihydroxypropanoic acid
-(D-(+)-Glyceric acid?)
-1,5-Naphthalenediamine
-12-Aminododecanoic acid
-2,5-di-tert-Butylhydroquinone
-2,6-di-tert-Butylphenol
-2-Amino-N,N-diethylacetamide
-2-Ethyl-2-phenylmalonamide
-2-Naphthalenesulfonic acid
Here is the dput to reproduce the bar (names):
`bar<-c("X1", "(-)-Corey lactone", "(2R)-2,3-Dihydroxypropanoic acid (D-(+)- Glyceric acid?)", "1,5-Naphthalenediamine", "12-Aminododecanoic acid", "2,5-di- tert-Butylhydroquinone", "2,6-di-tert-Butylphenol", "2-Amino-N,N- diethylacetamide", "2-Ethyl-2-phenylmalonamide", "2-Naphthalenesulfonic acid")`
Here is the dput to reproduce the dataframe:
Dataframe<-structure(list(X1 = c(0, 0, 0.25, 0.25, 0.5, 0.5, 1, 1, 2, 2),
Met1 = c(0, 0, 38096319.85, 45978353.93, 35077691.7, 42146132.41,
62606961.17, 32786049.6, 51054004.82, 48898547.32), Met2 = c(0,
0, 1288905.771, 948466.4001, 645979.6463, 1228663.251, 1137957.136,
940928.9344, 1443680.706, 1755726.385), Met3 = c(0, 0, 575887.464,
693692.0349, 1362477.6, 1515767.293, 2241120.502, 2417932.908,
3866432.112, 3894701.876), Met4 = c(0, 0, 16737068.73, 21915551.3,
12088089.1, 16003037.3, 17720785.29, 11957614.24, 13127281.5,
14192542.13), Met5 = c(0, 0, 4556006.426, 4782909.936, 4484706.271,
8019957.826, 5112289.476, 8537488.48, 6680688.948, 5959748.061
), Met6 = c(0, 0, 16874476.32, 15721984.25, 18093323.61,
18619817.92, 22055835.04, 19754379.11, 29211315.88, 27321333.35
), Met7 = c(0, 0, 6604385.457, 6396794.568, 13823034.64,
15449539.63, 26013299.82, 20262673.28, 35301685.57, 33367520.66
), Met8 = c(0, 0, 6727973.448, 7166827.569, 13238311.46,
13986568.69, 20957194.23, 19186953.76, 34513697.47, 31192991.75
), Met9 = c(0, 0, 2373752.304, 3259738.104, 1998529.732,
2387445.15, 2479309.442, 26924139.6, 4611277.427, 2439602.098
)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-10L), .Names = c("X1", "Met1", "Met2", "Met3", "Met4", "Met5",
"Met6", "Met7", "Met8", "Met9"), spec = structure(list(cols = structure(list(
X1 = structure(list(), class = c("collector_double", "collector"
)), Met1 = structure(list(), class = c("collector_double",
"collector")), Met2 = structure(list(), class = c("collector_double",
"collector")), Met3 = structure(list(), class = c("collector_double",
"collector")), Met4 = structure(list(), class = c("collector_double",
"collector")), Met5 = structure(list(), class = c("collector_double",
"collector")), Met6 = structure(list(), class = c("collector_double",
"collector")), Met7 = structure(list(), class = c("collector_double",
"collector")), Met8 = structure(list(), class = c("collector_double",
"collector")), Met9 = structure(list(), class = c("collector_double",
"collector"))), .Names = c("X1", "Met1", "Met2", "Met3",
"Met4", "Met5", "Met6", "Met7", "Met8", "Met9")), default = structure(list(), class = c("collector_guess",
"collector"))), .Names = c("cols", "default"), class = "col_spec"))
Because names(Dataframe)[2:10] is not number. Below will work:
list_of_plots<-lapply(as.numeric(names(Dataframe)[2:10]), function(i) {
ggplot(Dataframe, aes_string(x="X1", y=i)) + geom_point()+labs(x=i,
y="Intensity", subtitle=bar[i])
})
do.call(grid.arrange, c(list_of_plots, ncol=3))

Resources