extract vertex from a R graph - r

I have the following problem.
A directed graph called tutti
I have a vector called tabellaerrori containing a vertex in each position
Now my problem is:
I want to create an array cointaining the list of vertex which are both in tutti graph and in errori vector.
I used the following code but it doesn't work:
risultato<-as.character(intersect(tabellaerrori,V(tutti)))
It gives me back always the content of tabellaerrori
What's wrong ?

For those who didn't suffer through the non-downloadable google plus image gallery, the actual line generating the error is:
graph.neighborhood(tutti, vcount(tutti), risultato, "out")
## Error in as.igraph.vs(graph, nodes) : Invalid vertex names
From the help - graph.neighborhood(graph, order, nodes=V(graph), mode=c("all", "out", "in")) is expecting nodes to be an actual igraph vertex sequence. You just need to make sure your intersected nodes are in that form.
Here's what jbaums meant by a reproducible example (provided I've made the right assumptions from your screen captures):
library(igraph)
set.seed(1492) # makes this more reproducible
# simulate your overall graph
tutti <- graph.full(604, directed=TRUE)
V(tutti)$name <- as.character(sample(5000, 604))
# simulate your nodes
tabellaerrori <- as.character(c(sample(V(tutti), 79), sample(6000:6500, 70)))
names(tabellaerrori) <- as.numeric(tabellaerrori)
# give a brief view of the overall data objects
head(V(tutti))
## Vertex sequence:
## [1] "1389" "1081" "922" "553" "261" "42"
length(V(tutti))
## [1] 604
head(tabellaerrori)
## 293 415 132 299 408 526
## "293" "415" "132" "299" "408" "526"
length(tabellaerrori)
## [1] 149
# for your answer, find the intersection of the vertext *names*
risultato <- as.character(intersect(tabellaerrori, V(tutti)$name))
risultato
## Vertex sequence:
## [1] "293" "132" "155" "261" "68" "381" "217" "394" "581"
# who are the ppl in your neighborhood
graph.neighborhood(tutti, vcount(tutti), risultato, "out")
## [[1]]
## IGRAPH DN-- 604 364212 -- Full graph
## + attr: name (g/c), loops (g/l), name (v/c)
##
## [[2]]
## IGRAPH DN-- 604 364212 -- Full graph
## + attr: name (g/c), loops (g/l), name (v/c)
##
## ... (a few more)
What you were really doing before (i.e. what intersect was doing under the covers) is:
risultato <- as.character(intersect(tabellaerrori, as.character(V(tutti))))
hence, your Invalid vertex names error.

Related

(v)matchPattern DNAStringSetList of Codons to Reference DNAString

I am assessing the impact of hotspot single nucleotide polymorphism (SNPs) from a next generation sequencing (NGS) experiment on the protein sequence of a virus. I have the reference DNA sequence and a list of hotspots. I need to first figure out the reading frame of where these hotspots are seen. To do this, I generated a DNAStringSetList with all human codons and want to use a vmatchpattern or matchpattern from the Biostrings package to figure out where the hotspots land in the codon reading frame.
I often struggle with lapply and other apply functions, so I tend to utilize for loops instead. I am trying to improve in this area, so welcome a apply solution should one be available.
Here is the code for the list of codons:
alanine <- DNAStringSet("GCN")
arginine <- DNAStringSet(c("CGN", "AGR", "CGY", "MGR"))
asparginine <- DNAStringSet("AAY")
aspartic_acid <- DNAStringSet("GAY")
asparagine_or_aspartic_acid <- DNAStringSet("RAY")
cysteine <- DNAStringSet("TGY")
glutamine <- DNAStringSet("CAR")
glutamic_acid <- DNAStringSet("GAR")
glutamine_or_glutamic_acid <- DNAStringSet("SAR")
glycine <- DNAStringSet("GGN")
histidine <- DNAStringSet("CAY")
start <- DNAStringSet("ATG")
isoleucine <- DNAStringSet("ATH")
leucine <- DNAStringSet(c("CTN", "TTR", "CTY", "YTR"))
lysine <- DNAStringSet("AAR")
methionine <- DNAStringSet("ATG")
phenylalanine <- DNAStringSet("TTY")
proline <- DNAStringSet("CCN")
serine <- DNAStringSet(c("TCN", "AGY"))
threonine <- DNAStringSet("ACN")
tyrosine <- DNAStringSet("TGG")
tryptophan <- DNAStringSet("TAY")
valine <- DNAStringSet("GTN")
stop <- DNAStringSet(c("TRA", "TAR"))
codons <- DNAStringSetList(list(alanine, arginine, asparginine, aspartic_acid, asparagine_or_aspartic_acid,
cysteine, glutamine, glutamic_acid, glutamine_or_glutamic_acid, glycine,
histidine, start, isoleucine, leucine, lysine, methionine, phenylalanine,
proline, serine, threonine, tyrosine, tryptophan, valine, stop))
Current for loop code:
reference_stringset <- DNAStringSet(covid)
codon_locations <- list()
for (i in 1:length(codons)) {
pattern <- codons[[i]]
codon_locations[i] <- vmatchPattern(pattern, reference_stringset)
}
Current error code. I am filtering the codon DNAStringSetList so that it is a DNAStringSet.
Error in normargPattern(pattern, subject) : 'pattern' must be a single string or an XString object
I can't give out the exact nucleotide sequence, but here is the COVID genome (link: https://www.ncbi.nlm.nih.gov/nuccore/NC_045512.2?report=fasta) to use as a reprex:
#for those not used to using .fasta files, first copy and past genome into notepad and save as a .fasta file
#use readDNAStringSet from Biostrings package to read in the .fasta file
filepath = #insert file path
covid <- readDNAStringSet(filepath)
For the current code, change the way the codons is formed. Currently the output of codons looks like this:
DNAStringSetList of length 24
[[1]] GCN
[[2]] CGN AGR CGY MGR
[[3]] AAY
[[4]] GAY
[[5]] RAY
[[6]] TGY
[[7]] CAR
[[8]] GAR
[[9]] SAR
[[10]] GGN
...
<14 more elements>
Change it from DNAStringSetList to a conglomerate DNAStringSet of the amino acids.
codons <- DNAStringSet(c(alanine, arginine, asparginine, aspartic_acid, asparagine_or_aspartic_acid,
cysteine, glutamine, glutamic_acid, glutamine_or_glutamic_acid, glycine,
histidine, start, isoleucine, leucine, lysine, methionine, phenylalanine,
proline, serine, threonine, tyrosine, tryptophan, valine, stop))
codons
DNAStringSet object of length 32:
width seq
[1] 3 GCN
[2] 3 CGN
[3] 3 AGR
[4] 3 CGY
[5] 3 MGR
... ... ...
[28] 3 TGG
[29] 3 TAY
[30] 3 GTN
[31] 3 TRA
[32] 3 TAR
When I run the script I get the following output with the SARS-CoV-2 isolate listed for the example (I'm showing a small slice)
codon_locations[27:28]
[[1]]
MIndex object of length 1
$`NC_045512.2 Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome`
IRanges object with 0 ranges and 0 metadata columns:
start end width
<integer> <integer> <integer>
[[2]]
MIndex object of length 1
$`NC_045512.2 Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome`
IRanges object with 554 ranges and 0 metadata columns:
start end width
<integer> <integer> <integer>
[1] 89 91 3
[2] 267 269 3
[3] 283 285 3
[4] 352 354 3
[5] 358 360 3
... ... ... ...
[550] 29261 29263 3
[551] 29289 29291 3
[552] 29472 29474 3
[553] 29559 29561 3
[554] 29793 29795 3
Looking at the ones that had an output, only those with the standard nucleotides ("ATCG", no wobbles) found matches. Those will need to be changed as well to search.
If you're on twitter, I suggest linking the question using the #rstats, #bioconductor, and #bioinformatics hashtags to generate some more traction, I've noticed that bioinformatic specific questions on SO don't generate as much buzz.

How to remove elements of a list in R?

I have an igraph object, what I have created with the igraph library. This object is a list. Some of the components of this list have a length of 2. I would like to remove all of these ones.
IGRAPH clustering walktrap, groups: 114, mod: 0.79
+ groups:
$`1`
[1] "OTU0041" "OTU0016" "OTU0062"
[4] "OTU1362" "UniRef90_A0A075FHQ0" "UniRef90_A0A075FSE2"
[7] "UniRef90_A0A075FTT8" "UniRef90_A0A075FYU2" "UniRef90_A0A075G543"
[10] "UniRef90_A0A075G6B2" "UniRef90_A0A075GIL8" "UniRef90_A0A075GR85"
[13] "UniRef90_A0A075H910" "UniRef90_A0A075HTF5" "UniRef90_A0A075IFG0"
[16] "UniRef90_A0A0C1R539" "UniRef90_A0A0C1R6X4" "UniRef90_A0A0C1R985"
[19] "UniRef90_A0A0C1RCN7" "UniRef90_A0A0C1RE67" "UniRef90_A0A0C1RFI5"
[22] "UniRef90_A0A0C1RFN8" "UniRef90_A0A0C1RGE0" "UniRef90_A0A0C1RGX0"
[25] "UniRef90_A0A0C1RHM1" "UniRef90_A0A0C1RHR5" "UniRef90_A0A0C1RHZ4"
+ ... omitted several groups/vertices
For example, this one :
> a[[91]]
[1] "OTU0099" "UniRef90_UPI0005B28A7E"
I tried this but it does not work :
a[lapply(a,length)>2]
Any help?
Since you didn't provide any reproducible data or example, I had to produce some dummy data:
# create dummy data
a <- list(x = 1, y = 1:4, z = 1:2)
# remove elements in list with lengths greater than 2:
a[which(lapply(a, length) > 2)] <- NULL
In case you wanted to remove the items with lengths exactly equal to 2 (question is unclear), then last line should be replaced by:
a[which(lapply(a, length) == 2)] <- NULL

How to create igraph mult-graph list

I have a number of networks, currently represented as edgelists, and I can create individual igraph objects from them:
> nodes <- read.csv("nodes.csv",header=T,as.is=T)
> links <- read.csv("edges.csv",header=T,as.is=T)
> net <- graph_from_data_frame(d=links, vertices=nodes, directed=T)
> net
IGRAPH f40255d DNW- 255 458 --
+ attr: name (v/c), word (v/c), type (e/c), weight (e/n)
+ edges from f40255d (vertex names):
[1] s001->s002 s001->s004 s001->s006 s001->s013 s001->s018 s001->s020
[7] s001->s025 s001->s027 s001->s031 s001->s032 s001->s033 s001->s034
[13] s001->s035 s001->s037 s001->s038 s001->s041 s001->s042 s001->s044
[19] s001->s046 s001->s047 s001->s050 s001->s051 s001->s052 s001->s055
[25] s001->s059 s001->s060 s001->s064 s001->s065 s001->s067 s001->s068
[31] s001->s069 s001->s070 s001->s072 s001->s074 s001->s075 s001->s078
[37] s001->s081 s001->s088 s001->s091 s001->s092 s001->s093 s001->s098
[43] s001->s100 s001->s103 s001->s111 s001->s112 s001->s112 s001->s118
+ ... omitted several edges
But I want to analyze I want to analyze these using the graphkernels package in R, which requires a list of igraph graphs. I have looked over the docs but I can't find any guidance on how to turn a number of these into a list. Can anyone point me in the right direction?
Sorry if this is a dumb question. I am minimally proficient in R.
Update: I tried creating 3 igraph objects and putting them in a list with
g <- list(net1,net2,net3)
But when I try to run this I get
> result <- CalculateWLKernel(g,5)
Error in CalculateKernelCpp(graph.info.list, par, 11) :
vector::_M_default_append
In addition: There were 12 warnings (use warnings() to see them)
So obviously something isn't right.

Convert a graphNEL graph to a network graph

I'm trying to convert a graphNEL graph to network graph.
Here's my example using topGO's vignette:
library(topGO)
library(ALL)
data(ALL)
data(geneList)
affyLib <- paste(annotation(ALL),"db",sep= ".")
library(package=affyLib,character.only=TRUE)
topgo.obj <- new("topGOdata",description="Simple session",ontology="BP",allGenes=geneList,geneSel=topDiffGenes,nodeSize=10,annot=annFUN.db,affyLib=affyLib)
topgo.graph <- attr(topgo.obj,"graph")
And trying to convert topgo.graph to a network through intergraph
library(network)
library(sna)
library(scales)
library(igraph)
library(intergraph)
topgo.igraph <- graph_from_graphnel(topgo.graph,name=TRUE,weight=TRUE,unlist.attrs=TRUE)
And finally
topgo.network <- asNetwork(topgo.igraph,amap=attrmap())
throws this error:
Error in as.data.frame.default(x[[i]], optional = TRUE) :
cannot coerce class ""environment"" to a data.frame
When I try this with intergraph's example:
asNetwork(exIgraph)
it works fine
and as far I can tell exIgraph and topgo.igraph look similar:
> exIgraph
IGRAPH D--- 15 11 --
+ attr: label (v/c), label (e/c)
+ edges:
[1] 2-> 1 3-> 1 4-> 1 5-> 1 6-> 7 8-> 9 10->11 11->12 12->13 13->14 14->12
> topgo.igraph
IGRAPH DNW- 1017 2275 --
+ attr: name (v/c), genes (v/x), weight (e/n)
+ edges (vertex names):
[1] GO:0000003->GO:0008150 GO:0000070->GO:0000278 GO:0000070->GO:0007067 GO:0000070->GO:1903047 GO:0000070->GO:0000819
[6] GO:0000075->GO:0022402 GO:0000077->GO:0031570 GO:0000077->GO:0006974 GO:0000079->GO:1904029 GO:0000079->GO:0071900
[11] GO:0000082->GO:0044772 GO:0000082->GO:0044843 GO:0000086->GO:0000278 GO:0000086->GO:0044772 GO:0000086->GO:0044839
[16] GO:0000122->GO:0006357 GO:0000122->GO:0045892 GO:0000122->GO:0006366 GO:0000165->GO:0035556 GO:0000165->GO:0023014
[21] GO:0000187->GO:0032147 GO:0000187->GO:0043406 GO:0000209->GO:0016567 GO:0000226->GO:1902589 GO:0000226->GO:0007010
[26] GO:0000226->GO:0007017 GO:0000278->GO:0007049 GO:0000280->GO:0048285 GO:0000302->GO:0006979 GO:0000302->GO:1901700
[31] GO:0000723->GO:0006259 GO:0000723->GO:0032200 GO:0000723->GO:0060249 GO:0000819->GO:1902589 GO:0000819->GO:0098813
[36] GO:0000819->GO:0051276 GO:0000902->GO:0032989 GO:0000910->GO:0022402 GO:0000910->GO:0051301 GO:0001501->GO:0048731
+ ... omitted several edges
Any idea?
This is happening because of the "gene" attribute. If you view it using V(topgo.igraph)$gene, you will see it return a list of environments rather than a vector. When deep in the intergraph code, it tries to coerce the vertex attributes into a data frame, which it cannot do. (This happens in the dumpAttr() function -- see getAnywhere(dumpAttr.igraph).)
To solve this, you can simple delete the attribute:
topgo.igraph <- delete_vertex_attr(topgo.igraph,"genes")
topgo.network <- asNetwork(topgo.igraph,amap=attrmap())
The argument unlist.attrs=T I think is designed to prevent the exact problem above, but it is not working in this case. This might be due to the naming convention used by the genes in the network.
If you look at the attributes from the original graphNEL object, you will notice that it consists of objects of class environment:
> head(graph::nodeData(topgo.graph, attr = "genes"))
$`GO:0000003`
<environment: 0x15c005ae0>
$`GO:0000070`
<environment: 0x15c136bf0>
$`GO:0000075`
<environment: 0x15c118a70>
$`GO:0000077`
<environment: 0x15c13ae70>
$`GO:0000079`
<environment: 0x163145670>
$`GO:0000082`
<environment: 0x16313d148>)
You could also alter the attribute data in the original topGO object to solve the problem as well:
nodeData(topgo.graph, attr = "genes") <- topgo.obj#graph#nodes
topgo.igraph <- graph_from_graphnel(topgo.graph,name=TRUE,weight=TRUE,unlist.attrs=TRUE)
topgo.network <- asNetwork(topgo.igraph,amap=attrmap())
This preserves the genes a vertex attributes, if you want that:
> head(network::get.vertex.attribute(topgo.network, "genes"))
[1] "GO:0000003" "GO:0000070" "GO:0000075" "GO:0000077" "GO:0000079" "GO:0000082"

How to access attributes of a dendrogram in R

From a dendrogram which i created with
hc<-hclust(kk)
hcd<-as.dendrogram(hc)
i picked a subbranch
k=hcd[[2]][[2]][[2]][[2]][[2]][[2]][[2]][1]
When i simply have k displayed, this gives:
> k
[[1]]
[[1]][[1]]
[1] 243
attr(,"label")
[1] "NAfrica_002"
attr(,"members")
[1] 1
attr(,"height")
[1] 0
attr(,"leaf")
[1] TRUE
[[1]][[2]]
[1] 257
attr(,"label")
[1] "NAfrica_016"
attr(,"members")
[1] 1
attr(,"height")
[1] 0
attr(,"leaf")
[1] TRUE
attr(,"members")
[1] 2
attr(,"midpoint")
[1] 0.5
attr(,"height")
[1] 37
How can i access, for example, the "midpoint" attribute, or the second of the "label" attributes?
(I hope i use the correct terminology here)
I have tried things like
k$midpoint
attr(k,"midpoint")
but both returned 'NULL'.
Sorry for question number 2: how could i add a "label" attribute after the attribute "midpoint"?
Your k is still buried one layer too deep. The attributes have been set on the first element of the list k.
attributes(k[[1]]) # Display attributes
attributes(k[[1]])$label # Access attributes
attributes(k[[1]])$label <- 'new' # Change attribute
Alternatively, you can use attr:
attr(k[[1]],'label') # Display attribute
You can change parameters manually as in the previous answer. The problem with this is that it is not efficient to do manually when you want to do it many times. Also, while it is easy to change parameters - that change may not be reflected in any other function, since they won't implement any action based on that change (it must be programmed).
For your specific question - it generally depends on which attribute we want to view. For "midpoint", use the get_nodes_attr function, with the "midpoint" parameter - from the dendextend package.
# install.packages("dendextend")
library(dendextend)
dend <- as.dendrogram(hclust(dist(USArrests[1:5,])))
# Like:
# dend <- USArrests[1:5,] %>% dist %>% hclust %>% as.dendrogram
# midpoint for all nodes
get_nodes_attr(dend, "midpoint")
And you get this:
[1] 1.25 NA 1.50 0.50 NA NA 0.50 NA NA
To also change an attribute, you can use the various assign functions from the package: assign_values_to_leaves_nodePar, assign_values_to_leaves_edgePar, assign_values_to_nodes_nodePar, assign_values_to_branches_edgePar, remove_branches_edgePar, remove_nodes_nodePar
If all you want is to change the labels, the following ability from the package would solve your question:
> labels(dend)
[1] "Arkansas" "Arizona" "California" "Alabama" "Alaska"
> labels(dend) <- 1:5
> labels(dend)
[1] 1 2 3 4 5
For more details on the package, you can have a look at its vignette.

Resources