I have a network on R and I have to attach names to all vertices that have more than 3 related ties(or better, that have degree >=2, that is, 2 or more adjacent edges). In one case I have a network made of firms who collaborated with one another, and I need to assign to all vertices with degree>=3 the correspondent firm's name (which I have in the csv dataset in the column Project Company).
Related
I need an efficient way to find the node in a subset of nodes that is the most connected to another subset of nodes.
Currently, I iterate over each node in the first subset S1 and increment a counter if there is a path to a node in subset S2, done over each node in S2. So time complexity is S1xS2x(time to find a path between candidates). My current algorithm is implemented using networkx and the graph is a directed graph.
Does anyone know of an algorithm that can solve my problem?
If your first subset is connected ( there is a path between every pair of nodes in the subset ) then there is no answer, all nodes are equally connected to the second subset.. If your second subset is connected, then again there is no answer because every node is equally connected, or not.
If your subsets are not connected, then the "most connected" node in the first subset is one of the nodes in the first subset connected to the largest component of the second subset. ( A component is a set of nodes, all reachable from each other )
So:
If subsets are connected
select random node
If second subset not connected
find largest component of second subset
find first node in first subset connected to node in largest component of second sebset.
I have a table with two columns: One column contains Disease names the other contains Genes. I want to create a graph of only Disease nodes, connecting two nodes if they have a common gene affecting them.
How can I do this in Cytoscape?
There isn't a button (or import function) in Cytoscape that will perform this transformation automatically. You would need to perform this transformation prior to network import in order to construct the network model that you want. Or you could perform it in a number of steps in Cytoscape, following an algorithm like this:
Import disease-gene network
Identify sets of disease nodes adjacent to a given gene node (e.g., select each gene and then select first neighbors; or use Filters)
Connect nodes within each set as a clique (all connected to all) (e.g., right-click on selected set, Add > Edges Connecting Selected Nodes)
Delete gene nodes (and consequently all disease-gene connections) (e.g., select all nodes; or use Filters)
If you like R or Python, then you could leverage the RCy3 or py4cytoscape packages to interact with Cytoscape while doing the transformation by script.
I want to analyze a dataset of companies that are related through shared corporate directors. The problem is that I first create a bipartite network and then work with its projection, and I don't know when to attach the vertex attributes.
I'm starting with a large dataframe with the following columns:
CompanyID, Director ID, CompanyName, Revenue, Size, Ticker, DirectorName, Director Position, ChairmanDummy, NumberofBoardSeats
I turn the df into a graph:
net<-graph.data.frame (df, directed = FALSE)
This gives me a graph ("net") of: CompanyID, Director ID relationships
I can't add vertex attributes at this stage because it is a bipartite network, and I only have a table of attributes for the company vertexes (the columns CompanyID, Director ID, CompanyName, Revenue, Size, Ticker). If I try to include "vertices=nodes" -- nodes being a file that contains the company vertexes, I get the error:
"Some vertex names in edge list are not listed in vertex data frame"
presumably because the table doesn't include data on the director vertexes.
So I move on to taking the bipartite projection that I am interested in:
V(net)$type <- V(net)$name %in% df$source
table(net$type)
is_bipartite(net)
net.list<-bipartite_projection(net, multiplicity = TRUE)
firms<-net.list[[1]]
Here, firms is a projection of firm-firm relationships where two firms share the same company director.
But, at this point, how do I add the data on the attributes associated with each vertex (each company) once I have already created a graph?
As above, I have the "nodes" file that contains the information associated with each company, but I don't know how to attach it to the firms graph.
I tried exporting the graph as a data.frame: (Firms.df<- as_data_frame(firms)
But get the error "cannot coerce class ""igraph"" to a data.frame"
As a related question, I'm unsure of how to deal with the information on the director vertexes that are the part of the bipartite projection that I am essentially discarding. Some of these are director attributes (name, number of director positions) but others are specific to a firm-director pair (position). These are less useful for my analysis, but my sense is that I would need to subset the initial dataframe to select the features that I am interested in (eg only directors that belong to 10 or more boards) and then construct the network from the resulting subset of data - is this the best way to approach them?
Any help would be much appreciated.
I used UCSC blat to search for a horse genomic sequence. Three results were returned, two were unplaced scaffolds, and the other was chr1. All had 100% identity to my query (gagttcctagacaccaaatacaacgtgggaatacacaacctactggcctatgtgaaacacctgaaaggccagaatgaggaagccctgaagagcttgagagaagctgaagacttaatccaggaagaacatggtgaccaatcaggcat).
My question is, are there 3 copies of this gene in the horse, or can the scaffolds belong to chr1? For what its worth, there is only one copy of the gene in mouse.
The answer turns out to be all 3 results are unique places in the genome. Unplaced scaffolds are sequences in the genome that are being worked on, but have not been added to the reference genome.
I have data representing the paths people take across a fixed set of points (discrete, e.g., nodes and edges). So far I have been using igraph.
I haven't found a good way yet (in igraph or another package) to create canonical paths summarizing what significant sub-groups of respondents are doing.
A canonical path can be operationalized in any reasonable way and is just meant to represent a typical path or sub-path for a significant portion of the population.
Does there already exist a function to create these within igraph or another package?
One option: represent each person's movement as a directed edge. Create an aggregate graph such that each edge has a weight corresponding to the number of times that edge occurred. Those edges with large weights will be "typical" 1-paths.
Of course, it gets more interesting to find common k-paths or explore how paths vary among individuals. The naive approach for 2-paths would be to create N additional nodes that correspond to nodes when visited in the middle of the 2-path. For example, if you have nodes a_1, ..., a_N you would create nodes b_1, ..., b_N. The aggregate network might have an edge (a_3, b_5, 10) and an edge (b_5, a_7, 10); this would represent the two-path (a_3, b_5, a_7) occurring 10 times. The task you're interested in corresponds to finding those two-paths with large weights.
Both the igraph and network packages would suffice for this sort of analysis.
If you have some bound on k (ie. only 6-paths occur in your dataset), I might also suggest enumerating all the paths that are taken and computing the histogram of each unique path. I don't know of any functions that do this automagically for you.