Disease network - networking

Disease network - networking

I have a table with two columns: One column contains Disease names the other contains Genes. I want to create a graph of only Disease nodes, connecting two nodes if they have a common gene affecting them.
How can I do this in Cytoscape?

There isn't a button (or import function) in Cytoscape that will perform this transformation automatically. You would need to perform this transformation prior to network import in order to construct the network model that you want. Or you could perform it in a number of steps in Cytoscape, following an algorithm like this:
Import disease-gene network
Identify sets of disease nodes adjacent to a given gene node (e.g., select each gene and then select first neighbors; or use Filters)
Connect nodes within each set as a clique (all connected to all) (e.g., right-click on selected set, Add > Edges Connecting Selected Nodes)
Delete gene nodes (and consequently all disease-gene connections) (e.g., select all nodes; or use Filters)
If you like R or Python, then you could leverage the RCy3 or py4cytoscape packages to interact with Cytoscape while doing the transformation by script.

Related

Efficient algorithm to get node in a subset most connected to nodes in another subset in a directed graph

I need an efficient way to find the node in a subset of nodes that is the most connected to another subset of nodes.
Currently, I iterate over each node in the first subset S1 and increment a counter if there is a path to a node in subset S2, done over each node in S2. So time complexity is S1xS2x(time to find a path between candidates). My current algorithm is implemented using networkx and the graph is a directed graph.
Does anyone know of an algorithm that can solve my problem?

If your first subset is connected ( there is a path between every pair of nodes in the subset ) then there is no answer, all nodes are equally connected to the second subset.. If your second subset is connected, then again there is no answer because every node is equally connected, or not.
If your subsets are not connected, then the "most connected" node in the first subset is one of the nodes in the first subset connected to the largest component of the second subset. ( A component is a set of nodes, all reachable from each other )
So:
If subsets are connected
select random node
If second subset not connected
find largest component of second subset
find first node in first subset connected to node in largest component of second sebset.

Do I need to create all nodes by hand in Neo4j?

I am probably missing something because I am very new to Neo4j, but looking at their Movie graph - probably the very first graph to play with when you are learning the platform - they give us a really big piece of code where every node and labels and properties are imputed by hand, one after the other. Ok, it seems fair to a small graph for learning purpose. But, how should I proceed when I want to import a CSV and create a graph from this data? I believe a hand-imput is not expected at all.
My data look something like this:
date
origin
destiny
value
type
balance
01-05-2021
A
B
500
transf
2500
It has more than 10 thousand rows like this.
I loaded it as:
LOAD CSV FROM "file:///MyData.csv" AS data
RETURN data;
and it worked. The data was loaded etc. But now I have some questions:
1- How do I proceeed if I want origin to be a node and destiny to be another node with type to be edges with value as property? I mean, I know how to create it like (a)->[]->(b) but how to create the entire graph without creating edge by edge, node by node, property by property etc...?
2- Am I able to select the date and see something like a time evolution for this graph? I want to see all transactions in 20-05-2021, 01-05-2021 etc and see how it evolves. Is it possible?

As example in the official docs says here: https://neo4j.com/docs/operations-manual/current/tutorial/neo4j-admin-import/#tutorial-neo4j-admin-import
You may want to create 3 separate files for the import:
First: you need the movies.csv to import nodes with label :Movie
movieId:ID,title,year:int,:LABEL
tt0133093,"The Matrix",1999,Movie
tt0234215,"The Matrix Reloaded",2003,Movie;Sequel
tt0242653,"The Matrix Revolutions",2003,Movie;Sequel
Second: you need actors.csv to import nodes with label :Actor
personId:ID,name,:LABEL
keanu,"Keanu Reeves",Actor
laurence,"Laurence Fishburne",Actor
carrieanne,"Carrie-Anne Moss",Actor
Finally, you can import relationships
As you see, actors and movies are already imported. So now you just need to specify the relationships. In the example, you're importing ROLE relationship in the given format:
:START_ID,role,:END_ID,:TYPE
keanu,"Neo",tt0133093,ACTED_IN
keanu,"Neo",tt0234215,ACTED_IN
keanu,"Neo",tt0242653,ACTED_IN
laurence,"Morpheus",tt0133093,ACTED_IN
laurence,"Morpheus",tt0234215,ACTED_IN
laurence,"Morpheus",tt0242653,ACTED_IN
carrieanne,"Trinity",tt0133093,ACTED_IN
carrieanne,"Trinity",tt0234215,ACTED_IN
carrieanne,"Trinity",tt0242653,ACTED_IN
So as you see in the header, you've got values:
START_ID - where the relationship starts, from which node
role - property name (you can specify multiple properties here, just make sure the csv format contains data for it)
:END_IN - where the relationship ends, to which node
:TYPE - type of the relationship
That's all :)

Set vertex names

I have a network on R and I have to attach names to all vertices that have more than 3 related ties(or better, that have degree >=2, that is, 2 or more adjacent edges). In one case I have a network made of firms who collaborated with one another, and I need to assign to all vertices with degree>=3 the correspondent firm's name (which I have in the csv dataset in the column Project Company).

From and From Named Graph in SPARQL

I am getting a confusion related to FROM and FROM NAMED graphs in SPARQL. I did read the specifications relating to these two construct in the SPARQL Specifications. I just want to confirm my understanding.
Suppose an RDF Dataset is located at IRI I. I is made up of:
a default graph G
3 named graphs {(I1,G1), (I2,G2), (I3,G3)}
Now, suppose I have a SPARQL query:
SELECT *
FROM I
FROM I1
FROM NAMED I2
So if I understand, to evaluate this query, the SPARQL service may construct the active graph at the back, this active merge will contain:
a default graph which is the merge of I and I1
a named graph I2
Is this understanding right?

The FROM, FROM NAMED clauses describe the dataset to be queried. How that comes into being is not part of the SPARQL spec. There is a universe of graphs from which I, I1, and I2 are taken.
You are correct that the dataset for the query will have a default graph which is the merge of I and I1, and also a named graph I2.
Whether those are taken from the underlying dataset is implementation dependent. It is a common thing to provide (the universe of graphs is the named graphs in the dataset) but it is also possible that the I, I1, and I2 are taken from the web (the universe of graphs is the web).

Analyzing Path Data

I have data representing the paths people take across a fixed set of points (discrete, e.g., nodes and edges). So far I have been using igraph.
I haven't found a good way yet (in igraph or another package) to create canonical paths summarizing what significant sub-groups of respondents are doing.
A canonical path can be operationalized in any reasonable way and is just meant to represent a typical path or sub-path for a significant portion of the population.
Does there already exist a function to create these within igraph or another package?

One option: represent each person's movement as a directed edge. Create an aggregate graph such that each edge has a weight corresponding to the number of times that edge occurred. Those edges with large weights will be "typical" 1-paths.
Of course, it gets more interesting to find common k-paths or explore how paths vary among individuals. The naive approach for 2-paths would be to create N additional nodes that correspond to nodes when visited in the middle of the 2-path. For example, if you have nodes a_1, ..., a_N you would create nodes b_1, ..., b_N. The aggregate network might have an edge (a_3, b_5, 10) and an edge (b_5, a_7, 10); this would represent the two-path (a_3, b_5, a_7) occurring 10 times. The task you're interested in corresponds to finding those two-paths with large weights.
Both the igraph and network packages would suffice for this sort of analysis.
If you have some bound on k (ie. only 6-paths occur in your dataset), I might also suggest enumerating all the paths that are taken and computing the histogram of each unique path. I don't know of any functions that do this automagically for you.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Disease network - networking

I have a table with two columns: One column contains Disease names the other contains Genes. I want to create a graph of only Disease nodes, connecting two nodes if they have a common gene affecting them. How can I do this in Cytoscape?

Related

Efficient algorithm to get node in a subset most connected to nodes in another subset in a directed graph

Do I need to create all nodes by hand in Neo4j?

Set vertex names

From and From Named Graph in SPARQL

Analyzing Path Data

Categories

Resources