igraph does not show the right network I imported - r

I would like run some sna analysis. I work with RStudio and the igraph Package.
My input data is from a text file (created from excel as a tab seperated text file).
The data file has 3 columns. 1st and 2nd row are network data (vertices) and the 3rd row is the weight for each edge. I use airport connections data that looks like this:
1 54 28382 (Airport ID Origin Airport / Airport ID Destination Airport / Passanger number as a weight)
I loaded id with these commands:
USAN_num1 <- read.table('USAN_num.txt', header=T)
USAN_g_num1 <- graph.data.frame(USAN_num1)
> summary(USAN_g_num1)
Vertices: 626
Edges: 7078
Directed: TRUE
No graph attributes.
Vertex attributes: name.
Edge attributes: PAX.
Data looks like this:
ORIGN DESTN PAX
1 1 604 646
2 2 42 3736
3 2 118 5189
Now to the problem that occured:
My network consints of 6 different clusters when I check it with igraph. Even when I create a graphical picture of my network it has 6 seperated parts. That makes totally no sense since my data should be connected to one network. I checked through my dataset and there really are not different sub-networks.
Here is the cluster characteristics I get:
$csize
[1] 5 608 2 4 5 2
$no
[1] 6
One vertice in a small cluster is even a huge airport that should be connected to many others and not just 1 other...
UPDATE:
I now updated to the newest igraph version but it still does not work.
I uploaded an exemplary part of my data as a .txt file here: USAN_numS.txt
Would be great if someone has an idea on what I did wrong.
Thank you

So, as I said above the in my comment, a possible source of confusion is that your graph has symbolic vertex names that are actually numbers and don't match igraph's vertex ids. The workaround is to drop the vertex names, or to specify them explicitly when creating the graph, so that they match the igraph vertex ids.
But your graph really has multiple components, see the following code, where I check it in the original table, that two vertices only appear exactly once in the table, and they form a component of two by themselves.
Maybe the network really has multiple components, or there are mistakes in the file.
library(igraph)
USAN_num1 <- read.table('USAN_numS.txt', header=T)
USAN_g_num1 <- graph.data.frame(USAN_num1,
vertices=data.frame(id=1:max(USAN_num1[,1:2])))
clu <- clusters(USAN_g_num1)
clu$csize
## [1] 5 607 2 4 5 1 2 1
## The '1's appear because we counted the vertices that are
## not in the table
## Third component has two vertices only, let's check them in the
## original table
which(clu$membership == 3)
## [1] 64 617
## List the table rows where any of these two appear
USAN_num1[ USAN_num1[,1] %in% c(64, 617) | USAN_num1[,1] %in% c(64, 617), ]
## ORIGN DESTN PAX
## 691 64 617 636

Related

How to create an interval file defined by values from another file - for circos imaging of WGS data

I am trying to depict my whole-genome sequence (WGS) data of my parasite, using the circos software.
One of the elements I would like to depict, is the areas of the reference genome for which i do not have sequencing data from my parasite.
I order to do this, I have used Samtools to create an mpileup file, from which I have extracted the positions where the sequence depth = 0. I therefore have a file that looks like this:
$chromosome_name $chromosome_position $depth
chr_1 1 0
chr_1 2 0
chr_1 3 0
chr_2 67 0
chr_2 68 0
chr_2 1099 0
chr_2 1100 0
chr_2 1101 0
this means that there are 3 positions in chromosome 1, with no sequence data (depth = 0): namely positions 1, 2 and 3. For chromosome 2, the positions with no data are positions 67, 68, 1099, 1100 and 1101.
Due to the fact that my files are enormous (up to 3 million lines), and the fact that alot of the unsequenced positions come in intervals, I would like to create an interval file from the above data. Also, circos requires such an interval-file in order to create tiles. I therefore need to create a new file from the above, that looks like this:
$chromosome_name $start_pos $end_pos
chr_1 1 3
chr_2 67 68
chr_2 1099 1101
I have searched a bunch, but I have only found questions pertaining to grouping data by pre-defined intervals (e.g. group purchases occurring over a period of 6 months, patients by age etc).
So if anybody can help me out, I will be extremely happy!
Sidsel
Consider using bedtools. Specifically the bedtools merge sub-command:
http://bedtools.readthedocs.io/en/latest/content/tools/merge.html
From this page, it would seem to do what you want:
bedtools merge combines overlapping or “book-ended” features in an
interval file into a single feature which spans all of the combined
features.
Moreover, you can use the -d option to specify max distance between featured to merge:
-d Maximum distance between features allowed for features to be merged. Default is 0. That is, overlapping and/or book-ended features
are merged.

how to create a network having edge list and node list in two different csv file using igraph in R?

what I am trying to do is to create a network using igraph in R
my data includes :
1. edge list in a CSV file like this one call it book
a b
1 2
2 3
5 1
2.node list in a CSv File like this one call it name
c
1
2
3
4
5
this is the code :
library(igraph)
# Import Data
relations=read.csv("book.csv",head=TRUE)
# Load (UNDIRECTED) graph from data frame
network=graph.data.frame(relations,directed=FALSE)
ecount(network)
Vcount(network)
summary(network)
#visualizing the network
tkplot(network,vertex.shape="circle", vertex.color="red" ,vertex.size=10)
I create and plot the network with only the edge list and everything works fine but there is no nodes showing there and when I use the Vcount(network) I get this error:could not find function "Vcount"
but in summary
I got for example :
IGRAPH UN--20 111 --
attr: name (v/c)
I think that I should use the data.frame to specify the nodes but I don't know how?(using this [http://www.inside-r.org/packages/cran/igraph/docs/graph.data.frame])
I try this code but it dosen't work :
library(igraph)
# Import Data
relations=read.csv("book.csv",head=TRUE)
nodes=read.csv("name.csv",head=TRUE)
vertex=data.frame(nodes)
# Load (UNDIRECTED) graph from data frame
network=graph.data.frame(relations,directed=FALSE,vertices=vertex)
ecount(network)
Vcount(network)
summary(network)
#visualizing the network
tkplot(network,vertex.shape="circle", vertex.color="red" ,vertex.size=10)
when I run summary(network) everything is right but still I got the Vcount error and again in tkplot still no nodes just the edges ....
How should I fix this ?

Excel: Select data for graph

To put it simple, I have three columns in excel like the ones below:
Vehicle x y
1 10 10
1 15 12
1 12 9
2 8 7
2 11 6
3 7 12
x and y are the coordinates of customers assigned to the corresponding vehicle. This file is the output of a program I run in advance. The list will always be sorted by vehicle, but the number of customers assigned to vehicle "k" may change from one experiment to the next.
I would like to plot a graph containing 3 series, one for each vehicle, where the customers of each vehicle would appear (as dots in 2D based on their x- and y- values) in different color.
In my real file, I have 12 vehicles and 3200 customers, and the ranges change from one experiment to the next, so I would like to automate the process, i.e copy-paste the list on my excel and see the graph appear automatically (if this is possible).
Thanks in advance for your time and effort.
EDIT: There is a similar post here: Use formulas to select chart data but requires the use of VB. Moreover, I am not sure whether it has been indeed answered.
you should try this free online tool - www.cloudyexcel.com/excel-to-graph/

merge same row of different Vector and multiplicate afterwards

I have a dataset like this:
MQ = data.frame(Model=c("C150A","B174","DG18"),Quantity=c(5000,3800,4000))
MQ is a data.frame, it shows the Productionplan for a week in the future. With Model producing Model and Quantity
C150A = data.frame( Material=c("A0015", "A0071", "Z00071", "Z00080","Z00090",
"Z00012","SZ0001"), Number=c(1,1,1,1,1,1,4))
B174= data.frame(Material=c("A0014","A0071","Z00080","Z00091","Z00011","SZ0000"),
Number=c(1,1,1,1,2,4))
DG18= data.frame( Material=c("A0014","A0075","Z00085","Z00090","Z00010","SZ0005"),
Number=c(1,1,1,2,3,4))
T75A= data.frame(Material=c("A0013","A0075","Z00085","Z00090","Z00012","SZ0005"),
Number=c(1,1,1,2,3,4))
G95= data.frame(Material=c("A0013","A0075","Z00085","Z00090","Z00017","SZ0008"),
Number=c(1,1,1,2,3,4))
These are Models which could be produced...
My first problem here is, that belonging on the Productionplan MQ, i want to open automatically the needed Models, and multiplicate the Quantity with the number, to know how many of each Component(Material) is needed.
The output could be a data.frame, where all needed Components ( different Models can use the same Components and different Components, also the amount of needed Components caan be different) over all in the production plan noted Models are combined.
Material_Master= data.frame( Material=c( "A0013", "A001","A0015", "A0071", "A0075",
"A0078", "Z00071", "Z00080", "Z00090", "Z00091",
"Z00012","Z00091","Z00010""Z00012","Z00017","SZ0001",
"SZ0005","SZ0005","SZ0000","SZ0008","SZ0009"),
Number=c(20000,180000,250000,480000,250000,170000,
690000,1800000,17000,45000,12000,5000, 5000,
8000,16000,17000,45000,88000,7500,12000,45000))
In the last step the created data.frame should be merged with the Material_Master data: in the Material Master data, there are all important Components with the stock noted.
In my example there are all Components which where needed for the production also noted in the Material Master, but it can also be that in Material_Master is a Component missing, then just ignore this Component.
The Output should be something like, Compare the needed amount of Components, with the actual stock of them. Give a report, if there is more need then the actual stock have.
Thank you for your help.
This should work:
mods <- do.call(rbind,lapply(MQ$Model,function(x)cbind(Model=x,get(x))))
full_plan <- merge(mods,MQ,by="Model")
material_plan <- with(full_plan,aggregate(Quantity*Number,by=list(Material),sum))
# Group.1 x
# 1 A0014 7800
# 2 A0015 5000
# 3 A0071 8800
# 4 A0075 4000
# 5 SZ0000 15200
# 6 SZ0001 20000
# 7 SZ0005 16000
# 8 Z00010 12000
# 9 Z00011 7600
# 10 Z00012 5000
# 11 Z00071 5000
# 12 Z00080 8800
# 13 Z00085 4000
# 14 Z00090 13000
# 15 Z00091 3800
The first line gets each of your models and stacks them, along with the model name. The second line merges back to get the Quantity, and the third aggregates.
I went ahead and made a usable example by trimming off the 1 at the beginning of each Number in your latter models. Also, I read the Model and Material columns in as character instead of factor.
options(stringsAsFactors=FALSE)
MQ = data.frame(Model=c("C150A","B174","DG18"),Quantity=c(5000,3800,4000))
C150A = data.frame(Material=c("A0015","A0071","Z00071","Z00080","Z00090","Z00012","SZ0001"),Number=c(1,1,1,1,1,1,4))
B174= data.frame(Material=c("A0014","A0071","Z00080","Z00091","Z00011","SZ0000"), Number=c(1,1,1,1,2,4))
DG18= data.frame(Material=c("A0014","A0075","Z00085","Z00090","Z00010","SZ0005"),Number=c(1,1,1,2,3,4))
T75A= data.frame(Material=c("A0013","A0075","Z00085","Z00090","Z00012","SZ0005"),Number=c(1,1,1,2,3,4))
G95= data.frame(Material=c("A0013","A0075","Z00085","Z00090","Z00017","SZ0008"),Number=c(1,1,1,2,3,4))
Edit: Added the required stringsAsFactors option, as identified by #RicardoSaporta.

How can I select human miRNA from affy chip while analyzing data using R?

I am new to R and want to analyze miRNA expression from a data set of 3 groups. Can anyone help me out.
In this case I got other miRNAs(on affy chips) as top expressed genes. Now I want to select only human miRNAs. Please help me
Thanks in advance
Summary
I'm not entirely sure what your data frame looks like, given that I haven't worked with Affy chips before. Let me try to summarize what I think you have told us. You have a data frame with a list of all of the microRNAs on the Affy chip, along with their expression data. You want to select a subset of these microRNAs that are unique to humans.
Possible solution 1
You do not state whether or not your data frame contains a variable that identifies whether or not these microRNAs are indeed from humans. If it does have this information, all you would need to do is subset your data based on this identifier. Type help(subset) or help(Extract) for more information on how to do this.
Possible solution 2
If your data frame does not contain such an identifier, you will first need to make a list of all known human microRNAs. You could retrieve these manually from the online miRBase website (and then import them into R), or you could download them from Ensembl using the R package biomaRt. To do the latter, after loading biomaRt, you might type this command:
miRNA <- getBM(c("mirbase_id", "ensembl_gene_id", "start_position", "chromosome_name"), filters = c("with_mirbase"), values = list(TRUE), mart = ensembl)
The above code requests that R download the mirbase identifier, gene ID, start position, and chromosome name for all microRNAs in the miRBase catalog. (Note that you would have to specify the human Ensembl mart in an earlier command, which I have not shown).
Once you have downloaded this information, you could use a merge command or perhaps a which command to pull the appropriate microRNAs from your Affy chip data.
Recommendations
This all might sound a bit complicated. If you haven't already, I recommend that you spend some time working through exercises on biomaRt and bioconductor. Information about these packages, and how to install them, are available at the below links:
Bioconductor, http://www.bioconductor.org/install/
Database mining with biomaRt, http://www.stat.berkeley.edu/~sandrine/Teaching/PH292.S10/Durinck.pdf
You might consider asking for this question to be migrated to Biostar. I think you would get better responses there. Also, consider editing your question to provide more information about your data. Good luck.
Edit to my original answer
In reference to your comment made at 2012-02-26 22:08:02, try the following:
## Load biomaRt package
library(biomaRt)
## Specify which "mart" (i.e., source of genetic data) that you want to use
ensembl <- useMart("ensembl")
ensembl <- useDataset("hsapiens_gene_ensembl", mart = ensembl)
## You can then ask the system what attributes are available for download
listAttributes(ensembl)
name description
58 mirbase_accession miRBase Accession(s)
59 mirbase_id miRBase ID(s)
60 mirbase_gene_name miRBase gene name
61 mirbase_transcript_name miRBase transcript
Above I have pasted part of the output from the listAttributes() command, which shows the relevant miRBase options. Now you can try the following code:
## Download microRNA data
miRNA <- getBM(c("mirbase_id", "ensembl_gene_id", "start_position", "chromosome_name"), filters = c("with_mirbase"), values = list(TRUE), mart = ensembl)
## Check how much we downloaded
> dim(miRNA)
[1] 715 4
## Peak at the head of our data
> head(miRNA)
mirbase_id ensembl_gene_id start_position chromosome_name
1 hsa-mir-320c-1 ENSG00000221493 19263471 18
2 hsa-mir-133a-1 ENSG00000207786 19405659 18
3 hsa-mir-1-2 ENSG00000207694 19408965 18
4 hsa-mir-320c-2 ENSG00000212051 21901650 18
5 hsa-mir-187 ENSG00000207797 33484781 18
6 hsa-mir-1539 ENSG00000222690 47013743 18
## Check which chromosomes are contributing to our data
> table(miRNA$chromosome_name)
1 10 11 12 13 14 15 16 17 18 19 2 20 21 22 3 4 5 6 7 8 9 X
50 27 26 25 15 59 26 15 35 7 85 23 32 5 16 31 23 30 17 33 27 28 80
Now your challenge will be to use this downloaded data to parse your original Affy data frame. Again, read the help files for the merge, Extract, and which functions to give it a try yourself first.

Resources