igraph R vertex ids get changed - r

I have very basic issue with igraph (in R): renaming of the node ids.
For example, I have following graph in form of edgelist.
10,12
10,14
12,14
12,15
14,15
12,17
17,34
17,100
100,34
I want to calculate local clustering coefficient for each node. First I have read the edgelist in object g using readcsv. Then, I used the following command to dump the local CC for each node.
write.csv(transitivity(g,type="local"),file="DumpLocalCC.csv")
Now the problem is, igraph changes the node IDs starting from 1 and I get following output
"","x"
"1",NA
"2",0.333333333333333
"3",0.333333333333333
"4",0.333333333333333
"5",1
"6",1
"7",1
Now how can I resolute which node id is what ? That is if 7 in the output file points to 100 or 34 ?
Is there anyway, we can force igraph to dump actual nodeids like 10, 34, 100 etc and their respective Local CC ?
I was googling and found people suggested "V(g)$name <- as.character(V(g))" for preserving the nodeids. I tried however, I think I am not using it correctly.
Also, since the data is large, I would not like to change the nodeids manually to make them sequential from 1 .... myself.
P.s: Here I noticed a similar question has been asked. It has been suggested to "assign these numbers as vertex names".
How to do that ?
Can someone exemplify it please ?
Another similar question like this (I understand its the similar question), where it was suggested to open an issue. I am not sure if this has been resolved ?
Thanks in advance.

You just need to combine the stats with the node names when you write the table. For example
DF <- read.csv(text="10,12
10,14
12,14
12,15
14,15
12,17
17,34
17,100
100,34", header=FALSE)
g <- graph.data.frame(DF)
outdata <- data.frame(node=names(V(g)), trans=transitivity(g, type="local"))
write.csv(outdata, file="DumpLocalCC.csv")

Related

Rstudio - how to write smaller code

I'm brand new to programming and an picking up Rstudio as a stats tool.
I have a dataset which includes multiple questionnaires divided by weeks, and I'm trying to organize the data into meaningful chunks.
Right now this is what my code looks like:
w1a=table(qwest1,talm1)
w2a=table(qwest2,talm2)
w3a=table(quest3,talm3)
Where quest and talm are the names of the variable and the number denotes the week.
Is there a way to compress all those lines into one line of code so that I could make w1a,w2a,w3a... each their own object with the corresponding questionnaire added in?
Thank you for your help, I'm very new to coding and I don't know the etiquette or all the vocabulary.
This might do what you wanted (but not what you asked for):
tbl_list <- mapply(table, list(qwest1, qwest2, quest3),
list(talm1, talm2, talm3) )
names(tbl_list) <- c('w1a', 'w2a','w3a')
You are committing a fairly typical new-R-user error in creating multiple similarly named and structured objects but not putting them in a list. This is my effort at pushing you in that direction. Could also have been done via:
qwest_lst <- list(qwest1, qwest2, quest3)
talm_lst <- list(talm1, talm2, talm3)
tbl_lst <- mapply(table, qwest_lst, talm_lst)
names(tbl_list) <- paste0('w', 1:3, 'a')
There are other ways to programmatically access objects with character vectors using get or wget.

Nodes Size Relative to Associated Value

I'm fairly new to R and haven't been able to find an answer for this. Someone else asked a similar question, but no solution was ever reported. If I should have posted this Q on a different stackexchange, I apologize and will delete if it can't be migrated.
Using data I pulled from the FDIC on US based financial institutions and their total asset holdings, I would like to create a basic network graph where each node is proportionally sized to each other node in the graph. Each node would also be labeled with the name of the financial institution.
The edges of the graph actually don't matter for now, but I want each node connected to the network by at least one edge.
As of now, I've already successfully created a very basic network with 8 banks, connected by edges I randomly assigned, as shown here (I apparently can't embed pictures yet, sorry about that):
My .csv file will be formatted as:
id, bank, assets
1, JP Morgan Chase, 16928000
2, Bank of America, 19075000
... ... ...
For the graph I already created, it is the same as above except without the asset column. It was also only 8 banks, where the file I hope to use will have 25.
Like I already said, as for edges, I just randomly assigned some. If someone knows an easier way of creating random edges that connect the nodes I create, please let me know. Otherwise, this is how my file is formatted as of now:
to, from
1, 2
1, 3
...
And I created the graph I linked with the following commands:
> nodes <- read.csv("~/foo/foo/foo.csv")
> links <- read.csv("~/blah/taco/burrito/blah.csv")
> net <- graph_from_data_frame(d=links, vertices = nodes, directed = F)
> class(net)
> net
IGRAPH UN-- 8 10 --
+ attr: name (v/c), bank (v/c)
+ edges (vertex names):
[1] 1--2 1--3 1--4 1--5 2--3 2--4 2--7 4--5 5--8 7--8
> plot(net, main = "Financial Intermediaries", edge.arrow.size=.4, vertex.size=25, vertex.label.cex=1.5, vertex.label.color="black", vertex.label=V(net)$bank)
I hope I was clear with my problem and gave the necessary details/code. If not, please just let me know and I'll post it up here. Like I said, I'm really new to R (I literally picked it up today, lol), and much of the code I've used so far was less or more taken from Katya Ognyanova's examples/presentations on her blog.
For the sake of clarity, I'm currently using RStudio (most recent stable) and R v3.2.5.
I have been only using the igraph package, but if what I want can't be done with that, I am more than willing to switch over to a different package. That said, I would like to stay with R (unless there really is something so much easier for this it can't be ignored. I would like to stick with and learn R).
Thank you for any and all help, I really appreciate it.
as #Osssan linked to in the comments, there was a partial solution floating around.
That said, I think I created more of a 'hack' solution than a proper one with what I gleaned from the previous question. Here is what I did.
In my csv file, I had four columns. In the third column, I had the asset's for a given bank. NOTE Since I don't know how to do data manipulation inside of R, I had to do some work to adjust the size of the asset value so that it did not result in nodes that covered the entirety of the graph. With my solution, you will NOT get nodes that are relative in size automatically. You must do that first.
Since I wanted to create a network with nodes(banks) that were variable in size according to their respective asset holdings, what I did was create a separate vector like so
> df <- read.csv("~/blah/blah/blah.csv", colClasses = c("NULL","NULL", NA, "NULL"))
What this command does is read in the csv file, looks at the headings with colClasses and tell the interpreter to vacuum up all columns specified (non-NULL). With this vector, I then plugged it into my the plot function as such:
> plot(net, main = "Financial Intermediaries", edge.arrow.size=.4, vertex.size=as.matrix(df), vertex.label.color="black")
where I make a matrix using the as.matrix(df) and set it to vertex.size=. Given a vector of only one dimension, R is able to quickly make the appropriate matrix (I guess).
I still have to do some relabeling and connecting with edges, but it worked in graphing. I graphed the largest 26 commercial banks by total asset holdings (and adjusted them to % of total commercial bank assets in the US), so you will see that the size of nodes increase from 26-1. Here's the output.
Like I said, this solution works, but I am far from sure whether it would be considered proper or kosher. I welcome anyone to edit this solution so that it clarifies what is actually happening with my code and or post a proper/optimized solution if it exists. I'm going to give this post a solid few days before marking it solved, as I would like to still get a solid answer on this confusing problem.
P.S. If anyone knows of a way to force nodes not to overlap, I would appreciate a comment explaining how to do that. If you look at my picture, you'll see that the effect of dwarfing the other nodes is diminished when the largest node is covered by it's closely sized peers.

Best approach to splitting up clusters of data

I am working on a way to split up data in a CSV file based on a timestamp.
For example, for a given object id, check each entries date and see if it is within a given, allowed range. So if a set of rows in the table were:
OBJECT ID - Info - Date
obj1 xyz 1/1/12
obj1 xyw 1/2/12
obj1 cya 1/3/12
obj1 abc 2/1/12
...
In this example, the fourth entry is well outside of the area of time that the other entries are in. Therefore, my desired behavior is for a script to assign that entry to a new object, say 'obj2' for example, such that it is separated from data within its own cluster. Note that the dataset this will be applied to will be somewhat large, at the very least in the 10s of thousands, so I don't know if manual algorithms will be fast enough.
I'm using R for the moment to try to get this done using the PAM and PAMK functions in the FPC package. This gives me a plot of the clusters (I think), but I don't know how to apply this information to the actual data.
Any thoughts or ideas on the best way to do this?
I figured out a solution using the following steps:
// Convert the timestamps to milliseconds
newData <- as.POSIXct(data$date, format="date_format_here")
// Split the data using the object ID as the parameter
splitData <- split(data, f=data$id)
// Iterate over the split sessions, concatenating the cluster IDs as it goes using paste
for each {
pamk.result <- pamk(splitData[[i]][dataColumnIndex]
newData[i,1] <- paste(data[i,1],
pamk.result$pamobject$clustering[[x]],
sep="delimiter_here")
}
Anyway, this is a rough outline of how I approached the problem. Maybe this will give some ideas to others down the line.

CSV Import in Gephi

I've created my network using R from a large dataset. I've used a smaller one to test and wrote my own plotter to show how I'd like it displayed, I just can't seem to get it right....
This Image shows how my network should look. I've tried square matrices of data (36x36) and a 1x36 exported as CSV, neither of which give the result I desire.
Ignoring the bigger circles, I'd like the network displayed in the image above.
Version 1 - 1x36 - https://www.dropbox.com/s/k4a7tc0kwlfqd0l/ABC.csv
Version 2 - 36x36 - https://www.dropbox.com/s/mmu7spix076bn6e/DEF.csv
The structure is as follows. Row 1 & Column 1 - node names. All numbers decide if an edge exists or not (0 or 1).
When I try to import these files, Gephi interprets them in an unusual way.
Is there something I'm doing wrong?
Cheers
I suggest you to use rgexf. It is available at
http://cran.r-project.org/web/packages/rgexf/index.html
I assume that you have a edgelist already. Let me call it x.
library(rgexf)
data <- edge.list(x) # It creates two objects from your edgelist: data$nodes and data$edges
g <- write.gexf(nodes=data$nodes,edges=data$edges,...) # It creates a graph in gexf format, here you can add nodes' attributes, edges' attributes, etc...
print(g, file="mygraph.gexf") # It saves the graph
For more details. The manual is here: http://cran.r-project.org/web/packages/rgexf/rgexf.pdf)

R GraphNEL: Result of adj function is a vector of length one. Access concatenated values separately?

Im working with a graphNEL object and need to extract the adjacent nodes of
a specified node. This is solvable with adj(nodes(graph),"node123"),
however the nodes are returned as a vector of size 1. So I cant
access directly vertain nodes for it.
Lets say:
> adjacent <- adj(subgraph,"hsa:991")
> adjacent
$`hsa:991`
[1] "hsa:10744" "hsa:29945" "hsa:51433" "hsa:8881"
For an algorithm I just need lets say "hsa:29945" but since this
vector just is of size one, I have a problem. Is this possible?
The best thing would be that every node is recognized as a element.
Btw.: maybe somebody can explain to me why they are even only one element
I mean [1] "hsa:10744 hsa:29945 hsa:51433 hsa:8881" I could understand
but why are there quotes after every node? After all I just need to implement
a random walk on a graph. But I havent found any packages. So I will try to
implement it myself.
Hope you can help me.
Thanks in advance.
Cheers
Rich
adj(g, index=XXX) is returning a list containing the neighbours for each entry of XXX.
So, in order to extract the results for an entry of XXX you need to access the corresponding entry in the list. This then gives you the desired results:
##a simple mock-up graph
g <- new("graphNEL", nodes=c("V1","V2","V3"), edgemode="undirected")
g <- addEdge("V1","V2",g)
g <- addEdge("V1","V3",g)
adj.res <- adj(g,"V1") #returns a list
adj.res[["V1"]] #returns a vector

Resources