CSV Import in Gephi - r

I've created my network using R from a large dataset. I've used a smaller one to test and wrote my own plotter to show how I'd like it displayed, I just can't seem to get it right....
This Image shows how my network should look. I've tried square matrices of data (36x36) and a 1x36 exported as CSV, neither of which give the result I desire.
Ignoring the bigger circles, I'd like the network displayed in the image above.
Version 1 - 1x36 - https://www.dropbox.com/s/k4a7tc0kwlfqd0l/ABC.csv
Version 2 - 36x36 - https://www.dropbox.com/s/mmu7spix076bn6e/DEF.csv
The structure is as follows. Row 1 & Column 1 - node names. All numbers decide if an edge exists or not (0 or 1).
When I try to import these files, Gephi interprets them in an unusual way.
Is there something I'm doing wrong?
Cheers

I suggest you to use rgexf. It is available at
http://cran.r-project.org/web/packages/rgexf/index.html
I assume that you have a edgelist already. Let me call it x.
library(rgexf)
data <- edge.list(x) # It creates two objects from your edgelist: data$nodes and data$edges
g <- write.gexf(nodes=data$nodes,edges=data$edges,...) # It creates a graph in gexf format, here you can add nodes' attributes, edges' attributes, etc...
print(g, file="mygraph.gexf") # It saves the graph
For more details. The manual is here: http://cran.r-project.org/web/packages/rgexf/rgexf.pdf)

Related

Do I need to create all nodes by hand in Neo4j?

I am probably missing something because I am very new to Neo4j, but looking at their Movie graph - probably the very first graph to play with when you are learning the platform - they give us a really big piece of code where every node and labels and properties are imputed by hand, one after the other. Ok, it seems fair to a small graph for learning purpose. But, how should I proceed when I want to import a CSV and create a graph from this data? I believe a hand-imput is not expected at all.
My data look something like this:
date
origin
destiny
value
type
balance
01-05-2021
A
B
500
transf
2500
It has more than 10 thousand rows like this.
I loaded it as:
LOAD CSV FROM "file:///MyData.csv" AS data
RETURN data;
and it worked. The data was loaded etc. But now I have some questions:
1- How do I proceeed if I want origin to be a node and destiny to be another node with type to be edges with value as property? I mean, I know how to create it like (a)->[]->(b) but how to create the entire graph without creating edge by edge, node by node, property by property etc...?
2- Am I able to select the date and see something like a time evolution for this graph? I want to see all transactions in 20-05-2021, 01-05-2021 etc and see how it evolves. Is it possible?
As example in the official docs says here: https://neo4j.com/docs/operations-manual/current/tutorial/neo4j-admin-import/#tutorial-neo4j-admin-import
You may want to create 3 separate files for the import:
First: you need the movies.csv to import nodes with label :Movie
movieId:ID,title,year:int,:LABEL
tt0133093,"The Matrix",1999,Movie
tt0234215,"The Matrix Reloaded",2003,Movie;Sequel
tt0242653,"The Matrix Revolutions",2003,Movie;Sequel
Second: you need actors.csv to import nodes with label :Actor
personId:ID,name,:LABEL
keanu,"Keanu Reeves",Actor
laurence,"Laurence Fishburne",Actor
carrieanne,"Carrie-Anne Moss",Actor
Finally, you can import relationships
As you see, actors and movies are already imported. So now you just need to specify the relationships. In the example, you're importing ROLE relationship in the given format:
:START_ID,role,:END_ID,:TYPE
keanu,"Neo",tt0133093,ACTED_IN
keanu,"Neo",tt0234215,ACTED_IN
keanu,"Neo",tt0242653,ACTED_IN
laurence,"Morpheus",tt0133093,ACTED_IN
laurence,"Morpheus",tt0234215,ACTED_IN
laurence,"Morpheus",tt0242653,ACTED_IN
carrieanne,"Trinity",tt0133093,ACTED_IN
carrieanne,"Trinity",tt0234215,ACTED_IN
carrieanne,"Trinity",tt0242653,ACTED_IN
So as you see in the header, you've got values:
START_ID - where the relationship starts, from which node
role - property name (you can specify multiple properties here, just make sure the csv format contains data for it)
:END_IN - where the relationship ends, to which node
:TYPE - type of the relationship
That's all :)

igraph R vertex ids get changed

I have very basic issue with igraph (in R): renaming of the node ids.
For example, I have following graph in form of edgelist.
10,12
10,14
12,14
12,15
14,15
12,17
17,34
17,100
100,34
I want to calculate local clustering coefficient for each node. First I have read the edgelist in object g using readcsv. Then, I used the following command to dump the local CC for each node.
write.csv(transitivity(g,type="local"),file="DumpLocalCC.csv")
Now the problem is, igraph changes the node IDs starting from 1 and I get following output
"","x"
"1",NA
"2",0.333333333333333
"3",0.333333333333333
"4",0.333333333333333
"5",1
"6",1
"7",1
Now how can I resolute which node id is what ? That is if 7 in the output file points to 100 or 34 ?
Is there anyway, we can force igraph to dump actual nodeids like 10, 34, 100 etc and their respective Local CC ?
I was googling and found people suggested "V(g)$name <- as.character(V(g))" for preserving the nodeids. I tried however, I think I am not using it correctly.
Also, since the data is large, I would not like to change the nodeids manually to make them sequential from 1 .... myself.
P.s: Here I noticed a similar question has been asked. It has been suggested to "assign these numbers as vertex names".
How to do that ?
Can someone exemplify it please ?
Another similar question like this (I understand its the similar question), where it was suggested to open an issue. I am not sure if this has been resolved ?
Thanks in advance.
You just need to combine the stats with the node names when you write the table. For example
DF <- read.csv(text="10,12
10,14
12,14
12,15
14,15
12,17
17,34
17,100
100,34", header=FALSE)
g <- graph.data.frame(DF)
outdata <- data.frame(node=names(V(g)), trans=transitivity(g, type="local"))
write.csv(outdata, file="DumpLocalCC.csv")

R XLConnect getting index/formula to a chunk of data using content found in first cell

Sorry if this is difficult to understand - I don't have enough karma to add a picture so I will do the best I can to describe this! Using XLConnect package within R to read & write from/to Excel spreadsheets.
I am working on a project in which I am trying to take columns of data out of many workbooks and concatenate them together into rows of a new workbook based on which workbook they came from (each workbook is data from a consecutive business day). The snag is that the data that I seek is only a small part (10 rows X 3 columns) of each workbook/worksheet and is not always located in the same place within the worksheet due to sloppiness on behalf of the person who originally created the spreadsheets. (e.g. I can't just start at cell A2 because the dataset that starts at A2 in one workbook might start at B12 or C3 in another workbook).
I am wondering if it is possible to search for a cell based on its contents (e.g. a cell containing the title "Table of Arb Prices") and return either the index or reference formula to be able to access that cell.
Also wondering if, once I reference that cell based on its contents, if there is a way to adjust that formula to get to where I know another cell is compared to that one. For example if a cell with known contents is always located 2 rows above and 3 columns to the left of the cell where I wish to start collecting data, is it possible for me to take that first reference formula and increment it by 2 rows and 3 columns to get the reference formula for the cell I want?
Thanks for any help and please advise me if you need further information to be able to understand my questions!
You can just read the entire worksheet in as a matrix with something like
library(XLConnect)
demoExcelFile <- system.file("demoFiles/mtcars.xlsx", package = "XLConnect")
mm <- as.matrix(readWorksheetFromFile(demoExcelFile, sheet=1))
class(mm)<-"character" # convert all to character
Then you can search for values and get the row/colum
which(mm=="3.435", arr.ind=T)
# row col
# [1,] 23 6
Then you can offset those and extract values from the matrix how ever you like. In the end, when you know where you want to read from, you can convert to a cleaner data frame with
read.table(text=apply(mm[25:27, 6:8],1,paste, collapse="\t"), sep="\t")
Hopefully that gives you a general idea of something you can try. It's hard to be more specific without knowing exactly what your input data looks like.

CSV file to Histogram in R

I'm a total newbie with R, and I'm trying to create a histogram (with value and frequency as the axises) from a csv file (just one row of values). Any idea how I can do this?
I'm also an R newbie, and I ran into the same thing. I made two separate mistakes, actually, so I'll describe them both here.
Mistake 1: Passing a frequency table to hist(). Originally I was trying to pass a frequency table to hist() instead of passing in the raw data. One way to fix this is to use the rep() ("replicate") function to explode your frequency table back into a raw dataset, as described here:
Creating a histogram using aggregated data
Simple R (histogram) from counted csv file
Instead of that, though, I just decided to read in my original dataset instead of the frequency table.
Mistake 2: Wrong data type. My raw data CSV file contains two columns: hostname and bookings (idea is to count the number of bookings each host generated during some given time period). I read it into a table.
> tbl <- read.csv('bookingsdata.csv')
Then when I tried to generate a histogram off the second column, I did this:
> hist(tbl[2])
This gave me the "'x' must be numeric" error you mention in a comment. (It was trying to read the "bookings" column header in as a data value.)
This fixed it:
> hist(tbl$bookings)
You should really start to read some basic R manual...
CRAN offers a lot of them (look into the Manuals and Contributed sections)
In any case:
setwd("path/to/csv/file")
myvalues <- read.csv("filename.csv")
hist(myvalues, 100) # Example: 100 breaks, but you can specify them at will
See the manual pages for those functions for more help (accessible through ?read.table, ?read.csv and ?hist).
To plot the histogram, the values must be of numeric class i.e the data must be of numeric value. Here the value of x seems to be of some other class.
Run the following command and see:
sapply(myvalues[1,],class)

Import Large Unusual File To R

First time poster here, so I'll try and make myself as clear as possible on the help I need. I'm fairly new to R, and this is my first real independent programming experience.
I have stock tick data for about 2.5 years, each day has its own file. The files are .txt and consist of approximately 20-30 million rows, and averaging I guess 360mb each. I am working one file at a time for now. I don't need all the data these files contain, and I was hoping that I could use the programming to minimize my files a bit.
Now my problem is that I am having some difficulties with writing the proper code so R understands what I need it to do.
Let me first show you some of the data so you can get an idea of the formatting.
M977
R 64266NRE1VEW107 FI0009653869 2EURXHEL 630 1
R 64516SSA0B 80SHB SE0002798108 8SEKXSTO 40 1
R 645730BBREEW750 FR0010734145 8EURXHEL 640 1
R 64655OXS1C 900SWE SE0002800136 8SEKXSTO 40 1
R 64663OXS1P 450SWE SE0002800219 8SEKXSTO 40 1
R 64801SSIEGV LU0362355355 11EURXCSE 160 1
M978
Another snip of data:
M732
D 3547742
A 3551497B 200000 67110 02800
D 3550806
D 3547743
A 3551498S 250000 69228 09900
So as you can see each line begins with a letter. Each letter denotes what the line means. For instance R means order book directory message, M means milliseconds after last second, H means stock trading action message. There are 14 different letters used in total.
I have used the readLines function to import the data into R. This however seems to take a very long time for R to process when I want to work with the data.
Now I would like to write some sort of If function that says if the first letter is R then from offset 1 to 4 the code means Market Segment Identifier etc., and have R add columns to these so I can work with the data in a more structured fashion.
What is the best way of importing such data, and also creating some form of structure - i.e. use unique ID information in the line of data to analyze 1 stock at a time for instance.
You can try something like this :
options(stringsAsFactors = FALSE)
f_A <- function(line,tab_A){
values <- unlist(strsplit(line," "))[2:5]
rbind(tab_A,list(name_1=as.character(values[1]),name_2=as.numeric(values[2]),name_3=as.numeric(values[3]),name_4=as.numeric(values[4])))
}
tab_A <- data.frame(name_1=character(),name_2=numeric(),name_3=numeric(),name_4=numeric(),stringsAsFactors=F)
for(i in readLines(con="/home/data.txt")){
switch(strsplit(x=i,split="")[[1]][1],M=cat("1\n"),R=cat("2\n"),D=cat("3\n"),A=(tab_A <- f_A(i,tab_A)))
}
And replace cat() by different functions that add values to each type of data.frame. Use the pattern of the function f_A() to construct others functions and same things for the table structure.
You can combine your readLines() command with regular expressions. To get more information about regular expressions, look at the R help site for grep()
> ?grep
So you can go through all the lines, check for each line what it means, and then handle or store the content of the line however you like. (Regular Expressions are also useful to split the data within one line...)

Resources