Use neo4j with R - r

Is there a R library that supports neo4j? I would like to construct a R graph (e.g. igraph) from neo4j or - vice versa - store a R graph in neo4j.
More precisely, I am looking for something similar to bulbflow for Python.
Update
There is a new neo4j driver for R that looks promising: http://nicolewhite.github.io/RNeo4j/. I changed the correct answer.

This link might be helpful. I'm going to connect ne04j with R in the following days and will try first with the provided link. Hope it helps.
I tried it out and it works well. Here is the function that works:
First, install and load packages and then execute function:
install.packages('RCurl')
install.packages('RJSONIO')
library('bitops')
library('RCurl')
library('RJSONIO')
query <- function(querystring) {
h = basicTextGatherer()
curlPerform(url="localhost:7474/db/data/ext/CypherPlugin/graphdb/execute_query",
postfields=paste('query',curlEscape(querystring), sep='='),
writefunction = h$update,
verbose = FALSE
)
result <- fromJSON(h$value())
#print(result)
data <- data.frame(t(sapply(result$data, unlist)))
print(data)
names(data) <- result$columns
}
and this is an example of calling function:
q <-"start a = node(50) match a-->b RETURN b"
data <- query(q)

Consider the RNeo4j driver. The function shown above is incomplete: it cannot return single column data and there is no NULL handling.
https://github.com/nicolewhite/RNeo4j

I tried to use the R script (thanks a lot for providing it) and it seems to me that you can directly use :
/db/data/cypher
instead of
db/data/ext/CypherPlugin/graphdb/execute_query
(with neo4j 2.0).

Not sure if it fits your requirements but have a look at Gephi.
http://gephi.org/.

Related

How to store SparkR result into an R object?

Still new to the world of Azure Databricks, the use of SparkR remains very obscure to me, even for very simple tasks...
It took me a very long time to find how to count distinct values, and I'm not sure it's the right way to go :
library(SparkR)
sparkR.session()
DW <- sql("select * from db.mytable")
nb.var <- head(summarize(DW, n_distinct(DW$VAR)))
I thought I found, but nb.per is not an object, but still a dataframe...
class(nb.per)
[1] "data.frame"
I tried :
nb.per <- as.numeric(head(summarize(DW, n_distinct(DW$PERIODE))))
It seems ok, but I'm pretty sure there is a better way to achieve this ?
Thanks !
Since you are anyway using Spark SQL, a very simple approach would be to do like this:
nb.per <- `[[`(SparkR::collect(SparkR::sql("select count(distinct VAR) from db.mytable")), 1).
And using SparkR APIs like:
DW <- SparkR::tableToDF("db.mytable")
nb.per <- `[[`(SparkR::collect(SparkR::agg(DW, SparkR::countDistinct(SparkR::column("VAR")))), 1)
The SparkR::sql function returns a SparkDataFrame.
In order to use this in R as an R data.frame, you can simply coerce it:
as.data.frame(sql("select * from db.mytable"))

Walsh-Hadamard Transform in r

I search for a command to compute Walsh-Hadamard Transform of an image in R, but I don't find anything. In MATLAB fwht use for this. this command implement Walsh-Hadamard Tranform to each row of matrix. Can anyone introduce a similar way to compute Walsh-Hadamard on rows or columns of Matrix in R?
I find a package here:
http://www2.uaem.mx/r-mirror/web/packages/boolfun/boolfun.pdf
But why this package is not available when I want to install it?
Packages that are not maintained get put in the Archive. They get put there when that aren't updated to match changing requirements or start making errors with changing R code base. https://cran.r-project.org/web/packages/boolfun/index.html
It's possible that you might be able to extract useful code from the archive version, despite the relatively ancient version of R that package was written under.
The R code for walshTransform calls an object code routine:
walshTransform <- function ( truthTable ) # /!\ should check truthTable values are in {0,1}
{
len <- log(length(truthTable),base=2)
if( len != round(len) )
stop("bad truth table length")
res <- .Call( "walshTransform",
as.integer(truthTable),
as.integer(len))
res
}
Installing the package succeeded on my Mac, but would require the appropriate toolchain on whatever OS you are working in.

R callback functions using sparklyr

I hope to use mapPartitions and reduce function of Spark (http://spark.apache.org/docs/latest/programming-guide.html), using sparklyr.
It is easy in pyspark, the only thing I need to use is a plain python code. I can simply add python functions as callback function. So easy.
For example, in pyspark, I can use those two functions as follows:
mapdata = self.rdd.mapPartitions(mycbfunc1(myparam1))
res = mapdata.reduce(mycbfunc2(myparam2))
However, it seems this is not possible in R, for example sparklyr library. I checked RSpark, but it seems it is another way of query/wrangling data in R, nothing else.
I would appreciate if someone let me know how to use those two functions in R, with R callback functions.
In SparkR you could use internal functions - hence the prefix SparkR::: - to accomplish the same.
newRdd = SparkR:::toRDD(self)
mapdata = SparkR:::mapPartitions(newRdd, function(x) { mycbfunc1(x, myparam1)})
res = SparkR:::reduce(mapdata, function(x) { mycbfunc2(x, myparam2)})
I believe sparklyr interfaces only with the DataFrame / DataSet API.

Save package settings between sessions

Is there a definitive way to save options or information pertaining to a certain package between sessions?
For example say somebody made a game and released it as an R package. If they wanted to save high scores and not have them reset each time R started a new session what would be the best way to do this? Currently I can only think of storing a file in the users home directory but I'm not sure if I like that approach.
This may be an approach. I created a dummy package with a dummy function (any function I create is bound to be a dummy function) and a data set I called scores that I set as follows:
scores <- NA
Then I created the package with the scores data set.
Then I used the following to change the data set from within R.
loc <- paste0(find.package("new"), "/Data")
unlink(paste0(loc, "/scores.rda"), recursive = TRUE, force = FALSE)
scores <- 10
save(scores, file=paste0(loc, "/scores.rda"))
Then when I unloaded the library and re loaded agin the data set now says:
> scores
[1] 10
Could this be modified to do what you want? You'd have to have it save in between somehow but am not sure on how to do this without messing with .Last function.
EDIT:
It appears this option is not viable in that when you compile as a package and use lazy load it saves the data sets as:
RData.rbd, RData.rbx, not as .rda files. That means the approach I use above is kinda worthless in that we want it to automatically be recognized.
EDIT2
This approach works and I tried it on a package I made. You can't do lazy load of the data and you have to either explicitly use data(scores) or use data(scores) inside of the function you're calling. I also assigned scores to .scores int he global.env the first time it was created and used exists inside the function to see if it exists. If `.scores. existed I assigned that to scores within the function. Once you unload the library and laod again you never have to worry about that again.
Maybe an alternative is to save this as a function somehow that can be altered using Josh's advice here: Permanently replacing a function
I guess there is no way to store settings without saving them to disk or a database, some way or another. It can be done silently though by putting the code below in your ~/.Rprofile. However, if you have packages that save settings in other ways than using options you need to add them manually.
I know this is exactly what you said you did not want, but it might spark some debate at least.
.Last <- function(){
my.options <- options()
save(my.options, file="~/.Roptions.Rdata")
}
.First <- function(){
tryCatch({
load("~/.Roptions.Rdata")
do.call(options, my.options)
rm(my.options)
}, error=function(...){})
}
To my suprise try(..., silent=TRUE) gives a warning on startup if ~/.Roptions.Rdata does not exist, which is why I used tryCatch instead.
The modern answer to this problem is well explained at https://blog.r-hub.io/2020/03/12/user-preferences/
I think I will be trying the hoardr package! Here is an example that worked for me :)
x <- hoardr::hoard()
x$cache_path_set("yourpackage", type = 'user_cache_dir')
x$mkdir()
scores<-data.frame(
user=c("one","two","three"),
score=c("500,200,1100")
)
save(scores,file = file.path(x$cache_path_get(), "scores.rdata"))
x$list()
x$details()
#new session
x <- hoardr::hoard()
x$cache_path_set("yourpackage", type = 'user_cache_dir')
x$list()
x$details()
load(file = file.path(x$cache_path_get(), "scores.rdata"))
PS - you can see a working example in the rnoaa package found on at github "opensci/rnoaa". Check their R/onload.r file! I can expand if needed.

Data inside a function (package creation)

If I need to use a data set inside a function (as a lookup table) inside of a package I'm creating do I need to explicitly load the data set inside of the function?
The function and the data set are both part of my package.
Is this the correct way to use that data set inside the function:
foo <- function(x){
x <- dataset_in_question
}
or is this better:
foo <- function(x){
x <- data(dataset_in_question)
}
or is there some approach I'm not thinking of that's correct?
There was a recent discussion about this topic (in the context of package development) on R-devel, numerous points of which are relevant to this question:
If only the options you provide are applicable to your example R himself (i.e., Brian Ripley) tells you to do:
foo <- function(x){
data("dataset_in_question")
}
This approach will however throw a NOTE in R CMD check which can be avoided in upcoming versions of R (or currently R devel) by using the globalVariables() function, added by John Chambers
The 'correct' approach (i.e., the one advocated by Brian Ripley and Peter Dalgaard) would be to use the LazyData option for your package. See this section of "Writing R Extensions".
Btw: I do not fully understand how your first approach should work. What should x <- dataset_in_question do? Is dataset_in_question a global Variable or defined previously?
For me it was necessary to use get() additionally to LazyData: true in DESCRIPTION file (see postig by #Henrik point 3) to get rid of the NOTE no visible binding for global variable .... My R version is 3.2.3.
foo <- function(x){
get("dataset_in_question")
}
So LazyData makes dataset_in_question directly accessible (without using data("dataset_in_question", envir = environment())) and get() is to satisfy R CMD check
HTH
One can just place the data set as a .rda file in the R folder as described by Hadley here: http://r-pkgs.had.co.nz/data.html#data-sysdata
Matthew Jockers uses this approach in the syuzhet package for data sets including the bing data set as seen at ~line 452 here: https://github.com/mjockers/syuzhet/blob/master/R/syuzhet.R
bing is not available to the user but is to the package as demonstrated by: syuzhet:::bing
Essentially, the command devtools::use_data(..., internal = TRUE) will set everything up in the way it's needed.

Resources