How to use rdflib to query WikiData? - wikidata

I mean that I want to use rdflib to query WIkidata in my local computer, but rdflib.Graph() need to parse the namespace firstly.THerefore, How can I get the Wikidata NameSpace to use the rdflib local code?

I think the goal was:
from rdflib import Graph
g = Graph()
g.parse('wikidata-link')
or
g.load('wikidata-link')
I haven't spent much time on it, but here are my tryouts, just to kinda complete the question and maybe find an answer.
Some of the following possible versions have resulted in some kind of error ranging from, 'timeout', 'not well formed (invalid token)', Typeerrors, '.. not a valid NCName ...' up to missing plugin errors when getting 'text/html' or '.../json' back. I marked what worked and what didn't.
CODE SAMPLES I'VE TRIED
g.parse('https://www.wikidata.org/wiki/Special:EntityData/Q42.n3') # WORKS
g.parse('https://www.wikidata.org/wiki/Special:EntityData/Q42.json') # FAILS
g.parse('https://www.wikidata.org/wiki/Special:EntityData/Q42.ttl') # WORKS
g.parse('https://www.wikidata.org/wiki/Special:EntityData/Q42.rdf') # FAILS
g.parse('https://www.wikidata.org/wiki/Special:EntityData/Q64') # FAILS
g.parse('https://www.wikidata.org/wiki/Q42') # FAILS
g.load('https://www.wikidata.org/wiki/Special:EntityData/Q42.n3') # FAILS
g.load('https://www.wikidata.org/wiki/Special:EntityData/Q42.json') # FAILS
g.load('https://www.wikidata.org/wiki/Special:EntityData/Q42.ttl') # FAILS
g.load('https://www.wikidata.org/wiki/Special:EntityData/Q42.rdf') # FAILS
g.load('https://www.wikidata.org/wiki/Special:EntityData/Q42') # FAILS
g.load('https://www.wikidata.org/wiki/Q42') # FAILS
I tried these out based on Wikidata Access
VERSIONS USED
RDFLib 6.1.1
Python 3.10.1
Last additional thoughts
You could query wikidata via the endpoint and build your rdflib graph from there.

Related

Creating a parameter in neo4j through R driver

I am trying to generate a graph using the neo4r R driver. I have no problems preforming standard queries such as
"MATCH (n:Node {nodeName: ‘A Name’}) RETURN COUNT(n)” %>% call_neo4j(con)
However when I try to create a parameter with the following query
":params {Testnode: {testNodeName: 'Node Name'}}" %>% call_neo4j(con)
I get the following syntax error
$error_code
[1] "Neo.ClientError.Statement.SyntaxError"
$error_message
[1] "Invalid input ':': expected <init> (line 1, column 1 (offset: 0))\n\":params {Testnode: {testNodeName: 'Node Name'}}\"\n ^"
The parameter query works fine when I run it directly in the neo4j browser so I do not understand how there is a syntax error?
Any ideas on how to fix this greatly accepted!
:params only works in the Neo4j Browser, it's not really Cypher.
Worse, the R Neo4j driver doesn't seem to support passing parameters - there's an open Github issue that points to a fork that contains relevant changes, but that fork also has other changes that make it deviate from the main driver.
I'd try either using the fork to see if it gets you anywhere, and if it does either create the relevant PR to the project or maintain a local fork that track the main driver but just contains that parameter change.

Is rJava object is exportable in future(Package for Asynchronous computing in R)

I'm trying to speed up my R code using future package by using mutlicore plan on Linux. In future definition I'm creating a java object and trying to pass it to .jcall(), But I'm getting a null value for java object in future. Could anyone please help me out to resolve this. Below is sample code-
library("future")
plan(multicore)
library(rJava)
.jinit()
# preprocess is a user defined function
my_value <- preprocess(a = value){
# some preprocessing task here
# time consuming statistical analysis here
return(lreturn) # return a list of 3 components
}
obj=.jnew("java.custom.class")
f <- future({
.jcall(obj, "V", "CustomJavaMethod", my_value)
})
Basically I'm dealing with large streaming data. In above code I'm sending the string of streaming data to user defined function for statistical analysis and returning the list of 3 components. Then want to send this list to custom java class [ java.custom.class ]for further processing using custom Java method [ CustomJavaMethod ].
Without using future my code is running fine. But I'm getting 12 streaming records in one minute and then my code is getting slow, observed delay in processing.
Currently I'm using Unix with 16 cores. After using future package my process is done fast. I have traced back my code, in .jcall something happens wrong.
Hope this clarifies my pain.
(Author of the future package here:)
Unfortunately, there are certain types of objects in R that cannot be sent to another R process for further processing. To clarify, this is a limitation to those type of objects - not to the parallel framework use (here the future framework). This simplest example of such an objects may be a file connection, e.g. con <- file("my-local-file.txt", open = "wb"). I've documented some examples in Section 'Non-exportable objects' of the 'Common Issues with Solutions' vignette (https://cran.r-project.org/web/packages/future/vignettes/future-4-issues.html).
As mentioned in the vignette, you can set an option (*) such that the future framework looks for these type of objects and gives an informative error before attempting to launch the future ("early stopping"). Here is your example with this check activated:
library("future")
plan(multisession)
## Assert that global objects can be sent back and forth between
## the main R process and background R processes ("workers")
options(future.globals.onReference = "error")
library("rJava")
.jinit()
end <- .jnew("java/lang/String", " World!")
f <- future({
start <- .jnew("java/lang/String", "Hello")
.jcall(start, "Ljava/lang/String;", "concat", end)
})
# Error in FALSE :
# Detected a non-exportable reference ('externalptr') in one of the
# globals ('end' of class 'jobjRef') used in the future expression
So, yes, your example actually works when using plan(multicore). The reason for that is that 'multicore' uses forked processes (available on Unix and macOS but not Windows). However, I would try my best to limit your software to parallelize only on "forkable" systems; if you can find an alternative approach I would aim for that. That way your code will also work on, say, a huge cloud cluster.
(*) The reason for these checks not being enabled by default is (a) it's still in beta testing, and (b) it comes with overhead because we basically need to scan for non-supported objects among all the globals. Whether these checks will be enabled by default in the future or not, will be discussed over at https://github.com/HenrikBengtsson/future.
The code in the question is calling unknown Method1 method, my_value is undefined, ... hard to know what you are really trying to achieve.
Take a look at the following example, maybe you can get inspiration from it:
library(future)
plan(multicore)
library(rJava)
.jinit()
end = .jnew("java/lang/String", " World!")
f <- future({
start = .jnew("java/lang/String", "Hello")
.jcall(start, "Ljava/lang/String;", "concat", end)
})
value(f)
[1] "Hello World!"

Getting Internal Server Error when trying to use Gviz's IdeogramTrack

I posted this on Bioconductor's support page but didn't any answers hence trying here.
I am using the IdeogramTrack function of R/Biocondutor package, Gviz, from my institution's cluster:
IdeogramTrack(genome="mm10",chromosome="chr1")
When I try this from the master node it works fine but when I try this from any other node in the cluster which IO's through the master node, it hangs and eventually I get the error message:
Error: Internal Server Error
I am able to access enter link description here or any other UCSC mirror through these nodes (using traceroute http://genome.ucsc.edu), and can successfully download data from other repositories such as Ensembl, (e.g., using getBM).
Any idea what's wrong?
BTW, any idea which port is IdeogramTrack trying to use?
it sounds like your institution's cluster has issue fetching annotation data from UCSC through Gviz. One suggestion I have is to see if you can manually download mm9 annotation from UCSC; here is a good place to start, by chromosome. Alternatively, you may use a Bioconductor annotation package such as this.
When you have your data.frame with chromosome and chromosomal information (e.g. mapInfo), you could take advantage of GenomicRanges::makeGRangesFromDataFrame to convert the mm9 annotation to a GRanges object, which allows you to make your own IdeogramTrack object. Details on how to make custom IdeogramTrack can be found here.
In general, here is the workflow:
library(GenomicRanges)
library(Gviz)
mm9_annot <- read.table(<file or url with annotation>)
mm9_granges <- makeGRangesFromDataFrame(mm9_annot)
# Alternatively, you may use rtracklayer package
# mm9_granges <- rtracklayer::import(<file or url with annotation>)
my_ideo <- IdeogramTrack(genome="mm9_custom", bands=mm9_granges)
Hope this helps.

Reading SDMX in R - parse error?

I've been trying to develop a shiny app in R with INEGI (mexican statistics agency) data through their recently initiated SDMX service. I went as far a contacting the developers themselves and they gave me the following, unworkable, code:
require(devtools)
require(RSQLite)
require(rsdmx)
require(RCurl)
url <- paste("http://www.snieg.mx/opendata/NSIRestService/Data/ALL,DF_PIB_PB2008,ALL/ALL/INEGI");
sdmxObj <- readSDMX(url)
df_pib <- as.data.frame(sdmxObj)
Which brings me to the following errors:
sdmxObj <- readSDMX(url)
Opening and ending tag mismatch: ad line 1 and Name
Opening and ending tag mismatch: b3 line 1 and Name
Opening and ending tag mismatch: b3 line 1 and Department
Opening and ending tag mismatch: c3 line 1 and Contact
Opening and ending tag mismatch: a1 line 1 and Sender
Opening and ending tag mismatch: c3 line 1 and Header
Opening and ending tag mismatch: b3 line 1 and GenericData
... etc, you get the point.
I tried to use another url (maybe this was to broad, bringing in every GDP measurement), but I get the same result:
url<-"http://www.snieg.mx/opendata/NSIRestService/Data/ALL,DF_PIB_PB2008,ALL/.MX.........C05.......0101/INEGI?format=compact"
If I download the file directly with my browser I seem to be getting useful structures.
Any ideas? Does this seem like a faulty definition directly from the source or an issue with the package "rsdmx", if so, has anyone found a way to parse similar structures correctly?
The code you pasted above, using rsdmx, works perfectly fine. The issue you had was about your workplace firewall, as you correctly figure out.
You only need to load rsdmx package (the other packages do not need to be explicitely declared)
require(rsdmx)
and do this code:
url <- paste("http://www.snieg.mx/opendata/NSIRestService/Data/ALL,DF_PIB_PB2008,ALL/ALL/INEGI");
sdmxObj <- readSDMX(url)
df_pib <- as.data.frame(sdmxObj)
I've checked for any potential issue related to this datasource, but there is not. Staying strictly within the scope of your post, your code is fine.
This being said, if you find a bug in rsdmx, you can directly submit a ticket at https://github.com/opensdmx/rsdmx/issues Prompt feedback is provided to users. You can also send suggestions or wished features there or in the rsdmx mailing list.
You could try RJSDMX .
To download all the time series of the DF_PIB_PB2008 dataflow you just need to hit:
library(RJSDMX)
result = getSDMX('INEGI', 'DF_PIB_PB2008/.................')
or equivalently:
result = getSDMX('INEGI', 'DF_PIB_PB2008/ALL')
If you need time series as a result, you're done. Elseway, if you prefer a data.frame, you can get it calling:
dfresult = sdmxdf(result, meta=T)
You can find more information about the package and its configuration in the project wiki

How to work with an expected return type of "java.lang.Class"?

I thought I'd try to write an R interface to Scribe (mature OAuth library for Java by Pablo Fernandez) as a way of refreshing myself on Java (not used it in 8 years), learning rJava and to make better use of the Twitter API. But mostly because it's Friday afternoon and I thought it'd be fun. :)
Unfortunately I haven't got very far...
I downloaded the .jar file for scribe and also commons-condec (its only dependency, which I subsequently unzipped). I've ran the code in Java using netbeans and it works fine using his twitter example.
I was OK for the first few lines of code by just following the rJava documenation:
# load R packages
library(rJava)
# Initialise
.jinit()
# Add class paths
d1 <- "C:/Users/Tony/Documents/R/java/scribe-1.1.0.jar"
d2 <- "C:/Users/Tony/Documents/R/java/commons-codec-1.4/"
.jaddClassPath(path=c(d1, d2))
But then scribe quick start guide says the following is needed:
// Java Code
OAuthService service = new ServiceBuilder()
.provider(TwitterApi.class)
.apiKey("6icbcAXyZx67r8uTAUM5Qw")
.apiSecret("SCCAdUUc6LXxiazxH3N0QfpNUvlUy84mZ2XZKiv39s")
.build();
I can't figure out how to rewrite that into rJava parlance. A little web searching suggests I should do it in parts, so first I did:
# Create object (back to R code again)
( service <- .jnew("org.scribe.builder.ServiceBuilder") )
[1] "Java-Object{org.scribe.builder.ServiceBuilder#58fe64b9}"
# Set up apiKey and apiSecret using "$" shortcut
service$apiKey("6icbcAXyZx67r8uTAUM5Qw")
service$apiSecret("SCCAdUUc6LXxiazxH3N0QfpNUvlUy84mZ2XZKiv39s")
Good so far. Then I need to figure out what return type is expected from the provider function:
# Inspect return type
.jmethods(service, "provider")
[1] "public org.scribe.builder.ServiceBuilder org.scribe.builder.ServiceBuilder.provider(java.lang.Class)"
It needs "java.lang.Class". This is where I get confused. What does that mean? I guess, looking at the source, it needs a return type of type "ServiceBuilder", but how to do that? This was my best guess after looking at ?.jcall (note: 'use.true.class = TRUE' didn't do anything):
> .jcall(obj = service, returnSig = "Lorg.scribe.builder.ServiceBuilder;", method = "org.scribe.builder.ServiceBuilder.provider", "org.scribe.builder.api.TwitterApi")
Error in .jcall(obj = service, returnSig = "Lorg.scribe.builder.ServiceBuilder;", :
method org.scribe.builder.ServiceBuilder.provider with signature (Ljava/lang/String;)Lorg.scribe.builder.ServiceBuilder; not found
Any ideas?
It looks to me like the provider method returns ServiceBuilder and takes a Class as a parameter.
In Java if you put the classname followed by .class that makes a class literal object in the code. If you instead load the class using reflection you can refer to a class by its string name. I'm not sure how this works in R but in Java the syntax is:
Class c = Class.forName("org.scribe.builder.api.TwitterApi");
This puts the class instance into the variable c. Then you could call the provider method:
service$provider(c);

Resources