What is a VAD package in virtuoso? - virtuoso

I googled quite a lot but beside installation instructions I could not find an answer to my simple question:
What is a Virtuoso Application distribution (VAD) package and what it is useful for?
I am mostly interested in dbpedia: having already loaded dbpedia data, what is the added value of installing dbpedia_dav.vad ?
Many thanks.

Product documentation is a useful resource to start with --
6.1.4. VAD - Virtuoso Application Distribution
VAD provides a package distribution framework for installation, management, dependency checking, and un-installation of Virtuoso applications. A VAD package contains all required Virtuoso components, which would constitute an application or hosted solution, within a single distributable file. A VAD package cannot contain any system parts independent of Virtuoso, thus excludes operating system executables, shared objects, installers, and settings.
To your specific --
The DBpedia VAD includes various things, like the "green page" template, which are not part of the DBpedia data set nor the base Virtuoso install.
For the future --
DBpedia-specific questions are often better raised to the DBpedia Discussion list.
Virtuoso-specific questions are often better raised in Virtuoso-specific arenas, such as the Virtuoso Users mailing list, the public OpenLink Support Forums, or a confidential OpenLink Support Case.
ObDisclaimer: I work for OpenLink Software, producer of Virtuoso, supporter of DBpedia.

Related

GRPC Services: Central Proto Repository or Distributed

We plan to keep a central proto repository to keep all proto definitions and its generated code here. We would keep messages as well as service definitions in a central Git repo. We plan to drive API design standard from this central repository.
But, any service which want to use this to expose a sever service or generate clients would have to import from this repo (.pg.go).
Do you see any issue with this approach? Or do you see keeping service proto files individually in the service repos as a better alternative.
PS: Starter in the GRPC journey of building microservices. Still learning the right way to structure and distribute code here.
This question occurs regularly and I suspect the fact that there's no published guidance is because the answer depends on your needs more than the technology's.
The specific issue of many vs one is not dissimilar to whether you prefer to use a monorepo and only you can effectively determine that. Perhaps one way to determine this is to understand now (and in the future) how many shared dependencies your services will have? Another may be to determine how many repos you'll have (how complex would it be to manage 10s or 100s of repos?).
In my experience, it's a good practice to keep the protos distinct (i.e. separate repo) from code that uses them. Not only may you want to version protos independently from implementations (across languages) but the implementations themselves are independent; in one use-case I must clone a repo containing an entire system (written mostly in one language) in order to get its protos to generate bindings in another language. In this case, it would be preferable if the repo were limited to just the protos.
You could look to examples for guidance. The gRPC repo keeps a bunch of stuff rooted on the grpc package in addition to math. Although less broad, Google bundles its well-known types under google.protobuf.

Advice needed for R package security in production

I am working as a Data Scientist for a small start up and we are using R as part of our platform for analysis, dashboards etc. Therefore, I need to ensure that we maintain security with each package we use and load.
I have looked around and done extensive searching and have come across the following links:
This is the official R Studio Blog Security update page.
This blog post shows how you can implement rJava to help with those packages that require it, though it does state that '...the integrity & safety of the R package ecosystem is still in the “trust me, everything’s 👍!!”'
This post gives some good advice for package security, but basically boils down to: if you get it from CRAN or another trusted source then it should be ok.
The CVE site lists vulnerabilities, though the last one was back in 2017.
However, all the above links essentially say the same thing, which is "if its from CRAN (or similar), then it is probably fine". Now this might indeed be the case, but I was hoping for something a bit more rigorous. Has anyone else come across this issue with production R deployment?
If possible, if someone could direct to where I might be able to find out more information on checking for security updates, breaches and changes for R packages, or how to go about testing the security myself, I would be very grateful.
Thanks!

How to keep abreast of known bugs and bug fixes in R packages?

Is there a standard R community resource for keeping up to date on known bugs or bug fixes for packages? My current approach is rather manual. (NB: I'm restricting this to CRAN - see Note 1.)
My use case is basically bug surveillance and the management of package updates. I've been averaging a couple of bug discoveries each month for awhile (which I duly report to the authors ;-)). Since a lot of my work is done with virtual machines, I tend to update the VM images when I have a good handle on the bug status for necessary packages. When a bunch of bugs are fixed, I can remove my workarounds, which is great, and I update the images. When I discover an outbreak of bugs, I don't create a new image.
Here are the sources I'm currently using:
NEWS files: Many, but not all, packages have NEWS files. These are certainly a helpful place to start.
Package home page: Some packages do not have a NEWS file on CRAN, but separately post a change log on the author's site.
R project-hosted mailing lists
Google Groups for packages
Personal communication with package authors
Bug tracking for packages (e.g. a developer may use Bugzilla)
It's one thing to be the first to discover a bug (I grant that bugs happen to all of us), it's another to belatedly discover a bug that is either already known or, better yet, already fixed. Both slow down my own coding, but better bug surveillance (maybe we need a cdc4R package :)) would significantly reduce the impact. Without a standard update alerting system (e.g. an extension to update.packages() that reports on which packages could be updated and links to info on what's changed), it's the user's job to seek out this information.
As such a user, trying to seek out this information, is there some standard resource I've overlooked in the list above? For instance, is there an R mailing list where it's common for developers to post their changes and bug fixes? Or is there a site that aggregates such posts, posts tests (CRAN posts R CMD CHECK output, it seems), or that gives some other feedback?
A few additional notes on other resources, for others' benefit:
I see that CRANberries has a terse diff summary on packages, which is new to me. (I am inspired to consider a grep for bug or fix in the diff output.)
bug.report() in R is a good way to send a message to R Core or the email address of a package maintainer.
Several testing packages worth consideration are: testthat, RUnit, and svUnit.
My personal "quick test" is to simply use digest to verify that results match, without having to test equality of very large objects.
Note 1: I'm tagging this cran because it's impossible to manage the universe of all R packages. For an individual package author, one can distribute a package wherever they'd like, use whatever mailing list or bug tracking system they like, etc. However, that's outside the "mainstream" for R. Were I to release a package and alert users to changes, bugs, bugfixes, I'd go with CRAN + NEWS + Bugzilla + Google Groups + R-Forge (and/or RForge), etc., but is there another standard reporting mechanism that is missing from this list?
In some sense, this note also serves to ask if there's a mechanism that developers are encouraged to use. I suspect there is no standard, as packages by R Core members seem to do many different things regarding bug and change reporting.
Note 2: I'm also adding administration (though something else may be more apropos), since this also relates to administering R. For reproducibility, administration of packages is quite important; when there are multiple users or more moving pieces, keeping aware of bugs and fixes becomes an administrative task, as well as an important consideration for development that depends on the external packages. If another tag, e.g. system-administration is more appropriate, I'm open to a change.
Not a complete answer but here are some thoughts.
In the case of data.table we track bugs (and feature requests) on R-Forge here. I imagine you could query R-Forge's tracker (programatically) for all packages hosted there. To add to your list anyway. That web tracker is where bug.report(package="data.table") points to (not just an email address to maintainer).
Also, anyone can subscribe to any <pkgname>-commits#lists.r-forge.r-project.org mailing list to receive a unified diff and commit message (at the time of commit) for each project on R-Forge. I'm not aware of a general mailing list spanning any commit to any R-Forge project, though.
At the top of ?data.table there is a link to up to the minute NEWS. This is how we communicate to users what is in the latest version (and in development) if they upgrade. That link updates in real-time; i.e., "up to the minute" is meant literally. But, they do have to check there!

Document/Scripts management for R code

I am looking for a solution that allows me to keep a track of a multitude of R scripts that I create for various projects and purposes. Some scripts are easily tracked to specific projects, whereas others are "convenience" functions created to serve a set of tasks.
Is there a way I can create a central DB and query it to find which scripts match most appropriately?
I could create a system using a DBMS manually, but are users aware of anything in general or specific to R, that comes in the form of a software tool (maybe FOSS) ?
EDIT: Thank you for the responses. My current system is just a set of scripts with comments that allow me to identify their intended task. Though I use StatET with SVN, I would like a search utility along the lines of the "sos" package.
The question
I am looking for a solution that allows me to keep a track of a multitude of R scripts
that I create for various projects and purposes. Some scripts are easily tracked to specific
projects, whereas others are "convenience" functions created to serve a set of tasks.
fails to address the obvious follow-up of why the existing mechanism is not suitable:
Create a local package for each project
Create one or more local packages for local utility functions
Use R's already existing mechanisms for searching, indexing, testing, cross-referencing
And use any revision control system of your liking, local or on the web, to host the code for 1. to 3. above.
Reinventing an RDBMS schema for 1. to 3. is just wrong in my book. But if you must, go ahead and replicate what you can already (mostly) get for free in tested and widely used code.
R comes with several mechanisms for searching for help, most of which naturally use CRAN. Some examples: the sos package, cranberries, crantastic, and rseek. In many cases, these could be adapted to use a local repository (you can find out how to create a local repository in the R manual, which is very easy to do). Otherwise, if you package your scripts and submit them to CRAN, you will naturally have these available to you. I would also highly recommend this presentation on the subject: Creating R Packages, Using CRAN, R-Forge, And Local R Archive Networks And Subversion (SVN) Repositories from Spencer Graves and Sundar Dorai-Raj.
These would require you to put your code in packages, and create documentation, all of which is worth doing anyway. The package documentation turns out to be very useful for both documenting what things do, and helping your find them in the future. You can use roxygen to create this documentation in-line with your code. Also read this related question: Organizing R Source Code.
Alternatively, the help.search() function can be very useful for searching local packages, regardless of whether you have a repository set up.
You'd probably be best working with a version control system. Many can be indexed and be made search-able. At my work, a stack of R, Eclipse, StatET, Subversion and Subclipse works very well for us.

Oracle Coherence License Issue

Are there any restrictions for using coherence.jar without any license?
coherence.jar is open for downloading without any fee.
You can use it for development purposes. Any other purpose means purchasing a license. On the download page is a link to the license agreement that states:
You may not:
use the programs for your own internal data processing or for any
commercial or production purposes, or
use the programs for any purpose
except the development of your
application;
use the application you develop with the programs for any internal data
processing or commercial or production
purposes without securing an
appropriate license from us;
continue to develop your application after you have used it for any
internal data processing, commercial
or production purpose without securing
an appropriate license from us, or an
Oracle reseller;
remove or modify any program markings or any notice of our
proprietary rights;
make the programs available in any manner to any third party;
use the programs to provide third party training;
assign this agreement or give or transfer the programs or an interest
in them to another individual or
entity; - cause or permit reverse
engineering (unless required by law
for interoperability), disassembly or
decompilation of the programs;
disclose results of any program benchmark tests without our prior
consent.
The first two points are the most relevant.
On the Coherence download page it says you need to agree to the Oracle Technology Network (OTN) License Agreement to download the software.
That license contains this text:
We grant you a nonexclusive, nontransferable limited license to use the programs only for the purpose of developing, testing, prototyping and demonstrating your application, and not for any other purpose. If you use the application you develop under this license for any internal data processing or for any commercial or production purposes, or you want to use the programs for any purpose other than as permitted under this agreement, you must obtain a production release version of the program by contacting us or an Oracle reseller to obtain the appropriate license.
So it's a free download only for development purposes. (Most Oracle Products are available free to developers.)
But if you want to use this code in production or in a product you're selling you will need a license.
Have you considered using Infinispan as an open source alternative to Coherence?
Don't forget that the version that you download from the public website is usually just the major release. The minor release, with all the many bug fixes, is only available if you have a support contract.

Resources