Any alternatives to Virtuoso as a graph store? [closed] - graph

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I like it (very much) that is supports SPARQL/Update and the SPARQL endpoint that comes with it, but
I'm a little worried about vendor lock in
I think it is overkill for my requirements (I want a graph store with half a billion triples)
I would love to use an open-source and free product instead
So far I couldn't find any descent and comparable products (commercial or otherwise). They pretty much look immature or experimental to me.
Ideas ?

What you might be looking for is http://4store.org/ and you might also try searching for questions very like this over on http://www.semanticoverflow.com/ (link is defunct)

Two others besides 4store that #dajobe has already mentioned are Dydra and the Talis platform. Vendor lock-in should not, in general, be a problem if you stick to language features specified in the SPARQL standards.

Having used a lot of different Triple Stores as storage layers in my research project I would recommend the following two:
4store - Already mentioned by dajobe and is very good and has frequent releases to fix bugs and add new features as SPARQL 1.1 continues to be standardised. Also has benefit of being totally free
AllegroGraph - Free for up to 50 million Triples though tends to be be quite a RAM hog even at relatively low numbers of Triples (e.g. used around 3 of my 4GB of RAM when I had about 1.5m triples). Actual memory usage will vary with usage - in my case I was running an app that meant my entire dataset had to be loaded into memory. I haven't used Version 4 so I can't say whether they have improved this
While Virtuoso is very good at some things it has a seriously bad case of feature creep and has a lot of non-standard/proprietary features which like you imply might lead to vendor lock in.
Like Ian says stick to using the core language features in the SPARQL Standards and then you can easily move to a different Triple Store as your needs change. When developing your application try and design it to be storage agnostic so you can just plug in a different storage layer as your need to. How easy this is to do will depend on your programming environment/language/API but doing it will be beneficial in the long run.

We have positive experience with Bigdata. 4Store (as mentioned above) is also good, but does not have support for transactions.

I'm a little worried about vendor lock in
OpenLink Software (my employer) works very hard to implement open standards and specifications where they exist and are sufficient. We add extensions, and document that we've done so, when necessary -- as with the aggregate and other analytics functions which were not part of SPARQL 1.0, but are part of SPARQL 1.1 and/or will be part of SPARQL 2.0.
If you stick with the published standards, you won't be locked in. If you need the extensions, we think we're not so much locking you in as enabling and empowering you... but your mileage may vary.
I think it is overkill for my requirements (I want a graph store with half a billion triples)
By all means, consider all the functionality you need when making your decision. But it seems likely to me that you'll be doing more than storing your triples. Queries, reasoning, query optimization, Federated SPARQL (joins against other remote SPARQL endpoints, formerly known as SPARQL-FED), and other functionality may not be so much overkill as simply not-yet-needed.
It's worth noting that Virtuoso can be run in a minimized form (LiteMode=1) which disables many of the features perceived as "overkill" and makes it much more like an embedded DBMS -- but still hybrid at the core. When Lite mode is on:
Web services are not initialized, i.e., no web server, DAV, SOAP, POP3, etc.
replication is stopped
PL debugging is disabled
plugins are disabled
Bonjour/Rendezvous is disabled
tables relevant to the above are not created
index tree maps is set to 8 if no other setting is given
memory reserve is not allocated
DisableTcpSocket setting is treated as 1, regardless of value in INI file
I would love to use an open-source and free product instead
Virtuoso has two flavors -- commercial (VCE), and open source (VOS). Commercial includes shared-nothing elastic clustering which brings linear scalability, SPARQL GEO indexing and querying, result transformation to CXML for exploration with PivotViewer, and other features which VOS lacks ... but use the one that makes sense to you.

Related

Difference of safety-critical SW development [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
When developing safety-critical software using some quality standards (like e.g. IEC 61508 or DO 178-C) developers have to care about many things. I know that the verification in each development step is quite time consuming and expensive. Moreover, I know that some reduced programming languages are used.
But I am interested in the concrete difference to a "normal" SW-development process. I mean in the standard V-Model, verification and testing should also be part of each development step. What do I have to consider finding requirements? What do I have to consider in SW design?
It isn't so much as a change in the "V Model" that helps verify critical system, it's what you do at each step of the way.
For example you may prefer to plan your development using waterfall in order to have verification steps and controlled transition periods. This has the benefit of staying in line with any government regulations that may be in place.
While developing it is common to use a limited subset of assemblies (APIs) in order to prevent developers from preforming dangerous operations. This type of restriction can also ensure that developers utilize the APIs correctly, such as cleaning up objects as a requirement.
Once the product has been developed you'll likely have gone through all of the testing phases. It is common in industry to develop test fixtures in order to verify and generate data to prove to the government or customers that your system says what it does.
In general, this topic is very deep. You did mention standards, one more is the ISO 2008 standard. I think what you should keep in mind is that the process doesn't change much (the life cycle model stays generally the same). But what you do at each step of the model will change depending on the project. You can take classes on Project Management... In fact it is a tract and sometimes a full degree program. So there's tons to learn about process and how to manage different projects.
Googling system critical projects and project management will likely generate a trove of knowledge.
Hope that helps shed some like on the subject.
EDIT: Finding requirements, like in a waterfall process, is very time consuming. It will involve understanding the customers needs and goals of course. In general you have to spend lots of time in this area for government reasons and software architecture. It's not really a different technique... Be explicit, understanding the requirements is most critical. The system shall recover from 90 second timeouts within 5 seconds of resetting. <- its like all other requirements in SW engineering... Explicit and testable. Objective not subjective. Think Grammer Nazi level of consideration.
One example of a safety critical systems is lockheeds F-35... The system requirements manuals are huge and the process to make a change requires meetings and quite a bit of paperwork.

anybody tried neo4j vs titan - pros and cons [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Can anybody please provide or point out to a good comparison between Neo4j and Titan?
One thing i can see is in terms of scale - Titan is scaleout and requires an underlying scalable datastore like cassandra. Neo4j is only for HA and has its own embedded database. Any other pros and cons? Any specific usecases. (Is Titan being used anywhere currently?)
I also have the following link: http://architects.dzone.com/articles/16-graph-databases-compared that gives a objective compare for graph databases but not much on pros and cons between Neo4j and Titan.
We have a social graph in which in a day we add almost 1 millions of node and twice as many edges. We started with neo4j graph because yes, it is very fast due to fact that its storage is on the same machine on which graph engine runs. But following are the experiences that we would like to share with you about neo4j.
Not good fit for real time query. We have social structure like twitter. We have to show latest 20 activities (and its associated activities) of all the users that a user follow on his time line.
We have some users who follows more than 1000 users. The gremlin query that we wrote for this (if you are interested then we can share gremlin query) really produced so much GC that a server with 8 cpu and 48 gb ram used to freeze and we had to restart the server to get it online again.
Many a time network partition observed.
There is not vertex centric index that is very much required in graoh database.
Ultimately we are so much fade up with server performance with gremlin query that we had to change the database to titan.
On titan we are getting reasonable performance and also scaling is very easy as we are using cassandra as backend storage. But mind you that .. using gremlin here also not a good idea as multiget query is very ugly to write and without multiget its query becomes very slow.
Great to see you exploring graph databases. I will speak to the Neo4j part of your question:
More than 30 of the Global 2000 now use Neo4j in production for a wide range of use cases, many of them surprising, even to us! (And we invented the property graph!)
A partial list of customers can be found below:
www.neotechnology.com/customers
Neo4j has been in 24x7 production for 10 years, and while the product has of course evolved significantly since then, it's built on a very solid foundation.
Most the companies moving to graph databases--speaking for Neo4j, which is what I know about-- are doing so because either a) their RDBMSs weren't able to handle the scope & scale of their connected query requirements, and/or b) the immense convenience and speed that comes from modeling domains that are a graph (social, network & data center management, fraud, portfolios, identity, etc.) as a graph, not as tables.
For kicks, you can find a number of customer talks here, from the four (soon five) GraphConnect conferences that were held this year in major cities around the world:
http://watch.neo4j.org/
If you're in London, the last one will be held next week:
http://www.graphconnect.com
You'll find a summary below of some of the technology behind Neo4j, with some customer examples. To speak very directly to your question about scaling: Neo4j has a unique architecture designed to maximize query response time & query predictability, by allowing horizontal scale-out in such a way that each instance can access the graph without having to hop over the network. (Need more read throughput. Just add instances.) It turns out that this approach works well for 95+% of the graphs out there, including some production customers who have more than half of the Facebook social graph running in a single Neo4j cluster, backing an "always on" 24x7 web site.
www.neotechnology.com/neo4j-scales-for-the-enterprise/
One of the world's largest postal delivery services does all of their real-time package routing with Neo4j. Railroads are building routing systems on Neo4j. Some of the world's largest customers are using them for HR and data governance, alternate-path routing, network & data center management, real-time fraud detection, bioinformatics, etc.
Neo4j's Cypher query language is the only declarative query language built expressly for property graphs. It takes all of the lessons learned from our 13-year old native Java API (which was the basis for Blueprints, which some of the other graph databases have since adopted) and rolls them into a next-generation language. Cypher is a great way to learn graphs, and to develop applications; and there's always the native Java API if you have special needs or value "bare metal" performance (i.e. sub millisecond vs. single-digit millisecond) performance above convenience. Neo4j is built from the ground up to support graphs, and has a graph storage engine that is built to store graphs; unlike some of the more recent additions to the graph database ecosystem, which are architected as graph libraries on top of non-graph databases, and are subject to some of the inherent limitations. (e.g. FlockDB, because it is based on MySQL, will still be very slow for anything greater than one hop.)
Definitely feel free to contact the Neo team if you need anything more specific. We'll be more than happy to help you! http://info.neotechnology.com/ContactUs.html
Good luck!

DNN vs Composite C1 - Pro and Cons [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I've spent a few hours already studying some CMS solutions for one of my customers' new business...In the end, I've taken in consideration these two: DotNetNuke and Composite C1.
I know all of these have a lot of features, a lot of capabilities etc.
What I would like to know:
is overhere any .NET guy who have worked with both of these? (if yes, can you please share your opinion - PRO and CONS?)
if for some reason I do need to add some new ASP.NET code (for some custom things), which of these two is better for this?
You should definitely go with Composite C1. I have been using DotNetNuke for many years and been very frustrated due to its limitations.
Below I have summed op what I think is best/worst for both CMS products (note: it's been two years since i set up a DNN site, so I do not know if there has been any improvement):
PROS for DNN
List item
VERY simple administration interface.
It is possible to edit pages directly by simply clicking on content when you are signed in.
Huge user/developer base - many extension modules available. Many of them for no charge.
CONS for DNN
The simplicity in DNN is basically the root for all cons;
Hard to extend - you have to develop packages for DNN with your specialized code or generate user controls in .NET (.ascx) files.
Difficult to skin
A lot of overhead related to webforms AJAX files (100+ kb without compression)
No build-in nice urls (but easy to find extensions enabling nice urls)
Relies on SQL-database ( = additional costs)
PROS for C1
Easy installation and setup.
Can run on file system only (and easily be upgraded to SQL support).
Build-in package manager with easy to install extensions.
Support for MVC, XSLT, Webforms and the best: Razor syntax!
Specialized code can be developed easily due to great API.
Great templating support. Can be made very simple and very advanced depending on your skills.
Nice URLs in the 3.0 releases.
Easy to set up multi-site support.
Supports Windows Azure out of box.
CONS for C1
Based on XML for data storage (can cause problems if server shuts down unexpectedly etc. - I have not experienced this)
THe many features in the backend can be a little overwhelming.
Backend takes quite a while to load because it is build like a webapp.
Not as large user base as DNN
Bottomline; Composite C1 is far better than DotNetNuke. Specially if you want to add custom code/functionality.
If you want even more functionality and a more mature CMS than Composite C1 you should take a look a Umbraco. It is open-source just as C1 and of course developed by great Danes ;)
I have quite some years of experience with DotNetNuke.
Avoid it at all costs. I'm serious.
Edit: Since I'm being asked about my reasons for this bold statement, I'll try to provide them.
The company I worked for had around 300 clients on DNN. Many of them were rather large corporations. I have a lot of epxerience with DNN.
First of all, DNN is riddled with bugs. Bugs that are never fixed. Instead, the guys behind DNN seem to be concerned more about introducing new features than providing stability. I've personally submitted a boatload of bug reports to their tracker. How many were fixed? Virtually zero. In most cases I even took the time to provide a patch! To no avail. When they made the switch to C#, they simply closed most of the open issues because their laziness began to bite them in their ass. "In order to better manage and assess issues for fixing, any issue that has not had any activity logged previous to January 1st, 2011, will be auto-closed. " (See here)
I've been bitten by so many bugs in those years. It was a rather frustrating and unsatisfying experience.
Secondly, new features are usually problematic and faulty. DNNCorp also often decides that these new features are not that important and subsequently abandons them for new features. For instance, their taxonomy module has some serious issues when you set up multiple portals and try to use system-wide vocabularies. To my knowledge this hasn't been fixed yet. Their MetaData / ContentItem / ContentType API had some serious problems for a long time and probably still has. It's not even really used for anything, even tho it could alleviate some of the problems that I describe later down (architecture).
Thirdly, their documentation just sucks.
More importantly, I think DNN's architecture is rather outdated. It carries a lot of old baggage. Their tab / module approach makes it very hard to create structured content and make relationships between content items. As soon as you try to create complex web sites, it falls flat on its nose. Not just from a programmer's perspective, but also from the perspective of a content administrator.
The overall impression is that these people are a) not very good programmers and b) don't know what they are doing.
I have worked with both CMS systemes (and many others) and I would recommend you to use Composite C1.
It is in my experience much easier to learn and much faster to be productive in Composite C1. The UI is much better (prettier and eaiser to understand). They have lots of good resources on their website. In my POV the most powerfull feature of Composite is that you don't have to bother about the datalayer - you just create your datatypes as classes or in the GUI and a 'ORM' just make the whole thing happen. That is if you even need datatypes (changes are you don't if it's a simple website).
They're both free and open source. DotNetNuke have a lot more modules that you can buy from third party developers, but Composite C1 still have a lot of what you need.
It's easier to develop new modules in Composite just because of the whole 'ORM' concept.
Only downside I can think of when I compare these to systems is that extranet functionality (logged in users that are not admins) is built into DotNetNuke. This is a module that you have to buy from Composite or develop yourself.
Composite C1 is the best out there for custom functionality and high flexibility. I've been using it for a few weeks and my site will finally be up and running because of it. Short learning curve (speaking for myself) but still very robust and flexible. Love it because you can use Razor with it.
DotNetNuke is still using WebForms and even though it's got a huge user base, but it is somewhat outdated.

Is there a MDSD/MDA success story for a real world application? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I am currently facing a situation where I as an advocate of test driven development have to compete with an advocate of model driven software development (MDSD) / model driven architecture (MDA).
In my opinion, code generation is a valuable tool in my toolbox and I make heavy use of templates and automation when needed. I also create diagrams in UML when I think this helps to understand the inner working or to discuss architecture on the white board. However, I strongly doubt that creating software via UML (creating statecharts and sequence diagrams to create working code not only skeletons of code) is more efficient for multi tier applications (database layer, business/domain layer and a Gui, maybe even distributed). It seems to me when it comes to MDSD, the CASE tooling suddenly isn't just a tool anymore but it is the thing to satisfy: As I see it, on the one hand, MDSDevelopers profit from the higher abstraction UML gives them but at the same time they are struggling with modifing the codegenerator/template/engine to fullfill their needs which might be easily implemented (and tested) if used another tool out of their toolbox (VisualStudio, Eclipse,...).
All this makes me wonder if there has been a success story (suceess being that the product was rolled out in time, within the budged and with only few bugs and parts of the software have been reused later on) for a real world application which fullfills this creteria and has been developed using a strict model driven approach:
it has nothing to do with the the Object Management Group (OMG) or with consultants related to MDSD/MDA/SOA/
the application is not related to Business Process Modelling and is not a CASE tool itself
the application is actively used by end user
it has at least three tiers, including a user interface which goes beyond displaying raw table values and is not one of the common MDA/MDSD examples ("how to model a coffee machine, traffic light, dishwasher").
A tiny, but nevertheless useful testimonial on the use of MDSD has been posted on the Model Driven Software Network:
http://www.modeldrivensoftware.net/profiles/blogs/viva-mdd-follow-up-building-a?xg_source=activity
It is a relatively small app being developed, but still a good example of MDSD in action.
More success stories are listed at Metacase's site (http://www.metacase.com/cases/index.html). Metacase sells MetaEdit+, which implements DSM (Domain-Specific Modeling). DSM is just a form of MDSD.
I am also developing ABSE (Atom-Based Software Engineering), another form of MDSD, very close to DSM. ABSE is outlined at http://www.abse.info.
I used MDA and code generation on an embedded system project using 4 processors connected via CAN. We had over 20 axes of motion and many, many sensors. The system was highly robust and maintainable as the mechanical components were evaluated and modified.
We worked in the models and generated code so the models were always up-to-date. We did a careful domain analysis to achieve subject matter isolation. The motor control required very high performance and so was not modeled or generated. Our network drivers were also hand-coded, and we wrote interfaces that allowed bridge services to send events to any service anywhere in the system as needed (although this was tightly controlled so as to minimize interprocessor dependencies).
Using the method took a bit of discipline, but having working models was great because they can be reviewed by non-software types.
Version control and differencing of the models was a bit of a challenge but we had a small, localized team so we were able to avoid merge issues.
The good people at Pathfinder Solutions (our tool vendor) can help mentor you through the project.
You could also take a look at the slides from previous Code Generation conferences. Several of these talks were from successful case studies e.g. http://www.codegeneration.net/cg2009/slides.php
I am working on one of the project for legacy modernization and its using MDA tool named Bluage. Its for a big healthcare organization and its in production so i could say that its successful. MDA is better in case of legacy modernization as it can generate KDM model from some technologies like pacbase which are going to be out of support.
I worked on a MDSD system that generated admin style web apps in Google Closure. I believe that your question is compelling. Too much complexity and your MDSD system is too hard to use. Too simple and you won't generate apps that are useful in the real world. Where MDSD really shines is in saving developer time typing lots of plumbing style code but how can MDSD remain effective over multiple releases? Requirements can go in many directions. That is the real challenge. I recently blogged about my MDSD lessons learned on that project.

What are some key concepts for effective development teams? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 6 years ago.
Improve this question
Where I work we've recently put together what we call the Development Standards Committee which is tasked with improving our procedures, processes, methodologies, tools, standards, and whatever we think would help us become a more effective team.
We've got a spreadsheet of items that we've ranked and are going to start tackling from the top down. We've got things such as better source control (currently on SourceSafe), implement a bug tracker (such as Mantis of FogBugz), peer code review, move to .Net 3.5, possibly move to some form of Agile, do more actual team development rather than single developer per project type stuff, and some other things...
What do you think are some key things that can make or break a development team? What should we add to this list?
Some additional information: We have about 12 people on our windows team, and about fifty in development if you include all platforms. We want to improve as much as possible for everyone, but we're our biggest focus is the Windows team. All of us have been here for a couple of years at least, so most of us know each other and work together pretty well.
The number of people on your team is actually really important here. There are basic things that every team should implement (source code control, bug tracking, etc), but there are things that are different base don team size. Code reviews on a very small team, for instance, can be more informal.
Moving to Agile is a good idea, unless you're particular development environment makes it a bad idea. Also, you'll not be able to do this without support from the people who are using your software.
Consider doing things to ensure that communication between the team is easier and with less roadblocks - do all your members know each other pretty well? Can you work with each other? Do you understand each other's idiosyncracies? Learning to work as a team is much more important than any random process improvements you can make.
Require comments when you check in code (it's great if you can tie commits back to your bug tracker)
Maybe Static Code analysis, like what's built into Visual Studio
Continuous Integration like CruiseControl
Development teams really need good people to start with, that work well together, but this isn't really an item to add to the list. It does however affect my first recommendation, be pragmatic. If you're not encouraging your developers to think about how they work and can drive them selves to improve, it's really hard to lay down a development environment that will do it for them.
Mentor and Training: If you can't do XP, then at least hook up your Juniors with Seniors whenever you can. Not only will you share knowledge but you'll share the context around your projects you own.
Some sort of Continous Integration and regular, tested, working "releases" make wonders for quality.
as better source control (currently on SourceSafe)
If this is Visual SourceSafe -- you need to change this immediately. Try cvs, svn or even something paid like Perforce.
There exists something called Rational Unified Process that deals with your problem (and much more).

Resources