Best UI interface/Language to query MarkLogic Data - xquery

We will be moving from Oracle and use MarkLogic 8 as our datastore and will be using MarkLogic's Java api to talk with data.
I am exploring for any UI tool (like SQL Developer is there for Oracle), which can be used for ML. I found that ML's Query Manager can used for accessing data. But I see multiple options wrt language:
SQL
SPARQL
XQuery
JavaScript
We need to perform CRUD operations and search for data, and our testing team is aware of SQL (for Oracle), so I am confused which route I should follow and on what basis I should decide which one/two will be better to explore. We are most likely to use JSON document type.
Any help/suggestions would be helpful.

You already mention you will be using the MarkLogic Java Client API, that should provide most of the common needs you could have, including search, CRUD, facets, lexicon values, and also custom extension though REST extensions as the Client API will be leveraging the MarkLogic REST API. It saves you from having to code inside MarkLogic to a large extent.
Apart from that you can run ad hoc commands from the Query Console, using one of the above mentioned languages. SQL will require the presence of a so-called SQL view (see also your earlier question Using SQL in Query Manager in MarkLogic). SPARQL will require enabling the triple index, and ingestion of RDF data.
That leaves XQuery and JavaScript, that have pretty much identical expression power, and performance. If you are unfamiliar with XQuery and XML languages in general, JavaScript might be more appealing.
HTH!

Related

Changing the database from SQL Server to PostgreSQL in creating ASP.NET web application?

I am currently using SQL Server for database and Dapper (ORM) for mapping relation to model classes. I have used multiple reads in Dapper so I am able to get multiple tables at one call to the database. But due to a certain circumstance, I have to change my database connection from SQL Server to PostgreSQL. In Postgresql, there are no options facilities for using the power of query multiple reads.
Is there any way to handle the below situation in Postgresql?
using (var multi = conn.QueryMultiple(query, param, commandType: CommandType.StoredProcedure))
{
obj.InventoryItemOrder = multi.Read<InventoryItemOrder>()
.FirstOrDefault(); //getting single object data (so firstordefault)
obj.InventoryItemDataModel = multi.Read<InventoryItemDataModel>(); //list
}
Can I use this concept when dealing with PostgreSQL and Dapper in building an ASP.NET application?
I'm not familiar with Dapper, but from a quick look at the doc, it appears that QueryMutiple basically runs multiple statements in the same command (that is, separated by a semicolon) and then maps the results of each statement.
You can certainly do the first part of that with Postgres: combine multiple statements into one separated by a semicolon. They will both be run. However, I am not aware of anything within Postgres itself that will automatically return results of the statements separately. Depending on exactly how you are interfacing with Postgres and getting the results, you could perhaps create a custom method of mapping these results.
From a purely database perspective, you don't gain much, if any, performance advantage from the behavior described in QueryMultiple as long as you issue the separate queries on the same connection.
It's the connection startup and teardown that is expensive, and even Dapper would have to issue and map the results for each query, so there is no performance benefit there. Rather, it's essentially a matter of syntactic sugar.
It's not clear from a quick look at the doc if Dapper is compatible with Postgres. If it is, and its QueryMultiple call is supported there, then it's probably handling the mapping of the multiple statements within the ORM itself.
If Dapper does not support Postgres, however, I would recommend simply issuing the queries separately and handling their results separately.

Making a Fhir server using C# and SQL Server ( RDBMS )

I am excited with Fhir's promises. I started getting my head around on this subject for last couple of days.
We have an existing SQL Server database containing health related records. We are trying to communicate with Fhir compliant messages.
Sending data : Based on the given specification in http://hl7.org/fhir/, and using data object model of https://www.nuget.org/packages/Hl7.Fhir.DSTU2 , I can transform my relational data to Hl7.Fhir.Model data. Then , its a matter of transforming that data to either JSON / XML.
Consuming data : We can have the incoming data mapped to Hl7.Fhir.Model. But , I find it difficult to map extensions ( i.e. not a direct property) with our columns. Is there any way I can do this easily?
Is SQL Server not a good choice to build a Fhir server ? Do I have to consider using MongoDB / DocumentDB ?
you can add tables to support extensions directly, if you want. of course, you would not know the extensions internally and make use of the content in them. But it would be just like using mongo etc.
But you do not have to round-trip extensions. Many many FHIR implementations are exactly what you say: a FHIR facade over an existing schema, usually a relational database. They support specific extensions that they've decided to support, by building them into their schema (or they already existed)

Querying namespaces using Dataflow's DatastoreIO

Is it possible to query entities in a specific namespace when using Dataflow's DatastoreIO?
As of today, unfortunately no - DatastoreIO does not support reading from entities in namespaces due to limitations of the Datastore QuerySplitter API which is used to read results of a query in parallel. We are tracking the issue internally and your feedback is valuable for prioritizing it.
If the number of entities your pipeline reads from Datastore is small enough (or the rest of the processing heavy enough) that reading them sequentially (but processing in parallel) would be ok, you can try the workaround suggested in Google Cloud Dataflow User-Defined MySQL Source
You can also try exporting your data to BigQuery and processing it there, using BigQuery's querying capabilities or Dataflow's BigQueryIO connectors - those have no parallelism limitations.

Which is fastest to transmit: XML or DataTables?

I would like to know which is faster. Let me give you the scenario. I'm on a LAN, have a report to build using data from a SQL Server database (if we need the version let's say 2005) and have these ways of getting the report done:
Have a web service at the server, where the data is taken from the server and serialized into XML. The client uses this XML as a source for a report that is built in the client machine. The cliente would be a windows form app.
From the client side, connect to the database using ADO.Net, get a DataTable and uses as a source for the report built in the client.
The same as (2) but using a DataReader.
Also, is there a better way to do this?
The serialization to XML is going to cost both in terms of the time it takes to do it, the overhead of the XML structure, and the time to deserialize. It will, however, provide a format that is consumable by more technologies. If you are using .NET end-to-end, and that isn't likely to change, I would not use XML, but use the framework-provided data access methods. Personally, I would probably use LINQ over DataTables or a DataReader but that more for ease of use and readability on the client-side than any performance advantage.
The best practice is to not use .NET-specific types in the interface of a web service. Even if you are certain today that your service will never be called by anything other than a .NET program, things change, and tomorrow you may be told that the service will be called by a Perl program.
Perl programs don't understand DataSet. Nor do Java programs, nor anything other than .NET.
The best practice is to create a Data Transfer Object containing just the data you need to transfer, in simple properties with primitive types, or collections or arrays of primitive types, or collections or arrays of Data Transfer Objects, etc. These will be understandable by any client.

Best way to keyword search Amazon SimpleDB using EC2 and Asp.Net?

I am wondering if anyone has any thoughts on the best way to perform keyword searches on Amazon SimpleDB from an EC2 Asp.Net application.
A couple options I am considering are:
1) Add keywords to a multi-value attribute and search with a query like:
select id from keywordTable where keyword ='firstword' intersection keyword='secondword' intersection keyword = 'thirdword'
Amazon Query Example
2) Create a webservice frontend to Katta:
Katta on EC2
3) A queued Lucene.Net update service that periodically pushes the Lucene index to the cloud. (to get around the 'locking' issue)
Load balance Lucene(StackOverflow post)
Lucene on S3 (blog post)
If you are looking for a strictly SimpleDB solution (as per the question as stated) Katta and Lucene won't help you. If you are looking for merely an 'Amazon infrastructure' based solution then any of the choices will work.
All three options differ in terms of how much setup and management you'll have to do and deciding which is best depends on your actual requirements.
SimpleDB with a multi-valued attribute named Keyword is your best choice if you need simplicity and minimum administration. And if you don't need to sort by relevance. There is nothing to set up or administer and you'll only be charged for your actual cpu & bandwidth.
Lucene is a great choice if you need more than keyword searching but you'll have manage updates to the index yourself. You'll also have to manage the load balancing, backups and fail over that you would have gotten with SimpleDB. If you don't care about fail over and can tolerate down time while you do a restore in the event of EC2 crash then that's one less thing to worry about and one less reason to prefer SimpleDB.
With Katta on EC2 you'd be managing everything yourself. You'd have the most flexibility and the most work to do.
Just to tidy up this question... We wound up using Lightspeed's SimpleDB provider, Solr and SolrNet by writing a custom search provider for Lightspeed.
Info on implementing ISearchEngine interface for Lightspeed:
http://www.mindscape.co.nz/blog/index.php/2009/02/25/lightspeed-writing-a-custom-search-engine/
And this is the Solr Library we are using:
http://code.google.com/p/solrnet/
Since Solr can be easily scaled using EC2 machines, this made the most sense to us.
Simple Savant is an open-source .NET persistence library for SimpleDB which includes integrated support for full-text search using Lucene.NET (I'm the Simple Savant creator).
The full-text indexing approach is described here.

Resources