Neo4J as OLAP performance benchmark - olap

I would like to know if Neo4J can be used as OLAP. The search says it is great tool as OLTP. However there are no proven case studies as OLAP.
Also only Jasper 1st version of connector with Neo4J. I have not seen any connectors from other BI providers. If there are any BI providers that have auto connector with Neo4J please share.

Reportsanywhere has a Neo4j connector and understands graphs.
If your OLAP model can be described as a graph, then Neo4j is very suitable.

Related

How to copy data from SSAS (on-premise) to Azure Analysis Services

My company plans to copy all data from on-premise SQL Services Analysis Services (2017 tabular) to Azure Analysis Services on a periodic basis. We want to do this at least once a day, and then use the Azure Analysis Services version for Power BI reporting only. The idea is to reduce load on the on-premise cube, and to improve response in Power BI.
Is this a recommended design for reporting?
What are the methods available for the periodic copy of data (and pros and cons for each)?
In addition to Nandan’s approach, you could continue to refresh the model on premises, then backup and restore to Azure Analysis Services. I shared a PowerShell script which automates this operation.
can you tell us what is the data source for the on prem SSAS cube ?
In case it a SQL server, rather than syncing data from SSAS to AAS, you can directly refresh the AAS with on prem SQL server as the source via on prem gateway.
And in case if the cube is only used for reporting(powerbi), then having AAS is enough rather than maintaining SSAS and AAS.

Connection options to Azure Analysis Services vs Power BI vs Power BI embedded vs Power BI Premium

I'm looking for option to connect to and query the "Model"/Database of Azure Analysis Services(AAS)/Power BI. I've found multiple options for connecting AAS to .Net Core, but nothing for Power BI. Can I use any of the following connection types to connect to Power BI? And if so which flavor? Power BI Pro, Power BI Premium, Power BI Embedded?
I can connect to Azure Analysis Services using the following:
ADOMD <- This is my preferred connection method.
AMO
MSOLAP
REST API with Bearer token
I'm not looking to embed my report in a .Net Core application. I'm looking to actually query different models so everyone is reporting off the same data.
I don't want to shell out for AAS if I can do this with Power BI Pro!
As a short answer, I would say you can most likely do what you are asking with data sets hosted within a Power BI Premium instance or by users with a Premium per user (PPU) license.
My reasoning is simple. Access to the XMLA endpoint is only available for datasets hosted within Power BI Premium. Microsoft describes that in a bit more detail here within the Power BI documentation. Power BI Embedded (in a round about way) also ends up requiring Power BI Premium, so I believe this would be the same case for Power BI Embedded.
As a reminder on why the XMLA endpoint matters, Power BI Premium encapsulates an AAS instance (with some limitations). Per Microsoft (from here):
XMLA is the same communication protocol used by the Microsoft Analysis Services engine, which under the hood, runs Power BI's semantic modeling, governance, lifecycle, and data management.
So XMLA endpoint is required in order to allow connectivity to the AAS instance behind Power BI.
To answer your question regarding the different connection methods:
ADOMD/AMO
Microsoft provides a client library for .NET Framework and .NET Core for both ADOMD and AMO which should be able to connect to the XMLA endpoint of Power BI. You can browse those and the information available from Microsoft on those here. There are several open-source tools out there (recommended by Microsoft) that make use of these libraries. So if you are looking for examples, look in to Tabular Editor 2 or DAX Studio.
MSOLAP
Per Microsoft (in same link about client libraries):
Analysis Services OLE DB Provider (MSOLAP) is the native client library for Analysis Services database connections. It's used indirectly by both ADOMD.NET and AMO, delegating connection requests to the data provider. You can also call the OLE DB Provider directly from application code.
So unless you have some very specific needs regarding MSOLAP, I would probably rely on the Microsoft's AMO/ADOMD client libraries.
REST API
Assuming we are talking about the actual Power BI REST API (like this link) then, it depends. There are certain functionalities that the API exposes that might have been your use case for wanting to use a direct connection to the XMLA endpoint. For example, it does allow you to execute DAX queries or initiate dataset refreshes (all with its limitations). So I would advise you to review the API's documentation. It seems to be a good tool so far, and my guess is that it will only expand.

Streaming data from Oracle 11g to Kafka

I am looking for a solution to stream data from Oracle 11g to Kafka. I was hoping to use GoldenGate, but that only seems to be available for Oracle 12c. Is the Confluent platform the best way to go?
Thanks!
First, the general answer would be: The best way to connect Oracle (databases) to Kafka is indeed to use Confluent Platform with Kafka's Connect API in combination with a ready-to-use connector for GoldenGate. See the GoldenGate/Oracle entry in section "Certified Connectors" at https://www.confluent.io/product/connectors/. The listed Kafka connector for GoldenGate is maintained by Oracle.
Is the Confluent platform the best way to go?
Hence, in general, the answer to the above question is: "Yes, it is."
However, as you pointed out for your specific question about Oracle versions, Oracle unfortunately has the following information in the README of their GoldenGate connector:
Supported Versions
The Oracle GoldenGate Kafka Connect Handler/Formatter is coded and
tested with the following product versions.
Oracle GoldenGate for Big Data 12.2.0.1.1
Confluent IO Kafka/Kafka Connect 0.9.0.1-cp1
Porting may be required for Oracle GoldenGate Kafka Connect
Handler/Formatter to work with other versions of Oracle GoldenGate for
Big Data and/or Confluent IO Kafka/Kafka Connect
This means that the connector does not work with Oracle 11g, at least as far as I can tell.
Sorry if that doesn't answer your specific question. At least I wanted to give you some feedback on the general approach. If I do come across a more specific answer, I'll update this text.
Update Mar 15, 2017: The best option you have at the moment is to use the Confluent's JDBC connector. That connector can't give you quite the same feature set as Oracle's native GoldenGate connector though.
Oracle GoldenGate and Confluent Platform are not comparable.
Confluent Platform provides the complete streaming platform and is a collection of multiple software which can be used for streaming your data, where as GoldenGate is replication and data-integration software.
Also GoldenGate is highly reliable for db replication since it maintains transactional integrity, same cannot be said for Kafka Mirror Maker or Confluent's Replicator at this time.
If you want just pure transactions - please also consider using OpenLogReplicator. It supports Oracle database from version 11.2.0.1.
It can produce transactions to Kafka in 2 formats:
Classic format - when every transaction is one Kafka message (multiple DMLS per Kafka message)
Debezium style format - transactions are divided - every DML is one Kafka message
There is already a working version. You can try it.
Right now I am using ojdbc6 to connect to Oracle 11g. It is good enough but not perfect especially when using pooling mode to check if there are new updates on the original tables.
I tried also to read all tables using certain pattern but this did not work well.
The best mode to connect an Oracle DB to Kafka (especially when the tables are very wide, columns wise, is to use queries for the connectors. This way, you ensure that you pick the right fields and do some casting for numbers if you are using avro.

Method to replicate sqlite database across multiple servers

I'm developing an application that works distributed, and I have a SQLite database that must be shared between distributed servers.
If I'm in serverA, and change sqlite row, this change must be in the other servers instantly, but if a server were offline and then it came online, it must update all info equal other servers.
I'm trying to develop a HA service with small SQLite databases.
I'm thinking on something like MongoDB or ReThinkDB, due to replication works fine and I have got data independently server online I had.
There are a library or other SQL methodology to share data between servers?
I used the Raft consensus protocol to replicate my SQLite database. You can find the system here:
https://github.com/rqlite/rqlite
Here are some options:
LiteReplica:
It supports master-slave replication for SQLite3 databases using a single master (writable node) and one or many replicas (read-only nodes).
If a device went offline and then it came online, the secondary/slave dbs are updated with the primary/master one incrementally.
LiteSync:
It implements multi-master replication so we can write to the db in any node, even when the device is off-line.
On both we open the database using a modified URI, like this:
“file:/path/to/app.db?replica=master&bind=tcp://0.0.0.0:4444”
AergoLite:
Blockchain based, it has the highest level of security. Stores immutable relational data, secured by a distributed consensus with low resource usage.
Disclosure: I am the author of these solutions
You can synchronize SQLite databases by embedding SymmetricDS in your application. It supports occasionally connected clients, so it will capture changes and sync them when a server comes online. It supports several different database platforms and can be used as a library or as a standalone service.
You can also use CopyCat, which support SQLite as well as a few other database types.
Marmot looks good:
https://github.com/maxpert/marmot
From their docs:
What & Why?
Marmot is a distributed SQLite replicator with leaderless, and eventual consistency. It allows you to build a robust replication between your nodes by building on top of fault-tolerant NATS Jetstream. This means if you are running a read heavy website based on SQLite, you should be easily able to scale it out by adding more SQLite replicated nodes. SQLite is probably the most ubiquitous DB that exists almost everywhere, Marmot aims to make it even more ubiquitous for server side applications by building a replication layer on top.

can postgresql scale to the likes of sql server? is it easy to tune?

hoping someone has experience with both sql server and postgresql.
Between the two db's, which one is easier to scale?
Is creating a read only db that mirrors the main db easier/harder than sql server?
Seeing as sql server can get $$, I really want to look into postgresql.
Also, are the db access libraries written well for an asp.net application?
(please no comments on: do you need the scale, worry about scaling later, and don't optimize until you have scaling issues...I just want to learn from a theoretical standpoint thanks!)
Currently, setting up a read-only replica is probably easier with SQL Server. There's a lot of work going on to get hot standby and streaming replication part of the next release, though.
Regarding scaling, people are using PostgreSQL with massive databases. Skype uses PostgreSQL, and Yahoo has something based on PostgreSQL with several petabyte in it.
I've used Postgresql with C# and ASP.Net 2.0 and used the db provider from devart:
http://www.devart.com/dotconnect/postgresql/
The visual designer has a few teething difficulties but the connectivity was fine.
I have only used SQL Server and not much PostgreSQL, so I can only answer for SQL Server.
When scaling out SQL Server you have a couple of options. You can use peer to peer replication between databases, or you can have secondary read-only DB(s) as you mention. The last option is relatively straight-forward to set up using database mirroring or log shipping. Database mirroring also gives you the benefit of automatic switchover on primary DB failure. For an overview, look here:
http://www.microsoft.com/sql/howtobuy/passive-server-failover-support.mspx
http://technet.microsoft.com/en-us/library/cc917680.aspx
http://blogs.technet.com/josebda/archive/2009/04/02/sql-server-2008-database-mirroring.aspx
As for licensing, you only need a license for the standby server if it is actively used for serving queries - you do not need one for a pure standby server.
If you are serious, you can set up a failover cluster with a SAN for storage, but that is not really a load balancing setup in itself.
Here are some links on the general scale up/out topic:
http://www.microsoft.com/sqlserver/2008/en/us/wp-sql-2008-performance-scale.aspx
http://msdn.microsoft.com/en-us/library/aa479364.aspx
ASP.NET Libraries are obviously very well written for SQL Server, but I would believe there exists good alternatives for PostgreSQL as well.

Resources