How HBase/Bigtable can be used for data analysis? - bigdata

Conceptually, HBase/Bigtable are key-value stores. Many times when reading the documentation of both, it is mentioned that they can be used for analytics. But since they are key-value and doesn't support SQL or SQL like language how they are used for analytics?
Cloud Bigtable also excels as a storage engine for batch MapReduce
operations, stream processing/analytics, and machine-learning
applications. (source)

You can use analytics tools such as Hadoop MapReduce, Apache Spark, and Apache Beam / Google Cloud Dataflow on HBase and Cloud Bigtable, e.g., see:
Dataflow connector for Cloud Bigtable
Connect Apache Spark to your HBase database
HBaseIO connector for Apache Beam
BigtableIO connector for Apache Beam
Additionally, TensorFlow is integrated with Cloud Bigtable for ML training, eg., see:
Using Cloud Bigtable as a streaming data source for TensorFlow
TensorFlow APIs for accessing data in Cloud Bigtable
Finally, you can run SQL analytics via integrations, e.g., BigQuery can run SQL queries on data stored in Cloud Bigtable; Apache Hive can run SQL queries on data stored in Apache HBase; e.g., see:
BigQuery + Cloud Bigtable federated queries
Hive + HBase integration

Related

How to copy data from SSAS (on-premise) to Azure Analysis Services

My company plans to copy all data from on-premise SQL Services Analysis Services (2017 tabular) to Azure Analysis Services on a periodic basis. We want to do this at least once a day, and then use the Azure Analysis Services version for Power BI reporting only. The idea is to reduce load on the on-premise cube, and to improve response in Power BI.
Is this a recommended design for reporting?
What are the methods available for the periodic copy of data (and pros and cons for each)?
In addition to Nandan’s approach, you could continue to refresh the model on premises, then backup and restore to Azure Analysis Services. I shared a PowerShell script which automates this operation.
can you tell us what is the data source for the on prem SSAS cube ?
In case it a SQL server, rather than syncing data from SSAS to AAS, you can directly refresh the AAS with on prem SQL server as the source via on prem gateway.
And in case if the cube is only used for reporting(powerbi), then having AAS is enough rather than maintaining SSAS and AAS.

Does my application need a connection pool if backend database is NoSQL (Azure Cosoms DB)

I am very new to NoSQL world and wondering how connections are managed by NoSQL databases like Azure Cosmos DB.
I am designing a highly scalable solution for real-time application. And one of the concern is how to manage numerous connections/requests to Azure Cosmos DB from Azure Functions or my business tier?
Is Cosmos DB subjecting to similar limitations as SQL Server is in terms of number of available connections?
Azure functions connections limitation allied to all outbound connections irrespective of target service. Some services might optimize the connection usage (pooling, multi-plexing etc...) for higher concurrency and throughput.
Specifically for CosmosDB: 2.0.0-preview package has connection multiplexing and pooling, please check https://www.nuget.org/packages/Microsoft.Azure.DocumentDB/2.0.0-preview
NOTE: Azure functions V2 run-time is required for custom CosmosDB SDK version.

Connection to Google Bigtable in Google Dataproc using HBase odbc driver

Has anyone already made a connection to Google Bigtable in Google Cloud Dataproc using any available HBase odbc driver? If yes, can you tell which ODBC you've used? Thanks
As noted by Solomon in the comments, we don't have a native ODBC or JDBC driver for Google Cloud Bigtable. If you're building a .NET application and want to interact with Bigtable we recommend using the native client here:
https://github.com/GoogleCloudPlatform/google-cloud-dotnet
Another approach that may work would be to use BigQuery which does have ODBC support and combine that with BigQuery's support for federated data access to Bigtable. That's a bit of a Rube Goldberg construction though, so it may be more painful to set up and debug.

MSDTC is not supported by AWS RDS SQL server

I have transaction scope in my code which move transaction to MSDTC. But when I run this code into AWS cloud where RDS is SQL server. It is not supporting MSDTC please how can I make this supportable or what will be alternative way for this. I need MSDTC in my code.
Well, there isn't much that you can do since AWS does not support it (https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_SQLServer.html).
Features Not Supported and Features with Limited Support
The following Microsoft SQL Server features are not supported on Amazon RDS:
Stretch database
Backing up to Microsoft Azure Blob Storage
Buffer pool extension
Data Quality Services
Database Log Shipping
Database Mail
Distribution Transaction Coordinator (MSDTC)
File tables
FILESTREAM support
Maintenance Plans
Performance Data Collector
...
...
The alternative is to deploy/host/manage your own MSSQL server on AWS.
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Appendix.SQLServer.Options.MSDTC.html
Looks like this is now supported by AWS.

Using Firebase within the physical network of an organization - not on the cloud

If for security and regulatory reasons an organization can not run its core internal business processes on the cloud, is there a solution by which Firebase could still be used, for example run only within the physical network of that organization ?
Firebase is a pure cloud-hosted solution. There is no on-premise version available.

Resources