HCatalog & Impala itegration - cloudera

Is there a way to use WebHcat to submit Impala queries?
As far as I understand, Impala uses same metastore as Hive and HCatalog give unified access to this metastore.

Unfortunately Impala queries are submitted to a different service endpoint than Hive queries, so you can't use WebHCat to submit queries to Impala.
If you're curious, here's a bit more information about how to submit queries to Impala. First, read the Impala Concepts and Architecture documentation. As you now know, you can submit your query to any node running the impalad daemon. The interface exposed by this daemon is specified in ImpalaService.thrift. There are a number of open source clients that have been implemented that will allow you to submit queries to Impala from the command line, from a web interface, or from a library in your favorite programming language. Here are a few examples:
impala-shell: command-line interface that ships with Impala
Impala app in Cloudera Hue: web interface
impyla: Python library
impala-ruby: Ruby library
php-impala: PHP library
ImpalaSharp: C# library
impala-java-client: Java library

Related

Connection to Google Bigtable in Google Dataproc using HBase odbc driver

Has anyone already made a connection to Google Bigtable in Google Cloud Dataproc using any available HBase odbc driver? If yes, can you tell which ODBC you've used? Thanks
As noted by Solomon in the comments, we don't have a native ODBC or JDBC driver for Google Cloud Bigtable. If you're building a .NET application and want to interact with Bigtable we recommend using the native client here:
https://github.com/GoogleCloudPlatform/google-cloud-dotnet
Another approach that may work would be to use BigQuery which does have ODBC support and combine that with BigQuery's support for federated data access to Bigtable. That's a bit of a Rube Goldberg construction though, so it may be more painful to set up and debug.

Streaming data from Oracle 11g to Kafka

I am looking for a solution to stream data from Oracle 11g to Kafka. I was hoping to use GoldenGate, but that only seems to be available for Oracle 12c. Is the Confluent platform the best way to go?
Thanks!
First, the general answer would be: The best way to connect Oracle (databases) to Kafka is indeed to use Confluent Platform with Kafka's Connect API in combination with a ready-to-use connector for GoldenGate. See the GoldenGate/Oracle entry in section "Certified Connectors" at https://www.confluent.io/product/connectors/. The listed Kafka connector for GoldenGate is maintained by Oracle.
Is the Confluent platform the best way to go?
Hence, in general, the answer to the above question is: "Yes, it is."
However, as you pointed out for your specific question about Oracle versions, Oracle unfortunately has the following information in the README of their GoldenGate connector:
Supported Versions
The Oracle GoldenGate Kafka Connect Handler/Formatter is coded and
tested with the following product versions.
Oracle GoldenGate for Big Data 12.2.0.1.1
Confluent IO Kafka/Kafka Connect 0.9.0.1-cp1
Porting may be required for Oracle GoldenGate Kafka Connect
Handler/Formatter to work with other versions of Oracle GoldenGate for
Big Data and/or Confluent IO Kafka/Kafka Connect
This means that the connector does not work with Oracle 11g, at least as far as I can tell.
Sorry if that doesn't answer your specific question. At least I wanted to give you some feedback on the general approach. If I do come across a more specific answer, I'll update this text.
Update Mar 15, 2017: The best option you have at the moment is to use the Confluent's JDBC connector. That connector can't give you quite the same feature set as Oracle's native GoldenGate connector though.
Oracle GoldenGate and Confluent Platform are not comparable.
Confluent Platform provides the complete streaming platform and is a collection of multiple software which can be used for streaming your data, where as GoldenGate is replication and data-integration software.
Also GoldenGate is highly reliable for db replication since it maintains transactional integrity, same cannot be said for Kafka Mirror Maker or Confluent's Replicator at this time.
If you want just pure transactions - please also consider using OpenLogReplicator. It supports Oracle database from version 11.2.0.1.
It can produce transactions to Kafka in 2 formats:
Classic format - when every transaction is one Kafka message (multiple DMLS per Kafka message)
Debezium style format - transactions are divided - every DML is one Kafka message
There is already a working version. You can try it.
Right now I am using ojdbc6 to connect to Oracle 11g. It is good enough but not perfect especially when using pooling mode to check if there are new updates on the original tables.
I tried also to read all tables using certain pattern but this did not work well.
The best mode to connect an Oracle DB to Kafka (especially when the tables are very wide, columns wise, is to use queries for the connectors. This way, you ensure that you pick the right fields and do some casting for numbers if you are using avro.

Is it possible to use single table for multiple connections in cassandra

Is it possible to use single table for multiple connections in cassandra?
I am using a spring data for java connection and am trying to use python flask for raspberry connection.
Yes, you can use single tale for multiple connections. I'm using Cassandra 2.2 where I connect using Java Astyanax driver and python pycassa client.
Are you referring about table lock for transactions? Cassandra does not use any locking mechanism on data, the below docs link describes more on transactions and consistency in Cassandra:
http://docs.datastax.com/en/archived/cassandra/2.0/cassandra/dml/dml_about_transactions_c.html
Mention what you tried and problems you faced.

How to use ODBC to connect to any DBMS

I'm developping a java application and i'm using JDBC to connect to MySQL Database, now i want to use ODBC to be able to get and retrieve data from any DBMS, of course if have access to it. Is there an API or tool to do this ?
What you are looking for is a JDBC-ODBC bridge. There are several available. It is not recommended, instead you should always use a native JDBC driver.

Is it bad to convert sqlite database to server (database)?

I am using a c program to write/delete 1-2MB of data periodically (10min) to sqlite3 database. The program also act as a read only database for my node.js web server to output Restful APIs. (I can not use node.js modules because node.js web server is on different machine)
In documentation its mentioned that in client/server architechture RDBMS might be good but that point is not put strongly
I am using a c program to act as a server to answer web servers request as well as other processes on different machine. The system require small data (~2-5Mb) frequently (every 5min).
If it is not good to use sqlite as client server database How can I convince my manager?
If its okay then why do they not have a standard server plugin?
When the SQLite documentation speaks about a client/server architecture, it says:
If you have many client programs accessing a common database over a network
This applies to whatever program(s) that access the database directly. (In the case of SQLite, this would imply that you have a networked file server, and that multiple clients access the database file directly over the network.)
If you have a single process that accesses the database file, then you do not have a client/server architecture as far as the database is concerned.
If there are other clients that access your server program, this has no effect on your database.

Resources