Can apache drill work with cloudera hadoop? - cloudera

I am trying to setup apache drill in distributed mode. I already have cloudera hadoop cluster with a master and 2 slaves. From documentation given on apache drill, its not pretty clear if it can be set up with typical cloudera cluster. I could not find any relevant articles. Any kind of help will be appreciated.

Drill can be installed along with Cloudera on the nodes of the cluster independently - and would be able to query the files on HDFS.
Refer the link for installation details -
https://cwiki.apache.org/confluence/display/DRILL/Deploying+Apache+Drill+in+a+Clustered+Environment

I got this working with cloudera hadoop distribution. I already had cloudera cluster installed with all services running.
perform following steps:
Install apache drill on all nodes of the cluster.
Run drill/bin/drillbit.sh on each node.
Configure storage plugin for dfs using apache drill webinterface at host:8047. Update HDFS configurations here.
Run Sqlline : ./sqlline -u jdbc:drill:zk=host1:2181,host2:2181,host3:2181
(2181 is the port number used by zookeeper.)

It may only work with a rudimentary insecure cluster as Drill currently isn't tested / documented to integrate with HDFS + Kerberos for secure Hadoop clusters. Vote and check back on this ticket for Drill secure HDFS support:
https://issues.apache.org/jira/browse/DRILL-3584

Related

Jfrog artifactory upgrade and clustering in docker-compose

We are using self hosted Jfrog Artifactory version 6.20.0. It is single node infrastructure. We have hosted Docker-compose environment in single VM, where we are using nginx, artofactory-pro, Postgress DB containers.
Now we have plan to upgrade the Artifactory and convert it from standalone to cluster. I have following questions:
can we directly upgrade from version 6.20.0 to version Artifactory 7.31.13?
Any document or guidance to move from single node to cluster (active/active) of 2 nodes?
cluster in docker compose means each node has its own Postgres DB, Artifactory containers and then load balancer on top of two VMs containing these two containers. Did I understand it right or am I missing something?
Yes, you can directly upgrade from 6.20.0 to 7.31.13.
Refer this JFrog Confluence page for the Artifactory upgrade.
Each node can not have its own DB, both the nodes will rely on single postgres database whereas the Artifactory load will be distributed between the nodes.
So the best approach for you is as below.
Perform an upgrade on single node first.
Once the upgrade is successful, you can just add the secondary node by connecting to the same database. Visit this page, before adding the secondary node.
For adding node after visiting the requirements page, visit this page for adding another node for docker-compose method.
If you have any queries further/issues while upgrading or adding the node, you may reach JFrog Support.

How to keep persistent volumes in sync between clusters?

I'm trying to get an installation of Wordpress running in Kubernetes, as well as have an option of running the same configuration locally in minikube. I want to use the standard Docker image of Wordpress: https://hub.docker.com/_/wordpress/.
I'm having trouble with making sure that the plugins and templates are in sync though. The Docker container exposes a Volume at /var/www/html. Wordpress installation, as well as my plugins will live there.
Assuming I do the development on Minikube, along with the installation of plugins etc. How do handle the move between Persistent Volumes between my local cluster and the target cluster? Should I just reinstall Wordpress every time when the Pod is scaled?
You can follow Writing Portable Configuration (https://kubernetes.io/docs/concepts/storage/persistent-volumes/#writing-portable-configuration) guide for persistent volume if you are planning to migrate it to different cluster.
In a real production scenario you would want to use a standard tool to backup and migrate persistent volumes between clusters. Valero is such a tool which enables you to achieve that.

Cloudify architecture

I am trying to setup cloudify in an OpenStack installation using this offline guide.
This guide does not specify much about cloud platform so I have assumed it can be used on OpenStack environment. I am using simple manager blueprint YAML file for bootstrapping .
I have the following questions:
Can I use fabric 1.4.2 with cloudify 3.4.1 ?
If not, I am unable to find wagon-file for fabric 1.4.1.wgn file
Architecture: Can I use CLI inside a network to bootstrap a manager within that network? And this network lies inside OpenStack environment. Can cloudify CLI machine, cloudify Manager and application reside within one network inside openstack? If so, how? Because we would like to test it inside one single network.
(Full disclosure: I wrote the document you linked to.)
Yes you can.
You can find all Wagon files for all versions of the Fabric plugin here: https://github.com/cloudify-cosmo/cloudify-fabric-plugin/releases
Yes.

how to use cloudera manager for monitor the components of CDH4

I have already install CDH4 without using cloudera manager. I wanted to use cloudera manager so that i can monitor the different components of CDH4. Please suggest me how to use the manager now.
I have recently had to undertake the same task of importing already installed and running clusters into new Cloudera Manager instances.
I would firstly suggest taking your time to read through as much documentation as possible to fully understand the processes and key components.
As a short answer, you need to manually import all your cluster configurations and assignments into Cloudera Manager so that they can be managed. A rough outline of the plan I used is below:
Setup MySQL instance on NEW hardware (can use postgresql)
Create Cloudera Manager user on all servers (must be sudo enabled)
Setup ssh key access between cloudera-manager server and all other hosts
Useful Docs below:
- http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Ent/latest/Cloudera-Manager-Installation-Guide/cmig_install_mysql.html
- http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Ent/latest/Cloudera-Manager-Installation-Guide/cmig_install_path_B.html
Install Cloudera Manager and agent/daemon packages on Cloudera Manager server
Shutdown all services using cluster and cluster services
Save the namespace
Backup Meta Data and Configuration files to MULTIPLE LOCATIONS
Ensure the backup can be loaded by starting a single instance NN
Install Cloudera Manager agent and daemon on all production servers
Start the services on the Cloudera Manager server
Access the Cloudera Manager interface
Skip Setup Wizard
Add all hosts to Cloudera Manager
Create HDFS service - DO NOT start the service
Check hosts assignments are correct
Input all configuration file parameters and verify (this means each servers conf files need to be input manually)
Run host inspector and configuration check
Perform the above process for remaining services
I hope this provides a some assistance for you. If you have any other questions I will be happy to try and assist you as much as I can.
Regards,
James
I just recorded a webinar titled "Installing Cloudera Manager in < 30 mins" for Global Knowledge. Available at: http://www.globalknowledge.com/training/coursewebsem.asp?pageid=9&courseid=20221&catid=248&country=United+States (register in the upper right of page). In the video, I install CM on Ubuntu, set up the core components (Hadoop only), and then browse through some of the graphs for monitoring.

Best way to install web applications (e.g. Jira) on Unixes?

Can you throw some points on how it is a best way, best practice
to install web application on Unixes?
Like:
where to place app and its bases and so for,
how to configure to be secure and easy to backup,
etc
For example I know such suggestion -- to set uniq user for each app.
App in question is Jira on FreeBSD, but more general suggestions are also welcomed.
Here's what I did for my JIRA install on Fedora Linux:
Create a separate user to run JIRA
Install JIRA under the JIRA user's home directory
Made a soft link "/home/jira/jira" pointing to the JIRA installation directory (the directory as installed contains the version number, something like /home/jira/atlassian-jira-enterprise-4.0-standalone)
Created an /etc/init.d script to run JIRA as a service, and added it to chkconfig so that it runs at system startup - see these instructions
Created a MySQL database for JIRA on a separate data volume
Set up scheduled XML backups via the JIRA admin interface
Set up a remote backup script to dump the MySQL database and copy the DB dump and XML backups to a separate backup server
In order to avoid having to open extra firewall ports, set up an Apache virtual host "jira.myhost.com" and used mod_proxy to forward requests to the JIRA URL.
I set everything up on a virtual machine (an Amazon EC2 instance in my case) and cloned the machine image so that I can easily restart a new instance if the current one goes down.

Resources