Is there any relationship between Zuul and tox for openstack project?
When I read a opensource openstack project
I find there are zuul.d, bindep.txt, tox.ini files.
So, the project use the zull, tox, I know them all have code review function, what's the relationship of them in there?
No directly relationship but because openstack is a python project you are likely to see all the used as once.
tox for running simple python tests
bindep for installing system packages before job starts (job would not have sudo)
zull would run the jobs
Obviously that you can install the system dependencies yourself.
Related
I am interested in knowing how can I integrate a repository with Azure Machine Learning Workspace.
What have I tried ?
I have some experience with Azure Data Factory and usually I have setup workflows where
I have a dev azure data factory instance that is linked to azure repository.
Changes made to the repository using the code editor.
These changes are published via the adf_publish branch to the live dev instance
I use CI / CD pipeline and the AzureRMTemplate task to deploy the templates in the publish branch to release the changes to production environment
Question:
How can I achieve the same / similar workflow with Azure Machine Learning Workspace ?
How is CI / CD done with Azure ML Workspace
The following workflow is the official practice to be followed to achieve the task required.
Starting with the architecture mentioned below
we need to have a specific data store to handle the dataset
Perform the regular code modifications using the IDE like Jupyter Notebook or VS Code
Train and test the model
To register and operate on the model, deploy the model image as a web service and operate the rest.
Configure the CI Pipeline:
Follow the below steps to complete the procedure
Before implementation:
- We need azure subscription enabled account
- DevOps activation must be activated.
Open DevOps portal with enabled SSO
Navigate to Pipeline -> Builds -> Choose the model which was created -> Click on EDIT
Build pipeline will be looking like below screen
We need to use Anaconda distribution for this example to get all the dependencies.
To install environment dependencies, check the link
Use the python environment, under Install Requirements in user setup.
Select create or get workspace select your account subscription as mentioned in below screen
Save the changes happened in other tasks and all those muse be in same subscription.
The entire CI/CD procedure and solution was documented in link
Document Credit: Praneet Singh Solanki
I am considering using Apache-Airflow. I had a look at the documentation and now I am trying to implement an already existing pipeline (home made framework) using Airflow.
All given examples are simple one module DAGs. But in real life you can have a versionned application that provides (complex) pipeline blocks. And DAGs use those blocks as tasks. Basically the application package is installed in a dedicated virtual environment with its dependencies.
Ok so no now how do you plug that with Airflow ? Should airflow be installed in the application virtualenv ? Then there is a dedicated Airflow instance for this application pipelines. But in this case if you have 100 applications you have to have 100 Airflow instances... On the other side if you have one unique instance it means you have installed all your applications packages on the same environement and potentially you can have dependency conflicts...
Is there something I am missing ? Are there best practices ? Do you know internet resources that may help ? Or GitHub repos using one pattern or the other ?
Thanks
One instance with 100 pipelines. Each pipelines can easily be versioned and python dependancies can be packaged.
We have 200+ very different pipelines and use one central airflow instance. Folders are organized as follow:
DAGs/
DAGs/pipeline_1/v1/pipeline_1_dag_1.0.py
DAGs/pipeline_1/v1/dependancies/
DAGs/pipeline_1/v2/pipeline_1_dag_2.0.py
DAGs/pipeline_1/v2/dependancies/
DAGs/pipeline_2/v5/pipeline_2_dag_5.0.py
DAGs/pipeline_2/v5/dependancies/
DAGs/pipeline_2/v6/pipeline_2_dag_6.0.py
DAGs/pipeline_2/v6/dependancies/
etc.
I know using command line it can be get by running feature:list -i but is there any API/JSON available to fetch this?
You can use jolokia and hawtio to retrieve that information. Quite easily. I believe you can easily add the hawtio repo from the native karaf repos in features (repo-add hawtio). Then you need to install jolokio, hawtio, and the karaf web console. From the karaf webconsole alone you can see a full list of features, but I find the hawtio interface to be a god send.
A REST API can be installed without the need for Hawtio, which uses jolokia for accessing the bundle list under the hood.
The jolokia project provides web applications called agents serving a REST API. For quick experiments you can deploy the war jolokia-war-unsecured into the hot deploy folder of a running karaf instance. This installs a A REST web service at e.g. http://localhost/jolokia-war-unsecured/ which does not require any authentications.
I am trying to setup cloudify in an OpenStack installation using this offline guide.
This guide does not specify much about cloud platform so I have assumed it can be used on OpenStack environment. I am using simple manager blueprint YAML file for bootstrapping .
I have the following questions:
Can I use fabric 1.4.2 with cloudify 3.4.1 ?
If not, I am unable to find wagon-file for fabric 1.4.1.wgn file
Architecture: Can I use CLI inside a network to bootstrap a manager within that network? And this network lies inside OpenStack environment. Can cloudify CLI machine, cloudify Manager and application reside within one network inside openstack? If so, how? Because we would like to test it inside one single network.
(Full disclosure: I wrote the document you linked to.)
Yes you can.
You can find all Wagon files for all versions of the Fabric plugin here: https://github.com/cloudify-cosmo/cloudify-fabric-plugin/releases
Yes.
I have already install CDH4 without using cloudera manager. I wanted to use cloudera manager so that i can monitor the different components of CDH4. Please suggest me how to use the manager now.
I have recently had to undertake the same task of importing already installed and running clusters into new Cloudera Manager instances.
I would firstly suggest taking your time to read through as much documentation as possible to fully understand the processes and key components.
As a short answer, you need to manually import all your cluster configurations and assignments into Cloudera Manager so that they can be managed. A rough outline of the plan I used is below:
Setup MySQL instance on NEW hardware (can use postgresql)
Create Cloudera Manager user on all servers (must be sudo enabled)
Setup ssh key access between cloudera-manager server and all other hosts
Useful Docs below:
- http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Ent/latest/Cloudera-Manager-Installation-Guide/cmig_install_mysql.html
- http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Ent/latest/Cloudera-Manager-Installation-Guide/cmig_install_path_B.html
Install Cloudera Manager and agent/daemon packages on Cloudera Manager server
Shutdown all services using cluster and cluster services
Save the namespace
Backup Meta Data and Configuration files to MULTIPLE LOCATIONS
Ensure the backup can be loaded by starting a single instance NN
Install Cloudera Manager agent and daemon on all production servers
Start the services on the Cloudera Manager server
Access the Cloudera Manager interface
Skip Setup Wizard
Add all hosts to Cloudera Manager
Create HDFS service - DO NOT start the service
Check hosts assignments are correct
Input all configuration file parameters and verify (this means each servers conf files need to be input manually)
Run host inspector and configuration check
Perform the above process for remaining services
I hope this provides a some assistance for you. If you have any other questions I will be happy to try and assist you as much as I can.
Regards,
James
I just recorded a webinar titled "Installing Cloudera Manager in < 30 mins" for Global Knowledge. Available at: http://www.globalknowledge.com/training/coursewebsem.asp?pageid=9&courseid=20221&catid=248&country=United+States (register in the upper right of page). In the video, I install CM on Ubuntu, set up the core components (Hadoop only), and then browse through some of the graphs for monitoring.