CI / CD and repository integration for Azure ML Workspace - azure-machine-learning-studio

I am interested in knowing how can I integrate a repository with Azure Machine Learning Workspace.
What have I tried ?
I have some experience with Azure Data Factory and usually I have setup workflows where
I have a dev azure data factory instance that is linked to azure repository.
Changes made to the repository using the code editor.
These changes are published via the adf_publish branch to the live dev instance
I use CI / CD pipeline and the AzureRMTemplate task to deploy the templates in the publish branch to release the changes to production environment
Question:
How can I achieve the same / similar workflow with Azure Machine Learning Workspace ?
How is CI / CD done with Azure ML Workspace

The following workflow is the official practice to be followed to achieve the task required.
Starting with the architecture mentioned below
we need to have a specific data store to handle the dataset
Perform the regular code modifications using the IDE like Jupyter Notebook or VS Code
Train and test the model
To register and operate on the model, deploy the model image as a web service and operate the rest.
Configure the CI Pipeline:
Follow the below steps to complete the procedure
Before implementation:
- We need azure subscription enabled account
- DevOps activation must be activated.
Open DevOps portal with enabled SSO
Navigate to Pipeline -> Builds -> Choose the model which was created -> Click on EDIT
Build pipeline will be looking like below screen
We need to use Anaconda distribution for this example to get all the dependencies.
To install environment dependencies, check the link
Use the python environment, under Install Requirements in user setup.
Select create or get workspace select your account subscription as mentioned in below screen
Save the changes happened in other tasks and all those muse be in same subscription.
The entire CI/CD procedure and solution was documented in link
Document Credit: Praneet Singh Solanki

Related

How to use AWS profile with Amplify?

Amplify does not support CLI option --profile. It always uses profile specified when application was generated. Different team members use different AWS profiles.
How to change/configure/use profile different than profile used during application generation?
Aim is to publish changes from different computer. Final goal is to use CI server to publish application to different regions.
Amplify does not work like "other" development tools where tool is detached from git. Amplify goes hand in hand with Git and requires initialization after cloning. Running amplify init and choosing existing environment (which is pushed by other developer), it is possible to select different AWS profile.

Azure Data Factory Test Framework

Are there any automatic testing mechanism available for azure data factory pipelines? Does the azure data factory visual studio project come with any test suite of its own? Any help highly appreciated
Thanks
EDIT after comment:
You could use a github repository (gbrueckl - Azure.DataFactory.LocalEnvironment) for running custom pipelines local etc. His repository provides some tools which make it easier to work with Azure Data Factory (ADF). It mainly contains two features:
Debug Custom .Net Activities locally (within VS and without
deployment to the ADF Service!)
Export existing ADF Visual Studio projects a Azure Resource Manager
(ARM) template for deployment
In addition, the repository also contains various samples to
demonstrate how to work with the ADF Local Environment.
https://github.com/gbrueckl/Azure.DataFactory.LocalEnvironment is the link to the repository for doing that. You can use it to debug your pipelines on your local environment at least and it could also help to test them...
Not that I'm aware of, but happy to be told otherwise.
I suggest you post this on Microsoft's user voice page as a feedback idea. Then people searching will come here, go to that link and vote to get something developed.
https://feedback.azure.com/forums/270578-data-factory/filters/my_feedback?query=Unit%20Testing%20for%20ADF%20Projects
Hope this helps.
The only related project / sample code I'm aware of is the ability to step into and debug "DotNetActivty" outside of Azure Batch, by using your pipeline variables.
This could effectively be used as a test runner of some description.
If you have your data factory setup to be automatically deployed, you could deploy to an alternative QA environment.
Then you could probably (I haven't dug into the SDK in that area enough to know for sure) then use the SDK to run the slice and check if it ran successfully. It would be more of an integration test / end to end smoke test at this point.
ADFCustomActivityRunner on Github

Running a post deploy ps script or executable

I am in the process of converting our legacy custom database deployment process with custom built tools into a full fledged SSDT project. So far everything has gone very well. I have composite projects that can deploy a base database as well as projects that deploy sample and test data.
The problem I am having now is finding a solution for running some sort of code that can call a web service to get an activation code and add it to the database as the final step of the process. Can anyone point me to a hook that I might be able to use?
UPDATE: To be clearer I am doing this to make it easier to maintain and deploy our sample and test data to a local machine. We can easily use Jenkins to activate the sites when they are deployed nightly to our official testing environments. I'm just hoping to be able to do this in a single step to replace the homegrown database deploy tool that we use now.
In my deployment scenario I wrapped database deployment process in some powershell scripts which do necessary prerequisites. For example:
powershell script is started and then it stops some services
next it run sqlpackage.exe or preproduced sql deployment scripts
finally powershell script starts services.
You can pass some parameters from powershell to sql scripts or sqlpackage.exe as sqlcmd variables. So you can call webservice first, then pass activation code as sqlcmd variable and use the variable in postdeployment script.
Particularly if it's the final step, I'd be tempted to do this separately, using whatever tool you're using to do the deployment: Powershell scripts, msbuild, TFS, Jenkins, whatever. Presumably there's also a front-end of some description that gets provisioned this way?
SSDT isn't an eierlegende Wollmilchsau, it's a set of tools for managing database changes.
I suspect if the final step were "provision a Google App Engine Instance and deploy a Python script", for example, it wouldn't appear to be a natural candidate for inclusion in an SSDT post-deploy script, and I reckon this falls into the same category.

Using Jenkins to Deploy to Production Server

I have 3 stages (dev / staging / production). I've successfully set up publishing for each, so that the code will be deployed, using msbuild, to the correct location, with the correct web configs transformed - all within Jenkins.
The problem I'm having is that I don't know to deploy the code to staging from what was built on dev (and staging to production). I'm currently using SVN as the source control, so I think I would need to somehow save the latest revision number dev has built and somehow tell Jenkins to build/deploy staging based on that number?
Is there a way to do this, or a better alternative?
Any help would be appreciated.
Edit: Decided to use the save the revision number method, which parses a file containing the revision number to the next job -- to do this, I followed this answer:
How to promote a specific build number from another job in Jenkins?
It explains how to copy an artifact from one job to another using the promotion plugin. For the artifact itself, I added a "Execute Windows batch command" build step after the main build with:
echo DEV_ENVIRONMENT_CORE_REVISION:%SVN_REVISION%>env.properties
Then in the staging job, using that above guide, copied that file, and then using a plugin EnvInject, to read from that file and set an environment variable, which can then be used as a parameter to the SVN Repository URL.
You should be able to identify the changeset number that was built in DEV and manually pass that changeset to the Jenkins build to pull that same changeset from SVN. Obviously that makes your deployment more manual. Maybe you can setup Jenkins to publish the changeset number to a file and then have the later env build to read that file for the changeset number.
We used to use this model as well and it was always complex. Eventually we moved to a build once and deploy many times model using WebDeploy. This has made the process much more simple. Check it out - http://www.dotnetcatch.com/2016/04/16/msbuild-once-msdeploy-many-times/

MSBuild: automate collecting of db migration scripts?

Summary of environment.
Asp.net web application (source stored in svn)
SQL Server database. (Database schema (tables/sprocs) stored in svn)
db version is synced with web application assembly version. (stored in table 'CurrentVersion')
CI hudson server that checks out web app from repo and runs custom msbuild file to publish/package app.
My msbuild script updates the assembly version of the web app (Major.Minor.Revision.Build) on each build. The 'Revision' is set to the currently checked out svn revision and the 'Build' to the hudson build number (incremented on each automated build).
This way i can match the app to a specific trunk revision also get other build stats from the hudson build number.
I'd like to automate the collecting of migration scripts (updated sprocs etc) to add to the zip package.
I guess by comparing the svn revision of the db that has yet to be deployed to, to the revision being deployed, i can find what db files have changed in the trunk since the last deployment to that database/environment.
This could easily be achieved by manually calling the svn diff -r REVNO:REVNO command to list changed .sql files. These files could then manually have to be added to the package.
It would be great if this could be automated.
Firstly i'd imagine I'll have to write a custom task to check the version of the db that has yet to be deployed to. After that I'm quite unsure.
Does anyone have any suggestion on how this would be achieved through an msbuild task either existing or custom?
Finally I'll have to autogen a script to add to the package that updates the database version table so as to be in sync with the application.
Integrating SQL changes into an automated build/deploy process is HARD. I know, because I've tried to to it a couple times with limited success. What you're trying to do is roughly on the right track, but I would argue that it's actually a bit too complicated. In your proposal, you suggest collecting the specific SQL scripts that need to be applied to your DB at build/package time. Instead, you should package all your delta scripts (for the entire history of your database) with your project, and calculate the deltas that actually need to be applied when you deploy -- that way, your deployable package can be deployed to environments with databases of differing versions. There are two implementation pieces you need to achieve this:
1) You need to package your deltas into your deployable package. Note that you should package deltas -- not static files that create the schema in its current state. These delta scripts should be in source control. It's okay to keep the static schema in source control as well, but you will have to keep it in sync with the deltas. You can actually use a tool like Red Gate's SQLCompare or the VS Database version to generate (most) deltas from the static schema. To get the deltas into your deployable package, and given that you're using svn -- you may want to look into svn:externals as a way to "soft link" the delta scripts into your web project. Your build script can then simply copy them into your deployable package.
2) You need a system that can read the list of delta files, compare them to an existing database, determine which deltas need to be applied to that database, and then apply the deltas (and update the bookkeeping information, like the database version). There is an open-source project (sponsored by ThoughtWorks) called dbdeploy that accomplishes this. I've had some success with that tool personally.
Good luck -- this is a tough nut to crack (correctly).
Have a look at SQL database projects. In VS 2010 they have been enhanced quite a bit and have built in deployment capabilities that can sync your DEV database to other environments.
Here are a few good links about DB projects in vs 2010:
http://msmvps.com/blogs/deborahk/archive/2010/05/02/vs-2010-database-project-building-and-deployment.aspx
http://weblogs.asp.net/gunnarpeipman/archive/2009/07/29/visual-studio-2010-database-projects.aspx
Try SQL Examiner:
http://www.sqlaccessories.com/Howto/Version_Control.aspx
You can automate script collecting with SQL Examiner command-line tool.
The solutions available today that target a .NET/SQL Server stack are:
DBUp (open source)
ReadyRoll (deeper Visual Studio integration,
auto-generation of scripts)
The latter product is one that we're actively developing here at Redgate.

Resources