I have a Hudson job that just does a check-out/update to a third-party library. Call this Job A.
Several other jobs depend on this library. Call them Jobs B and C. They use the stuff checked out by Job A, and need it to be up-to-date.
My question is, how can I require Jobs B and C to always run Job A (to update the library) before they run through their build routine?
If this is not possible, can someone recommend another way to achieve the same effect?
You can do it the other way with "child" jobs. For example, you can configure A to trigger B and C after it has succeeded. (You will find the option on job A configuration page).
If you need more advanced conditions for triggering the child jobs, you can take a look at the Parametrized Trigger plugin.
After thinking about the problem some more, I think I may have been over-complicating things.
Since the library in Job A is rarely updated, we decided it's probably acceptable to just scan SVN on an interval and update when there are changes. There's a small possibility that builds of B and C will miss library changes if they start right after the changes to A were checked in, but that should rarely be an issue.
If I follow you, it sounds like you might need the Join plugin:
This plugin allows a job to be run after all the immediate downstream jobs have completed. In this way, the execution can branch out and perform many steps in parallel, and then run a final aggregation step just once after all the parallel work is finished. The plugin is useful for creating a 'diamond' shape project dependency. This means there is a single parent job that starts several downstream jobs. Once those jobs are finished, a single aggregation job runs
Related
At the moment we schedule our Databricks notebooks using Airflow. Due to dependencies between projects, there are dependencies between DAGs. Some DAGs wait until a task in a previous DAG is finished before starting (by using sensors).
We are now looking to use Databricks DBX. It is still new for us, but it seems that DBX' main added value is when you use Databricks workflows. It would be possible to run a Python wheel in a job that was created by DBX. My question is now, is it possible to add dependencies between Databricks jobs? Can we create 2 different jobs using DBX, and make the second job wait until the first one is completed.
I am aware that I can have dependencies between tasks in one job, but in our case it is not possible to have only one job with all the tasks.
I was thinking about adding a notebook/python script before the wheel with ETL logic. This notebook would check then if the previous job is finished. Once this is the case, the task with the wheel will be executed. Does this make sense, or are there better ways? Is something like the ExternalTaskSensor in Airflow available within Databricks workflows?
Or is there a good way to use DBX without DB workflows?
author of dbx here.
TL;DR - dbx is not opinionated in terms of the orchestrator choice.
It is still new for us, but it seems that DBX' main added value is when you use Databricks workflows. It would be possible to run a Python wheel in a job that was created by DBX.
The short answer is yes, but it's done on the tasks level (read more here on the difference between workflow and task).
Another approach would be the following - if you still need (or want) to use Airflow, you can do it in the following way:
Deploy and update your jobs from your CI/CD pipeline with dbx deploy commands.
In Airflow, use the Databricks Operator to launch the job (either by name or by id).
I am developing processes for collecting, cleaning and storing various data sets. The development is done with RStudio projects. I won't say I'm following every tidyverse/RStudio workflow recommendation but in general I'm using that framework-- relevant now is that I'm using standard subdirectories and the here package for referencing them.
Every project has a MAIN.R script that ultimately sources the functions from the other scripts-- one only needs to run MAIN.R to execute the process. I did this not only for simplicity but also because the long-term intent is to have this be a scheduled process.
For now at least my method for scheduling R Scripts is with Windows Task Scheduler. Getting an R Script scheduled and running is not a problem. The issue is the contextual assumptions of developing within a project: source(here("CODE", "some-file.R")) fails when I run MAIN.R outside of the scope of the project.
One obvious solution would be to hard-code the project location as one of the parameters. I would need to have two different MAIN.R files, one for development that uses the project and one that uses that parameter for scheduling. I don't hate that idea, don't love it as it someone nullifies the whole point of the project/here approach. Is there a more elegant solution that someone else has created that I couldn't find on Google, or better workaround ideas?
I ended up using the solution described here: https://community.rstudio.com/t/how-to-play-nice-with-taskscheduler-r-studio-projects-and-here/24406/2 .
I didn't have to make any changes to the MAIN.R script. Instead, I scheduled it directly but added the project directory to the "Starts In" argument of the Windows Task Scheduler task.
In the project I'm working for we're having a continuous deployment setup. The goal is to always install the latest working build to production, unless someone manually overrides this functionality.
In order to make this working we
Run static code analysis
Run unit tests
Run integration tests
Run automatic UI tests, to the extent this is feasible
If any of the above steps fail, the build process is halted, and the build marked as failed. If the installation package is created it is then in steps installed to
CI --> staging --> production
At each step we run a integration and UI tests for the environment, to make sure we didn't introduce some new things which fail on on the subsequent environments. If none of the tests fail, and N minutes pass without anyone pressing the panic button, the build gets promoted to the next env. If the tests fail, we want to delete the package, and discard it completely. The installation packages are, however, delivered to other servers, so we need to run a bunch of remote (shell) scripts to make this step happen.
The problem is, that there are a big set of failure cases which we cannot reliably test in the normal automated cycle, e.g. page layout, or some integrations fail only production and so on.
So the actual question: How shall I demote/delete builds, once they've been promoted? Is it possible to either run a remote script when doing delete build or use any of the promotion plugins to achieve this functionality? Is there some think-outside-the-box solution for this that I might not have thought about?
Instead of deleting builds manually, you may write a Jenkins job that accepts the build number as a parameter, deletes it, and then does the rest of the housekeeping. You can configure Jenkins access privileges so that people do not delete builds manually by accident.
This might be a very particular case, but we decided against creating a separate job for removing the builds, for the very simple reason of keeping all the logging related to a specific build number in one single place. The setup was the following:
Promotion here means make the installation package (RPM) available to the given server, where auto-update handles the actual upgrade of the package.
We have one main build, that builds every time a new commit is available. We had some fine-tuning related to quiet times etc. but basically every new pushed set of commits resulted in a new build. The build contains all the relevant and available testing, which is far from being complete, and probably never will be.
Every hour a separate promotion step handles promotion from staging to production. This build kicks off a separate promotion which takes the latest accepted build from CI to staging. There is a 30min delay before builds were promoted CI-->staging, to prevent accidental promotions for last second commits. Delays were achieved with some bash find scripting. The order of promotions is this, to make sure a build is available in staging for (at least) one hour before going to promotion.
The actual answer:
The promotion steps were done as separate builds. In order to do a real promotion, rather than a separate build with a separate log, the build kicks off an actual promotion in the main build, using curl and calling the remote HTTP API. This leaves a relevan promotions star in the main build log. Using different colors, the promotions are visible with one look.
To demote builds I decided to create a separate "demote build" promotion step. This would then issue a purple star as a sign of the build being defective, and thus removed. The demotion is done by accessing the correct build in the UI, and pressing the "Remove build" button. No automation has been added to this step, but by creating a separate test step, it would be fairly easy to automate the demotion as well. We, however, have not gotten quite this far yet.
The benefits of this approach include
A build is deleted by accessing the failed build, not by providing parameters. Makes it much easier to document, and get right under pressure
Having a "panic button" like this available for anyone to press, builds trust and ownership for the process not only amongst the developers but also managers and DevOps.
It's dead simple to spot dead builds, as the log is available besides the other promotion logs
Having all the relevant promotion calls in the same build makes further scripting easier
Acute things we still have to improve include automating the testing on the later stages of the build pipeline, and also a suitable way of downgrading builds after demotion. E.g. in production a defective build and a demotion must always lead to installing the last good build, which has turned out to be fairly hard to achieve. Production data centers are rarely allowed to be accessible to this level from the development DC where the CI system sits. Also stopping and starting the build pipeline must be automated, as else there is the chance of slipping back to the manual state.
Naturally, in the spirit of continuous improvement, there are always things to improve. The whole setup is something of a bash/perl scripting mess, but since it's scripted and repeatable, there is always the option of improving one small piece at a time. The most important thing is the automation, as it allows for incremental steps, which any manual steps more or less prevent.
For anyone looking for an easy way to delete a build with custom steps:
Create a 'defective' promotion.
Make it manually triggered.
Force it run on the master.
Add a choice parameter DELETE with choices NO and YES.
Add action Execute Shell.
_
if [ "${DELETE}" == "YES" ]; then
# TODO: my custom steps
curl -X POST ${PROMOTED_URL}/doDelete"
fi
To delete a build now, just go to promotions, flip the choice to YES and click approve.
There is another thread here on StackOverflow, dealing wih how often to commit changes to source control. I want to put that in the context of using a DVCS like git or mercurial.
How often and when do you commit?
Do you only commit changes when they
build correctly?
How often and when do you push your changes (or file a pull request or similar)?
How do you approac developing a complex feature / doing a complex refactoring requiring many places to be touched? Are "private commits" that won't build ok? When finished, do you push them also to the master repository or do you bundle all your changes into a single changeset before pushing?
It depends on the nature of the branch ("line of development") you are working on.
The main advantage with those DVCS (git or mercurial) is the ease you can:
branch
merge
So:
1/ How often and when do you commit?
2/ Do you only commit changes when they build correctly?
As many time as necessary on a private branch (for instance, if it compiles).
The practice to only commit if unit tests pass is a good one, but should only apply to an "official" (as in "could be published or 'pushed'") branch: in your private branch, you merge a gazillon times if you need to.
The only thing is: do some merge --interactive to reorganize your many commits on your private branch, before replaying them on your main development branch, where you can pass some tests.
3/ How often and when do you push your changes (or file a pull request or similar)?
Publication is another matter and should be done with a "clear" history (coherent merges, representing a content which compile and pass some tests).
The branch you publish should be one where the history is never rewritten, always updated.
The pace of the publications depends on the nature of the remote branch and of the population pulling that branch. For instance, if it is for another team, you could push quite often. If it is for a system-wide integration testing team, you will push a lot less often.
4/ How do you approach developing a complex feature / doing a complex refactoring requiring many places to be touched? Are "private commits" that won't build ok? When finished, do you push them also to the master repository or do you bundle all your changes into a single changeset before pushing?
See 1. and 2.: patch first in your own private branch, then reorganize your commits on an official (published) patch branch. One single commit is not always the best option if the patch involves several different "activities" (or bug fix).
I'd commit changes to my DVCS (my own topic or task branch) very, very often, this way I can use it not only for "delivering changes" but also to help me while I work: like "why this was working 5 minutes ago and it's not working anymore?" If you commit often you can just run a diff.
Also, a technique I found very, very good is using it to "self-document refactors". Let me explain: if you've to do a big refactor on a topic branch and then review the change as a whole (having modified a nice set of files), you'd probably get lost. But, suppose you checkin on every "intermediate step" and document it with a comment, then you're creating some sort of "movie" of your own changes helping to describe what you've done! Huge for reviewers.
I commit a lot; when adding functions or even reformatting my sources.
I use git and do most of my work on non-shared branches. And when I've added enough little changes that count as a block, I use git rebase to collect the smaller related changes into larger chunks and commit that to the main branches.
This way, I have all the advantages of committed code that I can go backwards or forwards in, but I don't have to commit all my mistakes, and bactracks to the main history.
How often and when do you commit?
Very frequently. It could be as much as a few times in a hour, if the changes I've made work and make up a nice patch. Or it could be every few hours, depending on whether I am spending longer debugging things, or experimenting with risky changes.
Do you only commit changes when they build correctly?
Yes, almost always. I can't think of a reason right to check in code that didn't build correctly. There's plenty of reasons you might check in code that doesn't run correctly (in a branch) though.
How often and when do you push your changes (or file a pull request or similar)?
Normally only when a feature is complete and ready for integration testing. That means it has passed unit tests and other relevant tests, and the code/patch is clean enough that I consider it ready for review.
How do you approac developing a complex feature / doing a complex refactoring requiring many places to be touched? Are "private commits" that won't build ok? When finished, do you push them also to the master repository or do you bundle all your changes into a single changeset before pushing?
I would create a named branch for the feature (which would have traceability across design docs and Issue Tracking system). Commits that don't build would only really be ok on a private branch as an intermediate step, but would still be exceptional. Currently I don't rebase and merge the entire feature branch into a single changeset, though it is something I'm looking at doing in future. Changes are only pushed when appropriate tests are all passed.
I follow this kind of flow
alt text http://img121.imageshack.us/img121/3272/versioncontrolsysbestpr.png
(Here is the original image url)
I guess this says pretty everything. Basically I would do a check-in after implementing a full working use case / user story (depends on your software process). The major important thing is that you check-in things that work in the sense that they compile. Never break the build!
Doing a commit after each user story/use case has the advantage that you have a better tracking of past versions and undoing changes is much easier.
I am in phase of implementing hudson for the build automation. I am using some shell scripts to perform one of the build step. Cancel build operation (in the middle of build process) leads to the build in illogical state. Is it possible to restrict users to use cancel build operation?
You would be better off having the build make a clean workspace on each build. Having prior exit state be a factor in the stability of the build is likely to lead to more problems.
Even if you have the Use custom workspace option enabled and have a special set up for the build, consider having the first step of the build be a script that cleans up any prior artifacts and build steps that might have been created by failed builds.
Cleaning up the workspace is not an problem, but I use some scripts to checkout and change source code in cleacase prior to package. And if somebody abort it in between, those files could remain checked out and in invalid state. Hence I need users to not use abort build operation in the middle of the process. Thanks.