We are running a local installation of Artifactory Pro which contains around 1M artifacts. Recently, we tried to migrate from the embedded Derby DB to Postgres and switched back to Derby because of errors occurring during the migration.
After that, users reported missing files, mostly maven-metadata.xml but also at least one pom.xml. The files are missing on the filesystem.
The only way I can think of is to query the Artifactory API for all files, try to download them and check if they can be downloaded. Is there a better way to check all artifacts in Artifactory if they exist on the filesystem?
Welcome, Thomas! šš»
Although that kind of errors don't happen in normal operation, data migration back and forth of a large number of artifacts can lead to those problems sometimes.
We have a user plugin find them, so check it out, looks like it is exactly what you need.
We're making use of a remote repository and are storing artifacts locally. However, we are running into a problem because of the fact the remote repository regularly rebuilds all artifacts that it hosts. In our current state, we update metadata (e.x. repodata/repomd.xml), but artifacts are not updated.
We have to continually clear our local remote-repository-cache out in order to allow it to download the rebuilt artifacts.
Is there any way we can configure artifactory to allow it to recache new artifacts as well as the new artifact metadata?
In our current state, the error we regularly run into is
https://artifactory/artifactory/remote-repo/some/path/package.rpm:
[Errno -1] Package does not match intended download.
Suggestion: run yum --enablerepo=artifactory-newrelic_infra-agent clean metadata
Unfortunately, there is no good answer to that. Artifacts under a version should be immutable; it's dependency management 101.
I'd put as much effort as possible to convince the team producing the artifacts to stop overriding versions. It's true that it might be sometimes cumbersome to change versions of dependencies in metadata, but there are ways around it (like resolving the latest patch during development, as supported in the semver spec), and in any way, that's not a good excuse.
If that's not possible, I'd look into enabling direct repository-to-client streaming (i.e. disabling artifact caching) to prevent the problem of stale artifacts.
Another solution might be cleaning up the cache using a user plugin or a script using JFrog CLI once you learn about newer artifacts being published in the remote repository.
I have several services deployed in my Karaf-container. Most of them have multiple repository-URLs. I want to remove the old URLs, but it seems that this isnĀ“t possible if the old artifacts are no longer in my Nexus repository.
Exp: IĀ“m using Service XYZ with Version 25.0, but still have the repository-URLs for the versions 22.0,23.0 and 24.0. These versions are already deleted in my Nexus repository, but Karaf wonĀ“t uninstall/remove them because it canĀ“t find them. But why is Karaf looking at all? I just want to remove the old stuff. It seems like I have to reset the container because in its current state itĀ“s not possible to add or remove any features, Karaf always complains about not finding the old artifacts.
Is there any way to delete these entries manually, e.g. some file where Karaf keeps track of the old repository-URLs?
In a maven repository it is not a good idea to remove a version or an artifact. Anything that is published to maven must stay there. Maven creates copies of artifacts in the local repository. For releases it will never check again in nexus once a local copy is found.
So karaf simply does not expect that a released artifact ever goes away.
Why do you delete these old versions anyway?
I'm accustomed to writing desktop applications in C#.NET. Where I have a nice little solution folder which is also under version control. So at any time, on any computer I can check out whatever version of my software I want run the compiler and have a working copy of my program.
Now I'm looking into developing websites, where the files and data are a lot more dispersed. I'm using ASP.NET, but really my question is more general and could apply to any website framework.
I'm trying to understand the proper work-flow between developing my website, a version control server, and the actual live website that users will see. Obviously this can vary a lot depending on the type and scale of the website, but I'm only considering a pretty simple site. I'm just getting started with this stuff.
The diagram below shows my current idea. All the source files for the site would be stored on a subversion server, which I would check out onto my local computer. My local computer would have a local database which I would use for development of the site. Next I would publish to a test version on my hosted server, which would point to a separate test database. This test database may periodically be replaced by a copy of the live database.
If all goes well I would then publish to a beta version of the site which points to the live data. Users could then check out the beta version to provide feedback. Finally if there are still no problems the source files for the live site would be updated.
Does this make sense? Does anyone have any comments on how this could be improved? Are there any good books or online tutorials available on developing these kind of workflows?
Also the one thing that I'm really not sure about is how to manage changes to the actual schema of the database. I figure with each version I could generate a SQL script that can be use to update the Test and Live databases on the host. However, I'd also like to be able to easily setup a new database for any version of my site with out having to run every update SQL script for every version up to the desired version. Is the best solution to use an ORM like NHibernate or Subsonic so I could always generate my database schema directly from my code?
I currently use a workflow very similar to this. It's not flawless, but for the most part it works well. It sounds like you pretty much have the important parts figured out.
Where I work we have a powerful web server. Our subversion repos also live on this server. For myself, as a LAMP developer, I do all of my development on my local Linux machine (or VM) with a local MySQL database, operating on a local working copy.
At all times I maintain two primary branches of the application: A Dev branch (trunk) and a Live branch (which reflects the current production version). My repo looks like this:
/repo/trunk/ [My current active development version]
/repo/archive/ [All other versions not in active development live here]
/repo/archive/2010.12/ [This happens to be the current production version]
On the web server we maintain three separate instances of the software: Live (or Production), Beta, and Dev. The following should illustrate how we use these versions to
Dev - Always points to our development version at `/repo/trunk'. Uses a non-vesioned config file to point to the development database. Development is never physically done here however; instead we work on our local machines, where our working copies point to the Trunk, and do all our development testing on our local machines. After several commits from multiple developers, though, it's important to test on the Dev server to make sure we're on the right track and no one is breaking someone else's changes.
Live - Always points to the most recent stable production version. In this case, it points to /repo/archive/2010.12/. This version is world-accessible.
Beta - Most of the time, Beta will be a mirror of whats on Live and points to the same production version in our repository (/repo/archive/2010.12). Anytime we need bug fixes on Live, we make them here, test them, and then commit & update live. (We also merge them into Trunk, if necessary.)
When the version in Trunk is deemed complete and ready for testing, I create a new archive branch in the repository for the upcoming new release (i.e. 2011.01) by svn copying the existing production branch. Then I merge the Trunk version into the new branch and commit, so we have a version that mirrors whats currently on Dev. Of course now active development for the next release can continue on Dev, while we Beta test this new Archived version on Beta. When beta testing is complete, we commit any fixes and switch Live to the new version (/repo/archive/2011.01).
Now, you've probably figured out that the database merging is a bit trickier. We use MySQL and, to my knowledge, there's no suitable versioning system for MySQL. So with each migration (VM to Dev, Dev to Beta, Beta to Live) I'll first make a backup of the current database, then make the necessary changes. Personally I use a commercial version of SQLYog which allows me to synchronize the database schema (super handy).
The process of creating a new build and releasing it to production is a critical step in the SDLC but it is often left as an afterthought and varies greatly from one company to the next.
I'm hoping people will share improvements they have made to this process in their organisation so we can all takes steps to 'reduce the pain'.
So the question is, specify one painful/time consuming part of your release process and what did you do to improve it?
My example: at a previous employer all developers made database changes on one common development database. Then when it came to release time, we used Redgate's SQL Compare to generate a huge script from the differences between the Dev and QA databases.
This works reasonably well but the problems with this approach are:-
ALL changes in the Dev database are included, some of which may still be 'works in progress'.
Sometimes developers made conflicting changes (that were not noticed until the release was in production)
It was a time consuming and manual process to create and validate the script (by validate I mean, try to weed out issues like problem 1 and 2).
When there were problems with the script (eg the order in which things were run such as creating a record which relies on a foreign key record which is in the script but not yet run) it took time to 'tweak' it so it ran smoothly.
It's not an ideal scenario for Continuous Integration.
So the solution was:-
Enforce a policy of all changes to the database must be scripted.
A naming convention was important for ensuring the correct running order of the scripts.
Create/Use a tool to run the scripts at release time.
Developers had their own copy of the database do develop against (so there was no more 'stepping on each others toes')
The next release after we started this process was much faster with fewer problems, indeed the only problems found were due to people 'breaking the rules', eg not creating a script.
Once the issues with releasing to QA were fixed, when it came time to release to production it was very smooth.
We applied a few other changes (like introducing CI) but this was the most significant, overall we reduced release time from around 3 hours down to a max of 10-15 minutes.
We've done a few things over the past year or so to improve our build process.
Fully automated and complete build. We've always had a nightly "build" but we found that there are different definitions for what constitutes a build. Some would consider it compiling, usually people include unit tests, and sometimes other things. We clarified internally that our automated build literally does everything required to go from source control to what we deliver to the customer. The more we automated various parts, the better the process is and less we have to do manually when it's time to release (and less worries about forgetting something). For example, our build version stamps everything with svn revision number, compiles the various application parts done in a few different languages, runs unit tests, copies the compile outputs to appropriate directories for creating our installer, creates the actual installer, copies the installer to our test network, runs the installer on the test machines, and verifies the new version was properly installed.
Delay between code complete and release. Over time we've gradually increased the amount of delay between when we finish coding for a particular release and when that release gets to customers. This provides more dedicated time for testers to test a product that isn't changing much and produces more stable production releases. Source control branch/merge is very important here so the dev team can work on the next version while testers are still working on the last release.
Branch owner. Once we've branched our code to create a release branch and then continued working on trunk for the following release, we assign a single rotating release branch owner that is responsible for verifying all fixes applied to the branch. Every single check-in, regardless of size, must be reviewed by two devs.
We were already using TeamCity (an excellent continuous integration tool) to do our builds, which included unit tests. There were three big improvements were mentioning:
1) Install kit and one-click UAT deployments
We packaged our app as an install kit using NSIS (not an MSI, which was so much more complicated and unnecessary for our needs). This install kit did everything necessary, like stop IIS, copy the files, put configuration files in the right places, restart IIS, etc. We then created a TeamCity build configuration which ran that install kit remotely on the test server using psexec.
This allowed our testers to do UAT deployments themselves, as long as they didn't contain database changes - but those were much rarer than code changes.
Production deployments were, of course, more involved and we couldn't automate them this much, but we still used the same install kit, which helped to ensure consistency between UAT and production. If anything was missing or not copied to the right place it was usually picked up in UAT.
2) Automating database deployments
Deploying database changes was a big problem as well. We were already scripting all DB changes, but there were still problems in knowing which scripts were already run and which still needed to be run and in what order. We looked at several tools for this, but ended up rolling our own.
DB scripts were organised in a directory structure by the release number. In addition to the scripts developers were required to add the filename of a script to a text file, one filename per line, which specified the correct order. We wrote a command-line tool which processed this file and executed the scripts against a given DB. It also recorded which scripts it had run (and when) in a special table in the DB and next time it did not run those again. This means that a developer could simply add a DB script, add its name to the text file and run the tool against the UAT DB without running around asking others what scripts they last ran. We used the same tool in production, but of course it was only run once per release.
The extra step that really made this work well is running the DB deployment as part of the build. Our unit tests ran against a real DB (a very small one, with minimal data). The build script would restore a backup of the DB from the previous release and then run all the scripts for the current release and take a new backup. (In practice it was a little more complicated, because we also had patch releases and the backup was only done for full releases, but the tool was smart enough to handle that.) This ensured that the DB scripts were tested together at every build and if developers made conflicting schema changes it would be picked up quickly.
The only manual steps were at release time: we incremented the release number on the build server and copied the "current DB" backup to make it the "last release" backup. Apart from that we no longer had to worry about the DB used by the build. The UAT database still occasionally had to be restored from backup (eg. since the system couldn't undo the changes for a deleted DB script), but that was fairly rare.
3) Branching for a release
It sounds basic and almost not worth mentioning, yet we weren't doing this to begin with. Merging back changes can certainly be a pain, but not as much of a pain as having a single codebase for today's release and next month's! We also got the person who made the most changes on the release branches to do the merge, which served to remind everyone to keep their release branch commits to an absolute minimum.
Automate your release process whereever possible.
As others have hinted, use different levels of build "depth". For instance a developer build could make all binaries for runnning your product on the dev machine, directly from the repository while an installer build could assemble everything for installation on a new machine.
This could include
binaries,
JAR/WAR archives,
default configuration files,
database scheme installation scripts,
database migration scripts,
OS configuration scripts,
man/hlp pages,
HTML documentation,
PDF documentation
and so on. The installer build can stuff all this into an installable package (InstallShield, ZIP, RPM or whatever) and even build the CD ISOs for physical distribution.
The output of the installer build is what is typically handed over to the test department. Whatever is not included in the installation package (patch on top of the installation...) is a bug. Challenge your devs to deliver a fault free installation procedure.
Automated single step build. The ant build script edits all the installer configuration files, program files that need changed ( versioning) and then builds. No intervention required.
There is still a script run to generate the installers when it's done, but we will eliminate that.
The CD artwork is versioned manually; that needs fixed too.
Agree with previous comments.
Here is what has evolved where I work. This current process has eliminated the 'gotchas' that you've described in your question.
We use ant to pull code from svn (by tag version) and pull in dependencies and build the project (and at times, also to deploy).
Same ant script (passing params) is used for each env (dev, integration, test, prod).
Project process
Capturing requirements as user 'stories' (helps avoid quibbling over an interpretation of a requirement, when phrased as a meaningful user interaction with the product)
following an Agile principles so that each iteration of the project (2 wks) results in demo of current functionality and a releasable, if limited, product
manage release stories throughout the project to understand what is in and out of scope (and prevent confusion abut last minute fixes)
(repeat of previous response) Code freeze, then only test (no added features)
Dev process
unit tests
code checkins
scheduled automated builds (cruise control, for example)
complete a build/deploy to an integration environment, and runs smoke test
tag the code and communicate to team (for testing and release planning)
Test process
functional testing (selenium, for example)
executing test plans and functional scenarios
One person manages the release process, and ensures everyone complies. Additionally all releases are reviewed a week before launch. Releases are only approved if there are:
Release Process
Approve release for a specific date/time
Review release/rollback plan
run ant with 'production deployment' parameter
execute DB tasks (if any) (also, these scripts can be version and tagged for production)
execute other system changes / configs
communicate changes
I don't know or practice SDLC, but for me, these tools have been indispensible in achieving smooth releases:
Maven for build, with Nexus local repository manager
Hudson for continuous integration, release builds, SCM tagging and build promotion
Sonar for quality metrics.
Tracking database changes to development db schema and managing updates to qa and release via DbMaintain and LiquiBase
On a project where I work we were using Doctrine's (PHP ORM) migrations to upgrade and downgrade the database. We had all manner of problems as the generated models no longer matched with the database schema causing the migrations to completely fail half way.
In the end we decided to write our own super basic version of the same thing - nothing fancy, just up's and down's that execute SQL. Anyway it worked out great (so far - touch wood). Although we were reinventing the wheel slightly by writing our own, the fact that the focus was on keeping it simple meant that we have far less problems. Now a release is a cinch.
I guess the moral of the story here is that it is sometimes OK to reinvent the wheel some times as long as you are doing so for a good reason.