Deleting artifacts older than 2 years from local nexus repository - nexus

We're running nexus on some old hardware which is limited in disk space and would like to remove artifacts older than a certain threshold.
Is there any way to do this other than a combination of find and curl?

There is a scheduled task that can automatically remove old snapshot releases:
http://www.sonatype.com/people/2009/09/nexus-scheduled-tasks/
http://www.sonatype.com/books/nexus-book/reference/confignx-sect-managing-tasks.html
Unfortunately, this does not work for hosted release repositories.

As mentioned on a Sonatype blog post linked from a comment in the blog in gavenkoa's answer, since Nexus 2.5 there is a built in "Remove Releases From Repository" scheduled task which can be configured to delete old releases keeping a defined number.
This is sufficient to meet our needs.

Delete all files to which no one access more then 100 days and not modified more then 200 days:
find . -type f -atime +100 -mtime 200 -delete
To cleanup empty directories:
find . -type d -empty -delete
Or alternatively look to https://github.com/akquinet/nexus_cleaner/blob/master/nexus_clean.sh and corresponding blog entry http://blog.akquinet.de/2013/12/09/how-to-clean-your-nexus-release-repositories/ (delete all except last 10 releases).

auto purge older than 30 days(u can change it) not download docker images from nexus 3
https://gist.github.com/anjia0532/4a7fee95fd28d17f67412f48695bb6de
# nexus3's username and pwd
username = 'admin'
password = 'admin123'
# nexus host
nexusHost = 'http://localhost:8081'
# purge repo
repoName = 'docker'
# older than days
days = 30
#change and run it

For Nexus2, you can use my Spring Boot application https://github.com/vernetto/nexusclean , you can define rules based on date and on a minimum number of Artifacts to retain, and it generates "rm -rf" commands (using the REST API is damn slow).
For Nexus3, I would definitely use a Groovy script as a "Execute Admin Task". One is posted here groovy script to delete artifacts on nexus 3 (not nexus 2)

Related

How to delete GAVs in Nexus 3 through Nexus REST API

Is there a way to delete GAVs through a REST API in Nexus 3? From various
google searches it appears that this capability existed in Nexus 2, but not in
Nexus 3 yet. Is that true?
I tried the following with my current Nexus installation, which is OSS 3.2.1-01:
I was trying to delete GAV:
groupId = org.mycompany.myproject
artifactId = myartifact
version = 1.0.0
$ curl --request DELETE --user "USERNAME:PASSWORD" --write-out '%{http_code}\n' http://my-server:8081/service/local/repositories/my-repo/content/org.mycompany.myproject/myartifact/1.0.0
This gave me a 405.
I also looked at the release notes for 3.3 through 3.5 and nothing jumped out
that REST API support was added.
I also looked into
https://help.sonatype.com/display/NXRM3/REST+and+Integration+API. I downloaded
the nexus-book-examples and downloaded several of the Javadocs (nexus-core,
nexus-repository, nexus-common, nexus-script, nexus-commands, nexus-selector)
for version 3.2.1-01 and started to look through the code. It was not clear
where to start with a simple program to delete GAVs.
Am I correct that you cannot delete GAVs through the REST API in Nexus 3? Is
there a plan to support this in a future Nexus 3 release? Is there a way to do
what I want to do by creating a Groovy script using the code referenced by the
REST+and+Integration+API link above? Is there some sample code which will help
bootstrap me to using the above code (either 3.2.1-01, or a newer version of
Nexus).
Thanks.
You might take a look at our Beta REST API in Nexus Repository 3. Upgrade to a version greater than 3.3, preferably to 3.5 (just so you are using latest and greatest) and navigate to:
http://nexushostname:nexusport/swagger-ui/
Since the REST API is currently Beta we have yet to publish documentation or fanfare around it while we let people experiment with it and give us feedback.
You should see endpoints for deleting components and assets. You will likely want to use the component delete, so that it will clean up all associated assets.
Let me know your mileage!
According to the documentation, you can delete an ASSET (individual file) or a COMPONENT (a set of files, like jar+md5+sha1+pom.xml representing an artifact) only if you know the assetId or the componentId
https://help.sonatype.com/repomanager3/rest-and-integration-api/components-api
https://help.sonatype.com/repomanager3/rest-and-integration-api/assets-api
So you should issue a separate search call passing the GAV and finding out the componentId, then use the componentId to delete in a second call.
However I see here https://issues.sonatype.org/browse/NEXUS-11266 and here
https://issues.sonatype.org/browse/NEXUS-11881 that people can delete an asset just by specifying the entire path... I have tried with
curl -u admin:admin123 -X "DELETE" -w "%{http_code}" http://localhost:8081/repository/deployments/org/apache/commons/commons-compress/1.18/commons-compress-1.18.jar
and it gives me a HTTP 204 (no content). In my case "deployments" is a hosted repository. I have tried the same command on "central" (aproxy repo) and I get a 405.
But if I try to download the whole component (including pom, sha1 etc) with
curl -u admin:admin123 -X "DELETE" -w "%{http_code}" http://localhost:8081/repository/deployments/org/apache/commons/commons-compress/1.18/
I get a HTTP 404.
I know, it's painful and in Nexus2 it was much easier.

Get size of specific repository in Nexus 3

How can I get a size of specific repository in Nexus 3?
For example, Artifactory shows the repository "size on disk" via UI.
Does Nexus have something similar? If not - how can I get this information by script?
You can use admin task with groovy script nx-blob-repo-space-report.groovy from https://issues.sonatype.org/browse/NEXUS-14837 - for me turned out too slow
Or you can get it from database:
login with user-owner nexus installation on nexus server (e.g.
nexus)
go to application directory (e.g. /opt/nexus):
$ cd /opt/nexus
run java orient console:
$ java -jar ./lib/support/nexus-orient-console.jar
connect to local database (e.g. /opt/sonatype-work/nexus3/db/component):
> CONNECT PLOCAL:/opt/sonatype-work/nexus3/db/component admin admin
find out repository row id in #RID column by repository_name value:
> select * from bucket limit 50;
get sum for all assets with repo row id found in the previous step:
> select sum(size) from asset where bucket = #15:9;
result should be like (apparently in bytes):
+----+------------+
|# |sum |
+----+------------+
|0 |224981921470|
+----+------------+
nexus database connection steps took from https://support.sonatype.com/hc/en-us/articles/115002930827-Accessing-the-OrientDB-Console
another useful queries
summary size by repository name (instead 5 and 6 steps):
> select sum(size) from asset where bucket.repository_name = 'releases';
top 10 repositories by size:
> select bucket.repository_name as repository,sum(size) as bytes from asset group by bucket.repository_name order by bytes desc limit 10;
Assign each repository to its own Blob Store.
You can refer below GitHub project. The script can be help to clear storage space on Nexus repository by analyzing the stats generated by the script. The script feature prompt-based user input, Search/Filter the results, Generates CSV output file and Print the output on console in a tabular format.
Nexus Space Utilization - GitHub
You may also refer below post on the same.
Nexus Space Utilization - Post

Can Ansible unarchive be made to write static folder modification times?

I am writing a build process for a WordPress installation using Ansible. It doesn't have a application-level build system at the moment, and I've chosen Ansible so that it can cleanly integrate with server build scripts, so I can bring up a working server at the touch of a button.
Most of my WordPress plugins are being installed with the unarchive feature, pointing to versioned plugin builds on the official wordpress.org installation server. I've encountered a problem with just one of these, which is that it is always being marked as "changed" even though the files are exactly the same.
Having examined the state of ls -Rl before and after, I noticed that this plugin (WordPress HTTPS) is the only one to use internal sub-directories, and upon each decompression, the modification time of folders is getting bumped.
It may be useful to know that this is a project build script, with a connection of local. I guess therefore that means that SSH is not being used.
Here is a snippet of my playbook:
- name: Install the W3 Total Cache plugin
unarchive: >
src=https://downloads.wordpress.org/plugin/w3-total-cache.0.9.4.1.zip
dest=wp-content/plugins
copy=no
- name: Install the WP DB Manager plugin
unarchive: >
src=https://downloads.wordpress.org/plugin/wp-dbmanager.2.78.1.zip
dest=wp-content/plugins
copy=no
# #todo Since this has internal sub-folders, need to work out
# how to preserve timestamps of the original folders rather than
# re-writing them, which forces Ansible to record a change of
# server state.
- name: Install the WordPress HTTPS plugin
unarchive: >
src=https://downloads.wordpress.org/plugin/wordpress-https.3.3.6.zip
dest=wp-content/plugins
copy=no
One hacky way of fixing this is to use ls -R before and after, using options to include file sizes but not timestamps, and then md5sum that output. I could then mark it as changed if there is a change in checksum. It'd work but it's not very elegant (and I'd want to do that for all plugins, for consistency).
Another approach is to abandon the task if a plugin file already exists, but that would cause problems when I bump the plugin version number to the latest copy.
Thus, ideally, I am looking for a switch to present to unarchive to say that I want the folder modification times from the zip file, not from playbook runtime. Is it possible?
Update: a commenter asked if the file contents could have changed in any way. To determine whether they have, I wrote this script, which creates a checksum for (1) all file contents and (2) all file/directory timestamps:
#!/bin/bash
# Save pwd and then change dir to root location
STARTDIR=`pwd`
cd `dirname $0`/../..
# Clear collation file
echo > /tmp/wp-checksum
# List all files recursively
find wp-content/plugins/wordpress-https/ -type f | while read file
do
#echo $file
cat $file >> /tmp/wp-checksum
done
# Get checksum of file contents
sha1sum /tmp/wp-checksum
# Get checksum of file sizes
ls -Rl wp-content/plugins/wordpress-https/ | sha1sum
# Go back to original dir
cd $STARTDIR
I ran this as part of my playbook (running it in isolation using tags) and received this:
PLAY [Set this playbook to run locally] ****************************************
TASK [setup] *******************************************************************
ok: [localhost]
TASK [jonblog : Run checksum command] ******************************************
changed: [localhost]
TASK [jonblog : debug] *********************************************************
ok: [localhost] => {
"checksum_before.stdout_lines": [
"374fadc4df1578f78fd60b1be6758477c2c533fa /tmp/wp-checksum",
"10d66f7bdbbdd3af531d1b11a3db3059a5868838 -"
]
}
TASK [jonblog : Install the WordPress HTTPS plugin] ***************
changed: [localhost]
TASK [jonblog : Run checksum command] ******************************************
changed: [localhost]
TASK [jonblog : debug] *********************************************************
ok: [localhost] => {
"checksum_after.stdout_lines": [
"374fadc4df1578f78fd60b1be6758477c2c533fa /tmp/wp-checksum",
"719c9da94b525e723b1abe188ee9f5bbaf121f3f -"
]
}
PLAY RECAP *********************************************************************
localhost : ok=6 changed=3 unreachable=0 failed=0
The debug lines reflect the checksum hash of the contents of the files (this is identical) and then the checksum hash of ls -Rl of the file structure (this has changed). This is in keeping with my prior manual finding that directory checksums are changing.
So, what can I do next to track down why folder modification times are incorrectly flagging this operation as changed?
Rather than overwriting all files each time and find a way to keep the same modification datetime, you may want to use the creates option of the unarchive module.
As you maybe already know, this tells Ansible that a specific file/folder will be created as a result of the task. Thus, next time the task will not be run again if that file/folder already exists.
See http://docs.ansible.com/ansible/unarchive_module.html#options
My solution is to modify the checksum script and to make that a permanent feature of the Ansible process. It feels a bit hacky to do my own checksumming, when Ansible should do it for me, but it works.
New answers that explain that I am doing something wrong, or that a new version of Ansible fixes the problem, would be most welcome.
If I get a moment, I will raise this as a possible bug with the Ansible team. However I do sometimes wonder about the effort/reward ratio when raising bugs on a busy tracker - I already have one item outstanding, it has been waiting a while, and I've chosen to work around that too.
Update (18 months later)
This Ansible build system never made it into live. It felt like I was always working around something. Recently, when I decided I needed to move my blog to another server, I finally Dockerised it. This took several weeks (since there is a surprising amount of things to think about in a real WordPress installation) but in general I found the process much nicer than using orchestration tools.

Deleting artifacts in artifactory

I want to delete artifacts in artifactory.I googled and found this link
https://www.jfrog.com/confluence/display/RTF/Artifactory+REST+API
Here the Delete build,using REST API,is what we are going for at the moment.Can any one give me a general idea how the command should look using curl command.Also in buildname what do i need to specify?
For deleting a single artifact or folder you should use the Delete Item API, for example
curl -uadmin:password -XDELETE http://localhost:8080/artifactory/libs-release-local/ch/qos/logback/logback-classic/0.9.9
Notice that you will need a user with delete permissions.
If all goes well you should expect a response with a 204 status and no content.
The delete API is intended for deleting build information and is relevant if you are using the Artifactory build integration.
Nowadays there's a tool that can be used for it (note that I am a contributor to that tool):
https://github.com/devopshq/artifactory-cleanup
Assume i have 10 repositories and i want to keep only last 20 artifacts in 5 repositories and unlimited in other 5 repositories
The rule for 10 repositories would look like:
# artifactory-cleanup.yaml
artifactory-cleanup:
server: https://repo.example.com/artifactory
# $VAR is auto populated from environment variables
user: $ARTIFACTORY_USERNAME
password: $ARTIFACTORY_PASSWORD
policies:
- name: reponame
rules:
- rule: Repo
name: "reponame"
- rule: KeepLatestNFiles
count: 20

Capistrano 3 deploy WordPress with GIT available in release dir

I use Capistrano 3 to deploy my WordPress projects (as implemented in the Bedrock WP stack: https://github.com/roots/bedrock).
WordPress specifically supports a number of features that update the actual code of the production/staging sites (plugin updates, settings files for certain plugins etc) and there are various scenarios where I might want to commit these code changes to the project GIT repo directly from a server.
So, the question is, is there a way configure Capistrano Deploy to keep the .git repo in the relase dir?
I gather this was doable with the 'copy strategy' settings in Cap 2, but I can't find any info about this for Cap 3.
I've solved this by modifying the custom deployment strategy that's implemented by https://github.com/Mixd/wp-deploy project.
Note the changed context.execute line.
# Usage:
# 1. Drop this file into lib/capistrano/submodule_strategy.rb
# 2. Add the following to your Capfile:
# require 'capistrano/git'
# require './lib/capistrano/submodule_strategy'
# 3. Add the following to your config/deploy.rb
# set :git_strategy, SubmoduleStrategy
module SubmoduleStrategy
# do all the things a normal capistrano git session would do
include Capistrano::Git::DefaultStrategy
# check for a .git directory
def test
test! " [ -d #{repo_path}/.git ] "
end
# same as in Capistrano::Git::DefaultStrategy
def check
test! :git, :'ls-remote', repo_url
end
def clone
git :clone, '-b', fetch(:branch), '--recursive', repo_url, repo_path
end
# same as in Capistrano::Git::DefaultStrategy
def update
git :remote, :update
end
# put the working tree in a release-branch,
# make sure the submodules are up-to-date
# and copy everything to the release path
def release
release_branch = fetch(:release_branch, File.basename(release_path))
git :checkout, '-B', release_branch,
fetch(:remote_branch, "origin/#{fetch(:branch)}")
git :submodule, :update, '--init'
# context.execute "rsync -ar --exclude=.git\* #{repo_path}/ #{release_path}"
context.execute "rsync -ar #{repo_path}/ #{release_path}"
end
end
This solution now deploys the release as a GIT repo set to a custom branch based on the release id.
This can then be committed and pushed up to the master repo for merging as required.
All credit goes to Aaron Thomas, the creator of the WP-Deploy project.

Resources