Binaries size showing high in jfrog

Binaries size showing high in jfrog - artifactory

I have deleted all the artifacts from my instance of Artifactory cloud, and I have deleted all the repositories, but the Binaries size is still showing as 1.5GB.
How to resolve this issue?
I am not seeing a maintenance option in JFrog. I am an individual user and I am the admin for this account.

Artifacts in Artifactory are not removed immediately. There is a default trash can that stores your artifacts for 2 weeks by default, before deleting them forever.
The trash can settings can be accessed only by admin, therefore as a non-admin user, you can wait ~2 weeks or ask your Artifactory admin to empty the trash can.
To empty the trash can as admin, go to Administration | Artifactory | General | Settings and then click on the "Empty Trash Can"
For more information see:
Trash Can Settings
How can I completely remove artifacts from Artifactory?
How to delete artifact from Trash can in Jfrog Artifactory?

I know it is late, but let me share this info so it may be helpful to others.
There are two scenarios
You can check whether the Trash is cleaned up or not, if not the usual retention period is 14 days, meaning once you delete any artifact it will come to Trash (if Trash facility is enabled). It will be in trash for 14 days after this period automatically Trash will be cleaned up.
When we delete any artifact in UI or by using Jfrog API it will not delete the actual binary of that artifact, instead it will delete the checksum of that binary.
In above case you can see in the screenshot Binary size is 1.53 GB and the artifact size is 8.06 MB which should not the case.
Artifact size will always be higher than Binary size.
But the reason for the above issue is - artifactoy will store binaries in checksum based storage.
So when we delete the artifact it will delete the checksum alone, not the actual binary.
Then this actual binaries will be deleted by GC run by artifactory for every 4 hours by default(this will be triggered by cront exp, we can run this manually by using API as well).
While running the GC jfrog will check is there any binaries without any checksum reference, then it will be deleted during GC run.
Please refer below links for detailed info.
https://jfrog.com/knowledge-base/why-does-removing-deleting-old-artifacts-is-not-affecting-the-artifactory-disk-space-usage/#:~:text=The%20above%20behavior,any%20other%20repository.

Related

Deleting artifacts from Jfrog artifactory doesn't result in freed disk space

I have Artifactory Jfrog 6.16.0 Pro. I installed plugin artifactCleanup and run it against repositories. It deleted about 500GB.
The next step, I delete files from trashcan and it is zero now.
The last one I run "Garbage Collection" manually.
Space wasn't freed. In Storage section it shows me the following info:
Binaries Size: 1.67 TB
Artifacts Size: 663.15 GB
Optimization: 257.79%
How can I actually free space after artifacts deletion?

First let's make sure how Artifactory GC works. From the docs:
When a new file is deployed, Artifactory checks if a binary with the
same checksum already exists and if so, links the repository path to
this binary. Upon deletion of a repository path, Artifactory does not
delete the binary since it may be used by other paths. However, once
all paths pointing to a binary are deleted, the file is actually no
longer being used. To make sure your system does not become clogged
with unused binaries, Artifactory periodically runs a "Garbage
Collection" to identify unused ("deleted") binaries and dispose of
them from the datastore. By default, this is set to run every 4 hours
and is controlled by a cron expression.
This means that if I store the same 5GB file 100 times, then our artifacts size is 500GB, while our binaries size is still 5GB. This is because Artifactory de-duplicates through checksum-based storage.
The binaries size should never be more than the artifacts size, quite the opposite, the optimization shouldn't pass 100%. However, this is calculated essentially with what you get running a "df" command, so if GC hasn't run it will show those binaries still there.
This takes us to your issue, which may not be an issue but an expected behavior also noted in the previously linked docs:
Unreferenced binaries, (including existing unreferenced binaries or
artifacts that were manually deleted from the trashcan), will be
deleted during the previous Full GC strategy that runs every 20 GC
iterations (configurable,
'artifactory.gc.skipFullGcBetweenMinorIterations=20').
This tells us that the actual deletion fo binaries will happen only every 20th iteration.
Please try to manually trigger GC 20 times; the output of the full GC will be different from the regular one, giving you a summary of what was deleted.
If that doesn't work look into the permissions for the Artifactory user to make sure it can delete files.

Git: remote: fatal: index file corrupt, "index uses yz extension"?

I have just encountered an issue I have never seen before and can't find any help on.
I run a git master repo on the live website in our hosting environment, and a bare origin repo on the same server. All our developer commits go to the bare origin master, which has a post-receive hook to push each commit to the master repo so that the push is reflected in the live site files. The only annoyance is that whenever we get WordPress updates, we have to commit on the master repo, then push it back to the origin repo so that our developers can download those updated files.
Problem: Today, I went to commit a WordPress plugin update, and the commit worked fine, but the push gave the following cryptic error that I cannot find any help on:
git push origin master
Counting objects: 344, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (323/323), done.
Writing objects: 100% (344/344), 1.76 MiB | 5.61 MiB/s, done.
Total 344 (delta 237), reused 14 (delta 1)
remote: Resolving deltas: 100% (237/237), completed with 226 local objects.
remote: Running post-receive (git pull origin master):
remote: error: index uses ▒yz▒ extension, which we do not understand
remote: fatal: index file corrupt
(The weird symbols around the letters "yz" reads as "<B3>yz<AC>" when I run git diff)
If I run git status on master (cannot be run on origin as it is a bare repo), it repeats the last two lines of output. If I run git log on both repos, they look normal, but the origin is of course one commit behind the master, because committing to master worked, but pushing to origin failed.
I have never heard of this "yz" extension before, and as far as I can tell, no one else has either. All I did was a commit and a push and the push failed. We do not use any such "yz" extension, we use vanilla Git, do very basic pushes and pulls, our post-receive hook is about a 5 line script which just auto-pushes every commit from origin to master and resets file permissions to ensure they're readable by the web server. The only weird thing we're doing is using Git to manage a WordPress site for version control. All developers use TortoiseGit on Windows without any special settings or any sort of plugins.
I have seen plenty of questions and answers about corrupt indexes, but nothing like this.
I have not tried anything yet as I'm not sure what to do, don't understand the internals of Git, and don't want to break anything, but we need to be able to pull and push and cannot until the repos are in sync again.

Everything prefixed with remote: came not from your Git but from their Git—the Git your Git called up, at the URL you used when you ran git push.
It's their Git that is complaining about their Git's index. This is probably a file corruption issue on their machine. You need to get them—whoever "they" are—to check out their system. If all looks OK, they should probably just remove and rebuild their index files.
(If "they" / "them" is really you, go over to whichever machine(s) are involved, inspect them carefully for damage or disks that are on fire or whatever, and if all looks good, remove and rebuild the index files.)

Alright, here's what I did:
Because I did not want to lose any changes to the master repo (corrupt index), as the changes needed to be kept and also pushed to origin (working index), I ran the following:
rm .git/index
git reset --mixed
(mixed is the default option)
According to the documentation:
--mixed
Resets the index but not the working tree (i.e., the changed files are preserved but not marked for commit) and reports what has not been updated. This is the default action.
Thanks to torek and bk2204 for your help in understanding the situation, and a little bit of googling to figure out the safest process for resetting an index without losing any committed or working files

Jfrog CLI for Artifactory: download folder archive

I am using JFrog CLI (jfrog rt download) to download build reports from Artifactory that are published there by GitLab CI in unpacked state in order to allow unhindered html reports browsing.
However it takes extremely long (10-20 minutes) because of just how many small files there are.
I see that Artifactory has REST API to download whole repository folder content in one swoop as a single archive.
But I am not able to find any way to do the same using JFrog CLI.
Am I missing something or is there truly no way to download whole folder content as an archive using JFrog CLI?
P.S.: I am aware that there is a configuration option on Artifactory that supposedly allows to browse contents of archives, but there are reasons (organizational and technical) preventing me from using it

Using the CLI you can increase the "--threads" value. I have seen a massive improvement when downloading a directory with lots of small files when increasing the number of threads.

How to check all artifacts in Artifactory if their files exist on disk?

We are running a local installation of Artifactory Pro which contains around 1M artifacts. Recently, we tried to migrate from the embedded Derby DB to Postgres and switched back to Derby because of errors occurring during the migration.
After that, users reported missing files, mostly maven-metadata.xml but also at least one pom.xml. The files are missing on the filesystem.
The only way I can think of is to query the Artifactory API for all files, try to download them and check if they can be downloaded. Is there a better way to check all artifacts in Artifactory if they exist on the filesystem?

Welcome, Thomas! 👋🏻
Although that kind of errors don't happen in normal operation, data migration back and forth of a large number of artifacts can lead to those problems sometimes.
We have a user plugin find them, so check it out, looks like it is exactly what you need.

Remove snapshots using scheduled task running but not removing artifacts

I have a nexus server version 2.11.1-01 that has a scheduled task setup to run over our snapshots repo to remove snapshots with the minimum snapshot count set to 3 and a retention of 5 days. The scheduled task is showing that it runs, but when I look through our repository there are as many as 26 snapshots for one artifact gav spanning 6 months.
Is there something not configured correctly or a way to find out why it isn't running correctly?

As reported in this nexus Jira:
The snapshot remover removes timestamped snapshots within one version
folder. It does not remove previous versions of snapshots. Note that
it is very common for there to be multiple active versions of
snapshots for a given artifact, some being produced by branch builds
and others by trunk builds.
The "remove if released" option can be used to clean up all snapshots
from a particular version.
Also, as reported in this other Jira, you have to check file naming layout:
Nexus supports Maven2/3 layout only
If you want to delete multiple versions of an artifact, as reported here, you have to use REST apis:
curl -X DELETE -u admin:admin123
http://localhost:8081/nexus/service/local/repositories/releases/content/com/test/project/1.0/
Or, alternatively you can delete them directly from the repository's
local storage on disk. If you delete directly from local storage then
use the REST command to rebuild metadata for the affected path:
curl -X DELETE -u admin:admin123
http://localhost:8081/nexus/service/local/metadata/repositories/snapshots/content/com/test/project/
You'll also need to update the search indexes. If you schedule a
nightly "update indexes" task it will pick up any changes made
directly to the storage area.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex