Docker layer reuse - artifactory

Artifactory at the moment stores multiple duplicate docker image layers. If image A and image B both depend on layer SHA__12345 then artifactory will store both layer copies. Which is not a problem unless the layer SHA__12345 is a a gigabyte in size. In that case you can really quickly run out of space.
Is there a way in artifactory to deduplicate overlapping layers for storage reasons?
Thanks!

Artifactory uses checksum-based storage:
A file that is uploaded to Artifactory, first has its SHA1 checksum calculated, and is then renamed to its checksum. It is then hosted in the configured filestore in a directory structure made up of the first two characters of the checksum. For example, a file whose checksum is "ac3f5e56..." would be stored in directory "ac"; a file whose checksum is "dfe12a4b..." would be stored in directory "df" and so forth.
In parallel, Artifactory's creates a database entry mapping the file's checksum to the path it was uploaded to in a repository. This way of storing binaries optimizes many operations in Artifactory since they are implemented through simple database transactions rather than actually manipulating files.
One implication of this is that artifacts are deduplicated in general. Any two artifacts with the same checksum will point to the same file in storage, even if they're in different repositories. This applies to docker layers, as well as all other artifacts. So you shouldn't be having any issues with this.

Related

is common artifact to multiple Artifactory repositories stored only once?

Artifactory is using storage based on checksum; So if i need to upload the same artifact in 2 artifactory repos; The artifact shall be physically stored only once to optimize footprint.
Is this applicable to any type of repo: especially generic and docker ?
in other words, if i have 2 registries configured in my artifactory, will image common to several charts be stored only once?
Brs
Yes. It's stored only once for best efficiency and control, regardless of the repository type.
See the official documentation on how it actually works.

rdiff-backup-like storage on Artifactory

I am looking for a way to store files in Artifactory repository in a storage efficient way and upload/download difference between local version and remote in order to save disk space, bandwidth and time.
There are two good utilities which works in this way rsync and rdiff-backup. Sure there are others.
Is there a way to organize something similar with Artifactory stack?
What is rsync:
DESCRIPTION
Rsync is a fast and extraordinarily versatile file copying tool. It can copy locally,
to/from another host over any remote shell, or to/from a remote rsync daemon. It offers
a large number of options that control every aspect of its behavior and permit very
flexible specification of the set of files to be copied. It is famous for its
delta-transfer algorithm, which reduces the amount of data sent over the network by
sending only the differences between the source files and the existing files in the des-
tination. Rsync is widely used for backups and mirroring and as an improved copy com-
mand for everyday use.
JFrog CLI includes a functionality called "Sync Deletes", allowing to sync files between the local file system and Artifactory.
This functionality is supported by both the "jfrog rt upload" and "jfrog rt download" commands. Both commands accept the optional --sync-deletes flag.
When uploading, the value of this flag specofies a path in Artifactory, under which to sync the files after the upload. After the upload, this path will include only the files uploaded during this upload operation. The other files under this path will be deleted.
The same goes for downloading, but this time, the value of the --sync-deletes flag specifies a path in the local file system, under which files which had not been downloaded from Artifactory are deleted.
Read more this in the following link:
https://www.jfrog.com/confluence/display/CLI/CLI+for+JFrog+Artifactory

Can Artifactory age artifacts to S3?

We have an Artifactory solution deployed and I am trying to figure out if it can meet my use case. The normal use case is that artifacts are deleted within a week or so and can normally fit in X GB of local storage, but we'd like to be able to:
Keep some artifacts around much longer, and since they are accessed infrequently, store them in AWS S3.
Sometimes artifacts aren't able to be cleaned up in time, so we'd like to burst to the cloud when local storage is overflowed.
I was thinking I could do the following:
Local repository of X GB
Repo pointing to S3
Virtual repo in front of both of these
Setup a plugin to move artifacts from local->S3 via our policies
However, I can't figure out what a Filestore is in Artifactory, and how you'd have two Repositories backed by different filestores.
Anyone have pointers to documentation or anything that can help? The docs I can find are rather slim on the high level details of filestores and repositories.
The Artifactory binary provider does not support configuring multiple storage backends, so it is impossible to use S3 and NFS in parallel. The main reason for this limitation is that Artifactory has a checksum based storage which stores each binary only once and keeps pointers from all relevant repositories. For that reason Artifactory does not manage separate storage per repository.
For archiving purposes, one of the possible solutions is setting up another Artifactory instance which will take care of archiving. This instance can be connected to an S3 storage backend.
You can use replication to synchronize between the two instances (without syncing deletes). You can have a repository(s) in your master Artifactory which contains artifacts which should be archived, those artifacts will be replicated to the archive Artifactory and later on can be deleted from the master.
You can use a user plugin to decide which artifacts should be moved to the archive repository.

Mounting two locations with html5fs and different PERSISTENT and TEMPORARY types

I am using nacl_io library in my project. Is it possible to mount two locations with html5fs file system but different type PERSISTENT and TEMPORARY at the same time?
Thanks.
This is supported.
The nacl-spawn library in the naclports repo, which is used to build command line tools, does this by default. It mounts a temporary html5fs at /tmp and persistent ones at /mnt/html5 and /home/user.

A file storage format for file sharing site

I am implementing a file sharing system in ASP.NET MVC3. I suppose most file sharing sites store files in a standard binary format on a server's file system, right?
I have two options storage wise - a file system, or binary data field in a database.
Is there any advantages in storing files (including large one's) in a database, rather then on file system?
MORE INFO:
Expected average file size is 800 MB. 3 files per minute are to be usually requested to be fed back to the user, who is downloading.
If the files are as big as that, then using the filesystem is almost certainly a better option. Databases are designed to contain relational data grouped into small rows and are optimized for consulting and comparing the values in these relations. Filesystems are optimized for storing fairly large blobs and recalling them by name as a bytestream.
Putting files that big into a database will also make it difficult to manage the space occupied by the database. The tools to query space used in a filesystem, and remove and replace data are better.
The only caveat to using the filesystem is that your application has to run under an account that has the necessary permission to write the (portion of the) filesystem you use to store these files.
Use FileStream when:
Objects that are being stored are, on average, larger than 1 MB.
Fast read access is important.
You are developing applications that use a middle tier for application logic.
Here is MSDN link https://msdn.microsoft.com/en-us/library/gg471497.aspx
How to use it: https://www.simple-talk.com/sql/learn-sql-server/an-introduction-to-sql-server-filestream/

Resources