We have an Artifactory solution deployed and I am trying to figure out if it can meet my use case. The normal use case is that artifacts are deleted within a week or so and can normally fit in X GB of local storage, but we'd like to be able to:
Keep some artifacts around much longer, and since they are accessed infrequently, store them in AWS S3.
Sometimes artifacts aren't able to be cleaned up in time, so we'd like to burst to the cloud when local storage is overflowed.
I was thinking I could do the following:
Local repository of X GB
Repo pointing to S3
Virtual repo in front of both of these
Setup a plugin to move artifacts from local->S3 via our policies
However, I can't figure out what a Filestore is in Artifactory, and how you'd have two Repositories backed by different filestores.
Anyone have pointers to documentation or anything that can help? The docs I can find are rather slim on the high level details of filestores and repositories.
The Artifactory binary provider does not support configuring multiple storage backends, so it is impossible to use S3 and NFS in parallel. The main reason for this limitation is that Artifactory has a checksum based storage which stores each binary only once and keeps pointers from all relevant repositories. For that reason Artifactory does not manage separate storage per repository.
For archiving purposes, one of the possible solutions is setting up another Artifactory instance which will take care of archiving. This instance can be connected to an S3 storage backend.
You can use replication to synchronize between the two instances (without syncing deletes). You can have a repository(s) in your master Artifactory which contains artifacts which should be archived, those artifacts will be replicated to the archive Artifactory and later on can be deleted from the master.
You can use a user plugin to decide which artifacts should be moved to the archive repository.
Related
In a near future I will start using Artifactory in my project. I have been reading about local and remote repositories and I am a bit confused of their practical use. In general as far as I understand
Local repositories are for pushing and pulling artifacts. They have no connection to a remote repository (i.e. npm repo at https://www.npmjs.com/)
Remote repositories are for pulling and caching artifacts on demand. It works only one way, it is not possible to push artifacts.
If I am right up to this point, then practically it means you only need a remote repository for npm if you do not develop npm modules but only use them to build your application. In contrast, if you need to both pull and push Docker container images, you need to have one local repository for pushing&pulling custom images and one remote repository for pulling official images.
Question #1
I am confused because our Artifactory admin created a local npm repository for our project. When I discussed the topic with him he told me that I need to first get packages from the internet to my PC and push them to Artifactory server. This does not make any sense to me because I have seen some remote repositories on the same server and what we need is only to pull packages from npm. Is there a point that I miss?
Question #2
Are artifacts at remote repository cache saved until intentionally deleted? Is there a default retention policy (i.e. delete packages older than 6 months)? I ask this because it is important to keep packages until a meteor hits the servers (for archiving policy of the company).
Question #3
We will need to get official Docker images and customize them for CI. It would be a bit hard to maintain one local repo for pulling&pushing custom images and one remote repo for pulling official images. Let's say I need to pull official Ubuntu latest, modify it, push and finally pull the custom image back. In this case it should be pulled using remote repository, pushed to local repo and pulled again from local repo. Is it possible to use virtual repositories to do this seamlessly as one repo?
Question #1 This does not make any sense to me because I have seen some remote repositories on the same server and what we need is only to pull packages from npm. Is there a point that I miss?
Generally, you would want to use a remote repository for this. You would then point your client to this remote repository and JFrog Artifactory would grab them from the remote site and cache them locally, as needed.
In some very secure environments, corporate policies do not even allow this (they may not even be connected to the internet) and instead manually download, vet, and then upload those third-party libraries to a local repository. I don't think that is your case and they may just not understand their intended usages.
Question #2 Are artifacts at remote repository cache saved until intentionally deleted? Is there a default retention policy?
They will not be deleted unless you actively configure it to do so.
For some repo types there are built-in retention mechanisms like the number of snapshots or maximum tags but not for all of them and even in those that have it, they must be actively turned on. Different organizations have different policies for how long artifacts must be maintained. There are a lot of ways to cleanup those old artifacts but ultimately it will depend on your own requirements.
Question #3 Is it possible to use virtual repositories to do this seamlessly as one repo?
A virtual repository will let you aggregate your local and remote sites and appear as a single source. So you can do something like:
docker pull myarturl/docker/someimage:sometag
... docker build ...
docker push myarturl/docker/someimage:sometag-my-modified-version
docker pull myarturl/docker/someimage:sometag-my-modified-version
It is also security-aware so if the user only has access to the local stuff and not the remote stuff, they will only be able to access the local stuff even though they are using the virtual repository that contains both of them.
That said, I don't see why it would be any harder to explicitly use different repositories:
docker pull myarturl/docker-remote/someimage:sometag
... docker build ...
docker push myarturl/docker-local/someimage:sometag-my-modified-version
docker pull myarturl/docker-local/someimage:sometag-my-modified-version
This also has the added advantage that you know they can only pull your modified version of the image and not the remote (though you can also accomplish that by creating the correct permissions).
Artifactory is using storage based on checksum; So if i need to upload the same artifact in 2 artifactory repos; The artifact shall be physically stored only once to optimize footprint.
Is this applicable to any type of repo: especially generic and docker ?
in other words, if i have 2 registries configured in my artifactory, will image common to several charts be stored only once?
Brs
Yes. It's stored only once for best efficiency and control, regardless of the repository type.
See the official documentation on how it actually works.
Is there a script or any other automated process for migration of artifacts into JFrog? We are currently working on this and need more information to carry out this process. Please help us in achieving this. Thanks in advance.
If you have an existing artifact repository, JFrog Artifactory supports acting as an artifact proxy while you are in the process of migrating to Artifactory.
I would recommend the following:
Create a local repository in artifactory
Create a remote repository in artifactory which points to your current artifact repository.
Create a virtual repository in artifactory which contains both the local and remote repositories.
Iterate on all your projects to have them publish to the local artifactory repository and pull from the virtual repository.
The advantage to this workflow is that you can port things over piece by piece, rather than trying to do it all at once. If you point a dependency at artifactory that hasn't been ported there yet, artifactory will proxy it for you. When the dependency is ported over, it will be transparent to its users.
When you have moved everything to your local Artifactory repository, then you can remove the remote repository from your virtual repository.
The relevant documentation is available here: https://www.jfrog.com/confluence/display/RTF/Configuring+Repositories
For an Enterprise account, I'd suppose S3 storage and a significant number of artifacts, so there will be no easy and automated way to do it. It also highly dependent on the storage implementation of choice in the on-prem solution. If you plan to use S3 storage, JFrog can help to perform S3 replication. In other scenarios, the solution will be different. I suggest contacting the support.
Our setup includes a company wide Artifactory that holds in-house-built artifacts as well as goes out and fetches publicly available artifacts. I’m trying to setup a local Artifactory at our location that would fetch publicly available artifacts through the regular internet, but would connect to the company wide Artifactory for our in-house-built artifacts. Is this possible?
In my local Artifactory setup, I put the company wide Artifactory URL as a Remote Repository. I can hit the Test button and it tells me that it successfully connected. However, when I go to download an artifact it does not work. I would like to say that publicly available artifacts can be fetched through my local Artifactory, so at least I can get to jcenter.bintray.
Can one Artifactory be connected to another Artifactory? If yes, is there a way to test if this connection works
I don’t think we would be using all the contents of the company wide Artifactory, so I don’t want to do an export and import to the local or do replication. I would prefer if we could fetch on demand. Is this possible?
Edit: Thanks to #DarthFennec pointing me to Smart Remote Repositories I have solved my problem. To others who have the same problem
Please follow the steps mentioned on the previously mentioned page to set up the Smart Remote Repository. In my case Artifactory did not detect that the remote was another instance of Artifactory and did not give me any options to set, but I was not interested in these anyway.
Note You can always click the Test button to make sure that your connection to the Remote Repository works.
Next, go to the Admin -> Virtual Repositories select your Repository Key and select your Smart Repository from the Available Repositories so that it moves into the Selected Repositories. Click Save & Finish at the bottom and you should be good to go.
I'm not sure exactly what your problem ended up being, but if you want to remote one Artifactory repository from another, it should be a smart remote repository. This is when Artifactory detects that a remote is pointing at another Artifactory, and it enables a number of extra features, like download statistics, property replication, and remote browsing.
An important thing to keep in mind when configuring a smart remote repository is that depending on the package type, you might need to point the remote at <artifactory>/api/<type>/<repo>, rather than just <artifactory>/<repo>. This is the case for Bower, Chef, CocoaPods, Docker, Go, NuGet, Npm, Php Composer, Puppet, Pypi, RubyGems, and Vagrant repositories. Other repository types should use the standard <artifactory>/<repo> URL.
I am looking for a solution that would allow me to have a network share where people can access (read-only) the artifacts from an Artifactory repository.
Why? We use Artifactory to also keep track of big binaries like installation kits, ISO images and so on and it takes a lot of time to download all of them (sometimes as zips), unpack and run them. If these would be exported to a NFS/SMB share people would be able to only mount them in order to use them.
How can we achieve this? Please keep in mind that we also want to automate this, so the files would be updated by Artifactory when needed.
Artifactory supports WebDAV out of the box.
It seems that's not possible at this moment and there is a feature request for enabling it:
https://www.jfrog.com/jira/browse/RTFACT-8302
Feel free to vote and to comment on it, allowing jFrog to realise how important is this use case.
I guess they should be able to provide a script that does mirror/sync a repository to a NFS share but that would almost double the storage space needed.
Instead if they would use hardlinks or symlinks to create a browsable tree of the repository inside the storage directory, this would be solved and no sync will be needed.