Artifactory mirror doesn't update cached deb packages. Any way to fix this? - artifactory

We have an Artifactory installation acting as proxy/cache for a remote Ubuntu repository. Sometimes packages are updated on the remote but the update doesn't fully propagate to the Artifactory cache and outdated packages are being served.
What's been tried:
Using the generic as well as deb option to add the remote repository
Metadata Retrieval Cache Period (Sec) adjustment - The Release/Package files are updated and contain the correct checksums. However the checksums of the previously cached packages do not match and remain unchanged.
Disable artifact resolution in repository ON/OFF - no difference.
For testing purposes in an effort to reproduce the issue, apt-mirror was used to create a fake repository. Replacing the files there and using dpkg-scanpackages to update the Release/Package metadata on said repository.
I'd expect artifactory to validate the cache against the remote package metadata and update it on a mismatch.
Am I overlooking something or is there any way to fix this that doesn't involve an ugly workaround?

Related

Github Action failing with R CMD check, using old commit?

I'm not sure how best to describe this, hence the rather vague title.
I have an R package that uses Github Actions to run checks. You can see the workflow file here:
https://github.com/Azure/Microsoft365R/blob/master/.github/workflows/check-standard.yaml
It's basically the same as the check-standard workflow in the r-lib/actions repo, with some tweaks for my particular requirements. My latest commit is failing the check for the MacOS build, with this error:
Run remotes::install_deps(dependencies = TRUE)
Error: Error: HTTP error 404.
Not Found
Did you spell the repo owner (`hongooi73`) and repo name (`AzureGraph`) correctly?
- If spelling is correct, check that you have the required permissions to access the repo.
Execution halted
Error: Process completed with exit code 1.
The step in question is this. It just scans the package's DESCRIPTION file and installs the dependencies for the package -- all very straightforward.
- name: Install dependencies
run: |
remotes::install_deps(dependencies = TRUE)
remotes::install_cran(c("pkgbuild", "rcmdcheck", "drat"))
shell: Rscript {0}
It looks like it's trying to install a dependency from the hongooi73/AzureGraph repo, which no longer exists. But my DESCRIPTION file doesn't list hongooi73/AzureGraph as a remote dependency; it uses Azure/AzureGraph of which hongooi73/AzureGraph was a fork. It used to refer to hongooi73/AzureGraph, but that was several commits ago. Indeed, the Linux and Windows checks both run without problems so they are clearly using the correct repo location.
What can be causing this failure? And how do I fix it? I've already tried rerunning the workflow, and deleting older workflows.
You're using actions/cache to cache your R libs. By this you're restoring a cache that might be invalid if your key and the restore-keys isn't set up properly.
At the moment, there is no direct way to manually clear the cache. For some other options you can check Clear cache in GitHub Actions.
Jan. 2021:
At the moment, there is no direct way to manually clear the cache
June 2022: Actually, there now is:
List and delete caches in your Actions workflows
You can now get more transparency and control over dependency caching in your actions workflows.
Actions users who use actions/cache to make jobs faster on GitHub Actions can now use our cache list and delete APIs to:
list all the Actions caches within a repository and sort by specific metadata like cache size, creation time or last accessed time.
delete a corrupt or a stale cache entry by providing the cache key or ID.
Learn more about Managing caching dependencies to speed up workflows.
See the updated answer to "Clear cache in GitHub Actions" from beatngu13 for the GitHub API call examples.

RPM Remote Repository - Package does not match intended download

We're making use of a remote repository and are storing artifacts locally. However, we are running into a problem because of the fact the remote repository regularly rebuilds all artifacts that it hosts. In our current state, we update metadata (e.x. repodata/repomd.xml), but artifacts are not updated.
We have to continually clear our local remote-repository-cache out in order to allow it to download the rebuilt artifacts.
Is there any way we can configure artifactory to allow it to recache new artifacts as well as the new artifact metadata?
In our current state, the error we regularly run into is
https://artifactory/artifactory/remote-repo/some/path/package.rpm:
[Errno -1] Package does not match intended download.
Suggestion: run yum --enablerepo=artifactory-newrelic_infra-agent clean metadata
Unfortunately, there is no good answer to that. Artifacts under a version should be immutable; it's dependency management 101.
I'd put as much effort as possible to convince the team producing the artifacts to stop overriding versions. It's true that it might be sometimes cumbersome to change versions of dependencies in metadata, but there are ways around it (like resolving the latest patch during development, as supported in the semver spec), and in any way, that's not a good excuse.
If that's not possible, I'd look into enabling direct repository-to-client streaming (i.e. disabling artifact caching) to prevent the problem of stale artifacts.
Another solution might be cleaning up the cache using a user plugin or a script using JFrog CLI once you learn about newer artifacts being published in the remote repository.

Dependency resolution against local Artifactory takes very long

I have an Artifactory pro (without support) server installed in my local network.
One major use case for this artifactory was to use it as local cache for remote artifacts from e.g. repo1 maven repository or lightbend ivy2 repository. The hope was that I could speedup resolution of dependencies hosted on repo1 when caching them on my local artifactory.
I am pretty sure my development machine is configured correctly to exclusively resolve artifacts against my local artifactory.
However, every once in a while (suspiciously close to the interval configured as Metadata Retrieval Cache Period (Sec) in the Advanced Tab of the remote repository settings), the resolution of dependencies originally hosted on maven repo 1 takes far longer then usual.
I suspect that at these times artifactory refreshes artifact meta data (pom, ivy.xml) of remote artifacts. But this takes far longer than I would expect, assuming that a simple pom or ivy download should not take several seconds but rather a few milli seconds.
I am currently requesting root access to the server for attempting a tcpdump from OPs which may take time...
So my question is
Has anyone an idea what actually might happen that takes several seconds per dependency of a remote repository to refresh meta data files or am I looking in the wrong direction?
Update
My Artifactory version is
Artifactory Professional 5.1.3 rev 50019
We had a similar issue but with npm repo's where the meta data re-calculation was taking quite sometime and eventually we came to know that it was a bug in artifactory and was resolved in version 6.1.0. Worth checking the artifactory jira's for any such bugs. Hope so this helps!
Artifactory Jira Link

Artifactory 404 error on pulling package from remote

We are running into a 404 error when pulling a specific package from the npm remote repository. It seems to only happen with the #ngrx/effects#2.0.2. We are able to install the 2.0.0 version and other scoped packages correctly.
tested it with scoped and unscoped packages that we have never installed before and it works successfully. Just this package seems to have a problem.
We are on version 5.1.0
The issue is the metadata retrieval cache periods. In order to avoid the latency associated with upstream connections, Artifactory will cache certain metadata from the remote site (NPMJS in this case). This can mean that the period has to pass before you can see anything new.
You can read more about the settings on Artifactory Wiki entry for Advanced Settings. In your case, the relevant settings are Metadata Retrieval Cache Period and Missed Retrieval Cache Period. If you want to always receive the most up-to-date information, simply set those to zero (or a couple of minutes). This may slow down your builds a tad but it's a compromise between speed and completeness.
As administering my Artifactory install was not an option, I found an easy fix:
Remove the line containing the token to your artifactory server in ~/.npmrc.
This may be done with npm logout, however I didn't try that. In any case, the token being present resulted in 404 responses from the server.

How do I use Artifactory to mirror linux distributions?

I configured external yum and apt repositories for Artifactory, for CentOS, Debian and Ubuntu and it seems as working but Artifactory does not cache/mirror them in advance. It seems that the artifacts are cached the first time they are requested and I do want to be sure that I pre-cache them.
I imagined that this would be done by replication option but somehow it seems that this option require an Artifactory server on the other side, which I obviously do not have as these are just public http mirrors, like:
http://mirror.bytemark.co.uk/centos/
http://ftp.uk.debian.org/debian/
http://mirror.bytemark.co.uk/ubuntu/
How do I perform the caching/mirroring?
All your observations and assumptions are correct.
Arifactory remote repositories are lazy proxies and download the artifacts only on demand.
Replication can pre-populate the caches, but it requires Artifactory instances on both sides (because of the checksum-based replication algorithm it uses).
If you're sure you want to pre-populate Artifactory with all the artifacts from those repositories (we don't see this demand justified usually), the easiest way will be to use a web crawler on build the list of all the packages and then issue a HEAD request to those packages via Artifactory.

Resources