How to list subfolders in Artifactory - artifactory

I'm trying to write a script which cleans up old builds in my generic file repository in Artifactory. I guess the first step would be to look in the repository and check which builds are in there.
Each build shows up as a subfolder of /foo, so for example I have folders /foo/123, /foo/124, /foo/125/, etc.
There doesn't seem to be a ls or dir command. So I tried the search command:
jfrog rt search my-repo/foo/*
But this recursively lists all files, which is not what I'm looking for. I just need the list of direct subfolders. I also tried
jfrog rt search my-repo/foo/* --recursive=false
but this doesn't return any results, because the search command only returns files, not folders.
How do I list the subfolders of a given folder in an Artifactory repository?

Just one more way to do it with curl and jq
curl -s http://myatifactory.domain:4567/artifactory/api/storage/myRepo/myFolder | jq -r '.children[] |select(.folder==true) |.uri'
Explanation: Curl is used to get the folder info and that is piped to JQ which then displays all the uri keys of the children array whose folder key has value true.
Just for easier understanding - the json that curl gets looks something like this (example from artifactory docs)
{
"uri": "http://localhost:8081/artifactory/api/storage/libs-release-local/org/acme",
"repo": "libs-release-local",
"path": "/org/acme",
"created": ISO8601 (yyyy-MM-dd'T'HH:mm:ss.SSSZ),
"createdBy": "userY",
"lastModified": ISO8601 (yyyy-MM-dd'T'HH:mm:ss.SSSZ),
"modifiedBy": "userX",
"lastUpdated": ISO8601 (yyyy-MM-dd'T'HH:mm:ss.SSSZ),
"children": [
{
"uri" : "/child1",
"folder" : "true"
},{
"uri" : "/child2",
"folder" : "false"
}
]
}
and for it the output of the command would be /child1.
Of course here it's assumed that artifactory repo myRepo allows anonymous read.

You should have a look to AQL (Artifactory Query Langage) here : https://www.jfrog.com/confluence/display/RTF/Artifactory+Query+Language
as an example the following AQL will retrieve all folders located in "my-repo" under "foo" folder and will display the result ordered by folder's name :
items.find(
{
"type":"folder",
"repo":{"$eq":"my-repo"},
"path":{"$eq":"foo"}
}
)
.include("name")
.sort({"$desc":["name"]})
For cleanup you can also have a look at the following example which gives a list of the 10 biggest artifacts created more than a month ago that have never been downloaded :
items.find(
{
"type":"file",
"repo":{"$eq":"my-repo"},
"created":{"$before":"1mo"},
"stat.downloads":{"$eq":null}
}
)
.include("size","name")
.sort({"$desc":["size"]})
.limit(10)

Based on jroquelaure's answer, I ended up with the following. The key thing that was still missing was that you have to convert the "items.find" call into JSON when putting it in a filespec. There is an example of that in the filespec documentation which I missed at first.
I put this JSON in a test.aql file:
{
"files":
[
{
"aql":
{
"items.find" :
{
"type":"folder",
"repo":{"$eq":"my-repo"},
"path":{"$eq":"foo"}
}
}
}
]
}
Then I call jfrog rt search --spec=test.aql.

The jfrog cli now includes the --include-dirs option for search.
The command:
jf rt search --recursive=false --include-dirs path/
will essentially act like an ls.

By default, it searches for files, if you want to list directories, add one more property --include-dirs
Refer the link for additional parameters. jfrog search
Here is the command.
jf rt search --recursive=false --include-dirs=true path/
Response:
[
{
"path": "artifactory-name/path",
"type": "folder",
"created": "",
"modified": ""
}
]

A cleaner approach is to tell Artifactory about builds, and let it discard old ones.
There are 3 parts to this. My examples are for the jfrog command line utility:
When uploading files with the "jfrog rt upload" command, use the --build-name someBuildName and --build-number someBuildNumber arguments. This links the uploaded files to a certain build.
After uploading files, publish the build with "jfrog rt build-publish someBuildName someBuildNumber"
To clean up all but the 3 latest builds, use "jfrog rt build-discard --max-builds=3 someBuildName"

Related

How to avoid Jupyter cell-ids from changing all the time and thereby spamming the VCS diffs?

As discussed in q/66678305, newer Jupyter versions store in addition to the source code and output of cells an ID for the purpose of e.g. linking to a cell.
However, these IDs aren't stable but often change even when the cell's source code was not touched. As a result, if you have the .ipynb file under version control with e.g. git, the commits end up having lots of rather funny sounding “changed lines” that don't correspond to any actual change made in the commit. Like,
{
"cell_type": "code",
"execution_count": null,
- "id": "respected-breach",
+ "id": "incident-winning",
"metadata": {},
"outputs": [],
Is there a way to prevent this?
Answer for Git on Linux. Probably also works on MacOS, but not Windows.
It is good practice to not VCS the .ipynb files as saved by Jupyter, but instead a filtered version that does not contain all the volatile information. For this purpose, various git hooks are available; the one I'm using is based on https://github.com/toobaz/ipynb_output_filter/blob/master/ipynb_output_filter.py.
Strangely enough, it turns out this script can not be modified to remove the "id" field from cells. Namely, if you try to remove that field in the filtering loop, like with
for field in ("prompt_number", "execution_number", "id"):
if field in cell:
del cell[field]
then the write function from jupyter_nbformat will just put an id back in. It is possible to merely change the id to something constant, but then Jupyter will complain about nonunique ids.
As a hack to circumvent this, I now use this filter with a simple grep to delete the ID:
#!/bin/bash
grep -v '^ *"id": "[a-z\-]*",$'
Store that in e.g. ~/bin/ipynb_output_filter.sh, make it executable (chmod +x ~/bin/ipynb_output_filter.sh) and ensure you have the following ~/.gitattributes file:
*.ipynb filter=dropoutput_ipynb
and in your git config (either global ~/.gitconfig or project)
[core]
attributesfile = ~/.gitattributes
[filter "dropoutput_ipynb"]
clean = ~/bin/ipynb_output_filter.sh
smudge = cat
If you want to use a standard python filter in addition to that, you can invoke it before the grep in ~/bin/ipynb_output_filter.sh, like
#!/bin/bash
~/bin/ipynb_output_filter.py | grep -v '^ *"id": "[a-z\-]*",$'

In Artifactory, can I modify the target paths for AQL download results like I can when using patterns and placeholders?

Is there a way to perform path manipulation against AQL results? i.e. similar to how Placeholders can be used to modify a pattern's results when assigning the target path.
I am attempting to download files from a repository like so:
my-generic-repository/myFolder/mySubfolder/(*)
...becomes...
C:\my\crazy\new\location\{1}
...recursively, with the portion of the tree in (*) preserved as {1}.
My download spec looks like:
{
"files": [
{
"aql": {
"items.find": {
"repo": "my-generic-repository",
"path": {
"$match": "myFolder/mySubFolder/*"
}
}
},
"flat": "false",
"recursive": "true"
}
]
}
To run the query I do this:
cd C:\my\crazy\new\location
jfrog rt download --spec=MySpecFile.txt --include-dirs=true --quiet=true
I observe the following file-tree:
C:\my\crazy\new\location\myFolder\mySubfolder\...
...but I want the following file-tree:
C:\my\crazy\new\location\...
Setting flat=true causes the repository's contents to be flattened completely and all files drop into C:\my\crazy\new\location without their hierarchy, which is not useful to me either.
(I'll add more AQL query complexity once I get the basic download to work, which is why I wish to use AQL and not 'pattern')
(side-note: I only get the files that are one subfolder deep or deeper within the specified path, not the files located exactly in the specified path, which is also making me scratch my head)

Submit topology to storm cluster through streamparse

I am trying to use streamparse to develop and submit the topologies to the storm cluster.
Since streamparse has its default wordcount topology to help user test the cluster, most of the tutorials I could find online is about submitting this default wordcount example to the storm clusters.
My question is how to submit my own topologies? For example, I have a topology named 'mytopology'. Per streamparse's document, I tried
sparse submit --environment prod --name mytopology
and my config file is
{
"serializer": "json",
"topology_specs": "topologies/",
"virtualenv_specs": "virtualenvs/",
"envs": {
"prod": {
"user": "userx",
"ssh_password": "mypasswd",
"nimbus": "10.XXX.XX.210",
"workers": ["10.XXX.XX.206"],
"log": {
"path": "/home/userx/stormapp/splog",
"max_bytes": 1000000,
"backup_count": 10,
"level": "info"
},
"virtualenv_root": "/home/userx/stormapp/venv"
}
}
}
However, the log showed that
JAR created: _build/wordcount-0.0.1-SNAPSHOT.jar
was created and submitted to Nimbus.
Isn't the
--name mytopology
supposed to find the mytopology.py and build something like mytopology.jar and submit that?
Then I checked the project.clj file, the top line is
defproject wordcount "0.0.1-SNAPSHOT"
Now it is confusing. Should I also configure this file? When I do
sparse submit --environment prod --name mytopology
Does it do something that is related to this file? Please help...
I suppose that your have first creating your wordcount project using the following command : sparse quickstart wordcount
In this case, "wordcount" will be the name of the topology that will be submitted to Storm using the sparse run command.
Now if you want to submit another topology, say mytopology, you have to create another quickstart project called mytopology and edit the config.json file to suit your technical environment. You cannot just copy and rename the "wordcount" project's folder as I guess you've done because "wordcount" appears in your project.clj file.

R: How can I install a specific release by install_github()?

If the current version of a package, gives some errors, users may prefer to install a specific release (e.g. version 1.0.1). What kind of R code can be used to achieve that?
Take for example for the release of the latest OhdsiRTools R packages:
https://github.com/OHDSI/OhdsiRTools/tree/v1.0.1
The command something like:
install_github("OHDSI/OhdsiRTools", ref = 'v1.0.1')
The code above is not correct. It only works for branches (e.g., master or devA). But the devtools package has functions to refer to releases.
Ideally I would refer to releases by their tag (but solution with commit ID would work too).
EXTRA BONUS: What code can install "latest" release. (but consider this a bonus question. The question about is the main one)
You need to append tags for releases directly onto the name of the repository argument. So, username/repo#releasetag will work. Only use the parameter ref = "devA" when you need to refer to a specific branch of the git repository.
For your example, regarding OhdsiRTools v1.0.1, we have
we have:
devtools::install_github("OHDSI/OhdsiRTools#v1.0.1")
Edit
After toying around with devtools source, it has come to my attention that one can request the latest source with:
username/repo#*release
Hence, you could use:
devtools::install_github("OHDSI/OhdsiRTools#*release")
End Edit
Outdated, see edit
Unfortunately, to obtain the latest release tag, the work for that is a bit more complicated as it would involve parsing a response from the GitHub API. Here are some notes if you really do need the tagged version... You would have to parse JSON from:
https://api.github.com/repos/<user>/<repo>/releases/latest
using either RJSONIO, jsonlite, rjson
To extract "tag_name" from:
{
"url": "https://api.github.com/repos/OHDSI/OhdsiRTools/releases/2144150",
"assets_url": "https://api.github.com/repos/OHDSI/OhdsiRTools/releases/2144150/assets",
"upload_url": "https://uploads.github.com/repos/OHDSI/OhdsiRTools/releases/2144150/assets{?name,label}",
"html_url": "https://github.com/OHDSI/OhdsiRTools/releases/tag/v1.0.1",
"id": 2144150,
"tag_name": "v1.0.1",
"target_commitish": "master",
"name": "Minor bug fix",
"draft": false,
"author": {
"login": "schuemie",
"id": 6713328,
"avatar_url": "https://avatars.githubusercontent.com/u/6713328?v=3",
"gravatar_id": "",
"url": "https://api.github.com/users/schuemie",
"html_url": "https://github.com/schuemie",
"followers_url": "https://api.github.com/users/schuemie/followers",
"following_url": "https://api.github.com/users/schuemie/following{/other_user}",
"gists_url": "https://api.github.com/users/schuemie/gists{/gist_id}",
"starred_url": "https://api.github.com/users/schuemie/starred{/owner}{/repo}",
"subscriptions_url": "https://api.github.com/users/schuemie/subscriptions",
"organizations_url": "https://api.github.com/users/schuemie/orgs",
"repos_url": "https://api.github.com/users/schuemie/repos",
"events_url": "https://api.github.com/users/schuemie/events{/privacy}",
"received_events_url": "https://api.github.com/users/schuemie/received_events",
"type": "User",
"site_admin": false
},
"prerelease": false,
"created_at": "2015-11-18T00:55:28Z",
"published_at": "2015-11-18T06:35:57Z",
"assets": [
],
"tarball_url": "https://api.github.com/repos/OHDSI/OhdsiRTools/tarball/v1.0.1",
"zipball_url": "https://api.github.com/repos/OHDSI/OhdsiRTools/zipball/v1.0.1",
"body": "Fixed bug in `convertArgsToList ` function."
}
Above is taken from https://api.github.com/repos/OHDSI/OhdsiRTools/releases/latest
For anyone who arrives here looking for how to install from a specific commit SHA, it's simply:
remotes::install_github("username/repository#commitSHA")
Example
Look for the SHA for the commit you want to install from the 'commits' page on github:
In this case the commit SHA is: 8bc79ec6dd57f46f753cc073a3a50e0921825260, so simply:
remotes::install_github("wilkelab/ggtext#8bc79ec6dd57f46f753cc073a3a50e0921825260")

Globbing file pattern

I am new to the Grunt Task Runer. I'm trying to do some file matching in one of my configurations. With this file matching, I need ignore all of my test files, except for one. The one test file that I need to keep is named 'basic.test.js' In an attempt to do this, I currently have the following configuration:
files: [
'src/**/*.js',
'!src/**/*.test.js',
'src/root/basic.test.js'
]
At this time, I am still getting ALL of my tests. This means that my tests in the other test files are still being seen. I'm trying to confirm if I'm doing my globbing pattern correctly. Does my globbing pattern look correct for my scenario?
Thank you!
If you only want one test then there is no need to match and then unmatch all the others. Just include the one test:
files: [
'src/root/basic.test.js'
]

Resources