How to determine the last part of URL with jq?

How to determine the last part of URL with jq? - jq

I have to distinguish between the following two paths.
shorter: https://www.example.com/
longer: https://www.example.com/foo/
In Bash script, using Bash built-in literals as follows returns only longer one.
$ url1=https://www.example.com/
$ url2=https://www.example.com/foo/
$ cut -d/ -f4 <<<${url1%/*} # this returns nothing
>$
$ cut -d/ -f4 <<<${url2%/*} # this returns last part of path
>$ foo
So it could be identified longer one in Bash script,
but now I have to define same filter for JSON value handled in jq.
If jq can write like the following, my goal can be achieved...
jq '. | select( .url | (cut -d/ -f4 <<< ${url2%/*})!=null) )'
But can not do that. How can do that?

jq has many string-handling functions -- one could do worse than checking the jq manual. For the task at hand, using a regex function would probably be best, but since you mentioned cut -d/ -f4, it might be of interest to note that much the same effect can be achieved by:
split("/")[3]
For the last non-trivial part you could consider:
sub("/ *$";"") | split("/")[-1]

Related

jq combine output on a single line separated by space

I am trying to run a jq query on a windows machine and it extracts values from output on a separate line
jq -r .Accounts[].Id
Output
204359864429
224271824096
282276286062
210394168456
090161402717
How do I run the jq query so that it combines the output on a single line separated by space
This is what I need-
204359864429 224271824096 282276286062 210394168456 090161402717
Any help would be appreciated.

The usual way would be to use the #csv or #tsv operators to convert the result in the CSV or tab-delimited format. These operators need the result to be contained in an array. For your case also to have a single space delimit, we can do a simple join(" ") operation
jq -r '[.Accounts[].Id]|join(" ")'

You can use the #sh formatter:
jq -r ".Accounts[].Id | #sh"
From the jq docs:
The input is escaped suitable for use in a command-line for a POSIX shell. If the input is an array, the output will be a series of space-separated strings.
Reference:
https://stedolan.github.io/jq/manual/#Basicfilters

At first I thought the join() solution above did not work. Then I realized that I was "overfeeding" the join() filter, causing it to fail because I was providing more than a simple array as input. I had concatenated several filters with , and failed to limit the scope of my join().
Did not work:
jq -r \
'.ansible_facts |
.ansible_hostname,
.ansible_all_ipv4_addresses | join(" "),
.ansible_local."aws-public-ipv4".publicIP'
This gave the error,
jq: error (at <stdin>:0): Cannot iterate over string ("hostone")
because jq was attempting to "consume" not only ansible_all_ipv4_addresses but also the output of the preceding ansible_hostname filter (I am not certain why this is or whether it was even intended by the author of jq).
Does work:
jq -r \
'.ansible_facts |
.ansible_hostname,
(.ansible_all_ipv4_addresses | join(" ")),
.ansible_local."aws-public-ipv4".publicIP'
Here, I restrict join() to .ansible_all_ipv4_addresses only (ansible_all_ipv4_addresses is an array of IP addresses I wish to translate into a single, space-separated string).
P.S.: I found that the #sh filter produces space-separated output as desired, but in addition delimits each output item in single quotes.
P.P.S.:
Here was my workaround, until I discovered that join() works just the same as it when used properly (see above):
jq -r '.Accounts[].Id | #tsv | sub("\t";" ";"g")'
Explanation: the #tsv filter produces Tab Separated Values, then the sub() filter substitutes tabs with spaces, globally.

jq in CLI create error when I want to parse the output

Using Home Assistant 0.92 to test my CLI for creating automated backuping. After a successful backup, the command responds with an output and I need to catch that value. I'm trying to use jq to parse it but only get an error.
$ hassio snapshots new --name"Testbackup"
This gives an output of slug: 07afd144 and I want to catch 07afd144
Tried following:
$ hassio snapshots new --name"Testbackup" | jq --raw-output '.data.slug'
This gives an output of parse error: Invalid numeric literal at line 1, column 5
The final result is planned to be:
slug=$(hassio snapshots new --name="${name}" | jq --raw-output '.data.slug')
where ${slug}=07afd144
What am I doing wrong?

jq is a tool for parsing and transforming JSON documents. What you have shown is not legal JSON. It is however a legal YAML document and can be transformed with yq. yq uses jq-like syntax, but can handle JSON, YAML, XML, and CSV files.
slug=$(hassio snapshots new --name="${name}" | yq '.slug')

slug: 07afd144 isn't valid JSON and as such cannot be parsed with jq. Furthermore, it doesn't contain a data property anywhere, so .data.slug doesn't make sense.
If the format is always this simple (property name, colon, space, value), then the value can be easily extracted with other common tools generally available on GNU+Linux systems:
cut (different invocations possible):
cut -d' ' -f2-
cut -c7-
awk:
awk '{print $2}'
sed:
sed 's/^slug: //'
`perl:
perl -lane 'print $F[0]'
or even grep (different invocations possible):
grep -o '[^ ]*$'
grep -o '[[:xdigit:]]*$'

Divide the result of two grep and word count

I have a log file and I would like to divide the result of one grep and count by another grep and count.
$ echo $((cat log2.txt | grep timed\|error\|Error | wc -l)/(cat log2.txt | grep Duration | wc -l))
zsh: bad math expression: operator expected at `log2.txt |...'
It's ugly, doesn't work and I can probably do it in a better way but I don't know how.
Also I would like to know if it possible to id incrementaly on a log stream read by tail for example.

First of all, you should know that, both grep|wc -l will count number of matched lines instead of occurrences, I hope this is what you really want.
Regarding your requirement, indeed, your approach is ugly (7 processes), apart from the mistakes. The job can be done by a single awk line:
awk '/timed|[Ee]rror/{a++}/Duration/{b++}END{printf "%.2f\n",a/b}' log2.txt
The above line calculates the result based on matched number of lines, same as your grep|wc -l.

You have several problems:
You are trying to run shell commands directly inside an arithmetic expression.
You aren't passing the correct regular expression to grep.
You need to make sure at least one of the operands is a floating-point value to trigger zsh's floating-point division.
Each pipeline can also be reduced to a single command; use input redirection instead of cat, and use the -c option to get the number of lines that match the regular expression.
echo $(( 1.0 * $(grep -c 'timed\|error\|Error' log2.txt) / $(grep -c Duration log2.txt))
Basic regular expressions treat unescaped | as a literal character, not an alteration operator.
$ echo foo | grep foo\|bar
$ echo foo | grep foo\\\|bar # Pass a literal backslash as part of the regex
foo
$ echo foo | grep 'foo\|bar' # Use '...' instead of explicitly escaping \ and |
foo
$ echo foo | grep -E 'foo|bar' # Use extended regular expressions instead

When it is allowed to omit the dot filter in jq?

I do not understand, when it is allowed to omit the dot expression.
It is possible to convert every line of raw input into a JSON string:
$ echo -e "a\nb" | jq -Rc .
"a"
"b"
In that example it makes no difference, when the dot expression is missing:
$ echo -e "a\nb" | jq -Rc
"a"
"b"
Next I can read the output from the first jq and slurp it into an array:
$ echo -e "a\nb" | jq -Rc . | jq -sc .
["a","b"]
Here it makes also no difference, when I omit the dot expression:
$ echo -e "a\nb" | jq -Rc . | jq -sc
["a","b"]
But when I omit both dot expressions, I get an usage error and an empty array as result:
$ echo -e "a\nb" | jq -Rc | jq -sc
jq - commandline JSON processor [version 1.5]
Usage: jq [options] <jq filter> [file...]
...
[]
Why?

Before directly answering the question, I'd like to clarify that:
It is always acceptable to specify a filter explicitly.
Some versions of jq expect that a filter will be specified explicitly.
Different versions of jq behave differently in the absence of an explicit filter.
The main idea guiding jq's evolution with regard to interpreting the absence of a filter intelligently has been that if there's something to read on STDIN, and if a filter has not been specified explicitly, and if it looks like you meant ., then assume you did mean ..
The answer to the question, then, is that the perplexing behavior noted in the question is a bug in a particular version of jq.
(Or if you like, the perplexing behavior reflects the difficulties that arise when developers seek to endow software with the ability to read your mind.)
By the way, the bug has been fixed:
$ jq --version
jq-1.5rc2-150-g1740fd0
$ echo -e "a\nb" | jq -Rc | jq -sc
["a","b"]

The answer is in the rest of the text
Usage: jq [options] <jq filter> [file...]
A filter should be mandatory then, a filter takes an input and produces an output, but in many times you dont need to produce an output and just want the result printed so the default was . (see the issue believe introduced in 1.5, before you must had to include the filter)
so it should be the same if . is the default filtering, unfortunately is how pipe is reading stdin and stout. You can read the details in the GitHub issue
Maybe we should print the usage message only when the program is empty, and stdin and stdout are both terminals? That is, assume . when stdin is not a terminal or when stdout is not a terminal.
so the rule is :
if you want to be perfectionist always use a filter even if . is the filter you want
if you want the result of your command to be the input of another pipe, you must indicate the filter, again if you just want the same result to be taken as input of the next command.
so the same
echo -e "a\nb" | jq -Rc > test.txt will produce an error but echo -e "a\nb" | jq -Rc . > test.txt will write the result of the command into the file

unix awk repeated pattern

I have the following variable. i want to search with pattern "/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/"
export str='16/02/02 11:29:22 INFO mortbay.log: State being saved: {"#class":"com.paypal.fpti.hadoop.copy.FPTICopyState","timestamp":0,"state":"Running","name":"com.paypal.fpti.hadoop.copy.FPTICopyState","id":"99c7cba7-d211-4845-97a1-c34168a91b22","subStates":{"com.paypal.fpti.hadoop.copy.CopyToLocalJob_fpti-raw-data-4_2016/02/02/10/":{"#class":"com.paypal.fpti.hadoop.copy.CopyToJobState","timestamp":0,"state":"Stopped","name":"com.paypal.fpti.hadoop.copy.CopyToJobState","id":"99034acb-cfad-41a0-89ed-e2731b1f82ec","subStates":null,"instanceState":"PostDone","window":"2016-02-02T10:00:00.000Z","datasetname":"fpti-raw-data-4","sourceDir":"/fpti/v2/hdfs_writer_4//2016/02/02/10/","localDir":"/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_4//2016/02/02/10//"},"com.paypal.fpti.hadoop.copy.CopyToLocalJob_fpti-raw-data_2016/02/02/10/":{"#class":"com.paypal.fpti.hadoop.copy.CopyToJobState","timestamp":0,"state":"Stopped","name":"com.paypal.fpti.hadoop.copy.CopyToJobState","id":"40325dec-0fe2-4025-8258-f896f957ddf0","subStates":null,"instanceState":"PostDone","window":"2016-02-02T10:00:00.000Z","datasetname":"fpti-raw-data","sourceDir":"/fpti/v2/hdfs_writer//2016/02/02/10/","localDir":"/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp//2016/02/02/10//"},"com.paypal.fpti.hadoop.copy.CopyToLocalJob_fpti-raw-data-1_2016/02/02/10/":{"#class":"com.paypal.fpti.hadoop.copy.CopyToJobState","timestamp":0,"state":"Stopped","name":"com.paypal.fpti.hadoop.copy.CopyToJobState","id":"5216f8c1-2cfa-4eac-a390-f4d2bcd6584f","subStates":{},"instanceState":"PostDone","window":"2016-02-02T10:00:00.000Z","datasetname":"fpti-raw-data-1","sourceDir":"/fpti/v2/hdfs_writer_1//2016/02/02/10/","localDir":"/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_1//2016/02/02/10//"},"com.paypal.fpti.hadoop.copy.CopyToLocalJob_fpti-raw-data-2_2016/02/02/10/":{"#class":"com.paypal.fpti.hadoop.copy.CopyToJobState","timestamp":0,"state":"Stopped","name":"com.paypal.fpti.hadoop.copy.CopyToJobState","id":"5fcd0b6e-3df9-4f82-a76f-bc8ff1493623","subStates":{},"instanceState":"PostDone","window":"2016-02-02T10:00:00.000Z","datasetname":"fpti-raw-data-2","sourceDir":"/fpti/v2/hdfs_writer_2//2016/02/02/10/","localDir":"/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_2//2016/02/02/10//"},"com.paypal.fpti.hadoop.copy.CopyToLocalJob_fpti-raw-data-3_2016/02/02/10/":{"#class":"com.paypal.fpti.hadoop.copy.CopyToJobState","timestamp":0,"state":"Stopped","name":"com.paypal.fpti.hadoop.copy.CopyToJobState","id":"6ec9223a-fcf0-447a-b9ae-2020e3232f6d","subStates":{},"instanceState":"PostDone","window":"2016-02-02T10:00:00.000Z","datasetname":"fpti-raw-data-3","sourceDir":"/fpti/v2/hdfs_writer_3//2016/02/02/10/","localDir":"/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_3//2016/02/02/10//"},"com.paypal.fpti.hadoop.copy.CopyToLocalJob_fpti-raw-data-5_2016/02/02/10/":{"#class":"com.paypal.fpti.hadoop.copy.CopyToJobState","timestamp":0,"state":"Stopped","name":"com.paypal.fpti.hadoop.copy.CopyToJobState","id":"d123742c-8a55-4e25-bfa0-0a97f6ed25d7","subStates":{},"instanceState":"PostDone","window":"2016-02-02T10:00:00.000Z","datasetname":"fpti-raw-data-5","sourceDir":"/fpti/v2/hdfs_writer_5//2016/02/02/10/","localDir":"/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_5//2016/02/02/10//"}},"copystate":"CopyToLocalDone","start":"2016-02-02T11:21:24.678Z","end":null,"window":"2016-02-02T10:00:00.000Z","retryCount":0}'
I tried like below it gives the first occurence alone
[ggangadharan#phxbastion2 ~]$ echo $str | awk '{match($0, "/x/home[/,a-z,0-9,_]+*", a)}END{print a[0]}'
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_4//2016/02/02/10//
but i want output like below.
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_4//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_1//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_2//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_3//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_5//2016/02/02/10//
Can somebody help me how to use awk for this scenario?
thanks in advance

I'm not sure how to hack this in awk, but you can safely use egrep here:
$ echo $str | egrep -o /x/home[/,a-z,0-9,_]+*
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_4//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_1//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_2//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_3//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_5//2016/02/02/10//

Using "significant splitting" in AWK:
$ awk -v RS="\"" '/\/x\/home\/pp_dt_fpti_batch\/stampy_copy_orchestration\//' <<< "$str"
which gives
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_4//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_1//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_2//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_3//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_5//2016/02/02/10//
You specified /x/home/pp_dt_fpti_batch/stampy_copy_orchestration/ for your search pattern, so I used that. If you want something different, use something different.
This separates input into records by a quote " (set RS to ", escaped in the shell). Any record matching the regular expression is printed. Input is given from the shell with the string $str. Maybe this is more readable:
$ awk -v RS='"' '/regexp/' <<< "$str"

Here are two approaches using a JSON-aware command-line tool, here jq.
In both cases we assume that the string of interest is embedded in the
JSON object contained in $str
(1) In the following, we simply pretty-print the JSON object and grep for
the string of interest in case it appears in a surprising spot. Further trimming of the result can easily be done (e.g. using sed) as desired:
$ sed 's/^[^{]*//' <<< "$str" | jq '.[]' | fgrep /x/home/pp_dt_fpti_batch/stampy_copy_orchestration/
"localDir": "/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_4//2016/02/02/10//"
"localDir": "/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp//2016/02/02/10//"
"localDir": "/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_1//2016/02/02/10//"
"localDir": "/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_2//2016/02/02/10//"
"localDir": "/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_3//2016/02/02/10//"
"localDir": "/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_5//2016/02/02/10//"
(2) The following query is appropriate if we are only interested in a
match if it occurs in an object as a value corresponding to the key "localDir":
sed 's/^[^{]*//' <<< "$str" |
jq -r '..
| select(.localDir?)
| .localDir
| select(test("/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/"))'
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_4//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_1//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_2//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_3//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_5//2016/02/02/10//

Categories

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to determine the last part of URL with jq? - jq

Related

jq combine output on a single line separated by space

jq in CLI create error when I want to parse the output

Divide the result of two grep and word count

When it is allowed to omit the dot filter in jq?

unix awk repeated pattern

Categories

Resources