Sed replacing pattern starts with but not contains string - unix

I have the following text file: file.text
Cygwin
value: c
Unix-keep
value: u
Linux
value: l
Unix-16
value: u
Solaris
value: s
Unix-replace-1
value: u
Unix-replace-2
value: u
I want to replace all lines values based on previous string : starting with Unix- but excluding containing string keep
So in the end affected lines sould be the ones after:
Unix-replace-1 and Unix-replace-2 with value value: NEW_VERSION
Expected output:
Cygwin
value: c
Unix-keep
value: u
Linux
value: l
Unix-16
value: NEW_VERSION
Solaris
value: s
Unix-replace-1
value: NEW_VERSION
Unix-replace-2
value: NEW_VERSION
I tried following sed script:
sed '/^Unix-/ {n;s/.*/value: NEW_VERSION/}' file.text
but this can only get starting but not excluding -keep substrings. I am not sure how to combine the excluded ones.
Any ideas?

$ awk -v new='NEW_VERSION' 'f{$0=$1 FS new} {f=(/^Unix-/ && !/keep/)} 1' file
Cygwin
value: c
Unix-keep
value: u
Linux
value: l
Unix-16
value: NEW_VERSION
Solaris
value: s
Unix-replace-1
value: NEW_VERSION
Unix-replace-2
value: NEW_VERSION

Using a proper xml parser (the question was originally plain XML before edited by OP):
The XML edited to be valid (closing tags missing) :
<project>
<Cygwin>
<version>c</version>
</Cygwin>
<Unix-keep>
<version>u</version>
</Unix-keep>
<Linux>
<version>l</version>
</Linux>
<Solaris>
<version>s</version>
</Solaris>
<Unix-replace-1>
<version>u</version>
</Unix-replace-1>
<AIX>
<version>a</version>
</AIX>
<Unix-replace-2>
<version>u</version>
</Unix-replace-2>
</project>
The command:
xmlstarlet ed -u '//project/*
[starts-with(name(), "Unix") and not(starts-with(name(), "Unix-keep"))]
/version
' -v 'NEW_VERSION' file
To edit the file in place, use
xmlstarlet ed -L -u ...

Related

Using kislyuk/yq returns syntax error, unexpected INVALID_CHARACTER with additional /0 at the end

I am using kislyuk/yq - The more often talked about version, which is a wrapper over jq, written in Python using the PyYAML library for YAML parsing
The version is yq 2.12.2
My jq is jq-1.6
I'm using ubuntu and bash scripts to do my parsing.
I wrote this as bash
alias=alias1
token=abc
yq -y -i ".tokens += { $alias: { value: $token }}" /root/.github.yml
I get the following error
jq: error: abc/0 is not defined at <top-level>, line 1:
.tokens += { alias1: { value: abc }}
I don't get it. Why would there be a /0 at the end?
The problem is abc is not interpreted as a literal string, when the double quotes are expanded by the shell. The underlying jq wrapper tries to match with abc as a standard built-in or a user-defined function which it was not able to resolve to, hence the error.
A JSON string (needed for jq) type needs to be quoted with ".." to be consistent with the JSON grammar. One way would be to pass the arg via command line with the --arg support
yq -y -i --arg t "$token" --arg a "$alias" '.tokens += { ($a): { value: $t } }' /root/.github.yml
Or have a quoting mess like below, which I don't recommend at all
yq -y -i '.tokens += { "'"$alias"'": { value: "'"$token"'" }}' /root/.github.yml

how to read tab separated line into variables

I wrote this function in zsh
function test() {
test="a\tb\tc"
while IFS=$'\t' read -r user host key; do
echo "user: $user"
echo "host: $host"
echo "key: $key"
done <<< "$test"
}
The output was:
user: a b c
host:
key:
if instead of
... IFS=$'\t' read -r ...
I change it to
... IFS='\t' read -r ...
the output is
user: a
host:
key: b c
Just what is going on?
I would like to read the tab separated line and set my variables accordingly.
Changing the double-quotes to $'...' (single quotes preceded by a $) could rescue for the variable $test:
test=$'a\tb\tc'
Here is the zsh manual for QUOTING (double-quoting and $'...'):
QUOTING
...
A string enclosed between $' and ' is processed the same way as the string arguments of the print builtin
...
Inside double quotes (""), parameter and command substitution occur, and \ quotes the characters \, `, ", $, and the first character of $histchars (default !).
--- zshmisc(1), QUOTING
For example:
"\$" -> $, "\!" -> ! etc.
"\t" -> \t (zsh does not recognize as tab this case), "\a" -> \a etc.
It does not treat the escape sequence \t as tab when it is used inside double quotes, so "a\tb\tc" does not mean "atabbtabc". (But things are a little more complicated: builtin echo recognizes the escape sequence \t.)
(1) ... IFS=$'\t' read -r ... (the original form)
Because expanding "$test" dose not have any tab characters, so read assigns the whole line to $user:
user: a b c
host:
key:
(But echo recognizes the escape sequence \t as the tab.)
(2) ... IFS='\t' read -r ...
Again, expanding "$test" does not have any tab characters, so read separate the field by \ and t according $IFS.
a\t\b\tc splits into a (to $user), \(separator), `` (empty to $host), t(separator), and the rest of the line (b\tc to $key):
user: a
host:
key: b c
(But again, echo recognizes the escape sequence \t as the tab.)
Here is the code changed from test="..." to test=$'...':
function test() {
test=$'a\tb\tc'
while IFS=$'\t' read -r user host key; do
echo "user: $user"
echo "host: $host"
echo "key: $key"
done <<< "$test"
}
test
The output is:
user: a
host: b
key: c
PS: it is worth reading POSIX's Quoting specification, which is simpler than zsh's (https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_02)

Select version string from JSON array and increment it by one using jq

Bash script find a a tags in ECR repo:
aws ecr describe-images --repository-name laplacelab-backend-repo
\ --query 'sort_by(imageDetails,& imagePushedAt)[*]'
\--output json | jq -r '.[].imageTags'
Output:
[
"v1",
"sometag",
...
]
How I can extract the version number? v<number> can contain the only version tag. I need to get a number and increment version for the set to var. If output of sort_by(imageDetails,& imagePushedAt)[*] is empty JSON arr instead
[
{
"registryId": "057296704062",
"repositoryName": "laplacelab-backend-repo",
"imageDigest": "sha256:c14685cf0be7bf7ab1b42f529ca13fe2e9ce00030427d8122928bf2d46063bb7",
"imageTags": [
"v1"
],
"imageSizeInBytes": 351676915,
"imagePushedAt": 1593514683.0
}
]
Set 2
No one repo sort_by(imageDetails,& imagePushedAt)[*] return [] set 1.
As a result, I try to get var VERSION with next version for an update or 1 if the repo is empty.
You could use the select() function on the imageTags array and get only the tag starting with v and increment it.
jq '( .[].imageTags[] | select(startswith("v")) | ltrimstr("v") | tonumber | .+1 ) // 1'
For other cases like the tags array being empty or containing null strings (error case), the value defaults to 1
For storing into the variable e.g. say version (avoid using uppercase variable names from a user scripts), use command substitution. See How do I set a variable to the output of a command in Bash?
version=$( <your-pipeline> )
Note: This does not work well with version strings following Semantic versioning RFC, e.g. as v1.2.1 as jq does not have a library to parse them.

Loop over environment variables in POSIX sh

I need to loop over environment variables and get their names and values in POSIX sh (not bash). This is what I have so far.
#!/usr/bin/env sh
# Loop over each line from the env command
while read -r line; do
# Get the string before = (the var name)
name="${line%=*}"
eval value="\$$name"
echo "name: ${name}, value: ${value}"
done <<EOF
$(env)
EOF
It works most of the time, except when an environment variable contains a newline. I need it to work in that case.
I am aware of the -0 flag for env that separates variables with nul instead of newlines, but if I use that flag, how do I loop over each variable? Edit: #chepner pointed out that POSIX env doesn't support -0, so that's out.
Any solution that uses portable linux utilities is good as long as it works in POSIX sh.
There is no way to parse the output of env with complete confidence; consider this output:
bar=3
baz=9
I can produce that with two different environments:
$ env -i "bar=3" "baz=9"
bar=3
baz=9
$ env -i "bar=3
> baz=9"
bar=3
baz=9
Is that two environment variables, bar and baz, with simple numeric values, or is it one variable bar with the value $'3\nbaz=9' (to use bash's ANSI quoting style)?
You can safely access the environment with POSIX awk, however, using the ENVIRON array. For example:
awk 'END { for (name in ENVIRON) {
print "Name is "name;
print "Value is "ENVIRON[name];
}
}' < /dev/null
With this command, you can distinguish between the two environments mentioned above.
$ env -i "bar=3" "baz=9" awk 'END { for (name in ENVIRON) { print "Name is "name; print "Value is "ENVIRON[name]; }}' < /dev/null
Name is baz
Value is 9
Name is bar
Value is 3
$ env -i "bar=3
> baz=9" awk 'END { for (name in ENVIRON) { print "Name is "name; print "Value is "ENVIRON[name]; }}' < /dev/null
Name is bar
Value is 3
baz=9
Maybe this would work?
#!/usr/bin/env sh
env | while IFS= read -r line
do
name="${line%%=*}"
indirect_presence="$(eval echo "\${$name+x}")"
[ -z "$name" ] || [ -z "$indirect_presence" ] || echo "name:$name, value:$(eval echo "\$$name")"
done
It is not bullet-proof, as if the value of a variable with a newline happens to have a line beginning that looks like an assignment, it could be somewhat confused.
The expansion uses %% to remove the longest match, so if a line contains several = signs, they should all be removed to leave only the variable name from the beginning of the line.
Here an example based on the awk approach:
#!/bin/sh
for NAME in $(awk "END { for (name in ENVIRON) { print name; }}" < /dev/null)
do
VAL="$(awk "END { printf ENVIRON[\"$NAME\"]; }" < /dev/null)"
echo "$NAME=$VAL"
done

How to extract text from a file which appears one or more times in each line?

I have a text file which have 1 or more email ids in each line. E.g.
id:123, name:test, id: 5678, name john, address:new york
id:567, name:bob
id:3643, name:meg, id: 6721, name kate, address:la
Now, the problem is id:value may appear one or more times in a single line. How do I extract all id:value pairs so that the output is,
id:123, id:5678
id:567
id:3643, id:6721
I tried egrep -o but that is putting each id:value pair in a separate line.
sed/awk should do the trick but I am a noob
Do not want to use Perl as that will require a Perl installation.
EDIT:
On further analysis of the data files, I am seeing inconsistent separators, i.e. not all lines are , separated. Some are even separated with : and |. Also, , is appearing within the address value field. i.e. address:52nd st, new york. Can this be done in awk using a regex expression?
If your content is in the file test.txt then the following command:
cat test.txt | sed 's/ *: */:/g' | grep -o 'id:[0-9]*'
will return:
id:123
id:5678
id:567
id:3643
id:6721
The sed command is to remove any spaces adjacent to the colon, yielding an output of:
id:123, name:test, id:5678, name john, address:new york
id:567, name:bob
id:3643, name:meg, id:6721, name kate, address:la
and the grep -o command finds all matches to id: proceeded by zero or more numbers, with the -o to return only the matching part of the input string.
Per the man page:
-o, --only-matching Print only the matched (non-empty) parts of a matching
line, with each such part on a separate output line.
(FYI, the grep and sed commands are using regular expressions.)
EDIT:
Sorry, I didn't read carefully. I see that you object to the -o output format of one value per line. Back to the drawing board...
Note: If the reason you are opposed to the -o output is to preserve line numbers, using grep -no will give the following output (where the first number is the line number):
1:id:123
1:id:5678
2:id:567
3:id:3643
3:id:6721
Maybe that helps?
This might work for you (GNU sed):
sed -r 's/\<id:\s*/\n/g;s/,[^\n]*//g;s/\n/, id:/g;s/^, //' file
Convert the words id: and any following spaces to a unique token (in this case \n). Delete anyting following a , upto a \n. Replace the \n by the token , id: and then delete the leading ,.
This should work:
awk -F, '{id=0;for(i=1;i<=NF;i++) if($i~/id:/) id=id?id FS $i:$i; print id}' file
Test:
$ cat file
id:123, name:test, id: 5678, name john, address:new york
id:567, name:bob
id:3643, name:meg, id: 6721, name kate, address:la
$ awk -F, '{id=0;for(i=1;i<=NF;i++) if($i~/id:/) id=id?id FS $i:$i; print id}' file
id:123, id: 5678
id:567
id:3643, id: 6721
perl -lne 'push #a,/id:[^,]*/g;print "#a";undef #a' your_file
Tested Below:
> cat temp
id:123, name:test, id: 5678, name john, address:new york
id:567, name:bob
id:3643, name:meg, id: 6721, name kate, address:la
> perl -lne 'push #a,/id:[^,]*/g;print "#a";undef #a' temp
id:123 id: 5678
id:567
id:3643 id: 6721
>
This is just a variation of an answer allready given..I personaly prefere the script verion in a file more than the command line (better control, readability)
id.txt
id:1, name:test, id:2, name john, address:new york
id:3, name:bob
id:4, name:meg, id:5, name kate, address:la
id.akw
{
i=0
for(i=1;i<=NF;i++)
{ if($i~/id:/)
id=id?id $i:$i;}
print id
id=""
}
call: awk -f id.awk id.txt
output:
id:1, id:2,
id:3,
id:4, id:5,

Resources