Why is the first line is skipped - jq

Why is the first line skipped if I remove echo "prevent remove first line"?
(echo "prevent remove first line" && find ~/somedir -mindepth 1 -maxdepth 1 -type d) | /usr/local/bin/jq -nR \
'{
"items": [
inputs |
inputs as $title |
{
"title": $title,
}
]
}'

Your example should be:
find ~/somedir -mindepth 1 -maxdepth 1 -type d | /usr/local/bin/jq -nR \
'{
"items": [
inputs as $title |
{
"title": $title,
}
]
}'
From the jq man page:
inputs
Outputs all remaining inputs, one by one.
This is primarily useful for reductions over a programĀ“s inputs.
This implicitly means that every time called, inputs reads the next line from stdin. stdin is realized as a pipe, once a line from stdin has been read, it is gone.
When you echo a line before the actual input, then this extra line will be subject to this extra read. Just use inputs twice to see the problem coming back:
(echo "test" && find ~/somedir -mindepth 1 -maxdepth 1 -type d) | /usr/local/bin/jq -nR \
'{
"items": [
inputs | inputs |
inputs as $title |
{
"title": $title,
}
]
}'
PS: As pmf pointed out, the whole solution can be simplified to:
find ~/somedir -mindepth 1 -maxdepth 1 -type d \
jq -nR '{items: [{title: inputs}]}'

Related

Pass variable from bash to R with commandArgs

I'm having a terrible go trying to pass some variables from the shell to R. I am hesitant to post this because I can't figure out a reasonable way to make this reproducible, since it involves a tool that has to be downloaded, and really it's more of a general methodology issue that I don't think needs to be reproducible, if you can just suspend your disbelief and bear with me for a quick minute.
I have arguments that are defined in a bash script: $P, $G, and $O.
I have some if/then statements and everything is fine until I get to the $O options.
This is the first part of the $O section and it works fine. It grabs data from $P and passes it to the twoBitToFa utility from UCSC's genome project and outputs the data correctly in a .fa file. Beautiful. (Although I think using 'stdout' and '>' is perhaps redundant?)
if [ "$O" = "fasta" ]
then
awk '{print $0" "$1":"$2"-"$3}' "$P" |
twoBitToFa -bed=stdin -udcDir=. "$twobit" stdout > "${P%.bed}".fa
fi
The next section is where I am stuck. If the $O option is "bed", then I want to invoke the Rscript command and pass my stuff over to R. I am able to pass my $P, $G, and $O variables without issue, but now I also need to pass the output from the twoBitToFa function. I could add a step and make the .fa file and then pick that up in R, but I am trying to skip the .fa file creation step and output a different file type instead (.bed). Here are some things I have tried:
# try saving twoBitToFa output to variable and including it in the variables passed to R:
if [ "$O" = "bed" ]
then
awk '{print $0" "$1":"$2"-"$3}' "$P" |
myvar=$(twoBitToFa -bed=stdin -udcDir=. "$twobit" stdout) \
Rscript \
GetSeq_R.r \
$P \
$G \
$O \
$myvar
fi
To check what variables come through, my GetSeq_R.r script starts with:
args = commandArgs(trailingOnly=TRUE)
print(args)
and with the above code, the output only includes my $P, $G, and $O variables. $myvar doesn't make it. $P is the TAD-1 file, $G is "hg38", and $O is "bed".
[1] "TAD-1_template.bed" "hg38" "bed"
I am not sure if the way I am trying to pass the data in the variable is wrong. From everything I've read, it seems like it should work. I've also tried using tee to see what is in my stdout at that step like so:
if [ "$O" = "bed" ]
then
awk '{print $0" "$1":"$2"-"$3}' "$P" |
twoBitToFa -bed=stdin -udcDir=. "$twobit" stdout | tee \
Rscript \
GetSeq_R.r \
$P \
$G \
$O
fi
And the data I want to pass to R is correctly shown in my console by using tee. I've tried saving stdout and tee to a variable and passing that variable to R, thinking maybe it's something about twoBitToFa that refuses to be put inside a variable, but was unsuccessful. I've spent hours looking up info about tee, stdout, and passing variables from bash to R. I feel like I'm missing something fundamental, or trying to do something impossible, and would really appreciate some other eyes on this.
Here's the whole bash script, in case that's illuminating. Do I need to define a variable in "$#" for what I am trying to pass to R, even though it's not something I want the user to be aware of? Am I capturing the variable with $myvar incorrectly? Can I get the contents of stdout or tee to show up in R?
Thanks in advance.
for arg in "$#"; do
shift
case "$arg" in
"--path") set -- "$#" "-P" ;;
"--genome") set -- "$#" "-G" ;;
"--output") set -- "$#" "-O" ;;
"--help") set -- "$#" "-h" ;;
*) set -- "$#" "$arg"
esac
done
while getopts ":P:G:O:h" OPT
do
case $OPT in
P) P=$OPTARG;;
G) G=$OPTARG;;
O) O=$OPTARG;;
h) help ;;
\?)
echo "Invalid option: -$OPTARG" >&2
usage
exit 1
;;
:)
echo "Option -$OPTARG requires an argument." >&2
usage
exit 1
;;
esac
done
num_col=$(cat "$P" | awk "{print NF; exit}")
if [ "$num_col" = 3 ]
then
echo -e "\n\n3 column bed file detected; no directional considerations for sequences \n\n"
if [ "$G" = "hg38" ]
then
twobit="https://hgdownload.cse.ucsc.edu/goldenpath/hg38/bigZips/hg38.2bit"
fi
if [ "$G" = "hg19" ]
then
twobit="https://hgdownload.cse.ucsc.edu/goldenpath/hg19/bigZips/hg19.2bit"
fi
if [ "$O" = "fasta" ]
then
awk '{print $0" "$1":"$2"-"$3}' "$P" |
twoBitToFa -bed=stdin -udcDir=. "$twobit" stdout > "${P%.bed}".fa
fi
if [ "$O" = "bed" ]
then
awk '{print $0" "$1":"$2"-"$3}' "$P" |
#myvar=$(twoBitToFa -bed=stdin -udcDir=. "$twobit" stdout) \
Rscript \
GetSeq_R.r \
$P \
$G \
$O \
$myvar
fi
fi

Suffix subdirectory name with number of files underneath it

I have subdirectories a b c. For various obscure reasons, I would like to count all files recursively underneath these and only for maxdepth=1 mindepth=1 suffix this first layer of subdirectories with the file count down to the bottom of each subdirectory tree (no limit).
So if a and its subdirectories have 23 files, b...64 and c...82 I will end up with subdirectories renamed as
a_23
b_64
c_82
I have a routine to count recursively:
function count_all_files () {
echo "enter directory"
find "$1" -type f | wc -l
}
but am at a loss how to construct a find -exec operation to rename as I need.
Something like this pseudo code.
find . -type d -mindepth 1 -maxdepth 1 "*" -exec $(count_all_files {}) && [suffix dir name]
Grateful for thoughts. Needs to work with directories containing spaces too.
This seems to be working. I have amended it so it always makes a clean update eg if you add new files.
function label_subdirectories_number_files () {
for file in *_my_dir_count_* ; do rename 's/_my_dir_count_.*//g' "$file" ; done
find . -type d -mindepth 1 -maxdepth 1 -name '*' -exec bash -c 'cd {} \
&& number_of_files=$(find . -type f | wc -l) && directory=$(pwd) \
&& directory="${directory## }" && read -r number_of_files <<< "$number_of_files" \
&& new_directory="$directory""_my_dir_count_""$number_of_files" && \mv "$directory" "$new_directory" ' &> /dev/null 2>&1 \;
}
This variation does selected number of lower subdirectories too in case you want a quick eyeball test of lower level counts.
function label_subdirectories_number_files_many () {
echo "enter number of levels to scan"
for file in *_my_dir_count_* ; do rename 's/_my_dir_count_.*//g' "$file" &> /dev/null 2>&1 ; done
for zcount in $(seq 1 "$1") ; do
echo "level = $zcount out of $1 "
find . -type d -mindepth $zcount -maxdepth $zcount -name '*' -exec bash -c 'cd {} \
&& number_of_files=$(find . -type f | wc -l) && directory=$(pwd) \
&& directory="${directory## }" && read -r number_of_files <<< "$number_of_files" \
&& new_directory="$directory""_my_dir_count_""$number_of_files" && \mv "$directory" "$new_directory" ' &> /dev/null 2>&1 \;
done
}

Unix SSH : Find files with different path on two servers

I have server A & Server B . I want to file find command on both server but with different path .
Currently i created below code to do so :
dir1=( $DATA_DIR/sdfgv $DATA_DIR/1wefgg $DATA_DIR/3fdsevg );
dir2=( $DATA_DIR/asdf $DATA_DIR/sdfewfT $DATA_DIR/efergvfw );
timestamp=$(date +%Y%m%d%H%M%S);
report_name=Audit_Report_${timestamp}.txt
uname=xyz
server=( serv1 serv2);
for j in ${server[#]};
do {
if [ "$j" == "serv1 " ]
then
for i in ${dir1[#]};
do {
Size=`ssh -q $uname#$j "find $i -type f -mtime +1 -name '*.gz' -printf '%s + ' | dc -e0 -f- -ep"`;
echo " $j $i $Size "
}
done
else
for i in ${dir2[#]};
do {
Size=`ssh -q $uname#$j "find $i -type f -mtime +1 -name '*.gz' -exec du -k {} \; | awk '{ total += $ 1} END{print total/1024;}'"`;
echo " $j $i $Size "
}
done
fi
}
done
This code works pretty good but i want something that can be generic without if else on server name .
I dont want to use if else for server name .
Both server have different directory path to search for
Please come up with some suggestions .
Thank You !
Bash functions would help. Perhaps one for determining which directory to use for which server, and then one for executing the find command. Something like this (pseudo-code, untested):
getDirs() {
if [ $1 == 'server1' ]; then
echo $dir1
fi
if [ $1 == 'server2' ]; then
echo $dir2
fi
}
findFiles() {
local srv=$1
shift
local dir=$1
shift
local findArgs=$#
echo $(ssh -q $uname#$srv "find $dir $findArgs")
}
for srv in ${server[#]};
do
dirs=$(getDirs $srv)
for d in $dirs
do
findFiles $srv $d -type f -mtime +1 -name '*.gz'
done
done
Your sample does different things with the find results, so you will still need to add logic to handle that (could just be an if statement inside the loops maybe..)

Is there a way to ignore header lines in a UNIX sort?

I have a fixed-width-field file which I'm trying to sort using the UNIX (Cygwin, in my case) sort utility.
The problem is there is a two-line header at the top of the file which is being sorted to the bottom of the file (as each header line begins with a colon).
Is there a way to tell sort either "pass the first two lines across unsorted" or to specify an ordering which sorts the colon lines to the top - the remaining lines are always start with a 6-digit numeric (which is actually the key I'm sorting on) if that helps.
Example:
:0:12345
:1:6:2:3:8:4:2
010005TSTDOG_FOOD01
500123TSTMY_RADAR00
222334NOTALINEOUT01
477821USASHUTTLES21
325611LVEANOTHERS00
should sort to:
:0:12345
:1:6:2:3:8:4:2
010005TSTDOG_FOOD01
222334NOTALINEOUT01
325611LVEANOTHERS00
477821USASHUTTLES21
500123TSTMY_RADAR00
(head -n 2 <file> && tail -n +3 <file> | sort) > newfile
The parentheses create a subshell, wrapping up the stdout so you can pipe it or redirect it as if it had come from a single command.
If you don't mind using awk, you can take advantage of awk's built-in pipe abilities
eg.
extract_data | awk 'NR<3{print $0;next}{print $0| "sort -r"}'
This prints the first two lines verbatim and pipes the rest through sort.
Note that this has the very specific advantage of being able to selectively sort parts
of a piped input. all the other methods suggested will only sort plain files which can be read multiple times. This works on anything.
In simple cases, sed can do the job elegantly:
your_script | (sed -u 1q; sort)
or equivalently,
cat your_data | (sed -u 1q; sort)
The key is in the 1q -- print first line (header) and quit (leaving the rest of the input to sort).
For the example given, 2q will do the trick.
The -u switch (unbuffered) is required for those seds (notably, GNU's) that would otherwise read the input in chunks, thereby consuming data that you want to go through sort instead.
Here is a version that works on piped data:
(read -r; printf "%s\n" "$REPLY"; sort)
If your header has multiple lines:
(for i in $(seq $HEADER_ROWS); do read -r; printf "%s\n" "$REPLY"; done; sort)
This solution is from here
You can use tail -n +3 <file> | sort ... (tail will output the file contents from the 3rd line).
head -2 <your_file> && nawk 'NR>2' <your_file> | sort
example:
> cat temp
10
8
1
2
3
4
5
> head -2 temp && nawk 'NR>2' temp | sort -r
10
8
5
4
3
2
1
It only takes 2 lines of code...
head -1 test.txt > a.tmp;
tail -n+2 test.txt | sort -n >> a.tmp;
For a numeric data, -n is required. For alpha sort, the -n is not required.
Example file:
$ cat test.txt
header
8
5
100
1
-1
Result:
$ cat a.tmp
header
-1
1
5
8
100
So here's a bash function where arguments are exactly like sort. Supporting files and pipes.
function skip_header_sort() {
if [[ $# -gt 0 ]] && [[ -f ${#: -1} ]]; then
local file=${#: -1}
set -- "${#:1:$(($#-1))}"
fi
awk -vsargs="$*" 'NR<2{print; next}{print | "sort "sargs}' $file
}
How it works. This line checks if there is at least one argument and if the last argument is a file.
if [[ $# -gt 0 ]] && [[ -f ${#: -1} ]]; then
This saves the file to separate argument. Since we're about to erase the last argument.
local file=${#: -1}
Here we remove the last argument. Since we don't want to pass it as a sort argument.
set -- "${#:1:$(($#-1))}"
Finally, we do the awk part, passing the arguments (minus the last argument if it was the file) to sort in awk. This was orignally suggested by Dave, and modified to take sort arguments. We rely on the fact that $file will be empty if we're piping, thus ignored.
awk -vsargs="$*" 'NR<2{print; next}{print | "sort "sargs}' $file
Example usage with a comma separated file.
$ cat /tmp/test
A,B,C
0,1,2
1,2,0
2,0,1
# SORT NUMERICALLY SECOND COLUMN
$ skip_header_sort -t, -nk2 /tmp/test
A,B,C
2,0,1
0,1,2
1,2,0
# SORT REVERSE NUMERICALLY THIRD COLUMN
$ cat /tmp/test | skip_header_sort -t, -nrk3
A,B,C
0,1,2
2,0,1
1,2,0
Here's a bash shell function derived from the other answers. It handles both files and pipes. First argument is the file name or '-' for stdin. Remaining arguments are passed to sort. A couple examples:
$ hsort myfile.txt
$ head -n 100 myfile.txt | hsort -
$ hsort myfile.txt -k 2,2 | head -n 20 | hsort - -r
The shell function:
hsort ()
{
if [ "$1" == "-h" ]; then
echo "Sort a file or standard input, treating the first line as a header.";
echo "The first argument is the file or '-' for standard input. Additional";
echo "arguments to sort follow the first argument, including other files.";
echo "File syntax : $ hsort file [sort-options] [file...]";
echo "STDIN syntax: $ hsort - [sort-options] [file...]";
return 0;
elif [ -f "$1" ]; then
local file=$1;
shift;
(head -n 1 $file && tail -n +2 $file | sort $*);
elif [ "$1" == "-" ]; then
shift;
(read -r; printf "%s\n" "$REPLY"; sort $*);
else
>&2 echo "Error. File not found: $1";
>&2 echo "Use either 'hsort <file> [sort-options]' or 'hsort - [sort-options]'";
return 1 ;
fi
}
This is the same as Ian Sherbin answer but my implementation is :-
cut -d'|' -f3,4,7 $arg1 | uniq > filetmp.tc
head -1 filetmp.tc > file.tc;
tail -n+2 filetmp.tc | sort -t"|" -k2,2 >> file.tc;
Another simple variation on all the others, reading a file once
HEADER_LINES=2
(head -n $HEADER_LINES; sort) < data-file.dat
With Python:
import sys
HEADER_ROWS=2
for _ in range(HEADER_ROWS):
sys.stdout.write(next(sys.stdin))
for row in sorted(sys.stdin):
sys.stdout.write(row)
cat file_name.txt | sed 1d | sort
This will do what you want.

Check that a variable is a number in UNIX shell [duplicate]

This question already has answers here:
How do I test if a variable is a number in Bash?
(40 answers)
Closed 1 year ago.
How do I check to see if a variable is a number, or contains a number, in UNIX shell?
if echo $var | egrep -q '^[0-9]+$'; then
# $var is a number
else
# $var is not a number
fi
Shell variables have no type, so the simplest way is to use the return type test command:
if [ $var -eq $var 2> /dev/null ]; then ...
(Or else parse it with a regexp)
No forks, no pipes. Pure POSIX shell:
case $var in
(*[!0-9]*|'') echo not a number;;
(*) echo a number;;
esac
(Assumes number := a string of digits). If you want to allow signed numbers with a single leading - or + as well, strip the optional sign like this:
case ${var#[-+]} in
(*[!0-9]*|'') echo not a number;;
(*) echo a number;;
esac
In either ksh93 or bash with the extglob option enabled:
if [[ $var == +([0-9]) ]]; then ...
Here's a version using only the features available in a bare-bones shell (ie it'd work in sh), and with one less process than using grep:
if expr "$var" : '[0-9][0-9]*$'>/dev/null; then
echo yes
else
echo no
fi
This checks that the $var represents only an integer; adjust the regexp to taste, and note that the expr regexp argument is implicitly anchored at the beginning.
This can be checked using regular expression.
###
echo $var|egrep '^[0-9]+$'
if [ $? -eq 0 ]; then
echo "$var is a number"
else
echo "$var is not a number"
fi
I'm kind of newbee on shell programming so I try to find out most easy and readable
It will just check the var is greater or same as 0
I think it's nice way to choose parameters... may be not what ever... :
if [ $var -ge 0 2>/dev/null ] ; then ...
INTEGER
if echo "$var" | egrep -q '^\-?[0-9]+$'; then
echo "$var is an integer"
else
echo "$var is not an integer"
fi
tests (with var=2 etc.):
2 is an integer
-2 is an integer
2.5 is not an integer
2b is not an integer
NUMBER
if echo "$var" | egrep -q '^\-?[0-9]*\.?[0-9]+$'; then
echo "$var is a number"
else
echo "$var is not a number"
fi
tests (with var=2 etc.):
2 is a number
-2 is a number
-2.6 is a number
-2.c6 is not a number
2. is not a number
2.0 is a number
if echo $var | egrep -q '^[0-9]+$'
Actually this does not work if var is multiline.
ie
var="123
qwer"
Especially if var comes from a file :
var=`cat var.txt`
This is the simplest :
if [ "$var" -eq "$var" ] 2> /dev/null
then echo yes
else echo no
fi
Here is the test without any regular expressions (tcsh code):
Create a file checknumber:
#! /usr/bin/env tcsh
if ( "$*" == "0" ) then
exit 0 # number
else
((echo "$*" | bc) > /tmp/tmp.txt) >& /dev/null
set tmp = `cat /tmp/tmp.txt`
rm -f /tmp/tmp/txt
if ( "$tmp" == "" || $tmp == 0 ) then
exit 1 # not a number
else
exit 0 # number
endif
endif
and run
chmod +x checknumber
Use
checknumber -3.45
and you'll got the result as errorlevel ($?).
You can optimise it easily.
( test ! -z "$num" && test "$num" -eq "$num" 2> /dev/null ) && {
# $num is a number
}
You can do that with simple test command.
$ test ab -eq 1 >/dev/null 2>&1
$ echo $?
2
$ test 21 -eq 1 >/dev/null 2>&1
$ echo $?
1
$ test 1 -eq 1 >/dev/null 2>&1
$ echo $?
0
So if the exit status is either 0 or 1 then it is a integer , but if the exis status is 2 then it is not a number.
a=123
if [ `echo $a | tr -d [:digit:] | wc -w` -eq 0 ]
then
echo numeric
else
echo ng
fi
numeric
a=12s3
if [ `echo $a | tr -d [:digit:] | wc -w` -eq 0 ]
then
echo numeric
else
echo ng
fi
ng
Taking the value from Command line and showing THE INPUT IS DECIMAL/NON-DECIMAL and NUMBER or not:
NUMBER=$1
IsDecimal=`echo "$NUMBER" | grep "\."`
if [ -n "$IsDecimal" ]
then
echo "$NUMBER is Decimal"
var1=`echo "$NUMBER" | cut -d"." -f1`
var2=`echo "$NUMBER" | cut -d"." -f2`
Digit1=`echo "$var1" | egrep '^-[0-9]+$'`
Digit2=`echo "$var1" | egrep '^[0-9]+$'`
Digit3=`echo "$var2" | egrep '^[0-9]+$'`
if [ -n "$Digit1" ] && [ -n "$Digit3" ]
then
echo "$NUMBER is a number"
elif [ -n "$Digit2" ] && [ -n "$Digit3" ]
then
echo "$NUMBER is a number"
else
echo "$NUMBER is not a number"
fi
else
echo "$NUMBER is not Decimal"
Digit1=`echo "$NUMBER" | egrep '^-[0-9]+$'`
Digit2=`echo "$NUMBER" | egrep '^[0-9]+$'`
if [ -n "$Digit1" ] || [ -n "$Digit2" ]; then
echo "$NUMBER is a number"
else
echo "$NUMBER is not a number"
fi
fi

Resources