problems with cut (unix) - unix

I've got strange problem with cut
I wrote script, there I have row:
... | cut -d" " -f3,4 >! out
cut recieves this data (I checked it with echo)
James James 033333333 0 0.00
but I recieve empty lines in out, can somebody explain why?

You need to compress out the sequences of spaces, so that each string of spaces is replaced by a single space. The tr command's -s (squeeze) option is perfect for this:
$ ... | tr -s " " | cut -d" " -f3,4 >! out

If you want fields from a text file, awk is almost always the answer:
... | awk '{print $3" "$4}'
For example:
$ echo 'James James 033333333 0 0.00' | cut -d" " -f3,4
$ echo 'James James 033333333 0 0.00' | awk '{print $3" "$4}'
033333333 0

Cut doesn't see multiple spaces as single space, so it matches "nothingness" between spaces.
Do you get empty lines when you leave out >! out part? Ie, are you targeting correct fields?
If your input string uses fixed spacing, you might want to use cut -c 4-10,15-20 | tr -d ' ' to extract character groups 4-10 and 15-20 and remove spaces from them..

... | grep -o "[^ ]*"
will extract fields, each on separate line. Then you might head/tail them. Not sure about putting them on the same line again.

Related

how to capture text after / but before 2nd - symbols in zsh

For example I have a git branch name feature/ABC-123-my-stuff
I want to just capture ABC-123 in this format.
I tried
cut -d "/" -f2 <<< "$branchName"
result in
ABC-123-my-stuff
but I want to only keep the string right after the / and before 2nd -
What do I add / modify to achieve that?
NOTE: I am using zsh on MacOS
Use cut 2 times:
(cut -d"/" -f2 | cut -d"-" -f1,2) <<< $branchName
or with echo:
echo $branchName | cut -d"/" -f2 | cut -d"-" -f1,2
Another way you can use grep:
echo $branchName | egrep -o '[A-Z]{3}-[0-9]{3}'
Note: this solution will work if you have every time 3 times capital letter, then -, then 3 digits.
All solutions gives me the output:
ABC-123
Use regexp matching:
if [[ $branchName =~ /([^-]+-[^-]+)- ]]
then
desired_part=$match[1]
else
echo $branchName has not the expected format
fi

Use of cut command in unix

Suppose I have string as follwing:
Rajat;Harshit Srivastava;Mayank 123;5
Now i want result as following using cut command
Rajat
Harshit Srivastava
Mayank 123
5
I have tried but cut is not working on string containing spaces.
man cut would tell you:
-d, --delimiter=DELIM
use DELIM instead of TAB for field delimiter
--output-delimiter=STRING
use STRING as the output delimiter the default is to use the
input delimiter
If you insist on using cut for changing the delimiters:
$ echo "Rajat;Harshit Srivastava;Mayank 123;5" | cut -d \; --output-delimiter=\ -f 1-
Rajat Harshit Srivastava Mayank 123 5
but instead you should use sed or tr or awk for it. Try man tr, for example.
Try this
echo "Rajat;Harshit Srivastava;Mayank 123;5" | sed 's/;/ /g'

I tried various cut commands but unable to get the output I desire

cat DecisionService.txt
/MAGI/Household/MAGI_EDG_FLOW.erf;/Medicaid/MAGI_EDG_FLOW;4;4
/VCL/VCL_Ruleflow_1.erf;/VCL/VCL1_EBDC_FLOW;4;4
/VCL/VCL_Ruleflow_2.erf;/VCL/VCL2_EBDC_FLOW;4;4
I tried this:
cat DecisionService.txt | cut -d ';' -f2 | cut -d '/' -f2 | tr -s ' ' '\n'
My output is:
$i=Medicaid
VCL
VCL
Whereas I need the output to be:
$a=Medicaid
$b=VCL
If you just want the unique values then:
awk -F'/' 'NF&&!a[$(NF-1)]++{print $(NF-1)}' file
Medicaid
VCL
If you actually want the output to contain prefixed incremental variables then:
awk -F'/' 'NF&&!a[$(NF-1)]++{printf "$%c=%s\n",i++,$(NF-1)}' i=97 file
$a=Medicaid
$b=VCL
Note: If your input may contain more than 26 unique value you will need to do something cleverer to avoid output such as $|=VCL.
Well from the question, it's not much clear what exactly you want, but i guess you don't want repeated VCL in output. Try adding sort and uniq at the end.
cat DecisionService.txt
/MAGI/Household/MAGI_EDG_FLOW.erf;/Medicaid/MAGI_EDG_FLOW;4;4
/VCL/VCL_Ruleflow_1.erf;/VCL/VCL1_EBDC_FLOW;4;4
/VCL/VCL_Ruleflow_2.erf;/VCL/VCL2_EBDC_FLOW;4;4
cat DecisionService.txt | cut -d ';' -f2 | cut -d '/' -f2 | tr -s ' ' '\n'|sort|uniq
Medicaid
VCL

Unix - Need to cut a file which has multiple blanks as delimiter - awk or cut?

I need to get the records from a text file in Unix. The delimiter is multiple blanks. For example:
2U2133 1239
1290fsdsf 3234
From this, I need to extract
1239
3234
The delimiter for all records will be always 3 blanks.
I need to do this in an unix script(.scr) and write the output to another file or use it as an input to a do-while loop. I tried the below:
while read readline
do
read_int=`echo "$readline"`
cnt_exc=`grep "$read_int" ${Directory path}/file1.txt| wc -l`
if [ $cnt_exc -gt 0 ]
then
int_1=0
else
int_2=0
fi
done < awk -F' ' '{ print $2 }' ${Directoty path}/test_file.txt
test_file.txt is the input file and file1.txt is a lookup file. But the above way is not working and giving me syntax errors near awk -F
I tried writing the output to a file. The following worked in command line:
more test_file.txt | awk -F' ' '{ print $2 }' > output.txt
This is working and writing the records to output.txt in command line. But the same command does not work in the unix script (It is a .scr file)
Please let me know where I am going wrong and how I can resolve this.
Thanks,
Visakh
The job of replacing multiple delimiters with just one is left to tr:
cat <file_name> | tr -s ' ' | cut -d ' ' -f 2
tr translates or deletes characters, and is perfectly suited to prepare your data for cut to work properly.
The manual states:
-s, --squeeze-repeats
replace each sequence of a repeated character that is
listed in the last specified SET, with a single occurrence
of that character
It depends on the version or implementation of cut on your machine. Some versions support an option, usually -i, that means 'ignore blank fields' or, equivalently, allow multiple separators between fields. If that's supported, use:
cut -i -d' ' -f 2 data.file
If not (and it is not universal — and maybe not even widespread, since neither GNU nor MacOS X have the option), then using awk is better and more portable.
You need to pipe the output of awk into your loop, though:
awk -F' ' '{print $2}' ${Directory_path}/test_file.txt |
while read readline
do
read_int=`echo "$readline"`
cnt_exc=`grep "$read_int" ${Directory_path}/file1.txt| wc -l`
if [ $cnt_exc -gt 0 ]
then int_1=0
else int_2=0
fi
done
The only residual issue is whether the while loop is in a sub-shell and and therefore not modifying your main shell scripts variables, just its own copy of those variables.
With bash, you can use process substitution:
while read readline
do
read_int=`echo "$readline"`
cnt_exc=`grep "$read_int" ${Directory_path}/file1.txt| wc -l`
if [ $cnt_exc -gt 0 ]
then int_1=0
else int_2=0
fi
done < <(awk -F' ' '{print $2}' ${Directory_path}/test_file.txt)
This leaves the while loop in the current shell, but arranges for the output of the command to appear as if from a file.
The blank in ${Directory path} is not normally legal — unless it is another Bash feature I've missed out on; you also had a typo (Directoty) in one place.
Other ways of doing the same thing aside, the error in your program is this: You cannot redirect from (<) the output of another program. Turn your script around and use a pipe like this:
awk -F' ' '{ print $2 }' ${Directory path}/test_file.txt | while read readline
etc.
Besides, the use of "readline" as a variable name may or may not get you into problems.
In this particular case, you can use the following line
sed 's/ /\t/g' <file_name> | cut -f 2
to get your second columns.
In bash you can start from something like this:
for n in `${Directoty path}/test_file.txt | cut -d " " -f 4`
{
grep -c $n ${Directory path}/file*.txt
}
This should have been a comment, but since I cannot comment yet, I am adding this here.
This is from an excellent answer here: https://stackoverflow.com/a/4483833/3138875
tr -s ' ' <text.txt | cut -d ' ' -f4
tr -s '<character>' squeezes multiple repeated instances of <character> into one.
It's not working in the script because of the typo in "Directo*t*y path" (last line of your script).
Cut isn't flexible enough. I usually use Perl for that:
cat file.txt | perl -F' ' -e 'print $F[1]."\n"'
Instead of a triple space after -F you can put any Perl regular expression. You access fields as $F[n], where n is the field number (counting starts at zero). This way there is no need to sed or tr.

Forcing the order of output fields from cut command

I want to do something like this:
cat abcd.txt | cut -f 2,1
and I want the order to be 2 and then 1 in the output. On the machine I am testing (FreeBSD 6), this is not happening (its printing in 1,2 order). Can you tell me how to do this?
I know I can always write a shell script to do this reversing, but I am looking for something using the 'cut' command options.
I think I am using version 5.2.1 of coreutils containing cut.
This can't be done using cut. According to the man page:
Selected input is written in the same order that it is read, and is
written exactly once.
Patching cut has been proposed many times, but even complete patches have been rejected.
Instead, you can do it using awk, like this:
awk '{print($2,"\t",$1)}' abcd.txt
Replace the \t with whatever you're using as field separator.
Lars' answer was great but I found an even better one. The issue with his is it matches \t\t as no columns. To fix this use the following:
awk -v OFS=" " -F"\t" '{print $2, $1}' abcd.txt
Where:
-F"\t" is what to cut on exactly (tabs).
-v OFS=" " is what to seperate with (two spaces)
Example:
echo 'A\tB\t\tD' | awk -v OFS=" " -F"\t" '{print $2, $4, $1, $3}'
This outputs:
B D A

Resources