Drop 4 first columns - unix

I have a command that can drop first 4 columns, but unfortunately if 2nd column name and 4th column name likely similar, it will truncate at 2nd column but if 2nd column and 4th column name are not same it will truncate at 4th column. Is it anything wrong to my commands?
awk -F"|" 'NR==1 {h=substr($0, index($0,$5)); next}
{file= path ""$1""$2"_"$3"_"$4"_03042017.csv"; print (a[file]++?"": "DETAILS 03042017" ORS h ORS) substr($0, index($0,$5)) > file}
END{for(file in a) print "EOF " a[file] > file}' filename
Account Num | Name | Card_Holder_Premium | Card_Holder| Type_Card | Balance | Date_Register
01 | 02 | 03 | 04 | 05 | 06 | 07
_Premium | Card_Holder| Type_Card | Balance | Date_Register
04 | 05 | 06 | 07
My desired output:
Card_Holder| Type_Card | Balance | Date_Register
05 | 06 |07

Is this all you're trying to do?
$ sed -E 's/([^|]+\| ){4}//' file
April | May | June
05 | 06 | 07
$ awk '{sub(/([^|]+\| ){4}/,"")}1' file
April | May | June
05 | 06 | 07

The method you use to remove columns using index is not correct. As you have figured out, index can be confused and match the previous field when the previous field contains the same words as the next field.
The correct way is the one advised by Ed Morton.
In this online test, bellow code based on Ed Morton suggestion, gives you the output you expect:
awk -F"|" 'NR==1 {sub(/([^|]+\|){3}/,"");h=$0;next} \
{file=$1$2"_"$3"_"$4"_03042017.csv"; sub(/([^|]+\|){3}/,""); \
print (a[file]++?"": "DETAILS 03042017" ORS h ORS) $0 > file} \
END{for(file in a) print "EOF " a[file] > file}' file1.csv
DETAILS 03042017
Card_Holder| Type_Card | Balance | Date_Register
04 | 05 | 06 | 07
Due to the whitespace that you have include in your fields, the filename of the generated file appears as 01 02 _ 03 _ 04 _03042017.csv. With your real data this filename should appear correct.
In any case, i just adapt Ed Morton answer to your code. If you are happy with this solution you should accept Ed Morton answer.
PS: I just removed a space from Ed Morton answer since it seems to work a bit better with your not so clear data.
Ed Suggested:
awk '{sub(/([^|]+\| ){4}/,"")}1' file
#Mind this space ^
This space here it might fail to catch your data if there is no space after each field (i.e April|May).
On the other hand, by removing this space it seems that Ed Solution can correctly match either fields in format April | May or in format April|May


Unix Shell Scripting to calculate

I have following data ;
24692 -rw-rw-r--+ 1 da01 da01 25284427 Aug 31 09:06 collected_BOT.227031
24660 -rw-rw-r--+ 1 da01 da01 25248756 Aug 31 09:35 collected_BOT.227032
24748 -rw-rw-r--+ 1 da01 da01 25338868 Aug 31 10:03 collected_BOT.227033
24740 -rw-rw-r--+ 1 da01 da01 25331322 Aug 31 10:31 collected_BOT.227034
grep 1303 collected_BOT.227034 | more
$15 = duration
I just want to calculate the total amount $15 on file collected_BOT.227034 (only if $11=1303)
awk -F, '$11==1303{sum+=$15} END {print sum}' collected_BOT.227034
-F, field separator is ,
$11==1303 check if 11th field exactly matches the number 1303
If so, add the value of 15th field to sum variable (whose initial value is zero by default)
END {print sum} after processing all the lines of input file, print the value of sum variable
Thanks #Mark Setchell for pointing out that $11==1303 can be used instead of $11 ~ /^1303$/
Also, use print sum + 0 if output is needed as '0' even when no lines match. Or an explicit BEGIN{sum=0} block
Great solution #sp asic.
No need to use regular expression for field $11 though:
awk -F, '$11=="1303" {sum+=$15} END {print sum}' collected_BOT.227034
(beware: use == and not =, because this last one will do nothing except do a (successful) assignment to field $11

Format output of concatenating 2 variables in unix

I am coding a simple shell script that checks the space of the target path and the space utilization per directory on that target path (example, I am checking space of /path1/home, and also checks how all the folders on /path1/home is consuming the total space.) My question is regarding the output it produces, it is not that pleasing to the eye (uneven spacing). See sample output lines below.
83G FOLDER 1 Apr 15 03:45
34G FOLDER 10 Mar 9 05:02
26G FOLDER 11 Mar 29 13:01
8.2G FOLDER 100 Apr 1 09:42
1.8G FOLDER 101 Apr 11 13:50
1.3G FOLDER 110 Feb 16 09:30
I just want the output format to be in line with the header so it will look neat because I will use it as a report. Here is the code I am using for this part.
ls -1 | grep -v "lost+found" |grep -v "email_body.tmp" > $v_path/Users.tmp
for user in `cat $v_path/Users.tmp | grep -v "Users.tmp"`
folder_size=`du -sh $user 2>/dev/null` # should be run using a more privileged user so that other folders can be read (2>/dev/null was used to discard error messages i.e. "du: cannot read directory `./marcnad/.gnupg': Permission denied")
folder_date=`ls -ltr | tr -s " " | cut -f6,7,8,9, -d" " | grep -w $user | cut -f1,2,3, -d" "`
folder_size="$folder_size $folder_date"
echo $folder_size >> $v_path/Users_Usage.tmp
echo "Summary of $v_path Disk Space Utilization per folder." >> email_body.tmp
echo "" >> email_body.tmp
echo "SIZE USER_FOLDER DATE_LAST_MODIFIED" >> email_body.tmp
for i in T G M K
cat $v_path/Users_Usage.tmp | grep [0-9]$i | sort -nr -k 1 >> $v_path/email_body.tmp
EDIT: Formatting
When you print the data use printf instead of echo
cat $v_path/Users_Usage.tmp | while read a b c d e f
printf '%-5s%-7%s%-4s%-4s%-3s-6s' $a $b $c $d $e $f
See here

Unix grep line above and concatenate

here is my sample input from a log file.
#2014 03 06 11:21:44:028#+1300#
[UserID= testUser]
What I am trying to do is go through all the log entries and do a grep command on the "UserID=" and then get the line 2 lines above (the timestamp). I then wish my output file to be a concatenation of the two into the file tempLog.txt
#2014 03 06 11:21:44:028#+1300# [UserID= testUser]
Can anyone help me with this? Still kinda new to Unix.... :)
#2.#2014 03 06 11:21:29:163#+1300#Info#/System/Security/Audit/Logon#
#xxxxxx (Has white spaces)
Logon failed | LOGIN.ERROR | null | | Login Method=[default], IP Address=[xx.xx.xxxx], UserID=[testUser], Reason=[Authentication did not succeed.]#
give this line a try:
grep --group-separator="" -B2 'UserID=' file|awk -v RS="" -F '\n' '{$2=""}7'
kent$ cat f
#2014 03 06 11:21:44:028#+1300#
[UserID= testUser]
#2014 03 06 11:21:44:028#+1400#
[UserID= testUser2]
kent$ grep --group-separator="" -B2 'UserID=' f|awk -v RS="" -F '\n' '{$2=""}7'
#2014 03 06 11:21:44:028#+1300# [UserID= testUser]
#2014 03 06 11:21:44:028#+1400# [UserID= testUser2]
This awk should do:
awk '/#20/ {f=$0} /\[UserID/ {print f,$0}' file
#2014 03 06 11:21:44:028#+1300# [UserID= testUser]

extract a string after a pattern

I want to extract the numbers following client_id and id and pair up client_id and id in each line.
For example, for the following lines of log,
User(client_id:03)) results:[RelatedUser(id:204, weight:10),_RelatedUser(id:491,_weight:10),_RelatedUser(id:29, weight: 20)
User(client_id:04)) results:[RelatedUser(id:209, weight:10),_RelatedUser(id:301,_weight:10)
User(client_id:05)) results:[RelatedUser(id:20, weight: 10)
I want to output
03 204
03 491
03 29
04 209
04 301
05 20
I know I need to use sed or awk. But I do not know exactly how.
This may work for you:
awk -F "[):,]" '{ for (i=2; i<=NF; i++) if ($i ~ /id/) print $2, $(i+1) }' file
03 204
03 491
03 29
04 209
04 301
05 20
Here's a awk script that works (I put it on multiple lines and made it a bit more verbose so you can see what's going on):
awk 'BEGIN{FS="[\(\):,]"}
/client_id/ {
for (i=1; i<NF; i++) {
if ($i == "client_id") {
cid = $(i+1)
} else if ($i == "id") {
id = $(i+1);
print cid OFS id;
}' input_file_name
03 204
03 491
03 29
04 209
04 301
05 20
awk 'BEGIN{FS="[\(\):,]"}: invoke awk, use ( ) : and , as delimiters to separate your fields
/client_id/ {: Only do the following for the lines that contain client_id:
for (i=1; i<NF; i++) {: iterate through the fields on each line one field at a time
if ($i == "client_id") { cid = $(i+1) }: if the field we are currently on is client_id, then its value is the next field in order.
else if ($i == "id") { id = $(i+1); print cid OFS id;}: otherwise if the field we are currently on is id, then print the client_id : id pair onto stdout
input_file_name: supply the name of your input file as first argument to the awk script.
This might work for you (GNU sed):
sed -r '/.*(\(client_id:([0-9]+))[^(]*\(id:([0-9]+)/!d;s//\2 \3\n\1/;P;D' file
/.*(\(client_id:([0-9]+))[^(]*\(id:([0-9]+)/!d if the line doesn't have the intended strings delete it.
s//\2 \3\n\1/ re-arrange the line by copying the client_id and moving the first id ahead thus reducing the line for successive iterations.
P print upto the introduced newline.
D delete upto the introduced newline.
I would prefer awk for this, but if you were wondering how to do this with sed, here's one way that works with GNU sed.
/client_id/ {
s/(client_id:([0-9]+))[^(]+\(id:([0-9]+)([^\n]+)(.*)/\1 \4\5\n\2 \3/
Run it like this:
sed -rf parse.sed infile
Or as a one-liner:
<infile sed '/client_id/ { :a; s/(client_id:([0-9]+))[^(]+\(id:([0-9]+)([^\n]+)(.*)/\1 \4\5\n\2 \3/; ta; s/^[^\n]+\n//; }'
03 204
03 491
03 29
04 209
04 301
05 20
The idea is to repeatedly match client_id:([0-9]+) and id:([0-9]+) pairs and put them at the end of pattern space. On each pass the id:([0-9]+) is removed.
The final replace removes left-overs from the loop.

Sort history on number of occurrences

Basically I want to print the 10
most used commands that are stored in the
bash history but they still have to be proceeded
by the number that indicates when it was used;
I got this far:
history | cut -f 2 | cut -d ' ' -f 3,5 | sort -k 2 -n
Which should sort the second column of the number of occurrences from the command in that row... But it doesn't do that. I know I can head -10 the pipe at the end to take the highest ten of them, but I'm kinda stuck with the sorting part.
The 10 most used commands stored in your history:
history | sed -e 's/ *[0-9][0-9]* *//' | sort | uniq -c | sort -rn | head -10
This gives you the most used command line entries by removing the history number (sed), counting (sort | uniq -c), sorting by frequency (sort -rn) and showing only the top ten entries.
If you just want the commands alone:
history | awk '{print $2;}' | sort | uniq -c | sort -rn | head -10
Both of these strip the history number. Currently, I have no idea, how to achieve that in one line.
If you want to find the top used commands in your history file, you will have to count the instances in your history. awk can be used to do this. In the following code, the awk segment will create a hashtable with commands as the key and the number of times they appear as the value. This is printed out with the last history number for that command and sorted:
history | cut -f 2 | cut -d ' ' -f 3,5 | awk '{a[$2]++;b[$2]=$1} END{for (i in a) {print b[i], i, a[i]}}' | sort -k3 -rn | head -n 10
Output looks like:
975 cd 142
972 vim 122
990 ls 118
686 hg 90
974 mvn 51
939 bash 39
978 tac 32
958 cat 28
765 echo 27
981 exit 17
If you don't want the last column you could pipe the output through cut -d' ' -f1,2.
