How Can I apply a substring function on file - unix

I have a scenario where I have to find the column name for the tables.
I have a large file having around 50K records
like:
PLACES_OF_INDIA_2 :3432123, :Names Expr=('"Table_Name1".Column_1 || '' '' || "Table_Name2".Column_1 || '' '' || "Table_Name3"."Column_2"'), :Name=BCUDB2."Table_Name1".ATTR_VALUE, :Condition=BCUDB2."Table_Name1".Column_1, :Tables=(ABCXYZ."Table_Name1", ABCXYZ."Table_Name2", ABCXYZ."Table_Name3"), :Keys=ABCXYZ."IT_DIM_ANHBUSCH_37560".(M_113478_PQR(Int), M_113443_PQR(Int), M_113484_PQR(Int), M_113470_PQR(Int), M_113468_PQR(Int)), :Attrs=(Name :456866 = ('"Table_Name1".Column_1 || '' '' || "Table_Name2".Column_1 || '' '' || "Table_Name3"."Column_2"', ABCXYZ."Table_Name1", ABCXYZ."Table_Name2", ABCXYZ."Table_Name3"), PLACES_OF_ORIGIN_WineHierarchy_2 :490064736 = ABCXYZ."Table_Name1".ATTR_VALUE)
So I want to extract the columns with the line number used for the tables in the file. I have list of tables in separate file like
Table_Name1
Table_Name2
Table_Name3
and I want the output like
1,Column_1
1,Column_2
1,Column_3
Could you please check this

Related

Application Engine Peoplecode bind variables

I have the below PeopleCode step in an Application Engine program that reads a CSV file using a File Layout and then inserts the data into a table, and I am just trying to get a better understanding of how the the line of code (&SQL1 = CreateSQL("%Insert(:1)");) in the below script gets generated. It looks like the CreateSQL is using a bind variable (:1) inside the Insert statement, but I am struggling as where to find where this variable is defined in the program.
Function EditRecord(&REC As Record) Returns boolean;
Local integer &E;
&REC.ExecuteEdits(%Edit_Required + %Edit_DateRange + %Edit_YesNo + %Edit_OneZero);
If &REC.IsEditError Then
For &E = 1 To &REC.FieldCount
&MYFIELD = &REC.GetField(&E);
If &MYFIELD.EditError Then
&MSGNUM = &MYFIELD.MessageNumber;
&MSGSET = &MYFIELD.MessageSetNumber;
&LOGFILE.WriteLine("****Record:" | &REC.Name | ", Field:" | &MYFIELD.Name);
&LOGFILE.WriteLine("****" | MsgGet(&MSGSET, &MSGNUM, ""));
End-If;
End-For;
Return False;
Else
Return True;
End-If;
End-Function;
Function ImportSegment(&RS2 As Rowset, &RSParent As Rowset)
Local Rowset &RS1, &RSP;
Local string &RecordName;
Local Record &REC2, &RECP;
Local SQL &SQL1;
Local integer &I, &L;
&SQL1 = CreateSQL("%Insert(:1)");
rem &SQL1 = CreateSQL("%Insert(:1) Order by COUNT_ORDER");
&RecordName = "RECORD." | &RS2.DBRecordName;
&REC2 = CreateRecord(#(&RecordName));
&RECP = &RSParent(1).GetRecord(#(&RecordName));
For &I = 1 To &RS2.ActiveRowCount
&RS2(&I).GetRecord(1).CopyFieldsTo(&REC2);
If (EditRecord(&REC2)) Then
&SQL1.Execute(&REC2);
&RS2(&I).GetRecord(1).CopyFieldsTo(&RECP);
For &L = 1 To &RS2.GetRow(&I).ChildCount
&RS1 = &RS2.GetRow(&I).GetRowset(&L);
If (&RS1 <> Null) Then
&RSP = &RSParent.GetRow(1).GetRowset(&L);
ImportSegment(&RS1, &RSP);
End-If;
End-For;
If &RSParent.ActiveRowCount > 0 Then
&RSParent.DeleteRow(1);
End-If;
Else
&LOGFILE.WriteRowset(&RS);
&LOGFILE.WriteLine("****Correct error in this record and delete all error messages");
&LOGFILE.WriteRecord(&REC2);
For &L = 1 To &RS2.GetRow(&I).ChildCount
&RS1 = &RS2.GetRow(&I).GetRowset(&L);
If (&RS1 <> Null) Then
&LOGFILE.WriteRowset(&RS1);
End-If;
End-For;
End-If;
End-For;
End-Function;
rem *****************************************************************;
rem * PeopleCode to Import Data *;
rem *****************************************************************;
Local File &FILE1, &FILE3;
Local Record &REC1;
Local SQL &SQL1;
Local Rowset &RS1, &RS2;
Local integer &M;
&FILE1 = GetFile("\\nt115\apps\interface_prod\interface_in\Item_Loader\ItemPriceFile.csv", "r", "a", %FilePath_Absolute);
&LOGFILE = GetFile("\\nt115\apps\interface_prod\interface_in\Item_Loader\ItemPriceFile.txt", "r", "a", %FilePath_Absolute);
&FILE1.SetFileLayout(FileLayout.GH_ITM_PR_UPDT);
&LOGFILE.SetFileLayout(FileLayout.GH_ITM_PR_UPDT);
&RS1 = &FILE1.CreateRowset();
&RS = CreateRowset(Record.GH_ITM_PR_UPDT);
REM &SQL1 = CreateSQL("%Insert(:1)");
&SQL1 = CreateSQL("%Insert(:1)");
/*Skip Header Row: The following line of code reads the first line in the file layout (the header)
and does nothing. Then the pointer goes to the next line in the file and starts using the
file.readrowset*/
&some_boolean = &FILE1.ReadLine(&string);
&RS1 = &FILE1.ReadRowset();
While &RS1 <> Null
ImportSegment(&RS1, &RS);
&RS1 = &FILE1.ReadRowset();
End-While;
&FILE1.Close();
&LOGFILE.Close();
The :1 is coming from the line further down &SQL1.Execute(&REC2);
&REC2 gets assigned a record object, so the line &SQL1.Execute(&REC2); evaluates to %Insert(your_record_object)
Here is a simple example that's doing basically the same thing
Here is a description of %Insert
Answer because too long to comment:
The table name is most likely (PS_)GH_ITM_PR_UPDT. The general consensus is to name the FileLayout the same as the record it is based on.
If not, it is defined in FileLayout.GH_ITM_PR_UPDT. Open the FileLayout, right click the segment and under 'Selected Node Properties' you will find the 'File Record Name'.
In your code this record is carried over into &RS1.
&FILE1.SetFileLayout(FileLayout.GH_ITM_PR_UPDT);
&RS1 = &FILE1.CreateRowset();
The rowset is a collection of rows. A row consists of records and a record is a row of data from a database table. (Peoplesoft Object Data Types are fun...)
This rowset is filled with data in the following statement:
&RS1 = &FILE1.ReadRowset();
This uses your file as input and outputs a rowset collection, mapping the data to records based on how you defined your FileLayout.
The result is fed into the ImportSegment function:
ImportSegment(&RS1, &RS);
Function ImportSegment(&RS2 As Rowset, &RSParent As Rowset)
&RS2 in the function is a reference to &RS1 in the rest of your code.
The table name is also hidden here:
&RecordName = "RECORD." | &RS2.DBRecordName;
So if you can't/don't want to check the FileLayout, you could output &RS2.DBRecordName with a messagebox and your answer will be Message Log of your Process Monitor.
Finally a record object is created for this database table and it is filled with a row from the rowset. This record is inserted into the database table:
&REC2 = CreateRecord(#(&RecordName));
&RS2(&I).GetRecord(1).CopyFieldsTo(&REC2);
&SQL1 = CreateSQL("%Insert(:1)");
&SQL1.Execute(&REC2);
TLDR:
Table name can be found in the FileLayout or output in the ImportSegment Function as &RS2.DBRecordName

Validating data type of all column in csv file through unix

I have CSV file like this
dsdgh|234|#jhsjdh||jdhjdhfu|123|
#45ghf|123|laiej|||b8#hfj|
|hyrhyf|123||fhyr|##$%|
and so on.
The number of column can be equals to 100. Also above file is pipe separated.
I want to check data type of each column i.e. whether a column is numeric or alphabetic or alphanumeric
and want to redirect result in txt file
please help me , to achieve this
thanks
Assuming that in every row number of columns is the same, you can use this script:
import re
import sys
input_file = open(sys.argv[1])
cols = None
for line in input_file.readlines():
fields = line.split('|')
if not cols:
cols = map(lambda _: 'empty', fields)
for i, field in enumerate(fields):
if field == '':
continue
if re.match(r'^[0-9]+$', field):
if cols[i] == 'empty':
cols[i] = 'numeric'
elif cols[i] == 'alphabetic':
cols[i] = 'alphanumeric'
elif re.match(r'^[^0-9]+$', field):
if cols[i] == 'empty':
cols[i] = 'alphabetic'
if cols[i] == 'numeric':
cols[i] = 'alphanumeric'
else:
cols[i] = 'alphanumeric'
print '|'.join(cols)
Just save it to file (script.py in this example), and run:
$ python script.py <path_to_file_with_columns>

Improve AWK script - Compare and add fields

I have a CSV file separated by semicolons, the file contains a sentiment analysis on customer reviews.
The views are grouped by field 1 and 6, what I have to do is add the last fields of each line of the group and then compare the sum with field 3. If they match, the comparison equals 1 if not 0.
At the same time, I have also compare fields 6 and 7 applying the same rule above.
Finally calculate the number of ones obtained for both comparisons separately.
I have a script made below, but I think it could be improved. Any suggestions? Also I am not sure that the script is fine..!
BEGIN {
OFS=FS=";";
flag="";
counter1=0;
counter2=0;
counter3=0;
}
{
number=$1;
topic=$6;
id= number";"topic;
if (id != flag)
{
for (i in topics)
{
if ((sum < 0) && (polarity[i] == "negative") || (sum > 0) && (polarity[i] == "positive"))
{
hit_2=1;
counter2++;
}
else
{
hit_2=0;
}
s=split(topics[i],words,";")
hit_1=0;
for (k=1;k<=s;k++)
{
if ((words[k] == words[k+1]) && (words[k] != "") || (words[k] == "NULL") && (hit_2 == 1))
{
hit_1=1;
}
}
if (hit_1 == 1)
{
counter1++;
}
print to_print[i]";"hit_1";"hit_2;
}
delete topics;
delete to_print;
delete polarity;
counter3++;
sum="";
flag=id;
}
sum += $(NF-1);
topics[$1";"$6]=topics[$1";"$6] ";"$6";"$7;
to_print[$1";"$6]=$1";"$2";"$3";"$4";"$5";"$6
polarity[$1";"$6]=$3;
}
END {
print ""
print "#### - sentiments: "counter3" - topic: "counter1 " - polarity: "counter2;
}
A portion of the input data:
100429301;"RESTAURANT#GENERAL";negative;1004293;10042930;place;place;place;good;good;2.000000;
100429301;"RESTAURANT#GENERAL";negative;1004293;10042930;place;place;place;not longer;not longer;-3.000000;
100429331;"FOOD#QUALITY";negative;1004293;10042933;food;food;food;lousy;lousy;-3.000000;
100429331;"FOOD#QUALITY";negative;1004293;10042933;food;food;food;too sweet;too sweet;3.600000;
100429331;"FOOD#QUALITY";negative;1004293;10042933;food;portions;portion;tiny;tiny;-1.000000;
103269521;"FOOD#QUALITY";positive;1032695;10326952;duck breast special;visit;visit;incredible;incredible;4.000000;
Output:
100429301;"RESTAURANT#GENERAL";negative;1004293;10042930;place;1;1
100429331;"FOOD#QUALITY";negative;1004293;10042933;food;1;1
103269521;"FOOD#QUALITY";positive;1032695;10326952;duck breast special;0;1
#### - sentiments: 57 - topic: 28 - polarity: 39
I rewrote part of it, perhaps you can extend this further.
$ awk -F';' -v OFS=';' '
{key=$1 FS $3 FS $6;
sum[key]+=$(NF-1);
line[key]=$1 FS $2 FS $3 FS $4 FS $5 FS $6;
sign[key]=($3=="negative"?-1:1)
}
END{for(k in sum)
print line[k],(sum[k]*sign[k]<0?0:1),sum[k],sign[k]}' data
100429301;"RESTAURANT#GENERAL";negative;1004293;10042930;place;1;-1;-1
103269521;"FOOD#QUALITY";positive;1032695;10326952;duck breast special;1;4;1
100429331;"FOOD#QUALITY";negative;1004293;10042933;food;1;-0.4;-1
This only does your first check (I guess hit2), while also added sum and sign information (last two fields).

Calculating and displaying date difference

I have two dates which im loading into variables using
a=`date +%s`
b=`date +%s`
i want to know the difference between times e.g difference 00:00:10 and so on , i do calculate it using
diff=$(( b-a ))
echo "$(( diff/3600 )):$((( diff/60)%60)):$((diff%60))"
but the output is 0:0:07 , how can i convert it on 2points = on 00:00:07?
If the string length is 1 then added the zero with value
hour=$(( diff/3600 ))
min=$((( diff/60)%60))
sec=$((diff%60))
[[ ${#hour} == 1 ]] && hour="0$hour" || hour="$hour"
[[ ${#min} == 1 ]] && min="0$min" || min="$min"
[[ ${#sec} == 1 ]] && sec="0$sec" || sec="$sec"
echo "$hour:$min:$sec"
Output:
00:00:16

awk count and sum based on slab:

Would like to extract all the lines from first file (GunZip *.gz i.e Input.csv.gz), if the first file 4th field is falls within a range of
Second file (Slab.csv) first field (Start Range) and second field (End Range) then populate Slab wise count of rows and sum of 4th and 5th field of first file.
Input.csv.gz (GunZip)
Desc,Date,Zone,Duration,Calls
AB,01-06-2014,XYZ,450,3
AB,01-06-2014,XYZ,642,3
AB,01-06-2014,XYZ,0,0
AB,01-06-2014,XYZ,205,3
AB,01-06-2014,XYZ,98,1
AB,01-06-2014,XYZ,455,1
AB,01-06-2014,XYZ,120,1
AB,01-06-2014,XYZ,0,0
AB,01-06-2014,XYZ,193,1
AB,01-06-2014,XYZ,0,0
AB,01-06-2014,XYZ,161,2
Slab.csv
StartRange,EndRange
0,0
1,10
11,100
101,200
201,300
301,400
401,500
501,10000
Expected Output:
StartRange,EndRange,Count,Sum-4,Sum-5
0,0,3,0,0
1,10,NotFound,NotFound,NotFound
11,100,1,98,1
101,200,3,474,4
201,300,1,205,3
301,400,NotFound,NotFound,NotFound
401,500,2,905,4
501,10000,1,642,3
I am using below two commands to get the above output , expect "NotFound"cases .
awk -F, 'NR==FNR{s[NR]=$1;e[NR]=$2;c[NR]=$0;n++;next} {for(i=1;i<=n;i++) if($4>=s[i]&&$4<=e[i]) {print $0,","c[i];break}}' Slab.csv <(gzip -dc Input.csv.gz) >Op_step1.csv
cat Op_step1.csv | awk -F, '{key=$6","$7;++a[key];b[key]=b[key]+$4;c[key]=c[key]+$5} END{for(i in a)print i","a[i]","b[i]","c[i]}' >Op_step2.csv
Op_step2.csv
101,200,3,474,4
501,10000,1,642,3
0,0,3,0,0
401,500,2,905,4
11,100,1,98,1
201,300,1,205,3
Any suggestions to make it one liner command to achieve the Expected Output , Don't have perl , python access.
Here is another option using perl which takes benefits of creating multi-dimensional arrays and hashes.
perl -F, -lane'
BEGIN {
$x = pop;
## Create array of arrays from start and end ranges
## $range = ( [0,0] , [1,10] ... )
(undef, #range)= map { chomp; [split /,/] } <>;
#ARGV = $x;
}
## Skip the first line
next if $. ==1;
## Create hash of hash
## $line = '[0,0]' => { "count" => counts , "sum4" => sum_of_col4 , "sum5" => sum_of_col5 }
for (#range) {
if ($F[3] >= $_->[0] && $F[3] <= $_->[1]) {
$line{"#$_"}{"count"}++;
$line{"#$_"}{"sum4"} +=$F[3];
$line{"#$_"}{"sum5"} +=$F[4];
}
}
}{
print "StartRange,EndRange,Count,Sum-4,Sum-5";
print join ",", #$_,
$line{"#$_"}{"count"} //"NotFound",
$line{"#$_"}{"sum4"} //"NotFound",
$line{"#$_"}{"sum5"} //"NotFound"
for #range
' slab input
StartRange,EndRange,Count,Sum-4,Sum-5
0,0,3,0,0
1,10,NotFound,NotFound,NotFound
11,100,1,98,1
101,200,3,474,4
201,300,1,205,3
301,400,NotFound,NotFound,NotFound
401,500,2,905,4
501,10000,1,642,3
Here is one way using awk and sort:
awk '
BEGIN {
FS = OFS = SUBSEP = ",";
print "StartRange,EndRange,Count,Sum-4,Sum-5"
}
FNR == 1 { next }
NR == FNR {
ranges[$1,$2]++;
next
}
{
for (range in ranges) {
split(range, tmp, SUBSEP);
if ($4 >= tmp[1] && $4 <= tmp[2]) {
count[range]++;
sum4[range]+=$4;
sum5[range]+=$5;
next
}
}
}
END {
for(range in ranges)
print range, (count[range]?count[range]:"NotFound"), (sum4[range]?sum4[range]:"NotFound"), (sum5[range]?sum5[range]:"NotFound") | "sort -t, -nk1,2"
}' slab input
StartRange,EndRange,Count,Sum-4,Sum-5
0,0,3,NotFound,NotFound
1,10,NotFound,NotFound,NotFound
11,100,1,98,1
101,200,3,474,4
201,300,1,205,3
301,400,NotFound,NotFound,NotFound
401,500,2,905,4
501,10000,1,642,3
Set the Input, Output Field Separators and SUBSEP to ,. Print the Header line.
If it is the first line skip it.
Load the entire slab.txt in to an array called ranges.
For every range in the ranges array, split the field to get start and end range. If the 4th column is in the range, increment the count array and add the value to sum4 and sum5 array appropriately.
In the END block, iterate through the ranges and print them.
Pipe the output to sort to get the output in order.

Resources