Custom sorting and deduping

Custom sorting and deduping - openedge

I have read data from a csv file containing many duplicate email addresses into a temp table. The format is essentially id, emailtype-description, email.
Here is an example of some data:
id emailtype-description email
1 E-Mail john#gmail.com
1 preferred E-mail john#gmail.com
2 2nd E-mail stacey#yahoo.com
2 preferred-Email sth#yahoo.com
2 family E-Mail sth#yahoo.com
cInputFile = SUBSTITUTE(cDataDirectory, "Emails").
INPUT STREAM csv FROM VALUE(cInputFile).
IMPORT STREAM csv DELIMITER "," ^ NO-ERROR.
REPEAT TRANSACTION:
CREATE ttEmail.
IMPORT STREAM csv DELIMITER ","
ttEmail.uniqueid
ttEmail.emailTypeDescription
ttEmail.emailAddr
.
END.
INPUT STREAM csv CLOSE.
I want to dedupe the rows, but I don't want to do this randomly. I want to make sure that certain types take priority over others. For instance, some are marked with the type "preferred E-mail" and those should always remain if they exist, additional types take precedent over others, so "E-mail" will take precedent over "2nd-Email" or "family E-Mail".
I'd like to do in Progress code the equivalent of a custom sort of emailtype-description, then a de-dupe. That way I could define the sort order and then dedupe to retain the emails and the types by priority.
Is there a way to do this to my table in Progress? I want to sort first by uniqueid, then by emailtype-description, but I want a custom sort, not an alphabetical sort. What is the best approach?

When you say that you want a custom sort, not alphabetical do you mean that you want to sort by the emailtype in a non-alphabetical way? If so then I think that you would need to translate the email type into a field that sorts the way that you wish. Something along these lines:
/* first add a field to your ttEmail called emailTypeSortOrder */
define variable emailTypeSortOrderList as character no-undo.
emailTypeSortOrderList = "preferred E-mail,E-mail,2nd-Email,family E-mail".
cInputFile = SUBSTITUTE(cDataDirectory, "Emails").
INPUT STREAM csv FROM VALUE(cInputFile).
IMPORT STREAM csv DELIMITER "," ^ NO-ERROR.
REPEAT TRANSACTION:
CREATE ttEmail.
IMPORT STREAM csv DELIMITER ","
ttEmail.uniqueid
ttEmail.emailTypeDescription
ttEmail.emailAddr
.
/* classify the email type sort order
*/
ttEmail.emailTypeSortOrder = lookup( emailTypeDescription, emailTypeSortOrderList ).
if ttEmail.emailTypeSortOrder <= 0 then emailTypeSortOrder = 9999999.
END.
INPUT STREAM csv CLOSE.
And now you can sort and de-duplicate using the newly ordered field:
for each ttEmail break by ttEmail.emailAddr by ttEmail.emailTypeSortOrder:
if first-of( ttEmail.emailAddr ) then
next. /* always keep the first one */
else
delete ttEmail. /* remove unwanted duplicates... */
end.

Related

Add leading zeros to a character variable in progress 4gl

I am trying to import a .csv file to match the records in the database. However, the database records has leading zeros. This is a character field The amount of data is a bit higher side.
Here the length of the field in database is x(15).
The problem I am facing is that the .csv file contains data like example AB123456789 wherein the database field has "00000AB123456789" .
I am importing the .csv to a character variable.
Could someone please let me know what should I do to get the prefix zeros using progress query?
Thank you.

You need to FILL() the input string with "0" in order to pad it to a specific length. You can do that with code similar to this:
define variable inputText as character no-undo format "x(15)".
define variable n as integer no-undo.
input from "input.csv".
repeat:
import inputText.
n = 15 - length( inputText ).
if n > 0 then
inputText = fill( "0", n ) + inputText.
display inputText.
end.
input close.
Substitute your actual field name for inputText and use whatever mechanism you are actually using for importing the CSV data.
FYI - the "length of the field in the database" is NOT "x(15)". That is a display formatting string. The data dictionary has a default format string that was created when the schema was defined but it has absolutely no impact on what is actually stored in the database. ALL Progress data is stored as variable length length. It is not padded to fit the display format and, in fact, it can be "overstuffed" and it is very, very common for applications to do so. This is a source of great frustration to SQL reporting tools that think the display format is some sort of length limit. It is not.

How to change date format while exporting as a csv file?

I use below code and its working fine. I don't want to change temp table field(dActiveDate) type but please help me to change the date format.
Note - Date format can be changed by user. It can be YY/MM/DD or DD/MM/YYYY or MM/DD/YY and so on...
DEFINE TEMP-TABLE tt_data NO-UNDO
FIELD cName AS CHARACTER
FIELD dActiveDate AS DATE.
CREATE tt_data.
ASSIGN
tt_data.cName = "David"
dActiveDate = TODAY
.
OUTPUT TO value("C:\Users\ast\Documents\QRF\data.csv").
PUT UNFORMATTED "Name,Activedate" SKIP.
FOR EACH tt_data NO-LOCK:
EXPORT DELIMITER "," tt_data. /* There are more than 15 fields available so using export delimeter helps to have less lines of code*/
END.
OUTPUT CLOSE.

As this a "part two" of this question: How to change date format based on variable initial value? why not build on the answer there?
Wrap the dateformat part in a function/procedure/method and call it in the EXPORT statement. The only change required will be to specify each field rather than just the temp-table.
EXPORT DELIMITER ","
dateformat(tt_data.dactivedate, cDateFormat)
tt_data.cName
This assumes that there's a function called dateformat that takes the date and format and returns a string with the formatted date (as in the previous question).

"and so on..." needs to be specified. Depending on the specification you may have to resort to a custom function like Jensd's answer.
If you can constrain the formats allowed, you can use normal handling by using:
session:date-format = "ymd" / "mdy" / "dmy".
session:year-offset = 1 / 1950. // for four vs two digit year
How you populate these two variables can be done in similar fashion as in the other question.
You may need to reset these session attributes to their initial state in a finally block.

Import in Openedge

Good day.
I have a previous question in this link. On the exported csv, I put on the first line the TABLE NAME. I wish to import this CSV to my system.
My current code is this:
DEF VAR ic as INT.
DEF VAR cTable as CHAR.
INPUT FROM VALUE(SESSION:TEMP-DIRECTORY + "temp.csv").
ic = 0.
REPEAT:
ic = ic + 1.
IF ic > 1 THEN DO:
CREATE cTable.
IMPORT DELIMITER "," cTable.
END.
IMPORT cTable.
END.
INPUT CLOSE.
I know that the code is wrong in CREATE part. How do I do this?
Also, when I EXPORT, there is an additional BLANK line after the last record. How do I remove this without opening the CSV file?

Removing the empty line at the end of your file doesn't fix your problem, worse you will not be able to read the last valid CSV line with the IMPORT statement if you do that (it needs an empty line at the end to work properly).
The actual problem is you get an empty row in your table because the IMPORT DELIMITER "," cTable. fails when the REPEAT block reaches the end of the file to leave the loop it raised the ENDKEY condition. But since you call CREATE cTable. before the loop is left you get en empty entry. I hope this explanation helps you understand how the REPEAT loop works, if you don't know this it looks just like an endless loop without any break condition.
Anyway to fix that problem you can either delete the empty row (like you did before), that is perfectly valid or you can omit the NO-UNDO from the temp table definition, because then the REPEAT will UNDO the CREATE by default.
To your other question about the CSV header line, you have to read the line somehow, I don't think there is a statement to just skip it and start reading at the 2. line in the file.
if you need the header names you can simply define a char variable for every column and import it like:
IMPORT DELIMITER "," cColumn1 cColumn2. /* for every column */
or if you just want to read and ignore it you can use
IMPORT UNFORMATTED cTemp.
with a temp variable that reads the whole line.

If you're trying to import an unknown file format, you might try reading in the entire line of data and then pick through the individual fields. As for question 2, it's better to handle the incorrect data programmatically, rather than trying to change the file itself.
This code reads in each line from the file, skipping any that are blank or null. Then it goes through each comma-separated field in the line and displays them.
DEFINE VARIABLE cTable AS CHARACTER NO-UNDO.
DEFINE VARIABLE cField AS CHARACTER NO-UNDO.
DEFINE VARIABLE iLoop AS INTEGER NO-UNDO.
INPUT FROM VALUE(SESSION:TEMP-DIRECTORY + "temp.csv").
REPEAT ON ERROR UNDO, NEXT:
IMPORT UNFORMATTED cTable. /* Read an entire line from the file. */
IF cTable = "" OR cTable = ? THEN NEXT. /* Skip blank lines. */
DO iLoop = 1 TO NUM-ENTRIES(cTable, ","): /* Break up the line by the comma delimiter. */
cField = ENTRY(iLoop, cTable).
MESSAGE "Field " + STRING(iloop) + ": " + cField VIEW-AS ALERT-BOX.
END.
END.
INPUT CLOSE.
Since the file layout is unknown, all of the fields are read as character. You'll need to add some logic to determine if the values are integers, decimals, dates, etc.

Want a commandline to get the data as it is as we get when we export data

I have a data that is having some thousands of records and each record having multiple columns.One of the column is having a data where there is a punctuation mark "," in it.
When I had tried to spool that data into a csv file and text to columns data using the delimters as comma,the data seems to be inappropriate as the data itself has a comma in it.
I am looking for a solution where I can export the data using a command line which is having as it is look when I export the data via TOAD.
Any help is much appreciated.
Note: I was looking for this solution since many days but got a chance now to post it here.

When exporting the dataset in Toad, select a delimiter other than a comma or drop down the "string quoting" dropdown box and select "double quote strings including NULLS".
Oh wait if you are spooling output, you'll need to add the double-quotes in your select statement like this in order to surround the columns containing the comma with double-quotes:
select '"' || column || '"' as column from table;
This format is pretty standard but use pipes as delimiters instead and save space by not having to wrap strings in double-quotes. Depends on what the consumer of the data requires really.

Progress ABL - strip and add to temp table

I'm coming from a Java/.NET background and trying to learn ABL but the difference in structure and the limited information on the internet is making it hard. What I want to do is import data from a text file which is in the following format:
john smith 52 ceo
...
line by line, and take the different parts based on the position of the character. For example, positions 1-10 are for the first name, 10-20 second name and so on... Do I have to use entry for that? If so can someone more experienced give an example how to do it cause I'm quite confused. Then I need to add a record for each line to a temp-table I have created called tt-employee. How do I go about doing that?
I apologise if my question is a bit vague but as I said, I am new to this so I'm still figuring things out.

If space is a delimiter you can use the IMPORT statement.
DEFINE TEMP-TABLE tt-employee NO-UNDO
FIELD firstname AS CHARACTER
FIELD lastname AS CHARACTER
FIELD age AS INTEGER
FIELD empTitle AS CHARACTER.
INPUT FROM c:\temp\indata.dat.
REPEAT:
CREATE tt-employee.
IMPORT DELIMITER " " tt-employee.
END.
INPUT CLOSE.
However if there isn't a delimiter but rather a fixed record with (as you mention) you can do something like this (error checking and correct record lengths needs to be applied).
/* Skipping temp-table definition - copy-paste from above */
DEFINE VARIABLE cRow AS CHARACTER NO-UNDO.
INPUT FROM c:\temp\indata.dat.
REPEAT:
IMPORT UNFORMATTED cRow.
/* You could replace 0 with a higher number that qualifies a record so
SUBSTRING doesn't return an error if reading past end of line */
IF LENGTH(cRow) > 0 THEN DO:
CREATE tt-employee.
ASSIGN
tt-employee.firstname = SUBSTRING(cRow, 1, 10)
tt-employee.lastname = SUBSTRING(cRow, 11, 10)
tt-employee.age = INTEGER(SUBSTRING(cRow, 21, 2))
tt-employee.empTitle = SUBSTRING(cRow, 23, 10) NO-ERROR.
END.
END.
INPUT CLOSE.
There are several places on the web to look for OpenEdge information:
Official knowledgebase - http://knowledgebase.progress.com/
Official community - https://community.progress.com/?Redirected=true
More communities - http://www.progresstalk.com/ and http://oehive.org/

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Custom sorting and deduping - openedge

Related

Add leading zeros to a character variable in progress 4gl

How to change date format while exporting as a csv file?

Import in Openedge

Want a commandline to get the data as it is as we get when we export data

Progress ABL - strip and add to temp table

Categories

Resources