Good day.
I have a previous question in this link. On the exported csv, I put on the first line the TABLE NAME. I wish to import this CSV to my system.
My current code is this:
DEF VAR ic as INT.
DEF VAR cTable as CHAR.
INPUT FROM VALUE(SESSION:TEMP-DIRECTORY + "temp.csv").
ic = 0.
REPEAT:
ic = ic + 1.
IF ic > 1 THEN DO:
CREATE cTable.
IMPORT DELIMITER "," cTable.
END.
IMPORT cTable.
END.
INPUT CLOSE.
I know that the code is wrong in CREATE part. How do I do this?
Also, when I EXPORT, there is an additional BLANK line after the last record. How do I remove this without opening the CSV file?
Removing the empty line at the end of your file doesn't fix your problem, worse you will not be able to read the last valid CSV line with the IMPORT statement if you do that (it needs an empty line at the end to work properly).
The actual problem is you get an empty row in your table because the IMPORT DELIMITER "," cTable. fails when the REPEAT block reaches the end of the file to leave the loop it raised the ENDKEY condition. But since you call CREATE cTable. before the loop is left you get en empty entry. I hope this explanation helps you understand how the REPEAT loop works, if you don't know this it looks just like an endless loop without any break condition.
Anyway to fix that problem you can either delete the empty row (like you did before), that is perfectly valid or you can omit the NO-UNDO from the temp table definition, because then the REPEAT will UNDO the CREATE by default.
To your other question about the CSV header line, you have to read the line somehow, I don't think there is a statement to just skip it and start reading at the 2. line in the file.
if you need the header names you can simply define a char variable for every column and import it like:
IMPORT DELIMITER "," cColumn1 cColumn2. /* for every column */
or if you just want to read and ignore it you can use
IMPORT UNFORMATTED cTemp.
with a temp variable that reads the whole line.
If you're trying to import an unknown file format, you might try reading in the entire line of data and then pick through the individual fields. As for question 2, it's better to handle the incorrect data programmatically, rather than trying to change the file itself.
This code reads in each line from the file, skipping any that are blank or null. Then it goes through each comma-separated field in the line and displays them.
DEFINE VARIABLE cTable AS CHARACTER NO-UNDO.
DEFINE VARIABLE cField AS CHARACTER NO-UNDO.
DEFINE VARIABLE iLoop AS INTEGER NO-UNDO.
INPUT FROM VALUE(SESSION:TEMP-DIRECTORY + "temp.csv").
REPEAT ON ERROR UNDO, NEXT:
IMPORT UNFORMATTED cTable. /* Read an entire line from the file. */
IF cTable = "" OR cTable = ? THEN NEXT. /* Skip blank lines. */
DO iLoop = 1 TO NUM-ENTRIES(cTable, ","): /* Break up the line by the comma delimiter. */
cField = ENTRY(iLoop, cTable).
MESSAGE "Field " + STRING(iloop) + ": " + cField VIEW-AS ALERT-BOX.
END.
END.
INPUT CLOSE.
Since the file layout is unknown, all of the fields are read as character. You'll need to add some logic to determine if the values are integers, decimals, dates, etc.
Related
I have read data from a csv file containing many duplicate email addresses into a temp table. The format is essentially id, emailtype-description, email.
Here is an example of some data:
id emailtype-description email
1 E-Mail john#gmail.com
1 preferred E-mail john#gmail.com
2 2nd E-mail stacey#yahoo.com
2 preferred-Email sth#yahoo.com
2 family E-Mail sth#yahoo.com
cInputFile = SUBSTITUTE(cDataDirectory, "Emails").
INPUT STREAM csv FROM VALUE(cInputFile).
IMPORT STREAM csv DELIMITER "," ^ NO-ERROR.
REPEAT TRANSACTION:
CREATE ttEmail.
IMPORT STREAM csv DELIMITER ","
ttEmail.uniqueid
ttEmail.emailTypeDescription
ttEmail.emailAddr
.
END.
INPUT STREAM csv CLOSE.
I want to dedupe the rows, but I don't want to do this randomly. I want to make sure that certain types take priority over others. For instance, some are marked with the type "preferred E-mail" and those should always remain if they exist, additional types take precedent over others, so "E-mail" will take precedent over "2nd-Email" or "family E-Mail".
I'd like to do in Progress code the equivalent of a custom sort of emailtype-description, then a de-dupe. That way I could define the sort order and then dedupe to retain the emails and the types by priority.
Is there a way to do this to my table in Progress? I want to sort first by uniqueid, then by emailtype-description, but I want a custom sort, not an alphabetical sort. What is the best approach?
When you say that you want a custom sort, not alphabetical do you mean that you want to sort by the emailtype in a non-alphabetical way? If so then I think that you would need to translate the email type into a field that sorts the way that you wish. Something along these lines:
/* first add a field to your ttEmail called emailTypeSortOrder */
define variable emailTypeSortOrderList as character no-undo.
emailTypeSortOrderList = "preferred E-mail,E-mail,2nd-Email,family E-mail".
cInputFile = SUBSTITUTE(cDataDirectory, "Emails").
INPUT STREAM csv FROM VALUE(cInputFile).
IMPORT STREAM csv DELIMITER "," ^ NO-ERROR.
REPEAT TRANSACTION:
CREATE ttEmail.
IMPORT STREAM csv DELIMITER ","
ttEmail.uniqueid
ttEmail.emailTypeDescription
ttEmail.emailAddr
.
/* classify the email type sort order
*/
ttEmail.emailTypeSortOrder = lookup( emailTypeDescription, emailTypeSortOrderList ).
if ttEmail.emailTypeSortOrder <= 0 then emailTypeSortOrder = 9999999.
END.
INPUT STREAM csv CLOSE.
And now you can sort and de-duplicate using the newly ordered field:
for each ttEmail break by ttEmail.emailAddr by ttEmail.emailTypeSortOrder:
if first-of( ttEmail.emailAddr ) then
next. /* always keep the first one */
else
delete ttEmail. /* remove unwanted duplicates... */
end.
I am trying to import a .csv file to match the records in the database. However, the database records has leading zeros. This is a character field The amount of data is a bit higher side.
Here the length of the field in database is x(15).
The problem I am facing is that the .csv file contains data like example AB123456789 wherein the database field has "00000AB123456789" .
I am importing the .csv to a character variable.
Could someone please let me know what should I do to get the prefix zeros using progress query?
Thank you.
You need to FILL() the input string with "0" in order to pad it to a specific length. You can do that with code similar to this:
define variable inputText as character no-undo format "x(15)".
define variable n as integer no-undo.
input from "input.csv".
repeat:
import inputText.
n = 15 - length( inputText ).
if n > 0 then
inputText = fill( "0", n ) + inputText.
display inputText.
end.
input close.
Substitute your actual field name for inputText and use whatever mechanism you are actually using for importing the CSV data.
FYI - the "length of the field in the database" is NOT "x(15)". That is a display formatting string. The data dictionary has a default format string that was created when the schema was defined but it has absolutely no impact on what is actually stored in the database. ALL Progress data is stored as variable length length. It is not padded to fit the display format and, in fact, it can be "overstuffed" and it is very, very common for applications to do so. This is a source of great frustration to SQL reporting tools that think the display format is some sort of length limit. It is not.
I am trying to add space between results when i am using the append function.
I am using export delimiter which i am appending the data and adding SKIP at the end of my export statement.
how can achieve this?
Thank you.
I have tried using double spaces, using ","
export delimiter ","
col1
col2
col3
skip.
Im expecting output similiar as shown below.
Results
Space
Results
Space
Results
Space
Results
Space.
Thank you
Multiple skips will not work in an export action. You can use PUT UNFORMATTED though...
OUTPUT TO value("c:\tmp\jp1.txt").
DEFINE VARIABLE i AS INTEGER NO-UNDO.
DO i = 1 TO 10:
PUT UNFORMATTED "JP" "," "PV" "," "HX" SKIP(1).
END.
OUTPUT CLOSE.
MESSAGE 1
VIEW-AS ALERT-BOX.
Try multiple skips. Like skip(1) or skip(2). If it still doesn't work, try putting a put skip statement after your export command.
I'm coming from a Java/.NET background and trying to learn ABL but the difference in structure and the limited information on the internet is making it hard. What I want to do is import data from a text file which is in the following format:
john smith 52 ceo
...
line by line, and take the different parts based on the position of the character. For example, positions 1-10 are for the first name, 10-20 second name and so on... Do I have to use entry for that? If so can someone more experienced give an example how to do it cause I'm quite confused. Then I need to add a record for each line to a temp-table I have created called tt-employee. How do I go about doing that?
I apologise if my question is a bit vague but as I said, I am new to this so I'm still figuring things out.
If space is a delimiter you can use the IMPORT statement.
DEFINE TEMP-TABLE tt-employee NO-UNDO
FIELD firstname AS CHARACTER
FIELD lastname AS CHARACTER
FIELD age AS INTEGER
FIELD empTitle AS CHARACTER.
INPUT FROM c:\temp\indata.dat.
REPEAT:
CREATE tt-employee.
IMPORT DELIMITER " " tt-employee.
END.
INPUT CLOSE.
However if there isn't a delimiter but rather a fixed record with (as you mention) you can do something like this (error checking and correct record lengths needs to be applied).
/* Skipping temp-table definition - copy-paste from above */
DEFINE VARIABLE cRow AS CHARACTER NO-UNDO.
INPUT FROM c:\temp\indata.dat.
REPEAT:
IMPORT UNFORMATTED cRow.
/* You could replace 0 with a higher number that qualifies a record so
SUBSTRING doesn't return an error if reading past end of line */
IF LENGTH(cRow) > 0 THEN DO:
CREATE tt-employee.
ASSIGN
tt-employee.firstname = SUBSTRING(cRow, 1, 10)
tt-employee.lastname = SUBSTRING(cRow, 11, 10)
tt-employee.age = INTEGER(SUBSTRING(cRow, 21, 2))
tt-employee.empTitle = SUBSTRING(cRow, 23, 10) NO-ERROR.
END.
END.
INPUT CLOSE.
There are several places on the web to look for OpenEdge information:
Official knowledgebase - http://knowledgebase.progress.com/
Official community - https://community.progress.com/?Redirected=true
More communities - http://www.progresstalk.com/ and http://oehive.org/
I have a large dataset in dbf file and would like to export it to the csv type file.
Thanks to SO already managed to do it smoothly.
However, when I try to import it into R (the environment I work) it combines some characters together, making some rows much longer than they should be, consequently breaking the whole database. In the end, whenever I import the exported csv file I get only half of the db.
Think the main problem is with quotes in string characters, but specifying quote="" in R didn't help (and it helps usually).
I've search for any question on how to deal with quotes when exporting in visual foxpro, but couldn't find the answer. Wanted to test this but my computer catches error stating that I don't have enough memory to complete my operation (probably due to the large db).
Any helps will be highly appreciated. I'm stuck with this problem on exporting from the dbf into R for long enough, searched everything I could and desperately looking for a simple solution on how to import large dbf to my R environment without any bugs.
(In R: Checked whether have problems with imported file and indeed most of columns have much longer nchars than there should be, while the number of rows halved. Read the db with read.csv("file.csv", quote="") -> didn't help. Reading with data.table::fread() returns error
Expected sep (',') but '0' ends field 88 on line 77980:
But according to verbose=T this function reads right number of rows (read.csv imports only about 1,5 mln rows)
Count of eol after first data row: 2811729 Subtracted 1 for last eol
and any trailing empty lines, leaving 2811728 data rows
When exporting to TYPE DELIMITED You have some control on the VFP side as to how the export formats the output file.
To change the field separator from quotes to say a pipe character you can do:
copy to myfile.csv type delimited with "|"
so that will produce something like:
|A001|,|Company 1 Ltd.|,|"Moorfields"|
You can also change the separator from a comma to another character:
copy to myfile.csv type delimited with "|" with character "#"
giving
|A001|#|Company 1 Ltd.|#|"Moorfields"|
That may help in parsing on the R side.
There are three ways to delimit a string in VFP - using the normal single and double quote characters. So to strip quotes out of character fields myfield1 and myfield2 in your DBF file you could do this in the Command Window:
close all
use myfile
copy to mybackupfile
select myfile
replace all myfield1 with chrtran(myfield1,["'],"")
replace all myfield2 with chrtran(myfield2,["'],"")
and repeat for other fields and tables.
You might have to write code to do the export, rather than simply using the COPY TO ... DELIMITED command.
SELECT thedbf
mfld_cnt = AFIELDS(mflds)
fh = FOPEN(m.filename, 1)
SCAN
FOR aa = 1 TO mfld_cnt
mcurfld = 'thedbf.' + mflds[aa, 1]
mvalue = &mcurfld
** Or you can use:
mvalue = EVAL(mcurfld)
** manipulate the contents of mvalue, possibly based on the field type
DO CASE
CASE mflds[aa, 2] = 'D'
mvalue = DTOC(mvalue)
CASE mflds[aa, 2] $ 'CM'
** Replace characters that are giving you problems in R
mvalue = STRTRAN(mvalue, ["], '')
OTHERWISE
** Etc.
ENDCASE
= FWRITE(fh, mvalue)
IF aa # mfld_cnt
= FWRITE(fh, [,])
ENDIF
ENDFOR
= FWRITE(fh, CHR(13) + CHR(10))
ENDSCAN
= FCLOSE(fh)
Note that I'm using [ ] characters to delimit strings that include commas and quotation marks. That helps readability.
*create a comma delimited file with no quotes around the character fields
copy to TYPE DELIMITED WITH "" (2 double quotes)