I am trying to import a .csv file to match the records in the database. However, the database records has leading zeros. This is a character field The amount of data is a bit higher side.
Here the length of the field in database is x(15).
The problem I am facing is that the .csv file contains data like example AB123456789 wherein the database field has "00000AB123456789" .
I am importing the .csv to a character variable.
Could someone please let me know what should I do to get the prefix zeros using progress query?
Thank you.
You need to FILL() the input string with "0" in order to pad it to a specific length. You can do that with code similar to this:
define variable inputText as character no-undo format "x(15)".
define variable n as integer no-undo.
input from "input.csv".
repeat:
import inputText.
n = 15 - length( inputText ).
if n > 0 then
inputText = fill( "0", n ) + inputText.
display inputText.
end.
input close.
Substitute your actual field name for inputText and use whatever mechanism you are actually using for importing the CSV data.
FYI - the "length of the field in the database" is NOT "x(15)". That is a display formatting string. The data dictionary has a default format string that was created when the schema was defined but it has absolutely no impact on what is actually stored in the database. ALL Progress data is stored as variable length length. It is not padded to fit the display format and, in fact, it can be "overstuffed" and it is very, very common for applications to do so. This is a source of great frustration to SQL reporting tools that think the display format is some sort of length limit. It is not.
I have a table in Progress named Car . I need to have my progress code take an input parameter of one Car instance.
I've tried this
DEFINE INPUT PARAMETER i_tuPDO AS Car.
But this results in compiler error.
You cannot use a single record as input. You could either define an object that relates to the "car" record and input that object. Another option is to input the corresponding BUFFER-HANDLE instead.
DEFINE TEMP-TABLE tt NO-UNDO
FIELD a AS CHARACTER.
CREATE tt.
ASSIGN tt.a = "HELLO".
RUN proc (INPUT BUFFER tt:HANDLE).
PROCEDURE proc:
DEFINE INPUT PARAMETER phBuffer AS HANDLE NO-UNDO.
MESSAGE phBuffer:BUFFER-FIELD(1):BUFFER-VALUE VIEW-AS ALERT-BOX.
END.
If what you really want is inputting a DATASET into a procedure (or a program) that can be done like this:
DEFINE TEMP-TABLE tt NO-UNDO
FIELD a AS CHARACTER.
DEFINE DATASET ds FOR tt.
CREATE tt.
ASSIGN tt.a = "HELLO".
RELEASE tt.
RUN proc (INPUT DATASET ds).
PROCEDURE proc:
DEFINE INPUT PARAMETER DATASET FOR ds.
FIND FIRST tt NO-ERROR.
IF AVAILABLE tt THEN
DISPLAY tt.
END.
I'm not totally sure what you are trying to do. In case you want to pass a specific record of the table Car you could pass load the buffer and pass it or pass the rowid of the buffer. Example:
PROCEDURE test1 :
define parameter buffer pbCar for Car.
END procedure.
PROCEDURE test2 :
define input parameter rCar as rowid no-undo.
define buffer bCar for Car.
find bCar
where rowid(bCar) = rCar
no-lock.
END procedure.
find first Car no-lock.
run test1 ( buffer Car ).
run test2 ( rowid(Car) ).
I am new to RSQLite.
I have an input document in text format in which values are seperately by '|'
I created a table with the required variables (dummy code as follows)
db<-dbconnect(SQLite(),dbname="test.sqlite")
dbSendQuery(conn=db,
"CREATE TABLE TABLE1(
MARKS INTEGER,
ROLLNUM INTEGER
NAME CHAR(25)
DATED DATE)"
)
However I am struck at how to import values into the created table.
I cannot use INSERT INTO Values command as there are thousands of rows and more than 20+ columns in the original data file and it is impossible to manually type in each data point.
Can someone suggest an alternative efficient way to do so?
You are using a scripting language. The deal of this is literally to avoid manually typing each data point. Sorry.
You have two routes:
1: You have corrected loaded a database connection and created an empty table in your SQLite database. Nice!
To load data into the table, load your text file into R using e.g. df <-
read.table('textfile.txt', sep='|') (modify arguments to fit your text file).
To have a 'dynamic' INSERT statement, you can use placeholders. RSQLite allows for both named or positioned placeholder. To insert a single row, you can do:
dbSendQuery(db, 'INSERT INTO table1 (MARKS, ROLLNUM, NAME) VALUES (?, ?, ?);', list(1, 16, 'Big fellow'))
You see? The first ? got value 1, the second ? got value 16, and the last ? got the string Big fellow. Also note that you do not enclose placeholders for text in quotation marks (' or ")!
Now, you have thousands of rows. Or just more than one. Either way, you can send in your data frame. dbSendQuery has some requirements. 1) That each vector has the same number of entries (not an issue when providing a data.frame). And 2) You may only submit the same number of vectors as you have placeholders.
I assume your data frame, df contains columns mark, roll, and name, corrsponding to the columns. Then you may run:
dbSendQuery(db, 'INSERT INTO table1 (MARKS, ROLLNUM, NAME) VALUES (:mark, :roll, :name);', df)
This will execute an INSERT statement for each row in df!
TIP! Because an INSERT statement is execute for each row, inserting thousands of rows can take a long time, because after each insert, data is written to file and indices are updated. Insert, enclose it in an transaction:
dbBegin(db)
res <- dbSendQuery(db, 'INSERT ...;', df)
dbClearResult(res)
dbCommit(db)
and SQLite will save the data to a journal file, and only save the result when you execute the dbCommit(db). Try both methods and compare the speed!
2: Ah, yes. The second way. This can be done in SQLite entirely.
With the SQLite command utility (sqlite3 from your command line, not R), you can attach a text file as a table and simply do a INSERT INTO ... SELECT ... ; command. Alternately, read the text file in sqlite3 into a temporary table and run a INSERT INTO ... SELECT ... ;.
Useful site to remember: http://www.sqlite.com/lang.html
A little late to the party, but DBI provides dbAppendTable() which will write the contents of a dataframe to an SQL table. Column names in the dataframe must match the field names in the database. For your example, the following code would insert the contents of my random dataframe into your newly created table.
library(DBI)
db<-dbConnect(RSQLite::SQLite(),dbname=":memory")
dbExecute(db,
"CREATE TABLE TABLE1(
MARKS INTEGER,
ROLLNUM INTEGER,
NAME TEXT
)"
)
df <- data.frame(MARKS = sample(1:100, 10),
ROLLNUM = sample(1:100, 10),
NAME = stringi::stri_rand_strings(10, 10))
dbAppendTable(db, "TABLE1", df)
I don't think there is a nice way to do a large number of inserts directly from R. SQLite does have a bulk insert functionality, but the RSQLite package does not appear to expose it.
From the command line you may try the following:
.separator |
.import your_file.csv your_table
where your_file.csv is the CSV (or pipe delimited) file containing your data and your_table is the destination table.
See the documentation under CSV Import for more information.
I am currently working on converting a SAS script to R. As I am relatively new to SAS, I am having a hard time understanding the following statement -
VARS=date id sales units
/* create lag event variable names to be used in the RETAIN statement */
%let vars_l = lag_%sysfunc(tranwrd(&vars,%str( ),%str( lag_)));
Here, date, id etc. are all variables present in my current data set. I understand the function tranwrd is used to replace a value by another value in a Character variable. In this case, it creates new items as -
vars_l = lag_date lag_id lag_sales lag_units
Am I right? What is vars_l? Is it a list? Or are these variables that are added to my dataset?
Also what is the use of lag before %sysfunc in the code below for?
%let vars_l = lag_%sysfunc(tranwrd(&vars,%str( ),%str( lag_)));
Are lagged variables created at all or just variables with no values prefixed with lag_ is created?
I don't have access to SAS or the datasets to try and check the result. Any help on this would be great. Thanks!
The following code is basically creating macro variables to hold a list of variables to process. (Note: macros in SAS are mere text replacements!)
%let VARS=date id sales units ;
/* create lag event variable names to be used in the RETAIN statement */
%let vars_l = lag_%sysfunc(tranwrd(&vars,%str( ),%str( lag_)));
If you run the following code (to see what exactly is stored in VARS & vars_l macros you see the following in the log:
31 %put VARS::&VARS.;
SYMBOLGEN: Macro variable VARS resolves to date id sales units
VARS::date id sales units
32 %put VARS_l::&VARS_L.;
SYMBOLGEN: Macro variable VARS_L resolves to lag_date lag_id lag_sales lag_units
VARS_l::lag_date lag_id lag_sales lag_units
In R the equvialent would be the following:
VARS<-c("date", "id", "sales", "units" )
vars_l<-paste("lag_",VARS, sep="")
The second vars_l macro assignment is just adding lag_ to the begining of each space delimited value in VARS macro variable. Since the first value does not begin with a space, if you omit the lag_ at the begining of %let vars_l = lag_%sysfunc(tranwrd(&vars,%str( ),%str( lag_))); you will get the following stored in vars_l: date lag_id lag_sales lag_units
From the code I can see there are no variables created just yet, but you should find a data step later on which does that. The mention of RETAIN statement in the comments suggests just that.
I'm using the sqlite dialect with sqlalchemy. When I run the following script I see the file paths get printed out but for result.lastrowid and result.rowcount the values None and -1 get printed respectively
from sqlalchemy import create_engine, select
from create_database import files
engine = create_engine('sqlite:///test.db', echo=True)
conn = engine.connect()
s = select([files.c.full_path, files.c.file_name])
result = conn.execute(s)
for row in result:
print row[files.c.full_path]
print result.lastrowid
print result.rowcount
result.close()
Why are those methods returning None and -1 when there are rows in the result set?
Is there a constant time operation to determine whether a SELECT statement returns no rows?
lastrowid is used only for INSERT statements; rowcount is used only for UPDATE/DELETE statements.
Result rows are computed dynamically; there is no function to check result rows without actually trying to fetch one.
You have to call fetchone() and check the result.
(This has the side effect that the first result row is already fetched, so the following code must not fetch again if it wants to read the first row.)