Postgres Insert statements from stdin - sql-insert

I have a dump file which is:
COPY public.applications (id, reference_id, lead_id) FROM stdin;
This is followed by the rows that need to be added.
Instead of copy, I want to insert these rows from stdin because copy is replacing my entire table (removing existing data in the table). I just want to add rows, not remove any existing ones.
I tried:
insert into public.applications (id, reference_id, lead_id) values FROM stdin;
But this is incorrect syntax. Whats the correct way to do this?
Is there a way to tweak the copy command to only add rows and not replace the table?

As currently pointed out in the comments, copy does not do replacements.
Which is to say, COPY public.applications (id, reference_id, lead_id) FROM stdin;
will emulate same behaviour as insert.

Related

Is it possible to import a CSV file to an existing table without the headers being included?

I'm trying to import a CSV file to a table that is empty but already exists in an SQLite database. For example:
sqlite> CREATE TABLE data (...);
sqlite> .mode csv
sqlite> .import mydata.csv data
I have created the table in advance because I'd like to specify a primary key, data types, and foreign key constraints. This process works as expected, but it unfortunately includes the header row from the CSV file in the table.
Here's what I've learned from the SQLite docs regarding CSV imports:
There are two cases to consider: (1) Table "tab1" does not previously exist and (2) table "tab1" does already exist.
In the first case, when the table does not previously exist, the table is automatically created and the content of the first row of the input CSV file is used to determine the name of all the columns in the table. In other words, if the table does not previously exist, the first row of the CSV file is interpreted to be column names and the actual data starts on the second row of the CSV file.
For the second case, when the table already exists, every row of the CSV file, including the first row, is assumed to be actual content. If the CSV file contains an initial row of column labels, that row will be read as data and inserted into the table. To avoid this, make sure that table does not previously exist.
So basically, I get extra data because I've created the table in advance. Is there a flag to change this behavior? If not, what's the best workaround?
The sqlite3 command-line shell has no such flag.
If you have a sufficiently advanced OS, you can use an external tool to split off the first line:
sqlite> .import "|tail -n +2 mydata.csv" data
You can also use the --skip 1 option with .import as documented on the sqlite3 website and this SO Answer. So, you can use the following command
.import --csv --skip 1 mydata.csv data

Adding value to existing database table in RSQLite

I am new to RSQLite.
I have an input document in text format in which values are seperately by '|'
I created a table with the required variables (dummy code as follows)
db<-dbconnect(SQLite(),dbname="test.sqlite")
dbSendQuery(conn=db,
"CREATE TABLE TABLE1(
MARKS INTEGER,
ROLLNUM INTEGER
NAME CHAR(25)
DATED DATE)"
)
However I am struck at how to import values into the created table.
I cannot use INSERT INTO Values command as there are thousands of rows and more than 20+ columns in the original data file and it is impossible to manually type in each data point.
Can someone suggest an alternative efficient way to do so?
You are using a scripting language. The deal of this is literally to avoid manually typing each data point. Sorry.
You have two routes:
1: You have corrected loaded a database connection and created an empty table in your SQLite database. Nice!
To load data into the table, load your text file into R using e.g. df <-
read.table('textfile.txt', sep='|') (modify arguments to fit your text file).
To have a 'dynamic' INSERT statement, you can use placeholders. RSQLite allows for both named or positioned placeholder. To insert a single row, you can do:
dbSendQuery(db, 'INSERT INTO table1 (MARKS, ROLLNUM, NAME) VALUES (?, ?, ?);', list(1, 16, 'Big fellow'))
You see? The first ? got value 1, the second ? got value 16, and the last ? got the string Big fellow. Also note that you do not enclose placeholders for text in quotation marks (' or ")!
Now, you have thousands of rows. Or just more than one. Either way, you can send in your data frame. dbSendQuery has some requirements. 1) That each vector has the same number of entries (not an issue when providing a data.frame). And 2) You may only submit the same number of vectors as you have placeholders.
I assume your data frame, df contains columns mark, roll, and name, corrsponding to the columns. Then you may run:
dbSendQuery(db, 'INSERT INTO table1 (MARKS, ROLLNUM, NAME) VALUES (:mark, :roll, :name);', df)
This will execute an INSERT statement for each row in df!
TIP! Because an INSERT statement is execute for each row, inserting thousands of rows can take a long time, because after each insert, data is written to file and indices are updated. Insert, enclose it in an transaction:
dbBegin(db)
res <- dbSendQuery(db, 'INSERT ...;', df)
dbClearResult(res)
dbCommit(db)
and SQLite will save the data to a journal file, and only save the result when you execute the dbCommit(db). Try both methods and compare the speed!
2: Ah, yes. The second way. This can be done in SQLite entirely.
With the SQLite command utility (sqlite3 from your command line, not R), you can attach a text file as a table and simply do a INSERT INTO ... SELECT ... ; command. Alternately, read the text file in sqlite3 into a temporary table and run a INSERT INTO ... SELECT ... ;.
Useful site to remember: http://www.sqlite.com/lang.html
A little late to the party, but DBI provides dbAppendTable() which will write the contents of a dataframe to an SQL table. Column names in the dataframe must match the field names in the database. For your example, the following code would insert the contents of my random dataframe into your newly created table.
library(DBI)
db<-dbConnect(RSQLite::SQLite(),dbname=":memory")
dbExecute(db,
"CREATE TABLE TABLE1(
MARKS INTEGER,
ROLLNUM INTEGER,
NAME TEXT
)"
)
df <- data.frame(MARKS = sample(1:100, 10),
ROLLNUM = sample(1:100, 10),
NAME = stringi::stri_rand_strings(10, 10))
dbAppendTable(db, "TABLE1", df)
I don't think there is a nice way to do a large number of inserts directly from R. SQLite does have a bulk insert functionality, but the RSQLite package does not appear to expose it.
From the command line you may try the following:
.separator |
.import your_file.csv your_table
where your_file.csv is the CSV (or pipe delimited) file containing your data and your_table is the destination table.
See the documentation under CSV Import for more information.

sqlloader: how to use substr in when-clause correctly?

I'm having a problem that I thought to be rather common, but trying to look it up in the "Oracle Database 10g2 Utilities_b14215.pdf" didn't help. After that I've surfed through numerous threads but no luck so far.
I'm having a tab-delimited file (x'09') e. g. name, userid, persnr. The values for the userids begin with either P, R or T e. g. P2198, P2199, R7288, T1229.
I want to load only the records with userids beginning with P.
Isolating a single record with a controlfile like this works splendidly:
OPTIONS (SKIP=1)
LOAD DATA
INFILE UserlistLoader.dat
APPEND
INTO TABLE Z_USERLIST
WHEN USERID = 'P2198'
FIELDS TERMINATED BY x'09'
TRAILING NULLCOLS
(name, userid, persnr)
But every attempt at using SUBSTR in the when-clause fails.
This:
OPTIONS (SKIP=1)
LOAD DATA
INFILE UserlistLoader.dat
APPEND
INTO TABLE Z_USERLIST
WHEN SUBSTR(USERID, 1, 1) = 'P'
FIELDS TERMINATED BY x'09'
TRAILING NULLCOLS
(name, userid, persnr)
ends in an SQL*Loader-350: Syntax-Error.
This
OPTIONS (SKIP=1)
LOAD DATA
INFILE UserlistLoader.dat
APPEND
INTO TABLE Z_USERLIST
WHEN "SUBSTR(:USERID, 1, 1)" = 'P'
FIELDS TERMINATED BY x'09'
TRAILING NULLCOLS
(name, userid, persnr)
ends in an SQL*Loader-403: Referenced column USERID not present in table Z_USERLIST.
But IT IS PRESENT - as the first example proves. I've found that the column should be preceded by : but that obviously isn't the issue.
What am I doing wrong?
From SQL Loader docs the left-hand side of a WHEN condition can only be a full field name e.g. USERID or a position spec e.g. (3:5).
The docs aren't very clear though on what is allowed - e.g. can LIKE be used as the operator?
USERID LIKE 'P%'
I strongly suspect it can't though.
I would load the entire file into a staging table that matches the file layout, then run a procedure that inserts the rows you want from there into the production table. That is a more common way to handle loads with criteria like this without having to edit source data.
If you can preprocess the source file, move the userid to the first field or copy the first letter of the userid to it's own field and construct the WHEN like this so sqlldr looks at the first position (this will cause sqlldr to return non-zero though, as not all rows meet WHEN clause criteria):
WHEN (1) = 'P'

SQLITE: select rows where a certian column is contained in a given string

I have a table which has a column named "directory" which contains strings like:
c:\mydir1\mysubdir1\
c:\mydir2
j:\myotherdir
...
I would like to do something like
SELECT FROM mytable WHERE directory is contained within 'c:\mydir2\something\'
This query should give me as a result:
c:\mydir2
Ok, I've just found that sqlite has a function instr that seems to work for my purpose.
Not sure about the performance, though.

Why Unix Join Failed Even Though Corresponding Entries Exist in Two Files

I tried the simple join
join query.txt source.tab
Based on 1st colum in both files. It's clear that source.tab
contain the query. But why the operation yields no result?
Both of the query and source file is downloadable here:
http://dl.dropbox.com/u/11482318/query.txt (2B)
http://dl.dropbox.com/u/11482318/source.tab (40KB)
Tha man page for join says that (as suggested by shelter):
Important: FILE1 and FILE2 must be sorted on the join fields.
In your case the source.tab file is sorted naturally on the first field (r1.1, r2.1, etc.) But the sort order required by join would be based on the collating sequence of sort (probably r1.1, r10.1, r100.1, r11.1, r12.1, etc.)
If you sort your source.tab file using the sort command, then join, it should work.
(Note that - perhaps by luck - the query.txt file has the correct sort order.)
The join command will not return results if the file has a header. This causes join to consider the file as unsorted, and thus fails all matches to keys less than that field in the header.
One way to strip the header out is to use grep -v ",Header2," file1.txt >file2.txt then join to file2.txt (assuming the file was sorted to begin with). Another option is to just sort it as it is, allowing the header to remain. This will work if the header row will not be displayed in the final result.

Resources