PostgreSQL, R and timestamps with no time zone - r

I am reading a big csv (>1GB big for me!). It contains a timestamp field.
I read it (100 rows to start with ) with fread from the excellent data.table package.
ddfr <- fread(input="~/file1.csv",nrows=100, header=T)
Problem 1 (RESOLVED): the timestamp fields (called "ts" and "update"), e.g. "02/12/2014 04:40:00 AM" is converted to string. I convert the fields back to timestamp with lubridate package mdh_hms. Splendid.
ddfr$ts <- data.frame( mdy_hms(ddfr$ts))
Problem 2 (NOT RESOLVED): The timestamp is created with time zone as per POSIXlt.
How do I create in R a timestamp with NO TIME ZONE? is it possible??
Now I use another (new) great package, PivotalR to write the dataframe to PostGreSQL 9.3 using as.db.data.frame. It works as a charm.
x <- as.db.data.frame(ddfr, table.name= "tbl1", conn.id = 1)
Problem 3 (NOT RESOLVED): As the original dataframe timestamp fields had time zones, a table is created with the fields "timestamp with time zone". Ultimately the data needs to be stored in a table with fields configured as "timestamp without time zone".
But in my table in Postgres the data is stored as "2014-02-12 04:40:00.0", where the .0 at the end is the UTC offset. I think I need to have "2014-02-12 04:40:00".
I tried
ALTER TABLE tbl ALTER COLUMN ts type timestamp without time zone;
Then I copied across. While Postgres accepts the ALTER COLUMN command, when I try to copy (using INSERT INTO tbls SELECT ...) I get an error:
"column "ts" is of type timestamp without time zone but expression is of type text
Hint: You will need to rewrite or cast the expression."
Clearly the .0 at the end is not liked (but why then Postgres accepts the ALTER COLUMN? boh!).
I tried to do what the error suggested using CAST in the INSERT INTO query:
INSERT INTO tbl2 SELECT CAST(ts as timestamp without time zone) FROM tbl1
But I get the same error (including the suggestion to use CAST aargh!)
The table directly created by PivotalR (based on the dataframe) has this CREATE script:
CREATE TABLE tbl2
(
businessid integer,
caseno text,
ts timestamp with time zone
)
WITH (
OIDS=FALSE
);
ALTER TABLE tbl1
OWNER TO mydb;
The table I'm inserting into has this CREATE script:
CREATE TABLE tbl1
(
id integer NOT NULL DEFAULT nextval('bus_seq'::regclass),
businessid character varying,
caseno character varying,
ts timestamp without time zone,
updated timestamp without time zone,
CONSTRAINT busid_pkey PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
ALTER TABLE tbl1
OWNER TO postgres;
My apologies for the convoluted explanation, but potentially a solution could be found at any step in the chain, so I preferred to put all my steps in one question. I am sure there has to be a simpler method...

I think you're confused about copying data between tables.
INSERT INTO ... SELECT without a column list expects the columns from source and destination to be the same. It doesn't magically match up columns by name, it'll just assign columns from the SELECT to the INSERT from left to right until it runs out of columns, at which point any remaining cols are assumed to be null. So your query:
INSERT INTO tbl2 SELECT ts FROM tbl1;
isn't doing this:
INSERT INTO tbl2(ts) SELECT ts FROM tbl1;
it's actually picking the first column of tbl2, which is businessid, so it's really attempting to do:
INSERT INTO tbl2(businessid) SELECT ts FROM tbl1;
which is clearly nonsense, and no casting will fix that.
(Your error in the original question doesn't match your tables and queries, so the details might be different as you've clearly made a mistake in mangling/obfuscating your tables or posted a newer version of the tables than the error. The principle remains.)
It's generally a really bad idea to assume your table definitions won't change and column order won't change anyway. So always be explicit about columns. In this case I think your intention might have actually been:
INSERT INTO tbl2(businessid, caseno, ts)
SELECT CAST(businessid AS integer), caseno, ts
FROM tbl1;
Note the cast, because the type of businessid is different between the two tables.

Related

Adding value to existing database table in RSQLite

I am new to RSQLite.
I have an input document in text format in which values are seperately by '|'
I created a table with the required variables (dummy code as follows)
db<-dbconnect(SQLite(),dbname="test.sqlite")
dbSendQuery(conn=db,
"CREATE TABLE TABLE1(
MARKS INTEGER,
ROLLNUM INTEGER
NAME CHAR(25)
DATED DATE)"
)
However I am struck at how to import values into the created table.
I cannot use INSERT INTO Values command as there are thousands of rows and more than 20+ columns in the original data file and it is impossible to manually type in each data point.
Can someone suggest an alternative efficient way to do so?
You are using a scripting language. The deal of this is literally to avoid manually typing each data point. Sorry.
You have two routes:
1: You have corrected loaded a database connection and created an empty table in your SQLite database. Nice!
To load data into the table, load your text file into R using e.g. df <-
read.table('textfile.txt', sep='|') (modify arguments to fit your text file).
To have a 'dynamic' INSERT statement, you can use placeholders. RSQLite allows for both named or positioned placeholder. To insert a single row, you can do:
dbSendQuery(db, 'INSERT INTO table1 (MARKS, ROLLNUM, NAME) VALUES (?, ?, ?);', list(1, 16, 'Big fellow'))
You see? The first ? got value 1, the second ? got value 16, and the last ? got the string Big fellow. Also note that you do not enclose placeholders for text in quotation marks (' or ")!
Now, you have thousands of rows. Or just more than one. Either way, you can send in your data frame. dbSendQuery has some requirements. 1) That each vector has the same number of entries (not an issue when providing a data.frame). And 2) You may only submit the same number of vectors as you have placeholders.
I assume your data frame, df contains columns mark, roll, and name, corrsponding to the columns. Then you may run:
dbSendQuery(db, 'INSERT INTO table1 (MARKS, ROLLNUM, NAME) VALUES (:mark, :roll, :name);', df)
This will execute an INSERT statement for each row in df!
TIP! Because an INSERT statement is execute for each row, inserting thousands of rows can take a long time, because after each insert, data is written to file and indices are updated. Insert, enclose it in an transaction:
dbBegin(db)
res <- dbSendQuery(db, 'INSERT ...;', df)
dbClearResult(res)
dbCommit(db)
and SQLite will save the data to a journal file, and only save the result when you execute the dbCommit(db). Try both methods and compare the speed!
2: Ah, yes. The second way. This can be done in SQLite entirely.
With the SQLite command utility (sqlite3 from your command line, not R), you can attach a text file as a table and simply do a INSERT INTO ... SELECT ... ; command. Alternately, read the text file in sqlite3 into a temporary table and run a INSERT INTO ... SELECT ... ;.
Useful site to remember: http://www.sqlite.com/lang.html
A little late to the party, but DBI provides dbAppendTable() which will write the contents of a dataframe to an SQL table. Column names in the dataframe must match the field names in the database. For your example, the following code would insert the contents of my random dataframe into your newly created table.
library(DBI)
db<-dbConnect(RSQLite::SQLite(),dbname=":memory")
dbExecute(db,
"CREATE TABLE TABLE1(
MARKS INTEGER,
ROLLNUM INTEGER,
NAME TEXT
)"
)
df <- data.frame(MARKS = sample(1:100, 10),
ROLLNUM = sample(1:100, 10),
NAME = stringi::stri_rand_strings(10, 10))
dbAppendTable(db, "TABLE1", df)
I don't think there is a nice way to do a large number of inserts directly from R. SQLite does have a bulk insert functionality, but the RSQLite package does not appear to expose it.
From the command line you may try the following:
.separator |
.import your_file.csv your_table
where your_file.csv is the CSV (or pipe delimited) file containing your data and your_table is the destination table.
See the documentation under CSV Import for more information.

SQLite3 split date while creating index

I'm using a SQLite3 database, and I have a table that looks like this:
The database is quite big and running queries is very slow. I'm trying to speed up the process by indexing some of the columns. One of the columns that I want to index is the QUOTE_DATETIME column.
Problem: I want to index by date (YYYY-MM-DD) only, not by date and time (YYYY-MM-DD HH:MM:SS), which is the data I currently have in QUOTE_DATETIME.
Question: How can I use CREATE INDEX to create an index that uses only dates in the format YYYY-MM-DD? Should I split QUOTE_DATETIME into 2 columns: QUOTE_DATE and QUOTE_TIME? If so, how can I do that? Is there an easier solution?
Thanks for helping! :D
Attempt 1: I tried running CREATE INDEX id ON DATA (date(QUOTE_DATETIME)) but I got the error Error: non-deterministic functions prohibited in index expressions.
Attempt 2: I ran ALTER TABLE data ADD COLUMN QUOTE_DATE TEXT to create a new column to hold the date only. And then INSERT INTO data(QUOTE_DATE) SELECT date(QUOTE_DATETIME) FROM data. The date(QUOTE_DATETIME) should convert the date + time to only date, and the INSERT INTO should add the new values to QUOTE_DATE. However, it doesn't work and I don't know why. The new column ends up not having anything added to it.
Expression indexes must not use functions that might change their return value based on data not mentioned in the function call itself. The date() function is such a function because it might use the current time zone setting.
However, in SQLite 3.20 or later, you can use date() in indexes as long as you are not using any time zone modifiers.
INSERT adds new rows. To modify existing rows, use UPDATE:
UPDATE Data SET Quote_Date = date(Quote_DateTime);

Query a manual list of data items

I would like to run a query involving joining a table to a manually generated list but am stuck trying to generate the manual list. There is an example of what I am attempting to do below:
SELECT
*
FROM
('29/12/2014', '30/12/2014', '30/12/2014') dates
;
Ideally I would want my output to look like:
29/12/2014
30/12/2014
31/12/2014
What's your Teradata release?
In TD14 there's STRTOK_SPLIT_TO_TABLE:
SELECT *
FROM TABLE (STRTOK_SPLIT_TO_TABLE(1 -- any dummy value
,'29/12/2014,30/12/2014,30/12/2014' -- any delimited string
,',' -- delimiter
)
RETURNS (outkey INTEGER
,tokennum INTEGER
,token VARCHAR(20) CHARACTER SET UNICODE) -- modify to match the actual size
) AS d
You can easily put this in a Derived Table and then join to it.
inkey (here the dummy value 1) is a numeric or string column, usually a key. Can be used for joining back to the original row.
outkey is the same as inkey.
tokennum is the ordinal position of the token in the input string.
token is the extracted substring.
Try this:
select '29/12/2014'
union
select '30/12/2014'
union
...
It should work in Teradata as well as in MySql.

Fastload to temporal table

Had a cvs file containing 3 fields
1,cat,2012-06-16,2013-06-16
1,cat,2013-06-16,
I am trying to load that to temporal table having valid_dt PERIOD(DATE) using fastload script
nonsequenced validtime
INSERT INTO financial.test1 (id,name,valid_dt) values
(:id,:name,period( cast(:start_dt as date FORMAT 'YYYY-MM-DD'),cast(:end_dt as date FORMAT 'YYYY-MM-DD'))
);
Error I got is RDBMS error 3618: Expression not allowed in Fast Load
Insert, column INTERNALPERIODDATETYPE.
could not find any in manuals, they only said it will be possible with fastload.
Thankyou.
FastLoad doesn't allow ANSI style CAST, must be old Teradata style instead:
:start_dt (date, FORMAT 'YYYY-MM-DD')
But there's no old-style PERIOD cast and FastLoad also doesn't allow any kind of expression and PERIOD(...) is an expression.
So you can only load data which can be automatically converted to a PEROD like:
1;cat;(2012-06-16, 2013-06-16)
1;cat;(2013-06-16, 9999-12-31)
Including the parens, the blank after the comma and a different delimiter...
I would suggest simply loading the data as DATEs (or CHARs) into a staging table using FastLoad or MultiLoad, followed by a
nonsequenced validtime
INSERT INTO financial.test1 (id,name,valid_dt) values
select id, name, period(start_dt,COALESCE(end_dt, date '9999-12-31'))
from stagingtable

SELECT clause with a DATETIME column in Sybase 15

I'm trying to do a query like this on a table with a DATETIME column.
SELECT * FROM table WHERE the_date =
2011-03-06T15:53:34.890-05:00
I have the following as an string input from an external source:
2011-03-06T15:53:34.890-05:00
I need to perform a query on my database table and extract the row which contains this same date. In my database it gets stored as a DATETIME and looks like the following:
2011-03-06 15:53:34.89
I can probably manipulate the outside input slightly ( like strip off the -5:00 ). But I can't figure out how to do a simple select with the datetime column.
I found the convert function, and style 123 seems to match my needs but I can't get it to work. Here is the link to reference about style 123
http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.help.ase_15.0.blocks/html/blocks/blocks125.htm
I think that convert's slightly wrongly documented in that version of the docs.
Because this format always has century I think you only need use 23. Normally the 100 range for convert adds the century to the year format.
That format only goes down to seconds what's more.
If you want more you'll need to past together 2 x converts. That is, past a ymd part onto a convert(varchar, datetime-column, 14) and compare with your trimmed string. milliseconds comparison is likely to be a problem depending on where you got your big time string though because the Sybase binary stored form has a granularity of 300ms I think, so if your source string is from somewhere else it's not likely to compare. In other words - strip the milliseconds and compare as strings.
So maybe:
SELECT * FROM table WHERE convert(varchar,the_date,23) =
'2011-03-06T15:53:34'
But the convert on the column would prevent the use of an index, if that's a problem.
If you compare as datetimes then the convert is on the rhs - but you have to know what your milliseconds are in the_date. Then an index can be used.

Resources