R: dbplyr: postgres: How to create an index on a table - r

A user has a large table (3+ billion rows).
To speed up queries for the next few months, an index on the remote database must be created.
Assuming there is a connection called conn - what is the best way to create an index and make it persist after disconnection from the database.
e.g.,
library(DBI)
sql<-'CREATE INDEX idx_pmid ON medcit (pmid ASC);'
dbExecute(conn,sql)
dbDisconnect(conn)
The code above seem to work but how can the index be somehow verified (make sure it truly exist and speeds up future queries)? In other words - how can a user check the existence of the index? Also, do I need to issue COMMIT command somehow?

To create an index on a table:
dbGetQuery(conn, "CREATE INDEX index_name ON public.db_name USING btree (variable_name)")

Related

MariaDB table missing but I can't recreate it

Something went wrong during a structure synchronization between two databases.
One of our production databases now is missing a key table 'customers' (which just about every other table has foreign keys to)
I'm trying to recreate the table from last night's backup (I don't want to restore the entire db - just recreate this table as the data in it does not change that much and I don't want to lose the transactional data from today)
The hassle seems to be that all the foreign key data for this table still exists in INFORMATION_SCHEMA.KEY_COLUMN_USAGE and I am getting 121 and 150 errors when I try run the CREATE TABLE query.
I've manually deleted all FK to the missing table and I am still getting errno 150 when trying to recreate the table. Any ideas where else there might be lost references to this table that is stopping me creating it again?
This was eventually resolved by multiple consultations of the SHOW ENGINE INNODB STATUS query.
The missing table had various indexes - example on the customer name there was an index "customer_name_idx". The CREATE TABLE query asked for this index to be created. The show engine innodb status return was "could not create table because index customer_name_idx already exists."
There was no reference to this index, to any primary key or to the table itself in any of the meta-data tables - I checked
INFORMATION_SCHEMA.INNODB_SYS_INDEXES
INFORMATION_SCHEMA.TABLE_SCHEMA
INFORMATION_SCHEMA.STATISTICS -INFORMATION_SCHEMA.TABLE
so I could not explain why this error was being thrown.
My guess, after the fact, is that MySQL is holding a cached copy of the information_schema meta data in memory and was consulting that, and maybe that only gets refreshed if you restart MySQL?
The solution was to give the indexes new names as a short term fix, and to rename them during our next scheduled downtime.
Once these were made, the table was created and the backup data could be reinstated.

Include a hashtag in dbGetQuery()

I'm trying to use RJDBC to connect to a SAP HANA database and query for a temporary table, which is stored with a #-prefix:
test <- dbGetQuery(jdbcConnection,
"SELECT * FROM #CONTROL_TBL")
# Error in [...]: invalid table name: Could not find table/view #CONTROL_TBL in schema USER
If I execute the SQL statement in HANA, it works perfectly fine. I'm also able to query for permanent tables. Therefore I assume that R doesn't pass over the hashtag. Inserting escapes like "SELECT * FROM \\#CONTROL_TBL" however didn't solve my problem.
It's not possible to query for the data of a local or global temporary table from a different session, since they are by definition session-specific. In the case of a global temporary table one can query for the metadata of the table because they are shared across sessions.
Source: Tutorial for HANA temporary tables
You have to double-quote the table because it contains special characters, see SAP Help, identifiers for details.
test <- dbGetQuery(jdbcConnection,
'SELECT * FROM "#CONTROL_TBL"')
See also related discussion on stackoverflow.
Ok, local temporary tables are always only visible to the session in which they've been defined, while global temporary tables are visible just like normal tables, but the data is session private.
So, if you created the local temp. table (name starts with #) in a different session, then no wonder it cannot be found.
For your example, the question is: why do you need a temporary table in the first place?
Instead of that, you could e.g. define a view or a table function to select data from.

Change a large dynamodb table to use LSIs instead of GSIs

I have a live table in dynamo with about 28 million records in it.
The table has a number of GSI that I'd like to change to be LSIs however LSIs can only be created when the table is created.
I need to create a new table and migrate the data with minimum downtime. I was thinking I'd do the following:
Create the new table with the correct indexes.
Update the code to write records to the old and new table. When this starts, take a note of the timestamp for the first record.
Write a simple process to sync existing data for anything with a create date prior to my first date.
I'd have to add a lock field to the new table to prevent race conditions when an existing record is updated.
When it's all synced we'd swap to using the new table.
I think that will work, but it's fairly complicated and feels prone to error. Has anyone found a better way to do this?
Here is an approach:
(Let's refer to the table with GSIs as oldTable and the new table with LSIs as newTable).
Create newTable with the required LSIs.
Create a DynamoDB tirgger for the oldTable such that for every new record coming to the oldTable insert the same record to the newTable. (This logic needs to be in the AWS Lambda).
Make your application point to the newTable.
Migrate all the records from oldTable to newTable.

copy sqlite index from one database to another

I have a massive database (~800 GB) with several indexed tables. I need to copy one table (including indexes) to a new database. Copying the table itself is pretty straightforward.
$ sqlite3 newDB
> attach database 'oldDB.db' as oldDB
> create table newTable as select * from oldDB.oldTable
But I can't seem to find any information on a way to also copy over an index. Is there any way to do this? Since the tables are so large I'd really like to avoid having to re-index them.
SQLite has no mechanism to copy index contents.
If this particular table would be the majority of the data in the database, the fastest way to copy it would be to copy the database file and then to drop all other tables.
But otherwise, there you cannot avoid the reindex operation.
Please note that CREATE TABLE ... AS ... does copy only the contents of the table, but not the complete table definition (such as column types or constraints).
Copying large table in a single transaction is not a good idea. If you really have to you should turn off journaling first (destination database):
PRAGMA journal_mode=OFF;
As the others have stated, the index cannot be broken out. I suspect that time spent copying the database and then dropping a very large table would be longer than just -> 1. creating the new destination database, 2. determining the original CREATE TABLE statement (from the SQLITE_MASTER table of the source database) and recreating the table in the destination database. Then 3. just ATTACH your destination database to the source database and INSERT INTO destinationdb.tablename SELECT * FROM sourcedb.tablename;* to get the copy rolling.

Dropping a SQLite database index and then adding it back on

I am inserting a large number of records into a SQLite database on Android. To improve insert times, I am considering creating the index on the table after data has been fully added.
My question is, at what point does the database actually build the index against values on the table? Does it happen as soon as I issue the SQL statement (create index index_name on table ...), or can the database defer it until the first query arrives?
Thanks,
Ranjit
It creates the index immediately you issue the create index command. The relevant code is in sqlite3CreateIndex and this will create the index and write it to disk (except for the special case where it's called as part of a database open operation but that's not the case when a user creates an index).

Resources