I want to query encrypted data from my SQLite database.
For each row, I'm using XOR operation on every value, convert it toBase64 and then INSERT it in the database.
Now I need to find a way to SELECT the encrypted values.
i.e:
SELECT *
FROM table
WHERE name_column BETWEEN 'value1' AND 'value2'
Considering the huge information in my database, how can I do that without having to decrypt all the table to get the wanted rows?
It's impossible. You are using BETWEEN 'value1' AND 'value2'. The database can only see the XORed strings and BETWEEN will not work as expected. Even if you find a way to decrypt the strings on-the-fly with SQLITE (remember XOR calling again will decrypt) it's not very efficient and resource consuming when there are thousand of entries.
So in order to continue with your problem you could have a look at this extension list. SQLITE seems to provide some very basic encryption modules, which can XOR the whole database with a key you defined. (not recommended)
This file describes the SQLite Encryption Extension (SEE) for SQLite.
The SEE allows SQLite to read and write encrypted database files. All
database content, including the metadata, is encrypted so that to an
outside observer the database appears to be white noise.
This file contains the complete source code to the SEE variant that
does weak XOR encryption. Do not take this file seriously. It is for
demonstration purposes only. XOR encryption is so weak that it hardly
qualifies as "encryption".
The way you want to do it won't work, unless you read all values of a column to your Qt program, decrypt them and check if VALUE X is BETWEEN A and B.
Related
obviously editing any column value will change the checksum.
but saving the original value back will not return the file to the original checksum.
I ran VACUUM before and after so it isn't due to buffer size.
I don't have any indexes referencing the column and rows are not added or removed so pk index shouldn't need to change either.
I tried turning off the rollback journal, but that is a separate file so I'm not surprised it had no effect.
I'm not aware of an internal log or modified dates to explain why the same content does not produce the same file bytes.
Looking for insight on what is happening inside the file to explain this and if there is a way to make it behave(I don't see a relevant PRAGMA).
granted https://sqlite.org/dbhash.html exists to work around this problem but I don't see any of these conditions being triggered "... and so forth" is a pretty vague cause
Database files contain (the equivalent of) a timestamp of the last modification so that other processes can detect that the data has changed.
There are many other things that can change in a database file (e.g., the order of pages, the B-tree structure, random data in unused parts) without a difference in the data as seen at the SQL level.
If you want to compare databases at the SQL level, you have to compare a canonical SQL representation of that data, such as the .dump output, or use a specialized tool such as dbhash.
I'm a MSSQL developer who recently was tasked with building a new application using DynamoDB since we use AWS and we wanted a highly scaleable database service.
My biggest concern is data integrity. For example, I have a table for all my users where every row needs to have a username, email, and name field, all strings, with a verified field that's an int. Is there anyway to require all entries in that table to have those fields and to be of that particular type?
Since the application is in PHP I'm using Kettle as my ORM which should prevent me from messing up the data integrity but another developer voiced a concern about if we ever add another application or if someone manually changes some types via the console.
https://github.com/inouet/kettle
Currently, no, you are responsible for maintaining the integrity of your items with respect to the existence of attributes that are not keys on the base table. However, you can use LSI and GSI to enforce data types of attributes (notwithstanding my qualm that this is not a recommended pattern, as it could cause partition heat especially for attributes whose range of values is small). For example, verified seems like it might take only 0 or 1 as a value, so if you create a GSI with PK=verified where verified is a Number, writes to the base table may get throttled by the verified GSI.
We have several pairs of databases that relate to each other, i.e.
db1a and db1b
db2a and db2b
db3a and db3b
etc.
where there are cross-db joints between db1a and db1b, 2a and 2b, etc.
Having an open sqlite database connection, they can be attached, i.e.
ATTACH 'db1a' as a; ATTACH 'db1b' as b;
and later detached and replaced with other pairs when needed.
How should the database "connection" be created, though, as there is originally no real database to attach to? Using the first one as the main database gives it much more significance which is not meaningful - and as it is not detachable it's a hinderance later on.
The only way I can see is opening a :memory: or a temporary ('') database connection. Is there some better option available?
In the absence of any better alternative, :memory: is the best choice. (:temp: is a normal file name, and invalid in many OSes; a temp DB would have an empty file name.)
If you have some meta database that lists all other databases, you could use that.
Please note that when you have multiple attached databases, any change to one of them will involve a multi-database transaction.
So if the various database pairs do not have any relationship with other databases with different numbers, consider using a new connection for each pair.
Is there any way to query a SQLite database for basic meta data such as:
Last date/time updated
Hash of database to indicate "state"
I am just looking for a simple, infrastructural way to have a script evaluate different databases and take a reasonable point of view on whether they are the same "state" as other databases in a different environment (PROD and DEV for instance).
In my experience, if no update, new record, or any change is made to the SQLite database file, the last modified time of the file doesn't change. So the last modified time should suffice for the time of any change made to database.
If 2 database files with same state are only accessed for reading, their modified times are always the same.
Similarly you get the file sizes for comparison.
You can use the whole file to calculate hash. If you consider same data in the database as the same "state" regardless of any difference in the past, then maybe you want hash of the all records in database, which is probably not simple.
One requirement is that when persisting my C# objects to the database I must decide the database ID (surrogate primary key) in code.
Second requirement is that the database type for the key must be int or char(x)... so no uniqueidentifier or binary(16) or the like.
These are unchangeable requirements.
What would be the best way to go about handling this?
One idea is the base64 encoded GUIDs looking like "XSiZtdXcKU68QWe7N96Dig". These are easily created in code and are to me acceptable in URLs if necessary. But will it be too expensive regarding performance (indexing, size) having all primary and foreign keys be char(22)? Off hand I really like this idea.
Another idea would be to create a code version of a database sequence creating incremented integers for me. But I don't know if this is plausible and would need some guidance to secure the reliability. The sequencer must know har far it has come and what about threads that I don't control etc.
I imagine that no table involved will ever exceed 1.000.000 rows... will probably be far less.
You could have a table called "sequences". For each table there would be a row with a counter. Then, when you need another number, fetch it from the counter table and increment it. Put it in a transaction and you will have uniqueness.
However this will suffer in terms of performance, of course.
A simple incrementing int would be the easiest way to ensure uniqueness. This is what the database will do if you let it. If you set the table row to auto_increment, the database will do this for you automatically.
There are no security issues with this, but since you will be handling it yourself instead of letting the database engine take care of it, you will need to ensure that you don't generate the same id twice. This should be simple if you are on a single threaded system, but if your program is distributed you will need to put some effort into ensuring the uniqueness.
Seeing that you have an ASP.NET app, you could do the following (hoping and assuming all users must authenticate themselves before using your app!):
Assign each user a unique "UserID" in your database (can be INT, or CHAR)
Assign each user a "HighestSequentialID" (INT) in your database
When the user logs on, read those values from the database and store them in e.g. a custom principal, or in a cookie, or something else
whenever the user is about to insert a row, create a segmented ID: (UserID).(User's sequential number) and store it as "VARCHAR(20)" - e.g. your UserID is 15 and thus this user's entries would have unique IDs of "15.00001", "15.00002" and so on.
when the user logs off (or at any other time), update its new, highest used sequential ID in the database so that next time around, you'll know what this user has used last
Again - you'll have to do a lot more housekeeping work yourself, and it's always prone to a mishap (assigning a duplicate user ID, or misinterpreting the highest sequential number for that user).
I would strongly recommend trying to get these requirements changed - with these in place, all solutions will be sub-optimal at best, while using the database to handle this would be totally painless.
Marc
For a table below 1.000.000 rows, I would not be too terribly concerned about a char(22) Primary key. Of course the ideal solution for a situation like this would be for each object to have something unique about it that you could leverage for the key, even if it is a multi-part key. The next ideal solution would be to have the requirements changed :)