encryption with type information preserved - encryption

How can we achieve asymmetric key based encryption but with type information preserved, e.g. int to int, string to string, and datetime to datetime etc.
I have a situation where I want this in my application before sending data to any DB. Before sending an object to the document based DB, I want to encrypt the members of the object and set the encrypted value back to the object's members.
The objects are statically typed means let say a c# or java class instance. So the scene is, we cannot assign encrypted value (that is essentially a string) to a non string data type and we don't want to create a copy of the instance holding all the encrypted values in corresponding string members, this way I will lose the type information.
Any help/suggestion is much appreciated.

Before sending an object to the document based DB, I want to encrypt the members of the object and set the encrypted value back to the object's members.
So encode the type along with the data. This is completely independent of encryption. Any way that you'd want to encode the data would have this issue; whatever you'd use to encode this to JSON works precisely the same when you encrypt it.
It's important to note that encrypting data is not "essentially a string." The output of encryption is a random series of bytes.
How you implement this in practice is highly dependent on your use case and your language. Different languages have very different kinds of types. But in every language, there is some way to encode data into a serialized format (JSON, Protobuf, etc). That is the thing you want to focus on. Once you are able to convert to and from any serialized format, that format can be encrypted. It's the serialization that matters, not the encryption.

Related

Querying encrypted data in SQLite

I want to query encrypted data from my SQLite database.
For each row, I'm using XOR operation on every value, convert it toBase64 and then INSERT it in the database.
Now I need to find a way to SELECT the encrypted values.
i.e:
SELECT *
FROM table
WHERE name_column BETWEEN 'value1' AND 'value2'
Considering the huge information in my database, how can I do that without having to decrypt all the table to get the wanted rows?
It's impossible. You are using BETWEEN 'value1' AND 'value2'. The database can only see the XORed strings and BETWEEN will not work as expected. Even if you find a way to decrypt the strings on-the-fly with SQLITE (remember XOR calling again will decrypt) it's not very efficient and resource consuming when there are thousand of entries.
So in order to continue with your problem you could have a look at this extension list. SQLITE seems to provide some very basic encryption modules, which can XOR the whole database with a key you defined. (not recommended)
This file describes the SQLite Encryption Extension (SEE) for SQLite.
The SEE allows SQLite to read and write encrypted database files. All
database content, including the metadata, is encrypted so that to an
outside observer the database appears to be white noise.
This file contains the complete source code to the SEE variant that
does weak XOR encryption. Do not take this file seriously. It is for
demonstration purposes only. XOR encryption is so weak that it hardly
qualifies as "encryption".
The way you want to do it won't work, unless you read all values of a column to your Qt program, decrypt them and check if VALUE X is BETWEEN A and B.

What's the best way to store an array in a relational database?

I'm currently trying to design a database and I'm not too sure about the best way to approach a dynamically sized array field of one of my objects. My first thought is to use a column in my object to store an array of integers. However the more I read, the more I think this isn't the best option. Concrete example wise, I have a player object that stores 0 to many items, which are represented by an integer. What is the best way to represent this?
If that collection of values is atomic, store them together. Meaning, if you always care about the entire group, if you never search for nested values and never sort by nested values, then they should be stored together as a single field value.
If not, they should be stored in a separate table, each value bring a row , each assigned the parent ID (foreign key) of a record on the other table that "owns" them as a group.
For example, a clump of readings from a scientific instrument that are only ever used together as a collection for analysis should be stored together in a field. In contrast, a list of phone numbers for a customer that may often need to be queried for an individual number should probably be broken up into single phone number per row in a related child table.
For more info, search on the term "database normalization".
Some databases, support an array as a data type. For example, Postgres allows you to define a column as a one-dimension array, or even a two dimension array.
If your database does not support array as a type of column definition, then you may have three alternatives:
XML/JSONTransform you data collection into an XML or JSON document if your database your database supports that type. For example, Postgres has basic support for storing, retrieving, and non-indexed searching of XML using XPath. And Postgres offers excellent industry-leading support for JSON as a data type including indexed support on nested values with its jsonb data type where incoming JSON is parsed and stored in an internally-defined binary format. This feature addresses one of the main reasons people consider using the so-called “NoSQL” systems, looking to store and search semi-structured data.
TextCreate a string representation of your data to store as text.
BLOBCreate a binary value to store as a binary large object (BLOB).

How to make printed pdf files authentic to prevent forging or cloning using asp.net

please i have a web application developed using asp.net which generates certificates (pdf files) using data in xml files. What can i include (e.g like using a barcode or something) to make the certificates authentic to make it impossible to forge or clone. I have been on this for like a week now. Google hasn't been helpful either. Please assist.
thanks
I can explain how to achieve this, but as is customary on SO, you'll have to write the code yourself.
You are creating a PDF file based on XML data. When the document is printed, you can't use a digital signature as digital signatures check the validity of the document at the byte level: a hash is created of the document, this hash is signed with a private key (some extra stuff is added) and that signed hash is integrated into the file.
Now when somebody wants to check the integrity of the file, a new hash of the bytes in the file is created (hash #1) and the encrypted hash is decrypted (hash #2) using the public key that corresponds with the private key that was used to encrypt the hash. If hash #1 differs from hash #2, the document was forged.
When you print a document, there are no bytes to check. As Chris Haas points out, you can't protect the document. However: you can protect the data. For instance, you could add the original XML to the document in the form of a 2D barcode (you can choose which type of barcode). This way, people can scan the original data. You can then add a second barcode. For instance: you make a hash of the original XML and you encrypt it with a private key. You add this encrypted hash as a barcode (you may have to use Base64 encoding).
Now when somebody has scanned the first barcode for the data, he can scan the second barcode for the "signature". He needs to decrypt the scanned signature using the public key and compare the resulting hash with a hash of the scanned data. If both hashes are identical, the scanned data equals the original data.

What is a hash map in programming and where can it be used

I have often heard people talking about hashing and hash maps and hash tables. I wanted to know what they are and where you can best use them for.
First you shoud maybe read this article.
When you use lists and you are looking for a special item you normally have to iterate over the complete list. This is very expensive when you have large lists.
A hashtable can be a lot faster, under best circumstances you will get the item you are looking for with only one access.
How is it working? Like a dictionary ... when you are looking for the word "hashtable" in a dictionary, you are not starting with the first word under 'a'. But rather you go straight forward to the letter 'h'. Then to 'ha', 'has' and so on, until you found your word. You are using an index within your dictionary to speed up your search.
A hashtable does basically the same. Every item gets an unique index (the so called hash). You use this hash for lookups. The hash may be an index in a normal linked list. For instance your hash could be a number like 2130 which means that you should look at position 2130 in your list. A lookup at a known index within a normal list is very easy and fast.
The problem of the whole approach is the so called hash function which assigns this index to each item. When you are looking for an item you should be able to calculate the index in advance. Just like in a real dictionary, where you see that the word 'hashtable' starts with the letter 'h' and therefore you know the approximate position.
A good hash function provides hashcodes that are evenly distrubuted over the space of all possible hashcodes. And of course it tries to avoid collisions. A collision happens when two different items get the same hashcode.
In C# for instance every object has a GetHashcode() method which provides a hash for it (not necessarily unique). This can be used for lookups and sorting with in your dictionary.
When you start using hashtables you should always keep in mind, that you handle collisions correctly. It can happen quite easily in large hashtables that two objects got the same hash (maybe your overload of GetHashcode() is faulty, maybe something else happened).
Basically, a HashMap allows you to store items with identifiers. They are stored in a table format with the identifier being hashed using a hashing algorithm.
Typically they are more efficient to retrieve items than search trees etc.
You may find this helpful: http://www.relisoft.com/book/lang/pointer/8hash.html
Hope it helps,
Chris
Hashing (in the noncryptographic sense) is a blanket term for taking an input and then producing an output to identify it with. A trivial example of a hash is adding the sum of the letters of a string, i.e:
f(abc) = 6
Note that this trivial hash scheme would create a collision between the strings abc, bca, ae, etc. An effective hash scheme would produce different values for each string, naturally.
Hashmaps and hashtables are datastructures (like arrays and lists), that use hashing to store data. In a hashtable, a hash is produced (either from a provided key, or from the object itself) that determines where in the table the object is stored. This means that as long as the user of the hashtable is aware of the key, retrieving the object is extremely fast.
In a list, in comparison, you would need to in some way search through the list in order to find your sought object. This also represents the backside of hashtables, which is that it is very complicated to find an object in it without knowing the key, because where the object is stored in the table has no relevance to its value nor when it was inputed.
Hashmaps are similar to hashtables, but only one example of each object is stored in it (hence no key needs to be provided, the object itself is the key).
This is of course a very simple explanation, so I suggest you read in depth from this point on. I hope I didn't make any silly mistakes. =)
Hashmap is used for storing data in key value pairs. We can use a hashmap for storing objects in a application and use it further in the same application for storing, updating, deleting values. Hashmap key and values are stored in a bucket to a specific entry, this entry location is determined using Hashcode function. This hashcode function determines the hash where the value is stored. The detailed explanantion of how hashmap works is described in this video: https://youtu.be/iqYC1odZSNo
Hash maps saves a lot of time as compared to other search criteria. We have a hash key that corresponds to a hash code which further helps to find its index value. In terms of implementation, hash maps takes a string converts it into an integer and remaps it to convert it into an index of an array which helps to find the required value.
To go in detail we can look for handling collisions in hash maps. Like instead of using array we can go with the linked list.
There is a short video available to understand it.
Available here :
Implementation example --> https://www.youtube.com/watch?v=shs0KM3wKv8
Sample:
int hashCode(String s)
{
logic
}

Shorter GUID using CRC

I am making a website in ASP.NET and want to be able to have a user profile which can be accessed via a URL with the users id at the end. Unique identifier is obviously a bad choice as it is long and (correct me if i am wrong) not really URL friendly.
I was wondering if i produced a unique idnetifier on the ASP page then hashed it using CRC (or something similar) if it would still be as unique (or even unique at all) as just a GUID.
For example:
The GUID 6f1a7841-190b-4c7a-9f23-98709b6f8848 equals CRC E6DC2D44.
Thanks
A CRC of a GUID would not be unique, no. That would be some awesome compression algorithm otherwise, to be able to put everything into just 4 bytes.
Also, if your users are stored in the database with a GUID key, you'd have trouble finding the user that matches up to this particular CRC.
You'd be better off using a plain old integer to uniquely identify a user. If you want to have the URL unguessable, you can combine it with a second ticket (or token) parameter that's randomly generated. It doesn't have to be unique, because you use the integer ID for identifying the user. You can think of it more or less as a password.
Any calculated hash contains less information (bits) than the original data and can never be as unique. There are always collisions.
If the users have a username then why not use that? It should be unique (I would hope!) and would probably be short and URL friendly. It would also be easy for users to remember, too, and fits in the with the ASP.NET membership scheme (since usernames are the "primary key" in membership providers). I don't see any security issue as (presumably) only authenticated users would be able to access it, anyway?
No, it won't be as unique, because you're losing information from it. If you take a 32 character hex string and convert it to an 8 character hex string then, by definition, you're losing 75% of the data.
What you can do is use more characters to represent the data. A guid uses ony 16 characters (base 16) so you could use a higher base (e.g. base 64) which lets you encode the same amount of information in fewer characters.
I don't see any problem with the normal GUID in HTTP URL. If you want the shorted form of Guid use the below.
var gid = Guid.NewGuid().ToString("N");
This will give a GUID without any hyphen or special characters.
A GUID is globally unique, meaning that you won't run into clashes, hopefully ever. These are usually based on some sort of time based calculation with randomness interjected. If you want to shorten something using a hash, such as CRC, then then uniqueness it not automatic, but as long as you manage your uniqueness yourself (checking to see if the hash is not currently assigned to another user and if so, regenerating until you get a unique one) then you could use almost anything.
This is the way a lot of url-shorteners work.
If you use a CRC of a UUID/GUID as ID you could also use a shorter ID in the first place.
The idea of an UUID/GUID as ID is IMO that you can create IDs on disconnected systems and should have no problem with duplicate IDs.
Anyway who is going to enter the URL for the profile page by hand anyway?
Also I see no problems with URL friendliness of an UUID/GUID - there are no chars which are not allowed by http.
How are users identified in the database (or any other place you use to store your data)?
If they are identified using this GUID I'd say, you have a really good reason for this, because this makes searching for a special ID really complicated (even when using a binary tree); there is also more space needed to store these values.
If they are identified by an unique integer value, why not using this to call the user profile?
You can shorten a GUID to 20 printable ASCII characters, with it still being unique and without losing any information.
Take a look at this blog post by Jeff Atwood:
Equipping our ASCII Armor

Resources