I am developing a application where the idea is to combine two separate keys into a single key. From my research there exists two common ways to do this:
1) Use concatenation of hashes and perhaps hash again
2) Use HMAC
Which would be the more secure approach, and if possible is there any source that could be used to back a specific approach?
Normally, the 2 keys are XOR'ed together to compute the final key
Related
I am trying to create a naming convention for different objects in DynamoDB, such as tables, partition and sort keys, LSIs, GSIs, attributes, etc. I read a lot of articles and there is no common way to do that but want to learn from real-time examples to choose which one will fit best our needs.
The infrastructure I am working on is based on microservices. Along with this, some of our development environments share the same AWS account. Based on this, I ended up with something like this:
Tables: [Environment].[Service Name].[Table Name].ddb-table
GSIs/LSIs: [Environment].[Service Name].[Table Name].[GSI/LSI Name].ddb-[gsi/lsi]
Partition Key: pk ??? (in my understanding, the keys should have abstract names, because the single table stores versatile data in the same key)
Sort Key: sk ??? (in my understanding, the keys should have abstract names, because the single table stores versatile data in the same key)
Attributes: meaningful but as short as possible as they are kept for every item in the table
Different elements are separated by dot (.)
All names are separated by dashes (kebab-case) and in lower case
Tables/GSIs/LSIs are in singular form
Here is an example:
Table: dev.user-service.user-order.ddb-table
LSI: dev.user-service.user-order.lsi1pk.ddb-lsi
GSI: dev.user-service.user-order.gsi1pk.ddb-gsi
What naming conventions do you follow?
Thanks a lot in advance!
My advice:
Use PK and SK as your partition key and sort key.
Don't put table names into code. Use ParameterStore. For example, if you ever do a table restore it will be to a new table name, and if you want to send traffic to the new name you'll not want to change code.
Thus don't get too fixed to any particular table name. Never try to have code predict a table name. Only have them be consistent to help humans.
Don't put regions in your table names. When you switch to Global Tables they all keep the same name. Awkward!
GSIs can be called GSI1, GSI2, etc. GSI keys are GSI1PK and GSI1SK, etc.
Tag your tables with their name if you ever want to track per-table costs later.
Short yet meaningful attribute names are nice because it reduces storage and can reduce RCU/WCU if you're near the 4kb or 1kb lines.
Use difference accounts for dev, staging, and production. If you want to put the names into tables as well to help you spot "OMG I'm in production" that's fine.
If you have lots of attributes as the item payload which aren't used for GSIs or filtering and are always returned together, consider just storing them as a string or binary which gets parsed client side. You can even compress them. It's more efficient and lower latency because it skips the data marshaling.
I have a service which maintains the data of users. Can I just have userId as 1, 2, 3, etc. as integers which other services can call our account? Or is it some better practice to use string like UUID as ids?
Regarding performance, using string for IDs will indeed affect performance as when querying and analyzing data, matching two numerical values will be faster than matching two string, however you need to take into account that the solution(not limiting to the DB here) needs to support scaling.
This means that at some point you might need to migrate or combine this database with something else, or maybe distribute to this database across multiple server, hence you would need something more solid, where you should be looking up on GUIDs (e.g. https://blog.codinghorror.com/primary-keys-ids-versus-guids/).
TLDR: Use numerical values if you plan on keeping it a small DB for this service only, consider something more complex if you plan on scaling the project later on.
whether there are adequate ways to make the encryption / decryption of files using several different keys? That is, it is possible to make a group of n keys so that any of the group key could encrypt file and also with any of the group key to decrypt the file? (Of course, provided that no other key, in addition to these n group keys to decrypt the file could not be?)
There isn't a way to encrypt with only one key, and be able to decrypt with several other keys individually.
You could have a group key that is encrypted by several keys, so those keys can decrypt the group key as needed, and the group key can be used to encrypt/decrypt data that is shared between users with different keys.
You could do the same thing with the data itself, but this means all keys need to be available for encryption, which doesn't seem to meet your use case.
https://en.wikipedia.org/wiki/Chord_(peer-to-peer)
I've looked into Chord and i'm having trouble understanding exactly what it does.
It's a protocol for a distributed hash table which stores various keys/values for later usage? Is it just an efficient way to look up in the hash table what value for a given key?
Any help such as a basic example would be much appreciated
An example question is say if I hashed inserting string "Hi" to 3 and there were no peers at 3 it would go to the next available peer and store it there right? Or where does it store it's values to?
I already answered a similar question for bittorrent/kademlia, so just to summarize in a more general sense:
DHTs store the values with some redundancy on N nodes whose ID is closest to the target hash.
Considering the vastness of >= 128bit keyspaces it would be extremely unlikely for nodes to exactly match the key. At least in routing schemes where nodes don't adjust their IDs based on content, and chord is one of those.
It's pretty much the same as regular hash tables, hence distributed hash table. You have a limited set of buckets into which the entries are hashed, where the bucket space is much smaller than the potential input keyspace and thus does not precisely match the keys either.
We are planning to move our Transactional data into BigData platform and do the analysis there. One challenge we faced is how can we create auto-increment in bigData. We need it to generate Surrogate keys.
Most common approach is to use a type 3 UUID, i.e. a pseudo-random identifier with extremely, extremely low collision chance.
If you really need sequential (or at least monotonic) identifiers for some reason, then you will need to generate them from a single source, and this single source may need to be separated out as a service, e.g. Twitter Snowflake.
Yes. I agree with UUID approach.
but please make sure that you refactor your ER model to have proper balance between normalised and deNormalised entity.
If you move your existing application ER model as is in BigData architecture then it would slow down performance as it might have to do joins with BigTable.
Also make sure that you know your Key to access data is strong and not changing when data is updated while storing in NoSql database
This link will give u some idea about above
Transition-RDBMS-NoSQL
relational-databases-vs-non-relational-databases