guid in System.Guid - guid

As per MSDN, they describe System.Guid.NewGuid() as ..
The chance that the value of the new Guid will be all zeros or equal to any other Guid is very **low**
Will it be a bad idea to set customerID of Customer table to "uniqueidentifier" and generate the unique id using the System.Guid.NewGuid() ? How can I assure the method will generate only unique IDs ?

Unless you have very good reasons to use Guid as ID, I would recommend against using Guid as key. Guids take a lot of space in a database, and provide no benefit in common scenarios. Plus they do not play well with indexes.
Why don't you set the CustomerID as an integer and set it to auto-generate its value when you insert a new record?

Related

nullable GSI vs Sparse Index

I know AWS questions are rarely getting an answer but I will try my luck. It's a pretty easy one but I cannot find an answer to that
Let's say I have the simples table ever with 2 'column'. token(Partition Key) and userId
What I want is to have a 1000 of tokens, and once a user signs up, a token will get assigned to the user. Basically the userId property in the table will be populated. Also, I want to be able to query, not scan, by both token and userId
In order to query by userId I can use it as a GSI, BUT it states that GSIs should not be null. All my entries in the beginning will have the token property, but the userId will be empty until it actually gets assigned.
What can I use for this scenario? I thought about a Sort Key as it's a Sparse Index, but as far as I know I cannot query ONLY by the Sort Key itself.
Just do what you’re thinking there and create a GSI based on the userID. For the base table don’t include a userID attribute at all until the token has one. This isn’t relational. You don’t need every row to have all attributes. Only the key attributes have to be provided.
So:
Early on all items are just the token attributes with no other attributes
You have a GSI with userID as its PK
Then add userID attributes to the tokens when assigned, and that will get propagated to the GSI

DynamoDB : Good practice to use a timestamp field in a Primary Key

I want to store and retrieve data from a DynamoDB table.
My data (an item = a review a user gave on a feature of an app) have the following attributes :
user string
feature string
appVersion string
timestamp string
rate int
description string
There is multiple features, on multiple versions of the app, and an user can give multiple reviews on these features. So I would like to use (user, appVersion, feature, timestamp) as a primary key.
But it does not seem to be possible to use that much attributes in a primary key in DynamoDB.
The first solution I implemented is to use user as a Partition Key, and a hash of (appVersion, feature, timestamp) as a Sort Key (in a new field named reviewID).
My problem is that, I want to retrieve an item for a given user, feature, appVersion without knowing the timestamp value (let's say I want the item with the latest timestamp, or the list of all items matching the 3 fields)
Without knowing the timestamp, I can't build the Sort Key necessary to retrieve my item. But if I remove the timestamp from the Sort Key, I will not be able to store multiple items having the same (user, appVersion, feature).
What would be the proper way to handle this usecase ?
I am thinking about using a hash of (user, appVersion, feature) as a Partition Key, and the timestamp as a Sort Key, would this be a correct solution ?
Put the timestamp at the end of your SK and then when you Query the data you use begins_with on the SK.
PK SK
UserID appVersion#feature#timestamp
This will allow you to dynamically query the data. For example you want all the users votes for a specific appversion
SELECT * FROM Mytable WHERE PK= 'x' AND SK BEGINS_WITH('{VERSION ID}')
This is done using a Query command.
The answer from Lee Hannigan will work, I like it.
However, keep in mind that accessing a PK is very fast because its hash-based.
I am thinking about using a hash of (user, appVersion, feature) as a
Partition Key, and the timestamp as a Sort Key, would this be a
correct solution?
This might also work, the table would look like this
PK SK
User#{User}AppVersion#{appVersion}#Feature#{feature} TimeStamp#{timestamp}
If you always know the user, appVersion, and the feature, this will be more optimal, because the SK lookup is O(logN)
one way
HASH string "modelName": "user"
RANGE string "id": "b0d5be50-4fae-11ed-981f-dbffcc56c88a"
uuid himself can be used for as timestamp
when searching you could search using reverse index
Another way
HASH string "modelName": "user"
RANGE string "createdAt" "2019-10-12T07:20:50.52Z"
createdAt, use time format rfc3339
when searching you could search using reverse index
Put down on paper what you need and you'll find others way to manage indes HASH/RANGE

Using a GUID as entity Id vs the entity's "actual" Id

In every cosmos db repository example I've seen, the id/row key has been generated like this: {partitionKey}:{Guid.newGuid()}. I'm working on a web api where the user won't necessarily have any way of knowing what this random GUID is. But they will know the EmployeeId, ProjectId etc. of the respective object, so I'm wondering if there are any issues with using i.e. EmployeeId as both the partition key and Id?
There's nothing technically wrong with the approach of setting id and partition key the same however you will have just one document per partition and that's bad design IMHO as all your read queries will be cross-partition queries (e.g. listing all employees).
One approach could be to set the partition key as the type of the entity (Employee, Project etc.) and then set the id as the unique identifier of the entity (employee id, project id etc.).
To be honest, if you know the partition key AND the item id, you can do a Point read which is the fastest.
We used to also take the approach of using random guids for all item IDs, but this means you will always need to know this id and partition key. Sometimes a more functional key as the item ID makes more sense so have a good thought about it!
And remember, an item ID is not unique, the uniqueness is only within the partition key.
So you could have two items with the same item ID and different partition key.

DynamoDB query by non-partition fields

Given a DynamoDB table that looks similar to:
sessionId: String
deviceType: String (mobile/tablet/computer/...)
networkType: String (wifi/ethernet/3g/4g/...)
There may be some other fields.
I need to be able to look up a session id given the other parameters. SQLish:
SELECT sessionId WHERE deviceType="Mobile"
SELECT sessionId WHERE networkType in (wifi, ethernet) AND deviceType="Tablet"
But from what I understand, querying in DynamoDB always requires the partition key (sessionId).
Is there an alternative layout to this table that will allow for better querying? We're still in setup phase, so it can be changed.
To be efficient and cost effective, I suggest you to create 2 Global Secondary Indexes (GSI). The PK will be "deviceType" and "networkType". For the SK and I don't have enough information to suggest something. Hence, no need to project all attributes because you only want to retrieve sessionId which is projected by defaut because it is a PK.
To sum up the data model:
PK Attributes
Table: sessionId deviceType, networkType, ...
GSI_1: deviceType sessionId, networkType, ...
GSI_2: networkType sessionId, deviceType, ...
For example, while querying GSI_1, you'll use PK="Mobile" for example to retrieve all related sessionId.
Doing this way is really fast and cost effective as the opposite as scan.

How to support GUID in Windows Azure Mobile services

It is specifically mention that WAMS needs a int ID column to work in SQL Azure. However when developing enterprise apps over distributed databases, GUIDs are the preferred Primary key to have. How does one get around avoiding int ID column and support GUID?
If that cannot be done then how does one go about syncing data on the cloud from multiple standalone databases on various tablets/mobile the app using WAMS is running on?
An update on this issue - as of last week, the mobile services now support arbitrary strings as the ids for the column - check out this post for more information. You can now insert data with an 'id' value (which you couldn't before), so you can use a Guid value on insert. Also, if you don't send anything on the Id column on insert (or that value is set to null), the server will by default generate an unique identifier for the column.
At present, I don't think that its possible to use a GUID in the ID column. The documentation for the Mobile Services server side scripts specify that for the Delete function, the ID must be a javascript Number type. As far as I can see, all of the available sample code, and the code that you can download from the portal is quite explicit in using an integer type for the ID.
You'll have to come up with a way of generating a unique integer value whenever a new record is created. The example here uses a tick count in Insert script, which is probably OK for a low volume application, but it would need to be made more robust, perhaps by generating a number based on the user's identity and combining it with the tick count.
I'm a little late to this but I have found you can use a GUid as a primary key to a mobile services table. A couple of points though. Set the JSON property to lower case "id" and use a nullable guid, this allows inserting when there a default on the id column (NewId())
[JsonProperty(PropertyName = "id")]
public Guid? Id { get; set; }
Ash..

Resources