I wanted to run a query, "Find me the item with smallest 'id' which is larger than some number" ?
Is it possible in dynamodb ?
And how to do it ?
Thanks in advance.
As you probably know, a DynamoDB table can have 2 types of keys: hash keys, or hash+range keys
When you run a query, you need to specify the hash key for the item that you are looking for. If your table has a key of type hash+range, you will automatically get the results back with the range attribute sorted. Your Query request can also optionally add a KeyCondition on the range attribute so that you can require that it be larger than some number. So, yes, what you are looking for is possible, assuming that you design your table appropriately.
For more info, check out the following links:
http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Query.html
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/QueryAndScan.html
Related
I have a table with a composite key; there is both a partition and a sort key. I know that the java sdk allows me to query by just the partition key. However, if I do this then the docs say I will get this iterator back ItemCollection<QueryOutcome>. This means for me to work with this data, I will have to iterate over the entire collection in order to fulfill my needs.
It would be easier if I was able to get back a Map<T, V> type where the key here would be the sort key. That way, I can quickly find rows for a particular sort key. Is this possible? I would rather not iterate over the collection just to find certain items with a certain sort key value.
If you just want an item with a certain sort key, that’s a get item. Don’t do a Query.
You may be confused by DynamoDB’s use of the word Query. That’s not the only way to query the database. It’s one way to query which happens to have the name Query.
We currently have a table that has both a partition key and sort key that make up the primary key.
They're both strings.
Example:
p_id: A#2021-04-21 (+)
s_id: XYZ#2#1634925978 (, , )
A use case of ours is to get all items for a given partition (regioncode+date), but ONLY the latest for a given id and code.
So for example if we had:
A#2021-04-21 , XYZ#2.0#10000 , <other attributes>
A#2021-04-21 , XYZ#2.0#20000 , ...
A#2021-04-21 , QRS#2.0#10000 , ...
We'd only want to get
A#2021-04-21 , XYZ#2.0#20000 , ...
A#2021-04-21 , QRS#2.0#10000 , ...
To do this currently, I'm just doing:
response = self.table.query(
KeyConditionExpression=Key(self.table_key_name).eq(f"{region_id}#{date_key}")
)
And then getting out the items, and having to manually make a map for each sort key prefix up until the epoch milliseconds / timestamp. Then for each key, set the value only if the timestamp is newer than whatever was previously there.
Is there a way to do this faster and utilize the query itself more? I've debated adding the pieces in the ID as attributes and maybe being able to use some kind of filtering but I don't think I see anything that would let me do the equivalent of a "group by" like I want here. Do I have no choice but to create some kind of Index?
Any ideas? Help would be much appreciated!
DDB doesn't support aggregations, MIN/MAX/COUNT/SUM/, like an RDBMS does...
One solution, is to use a "trigger", DDB Streams + Lamdba, to aggregate the needed data for you. See Using Global Secondary Indexes for Materialized Aggregation Queries
You might also want to consider looking at various ways to implement versioning of your DDB data.
If you want to get the latest item, then your Sort Key should end in an ISO8601 standard format date that is determined when the item is added. You can then do a Query and because your sort key is ending in an iso8601 standard date, the first item returned is automatically the last item added. (ISO8601 date format being 'alphabetical' and Sort Keys being ... well automatically sorted'. (and if you tell it to order the response in the opposite direction, then the first item returned is automatically... the first item!)
You will need to do something like SK: SOME_QUALIFIER#YYYY-mm-ddTHH:MM:SSZ00:00 - and then do your query with your SK begins with "SOME_QUALIFIER#". - so you will have to think about how you want to organize this, but it is entirely possible to do taking advantage of the fact that the sort key is automatically sorted.
Alternatively, if you are only going to be doing this once in a while (ie for a generated report or someting) Its OK to put your last updated date (or last created, which ever is more important) in its own attribute (And with composoite type keys you often should anyways!!!) and then create an index with that as your sort key, and something else (either report Type or something) for your PK. Then you can query that PK and get the latest item there
MIN/MAX and many other sql style calls can be tricked by making clever use of the sort key.
I have games table.
To keep it simple, I will add only two fields for the question.
gameId:
deadlineToPlay:
I want to query for all games with deadlineToPlay > than today.
How would I set up the index for this? I thought I could create an index with just deadlineToPlay, but if I understand correctly when querying on hashkey, it has to be exact value. Can't use >.
I would also not like to use a scan, due to costs.
A way to workaround this would be to create or use an existing field which will have constant value (for example, field hasDeadline with value true).
Now you can create the table key like this: hasDeadline as HASH key and deadlineToPlay as SORT key (if the table is already created, you can define this key in a new GSI).
This way you will be able to query by hasDeadline = true and deadlineToPlay > today.
I've got a list of partition keys from one table.
userId["123","456","235"]
I need to get an attribute that they all share. like "username".
What would be the best practice to get them all at once?
Is scan my only option knowing that I know all my partition keys?
Do I know the sort key? yes but only the beginning of it. Therefore I
don't think I could use batchGetItem.
Scan is only appropriate if you don't know the partition keys. Because you know the partition keys you want to search, you can achieve the desired behavior with multiple Query operations.
A Query searches all documents with the specified partition key; you can only query one partition key per request, so you'll need multiple queries, but this will still be significantly more efficient than a single Scan operation.
If you're only looking for documents with a sort key that begins with something, you can include it in your KeyConditionExpression along with the partition key.
For example, if you wanted to only return documents whose sort key begins with a certain string, you could pass something like userId = :user_id AND begins_with(#SortKey, :str) as the key condition expression.
You can efficiently achieve the result by using PartQL SELECT statement. It allows to query array of partition keys with IN operator and apply additional conditions on other attributes without causing a full table scan.
To ensure that a SELECT statement does not result in a full table
scan, the WHERE clause condition must specify a partition key. Use the
equality or IN operator.
I have a table which has two varchar(Max) columns
Column 1 Column 2
-----------------------
URLRewitten OriginalURL
its part of my url re-writing for an asp.net webforms site.
when a url comes in I do a check to see if its in the table if it is i use the OriginalURL.
My question is, if all I'm doing is querying the table for urls and no other table in the database will ever link to this table does it need a dedicated primary key field? like an auto-number? will this make queries faster?
and also how can I make the query's run as faster?
Edit: I do have a unique constraint on URLRewitten.
Edit: ways i'm using this table..
Query when a new Request comes in.. search on URLRewitten to find OriginalURL
When needing to display a link on the site, i query on the OriginalURL to find the URLRewitten url i should use.
When adding a new url to the table i make sure that it doesn't already exist.
thats all the querys i do.. at the moment.
Both columns together would be unique.
Do you need a primary key? Yes. Always. However, it looks like in your case OriginalURL could be your primary key (I'm assuming that there wouldn't be more than one value for URLRewritten for a given value in OriginalURL).
This is what's known as a "natural key" (where a component of the data itself is, by its nature, unique). These can be convenient, though I have found that they're generally more trouble than they're worth under most circumstances, so yes, I would recommend some sort of opaque key (meaning a key that has no relation to the data in the row, other than to identify a single row). Whether or not you want an autonumber is up to you. It's certainly convenient, though identity columns come with their own set of advantages and disadvantages.
For now I suppose I would advise creating two things:
A primary key on your table of an identity column
A unique constraint on OriginalURL to enforce data integrity.
I'd put one in there anyway... it'll make updating alot easier or duplicating an existing rule...
i.e. this is easier
UPDATE Rules SET OriginalURL = 'http://www.domain.com' WHERE ID = 1
--OR
INSERT INTO Rules SELECT OriginalUrl, NewUrl FROM Rules WHERE ID = 1
Than this
this is easier
UPDATE Rules SET OriginalURL = "http://www.domain.com" WHERE OriginalURL = 'http://old.domain.com'
--OR
INSERT INTO Rules SELECT OriginalUrl, NewUrl FROM Rules WHERE OriginalURL = 'http://old.domain.com'
In terms of performance, if your going to be searching by OriginalURL,
you should add an index to that column,
I would use the OriginalURL as your primary key as I would assume this is unique. Assuming your are using SQL-Server you could create an index on RewrittenURL with OrigionalURL as an "Included column" to speed up the performance of the query.
An identity column can help when you search for recent events:
select top 100 * from table order by idcolumn desc
We'd have to know what kind of queries you are running, before we can search for a way to make them faster.
As you are doing your query on the URLRewritten column I don't think adding an auto-generated primary key would help you.
Have you got an index on your URLRewritten column? If not, create one: that should see a big increase in the speed of your queries (perhaps just make URLRewritten your primay key?).
Yes there should be a Primary Key Because you can set INDEX on that Primary Key for Fast Access
I don't think adding auto generated primary key will make your query faster.
However there are are a few things to consider:
I would not be so sure, that never
ever nothing will link to this table
:(.
I've seen a lot of people asking about
how to i.e. remove duplicates from
table like that -- with primary key
it is much easier.
To make this query
faster we need to
know more about this table and ways
of using it...
In my opinion, every table, must have auto generated primary key (i.e. identity in MSSQL).
I don't believe in unique natural keys.