Getting recent N items from DynamoDB - amazon-dynamodb

I just started figuring out DynamoDB.
I have a simple table has date attribute(ex. 20160101) as HASH and created_at attribute(ex. 20160101185332) as RANGE.
I'd like to get latest N items from the table.
First, SCAN command does not have ScanIndexForward option. I think it's not possible with SCAN.
Next, QUERY command. It seems to be work if I repeat QUERY command several times to get enough number of items(cuz, I don't know how many items have same key value). - for example, I can query using today first and repeat for the day before if the result does not give enough items.
How can I do the job more efficiently? Or, can I query without KEY value?

as you described your table, you cant do it more efficiently, and you cant query dynamodb without KEY(hash) value
look at the answer here:
dynamodb get earliest inserted distinct values from a table

Related

DynamoDB Best practice to select all items from a table with pagination (Without PK)

I simply want to get a list of products back from my table and paginated, the pagination part is relatively clear with last_evaluated_key, however all the examples are using on PK or SK, but in my case I just want to get paginated results sort by createdAt.
My product id (uniq uuid) is not very useful in this case. Is the last solution to scan the whole table?
Yes, you will use Scan. DynamoDB has two types of read operation, Query and Scan. You can Query for one-and-only-one Partition Key (and optionally a range of Sort Key values if your table has a compound primary key). Everything else is a Scan.
Scan operations read every item, max 1 MB, optionally filtered. Filters are applied after the read. Results are unsorted.
The SDKs have pagination helpers like paginateScan to make life easier.
Re: Cost. Ask yourself: "is Scan returning lots of data MB I don't actually need?" If the answer is "No", you are fine. The more you are overfetching, however, the greater the cost benefit of Query over Scan.

PeopleSoft Query Manager - 'count' function

I'm using the current version of PeopleSoft and I'm using their Query manager. I've built a query that looks at the job table and a customized version of the job table (so I can see future hires). In order to do this I've created a union. Everything works fine, except now I want to do a count of the job codes.
When I put in a count, I get an error. I don't know how to get it to work properly. I also don't really know how to using the 'having' tab.
I've attached some screenshots, including the SQL code.
SQL:
Having tab
You have a criteria in your query:
AND COUNT(*) = A.JOBCODE
Your job codes are string values that uniquely identify a job. It will never be equal to a count.
If you remove that criteria, your query will work:
The bigger issue is, what do you want to count? If your query was simply:
SELECT DEPTID, JOBCODE, COUNT(*)
This will give the count of employees in this department and job code. In your description, you said that you wanted the count of job codes. But each row has JOBCODE on it. The count of job codes on the row is one. What do you really want? The count of job codes in the database? The count of job codes in the result set?
If you want to get anything other than the count of rows within the group, you are not able to put that logic in PeopleSoft Query. You will need to create a view in AppDesigner and then you can add that to the query.

dynamodb query to select all items that match a set of values

In a dynamo table I would like to query by selecting all items where an attributes value matches one of a set of values. For example my table has a current_status attribute so I would like all items that either have a 'NEW' or 'ASSIGNED' value.
If I apply a GSI to the current_status attribute it looks like I have to do this in two queries? Or instead do a scan?
DynamoDB does not recommend using scan. Use it only when there is no other option and you have fairly small amount of data.
You need use GSIs here. Putting current_status in PK of GSI would result in hot
partition issue.
The right solution is to put random number in PK of GSI, ranging from 0..N, where N is number of partitions. And put the status in SK of GSI, along with timestamp or some unique information to keep PK-SK pair unique. So when you want to query based on current_status, execute N queries in parallel with PK ranging from 0..N and SK begins_with current_status. N should be decided based on amount of data you have. If the data on each row is less than 4kb, then this parallel query operation would consume N read units without hot partition issue. Below link provides the details information on this
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-indexes-gsi-sharding.html
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-modeling-nosql-B.html

DynamoDBScanExpression withLimit returns more records than Limit

Have to list all the records from a DynamoDB table, without any filter expression.
I want to limit the number of records hence using DynamoDBScanExpression with setLimit.
DynamoDBScanExpression scanExpression = new DynamoDBScanExpression();
....
// Set ExclusiveStartKey
....
scanExpression.setLimit(10);
However, the scan operation returns more than 10 results always !!!!
Is this the expected behaviour and if so how?
Python Answer
It is not possible to set a limit for scan() operations, however, it is possible to do so with a query.
A query searches through items, the rows in the database. It starts at the top or bottom of the list and finds items based on set criteria. You must have a partion and a sort key to do this.
A scan on the other hand searches through the ENTIRE database and not by items, and, as a result, is NOT ordered.
Since queries are based on items and scan is based on the ENTIRE database, only queries can support limits.
To answer OP's question, essentially it doesn't work because you're using scan not query.
Here is an example of how to use it using CLIENT syntax. (More advanced syntax version. Sorry I don't have a simpler example that uses resource. you can google that.)
def retrieve_latest_item(self):
result = self.dynamodb_client.query(
TableName="cleaning_company_employees",
KeyConditionExpression= "works_night_shift = :value",
ExpressionAttributeValues={':value': {"BOOL":"True"}},
ScanIndexForward = False,
Limit = 3
)
return result
Here is the DynamoDB module docs

Teradata - joining query for single partition

I need to run a query which joins 5 large table on user_id and filter it on proc_date.
I have planed to do partition on proc_date and partition(5 range partition) on user_id to increase query performance. I keep primary index as well on proc_date and user_id.
"But how can I run the query for just one partition of the user_id at a time? I want to restrict the query to join first partition(on User_id) of every table"
Reason behind this is, once I complete the query for first partition, I can send the output data for next process. While next process is running i can run the query for 2nd partition.
Could anyone please give me some solution to achieve this.

Resources