I don't get the concept of limits for query/scan in DynamoDb.
According to the docs:
A single Query operation can retrieve a maximum of 1 MB of data.This
limit applies before any FilterExpression is applied to the results.
Let's say I have 10k items, 250kb per item, all of them fit query params.
If I run a simple query, I get only 4 items?
If I use ProjectionExpression to retrieve only single attribute (1kb
in size), will I get 1k items?
If I only need to count items (select: 'COUNT'), will it count all
items (10k)?
If I run a simple query, I get only 4 items?
Yes
If I use ProjectionExpression to retrieve only single attribute (1kb in size), will I get 1k items?
No, filterexpressions and projectexpressions are applied after the query has completed. So you still get 4 items.
If I only need to count items (select: 'COUNT'), will it count all items (10k)?
No, still just 4
The thing that you are probably missing here is that you can still get all 10k results, or the 10k count, you just need to get the results in pages. Some details here. Basically when you complete your query, check the LastEvaluatedKey attribute, and if its not empty, get the next set of results. Repeat this until the attribute is empty and you know you have all the results.
EDIT: I should say some of the SDKs abstract this away for you. For example the Java SDK has query and queryPage, where query will go back to the server multiple times to get the full result set for you (i.e. in your case, give you the full 10k results).
For any operation that returns items, you can request a subset of attributes to retrieve; however, doing so has no impact on the item size calculations. In addition, Query and Scan can return item counts instead of attribute values. Getting the count of items uses the same quantity of read capacity units and is subject to the same item size calculations. This is because DynamoDB has to read each item in order to increment the count.
Managing Throughput Settings on Provisioned Tables
Great explanation by #f-so-k.
This is how I am handling the query.
import AWS from 'aws-sdk';
async function loopQuery(params) {
let keepGoing = true;
let result = null;
while (keepGoing) {
let newParams = params;
if (result && result.LastEvaluatedKey) {
newParams = {
...params,
ExclusiveStartKey: result.LastEvaluatedKey,
};
}
result = await AWS.query(newParams).promise();
if (result.count > 0 || !result.LastEvaluatedKey) {
keepGoing = false;
}
}
return result;
}
const params = {
TableName: user,
IndexName: 'userOrder',
KeyConditionExpression: 'un=:n',
ExpressionAttributeValues: {
':n': {
S: name,
},
},
ConsistentRead: false,
ReturnConsumedCapacity: 'NONE',
ProjectionExpression: ALL,
};
const result = await loopQuery(params);
Edit:
import AWS from 'aws-sdk';
async function loopQuery(params) {
let keepGoing = true;
let result = null;
let list = [];
while (keepGoing) {
let newParams = params;
if (result && result.LastEvaluatedKey) {
newParams = {
...params,
ExclusiveStartKey: result.LastEvaluatedKey,
};
}
result = await AWS.query(newParams).promise();
if (result.count > 0 || !result.LastEvaluatedKey) {
keepGoing = false;
list = [...list, ...result]
}
}
return list;
}
const params = {
TableName: user,
IndexName: 'userOrder',
KeyConditionExpression: 'un=:n',
ExpressionAttributeValues: {
':n': {
S: name,
},
},
ConsistentRead: false,
ReturnConsumedCapacity: 'NONE',
ProjectionExpression: ALL,
};
const result = await loopQuery(params);
Related
I have 2 vertices and an edge named user, device, ownership respectively.
My business logic is when I receive device information, I upsert it with dateCreated and dateUpdated fields added. If I inserted that device then I insert new user with default values and create edge connection to it. If I update I simple return already connected user as a result.
Without losing atomicity how can I achieve this?
I tried single AQL query but without condition it is not possible it seems and traversal also is not supported with insert/update operation.
I can do separate queries but that loses atomicity.
var finalQuery = aql`
UPSERT ${deviceQuery}
INSERT MERGE(${deviceQuery},{dateCreated:DATE_NOW()})
UPDATE MERGE(${deviceQuery},{dateUpdated:DATE_NOW()})
IN ${this.DeviceModel}
RETURN { doc: NEW, type: OLD ? 'update' : 'insert' }`;
var cursor = await db.query(finalQuery);
var result = await cursor.next();
if (result.type == 'insert') {
console.log('Inserted documents')
finalQuery = aql`
LET user=(INSERT {
"_key":UUID(),
"name": "User"
} INTO user
RETURN NEW)
INSERT {
_from:${result.doc._id},
_to:user[0]._id,
"type": "belongs"
}INTO ownership
return user[0]`;
cursor = await db.query(finalQuery);
result = await cursor.next();
console.log('New user:',result);
}
You can try something like this
Upsert ....
FILTER !OLD
Let model = NEW
LET user= First(INSERT {
"_key":UUID(),
"name": "User"
} INTO user
RETURN NEW)
INSERT {
_from:model._id,
_to:user._id,
"type": "belongs"
}INTO ownership
return user
I end up separating the modification and selection queries.
var finalQuery = aql`
LET device=(
UPSERT ${deviceQuery}
INSERT MERGE(${deviceQuery},{dateCreated:DATE_NOW()})
UPDATE MERGE(${deviceQuery},{dateUpdated:DATE_NOW()})
IN ${this.DeviceModel}
RETURN { doc: NEW, type: OLD ? 'update' : 'insert' })
FILTER device[0].type=='insert'
LET user=(INSERT {
"_key":UUID(),
"name": "User"
} INTO user
RETURN NEW)
INSERT {
_from:device[0].doc._id,
_to:user[0]._id,
"type": "belongs"
}INTO ownership
return user[0]`;
var cursor = await db.query(finalQuery);
var result = await cursor.next();
if (result == null) {
const deviceId=this.DeviceModel.name+"/"+queryParams._key;
finalQuery = aql`
FOR v,e,p IN 1..1
OUTBOUND ${deviceId} ownership
FILTER e.type=="belongs"
RETURN v `;
cursor = await db.query(finalQuery);
result = await cursor.next();
isUpdate=true;
}
This way I ensure the atomicity. There are improvements for controling if cursor.extra.stats.writesExecuted true etc.
I am using a >= query again a collection. To test the script, I just have 4 entries in my collection.
My query is:
...
.where("Workdesc", ">=", "imple") // no returning as expected
.get()
.then(querySnapshot => {
querySnapshot.forEach(function(doc) {
console.log("Result");
console.log(doc.id, " ===> ", doc.data());
});
});
Workdesc of all 4 docs are -
"kj implementation"
"hb implementation urgent"
"sharu implementation quick response needed"
"cb implementation urgent job"
Result according to me, it should have returned all 4 docs but it is returning only 2. I am attaching screenshot of the console log and Firebase console:
How can I get the result back with partial letter anywhere in the string.
Your query is working as expected. When you perform comparisons with strings, they are sorted lexicographically, or in other words, alphabetically. Here's the actual sort order of each value, and where "impl" sorts among them:
"cb implementation urgent job"
"hb implementation urgent"
"impl"
"kj implementation"
"sharu implementation quick response needed"
Alphabetically, you can see that "k" and "s" come after "i". So, those are the only documents you're going to get from a query where Workdesc values are greater than "impl".
If you're trying to do a substring search to find all the Workdesc strings that contain "impl", that's not possible with Firestore. Firestore doesn't offer substring searches. You'll have to find another way (probably mirroring data to another database that supports it).
To build on Doug's answer, unfortunately Firestore does not support the type of string search you are looking to do. A potential solution that does away with text search is that you could create another field on your todo documents that stores whether you're dealing with an "implementation" or not.
For example, if you had a field isImplementation, which would be true for implementation todos and false for those that are not, you could add this field as part of your where clause to your query. This would ensure that you are fetching implementation todos only.
Once again building on #Doug's answer, Firestore is an indexed document database. To query for data, the query must be performed against an index in a single sweep to keep queries performant in the way the database is designed.
Firebase won't index fields that are strings by default because it isn't efficient and it is quite a taxing operation at scale. A different approach is often the best option.
Take for example the following function that splits an input string into searchable parts that can then be added to an index. As the length of the input string grows, the number of substrings contained within grows rapidly.
function shatter(str, minLength = 1) {
let parts = [str]; // always have full string
let i, subLength = minLength;
let strLength = str.length;
while (subLength < strLength) {
for (i = 0; i < (strLength - subLength + 1); i++) {
parts.push(str.substring(i, i + subLength));
}
subLength++;
}
return parts;
}
Here's an interactive snippet demonstrating this:
function shatter(str, minLength = 1) {
let parts = [str]; // always have full string
let i, subLength = minLength;
let strLength = str.length;
while (subLength < strLength) {
for (i = 0; i < (strLength - subLength + 1); i++) {
parts.push(str.substring(i, i + subLength));
}
subLength++;
}
return parts;
}
let str = prompt('Please type out a string to shatter:', 'This is a test string');
let partsOfMin1 = shatter(str, 1);
console.log('Shattering into pieces of minimum length 1 gives:', partsOfMin1);
let partsOfMin3 = shatter(str, 3);
console.log('Shattering into pieces of minimum length 3 gives:', partsOfMin3);
let partsOfMin5 = shatter(str, 5);
console.log('Shattering into pieces of minimum length 5 gives:', partsOfMin5);
alert('The string "' + str + '" can be shattered into as many as ' + partsOfMin1.length + ' pieces.\r\n\r\nThis can be reduced to only ' + partsOfMin3.length + ' with a minimum length of 3 or ' + partsOfMin5.length + ' with a minimum length of 5.');
However using that above function, we can repurpose it so that it saves the shattered pieces to Firestore at /substringIndex/todos/workDesc with a link back to the document containing the string.
const firebase = require('firebase');
firebase.initializeApp(/* config here */);
const arrayUnion = firebase.firestore.FieldValue.arrayUnion;
const TODOS_COL_REF = firebase.firestore().collection('todos');
const SUBSTRING_INDEX_COL_REF = firebase.firestore().collection('substringIndex');
// splits given string into segments ranging from the given minimum length up to the full length
function shatter(str, minLength = 1) {
let parts = [str];
let i, subLength = minLength;
let strLength = str.length;
while (subLength < strLength) {
for (i = 0; i < (strLength - subLength + 1); i++) {
parts.push(str.substring(i, i + subLength));
}
subLength++;
}
return parts;
}
// upload data
const testData = {
workDesc: 'this is a prolonged string to break code',
assignDate: firebase.firestore.Timestamp.fromDate(new Date()),
assignTo: 'Ddy1QVOAO6SIvB8LfAE8Z0Adj4H3',
followers: ['Ddy1QVOAO6SIvB8LfAE8Z0Adj4H3'],
searchArray: ['v1', 'v2']
}
const todoDocRef = TODOS_COL_REF.doc();
const todoId = todoDocRef.id;
todoDocRef.set(testData)
.then(() => console.log('Uploaded test data!'))
.catch((err) => console.error('Failed to test data!', err));
// Note: in this example, I'm not waiting for the above promise to finish
// Normally, you would integrate it into the batched write operations below
// index each desired string field
const indexDocRef = SUBSTRING_INDEX_COL_REF.doc('todos');
const indexedFields = ["workDesc"];
const indexEntryMinLength = 3;
const indexUpdatePromises = indexedFields.map((fieldName) => {
const indexColRef = indexDocRef.collection(fieldName);
const fieldValue = testData[fieldName];
if (typeof fieldValue !== 'string') return Promise.resolve(undefined); // skip non-string values
const parts = shatter(fieldValue, indexEntryMinLength);
console.log('INFO: Consuming ' + (parts.length * 2) + ' write operations to index ' + fieldName);
// Each batched write can handle up to 500 operations, each arrayUnion counts as two
const partsBatches = [];
if (parts.length > 250) {
for (let i = 0; i < parts.length; i += 250) {
partsBatches.push(parts.slice(i, i + 250));
}
} else {
partsBatches.push(parts);
}
const batchCommitPromises = partsBatches
.map((partsInBatch) => {
const batch = firebase.firestore().batch();
partsInBatch.forEach((part) => {
batch.set(indexColRef.doc(part), {ids: arrayUnion(todoId)}, { merge: true })
})
return batch.commit();
});
return Promise.all(batchCommitPromises);
})
Promise.all(indexUpdatePromises)
.then(() => console.log('Uploaded substring index!'))
.catch((err) => console.error('Failed to upload index!', err));
Then when you want to search for all documents containing "impl" you would use the following to get an array of matching document IDs:
firebase.firestore().doc('substringIndex/todos/workDesc/impl').get()
.then(snap => snap.get('ids'))
.then(console.log, console.error)
While the above code works, you will hit your read/write limits quite quickly as you update the index and you will also likely run into concurrency issues. I also consider it fragile in that non-English characters and punctuation will also trip it up - it is included as a demo only. These issues are why the relevant Firebase documentation recommends making use of a third-party search service like Algolia for full-text search.
TL:DR;
The best solution is to have a human-readable form of your data ("sharu implementation quick response needed") and a indexable form of your data ({implementation: true, urgent: true, pending: true}) as covered by #Luis in their answer.
I'm new to DynamoDB and trying to query a table based off the presence of a list of certain values for a field.
I have a field doc_id, which is also a secondary index, and I'd like to return all results where doc_id is contained in a list of values.
I'm trying something like this:
response = table.query(
IndexName='doc_id-index',
FilterExpression=In(['27242226'])
)
But clearly that is not correct.
Can anyone point me in the right direction?
Thanks!
with Query operation
A FilterExpression does not allow key attributes. You cannot define a filter expression based on a partition key or a sort key.
So, Your doc_id field is the partition key of the doc_id-index and cannot be used in FilterExpression.
Note
A FilterExpression is applied after the items have already been read; the process of filtering does not consume any additional read capacity units.
I'm assuming you have another field like userId, just to show how to implement IN operation.(Query)
var params = {
TableName: 'tbl',
IndexName: 'doc_id-index',
KeyConditionExpression: 'doc_id= :doc_id',
FilterExpression: 'userId IN (:userId1,:userId2)',//you can add more userId here
ExpressionAttributeValues: {
':doc_id':100,
':userId1':11,
':userId2':12
}
};
If you have more userId you should pass to FilterExpression dynamically.
but in your case, you can use Scan operation
var params = {
TableName : "tbl",
FilterExpression : "doc_id IN (:doc_id1, :doc_id2)",
ExpressionAttributeValues : {
":doc_id1" :100,
":doc_id2" :101
}
};
and even pass to FilterExpression dynamically like below
var documentsId = ["100", "101","200",...];
var documentsObj = {};
var index = 0;
documentsId.forEach((value)=> {
index++;
var documentKey = ":doc_id"+index;
documentsObj[documentKey.toString()] = value;
});
var params = {
TableName: 'job',
FilterExpression: 'doc_id IN ('+Object.keys(documentsObj).toString()+')',
ExpressionAttributeValues: documentsObj,
};
Note:be careful while using Scan operation, less efficient than Query.
I have a table storedgames, which contains 2092 items.
And it also has an index on that table, which also lists 2092 items.
when I fetch data, I use the index, to obtain the items for one specific user.
const params = {
TableName: "storedgames",
IndexName: "user-index",
KeyConditionExpression: "#usr = :usr",
ExpressionAttributeNames: { "#usr": "user" },
ExpressionAttributeValues: { ":usr": user }
};
const data = await new Promise((resolve, reject) => {
docClient.query(params, (err, data) => {
if (err) { reject(err); } else { resolve(data); }
});
}).catch((err) => {
console.error(err);
return false;
});
However, the above code does not return all items. It only finds 42. And for today's items there is only 1 hit. When I check directly on the AWS webpage, I actually find more items for today.
And even when I do this using the index, it finds more records.
When I leave out the filtering of the day, I actually find over 130 items,
while my javascript code only returns 42 items when I leave out the day filter.
So my question is, why does the data of my index seem to be incomplete when I call it programmatically ?
The records actually contain a lot of data, and there appears to be a limit in the amount of data that can be fetched per query.
A single Query operation can retrieve a maximum of 1 MB of data. This
limit applies before any FilterExpression is applied to the results.
If LastEvaluatedKey is present in the response and is non-null, you
must paginate the result set (see Paginating the Results).
So, I one possible solution, is to perform multiple fetches until you have the entire collection.
const queryAllItems = (params, callback) => {
let fullResult = { Items: [], Count: 0, ScannedCount: 0 };
const queryExecute = (callback) => {
docClient.query(params, (err, result) => {
if (err) {
callback(err);
return;
}
const { Items, LastEvaluatedKey, Count, ScannedCount } = result;
fullResult.Items = [...fullResult.Items, ...Items];
fullResult.Count += Count;
fullResult.ScannedCount += ScannedCount;
if (!LastEvaluatedKey) {
callback(null, fullResult);
return;
}
params.ExclusiveStartKey = LastEvaluatedKey;
queryExecute(callback);
});
}
queryExecute(callback);
}
Unfortunately, this isn't a complete solution. In my situation, a query for a mere 130 items (which require 4 actual fetches) takes about 15 seconds.
I've been reading the limited documentation and from what I've seen, this query should work.
// SAMPLE STORED PROCEDURE
function sample() {
var prefix = "";
var collection = getContext().getCollection();
var data = collection.readDocuments(collection.getSelfLink());
console.log(data);
console.log(JSON.stringify(data));
// Query documents and take 1st item.
var isAccepted = collection.queryDocuments(
collection.getSelfLink(),
'SELECT * FROM root r',
function (err, feed, options) {
if (err) throw err;
// Check the feed and if empty, set the body to 'no docs found',
// else take 1st element from feed
if (!feed || !feed.length) {
var response = getContext().getResponse();
response.setBody('no docs found');
}
else {
var response = getContext().getResponse();
var body = { prefix: prefix, feed: feed[0] };
response.setBody(JSON.stringify(body));
}
});
if (!isAccepted) throw new Error('The query was not accepted by the server.');
}
After all, this is just the sample query that the portal generates. The only issue is that it does not return any results.
The collection has 7000 documents
It is partitioned by a prop type /EmailTypeId
When I execute the query I am submitting partition value of 5 (which is the partition value for all current records)
I'm console.logging a call to collection.readDocuments which should return all docs but it just returns the value true
I'm trying to return all records so I can aggregate by id and count. How can I get this query to actually return data?
Here is a sample screen shot of one of the doc schemas in the collection
Here is the input form for supplying the partition value
Update
I created a new collection as a control. This collection has no partition key and it seems the same query returns results in that collection. Therefor the issue has to be in the 2nd screen shot I provided. Perhaps I am providing the partition key incorrectly.
I believe you need to limit your response size since there is a limit coming from cosmo. I added something like this in my sproc to alleviate this:
if (responseSize + queryPageSize < 1024 * 1024) {
// Append query results to nodesBatch.
nodesBatch = nodesBatch.concat(documentsRead);
// Keep track of the response size.
responseSize += queryPageSize;
if (responseOptions.continuation) {
// If there is a continuation token... Run the query again to get the next page of results
lastContinuationToken = responseOptions.continuation;
getNodes(responseOptions.continuation);
} else {
// If there is no continutation token, we are done. Return the response.
response.setBody({
"message": "Query completed succesfully.",
"queryResponse": nodesBatch
});
}
} else {
// If the response size limit reached; run the script again with the lastContinuationToken as a script parameter.
response.setBody({
"message": "Response size limit reached.",
"lastContinuationToken": lastContinuationToken,
"queryResponse": nodesBatch
});
}
});
Let me know how this works for you.
Also, check your collection name and use that instead of root