I've got a table like
request
avgResponseTime
Every request that is done is logged to this table. I use putItem for inserting or updating items - but I like the avgResponseTime to be a "continuous value" (the avgResponseTime so far + the response time of the newly added request).
Is there a way to do that in a single operation?
Related
At first sight, it's clear what the continuation token does in Cosmos DB: attaching it to the next query gives you the next set of results. But what does "next set of results" mean exactly?
Does it mean:
the next set of results as if the original query had been executed completely without paging at the time of the very first query (skipping the appropriate number of documents)?
the next set of results as if the original query had been executed now (skipping the appropriate number of documents)?
Something completely different?
Answer 1. would seem preferable but unlikely given that the server would need to store unlimited amounts of state. But Answer 2. is also problematic as it may result in inconsistencies, e.g. the same document may be served multiple times across pages, if the underlying data has changed between the page queries.
Cosmos DB query executions are stateless at the server side. The continuation token is used to recreate the state of the index and track progress of the execution.
"Next set of results" means, the query is executed again on from a "bookmark" from the previous execution. This bookmark is provided by the continuation token.
Documents created during continuations
They may or may not be returned depending on the position of insert and query being executed.
Example:
SELECT * FROM c ORDER BY c.someValue ASC
Let us assume the bookmark had someValue = 10, the query engine resumes processing using a continuation token where someValue = 10.
If you were to insert a new document with someValue = 5 in between query executions, it will not show up in the next set of results.
If the new document is inserted in a "page" that is > the bookmark, it will show up in next set of results
Documents updated during continuations
Same logic as above applies to updates as well
(See #4)
Documents deleted during continuations
They will not show up in the next set of results.
Chances of duplicates
In case of the below query,
SELECT * FROM c ORDER BY c.remainingInventory ASC
If the remainingInventory was updated after the first set of results and it now satisfies the ORDER BY criteria for the second page, the document will show up again.
Cosmos DB doesn’t provide snapshot isolation across query pages.
However, as per the product team this is an incredibly uncommon scenario because queries over continuations are very quick and in most cases all query results are returned on the first page.
Based on preliminary experiments, the answer seems to be option #2, or more precisely:
Documents created after serving the first page are observable on subsequent pages
Documents updated after serving the first page are observable on subsequent pages
Documents deleted after serving the first page are omitted on subsequent pages
Documents are never served twice
The first statement above contradicts information from MSFT (cf. Kalyan's answer). It would be great to get a more qualified answer from the Cosmos DB Team specifying precisely the semantics of retrieving pages. This may not be very important for displaying data in the UI, but may be essential for data processing in the backend, given that there doesn't seem to be any way of disabling paging when performing a query (cf. Are transactional queries possible in Cosmos DB?).
Experimental method
I used Sacha Bruttin's Cosmos DB Explorer to query a collection with 5 documents, because this tool allows playing around with the page size and other request options.
The page size was set to 1, and Cross Partition Queries were enabled. Different queries were tried, e.g. SELECT * FROM c or SELECT * FROM c ORDER BY c.name.
After retrieving page 1, new documents were inserted, and some existing documents (including documents that should appear on subsequent pages) were updated and deleted. Then all subsequent pages were retrieved in sequence.
(A quick look at the source code of the tool confirmed that ResponseContinuationTokenLimitInKb is not set.)
I am creating a leave tracker app where I want to store the user ID along with the from date and to date. I am using Amazon's DynamoDB as the database, and the user enters a leave through a custom command.
Eg: apply-leave from-date to-date
I want to avoid duplicate entries in the database. For example, if a user has already applied for a leave between 06-10-2019 to 10-10-2019 and applies for a leave between the same dates again, they should get a message saying that this already exists and a new record should not be created for the same.
However, a user can apply for multiple leaves and two users can take a leave between the same dates.
I tried using a conditional statement as follows:
table.put_item(
Item={
'leave_id': leave_id,
'user_id': user_id,
'from_date': from_date,
'to_date': to_date,
},
ConditionExpression='attribute_not_exists(user_id) AND attribute_not_exists(from_date) AND attribute_not_exists(to_date)'
)
where leave_id is the partition key. However, this does not work and a new row is added every time, even if it is the same dates. I have looked through similar other questions, but haven't been able to understand how to get this configured correctly.
Any ideas on how I should go about this, or if there is a different design that I should follow?
If you are calling your code with the leave_id that doesn't yet exist in the table, the item will always be inserted. If you call your code with leave_id that does already exist in your table you should be getting An error occurred (ConditionalCheckFailedException) when calling the PutItem operation: The conditional request failed error message.
I have two suggestions:
If you don't want to change your table, you can create a secondary index with user_id as the partition key and then query the index for all the items where the given user has some from_date and to_date attributes.
Like this:
table.query(
IndexName='user_id-index',
KeyConditionExpression=Key('user_id').eq(user_id),
FilterExpression=Attr('from_date').exists() & Attr('from_date').exists()
)
Then you will need to check for overlapping leave requests, etc. (eg. leave request that starts before the one that is already in place finishes). After deciding that the leave request is a valid one you will call put_item.
Another suggestion and probably a better one would be to create a composite primary key on your table with user_id as a partition key and leave_id as a sort key. That way you could execute a query for all leave requests from a particular user without the need to create a secondary index.
When sending batched records to the "Sync Leads" endpoint, if the request itself was successful, there is a result array returned with the success or failure of each record. Unfortunately in the case of a record-level failure, there won't be any identifying information I can use in the response to reference back to the input collection that I sent in the request as a batch.
I need to tie back any "skipped" results to the record in the request that failed to process. Is the result array in the same order as the collection of input records I posted in the batch? This would allow me to reference the input records by index of the collection.
Referring to Marketo's Developer documentation on querying marketable people.
You should be able to correlate the order of your submission with the sequence number returned.
... calls to create or update lead database objects will return an
seq field in each object in the results array. The number listed corresponds to the order of the updated record in the request made. Each item will also return the value of the idField for the object
type, and a status. The status field will indicate one of “created,”
“updated,” or “skipped.” If the status is skipped, then there will
also be a corresponding “reasons” array with one or more reason
objects that includes a code and a message, indicating why a record
was skipped.
Suppose that I have a table called persons and that a request to change any information about a person also updates that record's last_modified column. Would such a request still be considered idempotent? What I'm trying to find out is if auxiliary fields can be exempted from the criteria of idempotence.
If any information is changed on the database after a request (a POST request obviously, you would not alter a person record on a GET request) then it's not indempotent. By definition. Unless you only store stats (like logs).
Here it's not the last_modified column which is important, it's the change any information about a person.
A GET request is indempotent, you can take any uri and put it in an <IMG> in a web page, browsers will load it without asking, it must not alter anything in the database, or in the session (like destroying a session is not indempotent). An indempotent request can be prefetched, can run in any prioity (no need to care about the order of several indempotent queries,none of them can impact the other), etc.
I need to get the Marketo Leads who have changes on "progressionStatus" field (inside membership) with the API.
I can get all the leads related to a Program (with Get Leads by ProgramID API) without issues, but my need is to get those Leads with changes on "progressionStatus" column.
I was thinking to use the CreatedAt / UpdatedAt fields of the Program, so then, get all the leads related to those programs. But I didn't get the accurate results that I want.
Also, I tried to use the GET Lead changes API and use "fields" parameter to "progressionstatus" but that field don't exist.
It is possible to resolve this?
Thanks in advance.
You can get the list of Leads with progression status change by querying against the Get Lead Activities endpoint.
The Get Lead Changes endpoint could sound as a good candidate, but that endpoint only returns changes on the lead fields. Progression status change is not stored on the lead directly, so at the end that won't work. On the other hand the Get Leads by ProgramId endpoint returns –amongst others– the actual value of progressionStatus (program status of the lead in the parent program) but not the “change” itself, so you cannot process the resultset based on that.
The good news is that the progression status change is an activity type and luckily we have the above mentioned Get Lead Activities endpoint (which is also mentioned as the Query in the API docs) available to query just that. This endpoint also allows for filtering by activityTypeIds to narrow down the resultset to a single activity type.
Basically you have to call the GET /rest/v1/activities.json enpoint and pass the values of activityTypeIds and nextPageToken as query parameters (next to the access token obviously). So, first you need to get the internal Id of the activity type called “Change Status in Progression”. You can do that by querying the GET /rest/v1/activities/types.json endpoint and look for a record with that name. (I don't know if this Id changes from instance to instance, but in ours it is the #104). Also, to obtain a nextPageToken you have to make a call to GET /rest/v1/activities/pagingtoken.json as well, where you have to specify the earliest datetime to retrieve activities from. See more about Paging Tokens.
Once you have all of these bits at hand, you can make your request like that:
GET https://<INSTANCE_ID>.mktorest.com/rest/v1/activities.json?activityTypeIds=<TYPE_ID>&nextPageToken=<NEXTPAGE_TOKEN>&access_token=<ACCESS_TOKEN>
The result it gives is an array with items like below, which is easy to process further.
{
"id":712630,
"marketoGUID":"712630",
"leadId":824864,
"activityDate":"2017-12-01T08:51:13Z",
"activityTypeId":104,
"primaryAttributeValueId":1104,
"primaryAttributeValue":"PROGRAM_NAME",
"attributes":[
{"name":"Acquired By","value":true},
{"name":"New Status ID","value":33},
{"name":"Old Status ID","value":32},
{"name":"Reason","value":"Filled out form"},
{"name":"Success","value":false},
{"name":"New Status","value":"Filled-out Form"},
{"name":"Old Status","value":"Not in Program"}
]
}
Knowing the leadIds in question, you can make yet another request to fetch the actual lead records.