Where does the _id being generated (and checked)?

Where does the _id being generated (and checked)? - meteor

The database action insert now seems to be synchronous returning _id right after the insertion, therefore no callback is needed here.
The question lies on where does the _id being generated (and checked), since this seems to be a fast synchronous action done on miniMongo, yet there is no full list of _id of a certain collection, how’s that possible for miniMongo to check whether the _id is duplicated or not??

When using Collection.insert on the client, the _id is generated on the client using a random uuid algorithm, hence the seemingly perfectly latency compensated client-side insertion.
Collection.insert being implemented as a special case of Meteor.method, we know that at the same time the client simulation is run on the client, a corresponding server operation is triggered, the client document is sent to the server along with its locally generated _id.
On the server, there's a check to see if the _id is correct (truly unique) and the server acknowledge the valid insertion back to the client.
If the client generated _id was not unique after all, then the insert will fail with a "duplicate key error" (this could happen like 0.001% of the time - the probability is even lower, and you would have to resubmit your client form or whatever).
To answer specifically your question, the _id can be generated in the browser in case of a client insert, but it's validity is ultimately checked on the server.
EDIT : I initially assumed that Meteor was trying to recover from the duplicate key error and generate a new key to avoid duplicity and propagate it on the client, I tested the use case and found out I was wrong, thanks #Tom Freudenberg for pointing this out.

Related

Database rollback on API response failure

A customer of ours is very persistent that they expect this "from any API" (meaning they don't want to pay for the changes). I seem to have trouble finding clear information on this though.
Say we have an API that creates an appointment for a calendar. Server-side everything was successful, data is committed to the database. API tries to send the HTTP 201 (Created) response, but something goes wrong there. Client ignores the response, or connection dropped, ...
They want our API to undo the database changes in that particular situation.
The question is not how to do this, but rather if this is something most APIs do? Is this standard behavior? Or something similar like refusing duplicate create requests?

The difficult part of course is to actually know if an API has failed to send the response, and as far as I am concerned with respect to the crux of the question, it is not a usual behavior implemented. If the user willingly inputs the data, you can go ahead and store it. If the response doesn't return properly due to timeouts (you are not responsible for user "ignoring" the response), then the client side code can refresh on failure and load fresh data. And the user can delete inputted data themselves(given you provide an endpoint for that)
Depending on the database, it is possible to make all database changes of an API reversible. For example, with SQL, you use [SQL transactions][1] using commit, rollback and savepoints. There is most likely a similar mechanism available for noSQL.

Using SQS or DynamoDB to control order status

I am building a system that processes orders. Each order will follow a workflow. So this order can be, e.g., booked,accepted,payment approved,cancelled and so on.
Every time a status of a order changes I will post this change to SNS. To know if a status order has changed I will need to make a request to a external API, and compare to the last known status.
The question is: What is the best place to store the last known order status?
1. A SQS queue. So every time I read a message from queue, check status using the external API, delete the message and insert another one with the new status.
2. Use a database (like Dynamo DB) to control the order status.

You should not use the word "store" to describe something happening with stateful facts and a queue. Stateful, factual information should be stored -- persisted -- to a database.
The queue messages should be treated as "hints" on what work needs to be done -- a request to consider the reasonableness of a proposed action, and if reasonable, perform the action.
What I mean by this, is that when a queue consumer sees a message to create an order, it should check the database and create the order if not already present. Update an order? Check the database to see whether the order is in a correct status for the update to occur. (Canceling an order that has already shipped would be an example of a mismatched state).
Queues, by design, can't be as precise and atomic in their operation as a database should. The Two Generals Problem is one of several scenarios that becomes an issue in dealing with queues (and indeed with designing a queue system) -- messages can be lost or delivered more than once.
What happens in a "queue is authoritative" scenario when a message is delivered (received from the queue) more than once? What happens if a message is lost? There's nothing wrong with using a queue, but I respectfully suggest that in this scenario the queue should not be treated as authoritative.

I will go with the database option instead of SQS:
1) option SQS:
You will have one application which will change the status
Add the status value into SQS
Now another application will check your messages and send notification, delete the message
2) Option DynamoDB:
Insert you updated status in DynamoDB
Configure a Lambda function on update of that field
Lambda function will send notifcation
The database option looks clear additionally, you don't have to worry about maintaining any queue plus you can read one message from the queue at a time unless you implement parallel reader to read from the queue. In a database, you can update multiple rows and it will trigger the lambda and you don't have to worry about it.
Hope that helps

Would real world Meteor application use server side Methods almost exclusively?

I'm learning Meteor and fundamentally enjoy how fast I can build data driven applications however as I went through the Creating Posts chapter in the Discover Meteor book I learned about using server side Methods. Specifically the primary reason (and there are a number of very valid reasons to use these) was because of the timestamp. You wouldn't want to rely on the client date/time, you'd want to use the server date/time.
Makes sense except that in almost every application I've ever built we store date/time of row create/update in a column. Effectively every single create or update to the database records date/time which in Meteor now looks like I would need to use server side Methods to ensure data integrity.
If I'm understanding correctly that pretty much eliminates the ease of use and real-time nature of a client side Collection because I'll need to use Methods for almost every single update and create to our databases.
Just wanted to check and see how everyone else is doing this in the real world. Are you just querying a server side Method that just returns the date/time and then using client side Collection or something else?
Thanks!

The short answer to this question is that yes, every operation that affects the server's database will go through a server-side method. The only difference is whether you are defining this method explicitly or not.
When you are just getting started with Meteor, you will probably do insert/update/remove operations directly on client collections using validators, which check for whether the operation is allowed. This usage is actually calling predefined methods on both the server and client: (for a collection named foo the you have /foo/insert, for example) which simply checks the specified validators before doing the operation. As you become more familiar with Meteor you will probably override these default methods, for reasons you described (among others.)
When using your own methods, you will typically want to define a method both on the server and the client, just as the default collection functions do for you. This is because of Meteor's latency compensation, which allows most client operations to be reflected immediately in the browser without any noticeable lag, as long as they are permitted. Meteor does this by first simulating the effect of a method call in the client, updating the client's cached data temporarily, then sending the actual method call to the server. If the server's method causes a different set of changes than the client's simulation, the client's cache will be updated to reflect this when the server method returns. This also means that if the client's method would have done the same as the server, we've basically allowed for an instant operation from the perspective of the client.
By defining your own methods on the server and client, you can extend this to fill your own needs. For example, if you want to insert timestamps on updates, have the client insert whatever timestamp in the simulation method. The server will insert an authoritative timestamp, which will replace the client's timestamp when the method returns. From the client's perspective, the insert operation will be instant, except for an update to the timestamp if the client's time happens to be way off. (By the way, you may want to check out my timesync package for displaying relative server time accurately on the client.)
A final note: it's good to understand what scope you are doing collection operations in, as this was one of the this that originally confused me about Meteor. For example, if you have a collection instance in the client Foo, Foo.insert() in normal client code will call the default pair of client/server methods. However, Foo.insert() in a client method will run only in a simulation and will never call server code - so you will need to define the same method on the server and make sure you do Foo.insert() there as well, for the method to work properly.
A good rule of thumb for moving forward is to replace groups of validated collection operations with your own methods that do the same operations, and then adding specific extra features on the server and client respectively.

In short— yes!
Publications exist to send out a 'live', and dynamic, subset of the database to the client, sending DDP added messages for existing records, followed by a ready, and then added, changed, and deleted messages to keep the client's cache consistent.
Methods exist to- directly, or indirectly— cause Mongo Updates, and like it was mentioned by Andrew, they are always in use.
But truly, because of Meteor's publication architecture, any edits to collections that are currently being published to at least one client, will be published via DDP - regardless of the source of the change to Mongo - even an outside process.

Meteor DDP - "ready" and "update" messages clarification

I'm currently implementing a DDP client based on the specs available on this page:
https://github.com/meteor/meteor/blob/master/packages/livedata/DDP.md
I just have a doubt concerning the 2 method types called "ready" and "update".
Let's start with the "ready", according to the spec:
When one or more subscriptions have finished sending their initial
batch of data, the server will send a ready message with their IDs.
Do this mean that we can have several "added" messages from the server until the whole collection to be completely transferred to the client. We should store this in a temporary place to then wait for the "ready" semaphore prior to make it public ? i.e. in the real collection ?
The same question regarding the remote procedure calls. Should I store the result in a temporary collection and only return (process) the result once the "updated" message is received ?
This part is obscure
Once the server has finished sending the client all the relevant data messages based on this procedure call, the server should send an
updated message to the client with this method's ID.
"Should", so I'm stuck if I do rely on it but nothing ?

We should store this in a temporary place to then wait for the "ready"
semaphore prior to make it public ? i.e. in the real collection ?
The standard Meteor JavaScript client makes added documents available in the client collection as they come in from the server. So if, for example, the collection is being displayed on the web page and 5 of 100 documents have arrived so far, the user will be able to see the 5 documents.
When the subscription "ready" message arrives, the subscription on the client is marked as "ready", which the client can use if they're doing something that needs to wait for all the data to arrive.
Whether you want to wait in your client for all the data to arrive before making it public is up to you... it depends on what you're doing with your client and if you want to show documents as they arrive or not.
"Should", so I'm stuck if I do rely on it but nothing ?
The Meteor server does send the "updated" message, so you can rely on it.
The same question regarding the remote procedure calls. Should I store the result in a temporary collection and only return (process) the result once the "updated" message is received ?
There are two outcomes from making a method call: the return value (or error) returned by the method (the "result" message), and documents that may have been inserted / updated / removed by the method call (the "updated" message). Which one you want to listen to is up to you: whether it's important for you to know when you've received all the document changes coming from the method call or if you just want the method return value.
The "updated" message is used by the Meteor client to perform "latency compensation": when the client changes a local document, the change is applied immediately to the local document (and the changes will be visible to the user)... on the assumption that the changes will likely be accepted by the server. Then the client makes a method call requesting the change, and waits for the updated documents to be sent from the server (which may include the changes if they were accepted, or not, if they were rejected). When the "update" message is received, the local changes are thrown away and replaced by the real updates from the server. If you're not doing latency compensation in your own client then you may not care about the "updated" message.

Consequences of POST not being idempotent (RESTful API)

I am wondering if my current approach makes sense or if there is a better way to do it.
I have multiple situations where I want to create new objects and let the server assign an ID to those objects. Sending a POST request appears to be the most appropriate way to do that.
However since POST is not idempotent the request may get lost and sending it again may create a second object. Also requests being lost might be quite common since the API is often accessed through mobile networks.
As a result I decided to split the whole thing into a two-step process:
First sending a POST request to create a new object which returns the URI of the new object in the Location header.
Secondly performing an idempotent PUT request to the supplied Location to populate the new object with data. If a new object is not populated within 24 hours the server may delete it through some kind of batch job.
Does that sound reasonable or is there a better approach?

The only advantage of POST-creation over PUT-creation is the server generation of IDs.
I don't think it worths the lack of idempotency (and then the need for removing duplicates or empty objets).
Instead, I would use a PUT with a UUID in the URL. Owing to UUID generators you are nearly sure that the ID you generate client-side will be unique server-side.

well it all depends, to start with you should talk more about URIs, resources and representations and not be concerned about objects.
The POST Method is designed for non-idempotent requests, or requests with side affects, but it can be used for idempotent requests.
on POST of form data to /some_collection/
normalize the natural key of your data (Eg. "lowercase" the Title field for a blog post)
calculate a suitable hash value (Eg. simplest case is your normalized field value)
lookup resource by hash value
if none then
generate a server identity, create resource
Respond => "201 Created", "Location": "/some_collection/<new_id>"
if found but no updates should be carried out due to app logic
Respond => 302 Found/Moved Temporarily or 303 See Other
(client will need to GET that resource which might include fields required for updates, like version_numbers)
if found but updates may occur
Respond => 307 Moved Temporarily, Location: /some_collection/<id>
(like a 302, but the client should use original http method and might do automatically)
A suitable hash function might be as simple as some concatenated fields, or for large fields or values a truncated md5 function could be used. See [hash function] for more details2.
I've assumed you:
need a different identity value than a hash value
data fields used
for identity can't be changed

Your method of generating ids at the server, in the application, in a dedicated request-response, is a very good one! Uniqueness is very important, but clients, like suitors, are going to keep repeating the request until they succeed, or until they get a failure they're willing to accept (unlikely). So you need to get uniqueness from somewhere, and you only have two options. Either the client, with a GUID as Aurélien suggests, or the server, as you suggest. I happen to like the server option. Seed columns in relational DBs are a readily available source of uniqueness with zero risk of collisions. Round 2000, I read an article advocating this solution called something like "Simple Reliable Messaging with HTTP", so this is an established approach to a real problem.
Reading REST stuff, you could be forgiven for thinking a bunch of teenagers had just inherited Elvis's mansion. They're excitedly discussing how to rearrange the furniture, and they're hysterical at the idea they might need to bring something from home. The use of POST is recommended because its there, without ever broaching the problems with non-idempotent requests.
In practice, you will likely want to make sure all unsafe requests to your api are idempotent, with the necessary exception of identity generation requests, which as you point out don't matter. Generating identities is cheap and unused ones are easily discarded. As a nod to REST, remember to get your new identity with a POST, so it's not cached and repeated all over the place.
Regarding the sterile debate about what idempotent means, I say it needs to be everything. Successive requests should generate no additional effects, and should receive the same response as the first processed request. To implement this, you will want to store all server responses so they can be replayed, and your ids will be identifying actions, not just resources. You'll be kicked out of Elvis's mansion, but you'll have a bombproof api.

But now you have two requests that can be lost? And the POST can still be repeated, creating another resource instance. Don't over-think stuff. Just have the batch process look for dupes. Possibly have some "access" count statistics on your resources to see which of the dupe candidates was the result of an abandoned post.
Another approach: screen incoming POST's against some log to see whether it is a repeat. Should be easy to find: if the body content of a request is the same as that of a request just x time ago, consider it a repeat. And you could check extra parameters like the originating IP, same authentication, ...

No matter what HTTP method you use, it is theoretically impossible to make an idempotent request without generating the unique identifier client-side, temporarily (as part of some request checking system) or as the permanent server id. An HTTP request being lost will not create a duplicate, though there is a concern that the request could succeed getting to the server but the response does not make it back to the client.
If the end client can easily delete duplicates and they don't cause inherent data conflicts it is probably not a big enough deal to develop an ad-hoc duplication prevention system. Use POST for the request and send the client back a 201 status in the HTTP header and the server-generated unique id in the body of the response. If you have data that shows duplications are a frequent occurrence or any duplicate causes significant problems, I would use PUT and create the unique id client-side. Use the client created id as the database id - there is no advantage to creating an additional unique id on the server.

I think you could also collapse creation and update request into only one request (upsert). In order to create a new resource, client POST a “factory” resource, located for example at /factory-url-name. And then the server returns the URI for the new resource.

Why don't you use a request Id on your originating point (your originating point should do two things, send a GET request on request_id=2 to see if it's request has been applied - like a response with person created and created as part of request_id=2
This will ensure your originating system knows what was the last request that was executed as the request id is stored in db.
Second thing, if your originating point finds that last request was still at 1 not yet 2, then it may try again with 3, to make sure if by any chance just the GET response has gotten lost but the request 2 was created in the db.
You can introduce number of tries for your GET request and time to wait before firing again a GET etc kinds of system.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex