I am trying to write a piece of code that would:
access 2 webservice with some request
the response will be sequences of objects, each object identified by id, the responses sorted by ID in ascending order
the responses will be large and streamed (or gzip chunked)
the result will be a merge of data from the two inputs based on IDs
What I try to achieve is that once the corresponding parts of responses are available, the output should be written out. I also don't want to wait for the whole response to be in place, since this will run out of memory. I want to start streaming output as soon as I can and keep as little in memory as possible.
What would be a good way to start?
I have taken a look at aleph and lamina, also async.http.client. It seems that these tools could help me, but I struggle with figuring out how to have one piece of code that would react to having the same part of responses from both webservices.
You can do something like this (using aleph - which under the hood used lamina channels abstraction).
Use sync-http-request to create the 2 http request
Get the :body from the above created 2 request object. Ex: at https://github.com/ztellman/aleph/wiki/Consuming-and-Broadcasting-a-Twitter-Stream
The :body is a lamina channel, use lamina join method to join the 2 channels into one channel
Subscribe to the above channel (which was result of join call).
Now the subscription callback will receive each JSON object as soon as it arrives on the either of the channels and you can than have a local atom which is a map with key being the value on which you want to combine the result from the 2 channels and value being a vector which will store values for the same key. So this will go something like this:
On receiving an item in callback, check if the local atom map has the key already
If key is already, store or do some other processing with the 2 items (one already in the map and other that you just received) for the key and remove the key from the map.
If key not there, create the key and value as [item] i.e the vector of one item that is being received now.
Related
I have a firestore database with data like this:
Now, I access this data with doc('mydoc').get().data() and it returns the data. But, even without the data changing, if I make the same call again and again, I get different response. I mean, the data is the same, but the order of the fields is different each time.
Here's my logs with two calls, see how the field order is random? Not just between objects in the same request, but between the same object in different requests.
I'm accessing this data in a Cloud Function and serving it as an API endpoint. I want to cache the response if the data (in the database) hasn't changed, but I can't, because the data (as returned by doc.get().data()) is constantly changing.
From what I could find, this might stem from ProtoBuf encoding.
My question: is there any way to get a consistent response to a firebase query when the underlying data isn't changing?
And if no, is my only option to JSON.stringify() the whole object before putting it into firestore? (I don't need to query within document objects.)
Edit for clarity: I am not expecting to know in advance the order of the fields being returned. I am expecting (hoping) that the order will be the same each time.
JSON object fields are unordered as per the JSON spec. Individual implementations of JSON are free to rearrange order however they see fit, and there's no surefire way to guarantee an order. See e.g. this answer.
This isn't a Firestore-specific problem, this is just generally how JSON objects work. You cannot and should not depend on the order of fields for any parsing or representation.
If display order is extremely important to you, you might want to investigate libraries like ordered-json.
In a current project the UI posts an ordered list of ids of several files under one key to tell the server in which order the files need to be processed:
file[]=18&file[]=20&...
So far the order is preserved when handing this over from client to server, however I could find no specification whether the HTTP protocol keeps the parameters in the specified order. So the question is, is it safe to depend on the given order, or should I implement a workaround to assign each file id a specific order? E.g.
file_18=0&file_20=1&...
Edit:
jQuery Ui has a serialize method, that will pass the parameters just in the initial way that I described above:
foo_1, foo_5, foo_2 will serialize to foo[]=1&foo[]=5&foo[]=2
This is for a sortable list, so I assume they know what they are doing.
Depends on the server. In general, the order is guaranteed by the TCP protocol. If you can read in this order, your HTTP parser stores the parameters in the direct sequence - do not worry. Nobody will be changing these parameters in some places.
HTTP doesn't specify the format of GET and POST data. So they just get passed as blobs of data.
It is up to your form data parser to maintain the order (I'm not aware of any that don't (for identically named fields).
Our program so far: We have a process that involves multiple schemata, orchestrations and messages sent/received.
Our desire: To have an ID that links the whole process together when we log our progress into a SQL server table.
So far, we have a table that logs our progress but when there are multiple messages it is very difficult to read since Biztalk will, sometimes, process certain messages out of order.
E.g., we could have:
1 Beginning process for client1
2 Second item for client1
3 Third item for client1
4 Final item for client1
Easily followed if there's only one client being updated at a time. On the other hand, this will be much more likely:
1 Beginning process for client1
2 Beginning process for client2
3 Second item for client2
4 Third item for client2
5 Second item for client1
6 Third item for client1
7 Final item for client1
8 Final item for client2
It would be nice to have an ID throughout the whole thing so that the last listing could ordered by this ID field.
What is the best and/or quickest way to do this? We had thought to add an ID, we would create, from the initial moment of the first orchestration's triggering and keep passing that value to all the schemata and later orchestrations. This seems like a lot of work and would require we modify all the schemata - which just seems wrong.
Should we even be wanting to have such an ID? Any other solutions that come to mind?
This may not exactly be the easiest way, but have you looked at this:
http://blogs.msdn.com/b/appfabriccat/archive/2010/08/30/biztalk-application-tracing-made-easy-with-biztalk-cat-instrumentation-framework-controller.aspx
Basically it's an instrumentation framework which allows you to event out from pipelines, maps, orchs, etc.
When you write out to the event-trace you can use a "business key" which will tie mutltiple events together in a chain, similar to what you are saying.
Available here
http://btscatifcontroller.codeplex.com/
I'm not sure I fully understand all the details of your specific setup, but here goes:
If you can correlate the messages from the same client into a "long running" orchestration (which waits for subsequent messages from the same client), then the orchestration will have an automatically assigned ServiceId Guid, which will be kept throughout the orchestration.
As you say, for correlation purposes, you would usually try and use natural keys within the existing incoming message schemas to correlate subsequent messages back to the running orchestration - this way you don't need to change the schemas. In your example, ClientId might be a good correlation, provided that the same client cannot send multiple message 'sets' simultaneously. (and worst case, if you do add a new correlation key to the schemas, all systems involved in the orchestration will need to be changed to 'remember' this key and return it to you.) Again, assuming ClientId as a correlation key, in your example, 2 orchestrations would be running simultaneously - one for Client 1 and one for Client 2
However, for scalability and version control reasons, (very) long running orchestrations are generally to be avoided unless they are absolutely necessary (e.g. unless you can only trigger a process once all 4 client messages are received). If you decide to keep each message as a separate orchestration or just mapped and filtered on a port, another way to 'track' the sets of is by using BAM - you can use a continuation to tie all the client messages back together, e.g. for the purpose of a report or such.
Take a look at BAM. It's designed to do exactly what you describe: Using Business Activity Monitoring
This book has got a very good chapter about BAM and this tool, by one of the authors of the book, can help you developing your BAM solution. And finally, a nice BAM Poster.
Don't be put off by the initial complexity. When you get your head around it, BAM it's one of the coolest features of BizTalk.
Hope this helps. Good luck.
Biztalk assigns various values in the message context that usually persist for the life of the processing of that message. Such as the initial MessageId. Will that work for you?
In our application we have to use an externally provided ID (from the customer). We have a multi-part message with this id in part of it. You might consider that as well
You could create a UniqueId and StepId and pass them around in the message context. When a new process for a client starts set UniqueId to a Guid and StepId to 1. As it gets passed to the next process increment the StepId.
This would allow you to query events, grouped by client id and in the order (stepId) the event happened.
Let's say I have a flat file containing incoming messages. Where would the appropriate place be to inject the logic that takes identifying information from the message and sets primary key properties to link it to internal record IDs. For example, to map a customer's version of order ID into our internal order ID.
Sounds like you are looking to do a conversion of the incoming id to the internal id before sending the further along.
There are multiple places to do this.
You could do it in a pipeline component that either reads directly from its run-time configuration or from a database. You could also do it in a orchestration.
The easiest and most suitable place to do is probably however in a transformation map. Just make sure not to hard-code the transformation table (what id maps to one of you internal ids) as these usually change a lot. Have the map do a lookup ion a database for example to find the matching id.
Doing these kind of tasks in a map compared to the other options gives you a bit more flexibility as you can then apply the map directly in receive or send port. So if you don't need to do any workflow based logic you can use a messaging pattern and skip any orchestrations (always preferable).
I would consider doing this type of conversion in a map.
I'd like some advice on designing a REST API which will allow clients to add/remove large numbers of objects to a collection efficiently.
Via the API, clients need to be able to add items to the collection and remove items from it, as well as manipulating existing items. In many cases the client will want to make bulk updates to the collection, e.g. adding 1000 items and deleting 500 different items. It feels like the client should be able to do this in a single transaction with the server, rather than requiring 1000 separate POST requests and 500 DELETEs.
Does anyone have any info on the best practices or conventions for achieving this?
My current thinking is that one should be able to PUT an object representing the change to the collection URI, but this seems at odds with the HTTP 1.1 RFC, which seems to suggest that the data sent in a PUT request should be interpreted independently from the data already present at the URI. This implies that the client would have to send a complete description of the new state of the collection in one go, which may well be very much larger than the change, or even be more than the client would know when they make the request.
Obviously, I'd be happy to deviate from the RFC if necessary but would prefer to do this in a conventional way if such a convention exists.
You might want to think of the change task as a resource in itself. So you're really PUT-ing a single object, which is a Bulk Data Update object. Maybe it's got a name, owner, and big blob of CSV, XML, etc. that needs to be parsed and executed. In the case of CSV you might want to also identify what type of objects are represented in the CSV data.
List jobs, add a job, view the status of a job, update a job (probably in order to start/stop it), delete a job (stopping it if it's running) etc. Those operations map easily onto a REST API design.
Once you have this in place, you can easily add different data types that your bulk data updater can handle, maybe even mixed together in the same task. There's no need to have this same API duplicated all over your app for each type of thing you want to import, in other words.
This also lends itself very easily to a background-task implementation. In that case you probably want to add fields to the individual task objects that allow the API client to specify how they want to be notified (a URL they want you to GET when it's done, or send them an e-mail, etc.).
Yes, PUT creates/overwrites, but does not partially update.
If you need partial update semantics, use PATCH. See http://greenbytes.de/tech/webdav/draft-dusseault-http-patch-14.html.
You should use AtomPub. It is specifically designed for managing collections via HTTP. There might even be an implementation for your language of choice.
For the POSTs, at least, it seems like you should be able to POST to a list URL and have the body of the request contain a list of new resources instead of a single new resource.
As far as I understand it, REST means REpresentational State Transfer, so you should transfer the state from client to server.
If that means too much data going back and forth, perhaps you need to change your representation. A collectionChange structure would work, with a series of deletions (by id) and additions (with embedded full xml Representations), POSTed to a handling interface URL. The interface implementation can choose its own method for deletions and additions server-side.
The purest version would probably be to define the items by URL, and the collection contain a series of URLs. The new collection can be PUT after changes by the client, followed by a series of PUTs of the items being added, and perhaps a series of deletions if you want to actually remove the items from the server rather than just remove them from that list.
You could introduce meta-representation of existing collection elements that don't need their entire state transfered, so in some abstract code your update could look like this:
{existing elements 1-100}
{new element foo with values "bar", "baz"}
{existing element 105}
{new element foobar with values "bar", "foo"}
{existing elements 110-200}
Adding (and modifying) elements is done by defining their values, deleting elements is done by not mentioning it the new collection and reordering elements is done by specifying the new order (if order is stored at all).
This way you can easily represent the entire new collection without having to re-transmit the entire content. Using a If-Unmodified-Since header makes sure that your idea of the content indeed matches the servers idea (so that you don't accidentally remove elements that you simply didn't know about when the request was submitted).
Best way is :
Pass Only Id Array of Deletable Objects from Front End Application To Web API
2. Then You have Two Options:
2.1 Web API Way : Find All Collections/Entities using Id arrays and Delete in API , but you need to take care of Dependant entities like Foreign Key Relational Table Data too
2.2. Database Way : Pass Ids to your database side, find all records in Foreign Key Tables and Primary Key Tables and Delete in same order i.e. F-Key Table records then P-Key Table records