Is there a way to programmatically write data to Time Series Insights Gen2?
In our case we regularily compute values based on the raw data. The computation requires more logic than simply applying aggregates to values of a single time series.
I want to re-ingest the result of the computation and it's timestamp to TSI so i can query it along with the raw data.
Is the only way to achieve this by sending an event with the computation result to IoT Hub or Event Hub the same way it was for Gen1 as seen here or am i missing something?
It still stands true for Gen2 that the only way to add data to TSI is through an EventHub or IoTHub - see: https://learn.microsoft.com/en-us/azure/time-series-insights/concepts-streaming-ingestion-event-sources
Please add a new feature request here with your scenario details so others can vote and we prioritize other ways to ingest data differently than through IoT Hub or Event Hub.
For your scenario though you can send the result of the computation to your IoT Hub or Event Hub via HTTP --both services have an HTTP endpoint-- but be aware that it's not advisable to mix historical data with real-time data, so if you do this you should do the calculation as soon as possible -- within the same day as the original event or sooner.
Related
I have been scouring the internet for days on a solution to this problem.
That is, how to handle aggregation when there is no network connection? I have a task management app that looks to aggregate meta data about user tasks. For example, the task can contain tags that can be aggregated to be shown in a dashboard to the user on a daily basis. This would be easy if the user is always online, so I could use transaction or cloud function to aggregate, but when the user is offline, the aggregation will appear to be incorrect, until the user restores their network connection.
Aggregation queries are explained here:
https://firebase.google.com/docs/firestore/solutions/aggregation
Which states a limitation:
Offline support - Client-side transactions will fail when the user's
device is offline, which means you need to handle this case in your
app and retry at the appropriate time.
However, there has yet to be any example or documentation on how to 'handle this case'. How would I go about addressing this problem?
Some thoughts:
I could cache the item if a transaction fails. This item will be aggregated on top of the stored aggregation. However, going down this line would mean that I can't take advantage of the Firestore's "offline mode", because I'm using my own cache on every write while offline anyway.
I could aggregate on demand. That is, never store the aggregation. This is going to be very heavy on read depending on how many tasks a user has. Furthermore, if the aggregation will need to be shared as insights to other users, this option will not work because other users do not have access to the tasks.
I'm at a loss and any help would be appreciated, thanks!
After a lot of research and trial and error I found a solution that can address this problem gracefully.
FieldValue.increment to the rescue.
What FieldValue.increment does is bypass the use of transaction while respecting the default Firestore's offline cache behaviour. It requires the use of set or update on the field directly. The drawback is the inability to use the 'withConverter' on the collection for type safety. I'm willing to live with the drawback considering how useful FieldValue.increment is.
I've done multiple tests and can confirm that the values can be incremented/decremented multiple times locally while offline. This offline value is reflected in a get or snapshot call to the cache. When the network connection is restored, the values are updated on the server.
The value itself is not stored on the cache, it simply stores the "difference" in the FieldValue sentinel for when it is time to update it on the server.
This method only works with incrementing and decrementing values. Storing averages will not be possible using this method. That is because the true total number of items is not known at the time of its calculation when offline.
Instead, the total number of items are stored along side the total value. The average is then calculated when and as needed. In this way the average will always be accurate from a local perspective when offline, and it will also be accurate when online when the total value and count has been synced.
The goal is to generate events on every participating node when a state is changed that includes the business action that caused the change. In our case, Business Action maps to the Transaction command and provides the business intent or what the user is doing in business terms. So in our case, where we are modelling the lifecycle of a loan, an action might be to "Close" the loan.
We model Event at a state level as follows: Each Event encapsulates a Transaction Command and is uniquely identified by a (TxnHash, OutputIndex) and a created/consumed status.
We would prefer a polling mechanism to generate events on demand, but an asynch approach to generate events on ledger changes would be acceptable. Either way our challenge is in getting the Command from the Transaction.
We considered querying the States using the Vault Query API vaultQueryBy() for the polling solution (or vaultTrackBy() for the asynch Obvservalble Stream solution). We were able to create a flow that gets the txn for a state. This had to be done in a flow, as Corda deprecated the function that would have allowed us to do this in our Springboot client. In the client we use vaultQueryBy() to get a list of States. Then we call a flow that iterates over the states, gets txHash from each StateRef and then calls serviceHub.validatedTransactions.getTransaction(txHash) to get signedTransaction from which we can ultimately retrieve the Command. Is this the best or recommended approach?
Alternatively, we have also thought of generating events of the Transaction by querying for transactions and then building the Event for each input and output state in the transaction. If we go this route what's the best way to query transactions from the vault? Is there an Observable Stream-based option?
I assume this mapping of states to command is a common requirement for observers of the ledger because it is standard to drive contract logic off the transaction command and quite natural to have the command map to the user intent.
What is the best way to generate events that encapsulate the transaction command for each state created or consumed on the ledger?
If I understand correctly you're attempting to get a notified when certain types of ledger updates occur (open, approved, closed, etc).
First: Asynchronous notifications are best practice in Corda, polling should be avoided due to the added weight it puts on the node for constant querying and delays. Corda provides several mechanisms for Observables which you can use: https://docs.corda.net/api/kotlin/corda/net.corda.core.messaging/-corda-r-p-c-ops/vault-track-by.html
Second: Avoid querying transactions from the database as these are intended to be internal to the node. See this answer for background on why to avoid transaction querying. In general only tables that begin with "VAULT_*" are intended to be queried.
One way to solve your use case would be a "status" field which reflects the command that was used to produce the current state. For example: if a "Close" command was used to produce the state it's status field could be "closed". This way you could use the above vaultTrackBy to look at each state's status field and infer the action that occured.
Just to finish up on my comment: While the approach met the requirements, The problem with this solution is that we have to add and maintain our own code across all relevant states to capture transaction-level information that is already tracked by the platform. I would think a better solution would be for the platform to provide consumers access to transaction-level information (selectively perhaps) just as it does for states. After all, the transaction is, in part, a business/functional construct that is meaningful at the client application level. For example, If I am "transferring" a loan, that may be a complex business transaction that involves many input and output states and may be an important construct/notion for the client application to manage.
I have set up an IotHub that receives messages from a device. The Hub is getting the messages, and I am able to see the information reaching and being processed in TSI.
Metrics from TSI Azure
However, when trying to view the data in the TSI enviroment I get an error message saying there is no data.
I think the problem might have to do with setting up the model. I have created an hierarchy, types, and an instance.
model view - instance
As I understand it the instance fields are what is need to reference the set of data. In my case, the Json message being pushed thru the IOT HUb has a field called dvcid, in which "1" is the name of the only device sending values.
Am I doing something wrong?
How can i check the data being stored in TSI, like the rows and columns?
Is there an tutorial or example online where I can see the raw data going in and the model creation based on that data?
Thanks in advance
I also had a similar issue when I first tried using TSI. My problem was due to the timestamp I sent that was not in a proper format (the formatter sent things like "/Date(1547048015593+0100)/", which is not a typical way of encoding dates). When I specified the 'o' date to string format, it worked fine afterwards:
message.Timestamp = DateTime.UtcNow.ToString("o");
Hope this helps
f
I am using Kafka as a pipeline to store analytics data before it gets flushed to S3 and ultimately to Redshift. I am thinking about the best architecture to store data in Kafka, so that it can easily be flushed to a data warehouse.
The issue is that I get data from three separate page events:
When the page is requested.
When the page is loaded
When the page is unloaded
These events fire at different times (all usually within a few seconds of each other, but up to minutes/hours away from each other).
I want to eventually store a single event about a web page view in my data warehouse. For example, a single log entry as follows:
pageid=abcd-123456-abcde, site='yahoo.com' created='2015-03-09 15:15:15' loaded='2015-03-09 15:15:17' unloaded='2015-03-09 15:23:09'
How should I partition Kafka so that this can happen? I am struggling to find a partition scheme in Kafka that does not need a process using a data store like Redis to temporarily store data while merging the CREATE (initial page view) and UPDATE (subsequent load/unload events).
Assuming:
you have multiple interleaved sessions
you have some kind of a sessionid to identify and correlate separate events
you're free to implement consumer logic
absolute ordering of merged events are not important
wouldn't it then be possible to use separate topics with the same number of partitions for the three kinds of events and have the consumer merge those into a single event during the flush to S3?
As long as you have more than one total partition you would then have to make sure to use the same partition key for the different event types (e.g. modhash sessionid) and they would end up in the same (per topic corresponding) partitions. They could then be merged using a simple consumer which would read the three topics from one partition at a time. Kafka guarantees ordering within partitions but not between partitions.
Big warning for the edge case where a broker goes down between page request and page reload though.
Most examples of Flux use a todo or chat example. In all those examples, the data set you are storing is somewhat small and and be kept locally so not exactly sure if my planned use of stores falls in line with the flux "way".
The way I intend to use stores are somewhat like ORM repositories. A way to access data in multiple ways and persist data to the data service, whatever that might be.
Lets say I am building a project management system. I would probably have methods like these for data retrieval:
getIssueById
getIssuesByProject
getIssuesByAssignedUser
getIssueComments
getIssueCommentById
etc...
I would also have methods like this for persisting data to the data service:
addIssue
updateIssue
removeIssue
addIssueComment
etc...
The one main thing I would not do is locally store any issue data (and for that matter most store data that related to a data store). Most of the data is important to have fresh because maybe the issue status has updated since I last retrieved that issue. All my data retrieval method would probably always make an API requests to the the latest data.
Is this against the flux "way"? Are there any issue with going about flux in this way?
I wouldn't get too hung up on the term "store". You need create application state in some way if you want your components to render something. If you need to clear that state every time a different request is made, no problem. Here's how things would flow with getIssueById(), as an example:
component calls store.getIssueById(id)
returns empty object since issue isn't in store's cache
the store calls action.fetchIssue(id)
component renders empty state
server responds with issue data and calls action.receiveIssue(data)
store caches that data and dispatches a change event
component responds to event by calling store.getIssueById(id)
the issue data is returned
component renders data
Persisting changes would be similar, with only the most recent server response being held in the store.
user interaction in component triggers action.updateIssue(modifiedIssue)
store handles action, sending changes to server
server responds with updated issue and calls action.receiveIssue(data)
...and so on with the last 4 steps from above.
As you can see, it's not really about modeling your data, just controlling how it comes and goes.