Using R with IOT data logged using "Hysteresis Logging" (only log differences) - r

We want to perform data analysis on IOT data which is stored into our SQL Server database. The data itself is generated by IOT devices and some are using hysteresis based logging for data compression. Which means that it only logs a value when the data for that particular property has changed.
As an example, here's how it looks inside the database:
The Float and Timestamp are actually the interesting values we're looking for. The rest is meta data. AssetTypePropertyId is linked to the name of a certain property. Which describes what the value is actually about.
We can reshape this data into a 2d matrix, making it already more useable. However, since the data is compressed with hysteresis logging we need to 'recreate' the missing values.
To give an example we want to go from this 2d dataset:
To a set which has all the gaps filled in:
This is generated under the assumption that the previous value is valid as long as no new value has been logged for it.
My question: How can this transformation be done in R?

Related

How to decide whether to store data under 1 column or multiple columns?

I am recording the data for 3 counters and have the choice of using either of following schema:
Date|Sensor|Value
Date|Sensor1Value|Sensor2Value|Sensor3Value
When visualizing using either of the above schemas, x-axis will be the date. In case of 1st schema the sensor will be the legend and value will be the y-axis.
Whereas in case of the 2nd schema, each column will need to be added as y-axis, and there will be no legend.
What amongst the above 2 schemas are better suited for reporting (plotting graphs)?
The best answer will depend on 3 things:
the type of visualizations you're trying to build
which visualization(s) tool you're planning to use and
if you plan to add more sensor values in the future
Essentially, you're either going to pivot your data when storing it (second schema model with 1 column for each value) or you're going to store the data and rely on the visualization or your database query to perform the pivot logic.
In my experience in working with BI and analytics tools, it's almost always better to store data using the first Model (Date | Sensor | Value). This will provide the most flexibility when it comes to visualization tools and also if you need to add future sensor values, you won't need to modify your database table structure. If you need to convert your data into the second model, you can always build a View or temp table that uses a dynamic pivot query.

AnalysisServices: Cannot query internal supporting structures for column because they are not processed. Please refresh or recalculate the table

I'm getting the following error when trying to connect Power BI to my tabular model in AS:
AnalysisServices: Cannot query internal supporting structures for column 'table'[column] because they are not processed. Please refresh or recalculate the table 'table'
It is not a calculated column and the connection seems to work fine on the local copy. I would appreciate any help with this!
This would depend on how you are processing the data within your model. If you have just done a Process Data, then the accompanying meta objects such as relationships have not yet been built.
Every column of data that you load needs to also be processed in this way regardless of whether it is a calculated column or not.
This can be achieved by running a Process Recalc on the Database or by loading your tables or table partitions with a Process Full/Process Default rather than just a Process Data, which automatically runs the Process Recalc once the data is loaded.
If you have a lot of calculated columns and tables that result in a Process Recalc taking a long time, you will need to factor this in to your refreshes and model design.
If you run a Process Recalc on your database or a Process Full/Process Default on your table now, you will no longer have those errors in Power BI.
More in depth discussion on this can be found here: http://bifuture.blogspot.com/2017/02/ssas-processing-tabular-model.html

Azure Time Series - Can't get data

I have set up an IotHub that receives messages from a device. The Hub is getting the messages, and I am able to see the information reaching and being processed in TSI.
Metrics from TSI Azure
However, when trying to view the data in the TSI enviroment I get an error message saying there is no data.
I think the problem might have to do with setting up the model. I have created an hierarchy, types, and an instance.
model view - instance
As I understand it the instance fields are what is need to reference the set of data. In my case, the Json message being pushed thru the IOT HUb has a field called dvcid, in which "1" is the name of the only device sending values.
Am I doing something wrong?
How can i check the data being stored in TSI, like the rows and columns?
Is there an tutorial or example online where I can see the raw data going in and the model creation based on that data?
Thanks in advance
I also had a similar issue when I first tried using TSI. My problem was due to the timestamp I sent that was not in a proper format (the formatter sent things like "/Date(1547048015593+0100)/", which is not a typical way of encoding dates). When I specified the 'o' date to string format, it worked fine afterwards:
message.Timestamp = DateTime.UtcNow.ToString("o");
Hope this helps
f

R Models with Factors in Tableau

I'm attempting to build a model for sales in R that is then integrated into Tableau so I can look at the predictions as they relate to the actual values. The model I'm building for sales is in R, and I'm trying to integrate it into Tableau by creating a calculated field that uses the model to give the predicted value for each record using the SCRIPT_REAL function in Tableau. The records are all coming from a MySQL database connection. The issue that I'm having comes from using factors in my model (for example, month).
If I want to group all of the predictions by day of week, Tableau can't perform the calculation because it tries to aggregate each field I'm using before passing it into the model. When it tries to aggregate month, not all of the values are the same, so it instead returns a "". Obviously a prediction value then can't be reached because there is no value associated with a "". Essentially what I'm trying to do is get a prediction value for each record that I have, and then aggregate those prediction values in various ways.
Okay, now I can understand a little bit better what you're talking about. A twbx with dummy data (and even dummy model, but that generates the same problem you're facing) would help even more, but let me try to say a couple of things
One thing that is important to understand is that SCRIPT functions are like table calculations, i.e., they are performed only with aggregated fields, they are computed last (after all aggregations, measures and regular calculations) and you can define the level of aggregation you want.
So, if you want to display values on a daily basis, put your date field on page, go to the day level, and for the calculation partition by DAY(date_field). If you want by week, same thing.
I find table calculations (including R scripts) very useful when they are an end, i.e. the calculation is the answer. It's not so useful (or better, not so easily manipulable) when it's an end, like an intermediate step before a final calculation to get to the answer. That is mainly because the level of aggregation is based on the fields that are on page. So, for instance, if I have multiple orders from different clients, and want to assess what's the average order value by customer, table calculation is great, WINDOW_AVG(SUM(order_value)) partitioned by customer. If, for some reason, I want to sum all this values, then it's tricky. I can't do it directly, as the avg order value is not stored anywhere, and cannot be retrieved without all the clients being on page. So what I usually do is to create the table with all customers, export it to mdb, and reconnect in Tableau.
I said all this because it might be your problem, when you say "Tableau can't perform the calculation because it tries to aggregate each field I'm using before passing it into the model". Yes, Tableau does that and there's nothing you can do about it, but figure out a way around it. Creating an intermediate table in Tableau, exporting it, and connecting to it again in Tableau might be an answer. Performing the calculations in R, exporting it and then connecting to Tableau might be another way.
But again, without actually seeing what you're trying to do, it's hard to say what you need to do

Data Structure for storing dynamic data

I want to store data which is dynamic in nature. The user can create forms for capturing data. The data can be stored in various formats as per the configuration. The major ones are a RDBMS and a XML file. XML file format is pretty easy to store dynamic data and load it back.
I am not able to devise a data structure for a RDBMS. I currently store data in a key-value format and do a PIVOT for fetching it. For fields which have multiple values I store them as CSV in the value column.
Is there a better way for storing such dynamic data which helps in performance and extensibility?
Without knowing more about your application it is hard to say.
You could save the data as XML in a BLOB in the database. That would mean all your data was (sorta) handled the same way (as XML).
The other approach would be to change your database structure to hold nested data (which appears to be your problem). So instead of a straight key-value table you might hace a table structure that could reference itself (e.g. parent - key - value) and have a header table to hold the top level keys.
The real question though is why you want to use a database to hold the data. It seems the real problem is trying to fit a round peg into a square hole (or vice versa).

Resources