After I adapted my SQL dataset according to my model, the sensor list just disappeared from localhost, and the SQL command is not generating the expected data (a table from sample.db). Default dataset was Hyperion but I changed it to my dataset sensors.
https://forge.autodesk.com/en/docs/dataviz/v1/developers_guide/advanced_topics/sqlite_adapter/
What is the problem? What kind of table is expected to come from the SQL database?
Please note that the Data Viz Extension tutorial is a work-in-progress.
Before these tutorials are finished, I'd suggest that you take a look at a sample app that I've been working on: https://github.com/petrbroz/forge-iot-extensions-demo. It's also using the Data Visualization Extensions but it aims to be simpler and easier to reuse. By default, the IoT sensors, channels, and samples are defined in a simple JSON, but I've put together a separate code branch (sample/sqlite) where the IoT data is fetched from a sample sqlite database.
Related
I'm designing Data provisioning module in an big data system. Data provisioning is describe as
The process of providing the data from the Data Lake to downstream systems is referred to as Data Provisioning; it provides data consumers with secure access to the data assets in the Data Lake and allows them to source this data. Data delivery, access, and egress are all synonyms of Data Provisioning and can be used in this context.
in Data Lake Development with Big Data. I'm looking for some standards in designing this module, including how to secure the data, how to to identify some data is the data from the system, etc. I have searched on Google but there is not many results related to that keyword. Can you provide me with some advice or your own experience related to this problem? Every answer is appreciated.
Thank you!
Data Provisioning is mainly done by creating different Data Marts for your downstream consumers. For example, if you have a BigData system with data coming from various sources aggregated into one Data lake, yo can create different Data marts, like 'Purchase', 'Sales', 'Inventory' etc and let the down stream consume these. So a downstream consumer who needs only 'Inventory' data needs to consume only the 'Inventory' data mart.
Your best bet is to search for 'Data Marts'. For example, ref: https://panoply.io/data-warehouse-guide/data-mart-vs-data-warehouse/
Now you can fine tune the security, access control all based on the data mart. for example,
'sales' data is accessible only for sales reporting systems, users, groups etc.
Tokenize data in 'Purchase' data etc... All up to the business requirement.
Another way is to export the aggregate data via data export mechanisms. For example use 'Apache Sqoop' to offload data to an RDBMS. This is approach is advisable when the data to export is smaller enough to be exported for the downstream consumer.
Another way is to create separate 'Consumer Zones' in the same Data Lake, for exampele, be it a different Hadoop directory, or Hive DB.
My aim is to create a visual data representation of Temperature data coming from a street light via sensors. I am picking that data from the mongoDB database. Basically in my knowledge and experience, I used Shiny And R to show static data graphs from the database, but I am unable to find any way of showing continuously updating data in a moving line chart. I hope you get the picture.
Please refer to this example using node.js(I want to achieve results similar to this but using R)
https://www.youtube.com/watch?v=nauRfoNNEQs
My questions is, is there a way to show real time data visualization in R or I have to necessarily use node.js with plotly?
I'm not sure how "live" you can get the stream, but you might want to look into this:
https://shiny.rstudio.com/reference/shiny/latest/reactivePoll.html
http://shiny.rstudio.com/gallery/reactive-poll-and-file-reader.html
For example, lets say I wish to analyze a months worth of company data for trends. I plan on doing regression analysis and classification using an MLP.
A months worth of data has ~10 billion data points (rows).
There are 30 dimensions to the data.
12 features are numeric (integer or float; continuous).
The rest are categoric (integer or string).
Currently the data is stored in flat files (CSV) and is processed and delivered in batches. Data analysis is carried out in R.
I want to:
change this to stream processed (rather than batch process).
offload the computation to a Spark cluster
house the data in a time-series database to facilitate easy read/write and query. In addition, I want the cluster to be able to query data from the database when loading the data into memory.
I have an Apache Kafka system that can publish the feed for the processed input data. I can write a Go module to interface this into the database (via CURL, or a Go API if it exists).
There is already a development Spark cluster available to work with (assume that it can be scaled as necessary, if and when required).
But I'm stuck on the choice of database. There are many solutions (here is a non-exhaustive list) but I'm looking at OpenTSDB, Druid and Axibase Time Series Database.
Other time-series databases which I have looked at briefly, seem more as if they were optimised for handling metric data. (I have looked at InfluxDB, RiakTS and Prometheus)
Spark runs on Hadoop, Mesos, standalone, or in the cloud. It can
access diverse data sources including HDFS, Cassandra, HBase, and S3. - Apache Spark Website
In addition, the time-series database should store that data in a fashion that exposes it directly to Spark (as this is time-series data, it should be immutable and therefore satisfies the requirements of an RDD - therefore, it can be loaded natively by Spark into the cluster).
Once the data (or data with dimensionality reduced by dropping categoric elements) is loaded, using sparklyr (a R interface for Spark) and Spark's machine learning library (MLib, this cheatsheet provides a quick overview of the functionality), regression and classification model can be developed and experimented with.
So, my questions:
Does this seem like a reasonable approach to working with big data?
Is my choice of database solutions correct? (I am set on working with columnar store and time-series database, please do not recommend SQL/Relational DBMS)
If you have prior experience working with data analysis on clusters, from both an analytics and systems point of view (as I am doing both), do you have any advice/tips/tricks?
Any help would be greatly appreciated.
I have lots of data to wrangle and I need some help.
I have been using an excel file that has two worksheets of interest to me. They each produce a OLAP pivot table with the data I need to work with. What I would like to do is move those (.odc) connections to access queries so I don't have to hand paste all of this info out and manipulate it and then go through the whole process several more times.
One table is Throughput (number of parts through an operation(s)) by Part Number and by Date. The other is Hours Logged at the operation(s) by Part Number and by Date. I also have a master list of all part numbers with some more data that I have to mix in.
Biggest problem: Each chart is producing its own subset of dates and part numbers so I have to take care to match up the data to run the calculations. I've tried:
By hand. Got tired with that real quick.
Using LOOKUP, VLOOKUP, MATCH with INDIRECT and all sorts of tricks.
It's a mess. But I'm confident that if I can put the original pivot tables into Access I can add a few joins and write up a couple queries and it will turn out beautifully.
Worst comes to worse I can copy/paste the pivot table data into access by hand, but what if I want to change or expand the data set? I'd rather work with the raw data.
EDIT:
The data is held on SQL Server and I cannot change that.
The excel pivot tables use a .ODC file for the connection. They gives the following connection string:
Provider=MSOLAP.3;Integrated Security=SSPI;Persist Security Info=True;Initial Catalog=[MyCatalog];Data Source=[MySource];MDX Compatibility=1;Safety Options=2;MDX Missing Member Mode=Error
Provider=MSOLAP.4;Integrated Security=SSPI;Persist Security Info=True;Initial Catalog=[MyCatalog];Data Source=[MySource];MDX Compatibility=1;Safety Options=2;MDX Missing Member Mode=Error
(I replaced the actual catalog and source)
Can I use the .odc file information to create a pass through query in Access?
Have you consider using a proper OLAP server?
Comparison of OLAP Servers
Once setup you'll be able to connect your Excel's Pivot Table to the server (as well as other reporting tools).
Talked to our IT dept. The guy who built the Cubes is working on querying the same info into MS Access for me.
Thanks everyone.
I'm migrating/consolidating multiple FMP6 databases to a single C# application backed by SQL Server 2008. the problem I have is how to export the data to a real database (SQL Server) so I can work on data quality and normalisation. Which will be significant, there are a number of repeating fields that need to be normalised into child tables.
As I see it there are a few different options, most of which involve either connecting to to FMP over ODBC and using an intermediate to copy the data across (either custom code or MS Acess linked tables), or, exporting to flat file format (CSV with no header or xml) and either use excel to generate insert statements or write some custom code to load the file.
I'm leaning towards writing some custom code to do the migration (like this article does, but in C# instead of perl) over ODBC, but I'm concerned about the overhead of writing a migrator that will only be used once (as soon as the new system is up the existing DB's will be archived)...
a few little joyful caveats: in this version of FMP there's only one table per file, and a single column may have multi-value attributes, separated by hex 1D, which is the ASCII group separator, of course!
Does anyone have experience with similar migrations?
I have done this in the past, but using MySQL as the backend. The method I use is to export as csv or merge format and them use the LOAD DATA INFILE statement.
SQL Server may have something similar, maybe this link would help bulk insert