Best format to store incremental data in regularly using R - r

I have a database that is used to store transactional records, these records are created and another process picks them up and then removes them. Occasionally this process breaks down and the number of records builds up. I want to setup a (semi) automated way to monitor things, and as my tool set is limited and I have an R shaped hammer, this looks like an R shaped nail problem.
My plan is to write a short R script that will query the database via ODBC, and then write a single record with the datetime, the number of records in the query, and the datetime of the oldest record. I'll then have a separate script that will process the data file and produce some reports.
What's the best way to create my datafile, At the moment my options are
Load a dataframe, add the record and then resave it
Append a row to a text file (i.e. a csv file)
Any alternatives, or a recommendation?

I would be tempted by the second option because from a semantic point of view you don't need the old entries for writing the new ones, so there is no reason to reload all the data each time. It would be more time and resources consuming to do that.

Related

How to keep track of database changes

I'm working with Progress 11.6 appBuilder and procedure editor (and Data Dictionary).
Regularly we are doing modifications at the customer's database, there are two types of modifications:
Modifications of the structure: those are done, using interactive GUI of the data dictionary.
Modifications of the data: those are done, using the procedure editor
An example of a data modification in the procedure typically looks like this:
FOR EACH Table1 WHERE Table1.Field1 = <value>:
CREATE Table2.
Table2.Field1 = <value>.
Table2.Field2 = <some-other-value>.
END.
This is completely in contradiction with one of the basics of software delivery quantity, repeatability: there is no way to return to the previous situation!
Therefore I'm looking for ways to do this in an (automatable) repeatable way, hence my questions:
What can we use instead of the interactive GUI of data dictionary (without undo feature) in order to perform/undo database structure modifications?
What can we do in order to undo database data modifications? (Is there something like a Oracle redo log or a Oracle archive log in Progress?)
In case you say "What are you talking about? You can do "Undo transaction" in the data dictionary.", I mean the following:
I perform a transaction using the data dictionary, I leave the data dictionary and the day later the customer complains. When I open the data dictionary at that moment, the "Undo transaction" feature is disabled.
At a high level you should be creating "df files" (DDL scripts) and applying those to the customer database rather than manually making changes. There are many ways to create those files and you can automate the entire process with the appropriate tooling.
One of the most common ways to create a df file is to create whatever new schema you need in your development database and then use the "create an incremental df" facility in the data dictionary tool. This tool compares the development database schema to the target schema and builds a "df file" (DDL script) of the differences. You could connect directly to the target db for this process or you could have an empty skeleton db that you use for this.
How to create an incremental df file
(If you then reverse the comparison you can also create a reversing df file to undo the changes.)
Most df files consist of additions - new tables, new fields, new indexes. These can all be added online and that can all be completely scripted. And, of course, the individual df files and all of the supporting scripts can (and should) be stored in a repository (like git or whatever).
As for the data change scripts... there's no reason that those programs cannot be written as actual programs and saved in a repository. You can enclose the whole update in a transaction and UNDO it if that is appropriate. For what it is worth, I personally do not think that is a very good idea. Especially when large amounts of data are involved you really don't want to be creating monstrous multi-gigabyte undo logs. You're better off with a second "reversing transaction" script that will roll things back piecemeal. A side benefit is that you can still use that if you decide to back out the change a day or three afterwards.
The really gory details are going to depend on your development process and the customers change management process and the tooling available. It kind of sounds like there is not much process or tooling at either end of this relationship so you probably have a lot of adventures ahead of you!

Can I enter an existing DB into xamarin forms application?

I have some data contained in a CSV file, I need to efficiently access that information and want to importing it into my existing database.
I am wondering if I can make a pre-loaded database with the tables I need and then build the rest of the database on top of it (or make a second separate connection), or load the database from the CSV files on first startup.
What would be the preferred method and either way how would can I achieve it efficiently?
p.s 2 files are about 1000 lines long and 2 columns wide which seems to me to be considered fairly small... and the other ones really shouldn't be more then 10 lines long and 6-7 columns wide
Edit: realised I have a bunch of tables that need to be updated yearly, so any form that risks the user input data is unacceptable so using the existing DB is a not an option...

Airflow transfer data between tasks without storing data in between stages

I would like to know how to transfer data between tasks without storing them in between.
Attached image one can find the flow of tasks. As of now I am storing the output csv files of each task as a file in my local machine and fetching this csv file again as an input to next task. I wanted to know if there is any otherway to pass data between tasks without storing it after each task. I researched a bit and came across Xcoms. I wanted to make sure if Xcoms are the right way to achieve this or am I wrong. I could not find any practical examples. Any help is appreciated as I am just a newbie in airflow started couple of days
Short answer is no, tasks require data to be at rest before moving to the nest task. Xcom's are most suited to short strings that can be shared between tasks (file directories, object names, etc.). Your current flow of storing the data in csv files between tasks is the optimal way of running your flow.
XCom is intended for sharing little pieces of information, like the len of the sql table, any specific values or things like that. It is not made for sharing dataframes (which can be huge) because the shared information is written in the metadata database.
So either you keep exporting the csv to your computer (or uploading them somewhere), for reading it in the next Operator, or you combine the operators into one.

Meta-data from SQLite

Is there any way to query a SQLite database for basic meta data such as:
Last date/time updated
Hash of database to indicate "state"
I am just looking for a simple, infrastructural way to have a script evaluate different databases and take a reasonable point of view on whether they are the same "state" as other databases in a different environment (PROD and DEV for instance).
In my experience, if no update, new record, or any change is made to the SQLite database file, the last modified time of the file doesn't change. So the last modified time should suffice for the time of any change made to database.
If 2 database files with same state are only accessed for reading, their modified times are always the same.
Similarly you get the file sizes for comparison.
You can use the whole file to calculate hash. If you consider same data in the database as the same "state" regardless of any difference in the past, then maybe you want hash of the all records in database, which is probably not simple.

SQlite3 Optimization: Store external file name in db? Or just have a huge number of rows?

I am a newbie with no comp sci background. So please forgive me for whatever dumb stuff I may say. I am working on a solar power monitoring project to monitor the power output of the solar power systems my company installs. I am writing a client that will query the inverter (for power output, voltage output, current output, system errors/faults, etc--which constitutes one "reading") of each of our monitoring customers every 15 minutes for as long as they have their system--which means roughly 35k readings per year per customer. So I was thinking of organizing my sqlite3 database in one of the two following ways.
(1) Have the database be two tables, one table with regular customer information (name, email, etc) and another much bigger table where each row represents one reading and includes the customer id and timestamp of reading as identifiers. Which means roughly 35k rows will be being added to this bigger table per customer per year. (Data more than two years old will be pared down and archived.)
OR
(2) Store all readings in a csv file (one csv file per customer) and store the csv file name in my table with regular customer information
This database will be serving a website (built on rails if that makes any difference for options) where customers will be able to view their power output data. I want to minimize the amount of time it will take to load their output data on login. I basically don't have a clear idea of the amount of time it would take for my computer to open and read in lines from a text file versus open, look for (based on customer id) and read in the data from a huge sqlite3 table--and therefore am having trouble knowing how to judge between the two options above. Also I'm having trouble gauging the limits of sqlite3 where it functions optimally despite having read some about it (I don't think I have the background to understand the reading I did because it seems to say 100s of millions of rows are just fine when I read other people's comments seeming to say just the opposite.). I am also open to a completely different option as I'm not married to anything right now. Whatever makes things load faster. Thanks so much in advance!
Storing the parsed data in sqlite would definitely be a timesaver if you're doing any kind of repeated data mining on it. CSV Parsing overhead would almost instantly eat up any database space/time savings you'd gain.
As for efficiency, you'd have to test it. There's no one hard fast rule that says "use this database" or "use that database". It's ALWAYS a "depends on the scenario". SQLite may be perfect for you in this case, but be useless for someone else with a slightly different workload.
SQL applications in general do very well with large data sets, as long as the columns being queried are indexed. You should keep them in the same database. It will take a huge lot less to obtain the data from the database than for parsing CSV files. Databases are created with the purpose of storing and retrieving data, CSV files are not.
I use MySQL databases with tens of millions of rows per table and queries return results in fractions of a second. SQLite might be faster.
Just make sure you create indexes for what you will be searching.
I would do option 1, but use a database server such as PostgreSQL instead of SQLite.
SQLite will lock the table on update so you may run into locking issues if you read and write to the table a lot. SQLite is better suited for single user applications on the desktop or in a smartphone.
You can easily have millions of rows without it causing any problems.

Resources