Long term data logging - Sampling every 20 ms for 24 hours - sqlite

I am using max msp and the sql object described here :
https://cycling74.com/2008/09/05/data-collection-building-databases-using-sqlite/
I basically want to record an electrical voltage which I have streaming into max every 20 ms for 24 hours so I can play it back at various different speeds. Any idea how much space this is going to use up ? Id like to do this over 8 channels so 8 channels for 24 hours taking a sample every 20 ms - any tips appreciated.

This depends on the table structure and on the contents of the rows.
With eight integers per row, like in this Python test script:
import sqlite3
db=sqlite3.connect('logger.db')
c=db.cursor()
c.execute('CREATE TABLE log(ch1,ch2,ch3,ch4,ch5,ch6,ch7,ch8)')
for i in xrange(50*60*60*24):
c.execute('INSERT INTO log VALUES(1,2,3,4,5,6,7,8)')
db.commit()
the database ends up at about 100 MB.

Related

Predecessor lag converted into 4800 in MS Project while save it as XML

Description
If I have provided 1 day lag in predecessor's then it is converted into 4800 while saving it as XML
Please find the prepared link I have created in MS Project
Predecessor with lag:
Please find the link lag in XML:
How can I find the calculation behind this?
Assuming you are using MS Project's standard calendar, the calculation seems to be the duration in minutes (One 8hr work day = 480 minutes of duration) multiplied by 10 (480 x 10 = 4800).

Poor performance Arrow Parquet multiple files

After watching the mind-blowing webinar at Rstudio conference here I was pumped enough to dump an entire SQL server table to parquet files. The result was 2886 files, (78 entities over 37 months) with around 700 millons rows in total.
Doing a basic select returned all rows in less than 15 seconds! (Just out of this world result!!) At the webinar Neal Richardson from Ursa Labs was showcasing the Ny-Taxi dataset with 2 billions rows under 4 seconds.
I felt it was time to do something more daring like basic mean, sd, mode over a year worth of data, but that took a minute per month, so I was sitting 12.4 minutes waiting for a reply from R.
What is the issue? My badly written R-query? or simply too many files or granularity (decimal values?)??
Any ideas??
PS: I did not want to put a Jira-case in apache-arrow board as I see google search does not retrieve answers from there.
My guess (without actually looking at the data or profiling the query) is two things:
You're right, the decimal type is going to require some work in converting to an R type because R doesn't have a decimal type, so that will be slower than just reading in an int32 or float64 type.
You're still reading in ~350 million rows of data to your R session, and that's going to take some time. In the example query on the arrow package vignette, more data is filtered out (and the filtering is very fast).

Unexpected throughput with DynamoDB

I have a table in DDB with site_id as my hash key and person_id as the range key. There are another 6-8 columns on this table with numeric statistics about this person (e.g. times seen, last log in etc). This table has data for about 10 sites and 20 million rows (this is only used as a proof of concept now - the production table will have much bigger numbers).
I m trying to retrieve all person_ids for a given site where time_seen > 10. So I m doing a query using the hashkey and enter the time_seen > 10 as a criterion. This will result to a few thousand entries which I expected to get pretty much instantly. My test harness runs in AWS on the same region.
The read capacity on this table is 100 units. The results I m getting are attached.
For some reason I m hitting the limits. Given the only two limits I m aware of are the max data size returned. I m only returning 32 bytes per row (so approx 100KB per result) so no chance this is the case. The time as you see doesnt hit the 5 sec limit either. So why cant I get my results faster?
Results are retrieved in a single thread from C#.
Thanks

How do I create accumulated bandwidth usages in RRDtool (ie. GB's per month down)?

The following data comes from a mobile phone provider, it's a list of kb's downloaded at a certain time, usually on a per minute basis.
It's not the average, not the max, but the total of that time interval, which allows to track the data consumption precisely. These graphs were made with PIL, and instead of showing spikes to indicate a large data consumption, large steps can be seen, which is much more revealing, because it doesn't just tell "much happened here", but "exactly this much happened here". For example second graph Sat 10 at night 100mb. A rate-change graph wouldn't be as informative.
I'm also trying to find a way to do this with rrd.
I was mislead when using the COUNTER to track my networks data usage into thinking that I would be able to precisely compute the monthly/weekly accumulated data usage, but now it turned out to be a wrong assumption.
How I store my data in rrd in order to be able to easily generate graphs like below? Would that be by using ABSOLUTE and before updating it I would subtract the previous insertion value? Would that be precise down to the byte when checking the monthly usage?
You can add up all the value in your chart quite easily:
CDEF:sum=data,$step_width,*,PREV,ADDNAN
if your chart covers just one month, that should be all you have todo. If you want to have it cover multiple months, you will have to use a combination of IF and TIME operators to reset the line to 0 at the start of the month.
Version 1.5.4 will contain an additional operator called STEPWIDTH, which pushes the step width onto the stack, making this even simpler.
This is a common question which very few answers online but I first encountered a method to do this with RRD in 2009.
The DS type to use is a GAUGE and in your update script manually handle resetting the GAUGE to 0 at the start of the month for monthly usage graphs.
Then came along RRDTool's ' mrtg-traffic-sum ' package.
More recently I've had to monitor both traffic bandwidth and traffic volume so I created a standard RRD for that first and confirmed that was working.
So with the bandwidth being sampled (captured to the RRD), then use the mrtg-traffic-sum tool to generate the stat's needed as in the example below then pump them into another RRD created with just the GAUGE DS type and just LAST (no need for MIN/AVG/MAX).
This allows using RRDs to collect both traffic bandwidth as well as monthly traffic volumes / traffic quota limits.
root#server:~# /usr/bin/mrtg-traffic-sum --range=current --units=MB /etc/mrtg/R4.cfg
Subject: Traffic total for '/etc/mrtg/R4.cfg' (1.9) 2022/02
Start: Tue Feb 1 01:00:00 2022
End: Tue Mar 1 00:59:59 2022
Interface In+Out in MB
------------------------------------------------------------------------------
eth0 0
eth1 14026
eth2 5441
eth3 0
eth4 15374
switch0.5 12024
switch0.19 151
switch0.49 1
switch0.51 0
switch0.92 2116
root#server:~#
From mrtg-traffic-sum just write up a script that will populate your 2nd rrd with these values & presto you have a traffic volume / quota graph also.

What does this number mean?

I have one number that has something related to the currentdate :
634101448539930000
634101448627430000 (this information was took 9 seconds later than the first one)
I have many codes like this and I need to know what those number means. It's something related to the current time, because the new information has always a bigger number than the older ones.
Please if anybody could help me, thanks
It's a Tick Count.
There are 10,000 ticks in 1 ms. That number is the number of milliseconds that have passed since January 1st, 0001 at midnight.
This particular tick count represents the date/time 2010-05-22 17:07:33 (makes sense, since that's today).
Well, the second number is 87500000 greater than the first, and since they're about 9 seconds apart, I'd guess it was number of 100 nanoseconds since some epoch.
If you divide the number by 1x10^7, you get approximately the number of seconds since 1 Jan 0001 (ignoring the calendar reforms and all that stuff).
Dates and times are generally stored as a single number in computer languages. The number usually represents an offset from a specific data. For example it might be an offset from 1 Jan 1970. Usually you don't deal with these numbers. I suspect that if you look at your APIs you will find a function to convert these numbers into more meaningful representations.
It appears that those numbers represent your system's internal system time with precision down to the decimicrosecond (one ten-millionth of a second), as measured from the date January 1st, 1 A.D.
634101448627430000 - 634101448539930000 = 87500000
87500000 / 10,000,000 = 8.75
And, using Perl's Time::Duration module:
&duration_exact(634101448627430000/10_000_000);
2010 years, 263 days, 17 hours, 7 minutes, and 42 seconds
So from that we know that 63410144862 seconds ago was 2010 years ago, so the timestamp is based at the year 1 A.D.
It could be a variation of Unix Time ( a converter for unix time is available here ).
Unix time is a way of representing a date/time as a number that is easy to store and compare. Your numbers may be a substring of unix time, or may be a format designed by the author. I would examine your ASP.NET code to check how it is used, and you'll find out what it means.

Resources