Is there a way to get units with the datapoints when querying opentsdb? - opentsdb

I am trying to find a way to get units with the data points when querying opentsdb. I have saved units of metrics in the metadata and can fetch it using the metadata query but that just makes me run an extra query to get it. I was wondering if there is a way to get units in the data that we get back from opentsdb when we make the query (/api/query).

I think you want to get the raw data you put in openTSDB back. If so, it is hard for openTSDB to do it. openTSDB always aggregate data points at the same timestamp.
If you really want that, you can try kariosDB (a fork of openTSDB ) or you can try elasticsearch (which I think can handle everything openTSDB can)

OpenTSDB is time-series database i.e. storing a value for a particular point in time.
It allows to put just one value per timestamp (smallest time interval: 1 second) and per row key (metric name, timestamp, tag=val).
Suggestion 1:
You can put/write the unit as tag value i.e. unit:
When you query (api/query endpoint), you'll get the complete data:
<metric name> <time stamp> <the value> tag1=val1 tag2=val2
For example:
db.bytes_sent 1287333217 6604859181710 unit=kB host=db1
db.bytes_received 1287333232 327812421706 unit=Mb host=db1
db.bytes_sent 1287333232 6604901075387 unit=MB host=db1

Related

Preventing magic numbers used in firebase realtime database

So I want to specify a time after which a post gets deleted. The time is 3 months, in my code I would define this as
const THREE_MONTHS_IN_MS = 7889400000
export const TIME_AFTER_WHICH_USER_IS_DELETED = THREE_MONTHS_IN_MS
How can I define this in my database without resorting to the use of a magic number? Basically it looks like this right now:
timeAfterWhichUserIsDeleted: 7889400000
Or as a direct screenshot of the database: https://gyazo.com/67abfdc329e1e36aae4e66b0da4b4f75
I would like to avoid this value in the database and instead have it be more readable.
Any tips or suggestions?
The 7889400000 is a UNIX timestamp, indicating a number of milliseconds since the epic. While you can store a value indicating the same moment in a different format, you'll want to make sure that the format you use still allows you to query the value.
A common format that is both readable and queryable it ISO-8859-1, and my current time in that format would be 2022-03-22 06:50:48.
I noticed after re-reading your question that your timeAfterWhichUserIsDeleted is actually an interval and not a moment. If the interval is always going to be in months, you could store countOfMonthsAfterWhichUserIsDeleted: 3 as a more readable form of the same intent. Just note that 3 is equally magic as 7889400000 and the main difference is that I've named the field more meaningfully.

How do I keep track of the most recent time an item was read in DynamoDB?

I have a use case where I want to always know and be able to look up DynamoDB items by their last read time. What is the easiest way to do this (I would prefer not to use any other services).
You can recall items by their last read time in DynamoDB by using a combination of the UpdateItem API, Query API and GSIs.
Estimate the amount of time your application will randomly read 1MB worth of items from the DynamoDB table. Let's assume that we are working with small items and that each item is <=1KB, so then, if the RPS on the table is 100, it will take 10 seconds for your application to randomly read 1MB of data.
Create a GSI that projects all attributes and is keyed on (PK=read_time_bucket, SK=read_time). The WPS on the GSI should equal the RPS of the base table in the case of small items.
Use the UpdateItem API with the following parameters to “read” each item. The UpdateItemResult will contain the item and the updated last read time and bucket.:
(
ReturnAttributes=ALL_NEW,
UpdateExpression="SET read_time_bucket = :bucket, read_time = :time",
ExpressionAttributeValues={
":bucket": <a partial timestamp indicating the 10-second-long bucket of time that corresponds to now>,
":time": <epoch millis>
}
)
You can use the Query API to look up items by last read time on the GSI, using key conditions on read_time_bucket and read_time.
You will need to adjust your time bucket size and throughput settings depending on item size and read/write patterns on the base table. If item size is prohibitively large, restrict the projection to SELECTED_ATTRIBUTES or KEYS_ONLY.

How can I be notified of an index being updated on DynamoDB?

I have a table (key=username, value=male or female) and an index on the values.
After I add an item to the table, I want to update the counts of males and females. However, after a successful write, as the index is a Global Secondary Index, the count query is not consistent.
Is there a way (dynamo db Streams, Lambda, ...) to monitor when the index is up to date?
Note that Im not looking for a solution that involves something else (keep count of increments in redis or ...), what I describe here is a simplified problem to especially ask a question about how can I monitor an index in dynamo.
Thanks!
I am not sure if there is any mechanism currently provided to check this but, you can easily solve this problem by adding a single line of code to your query.
ConsistentRead = True
DynamoDB has a parameter when set as true will make sure that you get latest updated value.
Now, when you add/update the item and then query the data add ConsistentRead option in it, this will ensure that you will have latest count value.
Here is the reference link.
If you are able to accomplish using other technique then please do share it.
Hope that helps.

Dynamodb data model for process/transaction monitoring

I am wanting to keep track of multi stage processing job.
Likely just need the following fields
batchId (guid) | eventId (guid) | statusId (int) | timestamp | message (string)
There are relatively small number of events per batch.
I want to be able to easily query events that have a statusId less than n (still being processed or didn't finish processing).
Would using multiple rows for each status change, and querying for latest status be the best approach? I would use global secondary index but StatusId does not seem like a good candidate for hashkey (less than 10 statuses).
Instead of using multiple rows for every status change, if you updated the same event row instead, you could use a technique described in the DynamoDB documentation in the section 'Use a Calculated Value'. Basically this would involve adding another attribute (say 'derivedStatusId') which would be derived by appending a random number to statusId at the time of writing to DynamoDB. For example, for a statusId of 2, derivedStatusId could be one of {"2-00", "2-01", .. "2-99"}. Setting up a Global Secondary Index on derivedStatusId would give you some fan-out that will help in preventing the index from becoming hot.
If you are sure that you will use this index for only unfinished events, then removing the derivedStatusId attribute from the record when it transitions to a finished status will remove it from index as well - which may be a good property if events are expected to finish processing eventually, and if they stay around forever. This technique is called "Sparse Index" and is described in more detail here.
From your question, it seems like keeping status history recording is a desired property (I assume this because you want to have multiple rows for status changes). Consider putting this historical information in the same row. DynamoDB supports list data types and also has a generous 400KB item limit which may just allow you to capture all the desired historical information in the same record.

CouchDB Date functions

We are in the process of converting a rather large PHP/MySQL project to Angular/Node.js/CouchDB. The main problem I am running into now is that our MySQL queries are rather complicated, using a lot of date functions (like DATE_DIFF, DATE_FORMAT, etc.) and I don't know how to convert them to this new architecture.
How do most devs handle those types of functions in CouchDB? Do they just pull the raw data from the database and leave all of the calculations up to the controller/front-end?
Example query:
SELECT DATE_DIFF(NOW(),table.datefrom) as how_long, DATE_FORMAT(table.datefrom,'%m/%d/%Y') as formatted_date FROM table ORDER BY datefrom
How would that query be handled with CouchDB?
Datetimes are not a "native" type in CouchDB. However, you have several good options that you can choose between depending on the situation.
You can use a "timestamp" numeric value. (either in the native milliseconds, or converted to seconds if needed) You can get a timestamp for "now" with (new Date()).valueOf().
You can also break up the parts of your datetimes into an array. ([ year, month, day, hour, minute, second ]) This will enable you to use grouping to "drill down" into increasingly specific time-frames as well as query based on individual parts of the date.
If you want date manipulation and formatting from a tested library, you can pull in a 3rd party module like moment.js as a CommonJS module that you can use in your view/show/list/etc.
I can see one potential issue with your example query above. You are basically getting a "seconds since" via DATE_DIFF(NOW(), ...). In a view function, you won't be able to use a "transient" value like NOW() since views need to remain unaffected by outside variables/conditions. However, this is solved by adding a list function that can take your view results and transform the output to have "relative" values like what you are trying to achieve, and can also receieve querystring arguments to further add dynamism to your view.

Resources