Reduce numbers of queries in OpenTSDB - opentsdb

I used opentsdb to save my time series data. Of each data point input, I must get 20 value of data points before. But, I have a large numbers of metrics, I can not call query opentsdb api too many times. How can I do to reduce numbers of query from openTSDB?

As far as I know you can't aggregate different metrics into one single result. But I would suggest two solutions:
You can put multiple metrics queries in one call. If you use HTTP
API endpoint you can do something like this:
http://otsdb:4242/api/query?start=15m-ago&m=avg:metric1{tag1=a}&m=avg:metric2{tag2=b}
You get the results for all queries with the same start(end) dates/times. But with multiple metrics don't forget that it will take longer time...
Redefine your time series.I don't know any details about your data, but if you're going to store and use data you should also think about usage - What queries am I going to use? How often? What would be the most common access to the data? And so on...
That's also what's advised from OpenTSDB documentation [1]:
Cardinality also affects query speed a great deal, so consider the queries you will be performing frequently and optimize your naming schema for those.
So, I would suggest to use tags to overcome this issue of multiple metrics. But as I mentioned I don't know your schema, but OpenTSDB is much more powerful with tags - there are many examples and also filtering options as well.
Edit 1:
From OpenTSDB 2.3 version there is also expression api: http://opentsdb.net/docs/build/html/api_http/query/exp.html
You should be able to handle multiple metric queries together (but I've never used that for any query).

Related

DynamoDB usable for largeish event table?

I'm thinking of re-architecting an RDS model to a DynamoDB one and it appears mostly to be working using a single-table design. We have, however a log table that can contain 5-10 million rows that are queried on many attributes.
Is there any pattern that might be applicable in migrating to DynamoDB or is this a case where full scans would be required and we would just be better off keeping the log stuff as a relational table?
Thanks in advance,
Nik
Those keywords and phrases "log" and "queried on many attributes" sound to me like DynamoDB is not the best solution for your log data. If the number of distinct queries is fairly limited and well-known in advance, you might be able to design your keys to fit your access patterns.
For example, if you commonly query on Color and Quantity attributes, you could design a key like COLOR#Red#QTY#25. And you could use secondary or global secondary indexes for queries involving other attributes similarly.
But it is not a great solution if you have many attributes that you need to query arbitrarily.
Alternative Solution: Another serverless option to consider is storing your log data in S3 and using Athena to query it using SQL.
You will likely be trading away a bit of latency and speed by taking this approach compared to RDS and DynamoDB. But queries against log data often don't need millisecond response times, so it can cover a lot of use cases.
Data modelling for DynamoDB
Write down all of your access patterns, in order of priority/most used
Research models which are similar to your use-case
Download NoSQL Workbench and create test models where you can visualize your ideas
Run commands against DynamoDB Local and test your access patterns are fulfilled.
Access Parterns
Your access patterns will ultimately decide if DynamoDB will suit your needs. If you need to query based on multiple fields you can have up to 20 Global Secondary Indexes which will give you some flexibility, but usually if you exceed 8-10 indexes then DynamoDB may not be a good choice or the schema is badly designed.
Use smart designs with sort-key and index-key overloading, it will allow you to group the data better and make your access patterns more efficient.
Log Data Use-case
Storing log data is a pretty common use-case for DynamoDB and many many AWS customers use it for that sole purpose. But I can't over emphasize the importance of understanding your access patterns and working backwards from those to create your model.
Alternatives
If you require query capability or free text search ability, then you could use DynamoDB integrations with OpenSearch (via Lambda/EventBridge) for example, with OpenSearch providing you the flexibility for your queries.
Doesn't seem like a good use case - I have done it and wasn't at all happy with the result - now I load 'log like' data into elasticsearch and much happier with the result.
In my case, I insert the data to dynamodb - to archive it - but also feed data in ES, but once in a while if I kill my ES cluster, I can reload all or some of the data from ddb.

How to model data in dynamodb if your access pattern includes many WHERE conditions

I am a bit confused if this is possible in DynamoDB.
I will give an example of SQL and explain how the query could be optimized and then I will try to explain why I am confused on how to model this and how to access the same data in DynamoDB.
This is not company code. Just an example I made up based on pcpartpicker filter.
SELECT * FROM BUILDS
WHERE CPU='Intel' AND 'OVERCLOCKED'='true'
AND Price < 3000
AND GPU='GeForce RTX 3060'
AND ...
From my understanding, SQL will first do a scan on the BUILDS table and then filter out all the builds where CPU is using intel. From this subset, it then does another WHERE clause to filter 'OVERCLOCEKD' = true so on and so forth. Basically, all of the additional WHERE clauses have a smaller number of rows to filter.
One thing we can do to speed up this query is to create an index on these columns. The main increase in performance is reducing the initial scan on the whole table for the first clause that the database looks at. So in the example above instead of scanning the whole db to find builds that are using intel it can quickly retrieve them since it is indexed.
How would you model this data in DynamoDB? I know you can create a bunch of secondary Indexes but instead of letting the engine do the WHERE clause and passing along the result to do the next set of filtering. It seems like you would have to do all of this yourself. For example, we would need to use our secondary index to find all the builds that use intel, overclocked, less than 3000, and using a specific GPU and then we would need to find the intersection ourselves. Is there a better way to map out this access pattern? I am having a hard time figuring out if this is even possible.
EDIT:
I know I could also just use a normal filter but it seems like this would be pretty expensive since it basically brute force search through the table similar to the SQL solution without indexing.
To see what I mean from pcpartpicker here is the link to the site with this page: https://pcpartpicker.com/builds/
People basically select multiple filters so it makes designing for access patterns even harder.
I'd highly recommend going through the various AWS presentations on YouTube...
In particular here's a link to The Iron Triangle of Purpose - PIE Theorem chapter of the AWS re:Invent 2018: Building with AWS Databases: Match Your Workload to the Right Database (DAT301) presentation.
DynamoDB provides IE - Infinite Scale and Efficiency.
But you need P - Pattern Flexibility.
You'll need to decide if you need PI or PE.

Is there a workaround for the Firebase Query "NOT-IN" Limit to 10?

I saw a similar question here: Is there a workaround for the Firebase Query "IN" Limit to 10?
The point now is, with the query in, the union works, but with the query
not-in it will be intersection and give me all the documents, anyone knows how to do this?
As #samthecodingman mentioned, it's hard to provide specific advice without examples / code, but I've had to deal with this a few times and there are a few generalized strategies you can take:
Restructure your data - There's no limit on the number of equality operators you can use You can use up to 100 equality operators, so one possible approach is to store your filters/tags as a map, for example:
id: 1234567890,
...
filters: {
filter1: true,
filter2: true,
filter3: true,
}
If a doc doesn't have a particular tag, you could simply omit it, or you could set it to false, depending on your use case.
Note, however, that you may need to create composite indexes if you want to combine equality operators with inequality operators (see the docs). If you have too many filters, this will get unwieldy quickly.
Query everything and cache locally - As you mentioned, fetching all the data repeatedly can get expensive. But if it doesn't change too often or it isn't critical to get the changes in real time, you can cache it locally and refresh at some interval (hourly or daily, for example).
Implement Full-Text Search - If neither of the previous options will work for you, you can always implement full-text search using one of the services Firebase recommends like Elastic. These are typically far more efficient for use-cases with a high number of tags/filters, but obviously there's an upfront time cost for setup and potentially an ongoing monetary cost if your usage is higher than the free tiers these services offer.

Combining multiple Firestore queries to get specific results (with pagination)

I am working on small app the allows users to browse items based on various filters they select in the view.
After looking though, the firebase documentation I realised that the sort of compound query that I'm trying to create is not possible since Firestore only supports a single "IN" operator per query. To get around this the docs says to use multiple separate queries and then merge the results on the client side.
https://firebase.google.com/docs/firestore/query-data/queries#query_limitations
Cloud Firestore provides limited support for logical OR queries. The in, and array-contains-any operators support a logical OR of up to 10 equality (==) or array-contains conditions on a single field. For other cases, create a separate query for each OR condition and merge the query results in your app.
I can see how this would work normally but what if I only wanted to show the user ten results per page. How would I implement pagination into this since I don't want to be sending lots of results back to the user each time?
My first thought would be to paginate each separate query and then merge them but then if I'm only getting a small sample back from the db I'm not sure how I would compare and merge them with the other queries on the client side.
Any help would be much appreciated since I'm hoping I don't have to move away from firestore and start over in an SQL db.
Say you want to show 10 results on a page. You will need to get 10 results for each of the subqueries, and then merge the results client-side. You will be overreading quite a bit of data, but that's unfortunately unavoidable in such an implementation.
The (preferred) alternative is usually to find a data model that allows you to implement the use-case with a single query. It is impossible to say generically how to do that, but it typically involves adding a field for the OR condition.
Say you want to get all results where either "fieldA" is "Red" or "fieldB" is "Blue". By adding a field "fieldA_is_Red_or_fieldB_is_Blue", you could then perform a single query on that field. This may seem horribly contrived in this example, but in many use-cases it is more reasonable and may be a good way to implement your OR use-case with a single query.
You could just create a complex where
Take a look at the where property in https://www.npmjs.com/package/firebase-firestore-helper
Disclaimer: I am the creator of this library. It helps to manipulate objects in Firebase Firestore (and adds Cache)
Enjoy!

OpenTSDB indexing on keys

As I've worked in my personal lab instance of OpenTSDB, I've started to wonder if it is possible to get it to index on tags as well as metric names. My understanding (correction is welcome...) is that OpenTSDB indexes only on metric names. So, suppose I have something like the following, borrowed from the docs:
tsd.hbase.rpcs{type=*,host=tsd1}
My understanding is that tsd.hbase.rpcs is indexed for searching, but that the keys (type=, host=, etc) are not. Is that correct? If so, is there a way to have them be indexed, or some reasonable approximation of it? Thanks.
Yes you are correct, according to the documentation, OpenTSDB creates keys in the 'tsdb' HBase table of the form
[salt]<metric_uid><timestamp><tagk1><tagv1>[...<tagkN><tagvN>]
When you do a query with specific tagk and tagv OpenTSDB can construct the key and look it up. If you have a range of tagk and tagv it will look up all the rows and either aggregate them or return multiple time series, depending on your query.
If you are interested in asking questions about tagks, you should use the OpenTSDB search/lookup api, however this still requires a metric name.
If you want to formulate your question around tagks only, you could consider forwarding your data to Bosun for indexing and using its API
/api/metric/{tagk}/{tagv}
Returns the metrics that are available for the specified tagk/tagv pair. For example, you can see what metrics are available for host=server01

Resources