Graphite null json data responses, and more frequent datapoints than expected - graphite

I am having troubles interpreting the data coming back from render API on some metrics I have set up in Graphite. I have a splunk query populating the results, and graphite is configured to poll this query which has a 1 minute range every one minute. The results I am getting back when doing a render api call with format=json are not making much sense to me. If I add into the api call a from=-1min, I get 2 datapoint results, 4 for -2min, etc. Along with this, often the datapoints are just given a null value which should never happen given how populated this splunk query is when I run it manually for the same time ranges.
I am wondering if there is any additional documentation on the render api other than what is listed here: https://graphite.readthedocs.io/en/0.9.15/render_api.html because it really isn't giving me much on why the data is appearing the way it is. Does anyone know of additional docs, or have any insight into what is going on here?

Related

Azure Time Series - Can't get data

I have set up an IotHub that receives messages from a device. The Hub is getting the messages, and I am able to see the information reaching and being processed in TSI.
Metrics from TSI Azure
However, when trying to view the data in the TSI enviroment I get an error message saying there is no data.
I think the problem might have to do with setting up the model. I have created an hierarchy, types, and an instance.
model view - instance
As I understand it the instance fields are what is need to reference the set of data. In my case, the Json message being pushed thru the IOT HUb has a field called dvcid, in which "1" is the name of the only device sending values.
Am I doing something wrong?
How can i check the data being stored in TSI, like the rows and columns?
Is there an tutorial or example online where I can see the raw data going in and the model creation based on that data?
Thanks in advance
I also had a similar issue when I first tried using TSI. My problem was due to the timestamp I sent that was not in a proper format (the formatter sent things like "/Date(1547048015593+0100)/", which is not a typical way of encoding dates). When I specified the 'o' date to string format, it worked fine afterwards:
message.Timestamp = DateTime.UtcNow.ToString("o");
Hope this helps
f

Graphite Render URL API to Splunk - Track received events?

I'd like to setup a scripted input in Splunk to do a curl against the render url api for Graphite. I imagine I could configure this input to run on the minute, and retrieve that last minutes worth of events.
My concern with this is that some events might be missed, or duplicated.
Has anybody done something similar to this? How could I keep track of the events from Graphite that I have already read?
If you write a modular input you can use data checkpoints. See the docs for more info: http://docs.splunk.com/Documentation/Splunk/6.2.1/AdvancedDev/ModInputsCheckpoint
My concern with this is that some events might be missed, or duplicated.
Yes, it may go missing. In two cases-
If you're pushing your graphite server to the limits, there is a lag between the point wherein the datapoint is received and its flushing to disk. With large queues, i have seen this go upto 20 mins. (IO is the constraint here).
For example- in the case above wherein there's a 20 minute lag, and i am storing data at a 1m granularity- i will have the latest 20 datapoints with NULL against the timestamp. Of-course, they will soon fill in with the next flush.
Know that these are indeterminate. So if you have a zero lag deployment- go for this approach.
The latest datapoint can or cannot be NULL at any given point, because of the flushing nature of graphite, even if nothing is throttling. You can use something like &from=-21m&to=-1m to make sure you never encounter this. Note: Your monitoring now lags by a minute. :)
All said, graphite is a great monitoring tool if your requirements aren't realtime.

Get all leads programmatically in Marketo v1

I would like to get all the leads that a customer has in Marketo.
I understand that you can Get Multiple Leads by Filter Type REST API endpoint.
If I do not have access to their Marketo UI, how should I get all the leads?
I was thinking about querying 300 ids at a time until there were no more results. But I am unsure about how to handle if all leads in a 300 batch are deleted, but there are leads after that deleted batch. Are deleted leads returned?
I'll describe a workaround you can use to determine the last lead that was created in Marketo using the REST API. You can use this lead as the upper bound for lead id, and then query leads 300 at a time until you reach this upper bound as you described.
The workaround is to use the Get Lead Activities API to return activities for most recent leads that were created. By calling this API, you can determine the last lead that was created in Marketo, and then use it as your upper bound.
Here are some tips for calling the Get Lead Activities API:
Specify an activityTypeIds=12 parameter to return
activities for new leads.
Include a paging token parameter as the start date to
look for the most recent leads that were created. To generate a paging token, you will need to use
the Get Paging Token API.
To optimize this, start with a time range that is close to the
current date. For example, first query the Get Lead Activities API
for leads created in the past hour. Then if there are no results,
query for the past day, and so on.
Iterate through the results from the Get Lead Activities
API until the moreResult attribute in the response is false. The last
lead returned will be the upper bound for leads ids.
For example, a call to the Get Lead Activities API will look like:
/rest/v1/activities.json?nextPageToken=GIYDAOBNGEYS2MBWKQYDAORQGA5DAMBOGAYDAKZQGAYDALBQ&activityTypeIds=12

opentsdb api query pagination

I am using OpenTSDB for my time series data.
I have a use-case on the front end in which a user can fetch data from OpenTSDB between specific dates:
http://localhost:5000/api/query?start=2014/06/04%2020:30&end=2014/09/18%2000:00&m=sum:cpu_system
My problem is that the returned data is too large. Like, thousand of records if I fetch data for an interval of more than one day. The service call than takes a couple of minutes which is giving a bad user experience on front end.
I want to apply pagination on the service call so that it will take less time.
The /api/query documentation does not have any mention of pagination. The /api/search documentation does offer pagination, but does not have any mention of time ranges.
How can I query over a time range with pagination?
There is no native pagination support in queries, but you can always emulate it by splitting your time range in multiple queries, so that, for example, you only ask for a day for each query.
Another solution that may be feasible in some cases, is to ask OpenTSDB to downsample the data. This way it will return a lot less data points, and your application will have less data to download and process.

Is there any way to fill in missing data in graphite when using statsD?

I'm using statsD to report counter data to graphite; sends a tick everytime I get a message. This works great, except in the situation when statsD has to restart for whatever reason. Then I get huge holes in my graphs, since statsD is now no longer sending '0' every 10 seconds for periods when I didn't get any messages.
I'm reporting for various different message types and queues, and sometimes I don't get a message for a particular queue for a long time.
Is there any existing way to 'fill-in' the missing data with a default value I specify (in my case this would be 0)?
I thought about sending a '0' count for a given metric so that statsD starts sending 0's for it, but I don't always know the set of metrics I'll be reporting in advance.
Check out the function transformNull that Graphite provides. e.g.
transformNull(stats.timers.deploys.all.duration.total.mean, 0)
This will map sections with null data to 0.
You can use the "keepLastValue(requestContext, seriesList)" function in graphite to deal with missing data. It "[c]ontinues the line with the last received value when gaps (‘None’ values) appear in your data, rather than breaking your line."
If you just want to "fill in" the visual graph with zeros, look at "Graph Options -> Line Mode -> Draw Null as Zero". This won't let you set a value other than 0, and it won't cause 0's to show up if you get the data in json or csv format, but it's often what you want if you just want to see a graph with some stretches where no data gets recorded.
The solution to this problem is not to keep the last value or transform nulls. Implementing one of those options will only cause you to display incorrect data, and you will not be alerted when something is wrong.
You need to change your storage schema so that it stores the amount of data that you're sending, and no more.
If metrics are being sent every 5s and your storage schema says 1s, you will get five data points, four of which will be null.
Check out this doc: https://github.com/etsy/statsd/blob/master/docs/graphite.md

Resources