Graphite — filter out N latest datapoints? - graphite

Is there a way to filter out the N latest data points of Graphite series?
The latest data points contain incomplete data and I want to filter them out. Timeshift does not cut it.

You could use relative time in the render query by modifying from/until as needed, that might help. https://graphite.readthedocs.io/en/latest/render_api.html#from-until
If it's only the most recent few datapoints you want to ignore, but if it's some data in the middle of the range, that wont work. If the data you want to omit our outliers of some sort, with values exceeding or far below the other data, you could use something like removeAbove/BelowValue or removeAbove/BelowPercentile: https://graphite.readthedocs.io/en/latest/functions.html#graphite.render.functions.removeAbovePercentile

Related

ggplot2 histogram with outlier bin from tibble?

Much like this previous question, I am trying to create some histograms with ggplot2 and would like to add a "greater than X"-bin that includes all values greater than some value.
In this particular case, I am dealing with a set of bike trip data with trip durations that vary from very small (under a minute) to very, very large (hours to days). I have filter()ed the set to remove the truly extraneous rows, but I would still like to show some of the outliers on the plot.
I had already found the solution given there, the package ggoutlier, but it evidently requires the data be in a data.frame—I get an error message that "is.data.frame(x) is not TRUE". This is correct, x is a tibble—is there another solution that works with a tibble?

Google Spreadsheet IF and AND

im trying to find an easy formula to do the following:
=IF(AND(H6="OK";H7="OK";H8="OK";H9="OK";H10="OK";H11="OK";);"OK";"X")
This actually works. But I want to apply to a range of cells within a column (H6:H11) instead of having to create a rule for each and every one of them... But trying as a range:
=IF(AND(H6:H11="OK";);"OK";"X")
Does not work.
Any insights?
Thanks.
=ArrayFormula(IF(AND(H6:H11="OK");"OK";"X"))
also works
arrayformulas work the same way they do in excel... they just need an ArrayFormula() around to work (will be automatically set when pressing Ctrl+Alt+Return like in excel)
In google sheets the formula is:
=ArrayFormula(IF(SUM(IF(H6:H11="OK";1;0))=6;"OK";"X"))
in excel:
=IF(SUM(IF(H6:H11="OK";1;0))=6;"OK";"X")
And confirm with Ctrl-Shift-Enter
This basically counts the number of times the said range is = to the criteria and compares it to the number it should be. So if the range is increased then increase the number 6 to accommodate.

Missing Time Series data in Hadoop

I have a big text file (in TBs), every line has a timestamp and some other data, like this:
timestamp1,data
timestamp2,data
timestamp5,data
timestamp7,data
...
timestampN,data
This file is ordered by timestamp but there might be gaps between consecutive timestamps. I need to fill those gaps and write the new file.
Can this be done in Hadoop Map Reduce? The reason for asking this question,to interpolate the missing lines I need the previous and next lines too. For Eg. To interpolate timestamp6, I need the values in timestamp5 and timestamp7. So what if, starting from timestamp7 sits in another data block in which case I will not be able to calculate timestamp6 at all..
Any other algorithm/solution? Maybe this can not be done with mapreduce? Can we do this in RHADOOP?
(Pig/Hive solutions are also valid)
Though my suggestion is a bit tedious and may impact a little bit performance also. You can implement your own RecordReader and at the end of all lines in the current split, get the first line of next split using its block location. I am suggesting this because, hadoop itself do this if last line of any mapper is incomplete. Hope this helps!!

GeoServer: extract raster data of a time period

is there a way to use WMS->GetFeatureInfo specifying a TIME period (eg: 2014-01-01/2014-03-01) to extract a series of values from a raster layer loaded from a GeoServer instance?
Thanks in advance
Not at the moment, no. It may be added in the future though, it's not the first time I hear this request. I don't have a ETA, it depends on when funding to work on it shows up.
In the meantime, a somewhat complex workaround might be to configure the image mosaic index as a WFS feature type, query it by date, figure out the exact time values intersected by the interval, and then do N GetFeatureInfo requests, one for each of those values.

JasperReports CategoryDataset has less data than expected?

I'm trying to develop a ChartCustomizer that takes the data from a chart and converts it into a histogram (because JR does not directly support histograms). It's a fairly simple implementation with hard-coded intervals, etc. mostly as a proof-of-concept at this point.
The data I'm analyzing is HTTP response-time data of the form [date, response-time] and I have a CSV file with 18512 records in it. In my summary band, I have 3 items:
A text field dumping $V{REPORT_COUNT} (it reports 18512 in iReport's report preview)
A time series showing all the data points [date, response-time]
A category plot containing all the data points in a single series [category=$F{DATE}, value=$F{RESPONSE_TIME}]
I decided that the most straightforward way to build a histogram would be to use the Category plot because it had the right structure for the final histogram chart.
When the ChartCustomizer runs, it dumps out all kinds of good information about the data set, including the size. Strangely, the size is 10252: it's missing something like 8000 data points. I can't understand why the category plot would have fewer data points than the whole data set.
Any ideas?
Answering my own question in case others run across this foolish user error.
The problem was that CategoryDataset only allows one data point per "category", and in my case, "category" was a java.util.Date captured from the web server log. Apparently, nearly half of my dates were duplicates and so part of the data set overwrote the other half, leaving a subset of the data.
That should have been totally obvious to me at the outset, because that is exactly how a category dataset works.
Anyhow, simply changing the category plot series's "category expression" from $F{DATE} to $V{REPORT_COUNT} gave each datum a unique category which makes everything work.

Resources