Can Graphite (whisper) metrics be renamed? - graphite

I'm preparing to refactor some Graphite metric names, and would like to be able to preserve the historical data. Can the .wsp files be renamed (and possibly moved to new directories if the higher level components change)?
Example: group.subgroup1.metric is stored as:
/opt/graphite/storage/whisper/group/subgroup1/metric.wsp
Can I simply stop loading data and move metric.wsp to metricnew.wsp?
Can I move metric.wsp to whisper/group/subgroup2/metric.asp?

Yes.
The storage architecture is pretty flexible. Rename/move/delete away, just make sure update your storage-schema and aggregation settings for the new location/pattern.
More advanced use cases, like merging into existing whisper files, can get tricky but also can be done with the help of the included scripts. This contains an overview of the Whisper Scripts included. Check it out:
https://github.com/graphite-project/whisper
That said, it sounds like you don't already have existing data in the new target location so you can just move them.

Related

How to keep track of database changes

I'm working with Progress 11.6 appBuilder and procedure editor (and Data Dictionary).
Regularly we are doing modifications at the customer's database, there are two types of modifications:
Modifications of the structure: those are done, using interactive GUI of the data dictionary.
Modifications of the data: those are done, using the procedure editor
An example of a data modification in the procedure typically looks like this:
FOR EACH Table1 WHERE Table1.Field1 = <value>:
CREATE Table2.
Table2.Field1 = <value>.
Table2.Field2 = <some-other-value>.
END.
This is completely in contradiction with one of the basics of software delivery quantity, repeatability: there is no way to return to the previous situation!
Therefore I'm looking for ways to do this in an (automatable) repeatable way, hence my questions:
What can we use instead of the interactive GUI of data dictionary (without undo feature) in order to perform/undo database structure modifications?
What can we do in order to undo database data modifications? (Is there something like a Oracle redo log or a Oracle archive log in Progress?)
In case you say "What are you talking about? You can do "Undo transaction" in the data dictionary.", I mean the following:
I perform a transaction using the data dictionary, I leave the data dictionary and the day later the customer complains. When I open the data dictionary at that moment, the "Undo transaction" feature is disabled.
At a high level you should be creating "df files" (DDL scripts) and applying those to the customer database rather than manually making changes. There are many ways to create those files and you can automate the entire process with the appropriate tooling.
One of the most common ways to create a df file is to create whatever new schema you need in your development database and then use the "create an incremental df" facility in the data dictionary tool. This tool compares the development database schema to the target schema and builds a "df file" (DDL script) of the differences. You could connect directly to the target db for this process or you could have an empty skeleton db that you use for this.
How to create an incremental df file
(If you then reverse the comparison you can also create a reversing df file to undo the changes.)
Most df files consist of additions - new tables, new fields, new indexes. These can all be added online and that can all be completely scripted. And, of course, the individual df files and all of the supporting scripts can (and should) be stored in a repository (like git or whatever).
As for the data change scripts... there's no reason that those programs cannot be written as actual programs and saved in a repository. You can enclose the whole update in a transaction and UNDO it if that is appropriate. For what it is worth, I personally do not think that is a very good idea. Especially when large amounts of data are involved you really don't want to be creating monstrous multi-gigabyte undo logs. You're better off with a second "reversing transaction" script that will roll things back piecemeal. A side benefit is that you can still use that if you decide to back out the change a day or three afterwards.
The really gory details are going to depend on your development process and the customers change management process and the tooling available. It kind of sounds like there is not much process or tooling at either end of this relationship so you probably have a lot of adventures ahead of you!

What is the best way to handle time consuming dynamic generated reports downloads?

A website is serving continuously updated content (think stock exchange), is required to generate reports on-demand and files get downloaded by users. Users can customize the downloaded report based on lots of parameters.
What is the best practice in handling highly customized reports downloaded files as (.xls)?
How to cache and improve performance ?
It might be good to mention that the data is stored in RavenDb and the reports are expected to handle 100K results sizes.
Here are some pointers:
Make sure you haven static indexes defined in RavenDB to match all possible reports. You don't want to use dynamically generated temp indexes for this.
Probably one or more parameters will drastically change the query, so you may have some conditional logic to choose which of several query to run. This is especially true for different groupings, as they'll require a different map-reduce index.
Choose whether you want to limit your result set using standard paging with Skip and Take operators, or whether you are going to stream unbounded result sets.
However you build the actual report, do it in memory. Do not try to write it to disk first. Managing file permissions, locks, and cleanup is not worth the hassle. Plus, you risk taking servers down if they run out of disk space.
Preferably you should build the response and stream it out to your user in a single step, as to not require large amounts of memory on the server. Make sure you understand the yield keyword in C#, and that you work with IEnumerable and IQueryable directly whenever possible. Don't try to use .ToList() or .ToArray(), which will put the whole result set into memory.
With regard to caching, you could consider using a front-end cache like Memcached, but I'm not sure if it will help you here or not. You probably want as accurate of data that's possible from your database. Introducing any sort of cache will require you understand how and when to reset that cache. Keep in mind that Raven has several caching layers built in already. Build your solution without cache first, and then add caching if you need it.

Should I use Wordpress Transient API in this case?

I'm writing a simple Wordpress plugin for work and am wondering if using the Transients API is practical in this case, or if I should seek out another way.
The plugin's purpose is simple. I'm making a call to USZip Web Service (http://www.webservicex.net/uszip.asmx?op=GetInfoByZIP) to retrieve data. Our sales team is using a Lead Intake sheet that the plugin will run on.
I wanted to reduce the number of API calls, so I thought of setting a transient for each zip code as the key and store the incoming data (city and zip). If the corresponding data for a given zip code already exists, then no need to make an API call.
Here are my concerns:
1. After a quick search, I realized that the transient data is stored in the wp_options table and storing the data would balloon that table in no time. Would this cause a significance performance issue if the db becomes huge?
2. Is this horrible practice to create this many transient keys? It could easily becomes thousands in a few months time.
If using Transient is not the best way, could you please help point me in the right direction? Thanks!
P.S. I opted for the Transients API vs the Options API. I know zip codes don't change often, but they sometimes so. I set expiration time of 3 months.
A less-inflated solution would be:
Store a single option called uszip with a serialized array inside the option
Grab the entire array each time and simply check if the zip code exists
If it doesn't exist, grab the data and save the whole transient again
You should make sure you don't hit the upper bounds of a serialized array in this table (9,000 elements) considering 43,000 zip codes exist in the US. However, you will most likely have a very localized subset of zip codes.

Making Graphite UI data cumualtive by default

I'm setting up Graphite, and hit a problem with how data is represented on the screen when there's not enough pixels.
I found this post whose first answer is very close to what I'm looking for:
No what is probably happening is that you're looking at a graph with more datapoints than pixels, which forces Graphite to aggregate the datapoints. The default aggregation method is averaging, but you can change it to summing by applying the cumulative() function to your metrics.
Is there any way to get this cumulative() behavior by default?
I've modified my storage-aggregation.conf to use 'aggregationMethod = sum', but I believe this is for historical data and not for data that's displayed in the UI.
When I apply cumulative() everything is perfect, I'm just wondering if there's a way to get this behavior by default.
I'm guessing that even though you've modified your storage-aggregation.conf to use 'aggregationMethod = sum', your metrics you've already created have not changed their aggregationMethod. The rules in storage-aggregation.conf only affect new metrics.
To change your existing metrics to be summed instead of averaged, you'll need to use whisper-resize.py. Or you can delete your existing metrics and they'll be recreated with sum.
Here's an example of what you might need to run:
whisper-resize.py --xFilesFactor=0.0 --aggregationMethod=sum /opt/graphite/storage/whisper/stats_counts/path/to/your/metric.wsp 10s:28d 1m:84d 10m:1y 1h:3y
Make sure to run that as the same user who owns the file, or at least make sure the files have the same ownership when you're done, otherwise they won't be writeable for new data.
Another possibility if you're using statsd is that you're just using metrics under stats instead of stats_counts. From the statsd README:
In the legacy setting rates were recorded under stats.counter_name
directly, whereas the absolute count could be found under
stats_count.counter_name. With disabling the legacy namespacing those
values can be found (with default prefixing) under
stats.counters.counter_name.rate and stats.counters.counter_name.count
now.
Basically, metrics are aggregated differently under the different namespaces when using statsd, and you want stuff under stats_count or stats.counters for things that should be summed.

Is it possible (and wise) to add more data to the riak search index document, after the original riak object has been saved (with a precommit hook)?

I am using riak (and riak search) to store and index text files. For every file I create a riak object (the text content of the file is the object value) and save it to a riak bucket. That bucket is configured to use the default search analyzer.
I would like to store (and be able to search by) some metadata for these files. Like date of submission, size etc.
So I have asked on IRC, and also given it quite some thought.
Here are some solutions, though they are not as good as I would like:
I could have a second "metadata" object that stores the data in question (maybe in another bucket), have it indexed etc. But that is not a very good solution especially if I want to be able to do combined searches like value:someword AND date:somedate
I could put the contents of the file inside a JSON object like: {"date":somedate, "value":"some big blob of text"}. This could work, but it's going to put too much load on the search indexer, as it will have to first deserialize a big json object (and those files are sometimes quite big).
I could write a custom analyzer/indexer that reads my file object and generates/indexes the metadata in question. The only real problem here is that I have a hard time finding documentation on how to do that. And it is probably going to be a bit of an operational PITA as I will need to push some erlang code to every riak node (and remember to do that when I update the cluster, when I add new nodes etc.) I might be wrong on this, if so, please, correct me.
So the best solution for me would be if I could alter the riak search index document, and add some arbitrary search fields to it, after it gets generated. Is this possible, is this wise, and is there support for this in libraries etc.? I can certainly modify the document in question "manually", as a bucket with index documents gets automatically created, but as I said, I just don't know what's the right thing to do.

Resources