Munin: Is it possible to recover lost graphs? - graph

I'm not extremely familiar with how munin works, so I apologize if this is obvious.
I've been using munin for a couple of my projects now and I've run into this twice where I will lose all my munin graphs that were generated from past events. This just happened to me this afternoon. Now, I only have a track record of all system events since this afternoon.
Are these graphs recoverable? Is there data stored somewhere thats used to generate these graphs?
If its unrecoverable, I would like to know what could possibly have caused this to occur. Granted, each time this has happened, I was messing w/ my munin config settings. For this case, I was adding new servers to be logged by munin...I dont see how doing that would cause munin to lose all data on my other servers.
Thanks.

the data is generated from .rrd databases which are updated during 'munin-update' (usually called from 'munin-cron'). You can find those files in the directory specified by 'dbdir' in munin.conf (/var/db/munin on my site). If your graphs only show events since this afternoon, my guess is that the .rrd files got deleted/recreated/corrupted. You should be able to restore them from backup and then call 'munin-graph' and 'munin-html'. See the respective man-pages for those commands.

Related

Running a R script to Update a Database

I have written a script in R that calculates a specific value for each of the S&P500 stocks. I would like to run this script every five minutes during trading hours and have the script upload the values to an online database.
I don't know anything about IT. I was thinking to run the script on AWS and have the script upload a SQL database or an AWS version of a SQL server every five minutes.
Do you guys have any ideas about how I should approach this problem? or any other methodologies I could use.
Thank you.
If you want to go the AWS route with a database then there are any number of ways to achieve this, but here is an outline of a fairly straightforward approach.
Launch a database. See e.g. https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Welcome.html.
Launch an EC2 instance. See e.g. https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/concepts.html.
Set up a cron job to launch your R script every 5 minutes during business hours from the EC2 instance. See e.g. https://www.cyberciti.biz/faq/how-do-i-add-jobs-to-cron-under-linux-or-unix-oses/.
Use an R package such as dplyr/dbplyr/DBI/odbc as suggested by #rpolicastro to connect to and write data to the database.
I'm glossing over a lot of complication in setting up the system in AWS, but hopefully this can get you started. Also, if you really care about ensuring that you never miss any data timepoints then you may either need to set up some kind of redundancy systems, or code in the ability to look backwards in time and fill in the missing timepoints.

Add a new Node - OpenLdap N-way

I have been running a OpenLdap N-Way Structure with two Master Nodes. This configuration have been running for some months whitout any problem. https://www.openldap.org/doc/admin24/replication.html
Now, I need to add a third node. The strange behavior is that, when I add this third node (clean Database), It starts to delete the entries of the others two nodes.
Seems that the "clean database" is getting replicated to the other servers, deleting several entries. It is creating a lot of problems even, to restore the backups.
I am looking for the best practice/way to add a completely new node in this environment that already is running, without losing data.
Also, is there some official documentation about the best way to Backup this environment?
Any information is welcome.
Thank you,

Per-asset versioning

Until now I have been using the standard "assets.version" configuration directive for the versioning of my assets. I am releasing a new production quite frequently (once a week or more). So if I change a single asset (e.g. javascript file), I increment my "version counter".
Here is my problem with this system: Changing a single line in an asset causes the invalidation of all assets of the whole application! This means that every weeks, users connecting to my application will re-download all assets! This appear to be quite inefficient to me...
My Question: is there a smarter system? for example we could imagine a console command to execute before each release that would track changes of every assets (using e.g. md5) and save the version to be used for every single asset? This way, only modified assets would be re-downloaded...
I know I can develop my own service and use assets.version_strategy like in this example But, before re-inventing the wheel, I would like to know if nothing similar already exists? It seams to me that every one should be using such a solution...
Thank you!
Vincent

Alfresco failed to find new folder

I'm using alfresco throw cmis.
On one of our environment, we have an issue.
We want to create a folder and put some docs in it.
This works fines in all our env except one.
In this one, we can create the folder.
But when we do a search to find the folder, the folder isn't found.
After that i can find it with the share gui.
I have no error message in the share app.
Does any one have an idea on what could be the issue?
Promoting a comment to an answer...
When using Alfresco with SOLR, you need to be aware that the SOLR index isn't quite real-time. Close to real time, sure, but it's asynchronous so there's always a lag. (It's an eventually consistent index, not a fully realtime one)
There's a lot of information on the Alfresco and SOLR Wiki, including the way you can query what the current lag is.
If the lag is very low (eg a lightly loaded system), you can find that SOLR will catch up almost instantly, and newly created items will show instantly in the search results. However, it's more normal to expect to have to wait a little bit, especially on more loaded systems.
If no new results are showing up even after several minutes, you'll want to follow the instructions on the wiki or the SOLR Monitoring and Troubleshooting docs to work out why and fix.

What's the best way to sync large amounts of data around the world?

I have a great deal of data to keep synchronized over 4 or 5 sites around the world, around half a terabyte at each site. This changes (either adds or changes) by around 1.4 Gigabytes per day, and the data can change at any of the four sites.
A large percentage (30%) of the data is duplicate packages (Perhaps packaged-up JDKs), so the solution would have to include a way of picking up the fact that there are such things lying aruond on the local machine and grab them instead of downloading from another site.
The control of versioning is not an issue, this is not a codebase per-se.
I'm just interested if there are any solutions out there (preferably open-source) that get close to such a thing?
My baby script using rsync doesn't cut the mustard any more, I'd like to do more complex, intelligent synchronization.
Thanks
Edit : This should be UNIX based :)
Have you tried Unison?
I've had good results with it. It's basically a smarter rsync, which maybe is what you want. There is a listing comparing file syncing tools here.
Sounds like a job for BitTorrent.
For each new file at each site, create a bittorrent seed file and put it into centralized web-accessible dir.
Each site then downloads (via bittorrent) all files. This will gen you bandwidth sharing and automatic local copy reuse.
Actual recipe will depend on your need.
For example, you can create 1 bittorrent seed for each file on each host, and set modification time of the seed file to be the same as the modification time of the file itself. Since you'll be doing it daily (hourly?) it's better to use something like "make" to (re-)create seed files only for new or updated files.
Then you copy all seed files from all hosts to the centralized location ("tracker dir") with option "overwrite only if newer". This gets you a set of torrent seeds for all newest copies of all files.
Then each host downloads all seed files (again, with "overwrite if newer setting") and starts bittorrent download on all of them. This will download/redownload all the new/updated files.
Rince and repeat, daily.
BTW, there will be no "downloading from itself", as you said in the comment. If file is already present on the local host, its checksum will be verified, and no downloading will take place.
How about something along the lines of Red Hat's Global Filesystem, so that the whole structure is split across every site onto multiple devices, rather than having it all replicated at each location?
Or perhaps a commercial network storage system such as from LeftHand Networks (disclaimer - I have no idea on cost, and haven't used them).
You have a lot of options:
You can try out to set up replicated DB to store data.
Use combination of rsync or lftp and custom scripts, but that doesn't suit you.
Use git repos with max compressions and sync between them using some scripts
Since the amount of data is rather large, and probably important, do either some custom development on hire an expert ;)
Check out super flexible.... it's pretty cool, haven't used it in a large scale environment, but on a 3-node system it seemed to work perfectly.
Sounds like a job for Foldershare
Have you tried the detect-renamed patch for rsync (http://samba.anu.edu.au/ftp/rsync/dev/patches/detect-renamed.diff)? I haven't tried it myself, but I wonder whether it will detect not just renamed but also duplicated files. If it won't detect duplicated files, then, I guess, it might be possible to modify the patch to do so.

Resources