Does anybody know how to take backup of riak database. So that I can restore it previous point if anything goes wrong. According to Basho's site, they have suggested that Rsync is the best strategy. I can copy database files by Rsync , but I am unable to link it with newly created node in riak cluster. Please help.
The best advice you will get on backing up Riak is located here: http://docs.basho.com/riak/kv/2.1.4/using/cluster-operations/backing-up/
Point-in-time backups are possible but challenging due to the nature of how your data is distributed around nodes and the built in repair mechanisms the Riak employs when nodes leave and return to the cluster. If you want to restore a cluster to a state that it was in at a given point-in-time then all nodes need to get restored back to that state at the same time which likely means downtime (which Riak is designed to avoid).
As to why you are unable to restore the node with a backup you made you don't provide enough information to determine why the restoration steps in the documentation (http://docs.basho.com/riak/kv/2.1.4/using/cluster-operations/backing-up/#restoring-a-node) aren't working for you.
Related
Very often in enterprise applications, something doesn't work as expected and you need to debug and create a fix.
Obviously you can't test in production as you might have to save something in order to debug it, and you don't want to be responsible for accidentally sending a $1M transaction by mistake!
With traditional applications, this process is done by copying the database from production to a dev environment (maybe redacting sensitive data) and duplicating and debugging the problem there.
In Corda you have multiple nodes involved, the nodes have specific keys and the network has a truststore hierarchy.
What is the process to replicate the production structure and copy all the data from production to development in order to debug?
I think it depends on how complicated your setup is.
The easy way to rigorously do this is within a mocknetwork during unit testing. (this is the most common setup, example here: https://github.com/corda/samples-kotlin/blob/master/Advanced/obligation-cordapp/workflows/src/test/kotlin/net/corda/samples/obligation/flow/IOUSettleFlowTests.kt)
Something I like to do a lot is to use intellij breakpoints in the flow / unit tests in order to be sure something works the way i expect.
Another way to do it is potentially using the testnet (again depends on your use case) https://docs.corda.net/docs/corda-os/4.7/corda-testnet-intro.html
Another way to do this is to write up a script to perform all of the transactions you want the nodes to do while running them locally on your machine just by using the corda shell on all the local nodes and feeding the transactions directly that way.
Copying data from production to apply to local network is gonna be hard because you can't fake all the transactions / state history without a lot of really painful editing of all the tables on each node.
Generally, I am excited by the Temporal Database feature.
However, mysqldump is not supported for database export and restore.
I can find no resource in the documentation (linked to above) that indicates which methods of backup and restore are safe to use for this type of database. Google searches do not seem to help.
Does anyone have any insights into using these MariaDB temporal databases in production environments? Or more specifically, in using them in development environments, and then transferring the database to a production environment and still keeping the history of the database intact?
I understands this something of a dev-ops question, but it seems pretty central issue to how to work with and around this new feature. Does anyone have an insights in moving these databases around and relying on that process in-production? Just wondering how mature this technology is, given that this issue (which seems pretty central) is not covered in the documentation.
Unfortunately, as the documentation states, while mysqldump will dump these tables, the invisible temporal columns are not included - the tool will only backup the current state of the tables.
Luckily, there are a couple of options here;
You can use mariadb-enterprise-backup or mariabackup which should support the new format of the temportal data and correctly back it up (these tools do binary backups instead of table dumps);
https://mariadb.com/docs/usage/mariadb-enterprise-backup/#mariadb-enterprise-backup
https://mariadb.com/kb/en/library/full-backup-and-restore-with-mariabackup/
Unfortunately, we have found the tool to be somewhat unreliable - especially when using the MyRocks storage engine. However, it is constantly improving.
To get around this, in our production servers we take advantage of the slave replication - which keeps the temporal data (and everything else) intact across all our nodes. We then do secondary backups by taking the slave nodes down and doing a straight copy of the database data files. For more information on how to set up replication, please refer to the documentation;
https://mariadb.com/kb/en/library/setting-up-replication/
So you could potentially set up dev-copy of the database with replication and just copy the data from there. However, in your case, mariabackup might also do the trick.
Regardless of how you do it, be wary of the system clock when setting up replication or when moving these files between systems. You can get some problems when the clock is not in sync (or if the systems are in different time zones). There is some official documentation (and mitigation) on this topic also;
https://mariadb.com/kb/en/library/temporal-data-tables/#use-in-replication-and-binary-logs
Looking at your additional comment - I am not aware of any way to get a complete image of a database as it looked at a given date (with temporal data included), directly from MariaDB itself. I don't think this information is stored in a way that makes this possible. However, there is a workaround even for this. You could potentially use the above method in combination with incremental rdiff backups. Then what you would do to solve it would be to;
Backup the database with any of the above methods.
Use rdiff-backup (https://www.nongnu.org/rdiff-backup/) on those backup files, running it once per day.
This would allow you to fetch an exact copy of how the database looked at any given date of your choice. rdiff-backup also fully supports ssh, allowing you to do things like,
rdiff-backup -r 10D host.net::/var/lib/mariadb /my/tmp/mariadb
This would fetch a copy of those backup files as they looked 10 days ago.
For future planning, according to https://mariadb.com/kb/en/system-versioned-tables/#limitations:
Before MariaDB 10.11, mariadb-dump did not read historical rows from versioned tables, and so historical data would not be backed up. Also, a restore of the timestamps would not be possible as they cannot be defined by an insert/a user. From MariaDB 10.11, use the -H or --dump-history options to include the history.
10.11 is still in development as of writing this answer.
In the publishing scenario I have, we have multiple deployers pushing content to both file system and database (broker). Pages and Binaries are put on the file system, everything else in the Broker. We have one of the deployers putting the content into the database. Is this the recommended best practice?
If the storage configurations in all deployers also put the content into the database, how does Tridion handle this? Could this cause duplicate entries, locking failures etc?
I'm afraid at the time of writing I don't have access to an environment to test how this would work.
SDL best practice is to have a one-to-one relationship between a deployer and a publication; that means so long as two deployers do not publish the same content (from the same publication) then they will not collide providing, if a file system, there is separation between the deployed sites e.g. www/pub1 & www/pub2.
Your explanation of your scenario needs some additional information to make it complete but it sounds most likely that there are multiple broker databases (albeit hosted on a single database server). This is the most common setup when dealing with multiple file systems on webservers, combined with a single database server.
I personally do not like this set up as I think it would be better to host file system content in a shared location & share single DB. Or better still deploy everything to the database and uses something like DD4T/CWA.
I have seen (and even recommended based on customer limitations) similar configurations where you have multiple deployers configured as destinations of a given target.
Only one of the deployers can write to the database for the same transaction, otherwise you'll have concurrency issues. So one deployer writes to the database, while all others write to the file system.
All brokers/web applications are configured to read from the database.
This solves the issue of deploying to multiple servers and/or data centers where using a shared file system (preferred approach) is not feasible - be it for cost or any other reason).
In short - not a best practice, but it is known to work.
Julian's and Nuno's approaches cover most of the common scenarios. Indeed a single database is a single point of failure, but in many installations, you are expected to run multiple schemas on the same database server, so you still have a single point of failure even if you have multiple "Broker DBs".
Another alternative to consider is totally independent delivery nodes. This might even mean running a database server on your presentation box. These days it's all virtual anyway so you could run separate small database servers. (Licensing costs would be an important constraint)
Each delivery server has it's own database and file system. Depending on how many you want, you might not want to set up multiple destinations/deployers, so you deploy to one, and use file system replication and database log shipping to mirror the content to the rest.
Of course, you could configure two deployment systems (or three) for redundancy, assuming you can manage all the clustering etc.
OK - to come clean - I've never built one like this, but I'm fairly sure elements of this kind of design will become more common as virtualisation increases, and licensing models which support it. (Maybe we have to wait for Tridion to support an open source database!)
I have got a Bizspark account from Microsoft and they are providing a basic Azure account. I have been told that it can run PHP, however I would like to use a more tested solution like WAMP. On top of that, I want to place a quite heavy WordPress / BuddyPress installation (that I hope will bring a lot of trafic :)
Has anyone done something similar to this? If so, what is your experience / pitfalls etc.?
Thanks
Stelios
Yes, you can do this. At the end of the day you are just using Windows Server, so anything that installs there will install in the cloud as well. I have done this myself for hosting WordPress in Windows Azure.
However, there are some pitfalls here. Mostly the pitfalls are around the M (MySQL). To setup MySQL in Windows Azure is not really that hard, but you have several considerations on how to make sure it is always available. You can:
Setup a single instance of MySQL in
a role and store the db on local
disk (this is a bad idea).
Setup a single instance of MySQL in
a role and store the db on a drive
(blob backed storage)
Setup 2 instances of MySQL to each
point to a shared drive
(hot-failover). Only one drives will
be able to mount. Now, you have reliability and failover, but a single instance at a time working for you.
Setup 1 writer of MySQL on a drive,
and multiple readers on a snapshot
of a drive. Put in some logic via
connection strings to make sure only
writes goto a single one and reads
to the others. Snapshot every X
mins to update readers.
Setup multiple instances of MySQL
and use native replication features
(each storing to local disk) and
rely on that if you lose an
instance.
There are probably more permutations, but the gist of the problem is how you scale out MySQL to be available and reliable. In Windows Azure, you don't get to rely on the fact that the local disk will always be around or that you will always have the same instance. In fact, you can guarantee that your instances will be down for some period of time each month and eventually, given enough time, you will lose the local disk.
Overall, with multiple instances however, you can guarantee they won't be down simultaneously (to the service SLA level at least). So, you need to make sure MySQL works with multiple instances (or live with single instance downtime) and that your data is backed by blob storage to guarantee it is persisted.
Or you can scrap all that crap and just use SQL Azure, which solves all those problems. So, it become WASP. SQL Azure can also be more economical as well for smaller DBs.
Or you can scrap all that crap and just use SQL Azure, which solves all those problems. So, it become WASP. SQL Azure can also be more economical as well for smaller DBs.
Ditto.
Installing MySQL on an Azure role is not a good idea for plenty of reasons, most notably (lack of) scalability and reliability. (That's just for deploying on Azure, MYSQL itself is great)
To set it up remotely reliably you're going to need a dedicated instance which will run you at least $40 a month, going with SQL Azure is $10/Gb, or free if you get an introductory offer or Bizspark.
If you're just looking to play around with a single instance app, I'd suggest you rather use SQLite or some other in memory db, it'll be a lot less painful.
There are certain tables that get called often but updated rarely. One of these tables is Departments. So to save DB trips, I think it is ok to cache this table taking into consideration that the table has very small size. However, once you cached it an issue of keeping the table data fresh occurs. So what is the best way to determine that the table is dirty and therefore requires a reload and how that code should be invoked. I look for solution that will be scalable. So updating the cache on single right after inserting will not resolve the issue. If one machine inserted the record all other on the farm should get notified to reload the cache. I was thinking for calling corresponding web service from T-SQL but don't really like the idea of consuming recourses on sql server. So what are the best practices to resolve this type of problems.
Thanks in advance
Eddy
There are some great distributed caching frameworks out there. Have a look at NCache and Velocity. NCache has some great features for keeping the cached data in sync between different cache nodes as well as the underlying database. But it comes at a price.
Have you tried using sql dependencies or cache dependencies? The library will pole the database every so often to see if the data has changed. An alternative is to use cache dependencies too. You can have a master cache object and have child caches depend on it. so if the master cache change the child caches will be updated.
Edit:
If the above is not a solution you can easily use memcached.net -- wikipedia. Geared toward large sites but it is a solution for your problem.
Here is an article that describes the thinking around setting up a cache.
http://www.javaworld.com/javaworld/jw-07-2001/jw-0720-cache.html?page=1
Generally speaking objects in a cache have lifetimes and when the lifetime expires they are re-fetched from the database. If the data is not so important this eventual consistency allows for a mixture of performance and accuracy of presented information.
Other cache tools add in additional techniques to keep data more accurate i.e. if a particular object is known to be updated then repopulate after the update command is executed.