We have set up master slave configuration and we are storing bin logs of 30 days. It is taking up more space on server and we need find some better way to handle it.
We are thinking to store logs of 7 days only. Is it OK.
Yes, that's how often I do it. Just make sure you do a full-database backup every 7 days, and rsync or copy the binlogs to an off-site location more frequently.
Related
I am currently looking for a way to synchronize confidential files between two PCs (and possibly an always running raspberry pi - would serve as a host and backup).
On each PC I have an LUKS-encrypted partition. I want to synchronize the files in those partitions with the rpi, but I don't want to store them on rpi in clear text.
I think the only reliable way is to encrypt the files while still on the PC (in every other way the files could be obtained as long as there is physical access to the rpi).
One possible way is storing the files also in a encrypted partition of the rpi and sending the pass-phrase to the rpi every time I want to sync, but I did not find an extremely simple way to do this (e.g. Unison doesn't over such a feature) + the pass-phrase could be obtained by simple manipulations.
The second way I thought of was storing the files in an encrypted container an synchronizing the container, but with every little change the whole file would have to be uploaded to the rpi.
So, is there a fast way to encrypt single files (esp. only the changed ones and possibly combine it with synchronization right away)?
I read openssl is one way of encrypting single files.
I don't know much about encryption or synchronization, but I want to find a way that is reasonably safe and not more than reasonably complex and doesn't use any external services...
Thank you very much for reading and considering my question,
Max
Edit: One part that might solve my problem right away:
If I use a container (luks) and change some files, will the changes in the container file be proportional to the changes I made in the files AND will rsync only transmit the changed parts of the big container file?
Edit: After editing my question the first time I continued researching and found this article: Off Site Encrypted Backups using Rsync and AES
This article covers backing up files to a remote machine and encrypting them before transmitting them. The next step will be to compare files and use the more recent one. I can probably use a local sync mechanism (which rsync offers) if there not an option for that already.
Edit: I finally found this discussion debating whether a truecrypt container could be synced via rsync. The discussion concluded that it in fact is possible. This might be the perfect solution for me then. I would still be interested whether it is possible with luks-containers as well (I might try that out), but I will probably simply use truecrypt.
This discussion presents a solution.
If a truecrypt container is synced by rsync only the affected blocks of the container will be updated.
I tried out the procedure explained in the article using an LUKS-container (aes-xts-plain) and it worked, too. So, this answers my question.
I'm about to start developing an application to transfer very large files without any rush but with need of reliability. I would like people that had worked coding such a particular case give me an insight of what I'm about to get into.
The environment will be intranet ftp server> so far using active ftp normal ports windows systems. I might need to also zip up the files before sending and I remember working with a library once that would zip in memory and there was a limit on the size... ideas on this would also be appreciated.
Let me know if I need to clarify something else. I'm asking for general/higher level gotchas if any not really detail help. I've done apps with normal sizes (up to 1GB) before but this one seems I'd need to limit the speed so I don't kill the network or things like that.
Thanks for any help.
I think you can get some inspiration from torrents.
Torrents generally break up the file in manageable pieces and calculate a hash of them. Later they transfer them piece by piece. Each piece is verified against hashes and accepted only if matched. This is very effective mechanism and let the transfer happen from multiple sources and also let is restart any number of time without worrying about corrupted data.
For transfer from a server to single client, I would suggest that you create a header which includes the metadata about the file so the receiver always knows what to expect and also knows how much has been received and can also check the received data against hashes.
I have practically implemented this idea on a client server application but the data size was much smaller, say 1500k but reliability and redundancy were important factors. This way, you can also effectively control the amount of traffic you want to allow through your application.
I think the way to go is to use the rsync utility as an external process to Python -
Quoting from here:
the pieces, using checksums, to possibly existing files in the target
site, and transports only those pieces that are not found from the
target site. In practice this means that if an older or partial
version of a file to be copied already exists in the target site,
rsync transports only the missing parts of the file. In many cases
this makes the data update process much faster as all the files are
not copied each time the source and target site get synchronized.
And you can use the -z switch to have compression on the fly for the data transfer transparently, no need to boottle up either end compressing the whole file.
Also, check the answers here:
https://serverfault.com/questions/154254/for-large-files-compress-first-then-transfer-or-rsync-z-which-would-be-fastest
And from rsync's man page, this might be of interest:
--partial
By default, rsync will delete any partially transferred
file if the transfer is interrupted. In some circumstances
it is more desirable to keep partially transferred files.
Using the --partial option tells rsync to keep the partial
file which should make a subsequent transfer of the rest of
the file much faster
Is there any way to run an NBD (Network Block Device) client and server on the same machine without deadlocking the system?
I am very exhausted looking to find an answer for this. I appreciate if anyone can help.
UPDATE:
I'm writing an NBD server that talks to Google Storage system. I want to mount a file system on the NBD and backup my files. I will be hugely disappointed if I have to end up running the server on another machine. Few ideas I already had seem to lead nowhere:
telling the file system to open the block device using O_DIRECT flag to bypass the linux buffer cache
using a raw device (unfortunately, raw devices are character devices and FSes refuse to use them as underlying device)
Just for the record, having the NBD client and server on the same machine has been possible since 2008.
Use a virtual machine (not a container) - you need two kernels, but you don't need two physical machines.
Since the front page of the Sourceforge project for NBD say that a deadlock will happen "within seconds" in this scenario, I'm guessing the answer is a big "No."
Try to write a more complete question of what actual goal you're trying to accomplish. There's some times that you need to bang away at a little problem, and some times that you need to look at the big picture.
I would like to know how people dealing with logging across multiple web servers. E.g. Assume there are 2 webservers and some events during the users session are serviced from one, some from the other. How would you go about logging events from the session coherently in one place (without e.g.creating single points of failure)? Assuming we are using: ASP.Net MVC, log4net.
Or am I looking at this the wrong way - should I log seperately and then merge later?
Thanks,
S
UPDATE
Please also assume that the load balancers will not guarantee that a session is stuck to one server.
You definitely want your web servers to log locally rather than over a network. You don't want potential network outages to prevent logging operations and you don't want the overhead of a network operation for logging. You should have log rotation set up and all your web servers clock's synced. When log rotation rolls your log files over to a new file, have the completed log files from each web server shipped to a common destination where they can be merged. I'm not a .net guy but you should be able to find software out there to merge IIS logs (or whatever web server you're using). Then you analyze the merged logs. This strategy is optimal except in the case that you need real-time log analysis. Do you? Probably not. It's fairly resilient to failures (assuming you have redundant disks) because if a server goes down, you can just reboot it and reprocess any log ship, log merge or log analysis operations that were interrupted.
An interesting solution alternative:
Have 2 log files appenders
First one in the local machine
In case of network failure you'll keep this log.
Second log to a unix syslog service remotely (of course
a very consistent network connection)
I used a similar approach long time ago, and it work really well, there are
a lot of nice tools for analyzing unix logs.
Normally your load balancing would lock the user to one server after the session is started. Then you wouldn't have to deal with logs for a specific user being spread across multiple servers.
One thing you could try is to have the log file in a location that is accessible by all web servers and have log4net configured to write to it. This may be problematic, however, with multiple processes trying to write to the same file. I have read about NLog which may work better in this scenario.
Also, the log4net FAQ has a question and possible solution to this exact problem
We have an ASP.NET file delivery app (internal users upload, external users download) and I'm wondering what the best approach is for distributing files so we don't have a single point of failure by only storing the app's files on one server. We distribute the app's load across multiple front end web servers, meaning for file storage we can't simply store a file locally on the web server.
Our current setup has us pointing at a share on a primary database/file server. Throughout the day we robocopy the contents of the share on the primary server over to the failover. This scneario ensures we have a secondary machine with fairly current data on it but we want to get to the point where we can failover from the primary to the failover and back again without data loss or errors in the front end app. Right now it's a fairly manual process.
Possible solutions include:
Robocopy. Simple, but it doesn't easily allow you to fail over and back again without multiple jobs running all the time (copying data back and forth)
Store the file in a BLOB in SQL Server 2005. I think this could be a performance issue, especially with large files.
Use the FILESTREAM type in SQL Server 2008. We mirror our database so this would seem to be promising. Anyone have any experience with this?
Microsoft's Distributed File System. Seems like overkill from what I've read since we only have 2 servers to manage.
So how do you normally solve this problem and what is the best solution?
Consider a cloud solution like AWS S3. It's pay for what you use, scalable and has high availability.
You need a SAN with RAID. They build these machines for uptime.
This is really an IT question...
When there are a variety of different application types sharing information via the medium of a central database, storing file content directly into the database would generally be a good idea. But it seems you only have one type in your system design - a web application. If it is just the web servers that ever need to access the files, and no other application interfacing with the database, storage in the file system rather than the database is still a better approach in general. Of course it really depends on the intricate requirements of your system.
If you do not perceive DFS as a viable approach, you may wish to consider Failover clustering of your file server tier, whereby your files are stored in an external shared storage (not an expensive SAN, which I believe is overkill for your case since DFS is already out of your reach) connected between Active and Passive file servers. If the active file server goes down, the passive may take over and continue read/writes to the shared storage. Windows 2008 clustering disk driver has been improved over Windows 2003 for this scenario (as per article), which indicates the requirement of a storage solution supporting SCSI-3 (PR) commands.
I agree with Omar Al Zabir on high availability web sites:
Do: Use Storage Area Network (SAN)
Why: Performance, scalability,
reliability and extensibility. SAN is
the ultimate storage solution. SAN is
a giant box running hundreds of disks
inside it. It has many disk
controllers, many data channels, many
cache memories. You have ultimate
flexibility on RAID configuration,
adding as many disks you like in a
RAID, sharing disks in multiple RAID
configurations and so on. SAN has
faster disk controllers, more parallel
processing power and more disk cache
memory than regular controllers that
you put inside a server. So, you get
better disk throughput when you use
SAN over local disks. You can increase
and decrease volumes on-the-fly, while
your app is running and using the
volume. SAN can automatically mirror
disks and upon disk failure, it
automatically brings up the mirrors
disks and reconfigures the RAID.
Full article is at CodeProject.
Because I don't personally have the budget for a SAN right now, I rely on option 1 (ROBOCOPY) from your post. But the files that I'm saving are not unique and can be recreated automatically if they die for some reason so absolute fault-tolerance is necessary in my case.
I suppose it depends on the type of download volume that you would be seeing. I am storing files in a SQL Server 2005 Image column with great success. We don't see heavy demand for these files, so performance is really not that big of an issue in our particular situation.
One of the benefits of storing the files in the database is that it makes disaster recovery a breeze. It also becomes much easier to manage file permissions as we can manage that on the database.
Windows Server has a File Replication Service that I would not recommend. We have used that for some time and it has caused alot of headaches.
DFS is probably the easiest solution to setup, although depending on the reliability of your network this can become un-synchronized at times, which requires you to break the link, and re-sync, which is quite painful to be honest.
Given the above, I would be inclined to use a SQL Server storage solution, as this reduces the complexity of your system, rather then increases it.
Do some tests to see if performance will be an issue first.