rsync and logrotate transfers old logs every day - rsync

I have Apache web server producing considerable amount of log files. I do logrotate them every day on the server. Every night I rsync server backups over Internet to another computer.
The speedup of log files sync is very close to 1, because logrotate renames all the files and rsync treat them as totally different, because they really are.
I guess this is common problem, what tools would you recommend me to use. I want to keep some history of logs (like 50 days) on the server and whole log history on backup. The night job should only transfer logs from last day.

Use the dateext option in logrotate:
Archive old versions of log files adding a daily extension like YYYYMMDD instead of simply adding a number.

Related

Rsync - How to display only changed files

When my colleague and I upload a PHP web project to production, we use rsync for the file transfer with these arguments:
rsync -rltz --progress --stats --delete --perms --chmod=u=rwX,g=rwX,o=rX
When this runs, we see a long list of files that were changed.
Running this 2 times in a row, will always show the files that were changed between the 2 transfers.
However, when my colleague runs the same command after I did it, he will see a very long list of all files being changed (though the contents are identical) and this is extremely fast.
If he uploads again, then again there will be only minimal output.
So it seams to me that we get the correct output, only showing changes, but if someone else uploads from another computer, rsync will regard everything as changed.
I believe this may have something to do with the file permissions or times, but would like to know how to best solve this.
The idea is that we only see the changes, regardless who does the upload and in what order.
The huge file list is quite scary to see in a huge project, so we have no idea what was actually changed.
PS: We both deploy using the same user#server as target.
The t in your command says to copy the timestamps of the files, so if they don't match you'll see them get updated. If you think the timestamps on your two machines should match then the problem is something else.
The easiest way to ensure that the timestamps match would be to rsync them down from the server before making your edits.
Incidentally, having two people use rsync to update a production server seems error prone and fragile. You should consider putting your files in Git and pushing them to the server that way (you'd need a server-side hook to update the working copy for the web server to use).

How do I set up a schedule task on my server?

I have a shared host with ASP MVC, my worker process times out after 5 minutes causing site to take up to 30 seconds to restart. I can't edit these settings with shared hosting. I found some info online where I can use a schedule task that will keep hitting the site every few minutes keeping it from going idle.
Executable C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe
Argument -c "(new-object
system.net.webclient).downloadstring('http://[domain.tld][path][file_name]')"
Not sure about the Executable and the Argument? I not sure what to put there. Should I put the path to the home page? Or to a page with few views like the privacy page?
What's a good practice setup to keep site from going idle, with a schedule task?
The executable & argument form the command that can be executed by a scheduler to make a request to a webpage and print out the data returned. For example, if you run this using the command line terminal (assuming you have powershell), you should see a whole bunch of javascript and html code present on google.com:
powershell -c "(new-object system.net.webclient).downloadstring('https://google.com')"
I am not sure whether or not this is an acceptable practice to keep websites from going idle on shared hosting spaces.

Updating code on production server when using Go

When I develop and update files on production server with PHP I just copy the files on the fly and everything seems to work without interrupting the server.
But if I am to update the code on the Go server and application and would need to kill the server, copy the src files to the server, run go install, and then start the server, this would interrupt the service, and if I do this quite often then it is going to look very bad for my users of the service.
How can I update files without the downtime when using Go with Go's http server?
PHP is an interpreted language, which means you provide your code in source format and the PHP interpreter will read it and execute it (it may create a more compact binary form so that it doesn't have to analyze the source again when needed).
Go is a compiled language, it compiles into a native executable binary; going further it is statically linked which means every code and library your app is referring to is compiled and linked when the executable is created. This implies you can't just "drop-in" new go modules into a running application.
You have to stop your running application and start the new version. You can however minimize the downtime: only stop the running application when the new version of the executable is already created and ready to be run. You may choose to compile it on a remote machine and upload the binary to the server, or upload the source and compile it on the server, it doesn't matter.
With this you could decrease the downtime to a maximum of few seconds, which your users won't notice. Also you shouldn't update in every hour, you can't really achieve significant updates in just an hour of coding. You could schedule updates daily (or even less frequently), and you could schedule them for hours when your traffic is low.
If even a few seconds downtime is not acceptable to you, then you should look for platforms which handle this for you automatically without any downtime. Check out Google App Engine - Go for example.
The grace library will allow you to do graceful restarts without annoyance for your users: https://github.com/facebookgo/grace
Yet in my experience restarting Go applications is so quick, unless you have an high traffic website it won't cause any trouble.
First of all, don't do it in that order. Copy and install first. Then you could stop the old process and run the new one.
If you run multiple instances of your app, then you can do a rolling update, so that when you bounce one server, the other ones are still serving. A similar approach is to do blue-green deployments, which has the advantage that the code your active cluster is running is always homogeneous (whereas during a rolling deploy, you'll have a mixture until they've all rolled), and you can also do a blue-green deployment where you normally have only one instance of your app (whereas rolling requires more than one). It does however require you to have double the instances during the blue-green switch.
One thing you'll want to take into consideration is any in-flight requests -- you may want to make sure that in-flight requests continue to go to old-code servers until their finished.
You can also look into Platform-as-a-Service solutions, that can automate a lot of this stuff for you, plus a whole lot more. That way you're not ssh'ing into production servers and copying files around manually. The 12 Factor App principles are always a good place to start when thinking about ops.

How to find out which script is running at what particular time in unix?

in my application server,some files are getting deleted from one folder exactly at 1 am everyday.i have checked the crontab.wms file and there is no script which runs at 1 am.
How to find out which script is deleting the files.
Exactly 1AM makes cron a prime suspect, but processes can be launched from other places (e.g. init). Also, if the directory can be mounted elsewhere then your server may not be deleting the files. And if malware is causing this, the origin of the process could be intentionally hidden. Some information about where the files are and what the files are could be useful clues.
Repeatedly running ps -aef for several seconds may uncover the culprit. I would run it hundreds of times without sleeping between starting just before 1AM. There can be a lot of processes to examine.
You may also repeatedly run this:
/usr/sbin/lsof +d <fullNameOfTheDirectory>
to list processes that have opened the specific directory (or files in the directory). This could give a more concise list, but you have to be lucky to be probing at exactly the time the process is using the directory. You may need to try over many nights and you will want both ps and lsof.
If the files do not belong to root, you can chown root before 1AM. If the delete succeeds then you know the process is root.
I assume the deletion is messing you up. You can archive the files before 1AM and restore them when they go missing, assuming the files are fairly static. Or, you can remove write permissions for a few minutes to see if that thwarts the process (you should still see it accessing the directory). These are kludges, but could patch things up until you can really solve it.

Transfering millions of images -- RSync not good enough

We've got a folder, 130GB in size, with millions of tiny (5-20k) image files, and we need to move it from our old server (EC2) to our new server (Hetzner, Germany).
Our SQL files SCP'd over really quickly -- 20-30mb/s atleast -- and the first ~5gb or so of images transfered pretty quick, too.
Then we went home for the day, and coming back in this morning, our images have slowed to only ~5kb/s in transfer. RSync seems to slow down as it hits the middle of the workload. I've looked into alternatives, like gigasync (which doesn't seem to work), but everyone seems to agree rsync is the best option.
We have so many files, doing ls -al takes over an hour, and all my attempts at using python to batch up our transfer into smaller parts have eaten all available RAM without successfully completing.
How can I transfer all these files at a reasonable speed, using readily available tools and some light scripting?
I don't know if it will significantly faster, but maybe a
cd /folder/with/data; tar cvz | ssh target 'cd /target/folder; tar xvz'
will do the trick.
If you can, maybe restructure your file arrangement. In similiar situations, I group the files project-wise or just 1000-wise together so that a single folder doesn't have too many entries at once.
But I can imagine that the necessity of rsync (which I otherwise like very well, too) to keep a list of transferred files is responsible for the slowness. If the rsync process occupies so much RAM that it has to swap, all is lost.
So another option could be to rsync folder by folder.
It's likely that the performance issue isn't with rsync itself, but a result of having that many files in a single directory. Very few file systems perform well with a single huge folder like that. You might consider refactoring that storage to use a hierarchy of subdirectories.
Since it sounds like you're doing essentially a one-time transfer, though, you could try something along the lines of a tar cf - -C <directory> . | ssh <newhost> tar xf - -C <newdirectory> - that might eliminate some of the extra per-file communication rsync does and the extra round-trip delays, but I don't think that will make a significant improvement...
Also, note that, if ls -al is taking an hour, then by the time you get near the end of the transfer, creating each new file is likely to take a significant amount of time (seconds or even minutes), since it first has to check every entry in the directory to see if it's in fact creating a new file or overwriting an old one.

Resources