How to let humans and programs access the same file without stepping on each others' toes - unix

Suppose I have a file, urls.txt, that contains a list of URLs I'm monitoring. My monitoring script edits that file occasionally, say, to indicate whether each URL is reachable. I'd like to also manually edit that file, to add to or change the list of URLs. How can I allow that such that I don't have to think about it when manually editing?
Here are some possible answers. What would you do?
Engage in hackery like having the program check for the lockfiles that vim or emacs create. Since this is just for me, this would actually work.
If the human edits always take precedence, just always have the human clobber the program's changes (eg, ignore the editor's warning that the file has changed on disk). The program can then just redo its changes on its next loop. Still, changing the file while the user edits it is not so nice.
Never let a human touch a file that a program makes ongoing modifications to. Rethink the design and have one file that only the human edits and another file that only the program edits.
Give the human a custom tool to edit the file that does the appropriate file locking. That could be as crude as locking the file and then launching an editor, or a custom interface (perhaps a simple command line interface) for inserting/changing/deleting entries from the file.
Use a database instead of a flat file and then the locking is all taken care of automatically.
(Note that I concocted the URL monitoring example to make this more concrete and because what I actually have in mind is perhaps too weird and distracting -- this question is strictly about how to let humans and programs both modify the same state file.)

I'd use a database since that's basically what you're going to have to build to achieve what you want. Why re-invent the wheel?
If a full-blown DBMS is too much of a load, separate the files into two and synchronize them periodically. Whether the URL is reachable doesn't sound like something the user would be changing, so should not be editable by them.
During the synchronize process (which would have to lock out the monitor and the user although it could be a sub-function of the monitor), remove entries in the monitor file that aren't in the user full. Also, add to the monitor file those that have been added to the user file (and start monitoring them).
But, I'd go the database method with a special front-end for the user, since you can get relatively good light-weight databases nowadays.

Use a sensible version control system!
(Git would work well here).
That said, the nature of the problem implies that a real database would be best - and they will generally have either database-level, table-level, or row-level locking - but then put any scripts you need into version control.

I would go with option 3. In fact, I would have the program read the human-edited input file, and append the results of each query to a log file. In this way, you can also analyse the reachability of sites over time. You can also have the program maintain a file that indicates the current reachability state of each site in the input file, as a snapshot of the current state.

One other option is using two files, one for automated access and one for manual. You'd need a way in the user file to indicate modifications or deletions but you'd have similar problems in some of the other solutions as well.

Related

Stop rsync from backing up if too many files are being changed

Does anyone know of a way that I can tell rsync to not perform a backup if it detects and X amount of data will be changed? For example, if I run a backup and it detects and 25% of the data in the destination directory will be changed can I have it automatically abort that run and then I can evaluate and make a decision whether allow it or not. I back up my machine every night but what I'm worried about is if my machine gets hit with a ransomware bug or another issue that causes a ton of my data gets destroyed or lost I really don't want it to propagate to my backup. I used to tool call synconvery and it had this feature but I don't think the tool is supported very well and I get a lot of permission and read errors that I don't see with any other tools. Goodsync also has this feature but even though it runs on the Mac it doesn't support special characters in a file name and replaces it with an underscore when the file is copied. I just think that will cause problems when I try to restore those file and it's being referenced wit that special character but can't be found because it has a damn underscore. I like using rsync and I will eventually retrofit my script to use msrsync but I can't trust it if I can't get this protection in place.

How to be sure about an uploaded file is not a virus in ASP.net?

I saw this question:
ASP.NET File Upload: how can I make sure that an uploaded file is really a JPEG?
and similar questions about being sure of the file being uploaded through asp:FileUpload control in ASP.net is really image. But What If users upload virus-infected images? How can I be insured of the image files being uploaded via my ASP.net application does not affect the files in my web app folder and/or images uploaded by other users?
As long as you don't serve it back to anyone as anything other than an image (content-type) and never trying to execute (.exe) the file you'll be fine.
Most anti-virus software run whats known as an "on-access scan". That is, when a file is changed, it automatically scans that file.
So save that file to the file system and let your server's anti-virus software do the work for you.
I'll take what is likely a somewhat controversial position.
There is no way to know with 100% certainty what the intent of a file is, be it good or evil. It is impossible. AV scanners give you a slice of data but they can't give you 100% guarantees either. No one can.
Given this reality, you need to build your app assuming that all files uploaded are bad. Yes, scanning is still fine and will filter out a bunch of stuff. But it will never be 100%. Is it 99.999% or 20%? Who knows. Does it really matter?
I would build any app today assuming that all user supplied content is bad. Very bad. Hostile bad. Because eventually it will be if you make it. And when it is, you'll be ready for them...rather than all the people that have to rearchitect their app because they made bad assumptions early on.
With a bit more data about your exact concerns, I'd be happy to comment on them more specifically...
As a side note, In older version of IIS (6 or prior versions) It could be possible to change FileName to the real malicious file name after save the file with original filename. Which has possibility to be read and execute regularly by the server.
E.G. set the file name like: file.asp;.jpg or file.asp%00.jpg etc...
It also has a possibility to change target directory by manipulation of file name. Which is extremely dangerous
E.G. newfolder.asp::$Index_Allocation or etc...
There is also some new way of attacks. Read more here.

Best practices place to put URL that configs my app?

We have a Qt app that when it starts tries to connect to a servlet to get config parameters that it needs to keep running.
The URL may change frequently because we have to test the application in several environments. Right now (as a temporary solution) the URL is a constant in source code, but it is a little bit ugly.
Where is the best place to mainting this URL, so that we do not need to change the source code every time I want to change the environment target?
In a database table maybe (my application uses a SQLite DB), in a settings file, or in some other way?
Thank you for you replies.
You have a number of options:
Hard coded (like you have already)
Run-time user input
Command line arguments
QSettings
Read from a bespoke file as text.
I would think option 3 would be the most simple to implement without being intrusive, but it does depend on what kind of application you have.
I would keep the list of url in a document, e.g. a XML, stored in a central, well known place, e.g. a known web server, and hardcode the url of the known place in the app.
The list could then be edited externally without recompiling your app;
The app would at startup download and parse the list, pointing to the right servlet based upon an environment specified as a command line parameter.

Development shell in ASP.NET

I write a lot of code, most of it I throw away eventually when I am done with it; recently I was thinking that if I just kept every small piece of utility script I wrote, named it, tagged it and filed it in a dev shell, I will never loose the code, and on top of that I won't need to redo something I have done already, which is the main motivation, as I keep finding myself writing something I've done earlier.
Is there a ASP.NET shell style environment anywhere?
If not, what would be the best way to go about this?
I am looking to be able to do the following:
Write big or small bits of code.
Derive from or chain together alread written code/libraries/services.
Ability to have everything on my desktop (would that mean IIS on the desktop? or is there an lighter weight mechanism?), sync'ed with the server at home, so if I am on the move I can still access this and make this part of my day-to-day workflow.
You could build a unique solution, with many class library projects inside. Each project would address a specific scenario, something like this:
MyStuff (Solution)
MyStuff.Common
MyStuff.Validation
MyStuff.Web
MyStuff.Encryption
etc.
Then you can put this solution on an online versioning service like bitbucket or assembla, so you can access your source code from anywhere, edit it and commit it back to the server. This way you get the advantages of versioning and you store your code on a remote server so even if your harddisk breaks it's not a problem, cause what's on the server is what matters.
You should either look into a source control system (Git perhaps?) or into a file storage / syncing / sharing service like DropBox.
DropBox would allow you to access code snippets from wherever you are and works really easily (just drop a file into a folder).
If you need versioning and branching you're going to have to look into a source control system. Since you have a server at home, that should be no problem.

Strategy for handling user input as files

I'm creating a script to process files provided to us by our users. Everything happens within the same UNIX system (running on Solaris 10)
Right now our design is this
User places file into upload directory
Script placed on cron to run every 10 minutes.
Script looks for files in upload directory, processes them, deletes immediately afterward
For historical/legacy reasons, #1 can't change. Also, deleting the file after processing is a requirement.
My primary concern is concurrency. It is very likely that the situation will arise where the analysis script runs while an input file is still being written to. In this case, data will be lost and this (obviously) unacceptable.
Since we have no control over the user's chosen means of placing the input file, we cannot require them to obtain a file lock. As I understand, file locks are advisory only on UNIX. Therefore a user must choose to adhere to them.
I am looking for advice on best practices for handling this problem. Thanks
Obviously all the best solutions involve the client providing some kind of trigger indicating that it has finished uploading. That could be a second file, an atomic move of the file to a processing directory after writing it to a stage directory, or a REST web service. I will assume you have no control over your clients and are unable or unwilling to change anything about them.
In that case, you still have a few options:
You can use a pretty simple heuristic: check the file size, wait 5 seconds, check the file size. If it didn't change, it's probably good to go.
If you have super-user privileges, you can use lsof to determine if anyone has this file open for writing.
If you have access to the thing that handles upload (HTTP, FTP, a setuid script that copies files?) you can put triggers in there of course.

Resources