I have an executable program which outputs data to the harddisk e.g. C:\documents.
I need some means to intercept the data in Windows 7 before they get to the hard drive. Then I will encrypt the data and send it back to the harddisk. Unfortunately, the .exe file does not support redirection command i.e. > in command prompt. Do you know how I can achieve such a thing in any programming language (c, c++, JAVA, php).
The encryption can only be done before the plain data is sent to the disk not after.
Any ideas most welcome. Thanks
This is virtually impossible in general. Many programs write to disk using memory-mapped files. In such a scheme, a memory range is mapped to (part of) a file. In such a scheme, writes to file can't be distinguished from writes to memory. A statement like p[OFFSET_OF_FIELD_X] = 17; is a logically write to file. Furthermore, the OS will keep track of the synchronization of memory and disk. Not all logical writes to memory are directly translated into physical writes to disk. From time to time, at the whim of the OS, dirty memory pages are copied back to disk.
Even in the simpler case of CreateFile/WriteFile, there's little room to intercept the data on the fly. The closest you could achieve is the use of Microsoft Detours. I know of at least one snakeoil encyption program (WxVault, crapware shipped on Dells) that does that. It repeatedly crashed my application in the field, which is why my program unpatches any attempt to intercept data on the fly. So, not even such hacks are robust against programs that dislike interference.
Related
When you download a file from the internet whether it be a FTP request, a Peer to Peer connection, ext. you are always prompted with a window asking where to store the file on your HDD or SSD, maybe you have a little NAS enclosure in your house.. either way you put it this information is being stored to a physical drive and the information is not considered volatile. It is stored digitally or magnetically and readily available to you even after the system is restarted.
Is it possible for software to be programmed to download and store information directly to a designated location in RAM without it ever touching a form of non-volatile memory?
If this is not possible can you please elaborate on why?
Otherwise if this is possible, if you could give me examples of software that implement this, or perhaps a scenario where this would be the only resolution to generate a desired outcome?
Thank you for the help. I feel this must be possible, however, I cant think of anytime I've encountered this and google doesn't seem to understand what I'm asking.
edit: This is being asked from the perspective of a novice programmer; someone who is looking into creating something like this. I seem to have over-inflated my own question. I suppose what I mean to ask is as follows:
How is software such as RAMDisk programmed, how exactly does it work, and are heavily abstract languages such as C# and Java incapable of implementing such a feature?
This is actually not very hard to do if I understand your request correctly. What you're looking for is tmpfs[1].
Carve our a tmpfs partition (if /tmp isn't tmpfs for you by default), mount it at a location, say something like /volative.
Then you can simply configure your browser or whatever application to download all files to folder/directory henceforth. Since tmpfs is essentially ram mounted as a folder, it's reset after reboot.
Edit: OP asks for how tmpfs and related ram based file systems are implemented. This is something that is usually Operating system specific, but the general idea probably remains the same: The driver responsible for the ram file system mmap() the required amount of memory and then exposes that memory in a way file system APIs typical to your operating system (For example POSIX-y operations on linux/solaris/bsd) can access it.
Here's a paper describing the implemention of tmpfs on solaris[2]
Further note: If however you're trying to simply download something, use it and delete it without ever hitting disk in a way that's internal entirely to your application, then you can simply allocate memory dynamically based on the size of whatever you're downloading, write bytes into allocated memory and free() it once you're done using it.
This answer assumes you're on a Linux-y operating system. There are likely similar solutions for other operating systems.
References:
[1] https://en.wikipedia.org/wiki/Tmpfs
[2] http://www.solarisinternals.com/si/reading/tmpfs.pdf
I have some questions on performing File I/Os using MPI.
A set of files are distributed across different processes.
I want the processes to read the files in the other processes.
For example, in one-sided communication, each process sets a window visible to other processors. I need the exactly same functionality. (Create 'windows' for all files and share them so that any process can read any file from any offset)
Is it possible in MPI? I read lots of documentations about MPI, but couldn't find the exact one.
The simple answer is that you can't do that automatically with MPI.
You can convince yourself by seeing that MPI_File_open() is a collective call taking an intra-communicator as first argument and returning a file handler to the opened file as last argument. In this communicator, all processes open the file and therefore, all processes must see the file. So unless a process sees a file, it cannot get a MPI_file handler to access it.
Now, that doesn't mean there's no solution. A possibility could be to do by hand exactly what you described, namely:
Each MPI process opens individually the file they see and are responsible of; then
Each of theses processes reads this local file into a buffer;
Theses individual buffers are all exposed, using either a global MPI_Win memory windows, or several individual ones, ready for one-sided read accesses; and finally
All read accesses to any data that were previously stored in these individual local files, are now done through MPI_Get() calls using the memory window(s).
The true limitation of this approach is that it requires to fully read all of the individual files, therefore, you need to have sufficient memory per node for storing each of them. I'm well aware that this is a very very big caveat that could just make the solution completely impractical. However, if the memory is sufficient, this is an easy approach.
Another even simpler solution would be to store the files into a shared file system, or having them all copied on all local file systems. I imagine this isn't an option since the question wouldn't have been asked otherwise...
Finally, in last resort, a possibility I see would be to dedicate a MPI process (or an OpenMP thread of a MPI process) per node to serve each files. This process would just act as a "file server", answering "read" request coming from the other MPI processes, and serving them by reading the requested data from the file, and sending it back via MPI. It's a bit lengthy to write, but it should work.
I'm having an issue with a MPI program running across a group of Linux nodes. The group is currently set up with NFS, with /home/mpi mounted across all nodes. The problem is that the program requires all of the nodes to open a file in the file system in write mode (use fopen on /home/mpi/file), and write to while it does calculations. One node will be able to open it, and the others won't and will throw an error. Instead I want each node to have its own file to write to.
I was wondering if there was a way to get around this. I was thinking about making a separate file for each node, with the nodes rank appended to the filename, but was wondering if there were simpler ways to get around this issue. Is there a way to set up the group so that all the worker nodes have their own copy of the /home/mpi directory that is auto-updated with any changes that the master node does to its copy?
Thanks.
As far as I know, the standard way of doing things is to open one file per node, indexed by rank as you described. Depending on what these files are used for (e.g. logging), you then have to write a script to re-combine them at the end of the computation.
If you really need all processes to write to the same file on the filesystem, you'll have to somehow coordinate concurrent outputs from all processes wanting to write to the file.
There is no way to do this at the filesystem level as far as I know, but you can do this within you MPI code. The standard, historical implementation of this is to have all MPI processes send messages to rank 0, which is in charge of effectively writing them to the filesystem.
Another option would be to look at the IO features introduced in MPI2, which allow all processes to work on different parts of the same file.
I'm about to start developing an application to transfer very large files without any rush but with need of reliability. I would like people that had worked coding such a particular case give me an insight of what I'm about to get into.
The environment will be intranet ftp server> so far using active ftp normal ports windows systems. I might need to also zip up the files before sending and I remember working with a library once that would zip in memory and there was a limit on the size... ideas on this would also be appreciated.
Let me know if I need to clarify something else. I'm asking for general/higher level gotchas if any not really detail help. I've done apps with normal sizes (up to 1GB) before but this one seems I'd need to limit the speed so I don't kill the network or things like that.
Thanks for any help.
I think you can get some inspiration from torrents.
Torrents generally break up the file in manageable pieces and calculate a hash of them. Later they transfer them piece by piece. Each piece is verified against hashes and accepted only if matched. This is very effective mechanism and let the transfer happen from multiple sources and also let is restart any number of time without worrying about corrupted data.
For transfer from a server to single client, I would suggest that you create a header which includes the metadata about the file so the receiver always knows what to expect and also knows how much has been received and can also check the received data against hashes.
I have practically implemented this idea on a client server application but the data size was much smaller, say 1500k but reliability and redundancy were important factors. This way, you can also effectively control the amount of traffic you want to allow through your application.
I think the way to go is to use the rsync utility as an external process to Python -
Quoting from here:
the pieces, using checksums, to possibly existing files in the target
site, and transports only those pieces that are not found from the
target site. In practice this means that if an older or partial
version of a file to be copied already exists in the target site,
rsync transports only the missing parts of the file. In many cases
this makes the data update process much faster as all the files are
not copied each time the source and target site get synchronized.
And you can use the -z switch to have compression on the fly for the data transfer transparently, no need to boottle up either end compressing the whole file.
Also, check the answers here:
https://serverfault.com/questions/154254/for-large-files-compress-first-then-transfer-or-rsync-z-which-would-be-fastest
And from rsync's man page, this might be of interest:
--partial
By default, rsync will delete any partially transferred
file if the transfer is interrupted. In some circumstances
it is more desirable to keep partially transferred files.
Using the --partial option tells rsync to keep the partial
file which should make a subsequent transfer of the rest of
the file much faster
If I have to move a moderate amount of memory between two processes, I can do the following:
create a file for writing
ftruncate to desired size
mmap and unlink it
use as desired
When another process requires that data, it:
connects to the first process through a unix socket
the first process sends the fd of the file through a unix socket message
mmap the fd
use as desired
This allows us to move memory between processes without any copy - but the file created must be on a memory-mounted filesystem, otherwise we might get a disk hit, which would degrade performance. Is there a way to do something like that without using a filesystem? A malloc-like function that returned a fd along with a pointer would do it.
[Edit] Having a file descriptor provides also a reference count mechanism that is maintained by the kernel.
Is there anything wrong with System V or POSIX shared memory (which are somewhat different, but end up with the same result)? With any such system, you have to worry about coordination between the processes as they access the memory, but that is true with memory-mapped files too.