What Exactly are Anonymous Files - r

A passage in the file documentation caught my eye:
## We can do the same thing with an anonymous file.
Tfile <- file()
cat("abc\ndef\n", file = Tfile)
readLines(Tfile)
close(Tfile)
What exactly is this anonymous file? Does it exist on disk, or only in memory? I'm interested in this as I'm contemplating a program that will potentially need to create/delete thousands of temporary files, and if this happens only in memory it seems like it would have a much lesser impact on system resources.
This linux SO Q appears to suggest this file could be a real disk file, but I'm not sure how relevant to this particular example that is. Additionally, this big memory doc seems to hint at a real disk based storage (though I'm assuming the file based anonymous file is being used):
It should also be noted that a user can create an “anonymous” file-backed big.matrix by specifying "" as the filebacking argument. In this case, the backing resides in the temporary directory and a descriptor file is not created. These should be used with caution since even anonymous backings use disk space which could eventually fill the hard drive. Anonymous backings are removed either manually, by a user, or automatically, when the operating system deems it appropriate.
Alternatively, if textConnection is appropriate for use for this type of application (opened/closed hundreds/thousands of times) and is memory only that would satisfy my needs. I was planning on doing this until I read the note in that function's documentation:
As output text connections keep the character vector up to date line-by-line, they are relatively expensive to use, and it is often better to use an anonymous file() connection to collect output.

My C is very rusty, so hopefully more experienced people can correct me, but I think the answer to your question "What exactly is this anonymous file? Does it exist on disk, or only in memory?" is "It exists on disk".
Here is what happens at C level (I'm looking at the source code at http://cran.r-project.org/src/base/R-3/R-3.0.2.tar.gz):
A. Function file_open, defined in src/main/connections.c:554, has the following logic related to anonymous file (with an empty description), lines 565-568:
if(strlen(con->description) == 0) {
temp = TRUE;
name = R_tmpnam("Rf", R_TempDir);
} else name = R_ExpandFileName(con->description);
So a new temporary filename is generated if no file name was supplied to file.
B. If the name of the file is not equal to stdin, the call R_fopen(name, con->mode) happens at line 585 (there some subtleties with Win32 and UTF8 names, but we can ignore them now).
C. Finally, the file name is unlinked at line 607. The documentation for unlink says:
The unlink() function removes the link named by path from its
directory and decrements the link count of the file which was
referenced by the link. If that decrement
reduces the link count of the file to zero, and no process has the file open, then all resources associated with the file are
reclaimed. If one or more process have the
file open when the last link is removed, the link is removed, but the removal of the file is delayed until all references to it have
been closed.
So in effect the directory entry is removed but file exists as long as it's being open by R process.
D. Finally, R_fopen is defined in src/main/sysutils.c:135 and just calls fopen internally.

Related

How do I check whether a file exists without exceptions in Julia?

How do I see if a file exists without exceptions using Julia? I want to make sure that my program does not crash if for some reason the file I am trying to open is not accessible, has been deleted, or does not exist.
There are two simple ways of doing so.
First:
println(isfile("Sphere.jl"))
false
This isfile() function will simply check if the file exists. Note: if Sphere.jl is not in your current file path, you would need to provide the absolute path to get to that file.
Second (more of a trial by fire example):
try
open("Sphere.jl", "w") do s
println(s, "Hi")
end
catch
#warn "Could not open the file to write."
end
The second example utilizes the try-catch schema. It is always best for your program to not have to deal with errors so it's recommended that you use isfile() unless you have to use try-catch for your use case.
It's worth noting that there may be some cases where the file exists, but writing to it is not possible (i.e. it's locked by the os). In that case, using try-catch is a great option when attempting to write.

How to move files in unix in a way that we save all of the atributes

I had this question in my exam today, I'can't seem to find an answer for it :
how to move all found fit files placed in root and move them in fit directory in a way that we save all atributes?
The question sounds a little vague: the first one and immediate answer is that you use the cp(1) command with --preserve=all. From the manpage:
--preserve[=ATTR_LIST] preserve the specified attributes (default: mode,ownership,timestamps), if possible additional attributes:
context, links, xattr, all
It looks like you're taking an operating systems class though, so I assume that the exam was not testing your ability to know all the possible options to cp(1). The question is (probably) about how to do it in code (or rather, how cp(1) does it).
Programmatically, you could do it like this:
Open the source directory with opendir(3).
Iteratively fetch each entry in the directory with readdir(3). Each call to readdir(3) will return a struct dirent, which, among other things, contains the inode of that entry, the filename, and the type of file (you may want to recursively repeat the process if the file type is a directory)
For each regular file entry, open(2) the file with O_RDONLY. Also open(2) the same filename in the target directory with O_WRONLY | O_CREAT | O_TRUNC (write only mode, truncate the file if it already exists, create it if it doesn't).
Copy the contents as usual with read(2) and write(2).
Call fstat(2) on the source file to get all the attributes.
Call fchmod(2) on the target file to set the permissions to be the same as those in the st_mode field of the stat structure of the source file.
Call fchown(2) on the target file to set the owner and group to be the same as the st_uid and st_gid fields of the stat structure of the source file.
Call futimens(2) on the target file to set the access and modification times to be the same as the st_atime and st_mtime fields of the stat structure of the source file.
Close the file and process the next file.
When done, close the directory with closedir(3).
These are all the attributes that you can preserve in a copy. Note that there are still some differences between the two files:
The ctime (time of last status change, i.e., when the inode was last changed) can't be copied.
The ID of the device containing the file may be different (depending on where you are copying it to).
The inode number is obviously different.
The number of hard links may not be the same.
The file size may differ if the source file had holes in it. Holes in a file are not necessarily backed by disk storage, but if you naively copy byte by byte from the source to target, the destination file will not have holes and will need more disk space.

Overwrite an existing file programmatically

I have a QDialogBox where there is an option to upload a file.
I can upload files and save them to a folder. It works fine.
But if in case there is a file that already exists in the folder, I am not sure how to handle that scenario.
I want to warn the user that the file with same name already exists.
Is there a Windows API that I can use in this case? (because when we manually save an existing file, we get a warning, how can I use that?)
If someone can point me to that documentation, it will be great.
If you are using a QFileDialog, confirmOverwrite is activated by default, so, if getSaveFileName() returned a non-empty QString, then that means the user accepted to overwrite the file. Other way, you get an empty QString.
Then, you can check if the file exists, and remove it in that case, but you know that the user was Ok with that.
There is always a potential race condition when saving files. Checking to see if the file exists first is not safe, because some other process could create a file with the same name in between the check and when you actually write the file.
To avoid problems, the file must be opened with exclusive access, and in such a way that it immediately fails if it already exists.
If you want to do things properly, take a look at these two answers:
How do I create a file in python without overwriting an existing
file
Safely create a file if and only if it does not exist with
python
You can use QDir::entryList() to get the file names in a directory if you're not using a QFileDialog.
QDir dir("/path/to/directory");
QStringList fileNames = dir.entryList();
Then iterating through file names, you can see if there's a file with the same name. If you need it, I can give an example for that too. It'd be C++, but easily adaptable to Python.
Edit: Smasho just suggested that using QDir::exists() method. You can check if the file name exists in the directory with this method instead of iterating like I suggested.
if(dir.exists(uploadedFileName))

File upload in ASP.NET - How can I prevent exceptions?

I have a FileUploader control in my web form. If the file being uploaded is already present, I want to delete it, and overwrite it with the newly-uploaded file. But I get an error, as the file is in use by another process, and thus the application can't delete it. Sample code:
if (FUpload.HasFile)
{
string FileName = Path.GetFileName(FUpload.PostedFile.FileName);
string Extension = Path.GetExtension(FUpload.PostedFile.FileName);
string FolderPath = ConfigurationManager.AppSettings["FolderPath"];
string FilePath = Server.MapPath(FolderPath + FileName);
if (File.Exists(FilePath))
{
File.Delete(FilePath);
}
FUpload.SaveAs(FilePath);
}
Is there anything I can do apart from writing the code in try/catch blocks?
Generate a unique temporary file name. Rename it to your destination when complete. You may still have collisions if someone uploads the "same" file name at the same time. You should always be catching file system errors somewhere. If you don't do it here, may I suggest a global error handler in global.asax.
you can save you file with some other name and after that if it exist use File.Replace to replace old file
At the end of the day, due to potential race conditions on your web site (due to, hopefully, concurrent users), you can't get around try/catch. (Why are you averse to it?)
Utkarsh and No Refunds No Returns have the basic answer right -- save it with a temporary file name, then replace/overwrite the existing one if needed. A good approach for this is to use a GUID as the temporary file name, to ensure that there are no collisions on the filename alone.
Depending on the nature of your application, you could get quite a few files stacked up, uploaded by different users, with lots of potential name conflicts. Depending on the nature and scale of your app, as well as its security boundaries, you might consider giving each user his/her own directory, based on user ID (how you'd identify the user in the database). Each user uploads his/her files there. If there's a name collision, you can bounce back to the user (holding the GUID name in session if needed) and ask if he/she wants to overwrite, and know with confidence that the answer is safe.
If the user declines to overwrite, you can delete your temp file.
If the user agrees to overwrite, you can delete the original and write the new one.
In either event, all of this is localized to the user's own directory, and thus (unless multiple users are signed on with the same ID) the behavior is safe.
In general, this will be more robust and safe than arbitrarily overwriting file name collisions.
Again, due to race conditions and other situations beyond your control, you need to use a try/catch block any time you attempt to write to the file system. Why? What if the drive is out of space? What if the file you are attempting to overwrite is legitimately in use by another process? What if the file you are attempting to overwrite has NTFS permissions forbidding the web process from touching it? So on and so forth. You need to be prepared to handle these kinds of exceptions.

File uploading: what should be the name of the file to save to?

I am going to add file upload control to my ASP.NET 2.0 web page so that users can upload files. Files will be stored in the server in the folder with the name as of the user. I want to know what is the best option to name the files when saving to server. Needs to consider security, performance, flexibility to handle files etc.
Options I am considering now :
Upload with the same name as of the input file name
Add User Id+Random Number +File name as of the input file name
Create random numbers +Current Time in seconds and save files with that number. Will have one table to map this number with users upload
Anything else? What is the best way?
NEVER EVER use user input for filenames. Don't use the username. User the user id instead (I assume your users have an unique id).
NEVER use the original filename. Use your solution number 3, plus the user id instead of the username.
For your information, PHP had a vulnerability a few years ago: one could forge a HTTP POST request with a file upload, and with a file name like "../../anything.php", and the php _FILES array, supposed to contain sanitized values, didn't detect these kind of file names, so one could write files anywhere in the filesystem.
I'd use a combination of
User ID
A random generated string (e.g. a GUID)
Example PDF file name: 23212-dd503cf8-a548-4584-a0a3-39dc8be618df.pdf
This way, the user can upload as many files as he/she wants, without file name conflict, and you are also able to point out which files belong to which users, just by looking at the file names.
I don't see the need to include any other information in the file name, since upload time/date and such can be retrieved from the file's attributes.
Also, you should store the files in a safe location, which external users, such as visitors of your website, cannot access. Instead, you deliver the file to them through a proxy web page (you read the file from the safe location, and pass the data on to the user). For this solution, a database is needed to keep track of files, their location, etc.
This also makes you able to control which users have access to which files through your code.
Update: Here's a description of how the solution with the proxy web page could be implemented.
Create a Web Form with the name GetFile.aspx
GetFile.aspx takes one query parameter named fileid, which is used to identify the file to get. E.g.: http://www.mypage.com/GetFile.aspx?fileid=100
Use the fileid parameter to lookup the file location in the database, so that it can be read and sent to the user. In the Web Form you use Request.QueryString("fileid") to get the file ID and use it in a query that will look something like this (SQL): SELECT FileLocation FROM UserFiles WHERE FileID = 100
Read the file using a System.IO.FileStream and output its contents through Response.Write. Remember to set the appropriate content type using Response.ContentType first, so that the client browser handles the requested file correctly (see this post on asp.forums.net and the MDSN article which is also referred to in the post, which both discuss a method of determining the appropriate content type automatically).
If you choose this approach, it's easy to implement your own simple security or custom actions later on, such as making sure a user is logged into your web site before you send the file, or that users can only access files they uploaded themselves, or logging which users download which files, etc. The possibilities are endless ;-)
Take a look at the System.IO.Path class as it has lots of useful functions you can utilise, such as:
Check which characters are invalid in a file name:
System.IO.Path.GetInvalidPathChars();
Get a random file name:
System.IO.Path.GetRandomFileName();
Get a unique, randome filename in the temporary directory
System.IO.Path.GetTempFileName();
I would go with option #3. A table mapping these files with users will provide other uses down the road, it always does. If you use the mapping, the only advantage of appending the user name or id to the file is if you are trying to debug a problem.
I'd probably use a GUID instead of a random number but either would work. The important things in my opinion are
No username as part of the filename as any part of the stored file
Never use the original file name as any part of the stored file
Use a random number or GUID to ensure no duplicate file
Adding an user id to the file will help with manual debugging issues
There is more to this than meets the eye...which I am thinking that you already knew!
What sort of files are you talking about? If they are anything even remotely big or in such quantity that the group of files could be big I would immediately suggest that you add some flexibility to your approach.
create a table that stores the root paths to various file stores (this could be drives, unc paths, what ever your environment supports). It will initially have one entry in it which will be your first storage location. An nice attribute to maintain with this data is how much room can be stored here.
maintain a table of file related data (id {guid}, create date, foreign key to path data, file size)
write the file to a root that still has room on it (query all file sizes stored in a root location and compare to that roots capacity)
write the file using a GUID for the name (obfuscates the file on the file system)..can be written without the file extension if security requires it (sensitive files)
write the file according to its create date starting from the root/year{number}/month{number}/day{number}/file.extension
With a system of this nature in place - even though you won't/don't need it up front - you can now more easily relocate the files. You can better manage the files. You can better manage collections of files. Etc. I have used this system before and found it to be quite flexible. Dealing with files that are stored to a file system but managed from a database can get a bit out of control once the file store becomes so large and things need to get moved around a bit. Also, at least in the case of windows...storing zillions of files in one directory is usually not a good idea (the reason for breaking things up by their create date).
This complexity is only really needed when you have high volumes and large foot prints.

Resources