How to move files in unix in a way that we save all of the atributes

How to move files in unix in a way that we save all of the atributes - unix

I had this question in my exam today, I'can't seem to find an answer for it :
how to move all found fit files placed in root and move them in fit directory in a way that we save all atributes?

The question sounds a little vague: the first one and immediate answer is that you use the cp(1) command with --preserve=all. From the manpage:
--preserve[=ATTR_LIST] preserve the specified attributes (default: mode,ownership,timestamps), if possible additional attributes:
context, links, xattr, all
It looks like you're taking an operating systems class though, so I assume that the exam was not testing your ability to know all the possible options to cp(1). The question is (probably) about how to do it in code (or rather, how cp(1) does it).
Programmatically, you could do it like this:
Open the source directory with opendir(3).
Iteratively fetch each entry in the directory with readdir(3). Each call to readdir(3) will return a struct dirent, which, among other things, contains the inode of that entry, the filename, and the type of file (you may want to recursively repeat the process if the file type is a directory)
For each regular file entry, open(2) the file with O_RDONLY. Also open(2) the same filename in the target directory with O_WRONLY | O_CREAT | O_TRUNC (write only mode, truncate the file if it already exists, create it if it doesn't).
Copy the contents as usual with read(2) and write(2).
Call fstat(2) on the source file to get all the attributes.
Call fchmod(2) on the target file to set the permissions to be the same as those in the st_mode field of the stat structure of the source file.
Call fchown(2) on the target file to set the owner and group to be the same as the st_uid and st_gid fields of the stat structure of the source file.
Call futimens(2) on the target file to set the access and modification times to be the same as the st_atime and st_mtime fields of the stat structure of the source file.
Close the file and process the next file.
When done, close the directory with closedir(3).
These are all the attributes that you can preserve in a copy. Note that there are still some differences between the two files:
The ctime (time of last status change, i.e., when the inode was last changed) can't be copied.
The ID of the device containing the file may be different (depending on where you are copying it to).
The inode number is obviously different.
The number of hard links may not be the same.
The file size may differ if the source file had holes in it. Holes in a file are not necessarily backed by disk storage, but if you naively copy byte by byte from the source to target, the destination file will not have holes and will need more disk space.

Related

Writing a library in unix

I have to do this:
Write a library responsible for handling an array array with pointers to blocks containing the results of the execution of the find command. Following instructions are possible:
execute search of the files with the name filenames starting from root directory, store the output and diagnostic output in temp file. You can use system function to execute the external proces/Unix command – Syntax: search root filenames temp
store the content of the temp file in dynamically allocated block of memory, add the pointer to this block in the array – store temp
remove (dealocate) the block of memory accessed in the entry number in array – remove number.
But I don't know how to start. Does anyone know where I can find info or can help me to do this?

Overwrite an existing file programmatically

I have a QDialogBox where there is an option to upload a file.
I can upload files and save them to a folder. It works fine.
But if in case there is a file that already exists in the folder, I am not sure how to handle that scenario.
I want to warn the user that the file with same name already exists.
Is there a Windows API that I can use in this case? (because when we manually save an existing file, we get a warning, how can I use that?)
If someone can point me to that documentation, it will be great.

If you are using a QFileDialog, confirmOverwrite is activated by default, so, if getSaveFileName() returned a non-empty QString, then that means the user accepted to overwrite the file. Other way, you get an empty QString.
Then, you can check if the file exists, and remove it in that case, but you know that the user was Ok with that.

There is always a potential race condition when saving files. Checking to see if the file exists first is not safe, because some other process could create a file with the same name in between the check and when you actually write the file.
To avoid problems, the file must be opened with exclusive access, and in such a way that it immediately fails if it already exists.
If you want to do things properly, take a look at these two answers:
How do I create a file in python without overwriting an existing
file
Safely create a file if and only if it does not exist with
python

You can use QDir::entryList() to get the file names in a directory if you're not using a QFileDialog.
QDir dir("/path/to/directory");
QStringList fileNames = dir.entryList();
Then iterating through file names, you can see if there's a file with the same name. If you need it, I can give an example for that too. It'd be C++, but easily adaptable to Python.
Edit: Smasho just suggested that using QDir::exists() method. You can check if the file name exists in the directory with this method instead of iterating like I suggested.
if(dir.exists(uploadedFileName))

What Exactly are Anonymous Files

A passage in the file documentation caught my eye:
## We can do the same thing with an anonymous file.
Tfile <- file()
cat("abc\ndef\n", file = Tfile)
readLines(Tfile)
close(Tfile)
What exactly is this anonymous file? Does it exist on disk, or only in memory? I'm interested in this as I'm contemplating a program that will potentially need to create/delete thousands of temporary files, and if this happens only in memory it seems like it would have a much lesser impact on system resources.
This linux SO Q appears to suggest this file could be a real disk file, but I'm not sure how relevant to this particular example that is. Additionally, this big memory doc seems to hint at a real disk based storage (though I'm assuming the file based anonymous file is being used):
It should also be noted that a user can create an “anonymous” file-backed big.matrix by specifying "" as the filebacking argument. In this case, the backing resides in the temporary directory and a descriptor file is not created. These should be used with caution since even anonymous backings use disk space which could eventually fill the hard drive. Anonymous backings are removed either manually, by a user, or automatically, when the operating system deems it appropriate.
Alternatively, if textConnection is appropriate for use for this type of application (opened/closed hundreds/thousands of times) and is memory only that would satisfy my needs. I was planning on doing this until I read the note in that function's documentation:
As output text connections keep the character vector up to date line-by-line, they are relatively expensive to use, and it is often better to use an anonymous file() connection to collect output.

My C is very rusty, so hopefully more experienced people can correct me, but I think the answer to your question "What exactly is this anonymous file? Does it exist on disk, or only in memory?" is "It exists on disk".
Here is what happens at C level (I'm looking at the source code at http://cran.r-project.org/src/base/R-3/R-3.0.2.tar.gz):
A. Function file_open, defined in src/main/connections.c:554, has the following logic related to anonymous file (with an empty description), lines 565-568:
if(strlen(con->description) == 0) {
temp = TRUE;
name = R_tmpnam("Rf", R_TempDir);
} else name = R_ExpandFileName(con->description);
So a new temporary filename is generated if no file name was supplied to file.
B. If the name of the file is not equal to stdin, the call R_fopen(name, con->mode) happens at line 585 (there some subtleties with Win32 and UTF8 names, but we can ignore them now).
C. Finally, the file name is unlinked at line 607. The documentation for unlink says:
The unlink() function removes the link named by path from its
directory and decrements the link count of the file which was
referenced by the link. If that decrement
reduces the link count of the file to zero, and no process has the file open, then all resources associated with the file are
reclaimed. If one or more process have the
file open when the last link is removed, the link is removed, but the removal of the file is delayed until all references to it have
been closed.
So in effect the directory entry is removed but file exists as long as it's being open by R process.
D. Finally, R_fopen is defined in src/main/sysutils.c:135 and just calls fopen internally.

append text into the beginning of a textfile

Is there a way that I can always append new text into the beginning of a text file in Qt? i'm using QFile::Append to do it.
file.open(QFile::Append | QFile::Text)

You can't, see the documentation at http://doc.qt.io/qt-5/qiodevice.html:
QIODevice::Append 0x0004 The device is opened in append mode, so that all data is written to the end of the file.
The problem is even worse, a file is usually stored sequentially on disk, appending (better: inserting) at the start of a file would involve moving all data towards the end of the file, thus a reorganization of filesystem blocks. I'm not sure such a filesystem exists, but if, I guess it would only allow insertion of a multiple of the filesystem block size into a file.

File uploading: what should be the name of the file to save to?

I am going to add file upload control to my ASP.NET 2.0 web page so that users can upload files. Files will be stored in the server in the folder with the name as of the user. I want to know what is the best option to name the files when saving to server. Needs to consider security, performance, flexibility to handle files etc.
Options I am considering now :
Upload with the same name as of the input file name
Add User Id+Random Number +File name as of the input file name
Create random numbers +Current Time in seconds and save files with that number. Will have one table to map this number with users upload
Anything else? What is the best way?

NEVER EVER use user input for filenames. Don't use the username. User the user id instead (I assume your users have an unique id).
NEVER use the original filename. Use your solution number 3, plus the user id instead of the username.
For your information, PHP had a vulnerability a few years ago: one could forge a HTTP POST request with a file upload, and with a file name like "../../anything.php", and the php _FILES array, supposed to contain sanitized values, didn't detect these kind of file names, so one could write files anywhere in the filesystem.

I'd use a combination of
User ID
A random generated string (e.g. a GUID)
Example PDF file name: 23212-dd503cf8-a548-4584-a0a3-39dc8be618df.pdf
This way, the user can upload as many files as he/she wants, without file name conflict, and you are also able to point out which files belong to which users, just by looking at the file names.
I don't see the need to include any other information in the file name, since upload time/date and such can be retrieved from the file's attributes.
Also, you should store the files in a safe location, which external users, such as visitors of your website, cannot access. Instead, you deliver the file to them through a proxy web page (you read the file from the safe location, and pass the data on to the user). For this solution, a database is needed to keep track of files, their location, etc.
This also makes you able to control which users have access to which files through your code.
Update: Here's a description of how the solution with the proxy web page could be implemented.
Create a Web Form with the name GetFile.aspx
GetFile.aspx takes one query parameter named fileid, which is used to identify the file to get. E.g.: http://www.mypage.com/GetFile.aspx?fileid=100
Use the fileid parameter to lookup the file location in the database, so that it can be read and sent to the user. In the Web Form you use Request.QueryString("fileid") to get the file ID and use it in a query that will look something like this (SQL): SELECT FileLocation FROM UserFiles WHERE FileID = 100
Read the file using a System.IO.FileStream and output its contents through Response.Write. Remember to set the appropriate content type using Response.ContentType first, so that the client browser handles the requested file correctly (see this post on asp.forums.net and the MDSN article which is also referred to in the post, which both discuss a method of determining the appropriate content type automatically).
If you choose this approach, it's easy to implement your own simple security or custom actions later on, such as making sure a user is logged into your web site before you send the file, or that users can only access files they uploaded themselves, or logging which users download which files, etc. The possibilities are endless ;-)

Take a look at the System.IO.Path class as it has lots of useful functions you can utilise, such as:
Check which characters are invalid in a file name:
System.IO.Path.GetInvalidPathChars();
Get a random file name:
System.IO.Path.GetRandomFileName();
Get a unique, randome filename in the temporary directory
System.IO.Path.GetTempFileName();

I would go with option #3. A table mapping these files with users will provide other uses down the road, it always does. If you use the mapping, the only advantage of appending the user name or id to the file is if you are trying to debug a problem.
I'd probably use a GUID instead of a random number but either would work. The important things in my opinion are
No username as part of the filename as any part of the stored file
Never use the original file name as any part of the stored file
Use a random number or GUID to ensure no duplicate file
Adding an user id to the file will help with manual debugging issues

There is more to this than meets the eye...which I am thinking that you already knew!
What sort of files are you talking about? If they are anything even remotely big or in such quantity that the group of files could be big I would immediately suggest that you add some flexibility to your approach.
create a table that stores the root paths to various file stores (this could be drives, unc paths, what ever your environment supports). It will initially have one entry in it which will be your first storage location. An nice attribute to maintain with this data is how much room can be stored here.
maintain a table of file related data (id {guid}, create date, foreign key to path data, file size)
write the file to a root that still has room on it (query all file sizes stored in a root location and compare to that roots capacity)
write the file using a GUID for the name (obfuscates the file on the file system)..can be written without the file extension if security requires it (sensitive files)
write the file according to its create date starting from the root/year{number}/month{number}/day{number}/file.extension
With a system of this nature in place - even though you won't/don't need it up front - you can now more easily relocate the files. You can better manage the files. You can better manage collections of files. Etc. I have used this system before and found it to be quite flexible. Dealing with files that are stored to a file system but managed from a database can get a bit out of control once the file store becomes so large and things need to get moved around a bit. Also, at least in the case of windows...storing zillions of files in one directory is usually not a good idea (the reason for breaking things up by their create date).
This complexity is only really needed when you have high volumes and large foot prints.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex