The POSIX shm_open() function returns a file descriptor that can be used to access shared memory. This is extremely convenient because one can use all the traditional mechanisms for controlling file descriptors to also control shared memory.
The only drawback is that shm_open() always wants a filename. So I need to do this:
// Open with a clever temp file name and hope for the best.
fd = shm_open(tempfilename, O_RDWR | O_CREAT | O_EXCL, 0600);
// Immediately delete the temp file to keep the shm namespace clean.
shm_unlink(tempfilename);
// Then keep using fd -- the shm object remains as long as there are open fds.
This use of tempfilename is difficult to do portably and reliably. The interpretation of the filename (what the namespace is, how permissions are handled) differs among systems.
In many situations the processes using the shared memory object have no need for a filename because the object can be accessed more simply and safely by just passing a file descriptor from one process to another. So is there something that's just like shm_open() but can be used without touching the shared memory filename namespace?
mmap() with MAP_ANON|MAP_SHARED is great but instead of a file descriptor it gives a pointer. The pointer doesn't survive over an exec boundary and can't be sent to another process over a Unix domain socket like file descriptors can.
The file descriptor returned by shm_open() also doesn't survive an exec boundary by default: the POSIX definition says that the FD_CLOEXEC file descriptor flag associated with the new file descriptor is set. But it is possible to clear the flag using fcntl() on MacOS, Linux, FreeBSD, OpenBSD, NetBSD, DragonFlyBSD and possibly other operating systems.
A library to solve the problem
I managed to write a library that provides the simple interface:
int shm_open_anon(void);
The library compiles without warnings and successfully runs a test program on Linux, Solaris, MacOS, FreeBSD, OpenBSD, NetBSD, DragonFlyBSD and Haiku. You may be able to adapt it to other operating systems; please send a pull request if you do.
The library returns a file descriptor with the close-on-exec flag set. You can clear that flag using fcntl() on all supported operating systems, which will allow you to pass the fd over exec(). The test program demonstrates that this works.
Implementation techniques used in the library
The readme of the library has very precise notes on what was done and what wasn't done for each OS. Here's a summary of the main stuff.
There are several non-portable things that are more or less equivalent to shm_open() without a filename:
FreeBSD can take SHM_ANON as the pathname for shm_open() since 2008.
Linux has a memfd_create() system call since kernel version 3.17.
Earlier versions of Linux can use mkostemp(name, O_CLOEXEC | O_TMPFILE) where name is something like /dev/shm/XXXXXX. Note that we are not using shm_open() at all here -- mkostemp() is implicitly using a perfectly ordinary open() call. Linux mounts a special memory-backed file system in /dev/shm but some distros use /run/shm instead so there are pitfalls here. And you still have to shm_unlink() the temp file.
OpenBSD has a shm_mkstemp() call since release 5.4. You still have to shm_unlink() the temp file but at least it is easy to create safely.
For other OSes, I did the following:
Figure out an OS-dependent format for the name argument of POSIX shm_open(). Please note that there is no name you can pass that is absolutely portable. For example, NetBSD and DragonFlyBSD have conflicting demands about slashes in the name. This applies even if your goal is to use a named shm object (for which the POSIX API was designed) instead of an anonymous one (as we are doing here).
Append some random letters and numbers to the name (by reading from /dev/random). This is basically what mktemp() does, except we don't check whether our random name exists in the file system. The interpretation of the name argument varies wildly so there's no reasonable way to portably map it to an actual filename. Also Solaris doesn't always provide mktemp(). For all practical purposes, the randomness we put in will ensure a unique name for the fraction of a second that we need it.
Open the shm object with that name via shm_open(name, O_RDWR | O_CREAT | O_EXCL | O_NOFOLLOW, 0600). In the astronomical chance that our random filename already exists, O_EXCL will cause this call to fail anyway, so no harm done. The 0600 permissions (owner read-write) are necessary on some systems instead of blank 0 permissions.
Immediately call shm_unlink() to get rid of the random name. The file descriptor remains for our use.
This technique is not quaranteed to work by POSIX, but:
The shm_open() name argument is underspecified by POSIX so nothing else is guaranteed to work either.
I'll let the above compatibility list speak for itself.
Enjoy.
No, there isn't. Since both System V shared memory model and POSIX shared file mapping for IPC require operations with a file, there is always need for a file in order to do mapping.
mmap() with MAP_ANON|MAP_SHARED is great but instead of a file
descriptor it gives a pointer. The pointer doesn't survive over an
exec boundary and can't be sent to another process over a Unix domain
socket like file descriptors can.
As John Bollinger says,
Neither memory mappings created via mmap() nor POSIX shared-memory
segments obtained via shm_open() nor System V shared-memory segments
obtained via shmat() are preserved across an exec.
There must be a well-known place on the memory to meet and exchange information. That's why a file is the requirement. By doing this, after exec, the child is able to reconnect to the appropriate shared memory.
This use of tempfilename is difficult to do portably and reliably. The interpretation of the filename (what the namespace is, how permissions are handled) differs among systems.
You can have mkstemp create a unique filename in /dev/shm/ or /tmp and open the file for you. You can then unlink the filename, so that no other process can open this file, apart from the process that have the file descriptor returned from mkstemp.
mkstemp(): CONFORMING TO 4.3BSD, POSIX.1-2001.
Why not creating it with access rights to 0?
Thus no process would be able to "open" it and let you unlink it safely just after?
Related
I am working on an embedded Linux project that can run on multiple
platforms. One uses e.MMC for storage and another NAND flash. I want
to encrypt all the filesystems (mainly to protect against someone
unsoldering the flash chips and putting them in a reader). I want to
try and maintain a common approach across both hardware types as far
as possible. One big difference between the two is the wear levelling
is in the hardware for the e,MMC whereas for NAND I'll be using UBI.
For the root filesystem I am thinking of using squashfs which is
protected using dm-crypt. For the NAND device I have tried this out
and I can layer dm-crypt on top of ubiblock then use the device mapper
to load the squashfs. This maps nicely to the e.MMC world with the
only difference being that the device mapper is on a gpt partition
rather than a ubiblock device.
My challenge is for other read/ write filesystems. I want to mount an
overlay filesystem on top of the read-only root and a data partition.
I want both of these to also be encrypted. I have been investigating
how fscrypt can help me. (I believe dm-crypt won't work with ubifs).
For filesystems on the e.MMC I will be using ext4 and for NAND
ubifs. The documentation says both of these support fscrypt. I've struggled
a bit to find detailed documentation about how to use this with ubifs
(there is a lot more for the ext4) but I think that there are some
differences between how this has been implemented on each and I'd like
those who know more to confirm this.
On the NAND side I have only been able to get it to work by using the
fscryptctl tool (https://github.com/google/fscryptctl
) as opposed to the fuller featured fscrypt tool
(https://github.com/google/fscrypt). This was following instructions I
found in a patch to add fscrypt support to mkfs.ubifs here:
https://patchwork.ozlabs.org/project/linux-mtd/cover/20181018143718.26298-1-richard#nod.at/
This appears to encrypt all the files on the partition using the
supplied key. When I look at fscrypt on ext4 it seems here that you
can't do this. The root directory cannot itself be encrypted, only
sub-directories. Reading here:
https://www.kernel.org/doc/html/v4.17/filesystems/fscrypt.html it
says:
"Note that the ext4 filesystem does not allow the root directory to be
encrypted, even if it is empty. Users who want to encrypt an entire
filesystem with one key should consider using dm-crypt instead."
So this is different. It also seems that with ubifs I can't apply
encryption to the subdirectories like I could in ext4. The README.md
here https://github.com/google/fscryptctl gives an example using ext4.
This encrypts a subdirectory called test. I don't see how to do the
same thing using ubifs. Could someone help me?
I've been using the NANDSIM kernel module for testing. At the end of
this post is a script for building an encrypted overlay ubifs
filesystem. As you can see the mkfs.ubifs is taking the key directly
and it appears to apply it to all the files on the partition. You
can't then apply policies to any sub-directories as they are already
encrypted.
I would like to use some of the other features that the userspace
fscrypt tool provides e.g. protectors (so I don't need to use the
master key directly). I can't however see any way to get the userspace fscrypt
tool to setup encryption on a ubifs. The userspace fscrypt command
creates a .fscrypt directory in the root of the
partition to store information about policies and protectors. This
seems to fit more with the ext4 implementation where the root itself isn't encrypted.
When I try to set-up an unencrypted ubifs with "fscrypt setup" I run
into trouble as making a standard ubifs seems to create a v4 ubifs format
version rather than the required v5. This means the "fscrypt encrypt"
command fails. (Errors like this are seen in the dmesg output
[12022.576268] UBIFS error (ubi0:7 pid 6006): ubifs_enable_encryption
[ubifs]: on-flash format version 5 is needed for encryption).
Is there some way to get mkfs.ubifs to create an unencrypted v5 formatted
filesystem? Or does v5 mean encrypted?
Here is my script to create an encrypted ubifs using the fscryptctl tool:
#!/bin/bash
MTD_UTILS_ROOT=../../mtd-utils
FSCRYPTCTL=../../fscryptctl/fscryptctl
MOUNTPOINT=./mnt
dd if=/dev/urandom of=overlay.keyfile count=64 bs=1 # XTS needs a 512bit key
descriptor=`$FSCRYPTCTL get_descriptor < overlay.keyfile`
$MTD_UTILS_ROOT/mkfs.ubifs --cipher AES-256-XTS --key overlay.keyfile
-m 2048 -e 129024 -c 32 -r ./overlay -o overlay.enc.img
$MTD_UTILS_ROOT/ubiupdatevol /dev/ubi0_6 overlay.enc.img
# Try it out
$FSCRYPTCTL insert_key < overlay.keyfile
key=`keyctl show | grep $descriptor | awk '{print $1}'`
mount -t ubifs /dev/ubi0_6 $MOUNTPOINT
ls $MOUNTPOINT
umount $MOUNTPOINT
keyctl unlink $key
NB I've been working with mtd-utils v2.1.2 on a 5.4 kernel.
I am trying to encrypt/decrypt the data using javax.crypto.Ciper where I have given transformation as AES/ECB/PKCS5Padding.
My problem is when I run the code in Local machine, encryption / decryption works fine, however when I run the same code on Server, system throws Exception during Cipher.init("AES/ECB/PKCS5Padding").
On doing detailed analysis and checking the code inside Cipher.java, I found the problem is inside the following method Cipher-initCryptoPermission() when system checks for JceSecurity.isRestricted().
In my local machine, JceSecurity.isRestricted() returns FALSE, however when it runs on Server, the same method returns TRUE. Due to this on server, the system does not assigns right permissions to the Cipher.
Not sure, where exactly JceSecurity restriction is set. Appreciate your help.
On doing deep-diving I found the real problem and solution.
Under Java_home/jre/lib/security there are two jar files, local_policy.jar and US_export_policy.jar. Inside local_policy.jar, there is a file called default_local.policy, which actually stores all the permissions of the cryptography.
In my local machine the file had AllPermission, hence there were no Restriction in JceSecurity for me and was allowing me to use AES encryption algorithm, but on the server it is having limited version as provided by Java bundle.
Replacing the local_policy.jar with no restrictions (or unlimited permissions) did the trick.
On reading more about it on Internet found that Java provides the restricted version with the download package as some countries have restrictions on using the type of cyptography algorithms, hence you must check with your organisation before replacing the jar files.
Jar files with no restrictions can be found on Oracle (Java) site at following location.Download link
In my app I ask users to register using a unique name. The app creates a directory for them with that name that they then can work with, saving files, etc.
I hadn't really thought about screening for other than alpha-numeric for the name. However, I ran across a thread somewhere than said to make sure not to create directory names that match a unix command name.
Is this a legitimate risk? If so, how might one programmatically screen for such an occurrence? I'm also curious how such a scenario might play out to illustrate the problem (exploit?). That last part is academic interest only, of course.
Generally, it doesn't matter(has no obvious security risk). Most softwares, for example shell, search a unix command based on some enviroment variables(like PATH). So even if your created directory matches a unix command like "cd", it can only be used as a parameter to other unix command, like cd cd.
However, if another application search the unix command based on other approaches like searching some directories, it may lead to security breaches.
The only way I can think of that being a risk is if you're going to turn around and process those user names through command-line functions. You would want to be careful to escape the user names anywhere that they could be interpreted as a command...though off the top of my head, with strictly alphanumeric user names, you'd have to go to a lot of trouble to run into such a risk.
If you decided anyway that you wanted to ensure that the username didn't match an application on the path of the creating process, you could shell out from whatever your app environment is, and evaluate the result of which $prospectiveUsername. If it returns anything other than an empty string, you know that the username is an application on the process's path.
NOTE: In the above scenario, make sure you sanitize the username before calling out to the shell command. Otherwise, you do run security risks, if e.g. the user decides to enter her username as "janedoe; rm -rf /".
I read a Powerpoint slide recently, and can't get a clear concept about the usage of setuid.
Here is the content of the slide.
Setuid example - printing a file
Goals
Every user can queue files
Users cannot delete other users' files
Solution
Queue directory owned by user printer
Setuid queue-file program
Create queue file as user printer
Copy joe's data as user joe
Also, setuid remove-file program
Allows removal only of files you queued
User printer mediated user joe's queue access.
Here are my questions.
I don't have a clear ​understanding​ how to solve this problem by "setuid".
First, if the file is created by user printer, how does user joe copy data to a file owned by printer. Does it have some certain right to this file.
Secondly, how to identify the relationship between you and the file you queued.
Does the file's state show something, but the file is owned by printer, which can only be changed by root. Does it have something to do with the file's gid? If so, other users' gid may also be related to this file.
Does the file's context show something about who queued it?
My questions are a mess, and I really don't have a clear concept about the solution.
Let's assume that the files are to be stored in /var/spool/pq directory; the permissions on the directory printer:printgrp:2700 (owner printer, group printgrp, mode 2700 - read, write, execute for owner only, and the SGID bit set on the directory means all files created in it will belong to group printgrp).
Further, we assume that the print queuing program, pq, has permissions printer:printgrp:4511 (setuid printer, anyone can execute it, but only printer or root can look at it).
Now, suppose Joe runs pq /home/joe/file, and the permissions on the file are joe:staff:600 (only Joe can read or write to the file). Joe's umask is set to 022 (though this file has more restrictive permissions than are implied by the umask).
When the program runs, the real UID of the process is joe, but the effective UID is printer. Since there is no need for setgid operation here, both the real and effective GID are staff. Joe's auxilliary groups list does not include printgrp.
Note that it is the effective UID and GID of a process that control its access to files. On his own, Joe cannot see what jobs are in the printer queue; the directory permissions shown don't even allow him to list the files, or access the files in the directory. Note that the converse applies; on its own, the printer user (or the pq program running as that user) cannot access Joe's file.
The pq program can create files in the printer queue directory; it will probably create two files for this request. One will be a control file, the other will be a copy of the file to be printed. It will determine a job number by some mechanism (say 12345) and it might create and open two files for writing (with restrictive permissions - 0600 or even 0400):
/var/spool/pq/c.12345 - the control file
/var/spool/pq/d.12345.1 - the first (only) data file
It should then reset its effective UID to the real UID, so it is running as Joe. It can then open the file Joe asked to be printed, and copy it to the data file. It can also write whatever information it deems relevant to the control file (date, time, who submitted the request, the number of files to be printed, special printing options, etc). When it closes those files, Joe is no longer able to access them. But the program was able to copy Joe's file to its print queue.
So, that addresses questions 1 (permissions) and 2 (control information), and also 4 (control information again).
Regarding question 3, root is always a wild card on Unix-like systems because can do anything that he wants. However, as suggested by the rest of the answer, the control file contains information about the print request, including (in particular) who submitted it. You can use setgid programs instead of setuid programs; these work in analogous ways. However, under the system I postulated, the group permissions essentially didn't come into the picture. The pq program set the permissions on the control file and data file such that the group can't read it, and the directory permissions also deny group access.
We can postulate two more programs:
pqs - printer queue status
pqr - printer queue remove
These programs would also be setuid printer. The pqs program can read the control files in the directory and list pertinent information from them. The pqr program can read the control files to ensure that when Joe submitted job 12345 when he requests the removal of job 12345. If the program is satisfied, then it can delete the files.
Separately from these user-invoked programs, there would also be a daemon program (conventionally named pqd in this system) that would be kicked into action by pq if it is not already running. It would be responsible for reading the control files in the directory, and using that information to actually print the data files to the relevant printer. The details of how different printers and different data formats are handled are a problem for the daemon to deal with. The daemon too will run with printer privileges, and printer will have been given access to the printer devices (for locally attached printers) or configured to communicate over the network with a protocol such as IPP (Internet Printer Protocol). Joe probably won't be able to use the printer devices directly.
Note that the setuid programs have powers that Joe does not have. They must be written carefully to ensure that Joe cannot abuse those extra powers. Any setuid program is somewhat dangerous; a setuid root program can be lethal. In general, setgid programs are less dangerous. However, for both types of program, great care is required in the writing of such programs.
What alternatives to the make command are able to detect file changes on other criteria than timestamp?
So far I have only found Rant ( http://rant.rubyforge.org/ ) which is able to detect file changes based on MD5 checksums instead of file modification times.
Are there others?
Ideally I would be able to specify an external command that is used to detect file changes (in that case I would be able to compare local file checksums to checksums of remote files on S3 for example).
I suppose you are looking something like makepp signature methods. Options for makepp are:
exact_match
target_newer
md5
c_compilation_md5
shared_object
xml
default (= target_newer)
They are all described here