What alternatives to the make command are able to detect file changes on other criteria than timestamp? - build-process

What alternatives to the make command are able to detect file changes on other criteria than timestamp?
So far I have only found Rant ( http://rant.rubyforge.org/ ) which is able to detect file changes based on MD5 checksums instead of file modification times.
Are there others?
Ideally I would be able to specify an external command that is used to detect file changes (in that case I would be able to compare local file checksums to checksums of remote files on S3 for example).

I suppose you are looking something like makepp signature methods. Options for makepp are:
exact_match
target_newer
md5
c_compilation_md5
shared_object
xml
default (= target_newer)
They are all described here

Related

How to get filtered list of files from SFTP server using SSHJ [duplicate]

I am using SSHJ SFTP library to get file list from SFTP-server.
The connection to server is very slow and there are tens of thousands of files in directory. Often getting file list will end in various timeout / socket errors.
Is there possibility to tell the client to retrieve file list only from eg. ".zip" files so that it would have positive impact on the performance? Pseudo command: sftpClient.ls("*.zip")
I know there is a method List<RemoteResourceInfo> net.schmizz.sshj.sftp.SFTPClient.ls(String path, RemoteResourceFilter filter) which will filter the list, but from what I understand, the filtering would happen only in client side? ie. the client would still receive whole file list and just after then it would be filtered.
Is there any way to achieve this so that server would only return the names requested? Does the SFTP-protocol even support this?
Indeed, the SFTP protocol does not have a way to provide a list of files matching any criteria. It does not matter, what SFTP library you are using.
You would have to use another interface/API if you need the filtered list. If you have a shell access, you might use shell command ls *.zip.
Or build you own (REST?) API.

Is there anything like shm_open() without filename?

The POSIX shm_open() function returns a file descriptor that can be used to access shared memory. This is extremely convenient because one can use all the traditional mechanisms for controlling file descriptors to also control shared memory.
The only drawback is that shm_open() always wants a filename. So I need to do this:
// Open with a clever temp file name and hope for the best.
fd = shm_open(tempfilename, O_RDWR | O_CREAT | O_EXCL, 0600);
// Immediately delete the temp file to keep the shm namespace clean.
shm_unlink(tempfilename);
// Then keep using fd -- the shm object remains as long as there are open fds.
This use of tempfilename is difficult to do portably and reliably. The interpretation of the filename (what the namespace is, how permissions are handled) differs among systems.
In many situations the processes using the shared memory object have no need for a filename because the object can be accessed more simply and safely by just passing a file descriptor from one process to another. So is there something that's just like shm_open() but can be used without touching the shared memory filename namespace?
mmap() with MAP_ANON|MAP_SHARED is great but instead of a file descriptor it gives a pointer. The pointer doesn't survive over an exec boundary and can't be sent to another process over a Unix domain socket like file descriptors can.
The file descriptor returned by shm_open() also doesn't survive an exec boundary by default: the POSIX definition says that the FD_CLOEXEC file descriptor flag associated with the new file descriptor is set. But it is possible to clear the flag using fcntl() on MacOS, Linux, FreeBSD, OpenBSD, NetBSD, DragonFlyBSD and possibly other operating systems.
A library to solve the problem
I managed to write a library that provides the simple interface:
int shm_open_anon(void);
The library compiles without warnings and successfully runs a test program on Linux, Solaris, MacOS, FreeBSD, OpenBSD, NetBSD, DragonFlyBSD and Haiku. You may be able to adapt it to other operating systems; please send a pull request if you do.
The library returns a file descriptor with the close-on-exec flag set. You can clear that flag using fcntl() on all supported operating systems, which will allow you to pass the fd over exec(). The test program demonstrates that this works.
Implementation techniques used in the library
The readme of the library has very precise notes on what was done and what wasn't done for each OS. Here's a summary of the main stuff.
There are several non-portable things that are more or less equivalent to shm_open() without a filename:
FreeBSD can take SHM_ANON as the pathname for shm_open() since 2008.
Linux has a memfd_create() system call since kernel version 3.17.
Earlier versions of Linux can use mkostemp(name, O_CLOEXEC | O_TMPFILE) where name is something like /dev/shm/XXXXXX. Note that we are not using shm_open() at all here -- mkostemp() is implicitly using a perfectly ordinary open() call. Linux mounts a special memory-backed file system in /dev/shm but some distros use /run/shm instead so there are pitfalls here. And you still have to shm_unlink() the temp file.
OpenBSD has a shm_mkstemp() call since release 5.4. You still have to shm_unlink() the temp file but at least it is easy to create safely.
For other OSes, I did the following:
Figure out an OS-dependent format for the name argument of POSIX shm_open(). Please note that there is no name you can pass that is absolutely portable. For example, NetBSD and DragonFlyBSD have conflicting demands about slashes in the name. This applies even if your goal is to use a named shm object (for which the POSIX API was designed) instead of an anonymous one (as we are doing here).
Append some random letters and numbers to the name (by reading from /dev/random). This is basically what mktemp() does, except we don't check whether our random name exists in the file system. The interpretation of the name argument varies wildly so there's no reasonable way to portably map it to an actual filename. Also Solaris doesn't always provide mktemp(). For all practical purposes, the randomness we put in will ensure a unique name for the fraction of a second that we need it.
Open the shm object with that name via shm_open(name, O_RDWR | O_CREAT | O_EXCL | O_NOFOLLOW, 0600). In the astronomical chance that our random filename already exists, O_EXCL will cause this call to fail anyway, so no harm done. The 0600 permissions (owner read-write) are necessary on some systems instead of blank 0 permissions.
Immediately call shm_unlink() to get rid of the random name. The file descriptor remains for our use.
This technique is not quaranteed to work by POSIX, but:
The shm_open() name argument is underspecified by POSIX so nothing else is guaranteed to work either.
I'll let the above compatibility list speak for itself.
Enjoy.
No, there isn't. Since both System V shared memory model and POSIX shared file mapping for IPC require operations with a file, there is always need for a file in order to do mapping.
mmap() with MAP_ANON|MAP_SHARED is great but instead of a file
descriptor it gives a pointer. The pointer doesn't survive over an
exec boundary and can't be sent to another process over a Unix domain
socket like file descriptors can.
As John Bollinger says,
Neither memory mappings created via mmap() nor POSIX shared-memory
segments obtained via shm_open() nor System V shared-memory segments
obtained via shmat() are preserved across an exec.
There must be a well-known place on the memory to meet and exchange information. That's why a file is the requirement. By doing this, after exec, the child is able to reconnect to the appropriate shared memory.
This use of tempfilename is difficult to do portably and reliably. The interpretation of the filename (what the namespace is, how permissions are handled) differs among systems.
You can have mkstemp create a unique filename in /dev/shm/ or /tmp and open the file for you. You can then unlink the filename, so that no other process can open this file, apart from the process that have the file descriptor returned from mkstemp.
mkstemp(): CONFORMING TO 4.3BSD, POSIX.1-2001.
Why not creating it with access rights to 0?
Thus no process would be able to "open" it and let you unlink it safely just after?

What would be the best practice downloading all the files from a directory using Sftp

I would like to implement the following functionality:
downloading all the files from a specified remote directory to a local directory.
after downloading all the files I need a list file which contains all the downloaded files.
(I only want this list file when all the files were downloaded successfully.)
Point 1:
Let's say we have around 10 files in the remote directory.
I can use an int-sftp:inbound-channel-adapter component to download all the files but 10 poll cycles are needed to download all of them since the inbound component is only able to download 1 file per poll request.
Spring Integration creates 10 File messages one by one.
Questions:
How can I identify the last file (message) received from the FTP server?
I don't want let users access to list file till all the files from the FTP is successfully received.
How can I achive this?
I can write file names into a list file using the int-file:outbound-channel-adapter but users can read temorary information from that file before the download process is finished.
How can I trigger the event that all files which are on the FTP are downloaded?
Thanks for your advices
Ferenc
First of all this isn't correct:
the inbound component is only able to download 1 file per poll request
You can configure it to to download infinitely during the single poll - max-messages-per-poll=-1. Anyway it is a default option on <poller>.
Anyway if it is your case to dowload one file per poll, you can go ahead with that requirements.
Since any Messaging system tries to achieve stateless paradigm, it is normal that one message doesn't know anything about another. And with that they all don't impact each other. The async scenario is the best for Messaging. With that we can process the second message quicker, than the first one.
Your requirement is enough interest and I won't dare to call it strange. Because any business may have place.
Since you are going to process several download files as one group, there will be need to have some marker on the remote server. Or it can be some timeframe, which we can extract from file timestamp. Or there will be need to store on the remote server some marker file to point that a set of files are finished and you can process them from your application using their local version. Would be great, if that marker file can contain a list of file names of that group.
Otherwise we don't have any hook to group messages for those files.
From other side you can consider to use <int-sftp:outbound-gateway> with MGET command: http://docs.spring.io/spring-integration/docs/latest-ga/reference/html/sftp.html#sftp-outbound-gateway

Prevent easy downloading of mp3s that play on mp3 players?

I have mp3 players set up on my site to play mp3s. At the moment, users can easily look through the source, run a search for "mp3" and download all of the music on my site. I know it's virtually impossible to completely prevent a determined user from downloading the music but I want to make it harder for the average user. Is there any way I can obfuscate the links to the mp3s?
Relevant site: http://boyceavenue.com/music
You did not specify the language you are using. To expand upon what Marc B wrote, I would recommend using the PHP http_send_file command along with the checksum of the file.
To send the file, use the following:
$filename = "/absolute/or/relative/path/to/file.ext";
$mime_type = "audio/mpeg"; // See note below
http_send_content_disposition($filename, true);
http_send_content_type($mime_type);
http_throttle(0.1, 2048);
http_send_file($filename);
If you are serving up multiple types of files using PHP 5.3.0 or later, you could determine the mimetype this way:
$filename = "/absolute/or/relative/path/to/file.ext";
$finfo = finfo_open(FILEINFO_MIME_TYPE);
$mime_type = finfo_file($finfo, $filename);
finfo_close($finfo);
Calculating the checksum is simple enough. Just use md5_file.
So, what can you do with the checksum?
Well, you could create an array of checksums and filenames that cross-reference each other. That is, you include the checksum in the links to the file, but have a little routine that looks up the checksum and delivers the mp3 file. You also could do this in a database. You also could do like some apps that store files in a directory structure based on their checksums (music/3/3a/song.mp3 with a checksum of 3a62f6 or whatever).
If you don't care about the filenames being mangled, you could save the files with a checksum for the filename. That could be done at upload time (if your files are being uploaded) or through a batch script (using the CLI).
Another thing you should do is to put a default document (index.php or whatever) that tells people to look elsewhere. Also disable browsing the directory. If only a very small number of people will need access, you could also throw a password on the directory, thus requiring a login to access the files.

Connect R to POP Email Server (Gmail)

Is is possible to have R connect to gmail's POP server and read/download the messages in a specific folder of mine? I have been storing emails and would like to go back and start to analyze subject lines, etc.
Basically, I need a way to export a folder in my gmail account and I would like to do this pro grammatically if it all possible.
Thanks in advance!
I am not sure that this can be done via a single command. Maybe there is a package out there, which I am not aware of that can accomplish that, but as long as you do not run into that maybe the following process would be a solution ...
Consider got-your-back (http://code.google.com/p/got-your-back/wiki/GettingStarted#Step_4%3a_Performing_A_Backup) which "is a command line tool that backs up and restores your Gmail account".
You can invoke it like this (given that python is available on your machine):
python gyb.py --email foo#bar.com --search "from:pip#pop.com" --folder "mail_from_pip"
After completion you'll find all the emails matching the --search in the specified --folder, along with a sqlite database. (posted by dukedave, Dec 4 '11)
So depending on your OS you should be able to invoke the above command from within R and then access the downloaded mails in the respective folder.
GotYourBack is a good backup utility, but for downloading metadata for analysis, you might want something that doesn't first require you to fetch the entire content of all your email.
I've recently used the gmailr package to do a similar analysis.

Resources