Using printf() in OpenCL kernel - opencl

While processing a vector with 1,000,000 elements I tried printing the global ID every 10,000 iterations to monitor progress in development by adding these lines to the kernel:
"#pragma OPENCL EXTENSION cl_amd_printf : enable \n" \
and
" if(id % 10000 == 0){ \n" \
" printf(\"%d\\r\\n\", id); \n" \
" } \n" \
That resulted in normal 3.0-3.3 second execution bloated into 38-40 seconds.
As I could not find any mention of performance in the section A.8.10 of AMD OpenCL 3.0 SDK, it is not immediately clear if this behavior is normal.
Is this performance hit normal and expected, or am I doing anything wrong?

Related

Is `ls -f | grep -c .` the fastest way to count files in directory, when using POSIX / Unix system (Big Data)?

I used to do ls path-to-whatever| wc -l, until I discovered, that it actually consumes huge amount of memory. Then I moved to find path-to-whatever -name "*" | wc -l, which seems to consume much graceful amount of memory, regardless how many files you have.
Then I learned that ls is mostly slow and less memory efficient due to sorting the results. By using ls -f | grep -c ., one will get very fast results; the only problem is filenames which might have "line breaks" in them. However, that is a very minor problem for most use cases.
Is this the fastest way to count files?
EDIT / Possible Answer: It seems that when it comes to Big Data, some versions of ls, find etc. have been reported to hang with >8 million files (need to be confirmed though). In order to succeed with very large file counts (my guess is > 2.2 billion), one should use getdents64 system call instead of getdents, which can be done with most programming languages, that support POSIX standards. Some filesystems might offer faster non-POSIX methods for counting files.
One way would be to use readdir and count the entries (in one directory). Below I'm counting regular file and using d_type==DT_REG which is available for limited OSs and FSs (man readdir and see NOTES) but you could just comment out that line and count all the dir entries:
#include <stdio.h>
#include <dirent.h>
int main (int argc, char *argv[]) {
struct dirent *entry;
DIR *dirp;
long long c; // 64 bit
if(argc<=1) // require dir
return 1;
dirp = opendir (argv[1]);
if (dirp == NULL) { // dir not found
return 2;
}
while ((entry = readdir(dirp)) != NULL) {
if(entry->d_type==DT_REG)
c++;
// printf ("%s\n", entry->d_name); // for outputing filenames
}
printf ("%lli\n", c);
closedir (dirp);
return 0;
}
Complie and run:
$ gcc code.c
$ ./a.out ~
254
(I need to clean my home dir :)
Edit:
I touched a 1000000 files into a dir and run a quick comparison (best user+sys of 5 presented):
$ time ls -f | grep -c .
1000005
real 0m1.771s
user 0m0.656s
sys 0m1.244s
$ time ls -f | wc -l
1000005
real 0m1.733s
user 0m0.520s
sys 0m1.248s
$ time ../a.out .
1000003
real 0m0.474s
user 0m0.048s
sys 0m0.424s
Edit 2:
As requested in comments:
$ time ./a.out testdir | wc -l
1000004
real 0m0.567s
user 0m0.124s
sys 0m0.468s

Ring buffer log file on unix

I'm trying to come up with a unix pipeline of commands that will allow me to log only the most recent n lines of a program's output to a text file.
The text file should never be more than n lines long. (it may be less when it is first filling the file)
It will be run on a device with limited memory/resources, so keeping the filesize small is a priority.
I've tried stuff like this (n=500):
program_spitting_out_text > output.txt
cat output.txt | tail -500 > recent_output.txt
rm output.txt
or
program_spitting_out_text | tee output.txt | tail -500 > recent_output.txt
Obviously neither works for my purposes...
Anyone have a good way to do this in a one-liner? Or will I have to write a script/utility?
Note: I don't want anything to do with dmesg and must use standard BSD unix commands. The "program_spitting_out_text" prints out about 60 lines/second, every second.
Thanks in advance!
If program_spitting_out_text runs continuously and keeps it's file open, there's not a lot you can do.
Even deleting the file won't help since it will still continue to write to the now "hidden" file (data still exists but there is no directory entry for it) until it closes it, at which point it will be really removed.
If it closes and reopens the log file periodically (every line or every ten seconds or whatever), then you have a relatively easy option.
Simply monitor the file until it reaches a certain size, then roll the file over, something like:
while true; do
sleep 5
lines=$(wc -l <file.log)
if [[ $lines -ge 5000 ]]; then
rm -f file2.log
mv file.log file2.log
touch file.log
fi
done
This script will check the file every five seconds and, if it's 5000 lines or more, will move it to a backup file. The program writing to it will continue to write to that backup file (since it has the open handle to it) until it closes it, then it will re-open the new file.
This means you will always have (roughly) between five and ten thousand lines in the log file set, and you can search them with commands that combine the two:
grep ERROR file2.log file.log
Another possibility is if you can restart the program periodically without affecting its function. By way of example, a program which looks for the existence of a file once a second and reports on that, can probably be restarted without a problem. One calculating PI to a hundred billion significant digits will probably not be restartable without impact.
If it is restartable, then you can basically do the same trick as above. When the log file reaches a certain size, kill of the current program (which you will have started as a background task from your script), do whatever magic you need to in rolling over the log files, then restart the program.
For example, consider the following (restartable) program prog.sh which just continuously outputs the current date and time:
#!/usr/bin/bash
while true; do
date
done
Then, the following script will be responsible for starting and stopping the other script as needed, by checking the log file every five seconds to see if it has exceeded its limits:
#!/usr/bin/bash
exe=./prog.sh
log1=prog.log
maxsz=500
pid=-1
touch ${log1}
log2=${log1}-prev
while true; do
if [[ ${pid} -eq -1 ]]; then
lines=${maxsz}
else
lines=$(wc -l <${log1})
fi
if [[ ${lines} -ge ${maxsz} ]]; then
if [[ $pid -ge 0 ]]; then
kill $pid >/dev/null 2>&1
fi
sleep 1
rm -f ${log2}
mv ${log1} ${log2}
touch ${log1}
${exe} >> ${log1} &
pid=$!
fi
sleep 5
done
And this output (from an every-second wc -l on the two log files) shows what happens at the time of switchover, noting that it's approximate only, due to the delays involved in switching:
474 prog.log 0 prog.log-prev
496 prog.log 0 prog.log-prev
518 prog.log 0 prog.log-prev
539 prog.log 0 prog.log-prev
542 prog.log 0 prog.log-prev
21 prog.log 542 prog.log-prev
Now keep in mind that's a sample script. It's relatively intelligent but probably needs some error handling so that it doesn't leave the executable running if you shut down the monitor.
And, finally, if none of that suffices, there's nothing stopping you from writing your own filter program which takes standard input and continuously outputs that to a real ring buffer file.
Then you would simply do:
program_spitting_out_text | ringbuffer 4096 last4k.log
That program could be a true ring buffer in that it treats the 4k file as a circular character buffer but, of course, you'll need a special marker in the file to indicate the write-point, along with a program that can turn it back into a real stream.
Or, it could do much the same as the scripts above, rewriting the file so that it's always below the size desired.
Since apparently this basic feature (circular file) does not exist on GNU/Linux, and because I needed it to track logs on my Raspberry Pi with limited storage, I just wrote the code as suggest above!
Behold: circFS
Unlike other tools quoted on this post and other similar, the maximum size is arbitrary and only limited by the actual available storage.
It does not rotate with several files, all is kept in the single file, which is rewritten on "release".
You can have as many log files as needed in the virtual directory.
It is a single C file (~600 lines including comments), and it builds with a single compile line after having installed fuse development dependencies.
This first version is very basic (see the README), if you want to improve it with some of the TODOs (see the TODO) be welcome to submit pull requests.
As a joke, this is my first "write only" fuse driver! :-)

grep -f maximum number of patterns?

I'd like to use grep on a text file with -f to match a long list (10,000) of patterns. Turns out that grep doesn't like this (who, knew?). After a day, it didn't produce anything. Smaller lists work almost instantaneously.
I was thinking I might split my long list up and do it a few times. Any idea what a good maximum length for the pattern list might be?
Also, I'm rather new with unix. Alternative approaches are welcome. The list of patterns, or search terms, are in a plaintext file, one per line.
Thank you everyone for your guidance.
From comments, it appears that the patterns you are matching are fixed strings. If that is the case, you should definitely use -F. That will increase the speed of the matching considerably. (Using 479,000 strings to match on an input file with 3 lines using -F takes under 1.5 seconds on a moderately powered machine. Not using -F, that same machine is not yet finished after several minutes.)
i got the same problem with approx. 4 million patterns to search for in a file with 9 million lines. Seems like it is a problem of RAM. so i got this neat little work around which might be slower than splitting and joining but it just need this one line.
while read line; do grep $line fileToSearchIn;done < patternFile
I needed to use the work around since the -F flag is no solution for that large files...
EDIT: This seems to be really slow for large files. After some more research i found 'faSomeRecords' and really other awesome tools from Kent NGS-editing-Tools
I tried it on my own by extracting 2 million fasta-rec from 5.5million records file. Took approx. 30 sec..
cheers
EDIT: direct download link
Here is a bash script you can run on your files (or if you would like, a subset of your files). It will split the key file into increasingly large blocks, and for each block attempt the grep operation. The operations are timed - right now I'm timing each grep operation, as well as the total time to process all the sub-expressions.
Output is in seconds - with some effort you can get ms, but with the problem you are having it's unlikely you need that granularity.
Run the script in a terminal window with a command of the form
./timeScript keyFile textFile 100 > outputFile
This will run the script, using keyFile as the file where the search keys are stored, and textFile as the file where you are looking for keys, and 100 as the initial block size. On each loop the block size will be doubled.
In a second terminal, run the command
tail -f outputFile
which will keep track of the output of your other process into the file outputFile
I recommend that you open a third terminal window, and that you run top in that window. You will be able to see how much memory and CPU your process is taking - again, if you see vast amounts of memory consumed it will give you a hint that things are not going well.
This should allow you to find out when things start to slow down - which is the answer to your question. I don't think there's a "magic number" - it probably depends on your machine, and in particular on the file size and the amount of memory you have.
You could take the output of the script and put it through a grep:
grep entire outputFile
You will end up with just the summaries - block size, and time taken, e.g.
Time for processing entire file with blocksize 800: 4 seconds
If you plot these numbers against each other (or simply inspect the numbers), you will see when the algorithm is optimal, and when it slows down.
Here is the code: I did not do extensive error checking but it seemed to work for me. Obviously in your ultimate solution you need to do something with the outputs of grep (instead of piping it to wc -l which I did just to see how many lines were matched)...
#!/bin/bash
# script to look at difference in timing
# when grepping a file with a large number of expressions
# assume first argument = name of file with list of expressions
# second argument = name of file to check
# optional third argument = initial block size (default 100)
#
# split f1 into chunks of 1, 2, 4, 8... expressions at a time
# and print out how long it took to process all the lines in f2
if (($# < 2 )); then
echo Warning: need at leasttwo parameters.
echo Usage: timeScript keyFile searchFile [initial blocksize]
exit 0
fi
f1_linecount=`cat $1 | wc -l`
echo linecount of file1 is $f1_linecount
f2_linecount=`cat $2 | wc -l`
echo linecount of file2 is $f2_linecount
echo
if (($# < 3 )); then
blockLength=100
else
blockLength=$3
fi
while (($blockLength < f1_linecount))
do
echo Using blocks of $blockLength
#split is a built in command that splits the file
# -l tells it to break after $blockLength lines
# and the block$blockLength parameter is a prefix for the file
split -l $blockLength $1 block$blockLength
Tstart="$(date +%s)"
Tbefore=$Tstart
for fn in block*
do
echo "grep -f $fn $2 | wc -l"
echo number of lines matched: `grep -f $fn $2 | wc -l`
Tnow="$(($(date +%s)))"
echo Time taken: $(($Tnow - $Tbefore)) s
Tbefore=$Tnow
done
echo Time for processing entire file with blocksize $blockLength: $(($Tnow - $Tstart)) seconds
blockLength=$((2*$blockLength))
# remove the split files - no longer needed
rm block*
echo block length is now $blockLength and f1 linecount is $f1_linecount
done
exit 0
You could certainly give sed a try to see whether you get a better result, but it is a lot of work to do either way on a file of any size. You didn't provide any details on your problem, but if you have 10k patterns I would be trying to think about whether there is some way to generalize them into a smaller number of regular expressions.
Here is a perl script "match_many.pl" which addresses a very common subset of the "large number of keys vs. large number of records" problem. Keys are accepted one per line from stdin. The two command line parameters are the name of the file to search and the field (white space delimited) which must match a key. This subset of the original problem can be solved quickly since the location of the match (if any) in the record is known ahead of time and the key always corresponds to an entire field in the record. In one typical case it searched 9400265 records with 42899 keys, matching 42401 of the keys and emitting 1831944 records in 41s. The more general case, where the key may appear as a substring in any part of a record, is a more difficult problem that this script does not address. (If keys never include white space and always correspond to an entire word the script could be modified to handle that case by iterating over all fields per record, instead of just testing the one, at the cost of running M times slower, where M is the average field number where the matches are found.)
#!/usr/bin/perl -w
use strict;
use warnings;
my $kcount;
my ($infile,$test_field) = #ARGV;
if(!defined($infile) || "$infile" eq "" || !defined($test_field) || ($test_field <= 0)){
die "syntax: match_many.pl infile field"
}
my %keys; # hash of keys
$test_field--; # external range (1,N) to internal range (0,N-1)
$kcount=0;
while(<STDIN>) {
my $line = $_;
chomp($line);
$keys {$line} = 1;
$kcount++
}
print STDERR "keys read: $kcount\n";
my $records = 0;
my $emitted = 0;
open(INFILE, $infile ) or die "Could not open $infile";
while(<INFILE>) {
if(substr($_,0,1) =~ /#/){ #skip comment lines
next;
}
my $line = $_;
chomp($line);
$line =~ s/^\s+//;
my #fields = split(/\s+/, $line);
if(exists($keys{$fields[$test_field]})){
print STDOUT "$line\n";
$emitted++;
$keys{$fields[$test_field]}++;
}
$records++;
}
$kcount=0;
while( my( $key, $value ) = each %keys ){
if($value > 1){
$kcount++;
}
}
close(INFILE);
print STDERR "records read: $records, emitted: $emitted; keys matched: $kcount\n";
exit;

Running C++ program multiple times

I have a C++ program which I need to run it multiple times.
For example:-
Run ./addTwoNumbers 50 times.
What would be a good approach to solve this problem?
In POSIX shells,
for i in {1..50} ; do ./addTwoNumbers ; done
If this is code you are writing, take the number of times you want to "run" as an argument:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char* argv[]) {
int numTimes = 1;
if (argc > 1)
{
numtimes = atoi(argv[1]);
}
for (int i = 0; i < numTimes; i++)
{
// Your code goes here
}
}
(Note this doesn't do any sanity checking on the input, but it should point you in the right direction)
The way you were asking the question indicated that you had a finished binary. You want to run it as if it was from the command line. The forward slash, to me, is a clue that you are a Unix like operating system user. Well, that, and the fact that this post is tagged "Unix", which I just saw after writing the below. It should all be applicable.
The scheme of using the shell is probably the simplest one.
man bash tells you how to write a shell script. Actually we need to figure out what shell you are using. From the command line, type:
echo $SHELL
The response I get is
/bin/bash
Meaning that I am running bash. Whatever you get, copy down, you will need it later.
The absolutely lowest knowledge base is to simply create a file with any standard text editor and no suffix. Call it, simply (for example) run50.
The first line is a special line that tells the unix system to use bash to run the command:
#! /bin/bash
(or whatever you got from echo $SHELL).
Now, in the file, on the next line, type the complete path, from root, to the executable.
Type the command just as if you were typing it on the command line. You may put any arguments to your program there as well. Save your file.
Do you want to run the program, and wait for it to finish, then start the next copy? Or do you want to start it 50 times as fast as you can without waiting for it to finish? If the former, you are done, if the latter, end the line with &
That tells the shell to start the program and to go on.
Now duplicate that line 50 times. Copy and paste, it is there twice, select all, and then paste at the end, for 4 times, again for 8, again for 16, and again for 32. Now copy 18 more lines and paste those at the end and you are done. If you happen to copy the line that says #! /bin/bash don't worry about it, it is a comment to the shell.
Save the file.
From the command line, enter the following command:
chmod +x ./filenameofmyshellcommand
Where you will replace filenameofmyshellcommand with the name of the file you just created.
Finally run the command:
./filenameofmyshellcommand
And it should run the program 15 times.
If you are using bash, instead of duplicating the line 50 times, you can write a loop:
for ((i=1;i<=50;i++)) do
echo "Invocation $i"
/complete/path/to/your/command
done
I have included a message that tells you which run the command is on. If you are timing the program I would not recommend a "feelgood" message like this. You can end the line with & if you want the command to be started and the script to continue.
The double parenthesis are required for this syntax, and you have to pay your syntax.
for ((i=1;i<=50;i++)) do echo "invocation $i" & done
is an interesting thing to just enter from the command line, for fun. It will start the 50 echos disconnected from the command line, and they often come out in a different order than 1 to 50.
In Unix, there is a system() library call that will invoke a command more or less as if from the terminal. You can use that call from C++ or from perl or about a zillion other programs. But this is the simplest thing you can do, and you can time your program this way. It is the common approach in Unix for running one program or a sequence of programs, or for doing common tasks by running a series of system tools.
If youy are going to use Unix, you should know how to write a simple shell script.
int count=0;
int main()
{
beginning:
//do whatever you need to do;
int count++;
if (count<=50);
{
goto beginning;
}
return 0;
}

Unix strace command

I found the following bash script in order to monitor cp progress.
#!/bin/sh
cp_p()
{
strace -q -ewrite cp -- "${1}" "${2}" 2>&1 \
| awk '{
count += $NF
if (count % 10 == 0) {
percent = count / total_size * 100
printf "%3d%% [", percent
for (i=0;i<=percent;i++)
printf "="
printf ">"
for (i=percent;i<100;i++)
printf " "
printf "]\r"
}
}
END { print "" }' total_size=$(stat -c '%s' "${1}") count=0
}
I don't understand the "-ewrite" option for the strace command. The closest thing I've found is the man page for strace which is
-e write=set Perform a full hexadecimal and ASCII dump of all the
data written to file descriptors
listed in the specified set. For
example, to see all output activity on
file descriptors 3 and 5 use -e
write=3,5. Note that this is
independent from the normal tracing of
the write(2) system call which is
controlled by the option -e
trace=write.
However I don't understand what the -ewrite option does.
-ewrite means that only the "write" system call will be traced.
-e expr A qualifying expression which modifies which events
to trace or how to trace them. The format of the
expression is:
[qualifier=][!]value1[,value2]...
where qualifier is one of trace, abbrev, verbose,
raw, signal, read, or write and value is a quali-
fier-dependent symbol or number. The default qual-
ifier is trace. Using an exclamation mark negates
the set of values. For example, -eopen means lit-
erally -e trace=open which in turn means trace only
the open system call. By contrast, -etrace=!open
means to trace every system call except open. In
addition, the special values all and none have the
obvious meanings.
Note that some shells use the exclamation point for
history expansion even inside quoted arguments. If
so, you must escape the exclamation point with a
backslash.

Resources