ack: Excluding only one directory but keeping all others with the same name - ack

My folder structure looks like this:
/app
/app/data
...
/app/secondary
/app/secondary/data
I want to recursively search /app, including /app/data. I do not want to search /app/secondary/data however. This what I have so far:
ack --ignore-dir=data searchtext
ack --ignore-dir=secondary/data searchtext
The first command is ignoring both directories and the second one is ignoring neither of them. From within the app folder, what should my ack command look like?

The older versions of ack can only take the folder name, not the folder path. As of version 1.93_02, they've added this ability in:
1.93_02 Wed Oct 6 21:39:58 CDT 2010
[ENHANCEMENTS]
The --ignore-dir option now can ignore entire paths relative
to your current directory. Thanks to Nick Hooey. For example:
ack --ignore-dir=t/subsystem/test-data
(From betterthangrep.com/Changes)
You can check which version you have with --version:
ack --version

This answer is for versions of Ack prior to 2, see This answer for versions of Ack >=2.
The first one is ignoring both because they both have 'data' as a sub-directory and ack searches sub-dirs by default. So it will ignore any sub-dir with that name. Unfortunately, your second way doesn't work either. This works for me:
ack -a searchtext -G '^(?!.*secondary/data.*).*$'
Instead of -a to search all files, see ack-grep --help=types to search for only certain file types, eg --type=text

Related

Udev-Rule with "PROGRAM" statement is nor executed anymore after update to ubuntu 22.04.1

I'm running a udev rule on my 3D printing server to automatically create easily identifyable symlinks to some attached microcontroller boards, which worked perfectly fine on ubuntu 20.04.
The rule triggers on the usb vendor and product ids and runs a python script via the PROGRAM directive. The script connects to the Microcontroller boards and reads it's init sequence to get the board's 'name'. It then outputs a string like "aaaaaaa b cccccc" and only the first block (containing the name) is used in the udev rule.
However, it seems like the whole PROGRAM directive is not executed at all anymore, since I updated my system to ubuntu 22.04.1.
My udev rule currently looks like this (While debugging. Regularly it just contained lines 1 and 3. I added #2 for testing purposes because the hook in line 1 works and that script is executed):
KERNELS=="ttyUSB*", ENV{ID_VENDOR_ID}=="0403", ENV{ID_MODEL_ID}=="6001", ENV{ID_SERIAL_SHORT}!="AI046A0Q", ACTION=="add|remove", RUN="/bin/su me -c \"/opt/me/deviceReg.py -d %k -a %E{ACTION}\""
KERNELS=="ttyUSB*", ENV{ID_VENDOR_ID}=="0403", ENV{ID_MODEL_ID}=="6001", ENV{ID_SERIAL_SHORT}!="AI046A0Q", ACTION=="add|remove", PROGRAM="/opt/me/serialUdev.py -s %s{serial} /dev/%k", SYMLINK+="%c{1}", OWNER="me", GOTO="script_end"
SUBSYSTEM=="tty", ATTRS{idVendor}=="0403", ATTRS{idProduct}=="6001", ATTRS{serial}!="AI046A0Q", PROGRAM="/opt/me/serialUdev.py -s %s{serial} /dev/%k", SYMLINK+="%c{1}", OWNER="me", GOTO="script_end"
SUBSYSTEM=="tty", ATTRS{idVendor}=="0403", ATTRS{idProduct}=="6001", ATTRS{serial}=="A9QXPRV7", SYMLINK+="tty_MainSwitch", GROUP="dialout", OWNER="me", GOTO="script_end"
SUBSYSTEM=="tty", ATTRS{idVendor}=="0403", ATTRS{idProduct}=="6001", ATTRS{serial}=="A9QOIMJ6", SYMLINK+="tty_Cooler", GROUP="dialout", OWNER="me", GOTO="script_end"
SUBSYSTEM=="tty", ATTRS{idVendor}=="0403", ATTRS{idProduct}=="6001", ATTRS{serial}=="A9PTMHGV", SYMLINK+="tty_CurrentTransformer", GROUP="dialout", OWNER="me", GOTO="script_end"
The python scripts write to some logfiles which clearly indicate that only lines 1 and 4, 5 or 6 are executed.
Is there anything in line 3 that isn't supported anymore in the latest udev version? As I said, line 3 worked perfectly before I updated the system.
The last 3 lines are my current workaround. They work fine but that's not what I want to achive with this whole naming system at all.
The python script in lines 2 & 3 runs perfectly fine, either if called as standard user or as root. It would also deliver valid output if the '-s ' input data is not matching the uc board, is missing or is random garbage.
Does anyone have an idea why the script omits any line with a PROGRAM statement?
Ok, I was able to solve the issue.
I set udev's log level to debug too see what's actually happening when the device is handled. The script actually IS invoked but immediately failed during importing needed modules: The pyserial module could not be found.
The module is installed though, but obviously in a way it could not be imported.
However, I checked the python script again and changed the first line from #!/usr/bin/env python3 to #!/usr/bin/python3 and now it works again.
So my problem actually wasn't related to udev at all, it was just my python script.

I cannot retrieve all entries from openLDAP database

I set up openLDAP on my Ubuntu server and filled the database via python-ldap with 10.000 persons.
Now, when trying to search for all of them, at first I only got 500 entries.
$ ldapsearch -x -h 192.168.1.222 -b dc=ldap-test,dc=xxx,dc=xx
I googled for a solution, and I read about a server side limit.
Then I changed following value from 500 to:
olcSizeLimit: unlimited
I also tried 15.000, but with the same effect.
Now, with the same search command I get:
# numResponses: 992
# numEntries: 991
I cannot find any 992 or 991 number restriction anywhere. I also grepped for sizelimit - only result is the above setting.
I also read about client side restrictions, but I tried the same search command against the old, deprecated test server, and there I get all 10.000 results.
I'd appreciate any help.
The problem was the generation of the data.
I used the Python package Faker which faked the last names, which I used as cn.
As Faker only provides a limited number of last names, the generation of names errored silently.
I fixed the problem by using the complete name for cn.

How does execlp work exactly?

So I am looking at my professor's code that he handed out to try and give us an idea of how to implement >, <, | support into our unix shell. I ran his code and was amazed at what actually happened.
if( pid == 0 )
{
close(1); // close
fd = creat( "userlist", 0644 ); // then open
execlp( "who", "who", NULL ); // and run
perror( "execlp" );
exit(1);
}
This created a userlist file in the directory I was currently in, with the "who" data inside that file. I don't see where any connection between fd, and execlp are being made. How did execlp manage to put the information into userlist? How did execlp even know userlist existed?
Read Advanced Linux Programming. It has several chapters related to the issue. And we cannot explain all this in a few sentences. See also the standard stream and process wikipages.
First, all the system calls (see syscalls(2) for a list, and read the documentation of every individual system call that you are using) your program is doing should be tested against failure. But assume they all succeed. After close(1); the file descriptor 1 (STDOUT_FILENO) is free. So creat("userlist",0644) is likely to re-use it, hence fd is 1; you have redirected your stdout to the newline created userlist file.
At last, you are calling execlp(3) which will call execve(2). When successful, your entire process is restarted with the new executable (so a fresh virtual address space is given to it), and its stdout is still the userlist file descriptor. In particular (unless execve fails) the perror call is not reached.
So your code is a bit what a shell running who > userlist is doing; it does a redirection of stdout to userlist and runs the who command.
If you are coding a shell, use strace(1) -notably with -f option- to understand what system calls are done. Try also strace -f /bin/sh -c ls to look into the behavior of a shell. Study also the source code of existing free software shells (e.g. bash and sash).
See also this and the references I gave there.
execlp knowns nothing. Before execing stdout was closed and a file opened, so the descriptor is the one corresponding to stdout (opens always returns the lowest free descriptor). At that point the process has an "stdout" plugged to the file. Then exec is called and this replaces to whole address space, but some properties remains as the descriptors, so know the code of who is executed with an stdout that correspond to the file. This is the way redirections are managed by shells.
Remember that when you use printf (for example) you never specify what stdout exactly is... That can be a file, a terminal, etc.
Basile Starynkevitch correctly explained:
After close(1); the file descriptor 1 (STDOUT_FILENO) is free. So creat("userlist",0644) is likely to re-use it…
This is because, as Jean-Baptiste Yunès wrote, "opens always returns the lowest free descriptor".
It should be stressed that the professor's code only likely works; it fails if file descriptor 0 is closed.

Configuring Nexus 3 (3.0m7) to run as a Linux Service

Can anyone help me translate the instructions for setting this up as a Linux Service (at http://books.sonatype.com/nexus-book/3.0/reference/install.html#service-linux) into English?
After following them as best I could, I get the following when starting the service:
su: user / does not exist
Here are the parts of the instructions which were unclear:
In the bin/nexus script remove the line below.
INSTALL4J_JAVA_PREFIX="su - $run_as_user -c"
The line in the file is actually
INSTALL4J_JAVA_PREFIX=""
but ok, I can remove that. However, the next instruction is:
Replace the entire link with this line:
exec su - $run_as_user "$prg_dir/$progname" $#
What is meant by "the entire link"? The thing I removed above? That was the first line in the file - therefore the three variables above have not yet been set.... and is probably the reason the script currently fails.
I'll get the book fixed, it shouldn't have this in it anymore.
Download the 3.0 release, this was just a bug in 3.0m7, and it has been fixed. You don't need to make these changes.
https://support.sonatype.com/hc/en-us/articles/217965118
The only things you need to do is edit $NEXUS_HOME/bin/nexus.rc, uncomment the run_as_user line, and set the value for it appropriately. Then just symlink $NEXUS_HOME/bin/nexus to /etc/init.d/nexus, and after that run chkconfig or update-rc.d depending on your Linux version.

How do Perl Cwd::cwd and Cwd::getcwd functions differ?

The question
What is the difference between Cwd::cwd and Cwd::getcwd in Perl, generally, without regard to any specific platform? Why does Perl have both? What is the intended use, which one should I use in which scenarios? (Example use cases will be appreciated.) Does it matter? (Assuming I don’t mix them.) Does choice of either one affect portability in any way? Which one is more commonly used in modules?
Even if I interpret the manual is saying that except for corner cases cwd is `pwd` and getcwd just calls getcwd from unistd.h, what is the actual difference? This works only on POSIX systems, anyway.
I can always read the implementation but that tells me nothing about the meaning of those functions. Implementation details may change, not so defined meaning. (Otherwise a breaking change occurs, which is serious business.)
What does the manual say
Quoting Perl’s Cwd module manpage:
Each of these functions are called without arguments and return the absolute path of the current working directory.
getcwd
my $cwd = getcwd();
Returns the current working directory.
Exposes the POSIX function getcwd(3) or re-implements it if it's not available.
cwd
my $cwd = cwd();
The cwd() is the most natural form for the current architecture. For most systems it is identical to `pwd` (but without the trailing line terminator).
And in the Notes section:
Actually, on Mac OS, the getcwd(), fastgetcwd() and fastcwd() functions are all aliases for the cwd() function, which, on Mac OS, calls `pwd`. Likewise, the abs_path() function is an alias for fast_abs_path()
OK, I know that on Mac OS1 there is no difference between getcwd() and cwd() as both actually boil down to `pwd`. But what on other platforms? (I’m especially interested in Debian Linux.)
1 Classic Mac OS, not OS X. $^O values are MacOS and darwin for Mac OS and OS X, respectively. Thanks, #tobyink and #ikegami.
And a little meta-question: How to avoid asking similar questions for other modules with very similar functions? Is there a universal way of discovering the difference, other than digging through the implementation? (Currently, I think that if the documentation is not clear about intended use and differences, I have to ask someone more experienced or read the implementation myself.)
Generally speaking
I think the idea is that cwd() always resolves to the external, OS-specific way of getting the current working directory. That is, running pwd on Linux, command /c cd on DOS, /usr/bin/fullpath -t in QNX, and so on — all examples are from actual Cwd.pm. The getcwd() is supposed to use the POSIX system call if it is available, and falls back to the cwd() if not.
Why we have both? In the current implementation I believe exporting just getcwd() would be enough for most of systems, but who knows why the logic of “if syscall is available, use it, else run cwd()” can fail on some system (e.g. on MorphOS in Perl 5.6.1).
On Linux
On Linux, cwd() will run `/bin/pwd` (will actually execute the binary and get its output), while getcwd() will issue getcwd(2) system call.
Actual effect inspected via strace
One can use strace(1) to see that in action:
Using cwd():
$ strace -f perl -MCwd -e 'cwd(); ' 2>&1 | grep execve
execve("/usr/bin/perl", ["perl", "-MCwd", "-e", "cwd(); "], [/* 27 vars */]) = 0
[pid 31276] execve("/bin/pwd", ["/bin/pwd"], [/* 27 vars */] <unfinished ...>
[pid 31276] <... execve resumed> ) = 0
Using getcwd():
$ strace -f perl -MCwd -e 'getcwd(); ' 2>&1 | grep execve
execve("/usr/bin/perl", ["perl", "-MCwd", "-e", "getcwd(); "], [/* 27 vars */]) = 0
Reading Cwd.pm source
You can take a look at the sources (Cwd.pm, e.g. in CPAN) and see that for Linux cwd() call is mapped to _backtick_pwd which, as the name suggests, calls the pwd in backticks.
Here is a snippet from Cwd.pm, with my comments:
unless ($METHOD_MAP{$^O}{cwd} or defined &cwd) {
...
# some logic to find the pwd binary here, $found_pwd_cmd is set to 1 on Linux
...
if( $os eq 'MacOS' || $found_pwd_cmd )
{
*cwd = \&_backtick_pwd; # on Linux we actually go here
}
else {
*cwd = \&getcwd;
}
}
Performance benchmark
Finally, the difference between two is that cwd(), which calls another binary, must be slower. We can make some kind of a performance test:
$ time perl -MCwd -e 'for (1..10000) { cwd(); }'
real 0m7.177s
user 0m0.380s
sys 0m1.440s
Now compare it with the system call:
$ time perl -MCwd -e 'for (1..10000) { getcwd(); }'
real 0m0.018s
user 0m0.009s
sys 0m0.008s
Discussion, choice
But as you don't usually query the current working directory too often, both options will work — unless you cannot spawn any more processes for some reason related to ulimit, out of memory situation, etc.
Finally, as for selecting which one to use: for Linux, I would always use getcwd(). I suppose you will need to make your tests and select which function to use if you are going to write a portable piece of code that will run on some really strange platform (here, of course, Linux, OS X, and Windows are not in the list of strange platforms).

Resources