ack - Binding an actual file name to a filetype - ack

For me ack is essential kit (its aliased to a and I use it a million times a day). Mostly it has everything I need so I'm figuring that this behavior is covered and I just can't find it.
I'd love to be able to restrict it to specific kinds of files using a type. the problem is that these files have a full filename rather than an extension. For instance I'd like to restrict it to build files for buildr so i can search them with --buildr (Similar would apply for mvn poms). I have the following defined in my .ackrc
--type-set=buildr=buildfile,.rake
The problem is that 'buildfile' is the entire filename, not an extension, and I'd like ack to match completely on this name. However if I look at the types bound to 'buildr' it shows that .buildfile is an extension rather than the whole filename.
--[no]buildr .buildfile .rake
The ability to restrict to a particular filename would be really useful for me as there are numerous xml usecases (e.g. ant build.xml or mvn pom.xml) that it would be perfect for. I do see that binary, Makefiles and Rakefiles have special type configuration and maybe that's the way to go. I'd really like to be able to do it within ack if possible before resorting to custom functions. Anyone know if this is possible?

No, you cannot do it. ack 1.x only uses extensions for detecting file types. ack 2.0 will have much more flexible capabilities, where you'll be able to do stuff like:
# There are four different ways to match
# is: Match the filename exactly
# ext: Match the extension of the filename exactly
# match: Match the filename against a Perl regular expression
# firstlinematch: Match the first 80 characters of the first line
# of text against a Perl regular expression. This is only for
# the --type-add option.
--type-add=make:ext:mk
--type-add=make:ext:mak
--type-add=make:is:makefile
--type-add=make:is:gnumakefile
# Rakefiles http://rake.rubyforge.org/
--type-add=rake:is:Rakefile
# CMake http://www.cmake.org/
--type-add=cmake:is:CMakeLists.txt
--type-add=cmake:ext:cmake
# Perl http://perl.org/
--type-add=perl:ext:pod
--type-add=perl:ext:pl
--type-add=perl:ext:pm
--type-add=perl:firstlinematch:/perl($|\s)/
You can see what development on ack 2.0 is doing at https://github.com/petdance/ack2. I'd love to have your help.

Related

scp_download to download multiple files based on a pattern?

I need to download many files from a server (specifically tectia) ideally using the ssh package. These files all follow the a predictable pattern across multiple sub folders. The filepath is formatted like this
/directory/subfolder/A001/abcde001.csv
Where A001 counts up alongside the last 3 digits of the filename (/A002/abcde002.csv and so on)
In the vignette for scp_download it states that the files parameter may contain wildcards so I have tried to do something like
scp_download(session, "/directory/subfolder/A.*/abcde.*[.]csv", to=tempdir())
and
scp_download(session, "directory/subfolder/A\\d{3}/abcde\\d{3}[.]csv", to=tempdir())
but no matter which combination of patterns or wildcards I can think of (which isn't many) I only get something like
Warning: SSH warning: scp: /directory/subfolder/A\d{3}/abcde\d{3}[.]csv: No such file or directory
What I'm hoping to do is either find a way to do pattern matching here, or to find a way to store tectia directories as a string to be read by scp_download. I've made sure that my session is connected properly and it works without attempting to pattern match, which it does.
I had the same problem. The problem is that when you use * in your pattern it gets escaped when you send it to the server. However, when you request a special file name like this /directory/subfolder/A001/abcde001.csv, it works fine.
Finally I changed my code based on the below steps:
I got the list of files/folders using ls command with ssh_exec_wait function and then store them on a variable.
Download files in the variable separately
session <- ssh_connect("username#ip",passwd="password")
files<-capture.output(ssh_exec_wait(session, command = 'ls /directory/subfolder/A001/*'))
dnc1<- scp_download(session, files[1], to = paste0(getwd(),"/data/"))
dnc2<- scp_download(session, files[2], to = paste0(getwd(),"/data/"))
dnc3<- scp_download(session, files[3], to = paste0(getwd(),"/data/"))
The bottom 3 commands can be done in a loop as this could be hundreds or thousands of records.

How do I tell Vim to use any/all dictionary files that fit a filepath wildcard glob?

I am trying to set the dictionary option (to allow autocompletion of certain words of my choosing) using wildcards in a filename glob, as follows:
:set dict+=$VIM/dict/dict*.lst
The hope is that, with this line in the initially sourced .vimrc (or, in my case of Windows 10, _vimrc), I can add different dictionary files to my $VIM/dict directory later, and each new invocation of Vim will use those dictionary files, without me needing to modify my .vimrc settings.
However, an error message says that there is no such file. When I give a specific filename (as in :set dict+=$VIM/dict/dict01.lst ), then it works.
The thing is, I could swear that this used to work. I had this setting in my .vimrc files since I started using Vim 7.1, and I don't recall any such error message until recently. Now it shows up on my Linux laptop as well as my Windows 7 and Windows 10 laptops. I can't remember exactly when this started happening.
Yes, I tried using backslashes (as in :set dict+=$VIM\dict\dict*.lst ) in case it was a Windows compatibility issue, but that still doesn't work. (Also this is happening on my Linux laptop, too, and that doesn't use backslashes for filepaths.)
Am I going senile? Or is there some other mysterious force going on?
Assuming for now that it is a change in the latest version of Vim, is there some way to specify "use all the dictionary files that fit this glob"?
-- Edited 2021-02-14 06:17:07
I also checked to see if it was due to having more than one file that fits the wildcard glob. (I thought that if I had more than one file that fit the wildcard, the glob would turn into two filenames, equivalent to saying dict+=$VIM/dict/dict01.lst dict02.lst which would not be syntactically valid.) But it still did not working after removing extra files so that only one file fit my pathname of $VIM/dict/dict*.lst . (I had previously put another Addendum here happily explaining that this was how I solved my problem, but it turned out to be premature.)
You must expand wildcards before setting an option. Multiple file names must be separated by commas. For example,
let &dictionary = tr(expand("$VIM/dict/dict*.lst"), "\n", ",")
If adding a value to a non-empty option, don't forget to add comma too (let is more universal than set, so it's less forgiving):
let &dictionary .= "," . tr(expand(...)...)

Should the longer synonyms for local variables be used in a Makefile?

In make (I am using OpenBSD’s implementation, but I suppose the question is relevant for GNU make as well), we have the following so called local variables
# The name of the target
% The name of the archive member (for library rules)
! The name of the archive file (for library rules)
? The list of prerequisites for this target that were deemed out of date
< The name of the prerequisite from which this target is to be built (for inference rules)
* The file prefix of the file, containing only the file portion, no suffix or preceding directory components
(roughly from make(1) on OpenBSD)
These local variables have synonyms: for exampe, .IMPSRC for < or .TARGET for #. The manual in FreeBSD says these longer versions are preferred. OpenBSD’s man page mentions no such thing, but says these longer names are an extension.
Is it better to use to longer names? Which is better for compatibility? Are both POSIX?
Those variables are called automatic variables in GNU make and internal variables in the POSIX standard.
The long names for these are purely BSD make inventions, they do not exist in any other version of make (such as GNU make) and they are not mentioned in the POSIX standard for make.
It's up to you whether you want to use them, but they are completely non-portable. Of course you could always define them yourself if you wanted to implement a compatibility layer.

What is the general syntax of a Unix shell command?

In particular, why is that sometimes the options to some commands are preceded by a + sign and sometimes by a - sign?
for example:
sort -f
sort -nr
sort +4n
sort +3nr
These days, the POSIX standard using getopt() (aka getopt(3)) is widely used as a standard notation, but in the early days, people were experimenting. On some machines, the sort command no longer supports the + notation. However, various commands (notably ar and tar) accept controls without any prefix character - and dd (alluded to by Alok in a comment) uses another convention altogether.
The GNU convention of using '--' for long options (supported by getopt_long(3)) was changed from using '+'. Of course, the X11 software uses a single dash before multi-character options. So, the whole thing is a collection of historic relics as people experimented with how best to handle it.
POSIX documents the Utility Conventions that it works to, except where historical precedent is stronger.
What styles of option handling are there?
[At one time, SO 367309 contained the following material as my answer. It was originally asked 2008-12-15 02:02 by FerranB, but was subsequently closed and deleted.]
How many different types of options do you recognize? I can think of
many, including:
Single-letter options preceded by single dash, groupable when there is
no argument, argument can be attached to option letter or in next
argument (many, many Unix commands; most POSIX commands).
Single-letter options preceded by single dash, grouping not allowed,
arguments must be attached (RCS).
Single-letter options preceded by single dash, grouping not allowed,
arguments must be separate (pre-POSIX SCCS, IIRC).
Multi-letter options preceded by single dash, arguments may be
attached or in next argument (X11 programs; also Java and many programs on Mac OS X with a NeXTSTEP heritage).
Multi-letter options preceded by single dash, may be abbreviated
(Atria Clearcase).
Multi-letter options preceded by single plus (obsolete).
Multi-letter options preceded by double dash; arguments may follow '='
or be separate (GNU utilities).
Options without prefix/suffix, some names have abbreviations or are
implied, arguments must be separate. (AmigaOS
Shell)
For options taking an optional argument, sometimes the argument must be attached (co -p1.3 rcsfile.c),
sometimes it must follow an '=' sign. POSIX doesn't support optional
arguments meaningfully (the POSIX getopt() only allows them for the last
option on the command line).
All sensible option systems use an option consisting of double-dash
('--') alone to mean "end of options" — the following arguments are
"non-option arguments" (usually file names; POSIX calls them 'operands')
even if they start with a
dash. (I regard supporting this notation as an imperative. Be aware that if the -- is preceded by an option requiring an argument, the -- will be treated as the argument to the option, not as the 'end of options' marker.)
Many but not all programs accept single dash as a file name to mean
standard input (usually) or standard output (occasionally). Sometimes,
as with GNU 'tar', both can be used in a single command line:
... | tar -cf - -F - | ...
The first solo dash means 'write to stdout'; the second means 'read file
names from stdin'.
Some programs use other conventions — that is, options not preceded by a
dash. Many of these are from the oldest days of Unix. For example,
'tar' and 'ar' both accept options without a dash, so:
tar cvzf /tmp/somefile.tgz some/directory
The dd command uses opt=value exclusively:
dd if=/some/file of=/another/file bs=16k count=200
Some programs allow you to interleave options and other arguments
completely; the C compiler, make and the GNU utilities run without
POSIXLY_CORRECT in the environment are examples. Many programs expect
the options to precede the other arguments.
Note that git and other VCS commands often use a hybrid system:
git commit -m 'This is why it was committed'
There is a sub-command as one of the arguments. Often, there will be optional 'global' options that can be specified between the command and the sub-command. There are examples of this in POSIX; the sccs command is in this category; you can argue that some of the other commands that run other commands are also in this category: nice and xargs spring to mind from POSIX; sudo is a non-POSIX example, as are svn and cvs.
I don't have strong preferences between the different systems. When
there are few enough options, then single letters with mnemonic value
are convenient. GNU supports this, but recommends backing it up with
multi-letter options preceded by a double-dash.
There are some things I do object to. One of the worst is the same
option letter being used with different meanings depending on what other
option letters have preceded it. In my book, that's a no-no, but I know
of software where it is done.
Another objectionable behaviour is inconsistency in style of handling
arguments (especially for a single program, but also within a suite of
programs). Either require attached arguments or require detached
arguments (or allow either), but do not have some options requiring an
attached argument and others requiring a detached argument. And be
consistent about whether '=' may be used to separate the option and
the argument.
As with many, many (software-related) things — consistency is more
important than the individual decisions. Using tools that automate
and standardize the argument processing helps with consistency.
Whatever you do, please, read the TAOUP's Command-Line Options and
consider Standards for Command Line Interfaces. (Added by J F
Sebastian — thanks; I agree.)
It's completely arbitrary; the command may implement all of the option handling in its own special way or it might call out to some other convenience functions. The getopt() family of functions is pretty popular, so most software written even remotely recently follows the conventions set by those routines. There are always exceptions, of course!
It's left to apps to parse options hence the inconsistency. Expanding on your sort example these are all equivalent for coreutils:
sort -k3
sort --k 3
sort --key 3
sort --key=3
_POSIX2_VERSION=199209 sort +2
A shell command is just a program, and it is free to interpret its command line any way it likes.
Unix never had anything like Apple's interface police to make sure that the command-line interface was consistent across applications. As a result, there is inconsistency, especially in older commands.
Peering into my crystal ball, I think command-line tools will slowly migrate toward GNU standards, double dashes and all. (I grew up with single dashes and still find the double dash very awkward, but it is consistent.)

Convert asp.net project pages from Windows-1251 to Utf-8

I can do that file-by-file with Save As Encoding in Visual Studio, but I'd like to make this in one click. Is it possible?
I know, some will start bashing on me:
download a smalltalk IDE (such as ST/X),
open a workspace,
type in:
'yourDirectoryHere' asFilename directoryContentsAsFilenamesDo:[:oldFileName |
|cyrString utfString newFile|
cyrString := oldFileName contentsAsString.
utfString := CharacterEncoder encodeString:cyrString from:#'iso8859-5' into:#'utf'.
newFile := oldFile withSuffix:'utf'.
newFile contents:utfString.
].
that will convert all files in the given directory and create corresponding .utf files without affecting the original files. Even if you normally do not use smalltalk, for this type of actions, smalltalk is a perfect scripting environment.
I know, most of you don't read smalltalk, but the code should be readable even for non-smalltalkers and a corresponding perl/python/java/c# piece of code also written and executed in 1 minute or so, taking the above as a guide. I guess all current languages provide something similar to the CharacterEncoder above.

Resources