shell script detect first couple letters - unix

I'm writing a shell script. A variable will have a url that will look at the the beginning few characters and
make sure there is something before the //...so a http, https,rtmp,rtmps,rtmpe,etc....
if nothing is in front of the // then tell user there is nothing...else if that value = whatever do whatever
How would I be able to do this?

case "$URL" in
http://*|https://*) echo "HTTP selected" ;;
ftp://*|ftps://*) echo "FTP selected" ;;
*) echo "Nothing selected" ;;
esac

Firstly you should should try coding this in some language such as perl or ruby that has extensive built-in regex capabilities. However if you really wish to do this in shell, then the sting operators are your friend:
url="something://and something else"
${url%%://*} will extract "something", i. e. the protocol being used.
${url##*://} will extract "and something else" i.e. everything to the right of ://

Ignacio has demonstrated how to do straight string matching, and ennuikiller has shown some of bash's pattern matching capabilities. But for what it's worth (and perhaps as surprising to others as to me), bash is indeed capable of full regexp pattern matching:
An additional binary operator, =~,
is available, with the same precedence
as == and !=. When it is used, the
string to the right of the operator
is considered an extended regular
expres‐ sion and matched accordingly
(as in regex(3)).
This was in the description of the [[ ]] operators for expression evaluation.

Related

Extract mm/dd/yyyy and m/dd/yyyy dates from string in R [duplicate]

My regex pattern looks something like
<xxxx location="file path/level1/level2" xxxx some="xxx">
I am only interested in the part in quotes assigned to location. Shouldn't it be as easy as below without the greedy switch?
/.*location="(.*)".*/
Does not seem to work.
You need to make your regular expression lazy/non-greedy, because by default, "(.*)" will match all of "file path/level1/level2" xxx some="xxx".
Instead you can make your dot-star non-greedy, which will make it match as few characters as possible:
/location="(.*?)"/
Adding a ? on a quantifier (?, * or +) makes it non-greedy.
Note: this is only available in regex engines which implement the Perl 5 extensions (Java, Ruby, Python, etc) but not in "traditional" regex engines (including Awk, sed, grep without -P, etc.).
location="(.*)" will match from the " after location= until the " after some="xxx unless you make it non-greedy.
So you either need .*? (i.e. make it non-greedy by adding ?) or better replace .* with [^"]*.
[^"] Matches any character except for a " <quotation-mark>
More generic: [^abc] - Matches any character except for an a, b or c
How about
.*location="([^"]*)".*
This avoids the unlimited search with .* and will match exactly to the first quote.
Use non-greedy matching, if your engine supports it. Add the ? inside the capture.
/location="(.*?)"/
Use of Lazy quantifiers ? with no global flag is the answer.
Eg,
If you had global flag /g then, it would have matched all the lowest length matches as below.
Here's another way.
Here's the one you want. This is lazy [\s\S]*?
The first item:
[\s\S]*?(?:location="[^"]*")[\s\S]* Replace with: $1
Explaination: https://regex101.com/r/ZcqcUm/2
For completeness, this gets the last one. This is greedy [\s\S]*
The last item:[\s\S]*(?:location="([^"]*)")[\s\S]*
Replace with: $1
Explaination: https://regex101.com/r/LXSPDp/3
There's only 1 difference between these two regular expressions and that is the ?
The other answers here fail to spell out a full solution for regex versions which don't support non-greedy matching. The greedy quantifiers (.*?, .+? etc) are a Perl 5 extension which isn't supported in traditional regular expressions.
If your stopping condition is a single character, the solution is easy; instead of
a(.*?)b
you can match
a[^ab]*b
i.e specify a character class which excludes the starting and ending delimiiters.
In the more general case, you can painstakingly construct an expression like
start(|[^e]|e(|[^n]|n(|[^d])))end
to capture a match between start and the first occurrence of end. Notice how the subexpression with nested parentheses spells out a number of alternatives which between them allow e only if it isn't followed by nd and so forth, and also take care to cover the empty string as one alternative which doesn't match whatever is disallowed at that particular point.
Of course, the correct approach in most cases is to use a proper parser for the format you are trying to parse, but sometimes, maybe one isn't available, or maybe the specialized tool you are using is insisting on a regular expression and nothing else.
Because you are using quantified subpattern and as descried in Perl Doc,
By default, a quantified subpattern is "greedy", that is, it will
match as many times as possible (given a particular starting location)
while still allowing the rest of the pattern to match. If you want it
to match the minimum number of times possible, follow the quantifier
with a "?" . Note that the meanings don't change, just the
"greediness":
*? //Match 0 or more times, not greedily (minimum matches)
+? //Match 1 or more times, not greedily
Thus, to allow your quantified pattern to make minimum match, follow it by ? :
/location="(.*?)"/
import regex
text = 'ask her to call Mary back when she comes back'
p = r'(?i)(?s)call(.*?)back'
for match in regex.finditer(p, str(text)):
print (match.group(1))
Output:
Mary

How to process latex commands in R?

I work with knitr() and I wish to transform inline Latex commands like "\label" and "\ref", depending on the output target (Latex or HTML).
In order to do that, I need to (programmatically) generate valid R strings that correctly represent the backslash: for example "\label" should become "\\label". The goal would be to replace all backslashes in a text fragment with double-backslashes.
but it seems that I cannot even read these strings, let alone process them: if I define:
okstr <- function(str) "do something"
then when I call
okstr("\label")
I directly get an error "unrecognized escape sequence"
(of course, as \l is faultly)
So my question is : does anybody know a way to read strings (in R), without using the escaping mechanism ?
Yes, I know I could do it manually, but that's the point: I need to do it programmatically.
There are many questions that are close to this one, and I have spent some time browsing, but I have found none that yields a workable solution for this.
Best regards.
Inside R code, you need to adhere to R’s syntactic conventions. And since \ in strings is used as an escape character, it needs to form a valid escape sequence (and \l isn’t a valid escape sequence in R).
There is simply no way around this.
But if you are reading the string from elsewhere, e.g. using readLines, scan or any of the other file reading functions, you are already getting the correct string, and no handling is necessary.
Alternatively, if you absolutely want to write LaTeX-like commands in literal strings inside R, just use a different character for \; for instance, +. Just make sure that your function correctly handles it everywhere, and that you keep a way of getting a literal + back. Here’s a suggestion:
okstr("+label{1 ++ 2}")
The implementation of okstr then needs to replace single + by \, and double ++ by + (making the above result in \label{1 + 2}). But consider in which order this needs to happen, and how you’d like to treat more complex cases; for instance, what should the following yield: okstr("1 +++label")?

how to key out operators in grep in r

I am trying to use a grep formula to search for at least one of the following terms in quotations in the code below in df$AllPrograms.
grep("Service & Product Provider (Partner;ACT)" | "Buildings (Prospect;INA)", df$AllPrograms)
This isn't working and I suspect it is because grep is not interpret ting the & ; and () as operators rather than characters.
Use a double backslash "\" to escape these characters. This is because the backslash is an escape character in extended regex, but we need to "escape" the first backslash as well.
Also, in your example code you have incorrectly specified the OR statement. Try:
grep("Service \\& Product Provider \\(Partner\\;ACT\\)|Buildings \\(Prospect\\;INA\\)", df$AllPrograms)
If there are many other patterns that you'd like to check for, take a look at this link here:
grep using a character vector with multiple patterns

zsh: command substitution, proper quoting and backslash (again)

(Note: This is a successor question to my posting zsh: Command substitution and proper quoting , but now with an additional complication).
I have a function _iwpath_helper, which outputs to stdout a path, which possibly contains spaces. For the sake of this discussion, let's assume that _iwpath_helper always returns a constant text, for instance
function _iwpath_helper
{
echo "home/rovf/my directory with spaces"
}
I also have a function quote_stripped expects one parameter and if this parameter is surrounded by quotes, it removes them and returns the remaining text. If the parameter is not surrounded by quotes, it returns it unchanged. Here is its definition:
function quote_stripped
{
echo ${1//[\"\']/}
}
Now I combine both functions in the following way:
target=$(quote_stripped "${(q)$(_iwpath_helper)}")
(Of course, 'quote_stripped' would be unnecessary in this toy example, because _iwpath_helper doesn't return a quote-delimited path here, but in the real application, it sometimes does).
The problem now is that the variable target contains a real backslash character, i.e. if I do a
echo +++$target+++
I see
+++home/rovf/my\ directory\ with\ spaces
and if I try to
cd $target
I get on my system the error message, that the directory
home/rovf/my/ directory/ with/ spaces
would not exist.
(In case you are wondering where the forward slashes come from: I'm running on Cygwin, and I guess that the cd command just interprets backslashes as forward slashes in this case, to accomodate better for the Windows environment).
I guess the backslashes, which physically appear in the variable target are caused by the (q) expansion flag which I apply to $(_iwpath_helper). My problem is now that I can not simply drop the (q), because without it, the function quote_stripped would get on parameter $1 only the first part of the string, up to the first space (/home/rovf/my).
How can I write this correctly?
I think you just want to avoid trying to strip quotes manually, and use the (Q) expansion flag. Compare:
% v="a b c d"
% echo "$v"
a b c d
% echo "${(q)v}"
a\ b\ c\ d
% echo "${(Q)${(q)v}}"
a b c d
chepner was right: The way I tried to unquote the string was silly (I was thinking too much in a "Bourne Shell way"), and I should have used the (Q) flag.
Here is my solution:
target="${(Q)$(_iwpath_helper)}"
No need for the quote_stripped function anymore....

Ungoogleble machine-code or otherwise hard to read EUID part

Not sure whether obfuscated, machine-code or something else. Please, let me know what the part is for and how to read it. The part is from the file.
###############################################################################
# Set prompt based on EUID
################################################################################
if (( EUID == 0 )); then
PROMPT=$'%{\e[01;31m%}%n#%m%{\e[0m%}[%{\e[01;34m%}%3~%{\e[0;m%}]$(pc_scm_f)%# '
else
PROMPT=$'%{\e[01;32m%}%n#%m%{\e[0m%}[%{\e[01;34m%}%3~%{\e[0;m%}]$(pc_scm_f)%% '
fi
could someone break it a bit more into parts?
What does the conditional EUID == 0 do?
I get an error about pc_scm_f, using OBSD, is it some sort of value in other OS?
the \e starts some sort of logical part, what do the rest do?
Looks like ANSI escape sequences to me.
I found this link which seems to contain the whole thing in proper context.
Also tells me Ferruccio is right: It's an ANSI escape string, used to change the style of the command-prompt. \e starts the escape codes, the rest is the code itself. Used to be very popular in the old DOS time, especially with a game called NetHack. It's just pretty-print for your console.

Resources