teradata sql match regular expression - teradata

My code current strips: https://www.website.com/events
but I want to strip everything up to the /events, so it would be
https://www.website.com
I feel like I am close, but I am missing something ?
This is in Teradata SQL.
select 'https://www.website.com/events/143403?sid=1090794&mid=35' as string_to_search
,REGEXP_SUBSTR(string_to_search, '^.*(?=(/))',1,1,'i') as extract_domain

If you always got the scheme in your URLs you can use a simple
strtok(PAGREF_URL, '/',2)

Starting from the beginning of the string ^, take everything .* up to the 1st place ? where there is
No slash behind (?<!/)
A slash in front (?=/)
but not 2 slashes in front (?!//)
select 'https://www.website.com/events/143403?sid=1090794&mid=35' as string_to_search
,regexp_substr(string_to_search, '^.*?(?<!/)(?=/)(?!//)') as extract_domain
;
https://www.website.com

Related

Why is my url rewriting rule not working properly?

I'm trying to write this redirection
images/catalog/1002/10002/main-200x250.12345.jpg to url images/catalog/1002/10002/main.jpg?w=200&h=250&vw=main
I tried this rule:
rewrite "^/images/(.*)/([a-z0-9]+)-([0-9])x([0-9]).([0-9]{5}).(jpg|jpeg|png|gif|ico)$" /images/$1/$2.$6?w=$3&h=$4&vw=$2 break;
It is not working, it return 404 not found error. I don't know what I'm missing.
Also when I remove double quotes (") I got this error
directive "rewrite" is not terminated by ";"
And I don't clear see the utility of the sign " and when should I use it or avoid it
I m working on a Mac with MAMP Pro v 5.2.2
You forgot to add a quantifier for the width and height numbers in your regex. Try this (I added a twice a +, you might want to use {X} instead, where X is the amount of digits for each number (if it is always the same amount of digits)):
rewrite "^/images/(.*)/([a-z0-9]+)-([0-9]+)x([0-9]+).([0-9]{5}).(jpg|jpeg|png|gif|ico)$" /images/$1/$2.$6?w=$3&h=$4&vw=$2 break;
Your reqular expression needs to be quoted because there is a } in it.
I think the nginx documentation about rewrite directive will answer your question, when a regular expression needs to be quoted:
If a regular expression includes the “}” or “;” characters, the whole
expressions should be enclosed in single or double quotes.

How to remove up to a certain punctuation when there are same punctuations [duplicate]

I'm new at regular expressions and wonder how to phrase one that collects everything after the last /.
I'm extracting an ID used by Google's GData.
my example string is
http://spreadsheets.google.com/feeds/spreadsheets/p1f3JYcCu_cb0i0JYuCu123
Where the ID is: p1f3JYcCu_cb0i0JYuCu123
Oh and I'm using PHP.
This matches at least one of (anything not a slash) followed by end of the string:
[^/]+$
Notes:
No parens because it doesn't need any groups - result goes into group 0 (the match itself).
Uses + (instead of *) so that if the last character is a slash it fails to match (rather than matching empty string).
But, most likely a faster and simpler solution is to use your language's built-in string list processing functionality - i.e. ListLast( Text , '/' ) or equivalent function.
For PHP, the closest function is strrchr which works like this:
strrchr( Text , '/' )
This includes the slash in the results - as per Teddy's comment below, you can remove the slash with substr:
substr( strrchr( Text, '/' ), 1 );
Generally:
/([^/]*)$
The data you want would then be the match of the first group.
Edit   Since you’re using PHP, you could also use strrchr that’s returning everything from the last occurence of a character in a string up to the end. Or you could use a combination of strrpos and substr, first find the position of the last occurence and then get the substring from that position up to the end. Or explode and array_pop, split the string at the / and get just the last part.
You can also get the "filename", or the last part, with the basename function.
<?php
$url = 'http://spreadsheets.google.com/feeds/spreadsheets/p1f3JYcCu_cb0i0JYuCu123';
echo basename($url); // "p1f3JYcCu_cb0i0JYuCu123"
On my box I could just pass the full URL. It's possible you might need to strip off http:/ from the front.
Basename and dirname are great for moving through anything that looks like a unix filepath.
/^.*\/(.*)$/
^ = start of the row
.*\/ = greedy match to last occurance to / from start of the row
(.*) = group of everything that comes after the last occurance of /
you can also normal string split
$str = "http://spreadsheets.google.com/feeds/spreadsheets/p1f3JYcCu_cb0i0JYuCu123";
$s = explode("/",$str);
print end($s);
This pattern will not capture the last slash in $0, and it won't match anything if there's no characters after the last slash.
/(?<=\/)([^\/]+)$/
Edit: but it requires lookbehind, not supported by ECMAScript (Javascript, Actionscript), Ruby or a few other flavors. If you are using one of those flavors, you can use:
/\/([^\/]+)$/
But it will capture the last slash in $0.
Not a PHP programmer, but strrpos seems a more promising place to start. Find the rightmost '/', and everything past that is what you are looking for. No regex used.
Find position of last occurrence of a char in a string
based on #Mark Rushakoff's answer the best solution for different cases:
<?php
$path = "http://spreadsheets.google.com/feeds/spreadsheets/p1f3JYcCu_cb0i0JYuCu123?var1&var2#hash";
$vars =strrchr($path, "?"); // ?asd=qwe&stuff#hash
var_dump(preg_replace('/'. preg_quote($vars, '/') . '$/', '', basename($path))); // test.png
?>
Regular Expression to collect everything after the last /
How to get file name from full path with PHP?

Extract a certain element from URL using regular expressions

I need to extract the first element ("adidas-originals") after "designer" in the following URL using regular expressions.
xxx/en-ca/men/designers/adidas-originals/shorts
This needs to be done in Google Big Query API (standard SQL). To this end, I have tried several ways to get the desired valued without any success. Below is the best solution that I have found so far which obviously is not the right one as it returns "/adidas-originals/shorts".
REGEXP_EXTRACT(hits.page.pagePath, r'designers([^\n]*)')
Thanks!
The [^\n]* matches 0 or more chars other than a newline, LF, so no wonder it matches too much.
You need a pattern to match up to the next /, so you may use
designers/([^/]+)
Or a more precise:
(?:^|/)designers/([^/]+)
See the regex demo
Details
(?:^|/) - either start of a string or / (you may just use / if designers is always preceded with /)
designers/ a designers/ substring
([^/]+) - Capturing group 1 (just what will be returned with the REGEXP_EXTRACT function): one or more chars other than /.

Using curly brackets({}) for REGEX in drupal db_query

I have a where clause in my query like this "WHERE sth REGEXP '[0-9]{5,10}'"
when I run this query in phpmyadmin it returns all matched records but in drupal it has no result.I think it's because drupal assumes everything like "{sth}" as a table.
how can I solve this problem?
Thanks
Your theory is correct.
Curly brackets used as repetition quantifier in regexes are removed as any other curly bracket. Pass the regex as an argument to db_query() instead like this:
db_query('SELECT name from {users} WHERE std RLIKE "%s"', '[0-9]{5,10}');
(I've had to guess at the rest of your query.)

Regular Expression for ASP.NET ID Using Javascript

I am trying to extract the word "need" from this string.
ctl00_ctl00_ContentMainContainer_ContentColumn1__needDont_Panel1
I have tried [__]([.]?=Dont)
This is using javascript .match()
I have even tried to use http://gskinner.com/RegExr/ but just can't solve this one. Thanks for the help!
(?<=__)\w+(?=Dont)
Matches all alpha-numbers between __ and Dont
Edit
Sorry, I havent noticed word JavaScript. It does not support lookbehind, so __(\w+)(?=Dont) can be used there.
If Regex should match even when nothing comes between __ and Dont use "\w*" instead of "\w+". Be careful with ".*" because dot matches almost all characters, do you allow spaces in ID?
I haven't noticed
This will accomplish what you're looking for:
__(.*)(?=Dont)
You seem to be mixing up what a character class - square brackets [] - does, instead you should be using regular brackets ().
In your regex [__] will only match a single underscore _ and [.] will match a single period.
Your error is writing [__] instead of __ (without the braces). [__] matches only a single underscore, so it will match _ctl00_ContentMainContainer_ContentColumn1__need.
[.] is also wrong. You should use something like: [^_]+ (anything except underscore).

Resources