Regex to get parts of URL - asp.net

Hi I have URL as follows:
vimeo.com/99612902
www.vimeo.com/99612902
http://vimeo.com/99612902
http://www.vimeo.com/99612902
http://vimeo.com/moogaloop.swf?clip_id=81368903
I need to parse the above URL to get two group as folloes:
Group1 Group 2
vimeo.com/ 99612902
www.vimeo.com/ 99612902
http://vimeo.com/ 99612902
http://www.vimeo.com/ 99612902
http://vimeo.com/ 81368903
I've tried the followin regex
^((http[s]?|ftp):\/)?\/?([^:\/\s]+)(:([^\/]*))?((\/[\w\-]+)*\/)([\w\-\.]+[^#?\s]+)(\?([^#]*))?(#(.*))?
but which yields me unwanted and empty group. Please help me out.

With your input, we can match both parts into Groups 1 and 2 with this:
^(.*/)(.*)
or, for your revised input:
^(.*[/=])([^/=]+$)
In the demo, see the capture groups in the right pane.
In VB.NET, you can do this:
Dim theUrl As String
Dim theNumbers As String
Try
ResultString = Regex.Match(SubjectString, "^(.*/)(.*)", RegexOptions.Multiline)
theUrl = ResultString.Groups(1).Value
theNumbers = ResultString.Groups(2).Value
Catch ex As ArgumentException
'Syntax error in the regular expression
End Try
Option 2
If you want to do some very lightweight url validation at the same time, you can use this:
^((?:http://)?(?:www\.)?[^./]+\.\w+/)(.*)
or, with your revised input:
^((?:http://)?(?:www\.)?[^./]+\.\w+[=/])([^/=]+$)

If you don't want to validate the url then try this as well. Get the matched group from index 1 and 2.
(.*?[^\/]*\/)(\d+)
Here is DEMO
String literals for use in programs: C#
#"(.*?[^\/]*\/)(\d+)"

Simply you could use the below regex,
^(.*\/)(.*)$
DEMO
From the starting upto the last / symbol are captured by group1. Remaining characters are captured into group2.
OR
^((?:https?:\/\/)?(?:www\.)?(?:[^.]*)\.\w+\/)(.*)$
DEMO

Related

How to Store All Text in Between Two Index Positions of Same String in VBScript?

So I am going off memory here because I cannot see the code I am trying to figure this out for at the moment, but I am working with some old VB Script code where there is a data connection that is set like this:
set objCommand = Server.CreateObject("ADODB.command")
and I have a field from the database that is being stored in a variable like this:
Items = RsData(“Item”).
This specific field in the database is a long string of
text:
(i.e. “This is part of a string of text…Header One: Here is text after header one… Header Two: Here is more text after header two”).
There are certain parts of the text that I wish to store as a variable that are between two index positions in the long string of text within that field. They are separated by headers that are stored in the text field above like this: “Header One:” and “Header Two:”, and I want to capture all text that occurs in between those two headers of text and store them into their own variable (i.e. “Here is text after header one…”).
How do I achieve this? I have tried to use the InStr method to set the index but from how I understand how this works it will only count the beginning of where a specific part of the string occurs. Am I wrong in my thinking of this? Since that is the case, I am also having trouble getting the Mid function to work. Can some one please show me an example of how this is supposed to work? Remember, I am only going off of memory so please forgive me that I am unable to provide better code examples now. I hope my question makes sense!
I am hopeful that someone can help me with an answer tonight so I can try this out tomorrow when I am near the code again! Thank you for your efforts and any help offered!
You can extract all the substrings starting with the text Header and ending just before either the next Header or end-of-string. I have used regular expression to implement that and it is working for me. Have a look at the code below. If I get a simpler(non-regex solution), I will update the answer.
Code:
strTest = "Header One: Some random text Header Two: Some more text Header One: Some random textwerwerwefvxcf234234 Header Three: Some more t2345fsdfext Header Four: Some randsdfsdf3w42343om text Header Five: Some more text 123213"
set objReg = new Regexp
objReg.Global = true
objReg.IgnoreCase = false
objReg.pattern = "Header[^:]+:([\s\S]*?)(?=Header|$)" '<---Regex Pattern. Explained later.
set objMatches = objReg.Execute(strTest)
Dim arrHeaderValues() '<-----This array contains all the required values
i=-1
for each objMatch in objMatches
i = i+1
Redim Preserve arrHeaderValues(i)
arrHeaderValues(i) = objMatch.subMatches.item(0) '<---item(0) indicates the 1st group of each match
next
'Displaying the array values
for i=0 to ubound(arrHeaderValues)
msgbox arrHeaderValues(i)
next
set objReg = Nothing
Regex Explanation:
Header - matches Header literally
[^:]+: - matches 1+ occurrences of any character that is not a :. This is then followed by matching a :. So far, keeping the above 2 points in mind, we have matched strings like Header One:, Header Two:, Header blabla123: etc. Now, whatever comes after this match is relevant to us. So we will capture that inside a Group as shown in the next breakup.
([\s\S]*?)(?=Header|$) - matches and captures everything(including newlines) until either the next Header or the end-of-the-string(represented by $)
([\s\S]*?) - matches 0+ occurrences of any character and capture the whole match in Group 1
(?=Header|$) - match and capture the above thing until another instance of the string Header or end of the string
Click for Regex Demo
Alternative Solution(non-regex):
strTest = "Header One: Some random text Header Two: Some more text Header One: Some random textwerwerwefvxcf234234 Header Three: Some more t2345fsdfext Header Four: Some randsdfsdf3w42343om text Header Five: Some more text 123213"
arrTemp = split(strTest,"Header") 'Split using the text Header
j=-1
Dim arrHeaderValues()
for i=0 to ubound(arrTemp)
strTemp = arrTemp(i)
intTemp = instr(1,strTemp,":") 'Find the position of : in each array value
if(intTemp>0) then
j = j+1
Redim preserve arrHeaderValues(j)
arrHeaderValues(j) = mid(strTemp,intTemp+1) 'Store the desired value in array
end if
next
'Displaying the array values
for i=0 to ubound(arrHeaderValues)
msgbox arrHeaderValues(i)
next
If you don't want to store the values in an array, you can use Execute statement to create variables with different names during run-time and store the values in them. See this and this for reference.

How to split a string and get all values?

So I've got this small piece of example code in my View
#{
string MyValue = "val1;val2;val3";
}
And I am wondering how I can split it where there's a semi colon, and then I can run through each value and print it in an list
Remember that in a View when using the #{} most C# code can be used. You can use this line with a loop to get what you need done.
string sub = input.Substring(0, input.indexof(";"));

How do I delete characters in a string up to a certain point in classic asp?

I have a string that at any point may or may not contain one or more / characters. I'd like to be able to create a new string based on this string. The new string would include every character after the very last / in the original string.
Sounds like you're wanting the file name from a URL. In any case, it's the same function. The key is using the InStrRev function to find the first / char, but starting from the right. Here's the function:
Function GetFilename(URL)
Dim I
I = InStrRev(URL, "/")
If I > 0 Then
GetFilename = Mid(URL, I + 1)
Else
GetFilename = URL
End If
End Function
Split it up into parts and get the last part:
a = split("my/string/thing", "/")
wscript.echo a(ubound(a))
note: Not safe when the string is empty.

xQuery substring problem

I now have a full path for a file as a string like:
"/db/Liebherr/Content_Repository/Techpubs/Topics/HyraulicPowerDistribution/Released/TRN_282C_HYD_MOD_1_Drive_Shaft_Rev000.xml"
However, now I need to take out only the folder path, so it will be the above string without the last back slash content like:
"/db/Liebherr/Content_Repository/Techpubs/Topics/HyraulicPowerDistribution/Released/"
But it seems that the substring() function in xQuery only has substring(string,start,len) or substring(string,start), I am trying to figure out a way to specify the last occurence of the backslash, but no luck.
Could experts help? Thanks!
Try out the tokenize() function (for splitting a string into its component parts) and then re-assembling it, using everything but the last part.
let $full-path := "/db/Liebherr/Content_Repository/Techpubs/Topics/HyraulicPowerDistribution/Released/TRN_282C_HYD_MOD_1_Drive_Shaft_Rev000.xml",
$segments := tokenize($full-path,"/")[position() ne last()]
return
concat(string-join($segments,'/'),'/')
For more details on these functions, check out their reference pages:
fn:tokenize()
fn:string-join()
fn:replace can do the job with a regular expression:
replace("/db/Liebherr/Content_Repository/Techpubs/Topics/HyraulicPowerDistribution/Released/TRN_282C_HYD_MOD_1_Drive_Shaft_Rev000.xml",
"[^/]+$",
"")
This can be done even with a single XPath 2.0 (subset of XQuery) expression:
substring($fullPath,
1,
string-length($fullPath) - string-length(tokenize($fullPath, '/')[last()])
)
where $fullPath should be substituted with the actual string, such as:
"/db/Liebherr/Content_Repository/Techpubs/Topics/HyraulicPowerDistribution/Released/TRN_282C_HYD_MOD_1_Drive_Shaft_Rev000.xml"
The following code tokenizes, removes the last token, replaces it with an empty string, and joins back.
string-join(
(
tokenize(
"/db/Liebherr/Content_Repository/Techpubs/Topics/HyraulicPowerDistribution/Released/TRN_282C_HYD_MOD_1_Drive_Shaft_Rev000.xml",
"/"
)[position() ne last()],
""
),
"/"
)
It seems to return the desired result on try.zorba-xquery.com. Does this help?

Regular expression to convert substring to link

i need a Regular Expression to convert a a string to a link.i wrote something but it doesnt work in asp.net.i couldnt solve and i am new in Regular Expression.This function converts (bkz: string) to (bkz: show.aspx?td=string)
Dim pattern As String = "<bkz[a-z0-9$-$&-&.-.ö-öı-ış-şç-çğ-ğü-ü\s]+)>"
Dim regex As New Regex(pattern, RegexOptions.IgnoreCase)
str = regex.Replace(str, "<font color=""#CC0000"">$1</font>")
Generic remarks on your code: beside the lack of opening parentheses, you do redundant things: $-$ isn't incorrect but can be simplified into $ only. Same for accented chars.
Everybody will tell you that font tag is deprecated even in plain HTML: favor span with style attribute.
And from your question and the example in the reply, I think the expression could be something like:
\(bkz: ([a-z0-9$&.öışçğü\s]+)\)
the replace string would look like:
(bkz: <span style=""color: #C00"">$1</span>)
BUT the first $1 must be actually URL encoded.
Your regexp is in trouble because of a ')' without '('
Would:
<bkz:\s+((?:.(?!>))+?.)>
work better ?
The first group would capture what you are after.
Thanks Vonc,Now it doesnt raise error but also When i assign str to a Label.Text,i cant see the link too.Forexample after i bind str to my label,it should be viewed in view-source ;
<span id="Label1">(bkz: here)</span>
But now,it is in viewsource source;
<span id="Label1">(bkz: here)</span>

Resources