How to write regex to extract FlickR Image ID From URL? - asp.net

I'm looking to do do two things, and I am looking to do them in a beautiful way. I am working on a project that allows users to upload flickr photos by simply entering their flickr image URL. Ex: http://www.flickr.com/photos/xdjio/226228060/
I need to:
make sure it is a URL that matches the following format: http://www.flickr.com/photos/[0]/[1]/
extract the following part: http://www.flickr.com/photos/xdjio/[0]/
Now I could very easily write some string methods to do the above but I think it would be messy and love learning how to do these things in regex. Although not being a regex ninja I am currently unable to do the above.

Given an input string with a URL like the one you provided, this will extract the image ID for any arbitrary user:
string input = "http://www.flickr.com/photos/xdjio/226228060/";
Match match = Regex.Match(input, "photos/[^/]+/(?<img>[0-9]+)", RegexOptions.IgnoreCase | RegexOptions.SingleLine);
if(match.Success)
{
string imageID = match.Groups["img"].Value;
}
Breaking it down, we are searching for "photos/" followed by one or more characters that is not a '/', followed by a /, followed by one or more characters that are numbers. We also put the numbers segment into a named group called "img".

thought i would add to this that when using the javascript asp.net validator it doesn't support the grouping name.
the regex to use in this situation would be:
photos/[^/]+/([0-9]+)
thought someone might find this useful

Related

Remove http:// or https:// and Trailing / in NetSuite Saved Search

Let me preface this by stating very clearly that I am not a developer and I'm new to NetSuite formulas.
I have a NetSuite saved search that include the Web Address (field id: {url})
I need to remove everything except the main part of the domain (end result should look like abc.com).
I have attempted to use REPLACE({url}, 'http://[,' ']) unsuccessfully.
I have also attempted various LTRIM, RTRIM, TRIM formulas without luck.
I found some information on using REGEXP_SUBSTR, but wasn't successful there either.
I was able to accomplish my goal in Excel using Excel string functions MID, LEN, and RIGHT, but that doesn't seem to translate in NetSuite.
I'd love some assistance.
REGEXP_SUBSTR({url}, '//(.)+') --> get substring starting with //
REPLACE({text}, '/') --> replace / with nothing
The final formula is:
REPLACE(REGEXP_SUBSTR({url}, '//(.)+'), '/')
Jala's answer doesn't seem to work for URLs such as https://stdun7.wixsite.com/stdunstansparish where it returns stdun7.wixsite.comstdunstansparish
In your saved search create a Forumula (Text) field with the following formula
REGEXP_REPLACE({url},'(^http[s]?://)([a-zA-Z0-9.-])(/?.)', '\2')
I'll break down the arguments for the REGEXP_REPLACE function and how it all works...
First argument - {url} the Field containing the url information to parse
Second argument - regexp string
Third argument = replace regexp string
the regexp string has parentheses to denote capture groups of portions of the regular expression.
The first capture group captures the protocol portion of the URL.
The second capture group captures the next part, all permissible hostname characters until the end of the string, or until a '/'
The third capture group captures the remaining portion of the string.
The replace string is used to prepare the return value of the REGEXP_SUBSTR function. Since the entire url is matched by the regexp, the entire string will be replaced by this expression, referencing the second capture group. (aka the hostname)
Since you say you're new to NetSuite formulas, I'll note that those functions are based on Oracle PL/SQL so if you want additional info or examples of how they work beyond what NetSuite provide, sometimes it's instructive to just google things like "pl/sql REGEXP_SUBSTR" etc. to get additional documentation how how they work.
Another good resource is regex101.com, a helpful site to test regular expressions in advance....

Get the host name from url without www or extension in asp.net

Hello i need a way to find out the host part of an url , i've tried
Request.Url.Host.Split('.')
but it doesn't work with url like this:
sub.sub.domain.com
or
www.domain.co.uk
since you can have a variable number of dots before and after the domain
i need to get only "domain"
Check out the second answer at Get just the domain name from a URL?
I checked the pastebin link; it's active. I didn't test the code myself, but if it outputs as he describes, you can .split() from there.
If you need to be totally flexibel, you need to make a list of all possible top-level-domains, and try to remove those, with dot, from the end of your string, resulting in
www.domain
or
sub.sub.domain
Then take the last characters after the last dot.

ASP.NET Routing: Formatting the URL string

I have implemented a routing functionality successfully in my project (a news website):
Sub RegisterRoutes(ByVal routes As RouteCollection)
routes.MapPageRoute("ndetails", "news/{title}/{id}/", "~/newsdetail.aspx")
End Sub
and I set the URLs like this (databound to a repeater):
href="<%# Page.GetRouteUrl("ndetails", new with { .title= Server.UrlEncode(Eval("Title")), .id= Eval("NewsID")})%>"
The URL produced is like:
/this%20is%20a%20news%20item/89
As can be seen above, the URL part is difficult to read and I would like it to be like:
/this_is_a_news_item/89
I thought of going for a Replace function. But then, since the user creating the news might enter any string, I have to take into account all the other characters that might need to be replaced.
I just wanted to know from an experienced developer, whether going with a long replace function is the way to go, or is there another solution to format my URLs in this rouitng scenario.
Many thanks in advance
AFAIK there is no built in funcitonality in the framework to make url "pretty". You have to implement your own url fo rewriting the title.
In the save of your entities simply use a function that do the replaces that you need (' ' with '_' or example) and then use UrlEncode.
You can also use a Regular expression to do the replacement in one go.

asp.net allow german characters in Url

I am using RegularExpressionValidator control with
[http(s)?://]*([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)?
regular expression to validate Url. I need to allow german characters
(ä,Ä,É,é,ö,Ö,ü,Ü,ß)
in Url. What should be exact regular expression to allow these characters?
I hope you are aware that it is not easy to use regex for URL validation, because there are many valid variations of URLs. See for example this question.
First your regex has several flaws (this is only after a quick check, maybe not complete)
See here for online check on Regexr
It does not match
http://RegExr.com?2rjl6]
Why do you allow only \w and - after the first dot?
but it does match
hhhhhhppth??????ht://stackoverflow.com
You define a character group at the beginning [http(s)?://] what means match any of the characters inside (You probaly want (?:http(s)?://) and ? after wards instead of *.
To answer your question:
Create a character group with those letters and put it where you want to allow it.
[äÄÉéöÖüÜß]
Use it like this
(?:https?://)?([äÄÉéöÖüÜß\w-]+\.)+[äÄÉéöÖüÜß\w-]+(/[-äÄÉéöÖüÜß\w ./?%&=]*)?
Other hints
The - inside of a character group has to be at the start or the end or needs to be escaped.
(s)? is s?

Help with a regular expression to validate a series of n email addresses seperated by semicolons

I'm using an asp.net Web Forms RegularExpressionValidator Control to validate a text field to ensure it contains a series of email addresses separated by semicolons.
What is the proper regex for this task?
I think this one will work:
^([A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,4}(;|$))+
Breakdown:
[A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,4} : valid email (from http://www.regular-expressions.info/)
(;|$) : either semicolon or end of string
(...)+ : repeat all one or more times
Make sure you are using case-insensitive matching. Also, this pattern does not allow whitespace between emails or at the start or end of the string.
The 'proper' (aka RFC2822) regex is too complicated. Try something like (\S+#[a-zA-Z0-9-.]+(\s*;\s*|\s*\Z))+
Not perfect but should be there 90% (haven't tried it, so it might need some alteration)
Note: Not too sure about \Z it might be a Perl only thing. Try $ as well if it doesn't work.

Resources