Need regex help for URL rewriting querystrings to friendly URLs - asp.net

I updated my website CMS and the URL formats have changed. Where previously I had the URL /blog.aspx?Year=XXXX&Month=YY I now have /blog/XXXX/YY
Can someone help me create a regex for this?
Two additional notes:
it has to also support simply the year (/blog.aspx?Year=XXX)
the old Month urls use only 1 digit for single digit months (/blog.aspx?Year=2009&Month=2 instead of Month=02)
Here is what I came up with:
/blog.aspx[?]Year=([0-9]{4})([&]?)(Month=)?([0-9]*)
I can't seem to get it to work, as I still get a 404 on the page when I go to one of the above URLs.

Is this workable?
/blog.aspx\?Year=([0-9]{4})(?>\&?Month=?([0-9]{1,2})|)
works with these input
/blog.aspx?Year=1983&Month=2
/blog.aspx?Year=1983
/blog.aspx?Year=1983&Month=12
there is this (?>blabla|moomoo) syntax.
If it cant find blabla match, it will match moomoo
Though i suspect regex here is not the root problem, what CMS handles the redirect?

Related

How to remove a trailing question mark (empty query string) from the URL?

Given an url such as example.com/page? I would like to strip the trailing question mark from the URL to avoid Google treating example.com/page? as a duplicate of example.com/page.
Ideally, example.com/page? should return a 301 redirect to example.com/page. How can I accomplish this with Nginx?
Thank you.
Question mark is not part of path, so url with and without ? point to same location. You no need to fix for Google or in Nginx.
Just to have more pretty URL's fix code, which generate ugly URLs.

Regex for fixing URL patterns

I have the following url structure:
http://www.xyxyxyxyx.com/ShowProduct.aspx?ID=334
http://www.xyxyxyxyx.com/ShowProduct.aspx?ID=1094
and so on..
Recently I used IIS rewrite to rewrite this structure as
http://www.xyxyxyxyx.com/productcategory/334/my-product-url
http://www.xyxyxyxyx.com/productcategory/1094/some-other-product-url
and so on..
This works fine.
I want to create another rule so that if an invalid url requests comes with the following structure:
http://www.xyxyxyxyx.com/productcategory/ShowProduct.aspx?ID=334
the 'productcategory' part should be removed from the url and the url should look like
http://www.xyxyxyxyx.com/ShowProduct.aspx?ID=334
How do I write this rule?
It may vary depending on what you are using to apply the regex, but here's a basic one:
's|productcatgory/||'
If you want to make sure it also only does this when the xyxyxyxyx url is present, this should work:
's|^http://www\.xyxyxyxyx\.com/productcategory/|http://www\.xyxyxyxyx\.com/|'
Edit: Ah, so if productcategory could be any category, then you'll need to match around it, like so:
's|^http://www\.xyxyxyxyx\.com/.*/ShowProduct|http://www\.xyxyxyxyx\.com/ShowProduct|'

Get the host name from url without www or extension in asp.net

Hello i need a way to find out the host part of an url , i've tried
Request.Url.Host.Split('.')
but it doesn't work with url like this:
sub.sub.domain.com
or
www.domain.co.uk
since you can have a variable number of dots before and after the domain
i need to get only "domain"
Check out the second answer at Get just the domain name from a URL?
I checked the pastebin link; it's active. I didn't test the code myself, but if it outputs as he describes, you can .split() from there.
If you need to be totally flexibel, you need to make a list of all possible top-level-domains, and try to remove those, with dot, from the end of your string, resulting in
www.domain
or
sub.sub.domain
Then take the last characters after the last dot.

301 Redirect with Regular Expressions

Couldn't find an answer to this and thought it might be a quick answer.
My company, a local news site, is working on migrating to WordPress from a proprietary CMS. Part of the challenge is we are restructuring URLs. I will be utilizing 301 redirects but my issue is as follows:
Example Page name: Story Name: is "this"
Example Old CMS Page URL: /story-name--is--this-/
New CMS Page URL: /news/2012/09/12/story-name-is-this/
The old CMS turned special characters and spaces into hyphens. WordPress will be configured to instead ignore special characters and simply turn spaces into hyphens. Additionally, the old CMS did not include the date in the URL, and I'm not sure the best route to take regarding adding the date.
Thanks!
You're either going to have to write a script that takes all of your old links, does a lookup in your database to transform it into the new link, and redirect the browser to the new link. Or you'll have to enumerate the entire mapping of old links -> new links and create a 301 redirect for each of them (in either your vhost/server config or in an htaccess file):
Redirect 301 /story-name--is--this-/ /news/2012/09/12/story-name-is-this/
It's not clear what is your real question? I am also not sure what Regular expressions have to do with the problem.
There is no information about what your old CMS is capable of, assuming that you can intercept the calls to old articles when they are accessed via the browser, but before they are rendered you can form and send the redirect back to the browser dynamically generating the url using the programming mechanisms available in your proprietary CMS.
Again, assuming you have access to Java:
A. When generating the redirect URL you can access the article's date and form the
2012/09/12 from the date, you can use SimpleDateFormatter to format Dates into a string representation like YYYY/MM/DD.
B. You can use similar approach with the titles and replace the list of special characters in the title string with empty spaces. For example Apache StringUtils library can let you specify a set of characters to look for and if any are found they will be replaced with the target character.
C. You concatenate the output of A and B to create the target redirect URL and send it back to the browser instead of the article itself.

What is the name for that thing that lets part of the URL be an argument?

For example:
http://stackoverflow.com/questions/698627/ms-access-properties
The number is part of the URL but is an argument to the web app as opposed to other options like:
http://www.google.com/firefox?client=firefox-a&rls=org.mozilla:en-US:official
where all the args come after the '?'. I have used the second form before and I'm only trying to learn about the first form.
I'm sure I can find what else I need once I known what that's called so I can Google it.
URL Rewriting, generally.
Edit: Here is a good introduction to URL Rewriting.
Variables passed in the form of a URL are called the Query String. In a url like:
http://examples.com?a=b&c=d&e=f
The query string is ?a=b&c=d&e=f
In the Stackoverflow example, it uses URL Rewriting, specifically with MVC Routing to make 'pretty URLs'. There are other ways to do it in other languages. Some make use of Apache's mod_rewrite (example) while others parse the requested URI. In PHP a url like
http://example.com/index.php/test/path/info
can be parsed by reading $_SERVER['PATH_INFO'] which is /text/path/info.
Generally, they are using URL Rewriting to simulate the query string however. In the Stackoverflow example:
http://stackoverflow.com/questions/698711/what-is-the-name-for-that-thing-that-lets-part-of-the-url-be-an-argument
The important parts are the questions/698711. You can change the title of the question with impunity but the other two parts you cannot.
It's usually called the 'path info'.
That's just URL mapping. It lets you use pretty URLs instead of a large query string.
I believe the StackOverflow URL works that way because it is using MVC whereas your bottom example is using standard requests.
It is indeed done by URL rewriting.
Usually, web application frameworks do this automatically if you install it correctly on your server.
Check out CakePHP as an example.
It's called a URL parameter and uses the HTTP GET method. As others mentioned, it can be rewritten using URL rewriting so that the URL is easier to read and use. Some search keywords: "SEF URLs", "Apache Rewrite", "pretty URLs".

Resources