why are there so many duplicate words in url - http

Like this one, why are there so many duplicate words in url?
http://medicine.uiowa.edu/biochemistry/proteomics/biochemistry/biochemistry/biochemistry/biochemistry/biochemistry/node/451
Even when I add more biochemistry, it still works! Anyone can explain?

I used Chrome's Network Inspector, but all browsers have this capability. When a request is made to https://medicine.uiowa.edu/biochemistry/, the response code is a nice 200. If you hit https://medicine.uiowa.edu/biochemistry/proteomics/, you'll see that you get a 301, meaning that this link has been moved permanently, and you can see that you've been redirected to just /biochemistry again.
You may also get a 304, which tells the browser to simply get the content from a different location without retransmitting any information. Indeed, it appears you can add any number of /proteomics or /biochemistry to the URL and it will go to the same place. My guess is that whoever set up the web server rules used a flawed regular expression for routing.

Related

What is the significance of the _sm_nck=1 querystring?

What is the significance of the querystring:
_sm_nck=1
that I see appended to a lot of page requests? What adds it and why?
We are seeing the occasional page request coming into our site with that parameter added, usually as a second request for the same page from the same IP address a few seconds later. Also if I 'Google' that parameter I see a high number of search results have it appended too.
I think it has something to do with ZScaler not having the site in its database of sites. See the image below. I did not add the _sm_nck myself.

stylesheet linked with question mark and numeric value

I can see this site.com/assets/css/screen.css?954d46d92760d5bf200649149cf28ab453c16e2bwhat is this random alpha numeric vales question mark ? i don't think it's taking some value to use or what is it about ?
edit : also on refreshing page the alpha-numeric value is same.
It is for preventing the browser from caching the CSS. When a CSS is requested by some browsers, specifically Internet Explorer, the browser will have a local copy of the CSS.
When a request is given to a server as:
site.com/assets/css/screen.css?skdjhfk
site.com/assets/css/screen.css?5sd4f65
site.com/assets/css/screen.css?w4rtwgf
site.com/assets/css/screen.css?helloWd
The server at site.com sees only:
site.com/assets/css/screen.css
And gives the latest version. But when the HTML page is requesting the browser to fetch the CSS as: site.com/assets/css/screen.css, for the first time, it fetches from the site.com server. There are many possibilities that the content might be changed in the meantime when the next request is sent. So programmers generally add a ?and-some-random-text, which is called Query String. This will force the browser to get a new copy from the server.
Some more detailed explanation:
It is a well known problem that IE caches too much of html, even when
giving a Cache-Control: no-cache or Last-Modified header to
everypage.
This behaiviour is really troubling when working with querystrings to
get dynamic information, as IE considers it to be the same page
(i.e.: http://example.com/?id=10) and serves the cached version.
I've solved it adding either a random number or a timestring to the
querystring (as others have done) like this
http://example.com/?id=10&t=2009-08-06_13:12:56 that I just ignore
serverside.
Is there a better option? Is there another, cleaner way to acomplish
this? I'm aware that POST isn't cached, but it is semanticaly
correct to use GET here.
Reference: Random Querystring to avoid IE caching

Facebook like button is liking wrong url

First off, I saw similar posts already, but they weren't exactly what I am asking.
I used the Facebook Dev to create a like button for my website, stuck the code in and the the button showed up. The only issue is that it likes the wrong url when I click the button.
I'm pretty sure the issue is that I have it set to redirect automatically from mydomain.com to the most recent post. I think this is gumming up the works with the like button and causing it to like mydomain.com/mostrecentpost instead of simply liking mydomain.com.
Is there a way to correct this issue without having to get rid of the redirect (because that isn't an option)?
Sorry if that was a little wordy, wanted to make sure I explained the issue fully.
Is there a way to correct this issue without having to get rid of the redirect (because that isn't an option)?
Either don’t redirect in those cases where the user agent header of the request points to it being Facebook’s scraper;
or set the canonical URL of http://example.com/mostrecentpost to just http://example.com/ using the appropriate Open Graph meta tag. (Although that would mean you would not be able to like a single post any more, because all of your posts would just point to your base domain as the “real” website; so the first is probably the better option.)

Is it kosher to send a 404 or 410, but still show the content?

There was a lot of discussion about deleted questions over on Meta over the past couple of days. One proposal that came up how to deal with the deletion of questions now deemed off-topic was showing some popular deleted questions to everyone - with the grey
look that 10k+ users get when viewing a deleted question.
In that look, the background is greyed out, no interaction is possible, but all the content is still accessible:
I proposed the pages could at the same time send a 404 not found or 410 gone if the overwhelming desire is to phase them out from the search index.
So the content would be shown, but a 4xx status code sent.
However, there was a comment critcizing this idea:
Ehhh why send a 404 when the link exists publicly? You're breaking the semantics of the 404 code
I tend to disagree: what is shown in the response body (to satisfy the curiosity of us humans) doesn't really matter, does it? And machines get the 4xx to work with.
Who's right?
In my mind if you're going to show the original content (yes the colours are different to a human, but not to a search engine) then to return a not found or gone status is not appropriate. It's either there or it isn't; it can't be simultaneously there and not there (unless of course it's Schrodinger's Content).
It would be more appropriate to have the url permanently redirect to a non-indexed archive url instead; or if the original content is genuinely gone then a non-indexed 404 linking to similar content if possible - but I think that needs to be kept short and sweet.
As a user of the internet - I personally hate 404 pages that actually try to display meaningful content.
Ultimately I want to know if you have what I'm looking for or not. If not, then tell me straight. Don't tell me you 'used' to have some content but it was gotten rid of!

Give user a 404 if querystring is missing?

Say that I require a querystring; for example "itemid". If that querystring is for some reason missing, should I give the user a 200 error page or a "404 Not Found"?
I would favour 404 but I'm not really sure.
Maybe you should give a "400 Bad Request".
The request could not be understood by the server due to malformed syntax. The client SHOULD NOT repeat the request without modifications.
See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for more possibilities.
And like Chris Simpson said give a "404 Not Found" when no item for the corresponding item id is found.
You could also check popular RESTful apis to see how others have handled the problem. For example Twitter.
Good question. Personally I would use a 200 and provide a user friendly error explaining the problem but it really depends on the circumstances.
Would you also show the 404 if they did provide the itemid but the particular item did not exist?
From a usability standpoint, I'd say neither.
You should display a page that tells the user what's wrong and gives them an opportunity to fix it.
If the link is coming from another page on your site (or another site), then a page that tells them that the requested item wasn't found and redirects them to an appropriate page, maybe one that lets them browse items?
If the user's typing the querystring themselves, then I'd have to ask why? Since the URI isn't typically user-friendly.
You should give a user 200, only when the HTTP Request you got was responded with an appropriate Response, even when it is only a simple HTML that says they are missing a parameter.
The 404 code is when the User Agent is requesting a resource that is missing.
Check this list for further info
http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
Give a 500 error, probably a 501. This will allow a browser or code to handle it through existing onError mechanisms, instead of having to listen for it in some custom way.
The way I see this working, you should return a 200 - because it should be a valid resource.
Let's say one of your URLs is widgets.com/browse.php?itemid=100 - and that URL displays a specific item in your catalog.
Now, a user enters widgets.com/browse.php - what do we expect the action to be? To list all of the items in your catalog, of course (or at least a paginated list).
Rethink your URL structure and how directory levels and parameters relate to one another.

Resources