Redirect page to itself - what's the correct http status code to use - http

This is a question about web application architecture rather than coding per se, however I still think it belongs here as it's in the problem domain of most web developers:
My problem. I have a page on which the content is not complete (only partial content). I don't want to just return a 200 response because I want it to be clear that the content on the page is only temporary, and that a visitor (google) should return at a later date to retrieve the correct page.
I'm not sure if there is a status code in the http specification that would be useful here.
I'm thinking about using a 302 redirect to the same URI, but I'm not sure if google will see this as gaming (I don't see why it should - no-one would 302 to the same URI on a permanent basis as the page content would be pretty much disregarded).
That's exactly what I want: For the page to be accessible - but for google to disregard the page, remember the URL and come back later to index it.
I don't want to use a meta 'no-index' tag with a 200 response as I fear this will stop the page being reindexed when the correct content is ready.

206 is the partial status code but thats not what you are doing here. Thats for multi part docs. What you have here is a "under construction" type page but only the content in the page is going to change not the uri. So the right thing to do is just return a 200 and let Google index it.
If you don't want it indexed yet because it is not ready for the public yet then add a meta no-index like you say. Google still downloads the page and parses it to find the no-index but does not index it. Remove the no-index when you are ready and it will start indexing. You can even prompt this by submitting a new sitemap.xml file with your page in it.
Google re-indexes insanely quickly these days so don't worry too much about temp blocking a page with a meta tag.

Google will re-index the page when the content changes automatically. Or you can force an update somewhere in the webmaster tools.
Alternatively, you could have the page 302 to an alternate address with your partially completed content until such time as the page is 'finished'. Then copy the final content into your original page and take off the 302.

Any error codes are reserved for error conditions. There are no such error as "This page is not in it's final version", indeed. What you might want is to specify that this page becomes obsolete and invalidated at some later time. For example, the following code means the page becomes obsolete instantly:

Related

Cannot change og_type

I just replaced a tumblr website for a client with a brand new Wordpress site. And when running it through the Facebook debugger, I get this error:
The object at 'http://example.com/' previously had type 'tumblr-feed:tumblelog' and cannot be changed to an object of type 'website' to avoid data corruption of existing actions.
I Googled "Cannot change og_type" (in quotes) and got literally zero results (well now it seems there are results stemming from this question). Am I really doomed to Facebook data mismatch?
Per the error message
... cannot be changed to an object of type 'website' to avoid data corruption of existing actions.
If the og:type were changed for a URL, any existing user posts linking to it or sharing it, any Open Graph actions referencing it, as well as any likes of the URL would become broken and the user's profiles would be missing content they'd posted before.
I don't believe there's any way around this, as it's an intentional restriction to avoid breaking existing posts, likes, actions, etc referencing a URL. If the posts were broken, content would be removed from or mangled on the user's timeline.
A possible workaround if you want to have a 'new' object at that URL is to use my instructions in this answer about moving URLs to put a Like button on the URL you're trying to change (let's call it A), but pointing to a slightly different, new URL (let's call it B) , and then use the redirect mechanism in my answer to bounce users landing at URL B back to A, but serve the metdata describing 'A' on URL B if the Facebook crawler accesses it
Does the client's site have more than 10,000 likes? If so, Facebook doesn't allow og:type to be changed.
You can update the attributes of your page by updating your page's tags. Note that og:title and og:type are only editable initially - after your page receives 50 likes the title becomes fixed, and after your page receives 10,000 likes the type becomes fixed. These properties are fixed to avoid surprising users who have liked the page already. Changing the title or type tags after these limits are reached does nothing, your page retains the original title and type.
Here's the link to the Open Graph documentation. :)
I would reccomend using the Open Graph Debugger to check what facebook really sees and if facebook eventually has a cached version of your site. (you find hte debugger here: https://developers.facebook.com/tools/debug)
NOte that it doesnt say og:type - it says og_type
This is hitting me too since my og:type is set to "shamrockirishbar:shamrockirishbar" BUT the linter is saying og_type (of which there is none in my meta data) is set to "website".
enter link description here

Will Googlebot follow _escaped_fragment_ HTTP redirect?

I have an ajaxified website, and I want all my content to be crawlable. I have a photo gallery, which only loads the photo using ajax, without refreshing the whole page. My root URL is this:
http://mysite/photos
and whenever a photo thumbnail is clicked, it displays the photo, and hash becomes #!/photo/photoid/phototitle, or when you are searching for a criteria, it becomes #!/photos/f-number/1.8/iso/640 e.g. for searching for photos with f/1.8 at ISO 640 (and more criteria can be appended this way). When a user opens up a URL like http://mysite/photos/#!/photos/f-number/1.8/iso/640 the landing page, using a javascript, will redirect the user to http://mysite/photos/f-number/1.8/iso/640 (without the hashbang), and again, there, the page loads http://mysite/Dynamic/PhotoThumbnails.aspx?f-number=1.8&iso=640 using ajax (yes, javascript looks at the location path and parses it according to that format). For the first case (link of a photo itself rather than a search), using again, only javascript, the page loads the photo itself (along with some extra tables showing technical info about photo) from the url http://mysite/Dynamic/RenderPhoto.aspx?ID=123 (where 123 the ID of the photo).
Given this information, my problem is simple: I am planning to (on my masterpage load event) redirect all requests with _escaped_fragment_s to the appropriate RenderPhoto or PhotoThumbnails page, by parsing the _escaped_fragment_ at server side. Will that work? My main concerns are;
Will Google follow the HTTP redirect? (301 or 302)
Will I get into any trouble (such as being removed from index) as I am not showing the exact same content to Google? (a browser will load a side mavigation bar, and all those fancy css styles visually-nice-looking page etc. and then load the real content into a pane at that page, where Google will be getting the "true" content only. My base page, sidebar content thumbnail list page, and photo renderer are COMPLETELY different pages which implement their OWN logic, so I cannot ever merge them)
If there is a risk of being removed due to the reasons above, what are my alternatives (no, I cannot merge the pages, it is NOT an option)? Do you recommend taking regular snapshots of pages and cache them and sending those to Googlebot?
Here is the current BETA of my website (yeah I know about lots of bugs), just to give you the idea how it will work: http://canpoyrazoglu.com/photos
I'm on ASP.NET 4.0, and using jQuery, if it helps.
A new answer to an old question. Yes it will follow it. However you may end up with both the clean and #! URLs. However, check this out (from Google Developer Guides):
Note that if you use a permanent (301) redirect, the url shown in our
search results will typically be the target of the redirect, whereas
if a temporary (302) redirect is used, we'll typically show the #! url
in search results.
This is the Google Developer Guide link:
https://developers.google.com/webmasters/ajax-crawling/docs/faq#redirects
Yes, I'm pretty sure it will follow a redirect. The Facebook open graph debugger does, and this blog post advocates implementing redirects: http://www.yearofmoo.com/2012/11/angularjs-and-seo.html

Knowing the status of a form POST that is protected by X-Frame-Options

I am going to describe my specific case below, but this might be useful to a number of web-mashups.
My web application POSTs to Twitter by filling a form and then submitting it (via javascript). The target of the form is set to an iframe which has an onload trigger. When the onload trigger is called the application knows that the POST was completed.
This used to work fine until Chrome version 11, which now respects the X-Frame-Options=SAMEORIGIN sent by Twitter in the POST response. The POST goes through, but the iframe's onload is not called anymore.
It still works in Firefox 4, but I suppose that's a bug that will eventually get fixed.
Is there any other way to know the status of the POST? I understand that knowing the contents of the POST response would violate the security policy, but I am not interested in the contents. I would just like the app to be notified when the POST is completed.
If you just need to know when the POST was submitted, and not necessarily whether it succeeded or not, you could poll the iframe's contentWindow and contentWindow.document on an interval. When you can no longer access one of those objects, or when the document has an empty body, that means that the iframe has loaded a page with X-Frame-Options restrictions, which likely means that the submission went through. It's hacky, but it looks like it will work for this purpose. (You'll probably have to go through a few combinations to figure out what the contents of restricted iframes look like in your target browsers.)
You can do it by getting the headers of the page. In php it will be looks like,
$url = 'http://www.google.com';
print_r(get_headers($url));

Setting content expiration on all pages to avoid back button in ASP Classic

Is there a way to set pages to expire in ASP Classic so that the user can't hit back and re-do anything?
Is this a good practice?
If you force the page to 'expire', it would have the opposite effect you want: It would actually force the browser to make the request again (because it's been told the data it has expired)
I suspect you might be barking up the wrong tree here, though. Are the pages that "do stuff" using the Query String values as the parameters to take those actions? In other words, is the page that links to the 'action' page doing so via a regular anchor tag with query string parameters in the URL, or via a form using the GET method?
If so, you should change the form submitting that action to a POST form. Doing that will not only result in a prompt in the browser if the person uses the Back or Refresh buttons to try to reload that page, but also helps protect you against Cross-Site Request Forgery attacks. (more info on XSRF here)
What is the problem that you are trying to solve? If the back button is forcing something to be updated on the server, then you are better off making sure that you don't allow pages to be in the browser history that can cause problems.
After a POST, I often do a Response.Redirect so that the POST is not in the browser history. This helps avoid these types of issues.

Smart way to disallow users going to a site page directly

A site has 100's of pages, following a certain sitemap. A user can navigate to page2.aspx from page1.aspx. But if the user goes to page2.aspx directly say through a book marked URL, the user should be redirected to page1.aspx.
Edit: I dont want to go in and add code to every page that needs to fulfill this need.
Note: This is not a cross-page postback scenario.
You might consider something that is based off WorkFlow, such as this: http://blogs.msdn.com/mwinkle/archive/2007/06/07/introducing-the-pageflow-sample.aspx
The WCSF team also included a pageflow application block that you can use as a standalone add-on to your application.
I guess you could check the referrer, and if there isn't one / or it isn't page1.aspx then you could redirect back to page1.aspx.
As another answerer mentioned, you could use the Referrer header, but that can be faked by the client.
Since you don't want to modify each page, you could do something with an IHttpModule. Assuming you have some way of describing the valid page navigations, you could do something like this in the BeginRequest handler:
Check the session for a list of valid pages (using a default list for first visit if none are in the session).
If this request is for an invalid page, redirect to the place the user should be.
Based on this request, set up the list of valid pages and redirect page in the session so it's ready for the next request.
I recently worked with real code that checked to see if referrer was blank and used that as a step in authorization. The idea was users wouldn't be able to fake a referrer, you don't need a custom browser to fake a referrer. Users can book mark your page to delicious, then delicious.com is the referrer (and not blank).
I've had real arguments about how sophisticated a user needs to be to do certain hacks-- i.e. if users don't know how to set the referrer, then you can trust it. While true, it's unlikely your users will write a custom browser, but there already are Firefox addons to set headers, referrers etc and they're easy to use.
Josh has the best answer-- on page2 you should check the page hit log and see if the user has recently visted page1
I like alot of the answers above (specifically the workflow).
Another option, is creating each page as a usercontrol and having page1.aspx control what usercontrol gets loaded. This has the advantage of storing your workflow in a single place instead of on each page.
However, I don't think there's a magic bullet out there. It sounds like this security problem is an afterthought, or possibly reported as a bug, and you have been tasked with fixing it quickly and efficiently.
I would start weighing the answers here with their associated cost in hours.. I suspect the quickest solution will be to check referrer addresses on each page. Although hackable, it is obscure and if that risk is acceptable to you it may be the appropriate solution.

Resources