Will Googlebot follow _escaped_fragment_ HTTP redirect? - asp.net

I have an ajaxified website, and I want all my content to be crawlable. I have a photo gallery, which only loads the photo using ajax, without refreshing the whole page. My root URL is this:
http://mysite/photos
and whenever a photo thumbnail is clicked, it displays the photo, and hash becomes #!/photo/photoid/phototitle, or when you are searching for a criteria, it becomes #!/photos/f-number/1.8/iso/640 e.g. for searching for photos with f/1.8 at ISO 640 (and more criteria can be appended this way). When a user opens up a URL like http://mysite/photos/#!/photos/f-number/1.8/iso/640 the landing page, using a javascript, will redirect the user to http://mysite/photos/f-number/1.8/iso/640 (without the hashbang), and again, there, the page loads http://mysite/Dynamic/PhotoThumbnails.aspx?f-number=1.8&iso=640 using ajax (yes, javascript looks at the location path and parses it according to that format). For the first case (link of a photo itself rather than a search), using again, only javascript, the page loads the photo itself (along with some extra tables showing technical info about photo) from the url http://mysite/Dynamic/RenderPhoto.aspx?ID=123 (where 123 the ID of the photo).
Given this information, my problem is simple: I am planning to (on my masterpage load event) redirect all requests with _escaped_fragment_s to the appropriate RenderPhoto or PhotoThumbnails page, by parsing the _escaped_fragment_ at server side. Will that work? My main concerns are;
Will Google follow the HTTP redirect? (301 or 302)
Will I get into any trouble (such as being removed from index) as I am not showing the exact same content to Google? (a browser will load a side mavigation bar, and all those fancy css styles visually-nice-looking page etc. and then load the real content into a pane at that page, where Google will be getting the "true" content only. My base page, sidebar content thumbnail list page, and photo renderer are COMPLETELY different pages which implement their OWN logic, so I cannot ever merge them)
If there is a risk of being removed due to the reasons above, what are my alternatives (no, I cannot merge the pages, it is NOT an option)? Do you recommend taking regular snapshots of pages and cache them and sending those to Googlebot?
Here is the current BETA of my website (yeah I know about lots of bugs), just to give you the idea how it will work: http://canpoyrazoglu.com/photos
I'm on ASP.NET 4.0, and using jQuery, if it helps.

A new answer to an old question. Yes it will follow it. However you may end up with both the clean and #! URLs. However, check this out (from Google Developer Guides):
Note that if you use a permanent (301) redirect, the url shown in our
search results will typically be the target of the redirect, whereas
if a temporary (302) redirect is used, we'll typically show the #! url
in search results.
This is the Google Developer Guide link:
https://developers.google.com/webmasters/ajax-crawling/docs/faq#redirects

Yes, I'm pretty sure it will follow a redirect. The Facebook open graph debugger does, and this blog post advocates implementing redirects: http://www.yearofmoo.com/2012/11/angularjs-and-seo.html

Related

Show constant URL for site in asp.net

I have a web site with number of pages, developing in asp.net.
I have a page URL's like:
example:
1) http://www.xyz.com/Home.aspx
2) http://www.xyz.com/Index.aspx
3) http://www.xyz.com/viewMember?Name=abc&id=1
But the end user is at any page, i would like to show the URL like "http://www.xyz.ie".
Is there any setting in web.config ? If not, is there any other way ?
Please help me...
Thanks in advance.
Jagadi
You can not keep one single URL for different page - but you can do some tricks to simulate it.
To make the url stay the same, but the content change, you need to make some trick.
I am not recommend, search engines they will not follow what you do and they show each page different, user can not make bookmark, and average user can easy find the real url of the page, even with one different click on the browser can find it.
One trick is to use frames, or iframes. On the main page you load all the rest inside an iframe, or inside a frame.
Second trick is to use ajax to load each other content.
And finally you can use session to know what to show on the user, user did not change links, but make post back that change the content.

Redirect page to itself - what's the correct http status code to use

This is a question about web application architecture rather than coding per se, however I still think it belongs here as it's in the problem domain of most web developers:
My problem. I have a page on which the content is not complete (only partial content). I don't want to just return a 200 response because I want it to be clear that the content on the page is only temporary, and that a visitor (google) should return at a later date to retrieve the correct page.
I'm not sure if there is a status code in the http specification that would be useful here.
I'm thinking about using a 302 redirect to the same URI, but I'm not sure if google will see this as gaming (I don't see why it should - no-one would 302 to the same URI on a permanent basis as the page content would be pretty much disregarded).
That's exactly what I want: For the page to be accessible - but for google to disregard the page, remember the URL and come back later to index it.
I don't want to use a meta 'no-index' tag with a 200 response as I fear this will stop the page being reindexed when the correct content is ready.
206 is the partial status code but thats not what you are doing here. Thats for multi part docs. What you have here is a "under construction" type page but only the content in the page is going to change not the uri. So the right thing to do is just return a 200 and let Google index it.
If you don't want it indexed yet because it is not ready for the public yet then add a meta no-index like you say. Google still downloads the page and parses it to find the no-index but does not index it. Remove the no-index when you are ready and it will start indexing. You can even prompt this by submitting a new sitemap.xml file with your page in it.
Google re-indexes insanely quickly these days so don't worry too much about temp blocking a page with a meta tag.
Google will re-index the page when the content changes automatically. Or you can force an update somewhere in the webmaster tools.
Alternatively, you could have the page 302 to an alternate address with your partially completed content until such time as the page is 'finished'. Then copy the final content into your original page and take off the 302.
Any error codes are reserved for error conditions. There are no such error as "This page is not in it's final version", indeed. What you might want is to specify that this page becomes obsolete and invalidated at some later time. For example, the following code means the page becomes obsolete instantly:

how to get search engines to understand a DB driven asp.net site

All,
This would seem like a fairly basic asp.net question - but in all my years of coding, I've never really thought about it.
Say you have a asp.net 2.0 site with only a masterpage and a default.aspx and its a blog that saves all the data into the database. Links on the side are generated automatically. So ... the URL is always just http://www.XXXXX.com/default.aspx.
So, with that being the case, what do you need to do so that ... say google ... knows about all the different blog entries and links directly to the entries instead of just the base URL?
Is it as simple as changing the forms method to: method="get"?
Thanks, L. Lee Saunders
There are at least two solutions:
Search engines understand query strings, so just add the article IDs to the URLs in your anchor tags -- no need to even use a form control.
Use URL rewriting to expose one set of URLs to the outside world (like /article-title/1234/) in your anchor tags, and then modify the URL to be default.aspx when it arrives at your site; the page could then pull the article to be displayed from any number of places, including but not limited to a query string.
You could have a REST webservice so that you can just use urls to navigate the site, and perhaps have a front page with some new posts, so that the spider can navigate the site..
As an example, look at the urls for SO, it is easy for a spider to navigate this database-driven website.
Create a page that just serves up XML Sitemap (the data obviously being pulled from your database) and submit the sitemap to Google.
Google will then index any links in your sitemap.
(This assumes that these is some difference between each article - e.g. a Querystring key/value).
Useful Link(s):
Web Sitemap Generators
Google Sitemap Validator
Google Sitemaps for ASP.NET 2.0 (there are about a gazillion interesting links off the back of this as well).
some sort of URL rewriting may be an answer
I wouldn't recommend a postback for your situation, it can get ugly for refreshes etc. So, yes, change the method to "get"
Then, say your page of, default.aspx?postid=12345 will get translated into /mm/dd/yy/this-is-my-post.aspx

iframed ASP actions trouble

This is actually a follow up on my previous question (link)
I've created the HttpHandler and it works fine for now, I'll add flexibility by using the querystring and session to point the post I'm making in the right direction.
The next question is as follows.
Now that I have the old page iframed as it should be, there's still the trouble of handling the postbacks (or actions) these pages trigger.
Every button action (asp form post) refers to a page that is not there (it's on the other server from which I am importing functionality).
I've tried using a url mapping to the other server but I get an error that tells me the external link is not a valid virtual directory. Hence I discarded this option.
I there anyway to keep functionality going inside the iframe?
please do ask clarification if you need it.
I got a solution from a colleague.
before passing the response string to the Iframe from the handler I use a string.replace to adjust the urls in the old site. This way they point to the old site and everything works again :)

Track ad link clicks but maintain SEO-friendly links?

I have a web site that spits out links to third party sites. Now these third parties want MY site to track their clicks. How do I do this without ruining the SEO-friendly nature of a plain link?
Currently an ad link is just an anchor:
Come Visit Site A!
I can easily change the links to something like this:
http://mysite.com/clicktracker.aspx?redirect=adsiteA.com
But won't that kill any search engine benefits of linking to their site? If not, I'll happily do it this way... What are my other options? An onmousedown script that hijacks the click and does a postback then redirect?
Do your third party sites want you to report on all the bots and spiders that have crawled your site and followed the links, or just "real" people?
If it's the latter, you could do something along the lines that google use for their search results.
Basically, you render the link out normally, but add an OnMouseDown event to it, so that a spider that doesn't use a mouse follows the standard link, but a normal browser will fire the JS event first.
What you would end up with is something like this:
<a onmousedown="return trackMe(this)" href="http://example.com/">
And the trackMe method is then performing the redirect to the tracking page, which then issues a 302 redirect to the third party site.
You'd obviously want to check how this works for users navigating via the keyboard or similar (i.e. using Space or Return to follow the links).
If they're paid links, Google says they're not supposed to benefit your advertiser's PageRank. (In fact you could get penalized for trying to subvert this)
http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=66736

Resources