I'm trying to support a CQ5 (5.5) installation developed by an outside firm for my company.
It appears that my company wanted a pretty 404 page that looked like the rest of the site, and using the custom Sling 404.jsp error handler to redirect to a regular page that merely says "Page Not Found" was the easiest way to do it. The problem is that the 404 page actually returns a 200 status code since it really is just a regular content page that bears a "Not Found" message on it.
This is causing us problems with Google and the GoogleBot, since Google believes all the old search links to now non-existent pages are still valid (200 status code).
Is there any way to configure CQ to return the appropriate 404 status code for the "not found" HTML page that we display? When I am in the CQ Author mode editing the page, I find nothing in page properties or in components that could be added to the page.
Any help would be appreciated, as CQ is not exactly my area of expertise.
You'll have to overlay /libs/sling/servlet/errorhandler/404.jsp file in order to do so - copy it to /apps/sling/servlet/errorhandler/404.jsp and change according to your specification.
And if you are looking specifically into setting appropriate response status code - you can do it by setting respective response property:
response.setStatus(404);
UPDATE: instead of redirecting to the page_not_found.html you might want to include it to the 404.jsp after setting response status:
<sling:include path="path/page_not_found.html" />
You can set the response code fairly easily with this sort of code: response.setStatus(SlingHttpServletResponse.SC_NOT_FOUND);
So for example, a quick-and-dirty implementation on your page_not_found.jsp would be as follows:
<%
response.setStatus(SlingHttpServletResponse.SC_NOT_FOUND);
%>
(or a longer-term/better implementation would be to set it via a tag and a tag library to avoid scriptlets)
If your page_not_found.html page is a static HTML page and not rendered via a jsp, you may need to change your 404.jsp so it redirects to a page that is rendered via a jsp for this approach to work. The status code is set by the server rendering the response. It is not something intrinsic in the HTML itself, so you won't be able to set this in a regular, static HTML page. Something must be done on the server to set this status code. Also see How to Return Specific HTTP Status Code in a Plain HTML Page
Related
On the following page, the number 2, 3 ... at the bottom all point to the same URL. Yet, the different tables will be shown. Does anybody know what specific techniques are used here? How to extract information in these tables using raw HTTP request (I prefer not to use a headless browser to do so)? Thanks.
https://services27.ieee.org/fellowsdirectory/home.html#results_table
It is using Javascript (AJAX) to make HTTP calls to the server.
If you inspect the Network activity in the Developer tools you will see calls to the following URL: https://services27.ieee.org/fellowsdirectory/getpageresultsdesk.html.
They send data from Javascript:
selectedJSON: {"alpha":"ALL","menu":"ALPHABETICAL","gender":"All","currPageNum":1,"breadCrumbs":[{"breadCrumb":"Alphabetical Listing "}],"helpText":"Click on any of the alphabet letters to view a list of Fellows."}
inputFilterJSON: {"sortOnList":[{"sortByField":"fellow.lastName","sortType":"ASC"}],"typeAhead":false}
pageNum: 2
You can see the pageNum property. This is how they request a specific page of results.
When you click the number buttons, some Javascript code makes an AJAX POST request to https://services27.ieee.org/fellowsdirectory/getpageresultsdesk.html;jsessionid=yoursessionid with formData including pageNum: 3 and some other formatting parameters. The server responds with the HTML block of table rows that get loaded into the page. You can look at the requests on that webpage in your browser's network inspector (in the developer tools) to see exactly what HTTP requests are happening.
The link has an onclick handler that changes the href onclick. Go to
https://services27.ieee.org/fellowsdirectory/home.html#results_table
In the console, enter:
window.location=getDetailProfileUrl('lOH1bDxMyI1CCIxo5ODlGg==');
This redirects to Aarons, Jules.
Now go back and enter window.location=getDetailProfileUrl('JJuL3J00kHdIUozoVAgKdg==');
This opens Aarts, Ronald.
Basically, when the link is clicked, the JavaScript changes the url of the link.
To extract them using php, use the file_get_contents() function.
echo file_get_contents('https://services27.ieee.org/fellowsdirectory/home.html#results_table');
That will print out the page. Now scrape it with JavaScript.
echo "<script>console.log(document.querySelectorAll('.name'));</script>";
Hope this helps.
With Crawler4j, I can fetch page linked by a complete url, such as:
<a href='http://www.domain.com/thelink'>
However I found that if the link is relative, such as:
<a href='/thelink'>
Crawler4j will bypass this link(page), and I even have no chance to see the link in shouldVisit(Page referringPage, WebURL url) method.
I do not see any configuration about this in Crawler4j Github page, do I miss something?
As described in the related issue on the project page, it seems that this behaviour is related to the fact, that this specific web-page does a lot of rendering content using ajax / javascript.
However, crawler4j is not able to render javascript styling on demand as it does not include a javascript engine for this purpose. In addition, the script tag is not scanned for URLS yet.
How to add Open Graph properties to an iframe?
For example, this is a Facebook "Like" button generated by default by prettyPhoto (lightbox-like jquery plugin for displaying full-screen images):
<iframe src="//www.facebook.com/plugins/like.php?locale=en_US&href=http%3A%2F%2Fexample.com%2F%23prettyPhoto%2F2%2F&layout=button_count&show_faces=true&width=500&action=like&font&colorscheme=light&height=23" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:500px; height:23px;" allowtransparency="true"></iframe>
This code has the following problems:
in practice, it Likes the page http://example.com instead of the
dynamic photo page (which is http://example.com/#prettyPhoto/2/)
It shows title and description from page http://example.com instead of the dynamic photo page
The http://developers.facebook.com/tools/debug tool, when I pass my dynamic image page address to it (http://example.com/#prettyPhoto/2/) tells me that:
Inferred Property: The 'og:url' property should be explicitly provided, even if a value can be inferred from other tags.
Inferred Property: The 'og:title' property should be explicitly provided, even if a value can be inferred from other tags.
Inferred Property: The 'og:description' property should be explicitly provided, even if a value can be inferred from other tags.
Will setting those properties help to have the correct url and description for the image? How to insert those properties into the "Like" button iframe? For example, let's assume that the og:title is "My super image". How should the mentioned above iframe code look like with this property set, where to put it?
How to add Open Graph properties to an iframe?
Not at all, of course.
Those properties are put into the HTML code of URLs delivering HTML documents; the “iframe” has nothing to do with it whatsoever.
This code has the following problems:
in practice, it Likes the page http://example.com instead of the dynamic photo page (which is http://example.com/#prettyPhoto/2/)
It shows title and description from page http://example.com instead of the dynamic photo page
Those are not problems of the like button code, but of the site at http://example.com.
The hash part of an URL has only client-side meaning – it does not even get transferred to the web server at all. So http://example.com/#prettyPhoto/2/ and http://example.com/ are the exact same URL, from the web server’s view (the server serving example.com). So it will deliver the same data for requests to “both” of those URLs, and that is totally not the like button’s or for that matter Facebook’s fault.
It is most likely the fault of the noob who set up that page on example.com using fancy AJAX an’ stuff, thinking that’s cool and whatnot, but lacking basic knowledge about the techniques involved at the same time.
Will setting those properties help to have the correct url and description for the image?
No; not until the matter of making the individual contents of the site available under individual URLs is not resolved first.
I'm trying to execute an HTTP GET from my website to another website that is brought in via iframe.
On Firefox, you can see in the source that the correct url is in the iframe src along with it's correct parameters-- and it works.
On IE, you can see in the source that the correct url is in the iframe src along with it's correct parameters-- and it doesn't work...
Is there something about IE that doesn't let you pass parameters through an iframe in the querystring?
I've tried refreshing the iframe in IE, I've tried refreshing my page & the iframe in IE, and I've tried copying the url and re-pasting it into the iframe src (forcing it to refresh as if I just entered it into the address bar for that iframe window). Still no luck!
Anyone know why this is happening, or have any suggestions to try to get around this?
Edit: I cannot give a link to this because the site requires a password and login credentials to both our site and our vendor's site. Even though I could make a test account on our site, it would not do any good for the testing process because I cannot do the same for the vendor site. As for the code, all it's doing is creating the src from the backend code on page load and setting the src attribute from the back end...
//Backend code to set src
mainIframe.Attributes["src"] = srcWeJustCreated;
//Front end iframe code
<iframe id="mainIframe" runat="server" />
Edit: Problem was never solved. Answer auto accepted because the bounty expired. I will re-ask this question with more info and a link to the page when our site is closer to going live.
Thanks,
Matt
By the default security settings in IE query parameters are blocked in Iframes. On the security tab under internet options set your security level to low. If this fixes your problem then you know that is your issue. If the site is for external customers then expecting them to turn down their security settings is probably unreasonable, so you may have to find a work around.
Let's say your site is www.acme.com and the iframe source is at www.myvendor.com.
IIRC, most domain-level security settings don't care about the hostname, so add a DNS CNAME to your zone file for myvendor.acme.com, pointed back to www.myvendor.com. Then, in your IFRAME, set the source using your hostname alias.
Another solution might be to have your Javascript set the src to a redirector script on your own server (and, thus, within your domain). Your script would then simply redirect the IFRAME to the "correct" URL with the same parameters.
If it suits you, you can communicate between sites with fragment identifiers. You can find an article here: http://tagneto.blogspot.com/2006/06/cross-domain-frame-communication-with.html
What BYK said. I think what's happening is you are GETting a URL that is too large for IE to handle. I notice you are trying to send variable named src, which is probably very long, over 4k. I ran into this problem before, and this was my code. Notice the comment about IE. Also notice it causes a problem with Firefox then, which is addressed in another comment.
var autoSaveFrame = window.frames['autosave'];
// try to create a temp form object to submit via post, as sending the browser to a very very long URL causes problems for the server and in IE with GET requests.
var host = document.location.host;
var protocol = document.location.protocol;
// Create a form
var f = autoSaveFrame.document.createElement("form");
// Add it to the document body
autoSaveFrame.document.body.appendChild(f);
// Add action and method attributes
f.action = protocol + '//' + host + "/autosave.php"; // firefox requires a COMPLETE url for some reason! Less a cryptic error results!
f.method = "POST"
var postInput = autoSaveFrame.document.createElement('input');
postInput.type = 'text'
postInput.name = 'post';
postInput.value = post;
f.appendChild(postInput);
//alert(f.elements['post'].value.length);
// Call the form's submit method
f.submit();
Based on Mike's answer, the easiest solution in your case would be to use "parameter hiding" to convert all GET parameters into a single URL.
The most scalable way would be for each 'folder' in the URL to consist of the parameter, then a comma, then the value. For example you would use these URLs in your app:
http://example.com/app/param,value/otherparam,othervalue
http://example.com/app/param,value/thirdparam,value3
Which would be the equivalent of these:
http://example.com/app?param=value&otherparam=othervalue
http://example.com/app?param=value&thirdparam=value3
This is pretty easy on Apache with .htaccess, but it looks like you're using IIS so I'll leave it up to you to research the exact implementation.
EDIT: just came back to this and realised it wouldn't be possible for you to implement the above on a different domain if you don't own it :p However, you can do it server-side like this:
Set up the above parameter-hiding on your own server as a special script (might not be necessary if IE doesn't mind GET from the same server).
In Javascript, build the static-looking URL from the various parameters.
Have the script on your server use the parameters and read the external URL and output it, i.e. get the content server-side. This question may help you with that.
So your iframe URL would be:
http://yoursite.com/app/param,value/otherparam,othervalue
And that page would read and display the URL:
http://externalsite.com/app?param=value&otherparam=othervalue
Try using an indirect method. Create a FORM. Set its action parameter to the base url you want to navigate. Set its method to POST. Set its target to your iframe and then create the necessary parameters as hidden inputs. Finally, submit the form. It should work since it works with POST.
I just discovered a rather peculiar issue in IE8 for a HTTPS link. Every time the page tries to access the HTTPS link, it produces an error. This happens only in IE8 and nothing else. Any idea what's going on? I found some items that said that means the files were not loaded, hence the issue and tried some fixes recommended, but they haven't worked so far. This is a .NET site by the way.
https://www.beckshoes.com/cart/cart.aspx
Message: 'Sys' is undefined
Line: 70
Char: 1
Code: 0
URI: https://www.beckshoes.com/cart/cart.aspx
Message: 'Sys' is undefined
Line: 319
Char: 1
Code: 0
URI: https://www.beckshoes.com/cart/cart.aspx
Looks to be a JavaScript error. Firefox handles it fine, but Sys is undefined in IE8 so I'm guessing that the part where it normally gets defined is missing in IE?
Use View > Source.
Is the <script src="..."> coming from the https server as well, or is it coming from http? IE8 may not be loading the script because it isn't coming from the same secure source the rest of the page is coming from. Take the <script src="..."> (if it does not include the protocol and server, use the same one the page is coming from) and paste it into the address bar of a new tab, does the script load/download?
Is the <script src="..."> tag that loads the appropriate library even listed in the source? Maybe it isn't being added because ASP.NET doesn't recognize the User Agent and doesn't think it is capable or something.
Sys is part of the Microsoft script library, and is loaded through the WebResource.axd file.
Your page seems to be mostly working in IE 8 for me - have you fixed it?
I see that you're loading the WebResource at the foot of the page - if the calls to initialise are happening before the script is loaded, then that will be what is causing it - have you deliberately moved this to the footer? The scriptmanager has a property to do this correctly: LoadScriptsBeforeUI. Setting this to false will move those scripts that can be moved down to the bottom of the page.
I notice you've also pushed the viewstate down there, so you're clearly doing som post processing of the html.
The only other thing I can think of is that looking at your page, you've got an IFrame holding the brand rotator, that is requesting it's content from http://www.beckshoes.com/brands.aspx
You really want that under https as well - that's probably not helping.