Google Indexing Non-Existent URLs. WordPress Doesn't Show 404 - wordpress

I was inspecting the Google search results for: "site:mywordpress.org." And found lots or pages indexed that shouldn't exist.
There are two problems here:
I don't know how Google located, crawled, or found these URLs.
Wordpress doesn't show a 404 error, so it looks like duplicate content.
I tried the Wordpress support forums, but no one responded. I also have not been able to find anyone reporting this problem. Here's an example of what I am seeing:
mywordpress.org/blog-post/
mywordpress.org/blog-post/1363035032000/
I've added a canonical link reference to the head and I've been doing lots of Google WMT removal requests, but I'm still seeing some results like this.
I've tested this on a few wordpress installs, it seems that if you add any string of numbers to the end of a permalink it will still display the content rather than showing a 404 error.
I also noticed that the number that is being added to the permalinks is the UNIX time stamp with a few zeros on the end. As of this post the current UNIX time stamp is: 1363035971.
I'm looking for some advice on what I should do. I'm particularly interested in a PHP function that would check the url to see if there was a string of numbers at the end, and if there was, 301 redirect it to the right permalink. I'd also value any input on why Google is finding these wrong urls and if the UNIX time stamp is the clue.

Did you check if some plugin is causing this? Also check your Permalink Settings under Settings > Permalinks
Until you locate the source of your problem, you could try to get rid of it by using Redirect plugin.
This plugin has many features, two features important to your case are:
All URLs can be redirected, not just ones that don't exist
Full regular expression support
So with the help of regular expression, you would probably be able to redirect URL with numbers to a correct URL.

I had the same issue and found the solution to this issue.
Just add this to functions.php
add_action( 'template_redirect', 'so16179138_template_redirect', 0 );
function so16179138_template_redirect()
{
if( is_singular() )
{
global $post, $page;
$num_pages = substr_count( $post->post_content, '<!--nextpage-->' ) + 1;
if( $page > $num_pages ){
include( get_template_directory() . '/404.php' );
exit;
}
}
}

Related

How to add #anchor at the end of the Wordpress URL if the traffic source is coming from Google search

I have Wordpress and I would like that all visitors who access my site by searching in google (traffic source), go directly to a specific part of the page (focus).
To focus on this part, I need to include the anchor at the end of the url: #welcome-source-google
https://example.com/category/page-or-post/#welcome-source-google
I think I need to insert this code, probably in functions.php:
function organic_source_anchor() {
if (preg_match('/(www\\.)?google\\./', $_SERVER['HTTP_REFERER'])) {
wp_redirect( get_permalink( $postID ) . '#welcome-source-google' );
exit;
}
add_action( 'template_redirect', 'organic_source_anchor' );
I got this code by researching, but since I am not a programmer, I believe there are several errors or maybe I need to put the code out of functions.
A problem that happens is that I didn’t want to disable the cache, as this frame appears to all visitors, but if the source is from google, I wanted the focus on the part this page.

Wordpress: Incomplete URL redirects to actual page

Strange behavior for a Wordpress site. When I enter an incomplete URL, instead of getting a 404, I'm getting redirected to a page whose URL starts with the incomplete entry.
For example, when I enter this in my browser:
http://www.launchmoxie.com/jv/timeless
I'm redirected to:
http://www.launchmoxie.com/jv/timelessrhythm/timeless-rhythm-optin-confirmation/
There are several pages that begin with the initial URL, but I'd prefer that the user be given a 404, or I'd be okay with being able to set which of the pages gets served.
This behavior occurs for other pages with similar structures.
I'm pretty mystified. Any help/suggestions would be appreciated.
This is standard WordPress behaviour, and is part of the URL canonicalisation process—it's in redirect_canonical. There's a ticket to make just this auto-completion bit override-able, but it's not made it into a release yet.
In the meantime, there's a workaround suggested in that ticket:
function remove_redirect_guess_404_permalink( $redirect_url ) {
if ( is_404() )
return false;
return $redirect_url;
}
add_filter( 'redirect_canonical', 'remove_redirect_guess_404_permalink' );
...which a helpful soul has also made into a plugin.

Can I edit my .htaccess to write some WorldPress URL's (custom rewrites)?

So here's the problem: We don't like the fact that WordPress doesn't allow duplicate slugs, even for sub categories meaning we cannot have urls like:
product-1/guides
product-1/articles
product-2/guides
product-2/articles
That's very annoying! One solution we are considering is setting up our slugs like this:
product-1/product-1-guides
product-1/product-1-articles
product-2/product-2-guides
product-2/product-2-articles
But in our htaccess - can we use it to pick up such urls and rewrite them as prettier urls which have the product name removed from the sub folder? We don't mind hard coding these as we'll only ever have 5-10 products on the site.
This would keep the WordPress install happy with unique slugs, but the SEO tick in the box with better looking urls.
I just need a hand with the syntax please?
EDIT 1:
After looking at the WordPress Rewrite API, I'm failing to get anywhere with what I think is a really simple test. I have the following code in my functions.php which is running as I tested an echo, but no rewriting is taking place?
add_action( 'init', 'productRewrites' );
function productRewrites() {
add_rewrite_rule('^wordpress/james?','index.php?author_name=jwilson','top');
}
Nothing happens when I hit:
mysite.com/wordpress/james
Edit 2:
Cool I realise I now have to click save each time. The problem I now have is the following does not work not when I use $matches[1] - it only works if I hard code the author_name value (to jwilson for example):
function productRewrites() {
add_rewrite_rule(
"writer/([^/]+)/?",
"index.php?author_name=$matches[1]",
"top");
}
When I use $matches[1] it just returns everything! So clearly isn't using ([^/]+) in the url?!
you have to reset permalink structure
in order to do that, move to Settings -> Permalinks and press Save changes button

How do I prevent Wordpress from stripping the "at" sign (#) from the URL query string?

I am trying to pass an email address to a wordpress page like so:
http://www.website.com/?email=fakeemail#yeahwho.com
However, Wordpress turns it into this:
http://www.website.com/?email=fakeemailyeahwho.com
I even try URL encoding it like so:
http://www.website.com/?email=fakeemail%40yeahwho.com
But Wordpress is too smart and still removes the %40.
I understand that # is a reserved character, but I should be able to still use the URL encoded version. Alas, Wordpress does not want it to be so.
How can I force Wordpress to respect the # sign? I'm guessing I'll either have to hack the internals, or do some mod_rewrite magic.
from http://www.webopius.com/content/137/using-custom-url-parameters-in-wordpress
First, add this to your theme's functions.php file (or make a custom plugin to do it):
add_filter('query_vars', 'parameter_queryvars' );
function parameter_queryvars( $qvars )
{
$qvars[] = 'email';
return $qvars;
}
Next, try passing ?email=fakeemail-AT-yeahwho.com in the URL and then converting it back with something like this:
global $wp_query;
if (isset($wp_query->query_vars['email']))
{
$getemail = str_replace( '-AT-', '#', $wp_query->query_vars['email']);
}
// now use $getemail
This would only not work in the very rare occurrence of an email that actually has "-at-" in it. You could replace for an even more obscure string like '-AT6574892654738-' if you are concerned about this.
Whatever your final solution, don't hack the core to get it to work. :)
I was having a similar problem and I was able to isolate the issue to an SEO plugin. I'm sure the plugin added a filter to the functions.php but as the plugin wasn't being used uninstalling the plugin also resolved the issue.
I also had this problem, but it wasn't caused by a plugin. It was a result of the 301 redirect that WordPress does with regard to your Site URL having, or not having, a www. in it.
If my Site URL was defined as http://www.mydomain.com, then this would work as expected: http://www.mydomain.com/?email=user#domain.com
If the user came to the site as: http://mydomain.com/?email=user#domain.com (NOTE: no www), then WordPress would redirect to this: http://www.mydomain.com/?email=userdomain.com (NOTE: the stripped # symbol)
My solution was to hard code the www redirect in the htaccess file, so WordPress would never have the opportunity to mess with my URL. This page gives example htaccess lines to redirect non www to www and vice versa: http://dense13.com/blog/2008/02/27/redirecting-non-www-to-www-with-htaccess/
I was having a similar problem today when trying to pass Mailchimp data through to a Gravity Form in Wordpress. I found a solution. The original question stated that Wordpress was also stripping %40, but it didn't for me in this instance.
1) In Mailchimp create a new Merge tag. I called mine 'Email Param' and * |EMAIL2| *
2) Export your list of subscribers
3) Copy the normal 'email' column content into the new 'Email Param' column.
4) Do a Find and Replace for all # symbols to %40
5) Import your list and tick the box that Auto-updates that list
6) Update your URL to include the new parameter
* |EMAIL2| *
That worked for me.

Wordpress Plugin to Generate non-numeric slug / permalink for posts without titles? (1 post)

I've been looking for this all over, and simply cannot find it.
I have a blog that has no titles in its blog posts, but I'd like, for various usability reasons, to have the permalinks use the first few words from entries that do not have titles as the permalink slug.
ie, if the post on sample.com/blog is
Title: (no title)
Content: Ten Easy Ways to Lose Weight
The permalink could be sample.com/blog/ten-easy-ways-to-lose-weight.
Are there any plugins that do this? For the life of me, I cannot find one. (xposted to WP support, but no one is responding)
You could enter in titles, and then not display them in your view template.
I doubt there's anything like this already built for wordpress. To get your blog to do this, you have to write a plugin that does the following:
Generates the slug while checking
for uniqueness should you ever start
more than one entry with the same words
Processes URL requests to recognize slug permalinks and then updates the query step to locate the correct post in the database. This might involve a new db table of slugs (which would also help with the uniqueness issue)
In short, WP is designed to retrieve almost everything by keys, and to support slugs like this you'd have to create a new key type.
btw: if anything is retrieved by IDs (keys), it is technically not a permalink. so, wordpress probably fails in providing true permalinks.
ps: it's not that difficult to write an handler/dispatcher that would parse URL and takle out the unique permalink and then match it to the DB by the string (not by the key!).
something like:
$url=$_SERVER["REQUEST_URI"];
echo 'URL called: ',$url,'<br />';
$dispatchfile=$dispatcher->Dispatch($url);
if ($dispatchfile)
{
echo 'launching ',$dispatchfile,' inclusion<br />';
require($dispatchfile);
}
else
{
echo 'dispatcher failed to find module, will check physical file<br />';
if (file_exists($url)) echo 'dispatcher found physical file<br />';
else echo 'nada, throw 404!';
}
You can get a permalink redirect plugin from
http://scott.yang.id.au/code/permalink-redirect/
Works fine with WP2.71
It takes the Title and auto-creates a slug from that so you would have to manually enter the slug you wanted for each page if you have a Blank Title.
You should be able to hack Scott's PHP file (it is one page only) to look up the page code and select a portion of it to use as a slug though.
In addition, I solve incorrect page requests using a .htaccess rewrite file to bring up the index page upon an incorrect page request.
Download a copy of my rewrite file here
https://oulixes.com/htaccess_example.zip
Unzip the txt file and rename as .htaccess and upload into your root directory
Hope this helps!
Cheers,
Billy

Resources