I'm using permalinks in WP as: domain.com/category/post_name
The issue is that post names have non-latin characters such as chinese, hebrew, and arabic. So it encodes them to something like: %20%18%6b%20 therefore it counts every symbol's character as an actual character, ending up with 3x times more length that it truncates some very short slugs.
How to fix that? Or at least how to extend the length limit at least? I've tried to extend the length of the database field "post_name" from 200 to 500, But it's still truncating short.
You can change post_name by appling the filters for sanitize_title...
Short example:
add_filter('sanitize_title', 'sanitize_title_custom', 10, 3);
function sanitize_title_custom($title, $raw_title, $context){
// do some proccesing with title or raw_title
// assign new result to $title ($title = str_replace(" ","-", $raw_title);// as example )
return $title;
}
but, be careful... bad sanitizing can be security risk... sql injections etc...
Wordpress should not be encoding your post slugs like that. I use utf8 characters in titles and slugs all the time for clients. It works fine.
Are you sure that your database table's charset is utf8? If so, has it been overridden for any of the columns? Also check wp-config.php for define('DB_CHARSET', 'utf8');
I would also disable any plugins and test your permalinks again. Maybe one of your plugins is screwing with your post slugs.
This is a common situation that frustrates me too.
Check this plugin. Clean Trunks
You can set the the maximum url length on plugin configuration page.
(Default: 35 characters.)
Also you have the option to edit stopword list supplied.
Hope this will help. Cheers !!!
For non English URL:
I'm using IIS with fcgi and I found the solution for non english slug in different places on the web:
for Hebrew: here
for more info about IIS7 URL Rewrite and Symbols : here
generally except the IIS configuration for pretty url you need to add to the end of the wp-config.php:
if (isset($_SERVER['UNENCODED_URL']))
$_SERVER['REQUEST_URI'] = $_SERVER['UNENCODED_URL'];
here you will find more about UNENCODED_URL
Why not remove these bad characters when creating/saving the permalink?
Related
i want to remove this character after the domain name https://bolorims.com/?v=bf7410a9ee72 so it should be only https://bolorims.com/ in wordpress
I have try the to permalink but is not working
The permalink doesn't work if you already select the "post name" under the "common settings" and save and i'ts doesn't work check the plugin or custom code you may need to disable it thats all i remember just ask question if you don't understand or it doesn't
Sorry my english is bad because its my 3th language
Python program to remove all the
characters other than alphabets
Function to remove special characters
and store it in another variable
def removeSpecialCharacter(s):
t = ""
for i in s:
if(ord(i) in range(97,123) or ord(i) in range(65,91)):
t+=i
print(t)
s = "$Gee*k;s..fo, r'Ge^eks?"
removeSpecialCharacter(s)
I wrote an RSS2 feed on WordPress a while back, but for some reason, some of the URLs aren't working anymore. The current version of WP is 4.7.2.
For example, https://justhoodsbyawdis.com/product/jh001/feed/ works, but https://justhoodsbyawdis.com/brands/feed/ does not.
Note that https://justhoodsbyawdis.com/product/jh001/ is a valid page on the site, but that https://justhoodsbyawdis.com/brands/ is not, because it is only valid for feeds. The latter results in an "ERROR: This is not a valid feed." message.
Is there a way to make an URL for a RSS2 feed, even without an associated WP page (i.e. without the "/feed/" at the end).
Thanks!
Rob
EDIT 1:
I added a post called "brands", which fixed the problem. The only thing is that the dummy post is viewable by anyone. Any ideas how to block it, but not the feed?
Another problem is that query strings break the feed, for instance:
https://justhoodsbyawdis.com/products/feed/?name=hoodies
doesn't work, although it does without the "?name=hoodies".
How would I make that work?
EDIT 2:
It would appear that the name query string parameter is now causing problems - see:
https://codex.wordpress.org/Function_Reference/register_taxonomy#Reserved_Terms
Is there a way to make it backwards compatible? Otherwise, the existing app that calls the feed will also have to be changed...
I wound up creating dummy pages to fix the invalid feed error.
I had to change the "name" query string parameter to "prod_name" so as to not conflict with reserved terms.
Rob
I got lots of not followed page on Google Webmaster. I check them and is because lots of url are like http://www.mysite.net/2013/06/burn-notice-7%C3%9702-sub-espanol-online.html
whe the correct url have to be http://www.mysite.net/2013/06/burn-notice-7x02-sub-espanol-online.html
Im try to post a title wit many "x" on it and the only that weird %C3%97 when I post for example a new serie episode like this title: Burn Notice 7x02 Sub Español Online. When the x is between number appear %C3%97 and that made my posts duplicate.
So I try to fix changed the database collation from latin1_swedish_ci to utf8_general_ci but is still the same happend. I check as well my wp-config.php and is define('DB_CHARSET', 'utf8');
Please, some body know any good solution to fix all this situation? The database is quite big and supouse if I find a solution I need update the old url.
Thank you on advance
The URL you say Google is using:
http://www.mysite.net/2013/06/burn-notice-7%C3%9702-sub-espanol-online.html
is almost the same as the URL:
http://www.mysite.net/2013/06/burn-notice-7x02-sub-espanol-online.html
as the percent encoded characters actually repreesent Unicode Character 'MULTIPLICATION SIGN' aka it's an '×' not an 'x'. Google is just using the percent encoded version to be safe. That means that your database is probably fine, as it is showing URLs as valid UTF8.
The problem probably lies in how you're interpreting the requested URL and trying to match it to the database. PHP should already be decoding the percent encoded value to '×', so either:
Something is breaking the string (e.g. calling a non-multibyte safe function like strtolower() instead of mb_strtolower()).
Your PHP code is connecting to the database in a character set other than UTF8, please check that your my.cnf file contains 'default-character-set=utf8' in the client section.
or there's some other issue. The URL does appear valid though.
Couldn't find an answer to this and thought it might be a quick answer.
My company, a local news site, is working on migrating to WordPress from a proprietary CMS. Part of the challenge is we are restructuring URLs. I will be utilizing 301 redirects but my issue is as follows:
Example Page name: Story Name: is "this"
Example Old CMS Page URL: /story-name--is--this-/
New CMS Page URL: /news/2012/09/12/story-name-is-this/
The old CMS turned special characters and spaces into hyphens. WordPress will be configured to instead ignore special characters and simply turn spaces into hyphens. Additionally, the old CMS did not include the date in the URL, and I'm not sure the best route to take regarding adding the date.
Thanks!
You're either going to have to write a script that takes all of your old links, does a lookup in your database to transform it into the new link, and redirect the browser to the new link. Or you'll have to enumerate the entire mapping of old links -> new links and create a 301 redirect for each of them (in either your vhost/server config or in an htaccess file):
Redirect 301 /story-name--is--this-/ /news/2012/09/12/story-name-is-this/
It's not clear what is your real question? I am also not sure what Regular expressions have to do with the problem.
There is no information about what your old CMS is capable of, assuming that you can intercept the calls to old articles when they are accessed via the browser, but before they are rendered you can form and send the redirect back to the browser dynamically generating the url using the programming mechanisms available in your proprietary CMS.
Again, assuming you have access to Java:
A. When generating the redirect URL you can access the article's date and form the
2012/09/12 from the date, you can use SimpleDateFormatter to format Dates into a string representation like YYYY/MM/DD.
B. You can use similar approach with the titles and replace the list of special characters in the title string with empty spaces. For example Apache StringUtils library can let you specify a set of characters to look for and if any are found they will be replaced with the target character.
C. You concatenate the output of A and B to create the target redirect URL and send it back to the browser instead of the article itself.
-I'm using a number of WordPress rewrite rules to allow for the injection of country-codes immediately at the beginning of the URL path, which are used to determine a timezone offset. An example:
add_rewrite_rule('^([A-Za-z]{2})/days/([0-9]+)/?$', 'index.php?geo=$matches[1]&m=$matches[2]&post_type=days','top');
This takes a request like www.daysoftheyear.com/days/2011/ (which would usually return all valid content for this request) and allows for, e.g., www.daysoftheyear.com/us/days/2011/ to return the same content but with support for a timezone offset based on the country-code.
This works fine in almost all places, with the exception of a single query type - one for 'days' custom post type pages, e.g., http://www.daysoftheyear.com/days/waffle-day/.
The rules I have in place are:
add_rewrite_rule('^([A-Za-z]{2})/?$', 'index.php?geo=$matches[1]','top');
add_rewrite_rule('^([A-Za-z]{2})/days/([0-9]+)/?$', 'index.php?geo=$matches[1]&m=$matches[2]&post_type=days','top');
add_rewrite_rule('^([A-Za-z]{2})/days/([0-9]+)/([0-9]+)/?$', 'index.php?geo=$matches[1]&m=$matches[2]$matches[3]&post_type=days','top');
add_rewrite_rule('^([A-Za-z]{2})/days/([0-9]+)/([0-9]+)/([0-9]+)/?$', 'index.php?geo=$matches[1]&m=$matches[2]$matches[3]$matches[4]&post_type=days','top');
add_rewrite_rule('^([A-Za-z]{2})/days/([A-Za-z\-].*)/?$', 'index.php?geo=$matches[1]&page=$matches[2]','top');
add_rewrite_rule('^([A-Za-z]{2})/([A-Za-z\-].*)/?$', 'index.php?geo=$matches[1]&pagename=$matches[2]','top');
The fifth rule shoud match http://www.daysoftheyear.com/gb/days/waffle-day/ in much the same way as above, but redirects - I suspect that it's confliucting with the inbuilt rules which attempt to redirect to a correct URL if it's malformed (e.g., if I type a close structural match to a correct URL, it'll redirect me to the correct resource).
I can confirm that the 'raw' URL for this request works - e.g., http://www.daysoftheyear.com/index.php?geo=en&name=soup-month&post_type=days returns a valid and expected result.
I'm not convinced this is a regex rule, rather than a specific challenge with the way WP manages custom post types?
EDIT
Updated to allow for hyphens - no change in behaviour, though regexpal reports that the regex works against the example URL.
Updated after disabling WP canonical redirects functionality - now 404'ing rather than 301'ing to the page.
Updated to use 'page' rather than 'pagename', based on the information here: http://codex.wordpress.org/Class_Reference/WP_Query#Post_.26_Page_Parameters - no change in behaviour.
Updated the code, added a linebreak and clarified that I'm actually referencing line 5, rather than line 4.
This request http://www.daysoftheyear.com/days/waffle-day/ won't match your fourth rule since you didn't allow - inside the group cature : ([A-Za-z].*). Replace this group with ([A-Za-z\-].*) and it should match.
HTH
Resolved; it appears that the above ruleset now works correctly - thanks all!