Wordpress/Apache rewrite/redirect rule and regex - wordpress

I need to do some apache rewrite/redirect rules to external webservice in case of 404 error for specific file extensions: .jpg, .png, etc. Wordpress is used here.
So, if 404 occurs at:
https://test.com/folder/subfolder/year/month/filename.jpg
I want to redirect it to:
https://test1.com/folder/subfolder/year/month/filename.jpg (external webservice, not the same phisical server)
I've tried such a configuration in htaccess, didn't work as expected:
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*) test1.com/folder/subfolder//$year$\/$month$\/([^\s]+(\.(?i)(png | jpg | gif | svg))$)/
Do you have any ideas how to do it right way?
Any suggestions appreciated.

With your shown samples, attempts; please try following htaccess rules file. These rules are written as per shown domain names which are samples/tests, so you need to change values as per your actual values when you use them in your system. We also need to make sure that both (test.com and test1.com) are sharing same directory structure in your actual apache server.
Also make sure to clear your browser cache before testing your URLs.
RewriteEngine ON
RewriteCond %{HTTP_HOST} ^(?:www\.)?test\.com$ [NC]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)/?$ https://test1.com/$1 [R=301,L]

To "redirect" URLs of the form /folder/subfolder/<year>/<month>/<file>.<png|jpg|gif|svg> where /folder/subfolder/ is static and the other elements are variable and which do not exist on the filesystem you would need to do something like the following before the WordPress code block. ie. before the # BEGIN WordPress section.
# Redirect certain non-existent image files to another server
RewriteRule ^folder/subfolder/\d{4}/\d\d/[\w-]\.(png|jpg|gif|svg)$ https://test1.com/$0 [R=302,L]
# BEGIN WordPress
:
The <year> is a 4-digit number and <month> is a 2-digit number. The filename can consist of the characters 0-9, a-z, A-Z, _ (underscore) and - (hyphen).
This should presumably be a 302 (temporary) redirect, not a 301 (permanent), otherwise if the resource should become available at the source domain then it won't be accessible to those users who have visited the URL before (they will be redirected from cache).
To avoid the external redirect it may be preferable to "proxy" the request to the other domain. (This is invisible to the end user.) Although this potentially involves additional configuration server-side, as you would need to configure the source server as a "reverse proxy". You can then replace the R=302 flag in the above rule with P (proxy).

Related

Redirecting URL with paramter to top-level domain

I have been trying to search but no luck yet. Found different options about redirecting but nothing like what I'm looking for.
So currently the website URLs contain lang parameter like ?lang=en, ?lang=ru, ?lang=fi. Parameters are on the very end of the URL.
Idea is to move languages to top level domain. So basically I'm looking for a way to redirect all URLs that contain parameter ?lang=ru to top-level .ru domain. Same with other languages.
Can I do it via .htaccess or it shouldn't be done at all? Moving site to different domains should need redirection to pass the link juice and authority to new domains.
Hopefully, someone can lead me to the correct way of doing it.
Much appreciated!
If there are no other URL parameters that need to be preserved then you can do it like the following using mod_rewrite near the top of your .htaccess file:
RewriteEngine On
# Redirect "/foo?lang=xx" to "example.xx/foo"
RewriteCond %{QUERY_STRING} ^lang=([a-z]{2})$
RewriteRule ^ https://example.%1%{REQUEST_URI} [QSD,R=302,L]
The above matches any language code that consists of 2 lowercase letters. To match specific language codes then use alternation in the regex and change the RewriteCond directive to read:
RewriteCond %{QUERY_STRING} ^lang=(en|ru|fi|abc|etc)$
The QSD flag discards the original lang=xx query string from the redirect response (Apache 2.4). Otherwise this will be copied onto the target URL by default.
The %1 backreference contains the value of the lang URL parameter captured in the preceding condition. The REQUEST_URI server variable contains the full URL-path (no query string) from the request.
This does assume that the language specific TLD domains are hosted elsewhere. In other words, we do not need to check whether we are already at the required TLD domain.
Test with a 302 (temporary) redirect and only change to a 301 (permanent) redirect - if that is the intention - once you have confirmed that this works OK.
UPDATE: Any specific redirects, eg. lang=en to .com will need to appear first. For example:
# Redirect languages to .com
RewriteCond %{QUERY_STRING} ^lang=(en)$
RewriteRule ^ https://example.com%{REQUEST_URI} [QSD,R=302,L]
# All other language codes...
# Redirect "/foo?lang=xx" to "example.xx/foo"
RewriteCond %{QUERY_STRING} ^lang=([a-z]{2})$
RewriteRule ^ https://example.%1%{REQUEST_URI} [QSD,R=302,L]
Use alternation (as mentioned above) if there are more language codes. eg. (en|gb).
You should be able to do this via .hataccess. You can use the code below in your .htaccess
RewriteEngine on
RewriteCond %{REQUEST_URI} lang=en
RewriteRule ^ https//wwww.yourwebsite.ext [L,R]
I think it should work fine if you write same for all the others. DO give it a try and let me know.

htaccess restrict access to ALL pages but referrer

I managed to restrict access to my site using the .htaccess directives below. It works pretty well BUT I found that people other than referrer success to access direct page like https://example.com/**pages**/ and from there can go back to home. How can I restrict to all site but the referrer (so all tree from my root URL).
# Serve everyone from specific-domain (and internal requests)
RewriteCond %{HTTP_REFERER} ^https?://www\.your-domain\.com/ [OR]
RewriteCond %{HTTP_REFERER} ^https?://www\.specific-domain\.com/
RewriteRule ^ - [L]
# everybody else receives a forbidden
RewriteRule ^ - [F]
From discussion on your other question, it seems you have been putting these directives in the wrong place. It is a WordPress site and the directives have been placed after the WordPress front-controller, ie. after the # BEGIN WordPress ... # END WordPress code block.
This is actually a very common mistake. But order matters.
By placing them at the end of the file they are simply never going to be processed for requests to example.com/<wordpress-url>, because the request has already been routed to the WordPress front-controller (index.php).
These blocking directives need to go at the very top of the .htaccess file. Importantly they must go before the # BEGIN WordPress section.
You should NOT place these directives inside the WordPress code block since WordPress maintains this section and will likely overwrite any custom directives you place here.
You do not need to repeat the RewriteEngine On directive (which appears later in the file - the order of this directive does not matter). In fact, if there are multiple RewriteEngine directives then the last directive wins and controls the entire file/context.
UPDATE#1:
is there a way to exclude a single page from the directives so that this page can still be available even from non referrer - it would be a login page
Yes, you can add an additional condition to the first block that checks for this URL. For example:
# Serve everyone from specific-domain (and internal requests)
RewriteCond %{REQUEST_URI] ^/login$ [OR]
RewriteCond %{HTTP_REFERER} ^https?://www\.your-domain\.com/ [OR]
RewriteCond %{HTTP_REFERER} ^https?://www\.specific-domain\.com/
RewriteRule ^ - [L]
# Everybody else receives a forbidden
RewriteRule ^ - [F]
UPDATE#2:
However, since this is a WordPress site, you still need processing to continue to the front-controller (the # BEGIN WordPress section later in the file) in order to route the URLs. This would explain why you are seeing 404s for /<page> and other WordPress URLs despite the Referer presumably being set correctly.
To resolve this, change the [L] flag in the first RewriteRule to [S=1] (skip 1 rule), so instead of stopping further processing (the effect of the L / last flag), it simply skips the following rule that blocks access for everyone else. And continues on to the WordPress front-controller.
For example:
:
RewriteRule ^ - [S=1]
# Everybody else receives a forbidden
RewriteRule ^ - [F]
Alternatively, you could reverse the logic...
# Block everyone from "other" domains except for specific URLs
RewriteCond %{REQUEST_URI] !^/login$
RewriteCond %{HTTP_REFERER} !^https?://www\.your-domain\.com/
RewriteCond %{HTTP_REFERER} !^https?://www\.specific-domain\.com/
RewriteRule ^ - [F]
# BEGIN WordPress
:

mod rewrite exploding %{HTTP_HOST}

I have a situation I've not come across before that calls for some interesting mod_rewrite rules and I cant find any examples of someone trying to achieve the same thing in a similar configuration.
Currently I have two domain names which are configured to share the same document root, in said document root is a dynamic php application which, based on the incoming hostname, displays content specific to the that domain.
The domains for example purposes are:
www.example1.com
and
www.example2.co.uk
(one being a TLD the other not)
In addition to this application there are two wordpress installations one for each of the two domain names. As we are not using wordpress MU here I need some fancy rewrites to firstly hide the wordrpess folder, and secondly present the request to the correct folders based on the HTTP_HOST.
Currently I have the following:
RewriteRule ^wp-content(.*) wordpress/example1$1 [L]
RewriteRule ^wp-admin(.*) wordpress/example1/wp-admin$1 [L]
RewriteRule ^wp-login.php$ wordpress/example1/wp-login.php [L,R=301]
And similar rules for content specific pages.
This works well for the single wordpress installation, but obviously not for the second, what I was hoping to do here was something like the following:
RewriteRule ^wp-admin(.*) wordpress/${HTTP_HOST}/wp-admin$1 [L]
However I need to remove the www. and .com from the ${HTTP_HOST} variable (or the www. and .co.uk )
Any suggestions on a way to achieve this or a better approach would be appreciated.
You can use RewriteCond to check for a pattern in HTTP_HOST and then capture part of that pattern.
For instance:
RewriteCond %{HTTP_HOST} ^(?:www\.)?([a-zA-Z0-9_-]+)\.(?:com|co\.uk)$
RewriteRule ^wp-admin(.*) wordpress/%1/wp-admin$1 [L]
The RewriteCond directive above checks to see whether HTTP_HOST fits a domain pattern ending ".com" or ".co.uk" and optionally beginning with "www.". If it does, it captures the interesting part of the domain name.
Then the RewriteRule (which only fires if the RewriteCond does match) is able to refer to the captured part of the RewriteCond pattern by using the %1 back-reference.
The pattern I've used in the RewriteCond above might not suit your needs perfectly, but once you know you can use a back-reference to a pattern captured by RewriteCond, it should be easy for you to use this to get the effect you need.

Redirecting subdomains to wordpress content

I have one single Wordpress application on domain www.mywpsite.com.
I have static pages:
a. www.mywpsite.com/subsite
b. www.mywpsite.com/subsite/child1
c. www.mywpsite.com/subsite/child2
and need that content to be respectively addressed using a subdomain like this:
a. subsite.mywpsite.com
b. subsite.mywpsite.com/child1
c. subsite.mywpsite.com/child2
Note that there is no physical structure for the subdomain, it is just a way of aliasing, and there is only one Wordpress installation under one document root. Also, I have a few pages and just one subdomain to work with, so a 'case by case' solution is valid.
Eventually I will also use the same subdomain for some dynamic content:
d. www.mywpsite.com/category/subsitenews => subsite.mywpsite.com/news
Finally, if possible, I need that the url retained in the address bar is the one using the subdomain, I mean:
The user types subsite.mywpsite.com/some-static-or-dynamic-content.
Redirection is performed to the appropriate wordpress content, and content is delivered.
The url in the user is still the same using subdomain, instead of the redirected url.
How can I do this?
I think I must use .htaccess but I have no idea how it works. I'm not sure if other questions I have found have to do with my problem.
redirect subdomain and retain url structure
.htaccess redirect ~ [fake-subdomain.domain.com/*] to [domain.com/*]
Thank you very much.
If the subdomain's aren't on the same server and under the same document root as the main domain, then you'll need to use a reverse proxy, or the P flag using mod_rewrite. You can try adding these rules to the htaccess file in your document root, preferably above any rules that may already be there:
RewriteEngine
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^/?([^/]+)/([^/]+)/?$ http://$1.mywpsite.com/$2 [L,P]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^/?([^/]+)/?$ http://$1.mywpsite.com/ [L,P]
The problem here is that dynamic content, ones that wordpress dynamically generates, will also be sent to the subdomains. There is no way to tell on the htaccess level what's valid dynamic content and what isn't. In order to check for that, you'll need to do it from within the wordpress CMS, probably some way of doing that but it'll require custom code or at least a plugin and possible customizing the plugin.

Hide Page Extensions (Like StackOverflow)

I want to hide page extensions like stackoverflow does. How does the following work?
http://stackoverflow.com/tags/foo
http://stackoverflow.com/tags/bar
I've seen a lot of sites that do this, but I still don't know how this is accomplished (I have a LAMP stack).
When a web server gets a request for a URL, it has to decide how to handle it. The classic method was to map the head of the URL to a directory in the file system, then let the rest of the URL navigate to a file in the filesystem. As a result, URLs had file extensions.
But there's no need to do it that way, and most new web frameworks don't. They let the programmer define how to map a URL to code to run, so there's no need for file extensions, because there is no single file providing the response.
In your example, there isn't a "tags" directory containing files "foo" and "bar". The "tags" URL is mapped to code that uses the rest of the URL ("foo" or "bar") as a parameter in a query against the database of tag data.
What you want is clean URLS and you can do it with apache and .htaccess . There may be a better way, but here's how I have been doing it:
http://evolt.org/Making_clean_URLs_with_Apache_and_PHP
That's the beauty and the work of ASP.NET MVC.
No "hiding" - it's just the way ASP.NET MVC handles URL's and maps those "routes" to controller actions on your controller classes.
Quite a big step away from the "classic" ASP.NET Webforms way of doing things.
There are a couple of ways to do it under Apache+PHP, but the essential principle is to make a set of URIs (perhaps all URIs, depending on your site, but you may want different scripts to handle different portions of the site) translate to a single PHP file, which is told what object the user has requested.
The conceptually simplest way is to rewrite every URL to a script, which gets the URI through $_SERVER['REQUEST_URI'] and interprets it as it likes.
The URI rewriting can be done with various methods including mod_rewrite, mod_alias and ErrorDocument (see Apache docs).
Another way is to set up more complex URL rewriting (probably using mod_rewrite) to add the path as a GET variable.
There is also the $_SERVER['PATH_INFO'] variable which is loaded with the non-existent portion of the path. This option requires little or no modification to Apache config files, but reduces the flexibility of your URLs a little.
Modern web development frameworks have support for elegant urls. Check out Django or Ruby on Rails.
If you're using Apache and you simply want to hide the file extensions of static HTML files you can use this .htaccess code:
<IfModule mod_rewrite.c>
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-f # if the requested URL is not a file that exists
RewriteCond %{REQUEST_FILENAME} !-d # and it isn't a directory that exists either
RewriteCond %{REQUEST_FILENAME}\.html -f # but when you put ".html" on the end it is a file that exists
RewriteRule ^(.+)$ $1\.html [QSA] # then serve that file
</IfModule>
Apache mod_rewrite has been called "voodoo, but seriously cool voodoo".
The actual .htaccess code I use on a few sites is like that, but not identical:
<IfModule mod_rewrite.c>
RewriteEngine on
#RewriteRule ^$ index.php [QSA]
RewriteCond %{REQUEST_FILENAME}\.php -f
RewriteRule ^(.+)$ $1\.php [QSA]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.+)$ index.php/$1 [QSA]
</IfModule>
And here is some much longer but far more readable code to do the same thing on a Zeus server. On Zeus, it's called rewrite.script.
# http://drupal.org/node/46508
# get the document root
map path into SCRATCH:DOCROOT from /
# initialize our variables
set SCRATCH:ORIG_URL = %{URL}
set SCRATCH:REQUEST_URI = %{URL}
match URL into $ with ^(.*)\?(.*)$
if matched then
set SCRATCH:REQUEST_URI = $1
set SCRATCH:QUERY_STRING = $2
endif
# prepare to search for file, rewrite if its not found
set SCRATCH:REQUEST_FILENAME = %{SCRATCH:DOCROOT}
set SCRATCH:REQUEST_FILENAME . %{SCRATCH:REQUEST_URI}
# check to see if the file requested is an actual file or
# a directory with possibly an index. don't rewrite if so
look for file at %{SCRATCH:REQUEST_FILENAME}
if not exists then
look for dir at %{SCRATCH:REQUEST_FILENAME}
if not exists then
look for file at %{SCRATCH:REQUEST_FILENAME}.php
if exists then
set URL = %{SCRATCH:REQUEST_URI}.php?%{SCRATCH:QUERY_STRING}
else
set URL = /index.php/%{SCRATCH:REQUEST_URI}?%{SCRATCH:QUERY_STRING}
endif
endif
endif
goto END

Resources