Rewriting with lighttpd - how to remove file extensions - http

I would like to use lighttpd's mod_rewrite to allow requests without a specific file extension. For instance, I would like the following mappings to automatically work:
Requesting for "/index" would serve "/index.php".
"/dir/file" => "/dir/file.php"
"/dir/file?args" => /dir/file.php?args"
Can this be easily done with a single rewrite rule for a given extension (e.g. ".php")?

Cassy and natbro got this very nearly right, but as user102008 commented, this erroneously rewrites any directory index. Adding a url.rewrite-once matching anything ending with a '/' seems to make it work.
url.rewrite-once = ( "^(.*)/$" => "$1/" )
url.rewrite-if-not-file = ( "^([^?]*)(\?.*)?$" => "$1.php$2" )

Without having tested it, but you can give it a shot:
url.rewrite-once = (
"^([^?]*)(\?.*)?$" => "$1.php$2",
)
Basically it means
take everything but a question mark
and, if exists, take the question mark and everything following
and you rewrite it to the first part, include the .php and add the last part again.
Again: I haven't tested it yet.

cassie's answer above is just about right. i would suggest dropping the trailing comma and using url-rewrite-if-not-file (available since 1.4.x lighttpd). this lets you serve other files that exist in the same directory without them getting rewritten.
url.rewrite-if-not-file = ( "^([^?]*)(\?.*)?$" => "$1.php$2" )

yes
^(.*).php $1 [L,R,NC,QSA]
that would be for .htaccess in a directory
^/(.*).php http://same.site/$1 [L,R,NC,QSA]
where your domain is 'same.site' because it needs to redirect for the URL to change (as opposed to proxy)

Related

Routing algorithm for Wordpress

I don't know the exact meaning of the question but someone asked me this question in interview. I just want to know that there's something like that, we use any route algorithm in Wordpress?
This could be a trick question because routing would mean mapping an HTTP request to trigger specific function or method that would handle the request which is not something that WordPress does (there is a section about WordPress at the bottom). In simple word, you read the HTTP request information to decide what function is going to be triggered.
Bit more details in simple words
if you are building a PHP project from scratch and want to display specific content or trigger a method/function there are usually two option (without routing)
Using POST , GET or REQUEST variables and complex conditional statements to achieve what you want, so a result URL could be something like this
http://example.com/index.php?view=pubications&per_page=5
Setting a PHP file for each type of content
http://example.com/publications.php?per_page=5
However, if you created a Router (Routing algorithm as you named it or routing system) then pushed all requests to index.php and have the latter include let's say something like this:
// Include the Router class
require('classes/router.php');
// Include functions responsible for display our content
require('view/display.php');
I'll not go into how to build a router, just giving examples assuming that you already have one just to give you an idea how routing works.
So assuming you have a router and function to display a contact form for example, you'd also include something like this:
Router::add('/contact-us', get_contact_form(),'get');
Router::add('/contact-us', handle_contact_form(),'post');
Then initialize the Router
Router::initialize('/');
Again assuming you have a complete Router, the above function would tell the index.php file to handle HTTP requests on this URL differently:
http://example.com/contact-us
If it's the default request type GET, trigger this function get_contact_form(), but if the request type is POST trigger this one handle_contact_form() which will act and display content differently depending on your needs.
That's great because it would be instead of something like
http://example.com/index.php?page=contact-us
index.php content would handle the request differently since there is no router.
// Include functions responsible for display our content
require('view/display.php');
if( isset($_GET['page']) && $_GET['page'] == 'contact-us'){
echo get_contact_form();
}
if( isset($_GET['page']) && $_GET['page'] == 'contact-us' && isset($_POST['contact_submit']) ){
echo handle_contact_form();
}
Imagine how long and ugly this would look like if you have a lot of pages and a complex site.
So back to WordPress
If you have a new installation you'd notice that the URLs looks something like this:
http://example.com/?p=62
http://example.com/?cat=1
http://example.com/?author=3
So it would just take URL parameters then build a WP_Query based on that, if is p then look for posts in database by ID, if cat then look for categories by ID and so on... (that's the simple explanation, there is a lot going on of course in the back-end, but just to give an idea).
You might notice after changing permalink structure that the above examples would now look something like this:
http://example.com/post-slug
http://example.com/author/name
http://example.com/category/uncategorized
This might look like routing, but it isn't, let's go in a bit more details about how this works.
When requesting a (pretty-link) URL on WordPress, first thing that happens is that the .htaccess looks for a folder/file with same name on the server, if it exists it will served, if not, it would send that request to the index.php file which does one thing:
/** Loads the WordPress Environment and Template */
require( dirname( __FILE__ ) . '/wp-blog-header.php' );
loading the wp-blog-header.php file, which will make a small check to make sure the code only run once then the following:
// Load the WordPress library.
require_once( dirname(__FILE__) . '/wp-load.php' );
// Set up the WordPress query.
wp();
// Load the theme template.
require_once( ABSPATH . WPINC . '/template-loader.php' );
Let's not go deeper into these files, what's concerns us the most is what 'wp-load.php' and 'template-loader.php' does
wp-load.php
This one among other things, looks for wp-config, make sure everything is set correctly, then connect to the database, of course after a lot of initialization, setting up constants loading a lot files that handles different parts of WordPress structure. Part of this process is that WordPress tries to match the request URL with a large set of rule called rewrite rules which are set of regular expressions, when a match is found WordPress will translate that URL into a database query using [WP_Query][1] class which is located at wp-includes/class-wp-query.php and this class will save the query results among other things (query type...etc)
template-loader.php
This one handles the display part, it uses some WordPress function that make use of WP_Query (eg:is_home()) to find out what type of content is to be displayed, then loads the the correct template based on that, and finally the template will use WP_Query to show the result.

add_action not working in wordpress

I am setting a WP that it should redirect all the request made to sub-directory to file inside that directory.
For example if there a visit on example.com/link/me then it should send the user to /link/index.php?me
To achieve this, I am trying to add rewrite rule in theme function file.
Here's what my code looks like:
function moin_add_rewrite_rules() {
add_rewrite_rule(
'^([^/]*)/(link)/(?:[a-z][a-z0-9_]*)?$',
'/link/index.php?$matches[1]',
'top'
);
}
add_action( 'init', 'moin_add_rewrite_rules' );
It is not working, visiting example.com/link/meshows a 404 (as link is outside WP).
Is there any problem with regex code? or anywhere else? What could be other possible solution?
Looking at the example in the Codex, the matched expression doesn't have the leading /.
Stripping the leading bit out of your regex, then, I think this should work (sorry, I don't have a test environment handy):
add_rewrite_rule('^link/([a-z][a-z0-9_]*)/?$','/link/index.php?$matches[1]','top');
Alternatively, you could do something similar in your .htaccess file.
As a side note, I think as it's currently coded, you should be using $matches[3] not $matches[1], as the value you're interested in is the third parameter in parentheses (the Codex notes "capture group data starts at 1, not 0").

Different lighttpd rewrite rules in subirectory

I'm having a few issues getting rewrite rules for a specific subdirectroy (different from the webroot) and I'm at a loss as to where to put them.
/var/www/ (webroot containing WordPress)
/var/www/subdirectory (containing other app which requires it's own rewrite rules)
The below rules I for WordPress which is in the webroot dir (/var/www, or http://mywebsite.com):
$HTTP["host"] =~ "mywebsite.com" {
url.rewrite-final = (
# Exclude some directories from rewriting
"^/(wp-admin|wp-includes|wp-content|gallery2)/(.*)" => "$0",
# Exclude .php files at root from rewriting
"^/(.*.php)" => "$0",
# Handle permalinks and feeds
"^/(.*)$" => "/index.php/$1"
)
}
Then I have a second app that sits in a subdirectory of the webroot (/var/www/subdirectory, or http://mywebsite.com/subdirectory) with the following rules:
url.rewrite = (
"(index.php|test.php|favicon.ico)" => "/$1",
"(css|files|img|js)/(.*)" => "/$1/$2",
"^([^\?]*)(\?(.+))?$" => "/index.php?url=$1&$3",
)
What I need is for the 1st set of rewrite rules above applied to everything except the other directories in the #Excluded rule above (as well as /subdirectory) then the 2nd set of rules applied to only /subdirectory
I got the first set of rules for WordPress from a blog somewhere online any may well be not exactly what I need (in regards to matching "mywebsite.com" and could probably be simplified).
I've Googled myself out trying multiple variations (mostly just stabbing in the dark guided by random forum posts that are slightly related), but I just can't wrap my head round it.
So how would I go about having the 2nd set of rules applied to the subdirectory while maintaining the Wordpress rules for the root?
Note: I have no access to subdomains (that would be too easy).
I am not quite sure what you want to express - I will explain what your current ruleset does.
do not use this, user rewrite-once instead
url.rewrite = (
you keep map any index.php and test.php (even foo-test.php) back to the webroot
"(index.php|test.php|favicon.ico)" => "/$1",
you rewrite any subfolder or files containing css,files,tmp or js in it's url back to the webroot
"(css|files|img|js)/(.*)" => "/$1/$2",
now this matches anything in the your webroot (no subdirs!) and keeps get requests
"^([^\?]*)(\?(.+))?$" => "/index.php?url=$1&$3",
)
Update
This should do it (untested)
url.rewrite-once = (
"^/subdir/(?:index.php|test.php|favicon.ico)" => "$0",
"^/subdir/(?:.+/)?(css|files|img|js)/(.*)" => "/subdir/$1/$2", #update #2
"^/subdir/(?:/(?:.+/))?([^\?]*)(?:\?(.+))?$" => "/subdir/index.php?url=$1&$2",
"^/(wp-admin|wp-includes|wp-content|gallery2)/(.*)" => "$0",
"^/(.*.php)" => "$0",
"^/(.*)$" => "/index.php/$1"
)

Migration to new domain

I'm working on a drupal 6 site at mydomain.com/drupalsite, and the designer has put a lot of hardcoded image paths in there, for instance a custom img folder in mydomain.com/drupalsite/img. So a lot of the site uses links to /drupalsite/img/myimg1.png.
Here's the problem -- the site is eventually moving to finaldomain.com, via pointing finaldomain.com to mydomain.com/drupalsite. So now paths like /drupalsite/img/myimg1.png will resolve to finaldomain.com/drupalsite/img/myimg1.png, instead of what should be finaldomain.com/img/myimg1.png. The finaldomain.com site has to point to that subdirectory so it hits the index.php.
My first instinct is to use an .htaccess file to replace the /drupalsite with "", but I've tried about a dozen different solutions and they haven't worked. My hack of a solution was to use some ln -s links but I really don't like it :) tia
Andrew
The best method, in hindsight, is to ensure folks use Drupal functions to make all links:
l (that's the letter L)
drupal_get_path()
base_path()
The l() function takes care of base path worries, and provides a systematic way to define your URL's. Using things like theme_image() plus the l() function are a sure win. Use the second and third functions above if you have to write your own <a> tags and for use inside theme functions like theme_image().
But for your current situation:
As regards Andy's solution, it would be better if you could limit your changes to certain database fields where you know the links are located.
So write a query to select all those fields (e.g. all body fields):
$my_query = db_query("SELECT vid, body FROM {node_revisions}");
This, for example, will get you every body field in the node_revisions table, so even your old revisions would have proper links.
Then run through those results, do str_replace() on each, and then write the changes back:
while($node = db_fetch_object($my_query)) {
$new_body = str_replace('what you have', 'what you want', $node->body);
db_query("UPDATE {node_revisions} SET body = '%s' WHERE vid = %d", $new_body, $node->vid);
}
I'd obviously try it on one record first, to make sure your code behaves as intended (just add a WHERE vid = 5, for example, to narrow it down to one revision). Furthermore, I haven't taken advantage of node_load and node_save, which are better for loading and saving nodes properly, so as to provide a more general solution (for you to replace text in blocks, etc.).
For your files, I'd suggest a good ol' sed command, by running something like the following from within your "sites" folder:
find ./ -type f -exec sed -i ’s/string1/string2/’ {} \;
Nabbed that from here, so take a look on that site for more explanation. If you're going to be working with paths, you'll either need to escape the / of the paths in your version of the sed command, or use a different sed separator (i.e. you can write s#string1#string2# instead of s/string1/string2/, so you could write s#/drupalsite/img/#/img# instead of s/\/drupalsite\/img\//\/img/ :-). See also Drupal handbook page for quick sed commands: http://drupal.org/node/128513.
A bit of a mess, which is why I try to enforce using the proper functions up front. But this is difficult if you want themers to create Drupal content but you don't want to give them access to the "PHP Filter" input format, or they simply don't know PHP. Proper Drupal theming, at any point past basic HTML/CSS work, requires a knowledge of PHP and Drupal's theme-related functions.
I've done this before by taking a full database dump, opening it in a text editor, and doing a global search and replace on the paths. Then on the new host, load the modified dump file, and it will have the correct paths in.
You could try Pathologic, it should be able to correct paths like this.

How to provide a default image file for ImageCache to use and process when the original does not exist?

Is there away to get ImageCache to use a default image? Or using htaccess to provide a default image to imagecache to process? Some of our clients sites are >4GB's and it's very painful dealing with all of their images that we don't need for development. I've tried using htaccess but ImageCache does not process the file and just ends up using the files dimensions which screws up the layout.
Any thoughts?
As I understand ImageCache responds to URIs like
http://www.yourdomain.com/default/files/imagecache/set/images/pic.png
where http://www.yourdomain.com is your domain, files/imagecache is the imagecache path, set is the predefined set of image manipulation settings and the rest (here: images/pic.png) is the actual relative path of the original image.
So, if pic.png doesn't exist, another file (default.png) should be served to ImageCache. An .htaccess solution for non-existant files could be:
RewriteCond %{REQUEST_fileNAME} !-f
RewriteRule ^([^.]+)\.[gif|jpg|png]$ /images/default.png [L]
Now ImageCache requests images/pic.png which does not exist and gets images/default.png served, processes it and saves it at default/files/imagecache/set/images/pic.png.
Well, at least this is my theory.
Regards, Paul
-###########-
EDIT regarding first comment:
Ok, I looked into the module. In imagecache.module, line 386 starts the helper function _imagecache_cache($presetname, $path). Within this function is a check for the existance of original file (line 403). Change this block
// Check if the path to the file exists.
if (!is_file($src) && !is_file($src = file_create_path($src))) {
watchdog('imagecache', '404: Unable to find %image ', array('%image' => $src), WATCHDOG_ERROR);
header("HTTP/1.0 404 Not Found");
exit;
};
to
// Check if the path to the file exists.
if (!is_file($src) && !is_file($src = file_create_path($src))) {
watchdog('imagecache', '404: Unable to find %image ', array('%image' => $src), WATCHDOG_ERROR);
/*header("HTTP/1.0 404 Not Found");
exit;*/
$src = 'sites/all/modules/imagecache/sample.png';
};
(Notes: I left the original code lines as comments. You can set $src to any default file you want.)
I wrote a module for this because I too hated working with broken layouts and I hated pulling down giant files directories to get 1-10GB+ of images just to fix the layout.
It works on the theme layer by wrapping theme('imagecache') and theme('image_style') calls in a bit of logic to detect broken paths. Also works with image formatters as well as theme functions.
http://drupal.org/project/imagecache_defaults
Works for Drupal 6 and 7.
Hitting the file system extra times for every image can be slow for some server configurations (http://drupal.org/node/908282) so imagecache_defaults persistently caches everything it discovers about files on your server and a few other things (use a non-db cache implementation for best results).

Resources