Start with...

Start with...htaccess

by ivan at 2013-08-10

The .htaccess file is a very important file. Allow us to do stuff before the request starts to be processed. The content of this file can be included as a part of the virtualhost item in your Apache configuration, but sometimes we don't have access to touch this files and .htacces offers an alternative for do this things.

But...what are the things that we are talking about? Ok, yes, it's true, I offers you a meeting with a guy but you only know his name. But this have a solution. We are talking about:

This file is triggered when the request tries to access to a resource that is in the same depth of the .htaccess file or to a resource that have more depth in the File System.

Redirections

For use the redirection functionality we need the mod_rewrite.c module.

It's possible that this is the most used functionality of the .htaccess file. It's very simple redirects the traffic of one URL to another but also we take in account that this redirection isn't the only that we can manage via .htaccess. What?

There are two types of redirections: external redirections and internal redirections. The first ones, creates a new request and, the second ones, maps the traffic to one place to another without the need of request again. Confusing? Below I explain it better.

External Redirections

As we said before, an external redirection is say to the browser that the current request tries to go to a place that isn't correct, but we need which is the new place that have the requested content. For this, we say to the browser "go here and your user will find the content that him likes". This is a very simple concept to understand, I think but, why the user is requesting an URL that doesn't exists? There are some reasons but if we talk about percentages, we obtain two:

  • The user is browsing and find a link to our website, but is an old link. We changed our URL's for be more SEO friendly (for example), but this link is setted previously of our change. What happens? The user goes to our domain, but with this URL...the browser says that this content doesn't exists.
  • The same case but the link isn't in a website...is a bookmark.

Ok, we know now two of the multiple possible reasons that can create a request to an URL that doesn't exists now. Well, which are the things that we like to know for redirect this traffic to the new URL's?

We need to know about request codes. When in the TCP/IP protocol we request for something, we need to return a request code that explains to the browser which is the status of this request. By default, the request code is 200, that is the OK code. There lots of different codes (you can view it all here), but in this article we center our efforts in only one: 301.

301 is the external redirect by excellence. It's the most used and the most usefull. Why? Because with the 301 code (Moved Permanently) we say to the browser that this content is now in another place and, with this, we say also that the previous SEO attributes of the older page, needs to be passed to the new one.

If we have and old domain and we need to send all the traffic to the new one, if we had a section in our website that was removed but have a good position in searching engines...301 is our solution. We can say to the browser and to the robots that all of this set of pages now are one (for example the home page) or some (and map one to one or many to one), and ensure that the quality of our page isn't losed.

Internal Redirections

Internal redirections are a little bit more "complex". Not too much, but we need to understand which is the default behaviour of a website.

All the websites in the world, by default, have a folder architecture similar to the directory system of any operating system. This means that if we request this page:

http://www.mydomain.com/article/posts/example.html

We are trying to access to the example.html file that are in the root folder article and inside the folder posts. Ok, this is perfect but...what happens when our website isn't composed only by static files? or what happens when we like to use URL's without extensions? We have an issue.

For solve this situation, we can use the internal redirections. This technique allow us to map any URL with a file, saying that this file can resolve this URL or set of URL's. And when we say this, we offer the result without the need of create a new request. Why? Because if we create a new request, we need to enable a file that can solve this problem and if we said that we don't like to put extensions in out URL's...we can't do it or we can start a very funny infinite loop of redirections, but anyone like this for their website, right?


Well, now we know the theory, but we need to know how implement our redirection solution. And as the redirections, we have two solutions: one to one and many to one.

One to One redirections

Sometimes, when we change our website URL's, the mapping between the old URL and the new URL is one to one, for every old URL we have a new one. For say to the browser that we like to response the old URL with the new one, we need to write the next statement.

As we can view, we can use this type of redirection for all the types of URL's (with extension, without extension...) and if we need to set another type of response code, we only need to change 301 for the status code that we like to send. Isn't more difficult that put one line for every URL that we like to redirect.

Many to One redirections

The One to One redirection is a good solution but sometimes we have lots of URL's and put one line for every URL can be very hard. For this, exists the Many to One redirections. We called Many to One, but the real meaning is we set a rule and all the URL's that match with this rule goes to one page that can have also params.

Said this it's possible that every URL goes to a different URL or it's possible that some URL's match the same rule and if this URL doesn't have params, all of them goes to the same URL.

This type of redirections can be writted with different syntaxes, but in this article, we will talk about one that we consider that can cover all your needs. We are talking about the combination of the RewriteCond and RewriteRule functionalities.

RewriteCond allow us to set conditions that affect to our redirection rules. We can us a set of conditions for the same rule, creating a complex rule. The syntax of this types of statements are very simple, but it's a little bit tricky.

For explain it more easy, we start with an example

In this example we can view that we have 3 conditions. All of them have the same syntax, that we can define it as:

RewriteCond Variable Value [Flags]

Ok, we have that we starts every time with RewriteCond and then we have three custom fields

  1. Variable
    The variable is the attribute of the request that we like to analyse in the condition. Can be the host, the entire URL, the URI...lots. The most used are:
    • %{HTTP_REFERER}: Contains the URL of the last page visited.
    • %{HTTP_HOST}: Contains the URL of the requested page.
    • %{HTTP_USER_AGENT}: Contains the browser User Agent.
    • %{REQUEST_FILENAME}: Contains the file that is being requested.
  2. Value
    Here, the value of the variable. Can be a literal or a regular expression. In the case of the literal, we need to consider some aspects.
    • There are some variables that have their own literals, like the %{REQUEST_FILENAME}, that their values can be the name of the file that we are requesting or the type of this file, that can be defined with:

      • -s: Means that the requested file is a single file with size
      • -l: Means that the requested file is a link
      • -d: Means that the requested file is a directory
      • ...

      As you view, there are lots of possible values and every variable have their owns. This article doesn't pretend to describe all, only talk about the .htaccess file and their basic functionality and configuration.

    • Some variable values are also a variable that need a value, like the -strmatch that needs to be continued by an expression before the flags.
  3. Flags

    Flags aren't necessary. By default, the flag have a value of [AND], that isn't removed if we don't set the flag to [OR]. We can set more than one flag separating it via comma. The flags that we can define are:

    • NC: Means no-case. If we enable this flag, the value of the condition is no-case sensitive.
    • OR: Create a set of conditions that, if one of them matches, we execute the next RewriteRule.
    • NV: Means no-vary. Is the less frequently used and is used for create a break proper caching of the response.

With this, we are ready to start to define the RewriteRule functionality. After define some RewriteCond statements (if we like, we can't define any condition and operate directly with the RewriteRule), we put a RewriteRule statement, that have the next syntax.

RewriteRule Pattern Result [Flags]

As we can view, is very similar to the RewriteCond syntax, but have some differences. In this case, the definition of the different parts are:

  1. Pattern:

    The pattern section is a regular expression that defines which one of the matches URL's are processed by this rule.

  2. Result:

    With a literal or another regular expression, the result of the rewrite. If result is -, means do nothing.

  3. Flags:

    As in in the RewriteCond statement, aren't necessary and we can concat as many as we like separating it via comma. The most used are:

    • NC: Means no-case. If we enable this flag, the value of the condition is no-case sensitive.
    • C: Means Chain. It's similar to the [OR] statement in RewriteCond. If this rule doesn't match goes to the next one.
    • F: Means Forbidden. Sets a 403 request status code.
    • G: Means Gone. Sets a 410 request status code.
    • R: Means Redirect. If we set this flag like R=301, the new request have the request status code that we put after the equal symbol.
    • L: Means Last. This flag says that this is the last rule in the set. If there are another after, this isn't be affected by the RewriteCond statements that we defined before.
    • QSA: Means Query String Append. If we enable this flag, we append after the Result, all the params that are defined in the URL (GET params).
    • QSD: Means Query String Discard. Is the same that the last one, but in this case, we discard the params.

Ok with this we have all the information that we think that are necessary. Now, some explained examples for complement this information.

Redirect all the traffic of one domain to another domain

Redirect all the traffic of one domain to one file

Redirect all the traffic that doesn't exists as the file-system directory behaviour

If you have more questions about this types of redirections, you like to know more about all the posibilities that offer this Apache module, here you have the oficial documentation.

Access Control

Another common use of this file is for define access control via user and password to some parts of our website. For do it, we need to add this lines in our .htaccess file.

As we can view, we have a AuthUserFile. What type of file is this file?

.htpasswd

The .htpasswd file is a file that contains users and passwords. For generate it, we need to execute the next line:

htpasswd -c /path/to/.htpasswd username

After this, the console solitice us the password for the user username and when we type it, the .htpasswd file is generated with this user. If we need to add more users, we need to type the next line for every one, and type the password when the prompt appears.

htpasswd /path/to/.htpasswd username

If we type a username that is already setted, we change their password.

In the configuration lines that we put before, we can view a line that puts Require valid-user. This means that any user in the file is valid for entry in this section. If we like to select only many users of all the users that are in the file, we can type their usernames separated via comma.

Modify Headers

For use the header functionality we need the mod_headers.c module.

The .htaccess file can also be used for change some headers of our requests. For do it...we need to add a code like this:

This example set a header max-age depending of the file extension of the file that it's being requested. If we need to set another header, we only select which one we like and what's their value. here you have the entire list of the possible headers that have the HTTP protocol.

Set the expire time

For use the expire time functionality we need the mod_expires.c module.

When we work with Google Page Speed, this is one of the typical things that says that we don't have implemented yet. Browsers needs to know how many time we like that they conserve our static resources and doesn't request again for theirs. Below, an example of set this information in the .htaccess file.

Misc

In this section, two easy and useful functionalities that offer the .htaccess file and we think that all we need to have in our .htaccess files.

Compress the Request
For use the gzip functionality we need the mod_rewrite.c module.

If we write the next lines in at the begging of our .htaccess file, we compress via .gzip algorithm all our requests, reducing with that, the weight of our them.

For use the deflate functionality we need the mod_deflate.c module.

Another useful functionality. If we put the next lines at the beggining of our .htaccess file, we remove the unnecessary white spaces and break lines of our html, css and javascripts files, reducing with that the weight of our requests.


That's all folks! With this we end this entry. If you have more questions, if you like to send a subject that you think that is interesting that we talk about it, please, fill our contact form and we try to reply as soon as possible.

Thanks for your attention and we hope to view you again in the next article!

For install and enable modules in Apache, we need to have console access to the server. If you don't have this type of access, you need to talk with your System Adminstrators. However, if yo have access, it's very simple to do it by own. You only need to execute this line for every module that you need to install and enable:
a2enmod name_of_the_module
The module name is the name that we put in the info boxes but without mod_ and without the extension .c. Then, if you need to enable the mod_rewrite.c, you need to write:
a2enmod rewrite