About the .htaccess file

In this post, we will cover some fundamentals of the .htaccess file. We’ll learn about the .htaccess file in general, but then also look at some practical examples that you may find useful.

What is the .htaccess

You use the .htaccess configuration file on Apache web servers. This file controls how a web server responds to different requests. The .htaccess file allows you to take directives that would typically be put in Apache’s main configuration files and put them in a directory specific configuration file instead. The Apache server loads the .htaccess file from the directory placed in by detecting and executing the file. It is a server configuration file, and you use it to manipulate features directly. It is also less resource intensive as compared to a plugin.

In a nutshell, the .htaccess configuration file allows you to alter and change the functionalities and features of the Apache webserver software. You can enable, disable, and modify different functionalities at the run time.

What does .htaccess mean

.htaccess is short for hypertext access. The primary benefit of the .htaccess file was to control user access on files on specific directories. Note that ‘.htaccess’ is not an extension; it is a complete file name. You can’t create a file with a .htaccess extension such as sample.htaccess.

Where can you find It

The .htaccess file can be found primarily in your website’s root folder, for example:  /var/www/html/. Essentially, every directory on the web server can have a ‘.htaccess’ file. Each directory can have only one .htaccess file. Each .htaccess file will set different server behaviors.

Why can’t you see It

The .htaccess file is an Apache file and will not work on web servers such as Nginx. Linux hides all dot files by default. A quick solution is to open your hosting manager and turn on the “Show hidden files” option. Alternatively, you can use the ‘ls -a’ Linux command.

What can you do with it

You can use the .htaccess file in many ways. Below is a small list of the possibilities.

For example, you can –

  • Block specific IP addresses and, at the same time, only allow specific IP addresses to access your website. This feature is beneficial for allowing only specific IP addresses to access your website’s secure pages such as the admin panel. This way, an unauthorized person will get an error if they try to access the page.
  • Create custom error pages. Naturally, the webserver displays pre-defined error pages for the errors. You can customize and create custom pages for specific errors.
  • Enable basic HTTP authentication on your entire site or specific directories.

Right! Now that we know the theory lets do a deep dive into some practical uses.

How to add a custom header and value

We can use Apache’s header directive to add our custom header.

The syntax is as follows:

Header add Sample-Header "My Value"

You can add the above example to your website’s root .htaccess file. Just replace “Sample-Header” with any custom header name. Also, change the name of the parameter and set the value accordingly.

Executing the above line will perform two actions. It will instruct the Apache server to add a custom header named “Sample-Header,” and it will set the header parameter and value to “parameter” and” value,” respectively.

Blocking Users based on their IP addresses

You can restrict specific users with specific IP addresses from accessing your website. For example, it can restrict everyone except yourself from opening the dashboard of your site. So even if a hacker knows your admin panel’s password, they will not be able to open the admin panel page. They will be prompted with an error page. Note that your IP address changes unless you have been assigned a static IP.

Open up your site’s root .htaccess file and input the following commands to it:

To deny a specific IP address:

Deny from 121.212.121.212

Here replace 121.212.121.212 with the IP you want to block. If a user accesses your page from that IP, they will be prompted with an error message.

To deny multiple IP addresses:

Open up your site’s root .htaccess file and input the following commands to it:

Deny from 1.2.2.1 2.3.3.2. 3.4.4.3 4.5.5.4

This command will block the above-stated IP addresses from viewing your website.

Allowing Users based on their IP addresses

This works similar to blocking IP addresses. The only difference is that you allow specific IP addresses to access your site or your web pages.

Open up your site’s root .htaccess file and input the following directives to it:

To allow specific IP addresses:

Allow from 121.232.121.232

To allow multiple IP addresses:

Allow from 1.2.3.4 2.1.3.4 3.1.2.4 4.1.2.3

This will allow only the above-written IP addresses to view your website. You can add as many IP addresses as you want.

How to block users by domain

You also have the power to block certain domains. Any requests from the specified domain will receive a 403 forbidden error message. Let’s look at how you can block URLs from certain domains.

Blocking domain:

Open up your site’s root .htaccess file and input the following commands to it:

SetEnvIfNoCase Referer "sample-domain.com" bad_referer
Order Allow, Deny
Allow from ALL
Deny from env=bad_referer

Edit the above code by replacing “sample-domain.com” with the target domain you want to block. Now all the URL redirects that are hosted on the target domain will be blocked.

How to block by referrers

Websites (or referrers) can link directly to your images and other resources, without any benefit to you. Let us see how we can block these referrers.

Blocking a single referrer

Open up your site’s root .htaccess file and input the following commands to it:

RewriteEngine on
# Options +FollowSymlinks
RewriteCond %{HTTP_REFERER} sample-domain\.com
RewriteRule .* - [F]

The above code tells the Apache server to block traffic coming from the URL, “sample-domain.com.” You can replace “sample-domain.com” with the desired URL.

Blocking multiple referrers:

Open up your site’s root .htaccess file and input the following commands to it:

RewriteEngine on
# Options +FollowSymlinks
RewriteCond %{HTTP_REFERER} sample-domain\.com [NC,OR]
RewriteCond %{HTTP_REFERER} another-sample-domain\.com
RewriteCond %{HTTP_REFERER} another-domain\.com
RewriteRule .* - [F]

This code will block traffic from all the above-stated URLs.

Blocking bots

Bots can be good or bad! Let us see how you block bad bots that scour your site to download your content.

Open up your site’s root .htaccess file and input the following directives to it:

ErrorDocument 403 /403.html

RewriteEngine On
RewriteBase /

# IF THE UA STARTS WITH THESE
RewriteCond %{HTTP_USER_AGENT} ^(aesop_com_spiderman|alexibot|backweb|bandit|batchftp|bigfoot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(black.?hole|blackwidow|blowfish|botalot|buddy|builtbottough|bullseye) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(cheesebot|cherrypicker|chinaclaw|collector|copier|copyrightcheck) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(cosmos|crescent|curl|custo|da|diibot|disco|dittospyder|dragonfly) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(drip|easydl|ebingbong|ecatch|eirgrabber|emailcollector|emailsiphon) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(emailwolf|erocrawler|exabot|eyenetie|filehound|flashget|flunky) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(frontpage|getright|getweb|go.?zilla|go-ahead-got-it|gotit|grabnet) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(grafula|harvest|hloader|hmview|httplib|httrack|humanlinks|ilsebot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(infonavirobot|infotekies|intelliseek|interget|iria|jennybot|jetcar) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(joc|justview|jyxobot|kenjin|keyword|larbin|leechftp|lexibot|lftp|libweb) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(likse|linkscan|linkwalker|lnspiderguy|lwp|magnet|mag-net|markwatch) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(mata.?hari|memo|microsoft.?url|midown.?tool|miixpc|mirror|missigua) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(mister.?pix|moget|mozilla.?newt|nameprotect|navroad|backdoorbot|nearsite) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(net.?vampire|netants|netcraft|netmechanic|netspider|nextgensearchbot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(attach|nicerspro|nimblecrawler|npbot|octopus|offline.?explorer) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(offline.?navigator|openfind|outfoxbot|pagegrabber|papa|pavuk) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(pcbrowser|php.?version.?tracker|pockey|propowerbot|prowebwalker) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(psbot|pump|queryn|recorder|realdownload|reaper|reget|true_robot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(repomonkey|rma|internetseer|sitesnagger|siphon|slysearch|smartdownload) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(snake|snapbot|snoopy|sogou|spacebison|spankbot|spanner|sqworm|superbot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(superhttp|surfbot|asterias|suzuran|szukacz|takeout|teleport) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(telesoft|the.?intraformant|thenomad|tighttwatbot|titan|urldispatcher) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(turingos|turnitinbot|urly.?warning|vacuum|vci|voideye|whacker) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(libwww-perl|widow|wisenutbot|wwwoffle|xaldon|xenu|zeus|zyborg|anonymouse) [NC,OR]

# STARTS WITH WEB
RewriteCond %{HTTP_USER_AGENT} ^web(zip|emaile|enhancer|fetch|go.?is|auto|bandit|clip|copier|master|reaper|sauger|site.?quester|whack) [NC,OR]

# ANYWHERE IN UA -- GREEDY REGEX
RewriteCond %{HTTP_USER_AGENT} ^.*(craftbot|download|extract|stripper|sucker|ninja|clshttp|webspider|leacher|collector|grabber|webpictures).*$ [NC]

# ISSUE 403 / SERVE ERRORDOCUMENT
RewriteRule . - [F,L]

Reference: https://www.askapache.com/htaccess/blocking-bad-bots-and-scrapers-with-htaccess/.

Setting default pages

The server looks for a specifically named file called the index file as the home page. The home page or default page can also be changed with tweaking the .htaccess file. This can be done by changing the name of the index file.

Open up your site’s root .htaccess file and input the following directives to it:

DirectoryIndex your_new_index_file.php

Replace your_new_index_file.php with the name of the file you want to set as your default page.

Setting the default directory

By default, the root directory of your website is public_html. This folder is your document root directory. You can change the default directory with changes to the .htaccess file.

Open up your site’s root .htaccess file and input the following directives to it:

RewriteEngine on
RewriteCond %{HTTP_HOST} ^domain.com$ [NC,OR]
RewriteCond %{HTTP_HOST} ^www.domain.com$
RewriteCond %{REQUEST_URI} !new_folder/
RewriteRule (.*) /newfolder/$1 [L]

Replace domain.com and www.domain.com with your website’s domain name. Finally, replace the new_folder with the name of the new folder to set as your default directory.

Blocking referrers (hotlink protection)

Referral traffic keeps invading your analytics. The results of analytics get inaccurate because of them. You can filter referrers on your analytics, or you can block them through the .htaccess file.

Open up your site’s root .htaccess file and input the following directives to it:

RewriteCond %{HTTP_REFERER} site1\.com [NC,OR]
RewriteCond %{HTTP_REFERER} site2\.com [NC,OR]
RewriteCond %{HTTP_REFERER} site3\.com [NC,OR]
RewriteRule .* – [F]

Replace site1, site2, and site3.com with the URLs you want to block. You can add as many URLs as you want.

Adding MIME types

MIME types tell the Apache server about how to treat a specific type of file as. For example, you can tell the server to treat .mp3 files as audio files.

Open up your site’s root .htaccess file and input the following directives to it:

AddType audio/mpeg .mp3
AddType video/mp4 .mp4
AddType application/x-chrome-extension .crx

There are various MIME types you can add. The ones mentioned above are just a few examples.

Specify error documents

To create your custom error documents and link them to the error codes, you need to be familiar with returned error codes. The basic codes are 400, 401, 403, 404, and 500.

Open up your site’s root .htaccess file and input the following directives to it:

ErrorDocument 400 http://yoursite.com/errors/badrequestpage.html
ErrorDocument 401 http://yoursite.com/errors/authreqpage.html
ErrorDocument 403 http://yoursite.com/errors/forbidpage.html
ErrorDocument 404 http://yoursite.com/errors/notfoundpage.html
ErrorDocument 500 http://yoursite.com/errors/serverpage.html

Here the error pages are stored in the error directory. You can name the error documents anything and link them, as shown above.

Leveraging browser caching

Leveraging browser caching is a technique where the websites store their most used web pages on the user’s local storage. This speeds up the web page load time as the contents of the page are stored locally. Browsers can only cache static content.

Open up your site’s root .htaccess file and input the following directives to it:

<IfModule mod_expires.c>
ExpiresActive On
ExpiresByType image/jpg "access plus 1 year"
ExpiresByType image/jpeg "access plus 1 year"
ExpiresByType image/gif "access plus 1 year"
ExpiresByType image/png "access plus 1 year"
ExpiresByType text/css "access plus 1 month"
ExpiresByType application/pdf "access plus 1 month"
ExpiresByType text/x-javascript "access plus 1 month"
ExpiresByType application/x-shockwave-flash "access plus 1 month"
ExpiresByType image/x-icon "access plus 1 year"
ExpiresDefault "access plus 7 days"
</IfModule>

You can adjust the time duration according to your website.

Conclusion

The .htaccess file gives you great control of your Apache website’s behaviour. There are so many thing that you can do, and it it really quite flexible allowing you to manage everything on a per folder basis. Why not read our tools to scan for website problems to see if your site is working fine.



About the Authors

Each member of Anto's editorial team is a Cloud expert in their own right. Anto Online takes great pride in helping fellow Cloud enthusiasts. Let us know if you have an excellent idea for the next topic!

Support the Cause

Support Anto Online and buy us a coffee. Anything is possible with coffee and code.

Buy me a coffee



Leave a Reply

Your email address will not be published. Required fields are marked *