NetEngine AR600, AR6100, AR6200, and AR6300 V300R019 CLI-based Configuration Guide - Security

Understanding URL Filtering

Understanding URL Filtering

URL filtering allows the device to accurately manage online behaviors of users who access network resources through HTTP.

URL Matching Mode

After the device receives an HTTP request, it parses the HTTP request and uses a specified method to match the HTTP request with a URL. If the URL is matched, the device processes the HTTP request according to the action taken for the URL.

A URL is the address of a web page or an accessible resource on the Internet.

The standard format of a URL is protocol://hostname[:port]/path[?query]. Table 7-7 describes the fields.

Table 7-7 URL fields

Field

Description

protocol

Used application protocol. HTTP is often used.

hostname

Domain name or IP address of the web server.

:port

(Optional) Communication port. Application protocols have default ports. For example, the default port for HTTP is 80. If the server uses the default port, you do not need to configure the port number in a URL filtering rule. If the server uses a non-default port, the port number is mandatory in a URL filtering rule.

path

Directory or file path on the web server. The path is a character string that can be separated by slashes (/).

?query

(Optional) This field is used to transmit parameters to dynamic web pages.

URL matching modes include prefix matching, suffix matching, keyword matching, and exact matching. Table 7-8 describes the four matching modes.

Table 7-8 URL matching modes

Matching Mode

Function

Example

Result

Prefix matching

Matches URLs that start with a specified character string.

test.com*

All URLs that start with test.com are matched, for example:

  • test.com
  • test.com.cn
  • test.com/solutions.do

The following URLs will not match:www.test.com.

Suffix matching

Matches URLs that end with a specified character string.

*test.com

All URLs that start with test.com are matched, for example:

  • www.test.com
  • newtest.com

The following URLs will not match:

  • www.test.com.cn
  • www.test.com/news

Keyword matching

Matches URLs that include a specified character string.

*test*

All URLs that include test are matched, for example:

  • sports.test.com/news/solutions.aspx
  • newtest1.com/it/

Exact matching

First checks whether the URL matches the specified character string. If no, delete the last directory of the URL and check again. If the URL is still not matched, delete the second last directory and check again until the domain name can match a specific character string.

www.example.com

Based on matching rules, the following URLs match www.example.com:
  • www.example.com
  • www.example.com/news
  • www.example.com/news/en/
The following URLs will not match www.example.com:
  • www.example.com.cn/news
  • www.example.org/news/

The priorities of URL matching modes are as follows:

Exact matching > suffix matching > prefix matching > keyword matching

For example, URL www.example.com/news first matches www.example.com/news in the following prefix matching rules:
  • Exact matching: www.example.com/news
  • Prefix matching: www.example.com/*
  • Keyword matching: *example*
In the same matching mode, a long matching rule is assigned a higher priority than a short one. For example, URL www.example.com/news/index.html first matches www.example.com/news/* in the following prefix matching rules:
  • www.example.com/news/*
  • www.example.com/*
If the matching rules have the same length in the same matching mode, the action mode is used to determine the rule to which a rule matches. As shown in Table 7-9, the two URL rules are in keyword matching mode and have the same length (4). As for URL www.example.com/welcome.html:
  • If the action mode is strict mode, the URL will match the category with a stricter action. In this example, the URL matches category B whose action is block.
  • If the action mode is loose mode, the URL will match the category with a looser action. In this example, the URL matches category A whose action is permit.
Table 7-9 Action mode

Item

URL Category

Control Action

*.com*

URL category A

Permit

*html*

URL category B

Block

  • In all of the matching modes, the device removes http:// from the beginning of the entered character string. In the exact matching mode, the device adds a slash (/) at the end of character strings that do not contain any slashes (/) after the hostname.
  • URLs are case insensitive.
  • This section mainly describes URL matching details. For details on URL control actions, see Working Mechanism of URL Filtering.

Working Mechanism of URL Filtering

When an HTTP request matches a URL, the device processes the HTTP request according to the URL Filtering Mode and URL Filtering Process.

URL Filtering Mode

The device provides URL filtering based on the blacklist, whitelist, and URL category:

The AR611W, AR611W-LTE4CN, AR617VW, AR617VW-LTE4, AR617VW-LTE4EA models do not support remote URL query. Currently, URL filtering uses a built-in local library and does not support remote dynamic update of URLs. The local library maintains some mainstream websites, which are called predefined categories, and is used to control access to common websites. As the network develops, predefined URL categories may not cover new websites. If a URL category is found, the action configured in the URL filtering profile is used. If no URL category is found, the request is processed based on the default action. If you want to further filtering URL categories, you can configure a URL blacklist or whitelist to filter websites.

  • Blacklist and whitelist: The device filters received HTTP requests according to the manually configured URL Whitelist or URL Blacklist.

  • URL category: After receiving an HTTP request, the device queries the URL category in URL Category. After the URL category is found, the device processes the HTTP request according to the Action configured for the URL category.

The blacklist, whitelist, URL category, and action are described as follows:

  • URL whitelist: Enterprises can add URLs of network resources that employees are allowed to access to a URL whitelist. When URLs in HTTP requests match the URL whitelist, the device allows the HTTP requests to pass through.

  • URL blacklist: Enterprises can add URLs of network resources that employees are not allowed to access to a URL blacklist. When URLs in HTTP requests match the URL blacklist, the device rejects the HTTP requests and displays the blacklist blocking page.

  • URL category: URL categories fall into user-defined and predefined URL categories.

    • User-defined categories are configured and maintained by administrators and enable more refined control over URLs than predefined categories.
    • Predefined categories contain commonly used URLs. Predefined categories enable administrators to easily control accessible and inaccessible URL categories.
      Predefined URL category query has one mode:
      • Local URL filtering: A predefined URL category database is loaded to the local cache of the device before delivery. You can use this database if there is no special requirement. When the predefined URL category database fails to be loaded or the security service center updates the predefined URL category database, manually load the new predefined URL category to the local cache. After obtaining the URL information, the device queries the category matching the URL in the cache first. If a category is found, the device takes the action in the URL filtering configuration file. If no matching category is found, the device takes the default action.

  • Action: After the device finds a URL category matching an HTTP request, it processes the HTTP request according to the action taken for the URL category. Currently, the device supports the following actions:

    • allow: The device does not process the HTTP request, and allows the HTTP request to pass through.
    • alert: The device allows the HTTP request to pass through and generates a log.
    • block: The device rejects the HTTP request and generates a log. In addition, the device displays the blocking page.
If a URL belongs to multiple categories, the device takes an action based on the action mode.
  • Strict: The device takes the strictest action among all matched categories. For example, a URL belongs to two categories, and the actions are Alert and Block. In this case, the device takes the Block action.
  • Loose: The device takes the loosest action among all matched categories. For example, a URL belongs to two categories, and the actions are Alert and Block. In this case, the device takes the Alert action.

URL Filtering Process

URL filtering enables the device to filter HTTP requests passing through the device. Figure1 shows the URL filtering process.

Figure 7-10 URL filtering process
If a user accesses a network resource using HTTP through the device, the device applies the filter as follows:
  1. A user initiates a URL access request. If the data flow matches a security policy, and the action of the policy is permit, the system implements URL filtering.
  2. The Router checks the HTTP packet.

    • If the HTTP packet is abnormal, the Router blocks the request.
    • If the HTTP packet is normal, the Router performs the next detection.
  3. The device matches the URL with the whitelist.

    • If the URL matches the whitelist, the device permits the URL access request.
    • If the URL does not match the whitelist, the device proceeds to the next step.
  4. The device matches the URL with the blacklist.

    • If URL information matches the blacklist, the device blocks the URL access request.
    • If the URL does not match the blacklist, the device proceeds to the next step.
  5. The device matches the URL with user-defined categories.

    • If the URL matches a user-defined category, the device processes the request based on the action for this user-defined category.

      URLs added to predefined categories are still considered user-defined URLs.

    • If the URL does not match user-defined categories, the device proceeds to the next step.
  6. The device matches the URL with predefined categories in the cache.

    • If a predefined category is matched, the device processes the request based on the action for this predefined category.

    • If no predefined category is matched, the device processes the request based on the default action.