How to parse a URI...

Have you ever been presented with the task of building a security policy on top of your web applications? How about being asked to route certain applications to specific nodes based on the time of day? Or, how about blocking all requests to a web application that are missing certain query string attributes?

This article will illustrate how to use the various iRules commands to extract the components out of a URI and help you with that security or high availability policy that you desire. A HTTP request is broken down into the following format:

protocol://[host][uri]

For this article, we will use the following as the request

http://www.example.com/dir1/dir2/file.ext?arg1=val1&arg2=val2&arg3

where protocol is typically http or https, the host is the address (and optional port) the request is directed to, and the URI all the rest. These values can be retrieved with the HTTP::host and HTTP::uri commands respectively as follows:

when HTTP_REQUEST {
  # This would be "www.example.com"
  set host [HTTP::host]
  # This will be "/dir1/dir2/file.ext?arg1=val1&arg2=val2&arg3"
  set uri [HTTP::uri]
}

Dissecting the URI Now, let's say you have a requirement to make policy decisions based on certain specific components within the URI. Well, you'd most likely need to be able to dissect the URI into it's components. If you were a TCL guru and enjoy string parsing, you could use the builtin TCL string commands to break the URI down. But, for the rest of you out there who have better things to do with your time, we've included some utility routines to help you out. A URI consists of a path and an optional query component.

uri -> [path]?[query]

These values can be accessed with the URI::path and the URI::query commands.

when HTTP_REQUEST {
  # These will be "/dir1/dir2/file.ext"
  set path [URI::path [HTTP::uri]]
  set path2 [HTTP::path]
  # These will be "arg1=val1&arg2=val2&arg3"
  set query [URI::query [HTTP::uri]]
  set query2 [HTTP::query]
}

Ok, that's good and fine, but most likely you'll need to know more details than that. Maybe you need to know if the second directory is "/apps" or the value of the uid query parameter is a number. So, you'll need to be able to tear apart the path and query components to get to that information. For the path, you can use optional arguments to the HTTP::path command to extract the depth (number of directories) and use the URI::basename to pull the last filename out.

when HTTP_REQUEST {
  # This will be 2
  set depth [URI::path [HTTP::uri] depth]
  # This will be "/dir1"
  set path1 [URI::path [HTTP::uri] 1 1]
  # This will be "/dir2"
  set path2 [URI::path [HTTP::uri] 2 2]
  # This will be "file.ext"
  set basename [URI::basename [HTTP::uri]]
}

Well, that takes care of the path, now on to the Query. In v9.2, we introduced an extension to the URI::query command to allow for an optional query parameter name to be pass in and the result would be the value for that query parameter.

when HTTP_REQUEST {
  # This will be "val1"
  set v1 [URI::query [HTTP::uri] "arg1"]
  # This will be "val2"
  set v2 [URI::query [HTTP::uri] "arg2"]
  # This will be ""
  set v3 [URI::query [HTTP::uri] "arg3]
}

But, this assumes you know the names of the various query parameters. Assuming is generally not a good thing... At the time of this article, there are no methods that return a list of all the query parameters so here's a little bit of string parsing to split the query string into it's components.

when HTTP_REQUEST {
  # Split the query into name-value pairs delimited by "&"
  set namevals [split [HTTP::query] "&"]
  # A TCL for loop - if you ever wondered what one looks like...
  for {set i 0} {$i < [llength $namevals]} {incr i} {
  # Split name-value pair into name and value delimited by "="
    set params [split [lindex $namevals $i] "="]
    set name [lindex $params 0]
    set val [lindex $params 1]
  }
}

Putting it all together. Here's an iRule that breaks apart the URI and logs all components:

when HTTP_REQUEST {
  log local0. "----------------------"
  log local0. "URI Information"
  log local0. "----------------------"
  log local0. "HTTP::uri: [HTTP::uri]"
  log local0. "----------------------"
  log local0. "Path Information"
  log local0. "----------------------"
  log local0. "HTTP::path: [HTTP::path]"
  set depth [URI::path [HTTP::uri] depth]
  for {set i 1} {$i <= $depth} {incr i} {
    set dir [URI::path [HTTP::uri] $i $i]
    log local0. "dir\[$i\]: $dir"
  }
  log local0. "Basename: [URI::basename [HTTP::uri]]"
  log local0. "----------------------"
  log local0. "Query Information"
  log local0. "----------------------"
  log local0. "HTTP::query: [HTTP::query]"
  set namevals [split [HTTP::query] "&"]
  for {set i 0} {$i < [llength $namevals]} {incr i} {
    set params [split [lindex $namevals $i] "="]
    set pnum [expr $i+1]
    log local0. "Param\[$pnum\]: [lindex $params 0]"
#    log local0. "Value\[$pnum\]: [lindex $params 1]"
    log local0. "Value\[$pnum\]: [URI::query [HTTP::uri] [lindex $params 0]]"
  }
}

Well, now you've got the tools to rip a URI apart. Parse away...

Published Feb 06, 2007
Version 1.0

Was this article helpful?

No CommentsBe the first to comment