iRule Security 101 - #03 - HTML Comments

In this session of iRules Security 101, I'll walk you through on a process to strip unnecessary content from your outbound application responses.  Section 3.2.5 of RFC 1886 (The Hypertext Markup Language - 2.0) allows for comments to be enclosed within HTML content.  In certain cases, this could lead to unwanted information being exposed. 

<body>
  ...
  <!-- Pull user info from Users table in database and format a list from that data -->
  ...
</body>


In this article, I'll show you how to remove all those HTML comments from all HTTP traffic that's leaving your network. Other articles in the series:

An early of every (respectable) developers training is to thoroughly comment ones code.  Typically this is a safe process as source code is either compiled or obfuscated before being made available to the client.  But, with the advent of web based applications, situations can occur that could cause your source code to be leaked.  While stripping out HTML comments will not completely secure all content breaches, it will do a small part in making sure that any internal information included in asp/jsp/html development is not allowed to reach the masses.

The following example will inspect all HTML responses for patterns matching HTML comments and replace those characters with spaces, effectively erasing them from the outside world.

 

when HTTP_REQUEST {
  # Don't allow data to be chunked. This ensures we don't get
  # a comment that is spread across two chunked boundaries.
  if { [HTTP::version] eq "1.1" } {
    if { [HTTP::header is_keepalive] } {
      HTTP::header replace "Connection" "Keep-Alive"
    }
     HTTP::version "1.0"
  }
}
when HTTP_RESPONSE {
  # Ensure all of the HTTP response is collected
  if { [HTTP::header exists "Content-Length"] } {
     set content_length [HTTP::header "Content-Length"]
  } else {
     set content_length 1000000
  }
  if { $content_length > 0 } {
     HTTP::collect $content_length
  }
}
when HTTP_RESPONSE_DATA {
  # Find the HTML comments
  set indices [regexp -all -inline -indices {<![ \r\n\t]*--([^\-]|[\r\n]|-[^\-])*[^/][^/]--[ \r\n\t]*>} [HTTP::payload]]
  # Replace the comments with spaces in the response
  #log local0. "Indices: $indices"
  foreach idx $indices {
     set start [lindex $idx 0]
     set len [expr {[lindex $idx 1] - $start + 1}]
     #log local0. "Start: $start, Len: $len"
     HTTP::payload replace $start $len [string repeat " " $len]
  }
}

The special sauce in here is the regular expression used to search for the comments.  I'll leave it to you all to figure out how the regular expression works and possibly rehash it when I start the "iRules Ninja" series. 

Bonus points to anyone who can comment on why I added the "[^/][^/]" towards the end of the regexp.

Get the Flash Player to see this player.
Published Aug 29, 2007
Version 1.0

Was this article helpful?

5 Comments

  • Is the trailing negated match to disqualify javascript comments?
  • Somehow I knew citizen_elah would be quick to answer...

     

     

    Yes, typically embedded javascript is enclosed in html comments and is every case I can think of, it is undesirable to erase the client side javascript before sending the page to the browser.

     

     

    -Joe
  • Mike_Lowell_108's avatar
    Mike_Lowell_108
    Historic F5 Account
    Actually that part of the regex seems incorrect. My memory is telling me that bracket expressions contain a list of characters, not a string, so the 2nd slash is redundant. "man 7 regex" seems to confirm this:

     

     

     

     

    "If the list begins with `^', it matches any single character [...]"
  • Mike_Lowell_108's avatar
    Mike_Lowell_108
    Historic F5 Account
    Oh, and I almost forgot: the following PCRE has served me well...

    
    s///msg;

    You'll note that I took a different approach to matching. I'm not sure it's better, bit it is different, for sure. 🙂 The idea was to make sure I was allowing "anything" except sequences that would mean something special. I settled on this approach while trying to match a bunch of fairly complex and malformed-looking comments. I'm not sure I solved the problem perfectly, but I did solve it for the test cases I was interested in.
  • Is there an existing method to strip out the C-style comments of CSS?

     

     

    /* this is a comment */