DNS records, TTLs and SORBS.

May 23 2010

Another common point of contention over SORBS’ policies is the perceived TTL requirement.

The TTL issue is only a requirement on PTR record for Single IP delistings, and even then it is only a requirement for delisting from the SORBS DUHL database.

So what is a TTL and why is it an issue?

TTL means ‘Time To Live’ and is the time a record is held as valid and cached before rechecking.  TTLs apply to many things such as Web Pages and DNS records, where web page TTLs are used to tell your browser not to reload the page from the server for a minimum amount of time.  Browsers can ignore this value, but it is recommended that they do not.  In DNS it means when you enter “http://www.sorbs.net/” the computer knows to lookup www.sorbs.net, contact the host on port 80 and request a web page called www.sorbs.net.  In technical terms this is done in the following steps completely transparently to the user:

First the browser looks at the protocol requested (in this case http) and knows it is a HTTP (web) request, as there is no port specified it knows to use the default port of 80.

Next it grabs the hostname and looks up www.sorbs.net by sending the request to the local DNS server(s) using “Give me the A record for www.sorbs.net”. The DNS server will then consult it’s authoritative records, and if it’s not a SORBS name server will not find it.  The server will then consult it’s “Cache” entries and if it find an entry check the TTL.  The TTL will consist of two parts, the value and the time it was retrieved last, if the value added to the time retrieved is in the past the server will expunge the record and check it’s “root zone” file.  If the TTL+retrieval time is in the future it returns the record to host which in turn gives it to the browser.

The ‘root zone’ file is a list of ‘hints’, hints are a static list of where to go to find answers to the answers, in this case the hints will return the list of root servers which when queried will return NS records for the domain ‘sorbs.net’, they will also send the IP addresses associated with the names in the NS records.  These NS records can then be used to locate the servers with the authoritative information for the domain ‘sorbs.net’.  The resolving server (the local one to the browser) will then ask the authoritative servers for ‘sorbs.net’ for the A record for ‘www.sorbs.net’, the result, the time it was retrieved and the TTL of the record will then be added to the local ‘cache’ to save going through the process again for any other requests for the same information within the same time.  The resolving server then sends the information back to the host with the browser which in turn passes it to the browser.

At this point the browser has the request for ‘http://www.sorbs.net/’ and a DNS record that says ‘www.sorbs.net is an A record with value 111.125.160.134′ and will remember the record for the next 600 seconds (600 seconds being the TTL of www.sorbs.net).  The browser will then use the record to make an HTTP request from the server 111.125.160.134 on port 80 for the web page / on host www.sorbs.net.  The server sends the result and the browser will display the result to the user.

Nice and simple you might think, and so nice that it’s all done completely transparently in the background so you don’t have to know what is going on.  So why is there an issue with the TTL?  Well there are many records in DNS, not just the ‘www’ records, some will tell you what the hostname of the machine is, some will tell you where to send email to, and as we have already seen, some will tell you where you can get more information from.  TTLs just tell you how long that information is valid for.

SORBS uses this information to approve or deny requests, and has on a number of occasions had people try to fake the information for malicious purposes (eg Spammers getting delisted then faking the information to divert attention to someone innocent when they send spam)  For this reason we mandated that if we get requests for delisting of single IPs from the SORBS database we would require some minimum requirements, that being the information is valid for at least 12 hours and preferred 24 hours.  We believe that it should be valid for longer, but operationally this doesn’t make sense.  If a host is part of a cluster of servers and one of those servers has a problem, the administrators don’t want 100′s of people remembering where it is for hours at a time, they want all the requests to go to the remaining servers, and for this reason we don’t require TTLs to be any minimum value on the MX (Mail eXchanger) or A (Address) records, on the PTR (reverse PoinTeR) records.

Now why can the PTR records be high without them being a problem, well consider the following setup for SORBS.

SORBS has 7 mail servers in it’s data centers. 4 of these are ‘Mail eXchanger’ (MX) servers, 2 of them are higher priority than the others and they are as follows:

desperado.sorbs.net priority 10

scorpion.sorbs.net priority 10

catapillar.sorbs.net priority 5

anaconda.sorbs.net priority 5

The lower the number the higher the priority, and therefore all email for sorbs.net should get sent to catapillar.sorbs.net and anaconda.sorbs.net, only getting sent to scorpion.sorbs.net and desperado.sorbs.net if the first to are too busy to answer requests.  Now MX records give a list of hosts where the servers handle email for SORBS.net are, and those hosts when looked up will provide IP addresses (in the same way that the browser will request a webpage as described above.)

If anacoda.sorbs.net were to suffer a real problem (eg the power supplied fried and caught fire taking out the whole host) the remote servers would try it every second time, time out and then retry with catapilla, if catapilla is too busy it’ll fallback to either desperado or scorpion.  This means mail might be delayed so we would probably want to update the server list to exclude anaconda from the list, and if we have set the TTL to 86400 it means some servers will remember that anaconda is one of the server for upto 1 day.  Setting it longer results in a bigger delay for the change.  This is of course undesirable and is often used as the excuse for not changing the TTLs to the length SORBS requires for single IP delistings.

The argument is flawed.

SORBS requests only the PTR records be set to 86400 seconds (1 day) and not the A or MX records, so if you want to move the servers, or reconfigure because of outage, or other issue, you can do what ever you want.  The PTR record is the reverse PoinTeR record that translates the IP address back to a hostname.  The PTR record is used when your host contacts one of SORBS servers if you try to send email to it.  It is also used (for records) when you request a webpage, or when you register on SORBS.  How is it used you might ask..?  Well in most cases it’s just recorded to prevent/identify abuse, in other cases it might be used to block or allow access.  So as you can see it is not used for anything that would affect you operationally (unless you are trying to abuse something) so therefore the TTL should not really matter.  In fact the only time wehave ever seen a TTL matter is when a mistake is made, and the operator/admin wishes to correct the mistake and hosts remember the mistake for hours/days.

Many people use the “but I need to migrate my networks” excuse to try to invalidate the policy.  There are two reasons why this is not valid:

  • First, you don’t migrate every day, and when you do migrate you should be planning it well in advance so you change the TTLs on everything to smaller and smaller ones as the migration approaches.
  • Second, remember the PTR record is only used when your server contacts our server(s) so if you move that server to another IP the new IP’s PTR record should be setup accordingly (and in advance) then when the migration takes place the old and disused IP will retain the PTR record in caches for the length of the TTL (usually 1 day) but as there is no server on it, it’ll never be seen or recorded by anyone.

Another common argument against the policy is with ISPs where they say ‘but our customers might be disconnected and another customer allocated their IP address’

Good network management would result in a customer leaving and the IP addresses going into the ‘unallocated’ pool at the back of the list so until all the addresses have been cycled it is not re-used.  There are no ISPs that we know of that have a turn over of customers on their static IP addresses where all the available space is rotated within 24 hours.

There are many other arguments for and against the policy but we of SORBS have considered each one and can find no valid reasons to have PTR records of 60 seconds except where it comes to abuse, where someone will fake information, do something bad and then change it back.  We therefore mandate that if you want to remove a single IP address from the SORBS DUHL, the PTR record has to have a minimum TTL of 12 hours (43200 seconds.)

Note: Throughout this document we refer to a single IP address, this is because networks of 256 addresses (/24) or larger do not have the same policy as it is likely the ISP is performing a ‘mass update’ of data.  Should anyone request delisting from the SORBS database, the support robot looks at the PTR records for all addresses in the request and will reject the request if the TTLs are lower than 43200 seconds (even if it’s a network request.)   This is because the robot is simple, and is only for sorting the rubbish from the good requests.  Any ISP representative requesting a network delisting would need to reply to the robot response which will result in the request automatically being given to a SORBS administrator (a real human being) who can use a lot more complex logic to analyse and formulate a response to the request.

Comments Off

Comments are closed at this time.