Contents
- Introduction
- What is a sample?
- What is mean utilization?
- How is my 95th percentile rate calculated?
- Detailed example
- view RRD file information
- plot RRD data
- retrieve sample data
- compute a 95th percentile rate
- plot RRD data with a 95th percentile rate
- How many gigabytes are in a megabit/sec?
- I read on webhostingtalk that there are 324GB in a Mbit/sec. Are you saying that's not true?
- My host offers per-GB and 95th percentile billing. Which is better?
- How do I sample my traffic for cost projection or billing purposes?
- Acknowledgements
Introduction
Burstable Internet access is typically billed on what the industry terms the
95th percentile billing model. This applies to most forms of burstable connectivity, from high speed SONET or gigabit ethernet circuits from major transit networks to 10Mb/sec or 100Mb/sec ethernet hand-offs at dedicated server and co-location hosting facilities.
Most end-users are familar with paying for what they use on a per-unit basis. Residential and small commercial power users pay by the kilowatt*hour, motorists buy gasoline in gallons, and most people pay for long distance telephone calls based on the number of minutes they talk. 95th percentile billing doesn't fit these common, universally understood models.
What is a sample?
Hosting companies and ISPs
poll or
sample customer interface ports on their routers and switches at regular intervals. Each sample contains the number of bytes transmitted to the customer (that's you) and bytes received from the customer since the sample took place.
For example purposes, we will examine only traffic the ISP receives from you. This is your "egress" or "outbound" traffic, and is all most webmasters and content providers need to concern themselves with. Let's assume that you transmit 86MB of traffic to your ISP in a five minute sample interval. The ISP will convert that into a rate for the interval. 86MB of traffic is 2.293Mb/sec mean traffic for five minutes.
These samples are taken throughout your billing cycle. Your ISP will accumulate 8,640 samples in a 30 day calendar month!
What is mean utilization
When you figure the number of miles-per-gallon you get when driving your car, you are calculating its mean, or average, fuel use per mile; or miles per gallon of fuel expended. A mean value is one kind of statistical average. The other two are median and mode; and all three terms mean different things. When most people say
average, they intend to say
mean. In this document, when you read the phrase,
mean utilization, you can think of that as average utilization. We're just being specific about which kind of average we've used.
How is my 95th percentile rate calculated?
The
95th in the name of this billing model has specific meaning. In order to calculate the traffic rate for which you will be billed, your ISP sorts the samples taken during your billing period, then ignores the highest five percent of those samples. Using our example assumptions, the top 432 samples are not relavent to your bill. This means that, in a 30 day billing period, you can burst for 36 hours.
Detailed example
Let's look at a detailed example of how this process works. To the right you'll find both a text box with information about a particular rrdtool file and a graph generated based on the data in that file. Click on the radio buttons to bring up different data.
rrdinfo - view RRD file information
The rrdinfo label shows how the RRD file, containing traffic samples, is structured. Note that the
step paramater of our example file is 60, a value stored in seconds. This RRD file can accept samples as frequently as one per minute. You can also see the name of each field stored in the RRD file, in this case,
rx_bits,
rx_packets,
tx_bits, and
tx_packets. The RRD file can store as many fields as you choose, and not all the data needs to be utilized in every graph.
rrdgraph#1 - plot RRD data
Clicking on the rrdgraph#1 button to the right displays the command used to produce the example graph named figure 1. The
start and
end timestamps, in unix time_t format, are supplied to give bounds to the graph. A label is specified by using the
-v argument, and
DEF statements retrieve sample data from the RRD file.
CDEF statements perform calculations based on the stored data.
LINEx statements plot the computed data points on the graph.
GPRINT emits text into the graph's footer.
rrdfetch - retrieve sample data
Now we'll get down to real detail. Click the rrdfetch radio button to bring up raw data from the RRD file. The text box will display a series of sample timestamps and transfer rates during that sample interval. The rx_packets and tx_packets fields have been edited out of our example for simplicity.
compute a 95th percentile rate
These raw samples are what is utilized to figure out your 95th percentile usage number for billing. Click on the
compute radio button to bring up some example Perl code for performing this computation. Note that we make several adjustments from the rrd file used for our example. Don't use this code verbatim somewhere else; your setup may have different quirks than ours.
Notice the 95th percentile figure calculated by our Perl code. The larger of the two values, Rx or Tx, is typically used for billing, while the other value is ignored. Some providers add the two values together, and others combine the Rx and Tx rates in their samples to produce a full-duplex 95th percentile rate. For simplicity's sake, we'll go with the most common scenario, and utilize only the larger of the two.
rrdgraph#2 - plot RRD data with a 95th percentile line
5501732.669680 bits per second is our 95th percentile utilization for the example data. To make that useful, we need to represent it on the graph. You will doubtlessly want to store this information in a billing database for invoicing purposes as well. Click on rrdgraph#2 to view the rrdgraph command used to generate the graph labeled figure 2. This graph contains an
HRULE corrosponding to our 95th percentile value, as well as a text
COMMENT with a less precise rendition of that number, divided by one million to represent megabits per second. I've placed it next to the Rx label because we used data in the Rx direction.
How many gigabytes are in a megabit/sec?
| Example Customer Utilization |
| 95th utilization | 5.50 Mbit/sec |
| mean utilization | 3.11 Mbit/sec |
| GBytes transferred | 1,042.62 |
| GBytes per 95th Mbit/sec | 189.57 |
|
The question most webmasters and other end-users of dedicated server and co-location products have when they hear the term
95th percentile is, "how many gigabytes are in a megabit/sec?" Unfortunately, there is no easy answer. Your traffic patterns throughout each day and month will differ from the example here. Typical web hosting traffic results in about 190GB of data transferred per each Mbit/sec billed under a 95th percentile model.
If you already have a server, uplink port, or other connectivity that samples and plots data in a manner similar to our example, you can do the calculations yourself using the same method described here. We've already shown the 95th percentile usage, 5.50 Mbit/sec, for our example customer. To determine the number of gigabytes transferred you simply multiply the
mean utilization, represented on the example graph at 3.11 Mbit/sec, by the time span of the billing period.
Our example covers the month of October 2004, a 31 day calendar month, and includes an extra hour due to daylight savings time. (31 * 24 hours) + 1 hour yields 745 hours for the month. Our transfer rate is in Mbit/sec, so we further multiply 745 hours * 3600 seconds/hour, yielding 2,682,000 seconds in the invoice period. Divide by 8,000 to convert from Mbit to GByte units, and you have a ratio of
335.25 GBytes per mean 1 Mbit/sec for this invoice period. Multiply by 3.11 Mbit/sec mean utilization rate, as calculated in the graph, and you have approximately
1,042.62 GBytes transferred in our example month. Divide by 5.50 Mbit/sec 95th percentile utilization rate, and we find that our example customer transferred
189.57 GBytes per 1 Mbit/sec 95th percentile. This is very close to the 190 GBytes figure typical of hosting traffic.
I read on webhostingtalk that there are 324GB in a Mbit/sec. Are you saying that's not true?
Yes and no. That 324GB per Mbit/sec figure is roughly accurate for a 31 day calendar month of
continuous 1 Mbit/sec utilization. If you are billed based on 95th percentile usage, as are most dedicated and co-location products, those numbers like 324GB are totally irrelavent to you. Typical hosting traffic patterns result in approximately 190GB of data transferred per 1 Mbit/sec billed on a 95th percentile model. Our example customer is very close to this typical profile at 189.59GB, but it may surprise you to learn that this data set was selected at random.
My host offers per-GB and 95th percentile billing. Which is better?
| Example Customer Equivalent Costs |
| traffic efficiency | 56% or 1 / 1.76 |
| 5.50 Mbit/sec 95th | $40 per Mbit/sec |
| 3.11 Mbit/sec mean | $70.4 per Mbit/sec |
| 1,042.62 GB transfer | $0.21 per GB |
|
The answer to this question depends on the unit price of each product and your traffic patterns. We can address this question in detail using our example customer, above, by calculating the ratio of mean utilization to 95th percentile utilization. We can then apply that same ratio to pricing.
Consider that the example customer's mean utilization during the billing period graphed above is 3.11 Mbit/sec, and that their 95th percentile utilization is 5.50 Mbit/sec. This gives us a ratio of 3.11 / 5.50 == 0.56. Their traffic profile is about 56% efficient, where as a 100% efficient traffic profile would have 1 Mbit/sec of mean usage for every 1 Mbit/sec of 95th percentile usage. In the other direction, we can divide 95th percentile utilization by mean utilization, 5.50 / 3.11 == 1.76, to determine how many 95th percentile Mbit/sec we are billed for every 1 Mbit/sec of mean utilization. This is the ratio of data transferred to 95th percentile Mbit/sec billed.
How are these ratios useful? That tells us how much more you should be willing to pay for a per-GB, or mean Mbit/sec, billed hosting product versus a 95th percentile Mbit/sec product. If a 95th percentile Mbit/sec was sold by your hosting company for $50/Mbit, you could pay as much as 50*1.76, or $88/Mbit, for a mean Mbit/sec product. Using a conservative 28 day invoice month, where 1 Mbit/sec mean equals 302.4GB transferred, you could also pay up to $0.29/GB and still break even. Any discounts below $88/Mbit or $0.29/GB will be a cost savings for our example customer.
If you are already billed on a per-GB or mean Mbit/sec basis, and are looking at 95th percentile billed products, you need to turn the calculation around. If paying $100 per mean Mbit/sec, our example customer could save money by purchasing a 95th percentile billed product at any price below $100*0.56 or $56/Mbit.
How do I sample my traffic for cost projection or billing purposes?
As an end-user, you should ask your hosting company to provide you with copies of their data for your ports or servers. This is much easier than setting up your own polling and graphing system, and will save you from maintaining yet another service on all of your servers. If your hosting company won't give you copies of the raw samples you should find another vendor, but you can still sample servers yourself. That is well outside the scope of this document, but an experienced system administrator will have no difficulty setting this up for you.
As an ISP or hosting company providing burstable services, there are many options available to you. Many billing packages include an integrated SNMP poller, grapher, and billing calculator. If yours does not, you may wish to use
Cacti or
RTG. Both are popular polling and graphing packages. Again, your systems administration staff should be able to set these up.
Acknowledgements
Thanks to all those WebHostingTalk Forums
posters who provided feedback to improve the initial version of this document. "cbtrussell" is especially deserving of my thanks for his numerous, thoughtful suggestions; many of which were incorporated into this revision. Further feedback can be sent via email to the address at the top of this page.