Upcoming Events

Where the Cloud Touches Down: Simplifying Data Center Infrastructure Management

Thursday, July 25, 2013
10:00 AM PT/1:00 PM ET

In most data centers, DCIM rests on a shaky foundation of manual record keeping and scattered documentation. OpManager replaces data center documentation with a single repository for data, QRCodes for asset tracking, accurate 3D mapping of asset locations, and a configuration management database (CMDB). In this webcast, sponsored by ManageEngine, you will see how a real-world datacenter mapping stored in racktables gets imported into OpManager, which then provides a 3D visualization of where assets actually are. You'll also see how the QR Code generator helps you make the link between real assets and the monitoring world, and how the layered CMDB provides a single point of view for all your configuration data.

Register Now!

A Network Computing Webinar:
SDN First Steps

Thursday, August 8, 2013
11:00 AM PT / 2:00 PM ET

This webinar will help attendees understand the overall concept of SDN and its benefits, describe the different conceptual approaches to SDN, and examine the various technologies, both proprietary and open source, that are emerging. It will also help users decide whether SDN makes sense in their environment, and outline the first steps IT can take for testing SDN technologies.

Register Now!

More Events »

Subscribe to Newsletter

  • Keep up with all of the latest news and analysis on the fast-moving IT industry with Network Computing newsletters.
Sign Up

WAN Optimization Part 3: Overcoming Bandwidth Limitations

In my last blog post, I described issues to consider when selecting a WAN optimization product. Now, I want to focus on the fundamentals of WAN optimization.

As I wrote in my first blog post, there are three factors that can impact the performance of any application's connection: the amount of bandwidth, percentage of packet loss and latency. By addressing and correcting each of those factors, WAN optimization products profoundly improve protocol and application performance. I'll examine each of these items over several posts, starting with bandwidth.

How Much Optimized Bandwidth Is Enough?

What is the most important factor in determining the capacity of an optimization product? Most organizations would probably say the maximum amount of optimized bandwidth, and to some extent that's true. An optimization product that can optimize 1 Gbps of WAN bandwidth technically has greater capacity than one that can optimize 50 Mbps of WAN bandwidth.

But as we've seen, the constraints of latency and loss limit the amount of data an application can send over one IP connection. One application connection will probably not be able to consume the entire capacity of an optimization product. It's more likely that many simultaneous connections are needed to utilize the product's full capacity.

This is particularly true when optimizing a branch office. Each user typically generates 10 to15 connections over the WAN. (Just run "NETSTAT" at your Windows command prompt and see for yourself.) Multiplied across all branch office users, the number of simultaneous optimized connections is critical for realizing the peak capacity of optimization products. But how do you fully utilize each connection? That's where deduplication comes into play.

Shopping Lists and WAN Optimization

At a high-level, deduplication works in much the same way my wife creates my weekly grocery shopping. A while back, she became so fed up with my tendency to mistake her handwriting that she printed a standard shopping list. Now she just ticks off the stuff I need to purchase, only scribbling in a few new items every once in a while. (A printed list is very '90s, I know; we've since graduated to Paprika, an iPhone/Android recipe-grocery list manager.)

Deduplication works in much the same way. WAN optimization products inspect incoming data and create a unique "fingerprint," usually with a hashing algorithm. These fingerprints are then shared with the other optimization systems in the network, creating a single, coherent dictionary. (At one time, this was not the case, but today it's fairly common.) On subsequent passes, the optimization systems detect repetitive patterns by comparing the fingerprints, and replacing the outgoing data with small tokens or instructions. The data is reinserted on the receiving side. In this way, bandwidth can be dramatically expanded, enabling a 10-Mbps connection to carry 200Mbps of data.

Deduplication, Compression and Caching

Of course, that description could be used to describe compression, as well, but while compression works on data patterns over a short horizon--typically within a file--deduplication algorithms identify data patters across a much larger timeframe--typically across files.

Deduplication also sounds a lot like caching, but it's not. Like deduplication, a cache compares incoming traffic to a library of data and, if found, delivers data locally, saving the time and bandwidth of traversing the WAN. But that's where the similarity ends. Caches are typically specific to a particular environment--Web caches accelerate HTTP, file caches accelerate CIFS, NFS or some other file service, and object caches are specific to a given application. So organizations end up needing a separate cache for each application being accelerated. In contrast, deduplication can be protocol-agnostic.

There also is the matter of cache coherency. In general, caching is meant for static data as dynamic data change too frequently to be cached. Even slightly changed data would require the entire dataset to be retrieved from across the WAN. Deduplication, though, can detect more granular data patterns.

Evaluating Deduplication Approaches

There are a number of differences between deduplication approaches. As one might imagine, the granularity at which the optimization product inspects the incoming data stream is very important. The fewer bytes that are needed to form a fingerprint, the higher the probability of a "hit" in the optimization 'system's database. Byte-level granularity is ideal, but vendors are typically limited to looking at 16-byte data patterns as they need to create a unique hash to prevent erroneously substituting the wrong data. An alternative approach avoids the problem by using indexes to indicate shifts in data patterns, allowing for true byte-level granularity.

The size of the dictionary is also critical since the more fingerprints that can be stored, the greater the likelihood of detecting a repetitive data pattern. It should be no surprise then that WAN optimization vendors talk about the amount of their disk space as a rough indication of the size and effectiveness of the dictionary.

Of course, size doesn't matter if it's not populated with the right data patterns. By looking across more applications and protocols, there's a greater likelihood that additional data patterns can be detected, eliminating even more traffic from the WAN. This is a major reason why data that's been deduplicated by data replication systems can be further deduplicated by WAN optimization products. They can detect data patterns from other TCP-, UDP- and IP-based protocols running over a WAN and ignored by a system only deduplicating data being replicated between locations.

David Greenfield is a long-time technology analyst. He currently works in product marketing for Silver Peak.

Related Reading

More Insights

Network Computing encourages readers to engage in spirited, healthy debate, including taking us to task. However, Network Computing moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing/SPAM. Network Computing further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | Please read our commenting policy.
Vendor Comparisons
Network Computing’s Vendor Comparisons provide extensive details on products and services, including downloadable feature matrices. Our categories include:

Research and Reports

Network Computing: April 2013

TechWeb Careers