Game Development Reference
In-Depth Information
Imagine you have a 30 MB data file included with an initial release of your prod-
uct, and in the next update 200 KB is modified. Would it be feasible to have every-
one download the new data file? Such a large file would take a long time to down-
load on a dial-up connection, especially when only a small fraction of the original
data was modified. A better method would be to transmit only the bytes that
changed, and merge them into the existing data file.
An ideal solution would be a utility that could take an old and a new version of a
file, generate a list of differences, and offer a method to both apply and deploy the
data changes to users.
To start, we need an algorithm that can determine a list of differences between two
arbitrary sets of data that are sequentially similar, and do so in a manner that can
transform the old dataset as efficiently as possible.
Note
The large segments of source code are available on the Companion Web site. Please refer to the
source code of the example in order to clarify any implementation questions.
What Is Levenshtein Distance?
Devised in 1965 by a Russian scientist named Vladimir Levenshtein, Levenshtein
distance (LD) is an algorithm designed to measure the similarity between two
strings. The metric is also known as edit distance by people who cannot pro-
nounce “Levenshtein,” and it is a measure of the smallest number of deletions,
substitutions, and insertions required to transform one string into another, which
we will refer to as source (s) and target (t), respectively. An edit is either deleting a
character, substituting one character for another, or inserting a character.
Levenshtein distance is a [theta](m x n) algorithm, where m and n are the lengths
of the strings, that computes the edit distance in time proportional to the length
of the source times the length of the target. The greater the edit distance, the more
differences are present between the two strings.
Given:
(s) = “Apple”
(t) = “Apple”
LD(s, t) = 0 because both s and t are identical.
(s) = “Apple”