How to generate a deterministic GUID from two strings in Java

Currently working on MSI packaging for our software and using WiX tool suite to automate the process. Since it was decided that we don’t want to use patches, I had to implement pseudo-major upgrades for each version.  Now, the problem with that is that MSI framework doesn’t really care for version number so much as for the product GUID. In other words, if the version is the same but the product GUID is different (which happens because our build process generates new MSIs every time a module is updated, even if others are not, and the product ID is generated anew with every build), it will run the full uninstall old/install new routine regardless of other factors. This costs extra time during the installation, so I wanted to avoid it if possible.

The idea I had was to write an algorithm that created a unique GUIDs out of a string (or two strings: module/product name and version number). This way I could ensure that the package whose version has not changed will always have the same product GUID, regardless how many times it is rebuilt. The only difficult part was to find a way to make the implementation of said algorithm as simple and stupid as possible.

As I set off, I discovered that Java (my primary programming language at the moment) already has a nifty UUID class. However, it could only create random UUIDs (the universal term describing what Microsoft calls “GUID”) with the static randomUUID() method. Now, there was also nameUUIDFromBytes(), which made the algorithm pathetically easy:

String tempResult = UUID.nameUUIDFromBytes(aSourceString.getBytes()).toString();
System.out.println(tempResult);

However, the problem with that is that it generates Version 3 UUIDs (read more about versions on the Wikipedia) and WiX only accepts Version 4 (like the ones generated by randomUUID(), which are, well, too random). Then I took a look at the standard constructors and saw that there’s only one and it generates a Version 4 GUID out of two Long numbers (basically, a GUID consists of two 64 bit arrays, read two Long numbers: least significant long and most significant long). Yay! Now, where can I get an injective function that would map a string to a particular Long number?

Immediately, I thought of the MD5 hashes but a bit of googling revealed that there is no single pretty string-to-MD5 hashing function in the standard Java libraries. Unwilling to make my own implementation (simple-stupid, remember), however, I soon found the next best thing: the CRC32 class. Now, the beauty of this solution lies in the fact that it’s a standard library class and it’s relatively easy to use:

private CRC32 localCRC32Generator = new CRC32();

public String generateGUID(String anApplication, String aVersion) {
  localCRC32Generator.update(anApplication.getBytes());
  Long tempApplicationChecksum = localCRC32Generator.getValue();
  localCRC32Generator.update(aVersion.getBytes());
  Long tempVersionChecksum = localCRC32Generator.getValue();
  localCRC32Generator.reset();

Basically, you initialize the CRC32 instance, update it with a byte array generated from the source string, read the Long value, then reset the instance. The last part is essential if you want to generate identical Longs during the entire runtime.

The rest seemed easy at first:

    return UUID(tempApplicationChecksum, tempVersionChecksum).toString();

However, I immediately noticed that the resulting UUIDs had a lot of nulls in them. This is because the return values of CRC32.getValue() are 10 digits long and the 64 bit Long is 20 digits. Hence, half of the numbers (higher registers) are padded with nulls in the UUID. So I thought, how to produce two 20 digit Longs out of two 10 digit Longs. The answer is trivial if you know some math:

  Long tempMSBits = tempApplicationChecksum * tempVersionChecksum;
  Long tempLSBits = tempVersionChecksum * tempVersionChecksum;
  return UUID(tempMSBits, tempLSBits).toString();

I chose to use tempVersionChecksum as the multiplier in both cases, because it is what actually changes across versions, whereas the tempApplicationChecksum is just there to avoid collisions between product GUIDs of different product packages with the same version. Apropos collisions: I decided that CRC32 has good enough collision resistance for my purposes and the products of the results are distinct enough on such scale.

However, even after that, I was again frustrated by failure. This time, the WiX compiler again claimed that the GUID is invalid. I consulted the Wikipedia again and remembered that the Version 4 GUIDs had several reserved bits, which, of course, are not present in the UUIDs generated by my algorithm thus far. However, that was a minor problem: all I had to do was to wrap the GUID string I got into a StringBuffer, replace the 15th character with a “4” and the 20th, with an “8”, “9”, “A”, or “B”. Actually, it doesn’t matter which of the four characters is in the 20th position, it can even be always the same, but I wrote a small subroutine for additional collision resistance. In the end my class looked like this:

import java.util.UUID;
import java.util.zip.CRC32;

public class VersionToGuidConverter {

  private CRC32 localCRC32Generator = new CRC32();

  public static void main(String[] args) {
    VersionToGuidConverter tempConverter = new VersionToGuidConverter();
    System.out.println(tempConverter.generateGUID(args[0], args[1]));
  }

  public String generateGUID(String anApplication, String aVersion) {
    localCRC32Generator.update(anApplication.getBytes());
    Long tempApplicationChecksum = localCRC32Generator.getValue();
    localCRC32Generator.update(aVersion.getBytes());
    Long tempVersionChecksum = localCRC32Generator.getValue();
    localCRC32Generator.reset();

    Long tempMSBits = tempApplicationChecksum * tempVersionChecksum;
    Long tempLSBits = tempVersionChecksum * tempVersionChecksum;

    StringBuffer tempUUID =
      new StringBuffer(new UUID(tempMSBits, tempLSBits).toString());
    tempUUID.replace(14, 15, "4");
    tempUUID.replace(19, 20,
      convertSecondReservedCharacter(tempUUID.substring(19, 20)));

    return tempUUID.toString();
  }

  private String convertSecondReservedCharacter(String aString) {
    switch (aString.charAt(0) % 4) {
      case 0: return "8";
      case 1: return "9";
      case 2: return "a";
      case 3: return "b";
      default: return aString;
    }
  }
}

For a cherry tapping, you can add a feature that generates truly random UUIDs (with UUID.randomUUID()) if the Version or Product have a specific value. But that should be easy. Now go and have fun.

This entry was posted in java. Bookmark the permalink.

4 Responses to How to generate a deterministic GUID from two strings in Java

  1. Steve says:

    Thanks for this! This should be in the UUID by default!

    • Koveras says:

      Yeah, it should, but who are we to decide things for the almighty Oracle? 😀

    • Domingo Galdos says:

      Cool stuff! However, this probably should not be in UUID class by default, as it’s just a hack to get around the below (apparent) defect in WiX. There is no good reason to actually generate type 3 UUIDs that are misrepresented as type 4 UUIDs, unless you are trying to feed a type 3 UUID into a library that for some reason insists on type 4.

      “However, the problem with that is that it generates Version 3 UUIDs (read more about versions on the Wikipedia) and WiX only accepts Version 4 (like the ones generated by randomUUID(), which are, well, too random).”

      Do you know why WiX is limited to type 4? Is it an accident or intentional?

      • Koveras says:

        I am pretty sure it’s intentional because they only wanted to have to support one UUID implementation, with the assumption that the users keep track of their own UUIDs and use patches to update packages. The way I used MSI was dictated more by the specifics of our existing deployment process than any standards considerations. 🙂

Comments are closed.