Meson WrapDB for Google's cityhash.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

196 lines
7.3 KiB

  1. CityHash, a family of hash functions for strings.
  2. Introduction
  3. ============
  4. CityHash provides hash functions for strings. The functions mix the
  5. input bits thoroughly but are not suitable for cryptography. See
  6. "Hash Quality," below, for details on how CityHash was tested and so on.
  7. We provide reference implementations in C++, with a friendly MIT license.
  8. CityHash32() returns a 32-bit hash.
  9. CityHash64() and similar return a 64-bit hash.
  10. CityHash128() and similar return a 128-bit hash and are tuned for
  11. strings of at least a few hundred bytes. Depending on your compiler
  12. and hardware, it's likely faster than CityHash64() on sufficiently long
  13. strings. It's slower than necessary on shorter strings, but we expect
  14. that case to be relatively unimportant.
  15. CityHashCrc128() and similar are variants of CityHash128() that depend
  16. on _mm_crc32_u64(), an intrinsic that compiles to a CRC32 instruction
  17. on some CPUs. However, none of the functions we provide are CRCs.
  18. CityHashCrc256() is a variant of CityHashCrc128() that also depends
  19. on _mm_crc32_u64(). It returns a 256-bit hash.
  20. All members of the CityHash family were designed with heavy reliance
  21. on previous work by Austin Appleby, Bob Jenkins, and others.
  22. For example, CityHash32 has many similarities with Murmur3a.
  23. Performance on long strings: 64-bit CPUs
  24. ========================================
  25. We are most excited by the performance of CityHash64() and its variants on
  26. short strings, but long strings are interesting as well.
  27. CityHash is intended to be fast, under the constraint that it hash very
  28. well. For CPUs with the CRC32 instruction, CRC is speedy, but CRC wasn't
  29. designed as a hash function and shouldn't be used as one. CityHashCrc128()
  30. is not a CRC, but it uses the CRC32 machinery.
  31. On a single core of a 2.67GHz Intel Xeon X5550, CityHashCrc256 peaks at about
  32. 5 to 5.5 bytes/cycle. The other CityHashCrc functions are wrappers around
  33. CityHashCrc256 and should have similar performance on long strings.
  34. (CityHashCrc256 in v1.0.3 was even faster, but we decided it wasn't as thorough
  35. as it should be.) CityHash128 peaks at about 4.3 bytes/cycle. The fastest
  36. Murmur variant on that hardware, Murmur3F, peaks at about 2.4 bytes/cycle.
  37. We expect the peak speed of CityHash128 to dominate CityHash64, which is
  38. aimed more toward short strings or use in hash tables.
  39. For long strings, a new function by Bob Jenkins, SpookyHash, is just
  40. slightly slower than CityHash128 on Intel x86-64 CPUs, but noticeably
  41. faster on AMD x86-64 CPUs. For hashing long strings on AMD CPUs
  42. and/or CPUs without the CRC instruction, SpookyHash may be just as
  43. good or better than any of the CityHash variants.
  44. Performance on short strings: 64-bit CPUs
  45. =========================================
  46. For short strings, e.g., most hash table keys, CityHash64 is faster than
  47. CityHash128, and probably faster than all the aforementioned functions,
  48. depending on the mix of string lengths. Here are a few results from that
  49. same hardware, where we (unrealistically) tested a single string length over
  50. and over again:
  51. Hash Results
  52. ------------------------------------------------------------------------------
  53. CityHash64 v1.0.3 7ns for 1 byte, or 6ns for 8 bytes, or 9ns for 64 bytes
  54. Murmur2 (64-bit) 6ns for 1 byte, or 6ns for 8 bytes, or 15ns for 64 bytes
  55. Murmur3F 14ns for 1 byte, or 15ns for 8 bytes, or 23ns for 64 bytes
  56. We don't have CityHash64 benchmarks results for v1.1, but we expect the
  57. numbers to be similar.
  58. Performance: 32-bit CPUs
  59. ========================
  60. CityHash32 is the newest variant of CityHash. It is intended for
  61. 32-bit hardware in general but has been mostly tested on x86. Our benchmarks
  62. suggest that Murmur3 is the nearest competitor to CityHash32 on x86.
  63. We don't know of anything faster that has comparable quality. The speed rankings
  64. in our testing: CityHash32 > Murmur3f > Murmur3a (for long strings), and
  65. CityHash32 > Murmur3a > Murmur3f (for short strings).
  66. Installation
  67. ============
  68. We provide reference implementations of several CityHash functions, written
  69. in C++. The build system is based on autoconf. It defaults the C++
  70. compiler flags to "-g -O2", which is probably slower than -O3 if you are
  71. using gcc. YMMV.
  72. On systems with gcc, we generally recommend:
  73. ./configure
  74. make all check CXXFLAGS="-g -O3"
  75. sudo make install
  76. Or, if your system has the CRC32 instruction, and you want to build everything:
  77. ./configure --enable-sse4.2
  78. make all check CXXFLAGS="-g -O3 -msse4.2"
  79. sudo make install
  80. Note that our build system doesn't try to determine the appropriate compiler
  81. flag for enabling SSE4.2. For gcc it is "-msse4.2". The --enable-sse4.2
  82. flag to the configure script controls whether citycrc.h is installed when
  83. you "make install." In general, picking the right compiler flags can be
  84. tricky, and may depend on your compiler, your hardware, and even how you
  85. plan to use the library.
  86. For generic information about how to configure this software, please try:
  87. ./configure --help
  88. Failing that, please work from city.cc and city*.h, as they contain all the
  89. necessary code.
  90. Usage
  91. =====
  92. The above installation instructions will produce a single library. It will
  93. contain CityHash32(), CityHash64(), and CityHash128(), and their variants,
  94. and possibly CityHashCrc128(), CityHashCrc128WithSeed(), and
  95. CityHashCrc256(). The functions with Crc in the name are declared in
  96. citycrc.h; the rest are declared in city.h.
  97. Limitations
  98. ===========
  99. 1) CityHash32 is intended for little-endian 32-bit code, and everything else in
  100. the current version of CityHash is intended for little-endian 64-bit CPUs.
  101. All functions that don't use the CRC32 instruction should work in
  102. little-endian 32-bit or 64-bit code. CityHash should work on big-endian CPUs
  103. as well, but we haven't tested that very thoroughly yet.
  104. 2) CityHash is fairly complex. As a result of its complexity, it may not
  105. perform as expected on some compilers. For example, preliminary reports
  106. suggest that some Microsoft compilers compile CityHash to assembly that's
  107. 10-20% slower than it could be.
  108. Hash Quality
  109. ============
  110. We like to test hash functions with SMHasher, among other things.
  111. SMHasher isn't perfect, but it seems to find almost any significant flaw.
  112. SMHasher is available at http://code.google.com/p/smhasher/
  113. SMHasher is designed to pass a 32-bit seed to the hash functions it tests.
  114. No CityHash function is designed to work that way, so we adapt as follows:
  115. For our functions that accept a seed, we use the given seed directly (padded
  116. with zeroes); for our functions that don't accept a seed, we hash the
  117. concatenation of the given seed and the input string.
  118. The CityHash functions have the following flaws according to SMHasher:
  119. (1) CityHash64: none
  120. (2) CityHash64WithSeed: none
  121. (3) CityHash64WithSeeds: did not test
  122. (4) CityHash128: none
  123. (5) CityHash128WithSeed: none
  124. (6) CityHashCrc128: none
  125. (7) CityHashCrc128WithSeed: none
  126. (8) CityHashCrc256: none
  127. (9) CityHash32: none
  128. Some minor flaws in 32-bit and 64-bit functions are harmless, as we
  129. expect the primary use of these functions will be in hash tables. We
  130. may have gone slightly overboard in trying to please SMHasher and other
  131. similar tests, but we don't want anyone to choose a different hash function
  132. because of some minor issue reported by a quality test.
  133. For more information
  134. ====================
  135. http://code.google.com/p/cityhash/
  136. cityhash-discuss@googlegroups.com
  137. Please feel free to send us comments, questions, bug reports, or patches.