Wednesday, January 20, 2010

Aurora uses Chinese error-checking algorithm?

See Operation Aurora: Clues in the Code.

... "Operation Aurora" is the latest in a series of attacks originating out of Mainland China. Previous attacks have been known as – "GhostNet" and "Titan Rain." Operation Aurora takes its name directly from the hackers this time – the name was coined after virus analysts found unique strings in some of the malware involved in the attack. These strings are debug symbol file paths in source code that has apparently been custom-written for these attacks.

... The compiler often offers other clues to a malware sample’s origin. For instance, if the binary uses a PE resource section, the resource’s headers will often provide a language code. The Hydraq component does use a resource section, but in this case, the author was careful to either compile the code on an English-language system, or they edited the language code in the binary after-the-fact. So outside of the fact that PRC IP addresses have been used as control servers in the attacks, there is no "hard evidence" of involvement of the PRC or any agents thereof.

There is one interesting clue in the Hydraq binary that points back to mainland China, however. While analyzing the samples, I noticed a CRC (cyclic redundancy check) algorithm that seemed somewhat unusual. CRCs are used to check for errors that might have been introduced into stored or transferred data. There are many different CRC algorithms and implementations of those algorithms, but this is one I had not previously seen in any of my reverse-engineering efforts.

... The CRC algorithm used in Hydraq uses a table of only 16 constants; basically a truncated version of the typical 256-value table. By decompiling the algorithm and searching the Internet for source code with similar constants, operations and a 16-value CRC table size, I was able to locate one instance of source code that fully matched the structural code implementation in Hydraq and also produced the same output when given the same input ...

... This source code was created to implement a 16-bit CRC algorithm compatible with the implementation known as "CRC-16 XMODEM", while requiring only a 16-value CRC table. It is actually a clever optimization of the standard CRC-16 reference code that allows the CRC-16 algorithm to be used in applications where memory is at a premium, such as hobby microcontrollers. Because the author used the C "int" type to store the CRC value, the number of bits in the output is dependent on the platform on which the code is compiled. In the case of Hydraq, which is a 32-bit Windows DLL, this CRC-16 implementation actually outputs a 32-bit value, which makes it compatible with neither existing CRC-16 nor CRC-32 implementations.

Perhaps the most interesting aspect of this source code sample is that it is of Chinese origin, released as part of a Chinese-language paper on optimizing CRC algorithms for use in microcontrollers. The full paper was published in simplified Chinese characters, and all existing references and publications of the sample source code seem to be exclusively on Chinese websites. This CRC-16 implementation seems to be virtually unknown outside of China, as shown by a Google search for one of the key variables, "crc_ta[16]". At the time of this writing, almost every page with meaningful content concerning the algorithm is Chinese ...


Steven Li said...

The same algorithm was quoted by some American guy in 2007 already, in Oregon no less.

Steve Hsu said...

Has anyone pointed this out to Joe Stewart?

bobby fletcher said...

Joe Steward is only correct that this CRC16 algorithm came from the single-chip processor application.

Here's some REALLY old TI calculator code with nearly identical algorithm (look for RemoteCRC16Lookup and the RemoteByte functions):

bobby fletcher said...

Here's another example, like the one Steve L found:

This is from IDG in 2003.

bobby fletcher said...

Here's another problem with the claim code was lifted from this Chinese whitepaper:

The nibble CRC sample code in the Chinese white paper does not match the code used - it is missing the 12-bit shift optimization, rather, relied on two divisions to obtain top 4 bits:


Blog Archive