Loading color scheme

MD5 hash calculation in C

Creating an MD5 hash function from scratch in plain C requires an understanding of bit-level operations, byte ordering, and adherence to the MD5 algorithm.

MD5 (Message Digest Algorithm 5) is widely used to create 128-bit hash values from arbitrary pieces of data. Although it has been deemed insecure for cryptographic uses due to vulnerabilities to collision attacks, it remains prevalent in various non-secure applications, like checksums in data storage.

The MD5 algorithm operates in four main stages - each involving specific mathematical and logical operations on 512-bit chunks of the input data:

  1. Initialization:

    • Define four 32-bit variables (often termed A, B, C, and D) and initialize them with specific constant values.
    • Also, define a lookup table of 64 elements derived from the sine function. These are used to perform bit manipulations on parts of the input.
  2. Preprocessing:

    • Pad the input so its bit-length is congruent to 448, modulo 512 (i.e., the size is 64 bits less than a multiple of 512). Padding is done by appending a single '1' bit, followed by a series of '0' bits, and finally appending the length of the original message as a 64-bit integer.
    • Divide the input into blocks of 512 bits.
  3. Processing:

    • For each 512-bit block, break it into sixteen 32-bit words.
    • Perform four rounds of specific operations on the block using a mix of bitwise operations (AND, OR, NOT, XOR), additions, and bit rotations. Each round includes processing each 32-bit word and updating the A, B, C, and D variables.
    • After all rounds are finished, update the MD5 hash result (accumulating results in A, B, C, D).
  4. Output:

    • Concatenate the variables (A, B, C, D) to get the 128-bit MD5 hash.

In C, several aspects of this algorithm translate directly:

  • Bitwise operations: C provides &, |, ~, ^ for AND, OR, NOT, XOR respectively, and << and >> for bit shifts.
  • Byte ordering: Ensure that byte order matches the MD5 specifications (little-endian).
  • Block processing: Loop structures, memory operations (like memcpy), and array indexing allow efficient block processing.

mbedTLS is a widely used cryptographic library in C. Here's a basic example demonstrating how you might use mbedTLS to calculate the MD5 hash of a string in a C program.

#include <stdio.h>
#include <string.h>
#include <mbedtls/md5.h>

void print_hash(unsigned char hash[16]) {
    for (int i = 0; i < 16; i++) {
        printf("%02x", hash[i]);
    }
    printf("\n");
}

void compute_md5(const char *input) {
    unsigned char output[16]; // MD5 output is 16 bytes
    mbedtls_md5_context ctx;

    mbedtls_md5_init(&ctx);
    mbedtls_md5_starts_ret(&ctx);
    mbedtls_md5_update_ret(&ctx, (const unsigned char*) input, strlen(input));
    mbedtls_md5_finish_ret(&ctx, output);
    mbedtls_md5_free(&ctx);

    print_hash(output);
}

int main() {
    char *input = "Hello, World!";
    printf("Original string: %s\n", input);

    printf("MD5 Hash: ");
    compute_md5(input);

    return 0;
}

 

Explanation:

  1. Includes: Include the necessary header files (stdio.h, string.h, and mbedtls/md5.h).

  2. print_hash Function: This function takes an MD5 hash (an unsigned char array) and prints it in a hexadecimal format. It iterates over each byte of the hash, printing two hexadecimal digits.

  3. compute_md5 Function:

    • Takes a string input.
    • Declares an output buffer output[16] to hold the 16-byte MD5 hash.
    • Initializes an MD5 context variable ctx using mbedtls_md5_init.
    • Processes the input string in three steps: starts, update, and finish.
    • Frees the context memory using mbedtls_md5_free.
    • Calls print_hash to print the calculated hash.
  4. main Function:

    • Defines a string input to be hashed.
    • Prints the original string.
    • Calls compute_md5 to calculate and print the hash.

Note:

Ensure mbedTLS is correctly installed and linked. If you’re using gcc for compilation, you might link mbedTLS like so:

gcc your_source_file.c -o output_file -lmbedtls -lmbedcrypto

Make sure to adapt paths and file names to your use case. Ensure that the headers and libraries of mbedTLS are correctly referenced.

Get all interesting articles to your inbox
Please wait