MD5 hash calculation in C

Creating an MD5 hash function from scratch in plain C requires an understanding of bit-level operations, byte ordering, and adherence to the MD5 algorithm.

MD5 (Message Digest Algorithm 5) is widely used to create 128-bit hash values from arbitrary pieces of data. Although it has been deemed insecure for cryptographic uses due to vulnerabilities to collision attacks, it remains prevalent in various non-secure applications, like checksums in data storage.

The MD5 algorithm operates in four main stages - each involving specific mathematical and logical operations on 512-bit chunks of the input data:

  1. Initialization:

    • Define four 32-bit variables (often termed A, B, C, and D) and initialize them with specific constant values.
    • Also, define a lookup table of 64 elements derived from the sine function. These are used to perform bit manipulations on parts of the input.
  2. Preprocessing:

    • Pad the input so its bit-length is congruent to 448, modulo 512 (i.e., the size is 64 bits less than a multiple of 512). Padding is done by appending a single '1' bit, followed by a series of '0' bits, and finally appending the length of the original message as a 64-bit integer.
    • Divide the input into blocks of 512 bits.
  3. Processing:

    • For each 512-bit block, break it into sixteen 32-bit words.
    • Perform four rounds of specific operations on the block using a mix of bitwise operations (AND, OR, NOT, XOR), additions, and bit rotations. Each round includes processing each 32-bit word and updating the A, B, C, and D variables.
    • After all rounds are finished, update the MD5 hash result (accumulating results in A, B, C, D).
  4. Output:

    • Concatenate the variables (A, B, C, D) to get the 128-bit MD5 hash.

In C, several aspects of this algorithm translate directly:

  • Bitwise operations: C provides &, |, ~, ^ for AND, OR, NOT, XOR respectively, and << and >> for bit shifts.
  • Byte ordering: Ensure that byte order matches the MD5 specifications (little-endian).
  • Block processing: Loop structures, memory operations (like memcpy), and array indexing allow efficient block processing.

mbedTLS is a widely used cryptographic library in C. Here's a basic example demonstrating how you might use mbedTLS to calculate the MD5 hash of a string in a C program.

#include <stdio.h>
#include <string.h>
#include <mbedtls/md5.h>

void print_hash(unsigned char hash[16]) {
    for (int i = 0; i < 16; i++) {
        printf("%02x", hash[i]);

void compute_md5(const char *input) {
    unsigned char output[16]; // MD5 output is 16 bytes
    mbedtls_md5_context ctx;

    mbedtls_md5_update_ret(&ctx, (const unsigned char*) input, strlen(input));
    mbedtls_md5_finish_ret(&ctx, output);


int main() {
    char *input = "Hello, World!";
    printf("Original string: %s\n", input);

    printf("MD5 Hash: ");

    return 0;



  1. Includes: Include the necessary header files (stdio.h, string.h, and mbedtls/md5.h).

  2. print_hash Function: This function takes an MD5 hash (an unsigned char array) and prints it in a hexadecimal format. It iterates over each byte of the hash, printing two hexadecimal digits.

  3. compute_md5 Function:

    • Takes a string input.
    • Declares an output buffer output[16] to hold the 16-byte MD5 hash.
    • Initializes an MD5 context variable ctx using mbedtls_md5_init.
    • Processes the input string in three steps: starts, update, and finish.
    • Frees the context memory using mbedtls_md5_free.
    • Calls print_hash to print the calculated hash.
  4. main Function:

    • Defines a string input to be hashed.
    • Prints the original string.
    • Calls compute_md5 to calculate and print the hash.


Ensure mbedTLS is correctly installed and linked. If you’re using gcc for compilation, you might link mbedTLS like so:

gcc your_source_file.c -o output_file -lmbedtls -lmbedcrypto

Make sure to adapt paths and file names to your use case. Ensure that the headers and libraries of mbedTLS are correctly referenced.

