Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected Label Inconsistencies in SHA-1 and SHA-256 Computations Compared to AES-128 in emp-sh2pc #39

Open
Stu-Yang opened this issue Sep 9, 2024 · 1 comment

Comments

@Stu-Yang
Copy link

Stu-Yang commented Sep 9, 2024

Hello, @wangxiao1254 , thank you very much to your team for providing such an excellent MPC library. During my development work using EMP-Toolkit, I encountered a few issues and would like to seek your advice and guidance on these matters.

Issue Description

I have successfully installed emp-tool, emp-ot, and emp-sh2pc, and I am able to run the provided test examples without issue. My current project involves designing a secure two-party computation protocol based on emp-sh2pc. Specifically, I need to understand how output wire labels are computed in the half-gates-based garbled circuits used by emp-sh2pc, and how the relationships between the output wire labels held by Alice (the circuit generator) and Bob (the circuit evaluator) are established.

To investigate this, I modified emp-sh2pc/test/circuit_file.cpp to explicitly output:

  • The global key delta from Alice (the circuit generator)
  • The output wire labels held by both Alice and Bob
  • The final result (result.reveal<string>(BOB)) of the computation

In theory, following the secure computation protocol, Bob's output wire label should either match Alice's output wire label, or it should be the XOR of Alice's output wire label and the global key delta. This relationship holds true for the AES-128 circuit (/bristol_format/AES-non-expanded.txt) based on my testing. However, when testing with the SHA-1 circuit (/bristol_format/sha-1.txt) and the SHA-256 circuit (/bristol_format/sha-256-big.txt), the expected label relationships do not hold, and I am unable to reconcile the output labels between Alice and Bob.

I have tried several approaches to investigate this issue, but so far, none have resolved the problem. Could you please provide guidance on how to address this inconsistency, or offer some insights into what might be causing the deviation in behavior for the SHA-1 and SHA-256 circuits?

1. Environment Setup

I am using the following setup to compile and test the project:

  • OS and CPU: Ubuntu 20.04.2 LTS, Intel(R) Xeon(R) Platinum 8369HB CPU @ 3.30GHz, and 16 vCPU 128 GiB
  • GCC Version: 9.4.0
  • CMake Version: 3.16.3
  • GNU Make Version: 4.2.1
  • Python Virtual Environment: Conda 4.11.0 (with an environment named emp)

The specific versions of the repositories I am using are:

To compile the emp-tool, emp-ot, and emp-sh2pc projects, I used the following commands:

cmake .
make -j 4
sudo make install

I tested the code using two machines, with the following results:

# Terminal 1
(emp) root@aliyun:~/emp-toolkit/emp-sh2pc# ./bin/test_example 1 12345 123
connected
ALICE larger?   0
32

# Terminal 2
(emp) root@aliyun:~/emp-toolkit/emp-sh2pc# ./bin/test_example 2 12345 124
connected
ALICE larger?   0
32

These results matched the expected outcome.

2. Test Case Details

To better understand the half-gate garbled circuit computation process in emp-sh2pc, I tested emp-sh2pc/test/circuit_file.cpp. During the testing, I aimed to understand the behavior of the garbled circuit computation, specifically focusing on the output wire labels.

I identified that the core computation occurs in the following lines of code:

Integer a(128, 2, ALICE);
Integer b(128, 3, BOB);
Integer c(128, 1, PUBLIC);
for(int i = 0; i < 10000; ++i) {
    cf.compute((block*)c.bits.data(), (block*)a.bits.data(), (block*)b.bits.data());
}

The function cf.compute((block*)c.bits.data(), (block*)a.bits.data(), (block*)b.bits.data()) handles the garbled circuit computation. Through my analysis, I found that c.bits corresponds to the output wire labels of the circuit. Specifically:

  • For Alice (the circuit generator), the output wire labels correspond to the label for the mask 0.
  • For Bob (the circuit evaluator), the output wire labels are either the same as Alice's labels or the XOR of Alice's labels with the global key delta.

I modified the code to output the hexadecimal values of the following:

  • The labels in c.bits (for both Alice and Bob in the two-machine mode).
  • Alice's global key delta (in hexadecimal).
  • The final computation result (result).

Below is the modified version of the code:

#include "emp-sh2pc/emp-sh2pc.h"
#include "emp-sh2pc/semihonest.h"
#include <iostream>
using namespace emp;
using namespace std;

const string circuit_file_location = macro_xstr(EMP_CIRCUIT_PATH);  

int port, party;
string file = circuit_file_location + "/bristol_format/AES-non-expanded.txt";  
BristolFormat cf(file.c_str());

void print_delta(block delta) {
    bool data[128];

    uint64_t* ptr = (uint64_t*)(&delta);

    for (int i = 0; i < 64; ++i) {
        data[i] = (ptr[0] & (1ULL << i)) != 0;
        data[64 + i] = (ptr[1] & (1ULL << i)) != 0;
    }

    uint64_t high = 0;
    uint64_t low = 0;

    for (int i = 0; i < 64; ++i) {
        if (data[i]) {
            low |= (1ULL << i);
        }
        if (data[64 + i]) {
            high |= (1ULL << i);
        }
    }

    std::cout << "delta = 0x" 
              << std::hex << std::setw(16) << std::setfill('0') << high
              << std::hex << std::setw(16) << std::setfill('0') << low
              << std::endl;
}

void print_labels(const std::vector<emp::Bit>& bits) {
    for (size_t i = 0; i < bits.size(); ++i) {
        bool data[128];
        uint64_t* ptr = (uint64_t*)&bits[i].bit;

        for (int j = 0; j < 64; ++j) {
            data[j] = (ptr[0] & (1ULL << j)) != 0;
            data[64 + j] = (ptr[1] & (1ULL << j)) != 0;
        }

        uint64_t low = 0, high = 0;
        for (int j = 0; j < 64; ++j) {
            low |= (static_cast<uint64_t>(data[j]) << j);
            high |= (static_cast<uint64_t>(data[64 + j]) << j);
        }

        std::cout << "\"" << "0x" << std::hex << std::setw(16) << std::setfill('0') << high 
                  << std::setw(16) << std::setfill('0') << low << "\","
                  << std::endl;
    }
}


void test(int num_iter, SemiHonestParty<NetIO>* party_obj) {
    double total_time = 0.0;

    Integer a(128, 2, ALICE);
    Integer b(128, 3, BOB);
    Integer c(128, 1, PUBLIC);

    if (party == ALICE) {
        SemiHonestGen<NetIO>* gen_party = dynamic_cast<SemiHonestGen<NetIO>*>(party_obj);
        if (gen_party != nullptr) {
            print_delta(gen_party->gc->delta);
        }
    }

    for (int i = 0; i < num_iter; ++i) {
        auto start = clock_start();

        cf.compute((block*)c.bits.data(), (block*)a.bits.data(), (block*)b.bits.data()); 
        total_time += time_from(start);
        
        print_labels(c.bits);

    }

    cout << "******************** " << "party " << party << " ********************\n"
        << "the number of iterations: " << std::dec << num_iter << "\n"
        << "total num_and: " << (CircuitExecution::circ_exec->num_and() / num_iter) << " * " << num_iter << "\n" << endl;

    cout << std::fixed << std::setprecision(3)
         << "total running times: " << (total_time) / (1000.0) << " ms" << "\n"
         << "average running times: " << (total_time / num_iter) / (1000.0) << " ms\n" << endl;

    cout << "result = "  << c.reveal<string>(BOB) << "\n" << endl;
}


int main(int argc, char** argv) {
    cout << "circuit file: " << file << endl;
    int num_iter = 100;

    parse_party_and_port(argv, &party, &port);
    NetIO* io = new NetIO(party == ALICE ? nullptr : "127.0.0.1", port);
    
    SemiHonestParty<NetIO>* party_obj = setup_semi_honest(io, party);
    test(num_iter, party_obj);

    size_t comm = io->counter;
    cout << std::fixed << std::setprecision(3)
         << "total comm. cost: " << static_cast<float>(comm) / (1024.0) << " KB" << "\n"
         << "average comm. cost: " << static_cast<float>(comm) / (1024.0 * num_iter) << " KB" << endl; 
    
    finalize_semi_honest();
    delete io;
}

I believe my analysis is correct, and the modified code seems fine, though a confirmation is needed.

3. Expected Behavior

With the computation iteration count set to 1, I ran the modified code in a LAN environment (configured using throttle.sh, with bandwidth set to 3GB/s and latency set to 0.32ms). The results are as follows:

Terminal 1

(emp) root@aliyun:~/emp-toolkit/emp-sh2pc# ./bin/test_circuit_file 1 12345
circuit file: /usr/local/include/emp-tool/circuits/files//bristol_format/AES-non-expanded.txt
connected
delta = 0x95387a32e17c61e59dc7bd36f072c473
"0xd1200ebbb21e0e18d0a068eeebeccfff",
"0x67e2ebaad80ae50d1e3cdfb27fbc8b94",
"0x4a7a4fd1cbaac1e459679c3508cdfa92",
"0x4fabd7105431154801f22f6894fd481d",
"0xc7f3f6aff938b19220edfedefb3f4a66",
"0xff842c5213451fea4f70d4b42a5098bf",
"0xed2efd9854627e853363742f38329f15",
"0x86de931aad982d4a98ff532b86ceeb94",
"0x370f9e08f5fc61d501b33c90ce357da0",
"0x6739679f3d2382f68c114311792279ae",
"0xe734b228835b86475e5ab0765b69877a",
......
******************** party 1 ********************
the number of iterations: 1
total num_and: 6800 * 1

total running times: 0.497 ms
average running times: 0.497 ms

result = 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

total comm. cost: 221.312 KB
average comm. cost: 221.312 KB

Terminal 2

(emp) root@aliyun:~/emp-toolkit/emp-sh2pc# ./bin/test_circuit_file 2 12345
circuit file: /usr/local/include/emp-tool/circuits/files//bristol_format/AES-non-expanded.txt
connected
"0x4418748953626ffd4d67d5d81b9e0b8c",
"0x67e2ebaad80ae50d1e3cdfb27fbc8b94",
"0x4a7a4fd1cbaac1e459679c3508cdfa92",
"0xda93ad22b54d74ad9c35925e648f8c6e",
"0xc7f3f6aff938b19220edfedefb3f4a66",
"0xff842c5213451fea4f70d4b42a5098bf",
"0x781687aab51e1f60aea4c919c8405b66",
"0x86de931aad982d4a98ff532b86ceeb94",
"0xa237e43a148000309c7481a63e47b9d3",
"0x6739679f3d2382f68c114311792279ae",
......
******************** party 2 ********************
the number of iterations: 1
total num_and: 6800 * 1

total running times: 4.901 ms
average running times: 4.901 ms

result = 10010010101110100001110111010001001111011111101001001000001110101110010100110010011011111111101100100010001111110011101100110010

total comm. cost: 264.255 KB
average comm. cost: 264.255 KB

To verify the correctness of the labels computed by Alice and Bob, I wrote a Python program that checks if Bob’s label is either equal to Alice’s label (corresponding to an output value of 0), or equal to the XOR of Alice’s label and the global key delta (corresponding to an output value of 1).

In the case of the AES-128 circuit, I observed that this relationship holds true, meaning the labels between Alice and Bob are consistent with the expected behavior. Furthermore, the output string computed based on the relationship between their labels matched the result produced by the original code.

The Python program I used for this validation is as follows:

def validate_labels(participant1_labels, participant2_labels, halfgate_delta):
    """
    Validate whether each label in participant2_labels matches either the corresponding label in participant1_labels
    or the XOR of that label with halfgate_delta.

    :param participant1_labels: List of hexadecimal strings representing participant 1's labels.
    :param participant2_labels: List of hexadecimal strings representing participant 2's labels.
    :param halfgate_delta: Hexadecimal string representing the halfgate delta value.
    :return: List of booleans indicating if each label in participant2_labels is valid.
    """
    valid_results = []
    delta = int(halfgate_delta, 16)
    
    for label1, label2 in zip(participant1_labels, participant2_labels):
        label1_int = int(label1, 16)
        label2_int = int(label2, 16)
        valid = label2_int == label1_int or label2_int == (label1_int ^ delta)
        valid_results.append(valid)
    
    return valid_results

alice_labels = [
    "0xd1200ebbb21e0e18d0a068eeebeccfff",
    "0x67e2ebaad80ae50d1e3cdfb27fbc8b94",
    "0x4a7a4fd1cbaac1e459679c3508cdfa92",
    ...
]

bob_labels = [
    "0x4418748953626ffd4d67d5d81b9e0b8c",
    "0x67e2ebaad80ae50d1e3cdfb27fbc8b94",
    "0x4a7a4fd1cbaac1e459679c3508cdfa92",
    ...
]

# halfgate_delta = "0x95387a32e17c61e59dc7bd36f072c473"  # Use this value; if all results are True, labels match the protocol
halfgate_delta = "0x0"                                   # Using this value gives the computed circuit output based on labels

# Running the validation
results = validate_labels(alice_labels, bob_labels, halfgate_delta)

print(results)

bit_str = ''.join(['0' if bit else '1' for bit in results])

print(bit_str)

The program compares the labels from Alice and Bob, checking if Bob's labels match either Alice's label or its XOR with the global key delta. In the AES-128 circuit computation, the program output was consistent with the expected relationship, and the resulting bit string generated by comparing the labels matched the result produced by the secure computation code.

4. Observed Behavior and Attempts to Resolve

However, when I switched the circuit file to /bristol_format/sha-1.txt or /bristol_format/sha-256-big.txt, the output labels were entirely inconsistent. None of the labels matched between Alice and Bob, and using the Python program to validate these labels resulted in complete failure. This is very puzzling, as there should not be any fundamental difference between the AES-128 circuit computation and the SHA-1 or SHA-256 circuit computations.

Terminal 1

(emp) root@aliyun:~/emp-toolkit/emp-sh2pc# ./bin/test_circuit_file 1 12345
circuit file: /usr/local/include/emp-tool/circuits/files//bristol_format/sha-256-big.txt
connected
delta = 0xca6709be88382e9b3fea0a7f9c7e403b
"0xd373afccf9e6244be3d7335c6a1e93be",
"0xe86021edab338ccb3391231d7e3d163a",
"0x1e382cf0580b5f2cd797e1ab04a64de4",
"0xca3fb29e5031b5d8f156da5e87af6b72",
"0x60ae378b2b28436e72fa0767afa51ec6",
"0x7cc74e6fbe2ec7a24b23ef4cab13e8a0",
"0xde79040f29cc3c5a995b66b692216c59",
"0x48a551abc88eafe72c9daddfdbe09a2b",
"0x4f4122539d576150919bf031c1653b10",
"0x1185fb4c88a75476ca17c769dec9531f",
......
******************** party 1 ********************
the number of iterations: 1
total num_and: 90828 * 1

total running times: 7.603 ms
average running times: 7.603 ms

result = 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

total comm. cost: 2847.219 KB
average comm. cost: 2847.219 KB

Terminal 2

(emp) root@aliyun:~/emp-toolkit/emp-sh2pc# ./bin/test_circuit_file 2 12345
circuit file: /usr/local/include/emp-tool/circuits/files//bristol_format/sha-256-big.txt
connected
"0x017c3e61a54e50db9245dc9d62da6bc2",
"0xaf78a43788d2368b9c9a9bdb1386026f",
"0xcccdf29f6d6da7b5d6a79d598ca46d25",
"0x58b2778bd87d483e60941f69ecb3635f",
"0x1ca11e7ae9fd1428c87a914284f1844c",
"0x60f9867bb3f0714e0677be4dabe5d23a",
"0xc4cf50101ddff6efc6d886df11b8064b",
"0x3849295d036853acb5172d3971cff48e",
"0x99d6ff6d3d16ad95b898af2e7a12d9bd",
"0xa5ca3055732add239126fc7c03920ef4",
......
******************** party 2 ********************
the number of iterations: 1
total num_and: 90828 * 1

total running times: 13.708 ms
average running times: 13.708 ms

result = 0111000111001000010011000000110001111110010101110001111100100110100100010110001110001000010101011110010111001000100110100110000100011010010001000001111111000010000000111110110101000100011011101011110000000010000001001110110001011001101001110001111101001111

total comm. cost: 264.380 KB
average comm. cost: 264.380 KB

To investigate further, I used gdb to debug the above code but did not encounter any obvious anomalies. Additionally, we checked the example inputs and outputs provided for SHA-1 and SHA-256 on Nigel Smart’s MPC Circuits page. For instance, we tested the following input and output for SHA-256:

  • Input: 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
  • Expected Output: da5698be17b9b46962335799779fbeca8ce5d491c0d26243bafef9ea1837a9d8

We set the input values a, b, and c in emp-sh2pc/test/circuit_file.cpp to 0, 0, and 0 (256 bits), respectively. However, the output generated by the secure computation (71c84c0c7e571f2691638855e5c89a611a441fc203ed446ebc0204ec59a71f4f) was not equal to the expected result. This further adds to the confusion, as we could not identify the reason for this discrepancy.

At this point, I am unsure if the issue is with the SHA-1 or SHA-256 circuits themselves or with how the emp-sh2pc framework handles them.

5. Request for Assistance

In summary, I would like to consult on the following questions:

  1. Is there any issue with my analysis above, particularly with my understanding of the labels held by Alice (the circuit generator) and Bob (the circuit evaluator)? Additionally, does c.bits indeed correspond to the output wire labels as I expect?

  2. In the AES-128 example, does my analysis hold true and correctly reflect the label relationships between Alice and Bob? If so, why do inconsistencies arise when using the SHA-1 and SHA-256 examples? Is there any specific reason for these discrepancies that I may have overlooked?

Any guidance or insight into these questions would be highly appreciated. I am particularly interested in understanding whether this behavior stems from something specific to the SHA circuits or from a potential misunderstanding on my part. Any further discussions can take place under this issue, and feel free to contact me via email at [email protected] for any clarifications or additional information.

@wangxiao1254
Copy link
Member

Let me take a look. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants