When you run a Diffie-Hellman-Merkel exchange, geting a shared key, you cannot be sure a MITM attack might not be fooling you by actually creating two different keys, one with each end. The simple way to check for this would be that the endpoints compare their derived keys. Now, you can be fancy and perhaps use oblivious transfer to compare keys without spilling the key, but there is an easier way. In principle if each endpoint takes a hash of their key and sends part of the hash to the other end (part, so the hash can't be used to tell what the whole key is. One can use the whole thing) then each end can compare the hash with the local computation of it, and if they are alike, this shows the keys match and no MITM attack as described is going on. The worry is however that a MITM might notice these hash values and replace them to maintain deception. The MITM has of course the problem that for most traffic he needs to send data across transparently, to avoid notice. He can't know about possible later checksums or the like, as long as traffic is not completely predictable. So let's try and use this to hide the sending of hashes. Wait here till the connection has entered "normal" mode so user traffic begins. Suppose each end computes the hash of the shared key, H, a random number K, and several other random numbers P,Q, R, S, T. Each end computes the encrypted value of H using K. Call this E. The number of random numbers sent also should be randomly picked and not constant. Obviously they all need to be the same length and really random so entropy tests show nothing. Now each end sends, in random order, E, K, P, Q, R, S, T, sending one number and waiting for the other end to send one (so the MITM cannot just store the whole sequence). When out of random numbers the site sends a simple "out of data" message and when both are out they stop sending these. Now each end sends a message to the other "the message sent 5 messages ago (or however many it was) is the value E, and the one sent 7 ago is the value K". Note the MITM now can tell the hash, but he cannot keep the other end (in both cases) from receiving K and E since they were already sent. They are not distinguishable values so cannot be altered on the fly, and it is OK for the ends to send additional check values to detect tampering, referring to data already sent. The endpoints however now have enough information to compute H by decrypting E with K so they can now detect the MITM. If the MITM tried to store all messages, the exchange would fail due to each side waiting for replies (after the first). Note a "first" is defined in D-H exchange and can just be used. By adding a few more computations on random numbers sent the ends can detect other tampering. But unless the values of H match on both ends, the tampering MITM is detected. If a MITM should be detected of course, the channel is known compromised. Responses would be either to run a new D-H exchange and the detection scheme, or quit. If a good D-H key should be computed and checked it can be used. If a MITM is there, he will see noise.. Now there is a possible problem with the above: The MITM might notice that this is going on and decide to send some extra numbers including a fake H and K, and then alter the message about which messages have the real H and K (actually, E and K). We need a better way to tell when each side is done sending numbers so that at any rate by the end each side can tell what the valid range of numbers for holding E and K is. If something tampers with this, then, the parties would know the conversation is tapped and the MITM is detected. A fixed message won't do, but perhaps sending a hash of all numbers first will. Recipient computes hash with every new number and expects no more when this matches, so each knows when the real data stops. If the MITM notices that numbers are being sent and decides to just send his own early in the sequence, too, this makes it hard for him to match this hash using a faked E and K. He could send them but the parties will know they are wrong. The MITM can also possibly intercept and alter the final message indicating which mesage had the value of E and which had K, but doing this will also not produce matching H and thus the interception will be detected. There may be other weak points but this seems worth considering as a possible starting point to strengthen D-H. The foregoing is better, but if this is done immediately on completion of D-H the MITM could possibly just replace the whole sequence. If he can send a fake hash that might spoil the scheme. So we need to ensure the MITM must start sending data through transparently. Do this by starting to send user packets, and ensuring our numbers packets are padded some so they are harder to recognize: some user packets will be in their size range. Then we send things through as before, after a few user packets, and with user packets interspersed. At the end the message would need to have the information about which prior packets were part of the checking sequence. This will allow test of the hash. Note too the message about prior packets needs to be recognizable and the ends have to keep records of packets that might have been part of the sequence. If the MITM tries to send a fake message about what's what, he still has the problem that the whole sequence is there and he has to maintain a consistent story about what it all is, or the interference is noticed. Thus the sequence would be: 1. Generate H,E,K,PQRST... as many as you want to send. Number of items to send should be random but above, say, 4. 2. Figure the order you will send these and remember which is which 3. Compute a hash of all you will send. 4. Start sending user data in with this all. 4. Send the hash 5. Send the other items. Recipient on receiving anything saves any packets that might be part of the sequence. Note all these sends are interleaved with receives of other end so MITM cannot just store them all and send edited ones... 6. Send something to other end to indicate which message had E and K and other mappings (This must have mapping of all that was sent including the hash) When this message is received the hash gets checked and additional checks for structure (detail TBD) of other numbers get done. These need to be aware the MITM might try to append his own messages so do reasonableness checks of numbers of messages seen, so skipping a valid sequence can be detected. Having the hash cover this "mapping" message is a good idea too. Limiting amount of user data so skipping a whole sequence is infeasible might be a good avenue. 7. Compute other end's H from the messages he sent you. 8. Compare H values. If they are alike there is no interference. If they are not or if something else failed, this is evidence the connection is not secure. 9. At this point it is no longer necessary to keep copies of traffic. It is of course advantageous for step 6 to be obscured but in any case including it in the hash might be useful. You want some sort of MAC to guard against at least noise. It is also useful to send additional hashes of some other numbers, or operations that should work (say, let R be the ciphertext of S encrypted with T) to give additional problems to anyone wanting to tamper. In the above you note that the difficulty is in ensuring that the MITM cannot just replace the whole sequence of messages with his own. If he knows you have some code that runs this system right after a d-h exchange, he can possibly get away with this unless there's some user traffic that looks a lot like the hash and so on. If what is going back and forth is a human conversation, having that get messed with will be noticeable. Otherwise the MITM may be able to pick out the validation traffic well enough to put his own in, and then point at that at the end. (Closing every possible hole here is hard; what we do is make more work for the MITM.) It could be noted that burying the sequence in a stego message might work more easily. The cover mesage would tend to show up changes if it were not passed as is. Again though, if everyone does this every time, the MITM will know how to look in the stego message and replace bits. If you want to make the MITM's job hard, you have to fix it so he can tell the messages have been sent only after the fact or only after analysis that is hard to do as a middleman. Starting with a hash is good in that if it gets through, the MITM cannot alter anything undetected, and if alteration is detected the MITM is exposed. But you need to know the hash isn't just replaced along with what follows. Anything that makes the MITM unsure when he can get away with altering traffic if a good mechanism here as detecting the MITM is what we want. The final mapping message is a weak point here, since a MITM can see it before it goes to its destination. If everything else is sent and the MITM blocks or scrambles the message it would detect him, but it means he can tell when more checking ends. By using MACs on all messages and periodic MACs that cover multiple messages you make it harder to alter the message stream of course, but only by having something unpredictable in the stream (like natural language messages that can express meaning in huge numbers of ways) can you make the MITM fail at mechanized intervention. We can suggest sending a hash of material sent after it all gets sent, which could inform the other side that the stuff it received was what was sent...IF the MITM doesn't just send a replacement. He can, especially if this hash is always sent in the same position. However it is perhaps possible to send multiple hashes, after additional stuff is sent, so that to avoid detection the MITM must recognize all the hashes and replace them. The real endpoints have to keep track of the data too, but they need only do this. The MITM needs to record his lies also and be able to recognize the correct hashes so he can change them. This is not wonderful, but does give a way which may make detecting tampering give an advantage to the real communicating folks and make it hard for a MITM to avoid detection. Thus the scheme here gives perhaps asymptotic detection of MITM, and certainly raises the effort a MITM needs notably. Whether it is worth it is a user decision. Glenn Everhart 7/18/2014