By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
424,825 Members | 1,269 Online
Bytes IT Community
Submit an Article
Got Smarts?
Share your bits of IT knowledge by writing an article on Bytes.

RC4 Encryption Algorithm for VBA and VBScript

Rabbit
Expert Mod 10K+
P: 12,334
09/22/2015 Update: A bug was found in the code. The code block has been updated with the fixed code.

Description
RC4 is one of the most widely used ciphers in the world. It is used in WEP, WPA, SSL, BitTorrent, PDF, etc. It is one of the simplest to understand and implement. It is also one of the fastest algorithms available.

What is RC4
It is a stream cipher, which means that it encrypts a stream of data byte by byte as opposed to a block cipher that encrypts groups of bytes at a time, usually with the inclusion of byte shifting. This usually means that block ciphers are more secure than stream ciphers. One of the weaknesses of a stream cipher, more so than in a block cipher, is that the layout of the data is unchanged. Therefore, if you know the layout of the data, you can flip a bit to change the meaning of the message. For example, if you know the message format is "Deposit: ###", then you can essentially flip one of the bits at byte 10 and hope to change 100 to 900. What is needed then, is a way to authenticate a message. Usually, a cryptographic hash function is used for this purpose. In essence, this message authentication code allows you to verify the sender and the message.

It is a secret key algorithm, which means that it uses a key that should only be known by the intended sender and recipient. A major problem with this is the secure transmission of the key. Public key cryptography solves this by using so called "hard" math where calculating in one direction is easy while solving in the opposite direction takes much much longer. One example is multiplication vs factorization. Multiplying two numbers is easy but attempting to find the two numbers used to get that number takes a long time. This type of "hard" math allows you to transmit public information that the sender and recipient uses to compute a key without having to transmit a key. So why isn't public key cryptography used exclusively? It is a lot slower than secret key cryptograpy.

General Weaknesses of Cryptography
A weakness of cryptography is that they can be hacked using brute force. This means that a message can stay a secret only for as long as it takes a computer to try every password until it finds the one you used. Each byte that you add to a password means that it will take roughly 256 times longer to crack your password. So the question is, how long do you need that message to stay a secret?

Now, that isn't actually a weakness of the algorithm per se. An actual weakness is that many algorithms are subject to mathematical analysis that may reveal what key was used to encrypt the data. Given enough encrypted data using the same or similar keys will result in a crack quicker than it would take using brute force. One way to mitigate this is the use of a nonce, initialization vector, or salt. They are basically random bits that are used with the key so that even though you are using the same key, each message is different because the random bits in effect change the key that is used.

This is why you should choose a longer password and why you should change your passwords often.

How RC4 Works
RC4 encrypts a stream of data by generating a pseudorandom stream of bytes from a keystream. This output is XORed with the message.

To generate the inital keystream, you start out with a 256 item array filled with the sequence 0-255. Looping 256 times, each byte in the key is used to calculate two numbers between 0 and 255. The two numbers at those indexes are then swapped.

With this, the keystream is generated by calculating two indexes, swapping the numbers at those indexes, adding those two numbers mod 256 and then returning the number at that index.

This number is then XORed with one byte of the message. This continues until the end of the message is reached.

Specific Weaknesses of RC4
Mathematical analysis has shown that the first few bytes of the keystream follow a pattern that could reveal the key or keystream that was used to create it. It is suggested that they first few bytes of the keystream should be discarded to reduce this relationship. The standard is 768 bytes but the recommended value is 3072 bytes.

I mention the following weakness because it could be a major weakness in the algorithm but I do not understand enough about the attack to say anything worthwhile about it. Souradyuti Paul and Bart Preneel gave a proof of a prior weakness posited by Itsik Mantin and Adi Shamir about combinatorial analysis of the cipher stream. In their paper, they discuss the weakness and offer a modification to the algorithm that should mitigate it.

Sample Implementation of RC4
This is a function that works in many VB implementations. It even works in a Visual Basic Script and was, in fact, coded specifically for VBScript. But it should be directly portable to VBA. It is an inlined version because calling functions take a lot of overhead. One thing I did to make it more secure is to drop the first 3072 bytes of the keystream. Encryption and Decryption is done using the same function. I validated the pre-drop version against test values.
Expand|Select|Wrap|Line Numbers
  1. Function RunRC4(sMessage, strKey)
  2.     Dim kLen, x, y, i, j, temp
  3.     Dim s(256), k(256)
  4.  
  5.     'Init keystream
  6.     klen = Len(strKey)
  7.     For i = 0 To 255
  8.         s(i) = i
  9.         k(i) = Asc(Mid(strKey, (i Mod klen) + 1, 1))
  10.     Next
  11.  
  12.     j = 0
  13.     For i = 0 To 255
  14.         j = (j + k(i) + s(i)) Mod 256
  15.         temp = s(i)
  16.         s(i) = s(j)
  17.         s(j) = temp
  18.     Next
  19.  
  20.     x = 0
  21.     y = 0
  22.  
  23.     'Drop n bytes from keystream
  24.     For i = 1 To 3072
  25.         x = (x + 1) Mod 256
  26.         y = (y + s(x)) Mod 256
  27.         temp = s(x)
  28.         s(x) = s(y)
  29.         s(y) = temp
  30.     Next
  31.  
  32.     'Encode/Decode
  33.     For i = 1 To Len(sMessage)
  34.         x = (x + 1) Mod 256
  35.         y = (y + s(x)) Mod 256
  36.         temp = s(x)
  37.         s(x) = s(y)
  38.         s(y) = temp
  39.  
  40.         RunRC4 = RunRC4 & (s((s(x) + s(y)) Mod 256) Xor Asc(Mid(sMessage, i, 1))) & ","
  41.     Next
  42. End Function
** Edit **
Download an example to play with.
Jan 21 '11 #1
Share this Article
Share on Google+
33 Comments


mshmyob
Expert 100+
P: 903
Thanks Rabbit. This looks interesting and I look forward to fooling with it.

cheers,
Jan 21 '11 #2

ADezii
Expert 5K+
P: 8,615
@Rabbit - By any chance, did you break the Enigma Code during WWII?(LOL). On the serious side, and out of utter curiosity, why was that Encryption Algorithm almost impossible to break?
Jan 22 '11 #3

mshmyob
Expert 100+
P: 903
@ADezii -
One of the most interesting books I have read is called "The CODE BOOK - The Science of Secrecy from Ancient Egypt to Quantum Cryptography" by Simon Singh.

I highly recommend it.

There is quite a bit on the Enigma machine but basically it used a "scrambler" to encrypt each letter but the thing that made it more difficult was that this "scrambler" would rotate 1/26 (26 letters) of a rotation after each letter was entered. This in essence caused the cipher alphabet to change after each encryption (after each letter was encrypted) and therefore even if you encrpted the letter "S" twice in a row, the letter "S" would never come out encrypted the same. It was therefore hard to find repeating patterns without knowing the original key.

This setup of the scrambler defined six cipher alphabets. The flaw being if a letter was repeated 6 times the scrambler would return to its original position thereby creating a pattern for that specific letter.

A subsequent version of the Enigma used 2 scramblers giving 676 cipher alphabets.

An improved version had 3 scramblers giving 17,576 cipher alphabets. A reflector was also implemented (too long to explain it function).

But you can see that without another Enigma machine and trying to decipher by brain power alone would take forever.

cheers,


cheers,
Jan 22 '11 #4

ADezii
Expert 5K+
P: 8,615
@mshmyob - Thanks for the explanation.
Jan 22 '11 #5

Rabbit
Expert Mod 10K+
P: 12,334
Indeed it was a difficult system to crack but the Polish had it cracked before the start of WW2.
Jan 23 '11 #6

mshmyob
Expert 100+
P: 903
Through their spy network Poland managed to build an Enigma machine. Again through their spy network they discovered that they were the next target of the Nazis and called a secret meeting with England and gave them the Enigma machine so the Nazis wouldn't get a hold of it.

Most people give England credit for duplicating the Enigma machine but it was actually Poland. England for the first few years used brute strength (trial and error - not an easy task in of itself) to guess the keys to decipher the encrypted German codes but without the Enigma machine itself the keys would have been useless.

As the young kids today would say "History rocks"

cheers,
Jan 23 '11 #7

NeoPa
Expert Mod 15k+
P: 31,299
Rabbit:
Indeed it was a difficult system to crack but the Polish had it cracked before the start of WW2.
My understanding is that two Polish chaps were able to get hold of an early enigma machine in the early 1930s. The Polish determined the mechanics of it and passed replicas on to both Britain and France prior to the start of WWII. I must admit that my readings on the matter before today indicated that this was as far as the Poles got, and that the mathematical work was mainly done by British mathematicians, but that seems now to be contested by the Poles. My understanding of the writing of history leads me to believe the claim, but I believe the cracking of subsequent versions of the code (and particularly the Naval version which remained unbroken until very near the end of the war), as well as the techniques developed to enable rebreaking the code on an almost daily basis, were managed by the British Intelligence Services based at Bletchley Park. Interestingly though, the three main Polish mathematicians who were originally responsible for breaking the code, were working at Bletchley throughout much of the war. Their contribution, as well as the quite incredible decision in 1939 for the Poles to share their current information on the cipher machines themselves, was absolutely critical in enabling so much of the German coded traffic to be broken throughout the war. Without this contribution alone, and with the fourth largest allied force in Europe during the war this was certainly not their only contribution to the victory, it seems clear that Nazi Germany would have been ultimately victorious in the conflict. The more I learn of the Polish contributions, the more impressed I am.
Jan 24 '11 #8

Rabbit
Expert Mod 10K+
P: 12,334
The Polish didn't have an original Enigma, but instead, they recreated one from information they obtained through a German spy. They were also able to obtain a table of daily keys. But largely, it was because of poor operating procedures that allowed the Polish to make the first breakthroughs in cracking the Enigma. The German Navy used much more stringent procedures and it wasn't until Alan Turing developed the Bombe (a machine used to reduce the number of possible "passwords" for a particular ciphertext) that the German Naval Enigma was cracked. At some point during the war, the Germans switched from a 3-rotor Enigma to a 4-rotor Enigma and there was a long period of time when the Allies were unable to decipher the messages. But that too was eventually cracked. Some people say that had the Germans used correct operating procedures, the Allies would never have cracked the Enigma.
Jan 24 '11 #9

NeoPa
Expert Mod 15k+
P: 31,299
Rabbit:
The Polish didn't have an original Enigma, but instead, they recreated one from information they obtained through a German spy.
That doesn't fit with my original recollection, but I must admit that I first heard of this many, many years ago and have probably misremembered the details with the passing of the decades. I expect you're quite right.

I believe the cracking of the naval codes was largely down to the promoting of Admiral Doenitz away from direct command of the U-Boats. He took charge of the whole navy, but the U-Boat fleets, who to that point had maintained procedure very strictly, started to become slacker. To be honest, although I seem to remember that point, and that the actual capture of a naval Enigma Machine from a surface craft also had an enormous effect at some time, I don't remember the details clearly enough to paint a true picture.

There were alternating periods during the war in the Atlantic, where each side took and maintained a semblance of superiority over the other for a while. The reasons for this were many, but some of the times were directly related to Enigma developments of one form or another, both on the part of the Allies and the Axis.
Jan 24 '11 #10

Rabbit
Expert Mod 10K+
P: 12,334
There were commercial versions of the Enigmas before the war started. I believe the Polish had one of the commercial variants but the military Enigma differed from the commercial version.
Jan 24 '11 #11

NeoPa
Expert Mod 15k+
P: 31,299
BTW. The code was interesting too. I'm incorporating it (a variation) into my projects going forward. It's better than my simple 'NOT based' alternative.
Jan 24 '11 #12

Rabbit
Expert Mod 10K+
P: 12,334
Three dozen lines for encryption is simple. I like it for its ease of use and its simple algorithm. You could even save some more code space by not inlining the functions. If you want a more secure algorithm, you should take a look at the AES thread I posted. Not so simple lol, it's 400 lines of code.
Jan 24 '11 #13

NeoPa
Expert Mod 15k+
P: 31,299
I may do that.

In the mean time, I incorporated the (amended) routine into a simple spreadsheet that I've attached. I encapsulated the function call with some code to convert the result to a Hex string after encryption, to avoid any dodgy characters in the sheet, but it's fundamentally just a spreadsheet to illustrate what it does.

If you like, I'll add the attachment to your OP, but only if you're happy with it. Let me know.
Attached Files
File Type: zip RC4.Zip (10.4 KB, 2870 views)
Jan 25 '11 #14

Rabbit
Expert Mod 10K+
P: 12,334
Looks good, thanks!
Jan 25 '11 #15

P: 5
Hi,

I have been looking at your RC4 example above and believe that I have it working. For example to encrypt the string "Test" with the key "Key" (both without quotes) it returns:

9F2C10F8

This will decrypt fine within the tool as well (as will any other data I throw into it) however when I try and validate the work against an independent implementation of RC4 on the internet, such as http://rc4.online-domain-tools.com/ it returns:

tgy

Now, I'll be the first to admit that I know very little about encryption, so i'm asuming that I am missing something very obvious here?

The purpose of this is to provide encryption to CSV files. They are created by Excel and sent to an Oracle DB that will need to decrypt them and load the files, however I don't want to suggest a solution to the Oracle Devs before I understand how I am going to implement it myself ;-)

Thanks
Sep 21 '15 #16

Rabbit
Expert Mod 10K+
P: 12,334
The first output you have is in hexadecimal format. The second output you have there is in ASCII format. However, they're not going to match up anyways because my implementation drops the first 3072 bytes of the key stream for additional security.
Sep 21 '15 #17

P: 5
Sorry, I meant that when I encrypt the string with the VBA it returns the Hex above, but when I put that Hex output through a different decrypting tool it returns the ASCII which is obviously not the original message, anyway...

What you're saying is that because your implementation drops the first 3072 bytes for added security it wont produce the same output as the site I mentioned. What would I need to do to replicate the sites output. I tried simply swapping the 3 instances of 3072 for 0 and 768 (as you mentioned that in the text above) but I'm clearly missing something key as I still do not get the same results.

The reason I'm asking isn't because I actually want to implement a less secure version, but I know that the first thing the Oracle Devs will do is pass test outputs through an online tool to validate the solution and if I can't match up a baseline, then there'll be no chance of implementing your more secure version.

Thanks for your help!
Sep 21 '15 #18

Rabbit
Expert Mod 10K+
P: 12,334
Comment out lines 23-29 in the code block above.

It validates fine with the link you posted. I just checked it.
Sep 21 '15 #19

P: 5
hmmm, thanks for getting back so quick, I'm still however not able to validate it. So that you know I'm using Excel 2013 on Windows 10. I have created a new blank spreadsheet, pasted the code in the article (not from the attached sheet) into the code module for Sheet1 and commented out the lines you mentioned. I have then added a quick sub with one line, Debug.Print RunRC4("test", "Key"). It outputs:

_J€R

This is different to the site (9F FA 04 F5). Are you running the code in Excel as well? I was wondering if this was an instance of VBA badly interpreting characters somewhere along the line. To double check this I pulled the input from the last line to chr() and got:

95, 74, 128, 82 or 5F 4A 80 52
Sep 21 '15 #20

ADezii
Expert 5K+
P: 8,615
For what it is worth, here is a Demo that I use for 2 Versions of RC4. To the best of my knkowledge, I based this Demo on Rabitt's Posts regarding this Thread.
Attached Files
File Type: zip RC4.zip (22.2 KB, 177 views)
Sep 22 '15 #21

P: 5
Thanks, I took a look and at the heart of it, your RC4 is (as you say) based on Rabbit's and produces the same results that I see in my version... which as far as I can work out is different to others on the internet???

For the record, I don't think that Rabbit's code is wrong, if there was an error in it, i'm sure it would have been spotted in the nearly 5 years since it was created ;-) I am assuming that I must be mis-interpreting the output somewhere along the line... I just can't seem to work out where!

So that you can all see what i'm working with, here is the excel file (saved as 2003 xls format) that I have been working on to share with the DB guys. (It doesn't have any error handling in it, so for instance if you click to decrypt a file and the file doesn't exist you will get a nasty VBA error... sorry!)

For the record the output that I get for "Test" with the key "Key" (and dropping 0 bytes of the keystream) is 7F4A8052 where as I believe the correct output is BFFA04F5.

Anyhelp is much appreciated!
Sep 22 '15 #22

Rabbit
Expert Mod 10K+
P: 12,334
I was using a vbs file to test the values. And when I went to compare the code with the code in this thread, I discovered a discrepancy. I must have discovered the flaw in the code a while back but forgot to post an update to this thread. I have updated the code above. Anyone using this code should update their code as well. Sorry for all the confusion.
Sep 22 '15 #23

P: 5
Ah, well at least we have got to the bottom of it!

I have added a new version of the Excel tool that I built around your code.

Thanks very much for all your work on this it will really help me out!!!
Attached Files
File Type: xls Encrypt-Decrypt.xls (74.5 KB, 561 views)
Sep 22 '15 #24

NeoPa
Expert Mod 15k+
P: 31,299
Good work ShippWreck, and delicately handled ;-)

@Rabbit.
Can you give details of the change(s) so that I can update my code and post a replacement attachment?
Sep 23 '15 #25

Rabbit
Expert Mod 10K+
P: 12,334
Glad you got everything working for you and thanks for helping to discover that the code in the thread was outdated!

@NeoPa, references to Mod 255 were changed to Mod 256.
Sep 23 '15 #26

Rabbit
Expert Mod 10K+
P: 12,334
A note on the base RC4 algorithm. It's considered very insecure. It's one of the reasons why WEP is no longer recommended for wifi security. It can be broken with a few minutes of data from scanning the encrypted wifi packets. At the very least, you should drop the first few thousand bytes of the keystream. There are also variations on the RC4 algorithm that may be slightly more secure.

And the next piece of advice is for all encryption algorithms, you should incorporate a "salt" or "initialization vector" into the algorithm. What this is, is a known value that is used to change the key so that multiple encryptions of the same value with the same key result in different encrypted outputs.
Sep 23 '15 #27

NeoPa
Expert Mod 15k+
P: 31,299
Right. I've quickly updated that now so that it should show the correct hex value. If someone could run a quick eye over it to confirm it does what it should then I'll add it to the OP (replacing the original).
Attached Files
File Type: zip RC4.Zip (12.1 KB, 2357 views)
Sep 25 '15 #28

Rabbit
Expert Mod 10K+
P: 12,334
Looks, good. Thanks NeoPa
Sep 25 '15 #29

NeoPa
Expert Mod 15k+
P: 31,299
Cheers Rabbit. OP updated.
Sep 25 '15 #30

ADezii
Expert 5K+
P: 8,615
What is the specific purpose of the Disp() Function in the example?
Sep 27 '15 #31

NeoPa
Expert Mod 15k+
P: 31,299
It shows the result ADezii.

If you look at the code you'll see it has a second parameter so that you can use it to convert one way or another.

In the worksheet you'll see it used to take whatever is in A1, and with a key from A2, convert it to a value in A3. It is then used to take that value (in A3) and convert it back - again using the same key in A2 - to a value in A4. If all goes to plan (It does.) then A4 is the same as A1, regardless of what that is.

You can change the values of A1, and even the key in A2, but A3 & A4 are formula results using =Disp(...).
Sep 27 '15 #32

ADezii
Expert 5K+
P: 8,615
Thank you for the explanation, apparently just went blank for awhile. Getting old I guess!
Sep 28 '15 #33

NeoPa
Expert Mod 15k+
P: 31,299
I must be too then :-D

I had to update it recently and I was a while trying to work out how the darn thing even used the code.
Sep 29 '15 #34