Not long ago I came across a piece of code that sparked my interest. I’ve always been interested in code obfuscation, and this one is a really nice example of it.
For those who don’t know, an obfuscated program is a program that is made with the intention of it being unreadable or incomprehensible.
These programs often feature some incredibly convoluted logic and exotic syntax which the anyone reading it is unlikely to know.
Here is the program:
#include <stdio.h>
main()
{
int a,b,c;
int count = 1;
for (b=c=10;a="- FIGURE?, UMKC,XYZHello Folks,\
TFy!QJu ROo TNn(ROo)SLq SLq ULo+\
UHs UJq TNn*RPn/QPbEWS_JSWQAIJO^\
NBELPeHBFHT}TnALVlBLOFAkHFOuFETp\
HCStHAUFAgcEAelclcn^r^r\\tZvYxXy\
T|S~Pn SPm SOn TNn ULo0ULo#ULo-W\
Hq!WFs XDt!" [b+++21]; )
for(; a-- > 64 ; )
putchar ( ++c=='Z' ? c = c/ 9:33^b&1);
return 0;
}
What does this code do? Why, it prints an ascii map of India!
Of course!
!!!!!!
!!!!!!!!!!
!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!
!!!!!!!!!!!!
!!!!!!!!!!!!
!!!!!!!!!!!!
!!!!!!!!
!!!!!!!!!!
!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!! !!!!!
!!!!!!!!!!!!!!!!!!! !!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!! ! !!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! !!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! !!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!
!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !
!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!
!!!!!!!!!!!!
!!!!!!!!!!!!
!!!!!!!!!!!!
!!!!!!!!
!!!!!!
!!!!
The first time I saw it I was amazed; how could something like that be contained in that piece of code?
So I set out to understand its workings and, why not, modify it to make it print what I want!
So here we go:
Code readability
The first thing we need to do is to simplify the weird syntax in the code.
It is obvious that the image is encoded in that long string of gibberish which is being read in some way, so let’s put it in a variable of its own.
Let’s also add some indentation to those loops.
int a,b,c;
int count = 1;
const char *str="- FIGURE?, UMKC,XYZHello Folks,\
TFy!QJu ROo TNn(ROo)SLq SLq ULo+\
UHs UJq TNn*RPn/QPbEWS_JSWQAIJO^\
NBELPeHBFHT}TnALVlBLOFAkHFOuFETp\
HCStHAUFAgcEAelclcn^r^r\\tZvYxXy\
T|S~Pn SPm SOn TNn ULo0ULo#ULo-W\
Hq!WFs XDt!";
for(b=c=10;a=str[b+++21]; ){
for(; a-- > 64 ; ){
putchar( ++c=='Z' ? c = c/ 9:33^b&1);
}
}
Let’s take a look at it; the variable count is declared and then never used again, maybe to confound the reader, so we can erase it entirely. The variables b and c are initialized to 10 in the loop, and a is assigned some characters from the string.
In the loop, we have an assignment in the condition statement. This might seem weird, but one should keep in mind that an assignment used as an expression evaluates to the value of its right side.
In this case, the condition is equivalent to saying “until a is 0”, that is, until the end of the string.
The counter b is incremented each time the loop executes. Note that it is post-incremented, so its value is incremented AFTER it is evaluated in the expression.
Since 21 is added to b, the index of the string is initially 31 and is incremented by 1 on every iteration.
Thus, the entire first line of the string is not read.
We can rewrite this code like this.
int a, b=0, c=10;
const char *str=
"TFy!QJu ROo TNn(ROo)SLq SLq ULo+\
UHs UJq TNn*RPn/QPbEWS_JSWQAIJO^\
NBELPeHBFHT}TnALVlBLOFAkHFOuFETp\
HCStHAUFAgcEAelclcn^r^r\\tZvYxXy\
T|S~Pn SPm SOn TNn ULo0ULo#ULo-W\
Hq!WFs XDt!";
do{
a=str[b];
b++;
[Loop contents...]
} while(a>0);
Now let’s take a look at the inner loop.
The putchar command gets executed a-64 times. If a<64, it doesn’t execute. The loop can be rewritten this way.
int len=a-64;
while(len>0){
len--;
putchar( ++c=='Z' ? c = c/ 9:33^b&1);
}
Now let’s break down the putchar.
Its argument is a ternary if statement. Here, we see that c is incremented each time the putchar function is called.
When c reaches 90 (ascii for ‘Z’), it is divided by 9, which is equivalent to the assignment c=10.
This assignment statement here evaluates to 10, which is the ascii code for ‘\n’.
The second expression is a bitwise expression: it is evaluated like 33^(b & 1), where ^ means XOR and & means AND.
b & 1 equals 0 if b is even and 1 if it is odd.
This result is XORed with the number 33, so the final result is 33 if b is even or 32 if b is odd.
In ascii the codes 32 and 33 correspond to the whitespace and ‘!’ respectively.
So, the final code is like this:
int a,b=0,c=10;
const char *str=
"TFy!QJu ROo TNn(ROo)SLq SLq ULo+\
UHs UJq TNn*RPn/QPbEWS_JSWQAIJO^\
NBELPeHBFHT}TnALVlBLOFAkHFOuFETp\
HCStHAUFAgcEAelclcn^r^r\\tZvYxXy\
T|S~Pn SPm SOn TNn ULo0ULo#ULo-W\
Hq!WFs XDt!";
do{
a=str[b]; //get char from string
b++;
int len=a-64; //times the loop is executed
while(len>0){ //print as many corresponding chars
len--;
c++;
char ch;
if(c==90){ //each 80 chars: ignore the character that should be
c=10; //printed and print a \n instead
ch='\n';
}
else{
if(b%2==1) //if b is odd, print spaces
ch=' ';
else
ch='!'; //if b is even, print !
}
putchar(ch);
}
} while(a>0); //Until \0, the end of the string
Now that the code is somewhat more readable, let’s see the logic behind it.
The algorithm
This program uses a run-length encoding of the image stored in the string.
In a run-length encoding, a sequence of equal bytes can be represented simply by its length and the value.
For example, we have the string aaaaabbbaaaaaabbbb. Its run-length encoding would be 5a3b6a4b, as it is composed of a sequence of 5 a, 3 b and so on…
Here we have an image where sequences of only two characters alternate themselves: sequences of whitespaces and sequences of !.
Thus, only the length of each sequence can be indicated, while the value can be simply changed back and forth between ‘ ‘ and ‘!’ and left implicit.
Each character of the string encodes one of these sequences.
For example, the first char is ‘T’, which in ascii is 84. The program subtracts 64 and prints the resulting number of characters. Since b at this point is 1, it will print 20 whitespaces. Those are indeed the first characters of the image, followed by a series of 6 exclamation marks (because F=64+6).
So that’s it.
This program isn’t really that complicated, but is written in a way that confuses the reader.
What’s next? Making it print what we want, obviously!
Having some fun!
So, now for the fun part; I’ve readied a file with this charming ascii art of Groucho Marx. All we need to do is codify it in such a way that the original program can read it, and substitute the strings!
!!!!!!! !!!!!!!!!!!
!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!! !!! !!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!! !!!!!!!!!!!!!!!!!
!!!!!!!!! !!!!!!!!!!!!!!!!!!
!!!!!!!! !!!!!!!!!!!!!!!!!
!!!!!! !!!!!!!!!!!!!!!!
!!! !!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!
!!!!!!!!! !!!!!!!!!!!!!!!!!!
!!!!! !!!!!!!!!!!! !!!!!!!!!!!!!!!!!!
!!!!!!!!!!!! !!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!
!! !!!!! !!!!! !!!!!!!!! !!!!!!!!!!!!!!!
!!! !!!!! !! !! !!!!!!! !!!!!!!!!!!!!! !!!!
! !!!!!!!!! !! !!!!!!!!!!! !!!!!!!!!!!!! ! !!
! ! !!! ! ! ! ! !!!!!!!!!!! !! !!
! !!! !!! ! ! ! !!!!!!!!! !!!! !
!! ! !! ! ! !!! !!!!!!! !!!!!!!!!
!! !! !! !!!!!! !!!!! !!!!!!!!!
!! !! !!! !! !! !! !!!!!!!!!!!!
!!! !! !!! !! ! !! ! !!!!!!!!
!!!!!!! !!!!!!! ! !! ! !!!!!!!!
! !! !! !!!!!!!!!!!
! ! !!!! !! !!!!!!!! !
!! !!!!!! ! !!!!!!!!!! !!
!! !!! !!!!!!!! ! !!!!!! !
!! !!!!!!!!!!!!!!!! !! !!!!!!! !!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! !!!! ! !!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! ! ! !
!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ! ! !! !
!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!! ! ! !! !
!!!!!!!!!!! !!!!!!!!!!!! !! ! ! !!! !!
!!!!!! !!!! !! !
!!!!!! !! !! !! !!
!!!!!! !! !! !
!!!! !! !! !!
! ! !!! !
! !! !!
!!!!!!! !!
! !!
!! !!
!!!!!!! !! !!
!! !
!! !!
!! !
!!! !!!!
!!! !!!!!!
! !!!
! !!
!! !!
!! !!
! !!
! !
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Doing it by hand would be rather lengthy, so here is a simple C++ program that does the job.
#include <iostream>
#include <fstream>
#include <sstream>
std::ifstream fin("input.txt");
std::ofstream fout("output.txt");
std::stringstream buffer;
std::string img, enc="";
int main(){
buffer<<fin.rdbuf();
img=buffer.str();
int len=1;
for(int i=1;i<img.length();i++){
if(img[i]!=img[i-1]){
//Replace \ with \\ so we don't have to do it by hand
if(len+64=='\\')
enc+="\\\\";
else
enc+=len+64;
len=1;
}
else len++;
}
fout<<enc;
}
The file “input.txt” is read and put into the string img. Then, we check each character. If it is equal to the one before it, we increment the length of the current sequence. If it isn’t, we write this length+64 into the output string and reset the counter to 1.
Note that the code will recognize the newline characters at the end of each line as a new sequence of length 1. So in theory this could mess things up if it is between characters that are different, but this isn’t our case and there’s no point in fixing it as it’s a one-time program.
I’ll leave this bugfix as an exercise for the reader :-).
Running the program gives us the string
ZGIKZAQOFRWAMSCVVAKTCZSAGVB^RAGTC`QAFRFbOAFNMCCYOAFK_QNAGI`RMAGHcQLAHFePLAJCgQJAtRIAcIHRIAPELLGRIAKLHPERIAGPGSCRIAFQGSDQIAGBFELEAIGOJAFCFEBBGBEGJNBDDAFAGIGBEKGMCACBBAFAFABCCAGAEAIAIKCBDBAAFAECECGAFAHAJICDDAAAFBFADBIAGAECKGCIBAGBUBKBEFDEBICAHBHBKCHBEBDBFLDAICEBNCEBFAFBEABHEAKGQGGAGBDAAHFAQA_BGBCKEAQAQANDEBBHBAEAQBOFMAFJCBDARBGCCHLAIFEADASBEPKBHGDBCAN]KBGDBAECAAN]LBHACAGAAAN]MAHABBGAAAN\\GFAAHAABHAAAOKCLGBDAAAHCHBAAlFDDHBHABA\\FIBDBFBPBBAZFJBFBWACA]DHBHBUBCA`AHABC[ADAiADBYBDAiGXBEAjA\\BFAiB\\BFALGUB\\BGAgB]AHAfB]BHAeB^AIAcC\\DIAaCYFLAaAYCQAaAXBSAaBVBTAbBTBUAcASBVAcASAWA_cMA_cMA_cMA_cMA_cN
This string looks a little bit different from the original one, but keep in mind that different encoding strings can lead to the same image, so the original is probably generated in some slightly different way and hand-tuned to make it look nice (apart from the fact that it encodes a different image…).
We’re not interested in making it look nice though.
Our final program is this
#include <stdio.h>
main()
{
int a,b,c;
int count = 1;
for (b=c=10;a="- FIGURE?, UMKC,XYZHello Folks,\
ZGIKZAQOFRWAMSCVVAKTCZSAGVB^RAGT\
C`QAFRFbOAFNMCCYOAFK_QNAGI`RMAG\
HcQLAHFePLAJCgQJAtRIAcIHRIAPELL\
GRIAKLHPERIAGPGSCRIAFQGSDQIAGBF\
ELEAIGOJAFCFEBBGBEGJNBDDAFAGIGB\
EKGMCACBBAFAFABCCAGAEAIAIKCBDBA\
AFAECECGAFAHAJICDDAAAFBFADBIAGA\
ECKGCIBAGBUBKBEFDEBICAHBHBKCHBE\
BDBFLDAICEBNCEBFAFBEABHEAKGQGGA\
GBDAAHFAQA_BGBCKEAQAQANDEBBHBAE\
AQBOFMAFJCBDARBGCCHLAIFEADASBEP\
KBHGDBCAN]KBGDBAECAAN]LBHACAGAA\
AN]MAHABBGAAAN\\GFAAHAABHAAAOKC\
LGBDAAAHCHBAAlFDDHBHABA\\FIBDBF\
BPBBAZFJBFBWACA]DHBHBUBCA`AHABC\
[ADAiADBYBDAiGXBEAjA\\BFAiB\\BF\
ALGUB\\BGAgB]AHAfB]BHAeB^AIAcC\\\
DIAaCYFLAaAYCQAaAXBSAaBVBTAbBTB\
UAcASBVAcASAWA_cMA_cMA_cMA_cMA_cN" [b+++21]; )
for(; a-- > 64 ; )
putchar ( ++c=='Z' ? c = c/ 9:33^b&1);
return 0;
Which when executed shows our chosen string in all of its glory.
I hope you enjoyed this post and that I got someone interested in obfuscated programming! All the files are zipped and available for download here: http://www.mediafire.com/download/2k5j7l291zp242g/india.zip