Programming With DOS Debugger
Introduction
MS-DOS as well as Windows 98 come with a debugger program
named DEBUG.EXE
that can be used to work with assembly
language instructions and machine code. In MS-DOS version 6.22, this
program is named DEBUG.EXE
and it is typically present
at C:\DOS\DEBUG.EXE
. On Windows 98, this program is
usually present at C:\Windows\Command\Debug.exe
. It is
a line-oriented debugger that supports various useful features to
work with and debug binary executable programs consisting of machine
code.
In this post, we see how we can use this debugger program to assemble a few minimal programs that print some characters to standard output. We first create a 7-byte program that prints a single character. Then we create a 23-byte program that prints the "hello, world" string. All the steps provided in this post work well with Windows 98 too.
Contents
Print Character
Let us first see how to create a tiny 7-byte program that prints the
character A
to standard output. The
following DEBUG.EXE
session shows how we do it.
C:\>DEBUG -A 1165:0100 MOV AH, 2 1165:0102 MOV DL, 41 1165:0104 INT 21 1165:0106 RET 1165:0107 -G A Program terminated normally -N A.COM -R CX CX 0000 :7 -W Writing 00007 bytes -Q C:\>
Now we can execute this program as follows:
C:\>A A C:\>
The debugger command A
creates machine executable code
from assembly language instructions. The machine code created is
written to the main memory at address CS:0100 by default. The first
three instructions generate the software interrupt 0x21 (decimal 33)
with AH set to 2 and DL set to 0x41 (decimal 65) which happens to be
the ASCII code of the character A
. Interrupt 0x21
offers a wide variety of DOS services. Setting AH to 2 tells this
interrupt to invoke the function that prints a single character to
standard output. This function expects DL to be set to the ASCII
code of the character we want to print.
The command G
executes the program in memory from the
current location. The current location is defined by the current
value of CS:IP which is CS:0100 by default. We use this command to
confirm that the program runs as expected.
Next we prepare to write the machine code to a binary executable
file. The command N
is used to specify the name of the
file. The command W
is used to write the machine code
to the file. This command expects the registers BX and CX to contain
the number of bytes to be written to the file. When the DOS debugger
starts, BX is already initialised to 0, so we only set the register
CX to 7 with the R CX
command. Finally, we use the
command Q
to quit the debugger and return to MS-DOS.
Hello, World
The following DEBUG.EXE
session shows how to create a
program that prints a string.
C:\>DEBUG -A 1165:0100 MOV AH, 9 1165:0102 MOV DX, 108 1165:0105 INT 21 1165:0107 RET 1165:0108 DB 'hello, world', D, A, '$' 1165:0117 -G hello, world Program terminated normally -N HELLO.COM -R CX CX 0000 :17 -W Writing 00017 bytes -Q C:\>
Now we can execute this 23-byte program like this:
C:\>HELLO hello, world C:\>
In the program above we use the pseudo-instruction DB
to define the bytes of the string we want to print. We add the
trailing bytes 0xD and 0xA to print the carriage return (CR) and the
line feed (LF) characters so that the string is terminated with a
newline. Finally, the string is terminated with the byte for dollar
sign ('$'
) because the software interrupt we generate
next expects the string to be terminated with this symbol's byte
value.
We use the software interrupt 0x21 again. However, this time we set
AH to 9 to invoke the function that prints a string. This function
expects DS:DX to point to the address of a string terminated with
the byte value of '$'
. The register DS
has
the same value as that of CS
, so we only
set DX
to the offset at which the string begins.
Debugger Scripting
We have already seen above how to assemble a "hello, world" program in the previous section. We started the debugger program, typed some commands, and typed assembly language instructions to create our program. It is also possible to prepare a separate input file with all the debugger commands and assembly language instructions in it. We then feed this file to the debugger program. This can be useful while writing more complex programs where we cannot afford to lose our assembly language source code if we inadvertently crash the debugger by executing an illegal instruction.
To create a separate input file that can be fed to the debugger, we
may use the DOS command EDIT HELLO.TXT
to open a new
file with MS-DOS Editor, then type in the following debugger
commands, and then save and exit the editor.
A
MOV AH, 9
MOV DX, 108
INT 21
RET
DB 'hello, world', D, A, '$'
N HELLO.COM
R CX
17
W
Q
This is almost the same as the inputs we typed into the debugger in
the previous section. The only difference from the previous section
is that we omit the G
command here because we don't
really need to run the program while assembling it, although we
could do so if we really wanted to.
Then we can run the DOS command DEBUG < HELLO.TXT
to
assemble the program and create the binary executable file. Here is
a DOS session example that shows what the output of this command
looks like:
C:\>DEBUG < HELLO.TXT -A 1165:0100 MOV AH, 9 1165:0102 MOV DX, 108 1165:0105 INT 21 1165:0107 RET 1165:0108 DB 'hello, world', D, A, '$' 1165:0117 -N HELLO.COM -R CX CX 0000 :17 -W Writing 00017 bytes -Q C:\>
The output is in fact very similar to the debugger session in the previous section.
Disassembly
Now that we have seen how to assemble simple programs into binary executable files using the debugger, we will now briefly see how to disassemble the binary executable files. This could be useful when we want to debug an existing program.
C:\>DEBUG A.COM -U 100 106 117C:0100 B402 MOV AH,02 117C:0102 B241 MOV DL,41 117C:0104 CD21 INT 21 117C:0106 C3 RET
The debugger command U
(unassemble) is used to
translate the binary machine code to assembly language mnemonics.
C:\>DEBUG HELLO.COM -U 100 116 117C:0100 B409 MOV AH,09 117C:0102 BA0801 MOV DX,0108 117C:0105 CD21 INT 21 117C:0107 C3 RET 117C:0108 68 DB 68 117C:0109 65 DB 65 117C:010A 6C DB 6C 117C:010B 6C DB 6C 117C:010C 6F DB 6F 117C:010D 2C20 SUB AL,20 117C:010F 776F JA 0180 117C:0111 726C JB 017F 117C:0113 64 DB 64 117C:0114 0D0A24 OR AX,240A -D 100 116 117C:0100 B4 09 BA 08 01 CD 21 C3-68 65 6C 6C 6F 2C 20 77 ......!.hello, w 117C:0110 6F 72 6C 64 0D 0A 24 orld..$
INT 20 vs RET
Another way to terminate a .COM program is to simply use the
instruction INT 20
. This consumes two bytes in the
machine code: CD 20
. While producing the smallest
possible executables was not really the goal of this post, the code
examples above indulge in a little bit of size reduction by using
the RET
instruction to terminate the program. This
consumes only one byte: C3
. This works because when a
.COM file starts, the register SP contains FFFE. The stack memory
locations at offset FFFE and FFFF contain 00 and 00, respectively.
Further, the memory address offset 0000 contains the
instruction INT 20
. Here is a demonstration of these
facts using the debugger program:
C:\>DEBUG HELLO.COM -R SP SP FFFE : -D FFFE 117C:FFF0 00 00 -U 0 1 117C:0000 CD20 INT 20
As a result, executing the RET
instruction pops 0000
off the stack at FFFE and loads it into IP. This results in the
instruction INT 20
at offset 0000 getting executed
which leads to program termination.
While both INT 20
and RET
lead to
successful program termination both in DOS as well as while
debugging with DEBUG.EXE
, there is some difference
between them which affects the debugging experience. Terminating the
program with INT 20
allows us to run the program
repeatedly within the debugger by repeated applications of
the G
debugger command. But when we terminate the
program with RET
, we cannot run the program repeatedly
in this manner. The program runs and terminates successfully the
first time we run it in the debugger but the stack does not get
reinitialised with zeros to prepare it for another execution of the
program within the debugger. Therefore when we try to run the
program the second time using the G
command, the
program does not terminate successfully. It hangs instead. It is
possible to work around this by reinitialising the stack with the
debugger command E FFFE 0 0
before
running G
again.
Conclusion
Although the DOS debugger is very limited in features in comparison with sophisticated assemblers like NASM, MASM, etc., this humble program can perform some of the basic operations involved in working with assembly language and machine code. It can read and write binary executable files, examine memory, execute machine instructions in memory, modify registers, edit binary files, etc. The fact that this debugger program is always available with MS-DOS or Windows 98 system means that these systems are ready for some rudimentary assembly language programming without requiring any additional tools.