Writing Boot Sector Code
Introduction
In this article, we discuss how to write our own
"hello, world"
program into the boot sector. At the
time of this writing, most such code examples available on the web
were meant for the Netwide Assembler (NASM). Very little material
was available that could be tried with the readily available GNU
tools like the GNU assembler (as) and the GNU linker (ld). This
article is an effort to fill this gap.
Boot Sector
When the computer starts, the processor starts executing
instructions at the memory address 0xffff:0x0000 (CS:IP). This is
an address in the BIOS ROM. The machine instructions at this
address begins the boot sequence. In practice, this memory address
contains a JMP
instruction to another address,
typically 0xf000:0xe05b. This latter address contains the code to
perform power-on self test (POST), perform several initialisations,
find the boot device, load the code from the boot sector into
memory, and execute it. From here, the code in the boot sector
takes control. In IBM-compatible PCs, the boot sector is the first
sector of a data storage device. This is 512 bytes in length. The
following table shows what the boot sector contains.
Address | Description | Size in bytes | |
---|---|---|---|
Hex | Dec | ||
000 | 0 | Code | 440 |
1b8 | 440 | Optional disk signature | 4 |
1bc | 444 | 0x0000 | 2 |
1be | 446 | Four 16-byte entries for primary partitions | 64 |
1fe | 510 | 0xaa55 | 2 |
This type of boot sector found in IBM-compatible PCs is also known as master boot record (MBR). The next two sections explain how to write executable code into the boot sector. Two programs are discussed in the these two sections: one that merely prints a character and another that prints a string.
The reader is expected to have a working knowledge of x86 assembly language programming using GNU assembler. The details of assembly language won't be discussed here. Only how to write code for boot sector will be discussed.
The code examples were verified by using the following tools while writing this article:
- Debian GNU/Linux 4.0 (etch)
- GNU assembler (GNU Binutils for Debian) 2.17
- GNU ld (GNU Binutils for Debian) 2.17
- dd (coreutils) 5.97
- DOSBox 0.65
- QEMU 0.8.2
Print Character
The following code prints the character 'A' in yellow on a blue background:
.code16
.section .text
.globl _start
_start:
mov $0xb800, %ax
mov %ax, %ds
mov $0x1e41, %ax
xor %di, %di
mov %ax, (%di)
idle:
hlt
jmp idle
We save the above code in a file, say a.s
, then
assemble and link this code with the following commands:
as -o a.o a.s
ld --oformat binary -o a.com a.o
The above commands should generate a 15-byte output file
named a.com
. The .code16
directive in the
source code tells the assembler that this code is meant for 16-bit
mode. The _start
label is meant to tell the linker
that this is the entry point in the program.
The video memory of the VGA is mapped to various segments between 0xa000 and 0xc000 in the main memory. The colour text mode is mapped to the segment 0xb800. The first two instructions copy 0xb800 into the data segment register, so that any data offsets specified is an offset in this segment. Then the ASCII code for the character 'A' (i.e., 0x41 or 65) is copied into the first location in this segment and the attribute (0x1e) of this character to the second location. The higher nibble (0x1) is the attribute for background colour and the lower nibble (0xe) is that of the foreground colour. The highest bit of each nibble is the intensifier bit. Depending on the video mode setup, the highest bit may also represent a blinking character. The other three bits represent red, green, and blue. This is represented in a tabular form below.
Attribute | |||||||
Background | Foreground | ||||||
I | R | G | B | I | R | G | B |
0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 |
0x1 | 0xe |
We can be see from the table that the background colour is dark blue
and the foreground colour is bright yellow. We assemble and link
the code with the as
and ld
commands
mentioned earlier and generate an executable binary consisting of
machine code.
Before writing the executable binary into the boot sector, we might
want to verify whether the code works correctly with an emulator.
DOSBox is a pretty good emulator for this purpose. It is available
as the dosbox
package in Debian. Here is one way to
run the executable binary file using DOSBox:
dosbox -c cls a.com
The letter A
printed in yellow on a blue foreground
should appear in the first column of the first row of the screen.
In the ld
command earlier to generate the executable
binary, we used the extension name com
for the binary
file to make DOSBox believe that it is a DOS COM file, i.e., merely
machine code and data with no headers. In fact, the --oformat
binary
option in the ld
command ensures that the
output file contains only machine code. This is why we are able to
run the binary with DOSBox for verification. If we do not use
DOSBox, any extension name or no extension name for the binary would
suffice.
Once we are satisfied with the output of a.com
running
in DOSBox, we create a boot image file with this command: sector
with these commands:
cp a.com a.img
echo 55 aa | xxd -r -p | dd seek=510 bs=1 of=hello.img
This boot image can be tested with DOSBox using the following command:
dosbox -c cls -c 'boot a.img'
Yet another way to test this image would be to make QEMU x86 system emulator boot using this image. Here is the command to do so:
qemu-system-i386 -fda a.img
Finally, if you are feeling brave enough, you could write this image
to the boot sector of an actual physical storage device, such as a
USB flash drive, and then boot your computer with it. To do so, you
first need to determine the device file that represents the storage
device. There are many ways to do this. A couple of commands that
may be helpful to locate the storage device are mount
and fdisk -l
. Assuming that there is a USB flash drive
at /dev/sdx
, the boot image can be written to its boot
sector using this command:
cp a.img /dev/sdx
CAUTION: You need to be absolutely sure of the device path of the
device being written to. The device path /dev/sdx
is
only an example here. If the boot image is written to the wrong
device, access to the data on that would be lost.
Now booting the computer with this device should show display the letter 'A' in yellow on a blue background.
Print String
The following code prints the string "hello, world" in yellow on a blue background:
.code16
.section .text
.globl _start
_start:
ljmp $0, $start
start:
mov $0xb800, %ax
mov %ax, %ds
xor %di, %di
mov $message, %si
mov $0x1e, %ah
print:
mov %cs:(%si), %al
mov %ax, (%di)
inc %si
inc %di
inc %di
cmp $24, %di
jne print
idle:
hlt
jmp idle
.section .data
message:
.ascii "hello, world"
The BIOS reads the code from the first sector of the boot device into the memory at physical address 0x7c00 and jumps to that address. While most BIOS implementations jump to 0x0000:0x7c00 (CS:IP) to execute the boot sector code loaded at this address, unfortunately there are some BIOS implementations that jump to 0x07c0:0x0000 instead to reach this address. We will soon see that we are going to use offsets relative to the code segment to locate our string and copy it to video memory. While the physical address of the string is always going to be the same regardless of which of the two types of BIOS implementations run our program, the offset of the string is going to differ based on the BIOS implementation. If the register CS is set to 0 and the register IP is set to 0x7c00 when the BIOS jumps to our program, the offset of the string is going to be greater than 0x7c00. But if CS and IP are set to 0x07c0 and 0, respectively, when the BIOS jumps to our program, the offset of the string is going to be much smaller.
We cannot know in advance which type of BIOS implementation is going
to load our program into memory, so we need to prepare our program
to handle both scenarios: one in which the BIOS executes our program
by jumping to 0x0000:0x7c00 as well as the other in which the BIOS
jumps to 0x07c0:0x0000 to execute our program. We do this by using
a very popular technique of setting the register CS to 0 ourselves
by executing a far jump instruction to the code segment 0. The very
first instruction in this program that performs ljmp $0,
$start
accomplishes this.
There are two sections in this code. The text section has the
executable instructions. The data section has the string we want to
print. The code copies the first byte of the string to the memory
location 0xb800:0x0000, its attribute to 0xb800:0x0001, the second
byte of the string to 0xb800:0x0002, its attribute to 0xb800:0x0003
and so on until it has advanced to 0xb800:0x0018 after having
written 24 bytes for the 12 characters we need to print. The
instruction movb %cs:(%si), %al
copies one character
from the string indexed by the SI register in the code segment into
the AL register. We are reading the characters from the code
segment because we will place the string in the code segment using
the linker commands discussed later.
However, while testing with DOSBox, things are a little different.
In DOS, the text section is loaded at an offset 0x0100 in the code
segment. This should be specified to the linker while linking so
that it can correctly resolve the value of the label
named message
. Therefore we will assemble and link our
program twice: once for testing it with DOSBox and once again for
creating the boot image.
To understand the offset at which the data section can be put, it is worth looking at how the binary code looks like with a trial linking with the following commands:
as -o hello.o hello.s
ld --oformat binary -Ttext 0 -Tdata 40 -o hello.com hello.o
objdump -bbinary -mi8086 -D hello.com
xxd -g1 hello.com
The -Ttext 0
option tells the linker to assume that the
text section should be loaded at offset 0x0 in the code segment.
Similarly, the -Tdata 40
tells the linker to assume
that the data section is at offset 0x40.
The objdump
command mentioned above disassembles the
generated binary file. This shows where the text section and data
section are placed.
$ objdump -bbinary -mi8086 -D hello.com hello.com: file format binary Disassembly of section .data: 00000000 <.data>: 0: ea 05 00 00 00 ljmp $0x0,$0x5 5: b8 00 b8 mov $0xb800,%ax 8: 8e d8 mov %ax,%ds a: 31 ff xor %di,%di c: be 40 00 mov $0x40,%si f: b4 1e mov $0x1e,%ah 11: 2e 8a 04 mov %cs:(%si),%al 14: 89 05 mov %ax,(%di) 16: 46 inc %si 17: 47 inc %di 18: 47 inc %di 19: 83 ff 18 cmp $0x18,%di 1c: 75 f3 jne 0x11 1e: f4 hlt 1f: eb fd jmp 0x1e ... 3d: 00 00 add %al,(%bx,%si) 3f: 00 68 65 add %ch,0x65(%bx,%si) 42: 6c insb (%dx),%es:(%di) 43: 6c insb (%dx),%es:(%di) 44: 6f outsw %ds:(%si),(%dx) 45: 2c 20 sub $0x20,%al 47: 77 6f ja 0xb8 49: 72 6c jb 0xb7 4b: 64 fs
Note that the ... above indicates zero bytes skipped
by objdump
. The text section is above these zero bytes
and the data section is below them. Let us also see the output of
the xxd
command:
$ xxd -g1 hello.com 00000000: ea 05 00 00 00 b8 00 b8 8e d8 31 ff be 40 00 b4 ..........1..@.. 00000010: 1e 2e 8a 04 89 05 46 47 47 83 ff 18 75 f3 f4 eb ......FGG...u... 00000020: fd 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000040: 68 65 6c 6c 6f 2c 20 77 6f 72 6c 64 hello, world
Both outputs above show that the text section occupies the first 0x21 bytes (33 bytes). The data section is 0xc bytes (12 bytes) in length. Let us create a binary where the region from offset 0x0 to offset 0x20 contains the text section and the region from offset 0x21 to offset 0x2c contains the data section. The total length of the binary would then be 0x2d bytes (45 bytes). We will create a new binary as per this plan.
However while creating the new binary, we should remember that DOS
would load the binary at offset 0x100, so we need to tell the linker
to assume 0x100 as the offset of the text section and 0x121 as the
offset of the data section, so that it resolves the value of the
label named message
accordingly. Moreover while
testing with DOS, we must remove the far jump instruction at the top
of our program because DOS does not load our program at physical
address 0x7c00 of the memory. We create a new binary in this manner
and test it with DOSBox with these commands:
grep -v ljmp hello.s > dos-hello.s
as -o hello.o dos-hello.s
ld --oformat binary -Ttext 100 -Tdata 121 -o hello.com hello.o
Now we can test this program with DOSBox with the following command:
dosbox -c cls hello.com
If everything looks fine, we assemble and link our program once again for boot sector and create a boot image with these commands:
as -o hello.o hello.s
ld --oformat binary -Ttext 7c00 -Tdata 7c21 -o hello.img hello.o
echo 55 aa | xxd -r -p | dd seek=510 bs=1 of=hello.img
Now we can test this image with DOSBox like this:
dosbox -c cls -c 'boot hello.img'
We can also test the image with QEMU with the following command:
qemu-system-i386 -fda hello.img
Finally, this image can be written to the boot sector as follows:
cp hello.img /dev/sdx
CAUTION: Again, one needs to be very careful with the commands
here. The device path /dev/sdx
is only an example.
This path must be changed to the path of the actual device one
wants to write the boot sector binary to.
Once written to the device successfully, the computer may be booted with this device to display the "hello, world" string on the screen.