Writing Boot Sector Code
Introduction
In this article, we discuss how to write our own
"hello, world"
program into the boot sector. At the
time of this writing, most such code examples available on the web
were meant for the Netwide Assembler (NASM). Very little material
was available that could be tried with the readily available GNU
tools like the GNU assembler (as) and the GNU linker (ld). This
article is an effort to fill this gap.
Boot Sector
When the computer starts, the processor starts executing instructions at the memory address 0xfff0. This is usually a location in the BIOS ROM. Thus the BIOS code is executed by the processor. It checks several things, does many tests including POST (power-on self test), and then finds the boot device. It loads the code from its boot sector into the memory and executes it. From here, the code in the boot sector takes control. In IBM-compatible PCs, the boot sector is the first sector of a data storage device. This is 512 bytes in length. The following table shows what the boot sector contains.
Address | Description | Size in bytes | |
---|---|---|---|
Hex | Dec | ||
000 | 0 | Code | 440 |
1b8 | 440 | Optional disk signature | 4 |
1bc | 444 | 0x0000 | 2 |
1be | 446 | Four 16-byte entries for primary partitions | 64 |
1fe | 510 | 0xaa55 | 2 |
This type of boot sector found in IBM-compatible PCs is also known as master boot record (MBR). The next two sections explain how to write executable code into the boot sector. Two programs are discussed in the these two sections: one that merely prints a character and another that prints a string.
The reader is expected to have a working knowledge of x86 assembly language programming using GNU assembler. The details of assembly language won't be discussed here. Only how to write code for boot sector will be discussed.
The code examples were verified by using the following tools while writing this article:
- GNU assembler (GNU Binutils for Debian) 2.18
- GNU ld (GNU Binutils for Debian) 2.18
- dd (coreutils) 5.97
- DOSBox 0.72
Print Character
The following code prints a single character in yellow color on a blue background:
.code16
.section .text
.globl _start
_start:
mov $0xb800, %ax
mov %ax, %ds
movb $'A', 0
movb $0x1e, 1
idle:
jmp idle
We save the above code in a file, say char.s
, then
assemble and link this code with the following commands:
as -o char.o char.s
ld --oformat binary -o char.com char.o
The .code16
directive tells the assembler that this
code is meant for 16-bit mode. The _start
label is
meant to tell the linker that this is the entry point in the
program.
The video memory of the VGA is mapped to various segments between 0xa000 and 0xc000 in the main memory. The color text mode is mapped to the segment 0xb800. The first two instructions move 0xb800 into the data segment register, so that any data offsets specified is an offset in this segment. Then, the code for the character 'A' (usually 0x41 or 65) is moved into the first location in this segment and the attribute (0x1e) of this character to the second location. The higher nibble (0x1) is the attribute for background color and the lower nibble (0xe) is that of the foreground color. The highest bit of each nibble is the intensifier bit. The other three bits represent red, green, and blue. This is represented in a tabular form below.
Attribute | |||||||
Background | Foreground | ||||||
I | R | G | B | I | R | G | B |
0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 |
0x1 | 0xe |
We can be see from the table that the background color is dark blue
and the foreground color is bright yellow. We compile and link the
code with the as
and ld
commands mentioned
earlier and generate an executable binary consisting of machine
code.
Before writing the executable binary into the boot sector, we might want
to verify whether the code works correctly with an emulator. DOSBox is a
pretty good emulator for this purpose. It is available as the
dosbox
package in Debian. Rename the binary file to
char.com
and then run it with DOSBox with the following
commands:
dosbox -c cls char.com
The letter A
printed in yellow on a blue foreground
should appear in the first column of the first row of the screen.
In the ld
command earlier to generate the executable
binary, we used the extension name com
for the binary
file to make DOSBox believe that it is a DOS COM file, i.e., merely
machine code and data with no headers. In fact, the --oformat
binary
option in the ld
command was meant for
generating a binary with merely machine code and data without any
headers. This is why we are able to run the binary with DOSBox for
verification. If we do not use DOSBOX, any extension name or no
extension name for the binary would suffice.
Once we are satisfied with the output of char.com
running in DOSBox,we write the binary and the MBR signature into the
boot sector with these commands:
dd if=char of=/dev/sdb
printf '\x55\xaa' | dd seek=510 bs=1 of=/dev/sdb
Caution: One needs to be absolutely sure of the device path of the
device being written to. The device path /dev/sdb
is
only an example here. If the dd
command is used to
write to the wrong device, access to the data on it would be lost.
Now booting the computer with this device should show display the letter
A
in yellow on a blue background.
Print String
The following code prints a string in yellow color on a blue background:
.code16
.section .data
message:
.asciz "hello, world"
.section .text
.globl _start
_start:
nop
xor %di, %di
mov $0xb800, %ax
mov %ax, %ds
mov $message, %si
move:
xor %dx, %dx
mov %cs:(%si), %dl
cmp $0, %dl
idle:
jz idle
mov %dl, (%di)
inc %di
movb $0x1e, (%di)
inc %di
inc %si
jmp move
There are two sections in this code. The data section has the
null-terminated string to be displayed. The text section has the
code. The code moves the first byte of the string to the location,
0xb800:0x0000, its attribute to 0xb800:0x0001, the second byte of
the string to 0xb800:0x0002, its attribute to 0xb800:0x0003 and so
on until the string terminates which is detected by the null byte in
the end. The statement movb %cs:(%si), %dl
moves one
character from the string indexed by the SI register in the code
segment into the DL register. The reason why we are reading the
characters from code segment will become clear after understanding
the the linker commands discussed below.
While booting, the BIOS reads the code from the first sector of the boot
device into the memory at physical address 0x7c00 and jumps to that
address. However, while testing with DOSBox, things are a little
different. In DOS, the text section is loaded at an offset 0x100 in the
code segment. This should be specified to the linker while linking so
that it can correctly resolve the value of the label named
message
. Therefore the object file has to be linked
twice: once for testing it with DOSBox and once again before writing
it into the boot sector.
To understand the offset at which the data section can be put, it is worth looking at how the binary code looks like with a trial linking with the following command:
as -o string.o string.s
ld --oformat binary -Ttext 0 -Tdata 100 -o string.com string.o
objdump -bbinary -mi8086 -D string.com
xxd -g1 string.com
The -Ttext 0
option tells the linker to assume that the
text section should be loaded at offset 0x0 in the code segment.
Similarly, the -Tdata 100
tells the linker to assume
that the data section is at offset 0x100.
The objdump
command is used to disassemble the file.
This shows where the text section and data section are placed. Let
us take a close look at this portion of the output:
1b: 47 inc %di
1c: 46 inc %si
1d: eb ec jmp 0xb
...
ff: 00 68 65 add %ch,0x65(%bx,%si)
102: 6c insb (%dx),%es:(%di)
103: 6c insb (%dx),%es:(%di)
The output of the xxd
command mentioned above looks
like this (repeated sequence of zeros have been replaced with
...
by me for the sake of brevity):
00000000: 90 31 ff b8 00 b8 8e d8 be 00 01 31 d2 2e 8a 14 .1.........1....
00000010: 80 fa 00 74 fe 88 15 47 c6 05 1e 47 46 eb ec 00 ...t...G...GF...
00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
...
000000e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
000000f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000100: 68 65 6c 6c 6f 2c 20 77 6f 72 6c 64 00 hello, world.
Both outputs above show that the text section occupies the first 0x1e bytes (30 bytes). The data section is 0xd bytes (13 bytes) in length. We have 0x1bc bytes (440 bytes) in the boot sector where we can put our binary. To fit the entire binary into the first 440 bytes, let us create a binary where the region from offset 0x0 to offset 0x1e contains the text section and the region from offset 0x20 to offset 0x2c contains the data section. The byte at offset 0x1f is going to remain unused. The total length of the binary would then be 0x2d bytes (45 bytes). We will create a new binary as per this plan.
However while creating the new binary, we should remember that DOS
would load the binary at offset 0x100, so we need to tell the linker
to assume 0x100 as the offset of the text section and 0x120 as the
offset of the data section, so that it resolves the value of the
label named
message
accordingly. We create a new binary in this manner
and test it with DOSBox with these commands:
ld --oformat binary -Ttext 100 -Tdata 120 -o string.com string.o
dosbox -c cls string.com
If everything looks fine, we link it once again for boot sector and write it to the boot sector of our device.
ld --oformat binary -Ttext 7c00 -Tdata 7c20 -o string string.o
dd if=string of=/dev/sdb
printf '\x55\xaa' | dd seek=510 bs=1 of=/dev/sdb
Caution: Again, one needs to be very careful with
the dd
commands here. The device
path /dev/sdb
is only an example. This path must be
changed to the path of the actual device one wants to write the
boot sector binary to.
Once written to the device successfully, the computer may be booted
with this device to display the "hello, world"
string
on the screen.