Choosing an Embedded Debugger

This article discusses the different stages of software development for a typical embedded system, and outlines the debugging needs in each stage. Herein, the term “application development” collectively refers to 1) board bringup, 2) kernel porting (specifically embedded Linux), and 3) “user-mode” applications.

Board Bringup

Board bringup actually means different thing to different people. You could be dealing with an entirely new processor architecture with unreliable, prototypical hardware (most likely an FPGA instead of the actual ASIC), and no available existing toolchain. That being the case, 1) your first problem isn’t about a lack of a debugger, and 2) you are probably too smart to use/need one anyway. For this article, take a scenario where there is a stable toolchain for the architecture in question, and your assignment is to port a bootloader to it.

At this stage, your board may or may not have running firmware (or what I referred to earlier as the “bootloader”). The first thing you may want to do is write some sort of diagnostic program to verify that the hardware is stable. Later on, the diagnostic program may be transmogrified into a bootloader. Alternatively, you can choose from various publicly available bootloaders from the net (redboot, pmon, armboot, etc.), and modify one for your board. Your best option at this stage, whether you choose to port a bootloader or write one from scratch, is a debugger that supports some type of JTAG/BDM connectivity option. Let’s be careful here. Not all of them are the same, or better than the others. It depends on your needs. I recommend checking the following:

Is your JTAG hardware electrically and physically compatible with the connector on your board? More than one time I have purchased JTAG hardware, only to find that there was no way it would plug into the connector on my board, let alone work.

Verify the type of memory used by your board. If it’s DRAM, and that’s the only type of memory you have, you may want to make sure that the JTAG device used has the capability (or at least a user accessible interface) to initialize DRAM. Otherwise, you won’t be able to download anything to the target memory. If your board has some SRAM, or if it boots out of ROM/FLASH and you have the ability to put DRAM initialization code there, then your problem isn’t so perilous…

Verify that it works with your host system of choice. Some JTAG devices only have Windows drivers, and if you use Linux as your development host, it won’t work.

Most JTAG devices don’t come with a GUI, and if they do, it’s usually minimal. If you are command line oriented, gdb may suit you well. Most JTAG vendors have a gdb port for their product, or soon will. There are also various GUI front ends to gdb that are freely available. DDD is one that comes to mind, although I can never understand why someone would want to graphically see the relationship between two structures. Wouldn’t a tree view be sufficient, and much more space efficient? If you insist on having a graphical front end for your JTAG, some of the things to look for are:

The ability to create user defined mapping for your memory I/O. Most embedded processors, except for x86, use a certain range of addressable memory to address its I/O devices. That way, you don’t need a special instruction, like inp and outp to access I/O. A nice JTAG GUI would allow you to create mapping for these devices, and assign logical names to them, so that you can select for example, “PCI registers” from the menu and see them.

The level of integration with the debugger. How much of the capability of the JTAG device does the GUI expose? Does it let you set hardware breakpoints?

Performance. You don’t want to wait 3-5 seconds each time for each instruction single step to complete. Choose the wrong GUI debugger, and you may do just that. Connection between the JTAG box and your host system comes in many varieties including parallel, Ethernet, and more recently USB. However, the bottleneck is often the connection between your JTAG box and the target board, which is relatively slow. Each time your target board hits a debug stop point (like after an instruction singlestep), the GUI needs to refresh its multiple displays, including that of memory, registers, variables, etc. A well designed GUI would only access the minimum information required from JTAG, resulting in less delay and better response time.

Once the bootloader is running and stable on your board, a kernel port is next.

Kernel Porting

At this stage, we can safely assume (or maybe not) that 1) your bootloader can download and run another program image over serial or ethernet, and 2) there is an existing port of the embedded Linux kernel and respective toolchain that comes with the reference platform on which your board is based. After some changes to the kernel, which may involve one or more of the following, you would have a running kernel with ethernet and serial console device:

Configuring or modifying the memory mapping for your board

Configuring your serial console

Configuring and enabling your board’s ethernet and respective TCP/IP networking components

It’s difficult to be specific here as your to do list may be more expansive if, for example, you are dealing with a board with a PCI chipset currently unsupported by your kernel. Very likely, your kernel porting effort involves the development or port of kernel drivers for devices unique to board. Debugging at this stage of development generally falls into one of the following areas of code:

Trap handlers code that handles software generated events such as divide by zero, access misalignment, floating point emulation, etc.

Interrupt handlers or ISRs (Interrupt Service Routines) code that handles asynchronous events that occur as a result of device input/output. ISR is further divided into the top half which runs in the context of the handler, and the bottom half which is scheduled for subsequent execution by the kernel.

The top half of the kernel consists of less than 1% of the kernel source code, while the remaining body of code that runs in the bottom half takes up the balance. Additionally, it seems likely to me that most bottom half code is some type of device driver. I have not done a line count of the kernel source to verify this. The key point here is that if you are writing or debugging top half code, you probably don’t need a debugger. When you do, a JTAG based solution is probably the way to go. If you choose to go this route, make sure that the JTAG debugger and GUI that you choose can correctly handle addresses that are mapped by the target processor’s MMU (memory management unit).

For the remaining body of code, i.e. device drivers, JTAG may or may not be a good choice. Be sure that you take the following into consideration:

Does the debugger require modification to the kernel source files? Most debuggers, including JTAG based ones, require the kernel source files to be modified or patched in order to provide debugging functionality. This is undesirable from the standpoint that such debuggers can not be used in a production oriented environment. On the one hand, leaving the kernel patches in place poses as a security risk, as hackers can exploit the debug channel for malicious purposes. On the other, enabling the patches when debugging is needed, requires the kernel to be reconfigured and recompiled, which is not at all practical for a deployed product.

Does the debugger preempt servicing of ISRs? When you are debugging a driver within the target, the rest of the system should ideally be allowed to operate. Most JTAG based debuggers, or even software patches like kgdb will halt the system while debugging is in effect. This causes incoming I/O to be dropped, with negative consequences. Take the scenario where you are debugging a video driver, and the debugger preempts the network ISR from servicing incoming packets. In this case, your system may appear dead to the outside world, and may be marked as such.

Does the debugger offer high speed connectivity, such as Ethernet? This is simple. You experience less delay and faster response time if your host debugger offers a high speed method to communicate with your target board.

Does the debugger properly handle debugging of loadable modules? Kernel modules are loaded on the fly, which means that their program code and data locations are dynamically set at load time. This information needs to be correctly communicated by the kernel to the debugger so that it can properly recalculate addresses in the symbol table for the respective modules.

You really need to know what you are doing when debugging kernel code. A good debugger makes the task more convenient and far easier. Printk, perhaps the most often used means of debugging for kernel developers, is not without flaws. I have seen instances where code behaves incorrectly when printk’s are inserted. The reason for this is that kernel code runs on a very small stack, which can be as little as 4K on some (or most?) CPU architectures. Calling printk from within the outermost leaf functions can overrun the stack and destroy other kernel code/data located in the adjacent pages, causing the system to crash or function improperly. Conversely, I have also seen cases where code behaves incorrectly when printk’s are removed. For example, code that incorrectly references a local array beyond its bounds, would sometimes work correctly when a printk is added. The reason? Other local variables were defined next to the out of bounds array as a result of (or to facilitate) the printk call. Such variables effectively “padded” the out of bounds reference, which makes the code work.


Developing software for embedded systems is challenging. There is no substitute for true understanding, and a good intuition of your program/application. A good debugger should be the single most powerful weapon in your development arsenal, helping to improve the quality of your software, and saving you time. A true test of how well a debugger performs is to use it to debug your application and/or kernel code. A superior debugger should have a reasonably modern looking user interface, be 100% compatible with the cross compiler, allow you to debug both kernel and user mode code from a single connection, require no modification to the kernel, and can be loaded and unloaded on demand.

Have a question or comment? Email us at info[at]