How Debuggers Work

  •        0
  

We aggregate and tag open source projects. We have collections of more than one million projects. Check out the projects section.



The first thing I do when I create a project is to create the debugger launch config at the `.vscode` folder. Debuggers help me to avoid putting print statements and building the program again. I always wondered how a debugger can stop the program on the line number I want and be able to inspect variables. Debugger workings have always been dark magic for me. At last, I managed to learn dark art by reading several articles and groking the source code of delve.

In this post, I'll talk about my learning while demystifying the dark art of debugger.

Problem statement

Let's define the problem statement before coding. I have a sample golang program that prints random integer every second. The goal which I want to achieve is that our debugger program should print `breakpoint hit` before our sample program prints the random integer. 

 Here is the sample program which prints random integer at every second.

 package main 
1. import ( 
2.  "fmt"
3. "math/rand"
4.  "time"
5. )
6.  func main() {
7.     for {
8.        variableToTrace := rand.Int()
9.         fmt.Println(variableToTrace)
10.        time.Sleep(time.Second)
11.    }
12. }
Solution

Now that we know what we want to achieve. Let's go step by step and solve the problem statement.

The first step is to pause the sample program before it prints the random int. That means we have to set the breakpoint at line number 8. 

To set the breakpoint at line number 8, we must gather the address of instruction at line number 8. 

Some of us know from high school that all high-level language is converted into assembly language at the end. So, how do we find the address of the instruction in the assembly language? 

  

Luckily, compilers add debug information along with the optimized assembly instruction on the output binary. Debug information contains information related to the mapping of assembly code to high-level language.

For Linux binaries, debug information is usually encoded in the DWARF format. 

DWARF is a debugging file format used by many compilers and debuggers to support source level debugging. It addresses the requirements of a number of procedural languages, such as C, C++, and Fortran, and is designed to be extensible to other languages. DWARF is architecture independent and applicable to any processor or operating system. It is widely used on Unix, Linux and other operating systems, as well as in stand-alone environments. 

DWARF format can be parsed using objdump tool. 

The below command will output all the addresses of the instruction and it's mapping to the line number and file name.

    objdump --dwarf=decodedline ./sample

objdump command will output similar to this:

File:  /home/debugger-example/sample.go

File name       Line number    Starting address  View  Stmt
sample.go            6            0x498200            x
sample.go            6            0x498213               x
sample.go            7            0x498221               x
sample.go            8            0x498223               x
sample.go            8            0x498225        
sample.go            9            0x498233               x
sample.go            9            0x498236        
sample.go            10           0x4982be              x
sample.go            10           0x4982cb        
sample.go            8            0x4982cd               x
sample.go            9            0x4982d2        
sample.go            6            0x4982d9              x
sample.go            6            0x4982de        
sample.go            6            0x4982e0              x
sample.go            6            0x4982e5              x

The output clearly states that `0x498223` is the starting address of line number 8 for sample.go file. 

The next step is to pause the program at the address `0x498223`

Trick to pause the program execution

CPU will interrupt the program whenever it sees data integer 3. So, we just have to rewrite the data at the address `0x498223` with the data []byte{0xcc} to pause the program. 

In computing and operating systems, a trap, also known as an exception or a fault, is typically a type of synchronous interrupt caused by an exceptional condition (e.g., breakpoint, division by zero, invalid memory access). Source: Wikipedia

Does that mean we have to rewrite the binary at `0x498223`? No, we can write it using ptrace. 

Ptrace to rescue

ptrace is a system call found in Unix and several Unix-like operating systems. By using ptrace (the name is an abbreviation of "process trace") one process can control another, enabling the controller to inspect and manipulate the internal state of its target. ptrace is used by debuggers and other code-analysis tools, mostly as aids to software development. Source: Wikipedia

ptrace is a syscall that allows us to rewrite the registers and write the data at the given address. 

Now we know which address to pause and how to find the memory representing lines, and manipulate the memory of the sample program. So, let's put all this knowledge into action.

exec a process by setting Ptrace flag to true, so that we can use ptrace on the execed process.

process := exec.Command("./sample")
process.SysProcAttr = &syscall.SysProcAttr{Ptrace: true, Setpgid: true,    
Foreground: false}
process.Stdout = os.Stdout
if err := process.Start(); err != nil {
    panic(err)
}

The breakpoint can be set at `0x498223` by replacing the original data with integer 3 (0xCC). This can be done by `PtracePokeData`. 

func setBreakpoint(pid int, addr uintptr) []byte {
    data := make([]byte, 1)
    if _, err := unix.PtracePeekData(pid, addr, data); err != nil {
        panic(err)
    }
    if _, err := unix.PtracePokeData(pid, addr, []byte{0xCC}); err != nil {
        panic(err)
    }
    return data
}

You must already be wondering why there is `PtracePeekData`, other than `PtracePokeData`. `PtracePeekData` allows us to read the memory at the given address. I'll explain later why I'm reading the data at the address `0x498223`.

Since we set the breakpoint we'll continue the program and wait for the interrupt to happen. This can be done by `PtraceCont` and `Wait4`

if err := unix.PtraceCont(pid, 0); err != nil {
     panic(err.Error())
 }
 /* wait for the interupt to come.*/
 var status unix.WaitStatus
 if _, err := unix.Wait4(pid, &status, 0, nil); err != nil {
     panic(err.Error())
 }
 fmt.Println("breakpoint hit")

After the breakpoint hits, we need the program to continue as usual. Since we already modified the data at `0x498223` the program doesn't run as usual. So we need to replace the integer 3 with original data. Remember, we captured the original data at `0x498223` using `PtracePeekData` while setting the breakpoint. Let's just revert to the original data at `0x498223`.

if _, err := unix.PtracePokeData(pid, addr, data); err != nil {
      panic(err.Error())
}

Just reverting to original data doesn't run the program as usual. Because the instruction at `0x498223` is already executed when breakpoint hits. 

So, we want to tell the CPU to execute the instruction again at `0x498223`.


CPU executes the instruction that the instruction pointer points to. If you have studied microprocessors at university, you might remember. 


So, that means if we set the instruction pointer to `0x498223` then the CPU will execute the instruction at `0x498223` again.CPU registers can be manipulated using`PtraceGetRegs` and `PtraceSetRegs`.

regs := &unix.PtraceRegs{}
if err := unix.PtraceGetRegs(pid, regs); err != nil {
   panic(err)
}
regs.Rip = uint64(addr)
if err := unix.PtraceSetRegs(pid, regs); err != nil {
      panic(err)
 }

Now that we modified the register, if we continue the program then it'll execute the normal flow. But we want to hit the breakpoint again, so we'll tell the ptrace to execute only the next instruction and set the breakpoint again. `PtraceSingleStep` allows us to execute only one instruction.

func resetBreakpoint(pid int, addr uintptr, originaldata []byte) {
   /* revert back to original data*/
    if _, err := unix.PtracePokeData(pid, addr, originaldata); err != nil {
        panic(err.Error())
    }
    /* set the instruction pointer to execute the instruction again */
    regs := &unix.PtraceRegs{}
    if err := unix.PtraceGetRegs(pid, regs); err != nil {
        panic(err)
    }
    regs.Rip = uint64(addr)
    if err := unix.PtraceSetRegs(pid, regs); err != nil {
        panic(err)
    }
    if err := unix.PtraceSingleStep(pid); err != nil {
        panic(err)
    }
    /* wait for it's execution and set the breakpoint again */
    var status unix.WaitStatus
    if _, err := unix.Wait4(pid, &status, 0, nil); err != nil {
        panic(err.Error())
    }
    setBreakpoint(pid, addr)
}

So far we have learned how to manipulate registers and set breakpoints. Let's put all these into a for loop and drive the program. 

pid := process.Process.Pid
data := setBreakpoint(pid, 0x498223)
for {
    if err := unix.PtraceCont(pid, 0); err != nil {
        panic(err.Error())
    }
    /* wait for the interrupt to come. */
    var status unix.WaitStatus
    if _, err := unix.Wait4(pid, &status, 0, nil); err != nil {
        panic(err.Error())
    }
    fmt.Println("breakpoint hit")
    /* reset the breakpoint */
    resetBreakpoint(pid, 0x498223, data)
}

Phew, Finally we able to print `breakpoint hit` before our sample program prints random integer.

breakpoint hit
6129484611666145821
breakpoint hit
4037200794235010051
breakpoint hit
3916589616287113937
breakpoint hit
6334824724549167320
breakpoint hit
605394647632969758
breakpoint hit
1443635317331776148
breakpoint hit
894385949183117216

You can find the full source code at https://github.com/poonai/debugger-example

That's all for now. Hope you folks learned something new. In the next post, I'll write how to extract values from the variables by reading DWARF info. 

Plug

By the way, I've built a free vs-code extension that allows developers to set logpoint and get logs from the production system straight to your vscode console. You can check it out by going to quicklog.dev  or you can discuss on our discord server https://discord.gg/suk99uC5fa



   

Balaji is a systems engineer with experience in storage and networking system.

Subscribe to our newsletter.

We will send mail once in a week about latest updates on open source tools and technologies. subscribe our newsletter



Related Articles

8 Reasons Why Python Scores Over PHP for Web Development

  • python php web-development

PHP, the general-purpose scripting language has been used since decades for socket programming and web development. But in recent times, Python has become the most sought after programming language. This all-purpose programming language is attracting more developers in the industry owing to its highly dynamic and extensible nature. Let's see how Python is winning over age-old PHP.

Read More


Angular Service Workers Usage Guide

  • angular service-worker offline-app

Web developers come across scenarios like web application completely breaks when workstation goes offline. Likewise to get into our application, every time we need to open a browser and then access it. Instead if it is in app, it will be easy to access for end-user. Push notifications similar to email client need to be done through web application. All these are addressed by a magic called service worker.

Read More


How Bitcoin works? A simple introduction.

Bitcoin is an open source digital currency which could be transferred in a P2P payment network. It is decentralized and it is not controlled by any central authority or banks. It is transferred from person to person and no authority will be aware of your transaction. Its quite different from PayPal or Banks.

Read More


Light4j Cookbook - Rest API, CORS and RDBMS

  • light4j sql cors rest-api

Light 4j is a fast, lightweight and cloud-native microservices framework. In this article, we will see what and how hybrid framework works and integrate with RDMS databases like MySQL, also built in option of CORS handler for in-flight request.

Read More


Push Notifications using Angular

  • angular push-notifications notifications

Notifications is a message pushed to user's device passively. Browser supports notifications and push API that allows to send message asynchronously to the user. Messages are sent with the help of service workers, it runs as background tasks to receive and relay the messages to the desktop if the application is not opened. It uses web push protocol to register the server and send message to the application. Once user opt-in for the updates, it is effective way of re-engaging users with customized content.

Read More



An Introduction to the UnQLite Embedded NoSQL Database Engine

  • database nosql embedded key-value-store

UnQLite is an embedded NoSQL database engine. It's a standard Key/Value store similar to the more popular Berkeley DB and a document-store database similar to MongoDB with a built-in scripting language called Jx9 that looks like Javascript. Unlike most other NoSQL databases, UnQLite does not have a separate server process. UnQLite reads and writes directly to ordinary disk files. A complete database with multiple collections is contained in a single disk file. The database file format is cross-platform, you can freely copy a database between 32-bit and 64-bit systems or between big-endian and little-endian architectures.

Read More


Advanced Programming Guide in Redis using Jedis

  • redis jedis advanced-guide cluster pipline publish-subscribe

Redis is an in-memory data structure store, used as a database, cache and message broker. It supports data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs, geospatial indexes with radius queries and streams. This blog covers the advanced concepts like cluster, publish and subscribe, pipeling concepts of Redis using Jedis Java library.

Read More


How hashmap works in Java. My style of learning.

  • java hashmap opensource-learning

This is the most frequently asked questions in the interview. Googling will throw many links related to this topic. How to learn the implementation of hash map? My style of learning using open source learning technique.

Read More


Activiti - Open Source Business Automation

  • business-automation business bpm

Activiti Cloud is the first Cloud Native BPM framework built to provide a scalable and transparent solution for BPM implementations in cloud environments. The BPM discipline was created to provide a better understanding of how organisations do their work and how this work can be improved in an iterative fashion.

Read More


How to install and setup Redis

  • redis install setup redis-cluster

Redis is an open source (BSD licensed), in-memory data structure store, used also as a database cache and message broker. It is written in ANSI C and works in all the operating systems. It supports data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs, geospatial indexes with radius queries and streams. This article explains about how to install Redis.

Read More


Understanding Web Real-Time Communication (WebRTC)

  • webrtc communication web

Web Real-Time Communication (WebRTC) is an open source project currently being developed with an aim to provide real time, peer-to-peer communication between web applications. WebRTC provides simple JavaScript APIs that help developers to easily build web applications with real time audio, video and data transfer capabilities. This blog has been written assuming that the reader has zero knowledge of how WebRTC works and hence have explained the entire working in detail using simple terms and analogies wherever possible. Let’s get started!

Read More


mkcert - No config certificate authority tool

  • certificate ssl security cert go go-lang

Mkcert is go-lang project, which is super easy tool to setup certificate authority without any configuration. Using certificates are inevitable these days, data should be transferred in a secure communication channel. Buying a certificate is expensive and mostly companies buy certificates only for production systems. In Dev setup, if we use self-signed certificate then there will be trust errors. mkcert automatically creates and installs a local CA in the system root store, and generates locally-trusted certificates.

Read More


Univention Corporate Server - An open source identity management system

  • ucs identity-management-system

Univention Corporate Server is an open source identity management system, an IT infrastructure and device management solution and an extensible platform with a store-like App Center that includes tested third party applications and further UCS components: This is what Univention combines in their main product Univention Corporate Server, a Debian GNU/Linux based enterprise distribution. This article provides you the overview of Univention Corporate Server, its feature and installation.

Read More


How to increase Alexa rank for the website

  • alexa internet rank

Alexa is a web information company promoted by Amazon. It provides traffic, page views, reach, etc for the web sites.Alexa ranking is widely used to rate the web site. Ranking is in increasing order. High traffic sites has lesser the rank value and poor traffic web sites will have higher the rank value. Google is ranked 1. Follow our steps, how we increased the rank from 3 million to 300,000.

Read More


Cache using Hazelcast InMemory Data Grid

  • hazelcast cache key-value

Hazelcast is an open source In-Memory Data Grid (IMDG). It provides elastically scalable distributed In-Memory computing, widely recognized as the fastest and most scalable approach to application performance. Hazelcast makes distributed computing simple by offering distributed implementations of many developer-friendly interfaces from Java such as Map, Queue, ExecutorService, Lock and JCache.

Read More


Whats new in Lucene / Solr 4.0

  • lucene solr new-release

The release 4.0 is one of the important milestone for Lucene and Solr. It has lot of new features and performance important. Few important ones are highliggted in this article.

Read More


Marketing stratigies required to sell open source product

  • opensource selling promote

Many new products are coming in the open source world. Few are forking existing project, adding new features to it and selling it as open source product. Few strategies required to follow to sell the product better.

Read More


Ngnix - High Performance Web Server, Proxy Server, Content Cache and Reverse Proxy

  • load-balancer proxy-server web-server

Nginx is a High Performance Web Server, Proxy Server, Content Cache and Reverse Proxy server. It can also be used as mail proxy server and a generic TCP/UDP proxy server. Nginx claims to be more efficient and faster in the Web space compared to the other web servers. This can be evident with the architecture which is based on asynchronous event-driven approach. The event driven architecture enables to scale to hundreds / thousands of concurrent connections.

Read More


Getting Started on Undertow Server

  • java web-server undertow rest

Undertow is a high performing web server which can be used for both blocking and non-blocking tasks. It is extermely flexible as application can assemble the parts in whatever way it would make sense. It also supports Servlet 4.0, JSR-356 compliant web socket implementation. Undertow is licensed under Apache License, Version 2.0.

Read More


RESTEasy Advanced Guide - Filters and Interceptors

  • resteasy rest-api filters interceptors java

RESTEasy is JAX-RS 2.1 compliant framework for developing rest applications. It is a JBoss project that provides various frameworks to help you build RESTful Web Services and RESTful Java applications. It is a fully certified and portable implementation of the JAX-RS 2.1 specification, a JCP specification that provides a Java API for RESTful Web Services over the HTTP protocol.

Read More







We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.