The instructions for gcc vs clang couldn't be different, so be sure you know which version you're using.
# mingw-gcc
Debugging on Windows is tricky. I'm not aware of a mechanism to get symbols to work. There are several approaches and failed attempts. I'm going to detail them in an Appendix, but I never got any to work. In theory, it should be possible to debug on Windows, with symbols, using gdb. The mingw gcc toolchain can produce debugging information in the DWARF format. When we switch to mingw-clang, it might be possible to generate pdbs in which case we could generate symbols understandable with WinDBG.
## General
You'll want to get familiar with the following environment variables for debugging:
- MOZ_DEBUG_CHILD_PROCESS - should breakpoint child processes, but doesn't really work in Windows
- MOZ_DEBUG_CHILD_PAUSE - will pause child processes upon start and print their pid so you can connect to them
- MOZ_LOG - for outputting logging information
- MOZ_IPC_MESSAGE_LOG - more other logging relating to IPC
You'll also want to be debugging a build with MOZ_ASSERTs - these are usually what we use to orient ourselves. So make sure you're building with --enable-debug, which is not the default for tor.
## Debugging with gdb
The most important thing, if you're going to attempt to debug with gdb is to use a MinGW compiled gdb. You _cannot_ use the cygwin gdb. If you just want a pre-compiled gdb, you can get one here: <http://www.equation.com/servlet/equation.cmd?fa=gdb>
## Debugging with WinDBG
WinDBG commands make no sense. Get <http://windbg.info/doc/1-common-cmds.html> open and ready.
### Working Backwards from a Crash
First off, we have to figure out if the crash is in the parent or child. Try launching the process through WinDBG and see if you get the crash. Otherwise try attaching to the child (using MOZ_DEBUG_CHILD_PAUSE). Once you have a process broken in WinDBG, **g** will tell it to continue. ('g' for 'go')
Okay, we've crashed:
```plaintext
*** ERROR: Symbol file could not be found. Defaulted to export symbols for C:\cygwin64\home\Tom\working\ff-transfer\debugging\14 svg logging\xul.dll -
Why did we crash? This code is trying to move 7Ah into the memory address stored in rax. What's in rax? We can look at registers with **r** (for registers.)
Note that the command I type starts with '0:000>'.
Okay. rax is 0x0000000000000000 - yea trying to store memory to the nullptr is going to crash. Why is it doing that?!??! (There's an easy answer below, you should learn this signature, but for now let's keep going.)
Once you've crashed, you can get a stacktrace with **k**.
Okay, time to explore the frames. Our primary goal is to understand where we are. To do that, we're going to go up the frame until we find something with an ASSERT or a LOG. Basically, we're looking for something that's referencing a hardcoded string we can search for. ASSERTs tend to be more common.
First, let's look at our crashing frame. That's 'xul!XRE_GetBootstrap+0x2508847'. We want to disassemble the code we are crashing at. We can do what with **u** (u for unassemble.) This will show the assembly code that we crash on:
Notice that although I ask it to unassemble a relative address, the memory address it shows me is absolute: 071da737.
This shows us the crashing instruction and the next few we would execute, but really we'd like to get more context. So we're going to unassemble the entire function of this frame with **uf**.
Note that the first line after the 'uf' instruction shows us the _start_ of the function. xul!XRE_GetBootstrap+0x2508847 is where we crashed; but xul!XRE_GetBootstrap+0x2508680 is where the function began. And the absolute address of xul!XRE_GetBootstrap+0x2508847 is 071da737 - if you control+f for 071da737 on this page you'll see the same instructions we unassembled.
Also: why does it break the function up with newlines in between? Each of those sections of the function is part of a branch. In IDA you'd get a nice graph view like so: <http://hexblog.com/ida_pro/pix/idaqt_preview_100310_1.html> (picture is an illustrative example, not this particular function). Take 'xul!XRE_GetBootstrap+0x2508852' for example. If you control+f on this page for 0x2508852, you'll see there are a couple of places in this function we jump to this location.
Now, let's pretend this frame doesn't tell us anything useful. We'd need to look at the address of one frame up: xul!XRE_GetBootstrap+0x250891c
In this situation we see the instruction immediately following the 'call' we made to go one stack frame down. Again, we need more context, so disassemble the whole function with **uf**.
00000000`071da892 4883ec80 sub rsp,0FFFFFFFFFFFFFF80h
00000000`071da896 5d pop rbp
00000000`071da897 c3 ret
```
Now, what we're looking for is a sequence of instructions like the below. We actually hit this in the first stack frame, I just wanted to demonstrate going up a stack frame.
```plaintext
xul!XRE_GetBootstrap+0x2508829: <----- START OF A BRANCH, USUALLY PRESENT
00000000`071da719 41b87a000000 mov r8d,7Ah <------ MOV OF A CONSTANT INTO A REGISTER
00000000`071da71f 488d156a058905 lea rdx,[xul!workerlz4_maxCompressedSize+0x9069b0 (00000000`0ca6ac90)] <-------- LEA OF A MEMORY ADDRESS INTO A REGISTER
00000000`071da726 488d0dbb078905 lea rcx,[xul!workerlz4_maxCompressedSize+0x906c08 (00000000`0ca6aee8)] <-------- LEA OF A MEMORY ADDRESS INTO A REGISTER
At this point, we know what file and line we are dealing with. By comparing the callsites in the code with the 'call' instructions, we are almost always able to identify which function we have traversed into, until we finally have a decent idea what the crash is.
### Breaking where we want to break
The above section is good if you have a crash and want to go backwards. What if you want to break someone in particular, not related to a crash? Without symbols, how do you do it? There's a trick. Here's the code you'll want:
```plaintext
if(getenv("XXX_FINDME1"))
#ifdef __MINGW32__
__builtin_trap();
#else
DebugBreak();
#endif
```
This will let you break (conditionally, depending on the enviornment variable) right where you want.
You'll need to do the following dance:
1. Go into System -> Advanced System Settings -> Environment Variables and set your env var
2. THEN START WINDBG. This is very important. The Env Var change will not be picked up unless you start WinDBG after you set the variable (and close the dialog boxes I think)
3. Attach/Execute the program and get the address (as seen below)
4. CLOSE WINDBG
5. Delete the environment variable
6. REOPEN WINDBG
7. Attach/Execute the process and set your breakpoint
If you're comparing to a non-MinGW build using DebugBreak(); the fault will look like the below. Note that I need to go one frame up in the call stack to get the correct function.
If you see an address like xul!XRE_GetBootstrap+0x2508847 - that's a relative address (it's relative to xul.dll!XRE_GetBootstrap - It's not the _function_ XRE_GetBootstrap, it's 0x2508847 bytes after the start of that function). If you see an address like 00000000\`071da837 or 071da837 - that's an absolute addresses.
If a relative address points to code (it usually does), then that address will be the same across process executions. xul.dll will move around in memory due to ASLR, but as long as you're running the same build, the instructions at xul!XRE_GetBootstrap+0x2508847 will be the same.
The same is not true for absolute addresses. These addresses are not going to be the same across process execution.
## Symbols Appendix
Currently, building without stripping symbols for Windows produces a xul.dll over 2 GB. This is larger than is acceptable by Windows and thus it won't load the dll and firefox.exe will error very early on.
Using objcopy --add-gnu-debuglink one can move the debug sections to another file and tell gdb the symbols are there. This is explained in <https://stackoverflow.com/questions/866721/how-to-generate-gcc-debug-symbol-outside-the-build-target/866731#866731>
Once that it done, we can run firefox, and in theory tell gdb where the symbols are. However gdb choked on reading the symbols. This is detailed at <https://ritter.vg/misc/ff/dwarf-error.html> . Further investigation (in <https://ritter.vg/misc/ff/dwarf-error-2.html> ) revealed the culprit was yasm passing -dwarf2. The bypass for this is to comment out the -g dwarf2 line in <https://searchfox.org/mozilla-central/rev/08df4e6e11284186d477d7e5b0ae48483ecc979c/python/mozbuild/mozbuild/frontend/context.py#375> Once that is done, you will get a symbol file gdb can, supposedly, read.
All is not well. Following this fix, objcopy segfaulted: <https://sourceware.org/bugzilla/show_bug.cgi?id=23061> As explained in the bug, the suspicion is that something in gcc is overflowing (probably because it's a signed int and we're overflowing that) and resulting in corrupt DWARF information. At this point I ceased investigation.
Besides investigating the suspected gcc bug(s), there are other ways that are worth investigating for symbols:
- gz (not supported in MinGW??)
- gsplit-dwarf - This is supposed to move the dwarf data to separate files.
- g1 - This produces less debugging information. This produces a xul.dll that is within the size limits, but didn't run. The dwarf2 fix may be needed.
# mingw-clang
## Creating debug symbols
With Clang it is possible to create PDBs and use them to debug with WinDBG (and also with Visual Studio, to some extents).
You need to add `MOZ_COPY_PDBS=1` to `mozconfig-windows-x86_64`, however also some other options may be useful:
```
ac_add_options MOZ_COPY_PDBS=1
# This enables asserts
ac_add_options --enable-debug
# Make debug easier without optimizations
ac_add_options --disable-optimize
```
In this way, all the PDBs are created, but sadly not all are copied (`xul.pdb` is a notable exception; see #31546).
However, there is a workaround to get also the missing ones.
Add the following line after `./mach build --verbose` in `projects/firefox/build`:
Please notice that symbols are quite heavy (`xul.pdb` is almost 1GB alone) and their copy will make the creation of the installer quite longer.
Also notice that all references to the source code are absolute paths in the build environment. However if you open the source file manually, it is recognized without any problem.
## WinDBG Cheatsheet
There are three versions of WinDBG: WinDbg x86 WinDbg x64 WinDbg or WinDbg Preview
As far as I know, the commands in all of them are the same. But WinDbg Preview is the newest version, the easiest to install, and has a better interface for viewing multiple things at once. (That doesn't mean it's good, just that it's better.)
HOWEVER, the one thing that WinDbg Preview lacks is the Process and Threads window, which lets you switch which process you're debugging. That's important, so I usually use WinDbg x64/x86.
### Debugging Multi-Process Firefox
WinDbg Preview lets you automatically attach to all child processes. You'll hit a cc as a new process is launched and you can get breakpoints then.
In regular WinDbg, you can enable/disable the child process debugging setting by:
```plaintext
.childdbg 1 (or 0)
```
To switch processes, open the Processes and Threads window (which is not available in WinDbg Preview) and choose the process from that.
You can orient yourself when you land inside a process with the command
```plaintext
!peb
```
This will print the Process Environment Block, which will contain the PID, command line (good to telling what type of process you're in) and other information like Environment Variables.
If you're annoyed by the initial breakpoint on content process start, or the breakpoint on process end, you can disable them. This is especially useful if you've edited the source to include ::DebugBreak() where you want to break and thus don't need to set breakpoints.
```plaintext
sxi ibp - Disable the process starting breakpoint
sxi epr - Disable the process ending breakpoint
```
### WinDbg Commands
```plaintext
kn - call stack frames
.frame ## - switch the specified frame
```
```plaintext
d? <addr> - dump the memory at <addr> (choose of the below:)
db <addr> - dump the memory at <addr> as bytes
da <addr> - dump the memory at <addr> as an ASCII string
da /c100 <addr> - an ASCII string in 100 char columns
```
```plaintext
u <addr> - unassemble the memory at <addr>
u <addr> L10 - unassemble next 10 lines
uf <addr> - unassemble as function
```
You can work backwards by specify u -1 and so on. But when you do that, there is no guarentee you're unassembling on an instruction boundary, so you can wind up getting garbage instructions by beginning disassembly in the middle fo an instruction. (They'll show up as real assembly usually, but they don't make sense.) Because of the nature of assembly, after some smallish (5-15) number of instructions, they will almost already reconverge onto the correct assembly sequence. So try doing -x, -x+1, -x+2, and so on until you see something that seems like it's consistent.
uf is more reliable, it will disassemble the entire function block that is contained in. Untill it can't identify it as being inside a function, in which case it will fail and you'll have to use u.
```plaintext
bp <addr> - place a breakpoint at <addr>
```
There are variants of bp: bu and bm. They have to do with unresolved symbols or symbols matching a pattern. But I've never really had a problem just using bp - I think it gets converted to the correct type if it needs to be in most cases.
```plaintext
ba <r/w/e/i> <size> <addr>
```
Place a memory breakpoint at . Read or Write makes the most sense. is probably 4 or 8 bytes.
search in memory for string. You'll need to specify a address. A good bet is to specify the start of the xul.dll module; unless of course the string you're lookign for is in another module, like mozglue. You can also specify 0 for the start; and 0xffffffffffffffff for the length. The ? in the command is supposed to be there, you need that. Additionally, the string (if it is a string and not a byte value) needs to be in double quotes).
You can loop over the results and dup them for easy skimming also.
```plaintext
p - single step execution (step over)
t - single step execution (step into)
pt - step to next return
pc - step to next call
```
In assembly mode, p will execution a single assembly instruction. When you have source code, it will execution a single line. I'm a little fuzzy on this, so experiment and correct as needed.
```plaintext
x xul!*symbolname* - Search for Symbols
```
If the address is relative to a function or module, then ASLR won't matter. If it's absolute, then it probably will.
```plaintext
dt <variablename> - displays the value of a variable
dv - dump all local variables
dt -v <variablename> - verbose information about a variable.
```
dt -v is useful when you need to get the address of a pointer, not the value of what it points to.
```plaintext
wt -oR
```
This is a complicated command that will execute the current function until its return AND print out all functions it calls (recursively) with the return value of them. If you have a Windows system call that's suceeding on one build but failing on another, this is useful to compare what those calls are doing and where they diverge.
### Conditional Breakpoints
Conditional breakpoints are fuckign ugly as sin. Here's the most common ones you'll need. Fortunately, if you get the syntax wrong, it will always break and tell you "Syntax error at "...
```plaintext
bp <address/function> "<complicated-condition-string>"
```
The 'gc' command you'll see all over the place is 'go from conditional breakpoint'. It means "Don't break". The absence of 'gc' means "break".
Checking member variables:
```plaintext
bp xul!mozilla::net::CacheFileMetadata::OnDataRead ".if(((@@c++(this->mBuf))==0x00)){.echo \"hit\"} .else{gc}"
bp xul!mozilla::net::CacheFileMetadata::GetElement ".if((@@c++(this->mBuf))==0x00 & (@@c++(this->mElementsSize))!=0x00) {.echo \"hit\"} .else{gc}"
This is the same one as above (value in a register), except you specifically target the rax register (where the return value is stored) and you place the breakpoint in the right location (at the end of the function).
See <https://blogs.msdn.microsoft.com/jigarme/2007/10/28/how-to-break-in-windbg-when-particular-function-returns-specific-value/> Note the instructions there to get the return address actually get the first address after the function returns; which is specific to each callsite. It's better to use uf on the function in question and place a breakpoint on the ret instruction (of which there may be multiple).
Conditional breakpoint if a particular function is in the stack:
<https://stackoverflow.com/a/7800435>
That plus checking a return value:
```plaintext
bp 06f480da "r $t0 = 0;.foreach (v { k }) { .if ($spat(\"v\", \"*CacheFileMetadata::OnDataRead*\")) { r $t0 = 1;.break } }; .if( $t0 = 0 & rax = 0x80040111 ) { gc }"
```
^ that version is too slow. refactored to this version: /
```plaintext
bp 06f480da ".if( @rax = 0x80040111 ) { r $t0 = 0;.foreach (v { k }) { .if ($spat(\"v\", \"*CacheFileMetadata::OnDataRead*\")) { r $t0 = 1;.break } }; .if( $t0 = 0 ) { gc }}"
It is possible to debug also with Visual Studio, to some extents, even without the solution.
When you open VS, you can choose "Continue without code". At that point, you can attach to a process and debug it.
If you have debug symbols, in case of crashes, Visual Studio automatically asks for the path to the source file (since the one in the symbols is the absolute one in the build environment). Then it will recognize the source directory and load other files from there automatically.
If you need to set a breakpoint, you can set the source directory manually. Even though we do not have a VS solution, when you attach the exe, VS creates a temporary one. You can even modify its properties, where you can find «Debug Source Files». At that point you can add breakpoints even without crashes.
The only caveat is that I could not find a way to work with child processes. There is some extension to do that, but in my quick test, it did not work, while I could get the job done with WinDBG preview.