qira is cool but it doesn't scale. It singlesteps the program, recording the effects of each instruction in a fairly naive way. gdb's built-in reverse execution is similar. It's roughly a factor of 1000x slowdown and requires massive storage.
Getting the overhead down to < 2x (for rr) or 10x (for TTD), with reasonable trace sizes, requires much higher tech ... and is absolutely necessary for most users.
Getting the overhead down to < 2x (for rr) or 10x (for TTD), with reasonable trace sizes, requires much higher tech ... and is absolutely necessary for most users.