Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • D dynamorio
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 1,467
    • Issues 1,467
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 44
    • Merge requests 44
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • DynamoRIO
  • dynamorio
  • Issues
  • #2270
Closed
Open
Issue created Mar 07, 2017 by Derek Bruening@derekbrueningContributor

CRASH during detach in now-native thread

An app with a statically-linked DR often crashes during detach:

<Starting application myapp (595523)>
<Attached to 653 threads in application myapp (595523)>
<Detaching from application myapp (595523)>
SIGSEGV received by PID 595523 (TID 598205)
PC: @          0x4b3dc5d  (unknown)  safe_read_tls_magic
    @          0xfee3683        880  appSignalHandler()
    @     0x7f8574c47390       1440  (unknown)
    @          0x4bf5836        992  master_signal_handler_C
    @          0x4b3d883     321120  (unknown)
(gdb) thread find 598205                                                   
Thread 1 has target id 'LWP 598205'

(gdb) dps $rsp $rsp+8000
<...>
0x00007f76be49ddc8  0x000000000fee36d3  appSignalHandler
<...>
0x00007f76be49e6c8  0x0000000004be9559  get_thread_private_dcontext + 89 in section .text
0x00007f76be49e6d0  0x00007f76be49eab0  No symbol matches (void *)$retaddr.
0x00007f76be49e6d8  0x0000000004bf5836  master_signal_handler_C + 54 in section .text
0x00007f76be49e6e0  0x0000000000000000  No symbol matches (void *)$retaddr.
<...>
0x00007f76be49eab8  0x0000000004b3d883  dynamorio_sigreturn in section .text

    (gdb) dps 0x00007f76be49eab8 0x00007f76be49eab8+512
    0x00007f76be49eab8  0x0000000004b3d883  dynamorio_sigreturn in section .text
    0x00007f76be49eac0  0x0000000000000001  No symbol matches (void *)$retaddr.
    0x00007f76be49eac8  0x0000000000000000  No symbol matches (void *)$retaddr.
sp  0x00007f76be49ead0  0x00007f76be48f000  No symbol matches (void *)$retaddr.
    <...>
rsp 0x00007f76be49eb60  0x00007f76be4ecf50  No symbol matches (void *)$retaddr.
rip 0x00007f76be49eb68  0x00000000073fc4dc  appfunc
    <...>
sig 0x00007f76be49ebf0  0x000000000000001b  No symbol matches (void *)$retaddr.
    0x00007f76be49ebf8  0x0000000000000080  No symbol matches (void *)$retaddr.

Just SIGPROF arriving at random point of thread that's been detached and is now native. Our handler is still in place, and it calls get_thread_private_dcontext().

Looking back down the stack at the SIGSEGV:

rsp 0x00007f76be49e1e0  0x00007f76be49e6c8  No symbol matches (void *)$retaddr.
rip 0x00007f76be49e1e8  0x0000000004b3dc5d  safe_read_tls_magic in section .text
    <...>
sig 0x00007f76be49e270  0x000000000000000b  No symbol matches (void *)$retaddr.
    0x00007f76be49e278  0x0000000000000001  No symbol matches (void *)$retaddr.

(gdb) x/2i 0x0000000004b3dc5d
   0x4b3dc5d <safe_read_tls_magic>:     mov    %gs:0x60,%eax

So it's the expected fault after we've removed our segment. So why didn't the SIGSEGV just go to our safe_read_tls_magic check and from there go to safe_read_tls_magic_recover? Is it a race where we removed our handler before the SIGSEGV was delivered, and that's why it went to the app? We remove it once we detach from the final thread: actually once we also do thread exit from the detaching thread, right?

(gdb) p dynamo_exited
$2 = 0
(gdb) p doing_detach
$3 = 256
(gdb) p dynamo_detaching_flag
$4 = -1
(gdb) p dynamo_exited_and_cleaned
$5 = 0
(gdb) p num_known_threads
$6 = 0
(gdb) p dynamo_initialized
$7 = 0
(gdb) p do_once_generation
$8 = 2

It looks like dynamo_exit_post_detach() has run, though maybe the detacher made further progress while the fault was being processed.

Proposal: check doing_detach in master_signal_handler_C and if true, and it's some alarm signal, just drop it on the floor? Or try to invoke app handler if it's not SIGUSR2 (or a fault?).

Assignee
Assign to
Time tracking