Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • D dynamorio
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 1,467
    • Issues 1,467
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 44
    • Merge requests 44
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • DynamoRIO
  • dynamorio
  • Issues
  • #4067
Closed
Open
Issue created Jan 30, 2020 by Derek Bruening@derekbrueningContributor

HANG on Appveyor hitting 1-hour timeout with logs ending much sooner

While moving Appveyor to VS2017/Win10 (#2924 (closed)) I hit several cases where Appveyor failed because the build hit their 1-hour time limit. Yet the full log ended at much less than 1 hour: 40 minutes, 20 minutes, and in the latest case 17 minutes. It ended during the 64-bit build a few times, and during the 32-bit build a few times. I ran locally to see if I could repro any popups or something and I did find a few, which I fixed in PR #3999. I thought it was all behind us but now I just saw another timeout:

https://ci.appveyor.com/project/DynamoRIO/dynamorio/builds/30450667

I've attached the full log. It ends with:

[00:17:28] test 261
[00:17:28]         Start 261: code_api,disable_traces|client.events
[00:17:28] 
[00:17:28] 261: Test command: C:\projects\dynamorio\build\build_debug-internal-32\bin32\drrun.exe "-s" "90" "-quiet" "-debug" "-use_dll" "C:/projects/dynamorio/build/build_debug-internal-32/lib32/debug/dynamorio.dll" "-exit0" "-stderr_mask" "0xC" "-msgbox_mask" "0" "-dumpcore_mask" "0x7d" "-staged" "-code_api" "-disable_traces" "-c" "C:/projects/dynamorio/build/build_debug-internal-32/suite/tests/bin/client.events.dll.dll" "--" "C:/projects/dynamorio/build/build_debug-internal-32/suite/tests/bin/client.events.exe" "C:/projects/dynamorio/build/build_debug-internal-32/suite/tests/bin/client.events.appdll.dll"
[00:17:28] 261: Test timeout computed to be: 600
[00:17:28] 261: appdll initialized
[00:17:28] 261: exception event redirect
[00:17:28] 261: Redirect success!
[00:17:28] 261: exit event 2
[00:17:28] 261: exit event 1
[00:17:28] 261: module load event 1 is called 1 time
[00:17:28] 261: module load event 2 is called 1 time
[00:17:28] 261: thread init event 1 is called 1 time
[00:17:28] 261: thread init event 2 is called 1 time
[00:17:28] 261: bb event 1 is called 1 time
[00:17:28] 261: bb event 2 is called 1 time
[00:17:28] 261: delete event 1 is called 1 time
[00:17:28] 261: delete event 2 is called 1 time
[00:17:28] 261: filter syscall event 1 is called 1 time
[00:17:28] 261: filter syscall event 2 is called 1 time
[00:17:28] 261: pre syscall event 1 is called 1 time
[00:17:28] 261: pre syscall event 2 is called 1 time
[00:17:28] 261: post syscall event 1 is called 1 time
[00:17:28] 261: post syscall event 2 is called 1 time

The time prefix is the total time since the build started: so if the log itself is not truncated, it sat there for 43 minutes from that point.

Two tests have not finished:

$ egrep 'Test #|Start [0-9]' ~/Downloads/log\ \(24\).txt | cut -c 31-90 | awk '{print $1}' | sed 's/\r$//' | sort | uniq -u
code_api,disable_traces|client.events
code_api|tool.drcachesim.invariants

The invariants test:

[00:14:25]         Start 221: code_api|tool.drcachesim.invariants
[00:14:25] 221: Test command: C:\projects\dynamorio\build\build_debug-internal-32\bin32\drrun.exe "-s" "90" "-quiet" "-debug" "-use_dll" "C:/projects/dynamorio/build/build_debug-internal-32/lib32/debug/dynamorio.dll" "-exit0" "-stderr_mask" "0xC" "-msgbox_mask" "0" "-dumpcore_mask" "0x7d" "-staged" "-code_api" "-t" "drcachesim" "-ipc_name" "drtestpipe_invariants" "-test_mode" "--" "C:/projects/dynamorio/build/build_debug-internal-32/suite/tests/bin/client.winxfer.exe"
[00:14:25] 221: Test timeout computed to be: 600
[00:14:25] 221: About to create thread
[00:17:15] 221: <Timeout expired - 1st wait, possible deadlock (or you were debugging)
[00:17:15] 221: version 7.91.18291, custom build
[00:17:15] 221: -no_dynamic_options -client_lib 'C:\projects\dynamorio\build\build_debug-internal-32/clients/lib32/debug/drmemtrace.dll;0;-ipc_name drtestpipe_invariants -test_mode' -code_api -probe_api -dumpcore_mask 125 -stderr_mask 12 -stack_size 56K -max_elide_jmp 0 -max_elide_call 0 -no_inline_ignored_syscalls -staged -native_exec_

Looks very similar to an earlier 1-hour timeout where I thought it was from a VCRT abort messagebox I saw over Appveyor RDP (could not repro locally; did not attempt to debug over RDP). I configured VCRT to not popup on abort via _CrtSetReportMode in PR #2999. That abort was:

---------------------------
Microsoft Visual C++ Runtime Library
---------------------------
Debug Assertion Failed!

Program: ...io\build\build_debug-internal-64\clients\bin64\drcachesim.exe
File: C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\include\vector
Line: 1742

Expression: vector subscript out of range

For information on how your program can cause an assertion
failure, see the Visual C++ documentation on asserts.

(Press Retry to debug the application)

---------------------------
Abort   Retry   Ignore   
---------------------------

In a local run there was one other msgbox I did not yet try to suppress, b/c the suite kept going and the app was killed properly:

win32.tls.exe popped this up but it disappeared after ~30s:

---------------------------
Microsoft Visual C++ Runtime Library
---------------------------
Debug Error!

Program: ...d_suite\build_debug-internal-32\suite\tests\bin\win32.tls.exe

abort() has been called

(Press Retry to debug the application)

---------------------------
Abort   Retry   Ignore   
---------------------------

Several other tests have timed out (our CTest 90s timeout) with their last output 'About to create thread' on this VS2017/Win10 setup, presumably covered by #4058: 32-bit drutil-test; 64-bit callback; 32-bit drmgr-test up above.

So we have several action items:

  • Add the _CrtSetReportMode to tools.h to avoid popups in test apps too
  • Try to repro the hang in either client.events or drcachesim or the invariants test app
  • If the app dies, on Windows drcachesim seems to wait forever for the pipe: can we add a timeout there
  • Figure out and fix the underlying cause of the drcachesim vector abort()
  • Figure out and fix the underlying cause of the win32.tls.exe abort()
  • Try to repro and fix the problem at 'About to create thread'

Some of that may overlap with #4058.

Assignee
Assign to
Time tracking