Commit 27c98c9
propagate proc status in supervision events for proc failures (#1877)
Summary:
Pull Request resolved: #1877
Currently, we report the agent to be Stopped. This is accurate, but confusing, and could be better attributed.
Here, we synthesize an actor failure by:
1) attributing the fault to the corresponding actor in the monitored actor mesh;
2) elevating the proc_status (which contains mode of failure, exit code, etc) into the actor failure, making it clear it is a process failure
In the future:
1) We will have a more general "Failure" struct, explicitly capturing host, proc, actor, etc., failures.
2) We will attribute the actor failure (it is the most proximate), but move the proc failure to the "cause" (i.e., proc failure caused actor to fail), which is the most correct and clear.
ghstack-source-id: 323429315
exported-using-ghexport
Reviewed By: dulinriley
Differential Revision: D86993889
fbshipit-source-id: 03578a230d155a4a9307e7468b00482ce9f36e981 parent eb2aeca commit 27c98c9
File tree
3 files changed
+40
-12
lines changed- hyperactor_mesh/src/v1/host_mesh
- monarch_hyperactor/src/v1
- python/tests
3 files changed
+40
-12
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
355 | 355 | | |
356 | 356 | | |
357 | 357 | | |
358 | | - | |
| 358 | + | |
359 | 359 | | |
360 | 360 | | |
361 | 361 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
| 18 | + | |
18 | 19 | | |
19 | 20 | | |
20 | 21 | | |
21 | 22 | | |
22 | 23 | | |
23 | 24 | | |
24 | 25 | | |
| 26 | + | |
25 | 27 | | |
26 | 28 | | |
27 | 29 | | |
| |||
466 | 468 | | |
467 | 469 | | |
468 | 470 | | |
469 | | - | |
| 471 | + | |
470 | 472 | | |
471 | 473 | | |
472 | 474 | | |
| |||
502 | 504 | | |
503 | 505 | | |
504 | 506 | | |
505 | | - | |
| 507 | + | |
506 | 508 | | |
507 | 509 | | |
508 | 510 | | |
| |||
520 | 522 | | |
521 | 523 | | |
522 | 524 | | |
523 | | - | |
| 525 | + | |
524 | 526 | | |
525 | 527 | | |
526 | 528 | | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
| 532 | + | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
| 543 | + | |
| 544 | + | |
527 | 545 | | |
528 | | - | |
| 546 | + | |
529 | 547 | | |
530 | | - | |
531 | | - | |
532 | | - | |
533 | | - | |
534 | | - | |
| 548 | + | |
| 549 | + | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
535 | 561 | | |
536 | 562 | | |
537 | 563 | | |
| |||
555 | 581 | | |
556 | 582 | | |
557 | 583 | | |
558 | | - | |
| 584 | + | |
559 | 585 | | |
560 | 586 | | |
561 | 587 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
652 | 652 | | |
653 | 653 | | |
654 | 654 | | |
| 655 | + | |
655 | 656 | | |
656 | 657 | | |
657 | 658 | | |
| |||
692 | 693 | | |
693 | 694 | | |
694 | 695 | | |
695 | | - | |
| 696 | + | |
| 697 | + | |
696 | 698 | | |
697 | 699 | | |
698 | 700 | | |
| |||
0 commit comments