False ENOENT Report
Scope
This note summarizes the false-ENOENT investigation that informed later metadata TTL handling in Copper.
Primary Finding
The observed mount failures were not caused by Slurm CPU binding. The important
behavior was repeated metadata ENOENT handling during Python and Torch
startup through the Copper mount. Some of those ENOENT results were
legitimate optional-loader probes, but Copper was also doing too much repeated
work on exact-path negative lookups.
Changes Introduced by That Investigation
The investigation produced four important runtime changes:
path-status coordination cleanup for completed entries
root-only metadata
ENOENTTTLbroader exact-path metadata
ENOENTTTL reuseconfigurable metadata
ENOENTTTL through-md_enoent_ttl_msandlaunch_copper.sh -E <value>
Operational Takeaway
The metadata ENOENT TTL improves suppression of repeated rechecks for the
same exact missing path, but it does not eliminate legitimate missing-path
metadata events generated by Python, Conda, or dynamic loader behavior.
Current Relevance
Copper retains the configurable metadata ENOENT TTL path. That logic is
complementary to the startup, readiness, and address-book scaling work; it
addresses a different class of startup overhead.