Environment Path Analysis
Scope
This page integrates the maintained findings from the PyTorch dependency and environment-path evaluations. The source material came from:
the 2-node profiling evaluation with
python3 -c "import torch"the environment-path analysis for the same workload
the full-path profiling evaluation in the final experiment set
The goal is to explain why a simple import touches so many files and how to reduce unnecessary path fan-out without breaking the active runtime.
Workload Context
The maintained conclusions come from profiling-enabled runs that launched a
Python environment through Copper and then executed a minimal
python3 -c "import torch" workload. Although the user-visible workload is
small, the runtime behavior is not. The interpreter, import system, package
manager environment, dynamic loader, and native Torch dependency graph all
participate in the startup path.
The corresponding profiling outputs showed the mounted environment dominating the lookup stream, especially under:
the environment root itself
lib/lib/python3.12lib/python3.12/site-packagessite-packages/torch
The maintained iter3 artifacts under docs/source/iter3 also preserved a
full-path usage analysis for the active Conda environment. That analysis is
useful because it complements the hot-path tables with a coarse “used versus
available” estimate for the exact same workload family.
Why import torch Touches So Many Paths
The import is simple at the Python source level, but not at the runtime level. In one launch, the following all happen before user code does meaningful work:
Python interpreter startup
import-system path discovery under
lib/python*andsite-packagespackage discovery inside
torchand its transitive importsdynamic-loader resolution for compiled extension modules
ROCm shared-library loading
repeated probes for optional or absent libraries and helper paths
In practice, the observed path fan-out is the combined effect of:
the selected
python3executable fromPATHConda activation variables such as
CONDA_PREFIXPython import search rules and
site.pyprocessingLD_LIBRARY_PATHsearch behavior for native librariesPyTorch’s compiled ROCm dependency graph
repeated missing-path probes that are normal for loader startup
This is why a nominally simple import often fans out into:
Python interpreter startup work
standard-library discovery
site.pyprocessingpackage-directory walks inside
site-packagesimport of many Torch Python subpackages
loading of compiled extension modules
dynamic-loader resolution of large native dependency sets
optional-library probing that is expected to fail in many cases
Observed Path Classes
The full-path profiling run showed the hottest filesystem classes clearly. Across the four-rank cluster summary, the dominant classes were:
Path class |
Total events |
Meaning |
|---|---|---|
|
|
broad activity under the environment root and its parent directories |
|
|
interpreter startup and stdlib discovery |
|
|
Python-side Torch package import activity |
|
|
package-discovery traffic in the active environment |
|
|
compiled Torch and ROCm libraries loaded during startup |
|
|
optional or absent shared-library probes |
Representative missing-path classes included:
libhsa-amd-aqlprofile64.sopython312.zipglibc-hwcapspyvenv.cfg
These are usually normal probes rather than application bugs.
The profiling notes also showed heavy data reads from native libraries such as:
libtorch_cpu.solibtorch_python.solibamdhip64.solibMIOpen.solibmagma.solibrocblas.solibrocsolver.solibrocsparse.so
That pattern is consistent with a large GPU-enabled Torch stack rather than a small pure-Python package import.
Path Coverage in the Iter3 Environment Copy
The iter3 path-usage analysis compared the observed full-path outputs against the full existing path universe under the selected Conda environment root:
Measure |
Value |
|---|---|
All existing paths under the selected root |
|
Existing files under the selected root |
|
Existing directories under the selected root |
|
Existing paths observed in the run |
|
Missing probe paths |
|
Existing paths not observed in the run |
|
The same summary expressed that as coverage of the selected root:
Coverage metric |
Value |
|---|---|
Observed files |
|
File coverage |
|
Observed directories |
|
Directory coverage |
|
That result is useful, but it needs to be interpreted carefully. It does not
mean the remaining roughly 95% of files are safe to delete in general.
It means only that, in this same-app, same-node-count, same-configuration run,
the observed import path touched a relatively small fraction of the total
environment tree.
Operationally, the main value of this result is:
it shows that the active workload depends on a minority of the available file tree during this exact startup path
it supports using observed paths as an initial allowlist for cloned or filtered follow-up experiments
it argues for evidence-driven pruning rather than assuming the whole environment is equally active
Environment Variables That Matter Most
PATHChooses which
python3is launched. Once the interpreter comes from the Conda environment mounted through Copper, many later paths are derived from that prefix automatically.CONDA_PREFIXAnchors the active environment root, including
bin,lib, andlib/python*/site-packages.PYTHONPATHAdds optional import roots. It is important, but it is not the whole story; Python still derives a large built-in search path from the interpreter and its install prefix.
VIRTUAL_ENVIs often not the main driver for Conda-based runs, but Python still probes for virtual-environment style markers such as
pyvenv.cfgwhile establishing its runtime layout.LD_LIBRARY_PATHControls shared-library search order for compiled extensions and ROCm libraries. Duplicate or stale entries here can create large probe storms.
srun --export=ALLReplicates the activated environment across all ranks, which is necessary for correctness but also multiplies import and loader discovery activity.
Path Sources by Subsystem
Different path classes come from different subsystems, so path reduction works best when those subsystems are considered separately.
Python import machinery contributes:
interpreter-prefix discovery
stdlib and
lib-dynloadwalkssite-packagesscanningpackage and subpackage traversal for
torchand its transitive imports
Environment activation contributes:
active environment prefixes from
CONDA_PREFIXand related variablespath insertion in
PATHoptional import roots from
PYTHONPATHpropagated shell state when tasks are launched with full environment export
The dynamic loader contributes:
shared-library searches under the active environment
probing across
LD_LIBRARY_PATHentriesoptional-library probes for features that may not be installed
hardware-capability directory probes such as
glibc-hwcaps
The iter3 path-class summary is consistent with that subsystem view. The largest path classes were:
environment_prefixwith2,828,400eventspython_stdlibwith197,232eventstorch_python_packagewith188,608eventspython_site_packageswith163,484eventstorch_native_librarywith57,503events
Those totals show that most activity remains concentrated in a small set of environment, interpreter, package, and native-library regions rather than being evenly distributed across the full environment tree.
Why Missing Paths Repeat
The profiling data shows many repeated negative probes. This is expected for Python and shared-library startup:
the first lookup discovers a path is absent
later lookups ask for the same exact path again
Copper can serve that repeated miss from the metadata
ENOENTTTL
This is why high TTL-serve counts are a positive signal. They mean Copper is collapsing repeated negative metadata work that the workload would otherwise reissue.
The version4 path-analysis note highlighted several repeated examples:
libhsa-amd-aqlprofile64.sopython312.zipglibc-hwcapspyvenv.cfg
The iter3 artifacts preserved the same pattern in both the TTL top-path tables and the missing-probe lists. Representative repeated probe paths included:
.../torch/lib/libhsa-amd-aqlprofile64.so.../lib/python312.zip.../lib/glibc-hwcaps.../conda_env/pyvenv.cfg.../conda_env/bin/pyvenv.cfg
These should generally be interpreted as normal startup probes first and optimization opportunities second.
Pruning and Cleanup Guidance
The safest cleanup sequence is:
remove duplicate path entries first
remove obviously nonexistent path entries
remove stale environment or toolchain directories
only then experiment with a reduced or allowlist-based environment copy
The environment-path and full-path profiling evaluations support the following practical rules:
keep the active environment core intact first: environment root,
bin,lib,lib/python*, andsite-packagesprefer trimming duplicate or stale
LD_LIBRARY_PATHentries before touching Torch library directoriesprefer trimming duplicate or unnecessary
PYTHONPATHadditions before modifying the interpreter treetreat
python*.zip,pyvenv.cfg, andglibc-hwcapsas optimization hints, not as correctness failures
Minimization Priorities
The maintained guidance from the path-analysis work is to minimize the active environment in layers rather than trying to remove all path fan-out at once.
The safest order is:
eliminate duplicate path entries
eliminate obviously nonexistent path entries
remove stale toolchain or environment references that are no longer active
preserve the active runtime core while measuring again
only then consider more aggressive allowlist-style environment reduction
This approach keeps the debugging loop tied to observed profiling evidence instead of guessing which paths are safe to remove.
Operational Interpretation
The right question is usually not “why is Python probing so many files?” but “which of those probes are avoidable in the active environment?”
The maintained guidance from these evaluations is:
keep the active environment small and purpose-built
route only the necessary environment prefixes through Copper
keep the metadata
ENOENTTTL enableduse profiling outputs to identify duplicate, stale, or noisy environment paths before changing package contents