Profiling Reference
Profiling Modes
Copper supports three profiling levels:
- Aggregate-only profiling
Records cluster-wide totals and per-operation summaries.
- Top-path profiling
Records the hottest paths per category, bounded by
profile_top_n.- Full-path profiling
Preserves more detailed path-level artifacts for deeper investigation.
Runtime Options
Option |
Meaning |
|---|---|
|
enable profiling collection |
|
keep the hottest |
|
enable fuller path-level forensic output |
|
write periodic snapshots while Copper is still running |
Output Layout
With profiling enabled, the job output directory typically contains:
profiling/final/- final per-rank profiling summaries and CSV filesprofiling/cluster/- aggregated cluster-level summaries produced byscripts/aggregate_profiling.pytables/final/- raw cache or table outputs associated with the run
Common Files
File |
Meaning |
|---|---|
|
human-readable per-rank profiling summary |
|
per-operation counts and timing |
|
aggregate counters and cache metrics |
|
hottest path list when top-path profiling is enabled |
Key Terms
cache hitCopper served the request from a local or forwarded cache path without repeating the underlying storage operation.
Metadata ENOENT TTL storeCopper inserted an exact-path negative metadata result into the temporary ENOENT TTL cache.
Metadata ENOENT TTL serveCopper reused that temporary missing-path result instead of repeating the backend metadata lookup.
Metadata ENOENT TTL expireAn ENOENT TTL entry aged out and was removed after its configured lifetime.
FUSE operationA filesystem request issued through the Copper mount, such as
getattr,read,readdir, oropen.MetadataFile information rather than file content, including existence, type, size, and permissions.
DataFile content bytes read from a file, such as the bytes of a shared library or Python module.
Negative resultA request that ends in a missing-path or similar unsuccessful result rather than a successful metadata or data response.
ENOENTThe standard Unix “No such file or directory” result. In profiling output, repeated
ENOENTpaths are often informative rather than erroneous.Top-path profilingA bounded hotspot view that retains the busiest paths instead of dumping the full path population.
Full-path profilingA deeper forensic mode that preserves more complete per-path evidence and can become large at scale.
Pre-destroy snapshotA profiling snapshot written before the final Copper teardown sequence.
Reading the Metrics
The maintained profiling evaluations support the following interpretations:
high
getattrcounts usually indicate metadata-heavy startup or import behaviorhigh cumulative
readlatency usually indicates that content reads, not metadata probes, dominate elapsed service timehigh
Metadata ENOENT TTL servecounts are generally a positive sign that repeated negative metadata probes are being collapsed successfullyrepeated missing shared-library probes,
python*.zipchecks, andpyvenv.cfglookups are often expected runtime behavior rather than correctness bugs
Validated Output Families
The version4 profiling validation confirmed that a profiling-enabled run can produce:
per-rank Markdown summaries
per-rank operation CSVs
aggregate CSVs
bounded top-path CSVs
raw cache-event outputs
matching
pre-destroyvariants of those artifacts
That output structure is what makes profiling useful both for quick workload inspection and for deeper environment-path debugging.
Operational Guidance
Use aggregate profiling first for scalable runs.
Add top-path profiling when path hot spots matter.
Reserve full-path profiling for smaller or targeted forensic runs.
Do not rely on normal log verbosity to infer whether profiling ran; rely on the profiling output files themselves.
Cache Usage Metrics
Copper also supports ioctl-based table-size dumps for the three main cache tables:
data cache
tree cache
metadata cache
The raw table-size outputs are produced by the existing scripts under
scripts/get_copper_stats_ioctl/. A post-analysis helper can then summarize
them into one combined cache-usage report:
Recommended workflow:
cd scripts/get_copper_stats_ioctl
# Step 1: add this to env.sh, or export it in the current shell
export VIEW_DIR=/mnt/bb/${USER}/copper_mount
# Step 2: while Copper is still running, collect the raw cache-size outputs
bash ./get_cache_usage_summary.sh /lustre/orion/proj-shared/ums046/some_output_dir
# Step 3: for offline re-summarization of an existing directory
python3 ./summarize_cache_usage.py /path/to/dir --csv /path/to/dir/cache_usage_summary.csv
The summary reports:
used bytes for each table
entry counts for each table
combined used bytes across the three tables
The raw files are:
data_cache_size.outputtree_cache_size.outputmd_cache_size.output
Memory model:
the main Copper tables are dynamically growing containers, not fixed-capacity pools
cache.husesstd::unordered_map<Key, Value>with no global size limittl_cache.hdoes the same for the Thallium-side cachesthe tables grow as entries are inserted
memory usage is bounded by normal process and node limits, including DRAM, allocator behavior, and any job or cgroup memory limits
max_cacheable_byte_sizeis only a per-file admission threshold for data caching, not a total cache budget
It does not currently report a true remaining cache available value,
because Copper does not yet maintain a fixed global cache budget for those
tables.
Sample output:
| Table | Files Found | Used Bytes | Used Human | Entries |
| -------- | ----------: | ---------: | ---------: | ------: |
| data | 1 | 70296558 | 67.04 MiB | 1025 |
| tree | 1 | 27772 | 27.12 KiB | 140 |
| metadata | 1 | 340848 | 332.86 KiB | 2367 |
| combined | - | 70665178 | 67.39 MiB | 3532 |