Unnamed Skill
File I/O operations including binary formats (npy/npz), text processing (csv), and memory-mapping for huge datasets. Triggers: io, load, save, npz, genfromtxt, memmap, loadtxt.
$ 安裝
git clone https://github.com/majiayu000/claude-skill-registry /tmp/claude-skill-registry && cp -r /tmp/claude-skill-registry/skills/data/numpy-io ~/.claude/skills/claude-skill-registry// tip: Run this command in your terminal to install the skill
SKILL.md
name: numpy-io description: File I/O operations including binary formats (npy/npz), text processing (csv), and memory-mapping for huge datasets. Triggers: io, load, save, npz, genfromtxt, memmap, loadtxt.
Overview
NumPy I/O handles the transition of data between system memory and persistent storage. It supports highly efficient binary formats (.npy, .npz), flexible text parsers for messy data, and memory-mapping for datasets that exceed available RAM.
When to Use
- Storing model weights or large datasets in a compressed binary format.
- Reading messy CSV files with missing data or mixed types.
- Processing multi-gigabyte datasets without loading the entire file into memory.
- Bundling multiple related arrays into a single archive file.
Decision Tree
- Storing data for future NumPy use?
- Use
.npyfor single arrays,.npzfor multiple arrays.
- Use
- Is the file larger than your RAM?
- Use
np.memmapto access disk segments lazily.
- Use
- Reading a clean CSV?
- Use
np.loadtxt(fast).
- Use
- Reading a messy CSV with missing values?
- Use
np.genfromtxt(feature-rich but slower).
- Use
Workflows
-
Handling Large Disk-Bound Arrays
- Create a memory-mapped file with
np.memmap(filename, dtype='float32', mode='w+', shape=shape). - Process or fill the array as if it were in memory.
- Call
.flush()to ensure all data is written to the physical disk.
- Create a memory-mapped file with
-
Importing Messy CSV Data
- Define a dictionary of converters for specific columns.
- Use
np.genfromtxt(csv_file, delimiter=',', skip_header=1, converters=converters). - Access the data which now has NaNs or default values where data was missing.
-
Bundling Multiple Arrays for Storage
- Identify several related ndarrays.
- Save them into a single archive with
np.savez_compressed('data.npz', arr1=arr1, arr2=arr2). - Load them later using
data = np.load('data.npz')and access viadata['arr1'].
Non-Obvious Insights
- Manual Flush Requirement: Changes to a
memmaparray are not guaranteed to persist on disk until.flush()is called. - NPZ Lazy Loading:
.npzfiles are ZIP archives;np.loadon an.npzdoes not load the data until you access a specific key. - Parser Capability:
genfromtxthandles column selection, comment characters, and missing value substitution automatically, making it the primary tool for "real-world" text data.
Evidence
- "Memory-mapped files are used for accessing small segments of large files on disk, without reading the entire file into memory." Source
- "savez_compressed... Save several arrays into a single file in compressed .npz format." Source
Scripts
scripts/numpy-io_tool.py: Functions for memmap creation and compressed npz saving.scripts/numpy-io_tool.js: Basic CSV line parser simulation.
Dependencies
numpy(Python)
References
Repository

majiayu000
Author
majiayu000/claude-skill-registry/skills/data/numpy-io
0
Stars
0
Forks
Updated7h ago
Added1w ago