lammpskit.io package

The lammpskit.io package provides essential I/O functionality for reading and parsing LAMMPS molecular dynamics simulation output files. This module is designed for robust handling of trajectory data with comprehensive error checking and memory-efficient processing.

Key Features

  • Robust LAMMPS dump file parsing with automatic format detection

  • Memory-efficient coordinate loading with selective column reading

  • Batch processing capabilities for time-series analysis

  • Comprehensive error handling with descriptive failure messages

  • Support for large datasets with optimized memory management

Core Functions

The I/O module provides two primary functions for trajectory data access:

  • read_structure_info - Extract metadata from LAMMPS dump files

  • read_coordinates - Load atomic coordinates with selective data access

Performance Considerations

Memory usage scales as O(F × N × C) where: - F = number of files - N = number of atoms - C = number of columns read

For large datasets (>1GB), use DEFAULT_COLUMNS_TO_READ instead of EXTENDED_COLUMNS_TO_READ to reduce memory footprint by ~60%.

Submodules

Module contents

I/O utilities for LAMMPSKit.

This module provides general-purpose file reading and writing utilities for LAMMPS simulation data that can be used across different analysis types.

lammpskit.io.read_structure_info(filepath)[source]

Extract simulation metadata from LAMMPS trajectory file header.

Parses LAMMPS dump file to extract timestep, atom count, and simulation box dimensions. Essential for setting up analysis workflows and validating trajectory consistency. Robust to common file format variations and provides detailed error diagnostics.

Parameters:

filepath (str) – Path to LAMMPS trajectory file (.lammpstrj or .dump format). Supports both absolute and relative paths.

Return type:

Tuple[int, int, float, float, float, float, float, float]

Returns:

  • timestep (int) – Simulation timestep number. Used for temporal analysis and file sequencing.

  • total_atoms (int) – Total number of atoms in simulation. Critical for memory allocation and validation.

  • xlo, xhi (float) – Lower and upper x-bounds of simulation box (Angstroms). Defines spatial domain for analysis.

  • ylo, yhi (float) – Lower and upper y-bounds of simulation box (Angstroms). Used for periodic boundary condition handling.

  • zlo, zhi (float) – Lower and upper z-bounds of simulation box (Angstroms). Essential for layer analysis in electrochemical cells.

Raises:
  • FileNotFoundError – If trajectory file doesn’t exist at specified path.

  • EOFError – If file is truncated or missing required header sections.

  • ValueError – If header data is malformed or non-numeric values found.

  • OSError – If file permissions or disk I/O errors occur.

Notes

Function expects standard LAMMPS dump format with fixed header structure. Box bounds are returned in simulation units (typically Angstroms for MD). For triclinic cells, only orthogonal bounds are extracted.

Performance

Computational complexity: O(1) - reads only file header Memory usage: O(1) - minimal memory footprint Typical execution time: <1ms for standard trajectory files

Examples

Extract metadata for single trajectory:

>>> timestep, atoms, xlo, xhi, ylo, yhi, zlo, zhi = read_structure_info('dump.100000.lammpstrj')
>>> box_size_z = zhi - zlo  # Calculate electrode separation
>>> print(f"Timestep {timestep}: {atoms} atoms, box height {box_size_z:.2f} Å")

Validate trajectory sequence:

>>> import glob
>>> files = sorted(glob.glob('dump.*.lammpstrj'))
>>> for f in files:
...     ts, atoms, *box = read_structure_info(f)
...     print(f"File {f}: timestep {ts}, {atoms} atoms")
lammpskit.io.read_coordinates(file_list, skip_rows, columns_to_read)[source]

Load atomic coordinates and metadata from multiple LAMMPS trajectory files.

Efficiently reads trajectory sequences for time-series analysis, extracting atomic coordinates and simulation parameters. Optimized for batch processing in electrochemical cell analysis workflows. Provides comprehensive validation and memory-efficient loading with selective column reading.

Parameters:
  • file_list (list of str) – Trajectory files in temporal order. Typically generated from glob patterns like ‘dump.*.lammpstrj’ or timestep sequences.

  • skip_rows (int) – Header lines to skip before atomic data section. Standard LAMMPS format uses 9. Accounts for TIMESTEP, NUMBER OF ATOMS, BOX BOUNDS, and ATOMS header lines.

  • columns_to_read (tuple of int) – Column indices for atomic properties. Standard LAMMPS format: (0=id, 1=type, 2=charge, 3=x, 4=y, 5=z, 6=vx, 7=vy, 8=vz, …) Use DEFAULT_COLUMNS_TO_READ or EXTENDED_COLUMNS_TO_READ from config.

Return type:

Tuple[ndarray, ndarray, int, float, float, float, float, float, float]

Returns:

  • coordinates (np.ndarray, shape (n_files, n_atoms, n_columns)) – Atomic coordinate arrays for all files. First dimension indexes files, second dimension indexes atoms, third dimension indexes properties.

  • timestep_arr (np.ndarray, shape (n_files,)) – Simulation timesteps corresponding to each file. Used for temporal analysis.

  • total_atoms (int) – Number of atoms per file. Validated for consistency across all files.

  • xlo, xhi (float) – Simulation box x-bounds in Angstroms. Used for periodic boundary calculations.

  • ylo, yhi (float) – Simulation box y-bounds in Angstroms. Essential for spatial analysis setup.

  • zlo, zhi (float) – Simulation box z-bounds in Angstroms. Critical for electrode separation in electrochemical cell analysis.

Raises:
  • ValueError – If file_list is empty, column indices are invalid, or atomic data is malformed.

  • EOFError – If any file has fewer atomic lines than expected from header.

  • FileNotFoundError – If any file in file_list doesn’t exist (raised by validate_file_list).

  • Performance Notes

  • -----------------

  • Memory complexity – O(F * N * C) where F=files, N=atoms, C=columns:

  • Time complexity – O(F * N) for coordinate loading:

  • Memory optimization – Use column selection to reduce memory footprint by ~70%:

  • For large datasets (>1GB):

  • - Use DEFAULT_COLUMNS_TO_READ instead of EXTENDED_COLUMNS_TO_READ

  • - Process files in smaller batches if memory constraints exist

  • - Consider chunked reading for very large trajectories

  • Electrochemical Cell Applications

  • ---------------------------------

  • Typical usage patterns for HfTaO electrochemical analysis:

  • - Electrode separation – zhi - zlo (typically 20-100 Angstroms):

  • - Atom types – 2=Hf, odd=O, even(≠2)=Ta, {5,6,9,10}=electrodes:

  • - Time series – Multiple files representing voltage cycling or SET/RESET processes:

Examples

Load coordinate sequence for filament analysis:

>>> import glob
>>> from lammpskit.config import DEFAULT_COLUMNS_TO_READ
>>> files = sorted(glob.glob('trajectory_*.lammpstrj'))
>>> coords, timesteps, atoms, xlo, xhi, ylo, yhi, zlo, zhi = read_coordinates(
...     files, skip_rows=9, columns_to_read=DEFAULT_COLUMNS_TO_READ)
>>> print(f"Loaded {len(files)} files: {coords.shape}")
>>> electrode_separation = zhi - zlo
>>> print(f"Electrode separation: {electrode_separation:.1f} Å")

Memory-efficient loading for large trajectories:

>>> # Use core columns only: id, type, charge, x, y, z
>>> core_columns = (0, 1, 2, 3, 4, 5)
>>> coords, timesteps, atoms, *box = read_coordinates(
...     files[:10], skip_rows=9, columns_to_read=core_columns)  # First 10 files only

Validate trajectory consistency:

>>> coords, timesteps, atoms, *box = read_coordinates(files, 9, DEFAULT_COLUMNS_TO_READ)
>>> print(f"Trajectory spans timesteps {timesteps[0]} to {timesteps[-1]}")
>>> print(f"Consistent atom count: {atoms} across {len(files)} files")