lammpskit.io.lammps_readers module

LAMMPS trajectory file readers for molecular dynamics analysis. This module provides the core I/O functionality for parsing LAMMPS dump files and extracting simulation metadata with robust error handling.

Functions Overview

Core functions for LAMMPS trajectory file reading:

  • read_structure_info - Extract metadata from LAMMPS dump files

  • read_coordinates - Load atomic coordinates with selective data access

File Format Support

Supports standard LAMMPS dump format with headers:

ITEM: TIMESTEP
ITEM: NUMBER OF ATOMS
ITEM: BOX BOUNDS [units]
ITEM: ATOMS [column headers]

Standard Column Layout: id type charge x y z vx vy vz fx fy fz

Usage Examples

Extract simulation metadata:

from lammpskit.io import read_structure_info

timestep, atoms, xlo, xhi, ylo, yhi, zlo, zhi = read_structure_info('dump.lammpstrj')
electrode_separation = zhi - zlo

Load coordinates for time-series analysis:

import glob
from lammpskit.io import read_coordinates
from lammpskit.config import DEFAULT_COLUMNS_TO_READ

files = sorted(glob.glob('dump.*.lammpstrj'))
coords, timesteps, atoms, *box = read_coordinates(
    files, skip_rows=9, columns_to_read=DEFAULT_COLUMNS_TO_READ)

Error Handling

Functions use fail-fast approach with descriptive error messages:

Module Documentation

LAMMPS trajectory file readers for molecular dynamics analysis.

This module provides robust I/O functionality for parsing LAMMPS dump files and extracting simulation metadata. Functions are designed for general-purpose use across different MD analysis workflows, with particular emphasis on electrochemical cell simulations.

Key Features

  • Automatic LAMMPS dump format detection and parsing

  • Robust error handling for malformed or incomplete files

  • Memory-efficient coordinate loading with column selection

  • Batch processing capabilities for multi-trajectory analysis

  • Simulation box parameter extraction for spatial analysis

File Format Support

Supports standard LAMMPS dump format with headers: - ITEM: TIMESTEP - ITEM: NUMBER OF ATOMS - ITEM: BOX BOUNDS [units] - ITEM: ATOMS [column headers]

Performance Notes

Memory usage: O(N * M * C) where N=atoms, M=files, C=columns For large trajectories (>10^6 atoms), use column selection to minimize memory footprint. Batch processing is optimized for time-series analysis workflows.

Error Handling Philosophy

Functions use fail-fast approach with descriptive error messages to catch file format issues early. Validation ensures data consistency across multi-file analysis workflows.

Examples

Extract simulation metadata:

>>> timestep, atoms, xlo, xhi, ylo, yhi, zlo, zhi = read_structure_info('dump.lammpstrj')

Load coordinates for analysis:

>>> from lammpskit.config import DEFAULT_COLUMNS_TO_READ
>>> coords, timesteps, atoms, xlo, xhi, ylo, yhi, zlo, zhi = read_coordinates(
...     ['dump1.lammpstrj', 'dump2.lammpstrj'], skip_rows=9,
...     columns_to_read=DEFAULT_COLUMNS_TO_READ)
lammpskit.io.lammps_readers.read_structure_info(filepath)[source]

Extract simulation metadata from LAMMPS trajectory file header.

Parses LAMMPS dump file to extract timestep, atom count, and simulation box dimensions. Essential for setting up analysis workflows and validating trajectory consistency. Robust to common file format variations and provides detailed error diagnostics.

Parameters:

filepath (str) – Path to LAMMPS trajectory file (.lammpstrj or .dump format). Supports both absolute and relative paths.

Return type:

Tuple[int, int, float, float, float, float, float, float]

Returns:

  • timestep (int) – Simulation timestep number. Used for temporal analysis and file sequencing.

  • total_atoms (int) – Total number of atoms in simulation. Critical for memory allocation and validation.

  • xlo, xhi (float) – Lower and upper x-bounds of simulation box (Angstroms). Defines spatial domain for analysis.

  • ylo, yhi (float) – Lower and upper y-bounds of simulation box (Angstroms). Used for periodic boundary condition handling.

  • zlo, zhi (float) – Lower and upper z-bounds of simulation box (Angstroms). Essential for layer analysis in electrochemical cells.

Raises:
  • FileNotFoundError – If trajectory file doesn’t exist at specified path.

  • EOFError – If file is truncated or missing required header sections.

  • ValueError – If header data is malformed or non-numeric values found.

  • OSError – If file permissions or disk I/O errors occur.

Notes

Function expects standard LAMMPS dump format with fixed header structure. Box bounds are returned in simulation units (typically Angstroms for MD). For triclinic cells, only orthogonal bounds are extracted.

Performance

Computational complexity: O(1) - reads only file header Memory usage: O(1) - minimal memory footprint Typical execution time: <1ms for standard trajectory files

Examples

Extract metadata for single trajectory:

>>> timestep, atoms, xlo, xhi, ylo, yhi, zlo, zhi = read_structure_info('dump.100000.lammpstrj')
>>> box_size_z = zhi - zlo  # Calculate electrode separation
>>> print(f"Timestep {timestep}: {atoms} atoms, box height {box_size_z:.2f} Å")

Validate trajectory sequence:

>>> import glob
>>> files = sorted(glob.glob('dump.*.lammpstrj'))
>>> for f in files:
...     ts, atoms, *box = read_structure_info(f)
...     print(f"File {f}: timestep {ts}, {atoms} atoms")
lammpskit.io.lammps_readers.read_coordinates(file_list, skip_rows, columns_to_read)[source]

Load atomic coordinates and metadata from multiple LAMMPS trajectory files.

Efficiently reads trajectory sequences for time-series analysis, extracting atomic coordinates and simulation parameters. Optimized for batch processing in electrochemical cell analysis workflows. Provides comprehensive validation and memory-efficient loading with selective column reading.

Parameters:
  • file_list (list of str) – Trajectory files in temporal order. Typically generated from glob patterns like ‘dump.*.lammpstrj’ or timestep sequences.

  • skip_rows (int) – Header lines to skip before atomic data section. Standard LAMMPS format uses 9. Accounts for TIMESTEP, NUMBER OF ATOMS, BOX BOUNDS, and ATOMS header lines.

  • columns_to_read (tuple of int) – Column indices for atomic properties. Standard LAMMPS format: (0=id, 1=type, 2=charge, 3=x, 4=y, 5=z, 6=vx, 7=vy, 8=vz, …) Use DEFAULT_COLUMNS_TO_READ or EXTENDED_COLUMNS_TO_READ from config.

Return type:

Tuple[ndarray, ndarray, int, float, float, float, float, float, float]

Returns:

  • coordinates (np.ndarray, shape (n_files, n_atoms, n_columns)) – Atomic coordinate arrays for all files. First dimension indexes files, second dimension indexes atoms, third dimension indexes properties.

  • timestep_arr (np.ndarray, shape (n_files,)) – Simulation timesteps corresponding to each file. Used for temporal analysis.

  • total_atoms (int) – Number of atoms per file. Validated for consistency across all files.

  • xlo, xhi (float) – Simulation box x-bounds in Angstroms. Used for periodic boundary calculations.

  • ylo, yhi (float) – Simulation box y-bounds in Angstroms. Essential for spatial analysis setup.

  • zlo, zhi (float) – Simulation box z-bounds in Angstroms. Critical for electrode separation in electrochemical cell analysis.

Raises:
  • ValueError – If file_list is empty, column indices are invalid, or atomic data is malformed.

  • EOFError – If any file has fewer atomic lines than expected from header.

  • FileNotFoundError – If any file in file_list doesn’t exist (raised by validate_file_list).

  • Performance Notes

  • -----------------

  • Memory complexity – O(F * N * C) where F=files, N=atoms, C=columns:

  • Time complexity – O(F * N) for coordinate loading:

  • Memory optimization – Use column selection to reduce memory footprint by ~70%:

  • For large datasets (>1GB):

  • - Use DEFAULT_COLUMNS_TO_READ instead of EXTENDED_COLUMNS_TO_READ

  • - Process files in smaller batches if memory constraints exist

  • - Consider chunked reading for very large trajectories

  • Electrochemical Cell Applications

  • ---------------------------------

  • Typical usage patterns for HfTaO electrochemical analysis:

  • - Electrode separation – zhi - zlo (typically 20-100 Angstroms):

  • - Atom types – 2=Hf, odd=O, even(≠2)=Ta, {5,6,9,10}=electrodes:

  • - Time series – Multiple files representing voltage cycling or SET/RESET processes:

Examples

Load coordinate sequence for filament analysis:

>>> import glob
>>> from lammpskit.config import DEFAULT_COLUMNS_TO_READ
>>> files = sorted(glob.glob('trajectory_*.lammpstrj'))
>>> coords, timesteps, atoms, xlo, xhi, ylo, yhi, zlo, zhi = read_coordinates(
...     files, skip_rows=9, columns_to_read=DEFAULT_COLUMNS_TO_READ)
>>> print(f"Loaded {len(files)} files: {coords.shape}")
>>> electrode_separation = zhi - zlo
>>> print(f"Electrode separation: {electrode_separation:.1f} Å")

Memory-efficient loading for large trajectories:

>>> # Use core columns only: id, type, charge, x, y, z
>>> core_columns = (0, 1, 2, 3, 4, 5)
>>> coords, timesteps, atoms, *box = read_coordinates(
...     files[:10], skip_rows=9, columns_to_read=core_columns)  # First 10 files only

Validate trajectory consistency:

>>> coords, timesteps, atoms, *box = read_coordinates(files, 9, DEFAULT_COLUMNS_TO_READ)
>>> print(f"Trajectory spans timesteps {timesteps[0]} to {timesteps[-1]}")
>>> print(f"Consistent atom count: {atoms} across {len(files)} files")