lammpskit.io.read_coordinates
- lammpskit.io.read_coordinates(file_list, skip_rows, columns_to_read)[source]
Load atomic coordinates and metadata from multiple LAMMPS trajectory files.
Efficiently reads trajectory sequences for time-series analysis, extracting atomic coordinates and simulation parameters. Optimized for batch processing in electrochemical cell analysis workflows. Provides comprehensive validation and memory-efficient loading with selective column reading.
- Parameters:
file_list (list of str) – Trajectory files in temporal order. Typically generated from glob patterns like ‘dump.*.lammpstrj’ or timestep sequences.
skip_rows (int) – Header lines to skip before atomic data section. Standard LAMMPS format uses 9. Accounts for TIMESTEP, NUMBER OF ATOMS, BOX BOUNDS, and ATOMS header lines.
columns_to_read (tuple of int) – Column indices for atomic properties. Standard LAMMPS format: (0=id, 1=type, 2=charge, 3=x, 4=y, 5=z, 6=vx, 7=vy, 8=vz, …) Use DEFAULT_COLUMNS_TO_READ or EXTENDED_COLUMNS_TO_READ from config.
- Return type:
Tuple[ndarray,ndarray,int,float,float,float,float,float,float]- Returns:
coordinates (np.ndarray, shape (n_files, n_atoms, n_columns)) – Atomic coordinate arrays for all files. First dimension indexes files, second dimension indexes atoms, third dimension indexes properties.
timestep_arr (np.ndarray, shape (n_files,)) – Simulation timesteps corresponding to each file. Used for temporal analysis.
total_atoms (int) – Number of atoms per file. Validated for consistency across all files.
xlo, xhi (float) – Simulation box x-bounds in Angstroms. Used for periodic boundary calculations.
ylo, yhi (float) – Simulation box y-bounds in Angstroms. Essential for spatial analysis setup.
zlo, zhi (float) – Simulation box z-bounds in Angstroms. Critical for electrode separation in electrochemical cell analysis.
- Raises:
ValueError – If file_list is empty, column indices are invalid, or atomic data is malformed.
EOFError – If any file has fewer atomic lines than expected from header.
FileNotFoundError – If any file in file_list doesn’t exist (raised by validate_file_list).
Performance Notes –
----------------- –
Memory complexity – O(F * N * C) where F=files, N=atoms, C=columns:
Time complexity – O(F * N) for coordinate loading:
Memory optimization – Use column selection to reduce memory footprint by ~70%:
For large datasets (>1GB): –
- Use DEFAULT_COLUMNS_TO_READ instead of EXTENDED_COLUMNS_TO_READ –
- Process files in smaller batches if memory constraints exist –
- Consider chunked reading for very large trajectories –
Electrochemical Cell Applications –
--------------------------------- –
Typical usage patterns for HfTaO electrochemical analysis: –
- Electrode separation – zhi - zlo (typically 20-100 Angstroms):
- Atom types – 2=Hf, odd=O, even(≠2)=Ta, {5,6,9,10}=electrodes:
- Time series – Multiple files representing voltage cycling or SET/RESET processes:
Examples
Load coordinate sequence for filament analysis:
>>> import glob >>> from lammpskit.config import DEFAULT_COLUMNS_TO_READ >>> files = sorted(glob.glob('trajectory_*.lammpstrj')) >>> coords, timesteps, atoms, xlo, xhi, ylo, yhi, zlo, zhi = read_coordinates( ... files, skip_rows=9, columns_to_read=DEFAULT_COLUMNS_TO_READ) >>> print(f"Loaded {len(files)} files: {coords.shape}") >>> electrode_separation = zhi - zlo >>> print(f"Electrode separation: {electrode_separation:.1f} Å")
Memory-efficient loading for large trajectories:
>>> # Use core columns only: id, type, charge, x, y, z >>> core_columns = (0, 1, 2, 3, 4, 5) >>> coords, timesteps, atoms, *box = read_coordinates( ... files[:10], skip_rows=9, columns_to_read=core_columns) # First 10 files only
Validate trajectory consistency:
>>> coords, timesteps, atoms, *box = read_coordinates(files, 9, DEFAULT_COLUMNS_TO_READ) >>> print(f"Trajectory spans timesteps {timesteps[0]} to {timesteps[-1]}") >>> print(f"Consistent atom count: {atoms} across {len(files)} files")