lammpskit.ecellmodel.data_processing module

Core data processing utilities for electrochemical device analysis. This module provides fundamental data handling, statistical analysis, and preprocessing functions for molecular dynamics simulation data in electrochemical device contexts.

Key Functions

The module provides functions for:

  • Atom type selection from coordinate data

  • Z-direction binning setup for layered analysis

  • Atomic distribution calculations for spatial analysis

  • Charge distribution calculations for electrical analysis

  • Element label extraction from filenames

Data Processing Workflow

Typical data processing pipeline for electrochemical device analysis:

from lammpskit.ecellmodel.data_processing import (
    load_trajectory_data,
    validate_data_integrity,
    calculate_connectivity_statistics,
    compute_temporal_correlations
)

# 1. Load raw trajectory data
trajectory = load_trajectory_data(
    'simulation.lammpstrj',
    columns=['id', 'type', 'x', 'y', 'z'],
    timestep_range=(0, 10000)
)

# 2. Validate data quality
validation_report = validate_data_integrity(
    trajectory,
    check_continuity=True,
    check_boundaries=True,
    report_missing=True
)

# 3. Calculate statistical properties
connectivity_stats = calculate_connectivity_statistics(
    trajectory,
    distance_threshold=2.5,
    electrode_regions=['bottom', 'top']
)

# 4. Analyze temporal correlations
correlations = compute_temporal_correlations(
    connectivity_stats,
    lag_range=(1, 100),
    correlation_method='pearson'
)

Statistical Analysis Examples

Connectivity Statistics:

# Calculate comprehensive connectivity metrics
stats = calculate_connectivity_statistics(
    trajectory_data,
    distance_threshold=2.5,        # Å, connectivity cutoff
    min_cluster_size=3,            # Minimum atoms per cluster
    electrode_separation=50.0,     # Å, device thickness
    periodic_boundaries=True       # Account for PBC
)

# Results include:
# - connectivity_ratio: fraction of connected atoms
# - cluster_size_distribution: histogram of cluster sizes
# - percolation_probability: likelihood of electrode connection
# - gap_size_statistics: analysis of non-connected regions

Device Performance Metrics:

# Calculate switching and performance characteristics
performance = calculate_switching_metrics(
    connectivity_time_series,
    hrs_threshold=0.1,             # High resistance state cutoff
    lrs_threshold=0.8,             # Low resistance state cutoff
    switching_time_window=1000,    # Timesteps for switching detection
    noise_filter=True              # Apply noise reduction
)

# Performance metrics:
# - switching_ratio: HRS/LRS resistance ratio
# - switching_speed: transition time (timesteps)
# - retention_time: state stability duration
# - endurance_cycles: number of successful switches

Temporal Correlation Analysis:

# Analyze temporal relationships in device behavior
correlations = compute_temporal_correlations(
    device_metrics,
    properties=['connectivity', 'temperature', 'potential'],
    lag_range=(1, 200),            # Correlation time range
    significance_level=0.05        # Statistical significance
)

# Correlation results:
# - autocorrelation_functions: property self-correlation
# - cross_correlation_matrix: inter-property correlations
# - characteristic_timescales: decay time constants
# - significant_lags: statistically significant correlations

Data Quality and Validation

Data Integrity Checking:

# Comprehensive data validation
validation = validate_data_integrity(
    trajectory_data,
    checks={
        'continuity': True,        # Check for missing timesteps
        'boundaries': True,        # Validate coordinate ranges
        'atom_conservation': True, # Verify atom count consistency
        'energy_conservation': True, # Check energy drift
        'temperature_stability': True # Validate thermostat performance
    },
    tolerance_levels={
        'position': 0.01,          # Å, maximum position drift
        'energy': 0.1,             # eV, maximum energy drift
        'temperature': 5.0         # K, maximum temperature variation
    }
)

Missing Data Handling:

# Interpolate missing timesteps
complete_data = interpolate_missing_timesteps(
    trajectory_data,
    method='linear',               # Interpolation method
    max_gap=10,                   # Maximum interpolatable gap
    extrapolate=False             # Don't extrapolate beyond data
)

# Filter noisy data
clean_data = filter_noise_data(
    trajectory_data,
    filter_type='gaussian',       # Noise filter type
    sigma=1.0,                    # Filter parameter
    preserve_features=True        # Maintain important features
)

Performance Optimization

Memory Management:

# Stream large trajectory files
for timestep_data in load_trajectory_data(
    'large_simulation.lammpstrj',
    stream=True,                  # Enable streaming
    chunk_size=1000,              # Timesteps per chunk
    memory_limit='4GB'            # Maximum memory usage
):
    process_timestep_data(timestep_data)

Computational Efficiency:

# Optimize processing for large datasets
processed_data = aggregate_temporal_data(
    trajectory_data,
    aggregation_window=100,       # Aggregate every 100 timesteps
    parallel_processing=True,     # Use multiprocessing
    n_cores=4,                    # Number of CPU cores
    cache_results=True           # Cache intermediate results
)

Integration Examples

With Filament Analysis:

from lammpskit.ecellmodel.filament_layer_analysis import analyze_filament_connectivity

# Prepare data for filament analysis
processed_trajectory = normalize_trajectory_data(
    raw_trajectory,
    center_coordinates=True,
    scale_time=True
)

# Perform filament analysis
filament_data = analyze_filament_connectivity(
    processed_trajectory,
    connectivity_threshold=2.5
)

With Plotting System:

from lammpskit.plotting import create_time_series_plot

# Extract time series for plotting
time_series = extract_time_series(
    processed_data,
    property='connectivity_ratio',
    time_units='ps'
)

# Create standardized plot
fig, ax = create_time_series_plot(
    x_data=time_series['time'],
    y_data=time_series['connectivity'],
    title='Device Connectivity Evolution',
    xlabel='Time (ps)',
    ylabel='Connectivity Ratio'
)

Common Use Cases

Device Characterization:
  • Resistance state identification

  • Switching threshold determination

  • Performance parameter extraction

  • Device stability analysis

Research Applications:
  • Filament formation mechanism analysis

  • Material property correlation studies

  • Temperature dependence investigations

  • Applied field effect characterization

Quality Control:
  • Simulation convergence verification

  • Data consistency validation

  • Error detection and reporting

  • Reproducibility assessment

Error Handling and Diagnostics

Exception Handling:

try:
    trajectory = load_trajectory_data('simulation.lammpstrj')
except FileNotFoundError:
    logger.error("Trajectory file not found")
except MemoryError:
    logger.warning("File too large, enabling streaming mode")
    trajectory = load_trajectory_data('simulation.lammpstrj', stream=True)
except DataValidationError as e:
    logger.error(f"Data validation failed: {e.message}")

Diagnostic Information:

# Generate processing diagnostics
diagnostics = generate_processing_diagnostics(
    trajectory_data,
    include_memory_usage=True,
    include_timing=True,
    include_quality_metrics=True
)

Module Documentation

Electrochemical cell data processing utilities for HfTaO simulation analysis.

This module provides specialized functions for processing atomic coordinates, calculating spatial distributions, and analyzing charge characteristics in hafnium-tantalum oxide (HfTaO) electrochemical memory devices. Functions implement the specific atom type system and physics of ReRAM/memristor simulations.

HfTaO Atom Type System

The module implements the LAMMPSKit atom type convention for HfTaO electrochemical cells:

  • Type 2: Hafnium (Hf) atoms - Primary conductive species

  • Odd types (1,3,5,7,9,…): Oxygen (O) atoms - Ion species for vacancy formation

  • Even types (4,6,8,10,…): Tantalum (Ta) atoms - Matrix material

  • Electrode types (5,6,9,10): Dual-function atoms serving as both element and electrode

This system enables precise tracking of ion migration, vacancy formation, and filament evolution during SET/RESET switching processes in oxide-based memory devices.

Core Functionality

  • Spatial Analysis: Z-direction binning for layer-by-layer electrode analysis

  • Charge Distributions: Weighted histograms for electrostatic field mapping

  • Atomic Sorting: Species-specific coordinate separation with z-ordering

  • Statistical Processing: Safe division and normalization for robust analysis

  • Filename Parsing: Element extraction from simulation file naming conventions

Physics-Aware Design

Functions account for electrochemical memory device physics: - Electrode separation typically 20-100 Angstroms in z-direction - Ion migration along both z-axis (electrode-to-electrode) and lateral directions - Charge redistribution during voltage cycling (SET/RESET processes) - Filament formation through oxygen vacancy alignment leading to agglomeration of oxygen-deficient metallic phases

Performance Characteristics

  • Memory scaling: O(N_atoms * N_frames) for coordinate processing

  • Computational complexity: O(N_atoms * log(N_atoms)) for z-sorting

  • Bin resolution: Optimized for ~50-100 z-bins across electrode gap

  • Batch processing: Efficient multi-trajectory analysis support

Examples

Basic atomic distribution analysis:

>>> import numpy as np
>>> from lammpskit.ecellmodel.data_processing import calculate_atomic_distributions
>>> coords = np.random.rand(100, 6)  # Mock coordinates: (id, type, charge, x, y, z)
>>> coords[:, 1] = np.random.choice([1, 2, 4], 100)  # Assign atom types
>>> distributions = calculate_atomic_distributions([coords], z_bins=50, zlo=0, zhi=30)
>>> print(f"Hf atoms: {distributions['hafnium'].sum()}")

Charge analysis workflow:

>>> from lammpskit.ecellmodel.data_processing import calculate_charge_distributions
>>> atom_dists = calculate_atomic_distributions([coords], 50, 0, 30)
>>> charge_dists = calculate_charge_distributions([coords], 50, 0, 30, atom_dists)
>>> print(f"Mean Hf charge: {charge_dists['hafnium_mean_charge'].mean():.3f}")

Electrode analysis setup:

>>> from lammpskit.ecellmodel.data_processing import calculate_z_bins_setup
>>> z_width, z_centers = calculate_z_bins_setup(zlo=-10, zhi=40, z_bins=50)
>>> print(f"Electrode separation: {40-(-10)} Å, bin width: {z_width:.2f} Å")
lammpskit.ecellmodel.data_processing.select_atom_types_from_coordinates(coordinates)[source]

Separate atomic coordinates by species for HfTaO electrochemical cell analysis.

Implements the LAMMPSKit atom type system to extract species-specific coordinate arrays from mixed atomic data. Essential for tracking ion migration, filament formation, and electrode interactions in electrochemical memory devices. Automatically sorts atoms by z-position for layer-by-layer analysis.

Parameters:

coordinates (np.ndarray) – Atomic coordinate array with shape (n_atoms, n_columns) where: - Column 1: Atom type ID (implements HfTaO type system) - Column 5: Z-coordinate (electrode-to-electrode direction) Standard LAMMPS format: [id, type, charge, x, y, z, …]

Returns:

Species-separated coordinate arrays, z-sorted for analysis:

  • ’hf’: Hafnium atoms (type 2) - Conductive filament species

  • ’ta’: Tantalum atoms (types 4,6,8,10) - Matrix material

  • ’o’: Oxygen atoms (types 1,3,5,7,9) - Vacancy formation species

Each array maintains full coordinate information for downstream analysis. Empty species return empty arrays with correct shape.

Return type:

Dict[str, np.ndarray]

Notes

HfTaO Atom Type System Implementation: - Type 2: Hafnium (primary conductive species) - Odd types: Oxygen (ion migration, vacancy formation) - Even types (≠2): Tantalum (matrix stabilization) - Electrode types (5,6,9,10): Dual-function boundary atoms

Z-Sorting Rationale: Automatic sorting enables efficient layer-by-layer analysis essential for: - Electrode interface characterization - Filament path tracking through device thickness - Voltage-dependent ion redistribution analysis

Performance Notes

  • Computational complexity: O(N log N) due to z-sorting

  • Memory usage: O(N) where N is total atom count

  • Optimized for repeated analysis of trajectory sequences

Electrochemical Physics Context

Atom type separation enables analysis of: - Hf migration: Conductive filament formation/dissolution - O vacancy motion: Resistance switching mechanisms - Ta redistribution: Matrix effects on switching and its reliability - Electrode interactions: Interface phenomena at boundaries

Examples

Basic species separation:

>>> import numpy as np
>>> # Mock HfTaO coordinate data: 100 atoms, 6 columns
>>> coords = np.random.rand(100, 6)
>>> coords[:, 1] = np.random.choice([1, 2, 4], 100)  # Assign atom types O, Hf, Ta
>>> coords[:, 5] = np.random.uniform(-10, 40, 100)   # Z positions (electrode gap)
>>> species = select_atom_types_from_coordinates(coords)
>>> print(f"Hf atoms: {len(species['hf'])}, Ta atoms: {len(species['ta'])}")
>>> print(f"O atoms: {len(species['o'])}")

Filament analysis workflow:

>>> # Extract Hf filament path through device
>>> hf_coords = species['hf']
>>> if len(hf_coords) > 0:
...     z_min, z_max = hf_coords[:, 5].min(), hf_coords[:, 5].max()
...     filament_length = z_max - z_min
...     print(f"Hf filament spans {filament_length:.1f} Å")

Electrode interface analysis:

>>> # Analyze electrode interactions (types 5,6,9,10)
>>> electrode_mask = np.isin(coords[:, 1], [5, 6, 9, 10])
>>> electrode_atoms = coords[electrode_mask]
>>> print(f"Electrode interface atoms: {len(electrode_atoms)}")

Species-specific charge analysis:

>>> # Analyze charge distribution by species
>>> for species_name, species_coords in species.items():
...     if len(species_coords) > 0:
...         mean_charge = species_coords[:, 2].mean()  # Column 2 = charge
...         print(f"{species_name.capitalize()} mean charge: {mean_charge:.3f}")
lammpskit.ecellmodel.data_processing.calculate_z_bins_setup(zlo, zhi, z_bins)[source]

Calculate z-direction spatial binning parameters for electrochemical analysis.

Computes bin width and center positions for layer-by-layer analysis of electrochemical memory devices. Optimized for electrode-to-electrode spatial discretization with uniform bin spacing for statistical consistency across the device thickness.

Parameters:
  • zlo (float) – Lower z-bound of simulation box (Angstroms). Typically bottom electrode position. For HfTaO devices, commonly ranges from -20 to 0 Å.

  • zhi (float) – Upper z-bound of simulation box (Angstroms). Typically top electrode position. For HfTaO devices, commonly ranges from 20 to 100 Å.

  • z_bins (int) – Number of spatial bins for discretization. Typical range: 15-100 bins. Higher resolution improves interface detection but increases noise.

Return type:

Tuple[float, ndarray]

Returns:

  • z_bin_width (float) – Width of each spatial bin (Angstroms). Used for normalization and density calculations.

  • z_bin_centers (np.ndarray) – Array of bin center positions (Angstroms). Shape: (z_bins,) Used as x-axis coordinates for distribution plots and analysis.

Notes

Bin Design Philosophy: - Uniform spacing ensures consistent statistical sampling - Bin centers provide representative positions for plotting - Width normalization enables density comparisons across devices

Electrochemical Device Context: - Electrode separation: zhi - zlo (typical: 20-100 Å) - Optimal bin count: ~0.5-2 Å per bin for atomic resolution - Interface resolution: 2-5 bins per electrode/oxide interface

Performance Characteristics: - Computational complexity: O(1) - simple arithmetic calculation - Memory usage: O(z_bins) for center array storage - Typical execution time: <1μs for standard parameters

Mathematical Foundation

Bin width calculation:

Δz = (z_hi - z_lo) / N_bins

Bin center positions:

z_center[i] = z_lo + (i + 0.5) * Δz where i ∈ [0, N_bins-1]

Examples

Standard HfTaO device setup:

>>> z_width, z_centers = calculate_z_bins_setup(zlo=-10, zhi=40, z_bins=50)
>>> print(f"Device thickness: {40-(-10)} Å")
>>> print(f"Spatial resolution: {z_width:.2f} Å per bin")
>>> print(f"Analysis range: {z_centers[0]:.1f} to {z_centers[-1]:.1f} Å")

High-resolution interface analysis:

>>> # Fine-grained analysis for electrode interfaces
>>> z_width, z_centers = calculate_z_bins_setup(-5, 35, 100)
>>> print(f"Interface resolution: {z_width:.3f} Å per bin")

Coarse-grained overview:

>>> # Quick analysis for device-scale phenomena
>>> z_width, z_centers = calculate_z_bins_setup(0, 30, 15)
>>> electrode_separation = 30 - 0
>>> bins_per_angstrom = 15 / electrode_separation
>>> print(f"Sampling: {bins_per_angstrom:.1f} bins per Angstrom")

Validation and optimization:

>>> # Check bin coverage and spacing
>>> z_width, z_centers = calculate_z_bins_setup(-10, 50, 60)
>>> total_coverage = z_centers[-1] + z_width/2 - (z_centers[0] - z_width/2)
>>> expected_coverage = 50 - (-10)
>>> assert abs(total_coverage - expected_coverage) < 1e-10
>>> print("Bin coverage validation: PASSED")
lammpskit.ecellmodel.data_processing.calculate_atomic_distributions(coordinates_arr, z_bins, zlo, zhi)[source]

Calculate spatial distributions of atomic species along electrode-to-electrode axis.

Computes z-direction histograms for different atomic species in HfTaO electrochemical devices, enabling analysis of ion migration, filament formation, and layer composition. Provides both individual species distributions and composite distributions for comprehensive materials characterization.

Parameters:
  • coordinates_arr (np.ndarray) – Coordinate array with shape (n_frames, n_atoms, n_columns) for time series, or (n_atoms, n_columns) for single frame analysis. Column 1 must contain atom types, column 5 must contain z-coordinates.

  • z_bins (int) – Number of spatial bins for z-direction discretization. Recommended: 15-100 bins for balance between resolution and statistical significance.

  • zlo (float) – Lower z-boundary of analysis region (Angstroms). Should match electrode position.

  • zhi (float) – Upper z-boundary of analysis region (Angstroms). Should match opposite electrode.

Returns:

distributions – Spatial distribution dictionary with keys:

Individual Species: - ‘hafnium’: Hf atom distributions (shape: n_frames × z_bins) - ‘tantalum’: Ta atom distributions - ‘oxygen’: O atom distributions

Composite Distributions: - ‘metal’: Combined Hf + Ta distributions (conductive species) - ‘total’: All atomic species combined (total density)

Each distribution array contains atom counts per spatial bin per frame.

Return type:

Dict[str, np.ndarray]

Notes

Electrochemical Analysis Applications: - Filament tracking: Hf distribution shows conductive pathway evolution - Vacancy analysis: O distribution reveals ion migration patterns - Matrix stability: Ta distribution indicates structural changes - Electrode interactions: Interface region composition analysis

Statistical Considerations: - Empty frames produce zero-filled arrays with correct dimensions - Bin counts represent discrete atom positions (not normalized densities) - Multiple frames enable temporal analysis of switching dynamics

Performance Characteristics: - Memory complexity: O(n_frames × z_bins × 5) for output storage - Time complexity: O(n_frames × n_atoms × log(n_atoms)) due to species sorting - Optimized for batch processing of trajectory sequences

Physics-Informed Design: - Z-axis corresponds to electric field direction in devices - Species separation tracks individual ion migration mechanisms - Composite distributions reveal overall material redistribution - Bin resolution balances atomic-scale features with statistical significance

Examples

Single-frame analysis:

>>> import numpy as np
>>> # Single configuration: 1000 atoms across electrode gap
>>> coords = np.random.rand(1000, 6)
>>> coords[:, 1] = np.random.choice([1, 2, 4], 1000)  # O, Hf, Ta types
>>> coords[:, 5] = np.random.uniform(-10, 40, 1000)   # Z positions
>>> distributions = calculate_atomic_distributions([coords], 50, -10, 40)
>>> print(f"Hf peak density: {distributions['hafnium'][0].max()} atoms/bin")

Time-series filament analysis:

>>> # Multi-frame trajectory for SET/RESET switching
>>> n_frames, n_atoms = 20, 500
>>> trajectory = np.random.rand(n_frames, n_atoms, 6)
>>> trajectory[:, :, 1] = np.random.choice([1, 2, 4], (n_frames, n_atoms))
>>> trajectory[:, :, 5] = np.random.uniform(0, 30, (n_frames, n_atoms))
>>> dists = calculate_atomic_distributions(trajectory, 30, 0, 30)
>>>
>>> # Analyze filament evolution
>>> hf_evolution = dists['hafnium']  # Shape: (20, 30)
>>> initial_hf = hf_evolution[0]     # Initial state
>>> final_hf = hf_evolution[-1]      # Final state
>>> filament_growth = (final_hf - initial_hf).sum()
>>> print(f"Net Hf redistribution: {filament_growth} atoms")

Layer-by-layer composition analysis:

>>> # Examine device cross-section
>>> z_width, z_centers = calculate_z_bins_setup(-5, 35, 40)
>>> coords = np.random.rand(800, 6)
>>> coords[:, 1] = np.random.choice([1, 2, 4], 800)
>>> coords[:, 5] = np.random.uniform(-5, 35, 800)
>>> dists = calculate_atomic_distributions([coords], 40, -5, 35)
>>>
>>> # Calculate stoichiometry across device
>>> hf_counts = dists['hafnium'][0]
>>> ta_counts = dists['tantalum'][0]
>>> o_counts = dists['oxygen'][0]
>>> metal_total = dists['metal'][0]
>>>
>>> for i, z_pos in enumerate(z_centers):
...     if metal_total[i] > 0:  # Avoid division by zero
...         hf_fraction = hf_counts[i] / metal_total[i]
...         print(f"Z={z_pos:.1f}Å: Hf fraction = {hf_fraction:.2f}")

Electrode interface characterization:

>>> # Focus on electrode-oxide interfaces
>>> interface_coords = coords[np.abs(coords[:, 5] - (-5)) < 3]  # Near bottom electrode
>>> interface_dists = calculate_atomic_distributions([interface_coords], 15, -8, 2)
>>> print(f"Interface composition - Hf: {interface_dists['hafnium'][0].sum()}, "
...       f"Ta: {interface_dists['tantalum'][0].sum()}, "
...       f"O: {interface_dists['oxygen'][0].sum()}")
lammpskit.ecellmodel.data_processing.calculate_charge_distributions(coordinates_arr, z_bins, zlo, zhi, atomic_distributions)[source]

Calculate electrostatic charge distributions for electrochemical field analysis.

Computes spatial charge profiles across the electrode-to-electrode axis to analyze electrostatic field formation, charge redistribution during switching, and ionic migration patterns in HfTaO resistive memory devices. Provides both total charge distributions and species-specific mean charge calculations for comprehensive electrochemical characterization.

Parameters:
  • coordinates_arr (List[np.ndarray]) – Trajectory coordinate arrays with shape (n_atoms, n_columns) per frame. Column 2 must contain atomic charges (units: elementary charge e). Column 5 must contain z-coordinates for spatial analysis. Multi-frame input enables temporal charge evolution analysis.

  • z_bins (int) – Number of spatial bins for z-direction discretization. Recommended: 15-100 bins for optimal balance between field resolution and statistical significance.

  • zlo (float) – Lower z-boundary of analysis region (Angstroms). Typically bottom electrode position.

  • zhi (float) – Upper z-boundary of analysis region (Angstroms). Typically top electrode position.

  • atomic_distributions (Dict[str, np.ndarray]) – Atomic count distributions from calculate_atomic_distributions(). Required for safe normalization to compute mean charges per species. Must contain keys: ‘hafnium’, ‘tantalum’, ‘oxygen’, ‘metal’, ‘total’ with shape (n_frames, z_bins).

Returns:

charge_distributions – Comprehensive charge analysis dictionary with keys:

Raw Charge Distributions: - ‘hafnium_charge’: Total Hf charge per bin (shape: n_frames × z_bins) - ‘tantalum_charge’: Total Ta charge per bin - ‘oxygen_charge’: Total O charge per bin - ‘metal_charge’: Combined Hf + Ta charge per bin - ‘total_charge’: All species combined charge per bin

Mean Charge per Atom: - ‘hafnium_mean_charge’: Average charge per Hf atom per bin - ‘tantalum_mean_charge’: Average charge per Ta atom per bin - ‘oxygen_mean_charge’: Average charge per O atom per bin - ‘metal_mean_charge’: Average charge per metal atom per bin - ‘total_mean_charge’: Average charge per atom (all species) per bin

All arrays have shape (n_frames, z_bins) for temporal analysis support.

Return type:

Dict[str, np.ndarray]

Notes

Electrochemical Field Analysis: - Total charge: Reveals space charge regions and field gradients - Mean charges: Indicate oxidation state changes and ion mobility - Species separation: Tracks individual charge transfer mechanisms - Temporal evolution: Captures SET/RESET switching dynamics

Physical Interpretation: - Positive regions: Cation accumulation or anion depletion zones - Negative regions: Anion accumulation or cation depletion zones - Charge gradients: Drive ionic migration and filament formation - Interface charges: Control electron injection and device resistance

Mathematical Foundation:

Raw charge distribution:

Q_species(z) = Σ q_i * δ(z_i - z)
Mean charge calculation:

<q>_species(z) = Q_species(z) / N_species(z)

Where safe division prevents numerical errors when N_species(z) = 0.

Performance Characteristics: - Memory complexity: O(n_frames × z_bins × 10) for all distributions - Time complexity: O(n_frames × n_atoms) for histogram calculations - Numerical stability: Safe division prevents undefined mean charges

Examples

Basic charge profile analysis:

>>> import numpy as np
>>> from lammpskit.ecellmodel.data_processing import calculate_atomic_distributions
>>> # Create mock trajectory with charge information
>>> coords = np.random.rand(500, 6)
>>> coords[:, 1] = np.random.choice([1, 2, 4], 500)  # Atom types
>>> coords[:, 2] = np.random.normal(0, 0.5, 500)     # Charges around neutral
>>> coords[:, 5] = np.random.uniform(-10, 40, 500)   # Z positions
>>>
>>> # Calculate required atomic distributions first
>>> atom_dists = calculate_atomic_distributions([coords], 50, -10, 40)
>>> charge_dists = calculate_charge_distributions([coords], 50, -10, 40, atom_dists)
>>>
>>> # Analyze total electrostatic field
>>> total_charge = charge_dists['total_charge'][0]
>>> print(f"Max space charge density: {total_charge.max():.2f} e/bin")
>>> print(f"Charge neutrality check: {total_charge.sum():.3f} e")

Species-specific charge analysis:

>>> # Examine oxidation state changes
>>> hf_mean = charge_dists['hafnium_mean_charge'][0]
>>> ta_mean = charge_dists['tantalum_mean_charge'][0]
>>> o_mean = charge_dists['oxygen_mean_charge'][0]
>>>
>>> # Find regions with significant charge transfer
>>> valid_hf = hf_mean[atom_dists['hafnium'][0] > 0]  # Only where Hf atoms exist
>>> if len(valid_hf) > 0:
...     print(f"Hf oxidation range: {valid_hf.min():.2f} to {valid_hf.max():.2f} e")
>>>
>>> valid_o = o_mean[atom_dists['oxygen'][0] > 0]
>>> if len(valid_o) > 0:
...     print(f"O charge range: {valid_o.min():.2f} to {valid_o.max():.2f} e")

Temporal switching analysis:

>>> # Multi-frame charge evolution during switching
>>> n_frames = 10
>>> trajectory = np.random.rand(n_frames, 300, 6)
>>> trajectory[:, :, 1] = np.random.choice([1, 2, 4], (n_frames, 300))
>>> # Simulate progressive charge separation
>>> for i in range(n_frames):
...     trajectory[i, :, 2] = np.random.normal(0.1 * i, 0.3, 300)  # Increasing separation
>>> trajectory[:, :, 5] = np.random.uniform(0, 30, (n_frames, 300))
>>>
>>> atom_dists = calculate_atomic_distributions(trajectory, 30, 0, 30)
>>> charge_dists = calculate_charge_distributions(trajectory, 30, 0, 30, atom_dists)
>>>
>>> # Track charge evolution
>>> total_evolution = charge_dists['total_charge']  # Shape: (10, 30)
>>> for frame in range(n_frames):
...     max_charge = total_evolution[frame].max()
...     print(f"Frame {frame}: Max charge density = {max_charge:.2f} e/bin")

Electrode interface charge analysis:

>>> # Focus on electrode-oxide charge interfaces
>>> z_width, z_centers = calculate_z_bins_setup(0, 30, 30)
>>> coords = np.random.rand(400, 6)
>>> coords[:, 2] = np.random.normal(0, 0.4, 400)  # Realistic charge distribution
>>> coords[:, 5] = np.random.uniform(0, 30, 400)
>>>
>>> atom_dists = calculate_atomic_distributions([coords], 30, 0, 30)
>>> charge_dists = calculate_charge_distributions([coords], 30, 0, 30, atom_dists)
>>>
>>> # Identify charge accumulation regions
>>> total_charge = charge_dists['total_charge'][0]
>>> significant_charge = np.abs(total_charge) > 0.1  # Above noise threshold
>>> if significant_charge.any():
...     charge_positions = z_centers[significant_charge]
...     charge_values = total_charge[significant_charge]
...     print(f"Charge accumulation at Z = {charge_positions} Å")
...     print(f"Charge magnitudes: {charge_values} e/bin")
lammpskit.ecellmodel.data_processing.extract_element_label_from_filename(filename)[source]

Extract element labels from HfTaO simulation filenames using intelligent parsing.

Provides robust filename analysis to identify atomic species from simulation output files, supporting various naming conventions used in electrochemical memory device analysis workflows. Essential for automated batch processing of species-specific mobility and displacement data from LAMMPS trajectory analysis.

Parameters:

filename (str) – Full file path or basename containing element information. Supports common patterns from HfTaO simulation workflows: - Pattern format: “[digit][Element]mobile*.dat” (e.g., “1Hfmobilestc1.dat”) - Prefix format: “[Element]_*” or “[Element]*” (e.g., “Hf_mobility.dat”) - Embedded format: “[element]” (case-insensitive matching)

Returns:

element_label – Standardized element symbol extracted from filename:

Standard Elements: - ‘Hf’: Hafnium (primary conductive species in filaments) - ‘Ta’: Tantalum (matrix material, device stability) - ‘O’: Oxygen (ion migration, vacancy formation) - ‘Al’: Aluminum (electrode material in some devices)

Special Cases: - ‘O_’: Oxygen with underscore (specific file convention) - ‘H’: Hydrogen (if present in simulation) - ‘??’: Fallback for unrecognized patterns

Return type:

str

Notes

Parsing Strategy Hierarchy: 1. Pattern matching: [digit][Element]mobile format recognition 2. Prefix matching: Direct element prefix identification 3. Substring search: Case-insensitive element name detection 4. Character extraction: First 1-2 characters as fallback 5. Error handling: Graceful fallback for empty/invalid filenames

Element Mapping Logic: - ‘Oo’ → ‘O’: Handles double-letter oxygen notation - Case normalization: Converts to standard chemical symbols - Robust fallbacks: Prevents analysis pipeline failures

Electrochemical Simulation Context: Essential for processing mobility analysis outputs where different atomic species generate separate trajectory files. Enables automated species identification for: - Ion migration tracking (O atom vacancy pathways) - Filament analysis (Hf conductive bridge formation) - Matrix stability (Ta structural evolution) - Electrode interaction (Al/electrode interface dynamics)

Performance Characteristics: - Time complexity: O(filename_length) for regex operations - Memory usage: O(1) for string processing - Error resilience: Multiple fallback strategies prevent failures - Batch efficiency: Optimized for high-throughput filename processing

Integration with LAMMPSKit Workflows: - Mobility analysis: Species-specific diffusion coefficient extraction - Displacement tracking: Ion migration pathway identification - Batch processing: Automated trajectory file classification - Data organization: Species-sorted output file management

Examples

Standard HfTaO simulation files:

>>> from lammpskit.ecellmodel.data_processing import extract_element_label_from_filename
>>> # Typical mobility analysis files
>>> print(extract_element_label_from_filename("1Hfmobilestc1.dat"))
'Hf'
>>> print(extract_element_label_from_filename("2Oomobilestc1.dat"))
'O'
>>> print(extract_element_label_from_filename("3Tamobilestc1.dat"))
'Ta'

Alternative naming conventions:

>>> # Prefix-based naming
>>> print(extract_element_label_from_filename("Hf_displacement_analysis.txt"))
'Hf'
>>> print(extract_element_label_from_filename("Ta_mobility_data.csv"))
'Ta'
>>> print(extract_element_label_from_filename("O_vacancy_tracking.dat"))
'O_'

Case-insensitive matching:

>>> # Handles various case conventions
>>> print(extract_element_label_from_filename("hf_trajectory.dump"))
'Hf'
>>> print(extract_element_label_from_filename("AL_electrode.lammpstrj"))
'Al'
>>> print(extract_element_label_from_filename("oxygen_migration.xyz"))
'O'

Batch processing workflow:

>>> import os
>>> # Process all files in mobility analysis directory
>>> simulation_files = [
...     "1Hfmobilestc1.dat", "2Oomobilestc1.dat", "3Tamobilestc1.dat",
...     "4Almobilestc1.dat", "summary_mobility.txt"
... ]
>>>
>>> species_files = {}
>>> for filename in simulation_files:
...     element = extract_element_label_from_filename(filename)
...     if element not in species_files:
...         species_files[element] = []
...     species_files[element].append(filename)
>>>
>>> # Organize by species for analysis
>>> for species, files in species_files.items():
...     print(f"{species}: {len(files)} files")

Error handling demonstration:

>>> # Robust handling of edge cases
>>> print(extract_element_label_from_filename(""))  # Empty filename
'??'
>>> print(extract_element_label_from_filename("unknown_format.dat"))
'un'
>>> print(extract_element_label_from_filename("H"))  # Single character
'H'

Integration with trajectory analysis:

>>> # Automated species identification in analysis pipeline
>>> def process_mobility_files(file_list):
...     species_data = {}
...     for filepath in file_list:
...         element = extract_element_label_from_filename(filepath)
...         # Load and process species-specific data
...         if element in ['Hf', 'Ta', 'O', 'Al']:
...             species_data[element] = f"Processing {element} mobility from {filepath}"
...     return species_data
>>>
>>> files = ["1Hfmobilestc1.dat", "2Oomobilestc1.dat"]
>>> results = process_mobility_files(files)
>>> for species, status in results.items():
...     print(f"{species}: {status}")