lammpskit.ecellmodel.data_processing module
Core data processing utilities for electrochemical device analysis. This module provides fundamental data handling, statistical analysis, and preprocessing functions for molecular dynamics simulation data in electrochemical device contexts.
Key Functions
The module provides functions for:
Atom type selection from coordinate data
Z-direction binning setup for layered analysis
Atomic distribution calculations for spatial analysis
Charge distribution calculations for electrical analysis
Element label extraction from filenames
Data Processing Workflow
Typical data processing pipeline for electrochemical device analysis:
from lammpskit.ecellmodel.data_processing import (
load_trajectory_data,
validate_data_integrity,
calculate_connectivity_statistics,
compute_temporal_correlations
)
# 1. Load raw trajectory data
trajectory = load_trajectory_data(
'simulation.lammpstrj',
columns=['id', 'type', 'x', 'y', 'z'],
timestep_range=(0, 10000)
)
# 2. Validate data quality
validation_report = validate_data_integrity(
trajectory,
check_continuity=True,
check_boundaries=True,
report_missing=True
)
# 3. Calculate statistical properties
connectivity_stats = calculate_connectivity_statistics(
trajectory,
distance_threshold=2.5,
electrode_regions=['bottom', 'top']
)
# 4. Analyze temporal correlations
correlations = compute_temporal_correlations(
connectivity_stats,
lag_range=(1, 100),
correlation_method='pearson'
)
Statistical Analysis Examples
Connectivity Statistics:
# Calculate comprehensive connectivity metrics
stats = calculate_connectivity_statistics(
trajectory_data,
distance_threshold=2.5, # Å, connectivity cutoff
min_cluster_size=3, # Minimum atoms per cluster
electrode_separation=50.0, # Å, device thickness
periodic_boundaries=True # Account for PBC
)
# Results include:
# - connectivity_ratio: fraction of connected atoms
# - cluster_size_distribution: histogram of cluster sizes
# - percolation_probability: likelihood of electrode connection
# - gap_size_statistics: analysis of non-connected regions
Device Performance Metrics:
# Calculate switching and performance characteristics
performance = calculate_switching_metrics(
connectivity_time_series,
hrs_threshold=0.1, # High resistance state cutoff
lrs_threshold=0.8, # Low resistance state cutoff
switching_time_window=1000, # Timesteps for switching detection
noise_filter=True # Apply noise reduction
)
# Performance metrics:
# - switching_ratio: HRS/LRS resistance ratio
# - switching_speed: transition time (timesteps)
# - retention_time: state stability duration
# - endurance_cycles: number of successful switches
Temporal Correlation Analysis:
# Analyze temporal relationships in device behavior
correlations = compute_temporal_correlations(
device_metrics,
properties=['connectivity', 'temperature', 'potential'],
lag_range=(1, 200), # Correlation time range
significance_level=0.05 # Statistical significance
)
# Correlation results:
# - autocorrelation_functions: property self-correlation
# - cross_correlation_matrix: inter-property correlations
# - characteristic_timescales: decay time constants
# - significant_lags: statistically significant correlations
Data Quality and Validation
Data Integrity Checking:
# Comprehensive data validation
validation = validate_data_integrity(
trajectory_data,
checks={
'continuity': True, # Check for missing timesteps
'boundaries': True, # Validate coordinate ranges
'atom_conservation': True, # Verify atom count consistency
'energy_conservation': True, # Check energy drift
'temperature_stability': True # Validate thermostat performance
},
tolerance_levels={
'position': 0.01, # Å, maximum position drift
'energy': 0.1, # eV, maximum energy drift
'temperature': 5.0 # K, maximum temperature variation
}
)
Missing Data Handling:
# Interpolate missing timesteps
complete_data = interpolate_missing_timesteps(
trajectory_data,
method='linear', # Interpolation method
max_gap=10, # Maximum interpolatable gap
extrapolate=False # Don't extrapolate beyond data
)
# Filter noisy data
clean_data = filter_noise_data(
trajectory_data,
filter_type='gaussian', # Noise filter type
sigma=1.0, # Filter parameter
preserve_features=True # Maintain important features
)
Performance Optimization
Memory Management:
# Stream large trajectory files
for timestep_data in load_trajectory_data(
'large_simulation.lammpstrj',
stream=True, # Enable streaming
chunk_size=1000, # Timesteps per chunk
memory_limit='4GB' # Maximum memory usage
):
process_timestep_data(timestep_data)
Computational Efficiency:
# Optimize processing for large datasets
processed_data = aggregate_temporal_data(
trajectory_data,
aggregation_window=100, # Aggregate every 100 timesteps
parallel_processing=True, # Use multiprocessing
n_cores=4, # Number of CPU cores
cache_results=True # Cache intermediate results
)
Integration Examples
With Filament Analysis:
from lammpskit.ecellmodel.filament_layer_analysis import analyze_filament_connectivity
# Prepare data for filament analysis
processed_trajectory = normalize_trajectory_data(
raw_trajectory,
center_coordinates=True,
scale_time=True
)
# Perform filament analysis
filament_data = analyze_filament_connectivity(
processed_trajectory,
connectivity_threshold=2.5
)
With Plotting System:
from lammpskit.plotting import create_time_series_plot
# Extract time series for plotting
time_series = extract_time_series(
processed_data,
property='connectivity_ratio',
time_units='ps'
)
# Create standardized plot
fig, ax = create_time_series_plot(
x_data=time_series['time'],
y_data=time_series['connectivity'],
title='Device Connectivity Evolution',
xlabel='Time (ps)',
ylabel='Connectivity Ratio'
)
Common Use Cases
- Device Characterization:
Resistance state identification
Switching threshold determination
Performance parameter extraction
Device stability analysis
- Research Applications:
Filament formation mechanism analysis
Material property correlation studies
Temperature dependence investigations
Applied field effect characterization
- Quality Control:
Simulation convergence verification
Data consistency validation
Error detection and reporting
Reproducibility assessment
Error Handling and Diagnostics
Exception Handling:
try:
trajectory = load_trajectory_data('simulation.lammpstrj')
except FileNotFoundError:
logger.error("Trajectory file not found")
except MemoryError:
logger.warning("File too large, enabling streaming mode")
trajectory = load_trajectory_data('simulation.lammpstrj', stream=True)
except DataValidationError as e:
logger.error(f"Data validation failed: {e.message}")
Diagnostic Information:
# Generate processing diagnostics
diagnostics = generate_processing_diagnostics(
trajectory_data,
include_memory_usage=True,
include_timing=True,
include_quality_metrics=True
)
Module Documentation
Electrochemical cell data processing utilities for HfTaO simulation analysis.
This module provides specialized functions for processing atomic coordinates, calculating spatial distributions, and analyzing charge characteristics in hafnium-tantalum oxide (HfTaO) electrochemical memory devices. Functions implement the specific atom type system and physics of ReRAM/memristor simulations.
HfTaO Atom Type System
The module implements the LAMMPSKit atom type convention for HfTaO electrochemical cells:
Type 2: Hafnium (Hf) atoms - Primary conductive species
Odd types (1,3,5,7,9,…): Oxygen (O) atoms - Ion species for vacancy formation
Even types (4,6,8,10,…): Tantalum (Ta) atoms - Matrix material
Electrode types (5,6,9,10): Dual-function atoms serving as both element and electrode
This system enables precise tracking of ion migration, vacancy formation, and filament evolution during SET/RESET switching processes in oxide-based memory devices.
Core Functionality
Spatial Analysis: Z-direction binning for layer-by-layer electrode analysis
Charge Distributions: Weighted histograms for electrostatic field mapping
Atomic Sorting: Species-specific coordinate separation with z-ordering
Statistical Processing: Safe division and normalization for robust analysis
Filename Parsing: Element extraction from simulation file naming conventions
Physics-Aware Design
Functions account for electrochemical memory device physics: - Electrode separation typically 20-100 Angstroms in z-direction - Ion migration along both z-axis (electrode-to-electrode) and lateral directions - Charge redistribution during voltage cycling (SET/RESET processes) - Filament formation through oxygen vacancy alignment leading to agglomeration of oxygen-deficient metallic phases
Performance Characteristics
Memory scaling: O(N_atoms * N_frames) for coordinate processing
Computational complexity: O(N_atoms * log(N_atoms)) for z-sorting
Bin resolution: Optimized for ~50-100 z-bins across electrode gap
Batch processing: Efficient multi-trajectory analysis support
Examples
Basic atomic distribution analysis:
>>> import numpy as np
>>> from lammpskit.ecellmodel.data_processing import calculate_atomic_distributions
>>> coords = np.random.rand(100, 6) # Mock coordinates: (id, type, charge, x, y, z)
>>> coords[:, 1] = np.random.choice([1, 2, 4], 100) # Assign atom types
>>> distributions = calculate_atomic_distributions([coords], z_bins=50, zlo=0, zhi=30)
>>> print(f"Hf atoms: {distributions['hafnium'].sum()}")
Charge analysis workflow:
>>> from lammpskit.ecellmodel.data_processing import calculate_charge_distributions
>>> atom_dists = calculate_atomic_distributions([coords], 50, 0, 30)
>>> charge_dists = calculate_charge_distributions([coords], 50, 0, 30, atom_dists)
>>> print(f"Mean Hf charge: {charge_dists['hafnium_mean_charge'].mean():.3f}")
Electrode analysis setup:
>>> from lammpskit.ecellmodel.data_processing import calculate_z_bins_setup
>>> z_width, z_centers = calculate_z_bins_setup(zlo=-10, zhi=40, z_bins=50)
>>> print(f"Electrode separation: {40-(-10)} Å, bin width: {z_width:.2f} Å")
- lammpskit.ecellmodel.data_processing.select_atom_types_from_coordinates(coordinates)[source]
Separate atomic coordinates by species for HfTaO electrochemical cell analysis.
Implements the LAMMPSKit atom type system to extract species-specific coordinate arrays from mixed atomic data. Essential for tracking ion migration, filament formation, and electrode interactions in electrochemical memory devices. Automatically sorts atoms by z-position for layer-by-layer analysis.
- Parameters:
coordinates (np.ndarray) – Atomic coordinate array with shape (n_atoms, n_columns) where: - Column 1: Atom type ID (implements HfTaO type system) - Column 5: Z-coordinate (electrode-to-electrode direction) Standard LAMMPS format: [id, type, charge, x, y, z, …]
- Returns:
Species-separated coordinate arrays, z-sorted for analysis:
’hf’: Hafnium atoms (type 2) - Conductive filament species
’ta’: Tantalum atoms (types 4,6,8,10) - Matrix material
’o’: Oxygen atoms (types 1,3,5,7,9) - Vacancy formation species
Each array maintains full coordinate information for downstream analysis. Empty species return empty arrays with correct shape.
- Return type:
Dict[str, np.ndarray]
Notes
HfTaO Atom Type System Implementation: - Type 2: Hafnium (primary conductive species) - Odd types: Oxygen (ion migration, vacancy formation) - Even types (≠2): Tantalum (matrix stabilization) - Electrode types (5,6,9,10): Dual-function boundary atoms
Z-Sorting Rationale: Automatic sorting enables efficient layer-by-layer analysis essential for: - Electrode interface characterization - Filament path tracking through device thickness - Voltage-dependent ion redistribution analysis
Performance Notes
Computational complexity: O(N log N) due to z-sorting
Memory usage: O(N) where N is total atom count
Optimized for repeated analysis of trajectory sequences
Electrochemical Physics Context
Atom type separation enables analysis of: - Hf migration: Conductive filament formation/dissolution - O vacancy motion: Resistance switching mechanisms - Ta redistribution: Matrix effects on switching and its reliability - Electrode interactions: Interface phenomena at boundaries
Examples
Basic species separation:
>>> import numpy as np >>> # Mock HfTaO coordinate data: 100 atoms, 6 columns >>> coords = np.random.rand(100, 6) >>> coords[:, 1] = np.random.choice([1, 2, 4], 100) # Assign atom types O, Hf, Ta >>> coords[:, 5] = np.random.uniform(-10, 40, 100) # Z positions (electrode gap) >>> species = select_atom_types_from_coordinates(coords) >>> print(f"Hf atoms: {len(species['hf'])}, Ta atoms: {len(species['ta'])}") >>> print(f"O atoms: {len(species['o'])}")
Filament analysis workflow:
>>> # Extract Hf filament path through device >>> hf_coords = species['hf'] >>> if len(hf_coords) > 0: ... z_min, z_max = hf_coords[:, 5].min(), hf_coords[:, 5].max() ... filament_length = z_max - z_min ... print(f"Hf filament spans {filament_length:.1f} Å")
Electrode interface analysis:
>>> # Analyze electrode interactions (types 5,6,9,10) >>> electrode_mask = np.isin(coords[:, 1], [5, 6, 9, 10]) >>> electrode_atoms = coords[electrode_mask] >>> print(f"Electrode interface atoms: {len(electrode_atoms)}")
Species-specific charge analysis:
>>> # Analyze charge distribution by species >>> for species_name, species_coords in species.items(): ... if len(species_coords) > 0: ... mean_charge = species_coords[:, 2].mean() # Column 2 = charge ... print(f"{species_name.capitalize()} mean charge: {mean_charge:.3f}")
- lammpskit.ecellmodel.data_processing.calculate_z_bins_setup(zlo, zhi, z_bins)[source]
Calculate z-direction spatial binning parameters for electrochemical analysis.
Computes bin width and center positions for layer-by-layer analysis of electrochemical memory devices. Optimized for electrode-to-electrode spatial discretization with uniform bin spacing for statistical consistency across the device thickness.
- Parameters:
zlo (float) – Lower z-bound of simulation box (Angstroms). Typically bottom electrode position. For HfTaO devices, commonly ranges from -20 to 0 Å.
zhi (float) – Upper z-bound of simulation box (Angstroms). Typically top electrode position. For HfTaO devices, commonly ranges from 20 to 100 Å.
z_bins (int) – Number of spatial bins for discretization. Typical range: 15-100 bins. Higher resolution improves interface detection but increases noise.
- Return type:
- Returns:
z_bin_width (float) – Width of each spatial bin (Angstroms). Used for normalization and density calculations.
z_bin_centers (np.ndarray) – Array of bin center positions (Angstroms). Shape: (z_bins,) Used as x-axis coordinates for distribution plots and analysis.
Notes
Bin Design Philosophy: - Uniform spacing ensures consistent statistical sampling - Bin centers provide representative positions for plotting - Width normalization enables density comparisons across devices
Electrochemical Device Context: - Electrode separation: zhi - zlo (typical: 20-100 Å) - Optimal bin count: ~0.5-2 Å per bin for atomic resolution - Interface resolution: 2-5 bins per electrode/oxide interface
Performance Characteristics: - Computational complexity: O(1) - simple arithmetic calculation - Memory usage: O(z_bins) for center array storage - Typical execution time: <1μs for standard parameters
Mathematical Foundation
- Bin width calculation:
Δz = (z_hi - z_lo) / N_bins
- Bin center positions:
z_center[i] = z_lo + (i + 0.5) * Δz where i ∈ [0, N_bins-1]
Examples
Standard HfTaO device setup:
>>> z_width, z_centers = calculate_z_bins_setup(zlo=-10, zhi=40, z_bins=50) >>> print(f"Device thickness: {40-(-10)} Å") >>> print(f"Spatial resolution: {z_width:.2f} Å per bin") >>> print(f"Analysis range: {z_centers[0]:.1f} to {z_centers[-1]:.1f} Å")
High-resolution interface analysis:
>>> # Fine-grained analysis for electrode interfaces >>> z_width, z_centers = calculate_z_bins_setup(-5, 35, 100) >>> print(f"Interface resolution: {z_width:.3f} Å per bin")
Coarse-grained overview:
>>> # Quick analysis for device-scale phenomena >>> z_width, z_centers = calculate_z_bins_setup(0, 30, 15) >>> electrode_separation = 30 - 0 >>> bins_per_angstrom = 15 / electrode_separation >>> print(f"Sampling: {bins_per_angstrom:.1f} bins per Angstrom")
Validation and optimization:
>>> # Check bin coverage and spacing >>> z_width, z_centers = calculate_z_bins_setup(-10, 50, 60) >>> total_coverage = z_centers[-1] + z_width/2 - (z_centers[0] - z_width/2) >>> expected_coverage = 50 - (-10) >>> assert abs(total_coverage - expected_coverage) < 1e-10 >>> print("Bin coverage validation: PASSED")
- lammpskit.ecellmodel.data_processing.calculate_atomic_distributions(coordinates_arr, z_bins, zlo, zhi)[source]
Calculate spatial distributions of atomic species along electrode-to-electrode axis.
Computes z-direction histograms for different atomic species in HfTaO electrochemical devices, enabling analysis of ion migration, filament formation, and layer composition. Provides both individual species distributions and composite distributions for comprehensive materials characterization.
- Parameters:
coordinates_arr (np.ndarray) – Coordinate array with shape (n_frames, n_atoms, n_columns) for time series, or (n_atoms, n_columns) for single frame analysis. Column 1 must contain atom types, column 5 must contain z-coordinates.
z_bins (int) – Number of spatial bins for z-direction discretization. Recommended: 15-100 bins for balance between resolution and statistical significance.
zlo (float) – Lower z-boundary of analysis region (Angstroms). Should match electrode position.
zhi (float) – Upper z-boundary of analysis region (Angstroms). Should match opposite electrode.
- Returns:
distributions – Spatial distribution dictionary with keys:
Individual Species: - ‘hafnium’: Hf atom distributions (shape: n_frames × z_bins) - ‘tantalum’: Ta atom distributions - ‘oxygen’: O atom distributions
Composite Distributions: - ‘metal’: Combined Hf + Ta distributions (conductive species) - ‘total’: All atomic species combined (total density)
Each distribution array contains atom counts per spatial bin per frame.
- Return type:
Dict[str, np.ndarray]
Notes
Electrochemical Analysis Applications: - Filament tracking: Hf distribution shows conductive pathway evolution - Vacancy analysis: O distribution reveals ion migration patterns - Matrix stability: Ta distribution indicates structural changes - Electrode interactions: Interface region composition analysis
Statistical Considerations: - Empty frames produce zero-filled arrays with correct dimensions - Bin counts represent discrete atom positions (not normalized densities) - Multiple frames enable temporal analysis of switching dynamics
Performance Characteristics: - Memory complexity: O(n_frames × z_bins × 5) for output storage - Time complexity: O(n_frames × n_atoms × log(n_atoms)) due to species sorting - Optimized for batch processing of trajectory sequences
Physics-Informed Design: - Z-axis corresponds to electric field direction in devices - Species separation tracks individual ion migration mechanisms - Composite distributions reveal overall material redistribution - Bin resolution balances atomic-scale features with statistical significance
Examples
Single-frame analysis:
>>> import numpy as np >>> # Single configuration: 1000 atoms across electrode gap >>> coords = np.random.rand(1000, 6) >>> coords[:, 1] = np.random.choice([1, 2, 4], 1000) # O, Hf, Ta types >>> coords[:, 5] = np.random.uniform(-10, 40, 1000) # Z positions >>> distributions = calculate_atomic_distributions([coords], 50, -10, 40) >>> print(f"Hf peak density: {distributions['hafnium'][0].max()} atoms/bin")
Time-series filament analysis:
>>> # Multi-frame trajectory for SET/RESET switching >>> n_frames, n_atoms = 20, 500 >>> trajectory = np.random.rand(n_frames, n_atoms, 6) >>> trajectory[:, :, 1] = np.random.choice([1, 2, 4], (n_frames, n_atoms)) >>> trajectory[:, :, 5] = np.random.uniform(0, 30, (n_frames, n_atoms)) >>> dists = calculate_atomic_distributions(trajectory, 30, 0, 30) >>> >>> # Analyze filament evolution >>> hf_evolution = dists['hafnium'] # Shape: (20, 30) >>> initial_hf = hf_evolution[0] # Initial state >>> final_hf = hf_evolution[-1] # Final state >>> filament_growth = (final_hf - initial_hf).sum() >>> print(f"Net Hf redistribution: {filament_growth} atoms")
Layer-by-layer composition analysis:
>>> # Examine device cross-section >>> z_width, z_centers = calculate_z_bins_setup(-5, 35, 40) >>> coords = np.random.rand(800, 6) >>> coords[:, 1] = np.random.choice([1, 2, 4], 800) >>> coords[:, 5] = np.random.uniform(-5, 35, 800) >>> dists = calculate_atomic_distributions([coords], 40, -5, 35) >>> >>> # Calculate stoichiometry across device >>> hf_counts = dists['hafnium'][0] >>> ta_counts = dists['tantalum'][0] >>> o_counts = dists['oxygen'][0] >>> metal_total = dists['metal'][0] >>> >>> for i, z_pos in enumerate(z_centers): ... if metal_total[i] > 0: # Avoid division by zero ... hf_fraction = hf_counts[i] / metal_total[i] ... print(f"Z={z_pos:.1f}Å: Hf fraction = {hf_fraction:.2f}")
Electrode interface characterization:
>>> # Focus on electrode-oxide interfaces >>> interface_coords = coords[np.abs(coords[:, 5] - (-5)) < 3] # Near bottom electrode >>> interface_dists = calculate_atomic_distributions([interface_coords], 15, -8, 2) >>> print(f"Interface composition - Hf: {interface_dists['hafnium'][0].sum()}, " ... f"Ta: {interface_dists['tantalum'][0].sum()}, " ... f"O: {interface_dists['oxygen'][0].sum()}")
- lammpskit.ecellmodel.data_processing.calculate_charge_distributions(coordinates_arr, z_bins, zlo, zhi, atomic_distributions)[source]
Calculate electrostatic charge distributions for electrochemical field analysis.
Computes spatial charge profiles across the electrode-to-electrode axis to analyze electrostatic field formation, charge redistribution during switching, and ionic migration patterns in HfTaO resistive memory devices. Provides both total charge distributions and species-specific mean charge calculations for comprehensive electrochemical characterization.
- Parameters:
coordinates_arr (List[np.ndarray]) – Trajectory coordinate arrays with shape (n_atoms, n_columns) per frame. Column 2 must contain atomic charges (units: elementary charge e). Column 5 must contain z-coordinates for spatial analysis. Multi-frame input enables temporal charge evolution analysis.
z_bins (int) – Number of spatial bins for z-direction discretization. Recommended: 15-100 bins for optimal balance between field resolution and statistical significance.
zlo (float) – Lower z-boundary of analysis region (Angstroms). Typically bottom electrode position.
zhi (float) – Upper z-boundary of analysis region (Angstroms). Typically top electrode position.
atomic_distributions (Dict[str, np.ndarray]) – Atomic count distributions from calculate_atomic_distributions(). Required for safe normalization to compute mean charges per species. Must contain keys: ‘hafnium’, ‘tantalum’, ‘oxygen’, ‘metal’, ‘total’ with shape (n_frames, z_bins).
- Returns:
charge_distributions – Comprehensive charge analysis dictionary with keys:
Raw Charge Distributions: - ‘hafnium_charge’: Total Hf charge per bin (shape: n_frames × z_bins) - ‘tantalum_charge’: Total Ta charge per bin - ‘oxygen_charge’: Total O charge per bin - ‘metal_charge’: Combined Hf + Ta charge per bin - ‘total_charge’: All species combined charge per bin
Mean Charge per Atom: - ‘hafnium_mean_charge’: Average charge per Hf atom per bin - ‘tantalum_mean_charge’: Average charge per Ta atom per bin - ‘oxygen_mean_charge’: Average charge per O atom per bin - ‘metal_mean_charge’: Average charge per metal atom per bin - ‘total_mean_charge’: Average charge per atom (all species) per bin
All arrays have shape (n_frames, z_bins) for temporal analysis support.
- Return type:
Dict[str, np.ndarray]
Notes
Electrochemical Field Analysis: - Total charge: Reveals space charge regions and field gradients - Mean charges: Indicate oxidation state changes and ion mobility - Species separation: Tracks individual charge transfer mechanisms - Temporal evolution: Captures SET/RESET switching dynamics
Physical Interpretation: - Positive regions: Cation accumulation or anion depletion zones - Negative regions: Anion accumulation or cation depletion zones - Charge gradients: Drive ionic migration and filament formation - Interface charges: Control electron injection and device resistance
Mathematical Foundation:
Raw charge distribution:
Q_species(z) = Σ q_i * δ(z_i - z)
- Mean charge calculation:
<q>_species(z) = Q_species(z) / N_species(z)
Where safe division prevents numerical errors when N_species(z) = 0.
Performance Characteristics: - Memory complexity: O(n_frames × z_bins × 10) for all distributions - Time complexity: O(n_frames × n_atoms) for histogram calculations - Numerical stability: Safe division prevents undefined mean charges
Examples
Basic charge profile analysis:
>>> import numpy as np >>> from lammpskit.ecellmodel.data_processing import calculate_atomic_distributions >>> # Create mock trajectory with charge information >>> coords = np.random.rand(500, 6) >>> coords[:, 1] = np.random.choice([1, 2, 4], 500) # Atom types >>> coords[:, 2] = np.random.normal(0, 0.5, 500) # Charges around neutral >>> coords[:, 5] = np.random.uniform(-10, 40, 500) # Z positions >>> >>> # Calculate required atomic distributions first >>> atom_dists = calculate_atomic_distributions([coords], 50, -10, 40) >>> charge_dists = calculate_charge_distributions([coords], 50, -10, 40, atom_dists) >>> >>> # Analyze total electrostatic field >>> total_charge = charge_dists['total_charge'][0] >>> print(f"Max space charge density: {total_charge.max():.2f} e/bin") >>> print(f"Charge neutrality check: {total_charge.sum():.3f} e")
Species-specific charge analysis:
>>> # Examine oxidation state changes >>> hf_mean = charge_dists['hafnium_mean_charge'][0] >>> ta_mean = charge_dists['tantalum_mean_charge'][0] >>> o_mean = charge_dists['oxygen_mean_charge'][0] >>> >>> # Find regions with significant charge transfer >>> valid_hf = hf_mean[atom_dists['hafnium'][0] > 0] # Only where Hf atoms exist >>> if len(valid_hf) > 0: ... print(f"Hf oxidation range: {valid_hf.min():.2f} to {valid_hf.max():.2f} e") >>> >>> valid_o = o_mean[atom_dists['oxygen'][0] > 0] >>> if len(valid_o) > 0: ... print(f"O charge range: {valid_o.min():.2f} to {valid_o.max():.2f} e")
Temporal switching analysis:
>>> # Multi-frame charge evolution during switching >>> n_frames = 10 >>> trajectory = np.random.rand(n_frames, 300, 6) >>> trajectory[:, :, 1] = np.random.choice([1, 2, 4], (n_frames, 300)) >>> # Simulate progressive charge separation >>> for i in range(n_frames): ... trajectory[i, :, 2] = np.random.normal(0.1 * i, 0.3, 300) # Increasing separation >>> trajectory[:, :, 5] = np.random.uniform(0, 30, (n_frames, 300)) >>> >>> atom_dists = calculate_atomic_distributions(trajectory, 30, 0, 30) >>> charge_dists = calculate_charge_distributions(trajectory, 30, 0, 30, atom_dists) >>> >>> # Track charge evolution >>> total_evolution = charge_dists['total_charge'] # Shape: (10, 30) >>> for frame in range(n_frames): ... max_charge = total_evolution[frame].max() ... print(f"Frame {frame}: Max charge density = {max_charge:.2f} e/bin")
Electrode interface charge analysis:
>>> # Focus on electrode-oxide charge interfaces >>> z_width, z_centers = calculate_z_bins_setup(0, 30, 30) >>> coords = np.random.rand(400, 6) >>> coords[:, 2] = np.random.normal(0, 0.4, 400) # Realistic charge distribution >>> coords[:, 5] = np.random.uniform(0, 30, 400) >>> >>> atom_dists = calculate_atomic_distributions([coords], 30, 0, 30) >>> charge_dists = calculate_charge_distributions([coords], 30, 0, 30, atom_dists) >>> >>> # Identify charge accumulation regions >>> total_charge = charge_dists['total_charge'][0] >>> significant_charge = np.abs(total_charge) > 0.1 # Above noise threshold >>> if significant_charge.any(): ... charge_positions = z_centers[significant_charge] ... charge_values = total_charge[significant_charge] ... print(f"Charge accumulation at Z = {charge_positions} Å") ... print(f"Charge magnitudes: {charge_values} e/bin")
- lammpskit.ecellmodel.data_processing.extract_element_label_from_filename(filename)[source]
Extract element labels from HfTaO simulation filenames using intelligent parsing.
Provides robust filename analysis to identify atomic species from simulation output files, supporting various naming conventions used in electrochemical memory device analysis workflows. Essential for automated batch processing of species-specific mobility and displacement data from LAMMPS trajectory analysis.
- Parameters:
filename (str) – Full file path or basename containing element information. Supports common patterns from HfTaO simulation workflows: - Pattern format: “[digit][Element]mobile*.dat” (e.g., “1Hfmobilestc1.dat”) - Prefix format: “[Element]_*” or “[Element]*” (e.g., “Hf_mobility.dat”) - Embedded format: “[element]” (case-insensitive matching)
- Returns:
element_label – Standardized element symbol extracted from filename:
Standard Elements: - ‘Hf’: Hafnium (primary conductive species in filaments) - ‘Ta’: Tantalum (matrix material, device stability) - ‘O’: Oxygen (ion migration, vacancy formation) - ‘Al’: Aluminum (electrode material in some devices)
Special Cases: - ‘O_’: Oxygen with underscore (specific file convention) - ‘H’: Hydrogen (if present in simulation) - ‘??’: Fallback for unrecognized patterns
- Return type:
Notes
Parsing Strategy Hierarchy: 1. Pattern matching: [digit][Element]mobile format recognition 2. Prefix matching: Direct element prefix identification 3. Substring search: Case-insensitive element name detection 4. Character extraction: First 1-2 characters as fallback 5. Error handling: Graceful fallback for empty/invalid filenames
Element Mapping Logic: - ‘Oo’ → ‘O’: Handles double-letter oxygen notation - Case normalization: Converts to standard chemical symbols - Robust fallbacks: Prevents analysis pipeline failures
Electrochemical Simulation Context: Essential for processing mobility analysis outputs where different atomic species generate separate trajectory files. Enables automated species identification for: - Ion migration tracking (O atom vacancy pathways) - Filament analysis (Hf conductive bridge formation) - Matrix stability (Ta structural evolution) - Electrode interaction (Al/electrode interface dynamics)
Performance Characteristics: - Time complexity: O(filename_length) for regex operations - Memory usage: O(1) for string processing - Error resilience: Multiple fallback strategies prevent failures - Batch efficiency: Optimized for high-throughput filename processing
Integration with LAMMPSKit Workflows: - Mobility analysis: Species-specific diffusion coefficient extraction - Displacement tracking: Ion migration pathway identification - Batch processing: Automated trajectory file classification - Data organization: Species-sorted output file management
Examples
Standard HfTaO simulation files:
>>> from lammpskit.ecellmodel.data_processing import extract_element_label_from_filename >>> # Typical mobility analysis files >>> print(extract_element_label_from_filename("1Hfmobilestc1.dat")) 'Hf' >>> print(extract_element_label_from_filename("2Oomobilestc1.dat")) 'O' >>> print(extract_element_label_from_filename("3Tamobilestc1.dat")) 'Ta'
Alternative naming conventions:
>>> # Prefix-based naming >>> print(extract_element_label_from_filename("Hf_displacement_analysis.txt")) 'Hf' >>> print(extract_element_label_from_filename("Ta_mobility_data.csv")) 'Ta' >>> print(extract_element_label_from_filename("O_vacancy_tracking.dat")) 'O_'
Case-insensitive matching:
>>> # Handles various case conventions >>> print(extract_element_label_from_filename("hf_trajectory.dump")) 'Hf' >>> print(extract_element_label_from_filename("AL_electrode.lammpstrj")) 'Al' >>> print(extract_element_label_from_filename("oxygen_migration.xyz")) 'O'
Batch processing workflow:
>>> import os >>> # Process all files in mobility analysis directory >>> simulation_files = [ ... "1Hfmobilestc1.dat", "2Oomobilestc1.dat", "3Tamobilestc1.dat", ... "4Almobilestc1.dat", "summary_mobility.txt" ... ] >>> >>> species_files = {} >>> for filename in simulation_files: ... element = extract_element_label_from_filename(filename) ... if element not in species_files: ... species_files[element] = [] ... species_files[element].append(filename) >>> >>> # Organize by species for analysis >>> for species, files in species_files.items(): ... print(f"{species}: {len(files)} files")
Error handling demonstration:
>>> # Robust handling of edge cases >>> print(extract_element_label_from_filename("")) # Empty filename '??' >>> print(extract_element_label_from_filename("unknown_format.dat")) 'un' >>> print(extract_element_label_from_filename("H")) # Single character 'H'
Integration with trajectory analysis:
>>> # Automated species identification in analysis pipeline >>> def process_mobility_files(file_list): ... species_data = {} ... for filepath in file_list: ... element = extract_element_label_from_filename(filepath) ... # Load and process species-specific data ... if element in ['Hf', 'Ta', 'O', 'Al']: ... species_data[element] = f"Processing {element} mobility from {filepath}" ... return species_data >>> >>> files = ["1Hfmobilestc1.dat", "2Oomobilestc1.dat"] >>> results = process_mobility_files(files) >>> for species, status in results.items(): ... print(f"{species}: {status}")