# Architecture This page explains the main components in NeXusCreator and how data flows from the initial inputs to the final NeXus (`.nxs`) file. ## Components - `NeXusCreator.py` — command-line entry point. - Parses options and either generates a NeXus definition (`.nxd`) via the plugin system, or performs a full conversion using `NeXusCreatorClass`. - With `--yaml` in generation mode (`-g`) it emits YAML instead of `.nxd` using `libraries/NeXusYaml.py`. - `NeXusCreatorClass.py` — orchestrates conversions. - Normalises output paths, handles ICAT subfolders, and selects a `DataParserPlugin` via the plugin manager (with fallbacks to built-in parsers). - Supports beamline-specific modes (for example `batteries`, `ikft`). - Parses the input into a variable library, loads the `.nxd`, injects values, and writes the final `.nxs`. - Handles SPEC per-scan output and builds a master file with HDF5 external links. - `libraries/NeXusHDF5.py` — data injection and writing. - `NexusValueInjector` replaces placeholders in the NeXus definition with real arrays and expands scan templates marked with `@scan_template`. - `NexusHDF5Writer` converts `@dtype`/`@value` descriptors to datasets, applies attributes, and recognises `@link` (soft links) and `@extlink` (HDF5 external links). - `libraries/NeXusDefinition.py` — loads `.nxd` files into the nested objects consumed by the injector and writer; supports internal (`name: --> /abs/path`) and external (`name: --> file | /abs/path`) links. - `libraries/NeXusYaml.py` — reads and writes the YAML representation with `link:` for internal links and `external: {file, path}` for external links. - `libraries/eis_processing.py` — enriches DTA/DAT variable libraries with derived electrochemical datasets (charge, state-of-charge, current/voltage curves, EIS metrics). Activated by the `--batteries-analysis` flag. - `libraries/mpes_utils.py` — locates and validates MPES HDF5 input files for the MPES plugin. - `parsers/` — format-specific parsers reused by plugins (SPEC, DTA RAW/non-RAW/temp, TIFF, etc.). - `parsers/base.py` defines the `BaseParser` class and `ParserManager` for dynamic discovery and registration. - Each parser inherits from `BaseParser` and implements `can_parse` and `parse` methods. - `generators/` — helpers to construct NeXus definition objects from input files and folders. - `generators/base.py` defines the `BaseGenerator` class and `GeneratorManager` for dynamic discovery and registration. - `generators/base.py` also exports `make_entry_obj()` and `get_or_create_group()` helper functions for plugin authors. - Each generator inherits from `BaseGenerator` and implements `can_generate` and `generate` methods. - `nexuscreator/constants.py` — central constants (`NXClass`, `FileExt`) for NeXus class names and file extensions, shared across plugins and generators. - `plugins/` — extension point for third parties. - `plugins/base.py` defines the `DefinitionGeneratorPlugin` and `DataParserPlugin` interfaces. - Discovery imports every module under `plugins/`, instantiates subclasses, and orders them by `priority`. - Built-in plugins: `spec_plugin.py`, `dta_plugin.py`, `hdf5_plugin.py`, `tiff_plugin.py`, `peaxis_plugin.py`, `yaml_plugin.py`, `mpes_plugin.py`, `jsonld_plugin.py`, `diamond_ascii_plugin.py`. ## Data Flow 1. The CLI parses arguments, then chooses between generation (`-g`) and conversion (`-n`). 2. For conversion, `NeXusCreatorClass`: - Chooses a parser via `Plugins.get_plugin_manager().get_parser(...)` (with fallbacks). - Produces a flat variable library (keys such as `general_*`, `scan12_*`). - Reads the `.nxd`, expands scan templates, injects values with `NexusValueInjector`, and writes the result via `NexusHDF5Writer`. - With `-f` and SPEC, splits into per-scan files and creates a master with external links to `/entry` in each per-scan file. ## External Links for SPEC Per-Scan Outputs When `-f/--file_per_scan` is used, the tool writes one `.nxs` per scan and a master file containing HDF5 external links under `/entry/scan_XX`, each pointing to `/entry` in the per-scan files. This keeps scans self-contained while providing a single entry point. ## Output Path Rules - If `-o` is a directory, outputs are created inside it using the `.nxd` base name. - If `-o` is a filename, it is used verbatim. - If `-I NUM` is provided, a `proposal_/` subfolder is appended to the output path before writing. This is used for ICAT data ingestion workflows where each proposal's outputs must reside in a dedicated subdirectory. ## Parser and Generator System The parser and generator system provides a consistent and extensible way to handle different input formats and generate NeXus definitions. ### Base Classes - **BaseParser**: Defines the interface for all parsers. Each parser must implement: - `can_parse(input_path: str) -> bool`: Checks if the parser can handle the given input path. - `parse(input_path: str) -> Dict[str, object]`: Parses the input file and returns a flat library. - **BaseGenerator**: Defines the interface for all generators. Each generator must implement: - `can_generate(input_path: str) -> bool`: Checks if the generator can handle the given input path. - `generate(input_path: str) -> dict`: Generates a NeXus-definition object from the input. ### Managers - **ParserManager**: Manages the discovery and registration of parsers. - Discovers all parsers in the `parsers` package. - Provides a method to get a parser for a specific file type. - **GeneratorManager**: Manages the discovery and registration of generators. - Discovers all generators in the `generators` package. - Provides a method to get a generator for a specific file type. ### Discovery and Registration The system automatically discovers and registers parsers and generators by: 1. Importing all modules in the `nexuscreator.parsers` or `nexuscreator.generators` package. 2. Finding all classes that inherit from `BaseParser` or `BaseGenerator`. 3. Instantiating these classes and sorting them by priority. ### Usage To use the parser and generator system: ```python from nexuscreator.parsers import get_parser_manager from nexuscreator.generators import get_generator_manager # Get a parser for a specific file type parser_manager = get_parser_manager() parser = parser_manager.get_parser("test.dta") if parser: library = parser.parse("test.dta") # Get a generator for a specific file type generator_manager = get_generator_manager() generator = generator_manager.get_generator("test.dta") if generator: nexus_object = generator.generate("test.dta") ``` ### Creating a New Parser or Generator To create a new parser or generator: 1. **Create a new parser**: ```python from nexuscreator.parsers.base import BaseParser class MyParser(BaseParser): id: str = 'my-parser' priority: int = 10 def can_parse(self, input_path: str) -> bool: return input_path.lower().endswith('.myformat') def parse(self, input_path: str) -> Dict[str, object]: # Parse the file and return a flat library return {"key": "value"} ``` 2. **Create a new generator**: ```python from nexuscreator.generators.base import BaseGenerator class MyGenerator(BaseGenerator): id: str = 'my-generator' priority: int = 10 def can_generate(self, input_path: str) -> bool: return input_path.lower().endswith('.myformat') def generate(self, input_path: str) -> dict: # Generate a NeXus-definition object from the input return {"entry": {"@NX_class": "NXentry"}} ``` ### Benefits - **Consistency**: All parsers and generators follow a consistent interface. - **Extensibility**: New parsers and generators can be easily added by inheriting from the base classes. - **Discoverability**: The system automatically discovers and registers parsers and generators. - **Testability**: The new system is well-tested, ensuring reliability. ## Plugin Overview - **Generators** decide whether they can build an `.nxd` for the given input and return the NeXus definition object. - **Parsers** decide whether they can parse the input and return the flat dictionary consumed by placeholders in the `.nxd`. Refer to the `plugins/` package for built-in examples.