Architecture

This page explains the main components in NeXusCreator and how data flows from the initial inputs to the final NeXus (.nxs) file.

Components

  • NeXusCreator.py — command-line entry point.

    • Parses options and either generates a NeXus definition (.nxd) via the plugin system, or performs a full conversion using NeXusCreatorClass.

    • With --yaml in generation mode (-g) it emits YAML instead of .nxd using libraries/NeXusYaml.py.

  • NeXusCreatorClass.py — orchestrates conversions.

    • Normalises output paths, handles ICAT subfolders, and selects a DataParserPlugin via the plugin manager (with fallbacks to built-in parsers).

    • Supports beamline-specific modes (for example batteries, ikft).

    • Parses the input into a variable library, loads the .nxd, injects values, and writes the final .nxs.

    • Handles SPEC per-scan output and builds a master file with HDF5 external links.

  • libraries/NeXusHDF5.py — data injection and writing.

    • NexusValueInjector replaces placeholders in the NeXus definition with real arrays and expands scan templates marked with @scan_template.

    • NexusHDF5Writer converts @dtype/@value descriptors to datasets, applies attributes, and recognises @link (soft links) and @extlink (HDF5 external links).

  • libraries/NeXusDefinition.py — loads .nxd files into the nested objects consumed by the injector and writer; supports internal (name: --> /abs/path) and external (name: --> file | /abs/path) links.

  • libraries/NeXusYaml.py — reads and writes the YAML representation with link: for internal links and external: {file, path} for external links.

  • libraries/eis_processing.py — enriches DTA/DAT variable libraries with derived electrochemical datasets (charge, state-of-charge, current/voltage curves, EIS metrics). Activated by the --batteries-analysis flag.

  • libraries/mpes_utils.py — locates and validates MPES HDF5 input files for the MPES plugin.

  • parsers/ — format-specific parsers reused by plugins (SPEC, DTA RAW/non-RAW/temp, TIFF, etc.).

    • parsers/base.py defines the BaseParser class and ParserManager for dynamic discovery and registration.

    • Each parser inherits from BaseParser and implements can_parse and parse methods.

  • generators/ — helpers to construct NeXus definition objects from input files and folders.

    • generators/base.py defines the BaseGenerator class and GeneratorManager for dynamic discovery and registration.

    • generators/base.py also exports make_entry_obj() and get_or_create_group() helper functions for plugin authors.

    • Each generator inherits from BaseGenerator and implements can_generate and generate methods.

  • nexuscreator/constants.py — central constants (NXClass, FileExt) for NeXus class names and file extensions, shared across plugins and generators.

  • plugins/ — extension point for third parties.

    • plugins/base.py defines the DefinitionGeneratorPlugin and DataParserPlugin interfaces.

    • Discovery imports every module under plugins/, instantiates subclasses, and orders them by priority.

    • Built-in plugins: spec_plugin.py, dta_plugin.py, hdf5_plugin.py, tiff_plugin.py, peaxis_plugin.py, yaml_plugin.py, mpes_plugin.py, jsonld_plugin.py, diamond_ascii_plugin.py.

Data Flow

  1. The CLI parses arguments, then chooses between generation (-g) and conversion (-n).

  2. For conversion, NeXusCreatorClass:

    • Chooses a parser via Plugins.get_plugin_manager().get_parser(...) (with fallbacks).

    • Produces a flat variable library (keys such as general_*, scan12_*).

    • Reads the .nxd, expands scan templates, injects values with NexusValueInjector, and writes the result via NexusHDF5Writer.

    • With -f and SPEC, splits into per-scan files and creates a master with external links to /entry in each per-scan file.

Output Path Rules

  • If -o is a directory, outputs are created inside it using the .nxd base name.

  • If -o is a filename, it is used verbatim.

  • If -I NUM is provided, a proposal_<NUM>/ subfolder is appended to the output path before writing. This is used for ICAT data ingestion workflows where each proposal’s outputs must reside in a dedicated subdirectory.

Parser and Generator System

The parser and generator system provides a consistent and extensible way to handle different input formats and generate NeXus definitions.

Base Classes

  • BaseParser: Defines the interface for all parsers. Each parser must implement:

    • can_parse(input_path: str) -> bool: Checks if the parser can handle the given input path.

    • parse(input_path: str) -> Dict[str, object]: Parses the input file and returns a flat library.

  • BaseGenerator: Defines the interface for all generators. Each generator must implement:

    • can_generate(input_path: str) -> bool: Checks if the generator can handle the given input path.

    • generate(input_path: str) -> dict: Generates a NeXus-definition object from the input.

Managers

  • ParserManager: Manages the discovery and registration of parsers.

    • Discovers all parsers in the parsers package.

    • Provides a method to get a parser for a specific file type.

  • GeneratorManager: Manages the discovery and registration of generators.

    • Discovers all generators in the generators package.

    • Provides a method to get a generator for a specific file type.

Discovery and Registration

The system automatically discovers and registers parsers and generators by:

  1. Importing all modules in the nexuscreator.parsers or nexuscreator.generators package.

  2. Finding all classes that inherit from BaseParser or BaseGenerator.

  3. Instantiating these classes and sorting them by priority.

Usage

To use the parser and generator system:

from nexuscreator.parsers import get_parser_manager
from nexuscreator.generators import get_generator_manager

# Get a parser for a specific file type
parser_manager = get_parser_manager()
parser = parser_manager.get_parser("test.dta")
if parser:
    library = parser.parse("test.dta")

# Get a generator for a specific file type
generator_manager = get_generator_manager()
generator = generator_manager.get_generator("test.dta")
if generator:
    nexus_object = generator.generate("test.dta")

Creating a New Parser or Generator

To create a new parser or generator:

  1. Create a new parser:

from nexuscreator.parsers.base import BaseParser

class MyParser(BaseParser):
    id: str = 'my-parser'
    priority: int = 10

    def can_parse(self, input_path: str) -> bool:
        return input_path.lower().endswith('.myformat')

    def parse(self, input_path: str) -> Dict[str, object]:
        # Parse the file and return a flat library
        return {"key": "value"}
  1. Create a new generator:

from nexuscreator.generators.base import BaseGenerator

class MyGenerator(BaseGenerator):
    id: str = 'my-generator'
    priority: int = 10

    def can_generate(self, input_path: str) -> bool:
        return input_path.lower().endswith('.myformat')

    def generate(self, input_path: str) -> dict:
        # Generate a NeXus-definition object from the input
        return {"entry": {"@NX_class": "NXentry"}}

Benefits

  • Consistency: All parsers and generators follow a consistent interface.

  • Extensibility: New parsers and generators can be easily added by inheriting from the base classes.

  • Discoverability: The system automatically discovers and registers parsers and generators.

  • Testability: The new system is well-tested, ensuring reliability.

Plugin Overview

  • Generators decide whether they can build an .nxd for the given input and return the NeXus definition object.

  • Parsers decide whether they can parse the input and return the flat dictionary consumed by placeholders in the .nxd.

Refer to the plugins/ package for built-in examples.