# Usage Guide

NeXusCreator can generate NeXus definitions (`.nxd` or YAML) from a variety of inputs and convert
them into NeXus HDF5 files (`.nxs`). This guide shows the common CLI flows and illustrates what
happens under the hood.

## Installation Recap

```bash
python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install .
```

This provides both the `nexuscreator` and `nxc` entry points. You can also invoke the module
directly using `python3 -m nexuscreator`.

## Choose Options by Relevance

Use this quick map before running commands:

- Always relevant: `-i/--input`, `-o/--output_path`, `-b/--beamline_name`
- Converting to `.nxs` (`-n`): `-n/--nexus_definition`, `-f/--file_per_scan`, `-I/--icat_proposal_number`, `--auto-generate-nxd`, `--pair-dta-raw`
- Generating definitions (`-g`): `-g/--generate_nexus_definition`, `-t/--template`, `--single-file`, `--multi-file`, `--yaml`, `--hdf5-option`
- Directory/batch work: `-r/--recursive`, `--glob`, `--glob-spec`, `--glob-dta`, `--dry-run`, `--summary-only`, `--no-group-dta-folders`
- Metadata/schema placement: `--metadata-csv`, `--jsonld-structure`, `--nxdl-root`, `--app-def`, `--export-vars-csv`, `-d/--dictionary`, `-D/--debug`
- Validation/value export: `--validate`, `--export-values-csv`, `--export-values-prefix`, `--csv-delimiter`

## Generate a NeXus Definition

```bash
# From a SPEC file
nexuscreator -g out.nxd -i data.spec

# Produce a single-scan template for later multi-scan conversion
nexuscreator -g template.nxd -i data.spec -t

# Batteries workflow (DTA/DAT folder)
nexuscreator -g out.nxd -i /path/to/folder -b batteries
```

Add `--yaml` to build YAML templates instead of `.nxd`. When generating from directories you can
choose how many outputs to emit:

```bash
# Single combined .nxd for the entire folder
nexuscreator -g out.nxd -i /path/to/folder --single-file

# One .nxd per supported input file
nexuscreator -g out_prefix_ -i /path/to/folder --multi-file

# YAML variants
nexuscreator -g out_dir/ -i /path/to/folder --single-file --yaml
nexuscreator -g out_dir/ -i /path/to/folder --multi-file --yaml
```

When targeting HDF5 or NeXus inputs, pick how placeholders should be generated:

```bash
# Links mode (default for generation): keep references to datasets
nexuscreator -g out.nxd -i data.nxs --hdf5-option links

# Extract mode: replace values with placeholders and emit a variable library
nexuscreator -g out.nxd -i data.nxs --hdf5-option extract
```

## Convert to NeXus HDF5

```bash
# Single output from SPEC
nexuscreator -n def.nxd -i data.spec -o out.nxs

# One file per scan plus a master with external links
nexuscreator -n def.nxd -i data.spec -o out.nxs -f

# Batteries or single DTA/DAT
nexuscreator -n def.nxd -i /path/to/folder -b batteries -o out.nxs
nexuscreator -n def.nxd -i data.dta -o out.nxs

# Single DTA + matching RAW sibling as one output
nexuscreator -n def.nxd -i EIS_CH_#1_#1.dta -o out.nxs --pair-dta-raw

# Diamond B18 XAFS (ASCII or NeXus input)
nexuscreator -g diamond.nxd -i "data/cdi-ddi/.../ascii/263814_PtSn_OCA_1.dat"
nexuscreator -n diamond.nxd -i "data/cdi-ddi/.../nexus/263814_PtSn_OCA_1.nxs" -o out.nxs

# MPES workflow
nexuscreator -g mpes.nxd -i /path/to/mpes_data -b mpes
nexuscreator -n mpes.nxd -i /path/to/mpes_data -b mpes -o out.nxs
```

Generation mode also supports pairing for a single `.dta` input:

```bash
nexuscreator -g pair.nxd -i EIS_CH_#1_#1.dta --pair-dta-raw
```

To inject ICAT proposal folders or auto-generate definitions during conversion:

```bash
# Append ICAT proposal number to the output path
# Outputs are placed under <output_path>/proposal_12345/
nexuscreator -n def.nxd -i data.spec -o ./outputs -I 12345

# Directory conversion with auto-generated definitions per file
nexuscreator -n def.nxd -i /path/to/folder -o ./out/ --auto-generate-nxd
```

## Batteries Electrochemical Analysis

Add `--batteries-analysis` to any batteries conversion to enrich the output with derived
electrochemical datasets — state-of-charge (SoC), current/voltage curves, and EIS metrics
(`libraries/eis_processing.py`):

```bash
# Generate definition with EIS structure
nexuscreator -g eis.nxd -i /path/to/batteries_folder -b batteries

# Convert with full electrochemical analysis
nexuscreator -n eis.nxd -i /path/to/batteries_folder -b batteries -o out.nxs \
  --batteries-analysis
```

Without `--batteries-analysis`, the batteries parser produces the raw DTA/DAT variable library
only. With it, additional datasets (charge, state-of-charge, IV curves) are computed and
injected automatically.

### DTA/RAW pair templates

When generating a paired DTA/RAW template, optionally supply the user name to avoid an
interactive prompt:

```bash
nexuscreator -g pair.nxd -i EIS_CH_#1_#1.dta --pair-dta-raw --user-name "Jane Smith"
```

If `--user-name` is omitted and the template requires a user field, the CLI prompts once and
caches the answer for the remainder of the run.

## MPES Workflow

The MPES plugin handles multi-photon emission spectroscopy HDF5 data. Activate it with
`-b mpes`:

```bash
# Generate a NeXus definition from MPES data
nexuscreator -g mpes.nxd -i /path/to/mpes_data -b mpes

# Convert to NeXus HDF5
nexuscreator -n mpes.nxd -i /path/to/mpes_data -b mpes -o out.nxs
```

The plugin uses `libraries/mpes_utils.py` to locate the HDF5 file within the input path and
`parsers/mpes_parser.py` to extract the variable library. The generator emits a definition
via `generators/mpes_to_nexus.py`.

## Directory Scans and Batch Processing

```bash
# Process a directory non-recursively; one output per input
nexuscreator -n def.nxd -i /path/to/dir -o ./out/

# Recurse into subfolders
nexuscreator -n def.nxd -i /data/root -o ./out/ -r

# Restrict to files matching patterns
nexuscreator -n def.nxd -i /data/root -o ./out/ -r --glob "*.spec"
nexuscreator -n def.nxd -i /data/root -o ./out/ -r \
  --glob-spec "*.spec" --glob-dta "EIS_*.dta"

# Dry run with a concise summary
nexuscreator -n def.nxd -i /data/root -o ./out/ -r \
  --glob-spec "*.spec" --glob-dta "EIS_*.dta" \
  --dry-run --summary-only
```

For DTA/DAT inputs with `-r`, NeXusCreator groups files by parent folder and produces one combined
output per folder. Disable this behaviour with `--no-group-dta-folders`.

## Placement Controls and Metadata

```bash
# Bias placement using base classes (always) + an application definition when requested
nexuscreator -g out.nxd -i data.spec \
  --nxdl-root /path/to/nxdl --app-def NXxas  # without --app-def only base classes are searched

# Enrich variables with metadata from CSV (name, description, units)
nexuscreator -g out.nxd -i /path/to/folder --metadata-csv femtospex.csv

# Use a JSON-LD descriptor + text file
nexuscreator -g out.nxd -i data/Se_Na2SeO4_rt_01.xdi \
  --jsonld-structure data/se_na2so4-testschemaorg-cdiv3.jsonLD

# Convert using the JSON-LD parser
nexuscreator -n out.nxd -i data/Se_Na2SeO4_rt_01.xdi \
  --jsonld-structure data/se_na2so4-testschemaorg-cdiv3.jsonLD

# Combine JSON-LD with SchemaPlacer to map variables into NXDL classes
nexuscreator -g out.nxd -i data/Se_Na2SeO4_rt_01.xdi \
  --jsonld-structure data/se_na2so4-testschemaorg-cdiv3.jsonLD \
  --nxdl-root /path/to/nxdl --app-def NXxas
```

### JSON-LD Parser

- Supply `--jsonld-structure FILE` alongside `-i FILE` to describe how to read fixed-width or delimited text files using CDI/Schema.org JSON-LD.
- When combined with `--nxdl-root`/`--app-def`, the parsed variables are run through SchemaPlacer before the fallback logic kicks in.
- If `--nxdl-root` is omitted, NeXusCreator uses `external_references/nexus/nexus_definitions` when present.
- Even without NXDL hints, `energy` is mapped to `/entry/instrument/monochromator/energy`, `i0` to `/entry/instrument/incoming_beam/data`, and `itrans` to `/entry/instrument/absorbed_beam/data`; any remaining arrays land in `entry/instrument/logs/<name>`.

Enable debug output or inspect the variable library:

```bash
# Print the NeXus definition line currently being processed
nexuscreator -n def.nxd -i data.spec -D

# Inspect the variable dictionary produced by the parser
nexuscreator -g out.nxd -i data.spec -d
```

### Inspecting variables: `-d` vs `--export-vars-csv`

| Flag | Output | Includes metadata |
|------|--------|-------------------|
| `-d, --dictionary` | Prints to terminal during run | No |
| `--export-vars-csv FILE` | CSV file (variable\_name, variable\_description, units) | Yes |

Use `--export-vars-csv` when you want a machine-readable catalogue of all variables produced
by the parser, including units and descriptions from `__attrs__` or a `--metadata-csv` source:

```bash
nexuscreator -i data.spec --export-vars-csv vars.csv
nexuscreator -i /path/to/folder --export-vars-csv vars.csv -r --glob-dta "EIS_*.dta"
```

## Prompt Literals

Prompt literals allow `.nxd` templates to interactively ask for a value at conversion time.
A dataset value that starts with `?` followed by a quoted string is a prompt literal:

```text
sample_name:NX_CHAR = ?"Sample name"
user:NX_CHAR        = ?'User name'
```

When NeXusCreator encounters a prompt literal during conversion, it prints the prompt text and
waits for user input. The entered value is **cached for the entire run**, so batch processing a
folder of 50 files will prompt only once per unique literal — not once per file.

Prompt literals are especially useful for fields that vary per experiment but are not present in
the data file (user name, sample description, proposal ID, etc.).

See `nexus-description-syntax` for the full specification.

## Validate Outputs

Use `--validate` to run `punx` validation immediately after writing a `.nxs` file. This is skipped
for `.nxd`/YAML generation. If `punx` is not installed, NeXusCreator looks for it under
`external_references/punx`.

```bash
nexuscreator -n def.nxd -i data.spec -o out.nxs --validate
```

## Export Dataset Values to CSV

Export a `.nxs` (or any HDF5) file to CSV with datasets expanded row-by-row. Scalars repeat to
match the longest column.

```bash
# Export every dataset (can be large)
nexuscreator -i data.nxs --export-values-csv values.csv

# Filter to a single HDF5 prefix and change the delimiter
nexuscreator -i data.nxs --export-values-csv values.csv \
  --export-values-prefix /entry/experiments/open_circuit_potential/ \
  --csv-delimiter ';'
```

## Links in `.nxd` and YAML

```text
# .nxd internal link
scopeP: --> /entry/instrument/detector/scope/scopeP

# YAML internal link
scopeP:
  link: /entry/instrument/detector/scope/scopeP

# .nxd external link
calibration: --> ../calibration/run_001.nxs | /entry/

# YAML external link
calibration:
  external:
    file: ../calibration/run_001.nxs
    path: /entry/
```

For a complete syntax reference, see `nexus-description-syntax`.

During conversion, internal links become HDF5 `SoftLink`s and external links become `ExternalLink`s.

## Python API (Advanced)

```python
from nexuscreator.plugins import get_plugin_manager
from nexuscreator.libraries.NeXusHDF5 import NexusValueInjector, NexusHDF5Writer

pm = get_plugin_manager()
flags = {"dictionary": False, "template_for_all_scans": False}

# One-liner helpers: find the right plugin and run it
nexus_object = pm.generate_definition("data.spec", beamline=None, flags=flags)
library = pm.parse_to_library("data.spec", beamline=None, flags={})

NexusValueInjector(library).inject(nexus_object)
NexusHDF5Writer(nexus_object).write("out.nxs")
```

### Using Parsers Directly

Every parser can be imported and called from your own scripts. They all return the flat variable
dictionary that the CLI would inject:

```python
from nexuscreator.parsers.diamond_ascii_parser import DiamondAsciiParser

parser = DiamondAsciiParser()
library = parser.parse("data/cdi-ddi/.../ascii/263814_PtSn_OCA_1.dat")

print(library["diamond_qexafs_energy"][:3])
print(library["general_sample_name"])
```

The `library` dict is what `NexusValueInjector` consumes. Keys such as `__attrs__` hold units and
descriptions that will become dataset attributes automatically.

Typical structure:

```python
{
    "general_sample_name": "PtSn_OCA",
    "general_command": "qexafs_energy 11364.0 13000.0 3266 63.59 qexafs_counterTimer01",
    "diamond_qexafs_energy": [11364.22, 11364.66, ...],
    "diamond_lni0it": [-0.77336, -0.77270, ...],
    "nexus_entry1_instrument_qexafs_counterTimer01_time": [0.019014, 0.018883, ...],
    "__attrs__": {
        "diamond_qexafs_energy": {"@units": "eV"},
        "nexus_entry1_instrument_qexafs_counterTimer01_time": {"@units": "s"},
    },
}
```

### Building NeXus Objects Programmatically

To generate a definition object without touching the CLI, use the plugin generators or call the
helpers directly:

```python
from nexuscreator.plugins import get_plugin_manager

pm = get_plugin_manager()
nexus_object = pm.generate_definition("data/cdi-ddi/.../ascii/263814_PtSn_OCA_1.dat",
                                      beamline=None,
                                      flags={"app_def": "NXxas"})
```

If you already have an `.nxd` file, load it with `NexusDefinitionReader` (see previous snippet). You
can also build the object manually; it’s just a nested dict mimicking the NeXus hierarchy:

```python
nexus_object = {
    "@default": "entry",
    "entry": {
        "@NX_class": "NXentry",
        "@default": "data",
        "instrument": {
            "@NX_class": "NXinstrument",
            "source": {
                "@NX_class": "NXsource",
                "name": {"@dtype": "NX_CHAR", "@value": '"synchrotron"'},
            },
        },
        "data": {
            "@NX_class": "NXdata",
            "@signal": "counts",
            "@axes": "energy",
            "energy": {"@dtype": "NX_FLOAT64[]", "@value": "diamond_qexafs_energy"},
            "counts": {"@dtype": "NX_FLOAT64[]", "@value": "diamond_it"},
        },
    },
}
```

This hand-written object can be fed straight into `NexusHDF5Writer` after `NexusValueInjector`
replaces the placeholders (`diamond_*` keys) with actual arrays from your parser.

For plotting-friendly outputs, expose an `NXdata` view that links to the real datasets:

```python
nexus_object = {
    "entry": {
        "@NX_class": "NXentry",
        "measurement": {  # real data lives here
            "@NX_class": "NXcollection",
            "energy": {"@dtype": "NX_FLOAT64[]", "@value": "diamond_qexafs_energy"},
            "counts": {"@dtype": "NX_FLOAT64[]", "@value": "diamond_it"},
        },
        "data": {  # plotting-friendly NXdata group
            "@NX_class": "NXdata",
            "@signal": "counts",
            "@axes": ["energy"],
            "energy": {"@link": "/entry/measurement/energy"},
            "counts": {"@link": "/entry/measurement/counts"},
        },
    }
}
```

Here `@signal` and `@axes` define the plot, while `@link` ensures the `NXdata` group points at the
datasets stored elsewhere in the file.

### Creating and Modifying `nexus_object` Safely

In practice, you usually modify `nexus_object` at one of these stages:

1. **Before injection** (recommended for structure edits): add/remove groups, add links, set
   `@NX_class`, `@signal`, `@axes`, `@units`, and placeholder `@value` keys.
2. **After injection** (for final tweaks): adjust literal metadata fields only (for example
   `@long_name`, attributes), not placeholder keys expected by the injector.

Typical lifecycle:

```python
from nexuscreator.libraries.NeXusDefinition import NexusDefinitionReader
from nexuscreator.libraries.NeXusHDF5 import NexusValueInjector, NexusHDF5Writer

obj = NexusDefinitionReader().read("template.nxd")  # create/load

# modify structure before injection
entry = obj.setdefault("entry", {"@NX_class": "NXentry"})
entry.setdefault("instrument", {"@NX_class": "NXinstrument"})
entry["instrument"]["name"] = {"@dtype": "NX_CHAR", "@value": '"My instrument"'}

# inject parser library values into placeholders
NexusValueInjector(library).inject(obj)

# optional post-injection metadata tweaks
entry.setdefault("data", {"@NX_class": "NXdata"})
entry["data"]["@long_name"] = "Processed dataset"

NexusHDF5Writer(obj).write("out.nxs")
```

Recommended rules:

- Keep dataset descriptors as dicts with both `@dtype` and `@value`.
- Use `@link` / `@extlink` for links instead of embedding resolved paths as plain strings.
- For `NX_CHAR`, use literal text in `@value`; for numeric arrays, use placeholders that exist in
  the parsed `library`.
- When batch-processing folders, prompt literals (`?"..."`) are resolved once per run and reused
  for all files in that run.
- If you mutate placeholder names, update the parser library keys (or template placeholders) to
  match, otherwise injection will miss values.

### Describing Links in `nexus_object`

Use `@link` for internal soft links and `@extlink` for external links:

```python
nexus_object = {
    "entry": {
        "@NX_class": "NXentry",
        "data": {
            "@link": "/entry/measurement/counts"           # SoftLink inside this file
        },
        "calibration": {
            "@extlink": {                                 # ExternalLink to another file
                "file": "../calibration/run_001.nxs",
                "path": "/entry/"
            }
        },
    }
}
```

`NexusHDF5Writer` turns `@link` into an HDF5 `SoftLink` and `@extlink` into an `ExternalLink`.

### Writing `.nxs` Files from a Script

You can reuse the high-level class that powers the CLI when you want the whole conversion pipeline in
Python code:

```python
from nexuscreator.creator import NeXusCreator

creator = NeXusCreator()
creator.execute_conversion({
    "input_path": "data/spec/sample.spec",
    "nexus_definition_file": "defs/sample.nxd",
    "output_path": "out/sample.nxs",
})
```

This accepts the same flags as the CLI (`beamline_name`, `auto_generate_nxd`, etc.). For finer
control, keep using `get_plugin_manager()` + `NexusValueInjector` + `NexusHDF5Writer` directly as
shown above.

## CLI Reference (Grouped)

### Core

- `-h, --help` — show help and exit.
- `-v, --version` — print the package version.
- `--license`, `--notice` — print licensing information.
- `--list-beamlines` — list accepted values for `-b/--beamline`.

### Input / Output

- `-i, --input PATH` — source data (file or directory).
- `-o, --output_path PATH` — target `.nxs` file or output directory.

### Conversion (use an existing `.nxd`)

- `-n, --nexus_definition FILE`
- `-f, --file_per_scan` — for SPEC, one `.nxs` per scan plus master.
- `-I, --icat_proposal_number NUM` — append ICAT subfolder to outputs.
- `--auto-generate-nxd` — with directory inputs, generate per-file `.nxd` automatically.

### Generation (build a new definition)

- `-g, --generate_nexus_definition FILE`
- `-t, --template` — emit a single-scan template.
- `--single-file`, `--multi-file` — control directory generation mode.
- `--yaml` — write YAML definitions instead of `.nxd`.
- `--hdf5-option MODE` — `links` (default for `-g`) or `extract` (default for `-n`).

### Scanning & Batch Flags

- `-r, --recursive`
- `--glob PATTERN` — limit matches for all supported types.
- `--glob-spec`, `--glob-dta` — type-specific patterns.
- `--dry-run`, `--summary-only`
- `--no-group-dta-folders` — process each DTA/DAT file individually.

### Metadata & Placement

- `-b, --beamline_name NAME` — beamline-specific context (for example `ikft`, `batteries`).
- `--metadata-csv FILE` — supply descriptions/units for variables.
- `--export-vars-csv FILE` — export parsed variables to CSV.
- `--nxdl-root PATH`, `--app-def NAME` — NXDL placement hints. Base classes are always indexed; `--app-def` adds the selected application definition to the search. Defaults to `external_references/nexus/nexus_definitions` when present.
- `--jsonld-structure FILE` — CDI/Schema.org JSON-LD document describing how to read the input file; pairs nicely with `--nxdl-root`.
- `-d, --dictionary` — print the parsed variable dictionary.
- `-D, --debug` — show the current `.nxd` line processed during conversion.

### Validation & Export

- `--validate` — validate the generated `.nxs` with `punx` when available.
- `--export-values-csv FILE` — export dataset values from a `.nxs`/HDF5 file to CSV.
- `--export-values-prefix PATH` — restrict CSV export to datasets under a prefix (must start with `/`).
- `--csv-delimiter CHAR` — override CSV delimiter (default `,`).