Multi-Material Files

An ENDF-6 file may contain several materials, stored one after another; such a file is traditionally called a tape. The ordinary parser expects a single material per file, but endf-parserpy provides a dedicated interface for tapes. It also covers the PENDF tapes produced by processing codes, which repeat the same material at several temperatures. On this page, we explain how to read, write and navigate such files.

Reading and writing a tape

The parse_tape_file() function reads a multi-material file and returns a list with one entry per material:

from endf_parserpy import parse_tape_file
materials = parse_tape_file('tape.endf')  # one entry per material
len(materials)                            # number of materials

Each tape operation comes as a pair: the _file variant works on a file path, the bare name on an ENDF-6 tape held in a string. parse_tape_file() reads a file, parse_tape() parses a string. Each entry of the list is an ordinary dictionary, identical to what the parsefile() method returns for a single-material file, and is therefore indexed by MF and then by MT number:

material = materials[0]      # the first material, a dict
section = material[3][2]     # its MF=3/MT=2 section, also a dict
section['AWR']               # a field of that section

As for parsefile(), the include and exclude arguments restrict parsing to parts of each material; sections that are not parsed are kept as lists of raw strings:

# parse only MF=3 of every material, keep the rest as raw text
materials = parse_tape_file('tape.endf', include=[3])

Because each material is an ordinary dictionary, modifying the data before writing it back is a plain assignment. To change, for instance, the atomic weight ratio in the MF1/MT451 section of the first material:

materials[0][1][451]['AWR'] = 63.5   # modify a value in place

The guide on ENDF-6 file plumbing covers modifying, adding and deleting data in more depth; the same operations apply to every material of a tape.

The reverse operation is the same pair the other way round: write_tape() assembles the materials into an ENDF-6 string, and write_tape_file() writes that tape to a file:

from endf_parserpy import write_tape, write_tape_file
write_tape_file(materials, 'output.endf')   # write to a file
text = write_tape(materials)                # or obtain the string

If a material cannot be parsed, the on_error argument decides what happens. With the default 'mark', the offending material is returned as a FailedMaterial object instead of a dictionary. It keeps the raw content of the material, so the remaining materials are still parsed and the tape can be written back without loss:

from endf_parserpy import FailedMaterial

materials = parse_tape_file('tape.endf')  # on_error='mark' is the default
for material in materials:
    if isinstance(material, FailedMaterial):
        # .mat is the MAT number, .exception the error that
        # occurred and .raw_lines the original text of the material
        print(material.mat, material.exception)
    else:
        ...   # an ordinary material dictionary

With on_error='raise' the first failure aborts the operation instead:

materials = parse_tape_file('tape.endf', on_error='raise')

For large tapes, the iter_parse_tape_file() function yields one material at a time instead of returning the complete list, so that the peak memory consumption stays bounded by the size of the largest material:

from endf_parserpy import iter_parse_tape_file
for material in iter_parse_tape_file('tape.endf'):
    ...   # one material, a dict or a FailedMaterial

Lazy access with EndfFile

When only some materials or sections of a large tape are relevant, parsing the complete file is wasteful. The EndfFile class indexes the file on construction and reads and parses an individual section from disk only when it is accessed:

from endf_parserpy import EndfFile
endf_file = EndfFile('tape.endf')
len(endf_file)                # number of materials on the tape

A material is addressed by its zero-based position on the tape. Indexing an EndfFile returns a MaterialView, a lightweight handle to one material; iterating over the file yields these handles in turn:

material = endf_file[0]            # a MaterialView
for material in endf_file:         # iterate over all materials
    print(material.position, material.mat, material.za)

Besides position, mat, za and awr, a MaterialView reports the sections the material contains:

material.sections()        # list of the (MF, MT) pairs present

A section is addressed on a material by an (MF, MT) pair. Accessing it parses that section and returns it as a dictionary; a section for which no recipe exists is returned as a list of raw strings instead:

section = endf_file[0][3, 2]       # parsed MF=3/MT=2 section, a dict

A whole material can also be lifted out of the tape as an ordinary single-material tape dictionary with the to_tape_dict() method. The result is a {MF: {MT: section}} mapping, the same form a single-material parse produces and complete with its MF=0/MT=0 tape head, so it can be handed straight to the parser’s writer or to write_tape():

material_dict = endf_file[0].to_tape_dict()   # one material as a tape dict
text = parser.write(material_dict)            # render it on its own

Because the same material number (MAT) may occur several times on a tape (a PENDF tape repeats it for every temperature), materials are identified by position rather than by MAT. The by_mat(), by_za() and find() methods look materials up by their identifiers:

material = endf_file.by_mat(2925)     # the single material with MAT 2925
materials = endf_file.by_za(29063)    # a list of materials with that ZA
materials = endf_file.find(mat=2925)  # a list matching every criterion

by_mat() returns a single MaterialView, whereas by_za() and find() return a list of them. If the MAT number is not unique, by_mat() raises AmbiguousMaterialError, and the copy of interest must then be selected with the occurrence argument:

material = endf_file.by_mat(2925, occurrence=0)   # the first such material

The sections of a material can be replaced, added or deleted, and whole materials can be deleted, appended or reordered. Every edit is kept in memory until the tape is written back:

endf_file[0][3, 2] = section          # replace (or add) a section
del endf_file[0][3, 18]               # delete a section
del endf_file[1]                      # delete the second material

A new material (an ordinary {MF: {MT: section}} mapping, such as one entry of a parse_tape_file() result) is appended with append_material(), which returns a MaterialView of the added material:

donor = parse_tape_file('other.endf')[0]            # a material dictionary
mat = donor[1][451]['MAT']                          # the MAT it carries
new_material = endf_file.append_material(donor, mat=mat)

The mat argument must equal the MAT number the material carries in its own records; it is rejected otherwise, since the records, not the argument, are what gets written to the tape.

The materials can be reordered by passing a permutation of their positions to reorder():

endf_file.reorder([1, 0])             # swap the first two materials

Finally, export() writes the edited tape to a file and to_string() returns it as an ENDF-6 string, the same memory/file pairing as the module functions. Sections that were not edited keep their data records verbatim from disk; the SEND/FEND/MEND framing and the column 76-80 sequence numbers are regenerated either way. Every data field is therefore preserved byte for byte, but the tape as a whole is not necessarily byte-identical to the original:

endf_file.export('edited.endf')               # write to a new file
text = endf_file.to_string()                  # or obtain the string

Exporting onto the very file the EndfFile was opened from is allowed, but it leaves the in-memory index out of step with the rewritten file. The object is therefore invalidated: any further use raises StaleSourceError, and the file must be re-opened to continue working with it.

endf_file.export('tape.endf', overwrite=True)  # overwrites the source
endf_file = EndfFile('tape.endf')              # re-open to continue

Note

The structural index that EndfFile builds on construction is faster to compute when NumPy is available. Installing the package with the fast extra pulls in this optional dependency; without it a pure-Python fallback is used.

Selecting a material by its content

On a tape that repeats the same material, the position is often not the most convenient way to pick a particular copy. A PENDF tape, for example, stores the same material at a series of temperatures, and one usually wants the copy at a specific temperature. The query() method selects materials by the value of a field in one of their sections and returns the matches as a list of MaterialView objects:

from endf_parserpy import EndfParserCpp, EndfFile

parser = EndfParserCpp(endf_format='pendf')
endf_file = EndfFile('file.pendf', parser=parser)

# the materials whose MF1/MT451 temperature is 293.6 K
room_temp = endf_file.query('1/451/TEMP', 293.6, tol=1.0)
xs = room_temp[0][3, 1]      # MF=3/MT=1 of the first match

The first argument is a path into an MF/MT section (here the TEMP field of the MF1/MT451 section), and the second the value to match; the tol argument allows for a numerical tolerance. Instead of a value, a predicate callable can be supplied to match on an arbitrary condition:

hot = endf_file.query('1/451/TEMP', predicate=lambda t: t > 1000.0)

If the same lookup is needed repeatedly, the build_index() method parses the section once per material and returns a dictionary that maps each field value to the list of material positions carrying it:

temperatures = endf_file.build_index('1/451/TEMP')
# e.g. {293.6: [0, 3], 600.0: [1, 4], ...}
positions = temperatures[293.6]

Passing a list of section paths instead of a single one builds a composite index: the key becomes the tuple of the values at the given paths, in order. The paths may address fields in different sections, and a material that lacks any of them is left out:

index = endf_file.build_index(['1/451/ZA', '1/451/TEMP'])
# e.g. {(29063.0, 293.6): [0], (30064.0, 293.6): [1], ...}
positions = index[(29063.0, 293.6)]

A single value can also be retrieved directly with the get() method and a material-qualified path. Such a path, described by the EndfMaterialPath class, extends an ordinary EndfPath with a leading material selector — a MAT number, MAT#k for the k-th material carrying that MAT number, or #k for the material at position k:

endf_file.get('#0/1/451/AWR')        # AWR of the material at position 0
endf_file.get('2925#0/3/2')          # MF=3/MT=2 of the 1st MAT-2925 material
endf_file.get('2925#1/1/451/TEMP')   # a field of the 2nd MAT-2925 material

A bare MAT number with no #k selector picks the material with that MAT only when it is unique on the tape; if the MAT number repeats, as it does on a PENDF tape, it must be qualified with #k or the lookup raises AmbiguousMaterialError.

The path may stop at a section, in which case the whole section is returned, or continue into it to address a single field.

Path-addressed access and editing

The get() method has a shorter spelling: an EndfFile can be indexed directly with an EndfMaterialPath. The [], []=, del and in operators all accept such a path (a string or an EndfMaterialPath object) in addition to an integer material position, so a tape reads and edits like a path-addressable mapping:

awr = endf_file['9237#1/3/2/AWR']     # read a field
endf_file['9237#1/3/2/AWR'] = 63.5    # write a field
section = endf_file['#0/3/2']         # read a whole section
del endf_file['#0/3/18']              # delete a section
del endf_file['#1']                   # delete a material
present = '#0/1/451/TEMP' in endf_file  # test for presence

Every such edit, whether a field write, a section or material deletion, an append_material() or a reorder(), only changes the in-memory tape. The file on disk is never touched until the tape is written out explicitly with export() (or to_string()); without that call the edits are discarded when the EndfFile object goes away.

endf_file.get(path) is the explicit-method synonym of endf_file[path]; both return the same thing: a MaterialView for a material-depth path, a section for an MF/MT path, and the value at the field for a deeper path.

A retrieved section is not a plain dictionary but a view over the tape, and what that view permits is governed by the check_edits argument of the EndfFile constructor:

from endf_parserpy import EndfFile

strict = EndfFile('tape.endf')                          # check_edits='eager'
relaxed = EndfFile('tape.endf', check_edits='deferred')

With check_edits='eager' (the default) every edit is rendered through the parser’s writer immediately, so a change that breaks the ENDF recipe raises SectionRenderError at the offending assignment. A section retrieved in this mode is a read-only view; to edit it, take a standalone copy with its detach() method, change the copy and assign it back:

section = strict['#0/3/2'].detach()   # a plain, mutable dict
section['QI'] = 0.0
strict['#0/3/2'] = section            # rendered and checked here

With check_edits='deferred' a retrieved section is instead a live view: assigning into it writes straight through to the tape, exactly as for an EndfDict. Recipe-conformity is then checked only when the tape is written out, or on demand via invalid_edits(), which returns the edited sections that fail to render:

relaxed['#0/3/2']['QI'] = 0.0         # writes through to the tape
if not relaxed.invalid_edits():       # empty list -> every edit is valid
    ...

A view — frozen or live — is itself path-addressable: a string key is read as an EndfPath relative to the view, so relaxed['#0/3/2']['xstable/E'] and relaxed['#0/3/2/xstable/E'] reach the same data.

Bounded memory and parallel processing

Because EndfFile parses sections lazily, it can open a tape far larger than the available memory. Parsed and raw sections are kept in two caches of a fixed byte budget, set by the parsed_cache_bytes and raw_cache_bytes constructor arguments; once a budget is exhausted the least-recently-used entries are evicted and re-read on the next access:

# 16 MiB for each cache tier instead of the 64 MiB default
endf_file = EndfFile('huge.endf', parsed_cache_bytes=16 << 20,
                     raw_cache_bytes=16 << 20)

The cache_nbytes property reports the current (raw, parsed) cache occupancy, and the unload() method drops the cached sections of one material (or, with no argument, of the whole tape) without discarding any pending edits.

The parser objects are picklable, so a configured parser can be shipped to a pool of worker processes. Together with the fast, index-only construction of EndfFile, this makes it straightforward to scan or parse a whole library of files in parallel:

from concurrent.futures import ProcessPoolExecutor
from functools import partial
from endf_parserpy import EndfParserFactory, EndfFile

parser = EndfParserFactory.create(select='fastest')

def material_count(path, parser):
    return path, len(EndfFile(path, parser=parser))

with ProcessPoolExecutor() as pool:
    worker = partial(material_count, parser=parser)  # parser is pickled
    counts = dict(pool.map(worker, library_files))

Tip

Two runnable scripts in the source repository exercise this interface end to end: examples/example-002-multimaterial-tapes.py builds, explores and edits a multi-material tape, and examples/example-003-bounded-memory.py demonstrates opening, editing and exporting a tape larger than the available memory with a bounded memory footprint.