pyissm.model.classes.cluster

Cluster classes for ISSM.

Classes

`gadi`([config_file, other])	Gadi HPC cluster interface for ISSM job submission and management.
`generic`([config_file, other])	Generic cluster class for ISSM.

class pyissm.model.classes.cluster.gadi(config_file=None, other=None)

Bases: manage_state

Gadi HPC cluster interface for ISSM job submission and management.

This class represents the Gadi HPC cluster at the National Computational Infrastructure (NCI) and provides methods for configuring cluster parameters, building PBS queue scripts, and managing job submission and file transfers.

The Gadi cluster uses PBS Pro for job scheduling and supports parallel execution via MPI. Configuration can be provided via YAML config files or programmatically through object attributes.

Parameters:

config_file (str, optional) – Path to YAML configuration file containing cluster parameters. If provided, will override default parameters with values from the file.
other (object, optional) – Another cluster object to inherit matching fields from.

name

Hostname of the cluster. Defaults to ‘gadi.nci.org.au’ if not on Gadi.

Type:: str

login

Login username for the cluster. Must be provided for cluster access.

Type:: str

np

Number of processors to use for job execution. Default is 16.

Type:: int

memory

Memory per node in GB. Default is 40.

Type:: int

port

SSH port number for cluster connection. Default is 0.

Type:: int

queue

PBS queue name. Options include ‘normal’, ‘express’, ‘hugemem’. Default is ‘normal’.

Type:: str

time

Walltime limit for job execution in minutes. Default is 60.

Type:: int

codepath

Path to the ISSM executable directory (e.g., $ISSM_DIR/bin). Must be provided.

Type:: str

executionpath

Path to the execution/working directory on the cluster. Must be provided.

Type:: str

project

NCI project code for job submission. Must be provided.

Type:: str

storage

Storage paths to access (e.g., ‘gdata/XXX+scratch/XXX’). Must be provided.

Type:: str

moduleload

List of module load commands needed for PBS job execution.

Type:: list of str

moduleuse

List of module use commands to specify module paths.

Type:: list of str

Notes

All required attributes (login, codepath, executionpath, project, storage) must be set either via configuration file or programmatically before building and launching jobs. The moduleload and moduleuse lists must have equal length.

Queue specifications:

normal: 48 hours on up to 3072 cores

express: 2 hours on up to 960 cores

hugemem: 48 hours on up to 3072 cores

Examples

>>> cluster = gadi(config_file='gadi_config.yaml')
>>> cluster.np = 32
>>> cluster.queue = 'express'

build_queue_script(dir_name, model_name, solution, io_gather, is_valgrind, is_gprof, is_dakota, is_ocean_coupling, executable='issm.exe')

Generate a PBS queue submission script for running ISSM models on the Gadi cluster. The script includes resource specifications, module loading, and execution commands.

Parameters:

dir_name (str) – Directory name where the model execution files are stored.
model_name (str) – Name of the model, used for output file naming.
solution (str) – Solution type or identifier to pass to the executable.
io_gather (bool) – If True, output files are pre-gathered. If False, output binary files are concatenated after execution.
is_valgrind (bool) – If True, raises NotImplementedError as Valgrind is not supported.
is_gprof (bool) – If True, raises NotImplementedError as gprof is not supported.
is_dakota (bool) – If True, raises NotImplementedError as DAKOTA is not supported.
is_ocean_coupling (bool) – If True, raises NotImplementedError as ocean coupling is not supported.
executable (str, optional) – Name of the executable to run. Default is ‘issm.exe’.

Raises:

IOError – If Python wrappers are not installed.
NotImplementedError – If any of the unsupported features (DAKOTA, ocean coupling, Valgrind, gprof) are requested.

Returns:

Writes a queue script file named ‘{model_name}.queue’ to the current directory.

Return type:

None

check_consistency(md, solution, analyses)

Check consistency of the [cluster.gadi] parameters.

Parameters:

md (pyissm.model.Model) – The model object to check.
solution (str) – The solution name to check.
analyses (list of str) – List of analyses to check consistency for.

Returns:

md – The model object with any consistency errors noted.

Return type:

pyissm.model.Model

download(dir_name, file_list)

Download files from a remote cluster to the local machine.

Parameters:

dir_name (str) – The name of the directory on the remote cluster containing the files to download.
file_list (list of str) – A list of filenames to download from the remote cluster directory.

Return type:

None

launch_queue_job(model_name, dir_name, restart=None, batch=False)

Launch a job on the Gadi cluster queue system.

This method submits a job to the Gadi PBS queue system. It handles both fresh job submissions and job restarts, with optional batch processing mode.

Parameters:

model_name (str) – Name of the model to be executed on the cluster.
dir_name (str) – Name of the directory where the job will be executed.
restart (bool or None, optional) – If not None, indicates this is a restart of an existing job. When restarting, the method assumes the job directory already exists and only submits the queue script via qsub. Default is None.
batch (bool, optional) – Flag indicating whether to run in batch mode. Currently unused for Gadi cluster but maintained for interface compatibility. Default is False.

Return type:

None

Notes

The method performs different operations based on the restart parameter:

If restart is not None: Changes to the existing execution directory and

submits the queue script using qsub. - If restart is None: Removes any existing directory, creates a new one, moves and extracts the tar.gz file, then submits the queue script using qsub.

The job is launched via SSH connection to the cluster using the cluster’s name, login credentials, and port configuration.

Examples

# Launch a new job:
>>> cluster.launch_queue_job('simulation_01', 'run_dir')
# Restart an existing job:
>>> cluster.launch_queue_job('simulation_01', 'run_dir', restart=True)

upload_queue_job(model_name, dir_name, file_list)

Upload job files to the cluster queue system.

This method compresses the specified files into a tar.gz archive and transfers it to the cluster using SCP. If running in interactive mode, also includes error and output log files in the archive.

Parameters:

model_name (str) – Name of the model, used for naming log files in interactive mode (not used here).
dir_name (str) – Name of the directory/archive to be created (without extension).
file_list (list of str) – List of file paths to be included in the compressed archive.

Notes

The function creates a tar.gz archive with the name {dir_name}.tar.gz containing all files in file_list. The compressed archive is then transferred to the cluster using the cluster’s configured connection parameters (name, execution path, login, and port).

class pyissm.model.classes.cluster.generic(config_file=None, other=None)

Bases: manage_state

Generic cluster class for ISSM.

This class provides a generic interface for managing cluster configurations and job execution in the ISSM framework. It handles cluster parameters, queue script generation, job submission, and result retrieval.

Parameters:

config_file (str, optional) – Path to YAML configuration file containing cluster parameters. If provided, will override default parameters with values from the file.
other (object, optional) – Another cluster object to inherit matching fields from.

name

Name of the cluster (defaults to hostname).

Type:: str

login

Login username for the cluster (defaults to current username).

Type:: str

np

Number of processors to use (default: 1).

Type:: int

port

Port number for connections (default: 0).

Type:: int

interactive

Interactive mode flag (default: 1).

Type:: int

codepath

Path to the ISSM executables directory (default: $ISSM_DIR/bin).

Type:: str

executionpath

Path to the execution directory on the cluster (default: $ISSM_DIR/execution).

Type:: str

valgrind

Path to valgrind executable for memory debugging (default: $ISSM_DIR/externalpackages/valgrind/bin/valgrind).

Type:: str

valgrindlib

Path to valgrind MPI debug library (default: $ISSM_DIR/externalpackages/valgrind/install/lib/libmpidebug.so).

Type:: str

valgrindsup

List of valgrind suppression files (default: $ISSM_DIR/externalpackages/valgrind/issm.supp).

Type:: list of str

verbose

Verbose output flag (default: 1).

Type:: int

shell

Shell to use for command execution (default: ‘/bin/sh’).

Type:: str

Notes

Configuration parameters can be overridden via YAML configuration files or by inheriting from other cluster objects.

Examples

>>> cluster = generic()
>>> cluster.np = 4
>>> cluster.name = 'my_cluster'

>>> cluster = generic(config_file='cluster_config.yaml')

build_kriging_queue_script(model_name, solution, io_gather, is_valgrind, is_gprof, executable='kriging.exe')

Build a queue script for executing kriging models on the cluster.

This method generates platform-specific execution scripts (bash for Linux/Mac, batch for Windows) that handle kriging model execution with various configurations including MPI, debugging tools, and profiling.

Parameters:

model_name (str) – Name of the kriging model to execute.
solution (str) – Solution type or configuration parameter.
io_gather (bool) – Flag indicating whether to gather I/O operations. If False, output files will be concatenated.
is_valgrind (bool) – Flag to enable Valgrind memory debugging tool execution.
is_gprof (bool) – Flag to enable gprof profiling tool execution.
executable (str, optional) – Name of the executable file to run. Default is ‘kriging.exe’.

Raises:

IOError – If Python wrappers are not installed.

Notes

On Linux/Mac systems, creates a ‘.queue’ bash script
On Windows systems, creates a ‘.bat’ batch script
Automatically handles MPI execution for kriging operations
In interactive mode, creates empty error and output log files
Supports memory debugging with Valgrind and profiling with gprof
Specifically designed for kriging executable execution

build_queue_script(dir_name, model_name, solution, io_gather, is_valgrind, is_gprof, is_dakota, is_ocean_coupling, executable='issm.exe')

Build a queue script for executing ISSM models on the cluster.

This method generates platform-specific execution scripts (bash for Linux/Mac, batch for Windows) that handle model execution with various configurations including MPI, debugging tools, and specialized executables.

Parameters:

dir_name (str) – Directory name where the model files are located.
model_name (str) – Name of the model to execute.
solution (str) – Solution type or configuration parameter.
io_gather (bool) – Flag indicating whether to gather I/O operations. If False, output files will be concatenated.
is_valgrind (bool) – Flag to enable Valgrind memory debugging tool execution.
is_gprof (bool) – Flag to enable gprof profiling tool execution.
is_dakota (bool) – Flag to use DAKOTA optimization executable.
is_ocean_coupling (bool) – Flag to use ocean coupling executable.
executable (str, optional) – Name of the executable file to run. Default is ‘issm.exe’.

Raises:

IOError – If Python wrappers are not installed or if DAKOTA support is requested but not available in the ISSM build.

Notes

On Linux/Mac systems, creates a ‘.queue’ bash script
On Windows systems, creates a ‘.bat’ batch script
Automatically handles MPI execution when available
In interactive mode, creates empty error and output log files
Supports various debugging and profiling tools integration
Handles different executable types based on coupling requirements

check_consistency(md, solution, analyses)

Check consistency of the [cluster.generic] parameters.

Parameters:

md (pyissm.model.Model) – The model object to check.
solution (str) – The solution name to check.
analyses (list of str) – List of analyses to check consistency for.

Returns:

md – The model object with any consistency errors noted.

Return type:

pyissm.model.Model

download(dir_name, file_list)

Download files from a remote cluster to the local machine.

This method retrieves specified files from a remote cluster directory to the current local directory. On Windows systems, this operation is skipped as it’s not supported.

Parameters:

dir_name (str) – The name of the directory on the remote cluster containing the files to download.
file_list (list of str) – A list of filenames to download from the remote cluster directory.

Return type:

None

Notes

This method does nothing on Windows platforms and returns immediately.
Files are copied from the cluster’s execution path combined with the

specified directory name. - The actual file transfer is handled by the model.io.issm_scp_in function.

launch_queue_job(model_name, dir_name, restart=None, batch=False)

Launch a job on the cluster queue system.

This method builds and executes the appropriate launch command for submitting a job to the cluster’s queue system. It handles both fresh job submissions and job restarts, with optional batch processing mode.

Parameters:

model_name (str) – Name of the model to be executed on the cluster.
dir_name (str) – Name of the directory where the job will be executed.
restart (bool or None, optional) – If not None, indicates this is a restart of an existing job. When restarting, the method assumes the job directory already exists and only executes the queue script. Default is None.
batch (bool, optional) – Flag indicating whether to run in batch mode. When True, only extracts the tar.gz file without executing the queue script. When False (default), extracts and immediately executes the job. Only relevant when restart is None.

Notes

The method performs different operations based on the restart parameter:

If restart is not None: Changes to the execution directory and runs
the existing queue script.

If restart is None: Removes any existing directory, creates a new one,
moves and extracts the tar.gz file, and optionally runs the queue script depending on the batch parameter.

The job is launched via SSH connection to the cluster using the cluster’s name, login credentials, and port configuration.

Examples

# Launch a new job:
>>> cluster.launch_queue_job('simulation_01', 'run_dir')
# Restart an existing job:
>>> cluster.launch_queue_job('simulation_01', 'run_dir', restart=True)
# Launch in batch mode (extract only, no execution):
>>> cluster.launch_queue_job('simulation_01', 'run_dir', batch=True)

upload_queue_job(model_name, dir_name, file_list)

Upload job files to the cluster queue system.

Compresses the specified files into a tar.gz archive and transfers it to the cluster using SCP. If running in interactive mode, also includes error and output log files in the archive.

Parameters:

model_name (str) – Name of the model, used for naming log files in interactive mode.
dir_name (str) – Name of the directory/archive to be created (without extension).
file_list (list of str) – List of file paths to be included in the compressed archive.

Notes

The function creates a tar.gz archive with the name {dir_name}.tar.gz containing all files in file_list. In interactive mode, it also includes {model_name}.errlog and {model_name}.outlog files. The compressed archive is then transferred to the cluster using the cluster’s configured connection parameters (name, execution path, login, and port).