Design notes
Pipelines
Internally S1 Tiling defines a series of pipelines. Actually, it distinguishes pipeline descriptions from actual pipelines. The actual pipelines are generated from their description and input files, and they are handled internally; they won’t be described here.
Each pipeline corresponds to a series of processings. The intended and original design is to have a direct match: one processing == one OTB application, and to permit to chain OTB applications in memory through OTB Python bindings.
However, a processing doesn’t always turn into the execution of an OTB application, sometimes we need to do other computations like calling a python function or executing an external program. Other times, we just need to do some analysis that will be reused later on in the pipeline.
When we need to have files produced at some point, we end a pipeline, the next one(s) can take over from that point.
This class is the main entry point to describe pipelines. |
|
Defines the prototype of |
Simple pipelines
In simple cases, we can chain the output of an in-memory pipeline of OTB applications into the next pipeline.
At this moment, the following sequence of pipelines is defined:
pipelines = PipelineDescriptionSequence(config)
pipelines.register_pipeline([AnalyseBorders, Calibrate, CutBorders], 'PrepareForOrtho', product_required=False)
pipelines.register_pipeline([OrthoRectify], 'OrthoRectify', product_required=False)
pipelines.register_pipeline([Concatenate], product_required=True)
if config.mask_cond:
pipelines.register_pipeline([BuildBorderMask, SmoothBorderMask], 'GenerateMask', product_required=True)
For instance, to minimize disk usage, we could chain in-memory orthorectification directly after the border cutting by removing the second pipeline, and by registering the following step into the first pipeline instead:
pipelines.register_pipeline([AnalyseBorders, Calibrate, CutBorders, OrthoRectify],
'OrthoRectify', product_required=False)
Complex pipelines
In more complex cases, the product of a pipeline will be used as input of several other pipelines. Also a pipelines can have several inputs coming from different other pipelines.
To do so, we name each pipeline, so we can use that name as input of other pipelines.
For instance, LIA producing pipelines are described this way
pipelines = PipelineDescriptionSequence(config, dryrun=dryrun)
dem = pipelines.register_pipeline([AgglomerateDEMOnS1],
'AgglomerateDEMOnS1',
inputs={'insar': 'basename'})
demproj = pipelines.register_pipeline([ExtractSentinel1Metadata, SARDEMProjection],
'SARDEMProjection',
is_name_incremental=True,
inputs={'insar': 'basename', 'indem': dem})
xyz = pipelines.register_pipeline([SARCartesianMeanEstimation],
'SARCartesianMeanEstimation',
inputs={'insar': 'basename', 'indem': dem, 'indemproj': demproj})
lia = pipelines.register_pipeline([ComputeNormals, ComputeLIAOnS1],
'Normals|LIA',
is_name_incremental=True,
inputs={'xyz': xyz})
# "inputs" parameter doesn't need to be specified in all the following
# pipeline declarations but we still use it for clarity!
ortho = pipelines.register_pipeline([filter_LIA('LIA'), OrthoRectifyLIA],
'OrthoLIA',
inputs={'in': lia},
is_name_incremental=True)
concat = pipelines.register_pipeline([ConcatenateLIA],
'ConcatLIA',
inputs={'in': ortho})
select = pipelines.register_pipeline([SelectBestCoverage],
'SelectLIA',
product_required=True,
inputs={'in': concat})
ortho_sin = pipelines.register_pipeline([filter_LIA('sin_LIA'), OrthoRectifyLIA],
'OrthoSinLIA',
inputs={'in': lia},
is_name_incremental=True)
concat_sin = pipelines.register_pipeline([ConcatenateLIA],
'ConcatSinLIA',
inputs={'in': ortho_sin})
select_sin = pipelines.register_pipeline([SelectBestCoverage],
'SelectSinLIA',
product_required=True,
inputs={'in': concat_sin})
Pipeline inputs
In order to buid the Direct Acyclic Graph (DAG) of tasks, that will be executed through the pipelines described, we need to inject inputs.
Pipeline inputs need to be registered explicitly. This is done through
FirstStepFactories
passed to
PipelineDescriptionSequence.register_inputs
.
Each FirstStepFactory
takes care of returning a list of FirstSteps
. These FirstSteps
are expected to hold
metadata that will be used to generate the DAG of tasks. They may also obtain
related products on-the-fly. For instance:
s1_raster_first_inputs_factory()
and eof_first_inputs_factory()
first check which products are already on disk before trying to download the
missing ones.
e.g.:
pipelines.register_inputs('basename', s1_raster_first_inputs_factory)
pipelines.register_inputs('basename', tilename_first_inputs_factory)
pipelines.register_inputs('basename', eof_first_inputs_factory)
As the PipelineDescriptionSequence
tries to be as independant of the actual
domain as possible, it doesn’t know which information is expected by all the
registered FirstStepFactories
. By default,
Configuration
information
is passed. But some other information needs to be declared in one or several
calls to
PipelineDescriptionSequence.register_extra_parameters_for_input_factories
.
e.g.:
pipelines.register_extra_parameters_for_input_factories(
tile_name=tilename, # Used by all
)
pipelines.register_extra_parameters_for_input_factories(
dag=dag, # Used by eof_first_inputs_factory
s1_file_manager=s1_file_manager, # Used by s1_raster_first_inputs_factory
dryrun=dryrun, # Used by all
)
Note
In simplified developer jardon, we use Factory Method design pattern to inverse dependencies.
Dask: tasks
Given pipeline descriptions, a requested S2 tile and its intersecting S1 images, S1 Tiling builds a set of dependant Dask tasks. Each task corresponds to an actual pipeline which will transform a given image into another named image product.
Processing Classes
Again the processing classes are split in two families:
the factories:
StepFactory
the instances:
Step
Step Factories
Step factories are the main entry point to add new processings. They are meant
to inherit from either one of OTBStepFactory
, AnyProducerStepFactory
, or ExecutableStepFactory
.
They describe processings, and they are used to instanciate the actual step that do the processing.
|
Abstract factory for |
Abstract class that factorizes filename transformations and parameter handling for Steps that produce files, either with OTB or through external calls. |
|
|
Abstract StepFactory for all OTB Applications. |
Abstract StepFactory for executing any Python made step. |
|
Abstract StepFactory for executing any external program. |
|
|
Factory for Artificial Step that forces the result of the previous app sequence to be stored on disk by breaking in-memory connection. |
Steps
Step types are usually instantiated automatically. They are documented for convenience, but they are not expected to be extended.
FirstStep
is instantiated automatically by the program from existing files (downloaded, or produced by a pipeline earlier in the sequence of pipelines)MergeStep
is also instantiated automatically as an alternative toFirstStep
in the case of steps that expect several input files of the same type. This is for instance the case ofConcatenate
inputs. A step is recognized to await several inputs when the dependency analysis phase found several possible inputs that lead to a product.Step
is the main class for steps that execute an OTB application.AnyProducerStep
is the main class for steps that execute a Python function.ExecutableStep
is the main class for steps that execute an external application.AbstractStep
is the root class of steps hierarchy. It still get instantiated automatically for steps not related to any kind of application.
Internal root class for all actual steps. |
|
|
First step instances are the pipeline staring points. |
Kind of FirstStep that merges the result of one or several other steps of same kind. |
|
Root class for all Steps that produce files |
|
|
Step that have a reference to an OTB application. |
|
Internal specialized Step that holds a binding to an OTB Application. |
|
Kind of OTB Step that forwards the OTB application of the previous step in the pipeline. |
|
Generic step for running any Python code that produce files. |
|
Generic step for calling any external application. |
|
Artificial Step that takes care of executing the last OTB application in the pipeline. |
Existing processings
The domain processings are defined through
StepFactory
subclasses, which in
turn will instantiate domain unaware subclasses of AbstractStep
for the actual processing.
Main processings
Factory that takes care of extracting meta data from S1 input files. |
|
StepFactory that analyses whether image borders need to be cut as described in Margins cutting documentation. |
|
Factory that prepares steps that run SARCalibration as described in SAR Calibration documentation. |
|
Factory that prepares steps that run ResetMargin as described in Margins cutting documentation. |
|
Factory that prepares steps that run OrthoRectification as described in Orthorectification documentation. |
|
Abstract factory that prepares steps that run Synthetize as described in Concatenation documentation. |
|
Factory that prepares the first step that generates border maks as described in Border mask generation documentation. |
|
Factory that prepares the first step that smoothes border maks as described in Border mask generation documentation. |
|
Factory that prepares the first step that smoothes border maks as described in Spatial despeckle filtering documentation. |
Processings for advanced calibration
These processings permit to produce Local Incidence Angles Maps for σ0NORMLIM calibration.
Factory that produces a |
|
Factory that produces a |
|
Factory that produces a |
|
Factory that produces a |
|
|
Factory that prepares steps that run Applications/app_SARComputeGroundAndSatPositionsOnDEM as described in Compute ECEF ground and satellite positions on S2 documentation to obtain the XYZ ECEF coordinates of the ground and of the satellite positions associated to the pixels from the input height file. |
Factory that prepares steps that run ExtractNormalVector on images in S2 geometry as described in Normals computation documentation. |
|
Factory that prepares steps that run SARComputeLocalIncidenceAngle on images in S2 geometry as described in LIA maps computation documentation. |
|
|
Generates a new |
Factory that concludes σ0 with NORMLIM calibration. |
Deprecated processings for advanced calibration
The following processings used to be used in v1.0 of S1Tiling, along some of the previous ones. Starting from v1.1, they are deprecated.
Factory that produces a |
|
Factory that prepares steps that run SARDEMProjection as described in Normals computation documentation. |
|
Factory that prepares steps that run SARCartesianMeanEstimation as described in Normals computation documentation. |
|
Factory that prepares steps that run OrthoRectification on LIA maps. |
|
Factory that prepares steps that run ExtractNormalVector on images in S1 geometry as described in Normals computation documentation. |
|
Factory that prepares steps that run SARComputeLocalIncidenceAngle on images in S1 geometry as described in LIA maps computation documentation. |
|
Factory that prepares steps that run Synthetize on LIA images. |
|
StepFactory that helps select only one path after LIA concatenation: the one that have the best coverage of the S2 tile target. |
Filename generation
At each step, product filenames are automatically generated by
StepFactory.update_filename_meta
function.
This function is first used to generate the task execution graph. (It’s still
used a second time, live, but this should change eventually)
The exact filename generation is handled by
StepFactory.build_step_output_filename
and
StepFactory.build_step_output_tmp_filename
functions to define the final filename and the working filename (used when the
associated product is being computed).
In some very specific cases, where no product is generated, these functions
need to be overridden. Otherwise, a default behaviour is proposed in
_FileProducingStepFactory
constructor.
It is done through the parameters:
gen_tmp_dir
: that defines where temporary files are produced.gen_output_dir
: that defines where final files are produced. When this parameter is left unspecified, the final product is considered to be a intermediary files and it will be stored in the temporary directory. The distinction is useful for final and required products.gen_output_filename
: that defines the naming policy for both temporary and final filenames.
Important
As the filenames are used to define the task execution graph, it’s important that every possible product (and associated production task) can be uniquely identified without any risk of ambiguity. Failure to comply will destabilise the data flows.
If for some reason you need to define a complex data flow where an output
can be used several times as input in different Steps, or where a Step has
several inputs of same or different kinds, or where several products are
concurrent and only one would be selected, please check all
StepFactories
related to
LIA dataflow.
Available naming policies
Three filename generators are available by default. They apply a transformation
on the basename
meta information.
|
Given a pair |
|
Given a template: |
Some steps produce several products. |
Hooks
StepFactory._update_filename_meta_pre_hook
Sometimes it’s necessary to analyse the input files, and/or their names before
being able to build the output filename(s). This is meant to be done by
overriding
StepFactory._update_filename_meta_pre_hook
method. Lightweight analysing is meant to be done here, and its result can
then be stored into meta
dictionary, and returned.
It’s typically used alongside
TemplateOutputFilenameGenerator
.
StepFactory._update_filename_meta_post_hook
StepFactory.update_filename_meta
provides various values to metadata. This hooks permits to override the values
associated to task names, product existence tests, and so on.