:orphan:

.. _slep_006_other:

Alternative solutions to sample-aligned meta-data
=================================================

This page contains alternative solutions that have been discussed
and finally not considered in the SLEP.

Solution sketches require these definitions:

.. literalinclude:: defs.py

Status quo solution 0a: additional feature
------------------------------------------

Without changing scikit-learn, the following hack can be used:

Additional numeric features representing sample props can be appended to the
data and passed around, being handled specially in each consumer of features
or sample props.

.. literalinclude:: cases_opt0a.py

Status quo solution 0b: Pandas Index and global resources
---------------------------------------------------------

Without changing scikit-learn, the following hack can be used:

If `y` is represented with a Pandas datatype, then its index can be used to
access required elements from props stored in a global namespace (or otherwise
made available to the estimator before fitting). This is possible everywhere
that a ground-truth `y` is passed, including fit, split, score, and metrics.
A similar solution with `X` is also possible (except for metrics), if all
Pipeline components retain the original Pandas Index.

Issues:

* use of global data source
* requires Pandas data types and indices to be maintained

.. literalinclude:: cases_opt0b.py

Solution 1: Pass everything
---------------------------

This proposal passes all props to all consumers (estimators, splitters,
scorers, etc). The consumer would optionally use props it is familiar with by
name and disregard other props.

We may consider providing syntax for the user to control the interpretation of
incoming props:

* to require that some prop is provided (for an estimator where that prop is
  otherwise optional)
* to disregard some provided prop
* to treat a particular prop key as having a certain meaning (e.g. locally
  interpreting 'scoring_sample_weight' as 'sample_weight').

These constraints would be checked by calling a helper at the consumer.

Issues:

* Error handling: if a key is optional in a consumer, no error will be
  raised for misspelling. An introspection API might change this, allowing a
  user or meta-estimator to check if all keys passed are to be used in at least
  one consumer.
* Forwards compatibility: newly supporting a prop key in a consumer will change
  behaviour. Other than a ChangedBehaviorWarning, I don't see any way around
  this.
* Introspection: not inherently supported. Would need an API like
  ``get_prop_support(names: List[str]) -> Dict[str, Literal["supported", "required", "ignored"]]``.

In short, this is a simple solution, but prone to risk.

.. literalinclude:: cases_opt1.py


Solution 2: Specify routes at call
----------------------------------

Similar to the legacy behavior of fit parameters in
:class:`sklearn.pipeline.Pipeline`, this requires the user to specify the
path for each "prop" to follow when calling `fit`.  For example, to pass
a prop named 'weights' to a step named 'spam' in a Pipeline, you might use
`my_pipe.fit(X, y, props={'spam__weights': my_weights})`.

SLEP004's syntax to override the common routing scheme falls under this
solution.

Advantages:

* Very explicit and robust to misspellings.

Issues:

* The user needs to know the nested internal structure, or it is easy to fail
  to pass a prop to a specific estimator.
* A corollary is that prop keys need changing when the developer modifies their
  estimator structure (see case C).
* This gets especially tricky or impossible where the available routes
  change mid-fit, such as where a grid search considers estimators with
  different   structures.
* We would need to find a different solution for :issue:`2630` where a Pipeline
  could not be the base estimator of AdaBoost because AdaBoost expects the base
  estimator to accept a fit param keyed 'sample_weight'.
* This may not work if a meta-estimator were to have the role of changing a
  prop, e.g. a meta-estimator that passes `sample_weight` corresponding to
  balanced classes onto its base estimator.  The meta-estimator would need a
  list of destinations to pass modified props to, or a list of keys to modify.
* We would need to develop naming conventions for different routes, which may
  be more complicated than the current conventions; while a GridSearchCV
  wrapping a Pipeline currently takes parameters with keys like
  `{step_name}__{prop_name}`, this explicit routing, and conflict with
  GridSearchCV routing destinations, implies keys like
  `estimator__{step_name}__{prop_name}`.

.. literalinclude:: cases_opt2.py


Solution 3: Specify routes on metaestimators
--------------------------------------------

Each meta-estimator is given a routing specification which it must follow in
passing only the required parameters to each of its children. In this context,
a GridSearchCV has children including `estimator`, `cv` and (each element of)
`scoring`.

Pull request :pr:`9566` and its extension in :pr:`15425` are partial
implementations of this approach.

A major benefit of this approach is that it may allow only prop routing
meta-estimators to be modified, not prop consumers.

All consumers would be required to check that 

Issues:

* Routing may be hard to get one's head around, especially since the prop
  support belongs to the child estimator but the parent is responsible for the
  routing.
* Need to design an API for specifying routings.
* As in Solution 2, each local destination for routing props needs to be given
  a name.
* Every router along the route will need consistent instructions to pass a
  specific prop to a consumer. If the prop is optional in the consumer, routing
  failures may be hard to identify and debug.
* For estimators to be cloned, this routing information needs to be cloned with
  it. This implies one of: the routing information be stored as a constructor
  parameter; or `clone` is extended to explicitly copy routing information.

Possible public syntax:

Each meta-estimator has a `prop_routing` parameter to encode local routing
rules, and a set of named children which it routes to. In :pr:`9566`, the
`prop_routing` entry for each child may be a white list or black list of
named keys passed to the meta-estimator.

.. literalinclude:: cases_opt3.py