Skip to content

schema

ModelSchema

Bases: BaseSchema

Schema for a machine learning model.

Schema

Operations for working with machine learning model schemas.

from_df classmethod

from_df(
    problem_type: ProblemType,
    df: pd.DataFrame,
    target_column_name: Optional[str] = None,
    timestamp_column_name: Optional[str] = None,
    prediction_column_name: Optional[str] = None,
    prediction_score_column_name_or_mapping: Optional[
        Union[str, Dict[str, str]]
    ] = None,
    identifier_column_name: Optional[str] = None,
    feature_columns: Dict[str, FeatureType] = {},
    ignore_column_names: Union[str, Collection[str]] = (),
) -> ModelSchema

Create a schema from a pandas dataframe.

Sends a sample of the dataframe to the NannyML Cloud API to inspect the schema. Heuristics are used to identify what each column represents. The schema is then modified according to the provided arguments.

Parameters:

Name Type Description Default
problem_type ProblemType

The problem type of the model.

required
df DataFrame

The pandas dataframe to create a schema from.

required
target_column_name Optional[str]

The name of the target column. Any column that heuristics identified as target will be changed to a feature column.

None
timestamp_column_name Optional[str]

The name of the timestamp column. Any column that heuristics identified as timestamp will be changed to a feature column.

None
prediction_column_name Optional[str]

The name of the prediction column. Any column that heuristics identified as prediction will be changed to a feature column.

None
prediction_score_column_name_or_mapping Optional[Union[str, Dict[str, str]]]

This parameter accepts two formats depending on problem type.

  • For binary classification and regression, this should be the name of the prediction score column.
  • For multiclass classification, it should be a dict mapping prediction score column names to class names, e.g. {'prediction_score_1': 'class_1', 'prediction_score_2': 'class_2'}.
None
identifier_column_name Optional[str]

The name of the identifier column. Any column that heuristics identified as identifier will be changed to a feature column.

None
feature_columns Dict[str, FeatureType]

A dictionary specifying whether features are CATEGORICAL or CONTINUOUS. Feature columns that are not specified will retain their original type.

{}
ignore_column_names Union[str, Collection[str]]

The names of columns to ignore.

()

Returns:

Type Description
ModelSchema

The inspected schema with any modifications applied.

set_feature classmethod

set_feature(
    schema: ModelSchema, column_name: str, feature_type: FeatureType
) -> ModelSchema

Set a feature column in a schema.

Parameters:

Name Type Description Default
schema ModelSchema

The schema to modify.

required
column_name str

The name of the feature column.

required
feature_type FeatureType

Whether the feature is CATEGORICAL or CONTINUOUS.

required

Returns:

Type Description
ModelSchema

The modified schema.

set_identifier classmethod

set_identifier(schema: ModelSchema, column_name: str) -> ModelSchema

Set the identifier column in a schema.

Parameters:

Name Type Description Default
schema ModelSchema

The schema to modify.

required
column_name str

The name of the identifier column. Any column that was previously set as identifier will be changed to a feature column.

required

Returns:

Type Description
ModelSchema

The modified schema.

set_ignored classmethod

set_ignored(
    schema: ModelSchema, column_names: Union[str, Collection[str]]
) -> ModelSchema

Set one or more columns to be ignored.

Parameters:

Name Type Description Default
schema ModelSchema

The schema to modify.

required
column_names Union[str, Collection[str]]

The name of the column or columns to ignore.

required

Returns:

Type Description
ModelSchema

The modified schema.

set_prediction classmethod

set_prediction(schema: ModelSchema, column_name: str) -> ModelSchema

Set the prediction column in a schema.

Parameters:

Name Type Description Default
schema ModelSchema

The schema to modify.

required
column_name str

The name of the prediction column. Any column that was previously set as prediction will be changed to a feature column.

required

Returns:

Type Description
ModelSchema

The modified schema.

set_prediction_score classmethod

set_prediction_score(
    schema: ModelSchema, column_name_or_mapping: Union[str, Dict[str, str]]
) -> ModelSchema

Set the prediction score column(s) in a schema.

Binary classification and regression problems require a single prediction score column. Multiclass classification problems require a dictionary mapping class names to prediction score columns, e.g. {'class_1': 'prediction_score_1', 'class_2': 'prediction_score_2'}.

Parameters:

Name Type Description Default
schema ModelSchema

The schema to modify.

required
column_name_or_mapping Union[str, Dict[str, str]]

The name of the prediction score column or a dictionary mapping class names to prediction score column names. Any existing prediction score columns will be changed to feature columns.

required

Returns:

Type Description
ModelSchema

The modified schema.

set_target classmethod

set_target(schema: ModelSchema, column_name: str) -> ModelSchema

Set the target column in a schema.

Parameters:

Name Type Description Default
schema ModelSchema

The schema to modify.

required
column_name str

The name of the target column. Any column that was previously set as target will be changed to a feature column.

required

Returns:

Type Description
ModelSchema

The modified schema.

set_timestamp classmethod

set_timestamp(schema: ModelSchema, column_name: str) -> ModelSchema

Set the timestamp column in a schema.

Note

The timestamp column will be coerced to a datetime data type.

Parameters:

Name Type Description Default
schema ModelSchema

The schema to modify.

required
column_name str

The name of the timestamp column. Any column that was previously set as timestamp will be changed to a feature column.

required

Returns:

Type Description
ModelSchema

The modified schema.