schema

ModelSchema

Bases: BaseSchema

Schema for a machine learning model.

Schema

Operations for working with machine learning model schemas.

from_df `classmethod`

from_df(problem_type: Literal['BINARY_CLASSIFICATION', 'REGRESSION'], df: pd.DataFrame, target_column_name: Optional[str] = ..., timestamp_column_name: Optional[str] = ..., prediction_column_name: Optional[str] = ..., prediction_score_column_name_or_mapping: Optional[str] = ..., identifier_column_name: Optional[str] = ..., feature_columns: Dict[str, FeatureType] = ..., ignore_column_names: Union[str, Collection[str]] = ..., segment_column_names: Union[str, Collection[str]] = ...) -> ModelSchema

from_df(problem_type: Literal['MULTICLASS_CLASSIFICATION'], df: pd.DataFrame, target_column_name: Optional[str] = ..., timestamp_column_name: Optional[str] = ..., prediction_column_name: Optional[str] = ..., prediction_score_column_name_or_mapping: Dict[str, str] = ..., identifier_column_name: Optional[str] = ..., feature_columns: Dict[str, FeatureType] = ..., ignore_column_names: Union[str, Collection[str]] = ..., segment_column_names: Union[str, Collection[str]] = ...) -> ModelSchema

from_df(problem_type: ProblemType, df: pd.DataFrame, target_column_name: Optional[str] = None, timestamp_column_name: Optional[str] = None, prediction_column_name: Optional[str] = None, prediction_score_column_name_or_mapping: Optional[Union[str, Dict[str, str]]] = None, identifier_column_name: Optional[str] = None, feature_columns: Dict[str, FeatureType] = {}, ignore_column_names: Union[str, Collection[str]] = (), segment_column_names: Union[str, Collection[str]] = ()) -> ModelSchema

Create a schema from a pandas dataframe.

Sends a sample of the dataframe to the NannyML Cloud API to inspect the schema. Heuristics are used to identify what each column represents. The schema is then modified according to the provided arguments.

Parameters:

Name	Type	Description	Default
`problem_type`	`ProblemType`	The problem type of the model.	required
`df`	`DataFrame`	The pandas dataframe to create a schema from.	required
`target_column_name`	`Optional[str]`	The name of the target column. Any column that heuristics identified as target will be changed to a feature column.	`None`
`timestamp_column_name`	`Optional[str]`	The name of the timestamp column. Any column that heuristics identified as timestamp will be changed to a feature column.	`None`
`prediction_column_name`	`Optional[str]`	The name of the prediction column. Any column that heuristics identified as prediction will be changed to a feature column.	`None`
`prediction_score_column_name_or_mapping`	`Optional[Union[str, Dict[str, str]]]`	This parameter accepts two formats depending on problem type. For binary classification and regression, this should be the name of the prediction score column. For multiclass classification, it should be a dict mapping prediction score column names to class names, e.g. `{'prediction_score_1': 'class_1', 'prediction_score_2': 'class_2'}`.	`None`
`identifier_column_name`	`Optional[str]`	The name of the identifier column. Any column that heuristics identified as identifier will be changed to a feature column.	`None`
`feature_columns`	`Dict[str, FeatureType]`	A dictionary specifying whether features are `CATEGORICAL` or `CONTINUOUS`. Feature columns that are not specified will retain their original type.	`{}`
`ignore_column_names`	`Union[str, Collection[str]]`	The names of columns to ignore.	`()`
`segment_column_names`	`Union[str, Collection[str]]`	The names of columns to mark as segment sources. Their values will be used to segment the data. The column will keep its original type.	`()`

Returns:

Type	Description
`ModelSchema`	The inspected schema with any modifications applied.

set_feature `classmethod`

set_feature(schema: ModelSchema, column_name: str, feature_type: FeatureType) -> ModelSchema

Set a feature column in a schema.

Parameters:

Name	Type	Description	Default
`schema`	`ModelSchema`	The schema to modify.	required
`column_name`	`str`	The name of the feature column.	required
`feature_type`	`FeatureType`	Whether the feature is `CATEGORICAL` or `CONTINUOUS`.	required

Returns:

Type	Description
`ModelSchema`	The modified schema.

set_identifier `classmethod`

set_identifier(schema: ModelSchema, column_name: str) -> ModelSchema

Set the identifier column in a schema.

Parameters:

Name	Type	Description	Default
`schema`	`ModelSchema`	The schema to modify.	required
`column_name`	`str`	The name of the identifier column. Any column that was previously set as identifier will be changed to a feature column.	required

Returns:

Type	Description
`ModelSchema`	The modified schema.

set_ignored `classmethod`

set_ignored(schema: ModelSchema, column_names: Union[str, Collection[str]]) -> ModelSchema

Set one or more columns to be ignored.

Parameters:

Name	Type	Description	Default
`schema`	`ModelSchema`	The schema to modify.	required
`column_names`	`Union[str, Collection[str]]`	The name of the column or columns to ignore.	required

Returns:

Type	Description
`ModelSchema`	The modified schema.

set_prediction `classmethod`

set_prediction(schema: ModelSchema, column_name: str) -> ModelSchema

Set the prediction column in a schema.

Parameters:

Name	Type	Description	Default
`schema`	`ModelSchema`	The schema to modify.	required
`column_name`	`str`	The name of the prediction column. Any column that was previously set as prediction will be changed to a feature column.	required

Returns:

Type	Description
`ModelSchema`	The modified schema.

set_prediction_score `classmethod`

set_prediction_score(schema: ModelSchema, column_name_or_mapping: Union[str, Dict[str, str]]) -> ModelSchema

Set the prediction score column(s) in a schema.

Binary classification and regression problems require a single prediction score column. Multiclass classification problems require a dictionary mapping class names to prediction score columns, e.g. {'class_1': 'prediction_score_1', 'class_2': 'prediction_score_2'}.

Parameters:

Name	Type	Description	Default
`schema`	`ModelSchema`	The schema to modify.	required
`column_name_or_mapping`	`Union[str, Dict[str, str]]`	The name of the prediction score column or a dictionary mapping class names to prediction score column names. Any existing prediction score columns will be changed to feature columns.	required

Returns:

Type	Description
`ModelSchema`	The modified schema.

set_segment `classmethod`

set_segment(schema: ModelSchema, column_name: str) -> ModelSchema

Sets the SEGMENT column flag for a column in a schema.

Parameters:

Name	Type	Description	Default
`schema`	`ModelSchema`	The schema to modify.	required
`column_name`	`str`	The name of the column to mark as a segment column. The column will keep its original type.	required

Returns:

Type	Description
`ModelSchema`	The modified schema.

set_target `classmethod`

set_target(schema: ModelSchema, column_name: str) -> ModelSchema

Set the target column in a schema.

Parameters:

Name	Type	Description	Default
`schema`	`ModelSchema`	The schema to modify.	required
`column_name`	`str`	The name of the target column. Any column that was previously set as target will be changed to a feature column.	required

Returns:

Type	Description
`ModelSchema`	The modified schema.

set_timestamp `classmethod`

set_timestamp(schema: ModelSchema, column_name: str) -> ModelSchema

Set the timestamp column in a schema.

Note

The timestamp column will be coerced to a datetime data type.

Parameters:

Name	Type	Description	Default
`schema`	`ModelSchema`	The schema to modify.	required
`column_name`	`str`	The name of the timestamp column. Any column that was previously set as timestamp will be changed to a feature column.	required

Returns:

Type	Description
`ModelSchema`	The modified schema.

schema

ModelSchema

Schema

from_df classmethod

set_feature classmethod

set_identifier classmethod

set_ignored classmethod

set_prediction classmethod

set_prediction_score classmethod

set_segment classmethod

set_target classmethod

set_timestamp classmethod

from_df `classmethod`

set_feature `classmethod`

set_identifier `classmethod`

set_ignored `classmethod`

set_prediction `classmethod`

set_prediction_score `classmethod`

set_segment `classmethod`

set_target `classmethod`

set_timestamp `classmethod`