schema
ModelSchema
Bases: BaseSchema
Schema for a machine learning model.
Schema
Operations for working with machine learning model schemas.
from_df
classmethod
from_df(problem_type: ProblemType, df: pd.DataFrame, target_column_name: Optional[str] = None, prediction_score_column_name_or_mapping: Optional[Union[str, Dict[str, str]]] = None, identifier_column_name: Optional[str] = None, ignore_column_names: Union[str, Collection[str]] = ()) -> ModelSchema
Create a schema from a pandas dataframe.
Sends a sample of the dataframe to the NannyML Cloud API to inspect the schema. Heuristics are used to identify what each column represents. The schema is then modified according to the provided arguments.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
problem_type
|
ProblemType
|
The problem type of the model. |
required |
df
|
DataFrame
|
The pandas dataframe to create a schema from. |
required |
target_column_name
|
Optional[str]
|
The name of the target column. Any column that heuristics identified as target will be changed to a feature column. |
None
|
prediction_score_column_name_or_mapping
|
Optional[Union[str, Dict[str, str]]]
|
This parameter accepts two formats depending on problem type.
|
None
|
identifier_column_name
|
Optional[str]
|
The name of the identifier column. Any column that heuristics identified as identifier will be changed to a feature column. |
None
|
ignore_column_names
|
Union[str, Collection[str]]
|
The names of columns to ignore. |
()
|
Returns:
Type | Description |
---|---|
ModelSchema
|
The inspected schema with any modifications applied. |
set_identifier
classmethod
set_identifier(schema: ModelSchema, column_name: str) -> ModelSchema
Set the identifier column in a schema.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
schema
|
ModelSchema
|
The schema to modify. |
required |
column_name
|
str
|
The name of the identifier column. Any column that was previously set as identifier will be changed to a feature column. |
required |
Returns:
Type | Description |
---|---|
ModelSchema
|
The modified schema. |
set_ignored
classmethod
set_ignored(schema: ModelSchema, column_names: Union[str, Collection[str]]) -> ModelSchema
Set one or more columns to be ignored.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
schema
|
ModelSchema
|
The schema to modify. |
required |
column_names
|
Union[str, Collection[str]]
|
The name of the column or columns to ignore. |
required |
Returns:
Type | Description |
---|---|
ModelSchema
|
The modified schema. |
set_prediction_score
classmethod
set_prediction_score(schema: ModelSchema, column_name_or_mapping: Union[str, Dict[str, str]]) -> ModelSchema
Set the prediction score column(s) in a schema.
Binary classification and regression problems require a single prediction score column.
Multiclass classification problems require a dictionary mapping class names to prediction score columns, e.g.
{'class_1': 'prediction_score_1', 'class_2': 'prediction_score_2'}
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
schema
|
ModelSchema
|
The schema to modify. |
required |
column_name_or_mapping
|
Union[str, Dict[str, str]]
|
The name of the prediction score column or a dictionary mapping class names to prediction score column names. Any existing prediction score columns will be changed to feature columns. |
required |
Returns:
Type | Description |
---|---|
ModelSchema
|
The modified schema. |
set_target
classmethod
set_target(schema: ModelSchema, column_name: str) -> ModelSchema
Set the target column in a schema.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
schema
|
ModelSchema
|
The schema to modify. |
required |
column_name
|
str
|
The name of the target column. Any column that was previously set as target will be changed to a feature column. |
required |
Returns:
Type | Description |
---|---|
ModelSchema
|
The modified schema. |