schema
ModelSchema
Bases: BaseSchema
Schema for a machine learning model.
Schema
Operations for working with machine learning model schemas.
from_df
classmethod
from_df(problem_type: Literal['BINARY_CLASSIFICATION', 'REGRESSION'], df: pd.DataFrame, target_column_name: Optional[str] = ..., timestamp_column_name: Optional[str] = ..., prediction_column_name: Optional[str] = ..., prediction_score_column_name_or_mapping: Optional[str] = ..., identifier_column_name: Optional[str] = ..., feature_columns: Dict[str, FeatureType] = ..., ignore_column_names: Union[str, Collection[str]] = ..., segment_column_names: Union[str, Collection[str]] = ...) -> ModelSchema
from_df(problem_type: Literal['MULTICLASS_CLASSIFICATION'], df: pd.DataFrame, target_column_name: Optional[str] = ..., timestamp_column_name: Optional[str] = ..., prediction_column_name: Optional[str] = ..., prediction_score_column_name_or_mapping: Dict[str, str] = ..., identifier_column_name: Optional[str] = ..., feature_columns: Dict[str, FeatureType] = ..., ignore_column_names: Union[str, Collection[str]] = ..., segment_column_names: Union[str, Collection[str]] = ...) -> ModelSchema
from_df(problem_type: ProblemType, df: pd.DataFrame, target_column_name: Optional[str] = None, timestamp_column_name: Optional[str] = None, prediction_column_name: Optional[str] = None, prediction_score_column_name_or_mapping: Optional[Union[str, Dict[str, str]]] = None, identifier_column_name: Optional[str] = None, feature_columns: Dict[str, FeatureType] = {}, ignore_column_names: Union[str, Collection[str]] = (), segment_column_names: Union[str, Collection[str]] = ()) -> ModelSchema
Create a schema from a pandas dataframe.
Sends a sample of the dataframe to the NannyML Cloud API to inspect the schema. Heuristics are used to identify what each column represents. The schema is then modified according to the provided arguments.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
problem_type
|
ProblemType
|
The problem type of the model. |
required |
df
|
DataFrame
|
The pandas dataframe to create a schema from. |
required |
target_column_name
|
Optional[str]
|
The name of the target column. Any column that heuristics identified as target will be changed to a feature column. |
None
|
timestamp_column_name
|
Optional[str]
|
The name of the timestamp column. Any column that heuristics identified as timestamp will be changed to a feature column. |
None
|
prediction_column_name
|
Optional[str]
|
The name of the prediction column. Any column that heuristics identified as prediction will be changed to a feature column. |
None
|
prediction_score_column_name_or_mapping
|
Optional[Union[str, Dict[str, str]]]
|
This parameter accepts two formats depending on problem type.
|
None
|
identifier_column_name
|
Optional[str]
|
The name of the identifier column. Any column that heuristics identified as identifier will be changed to a feature column. |
None
|
feature_columns
|
Dict[str, FeatureType]
|
A dictionary specifying whether features are |
{}
|
ignore_column_names
|
Union[str, Collection[str]]
|
The names of columns to ignore. |
()
|
segment_column_names
|
Union[str, Collection[str]]
|
The names of columns to mark as segment sources. Their values will be used to segment the data. The column will keep its original type. |
()
|
Returns:
Type | Description |
---|---|
ModelSchema
|
The inspected schema with any modifications applied. |
set_feature
classmethod
set_feature(schema: ModelSchema, column_name: str, feature_type: FeatureType) -> ModelSchema
Set a feature column in a schema.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
schema
|
ModelSchema
|
The schema to modify. |
required |
column_name
|
str
|
The name of the feature column. |
required |
feature_type
|
FeatureType
|
Whether the feature is |
required |
Returns:
Type | Description |
---|---|
ModelSchema
|
The modified schema. |
set_identifier
classmethod
set_identifier(schema: ModelSchema, column_name: str) -> ModelSchema
Set the identifier column in a schema.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
schema
|
ModelSchema
|
The schema to modify. |
required |
column_name
|
str
|
The name of the identifier column. Any column that was previously set as identifier will be changed to a feature column. |
required |
Returns:
Type | Description |
---|---|
ModelSchema
|
The modified schema. |
set_ignored
classmethod
set_ignored(schema: ModelSchema, column_names: Union[str, Collection[str]]) -> ModelSchema
Set one or more columns to be ignored.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
schema
|
ModelSchema
|
The schema to modify. |
required |
column_names
|
Union[str, Collection[str]]
|
The name of the column or columns to ignore. |
required |
Returns:
Type | Description |
---|---|
ModelSchema
|
The modified schema. |
set_prediction
classmethod
set_prediction(schema: ModelSchema, column_name: str) -> ModelSchema
Set the prediction column in a schema.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
schema
|
ModelSchema
|
The schema to modify. |
required |
column_name
|
str
|
The name of the prediction column. Any column that was previously set as prediction will be changed to a feature column. |
required |
Returns:
Type | Description |
---|---|
ModelSchema
|
The modified schema. |
set_prediction_score
classmethod
set_prediction_score(schema: ModelSchema, column_name_or_mapping: Union[str, Dict[str, str]]) -> ModelSchema
Set the prediction score column(s) in a schema.
Binary classification and regression problems require a single prediction score column.
Multiclass classification problems require a dictionary mapping class names to prediction score columns, e.g.
{'class_1': 'prediction_score_1', 'class_2': 'prediction_score_2'}
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
schema
|
ModelSchema
|
The schema to modify. |
required |
column_name_or_mapping
|
Union[str, Dict[str, str]]
|
The name of the prediction score column or a dictionary mapping class names to prediction score column names. Any existing prediction score columns will be changed to feature columns. |
required |
Returns:
Type | Description |
---|---|
ModelSchema
|
The modified schema. |
set_segment
classmethod
set_segment(schema: ModelSchema, column_name: str) -> ModelSchema
Sets the SEGMENT column flag for a column in a schema.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
schema
|
ModelSchema
|
The schema to modify. |
required |
column_name
|
str
|
The name of the column to mark as a segment column. The column will keep its original type. |
required |
Returns:
Type | Description |
---|---|
ModelSchema
|
The modified schema. |
set_target
classmethod
set_target(schema: ModelSchema, column_name: str) -> ModelSchema
Set the target column in a schema.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
schema
|
ModelSchema
|
The schema to modify. |
required |
column_name
|
str
|
The name of the target column. Any column that was previously set as target will be changed to a feature column. |
required |
Returns:
Type | Description |
---|---|
ModelSchema
|
The modified schema. |
set_timestamp
classmethod
set_timestamp(schema: ModelSchema, column_name: str) -> ModelSchema
Set the timestamp column in a schema.
Note
The timestamp column will be coerced to a datetime data type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
schema
|
ModelSchema
|
The schema to modify. |
required |
column_name
|
str
|
The name of the timestamp column. Any column that was previously set as timestamp will be changed to a feature column. |
required |
Returns:
Type | Description |
---|---|
ModelSchema
|
The modified schema. |