Skip to content

Commit f93f99d

Browse files
author
github-actions
committed
sync model docs: 2024-06-21T01:13:55Z
1 parent f152415 commit f93f99d

4 files changed

Lines changed: 24 additions & 20 deletions

File tree

module/model/user/generated/10_model_schema.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,8 @@ Detail specification is defined by using `InferenceSchema` class, following are
1515
|-------|------|-------------|-----------|
1616
| `feature_types` | Dict[str, ValueType] | Mapping between feature name with the type of the feature | True |
1717
| `model_prediction_output` | PredictionOutput | Prediction specification that differ between model types, e.g BinaryClassificationOutput, RegressionOutput, RankingOutput | True |
18-
| `prediction_id_column` | str | The column name that contains prediction id value | True |
18+
| `session_id_column` | str | The column name that is unique identifier for a request | True |
19+
| `row_id_column` | str | The column name that is unique identifier for a row in a request | True |
1920
| `tag_columns` | Optional[List[str]] | List of column names that contains additional information about prediction, you can treat it as metadata | False |
2021

2122
From above we can see `model_prediction_output` field that has type `PredictionOutput`, this field is a specification of prediction that is generated by the model depending on it's model type. Currently we support 3 model types in the schema:
@@ -73,7 +74,8 @@ from merlin.observability.inference import InferenceSchema, ValueType, BinaryCla
7374
"featureC": ValueType.STRING,
7475
"featureD": ValueType.BOOLEAN
7576
},
76-
prediction_id_column="prediction_id",
77+
session_id_column="session_id",
78+
row_id_column="row_id",
7779
model_prediction_output=BinaryClassificationOutput(
7880
prediction_score_column="score",
7981
actual_label_column="target",

module/model/user/generated/11_model_observability.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -33,17 +33,17 @@ Beside changes in signature, you can see some of those methods returning new typ
3333

3434
| Field | Type | Description|
3535
|-------|------|------------|
36-
| `prediction_ids` | List[str] | Unique identifier for each row in prediction |
37-
| `features` | Union[Values, pandas.DataFrame] | Features value that is used by the model to generate prediction. Length of features should be the same with `prediction_ids` |
36+
| `row_ids` | List[str] | Unique identifier for each row in prediction |
37+
| `features` | Union[Values, pandas.DataFrame] | Features value that is used by the model to generate prediction. Length of features should be the same with `row_ids` |
3838
| `entities` | Optional[Union[Values, pandas.DataFrame]] | Additional data that are not used for prediction, but this data is used to retrieved another features, e.g `driver_id`, we can retrieve features associated with certain `driver_id`|
39-
| `session_id` | str | Identifier for the request. This value will be used together with `prediction_ids` as prediction identifier in model observability system |
39+
| `session_id` | str | Identifier for the request. This value will be used together with `row_ids` as prediction identifier in model observability system |
4040

4141
`ModelInput` data is essential for model observability since it contains features values and identifier of prediction. Features values are used to calculate feature drift, and identifier is used as join key between features, prediction data with ground truth data. On the other hand, `ModelOutput` is the class that represent raw model prediction output, not the final output of PyFunc model. `ModelOutput` class contains following fields:
4242

4343
| Field | Type | Description |
4444
|-------|------|-------------|
4545
| `prediction` | Values | `predictions` contains prediction output from ml_predict, it may contains multiple columns e.g for multiclass classification or for binary classification that contains prediction score and label |
46-
| `prediction_ids` | List[str] | Unique identifier for each row in prediction output |
46+
| `row_ids` | List[str] | Unique identifier for each row in prediction output |
4747

4848
Same like `ModelInput`, `ModelOutput` is also essential for model observability, it can be used to calculate prediction drift but more importantly it can calculate performance metrics.
4949

@@ -61,21 +61,21 @@ There is not much change on the deployment part, users just needs to set `enable
6161
* featureC that has string type
6262
* featureD that has float type
6363

64-
The model type is ranking with prediction group id information is located in `session_id` column, prediction id in `prediction_id` column, rank score in `score` column and `relevance_score_column` in `relevance_score`. Below is the snipped of the python code
64+
The model type is ranking with prediction group id information is located in `session_id` column, row id in `row_id` column, rank score in `score` column and `relevance_score_column` in `relevance_score`. Below is the snipped of the python code
6565

6666
```python
6767
class ModelObservabilityModel(PyFuncV3Model):
6868

6969
def preprocess(self, request: dict, **kwargs) -> ModelInput:
7070
return ModelInput(
7171
session_id="session_id",
72-
prediction_ids=["prediction_1", "prediction_2"],
72+
row_ids=["prediction_1", "prediction_2"],
7373
features=pd.DataFrame([[0.7, 200, "ID", True], [0.99, 250, "SG", False]], columns=["featureA", "featureB", "featureC", "featureD"]),
7474
)
7575

7676
def infer(self, model_input: ModelInput) -> ModelOutput:
7777
return ModelOutput(
78-
prediction_ids=model_input.prediction_ids,
78+
row_ids=model_input.row_ids,
7979
predictions=Values(columns=["score"], data=[[0.5], [0.9]]),
8080
)
8181
def postprocess(self, model_output: ModelOutput, request: dict) -> dict:
@@ -90,7 +90,7 @@ model_schema = ModelSchema(spec=InferenceSchema(
9090
"featureD": ValueType.BOOLEAN
9191
},
9292
session_id_column="session_id",
93-
row_id_column="prediction_id",
93+
row_id_column="row_id",
9494
model_prediction_output=RankingOutput(
9595
rank_score_column="score",
9696
prediction_group_id_column="session_id",

module/model/user/templates/10_model_schema.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,8 @@ Detail specification is defined by using `InferenceSchema` class, following are
1515
|-------|------|-------------|-----------|
1616
| `feature_types` | Dict[str, ValueType] | Mapping between feature name with the type of the feature | True |
1717
| `model_prediction_output` | PredictionOutput | Prediction specification that differ between model types, e.g BinaryClassificationOutput, RegressionOutput, RankingOutput | True |
18-
| `prediction_id_column` | str | The column name that contains prediction id value | True |
18+
| `session_id_column` | str | The column name that is unique identifier for a request | True |
19+
| `row_id_column` | str | The column name that is unique identifier for a row in a request | True |
1920
| `tag_columns` | Optional[List[str]] | List of column names that contains additional information about prediction, you can treat it as metadata | False |
2021

2122
From above we can see `model_prediction_output` field that has type `PredictionOutput`, this field is a specification of prediction that is generated by the model depending on it's model type. Currently we support 3 model types in the schema:
@@ -73,7 +74,8 @@ from merlin.observability.inference import InferenceSchema, ValueType, BinaryCla
7374
"featureC": ValueType.STRING,
7475
"featureD": ValueType.BOOLEAN
7576
},
76-
prediction_id_column="prediction_id",
77+
session_id_column="session_id",
78+
row_id_column="row_id",
7779
model_prediction_output=BinaryClassificationOutput(
7880
prediction_score_column="score",
7981
actual_label_column="target",

module/model/user/templates/11_model_observability.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -33,17 +33,17 @@ Beside changes in signature, you can see some of those methods returning new typ
3333

3434
| Field | Type | Description|
3535
|-------|------|------------|
36-
| `prediction_ids` | List[str] | Unique identifier for each row in prediction |
37-
| `features` | Union[Values, pandas.DataFrame] | Features value that is used by the model to generate prediction. Length of features should be the same with `prediction_ids` |
36+
| `row_ids` | List[str] | Unique identifier for each row in prediction |
37+
| `features` | Union[Values, pandas.DataFrame] | Features value that is used by the model to generate prediction. Length of features should be the same with `row_ids` |
3838
| `entities` | Optional[Union[Values, pandas.DataFrame]] | Additional data that are not used for prediction, but this data is used to retrieved another features, e.g `driver_id`, we can retrieve features associated with certain `driver_id`|
39-
| `session_id` | str | Identifier for the request. This value will be used together with `prediction_ids` as prediction identifier in model observability system |
39+
| `session_id` | str | Identifier for the request. This value will be used together with `row_ids` as prediction identifier in model observability system |
4040

4141
`ModelInput` data is essential for model observability since it contains features values and identifier of prediction. Features values are used to calculate feature drift, and identifier is used as join key between features, prediction data with ground truth data. On the other hand, `ModelOutput` is the class that represent raw model prediction output, not the final output of PyFunc model. `ModelOutput` class contains following fields:
4242

4343
| Field | Type | Description |
4444
|-------|------|-------------|
4545
| `prediction` | Values | `predictions` contains prediction output from ml_predict, it may contains multiple columns e.g for multiclass classification or for binary classification that contains prediction score and label |
46-
| `prediction_ids` | List[str] | Unique identifier for each row in prediction output |
46+
| `row_ids` | List[str] | Unique identifier for each row in prediction output |
4747

4848
Same like `ModelInput`, `ModelOutput` is also essential for model observability, it can be used to calculate prediction drift but more importantly it can calculate performance metrics.
4949

@@ -61,21 +61,21 @@ There is not much change on the deployment part, users just needs to set `enable
6161
* featureC that has string type
6262
* featureD that has float type
6363

64-
The model type is ranking with prediction group id information is located in `session_id` column, prediction id in `prediction_id` column, rank score in `score` column and `relevance_score_column` in `relevance_score`. Below is the snipped of the python code
64+
The model type is ranking with prediction group id information is located in `session_id` column, row id in `row_id` column, rank score in `score` column and `relevance_score_column` in `relevance_score`. Below is the snipped of the python code
6565

6666
```python
6767
class ModelObservabilityModel(PyFuncV3Model):
6868

6969
def preprocess(self, request: dict, **kwargs) -> ModelInput:
7070
return ModelInput(
7171
session_id="session_id",
72-
prediction_ids=["prediction_1", "prediction_2"],
72+
row_ids=["prediction_1", "prediction_2"],
7373
features=pd.DataFrame([[0.7, 200, "ID", True], [0.99, 250, "SG", False]], columns=["featureA", "featureB", "featureC", "featureD"]),
7474
)
7575

7676
def infer(self, model_input: ModelInput) -> ModelOutput:
7777
return ModelOutput(
78-
prediction_ids=model_input.prediction_ids,
78+
row_ids=model_input.row_ids,
7979
predictions=Values(columns=["score"], data=[[0.5], [0.9]]),
8080
)
8181
def postprocess(self, model_output: ModelOutput, request: dict) -> dict:
@@ -90,7 +90,7 @@ model_schema = ModelSchema(spec=InferenceSchema(
9090
"featureD": ValueType.BOOLEAN
9191
},
9292
session_id_column="session_id",
93-
row_id_column="prediction_id",
93+
row_id_column="row_id",
9494
model_prediction_output=RankingOutput(
9595
rank_score_column="score",
9696
prediction_group_id_column="session_id",

0 commit comments

Comments
 (0)