sap-hana-ml
SAP HANA Machine Learning Python Client (hana-ml) development skill. Use when: Building ML solutions with SAP HANA's in-database machine learning using Python hana-ml library for PAL/APL algorithms, DataFrame operations, AutoML, model persistence, and visualization. Keywords: hana-ml, SAP HANA, machine learning, PAL, APL, predictive analytics, HANA DataFrame, ConnectionContext, classification, regression, clustering, time series, ARIMA, gradient boosting, AutoML, SHAP, model storage
$ インストール
git clone https://github.com/secondsky/sap-skills /tmp/sap-skills && cp -r /tmp/sap-skills/plugins/sap-hana-ml/skills/sap-hana-ml ~/.claude/skills/sap-skills// tip: Run this command in your terminal to install the skill
name: sap-hana-ml description: | SAP HANA Machine Learning Python Client (hana-ml) development skill.
Use when: Building ML solutions with SAP HANA's in-database machine learning using Python hana-ml library for PAL/APL algorithms, DataFrame operations, AutoML, model persistence, and visualization.
Keywords: hana-ml, SAP HANA, machine learning, PAL, APL, predictive analytics, HANA DataFrame, ConnectionContext, classification, regression, clustering, time series, ARIMA, gradient boosting, AutoML, SHAP, model storage license: GPL-3.0 metadata: version: 1.1.0 last_verified: 2025-11-27 package_version: 2.22.241011
SAP HANA ML Python Client (hana-ml)
Package Version: 2.22.241011
Last Verified: 2025-11-27
Table of Contents
Installation & Setup
pip install hana-ml
Requirements: Python 3.8+, SAP HANA 2.0 SPS03+ or SAP HANA Cloud
Quick Start
Connection & DataFrame
from hana_ml import ConnectionContext
# Connect
conn = ConnectionContext(
address='<hostname>',
port=443,
user='<username>',
password='<password>',
encrypt=True
)
# Create DataFrame
df = conn.table('MY_TABLE', schema='MY_SCHEMA')
print(f"Shape: {df.shape}")
df.head(10).collect()
PAL Classification
from hana_ml.algorithms.pal.unified_classification import UnifiedClassification
# Train model
clf = UnifiedClassification(func='RandomDecisionTree')
clf.fit(train_df, features=['F1', 'F2', 'F3'], label='TARGET')
# Predict & evaluate
predictions = clf.predict(test_df, features=['F1', 'F2', 'F3'])
score = clf.score(test_df, features=['F1', 'F2', 'F3'], label='TARGET')
APL AutoML
from hana_ml.algorithms.apl.classification import AutoClassifier
# Automated classification
auto_clf = AutoClassifier()
auto_clf.fit(train_df, label='TARGET')
predictions = auto_clf.predict(test_df)
Model Persistence
from hana_ml.model_storage import ModelStorage
ms = ModelStorage(conn)
clf.name = 'MY_CLASSIFIER'
ms.save_model(model=clf, if_exists='replace')
Core Libraries
PAL (Predictive Analysis Library)
- 100+ algorithms executed in-database
- Categories: Classification, Regression, Clustering, Time Series, Preprocessing
- Key classes:
UnifiedClassification,UnifiedRegression,KMeans,ARIMA - See:
references/PAL_ALGORITHMS.mdfor complete list
APL (Automated Predictive Library)
- AutoML capabilities with automatic feature engineering
- Key classes:
AutoClassifier,AutoRegressor,GradientBoostingClassifier - See:
references/APL_ALGORITHMS.mdfor details
DataFrames
- Lazy evaluation - builds SQL until
collect()called - In-database processing for optimal performance
- See:
references/DATAFRAME_REFERENCE.mdfor complete API
Visualizers
- EDA plots, model explanations, metrics
- SHAP integration for model interpretability
- See:
references/VISUALIZERS.mdfor 14 visualization modules
Common Patterns
Train-Test Split
from hana_ml.algorithms.pal.partition import train_test_val_split
train, test, val = train_test_val_split(
data=df,
training_percentage=0.7,
testing_percentage=0.2,
validation_percentage=0.1
)
Feature Importance
# APL models
importance = auto_clf.get_feature_importances()
# PAL models
from hana_ml.algorithms.pal.preprocessing import FeatureSelection
fs = FeatureSelection()
fs.fit(train_df, features=features, label='TARGET')
Pipeline
from hana_ml.algorithms.pal.pipeline import Pipeline
from hana_ml.algorithms.pal.preprocessing import Imputer, FeatureNormalizer
pipeline = Pipeline([
('imputer', Imputer(strategy='mean')),
('normalizer', FeatureNormalizer()),
('classifier', UnifiedClassification(func='RandomDecisionTree'))
])
Best Practices
- Use lazy evaluation - Operations build SQL without execution until
collect() - Leverage in-database processing - Keep data in HANA for performance
- Use Unified interfaces - Consistent APIs across algorithms
- Save models - Use
ModelStoragefor persistence - Explain predictions - Use SHAP explainers for interpretability
- Monitor AutoML - Use
PipelineProgressStatusMonitorfor long-running jobs
Bundled Resources
Reference Files
-
references/DATAFRAME_REFERENCE.md(479 lines)- ConnectionContext API, DataFrame operations, SQL generation
-
references/PAL_ALGORITHMS.md(869 lines)- Complete PAL algorithm reference (100+ algorithms)
- Classification, Regression, Clustering, Time Series, Preprocessing
-
references/APL_ALGORITHMS.md(534 lines)- AutoML capabilities, automated feature engineering
- AutoClassifier, AutoRegressor, GradientBoosting classes
-
references/VISUALIZERS.md(704 lines)- 14 visualization modules (EDA, SHAP, metrics, time series)
- Plot types, configuration, export options
-
references/SUPPORTING_MODULES.md(626 lines)- Model storage, spatial analytics, graph algorithms
- Text mining, statistics, error handling
Error Handling
from hana_ml.ml_exceptions import Error
try:
clf.fit(train_df, features=features, label='TARGET')
except Error as e:
print(f"HANA ML Error: {e}")
Documentation
- Official Docs: https://help.sap.com/doc/1d0ebfe5e8dd44d09606814d83308d4b/2.0.07/en-US/hana_ml.html
- PyPI Package: https://pypi.org/project/hana-ml/
Repository
