將python scikit學習模型導入pmml

我想將python scikit-learn模型導出爲PMML。將python scikit學習模型導入pmml

什麼python軟件包最適合？

我讀了約Augustus，但我無法找到任何使用scikit-learn模型的例子。

來源

2015-10-19 Selva

您可以轉換使用[sklearn2pmml] Scikit，學習模式和變壓器PMML（https://github.com/jpmml/sklearn2pmml）包。 – user1808924

JPMML-SkLearn也支持Python 2.7，但目前尚未公佈。 – user1808924

jpmml-sklearn包支持python 3.4。有沒有一種替代方案支持python 2.7 – Selva

SkLearn2PMML是

圍繞JPMML-SkLearn命令行應用程序的薄包裝。有關支持的Scikit-Learn Estimator和Transformer類型的列表，請參閱JPMML-SkLearn項目的文檔。

正如@ user1808924所說，它支持Python 2.7或3.4+。它也需要Java 1.7+

通過安裝：（需要git）如何分類樹導出到PMML

pip install git+https://github.com/jpmml/sklearn2pmml.git

例。首先，蘋果樹在成長：

# example tree & viz from http://scikit-learn.org/stable/modules/tree.html 
from sklearn import datasets, tree 
iris = datasets.load_iris() 
clf = tree.DecisionTreeClassifier() 
clf = clf.fit(iris.data, iris.target)

有兩個部分的SkLearn2PMML轉換，一個估計（我們clf）和映射器（用於預處理步驟，如離散或PCA）。我們的映射器是非常基本的，因爲我們沒有做任何轉換。

from sklearn_pandas import DataFrameMapper 
default_mapper = DataFrameMapper([(i, None) for i in iris.feature_names + ['Species']]) 

from sklearn2pmml import sklearn2pmml 
sklearn2pmml(estimator=clf, 
      mapper=default_mapper, 
      pmml="D:/workspace/IrisClassificationTree.pmml")

這是可能的（雖然不是記錄）通過mapper=None，但你會看到預測名迷路（返回x1沒有sepal length等）。

讓我們看看.pmml文件：

<?xml version="1.0" encoding="UTF-8" standalone="yes"?> 
<PMML xmlns="http://www.dmg.org/PMML-4_3" version="4.3"> 
    <Header> 
     <Application name="JPMML-SkLearn" version="1.1.1"/> 
     <Timestamp>2016-09-26T19:21:43Z</Timestamp> 
    </Header> 
    <DataDictionary> 
     <DataField name="sepal length (cm)" optype="continuous" dataType="float"/> 
     <DataField name="sepal width (cm)" optype="continuous" dataType="float"/> 
     <DataField name="petal length (cm)" optype="continuous" dataType="float"/> 
     <DataField name="petal width (cm)" optype="continuous" dataType="float"/> 
     <DataField name="Species" optype="categorical" dataType="string"> 
      <Value value="setosa"/> 
      <Value value="versicolor"/> 
      <Value value="virginica"/> 
     </DataField> 
    </DataDictionary> 
    <TreeModel functionName="classification" splitCharacteristic="binarySplit"> 
     <MiningSchema> 
      <MiningField name="Species" usageType="target"/> 
      <MiningField name="sepal length (cm)"/> 
      <MiningField name="sepal width (cm)"/> 
      <MiningField name="petal length (cm)"/> 
      <MiningField name="petal width (cm)"/> 
     </MiningSchema> 
     <Output> 
      <OutputField name="probability_setosa" dataType="double" feature="probability" value="setosa"/> 
      <OutputField name="probability_versicolor" dataType="double" feature="probability" value="versicolor"/> 
      <OutputField name="probability_virginica" dataType="double" feature="probability" value="virginica"/> 
     </Output> 
     <Node id="1"> 
      <True/> 
      <Node id="2" score="setosa" recordCount="50.0"> 
       <SimplePredicate field="petal width (cm)" operator="lessOrEqual" value="0.8"/> 
       <ScoreDistribution value="setosa" recordCount="50.0"/> 
       <ScoreDistribution value="versicolor" recordCount="0.0"/> 
       <ScoreDistribution value="virginica" recordCount="0.0"/> 
      </Node> 
      <Node id="3"> 
       <SimplePredicate field="petal width (cm)" operator="greaterThan" value="0.8"/> 
       <Node id="4"> 
        <SimplePredicate field="petal width (cm)" operator="lessOrEqual" value="1.75"/> 
        <Node id="5"> 
         <SimplePredicate field="petal length (cm)" operator="lessOrEqual" value="4.95"/> 
         <Node id="6" score="versicolor" recordCount="47.0"> 
          <SimplePredicate field="petal width (cm)" operator="lessOrEqual" value="1.6500001"/> 
          <ScoreDistribution value="setosa" recordCount="0.0"/> 
          <ScoreDistribution value="versicolor" recordCount="47.0"/> 
          <ScoreDistribution value="virginica" recordCount="0.0"/> 
         </Node> 
         <Node id="7" score="virginica" recordCount="1.0"> 
          <SimplePredicate field="petal width (cm)" operator="greaterThan" value="1.6500001"/> 
          <ScoreDistribution value="setosa" recordCount="0.0"/> 
          <ScoreDistribution value="versicolor" recordCount="0.0"/> 
          <ScoreDistribution value="virginica" recordCount="1.0"/> 
         </Node> 
        </Node> 
        <Node id="8"> 
         <SimplePredicate field="petal length (cm)" operator="greaterThan" value="4.95"/> 
         <Node id="9" score="virginica" recordCount="3.0"> 
          <SimplePredicate field="petal width (cm)" operator="lessOrEqual" value="1.55"/> 
          <ScoreDistribution value="setosa" recordCount="0.0"/> 
          <ScoreDistribution value="versicolor" recordCount="0.0"/> 
          <ScoreDistribution value="virginica" recordCount="3.0"/> 
         </Node> 
         <Node id="10"> 
          <SimplePredicate field="petal width (cm)" operator="greaterThan" value="1.55"/> 
          <Node id="11" score="versicolor" recordCount="2.0"> 
           <SimplePredicate field="sepal length (cm)" operator="lessOrEqual" value="6.95"/> 
           <ScoreDistribution value="setosa" recordCount="0.0"/> 
           <ScoreDistribution value="versicolor" recordCount="2.0"/> 
           <ScoreDistribution value="virginica" recordCount="0.0"/> 
          </Node> 
          <Node id="12" score="virginica" recordCount="1.0"> 
           <SimplePredicate field="sepal length (cm)" operator="greaterThan" value="6.95"/> 
           <ScoreDistribution value="setosa" recordCount="0.0"/> 
           <ScoreDistribution value="versicolor" recordCount="0.0"/> 
           <ScoreDistribution value="virginica" recordCount="1.0"/> 
          </Node> 
         </Node> 
        </Node> 
       </Node> 
       <Node id="13"> 
        <SimplePredicate field="petal width (cm)" operator="greaterThan" value="1.75"/> 
        <Node id="14"> 
         <SimplePredicate field="petal length (cm)" operator="lessOrEqual" value="4.8500004"/> 
         <Node id="15" score="virginica" recordCount="2.0"> 
          <SimplePredicate field="sepal width (cm)" operator="lessOrEqual" value="3.1"/> 
          <ScoreDistribution value="setosa" recordCount="0.0"/> 
          <ScoreDistribution value="versicolor" recordCount="0.0"/> 
          <ScoreDistribution value="virginica" recordCount="2.0"/> 
         </Node> 
         <Node id="16" score="versicolor" recordCount="1.0"> 
          <SimplePredicate field="sepal width (cm)" operator="greaterThan" value="3.1"/> 
          <ScoreDistribution value="setosa" recordCount="0.0"/> 
          <ScoreDistribution value="versicolor" recordCount="1.0"/> 
          <ScoreDistribution value="virginica" recordCount="0.0"/> 
         </Node> 
        </Node> 
        <Node id="17" score="virginica" recordCount="43.0"> 
         <SimplePredicate field="petal length (cm)" operator="greaterThan" value="4.8500004"/> 
         <ScoreDistribution value="setosa" recordCount="0.0"/> 
         <ScoreDistribution value="versicolor" recordCount="0.0"/> 
         <ScoreDistribution value="virginica" recordCount="43.0"/> 
        </Node> 
       </Node> 
      </Node> 
     </Node> 
    </TreeModel> 
</PMML>

第一分割（節點1）是花瓣寬度在0.8。節點2（花瓣寬度< = 0.8）捕獲所有的setosa，沒有別的。

您可以比較PMML輸出到graphviz輸出：

from sklearn.externals.six import StringIO 
import pydotplus # this might be pydot for python 2.7 
dot_data = StringIO() 
tree.export_graphviz(clf, 
        out_file=dot_data, 
        feature_names=iris.feature_names, 
        class_names=iris.target_names, 
        filled=True, rounded=True, 
        special_characters=True) 
graph = pydotplus.graph_from_dot_data(dot_data.getvalue()) 
graph.write_pdf("D:/workspace/iris.pdf") 
# for in-line display, you can also do: 
# from IPython.display import Image 
# Image(graph.create_png())

來源

2016-09-26 19:31:19 C8H10N4O2

有沒有一種方法可以在不使用映射器時保留預測變量名稱？我真的需要在評估者方面瞭解他們，但僅僅爲此構建一個映射器太過於過分了。 – KidCrippler

@K我無法弄清楚如何保留沒有映射器的預測變量名稱。您可以嘗試發佈問題。 – C8H10N4O2

答案似乎已過時：'sklearn2pmml'現在使用'PMMLPipeline'。 – sds

將python scikit學習模型導入pmml

回答

相關問題