2016-04-30 64 views
1

我已經使用bigml.com生成了虹膜數據集的決策樹模型。我已經將此決策樹模型下載爲PMML,並且希望將其用於本地計算機中的預測。從bigml如何使用下載的bigml模型進行本地預測?

<?xml version="1.0" encoding="utf-8"?> 
<PMML version="4.2" xmlns="http://www.dmg.org/PMML-4_2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> 
    <Header description="Generated by BigML"/> 
    <DataDictionary> 
     <DataField dataType="double" displayName="Sepal length" name="000001" optype="continuous"/> 
     <DataField dataType="double" displayName="Sepal width" name="000002" optype="continuous"/> 
     <DataField dataType="double" displayName="Petal length" name="000003" optype="continuous"/> 
     <DataField dataType="double" displayName="Petal width" name="000004" optype="continuous"/> 
     <DataField dataType="string" displayName="Species" name="000005" optype="categorical"> 
      <Value value="Iris-setosa"/> 
      <Value value="Iris-versicolor"/> 
      <Value value="Iris-virginica"/> 
     </DataField> 
    </DataDictionary> 
    <TreeModel algorithmName="mtree" functionName="classification" modelName=""> 
     <MiningSchema> 
      <MiningField name="000001"/> 
      <MiningField name="000002"/> 
      <MiningField name="000003"/> 
      <MiningField name="000004"/> 
      <MiningField name="000005" usageType="target"/> 
     </MiningSchema> 
     <Node recordCount="150" score="Iris-setosa"> 
      <True/> 
      <ScoreDistribution recordCount="50" value="Iris-setosa"/> 
      <ScoreDistribution recordCount="50" value="Iris-versicolor"/> 
      <ScoreDistribution recordCount="50" value="Iris-virginica"/> 
      <Node recordCount="100" score="Iris-versicolor"> 
       <SimplePredicate field="000003" operator="greaterThan" value="2.45"/> 
       <ScoreDistribution recordCount="50" value="Iris-versicolor"/> 
       <ScoreDistribution recordCount="50" value="Iris-virginica"/> 
       <Node recordCount="46" score="Iris-virginica"> 
        <SimplePredicate field="000004" operator="greaterThan" value="1.75"/> 
        <ScoreDistribution recordCount="45" value="Iris-virginica"/> 
        <ScoreDistribution recordCount="1" value="Iris-versicolor"/> 
        <Node recordCount="43" score="Iris-virginica"> 
         <SimplePredicate field="000003" operator="greaterThan" value="4.85"/> 
         <ScoreDistribution recordCount="43" value="Iris-virginica"/> 
        </Node> 
        <Node recordCount="3" score="Iris-virginica"> 
         <SimplePredicate field="000003" operator="lessOrEqual" value="4.85"/> 
         <ScoreDistribution recordCount="2" value="Iris-virginica"/> 
         <ScoreDistribution recordCount="1" value="Iris-versicolor"/> 
         <Node recordCount="1" score="Iris-versicolor"> 
          <SimplePredicate field="000002" operator="greaterThan" value="3.1"/> 
          <ScoreDistribution recordCount="1" value="Iris-versicolor"/> 
         </Node> 
         <Node recordCount="2" score="Iris-virginica"> 
          <SimplePredicate field="000002" operator="lessOrEqual" value="3.1"/> 
          <ScoreDistribution recordCount="2" value="Iris-virginica"/> 
         </Node> 
        </Node> 
       </Node> 
       <Node recordCount="54" score="Iris-versicolor"> 
        <SimplePredicate field="000004" operator="lessOrEqual" value="1.75"/> 
        <ScoreDistribution recordCount="49" value="Iris-versicolor"/> 
        <ScoreDistribution recordCount="5" value="Iris-virginica"/> 
        <Node recordCount="6" score="Iris-virginica"> 
         <SimplePredicate field="000003" operator="greaterThan" value="4.95"/> 
         <ScoreDistribution recordCount="4" value="Iris-virginica"/> 
         <ScoreDistribution recordCount="2" value="Iris-versicolor"/> 
         <Node recordCount="3" score="Iris-versicolor"> 
          <SimplePredicate field="000004" operator="greaterThan" value="1.55"/> 
          <ScoreDistribution recordCount="2" value="Iris-versicolor"/> 
          <ScoreDistribution recordCount="1" value="Iris-virginica"/> 
          <Node recordCount="1" score="Iris-virginica"> 
           <SimplePredicate field="000003" operator="greaterThan" value="5.45"/> 
           <ScoreDistribution recordCount="1" value="Iris-virginica"/> 
          </Node> 
          <Node recordCount="2" score="Iris-versicolor"> 
           <SimplePredicate field="000003" operator="lessOrEqual" value="5.45"/> 
           <ScoreDistribution recordCount="2" value="Iris-versicolor"/> 
          </Node> 
         </Node> 
         <Node recordCount="3" score="Iris-virginica"> 
          <SimplePredicate field="000004" operator="lessOrEqual" value="1.55"/> 
          <ScoreDistribution recordCount="3" value="Iris-virginica"/> 
         </Node> 
        </Node> 
        <Node recordCount="48" score="Iris-versicolor"> 
         <SimplePredicate field="000003" operator="lessOrEqual" value="4.95"/> 
         <ScoreDistribution recordCount="47" value="Iris-versicolor"/> 
         <ScoreDistribution recordCount="1" value="Iris-virginica"/> 
         <Node recordCount="1" score="Iris-virginica"> 
          <SimplePredicate field="000004" operator="greaterThan" value="1.65"/> 
          <ScoreDistribution recordCount="1" value="Iris-virginica"/> 
         </Node> 
         <Node recordCount="47" score="Iris-versicolor"> 
          <SimplePredicate field="000004" operator="lessOrEqual" value="1.65"/> 
          <ScoreDistribution recordCount="47" value="Iris-versicolor"/> 
         </Node> 
        </Node> 
       </Node> 
      </Node> 
      <Node recordCount="50" score="Iris-setosa"> 
       <SimplePredicate field="000003" operator="lessOrEqual" value="2.45"/> 
       <ScoreDistribution recordCount="50" value="Iris-setosa"/> 
      </Node> 
     </Node> 
    </TreeModel> 
</PMML> 

我一般用R進行機器學習,並希望加載和我的系統中使用該模型預測

PMML模型。 R本身有一個pmml包,但它似乎不可能use it for prediction。有沒有其他方法可以在R中使用此PMML模型進行預測。如果不可能,可以將此PMML模型與其他語言(如python或weka)一起使用嗎?如果是的話,我該怎麼做(代碼需要)。從bigml

def predict_species(sepal_width=None, 
        petal_length=None, 
        petal_width=None): 
    """ Predictor for Species from 

     This is perhaps the best known database to be found in the pattern recognition literature. Fisher's paper is a classic 
     in the field and is referenced frequently to this day. (See Duda & Hart, for example.) The data set contains 3 classes 
     of 50 instances each, where each class refers to a type of iris plant. 
     Source 
     Iris Data Set[*] 
     Bache, K. & Lichman, M. (2013). UCI Machine Learning Repository[*]. Irvine, CA: University of California, School of Information and Computer Science. 

     [*]Iris Data Set: http://archive.ics.uci.edu/ml/datasets/Iris 
     [*]UCI Machine Learning Repository: http://archive.ics.uci.edu/ml 
    """ 
    if (petal_length is None): 
     return u'Iris-setosa' 
    if (petal_length > 2.45): 
     if (petal_width is None): 
      return u'Iris-versicolor' 
     if (petal_width > 1.75): 
      if (petal_length > 4.85): 
       return u'Iris-virginica' 
      if (petal_length <= 4.85): 
       if (sepal_width is None): 
        return u'Iris-virginica' 
       if (sepal_width > 3.1): 
        return u'Iris-versicolor' 
       if (sepal_width <= 3.1): 
        return u'Iris-virginica' 
     if (petal_width <= 1.75): 
      if (petal_length > 4.95): 
       if (petal_width > 1.55): 
        if (petal_length > 5.45): 
         return u'Iris-virginica' 
        if (petal_length <= 5.45): 
         return u'Iris-versicolor' 
       if (petal_width <= 1.55): 
        return u'Iris-virginica' 
      if (petal_length <= 4.95): 
       if (petal_width > 1.65): 
        return u'Iris-virginica' 
       if (petal_width <= 1.65): 
        return u'Iris-versicolor' 
    if (petal_length <= 2.45): 
     return u'Iris-setosa' 

回答

2

最簡單的方法

蟒模型來執行本地預測的結果與BigML只是經由API調用直接下載模型(合奏,羣集異常檢測器等)。

例如,使用BigML's Python Bindings的分類或迴歸模型,你會做這樣的事情:

from bigml.model import Model 
model = Model('model/570f4b6e84622c5ed10095a9') 
model.predict({'feature_1': 1, 'feature_2': 2}) 

要使用本地集羣找到最接近的質心:

from bigml.cluster import Cluster 
cluster = Cluster('cluster/572500b849c4a15c9d00451f') 
cluster.centroid({'feature_1': 1, 'feature_2': 2}) 

要使用一個本地異常檢測器來評分一個新的數據點:

from bigml.anomaly import Anomaly 
anomaly_detector = Anomaly('anomaly/570f4c333bbd21090101e79f') 
anomaly_detector.anomaly_score({'feature_1': 1, 'feature_2': 2}) 

T上面的類(模型,集羣和異常)將下載定義每個模型的JSON PML代碼,並將其更改爲本地函數(在本例中爲python)。由於您可能不想使用R來實現真實世界的應用程序,因此最好使用您將用於應用程序的語言執行預測:python,node.js,java等。BigML提供了開放式的應用程序,所有這些源綁定。

相關問題