2016-11-08 119 views
0

只有兩個支持GPU資源的mesos框架:Marathon和Aurora。我想在具有GPU資源的mesos代理上啓動批處理作業。所以,只有Aurora支持這種類型的工作。但奧羅拉目前尚未正式支持dcos。我試圖整合但不成功。 DCOS Mesos主人不註冊Aurora框架,但參展商爲Aurora創建記錄。我沒有設法在mesos master logs中找到有關Aurora的任何記錄。這裏是我的極光調度器配置:將Apache Aurora與dcos集成

#!/bin/bash 

GLOG_v=0 
LIBPROCESS_PORT=8083 
#LIBPROCESS_IP=127.0.0.1 

JAVA_HOME=/opt/mesosphere/active/java/usr/java 

JAVA_OPTS="-server -Djava.library.path='/opt/mesosphere/lib;/usr/lib;/usr/lib64'" 

PATH=$PATH:/opt/mesosphere/bin 

MESOS_NATIVE_JAVA_LIBRARY=/opt/mesosphere/lib/libmesos.so 

LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/mesosphere/lib 

JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:/opt/mesosphere/lib 

# Flags control the behavior of the Aurora scheduler. 
# For a full list of available flags, run /usr/lib/aurora/bin/aurora-scheduler -help 
AURORA_FLAGS=(
    # The name of this cluster. 
    -cluster_name='My Cluster' 

    # The HTTP port upon which Aurora will listen. 
    -http_port=8088 

    # The ZooKeeper URL of the ZNode where the Mesos master has registered. 
    -mesos_master_address=zk://master_ip1:2181,master_ip2:2181,master_ip3:2181/mesos 

    # The ZooKeeper quorum to which Aurora will register itself. 
    -zk_endpoints=master_ip1:2181,master_ip1:2181,master_ip1:2181 

    # The ZooKeeper ZNode within the specified quorum to which Aurora will register its 
    # ServerSet, which keeps track of all live Aurora schedulers. 
    -serverset_path='/aurora/scheduler' 

    # Allows the scheduling of containers of the provided type. 
    -allowed_container_types='DOCKER,MESOS' 

    -allow_docker_parameters=true 
    -allow_gpu_resource=true 
    -executor_user=root 
    ### Native Log Settings ### 

    # The native log serves as a replicated database which stores the state of the 
    # scheduler, allowing for multi-master operation. 

    # Size of the quorum of Aurora schedulers which possess a native log. If running in 
    # multi-master mode, consult the following document to determine appropriate values: 
    # 
    # https://aurora.apache.org/documentation/latest/deploying-aurora-scheduler/#replicated-log-configuration 
    -native_log_quorum_size=2 
    # The ZooKeeper ZNode to which Aurora will register the locations of its replicated log. 
    -native_log_zk_group_path='/aurora/replicated-log' 
    # The local directory in which an Aurora scheduler can find Aurora's replicated log. 
    -native_log_file_path='/var/lib/aurora/scheduler/db' 
    # The local directory in which Aurora schedulers will place state backups. 
    -backup_dir='/var/lib/aurora/scheduler/backups' 

    ### Thermos Settings ### 

    # The local path of the Thermos executor binary. 
    -thermos_executor_path='/usr/bin/thermos_executor' 
    # Flags to pass to the Thermos executor. 
    -thermos_executor_flags='--announcer-ensemble 127.0.0.1:2181') 
+0

我還沒有聽說有人試圖在DC/OS上運行Aurora。所以你可能是第一個。它計劃讓節拍器支持GPU資源,但實際上它可能不會在未來幾個月。 – KarlKFI

回答

1

我設法啓動DC/OS 1.8上的Aurora框架。 由於mesos和java被嵌入到DS/OS並具有自定義配置,特別是我必須用docker隔離極光的路徑。因此,您可以在我的碼頭回購站找到Aurora組件的碼頭圖像:Aurora schedulerAurora executor。這也可以讓我或其他人創建一個宇宙包。

步驟用於DC/OS部署極光計劃:

  1. 每個DC/OS代理的新建文件夾/var/lib/aurora

  2. 開始使用下的所有DC/OS代理極光執行JSON:

    { 
        "id": "/aurora/aurora-executor", 
        "env": { 
        "MESOS_ROOT": "/var/lib/mesos/slave" 
        }, 
        "instances": 20, 
        "cpus": 1, 
        "mem": 128, 
        "disk": 0, 
        "gpus": 0, 
        "constraints": [ 
        [ 
         "hostname", 
         "UNIQUE" 
        ] 
        ], 
        "container": { 
        "docker": { 
         "image": "krot/aurora-executor", 
         "forcePullImage": true, 
         "privileged": false, 
         "network": "HOST" 
        }, 
        "type": "DOCKER", 
        "volumes": [ 
         { 
         "containerPath": "/var/lib/mesos/slave", 
         "hostPath": "/var/lib/mesos/slave", 
         "mode": "RW" 
         }, 
         { 
         "containerPath": "/var/lib/aurora", 
         "hostPath": "/var/lib/aurora", 
         "mode": "RW" 
         } 
        ] 
        } 
    } 
    

    注意。"instances"設置爲代理人數量。

    2a。極光執行部署的替代方式(應在每個DC來完成/ OS代理):

    sudo yum install -y python2 wget 
    wget -c https://apache.bintray.com/aurora/centos-7/aurora-executor-0.16.0-1.el7.centos.aurora.x86_64.rpm 
    rpm -Uhv --nodeps aurora-executor-0.16.0-1.el7.centos.aurora.x86_64.rpm 
    

    進行一次修改,添加--mesos-root標誌導致類似:

    grep -A5 OBSERVER_ARGS /etc/sysconfig/thermos 
    OBSERVER_ARGS=(
        --port=1338 
        --mesos-root=/var/lib/mesos/slave 
        --log_to_disk=NONE 
        --log_to_stderr=google:INFO 
    ) 
    
  3. 啓動使用下一個JSON極光調度(3種以上的情況下被推薦用於容錯):

    { 
         "id": "/aurora/aurora-scheduler", 
         "env": { 
         "CLUSTER_NAME": "YourCluster", 
         "ZK_ENDPOINTS": "master.mesos:2181", 
         "MESOS_MASTER": "zk://master.mesos:2181/mesos", 
         "QUORUM_SIZE": "2", 
         "EXTRA_SCHEDULER_ARGS": "-allow_gpu_resource=true" 
         }, 
         "instances": 3, 
         "cpus": 1, 
         "mem": 1024, 
         "disk": 0, 
         "gpus": 0, 
         "constraints": [ 
         [ 
          "hostname", 
          "UNIQUE" 
         ] 
         ], 
         "container": { 
         "docker": { 
          "image": "krot/aurora-scheduler", 
          "forcePullImage": true, 
          "privileged": false, 
          "network": "HOST" 
         }, 
         "type": "DOCKER", 
         "volumes": [ 
          { 
          "containerPath": "/var/lib/aurora", 
          "hostPath": "/var/lib/aurora", 
          "mode": "RW" 
          } 
         ] 
         } 
    } 
    

    注意。-allow_gpu_resource=true支持GPU。 Aurora調度程序可以使用環境變量進行配置。詳情請參閱documentation

+0

不錯。推送到[DCOS宇宙](https://github.com/mesosphere/universe) – janisz