2017-05-15 38 views
0

我們的應用程序需要使用HAProxy來負載均衡和路由流量(每AZ一個),ALB和ELB對於我們的目的來說配置不夠。當通過AWS CodeDeploy部署新代碼時,我們希望修補的實例進入維護模式(從負載平衡中刪除,連接耗盡)。我們已經修改了缺省的CodeDeploy生命週期bash腳本,通過從相關實例向HAProxy發送SSM運行命令來從各自的HAProxy實例中刪除實例。目前這種修改不起作用,失敗的原因不明。該腳本在手動執行時逐步執行(至少到當前的故障點)。失敗的部分或者是返回「$ INSTANCE_ID似乎不是帶有HAProxy實例的AZ,跳過註銷」的測試,或者上述測試依賴的$ HAPROXY_ID設置。該腳本運行得很好,直到那一刻,但此時退出,因爲它找不到HAProxy實例ID。在AWS CodeDeploy期間從HAProxy刪除實例

我已檢查IAM角色權限/憑據,環境變量和文件權限,這些權限都顯示爲正確。通常情況下,我會將更多的日誌記錄放入腳本中進行調試,但部署對於我們來說實在太少而且太少了。

我的問題:有沒有更好的方法來做到這一點?我只能猜測我們並不是唯一使用CodeDeploy的HAProxy,並且必須有一個可靠的方法來做到這一點。以下是當前正在使用的代碼無效。

#!/bin/bash 
# 
# Copyright 2014 Amazon.com, Inc. or its affiliates. All Rights Reserved. 
# 
# Licensed under the Apache License, Version 2.0 (the "License"). 
# You may not use this file except in compliance with the License. 
# A copy of the License is located at 
# 
# http://aws.amazon.com/apache2.0 
# 
# or in the "license" file accompanying this file. This file is distributed 
# on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either 
# express or implied. See the License for the specific language governing 
# permissions and limitations under the License. 

. $(dirname $0)/common_functions.sh 

if [[ "$DEPLOYMENT_GROUP_NAME" != "redacted" ]]; then 
    msg "ELB Deregistration doesn't need to happen when not on redacted." 
    exit 
fi 

msg "Running AWS CLI with region: $(get_instance_region)" 

# get this instance's ID 
INSTANCE_ID=$(get_instance_id) 
if [ $? != 0 -o -z "$INSTANCE_ID" ]; then 
    error_exit "Unable to get this instance's ID; cannot continue." 
fi 

# Get current time 
msg "Started $(basename $0) at $(/bin/date "+%F %T")" 
start_sec=$(/bin/date +%s.%N) 

msg "Checking if instance $INSTANCE_ID is part of an AutoScaling group" 
asg=$(autoscaling_group_name $INSTANCE_ID) 
if [ $? == 0 -a -n "${asg}" ]; then 
    msg "Found AutoScaling group for instance $INSTANCE_ID: ${asg}" 

    msg "Checking that installed CLI version is at least at version required for AutoScaling Standby" 
    check_cli_version 
    if [ $? != 0 ]; then 
     error_exit "CLI must be at least version ${MIN_CLI_X}.${MIN_CLI_Y}.${MIN_CLI_Z} to work with AutoScaling Standby" 
    fi 

    msg "Attempting to put instance into Standby" 
    autoscaling_enter_standby $INSTANCE_ID "${asg}" 
    if [ $? != 0 ]; then 
     error_exit "Failed to move instance into standby" 
    else 
     msg "Instance is in standby" 
    fi 
fi 

msg "Instance is not part of an ASG, continuing..." 

## Get the instanceID of the HAProxy instance in this AZ and ENVIRONMENT - Will there ever be more than one??? 

HAPROXY_ID=$(/usr/local/bin/aws ec2 describe-instances --region us-east-1 --filters "Name=availability-zone,Values=$(/usr/bin/curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone)" "Name=tag:deployment_group,Values=haproxy.$ENVIRONMENT" --output text | \ 
grep INSTANCES | \ 
awk '{print $8}') 

HAPROXY_IP=$(/usr/local/bin/aws ec2 describe-instances --region us-east-1 --filters "Name=availability-zone,Values=$(/usr/bin/curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone)" "Name=tag:deployment_group,Values=haproxy.$ENVIRONMENT" --output text | \ 
grep INSTANCES | \ 
awk '{print $13}') 

if test -z "$HAPROXY_ID"; then 
    msg "$INSTANCE_ID doesn't seem to be in an AZ with a HAProxy instance, skipping deregistration." 
    exit 
fi 

## Put the current instance into MAINT mode with the HAProxy instance via SSM 

msg "Deregistering $INSTANCE_ID from HAProxy $HAPROXY_ID" 

DEREGCMD="{\"commands\":[\"haproxyctl disable server bk_app_servers/$INSTANCEID\"],\"executionTimeout\":[\"3600\"]}" 

/usr/local/bin/aws ssm send-command \ 
--document-name "AWS-RunShellScript" \ 
--instance-ids "$HAPROXY_ID" \ 
--parameters "$DEREGCMD" \ 
--timeout-seconds 600 \ 
--output-s3-bucket-name "redacted" \ 
--output-s3-key-prefix "haproxy-codedeploy/deregister" \ 
--region us-east-1 

if [ $? != 0 ]; then 
    error_exit "Failed to send SSM command to deregister instance $INSTANCE_ID from HAProxy $HAPROXY_ID" 
fi 

## Wait for all connections to drain from instance 

SESS_COUNT=$(/usr/bin/curl -s "http://$HAPROXY_IP:<portredacted>/<urlredacted>" | grep $INSTANCEID | awk -F "," '{print $5}') 
DRAIN_TIME=60 

msg "Initial session count: $SESS_COUNT" 

while [[ "$SESS_COUNT" -gt 0 ]]; do 
    if [[ "$COUNTER" -gt "$DRAIN_TIME" ]]; then 
     msg "Instance failed to drain all connections within $DRAIN_TIME seconds. Continuing to deploy anyway." 
     break 
    fi 
    msg $SESS_COUNT 
    sleep 1 
    COUNTER=$(($COUNTER + 1)) 
    SESS_COUNT=$(/usr/bin/curl -s "http://$HAPROXY_IP:<portredacted>/<urlredacted>" | grep $INSTANCEID | awk -F "," '{print $5}') 
done 

msg "Finished $(basename $0) at $(/bin/date "+%F %T")" 

end_sec=$(/bin/date +%s.%N) 
elapsed_seconds=$(echo "$end_sec - $start_sec" | /usr/bin/bc) 

msg "Elapsed time: $elapsed_seconds" 

回答

0

目前唯一的選擇是添加更多日誌記錄併發出部署來測試此腳本,然後查看部署日誌。這聽起來像你不知道它爲什麼失敗,只有日誌可以告訴你。

嘗試添加日誌記錄並查看發生了什麼。我們應該只是按照原樣執行腳本,因此不應該的行爲有所不同,但是很難說沒有看到日誌。

祝你好運, -Asaf