我們的應用程序需要使用HAProxy來負載均衡和路由流量(每AZ一個),ALB和ELB對於我們的目的來說配置不夠。當通過AWS CodeDeploy部署新代碼時,我們希望修補的實例進入維護模式(從負載平衡中刪除,連接耗盡)。我們已經修改了缺省的CodeDeploy生命週期bash腳本,通過從相關實例向HAProxy發送SSM運行命令來從各自的HAProxy實例中刪除實例。目前這種修改不起作用,失敗的原因不明。該腳本在手動執行時逐步執行(至少到當前的故障點)。失敗的部分或者是返回「$ INSTANCE_ID似乎不是帶有HAProxy實例的AZ,跳過註銷」的測試,或者上述測試依賴的$ HAPROXY_ID設置。該腳本運行得很好,直到那一刻,但此時退出,因爲它找不到HAProxy實例ID。在AWS CodeDeploy期間從HAProxy刪除實例
我已檢查IAM角色權限/憑據,環境變量和文件權限,這些權限都顯示爲正確。通常情況下,我會將更多的日誌記錄放入腳本中進行調試,但部署對於我們來說實在太少而且太少了。
我的問題:有沒有更好的方法來做到這一點?我只能猜測我們並不是唯一使用CodeDeploy的HAProxy,並且必須有一個可靠的方法來做到這一點。以下是當前正在使用的代碼無效。
#!/bin/bash
#
# Copyright 2014 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License").
# You may not use this file except in compliance with the License.
# A copy of the License is located at
#
# http://aws.amazon.com/apache2.0
#
# or in the "license" file accompanying this file. This file is distributed
# on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
# express or implied. See the License for the specific language governing
# permissions and limitations under the License.
. $(dirname $0)/common_functions.sh
if [[ "$DEPLOYMENT_GROUP_NAME" != "redacted" ]]; then
msg "ELB Deregistration doesn't need to happen when not on redacted."
exit
fi
msg "Running AWS CLI with region: $(get_instance_region)"
# get this instance's ID
INSTANCE_ID=$(get_instance_id)
if [ $? != 0 -o -z "$INSTANCE_ID" ]; then
error_exit "Unable to get this instance's ID; cannot continue."
fi
# Get current time
msg "Started $(basename $0) at $(/bin/date "+%F %T")"
start_sec=$(/bin/date +%s.%N)
msg "Checking if instance $INSTANCE_ID is part of an AutoScaling group"
asg=$(autoscaling_group_name $INSTANCE_ID)
if [ $? == 0 -a -n "${asg}" ]; then
msg "Found AutoScaling group for instance $INSTANCE_ID: ${asg}"
msg "Checking that installed CLI version is at least at version required for AutoScaling Standby"
check_cli_version
if [ $? != 0 ]; then
error_exit "CLI must be at least version ${MIN_CLI_X}.${MIN_CLI_Y}.${MIN_CLI_Z} to work with AutoScaling Standby"
fi
msg "Attempting to put instance into Standby"
autoscaling_enter_standby $INSTANCE_ID "${asg}"
if [ $? != 0 ]; then
error_exit "Failed to move instance into standby"
else
msg "Instance is in standby"
fi
fi
msg "Instance is not part of an ASG, continuing..."
## Get the instanceID of the HAProxy instance in this AZ and ENVIRONMENT - Will there ever be more than one???
HAPROXY_ID=$(/usr/local/bin/aws ec2 describe-instances --region us-east-1 --filters "Name=availability-zone,Values=$(/usr/bin/curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone)" "Name=tag:deployment_group,Values=haproxy.$ENVIRONMENT" --output text | \
grep INSTANCES | \
awk '{print $8}')
HAPROXY_IP=$(/usr/local/bin/aws ec2 describe-instances --region us-east-1 --filters "Name=availability-zone,Values=$(/usr/bin/curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone)" "Name=tag:deployment_group,Values=haproxy.$ENVIRONMENT" --output text | \
grep INSTANCES | \
awk '{print $13}')
if test -z "$HAPROXY_ID"; then
msg "$INSTANCE_ID doesn't seem to be in an AZ with a HAProxy instance, skipping deregistration."
exit
fi
## Put the current instance into MAINT mode with the HAProxy instance via SSM
msg "Deregistering $INSTANCE_ID from HAProxy $HAPROXY_ID"
DEREGCMD="{\"commands\":[\"haproxyctl disable server bk_app_servers/$INSTANCEID\"],\"executionTimeout\":[\"3600\"]}"
/usr/local/bin/aws ssm send-command \
--document-name "AWS-RunShellScript" \
--instance-ids "$HAPROXY_ID" \
--parameters "$DEREGCMD" \
--timeout-seconds 600 \
--output-s3-bucket-name "redacted" \
--output-s3-key-prefix "haproxy-codedeploy/deregister" \
--region us-east-1
if [ $? != 0 ]; then
error_exit "Failed to send SSM command to deregister instance $INSTANCE_ID from HAProxy $HAPROXY_ID"
fi
## Wait for all connections to drain from instance
SESS_COUNT=$(/usr/bin/curl -s "http://$HAPROXY_IP:<portredacted>/<urlredacted>" | grep $INSTANCEID | awk -F "," '{print $5}')
DRAIN_TIME=60
msg "Initial session count: $SESS_COUNT"
while [[ "$SESS_COUNT" -gt 0 ]]; do
if [[ "$COUNTER" -gt "$DRAIN_TIME" ]]; then
msg "Instance failed to drain all connections within $DRAIN_TIME seconds. Continuing to deploy anyway."
break
fi
msg $SESS_COUNT
sleep 1
COUNTER=$(($COUNTER + 1))
SESS_COUNT=$(/usr/bin/curl -s "http://$HAPROXY_IP:<portredacted>/<urlredacted>" | grep $INSTANCEID | awk -F "," '{print $5}')
done
msg "Finished $(basename $0) at $(/bin/date "+%F %T")"
end_sec=$(/bin/date +%s.%N)
elapsed_seconds=$(echo "$end_sec - $start_sec" | /usr/bin/bc)
msg "Elapsed time: $elapsed_seconds"