我在寫一個將連接到遠程postgres服務器的服務。 我正在尋找一種確定哪些異常應該被視爲臨時(值得重試)以及如何定義連接到遠程數據庫的適當策略的好方法。如何從Npgsql異常中判斷該呼叫是否值得重試(瞬態故障策略)
該服務使用Npgsql進行數據訪問。 該文檔說Npgsql將針對sql錯誤拋出一個PostgresException,併爲「服務器相關問題」拋出一個NpgsqlException。
到目前爲止,我所能想到的最好方式是假設所有不是PostgresExceptions的異常應該被視爲可能是暫時的,值得重試,但是PostgresException將意味着查詢有問題,重試不會有幫助。我在這個假設中糾正了嗎?
我正在使用Polly創建重試和斷路器策略。 因此,我的政策是這樣的:
Policy.Handle<Exception>(AllButPotgresExceptions()) // if its a postgres exception we know its not going to work even with a retry, so don't
.WaitAndRetryAsync(new[]
{
TimeSpan.FromSeconds(1),
TimeSpan.FromSeconds(2),
TimeSpan.FromSeconds(4)
}, onRetry: (exception, span) => Log.Warning(exception, "Postgres Retry Failure: "))
.WrapAsync(
Policy.Handle<Exception>(AllButPotgresExceptions())
.AdvancedCircuitBreakerAsync(
failureThreshold:.7,
samplingDuration: TimeSpan.FromSeconds(30),
minimumThroughput: 20,
durationOfBreak: TimeSpan.FromSeconds(30),
onBreak: (ex, timeSpan, context) => Log.Warning(ex, "Postres Circuit Breaker Broken: "),
onReset: (context) => Log.Warning("Postres Circuit Breaker Reset: "),
onHalfOpen:() => Log.Warning("Postres Circuit Breaker Half Open: ")
)));
}
}
private static Func<Exception, bool> AllButPotgresExceptions()
{
return ex => ex.GetType() != typeof(PostgresException);
}
是否有更好的方法來確定哪些錯誤可能是短暫的?
UPDATE:
繼吉文的建議,我在開Npgsql的一個新的問題,更新了我的政策是這樣的:
public static Policy PostresTransientFaultPolicy
{
get
{
return postgresTransientPolicy ?? (postgresTransientPolicy = Policy.Handle<Exception>(PostgresDatabaseTransientErrorDetectionStrategy())
.WaitAndRetryAsync(
retryCount: 10,
sleepDurationProvider: retryAttempt => ExponentialBackoff(retryAttempt, 1.4),
onRetry: (exception, span) => Log.Warning(exception, "Postgres Retry Failure: "))
.WrapAsync(
Policy.Handle<Exception>(PostgresDatabaseTransientErrorDetectionStrategy())
.AdvancedCircuitBreakerAsync(
failureThreshold:.4,
samplingDuration: TimeSpan.FromSeconds(30),
minimumThroughput: 20,
durationOfBreak: TimeSpan.FromSeconds(30),
onBreak: (ex, timeSpan, context) => Log.Warning(ex, "Postres Circuit Breaker Broken: "),
onReset: (context) => Log.Warning("Postres Circuit Breaker Reset: "),
onHalfOpen:() => Log.Warning("Postres Circuit Breaker Half Open: ")
)));
}
}
private static TimeSpan ExponentialBackoff(int retryAttempt, double exponent)
{
//TODO add random %20 variance on the exponent
return TimeSpan.FromSeconds(Math.Pow(retryAttempt, exponent));
}
private static Func<Exception, bool> PostgresDatabaseTransientErrorDetectionStrategy()
{
return (ex) =>
{
//if it is not a postgres exception we must assume it will be transient
if (ex.GetType() != typeof(PostgresException))
return true;
var pgex = ex as PostgresException;
switch (pgex.SqlState)
{
case "53000": //insufficient_resources
case "53100": //disk_full
case "53200": //out_of_memory
case "53300": //too_many_connections
case "53400": //configuration_limit_exceeded
case "57P03": //cannot_connect_now
case "58000": //system_error
case "58030": //io_error
//These next few I am not sure whether they should be treated as transient or not, but I am guessing so
case "55P03": //lock_not_available
case "55006": //object_in_use
case "55000": //object_not_in_prerequisite_state
case "08000": //connection_exception
case "08003": //connection_does_not_exist
case "08006": //connection_failure
case "08001": //sqlclient_unable_to_establish_sqlconnection
case "08004": //sqlserver_rejected_establishment_of_sqlconnection
case "08007": //transaction_resolution_unknown
return true;
}
return false;
};
}
感謝您的幫助和建議。我會更新我的政策以查找一些特定的錯誤代碼,然後在此處發佈我的修改政策。 –
太好了。請在https://github.com/npgsql/npgsql上打開一個問題 - 我們可以將您的策略納入Npgsql本身。 –
根據我的經驗,最重要的瞬時錯誤是代碼40001(事務序列化失敗)。 –