PowerShell輸出文件：防止編碼更改

我目前正在進行一些搜索和替換操作，我試圖使用PowerShell自動化。不幸的是，我昨天意識到我們的代碼庫中有不同的文件編碼（UTF8和ASCII）。因爲我們在不同的分支中執行這些搜索和替換操作，所以在此階段無法更改文件編碼。PowerShell輸出文件：防止編碼更改

如果我正在運行以下行，即使默認的powershell編碼設置爲iso-8859-1（西歐（Windows）），它也會將所有文件更改爲UCS-2 Little Eindian。

$content = Get-Content $_.Path 
$content -replace 'myOldText' , 'myNewText' | Out-File $_.Path

有沒有辦法阻止powershell改變文件的編碼？

來源

2012-02-02 Pete

Out-File有一個默認的編碼，除非與-Encoding參數重寫：通過讀取試圖讀取它的byte order mark，並用它作爲-Encoding

我做了什麼來解決這個問題是試圖得到原始文件的編碼參數值。

下面是一個處理一堆文本文件路徑，獲取原始編碼，處理內容並使用原始編碼將其寫回文件的示例。

function Get-FileEncoding { 
    param ([string] $FilePath) 

    [byte[]] $byte = get-content -Encoding byte -ReadCount 4 -TotalCount 4 -Path $FilePath 

    if ($byte[0] -eq 0xef -and $byte[1] -eq 0xbb -and $byte[2] -eq 0xbf) 
     { $encoding = 'UTF8' } 
    elseif ($byte[0] -eq 0xfe -and $byte[1] -eq 0xff) 
     { $encoding = 'BigEndianUnicode' } 
    elseif ($byte[0] -eq 0xff -and $byte[1] -eq 0xfe) 
     { $encoding = 'Unicode' } 
    elseif ($byte[0] -eq 0 -and $byte[1] -eq 0 -and $byte[2] -eq 0xfe -and $byte[3] -eq 0xff) 
     { $encoding = 'UTF32' } 
    elseif ($byte[0] -eq 0x2b -and $byte[1] -eq 0x2f -and $byte[2] -eq 0x76) 
     { $encoding = 'UTF7'} 
    else 
     { $encoding = 'ASCII' } 
    return $encoding 
} 

foreach ($textFile in $textFiles) { 
    $encoding = Get-FileEncoding $textFile 
    $content = Get-Content -Encoding $encoding 
    # Process content here... 
    $content | Set-Content -Path $textFile -Encoding $encoding 
}

更新下面是使用StreamReader類獲取原始文件的編碼的一個例子。該示例讀取文件的前3個字節，以便根據其內部BOM檢測例程的結果設置CurrentEncoding屬性。

http://msdn.microsoft.com/en-us/library/9y86s1a9.aspx

的detectEncodingFromByteOrderMarks參數檢測由編碼看的前三個字節流。如果文件以適當的字節順序標記開始，它會自動識別UTF-8，小端Unicode和大端Unicode文本。否則，使用UTF8Encoding 。有關更多信息，請參閱Encoding.GetPreamble方法。

http://msdn.microsoft.com/en-us/library/system.text.encoding.getpreamble.aspx

$text = @" 
This is 
my text file 
contents. 
"@ 

#Create text file. 
[IO.File]::WriteAllText($filePath, $text, [System.Text.Encoding]::BigEndianUnicode) 

#Create a stream reader to get the file's encoding and contents. 
$sr = New-Object System.IO.StreamReader($filePath, $true) 
[char[]] $buffer = new-object char[] 3 
$sr.Read($buffer, 0, 3) 
$encoding = $sr.CurrentEncoding 
$sr.Close() 

#Show the detected encoding. 
$encoding 

#Update the file contents. 
$content = [IO.File]::ReadAllText($filePath, $encoding) 
$content2 = $content -replace "my" , "your" 

#Save the updated contents to file. 
[IO.File]::WriteAllText($filePath, $content2, $encoding) 

#Display the result. 
Get-Content $filePath

來源

2012-02-02 23:35:16

我已經想過這個問題，但必須有一個更簡單的方法，絕不能呢？但現在這對我很有用。謝謝安迪！ – Pete 2012-02-03 03:34:20

@Pete你將不得不得到編碼。沒有任何cmdlet可以爲您提供。我更新了我的答案，添加了不同的方法。兩種方式都使用BOM檢測。 – 2012-02-03 05:20:13

'Set-Content -Path BOM_Utf32.txt -Value $ null -Encoding UTF32'寫入_UTF-32，little-endian_ BOM，即'FF FE 00 00'字節序列。但是，函數Get-FileEncoding返回Unicode。另一方面，'00 00 FE FF'字節序列被識別爲'UTF32'，但按照[Unicode聯盟]（http://unicode.org/faq/utf_bom.html#BOM）這是_UTF-32， big-endian_ BOM。我錯了嗎？錯誤在哪裏？ – JosefZ 2016-05-05 21:55:41

PowerShell輸出文件：防止編碼更改

回答

相關問題