正則表達式如何找到「unconsecutive後綴」的後綴

我有許多文字行與2種語言混合，看起來像這樣這個文件：（看的話עשמ和טקסט）正則表達式如何找到「unconsecutive後綴」的後綴

<a href="http://www.example.co.il/search/index.aspx?sQuery=ID:עשמ@111/13&CaseType=טקסט" />

目標：
我想要做的是將「其他語言」文本部分替換爲編碼的部分。

問題：
我只得到「其他語言」文本的第一個字母。

我使用正則表達式的這種模式：

((href=\"http://.+?sQuery=[^\"]*)([א-ת]+)([^\"]*\"))+?

這是該方法的全碼：

string[] files = Directory.GetFiles(@"C:\Test", "*.html", SearchOption.AllDirectories); 
foreach (string file in files) 
{ 
    string fileContent = File.ReadAllText(file, Encoding.GetEncoding(1255)); 
    fileContent = fileContent.Replace("windows-1255", "utf-8");  
    Regex hrefRegex = new Regex("((href=\"http://.+?sQuery=[^\"]*)([א-ת]+)([^\"]*\"))+?"); 

    fileContent = Regex.Replace(fileContent,hrefRegex.ToString(), delegate(Match match) 
    { 
     string textToEncode = match.Groups[3].Value; 
     string encodedText = HttpUtility.UrlEncode(textToEncode, new UTF8 Encoding(false)).ToUpper(); 
     return match.Groups[2].Value + encodedText + match.Groups[4].Value; 
    });   

File.WriteAllText(file + "_fix.html", fileContent, new UTF8Encoding(false)); 
}

我在做什麼錯？

如何更新我的正則表達式模式，以便它可以在href中找到所有「其他語言」部分，因爲現在我只帶上第一個。

來源

2013-09-24 Dvir

你想實現什麼？ –

由於其他語言和每個瀏覽器生成的請求，我遇到了與各種瀏覽器的鏈接問題。它解碼爲另一個Encode，我無法使用該文本。 – Dvir

我可以在表單數據中使用'POST'而不是'GET'並在'URL'中傳遞參數 –

它只有一個匹配，這是整個字符串。如果你想通過char轉換char，你必須使用這個正則表達式：([א-ת])，如果你想翻譯每個單詞，使用這個：([א-ת]+)。

編輯：剛剛在href部分翻譯那些字符，這樣做：

  fileContent = Regex.Replace(fileContent, hrefRegex , delegate(Match match) 
      { 
       string textToEncode = match.ToString(); 
       textToEncode = Regex.Replace(textToEncode, "[א-ת]", delegate(Match smallMatch) 
       { 
        return HttpUtility.UrlEncode(smallMatch.ToString(), new UTF8 Encoding(false)).ToUpper(); 
       }); 
       return textToEncode; 
      });

來源

2013-09-24 08:36:20

我只想翻譯href文本中的文本。不是全部文檔。 – Dvir

看看我的編輯;） –

輝煌。非常感謝Florian！ – Dvir

正則表達式如何找到「unconsecutive後綴」的後綴

回答

相關問題