2011-02-15 54 views
0

我一直在嘗試使用C#正則表達式從電影名稱中刪除某些字符串失敗。我正在使用的文件名的C#正則表達式的電影文件名

的例子是:

歐洲派(2004)[SD]

事件視界(1997)[720]

快速& Furious(2009)[1080p]

Star Trek(2009)[U nknown]

我想在方括號或圓括號(包括括號本身)

到目前爲止,我使用刪除任何東西:

movieTitleToFetch = Regex.Replace(movieTitleToFetch, "([*\\(\\d{4}\\)])", ""); 

這似乎除去年和括號確定,但我不知道如何刪除方括號和內容,而不會影響其他部分...我已經得到了其他結果,但最近的結果是:

movieTitleToFetch = Regex.Replace(movieTitleToFetch, "([?\\[+A-Z+\\]])", ""); 

這給我留下了:

urorip(2004)

相反的:

歐洲派(2004)[SD]

任何空格是在兩端都離開好吧,因爲我只是執行

movieTitleToFetch = movieTitleToFetch.Trim(); 

結束。

由於提前,

亞歷

回答

3

此正則表達式模式應該工作正常...也許需要一點調整

"[\[\(].+?[\]\)]" 

Regex.Replace(movieTitleToFetch, @"[\[\(].+?[\]\)]", ""); 

這應該從任一「[」或匹配任何「的(「直到下一次出現」]「或」)「

如果這樣不起作用,請嘗試刪除括號中的轉義字符,如下所示...

Regex.Replace(movieTitleToFetch, @"[\[(].+?[\])]", ""); 
+0

感謝Craigt,得到成功的治療的最前一頁版本! (只需爲每個「\」添加一個額外的轉義字符)。您的幫助非常感謝:) – Flexage 2011-02-15 12:31:36

0

廣東話我們用這個來代替: -

if(movieTitleToFetch.Contains("(")) 
     movieTitleToFetch=movieTitleToFetch.Substring(0,movieTitleToFetch.IndexOf("(")); 

上面的代碼一定會回報你完美的電影片名爲這些字符串: -

歐洲派(2004)[SD]

Event Horizo​​n(1997)[720]

Fast & Furious(2009)[108 0P]

星際迷航(2009)[未知]

,如果有發生,你不會有一年,但只有類型的情況下,即: -

歐洲派[SD]

事件視界[ 720]

快速&狂怒[1080]

星際旅行[未知]

然後用這個

if(movieTitleToFetch.Contains("(")) 
     movieTitleToFetch=movieTitleToFetch.Substring(0,movieTitleToFetch.IndexOf("(")); 
else if(movieTitleToFetch.Contains("[")) 
     movieTitleToFetch=movieTitleToFetch.Substring(0,movieTitleToFetch.IndexOf("[")); 
0

該做的伎倆:

@"(\[[^\]]*\])|(\([^\)]*\))" 

它消除了來自 「[」 下一個 「]」 和任何來自任何 「(」 下一個 「)」 。

1

@Craigt非常重要,但確保括號匹配可能更乾淨。

([\[].*?[\]]|[\(].*?[\)]) 
0

可你只需要使用:

string MovieTitle="Star Trek (2009) [Unknown]"; 
movieTitleToFetch= MovieTitle.IndexOf('(')>MovieTitle.IndexOf('[')? 
        MovieTitle.Substring(0,MovieTitle.IndexOf('[')): 
        MovieTitle.Substring(0,MovieTitle.IndexOf('(')); 
0

我想出了.+\s(?<year>\(\d{4}\))\s(?<format>\[\w+\])它匹配您的任何實例,幷包含一年的格式命名捕捉組來幫助你更換。

這種模式轉換爲:

任意字符,一個或多個repitions
空白
文字 '(' 後跟4位數字加文字 ')'(年)
空白
文字「[ '接着是字母數字,一個或多個重複,然後是字面']'(格式)

0

我知道我在這個線程晚了,但我寫了一個簡單的algorythm來消毒下載的電影文件名。

這將運行下列步驟操作:

  1. 刪除括號中的一切(如果發現一年它試圖保持信息)
  2. 刪除的常用詞(720P,bdrip,H264等列表...)
  3. 假設條件是,可以在標題語言信息,如果一年沒有被發現到括號在剩餘的字符串末尾時除去它們(不計特殊字)
  4. 看着剩下的字符串(如對語言的結束)

這樣做會替換點和空格,以便標題準備就緒,例如,查詢api。

下面是的xUnit測試(我最常用的意大利冠軍的測試吧)

using Grappachu.Movideo.Core.Helpers.TitleCleaner; 
using SharpTestsEx; 
using Xunit; 

namespace Grappachu.MoVideo.Test 
{ 
    public class TitleCleanerTest 
    { 
     [Theory] 
     [InlineData("Avengers.Confidential.La.Vedova.Nera.E.Punisher.2014.iTALiAN.Bluray.720p.x264 - BG.mkv", 
      "Avengers Confidential La Vedova Nera E Punisher", 2014)] 
     [InlineData("Fuck You, Prof! (2013) BDRip 720p HEVC ITA GER AC3 Multi Sub PirateMKV.mkv", 
      "Fuck You, Prof!", 2013)] 
     [InlineData("Il Libro della Giungla(2016)(BDrip1080p_H264_AC3 5.1 Ita Eng_Sub Ita Eng)by siste82.avi", 
      "Il Libro della Giungla", 2016)] 
     [InlineData("Il primo dei bugiardi (2009) [Mux by Little-Boy]", "Il primo dei bugiardi", 2009)] 
     [InlineData("Il.Viaggio.Di.Arlo-The.Good.Dinosaur.2015.DTS.ITA.ENG.1080p.BluRay.x264-BLUWORLD", 
      "il viaggio di arlo", 2015)] 
     [InlineData("La Mafia Uccide Solo D'estate 2013 .avi", 
      "La Mafia Uccide Solo D'estate", 2013)] 
     [InlineData("Ip.Man.3.2015.iTA.AC3.5.1.448.Chi.Aac.BluRay.m1080p.x264.Sub.[scambiofile.info].mkv", 
      "Ip Man 3", 2015)] 
     [InlineData("Inferno.2016.BluRay.1080p.AC3.ITA.AC3.ENG.Subs.x264-WGZ.mkv", 
      "Inferno", 2016)] 
     [InlineData("Ghostbusters.2016.iTALiAN.BDRiP.EXTENDED.XviD-HDi.mp4", 
      "Ghostbusters", 2016)] 
     [InlineData("Transcendence.mkv", "Transcendence", null)] 
     [InlineData("Being Human (Forsyth, 1994).mkv", "Being Human", 1994)] 
     public void Clean_should_return_title_and_year_when_possible(string filename, string title, int? year) 
     { 
      var res = MovieTitleCleaner.Clean(filename); 

      res.Title.ToLowerInvariant().Should().Be.EqualTo(title.ToLowerInvariant()); 
      res.Year.Should().Be.EqualTo(year); 
     } 
    } 
} 

和代碼

using System; 
using System.Globalization; 
using System.IO; 
using System.Linq; 
using System.Text.RegularExpressions; 

namespace Grappachu.Movideo.Core.Helpers.TitleCleaner 
{ 
    public class MovieTitleCleanerResult 
    { 
     public string Title { get; set; } 
     public int? Year { get; set; } 
     public string SubTitle { get; set; } 
    } 

    public class MovieTitleCleaner 
    { 
     private const string SpecialMarker = "§=§"; 
     private static readonly string[] ReservedWords; 
     private static readonly string[] SpaceChars; 
     private static readonly string[] Languages; 

     static MovieTitleCleaner() 
     { 
      ReservedWords = new[] 
      { 
       SpecialMarker, "hevc", "bdrip", "Bluray", "x264", "h264", "AC3", "DTS", "480p", "720p", "1080p" 
      }; 
      var cultures = CultureInfo.GetCultures(CultureTypes.AllCultures); 
      var l = cultures.Select(x => x.EnglishName).ToList(); 
      l.AddRange(cultures.Select(x => x.ThreeLetterISOLanguageName)); 
      Languages = l.Distinct().ToArray(); 


      SpaceChars = new[] {".", "_", " "}; 
     } 


     public static MovieTitleCleanerResult Clean(string filename) 
     { 
      var temp = Path.GetFileNameWithoutExtension(filename); 
      int? maybeYear = null; 

      // Remove what's inside brackets trying to keep year info. 
      temp = RemoveBrackets(temp, '{', '}', ref maybeYear); 
      temp = RemoveBrackets(temp, '[', ']', ref maybeYear); 
      temp = RemoveBrackets(temp, '(', ')', ref maybeYear); 

      // Removes special markers (codec, formats, ecc...) 
      var tokens = temp.Split(SpaceChars, StringSplitOptions.RemoveEmptyEntries); 
      var title = string.Empty; 
      for (var i = 0; i < tokens.Length; i++) 
      { 
       var tok = tokens[i]; 
       if (ReservedWords.Any(x => string.Equals(x, tok, StringComparison.OrdinalIgnoreCase))) 
       { 
        if (title.Length > 0) 
         break; 
       } 
       else 
       { 
        title = string.Join(" ", title, tok).Trim(); 
       } 
      } 
      temp = title; 

      // Remove languages infos when are found before special markers (should not remove "English" if it's inside the title) 
      tokens = temp.Split(SpaceChars, StringSplitOptions.RemoveEmptyEntries); 
      for (var i = tokens.Length - 1; i >= 0; i--) 
      { 
       var tok = tokens[i]; 
       if (Languages.Any(x => string.Equals(x, tok, StringComparison.OrdinalIgnoreCase))) 
        tokens[i] = string.Empty; 
       else 
        break; 
      } 
      title = string.Join(" ", tokens).Trim(); 


      // If year is not found inside parenthesis try to catch at the end, just after the title 
      if (!maybeYear.HasValue) 
      { 
       var resplit = title.Split(SpaceChars, StringSplitOptions.RemoveEmptyEntries); 
       var last = resplit.Last(); 
       if (LooksLikeYear(last)) 
       { 
        maybeYear = int.Parse(last); 
        title = title.Replace(last, string.Empty).Trim(); 
       } 
      } 


      // TODO: review this. when there's one dash separates main title from subtitle 
      var res = new MovieTitleCleanerResult(); 
      res.Year = maybeYear; 
      if (title.Count(x => x == '-') == 1) 
      { 
       var sp = title.Split('-'); 
       res.Title = sp[0]; 
       res.SubTitle = sp[1]; 
      } 
      else 
      { 
       res.Title = title; 
      } 


      return res; 
     } 

     private static string RemoveBrackets(string inputString, char openChar, char closeChar, ref int? maybeYear) 
     { 
      var str = inputString; 
      while (str.IndexOf(openChar) > 0 && str.IndexOf(closeChar) > 0) 
      { 
       var dataGraph = str.GetBetween(openChar.ToString(), closeChar.ToString()); 
       if (LooksLikeYear(dataGraph)) 
       { 
        maybeYear = int.Parse(dataGraph); 
       } 
       else 
       { 
        var parts = dataGraph.Split(SpaceChars, StringSplitOptions.RemoveEmptyEntries); 
        foreach (var part in parts) 
         if (LooksLikeYear(part)) 
         { 
          maybeYear = int.Parse(part); 
          break; 
         } 
       } 
       str = str.ReplaceBetween(openChar, closeChar, string.Format(" {0} ", SpecialMarker)); 
      } 
      return str; 
     } 

     private static bool LooksLikeYear(string dataRound) 
     { 
      return Regex.IsMatch(dataRound, "^(19|20)[0-9][0-9]"); 
     } 
    } 


    public static class StringUtils 
    { 
     public static string GetBetween(this string src, string a, string b, 
      StringComparison comparison = StringComparison.Ordinal) 
     { 
      var idxStr = src.IndexOf(a, comparison); 
      var idxEnd = src.IndexOf(b, comparison); 
      if (idxStr >= 0 && idxEnd > 0) 
      { 
       if (idxStr > idxEnd) 
        Swap(ref idxStr, ref idxEnd); 
       return src.Substring(idxStr + a.Length, idxEnd - idxStr - a.Length); 
      } 
      return src; 
     } 

     private static void Swap<T>(ref T idxStr, ref T idxEnd) 
     { 
      var temp = idxEnd; 
      idxEnd = idxStr; 
      idxStr = temp; 
     } 

     public static string ReplaceBetween(this string s, char begin, char end, string replacement = null) 
     { 
      var regex = new Regex(string.Format("\\{0}.*?\\{1}", begin, end)); 
      return regex.Replace(s, replacement ?? string.Empty); 
     } 
    } 
}