2014-12-02 36 views
2

我試圖計算兩個字符串獲得2串

例如

string val1 = "Have a good day"; 
string val2 = "Have a very good day, Joe"; 

其結果將是字符串列表之間的差異之間的差異,2項「非常」和「喬」

到目前爲止,我研究這個任務還沒有止跌回升多

編輯:結果很可能將需要2個獨立的字符串列表,一個持有增加,和一個持有清除

+2

你是什麼意思「你的研究」?你寫了一些代碼嗎?然後分享。你沒有寫任何東西嗎?在這種情況下,我認爲您只希望我們爲您編寫代碼是不公平的。努力! – mason 2014-12-02 16:53:34

+0

我已經編寫了代碼並完成了研究。我的代碼甚至不像預期的那樣工作 – mrb398 2014-12-02 16:54:31

+0

查看https://github.com/mmanela/diffplex – haim770 2014-12-02 16:54:50

回答

1

其實我遵循這個步驟,

(I)Obtain all words從兩個詞,不論特殊字符

(ii)從兩個列表中找到差異

CODE :

string s2 = "Have a very good day, Joe"; 
    IEnumerable<string> diff; 
    MatchCollection matches = Regex.Matches(s1, @"\b[\w']*\b"); 
    IEnumerable<string> first= from m in matches.Cast<Match>() 
       where !string.IsNullOrEmpty(m.Value) 
       select TrimSuffix(m.Value); 
    MatchCollection matches1 = Regex.Matches(s2, @"\b[\w']*\b"); 
    IEnumerable<string> second = from m in matches1.Cast<Match>() 
           where !string.IsNullOrEmpty(m.Value) 
           select TrimSuffix(m.Value); 

    if (second.Count() > first.Count()) 
    { 
     diff = second.Except(first).ToList(); 
    } 
    else 
    { 
     diff = first.Except(second).ToList(); 
    } 
    } 
    static string TrimSuffix(string word) 
    { 
    int apostropheLocation = word.IndexOf('\''); 
    if (apostropheLocation != -1) 
    { 
     word = word.Substring(0, apostropheLocation); 
    } 
    return word; 
    } 

OUTPUT: 非常喬

+1

您的代碼不符合OP的預期結果。 – 2014-12-02 16:58:56

+0

@ErikPhilips我修改了答案 – Sajeetharan 2014-12-02 17:22:50

-1

你必須刪除 '' 爲了得到預期的結果

string s1 = "Have a good day"; 
     string s2 = "Have a very good day, Joe"; 
     int index = s2.IndexOf(','); <----- get the index of the char to be removed 
     IEnumerable<string> diff; 
     IEnumerable<string> first = s1.Split(' ').Distinct(); 
     IEnumerable<string> second = s2.Remove(index, 1).Split(' ').Distinct();<--- remove it 
     if (second.Count() > first.Count()) 
     { 
      diff = second.Except(first).ToList(); 
     } 
     else 
     { 
      diff = first.Except(second).ToList(); 
     } 
2

這是我能想到的最簡單的版本:

class Program 
{ 
    static void Main(string[] args) 
    { 
     string val1 = "Have a good day"; 
     string val2 = "Have a very good day, Joe"; 

     MatchCollection words1 = Regex.Matches(val1, @"\b(\w+)\b"); 
     MatchCollection words2 = Regex.Matches(val2, @"\b(\w+)\b"); 

     var hs1 = new HashSet<string>(words1.Cast<Match>().Select(m => m.Value)); 
     var hs2 = new HashSet<string>(words2.Cast<Match>().Select(m => m.Value)); 

     // Optionaly you can use a custom comparer for the words. 
     // var hs2 = new HashSet<string>(words2.Cast<Match>().Select(m => m.Value), new MyComparer()); 

     // h2 contains after this operation only 'very' and 'Joe' 
     hs2.ExceptWith(hs1); 

    } 
} 

custom comparer

public class MyComparer : IEqualityComparer<string> 
{ 
    public bool Equals(string one, string two) 
    { 
     return one.Equals(two, StringComparison.OrdinalIgnoreCase); 
    } 

    public int GetHashCode(string item) 
    { 
     return item.GetHashCode(); 
    } 
} 
1

此編號:

enum Where { None, First, Second, Both } // somewhere in your source file 

//... 
var val1 = "Have a good calm day calm calm calm"; 
var val2 = "Have a very good day, Joe Joe Joe Joe"; 

var words1 = from m in Regex.Matches(val1, "(\\w+)|(\\S+\\s+\\S+)").Cast<Match>() 
       where m.Success 
       select m.Value.ToLower(); 
var words2 = from m in Regex.Matches(val2, "(\\w+)|(\\S+\\s+\\S+)").Cast<Match>() 
       where m.Success 
       select m.Value.ToLower(); 

var dic = new Dictionary<string, Where>(); 
foreach (var s in words1) 
{ 
    dic[s] = Where.First; 
} 
foreach (var s in words2) 
{ 
    Where b; 
    if (!dic.TryGetValue(s, out b)) b = Where.None; 

    switch (b) 
    { 
     case Where.None: 
      dic[s] = Where.Second; 
      break; 
     case Where.First: 
      dic[s] = Where.Both; 
      break; 
    } 
} 

foreach (var kv in dic.Where(x => x.Value != Where.Both)) 
{ 
    Console.WriteLine(kv.Key); 
} 

給我們'平靜','非常','喬'和'喬',這兩個字符串是不同的;從第一個'冷靜','非常','喬'和'喬'從下一個。它也消除了重複的情況。

並獲得兩個單獨的列出了爲我們展示了其字從文本傳來:

var list1 = dic.Where(x => x.Value == Where.First).ToList(); 
var list2 = dic.Where(x => x.Value == Where.Second).ToList(); 

foreach (var kv in list1) 
{ 
    Console.WriteLine("{0}: {1}", kv.Key, kv.Value); 
} 

foreach (var kv in list2) 
{ 
    Console.WriteLine("{0}: {1}", kv.Key, kv.Value); 
} 
0

把人物分成兩組,然後計算這些套的相對恭維。

相關讚美將在任何優秀集合庫中提供。

您可能需要注意保留字符的順序。