2016-05-12 97 views
0

如何查找文件中的某些單詞,然後在發現任何單詞時執行某些操作?如何查找文件中的某些單詞然後執行某些操作?

我想做些事情,如果例如任何這些詞bananahorsewindowwhatever被發現在一個文件中。

這是我最後一次嘗試

 Dim thefile As String = "C:\application\thefile" 

    If File.Exists(thefile) Then 
     Using reader As New StreamReader(thefile) 
      While Not reader.EndOfStream 
       Dim line As String = reader.ReadLine() 

       If line.Contains("Banana") OrElse line.Contains("horse") OrElse line.Contains("window") OrElse line.Contains("whatever") Then 
        msgbox("Word(s) found " & line) 
        Do_this_and_that()  
       Else 
        MsgBox("Word(s) not found")  


        Exit While 
       End If 
      End While 
     End Using 
    Else 
     msgbox("File not found") 
    End If 

似乎有這樣做的這麼多的變化,但我不能讓他們的工作的多個單詞,而不是僅僅一個時。這樣做的簡單和最乾淨的方式是什麼?

+0

難道你不想在每個條件之前?你只有第一個。 –

+0

你能否澄清你的問題?我不明白你的意思是「某個文件中的某些單詞*然後使用一個if語句」 - 你想找到單詞然後做別的事嗎?或者你(只)想看看某個單詞是否在某個單詞集中? – Default

+1

當然。更新我的問題。如果在文件中找到單詞(只是一個單詞就足夠了),我確實想做其他的事情。如果他們沒有找到,什麼都不要做。 – MadsTheMan

回答

1

這可能是一個有點性能問題的,但你可以嘗試使用List(Of String)

Dim thefile As String = "C:\application\thefile" 
Dim toCheck as New List(of String) 
'You can fill up your list by whoever you want 
toCheck.Add("banana") 
toCheck.Add("horse") 
'... 
Dim FoundWords As New List(Of String) 

If File.Exists(thefile) Then 
    Using reader As New StreamReader(thefile) 
     While Not reader.EndOfStream 
      Dim line As String = reader.ReadLine() 

      'We check our list to see if it matches 
      For Each item in toCheck 
       if line.Contains(item) then 
        FoundWords.Add(item) 
       End If 
      Next 
     End While 
    End Using 

    If FoundWords.Count > 0 Then 
     msgbox(FoundWords.Count.ToString() & " Word(s) found") 
     Do_this_and_that()  
    Else 
     MsgBox("Word(s) not found")  
    End If 
Else 
    msgbox("File not found") 
End If 

現在,這可以改善,但如果你沒有的話去尋找那幾千應該做的伎倆...

+0

那麼它是一個非常小的文件,我正在使用,所以也許性能不會成爲我的情況下的問題。任何人,如果只有一個單詞是文件,這個工作嗎? msgboxes顯示出來(沒有找到單詞和找到單詞) – MadsTheMan

+0

更新...所以現在它會將所有找到的單詞放在列表中,然後告訴它找到了多少單詞 –

+0

完美!謝謝。 :) – MadsTheMan

2

您需要標記行並使用HashSet。這是最快的方法。把所有詞語的HashSet的再檢查,如果每個字是INIT:

static void Main() 
    { 
     var file = @"C:\application\thefile"; 
     var hashSet = new HashSet<string>(new[] { "banana", "horse", "window", "whatever" }.Select(x => x.ToLower())); 

     foreach (var word in GetWords(file)) 
     { 
      Console.WriteLine(word); 

      if (hashSet.Contains(word)) 
      { 
       //DoSomething(); 
       Console.WriteLine("\tFound!!"); 
       //Continue or Break; 
      } 
     } 
    } 

    private static IEnumerable<string> GetWords(string file) 
    { 
     var rg = new Regex(@"[^\p{L}]"); 
     const int bufferLen = 512; 

     using (var reader = File.OpenText(file)) 
     { 
      var word = new StringBuilder(); 

      while (!reader.EndOfStream) 
      { 
       var buffer = new char[bufferLen]; 

       var readChars = reader.ReadBlock(buffer, 0, bufferLen); 

       for (int i = 0; i < readChars; i++) 
       { 
        if (rg.IsMatch(buffer[i].ToString()))//end of the word 
        { 
         if (word.Length > 0) 
         { 
          yield return word.ToString(); 
          word = new StringBuilder(); 
         } 
        } 
        else 
         word.Append(Char.ToLowerInvariant(buffer[i])); 
       } 
      } 

      if (word.Length > 0) 
       yield return word.ToString(); 
     } 
    } 

,並在這裏VB

Imports System.Text.RegularExpressions 
Imports System.IO 
Imports System.Text 

Module Module1 

    Sub Main() 
     Dim filename = "C:\application\thefile" 
     Dim words() As String = {"banana", "horse", "window", "whatever"} 
     Dim bagOfWords = New HashSet(Of String)(words.Select(Function(x) x.ToLower())) 

     For Each word As String In GetWords(filename) 
      Console.WriteLine(word) 

      If bagOfWords.Contains(word) Then 
       'DoSomething();     
       Console.WriteLine(vbTab & "Found!!") 

       'Exit For if you need to terminate here; 
      End If 
     Next 
    End Sub 

    Private Iterator Function GetWords(filename As String) As IEnumerable(Of String) 
     Dim rg = New Regex("[^\p{L}]") 
     Const bufferLen As Integer = 512 

     Using reader As New StreamReader(filename) 
      Dim word = New StringBuilder() 

      While Not reader.EndOfStream 
       Dim buffer = New Char(bufferLen - 1) {} 

       Dim readChars = reader.ReadBlock(buffer, 0, bufferLen) 

       For i As Integer = 0 To readChars - 1 
        If rg.IsMatch(buffer(i).ToString()) Then 
         'end of the word 
         If word.Length > 0 Then 
          Yield word.ToString() 
          word = New StringBuilder() 
         End If 
        Else 
         word.Append([Char].ToLowerInvariant(buffer(i))) 
        End If 
       Next 
      End While 

      If word.Length > 0 Then 
       Yield word.ToString() 
      End If 
     End Using 
    End Function 

End Module 
+0

如果你需要更復雜的詞來拆分單詞,你應該看看類似於nltk的東西,但是對於.net –

+0

'Tokenize'和'HashSet'對我來說都是兩個陌生的詞,我會對它進行一些研究。謝謝。 – MadsTheMan

+1

你可以從C#翻譯嗎?我放了一個小的代碼片段,但不幸的是它在c#中。爲了得到文字,我使用了正則表達式。對於哈希集,它非常簡單的概念,你應該看看這裏https://msdn.microsoft.com/query/dev12.query?appId=Dev12IDEF1&l=EN-US&k=k(System.Collections.Generic.HashSet%601.% 23); k(TargetFrameworkMoniker-.NETFramework,Version%3Dv4.5); k(DevLang-csharp)&rd = true –

相關問題