2014-01-08 39 views
2

從Golang中的字符串中提取內部子字符串的最佳方式是什麼?在Golang中從HTML中提取文本內容

輸入:

"Hello <p> this is paragraph </p> this is junk <p> this is paragraph 2 </p> this is junk 2" 

輸出:

"this is paragraph \n 
this is paragraph 2" 

是否有圍棋任何字符串包/庫已經做這樣的事情?

package main 

import (
    "fmt" 
    "strings" 
) 

func main() { 
    longString := "Hello world <p> this is paragraph </p> this is junk <p> this is paragraph 2 </p> this is junk 2" 

    newString := getInnerStrings("<p>", "</p>", longString) 

    fmt.Println(newString) 
    //output: this is paragraph \n 
    //  this is paragraph 2 

} 
func getInnerStrings(start, end, str string) string { 
    //Brain Freeze 
     //Regex? 
     //Bytes Loop? 
} 

感謝

+1

[Here](http://golang.org/pkg/regexp)。閱讀關於子匹配的部分;它應該幫助你。 – tenub

+0

是的,我看到了,但我不確定這是否是正確的選擇。儘管爲未來引用添加書籤。 – user3173591

回答

0

StrExtract檢索兩個分隔符之間的字符串。

StrExtract(sExper,cAdelim,cCdelim,nOccur)

sExper:指定要搜索的表達。 sAdelim:指定分隔sExper開頭的 字符。

sCdelim:指定分隔sExper結尾的字符。

nOccur:指定sExper中cAdelim在哪個位置出現以開始提取 。

Go Play

package main 

import (
    "fmt" 
    "strings" 
) 

func main() { 
    s := "a11ba22ba333ba4444ba55555ba666666b" 
    fmt.Println("StrExtract1: ", StrExtract(s, "a", "b", 5)) 
} 

func StrExtract(sExper, sAdelim, sCdelim string, nOccur int) string { 

    aExper := strings.Split(sExper, sAdelim) 

    if len(aExper) <= nOccur { 
     return "" 
    } 

    sMember := aExper[nOccur] 
    aExper = strings.Split(sMember, sCdelim) 

    if len(aExper) == 1 { 
     return "" 
    } 

    return aExper[0] 
} 
0

這裏是我的功能,我已經使用了很多。

func GetInnerSubstring(str string, prefix string, suffix string) string { 
    var beginIndex, endIndex int 
    beginIndex = strings.Index(str, prefix) 
    if beginIndex == -1 { 
     beginIndex = 0 
     endIndex = 0 
    } else if len(prefix) == 0 { 
     beginIndex = 0 
     endIndex = strings.Index(str, suffix) 
     if endIndex == -1 || len(suffix) == 0 { 
      endIndex = len(str) 
     } 
    } else { 
     beginIndex += len(prefix) 
     endIndex = strings.Index(str[beginIndex:], suffix) 
     if endIndex == -1 { 
      if strings.Index(str, suffix) < beginIndex { 
       endIndex = beginIndex 
      } else { 
       endIndex = len(str) 
      } 
     } else { 
      if len(suffix) == 0 { 
       endIndex = len(str) 
      } else { 
       endIndex += beginIndex 
      } 
     } 
    } 

    return str[beginIndex:endIndex] 
} 

你可以在操場https://play.golang.org/p/Xo0SJu0Vq4試試。