2015-08-27 100 views
9

是否有一種簡單的方法來拆分保留分隔符的字符串? 取而代之的是:拆分保持分隔符的字符串

let texte = "Ten. Million. Questions. Let's celebrate all we've done together."; 
let v: Vec<&str> = texte.split(|c: char| !(c.is_alphanumeric() || c == '\'')).filter(|s| !s.is_empty()).collect(); 

["Ten", "Million", "Questions", "Let's", "celebrate", "all", "we've", "done", "together"]結果。

我想的東西,給我:

["Ten", ".", " ", "Million", ".", " ", "Questions", ".", " ", "Let's", " ", "celebrate", " ", "all", " ", "we've", " ", "done", " ", "together", "."]

我想那樣的代碼(它假定字符串開頭一個字母和一個「non'字母結尾):

let texte = "Ten. Million. Questions. Let's celebrate all we've done together. "; 
let v1: Vec<&str> = texte.split(|c: char| !(c.is_alphanumeric() || c == '\'')).filter(|s| !s.is_empty()).collect(); 
let v2: Vec<&str> = texte.split(|c: char| c.is_alphanumeric() || c == '\'').filter(|s| !s.is_empty()).collect(); 
let mut w: Vec<&str> = Vec::new(); 

let mut j = 0; 
for i in v2 { 
    w.push(v1[j]); 
    w.push(i); 
    j = j+1; 
} 

這讓我幾乎我前面寫的結果,但它的好:

["Ten", ". ", "Million", ". ", "Questions", ". ", "Let's", " ", "celebrate", " ", "all", " ", "we've", " ", "done", " ", "together", "."] 

但是有沒有更好的方法來編碼?因爲我試圖枚舉v2,但它不起作用,並且在for循環中使用j看起來很粗糙。

+1

正則表達式是你所需要的。 – Onilol

回答

3

我沒能在標準庫中發現了什麼,所以I wrote my own

這個版本使用不穩定的模式API,因爲它是更靈活,但上面的鏈接有,我已經硬編碼爲我的具體回退穩定的用例。

#![feature(pattern)] 

use std::str::pattern::{Pattern,Searcher}; 

#[derive(Copy,Clone,Debug,PartialEq)] 
pub enum SplitType<'a> { 
    Match(&'a str), 
    Delimiter(&'a str), 
} 

pub struct SplitKeepingDelimiter<'p, P> 
    where P: Pattern<'p> 
{ 
    searcher: P::Searcher, 
    start: usize, 
    saved: Option<usize>, 
} 

impl<'p, P> Iterator for SplitKeepingDelimiter<'p, P> 
    where P: Pattern<'p>, 
{ 
    type Item = SplitType<'p>; 

    fn next(&mut self) -> Option<SplitType<'p>> { 
     if self.start == self.searcher.haystack().len() { 
      return None; 
     } 

     if let Some(end_of_match) = self.saved.take() { 
      let s = &self.searcher.haystack()[self.start..end_of_match]; 
      self.start = end_of_match; 
      return Some(SplitType::Delimiter(s)); 
     } 

     match self.searcher.next_match() { 
      Some((start, end)) => { 
       if self.start == start { 
        let s = &self.searcher.haystack()[start..end]; 
        self.start = end; 
        Some(SplitType::Delimiter(s)) 
       } else { 
        let s = &self.searcher.haystack()[self.start..start]; 
        self.start = start; 
        self.saved = Some(end); 
        Some(SplitType::Match(s)) 
       } 
      }, 
      None => { 
       let s = &self.searcher.haystack()[self.start..]; 
       self.start = self.searcher.haystack().len(); 
       Some(SplitType::Match(s)) 
      }, 
     } 
    } 
} 

pub trait SplitKeepingDelimiterExt: ::std::ops::Index<::std::ops::RangeFull, Output = str> { 
    fn split_keeping_delimiter<P>(&self, pattern: P) -> SplitKeepingDelimiter<P> 
     where P: for <'a> Pattern<'a> 
    { 
     SplitKeepingDelimiter { searcher: pattern.into_searcher(&self[..]), start: 0, saved: None } 
    } 
} 

impl SplitKeepingDelimiterExt for str {} 

#[cfg(test)] 
mod test { 
    use super::{SplitKeepingDelimiterExt}; 

    #[test] 
    fn split_with_delimiter() { 
     use super::SplitType::*; 
     let delims = &[',', ';'][..]; 
     let items: Vec<_> = "alpha,beta;gamma".split_keeping_delimiter(delims).collect(); 
     assert_eq!(&items, &[Match("alpha"), Delimiter(","), Match("beta"), Delimiter(";"), Match("gamma")]); 
    } 

    #[test] 
    fn split_with_delimiter_allows_consecutive_delimiters() { 
     use super::SplitType::*; 
     let delims = &[',', ';'][..]; 
     let items: Vec<_> = ",;".split_keeping_delimiter(delims).collect(); 
     assert_eq!(&items, &[Delimiter(","), Delimiter(";")]); 
    } 
} 

你會注意到,我需要跟蹤,如果事情是分隔符與否的一個,但應該是很容易適應,如果你不需要它。

+0

哇,我需要了解更多關於Rust的內容才能理解代碼。然而,我認爲在分割字符串兩次之後,就會得到單詞,然後是相反的模式。你對我的新代碼有什麼看法? – Keho

+1

當['str :: match_indices'](http://doc.rust-lang.org/nightly/std/primitive.str.html#method.match_indices)穩定時,這會更簡單。 – bluss

3

使用str::match_indices

let text = "Ten. Million. Questions. Let's celebrate all we've done together."; 

let mut result = Vec::new(); 
let mut last = 0; 
for (index, matched) in text.match_indices(|c: char| !(c.is_alphanumeric() || c == '\'')) { 
    if last != index { 
     result.push(&text[last..index]); 
    } 
    result.push(matched); 
    last = index + matched.len(); 
} 
if last < text.len() { 
    result.push(&text[last..]); 
} 

println!("{:?}", result); 

打印:

["Ten", ".", " ", "Million", ".", " ", "Questions", ".", " ", "Let\'s", " ", "celebrate", " ", "all", " ", "we\'ve", " ", "done", " ", "together", "."]

相關問題