我知道你想使用CharacterSet
而不是String
,但CharacterSet
不支持(但至少)支持由多個Unicode.Scalar
組成的字符。請參閱Apple在WWDC 2017視頻What's New in Swift的字符串討論中演示的「家庭」字符()或國際標記字符(例如「」或「」)。多膚色表情符號也表現出這種行爲(例如vs)。
因此,我會謹慎使用CharacterSet
(這是一組「用於搜索操作的Unicode字符值」)。或者,如果您想爲方便起見而提供此方法,請注意,它將無法正確使用由多個unicode標量表示的字符。
所以,你可能會提供一個掃描儀,提供了skip
方法既CharacterSet
和String
引渡:
class MyScanner {
let string: String
var index: String.Index
init(_ string: String) {
self.string = string
index = string.startIndex
}
var remains: String { return String(string[index...]) }
/// Skip characters in a string
///
/// This rendition is safe to use with strings that have characters
/// represented by more than one unicode scalar.
///
/// - Parameter skipString: A string with all of the characters to skip.
func skip(charactersIn skipString: String) {
while index < string.endIndex, skipString.contains(string[index]) {
index = string.index(index, offsetBy: 1)
}
}
/// Skip characters in character set
///
/// Note, character sets cannot (yet) include characters that are represented by
/// more than one unicode scalar (e.g. or or). If you want to test
/// for these multi-unicode characters, you have to use the `String` rendition of
/// this method.
///
/// This will simply stop scanning if it encounters a multi-unicode character in
/// the string being scanned (because it knows the `CharacterSet` can only represent
/// single-unicode characters) and you want to avoid false positives (e.g., mistaking
/// the Jamaican flag, , for the Japanese flag,).
///
/// - Parameter characterSet: The character set to check for membership.
func skip(charactersIn characterSet: CharacterSet) {
while index < string.endIndex,
string[index].unicodeScalars.count == 1,
let character = string[index].unicodeScalars.first,
characterSet.contains(character) {
index = string.index(index, offsetBy: 1)
}
}
}
因此,您簡單的例子仍然可以工作:
let scanner = MyScanner("fizz buzz fizz")
scanner.skip(charactersIn: CharacterSet.alphanumerics)
scanner.skip(charactersIn: CharacterSet.whitespaces)
print(scanner.remains) // "buzz fizz"
但使用String
如果要跳過的字符可能包含多個Unicode標量:
let family = "\u{200D}\u{200D}\u{200D}" //
let boy = ""
let charactersToSkip = family + boy
let string = boy + family + "foobar" // foobar
let scanner = MyScanner(string)
scanner.skip(charactersIn: charactersToSkip)
print(scanner.remains) // foobar
正如邁克爾瀑布下面的評論中指出,CharacterSet
有缺陷,甚至不正確地處理32位Unicode.Scalar
值,這意味着它甚至不正確,如果該值超過0xffff
處理單個標字符(包括表情符號等)。然而,上面的String
演繹處理正確。
來源
2017-08-25 03:56:35
Rob
有一個名爲'NSScanner'的系統類,以'Scanner'的形式橋接到Swift中。你檢查過它嗎? –
NSScanner肯定看起來像我正在重新發明的輪子。不是瘋狂的NS語義(使用in'NSString?'參數),但它可能會伎倆。出於好奇,我瀏覽了[source](https://github.com/apple/swift-corelibs-foundation/blob/master/Foundation/Scanner.swift),並將'String'轉換爲'Array',它是'skip'函數,然後使用'set.contains(UnicodeScalar(currentCharacter)!)'。 –
PocketLogic
如果您不喜歡'NSScanner'的NS語義,請使用Foundation的'Scanner',它不使用NS類型。當然,不要用現有類的名稱來定義自己的類。這隻會讓人困惑。 – Rob