2017-08-05 14 views
0

我必須從數據庫中解析10K條目。解析和修復凌亂的字符串數據庫的時間

該數據庫有一個名爲工作小時的字段,顯示了汽車經銷商的辦公室工作時間。

的問題是,這個字段包含描述性的東西是這樣的:

Office working hours from 10 am - 4 pm 
Open from 9AM to 5PM 
Main Showroom from 10:00AM - 5:00PM 
Open 10 AM to 13 PM 
Office 10AM to 3PM -- Showroom 9AM to 4PM 

所以你可以看到的風格各不相同,ampm小寫和大寫,有和沒有空間,時間與零和結腸和無它甚至是兩種風格產生錯誤時間的混合。換句話說,一團糟。此外,每行有多個時間範圍,或不是。

我想將整個事物轉換爲24小時格式。

Office working hours from 10:00 - 16:00 hours 
Open from 9:00 to 17:00 hours 
Main Showroom from 10:00 - 17:00 hours 
Open 10:00 to 13:00 hours 
Office 10:00 to 15:00 -- Showroom 9:00 to 16:00 hours 

我可以去的if就像這些無限數量每隔一小時:

if ([text containsString:@"7 PM"]) { 
    text = [text stringByReplacingOccurrencesOfString:@"7 PM" withString:@"19:00"]; 
    } 

,但這將有幾十億行的,不會是有效的。我將不得不測試大寫,小寫,有無空格和錯誤的條目。

它必須做一個簡單的方法...

任何想法?

+0

結帳NSDateFormatter。 – shallowThought

+0

我認爲這是行不通的。這些時間太緊張了。我正在嘗試它,它幾乎沒有捕捉日期。大部分時間都在崩潰。 – SpaceDog

+0

我想知道如果'NSDataDetector'可以幫助。 – Larme

回答

1

NSRegularExpressionNSDateFormatter的組合將會派上用場。結果是這樣的:

Results

13PM需要一個人工編輯,或者可以有自動修復的方法。

正則表達式是不完美的,它也將捕獲之類的東西36 AM,但日期格式將返回nil

下面的代碼,就可以運行:

NSString *test1 = @"Office working hours from 10 am - 4 pm"; 
NSString *test2 = @"Open from 9AM to 5PM"; 
NSString *test3 = @"Main Showroom from 10:00AM - 5:00PM"; 
NSString *test4 = @"Open 10 AM to 13 PM"; 
NSString *test5 = @"Office 10AM to 3PM -- Showroom 9AM to 4PM"; 

NSMutableArray *newStrings = [NSMutableArray array]; 

// [0-9]+ -> Capture 1 or more digit 
// (?:\\:[0-9]+)? -> Capture ":" optionally, if so capture 1 or more digit 
// ()* -> Capture 0 or more whitespace 
// (am|pm) -> Case insensitive search, captures aM, Pm, AM, pm 

NSString *hourPattern = @"([0-9]+(?:\\:[0-9]+)?()*(am|pm))"; 
NSError *error = nil; 
NSRegularExpression *miniFormatter = [NSRegularExpression 
             regularExpressionWithPattern:hourPattern 
             options:NSRegularExpressionCaseInsensitive | NSRegularExpressionSearch 
             error:&error]; 

if(error) 
{ 
    NSLog(@"%@", error.localizedDescription); 
    return; 
} 

for(NSString *text in @[test1, test2, test3, test4, test5]) 
{ 
    NSArray<NSTextCheckingResult *> *matches = [miniFormatter matchesInString:text 
                     options:kNilOptions 
                     range:NSMakeRange(0, text.length)]; 

    NSString *textToChange = [text copy]; 

    for(NSTextCheckingResult *result in matches) 
    { 
     NSString *foundTime = [text substringWithRange:result.range]; 

     NSString *foundTimeOriginal = [foundTime copy]; // This will be used when finding the current range of the text. 

     // Step 1: Remove whitespace for parsing. 

     foundTime = [foundTime stringByReplacingOccurrencesOfString:@" " withString:@""]; 

     // Step 2: Make am/pm uppercase. 

     foundTime = [foundTime uppercaseString]; 

     NSDateFormatter *dateFormatter = [NSDateFormatter new]; 
     [dateFormatter setTimeZone:[NSTimeZone timeZoneWithName:@"GMT"]]; // You may change it accordingly. 

     NSDate *foundDate; 

     // Step 3: Detect if it's in hh:mm format or hh format. 

     if([foundTime containsString:@":"]) 
     { 
      // hh:mm format 

      [dateFormatter setDateFormat:@"hh:mma"]; 
     } 
     else 
     { 
      // hh format 

      [dateFormatter setDateFormat:@"hha"]; 
     } 

     foundDate = [dateFormatter dateFromString:foundTime]; 

     if(!foundDate) 
     { 
      // There's a problem with parsing (such as 13PM). 
      // Proceeding manually... 

      continue; 
     } 

     //NSLog(@"%@ : %@", foundTime, foundDate); 

     // Step 4: Convert to 24-Hour 

     [dateFormatter setDateFormat:@"HH:mm"]; 

     NSString *convertedTime = [dateFormatter stringFromDate:foundDate]; 

     NSRange currentRange = [textToChange rangeOfString:foundTimeOriginal]; 
     textToChange = [textToChange stringByReplacingCharactersInRange:currentRange withString:convertedTime]; 
    } 

    [newStrings addObject:textToChange]; 
} 

for(NSString *text in newStrings) 
{ 
    NSLog(@"%@", text); 
} 

希望這有助於。

+0

哇!輝煌!謝謝!我試圖破譯你使用過的模式,但我不明白這個部分':\\:' – SpaceDog

+1

不客氣! :) 我意識到沒有必要''(?:: [0-9] +)?'也可以,但解析是這樣的: '?:' - >可選 '\\:' - >逃脫「:」 – EDUsta

+1

啊,我明白了。謝謝! – SpaceDog