2013-07-17 43 views
2

我使用這個調用加載網站的HTML返回nil -檢測HTML編碼時NSURLResponse爲textEncodingName

NSMutableURLRequest *request = [NSMutableURLRequest requestWithURL:url]; 
    [request setValue:@"utf-8" forHTTPHeaderField:@"Accept-Encoding"]; 
    [request setValue:@"text/html" forHTTPHeaderField:@"Accept"]; 
    [NSURLConnection sendAsynchronousRequest:request 
             queue:[NSOperationQueue currentQueue] 
          completionHandler:^(NSURLResponse *response, NSData *data, NSError *error) { ... } 

,然後,以NSData的轉換成的NSString,我需要知道的編碼,所以我呼籲 -

NSString *textEncoding = [response textEncodingName]; 

來自代碼塊,但它在沒有指定「Content-Encoding」標題字段的網站上返回nil。

如果我不知道編碼,[[NSString alloc] initWithData:data encoding:responseEncoding]不會給我可讀的HTML。

如何檢測不發送「Content-Encoding」標題字段的網站的正確編碼?

回答

2

它可以嘗試不同的編碼,看到一個結果與可讀的文本這 -

static int encodingPriority[] = { 
    NSUTF8StringEncoding, 
    NSASCIIStringEncoding, 
    NSISOLatin1StringEncoding, 
    NSISOLatin2StringEncoding, 
    NSUnicodeStringEncoding, 
    NSWindowsCP1251StringEncoding, 
    NSWindowsCP1252StringEncoding, 
    NSWindowsCP1253StringEncoding, 
    NSWindowsCP1254StringEncoding, 
    NSWindowsCP1250StringEncoding, 
    NSNEXTSTEPStringEncoding, 
    NSJapaneseEUCStringEncoding, 
    NSNonLossyASCIIStringEncoding, 
    NSShiftJISStringEncoding,   /* kCFStringEncodingDOSJapanese */ 
    NSISO2022JPStringEncoding,  /* ISO 2022 Japanese encoding for e-mail */ 
    NSMacOSRomanStringEncoding, 
    NSUTF16BigEndianStringEncoding, 
    NSUTF16LittleEndianStringEncoding, 
    NSUTF32StringEncoding, 
    NSUTF32BigEndianStringEncoding, 
    NSUTF32LittleEndianStringEncoding 
}; 

#define REQUIRED_HTML_STRING @"<html" 

- (NSString *)htmlStringForUnknownEncodingData:(NSData *)data detectedEncoding:(NSStringEncoding *)detectedEncoding 
{ 
    NSStringEncoding encoding; 
    NSString *html; 

    for (int i = 0; i < sizeof(encodingPriority); i++) { 
     encoding = encodingPriority[i]; 

     // try this encoding 
     html = [[NSString alloc] initWithData:data encoding:encoding]; 

     // we need to find a text, because bad encoding will return an unreadable text 
     if (html && [html rangeOfString:REQUIRED_HTML_STRING options:NSCaseInsensitiveSearch].location != NSNotFound) { 
      *detectedEncoding = encoding; 
      return html; 
     } 
    } 
    return nil; 
} 

然後,檢測哪些編碼您的NSData的HTML使用,調用 -

NSStringEncoding encoding; 
html = [self htmlStringForUnknownEncodingData:data detectedEncoding:&encoding]; 

if (html) 
    NSLog("Encoding detected!"); 
else 
    NSLog("No encoding detected"); 
0

我嘗試了@Kof的代碼。我注意到我得到的響應編碼是UTF-8。如果你直接設置編碼爲[[NSString alloc] initWithData:data encoding:@"utf-8"],它肯定會返回null。這是因爲編碼接受類型NSENUM的類型NSStringEncoding。如果您嘗試[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding,它會返回結果。