2013-05-20 45 views
3

我是iOS開發新手,在這一刻我已經實現了NSXMLparser,但我真的不知道如何分離具有相同名稱的標籤,但不同內容,如<description>。在某些feed中,這個標籤只有摘要,其他的包含「img src」,我也想提取它。 (有或沒有CDATA)如何在我的NSXMLParser中實現這種方法來提取圖像

Example of description tags wich i need to grab the images and then pass to my UIImageView: 

<description><![CDATA[ <p>Roger Craig Smith and Troy Baker to play Batman and the Joker respectively in upcoming action game; Deathstroke confirmed as playable character. </p><p><img src="http://image.com.com/gamespot/images/2013/139/ArkhamOrigins_29971_thumb.jpg" 

<description>&lt;img src=&quot;http://cdn.gsmarena.com/vv/newsimg/13/05/samsung-galaxy-s4-active-photos/thumb.jpg&quot; width=&quot;70&quot; height=&quot;92&quot; hspace=&quot;3&quot; alt=&quot;&quot; border=&quot;0&quot; align=left style="background:#333333;padding:0px;margin:0px 4px 0px 0px;border-style:solid;border-color:#aaaaaa;border-width:1px" /&gt; &lt;p&gt; 

我認爲@Rob example解決了我的情況,但我不知道如何在我的NSXMLParser包括,如下所述,分離數據和圖像。我只能抓取這個解析器上的數據(摘要)。

我的NSXMLParser:

- (void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qualifiedName attributes:(NSDictionary *)attributeDict 
{ 
element = [elementName copy]; 


if ([elementName isEqualToString:@"item"]) 
{ 
    elements = [[NSMutableDictionary alloc] init]; 
    title = [[NSMutableString alloc] init]; 
    date = [[NSMutableString alloc] init]; 
    summary = [[NSMutableString alloc] init]; 
    link = [[NSMutableString alloc] init]; 
    img = [[NSMutableString alloc] init]; 
    imageLink = [[NSMutableString alloc]init]; 

} 

if([elementName isEqualToString:@"media:thumbnail"]) { 
    NSLog(@"thumbnails media:thumbnail: %@", attributeDict); 
    imageLink = [attributeDict objectForKey:@"url"]; 
} 

if([elementName isEqualToString:@"media:content"]) { 
    NSLog(@"thumbnails media:content: %@", attributeDict); 
    imageLink = [attributeDict objectForKey:@"url"]; 

} 

if([elementName isEqualToString:@"enclosure"]) { 
    NSLog(@"thumbnails Enclosure %@", attributeDict); 
    imageLink = [attributeDict objectForKey:@"url"]; 
} 

- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string 
{ 
if ([element isEqualToString:@"title"]) 
{ 
    [title appendString:string]; 
} 
else if ([element isEqualToString:@"pubDate"]) 
{ 
    [date appendString:string]; 
} 
else if ([element isEqualToString:@"description"]) 
{ 
    [summary appendString:string]; 

} 
    else if ([element isEqualToString:@"media:description"]) 
{ 
    [summary appendString:string]; 

} 
else if ([element isEqualToString:@"link"]) 
{ 
    [link appendString:string]; 
} 
else if ([element isEqualToString:@"url"]) { 

    [imageLink appendString:string]; 
} 
else if ([element isEqualToString:@"src"]) { 

    [imageLink appendString:string]; 
} 
else if ([element isEqualToString:@"content:encoded"]){ 
    NSString *imgString = [self getImage:string]; 
    if (imgString != nil) { 
     [img appendString:imgString]; 
     NSLog(@"Content of img:%@", img); 
    } 

} 

-(NSString *) getImage:(NSString *)htmlString { 
NSString *url = nil; 

NSScanner *theScanner = [NSScanner scannerWithString:htmlString]; 

[theScanner scanUpToString:@"<img" intoString:nil]; 
if (![theScanner isAtEnd]) { 
    [theScanner scanUpToString:@"src" intoString:nil]; 
    NSCharacterSet *charset = [NSCharacterSet characterSetWithCharactersInString:@"\"'"]; 
    [theScanner scanUpToCharactersFromSet:charset intoString:nil]; 
    [theScanner scanCharactersFromSet:charset intoString:nil]; 
    [theScanner scanUpToCharactersFromSet:charset intoString:&url]; 

} 
return url; 
} 

@end 

回答

2

在您的例子,你剛纔有兩個description元素,每一個具有嵌入在其中的img標籤。您只需像正常解析description,然後拉出img標籤(使用正則表達式,使用下面的我的retrieveImageSourceTagsViaRegex或掃描儀)。

請注意,如果不需要,您不必處理CDATA和非CDATA轉換。雖然NSXMLParserDelegate提供了一個foundCDATA例程,我實際上傾向於而不是實現。在沒有foundCDATA的情況下,標準foundCharacters例程NSXMLParser將優雅地處理您的description標籤(帶和不帶CDATA)的無縫翻譯。

考慮以下的假設XML:

<xml> 
    <descriptions> 
     <description><![CDATA[ <p>Roger Craig Smith and Troy Baker to play Batman and the Joker respectively in upcoming action game; Deathstroke confirmed as playable character. </p><p><img src="http://image.com.com/gamespot/images/2013/139/ArkhamOrigins_29971_thumb.jpg">]]></description> 
     <description>&lt;img src=&quot;http://cdn.gsmarena.com/vv/newsimg/13/05/samsung-galaxy-s4-active-photos/thumb.jpg&quot; width=&quot;70&quot; height=&quot;92&quot; hspace=&quot;3&quot; alt=&quot;&quot; border=&quot;0&quot; align=left style="background:#333333;padding:0px;margin:0px 4px 0px 0px;border-style:solid;border-color:#aaaaaa;border-width:1px" /&gt; &lt;p&gt;</description> 
    </descriptions> 
</xml> 

下面的解析器將解析這兩個description條目,抓住了圖像的URL了出來。正如你所看到的,有沒有特殊處理CDATA需要:

@interface ViewController() <NSXMLParserDelegate> 

@property (nonatomic, strong) NSMutableString *description; 
@property (nonatomic, strong) NSMutableArray *results; 

@end 

@implementation ViewController 

- (void)viewDidLoad 
{ 
    [super viewDidLoad]; 
    // Do any additional setup after loading the view, typically from a nib. 

    NSURL *filename = [[NSBundle mainBundle] URLForResource:@"test" withExtension:@"xml"]; 
    NSXMLParser *parser = [[NSXMLParser alloc] initWithContentsOfURL:filename]; 
    parser.delegate = self; 
    [parser parse]; 

    // full array of dictionary entries 

    NSLog(@"results = %@", self.results); 
} 

- (NSMutableArray *)retrieveImageSourceTagsViaRegex:(NSString *)string 
{ 
    NSError *error = NULL; 
    NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:@"(<img\\s[\\s\\S]*?src\\s*?=\\s*?['\"](.*?)['\"][\\s\\S]*?>)+?" 
                      options:NSRegularExpressionCaseInsensitive 
                      error:&error]; 

    NSMutableArray *results = [NSMutableArray array]; 

    [regex enumerateMatchesInString:string 
          options:0 
           range:NSMakeRange(0, [string length]) 
         usingBlock:^(NSTextCheckingResult *result, NSMatchingFlags flags, BOOL *stop) { 

          [results addObject:[string substringWithRange:[result rangeAtIndex:2]]]; 
         }]; 

    return results; 
} 

#pragma mark - NSXMLParserDelegate 

- (void)parserDidStartDocument:(NSXMLParser *)parser 
{ 
    self.results = [NSMutableArray array]; 
} 

- (void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName attributes:(NSDictionary *)attributeDict 
{ 
    if ([elementName isEqualToString:@"description"]) 
     self.description = [NSMutableString string]; 
} 

- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string 
{ 
    if (self.description) 
     [self.description appendString:string]; 
} 

- (void)parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName 
{ 
    if ([elementName isEqualToString:@"description"]) 
    { 
     NSArray *imgTags = [self retrieveImageSourceTagsViaRegex:self.description]; 
     NSDictionary *result = @{@"description": self.description, @"imgs" : imgTags}; 
     [self.results addObject:result]; 
     self.description = nil; 
    } 
} 

@end 

這會產生以下結果(注意,沒有CDATA):

results = (
     { 
     description = " <p>Roger Craig Smith and Troy Baker to play Batman and the Joker respectively in upcoming action game; Deathstroke confirmed as playable character. </p><p><img src=\"http://image.com.com/gamespot/images/2013/139/ArkhamOrigins_29971_thumb.jpg\">"; 
     imgs =   (
      "http://image.com.com/gamespot/images/2013/139/ArkhamOrigins_29971_thumb.jpg" 
     ); 
    }, 
     { 
     description = "<img src=\"http://cdn.gsmarena.com/vv/newsimg/13/05/samsung-galaxy-s4-active-photos/thumb.jpg\" width=\"70\" height=\"92\" hspace=\"3\" alt=\"\" border=\"0\" align=left style=\"background:#333333;padding:0px;margin:0px 4px 0px 0px;border-style:solid;border-color:#aaaaaa;border-width:1px\" /> <p>"; 
     imgs =   (
      "http://cdn.gsmarena.com/vv/newsimg/13/05/samsung-galaxy-s4-active-photos/thumb.jpg" 
     ); 
    } 
) 

因此,底線,只是解析像普通的XML一樣,不用擔心CDATA,只需使用NSScannerNSRegularExpression解析出圖像URL即可。

+0

我很抱歉沒有足夠清晰,我的意思是說,在一些XML文件中,描述標籤在CDATA中有圖像而其他圖像沒有。 我上面的描述標籤示例來自不同的RSS源,而不是一個XML文件,裏面有兩個描述標籤。 當我在我的NSXMLParser中實現foundCDATA方法時,顯然它會覆蓋我的摘要,並獲取「img src」圖像,但我需要兩者。 請在這裏看到我的解析器[鏈接](https://dl.dropboxusercontent.com/u/1216970/RSSParser.rtf) 謝謝,我真的很感謝你的幫助。 – Edward

+0

@Edward你不必實現'foundCDATA'。如果你不這樣做,標準的'foundCharacters'會自動爲你解析它,從你的CDATA'正確地提取字符(但是不需要'CDATA'開始和結束標記)。特別是如果你有時候混合使用'CDATA',有時候不需要,只是不要實現'foundCDATA','foundCharacters'將會非常優雅地處理。看到我的實施;單個XML文件,一個'description'標籤有一個'CDATA',另一個沒有,但標準'foundCharacter'完全解析。 – Rob

+0

讓我們把這個聊天:http://chat.stackoverflow.com/rooms/30287/chat-with-edward – Rob

相關問題