0
我正在加載整個HTML頁面並希望獲取特定標記之間的所有內容。對於這個我做:使用TFHpple/hpple(IOS)解析HTML頁面的部分
articleXpathQueryString = @"//article/div[@class='entry breadtext']";
articleNodes = [articleParser searchWithXPathQuery:articleXpathQueryString];
item.content = [self recursiveHTMLIterator:articleNodes content:@""];
然後,我有一個遞歸函數,它試圖總結從所有子節點的內容以及他們的HTML標籤:
-(NSString*) recursiveHTMLIterator:(NSArray*)elementArray content:(NSString*)content {
for(TFHppleElement *element in elementArray) {
if(![element hasChildren]) {
//The element has no children
} else {
//The element has children
NSString *tmpStr = [[element firstChild] content];
if(tmpStr != nil) {
NSString *css = [element tagName];
content = [content stringByAppendingString:[self createOpenTag:css]];
content = [content stringByAppendingString:tmpStr];
content = [content stringByAppendingString:[self createCloseTag:css]];
}
NSString *missingStr = [[element firstTextChild] content];
if(![missingStr isEqualToString:tmpStr]) {
if(missingStr != nil) {
NSString *css= [element tagName];
content = [content stringByAppendingString:[self createOpenTag:css]];
content = [content stringByAppendingString:missingStr];
content = [content stringByAppendingString:[self createCloseTag:css]];
}
}
content = [self recursiveHTMLIterator:element.children content:content];
}
}
return content;
}
然而,即使結果是有點令人滿意,但不獲取的img標籤,並打亂了一點,當是HTML格式如下:
<p>
<strong>-</strong>
This text is not parsed because it skips it after it acquires <strong>-</strong>, this is why I have the second if-statement which catches up "missing strings", but they are inserted in the wrong order
</p>
所以我的問題是,我應該繼續試圖讓遞歸實現方法具d是否正確解析,還是有更簡單的方法來獲取所需的HTML(然後在Web視圖中使用它)。我正在尋找的是所有的內容withing
<article> THIS </article>.
在orther的話,我願做這樣的事情與TFHpple(儘管代碼不工作):
articleXpathQueryString = @"//article/div[@class='entry breadtext']";
articleNodes = [articleParser searchWithXPathQuery:articleXpathQueryString];
item.content = [articleParser allContentAsString]; //I simply want everything in articleParser in a string format
我面臨同樣的挑戰,但我試圖避免加載時間和webView內存成本(取決於內容)。我唯一的選擇是嵌套searchWithXPathQuery調用與更具限制性/目標XPath字符串來抓取目標數據。如果開發一種方法來抽象查詢的一部分並將其返回到另一個XPathQuery,那將是有益的。 – Dan
是的,目前的解決方案確實很糟糕,而且速度很慢,我現在正在查看它,並在找到更好的解決方案時回來。 – Oberheim