使用TFHpple/hpple（IOS）解析HTML頁面的部分

我正在加載整個HTML頁面並希望獲取特定標記之間的所有內容。對於這個我做：使用TFHpple/hpple（IOS）解析HTML頁面的部分

articleXpathQueryString = @"//article/div[@class='entry breadtext']"; 
articleNodes = [articleParser searchWithXPathQuery:articleXpathQueryString]; 
item.content = [self recursiveHTMLIterator:articleNodes content:@""];

然後，我有一個遞歸函數，它試圖總結從所有子節點的內容以及他們的HTML標籤：

-(NSString*) recursiveHTMLIterator:(NSArray*)elementArray content:(NSString*)content { 
for(TFHppleElement *element in elementArray) { 
    if(![element hasChildren]) { 
     //The element has no children 
    } else { 
     //The element has children 
     NSString *tmpStr = [[element firstChild] content]; 

     if(tmpStr != nil) { 
      NSString *css = [element tagName]; 
      content = [content stringByAppendingString:[self createOpenTag:css]]; 
      content = [content stringByAppendingString:tmpStr]; 
      content = [content stringByAppendingString:[self createCloseTag:css]]; 
     } 

     NSString *missingStr = [[element firstTextChild] content]; 
     if(![missingStr isEqualToString:tmpStr]) { 
      if(missingStr != nil) { 
       NSString *css= [element tagName]; 
       content = [content stringByAppendingString:[self createOpenTag:css]]; 
       content = [content stringByAppendingString:missingStr]; 
       content = [content stringByAppendingString:[self createCloseTag:css]]; 
      } 
     } 

     content = [self recursiveHTMLIterator:element.children content:content]; 
    } 
} 
return content; 
}

然而，即使結果是有點令人滿意，但不獲取的img標籤，並打亂了一點，當是HTML格式如下：

<p> 
<strong>-</strong> 
This text is not parsed because it skips it after it acquires <strong>-</strong>, this is why I have the second if-statement which catches up "missing strings", but they are inserted in the wrong order 
</p>

所以我的問題是，我應該繼續試圖讓遞歸實現方法具d是否正確解析，還是有更簡單的方法來獲取所需的HTML（然後在Web視圖中使用它）。我正在尋找的是所有的內容withing

<article> THIS </article>.

在orther的話，我願做這樣的事情與TFHpple（儘管代碼不工作）：

articleXpathQueryString = @"//article/div[@class='entry breadtext']"; 
articleNodes = [articleParser searchWithXPathQuery:articleXpathQueryString]; 
item.content = [articleParser allContentAsString]; //I simply want everything in articleParser in a string format

來源

2012-12-07 Oberheim

好吧，我最後想出了這個......我希望這有助於如果有人像我一樣愚蠢：

所有需要做的就是加載到web視圖的網址，然後簡單地做一個簡單的JavaScript查詢如下（在webViewDidFinishLoad ）：

NSString *bread_text = [webView stringByEvaluatingJavaScriptFromString:@"document.getElementsByClassName('entry breadtext')[0].innerHTML"];

獲取知名課程中的所有內容。現在我需要弄清楚如何在不顯示webview的情況下加載它，但這似乎比遍歷XML結構更容易:)

來源

2012-12-07 16:19:10 Oberheim

我面臨同樣的挑戰，但我試圖避免加載時間和webView內存成本（取決於內容）。我唯一的選擇是嵌套searchWithXPathQuery調用與更具限制性/目標XPath字符串來抓取目標數據。如果開發一種方法來抽象查詢的一部分並將其返回到另一個XPathQuery，那將是有益的。 – Dan

是的，目前的解決方案確實很糟糕，而且速度很慢，我現在正在查看它，並在找到更好的解決方案時回來。 – Oberheim

使用TFHpple/hpple（IOS）解析HTML頁面的部分

回答

相關問題