2011-08-25 60 views
0

現在我的工作我的AppleScript和我被困在這裏..讓我們這個片段作爲HTML代碼的AppleScript:子字符串或格式HTML

<body><div>Apple don't behave accordingly <a href = "http://apple.com>apple</a></div></body> 

一個例子,我需要的是現在返回沒有html標籤的單詞。或者通過與它的一切或刪除托架也許有任何其他方式重新格式化HTML成純文本..

結果應該是:

蘋果不規矩因此蘋果

回答

0

如何使用textutil

on run -- example (don't forget to escape quotes) 
    removeMarkup from "<body><div>Apple don't behave accordingly <a href = \"http://apple.com\">apple</a></div></body>" 
end run 

to removeMarkup from someText -- strip HTML using textutil 
    set someText to quoted form of ("<!DOCTYPE HTML PUBLIC>" & someText) -- fake a HTML document header 
    return (do shell script "echo " & someText & " | /usr/bin/textutil -stdin -convert txt -stdout") -- strip HTML 
end removeMarkup 
+0

工程就像一個魅力..謝謝.. – sicKo

0
on findStrings(these_strings, search_string) 
    set the foundList to {} 
    repeat with this_string in these_strings 
     considering case 
      if the search_string contains this_string then set the end of the foundList to this_string 
     end considering 
    end repeat 
    return the foundList 
end findStrings 

findStrings({"List","Of","Strings","To","find..."}, "...in String to search") 
+0

我不想搜索字符串..我試圖從html代碼中刪除html標籤..代碼會每次都不一樣.. – sicKo

1

以爲我會添加一個額外的答案,因爲我有問題。如果你想UTF-8字符不會迷路,你需要:

set plain_text to do shell script "echo " & quoted form of ("<!DOCTYPE HTML PUBLIC><meta charset=\"UTF-8\">" & html_string) & space & "| textutil -convert txt -stdin -stdout" 

你基本上需要添加<meta charset=\"UTF-8\"> meta標籤,以確保textutil認爲這是UTF-8編碼的文件。