所以,我在意大利,並與「最佳電影」奧斯卡名單IMDB在R.扮演運行此代碼:[R htmlTreeParse部分不需要翻譯
library(XML)
fileUrl <- "http://www.imdb.com/search/title?
count=100&groups=oscar_best_picture_winners&sort=year%2Cdesc&ref_=nv_ch_osc_3"
doc <- htmlTreeParse(fileUrl,useInternal=TRUE)
scores <- xpathSApply(doc,"//td[@class='title']",xmlValue)
head(scores,2)
產生以下的輸出:
[1] "\n \n\n\n\n 12 anni schiavo\n (2013)\n\n\n\n \n \n\n1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n\n8.2/10\nX\n \n\n\nIn the antebellum United States, Solomon Northup, a free black man from upstate New York, is abducted and sold into slavery.\n\n Dir: Steve McQueen\n With: Chiwetel Ejiofor, Michael K. Williams, Michael Fassbender\n\n Biography | Drama | History\n \n 134 mins.\n"
[2] "\n \n\n\n\n Argo\n (2012)\n\n\n\n \n \n\n1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n\n7.8/10\nX\n \n\n\nActing under the cover of a Hollywood producer scouting a location for a science fiction film, a CIA agent launches a dangerous operation to rescue six Americans in Tehran during the U.S. hostage crisis in Iran in 1980.\n\n Dir: Ben Affleck\n With: Ben Affleck, Bryan Cranston, John Goodman\n\n Drama | Thriller\n \n 120 mins.\n"
[3] "\n \n\n\n\n The Artist\n (2011)\n\n\n\n \n \n\n1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n\n8.0/10\nX\n \n\n\nA silent movie star meets a young dancer, but the arrival of talking pictures sends their careers in opposite directions.\n\n Dir: Michel Hazanavicius\n With: Jean Dujardin, Bérénice Bejo, John Goodman\n\n Comedy | Drama | Romance\n \n 100 mins.\n"
檢查出換行後的第一個字段...請注意電影1如何將名稱翻譯爲意大利語(英文名稱爲'12 Years a Slave'),但對於電影3,僅給出英語?快進一點,這裏有一個片段進一步沿着只給一個想法(ommitted中間步驟):
> head(scores.df[,1],10)
[1] "12 anni schiavo" "Argo"
[3] "The Artist" "Il discorso del re"
[5] "The Hurt Locker" "The Millionaire"
[7] "Non è un paese per vecchi" "The Departed - Il bene e il male"
[9] "Million Dollar Baby" "Crash: Contatto fisico"
我運行一個Web代理,所以很自然,當我去在Chrome網站它給了我所有的英語,但即使在隱身模式和Internet Explorer中,它也會提供全部英文,所以爲什麼它會部分翻譯某些標題,我該如何強制它停止?
謝謝!
因此,解析的值不存在於傳遞給解析器的URL中? – 2014-10-01 00:20:57
你可以添加問題的網址嗎? – 2014-10-01 00:21:31
正確的,如果我去的URL我看到所有的英文名字。 URL是在fileUrl中提供的,所以'http://www.imdb.com/search/title? count = 100&groups = oscar_best_picture_winners&sort = year%2Cdesc&ref_ = nv_ch_osc_3' – 2014-10-02 11:27:29