2014-09-02 15 views
2

我有含有公司名稱的數據的字段,如編程查找一個股票代號中的R

company <- c("Microsoft", "Apple", "Cloudera", "Ford") 
> company 

    Company 
1 Microsoft 
2 Apple 
3 Cloudera 
4 Ford 

等。

tm.plugin.webmining允許您從雅虎查詢數據。基於股票代碼的財務:

require(tm.plugin.webmining) 
results <- WebCorpus(YahooFinanceSource("MSFT")) 

我錯過了中間步驟。我如何根據公司名稱以編程方式查詢票證符號?

回答

5

我無法用tm.plugin.webmining這個軟件包做到這一點,但我想出了一個粗略的解決方案 - 從這個網絡文件中解析數據:ftp://ftp.nasdaqtrader.com/SymbolDirectory/nasdaqlisted.txt我說粗糙,因爲我的呼叫httr::content(httr::GET(...))每次都不工作 - 我認爲它與網址(ftp://)的類型有關,但我不做那麼多的網頁抓取,所以我不能真的解釋這一點。它似乎在我的Linux上比我的Mac更好,但這可能是無關緊要的。無論如何,這裏是我得到的:感謝@ thelatemail的評論,這似乎是工作順利得多:

library(quantmod) ## optional 
symbolData <- read.csv(
    "ftp://ftp.nasdaqtrader.com/SymbolDirectory/nasdaqlisted.txt", 
    sep="|") 
## 
> head(symbolData,10) 
    Symbol             Security.Name Market.Category Test.Issue Financial.Status Round.Lot.Size 
1 AAIT iShares MSCI All Country Asia Information Technology Index Fund    G   N    N   100 
2  AAL     American Airlines Group, Inc. - Common Stock    Q   N    N   100 
3 AAME     Atlantic American Corporation - Common Stock    G   N    N   100 
4 AAOI     Applied Optoelectronics, Inc. - Common Stock    G   N    N   100 
5 AAON          AAON, Inc. - Common Stock    Q   N    N   100 
6 AAPL          Apple Inc. - Common Stock    Q   N    N   100 
7 AAVL     Avalanche Biotechnologies, Inc. - Common Stock    G   N    N   100 
8 AAWW      Atlas Air Worldwide Holdings - Common Stock    Q   N    N   100 
9 AAXJ    iShares MSCI All Country Asia ex Japan Index Fund    G   N    N   100 
10 ABAC      Aoxin Tianli Group, Inc. - Common Shares    S   N    N   100 

編輯: 按@ GSEE的建議,一個(可能)更穩健的方式來獲得源數據是在包TTRstockSymbols()功能:

> symbolData2 <- stockSymbols(exchange="NASDAQ") 
Fetching NASDAQ symbols... 
> ## 
> head(symbolData2) 
    Symbol               Name LastSale MarketCap IPOyear   Sector 
1 AAIT iShares MSCI All Country Asia Information Technology Index Fun 34.556  6911200  NA   <NA> 
2 AAL         American Airlines Group, Inc. 40.500 29164164453  NA Transportation 
3 AAME         Atlantic American Corporation 4.020  83238028  NA  Finance 
4 AAOI         Applied Optoelectronics, Inc. 20.510 303653114 2013  Technology 
5 AAON              AAON, Inc. 18.420 1013324613  NA Capital Goods 
6 AAPL              Apple Inc. 103.300 618546661100 1980  Technology 
         Industry Exchange 
1       <NA> NASDAQ 
2 Air Freight/Delivery Services NASDAQ 
3     Life Insurance NASDAQ 
4     Semiconductors NASDAQ 
5 Industrial Machinery/Components NASDAQ 
6   Computer Manufacturing NASDAQ 

我不知道,如果你只是想從名稱股票代碼,但如果你也正在尋找你可以做一些實際的股價信息像t他:

namedStock <- function(name="Microsoft", 
         start=Sys.Date()-365, 
         end=Sys.Date()-1){ 
    ticker <- symbolData[agrep(name,symbolData[,2]),1] 
    getSymbols(
    Symbols=ticker, 
    src="yahoo", 
    env=.GlobalEnv, 
    from=start,to=end) 
} 
## 
## an xts object named MSFT will be added to 
## the global environment, no need to assign 
## to an object 
namedStock() 
## 
> str(MSFT) 
An ‘xts’ object on 2013-09-03/2014-08-29 containing: 
    Data: num [1:251, 1:6] 31.8 31.4 31.1 31.3 31.2 ... 
- attr(*, "dimnames")=List of 2 
    ..$ : NULL 
    ..$ : chr [1:6] "MSFT.Open" "MSFT.High" "MSFT.Low" "MSFT.Close" ... 
    Indexed by objects of class: [Date] TZ: UTC 
    xts Attributes: 
List of 2 
$ src : chr "yahoo" 
$ updated: POSIXct[1:1], format: "2014-09-02 21:51:22.792" 
> chartSeries(MSFT) 

enter image description here

所以就像我說的,這是不乾淨的解決方案,但希望它可以幫助你。另請注意,我的數據源正在拉動在納斯達克交易的公司(這是大多數主要公司),但您可以輕鬆地將其與其他來源結合使用。

+2

我不知道你爲什麼要打擾所有'httr'和擴展包'read.csv(「ftp://path/file.csv」,sep =「|」)'會得到它精細。 'file ='可以是任何可訪問的連接類型。 – thelatemail 2014-09-03 02:21:54

+0

@thelatemail謝謝你指出 - 我已經更新了我的答案。 – nrussell 2014-09-03 02:34:23

+2

或者你可以使用'stockSymbols()' – GSee 2014-09-03 02:36:44