2016-03-04 124 views
2

我不熟悉C++文件讀取,但我通過pyspark做了很多。 所以現在我有一個txt文件,該文件內容如下:C++文件讀取和拆分列

1 52 Hayden Smith  18:16 15 M Berlin 

2 54 Mark Puleo  18:25 15 M Berlin 

3 97 Peter Warrington 18:26 29 M New haven 

4 305 Matt Kasprzak  18:53 33 M Falls Church 

5 272 Kevin Solar  19:17 16 M Sterling 

6 394 Daniel Sullivan  19:35 26 M Sterling 

7 42 Kevan DuPont  19:58 18 M Boylston 

8 306 Chris Goethert  20:00 43 M Falls Church 

正如你可以看到有8列和351列(其中我只顯示8行), 對於每一行,[0 ]是排名,[1]是BIB,[2]是名,[3]是姓,[4]是時間,[5]是年齡,[6]是性別,[7]是城鎮 例如,第一排,第一排名,第52名BIB,海登史密斯名,18:16是時間,15歲,M是男性,柏林是小鎮。

我有一個排序的鏈接結構,我們稱之爲:類SortedLinked 和項目類型類,叫做:類亞軍

你不必擔心SortedLinked類。

級亞軍有四個私有屬性:

string name, int age, int min, int sec 

在我的驅動程序文件,我可以這樣做:

SortedLinked mylist     // initialize a sorted list 

Runner M("Jordan", 22, 20, 20)  // initialize a Runner called Jordan, who is 22 years old, and finished the race in 20 mins and 20 sec 

mylist.add(M) //add Runner M into my sorted list 

所以我需要閱讀的文本文件,並創建一個亞軍對象跑步者的名字,年齡,分鐘數和秒數。將該Runner插入到已排序的鏈接列表中。

因此,如果這是在pyspark,我可以做到這一點:

file=sc.textFile("hdfs")    //we usually use hdfs in pyspark 

newfile = file.map(lambda line: line.split('\t') //for each column, they are seperated by Tabs, except column[2][3] are separated by a space 

ColumnIneed = newfile.filter(lambda r: [r[2], r[3], r[4], r[5]]) // I only need the column [2][3][4][5] 

mylist = ColumnIneed.collect() // transform the RDD into a list 

Then I can just transform every row into a Runner object. 

,但在C++中,我只知道這一點:

ifstream, infile; 

string s, sAll; 

if(infile.is_open()) 
{ 

    while(getline(line, s)) 

    { 

     s = s.rstrip('\n')  //does NOT work in C++ 
     name, age, time = s.split('\t') // Does NOT work in C++ and I dont need all the columns 

SO,提出問題:

1,我需要訪問每一行,並且去掉換行符

2,我只需要列[2] [3] [4] [5] //每列是s通過標籤eparated

3,柱[4]是時間,這是字符串中的文本文件,我需要拆分 「:」 並投入mintues和秒

4,柱[2] [3]是姓氏和名字,我需要將它們組合成字符串名稱

5,列[2] [3]由空格

分離,理想情況下,我想這樣做:

while(I need a loop) 
{ 

    eachline = access each line; 

    eachline.strip('\n') //strip newline 

    eachline.split('\t') //split Tabs 

    string name = eachline[2][3]; 

    string time = eachline[4]; 

    int min; 

    int sec; 

    min, sec = time.split(':") 

    int age = eachline[5]; 

    Runner M(name, age, min, sec) //I don't know if this works, because it looks like you are overwriting the Runner M each time you access a new line. 

    mylist.add(M)  //add M into my linkedlist, this step you don't need to worry, I already finished. 

} 

如果你有更好的方法做的,我真的很感激它。

+0

請編輯格式。 – muXXmit2X

+0

今天早些時候提出了一個類似的問題。它可能有幫助。 http://stackoverflow.com/questions/35786613/populating-a-string-vector-with-tab-delimited-text – user4581301

回答

0

一些代碼片斷

std::ifstream in; 
    in.open(/*path to file*/); 
    std::string line; 
    if(in.is_open()) 
    { 
     while(std::getline(in, line)) //get 1 row as a string 
     { 
      std::istringstream iss(line); //put line into stringstream 
      std::string word; 
      while(iss >> word) //read word by word 
      { 
       std::cout << word << std::endl; 
      } 
      /* 
      int row; 
      int age; 
      std::string name; 
      iss >> row >> age >> name; // adopt to your input line 
      Runner M(name, age, min, sec); //common agreement - variables shouldn't start with capital, you don't override M, each time u create new local variable type of Runner, then you put copy of M into some container, M gets destroyed at the end of the block, probably you could use movement semantic, but you need C++ basics first  
      mylist.add(M); 
      */ 
     } 
    } 
+0

讓我試試 – JY078

+0

推薦調整代碼實際處理選項卡。現在它將分割所有的空白,而不僅僅是製表符。 – user4581301

+0

std :: istringstream iss(line)//不完整的類型是不允許的,發生了什麼。沒關係,忘了包括 JY078