2017-07-14 49 views
1
13-JUL-17                  


Bank User      Space Occupied(GB)        
------------------------------ ------------------        
CKYC_MNSB        .004211426        
CORE_AMARNATH_ASP      8.75262451        
CORE_AMBUJA       6.80389404        
CORE_AMBUJA_ASP      10.0085449        
CORE_ANAND_MERC_ASP     18.9866333        
CORE_BALOTRA       17.8280029        
CORE_BASODA       4.55432129        
CORE_CHHAPI_ASP      11.9767456        
CORE_DHANGDHRA_ASP      13.1849976        
CORE_IDAR_ASP       13.3209229        
CORE_JANTA_HALOL_ASP     12.7955933        

Bank User      Space Occupied(GB)        
------------------------------ ------------------        
CORE_JHALOD_URBAN_ASP     9.19219971        
CORE_MANINAGAR       5.36090088        
CORE_MANINAGAR_ASP      6.31414795        
CORE_SANKHEDA       20.4329834        
CORE_SMCB_ANAND_ASP     11.3191528        
CORE_TARAPUR_ASP      8.24627686        
CORE_VUCB        .000610352        
TBA_TEMP        5.39910889        
TEST_DUNIA        4.15698242        

20 rows selected. 


TABLESPACE NAME    Free Space in GB         
------------------------------ ----------------         
TBAPROJ        33.2736816         

我有上面的文本文件。如何轉換和存儲文本文件爲csv

如何將CSV文件以列分隔存儲?

我有加載文件,但很難從文件中刪除空格。

回答

1

你想每一行由大寫字母和下劃線,然後空格,然後一個數字,中有一個小數點組成的單詞的模式匹配。所以這grep將過濾出那些:

> file_raw <- readLines('file.txt') 
> read.table(
    text=paste(
     file_raw[ 
     grep("^[A-Z_].*\\s*\\.",file_raw) 
     ], 
     collapse="\n"), 
    sep="",head=FALSE) 
         V1   V2 
1    CKYC_MNSB 0.004211426 
2  CORE_AMARNATH_ASP 8.752624510 
3   CORE_AMBUJA 6.803894040 
4  CORE_AMBUJA_ASP 10.008544900 
5 CORE_ANAND_MERC_ASP 18.986633300 
6   CORE_BALOTRA 17.828002900 
7   CORE_BASODA 4.554321290 
8  CORE_CHHAPI_ASP 11.976745600 
9  CORE_DHANGDHRA_ASP 13.184997600 
10   CORE_IDAR_ASP 13.320922900 
11 CORE_JANTA_HALOL_ASP 12.795593300 
12 CORE_JHALOD_URBAN_ASP 9.192199710 
13  CORE_MANINAGAR 5.360900880 
14 CORE_MANINAGAR_ASP 6.314147950 
15   CORE_SANKHEDA 20.432983400 
16 CORE_SMCB_ANAND_ASP 11.319152800 
17  CORE_TARAPUR_ASP 8.246276860 
18    CORE_VUCB 0.000610352 
19    TBA_TEMP 5.399108890 
20   TEST_DUNIA 4.156982420 
21    TBAPROJ 33.273681600 

請注意,如果你期待任何第一令牌來匹配的模式,例如CORE_999lower_case那麼你就需要調整格局。但是如果沒有正式的規範,我們只能繼續提供您所提供的內容。

1

可能有可能是一個更優雅的方式,但這樣做的伎倆:

# read raw file in lines 
file_raw <- readLines('file.txt') 

# remove whitespace 
file_trim <- trimws(file_raw,which = 'both') 

# remove empty lines 
file_trim <- file_trim[file_trim != ''] 

# sub white space with separator , 
file_csv <- gsub('\\s{2,}',',',file_trim) 

最終仍會有一些事情沒有像--分離器和20 rows selected.,但可以很容易地過濾掉如果你想,寫之前或看完後:

file_clean <- file_csv[!grepl('(-){3,}|rows selected',file_csv)] 

write.csv(file_clean,'file_cleaned.csv') 




     > read.csv('file_cleaned.csv') 
    X        x 
1 1      13-JUL-17 
2 2  Bank User,Space Occupied(GB) 
3 3    CKYC_MNSB,.004211426 
4 4  CORE_AMARNATH_ASP,8.75262451 
5 5   CORE_AMBUJA,6.80389404 
6 6  CORE_AMBUJA_ASP,10.0085449 
7 7 CORE_ANAND_MERC_ASP,18.9866333 
8 8   CORE_BALOTRA,17.8280029 
9 9   CORE_BASODA,4.55432129 
10 10  CORE_CHHAPI_ASP,11.9767456 
11 11 CORE_DHANGDHRA_ASP,13.1849976 
12 12   CORE_IDAR_ASP,13.3209229 
13 13 CORE_JANTA_HALOL_ASP,12.7955933 
14 14  Bank User,Space Occupied(GB) 
15 15 CORE_JHALOD_URBAN_ASP,9.19219971 
16 16  CORE_MANINAGAR,5.36090088 
17 17 CORE_MANINAGAR_ASP,6.31414795 
18 18   CORE_SANKHEDA,20.4329834 
19 19 CORE_SMCB_ANAND_ASP,11.3191528 
20 20  CORE_TARAPUR_ASP,8.24627686 
21 21    CORE_VUCB,.000610352 
22 22    TBA_TEMP,5.39910889 
23 23   TEST_DUNIA,4.15698242 
24 24 TABLESPACE NAME,Free Space in GB 
25 25    TBAPROJ,33.2736816 
+2

感謝Val,當我保存文件的銀行名稱和佔用的GB都存儲在單列中如何分開它? – Ree

+0

@Ree請參閱我的編輯:我只是更改正則表達式,僅刪除2個或更多個連續的空格,在單詞之間留下單個空格,以使它們落入單個列 – Val