2013-07-26 22 views
2

我試圖提取年份並在單獨的新列上打印,並保持新列對齊。從列中提取信息並將其打印到單獨的對齊列上

這裏的輸入文件:

0000000124 462910 8.8 Star Wars: Episode V - The Empire Strikes Back (1980) 
0000000124 698356 8.8 The Lord of the Rings: The Fellowship of the Ring (2001) 
0000000233 393855 8.8 One Flew Over the Cuckoo's Nest (1975) 
0000000124 733447 8.7 Inception (2010) 
0000000233 411397 8.7 Goodfellas (1990) 
000000.7 Star Wars (1977) 
0000000124 146841 8.7 Shichinin no samurai (1954) 
000000.7 Forrest Gump (1994) 
000000.7 The Matrix (1999) 
000000.7 The Lord of the Rings: The Two Towers (2002) 
0000000233 309137 8.7 Cidade de Deus (2002) 
0000000232 548307 8.6 Se7en (1995) 
0000000232 459707 8.6 The Silence of the Lambs (1991) 

我怎樣才能在這樣一個單獨的列幾年?

0000000124 462910 8.8 Star Wars: Episode V - The Empire Strikes Back     1980 
0000000124 698356 8.8 The Lord of the Rings: The Fellowship of the Ring    2001 
0000000233 393855 8.8 One Flew Over the Cuckoo's Nest         1975 
0000000124 733447 8.7 Inception              2010 
0000000233 411397 8.7 Goodfellas              1990 
000000.7 Star Wars              1977 
0000000124 146841 8.7 Shichinin no samurai           1954 
000000.7 Forrest Gump             1994 
000000.7 The Matrix              1999 
000000.7 The Lord of the Rings: The Two Towers       2002 
0000000233 309137 8.7 Cidade de Deus             2002 
0000000232 548307 8.6 Se7en               1995 
0000000232 459707 8.6 The Silence of the Lambs          1991 
+8

也許您可以嘗試一些內容,然後回到一個真正的編程問題。這是一個問答網站,而不是「爲我工作」網站。問題應該顯示研究。 – shodanex

+2

問題必須**表明對正在解決的問題的最小理解**。告訴我們你試圖去做什麼,爲什麼它沒有工作,以及它應該如何工作。另請參閱:[堆棧溢出問題清單](http://meta.stackexchange.com/questions/156810/stack-overflow-question-checklist) – devnull

回答

4

這裏是一個快速黑客做到這一點:

$ awk '{gsub(/[()]/,"",$NF);$NF="{"$NF}1' file | column -s'{' -t 
0000000124 462910 8.8 Star Wars: Episode V - The Empire Strikes Back  1980 
0000000124 698356 8.8 The Lord of the Rings: The Fellowship of the Ring 2001 
0000000233 393855 8.8 One Flew Over the Cuckoo's Nest      1975 
0000000124 733447 8.7 Inception           2010 
0000000233 411397 8.7 Goodfellas           1990 
000000.7 Star Wars           1977 
0000000124 146841 8.7 Shichinin no samurai        1954 
000000.7 Forrest Gump          1994 
000000.7 The Matrix           1999 
000000.7 The Lord of the Rings: The Two Towers    2002 
0000000233 309137 8.7 Cidade de Deus          2002 
0000000232 548307 8.6 Se7en            1995 
0000000232 459707 8.6 The Silence of the Lambs       1991 

awk是用於去除最後一個字段括號,並插入一個{字符。將輸出傳送到column以使用{作爲分隔符構建表格。我選擇{這個字符,因爲我認爲它不太可能在數據的其他地方出現,如果不是這種情況,請選擇其他字符。

如果我是你,我還引用電影片名:

$ awk '{gsub(/[()]/,"",$NF);$NF="{"$NF;$4=q$4;$(NF-1)=$(NF-1)q}1' q='"' file | .. 
0000000124 462910 8.8 "Star Wars: Episode V - The Empire Strikes Back"  1980 
0000000124 698356 8.8 "The Lord of the Rings: The Fellowship of the Ring" 2001 
0000000233 393855 8.8 "One Flew Over the Cuckoo's Nest"      1975 
0000000124 733447 8.7 "Inception"           2010 
0000000233 411397 8.7 "Goodfellas"           1990 
000000.7 "Star Wars"           1977 
0000000124 146841 8.7 "Shichinin no samurai"        1954 
000000.7 "Forrest Gump"          1994 
000000.7 "The Matrix"           1999 
000000.7 "The Lord of the Rings: The Two Towers"    2002 
0000000233 309137 8.7 "Cidade de Deus"          2002 
0000000232 548307 8.6 "Se7en"            1995 
0000000232 459707 8.6 "The Silence of the Lambs"       1991 

更好的方法是使用像Python語言。

您可以使用字符串函數rfind()來計算填充。如果你有python你應該使用下面的腳本:

import os 
import sys 

try: 
    n = int(sys.argv[2]) 
except IndexError: 
    n = 78 
try: 
    if os.path.isfile(sys.argv[1]): 
     with open(sys.argv[1],'r') as f: 
      for line in f: 
       line = line.strip() 
       pad = n - line.rfind("(") 
       print line[:-7],' '*pad,line[-5:-1] 
    else: 
     print "Please provide a file." 
except IndexError: 
    print "Please provide a file." 

將其保存到這樣的table.py一個文件並運行,如:

$ python table.py file 
0000000124 462910 8.8 Star Wars: Episode V - The Empire Strikes Back  1980 
0000000124 698356 8.8 The Lord of the Rings: The Fellowship of the Ring  2001 
0000000233 393855 8.8 One Flew Over the Cuckoo's Nest      1975 
0000000124 733447 8.7 Inception            2010 
0000000233 411397 8.7 Goodfellas           1990 
000000.7 Star Wars            1977 
0000000124 146841 8.7 Shichinin no samurai         1954 
000000.7 Forrest Gump           1994 
000000.7 The Matrix           1999 
000000.7 The Lord of the Rings: The Two Towers     2002 
0000000233 309137 8.7 Cidade de Deus          2002 
0000000232 548307 8.6 Se7en             1995 
0000000232 459707 8.6 The Silence of the Lambs        1991 
000000.9 The best file (of all time)       2025 

注意添加膜:

000000.9 The best file (of all time) (2025) 

如果你在釋放列的位置需要增加傳遞值,就像第二個參數那樣:

$ python table.py file 100 
0

這裏是一個Python 2.X的解決方案:

$ python --version 
Python 2.7.3 
$ echo "0000000124 462910 8.8 Star Wars: Episode V - The Empire Strikes Back (1980)" | python -c "import sys;s=sys.stdin.readlines()[0]; print '%s\t%s' % (s[:-7], s[-6:-2])" 
0000000124 462910 8.8 Star Wars: Episode V - The Empire Strikes Back 1980 

如果你的字符串tmpfile則:

$ cat tmpfile | python -c "import sys;map(lambda i: sys.stdout.write('%s %s %s\n' % (i[:-8], ' '*(100-len(i)), i[-6:-2])), sys.stdin.readlines())" 
0000000124 462910 8.8 Star Wars: Episode V - The Empire Strikes Back      1980 
0000000124 698356 8.8 The Lord of the Rings: The Fellowship of the Ring     2001 
0000000233 393855 8.8 One Flew Over the Cuckoo's Nest          1975 
0000000124 733447 8.7 Inception               2010 
0000000233 411397 8.7 Goodfellas               1990 
000000.7 Star Wars               1977 
0000000124 146841 8.7 Shichinin no samurai            1954 
000000.7 Forrest Gump              1994 
000000.7 The Matrix               1999 
000000.7 The Lord of the Rings: The Two Towers        2002 
0000000233 309137 8.7 Cidade de Deus              2002 
0000000232 548307 8.6 Se7en                1995 
0000000232 459707 8.6 The Silence of the Lambs           1991 
+1

投擲標籤在一行工作...如何排隊多行的列? –

+0

我修復它,看更新 –

+0

@MichaelKazarian有一個小問題。查看輸出中的第一行。 :)輸入中有一個結尾空格。 – Kent

5
sed 's/)\s*$//' file|column -s '(' -t 

將努力在給定的輸入,並給你預期的輸出。

這裏測試:

kent$ echo "0000000124 462910 8.8 Star Wars: Episode V - The Empire Strikes Back (1980) 
0000000124 698356 8.8 The Lord of the Rings: The Fellowship of the Ring (2001) 
0000000233 393855 8.8 One Flew Over the Cuckoo's Nest (1975) 
0000000124 733447 8.7 Inception (2010) 
0000000233 411397 8.7 Goodfellas (1990) 
000000.7 Star Wars (1977) 
0000000124 146841 8.7 Shichinin no samurai (1954) 
000000.7 Forrest Gump (1994) 
000000.7 The Matrix (1999) 
000000.7 The Lord of the Rings: The Two Towers (2002) 
0000000233 309137 8.7 Cidade de Deus (2002) 
0000000232 548307 8.6 Se7en (1995) 
0000000232 459707 8.6 The Silence of the Lambs (1991)"|sed 's/)\s*$//'|column -s '(' -t 
0000000124 462910 8.8 Star Wars: Episode V - The Empire Strikes Back  1980 
0000000124 698356 8.8 The Lord of the Rings: The Fellowship of the Ring 2001 
0000000233 393855 8.8 One Flew Over the Cuckoo's Nest      1975 
0000000124 733447 8.7 Inception           2010 
0000000233 411397 8.7 Goodfellas           1990 
000000.7 Star Wars           1977 
0000000124 146841 8.7 Shichinin no samurai        1954 
000000.7 Forrest Gump          1994 
000000.7 The Matrix           1999 
000000.7 The Lord of the Rings: The Two Towers    2002 
0000000233 309137 8.7 Cidade de Deus          2002 
0000000232 548307 8.6 Se7en            1995 
0000000232 459707 8.6 The Silence of the Lambs       1991 
+0

注意:這需要電影標題不包含括號,這就是爲什麼我使用大括號代替。 –

+2

@sudo_O我注意到了,我提到了「給定的輸入」,我也在'imdb上搜索了一個小標題'(',沒有找到結果,所以我只是發佈了答案:D – Kent

+0

如果你不關心電影然後檢查@MiklosAubert的答案,我只用這種方法對於電影片名是靈活的。 –

4

下面是與awk的解決方案,這與您的樣本數據的工作原理:

$ awk -F\('{printf("%-77s %d\n", $1, $2)}' movies.txt 

調整格式,以自己的喜好(這裏,當年位於列78。您可以在格式說明符中更改該格式,例如,如果您希望從第100列開始,請使用%-99s

+0

這可能是我的方法,但我試圖確保解決方案會因'('in。+1因爲這可能是忽略警告的最佳解決方案。 –

相關問題