2010-03-09 18 views
0

這是文本文件,以及它的一部分,我想它,所以我可以把它作爲:正則表達式文本分成6列

Column 1 = distribution 
Column 2 = votes 
Column 3 = rank 
Column 4 = title 
Column 5 = year 
Column 6 = Subtitle (but only where there is a subtitle) 

正則表達式我使用的是:

regexp = 
    "([0-9\\.]+)[ \\t]+([0-9]+)[ \\t]+([0-9\\.]+)[ \\t]+(.*?[ \\t]+\\([0-9]{4}\\).*)"; 

但你可以告訴它似乎並沒有工作,我怎麼可能能夠解決它的任何想法..

1000000103  50 4.5 #1 Single (2006) {THis would be a subtitle example} 
2...1.2.12  8 2.7 $1,000,000 Chance of a Lifetime (1986) 
11..2.2..2  8 5.0 $100 Taxi Ride (2001) 
....13.311  9 7.1 $100,000 Name That Tune (1984) 
3..21...22  10 4.6 $2 Bill (2002) 
30010....3  18 2.7 $25 Million Dollar Hoax (2004) 
2000010002  111 5.6 $40 a Day (2002) 
2000000..4  26 1.6 $5 Cover (2009) 
.0..2.0122  15 7.8 $9.99 (2003) 
..2...1113  8 7.5 $weepstake$ (1979) 
0000000125 3238 8.7 Allo Allo! (1982) 
1....22.12  8 6.5 Allo Allo! (1982) {A Barrel Full of Airmen (#7.7) 

CODE IM應用:

try { 
     FileInputStream file_stream = new FileInputStream("/Users/angadsoni/Desktop/ratings-1.txt"); 
     DataInputStream data_stream = new DataInputStream(file_stream); 
     BufferedReader bf = new BufferedReader(new InputStreamReader(data_stream)); 
     ResultSet rs; 
     Statement stmt; 
     Connection con = null; 
     Class.forName("org.gjt.mm.mysql.Driver").newInstance(); 
     String url = "jdbc:mysql://localhost/mynewdatabase"; 
     con = DriverManager.getConnection(url,"root",""); 
     stmt = con.createStatement(); 
    try{ 
    stmt.executeUpdate("DROP TABLE myTable"); 
    }catch(Exception e){ 
    System.out.print(e); 
    System.out.println("No existing table to delete"); 

    //Create a table in the database named mytable 
    stmt.executeUpdate("CREATE TABLE mytable(distribution char(20)," + "votes integer," + "rank float," + "title char(250)," + "year integer," + "sub char(250));"); 
String rege= "^([\\d.]+)\\s+(\\d+)\\s+([\\d.]+)\\s+(.+?)\\s+\\((\\d+)\\)(?:\\s+\\{([^{}]+))?"; 
    Pattern pattern = Pattern.compile(rege); 
    String line; 
    String data= ""; 
    while ((line = bf.readLine()) != null) { 
    data = line.replaceAll("'", ""); 

Matcher matcher = pattern.matcher(data);

if (matcher.find()) { 
     System.out.println("hello"); 
     String distribution = matcher.group(1); 
     String votes = matcher.group(2); 
     String rank = matcher.group(3); 
     String title = matcher.group(4); 
     String year = matcher.group(5); 
     String sub = matcher.start(6) != -1 ? matcher.group(6) : ""; 
     System.out.printf("%s %8s %6s%n%s (%s) %s%n%n", 
     matcher.group(1), matcher.group(2), matcher.group(3), matcher.group(4), matcher.group(5), 
     matcher.start(6) != -1 ? matcher.group(6) : ""); 
     String todo = ("INSERT into mytable " + 
      "(Distribution, Votes, Rank, Title, Year, Sub) "+ 
      "values ('"+distribution+"', '"+votes+"', '"+rank+"', '"+title+"', '"+year+", '"+sub+"');"); 
     int r = stmt.executeUpdate(todo); 
    }//end if statement 
    }//end while loop 
} 
+0

嗯,我把它分成4個不同的欄目,就像發行,票,排名,標題,我希望標題分成3部分,以便更容易找到取決於年份的東西正則表達式:「([0-9 \\。] +)[\\ s] +([0-9] +)[\\ s] +([ 。0-9] \\ [0-9])[\\ S] +([^ \\秒] * $)「。 這對第4列的工作正常 – 2010-03-09 17:47:25

+0

你不會放棄使用錯誤的工具進行工作嗎? **已經超過**一週**而掙扎。作爲對您之前的一個問題的迴應,我在不到10分鐘的時間內用完整的JDBC代碼編寫了一個完整的工作解析器示例,您可能需要稍微編輯以適應文件中列的位置。你前面那個大的* regex *標籤是怎麼回事? :) http://stackoverflow.com/questions/2360418/would-a-regex-like-this-work-for-these-lines-of-text/2363260#2363260 – BalusC 2010-03-11 02:48:48

回答

0

我首先想到的是,它也許更容易分裂使用空格和StringTokenizer第幾場,然後用正則表達式的其餘3場。這樣你就可以簡化所需的正則表達式。

+0

split()更簡單:'String []部分= s。split(「\\ s +」,4);' – 2010-03-10 07:17:59

1

可能還有其他問題,但第一個障礙是反斜槓不會將它傳送到正則表達式機器。你需要加倍他們。

0

我試圖想出從標題和類似於你在開始的部分正則表達式想出了

(.*)\\s+(\\([0-9]{4}\\))\\s+(.*$) 

也許你可以提供一些更多的代碼,以澄清正是你有做什麼正則表達式?另外,this答案有問題嗎?

0

此正則表達式與您所提供的數據正常工作:

^([\d.]+)\s+(\d+)\s+([\d.]+)\s+(.+?)\s+\((\d+)\)(?:\s+\{([^{}]+))? 

如果沒有字幕,最後一組(組#6)將是無效的。

編輯:這是一個完整的例子:

import java.io.*; 
import java.util.*; 
import java.util.regex.*; 

public class Test 
{ 
    public static void main(String[] args) throws Exception 
    { 
    Pattern p = Pattern.compile(
     "^([\\d.]+)\\s+(\\d+)\\s+([\\d.]+)\\s+(.+?)\\s+\\((\\d+)\\)(?:\\s+\\{([^{}]+))?" 
    ); 
    Matcher m = p.matcher(""); 
    Scanner sc = new Scanner(new File("test.txt")); 
    while (sc.hasNextLine()) 
    { 
     String s = sc.nextLine(); 
     if (m.reset(s).find()) 
     { 
     System.out.printf("%s %8s %6s%n%s (%s) %s%n%n", 
      m.group(1), m.group(2), m.group(3), m.group(4), m.group(5), 
      m.start(6) != -1 ? m.group(6) : ""); 
     } 
    } 
    } 
} 

部分輸出:

1000000103  50 4.5 
#1 Single (2006) THis would be a subtitle example 

2...1.2.12  8 2.7 
$1,000,000 Chance of a Lifetime (1986) 

...等。

+0

不幸的是它似乎不適用於我 – 2010-03-10 18:40:33

+0

@angad:它適用於我;看我的編輯。 – 2010-03-10 19:52:00

+0

是啊,它打印出來就好了。但這裏有一個問題,我遇到了問題,也許你可以幫助我將值保存在一個變量中,這是每個m.group()在一個單獨的變量中的值,然後將它插入到MySQL的表中我的代碼可能會有用,然後你可以告訴我我哪裏出錯了。 – 2010-03-10 21:54:36