正則表達式文本分成6列

這是文本文件，以及它的一部分，我想它，所以我可以把它作爲：正則表達式文本分成6列

Column 1 = distribution 
Column 2 = votes 
Column 3 = rank 
Column 4 = title 
Column 5 = year 
Column 6 = Subtitle (but only where there is a subtitle)

正則表達式我使用的是：

regexp = 
    "([0-9\\.]+)[ \\t]+([0-9]+)[ \\t]+([0-9\\.]+)[ \\t]+(.*?[ \\t]+\\([0-9]{4}\\).*)";

但你可以告訴它似乎並沒有工作，我怎麼可能能夠解決它的任何想法..

1000000103  50 4.5 #1 Single (2006) {THis would be a subtitle example} 
2...1.2.12  8 2.7 $1,000,000 Chance of a Lifetime (1986) 
11..2.2..2  8 5.0 $100 Taxi Ride (2001) 
....13.311  9 7.1 $100,000 Name That Tune (1984) 
3..21...22  10 4.6 $2 Bill (2002) 
30010....3  18 2.7 $25 Million Dollar Hoax (2004) 
2000010002  111 5.6 $40 a Day (2002) 
2000000..4  26 1.6 $5 Cover (2009) 
.0..2.0122  15 7.8 $9.99 (2003) 
..2...1113  8 7.5 $weepstake$ (1979) 
0000000125 3238 8.7 Allo Allo! (1982) 
1....22.12  8 6.5 Allo Allo! (1982) {A Barrel Full of Airmen (#7.7)

CODE IM應用：

try { 
     FileInputStream file_stream = new FileInputStream("/Users/angadsoni/Desktop/ratings-1.txt"); 
     DataInputStream data_stream = new DataInputStream(file_stream); 
     BufferedReader bf = new BufferedReader(new InputStreamReader(data_stream)); 
     ResultSet rs; 
     Statement stmt; 
     Connection con = null; 
     Class.forName("org.gjt.mm.mysql.Driver").newInstance(); 
     String url = "jdbc:mysql://localhost/mynewdatabase"; 
     con = DriverManager.getConnection(url,"root",""); 
     stmt = con.createStatement(); 
    try{ 
    stmt.executeUpdate("DROP TABLE myTable"); 
    }catch(Exception e){ 
    System.out.print(e); 
    System.out.println("No existing table to delete"); 

    //Create a table in the database named mytable 
    stmt.executeUpdate("CREATE TABLE mytable(distribution char(20)," + "votes integer," + "rank float," + "title char(250)," + "year integer," + "sub char(250));"); 
String rege= "^([\\d.]+)\\s+(\\d+)\\s+([\\d.]+)\\s+(.+?)\\s+\\((\\d+)\\)(?:\\s+\\{([^{}]+))?"; 
    Pattern pattern = Pattern.compile(rege); 
    String line; 
    String data= ""; 
    while ((line = bf.readLine()) != null) { 
    data = line.replaceAll("'", "");

Matcher matcher = pattern.matcher（data）;

if (matcher.find()) { 
     System.out.println("hello"); 
     String distribution = matcher.group(1); 
     String votes = matcher.group(2); 
     String rank = matcher.group(3); 
     String title = matcher.group(4); 
     String year = matcher.group(5); 
     String sub = matcher.start(6) != -1 ? matcher.group(6) : ""; 
     System.out.printf("%s %8s %6s%n%s (%s) %s%n%n", 
     matcher.group(1), matcher.group(2), matcher.group(3), matcher.group(4), matcher.group(5), 
     matcher.start(6) != -1 ? matcher.group(6) : ""); 
     String todo = ("INSERT into mytable " + 
      "(Distribution, Votes, Rank, Title, Year, Sub) "+ 
      "values ('"+distribution+"', '"+votes+"', '"+rank+"', '"+title+"', '"+year+", '"+sub+"');"); 
     int r = stmt.executeUpdate(todo); 
    }//end if statement 
    }//end while loop 
}

來源

2010-03-09 angad Soni

嗯，我把它分成4個不同的欄目，就像發行，票，排名，標題，我希望標題分成3部分，以便更容易找到取決於年份的東西正則表達式：「（[0-9 \\。] +）[\\ s] +（[0-9] +）[\\ s] +（[ 。0-9] \\ [0-9]）[\\ S] +（[^ \\秒] * $）「。這對第4列的工作正常 – 2010-03-09 17:47:25

你不會放棄使用錯誤的工具進行工作嗎？ **已經超過**一週**而掙扎。作爲對您之前的一個問題的迴應，我在不到10分鐘的時間內用完整的JDBC代碼編寫了一個完整的工作解析器示例，您可能需要稍微編輯以適應文件中列的位置。你前面那個大的* regex *標籤是怎麼回事？ :) http://stackoverflow.com/questions/2360418/would-a-regex-like-this-work-for-these-lines-of-text/2363260#2363260 – BalusC 2010-03-11 02:48:48

我首先想到的是，它也許更容易分裂使用空格和StringTokenizer第幾場，然後用正則表達式的其餘3場。這樣你就可以簡化所需的正則表達式。

來源

2010-03-09 17:44:12

split（）更簡單：'String []部分= s。split（「\\ s +」，4）;' – 2010-03-10 07:17:59

可能還有其他問題，但第一個障礙是反斜槓不會將它傳送到正則表達式機器。你需要加倍他們。

來源

2010-03-09 17:45:04

我試圖想出從標題和類似於你在開始的部分正則表達式想出了

(.*)\\s+(\\([0-9]{4}\\))\\s+(.*$)

也許你可以提供一些更多的代碼，以澄清正是你有做什麼正則表達式？另外，this答案有問題嗎？

來源

2010-03-10 06:23:46 user286640

此正則表達式與您所提供的數據正常工作：

^([\d.]+)\s+(\d+)\s+([\d.]+)\s+(.+?)\s+\((\d+)\)(?:\s+\{([^{}]+))?

如果沒有字幕，最後一組（組＃6）將是無效的。

編輯：這是一個完整的例子：

import java.io.*; 
import java.util.*; 
import java.util.regex.*; 

public class Test 
{ 
    public static void main(String[] args) throws Exception 
    { 
    Pattern p = Pattern.compile(
     "^([\\d.]+)\\s+(\\d+)\\s+([\\d.]+)\\s+(.+?)\\s+\\((\\d+)\\)(?:\\s+\\{([^{}]+))?" 
    ); 
    Matcher m = p.matcher(""); 
    Scanner sc = new Scanner(new File("test.txt")); 
    while (sc.hasNextLine()) 
    { 
     String s = sc.nextLine(); 
     if (m.reset(s).find()) 
     { 
     System.out.printf("%s %8s %6s%n%s (%s) %s%n%n", 
      m.group(1), m.group(2), m.group(3), m.group(4), m.group(5), 
      m.start(6) != -1 ? m.group(6) : ""); 
     } 
    } 
    } 
}

部分輸出：

1000000103  50 4.5 
#1 Single (2006) THis would be a subtitle example 

2...1.2.12  8 2.7 
$1,000,000 Chance of a Lifetime (1986)

...等。

來源

2010-03-10 07:16:30

不幸的是它似乎不適用於我 – 2010-03-10 18:40:33

@angad：它適用於我;看我的編輯。 – 2010-03-10 19:52:00

是啊，它打印出來就好了。但這裏有一個問題，我遇到了問題，也許你可以幫助我將值保存在一個變量中，這是每個m.group（）在一個單獨的變量中的值，然後將它插入到MySQL的表中我的代碼可能會有用，然後你可以告訴我我哪裏出錯了。 – 2010-03-10 21:54:36

正則表達式文本分成6列

回答

相關問題