這是文本文件,以及它的一部分,我想它,所以我可以把它作爲:正則表達式文本分成6列
Column 1 = distribution
Column 2 = votes
Column 3 = rank
Column 4 = title
Column 5 = year
Column 6 = Subtitle (but only where there is a subtitle)
正則表達式我使用的是:
regexp =
"([0-9\\.]+)[ \\t]+([0-9]+)[ \\t]+([0-9\\.]+)[ \\t]+(.*?[ \\t]+\\([0-9]{4}\\).*)";
但你可以告訴它似乎並沒有工作,我怎麼可能能夠解決它的任何想法..
1000000103 50 4.5 #1 Single (2006) {THis would be a subtitle example}
2...1.2.12 8 2.7 $1,000,000 Chance of a Lifetime (1986)
11..2.2..2 8 5.0 $100 Taxi Ride (2001)
....13.311 9 7.1 $100,000 Name That Tune (1984)
3..21...22 10 4.6 $2 Bill (2002)
30010....3 18 2.7 $25 Million Dollar Hoax (2004)
2000010002 111 5.6 $40 a Day (2002)
2000000..4 26 1.6 $5 Cover (2009)
.0..2.0122 15 7.8 $9.99 (2003)
..2...1113 8 7.5 $weepstake$ (1979)
0000000125 3238 8.7 Allo Allo! (1982)
1....22.12 8 6.5 Allo Allo! (1982) {A Barrel Full of Airmen (#7.7)
CODE IM應用:
try {
FileInputStream file_stream = new FileInputStream("/Users/angadsoni/Desktop/ratings-1.txt");
DataInputStream data_stream = new DataInputStream(file_stream);
BufferedReader bf = new BufferedReader(new InputStreamReader(data_stream));
ResultSet rs;
Statement stmt;
Connection con = null;
Class.forName("org.gjt.mm.mysql.Driver").newInstance();
String url = "jdbc:mysql://localhost/mynewdatabase";
con = DriverManager.getConnection(url,"root","");
stmt = con.createStatement();
try{
stmt.executeUpdate("DROP TABLE myTable");
}catch(Exception e){
System.out.print(e);
System.out.println("No existing table to delete");
//Create a table in the database named mytable
stmt.executeUpdate("CREATE TABLE mytable(distribution char(20)," + "votes integer," + "rank float," + "title char(250)," + "year integer," + "sub char(250));");
String rege= "^([\\d.]+)\\s+(\\d+)\\s+([\\d.]+)\\s+(.+?)\\s+\\((\\d+)\\)(?:\\s+\\{([^{}]+))?";
Pattern pattern = Pattern.compile(rege);
String line;
String data= "";
while ((line = bf.readLine()) != null) {
data = line.replaceAll("'", "");
Matcher matcher = pattern.matcher(data);
if (matcher.find()) {
System.out.println("hello");
String distribution = matcher.group(1);
String votes = matcher.group(2);
String rank = matcher.group(3);
String title = matcher.group(4);
String year = matcher.group(5);
String sub = matcher.start(6) != -1 ? matcher.group(6) : "";
System.out.printf("%s %8s %6s%n%s (%s) %s%n%n",
matcher.group(1), matcher.group(2), matcher.group(3), matcher.group(4), matcher.group(5),
matcher.start(6) != -1 ? matcher.group(6) : "");
String todo = ("INSERT into mytable " +
"(Distribution, Votes, Rank, Title, Year, Sub) "+
"values ('"+distribution+"', '"+votes+"', '"+rank+"', '"+title+"', '"+year+", '"+sub+"');");
int r = stmt.executeUpdate(todo);
}//end if statement
}//end while loop
}
嗯,我把它分成4個不同的欄目,就像發行,票,排名,標題,我希望標題分成3部分,以便更容易找到取決於年份的東西正則表達式:「([0-9 \\。] +)[\\ s] +([0-9] +)[\\ s] +([ 。0-9] \\ [0-9])[\\ S] +([^ \\秒] * $)「。 這對第4列的工作正常 – 2010-03-09 17:47:25
你不會放棄使用錯誤的工具進行工作嗎? **已經超過**一週**而掙扎。作爲對您之前的一個問題的迴應,我在不到10分鐘的時間內用完整的JDBC代碼編寫了一個完整的工作解析器示例,您可能需要稍微編輯以適應文件中列的位置。你前面那個大的* regex *標籤是怎麼回事? :) http://stackoverflow.com/questions/2360418/would-a-regex-like-this-work-for-these-lines-of-text/2363260#2363260 – BalusC 2010-03-11 02:48:48