2012-05-02 75 views
1

我有一個文件universities.txt它看起來像這樣:矩陣到PostgreSQL表

 
Alabama 

Air University 
Alabama A&M University 
Alabama State University 
Concordia College-Selma 
Faulkner University 
Huntingdon College 
Jacksonville State University 
Judson College 
Miles College 
Oakwood College 
Samford University 
Southeastern Bible College 
Southern Christian University 
Spring Hill College 
Stillman College 
Talladega College 
University of North Alabama 
University of South Alabama 
University of West Alabama 

Alaska 

Alaska Bible College 
Alaska Pacific University 
Sheldon Jackson College 
University of Alaska - Anchorage 
University of Alaska - Fairbanks 
University of Alaska - Southeast 

Arizona 

American Indian College of the Assemblies of God 
Arizona State University 
Arizona State University East 
Arizona State University West 
DeVry University-Phoenix 
Embry-Riddle Aeronautical University 
Grand Canyon University 
Northcentral University 
Northern Arizona University 

..等等,其中在這種情況下,阿拉巴馬,阿拉斯加和亞利桑那州的地點和其他一切都大學。我想要做的就是加載位置到表名爲Location和大學到一個名爲University表,其中Location表的Id是一個FK的University表,就像這樣:

CREATE TABLE Location (
Id   SERIAL PRIMARY KEY, 
Name  TEXT 
); 

CREATE TABLE University (
Id   SERIAL PRIMARY KEY, 
Location INTEGER REFERENCES Location (Id) NOT NULL, 
Name  TEXT 
); 

那麼什麼我想在Postgres做的事情是這樣的:

for (int i=0 until i = universities.size() i++){ 
//each entry in the universities vector is a tuple with the first entry being the country/state 
//and the second entry being a vector of the universities as String's 
Vector tuple = (Vector)universities.get(i); 
//insert into location table 
String state = (String)tuple.get(0); 
Vector u = (Vector)tuple.get(1); 
for(int j=0; until j =u.size(); j++){ 
//insert into university table with i as FK to location table 

任何人都知道如何做到這一點?

回答

1

這是一個純SQL解決方案

使用COPY導入文件到一個臨時表,並與data modifying CTEs一個DML語句(需要的PostgreSQL 9.1或更高版本)做休息。如果這兩個步驟要快:從文件

CREATE TEMP TABLE tmp (txt text); 

導入數據:

COPY tmp FROM '/path/to/file.txt' 

如果你正在做

與單個文本列臨時表(自動在會議結束時下降)這來自遠程客戶端,請改用meta command \copy of psql

我的解決方案取決於問題中顯示的數據格式。 I.e .:在城市之前和之後有一個空行。我假設導入文件中有實際的空字符串。確保在第一個城市之前有一個空行的前導行,以避免出現特殊情況。

行將被插入,以便。我將它用於沒有排序的以下窗口函數。

WITH x AS (
    SELECT txt 
      ,row_number() OVER() AS rn 
      ,lead(txt) OVER() = '' AND 
      lag(txt) OVER() = '' AS city 
    FROM tmp    -- don't remove empty rows just yet 
    ), y AS (
    SELECT txt, city 
      ,sum(city::int) OVER w AS id 
    FROM x 
    WHERE txt <> ''   -- remove empty rows now 
    WINDOW w AS (ORDER BY rn) 
    ), l AS (
    INSERT INTO location (id, name) 
    SELECT id, txt 
    FROM y 
    WHERE city 
    ), u AS (
    INSERT INTO university u (location, name) 
    SELECT id, txt 
    FROM y 
    WHERE NOT city 
    ) 
SELECT setval('location_id_seq', max(id)) 
FROM y; 

Voilá。

  • CTEx馬克基於在之前和之後他們行一個空字符串值的城市。

  • CTE y增加了城市的運行總和(id),從而形成每個城市和清華紫光一個完全有效的id

  • CTE lu做插入,現在很容易。

  • 最後的SELECT爲連接到location.id的序列設置下一個值。我們沒有使用它,因此我們必須將其設置爲當前最大值,否則我們會在將來的INSERT定位時遇到重複的鍵錯誤。

+0

請參閱我對其他答案所作的評論。謝謝你的幫助。 –

1

將原始圖像轉換爲表格是最安全的方法...然後您可以使用COPY將其上傳。

BEGIN { bl=0; body=0; header=""; } 
$0 == "" && body==1 && header!="" { header=""; body=0; bl=1; next; } 
$0 == "" && body==0 { bl=1; next; } 
$0 != "" && header=="" { header=$0; bl=0; next; } 
$0 != "" && bl==1 && header!="" { body=1; print header, ",", $0 } 

類似的東西AWK會變成你的文件轉換成表格,然後您可以用直psql的副本聲明上傳:

COPY university_data_file_table FROM awk-mashed-file; 

,那麼你可以改變該表到您的單獨的:

CREATE TABLE country AS SELECT DISTINCT country FROM university_data_file_table; 
CREATE TABLE university AS SELECT country.id, udft.university FROM country, university_data_file_table udft WHERE udft.country = country.country; 

類似的東西很容易通過psql腳本編寫腳本。正如我所說,你必須做最初的變革。

+0

university_data_file_table的表定義僅作爲讀者的練習。 –

+0

看起來很酷。我已經開始執行這個任務了,因爲它是在上個星期二發佈的。我的解決方案來到從我的java代碼中生成一個充滿INSERT INTO語句的SQL文件。不是最乾淨的解決方案,但它的工作原理。對不起,這不早說。我想我認爲這篇文章已經死了。但是,無論如何謝謝你的幫助! –