2015-05-29 136 views
2

我是Java編碼的業餘愛好者,我被困在一個任務中。除了基本部分外,我已經寫了大部分代碼,並且我正在如何着手做這件事。我希望有人能指導我如何完成正確的方向。按XML標記分割文件

我做了一個叫Splitter的課。它的工作是讀取XML文件並根據特定的XML startend標籤將其分割成更小的文件,而每個較小的文件也必須小於給定的maxfilesize

此外,必須將舊版本的文件放入具有時間戳的歸檔文件夾中。我主要得到它。除此之外,我不知道如何通過startend標籤進行分割。我有一個getXML方法讀取這些標籤之間的所有內容;但是從那裏開始,當我將它稱爲拆分方法時,我不太確定如何處理它。

任何人有任何可以分享的信息來引導我走向正確的方向嗎?

public class Splitter { 

    public static void split(String directory, String fileName, 
     String transactionTag, int fileSize) throws IOException{ 
    String startTag = "<"+ transactionTag + ">"; 
    String endTag = "</"+ transactionTag + ">"; 
    File f = new File(directory + fileName); 
    File output = new File (directory + "Output/" + fileName); 
    BufferedInputStream in = new BufferedInputStream(new FileInputStream(f)); 
    Splitter sp = new Splitter(); 
    int fileCount = 0; 
    int len; 
    int maxFileSize = fileSize; 
    byte[] buf = new byte[maxFileSize]; 
    SimpleDateFormat sdf = new SimpleDateFormat("yyyy_MM_dd_hh_mm_ss"); 
    Date curDate = new Date(); 
    String strDate = sdf.format(curDate); 
    String fileTime = strDate; 
    while ((len = in.read(buf)) > 0) { 
     fileCount++; 
     try{ 
      File afile =new File(directory + "Output\\" + fileName + "." + fileCount); 
      if(afile.exists()){ 
       if(afile.renameTo(new File(directory + "Output\\Archive\\" + fileName + "." + fileCount + "-" + fileTime))){ 
       }else{ 
        System.out.println("Files failed to be archived. "); 
       } 
      }else{ 
       System.out.println("This file does not exist."); 
      } 
     }catch(Exception e){ 
      e.printStackTrace(); 
     } 
     BufferedOutputStream out = new BufferedOutputStream(new FileOutputStream(output + "." + fileCount)); 
     String newInput = new String(buf,0,len); // newInput is a String no greater in length than whatever bytes or chars 
     String value = sp.getXML(newInput, transactionTag); 

     //This part is incomplete. 
     //Do something with value to make this class split the file by XML tags. 
     //Also make sure any left over code before the first start tag and last end tag are also put into smaller files. 

     int start = value.indexOf(startTag); 
     int end = value.lastIndexOf(endTag); 

     out.write(buf,0,len); 
     out.close(); 
    } 
    in.close(); 
    } 
    public String getXML(String content, String tagName){ 
    String startTag = "<"+ tagName + ">"; 
    String endTag = "</"+ tagName + ">"; 
    int startposition = content.indexOf(startTag); 
    int endposition = content.indexOf(endTag, startposition); 
    if (startposition == -1)return ""; 
    startposition += startTag.length(); 
    if(endposition == -1) return ""; 
    return content.substring(startposition, endposition); 
    } 
    public static void main(String[]args) throws IOException{ 
    int num = 100; 
    int kb = num * 1024; 
    Splitter split = new Splitter(); 
    split("C:/SplitUp/", "fileSplit.xml", "blah1", kb); 
    System.out.println("Program ran"); 
    } 
} 
+0

IIUC您的單個輸入文件('fileSplit.xml')有多個'start'和'end'標籤,會讓你每對start'的'和'end'標籤之間的內容分割成獨立的單個文件,對? –

+0

是的,這是完全正確的。實際上,我已經將這段代碼運行到通過fileSize分割文件的位置,但我也需要通過這些開始和結束標記來分割它。我有getXML方法,它看到開始和結束標記之間的內容,並且我知道我需要將它調用到split方法中,並執行某種循環來分割所有內容,但我不知道如何去關於這樣做。我還需要提交「剩菜」,這意味着將第一個開始標記之前的內容以及最後一個結束標記之後的內容放入其自己的文件中。我會感謝任何見解。 – Galvatron

回答

0

基於您的評論,我假設你fileSplit.xml看起來是這樣的:

<header> 
    <!-- Some XML metadata --> 
<header> 
<start> 
    <!-- Some XML data --> 
</start> 
<start> 
    <!-- Some XML data --> 
</start> 
<start> 
    <!-- Some XML data --> 
</start> 
<start> 
    <!-- Some XML data --> 
</start> 
<footer> 
    <!-- Some XML metadata --> 
<footer> 

其中每個<start><header><footer>及其相應的結束標記都是靠自己行。

您可以使用簡化代碼:

  1. java.nio.files.readAllLines(Path path, Charset cs)讀你C:/SplitUp/fileSplit.xml
  2. java.io.FileWriter寫信給所有子文件。

本質(用於Java 7+),你可以這樣做,

// read the entire fileSplit.xml into an array of string 
List<String> fileContent = files.readAllLines(Paths.get("C:/SplitUp/fileSplit.xml"), StandardCharsets.UTF_8); 

// iterate through the array to split the file content into sub-files 
String subFileContent = ""; 
for(String line : fileContent){ 
    if(line.compareToIgnoreCase("<start>") != 0 || line.compareToIgnoreCase("<footer>") != 0) { // keep reading if this line isn't a <start> nor a <footer> 
    subFileContent += line; 
    } 
    else { // if this line is a <start> or a <footer>, write all the content thus-far into a new sub-file 
    // sub-files names taken from your codes above. Make sure they are unique! 
    FileWriter fileWriter = new FileWriter(directory + "Output\\" + fileName + "." + fileCount++); 

    // this will write up to only maxFileSize number of characters. 
    // how do you want to handle spillover? 
    fileWriter.write(subFileContent, 0, maxFileSize); 

    // reset subFileContent 
    subFileContent = new String(line); 
    } 
} 

在滿足

的要求方面......每個小文件也必須小於給定maxfilesize

您可以將最後的else更改爲else if爲f當其length()超出maxFileSize時,請將subFileContent寫出來,並確保餘數寫入第二個子文件。但是,我要說的是,在處理第二個需求之前,先將內容分解成子文件,然後再開始工作。