2015-11-22 98 views
0

所以我有這樣的文字(它有一千多行):如何用換行符替換第一個空格後跟大寫字母?

ABO blood group antigens Carbohydrate antigens attached mainly to cell surface proteins or lipids that are present on many cell types, including red blood cells. These antigens differ among individuals, depending on inherited alleles encoding the enzymes required for synthesis of the carbohydrate antigens. The ABO antigens act as alloantigens that are responsible for blood transfusion reactions and hyperacute rejection of allografts. 

Acquired immunodeficiency A deficiency in the immune system that is acquired after birth, usually because of infection (e.g., AIDS), and that is not related to a genetic defect. Synonymous with secondary immunodeficiency. 

Acquired immunodeficiency syndrome (AIDS) A disease caused by human immunodeficiency virus (HIV) infection that is characterized by depletion of CD4+ T cells, leading to a profound defect in cell-mediated immunity. Clinically, AIDS includes opportunistic infections, malignant tumors, wasting, and encephalopathy. 

Activation-induced cell death (AICD) Apoptosis of activated lymphocytes, generally used for T cells. 

Activation-induced (cytidine) deaminase (AID) An enzyme expressed in B cells that catalyzes the conversion of cytosine into uracil in DNA, which is a step required for somatic hypermutation and affinity maturation of antibodies and for Ig class switching. 

Activation protein 1 (AP-1) A family of DNA-binding transcription factors composed of dimers of two proteins that bind to one another through a shared structural motif called a leucine zipper. The best-characterized AP-1 factor is composed of the proteins Fos and Jun. AP-1 is involved in transcriptional regulation of many different genes that are important in the immune system, such as cytokine genes. 

而且我想它是這樣的:

ABO blood group antigens 
Carbohydrate antigens attached mainly to cell surface proteins or lipids that are present on many cell types, including red blood cells. These antigens differ among individuals, depending on inherited alleles encoding the enzymes required for synthesis of the carbohydrate antigens. The ABO antigens act as alloantigens that are responsible for blood transfusion reactions and hyperacute rejection of allografts. 

Acquired immunodeficiency 
A deficiency in the immune system that is acquired after birth, usually because of infection (e.g., AIDS), and that is not related to a genetic defect. Synonymous with secondary immunodeficiency. 

Acquired immunodeficiency syndrome (AIDS) 
A disease caused by human immunodeficiency virus (HIV) infection that is characterized by depletion of CD4+ T cells, leading to a profound defect in cell-mediated immunity. Clinically, AIDS includes opportunistic infections, malignant tumors, wasting, and encephalopathy. 

Activation-induced cell death (AICD) 
Apoptosis of activated lymphocytes, generally used for T cells. 

Activation-induced (cytidine) deaminase (AID) 
An enzyme expressed in B cells that catalyzes the conversion of cytosine into uracil in DNA, which is a step required for somatic hypermutation and affinity maturation of antibodies and for Ig class switching. 

Activation protein 1 (AP-1) 
A family of DNA-binding transcription factors composed of dimers of two proteins that bind to one another through a shared structural motif called a leucine zipper. The best-characterized AP-1 factor is composed of the proteins Fos and Jun. AP-1 is involved in transcriptional regulation of many different genes that are important in the immune system, such as cytokine genes. 

是否有辦法解決它?我不是程序員。謝謝。

回答

0

我在本地測試了你的文本,這個工作,我不是一個正則表達式專家,所以它可能不是最有效的。

使用 '替換' 選項卡(Ctrl+H):

查找內容:^(.*?) ([A-Z].*$)
替換:\1\r\n\2

讓和正則表達式檢查確保比賽情況。

說明
查找內容:

^   starts with 
.   anything 
*   repeated 0 or more times 
?   lazy match so that it stops at the capital letter (next group) 
(.*?)  remember that part (group 1) 
      followed by a space 
[A-Z]  match the capital letter 
.   anything 
*   repeated 0 or more times 
$   ends with 
([A-Z].*$) remember that part (group 2) 

替換

\1   group 1 
\r   carriage return 
\n   new line 
\2   group 2 
+0

我沒有你做了什麼有想法,但它的工作就像一個魅力!非常感謝! – bernardomdd

0

你需要做一個替代使用regular expressions(找一個空格,然後由大寫字母)。

在記事本++使用查找/使用正則表達式替換(一定要選中 「區分大小寫」)

查找:([^])([AZ]) 替換爲:\ 1 \ r \ n \ 2

0

是的,

使用perl腳本。這個工作我認爲...

#!/usr/bin/perl 
$cestbon = 0; 
while (<>) { 
@line = split(" ",$_); 
if (/^$/) { 
      $cestbon = 0; 
    print "\n"; 
    } 
foreach (@line) { 
    if (/\b[A-Z][a-z0-9]*\b/ && $cestbon < 2) { 
     print "\n$_ "; 
     $cestbon++; 
    } else { 
     print "$_ "; 
    } 
} 
} 

要運行它!因爲這是在運行OS X又名UNIX的MBP上進行的。

cat sample.txt | ./sample.pl

ABO blood group antigens 
Carbohydrate antigens attached mainly to cell surface proteins or lipids that are present on many cell types, including red blood cells. 
These antigens differ among individuals, depending on inherited alleles encoding the enzymes required for synthesis of the carbohydrate antigens. The ABO antigens act as alloantigens that are responsible for blood transfusion reactions and hyperacute rejection of allografts. 

Acquired immunodeficiency 
A deficiency in the immune system that is acquired after birth, usually because of infection (e.g., AIDS), and that is not related to a genetic defect. Synonymous with secondary immunodeficiency. 

Acquired immunodeficiency syndrome (AIDS) 
A disease caused by human immunodeficiency virus (HIV) infection that is characterized by depletion of CD4+ T cells, leading to a profound defect in cell-mediated immunity. Clinically, AIDS includes opportunistic infections, malignant tumors, wasting, and encephalopathy. 

Activation-induced cell death (AICD) 
Apoptosis of activated lymphocytes, generally used for T cells. 

Activation-induced (cytidine) deaminase (AID) 
An enzyme expressed in B cells that catalyzes the conversion of cytosine into uracil in DNA, which is a step required for somatic hypermutation and affinity maturation of antibodies and for Ig class switching. 

Activation protein 1 (AP-1) 
A family of DNA-binding transcription factors composed of dimers of two proteins that bind to one another through a shared structural motif called a leucine zipper. The best-characterized AP-1 factor is composed of the proteins Fos and Jun. AP-1 is involved in transcriptional regulation of many different genes that are important in the immune system, such as cytokine genes. 

可能不是完美的,但我寫它在10分鐘內,所以給我破:)

相關問題