2017-02-11 147 views
2

我有一個奇怪的數據框,其中玩家列有玩家的名字。問題是第一個名字顯示兩次。所以Roy SieversRoyRoy Sievers,我想這個名字顯然是Roy Sievers刪除R中同一數據幀列中的重複值/重複值

有人會知道如何做到這一點?

以下是完整的數據幀,它不是很長:

Year     Player     Team  Position 
1 1949   RoyRoy Sievers  St. Louis Browns  OF 
2 1950   WaltWalt Dropo   Boston Red Sox  1B 
3 1951   GilGil McDougald  New York Yankees  3B 
4 1952   HarryHarry Byrd Philadelphia Athletics  P 
5 1953  HarveyHarvey Kuenn   Detroit Tigers  SS 
6 1954    BobBob Grim  New York Yankees  P 
7 1955   HerbHerb Score  Cleveland Indians  P 
8 1956  LuisLuis Aparicio  Chicago White Sox  SS 
9 1957   TonyTony Kubek  New York Yankees  SS 
10 1958  AlbieAlbie Pearson Washington Senators  OF 
11 1959   BobBob Allison Washington Senators  OF 
12 1960   RonRon Hansen  Baltimore Orioles  SS 
13 1961   DonDon Schwall   Boston Red Sox  P 
14 1962    TomTom Tresh  New York Yankees  SS 
15 1963   GaryGary Peters  Chicago White Sox  P 
16 1964   TonyTony Oliva  Minnesota Twins  OF 
17 1965   CurtCurt Blefary  Baltimore Orioles  OF 
18 1966  TommieTommie Agee  Chicago White Sox  OF 
19 1967    RodRod Carew  Minnesota Twins  2B 
20 1968   StanStan Bahnsen  New York Yankees  P 
21 1969   LouLou Piniella  Kansas City Royals  OF 
22 1970 ThurmanThurman Munson  New York Yankees  C 
23 1971  ChrisChris Chambliss  Cleveland Indians  1B 
24 1972  CarltonCarlton Fisk   Boston Red Sox  C 
25 1973    AlAl Bumbry  Baltimore Orioles  OF 
26 1974  MikeMike Hargrove   Texas Rangers  1B 
27 1975   FredFred Lynn   Boston Red Sox  OF 
28 1976   MarkMark Fidrych   Detroit Tigers  P 
29 1977  EddieEddie Murray  Baltimore Orioles  DH 
30 1978   LouLou Whitaker   Detroit Tigers  2B 
31 1979*   JohnJohn Castino  Minnesota Twins  3B 
32 1979* AlfredoAlfredo Griffin  Toronto Blue Jays  SS 
33 1980  JoeJoe Charboneau  Cleveland Indians  OF 
34 1981  DaveDave Righetti  New York Yankees  P 
35 1982   CalCal Ripken  Baltimore Orioles  SS 
36 1983   RonRon Kittle  Chicago White Sox  OF 
37 1984   AlvinAlvin Davis  Seattle Mariners  1B 
38 1985  OzzieOzzie Guillén  Chicago White Sox  SS 
39 1986   JoseJose Canseco  Oakland Athletics  OF 
40 1987   MarkMark McGwire  Oakland Athletics  1B 
41 1988   WaltWalt Weiss  Oakland Athletics  SS 
42 1989   GreggGregg Olson  Baltimore Orioles  P 
43 1990   Sandy Alomar Jr  Cleveland Indians  C 
44 1991  ChuckChuck Knoblauch  Minnesota Twins  2B 
45 1992   PatPat Listach  Milwaukee Brewers  SS 
46 1993   TimTim Salmon  California Angels  OF 
47 1994   BobBob Hamelin  Kansas City Royals  DH 
48 1995  MartyMarty Cordova  Minnesota Twins  OF 
49 1996   DerekDerek Jeter  New York Yankees  SS 
50 1997 NomarNomar Garciaparra   Boston Red Sox  SS 
51 1998   BenBen Grieve  Oakland Athletics  OF 
52 1999  CarlosCarlos Beltrán  Kansas City Royals  OF 
53 2000 KazuhiroKazuhiro Sasaki  Seattle Mariners  P 
54 2001  IchiroIchiro Suzuki  Seattle Mariners  OF 
55 2002   EricEric Hinske  Toronto Blue Jays  3B 
56 2003  ÁngelÁngel Berroa  Kansas City Royals  SS 
57 2004  BobbyBobby Crosby  Oakland Athletics  SS 
58 2005  HustonHuston Street  Oakland Athletics  P 
59 2006 JustinJustin Verlander   Detroit Tigers  P 
60 2007  DustinDustin Pedroia   Boston Red Sox  2B 
61 2008  EvanEvan Longoria   Tampa Bay Rays  3B 
62 2009   Andrew Bailey  Oakland Athletics  P 
63 2010  NeftalíNeftalí Feliz   Texas Rangers  P 
64 2011 JeremyJeremy Hellickson   Tampa Bay Rays  P 
65 2012   MikeMike Trout  Los Angeles Angels  OF 
66 2013    WilWil Myers   Tampa Bay Rays  OF 
67 2014   JoséJosé Abreu  Chicago White Sox  1B 
68 2015  CarlosCarlos Correa   Houston Astros  SS 
69 2016 MichaelMichael Fulmer   Detroit Tigers  P 
+0

是否有任何中間名?我們總能期待? –

回答

3

您可以找到至少三個字母的重複模式和複印件1份這樣的替換它解決這個問題:

gsub("(\\w{3,})\\1", "\\1", Players$Player) 

如果要覆蓋舊版本,只是

Players$Player = gsub("(\\w{3,})\\1", "\\1", Players$Player) 
+0

爲什麼「至少3」規則是必要的? – jdobres

+0

只是爲了消除虛假的匹配。例如,名字「哈利」出現在上面。你不想讓它改變爲哈里。 – G5W

+0

但假如資本化很重要,你會不會更好?例如:'gsub('([A-Z] [a-z] +)\\ 1','\\ 1',myData $ Player)' – jdobres

2

G5W的回答讓你最在那裏的方式,但會錯過兩個字母的名字,如「鋁」。這個版本依賴於資本,而不是字符計數:

myData$Player <- gsub('([A-Z][a-z]+)\\1', '\\1', myData$Player) 
1

對於不太精通正則表達式---

library(stringr) 
    fun1<-function(string){ 
     g<-str_split(g," ") 
     h<-str_length(m<-g[[1]][1]) 
     l<-str_sub(m,start = 1,end = h/2) 
     return(paste(l,g[[1]][2])) 
    } 

fun1(df$Player)