everyone! 我剛剛創建了一個蠻力機器人,它使用WebDriver和多線程蠻力強制一個4位數的代碼。 4位數表示範圍爲0000 - 9999個可能的字符串值。 就我而言,在點擊「提交」按鈕後,客戶端從服務器得到響應之前不少於7秒。所以,我決定使用Thread.sleep(7200)讓頁面滿載。然後,我發現我無法等待9999 * 7.5秒的任務完成,所以我不得不使用多線程。我有一臺AMD四核機器,每1個硬件一個虛擬內核,這讓我有機會同時運行8個線程。好吧,我已經將8999個組合的9999個組合的整個工作分開了,每個組合都有1249個組合+剩餘線程的工作範圍。好的,現在我在1.5小時內完成了我的工作(因爲正確的代碼似乎處於工作範圍的中間)。這好多了,但它可能會更好!你知道,Thread.sleep(7500)純粹是浪費時間。由於硬件核心數量有限,我的機器可能會切換到wait()
的其他線程。這個怎麼做?有任何想法嗎?增加基於Selenium WebDriver的多線程蠻力機器人的性能
下面是兩個類來表示我的架構方法:
public class BruteforceBot extends Thread {
// All the necessary implementation, blah-blah
public void run() {
brutforce();
}
private void brutforce() {
initDriver();
int counter = start;
while (counter <= finish) {
try {
webDriver.get(gatewayURL);
webDriver.findElement(By.name("code")).sendKeys(codes.get(counter));
webDriver.findElement(By.name("code")).submit();
Thread.sleep(7200);
String textFound = "";
try {
do {
textFound = Jsoup.parse(webDriver.getPageSource()).text();
//we need to be sure that the page is fully loaded
} while (textFound.contains("XXXXXXXXXXXXX"));
} catch (org.openqa.selenium.JavascriptException je) {
System.err.println("JavascriptException: TypeError: "
+ "document.documentElement is null");
continue;
}
// Test if the page returns XXXXXXXXXXXXX below
if (textFound.contains("XXXXXXXXXXXXXXXx") && !textFound.contains("YYYYYYY")) {
System.out.println("Not " + codes.get(counter));
counter++;
// Test if the page contains "YYYYYYY" string below
} else if (textFound.contains("YYYYYYY")) {
System.out.println("Correct Code is " + codes.get(counter));
botLogger.writeTheLogToFile("We have found it: " + textFound
+ " ... at the code of " + codes.get(counter));
break;
// Test if any other case of response below
} else {
System.out.println("WTF?");
botLogger.writeTheLogToFile("Strange response for code "
+ codes.get(counter));
continue;
}
} catch (InterruptedException intrrEx) {
System.err.println("Interrupted exception: ");
intrrEx.printStackTrace();
}
}
destroyDriver();
} // end of bruteforce() method
而且
public class ThreadMaster {
// All the necessary implementation, blah-blah
public ThreadMaster(int amountOfThreadsArgument,
ArrayList<String> customCodes) {
this();
this.codes = customCodes;
this.amountOfThreads = amountOfThreadsArgument;
this.lastCodeIndex = codes.size() - 1;
this.remainderThread = codes.size() % amountOfThreads;
this.scopeOfWorkForASingleThread
= codes.size()/amountOfThreads;
}
public static void runThreads() {
do {
bots = new BruteforceBot[amountOfThreads];
System.out.println("Bots array is populated");
} while (bots.length != amountOfThreads);
for (int j = 0; j <= amountOfThreads - 1;) {
int finish = start + scopeOfWorkForASingleThread;
try {
bots[j] = new BruteforceBot(start, finish, codes);
} catch (Exception e) {
System.err.println("Putting a bot into a theads array failed");
continue;
}
bots[j].start();
start = finish;
j++;
}
try {
for (int j = 0; j <= amountOfThreads - 1; j++) {
bots[j].join();
}
} catch (InterruptedException ie) {
System.err.println("InterruptedException has occured "
+ "while a Bot was joining a thread ...");
ie.printStackTrace();
}
// if there are any codes that are still remain to be tested -
// this last bot/thread will take care of them
if (remainderThread != 0) {
try {
int remainderStart = lastCodeIndex - remainderThread;
int remainderFinish = lastCodeIndex;
BruteforceBot remainderBot
= new BruteforceBot(remainderStart, remainderFinish, codes);
remainderBot.start();
remainderBot.join();
} catch (InterruptedException ie) {
System.err.println("The remainder Bot has failed to "
+ "create or start or join a thread ...");
}
}
}
我需要你對如何改善這個應用程序的體系結構,使之與成功運行的發言權提醒, 20個線程,而不是8個。我的問題是 - 當我簡單地刪除Thread.sleep(7200)並同時命令運行20個線程實例而不是8個線程時,線程始終無法獲得來自服務器的響應,因爲它不'等待7秒鐘。因此,表現不僅僅是更少,它== 0;你會選擇哪種方法?
PS:我下令從main()方法的線程的數量:
public static void main(String[] args)
throws InterruptedException, org.openqa.selenium.SessionNotCreatedException {
System.setProperty("webdriver.gecko.driver", "lib/geckodriver.exe");
ThreadMaster tm = new ThreadMaster(8, new CodesGenerator().getListOfCodesFourDigits());
tm.runThreads();
'Thread.sleep(7500)是純粹浪費時間。我的機器可能會切換到其他等待的線程()'我不明白這一點。如果一個線程選擇Sleep(),操作系統會阻塞它並釋放它正在運行的核心。如果另一個線程準備就緒,它將立即被分派到現在免費的內核中。如果你的線程代碼中有一個Sleep(7200)調用,那麼你可以運行800個線程,沒問題,你也不會注意到任何放緩。 –
@MartinJames,不幸的是,'sleep()'不會釋放它的resorces上的鎖。這是[這裏]討論(https://stackoverflow.com/questions/1036754/difference-between-wait-and-sleep)。在Thread.sleep()期間,物理內核不會被釋放,但它將執行Thread.sleep()。據我所知,只有wait()可以在這裏幫助。 – Slavick
@MartinJames,你認爲這是因爲geckodriver.exe和chromedriver.exe都是獨立的Windows程序,它們與我的Java應用程序沒有多大關係,佔用了我的線程?可能它不是一個Java多線程問題,而是一個Windows多進程編程問題......無論如何,我希望得到一個建議仍然存在:) – Slavick