|
Spreadsheet Toolkit | |||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Object
|
+--corpus.gobbler.SearchMethod
|
+--corpus.gobbler.GoogleHTTPGetSearch
Title:
Description:
Copyright: Copyright (c) 2002
Company: VUW:MCS
| Field Summary |
| Fields inherited from class corpus.gobbler.SearchMethod |
fileType, numPerPage, result, searchString |
| Constructor Summary | |
GoogleHTTPGetSearch(java.lang.String searchString,
java.lang.String fileType,
int numPerPage,
Gobbler gobbler)
|
|
| Method Summary | |
protected int |
EstimateTotalResults()
|
protected java.lang.String |
sendSocket(java.lang.String query,
java.lang.String filetype,
int numPerPage,
int startnum,
java.lang.String host,
int port)
Opens a socket connection to host(Google) and sends a HTTP GET with the search string. |
protected SearchResult |
StartSearch(int startNumber)
Perform a http based search |
protected java.lang.String[] |
stripURLs(java.lang.String htmlpage,
java.lang.String[] excludes)
Parses the html and uses a regular expression to rip out the http://....... |
| Methods inherited from class corpus.gobbler.SearchMethod |
html, newEstimate, performSearch, status, urlCountTick |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
public GoogleHTTPGetSearch(java.lang.String searchString,
java.lang.String fileType,
int numPerPage,
Gobbler gobbler)
| Method Detail |
protected int EstimateTotalResults()
EstimateTotalResults in class SearchMethodprotected SearchResult StartSearch(int startNumber)
StartSearch in class SearchMethod
protected java.lang.String sendSocket(java.lang.String query,
java.lang.String filetype,
int numPerPage,
int startnum,
java.lang.String host,
int port)
query - The query terms seperated by spaces.numPerPage - How many results to ask for on each page.filetype - Which file type to ask forstartnum - Ask for results starting from this point.host - Should be www.google.com at this stageport - Should be 80.
protected java.lang.String[] stripURLs(java.lang.String htmlpage,
java.lang.String[] excludes)
htmlpage - The source code from web page to extract the URL's from.excludes - Any URL's that match the start of one of these URL's will be excluded from the results.
|
Spreadsheet Toolkit | |||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||