|
Spreadsheet Toolkit | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--corpus.gobbler.SearchMethod | +--corpus.gobbler.GoogleHTTPGetSearch
Title:
Description:
Copyright: Copyright (c) 2002
Company: VUW:MCS
Field Summary |
Fields inherited from class corpus.gobbler.SearchMethod |
fileType, numPerPage, result, searchString |
Constructor Summary | |
GoogleHTTPGetSearch(java.lang.String searchString,
java.lang.String fileType,
int numPerPage,
Gobbler gobbler)
|
Method Summary | |
protected int |
EstimateTotalResults()
|
protected java.lang.String |
sendSocket(java.lang.String query,
java.lang.String filetype,
int numPerPage,
int startnum,
java.lang.String host,
int port)
Opens a socket connection to host(Google) and sends a HTTP GET with the search string. |
protected SearchResult |
StartSearch(int startNumber)
Perform a http based search |
protected java.lang.String[] |
stripURLs(java.lang.String htmlpage,
java.lang.String[] excludes)
Parses the html and uses a regular expression to rip out the http://....... |
Methods inherited from class corpus.gobbler.SearchMethod |
html, newEstimate, performSearch, status, urlCountTick |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
public GoogleHTTPGetSearch(java.lang.String searchString, java.lang.String fileType, int numPerPage, Gobbler gobbler)
Method Detail |
protected int EstimateTotalResults()
EstimateTotalResults
in class SearchMethod
protected SearchResult StartSearch(int startNumber)
StartSearch
in class SearchMethod
protected java.lang.String sendSocket(java.lang.String query, java.lang.String filetype, int numPerPage, int startnum, java.lang.String host, int port)
query
- The query terms seperated by spaces.numPerPage
- How many results to ask for on each page.filetype
- Which file type to ask forstartnum
- Ask for results starting from this point.host
- Should be www.google.com at this stageport
- Should be 80.
protected java.lang.String[] stripURLs(java.lang.String htmlpage, java.lang.String[] excludes)
htmlpage
- The source code from web page to extract the URL's from.excludes
- Any URL's that match the start of one of these URL's will be excluded from the results.
|
Spreadsheet Toolkit | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |