Spreadsheet Toolkit

corpus.fetcher
Class Fetcher

java.lang.Object
  |
  +--java.lang.Thread
        |
        +--corpus.fetcher.Fetcher
All Implemented Interfaces:
java.lang.Runnable

public class Fetcher
extends java.lang.Thread

Reads the URLs to be downloaded from a file, fetches the files, and stores them. http://www.javaworld.com/javaworld/javatips/jw-javatip19.html


Field Summary
 
Fields inherited from class java.lang.Thread
MAX_PRIORITY, MIN_PRIORITY, NORM_PRIORITY
 
Constructor Summary
Fetcher(int me)
          Constructor for the Fetcher object.
Fetcher(int me, java.lang.String auth)
           
 
Method Summary
 void addURLs(java.lang.String[] newurls)
          Add a array of urls to the pending tasks.
static corpus.fetcher.Task getWork()
          The threads get their next job here.
static void main(java.lang.String[] args)
          The main program for the Fetcher class
static java.util.Vector possible(java.lang.String path)
          Given a path to look in, this method will return all the search files.
 void run()
          Main processing method for the Fetcher object
static void setup(java.util.Vector st)
          Add a vector of urls to fetch.
static void startThreads(int num)
          Create a pool of threads to do some work.
 void store(java.lang.String outputfile, byte[] outfile, java.lang.String searchTerm, java.lang.String url)
          Store a byte array version of the file on the harddisk.
 
Methods inherited from class java.lang.Thread
activeCount, checkAccess, countStackFrames, currentThread, destroy, dumpStack, enumerate, getContextClassLoader, getName, getPriority, getThreadGroup, holdsLock, interrupt, interrupted, isAlive, isDaemon, isInterrupted, join, join, join, resume, setContextClassLoader, setDaemon, setName, setPriority, sleep, sleep, start, stop, stop, suspend, toString, yield
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

Fetcher

public Fetcher(int me)
Constructor for the Fetcher object. There will be one thread for each Fetcher object.

Parameters:
me - thread number

Fetcher

public Fetcher(int me,
               java.lang.String auth)
Method Detail

getWork

public static corpus.fetcher.Task getWork()
The threads get their next job here. Synchronized!

Returns:
The work to do (id and URL to scan)

main

public static void main(java.lang.String[] args)
The main program for the Fetcher class

Parameters:
args - usage: java corpus.Fetcher number-of-threads

setup

public static void setup(java.util.Vector st)
Add a vector of urls to fetch.


addURLs

public void addURLs(java.lang.String[] newurls)
Add a array of urls to the pending tasks. May need to check if any threads are still running. It may be necessary to restart or create new threads.

Parameters:
newurls - Array of input urls as Strings.

startThreads

public static void startThreads(int num)
Create a pool of threads to do some work.

Parameters:
num - number of threads to start

possible

public static java.util.Vector possible(java.lang.String path)
Given a path to look in, this method will return all the search files.


store

public void store(java.lang.String outputfile,
                  byte[] outfile,
                  java.lang.String searchTerm,
                  java.lang.String url)
Store a byte array version of the file on the harddisk.

Parameters:
outputfile - File name for disk
outfile - The data to store

run

public void run()
Main processing method for the Fetcher object

Specified by:
run in interface java.lang.Runnable
Overrides:
run in class java.lang.Thread

Spreadsheet Toolkit

Project Home Page