Program Structure
DICE is written exclusively in Python. It relies heavily on assumptions about what files for your web scrapers are called in order to keep needless configuration to a minimum.
Every database should contain a download.py and an extract.py. All steps that follow are generic enough that no further part of the process should be database-specific. You can add in whatever other files you want. It is sometimes convenient to add in a file containg a class that encapsulates what an article looks like and share its use between download.py and extract.py.