Please find below my short experience summary specific to this project.
* Several years experience developing Text Mining and Information Extraction for web crawling, scraping, extraction and aggregation from unstructured big data such as web-pages and text corpus, assembling and populating them into databases, datastores and search-indexes(Lucene, Solr) for analysis, search, reporting and dashboard.
* Have worked extensively on Text Mining techniques for automatically processing, classifying, predicting, clustering, categorization and citation and linkage analysis using referential connection and linkage structure (used markov model).
* Extensive experience using Perl, PHP, C, Java, .NET with MySql, Oracle, MS-SQL Server
* Information Extraction Tools : Weka, R, Excel, Perl-CPAN Packages for Extraction.
Estimated Budget : ~ 340$ ( 7-12 days )
Price,milestones and timelines flexible and negotiable based on exact project specifications and details or for any additional project work.
I would use PHP and MySql for this project and/or Perl for text extraction and analysis ( CPAN packages )
Are you running it ( your server ) on a Linux environment ?
Would you need to run the extraction/population at a certain schedule .