While I am adept at using R for machine learning algorithms like random forest, xgboost, neural networks and sentiment analysis using NLP, I am not sure I understand your requirement fully. Correct me if I am wrong. I went to the first link (Oregon) that you have provided and clicked on the "Permit" hyperlink of the first record (Absorbent Technologies). That opened up a 67 page pdf document. Are you looking at making the pdf documents searchable? What kind of tabular data do you want extracted from these documents? What specific NLP functionality are you looking at? Are you expecting me to crape these sites to extract all the pdf documents? If so, how many sites and which all documents are to be scraped?
I am an R programmer with very limited scraping skills. If you can provide the documents and specify what action you want on those documents, I can help you out. I will look forward to hearing from you soon on this.