I am looking for a PHP script that batch processes text articles. It should clean them up, then turn them into two CSV files per input article.?
## Deliverables
I am looking for a PHP script that batch processes text articles. It should clean them up, then turn them into two CSV files per input article.?
Here is how I would like it to work:
1. The interface has an upload field where I can upload multiple .txt file from my computer. (The same way gmail does it - one upload field, but you can add multiple files from the same dialog.)
2. Some articles contain non-standard characters like fancy quotes. What I want to do is have a settings file in the script directory that lists the problem characters and their replacements, then goes through each text article file, finding and replacing the characters. The settings file needs to be easily editable by me so that I can add additional find and replace rules as I identify additional characters that are causing problems.
3. The CSV files will have the column headings: title, sentence, para, number.
4. Copy the title from the input article and put it in the first column, "title". This will be the same for every entry in the CSV.
5. In the second column, "sentence", will be listed one individual sentence from the article. These will be listed in order from the first sentence to the last.
6. The third column, "para", will contain the number of the paragraph that the sentence is in. The title does not count as a paragraph.
7. The fourth column, "number", contains the number of the sentence in the article (not within the paragraph), so number 1 would be the first sentence, number 10 would be the 10th sentence. The counting of "number" does NOT start over when a new paragraph starts.
8. All entries in which a sentence contains asterisks (*) need to be transferred to a separate CSV file with the same column headings.
8. The script would then provide a download link to a zip file containing all the generated CSVs.
The CSV files will be named (articlename)[login to view URL] for the sentences without asterisks and (articlename)[login to view URL] for the sentences with asterisks. So if an input article is named [login to view URL], then the CSVs would be name [login to view URL] and 030110-keywords.csv.
I have attached three files: one input article, and the two output CSVs that should be produced by the script.
A MySQL database may be used, but it is not necessary.