Convert PDF,RTF & MS WORD DOC to HTML

Ακυρώθηκε Αναρτήθηκε Apr 21, 2004 Πληρώθηκε κατά την παράδοση
Ακυρώθηκε Πληρώθηκε κατά την παράδοση

PDF, RTF, TXT to HTML Convertor

We need a perl or python module that converts PDF, RTF, TXT to HTML files.

Images and pictures, WMF etc can be ignored we only interested in the text itself and its logical layout - paragraphs bullets lists tables etc.

TXT translated to paragraphs only - \n means

\n followed by one or more empty lines means

do not use Word's object model, same goes for adobe acrobat

The module has simple interface convert that gets the filename, and directory and returns the filename of the htm file - example

example of how it will be used (if you use perl)

my $convertor = ModuleName->new;

my $file = ModuleName->Convert('[url removed, login to view]', 'c:\\documetns')

if (!$file)

{

print "*** Error" . ModuleName->GetLastError();

}

where $file will be '[url removed, login to view]' if everything OK

If the conversion fails the return value will be 0

And the error string should be returned by and GetErrorLast() function

The module should handle UTF-8 encoding as well as 8bit encodings (UTF-16 is bonus if you offer it)

The code should run unattended and you should create log file with all the errors

we should get all the source code documented and we get all copyrights and we can do whatever we want with the code including changing it and reselling it, or eating it ..:-)

the module should be compatible with ms windows and all the module dependencies as well, it must be based only on open source code no special modules that cost money or limit our ability to distribute the code are allowed !

we want simple code that is easy to maintain

we have several other modules we need so if you do well on this one you may get others too.

Perl Python

Ταυτότητα Εργασίας: #1145

Σχετικά με την εργασία

Απομακρυσμένη εργασία Ενεργό Apr 21, 2004