Is there software that can build a database by reading info from PDF files?? ( 3 Views )

no kitty!
  1. Is there such a thing as software that could read through a number of PDF files and look for specific pieces of text (such as "Policy Number: 123456789") and use those pieces of information to create entries in a database??

    I'm trying to find a simpler way to create a database from thousands of PDF documents from scanned client files... Typing them all into Access would be quite a project... and error prone...

    Help? Please? Thanks!

    (deniz, Panama)

  2. It depends on the PDF file.

    A PDF file can be just an image with no text. That is, a picture of dots that look like "Policy Number: 123456789" to a human, but is just a TIFF or JPG to the computer. The string "Policy Number: 123456789" doesn't exist in the file, so you'd need to use OCR to get it out. Depending on the quality of the scan, OCR might or might not be viable or accurate.

    Or, it can be an image with text over (or under) it. In this case, the text will be there, and the "Policy Number: 123456789" string is in the PDF File. In that case, it's a matter of finding it -- is it designated as a field and a response? Or do you have to full-text search over the document to look for that pattern?

    You can examine the PDF file you're reading in Acrobat to see how it's set-up. If you have thousands, then you should probably sample a few dozen to make sure they're all the same.

    From that, you'll know what it is you need to do specifically. And you can start working on finding software that will help you do it.

    (meltem, Burkina Faso)

  3. I highly doubt there is any generic software, for the reasons the last poster gave. However, the PDF format is reasonably straightforward and very well documented, so, if you're a coder, creating a custom 'datapump' would be far easier than manual input. Otherwise, hiring a software developer to do it for you is an option; one that tends to be expensive (common rates in my area are 250-300 GBP per day). This all assumes that the files have a uniform structure, of course.

    (senanur, China)

  4. Someone here might be able to provide more help on the issue as they are dedicated to pdf and what not, or they can point you in the direction of someone who could help.

    http://forum.planetpdf.com/wb/default.asp

    Enjoy
    AMDbuilder

    (Ece, Iceland)

  5. How about....?
     
    Omniform has been around for years...

    http://www.nuance.com/omniform/premium/

    (muhammet, Korea, Democratic People's Republic of)

  6. OmniForm has all of the limitations I describe above. It's a decent OCR, but OCR is never perfect. You can see a project I've been doing with OmniForm here. The product is easy to use, but their customer service in my experience has been completely negative.

    (gökhan , Ecuador)

  7. Quote:

    Originally Posted by mikeblas (Post 1031173568)
    OmniForm has all of the limitations I describe above. It's a decent OCR, but OCR is never perfect. You can see a project I've been doing with OmniForm here. The product is easy to use, but their customer service in my experience has been completely negative.

    Hi, what part of your site do you say you used Omniform for? I see a few PDFs and some stats, was the stats captured via Omniform?

    (ufuk, Guadeloupe)

  8. All of the old PDFs were scanned and converted with OmniForm. The new ones (from 2004 on), were produced in PDF directly, so no scanning was necessary.

    I haven't collected any stats from the old documents because I don't trust any of the data. The scans from the 90's clearly show how bad OCR can be. Stuff printed on dot-matrix printers is not at all reliably recognizable by OCR software.

    (ali, Nicaragua)



Related Topics ... (or search in 1.720.883 topics !)

is there software that can build a database by reading info from pdf files?? (8)
inserting database info a pdf form (22)
retrieving pdf files from a database (2)
reading text files vs database query speed (8)
reading info from one database server and inserting into a different database server (2)
need pdf search engine and pdf conversion component/software (2)
reading then deleting a pdf file (2)
help with reading pdf (adodb.stream): please help (2)
prevent user executing funny files: upload and converting files to pdf (3)




copyright © 2007-2031 Pfodere.COM ( 5 Pfoyihuee Online )

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 
1.4377