Moving from paper-based documents to paperless documents can offer a number of benefits. In addition to saving physical space, paperless documents are easier to search and to share. Paperless documents also don’t have the same type of physical limitations of paper and enable workflow solutions that simply aren’t possible with physical copies. For example, multiple people can view a paperless document simultaneously, while a single paper copy can only be viewed by a handful of people at the same time and everyone must be in the same room together.
There are all kinds of different ways to create and manage paperless documents. It can be as simple as scanning in a sheet of paper and putting it in a directory on your hard drive or as complicated as a multi-million dollar document management system that handles security and automatic scanning of bound documents.
Chances are high that at some point you’ll be working with or creating paperless documents, so it is good to understand some of the basic concepts.
To understand what an image-based document is, think about the difference between a fax and a word processor document. The fax is basically a picture of a page. You can’t edit the text, and what was faxed on page one is always going to be on page one. The image-based document is a permanent record of what the document looked like at a particular moment in time, so it is a good way to capture things like contracts or legal documents.
One of the biggest problems with image-based documents is that they are very difficult to search. To the computer, it is just a picture–not words.
The word processor document, on the other hand, is text based. You can edit words, delete paragraphs, and move sections around. If you increase the font size, the document will take more pages. If you decrease the margins, you may be able to make the document take fewer pages. It is much more fluid and dynamic. You can also search in a text-based document.
Obviously there are some very big advantages in having text-based documents, but their fluid nature makes them less than ideal for keeping record of signed contracts and other documents where you need a permanent record.
One way to get a bit more permanence in a document is to make it less fluid while retaining the ability to search. Saving a word processor document as a PDF is a good example of this. The document is still text based, but you can’t go in and re-arrange things other than making small text edits. PostScript is another format that accomplishes the same thing.
This is generally the format you’ll end up with if you download receipts and bills from a company website and it is great for keeping records. The document size is small because it doesn’t require an image file for each page. Since the document is text based, you can copy text, search, and even hyperlink a table of contents.
These types of documents are often a good choice when the source is in a digital form and where formatting is important. For example, if you print out a tax form to send to the IRS and want to keep a copy for your records, printing another copy to a PDF to save on your computer would be a good option. This preserves the formatting as well as the text search capabilities. However, there isn’t a good way to get a document to this state if all you have is a physical copy.
So what do you do if you have documents that need to be scanned in, but you need the text search capabilities of a formatted and text-based document? This can be solved by using a document format that supports layers. For example, a PDF can have one layer that is image based and another that is text based. Often the way this works is that you scan in a document to create the image layer and then run some type of optical character recognition to create an invisible text layer. So if you view the document on your screen, you’ll be looking at the image layer, but if you do a search, it will look in the text layer. If a match is found, it will take you to the part of the image layer that generated that text.
This type of setup generally gives you the best of both worlds for digitizing existing physical documents. In fact, some copier/scanner machines do this type of OCR automatically when you scan a document and save it as a PDF.
The management of paperless documents can obviously be a very complicated affair. However, just knowing the advantages and disadvantages of each of these types of storage formats puts you well ahead of the average person when it comes to understanding paperless documents.