University Secretary
Scanning
For guidance on scanning, either download the factsheet or select one of the options below.
1. Introduction
Scanning paper documents provides many benefits, such as improved access to information and reduced storage requirements (if the original hard copies are discarded). Before carrying out any scanning, however, it is advisable to assess the cost-effectiveness. For instance, it may be inappropriate to scan the following:
- Documents with very short retention periods (e.g. 6 months or less);
- Documents that are rarely consulted;
- Documents with long or permanent retention periods (see note below).
2. Resolution
When scanning documents, the following resolutions should be used:
| dpi | Type of document |
|---|---|
|
200 |
suitable for most text documents |
|
300 |
ideal for documents with small font sizes or handwritten documents |
|
300/400 |
may be necessary for plans and drawings |
|
300/400 |
will be appropriate if optical or intelligent character recognition (OCR/ICR) or optical mark recognition (OMR) is being used |
3. Colour
A decision should be made whether to scan material in black and white, greyscale or colour. Greyscale will be ideal for documents that have poor definition or low contrast. For colour documents, tests should be conducted to establish the level of accuracy required.
| 8 bit colour |
provides 256 different colours |
|---|---|
| 24 bit colour |
provides approximately 16 million different colours. Using 24 bit colour will increase file sizes considerably and should only be used if it is essential to reproduce every tonal variation exactly. |
4. Formats
The scanned copies will need to be saved as either text files (e.g. PDF) or image files (e.g. TIFF). The format selected should be appropriate for the type of document and the length of its retention.
The longer data is held, the greater the risk that it could become unreadable. Wherever possible it is best to use standard file formats in order to reduce the danger of information becoming trapped in obsolete technology: in particular, it is advisable to use ?open source? standards (e.g. TIFF, PDF, PDF/A); these are standards for which the underlying programming code has been published, so that they are not dependent on the continued support of one particular company.
Files sizes of images can be reduced by using compression, either lossless (which decompresses the file but ensures it remains identical to the original) or lossy (which removes some information from the file so it is not an exact copy). Lossy compression may compromise the evidential value of the images, and should not be used if it is essential to preserve every detail of a document. In some cases, it may helpful to store two sets of images: one set of compressed files suitable for every day use (which can be printed and retrieved quickly), and one set of uncompressed files that could be made available for evidential purposes (should it be necessary to prove the integrity of the information, for example, in the event of a legal dispute).
5. Quality Control
When the scanning is completed, the copies should be checked to make sure they are legible and everything has been captured, including the smallest details. It is also important to ensure that no pages have been omitted from multi-page or double-sided documents, and that they have been scanned in order. Depending on the quantity of material either each image will need to be checked or a representative sample. If the quality of any of the images is not adequate, the papers should be rescanned and the first copies replaced.
Alternatively, image enhancement or editing software can be used to improve the quality of the copies: for example, deskewing will improve the alignment, and despeckling will remove random black marks. When enhancing an image, however, care should be taken to ensure that it remains an accurate representation of the original; image processing may affect the evidential weight of an electronic document and, in some cases, it may therefore be advisable to retain two copies - one before enhancement and one after.
6. Indexing
A unique reference should be allocated to each text or image file to aid retrieval. In addition, the entire collection of files will need to be indexed (e.g. by date, surname or subject), so that it will be possible to locate and retrieve individual items easily and quickly.
It may also be helpful to record some additional metadata (such as, the date/time when the images were captured, who carried out the scanning, and the settings used), especially if the files are likely to be required for evidential purposes. The index and other metadata could be stored within a database or spreadsheet or an image management system.
7. Discarding hard copies
Before discarding any hard copies following scanning, a risk assessment should be carried out to balance the consequences of losing original evidence against the costs of retaining it. In most cases it will be acceptable to dispose of the originals, provided they have been checked for quality and indexed, and there are adequate procedures in place concerning security, disaster recovery and migration.
If it is likely that the scanned images will be required for evidential purposes, then clear, consistent procedures must be developed that are compliant with the British Standards Code of Practice for Legal Admissibility and Evidential Weight of information stored electronically, so the copies can be authenticated adequately. Compliance with this code will involve documenting in detail various aspects of the scanning (such as the settings and quality control criteria), as well as maintaining audit trails. In addition, the procedures may need to be audited by a relevant regulatory body.
In a few cases, it will be advisable to retain the hard copies because of their legal nature and the importance of being able to produce an original signed document. Although electronic documents are legally admissible under the terms of the Civil Evidence Act 1995 (provided their authenticity can be demonstrated), the evidential weight of an original signed document is still likely to be greater.
Note: The average life-span of a file format is five to seven years. To ensure electronic documents can continue to be read, they need to be converted to new file formats at regular intervals. Over a period of decades the effort required to prevent them from becoming trapped in obsolete technology is likely to be considerable. Furthermore, the more times data is migrated, the greater the risk that it may become corrupted. Unless therefore records need to be accessed frequently, it may prove cheaper and less onerous to retain the original paper documents