POLICY SETTING |
Policy factors influencing web archiving include political mandates,
organizational mission, financial parameters, and technical capabilities. |
| SELECTION |
| Selection |
Choice of web-published materials for archiving is impacted by the focus of the collection,
unit of selection, web boundaries, copyright obligations, and authenticity of materials. |
| Acquisition |
Web-published materials are acquired or 'harvested' using crawling tools, which either
globally or selectively capture web-published materials. |
| CURATION |
| Description |
Baseline metadata is machine-generated and gathered by a crawler at the time of data capture.
Enriched metadata is generally specific to an organization and contains a mixture of human-generated metadata added
subsequent to data capture as well as machine-generated metadata. |
| Organization |
Digital archives of web-published materials typically either retain the organizational
structure of the materials as they existed on the web at the time of capture or modify the organizational structure
to suit the archive's mission or constraints. |
| Presentation |
Presentation of web archive materials is related to how the content was captured and to post-harvest
descriptive and organizational analysis. For example, archived materials might mirror the web at the time of their capture
or might be categorized in accord with selection criteria, such as image files presented by subject. |
| Maintenance |
Several maintenance functions are critical to ensuring the successful use of materials in web archives:
software and hardware training for archive support staff; hardware and software maintenance, performance optimization,
backups, and upgrades; and duplicate detection. |
| Deselection |
Removal of materials from a web archive can be for several reasons: duplication, errors, legal or
social considerations (e.g., offensive materials). Risks of removal and retention are weighed against policy and storage costs. |
| PRESERVATION |
| Preservation |
Preservation challenges are numerous. They include persistent naming, format migration and/or emulation,
inventory management, volatility, replication, re-validation, curator-operator error, and storage. |