About the New Zealand Electronic Text Centre

About the NZETC

The New Zealand Electronic Text Centre has four aims:

  • To create a digital library providing open access to significant New Zealand and Pacific Island texts and materials. This encompasses both digitised heritage material and born-digital resources.

  • To effectively partner with other organisations, as a collaborator and service provider, on a variety of digitisation and digital content projects.

  • To build a wider community skilled in the use and creation of digital materials through teaching and training activities and by publishing and presenting the results of research.

  • To work at the intersection of computing tools with textual material and investigate how these tools may be used to make new knowledge from our cultural inheritance.

Acting on these goals, NZETC is engaged in an ongoing programme of digitisation and hosts an expanding online library. The standards-based collection is delivered through an Open Source framework and offers full and free access to a range of materials in multiple formats for download or online browsing. Today the NZETC collection contains over 2,600 texts (around 65,000 pages) and receives over 10,000 visits each day. Information on the NZETC selection policy for digitisation is available here.

Since it was created in 2002, the NZETC has worked successfully with a range of partners on a variety of external projects. More information about those projects can be found here. As part of Victoria University of Wellington Library, one of the Centre's key relationships is with the wider University. A large part of the funding for the Centre is provided by the University, and the NZETC works closely with Library colleagues and academic staff to identify and deliver digital content which will support research within the institution. The NZETC actively collaborates with VUW staff and students on digital humanities research projects. NZETC staff undertake teaching work in digital resource management and electronic publishing, and support internships within the Centre for students wishing to learn about various aspects of digital text.

The NZETC is a founding contributing partner in the Matapihi project. We are active members of the National Digital Forum, the Text Encoding Initiative Consortium, and the Australia New Zealand Digital Encyclopedias Group.

News

To read the news on developments at the NZETC go to the NZETC blog. Here you can find out what texts have been added to the collection and what projects we have been working on. The blog also acts as an archive with news stories going back to the creation of the Centre in 2002

Subscribe to the NZETC Blog RSS Feed

Research papers and reports produced by NZETC staff can be found in VUW's ResearchArchive

Projects

Projects with partners within the University

Projects with external partners

In addition to internal VUW projects, the NZETC provides expertise and services to a range of heritage institutions, government departments and commercial organisations in the areas of cultural heritage digitisation and e-publishing.

Technology

XML and TEI are the document mark-up standards which underpin the work of the NZETC. Information on TEI can be found through the Text Encoding Initiative. Other key technologies used at the NZETC include topic maps and XTM, XSLT, Apache Cocoon and Lucene. More information is given below.

Books, images, and collections are navigable through a dynamically-generated semantic framework, which represents the first release of a large-scale XML Topic Map (XTM) site in New Zealand. Users are able to move around the resources on the site tracking topics of interest rather than merely browsing the material linearly or through text searching. In a topic map, web-based resources are grouped around items called "topics", each of which represents some subject of interest. In the NZETC topic map, the topics represent books, chapters, and illustrations, and also people and places mentioned in those books.

Topics in a topic map are linked together with hyperlinks called "associations". There can be different types of association in a topic map, representing the different kinds of relationship in the real world. For instance, in the NZETC topic map, the topic which represents a particular person may be linked to a topic which represents a chapter of a book which mentions that person. This association would be labelled to indicate that it represents a "mention". Similarly, the same person's topic might be linked to a particular photograph topic, via a "depiction" association.

To construct our topic map, we use XSLT stylesheets to extract metadata from each of our XML text files, and express it in the XTM format. In this way we automatically create hundreds of topic maps, each of which describes one of our texts. We also harvest information about people, places and organisations from an entity authority file which we construct from what is mentioned in our collection. Finally we merge the harvested topic maps together to create a unified topic map which describes our entire website.

Each page on the website represents one of these topics, along with any associated topics.

The Topic Map framework for the NZETC website was presented at the launch of the new information architecture on 5 May 2005. PowerPoint slides from the presentation are available.

Papers on the NZETC technical infrastucture are available through the Victoria University ResearchArchive

We use the open source TM4J Topic Map engine for merging and querying our topic map.

We use an XML publishing framework called Apache Cocoon to publish the NZETC website.

Cocoon logo

Cocoon is a Java servlet and hence it can be deployed on a wide variety of systems. We run Cocoon inside the Apache Tomcat servlet container (the official reference Implementation for the Java Servlet specification), using JVM version 1.4 from Sun Microsystems.

Tomcat logo

Cocoon offers a flexible environment based on the separation of concerns between content, logic and style.

Cocoon can deliver documents in a variety of formats, including HTML, PDF, RTF, SVG, JPEG, PNG, and any other XML-based format. We have also integrated software to produce Microsoft's eBook Reader format.

We use Cocoon to transform our XML texts into readable documents using XSLT stylesheets.

Cocoon can perform these transformations on demand; i.e. when a request is received from a web browser. Each request is handled by reading the appropriate XML document or documents, and processing the XML data in a succession of stages, first applying logical, then presentational transformations. Each stage is distinct and can be effectively managed by different people. Our web designer can edit the look of the site, the web developer can edit the structure of the site, and the text-editors can edit the content of the site (the e-texts), all independently of each other. To install a new text, the editors can simply upload the XML document and associated image files into the webserver via FTP. The document will then be automatically converted to HTML and divided into separate pages for each chapter, and scaled-down thumbnail versions of the JPEG graphics will be created using the XML graphics format SVG. To change the overall look of the site, the web-designer can upload new design elements such as CSS stylesheets, new versions of the logo, navigation menu, etc, in the same way. When a document is displayed to the reader, the content will be automatically inserted into this new design.

Lucene logo

We use Lucene for searching. Lucene is a full-text search engine written entirely in Java, published by the Apache Software Foundation.

Services to External Partners

The NZETC provides expertise and services to other institutions, including commercial organisations, in XML, document conversion and repurposing, digitisation, Open Source e-publishing, metadata, digital imaging, and digitisation project development and management.

The income earned from commercial projects is used to support research and digitisation.

Digitisation, project-management and consultancy services

  • Full imaging services
  • Conversion from print to XML or other desired formats
  • Expertise across a range of metadata standards and protocols, including TEI, EAD, Dublin Core, MADS, OAI-PMH, MARC
  • Migration between metadata standards, dynamic creation of multiple metadata records
  • Expertise in XML and XSLT (stylesheets for manipulating and delivering XML)
  • Expertise in configuring proven open source XML-based platforms for managing and delivering digital content online, features can include:
    • Versioning control repository
    • Multiple format delivery, accessible formats for visually impaired community
    • dynamic thumbnail generation
    • full text searching
    • image galleries and searching
    • topic mapping of resources (semantic web)

XML-based publishing solutions

  • DocBook and TEI-based systems
  • Maintenance of technical manuals and large catalogues
  • Adding value to existing content with structured tagging
  • Schema and taxonomy development
  • XML for content exchange
  • Migration between schemas
  • Content management
  • Stylesheet ( CSS / XSLT / XSL-FO) development for templating and transformation
  • Single-source multi-channel publishing - Automated web publishing; Systems/database integration

People

The people at the NZETC come from a wide range of backgrounds - computing science, publishing, information management, literary scholarship, library science. This mix of skills and interests which fosters a dynamic and creative environment.

NZETC staff have experience and internationally recognised skills in online publication of digital heritage materials using XML, semantic web technologies, and open source systems.

Alison Stevenson, Director of the New Zealand Electronic Text Centre

Alison Stevenson, Director (currently on maternity leave)

Jason Darwin, Project Manager and acting Director

Jamie Norrish, Analyst Programmer, Jamie.Norrish@vuw.ac.nz

Stuart Yeates, Lead Architect

Samantha Callaghan, Research Assistant

Edmund King, Research Assistant

Max Sullivan, Research Assistant

Jane Hornibrook, Research Assistant

Louise Grenside, Research Assistant

Contact Information

Alison Stevenson, Director NZETC
Email: director@nzetc.org
Phone: +64 4 463 6847
Postal Address: New Zealand Electronic Text Centre, Victoria University of Wellington, P O Box 3438, Wellington, New Zealand

NZETC Digitisation Selection Policy

Introduction

“One of the most important services performed by archives, libraries, and museums is selection, choosing from the many products of the living those few items which will best tell their stories. Digitization means that cultural caretakers will find themselves conducting yet another series of selections among collections that have been winnowed time and again”

North Carolina ECHO, 2005

“Considering the bourgeoning volume and heterogeneity of information on the web, selection and appraisal of resources for digitization is one of the most difficult tasks in the digital resources management life cycle”

Hartman et al., 2005

There are many aspects to the creation of a “content-rich New Zealand”. This paper focuses on the digitisation of heritage material1. New Zealand has significant stores of formal content held in local, regional and national institutions, ranging from manuscripts and printed material to film, video and sound recording. Much of this material is not in digital form. Unlocking this content through digitisation is important because it enables New Zealanders to access information about our histories, cultures, languages and identities – and tells our stories to the world.

New Zealand’s efforts to date in putting such content online have been sporadic and lacking in national oversight or coordination. The New Zealand government has now proposed through the Draft Digital Content Strategy that action be taken to “significantly increase the store of New Zealand digital content on-line through a nationwide digitisation programme of key local, regional and national content”2.

It is expensive to select, create, and maintain digital resources. There are limits to financial resources and to technical capabilities. It is not currently feasible to digitise everything and intellectual property rights and cultural preferences mean that not everything should be digitised and made available online. A process of selection and prioritisation is required which takes account these factors along with the value of the materials and the interest in their content. This process takes place to some degree in every institution or community embarking on digitisation work but it should also take place at a national level. A 2004 report on the piecemeal and uncoordinated approach to digitisation in the UK highlighted resulting issues including risk of duplication, use of diverse standards, lost opportunities for collaboration, lack of user awareness of existing resources and poor gap analysis3.

This paper articulates ideas about how the New Zealand Electronic Text Centre can select and prioritise material for digitisation and what criteria should be taken into account when doing so4. It is intended to be a resource both for NZETC staff and the NZETC Text Selection Advisory Group. This paper does not attempt to address other aspects of the work of the NZETC such as criteria for taking on commercial digitisation work or the decision making framework around selection of digital humanities research projects.

Statement of Principles

  • The primary purpose of digitisation is to facilitate access. The aim is to enable people, regardless of location, to directly access to content relating to New Zealand’s documentary and cultural heritage. A secondary purpose may be to preserve rare and fragile items, by providing digital surrogates of the items for use.

  • The highest priority for digitisation is material relating to New Zealand and New Zealanders.

  • As part of Victoria University of Wellington, the New Zealand Electronic Text Centre has a responsibility to develop an online collection which supports the University’s strategic objectives around teaching, learning and research. Selection of material for digitisation should therefore fit within the overarching principles of the VUW Library Collection Development and Management Policy.

  • Special consideration needs to be given to the digitisation and online delivery of resources which are considered to be Mātauranga Māori.

    Māori share with other indigenous peoples a legitimate concern and apprehension when uninitiated enter their cultural world. Not only is there a need for respect, but also for caution about the dangers inherent in ‘getting on the bandwagon but starting at the top’ without having first served an appropriate apprenticeship in learning about the culture, its history, cosmogony, customs and language. Too often, the lack of these attributes has led to subsequent misuse and even abuse of superficially acquired knowledge, thus reinforcing the reluctance of many Māori to share their knowledge with the uninitiated.5

    These concerns must be addressed. There is a clear risk that if they are not, and if the majority of resources detailing aspects of Māori history, culture and language are therefore excluded from a nationwide digitisation programme, then part of essence of New Zealand will be invisible to us and to the world.

    The NZETC has developed a policy to cover the display of images of tupuna especially in relation to mokamokai.

  • Digitisation has to take account of the provisions of the 1994 Copyright Act.

Selection Criteria6

Value

The value of the materials’ content and the benefits derived from access to digital versions justify the expenditure of time and effort of carrying out a digitization project. The content should have sufficient intrinsic value to ensure ongoing use by a defined constituency for a significant period of time.

Many factors contribute, but they include:

  • intellectual content;
  • historical significance7;
  • rarity;
  • importance for the understanding of the relevant subject area;
  • broad or deep coverage of the relevant subject area;
  • useful and accurate content;
  • information on subjects or groups that are otherwise poorly documented;
  • access to the material currently restricted due to its condition, value, vulnerability or location.

Demand

To justify the effort and expense, there should be a reasonable expectation that the product will have immediate utility for New Zealanders community and/or other appropriate audiences. Thus factors to be considered might include:

  • an active, current audience for the materials;
  • advocacy for the project from part of the community;
  • realistic expectation of attracting new users even if current use is low;
  • requests from potential partners in collaborative or consortial efforts.

Note however that that a 2005 paper looking at 21 digitisation projects for historical photograph collections cautions against using existing demand as the sole justification for digitisation:

“Criteria for selection are often made on the perceived needs of the targeted viewer. Hence there is a danger of producing a ‘turn-of-the-century view’ shaped, as one archivist interviewee put it, by ‘today’s trends for nostalgia’ rather than by online resources that will have sustainability over time. …The question here .. is one of authenticity and representation of historical material being accessed by the public”8

Non-Duplication

There is no identical or similar digital resource that can reasonably meet the expressed needs.

Collaborative Potential

The following factors could be considered:

  • part of a collection split among a number of institutions that could be united online as a virtual collection;
  • contribution to development of a "critical mass" of digital materials in a subject area;
  • flexible integration and synthesis of a variety of formats, or of related materials scattered among many locations.

Enhancement of intellectual access

The following factors could be considered:

  • enhancement of intellectual control through creation of new finding aids, links to bibliographic records, and development of indices and other tools;
  • ability to search widely, manipulate images and text, and study disparate images in new contexts;
  • widespread dissemination of local or unique collections.

Enhancement of resource quality

Improved quality of access to resource content, e.g., through improved legibility of faded or stained documents, enhanced images or restored sound quality through digitisation processes.

Preservation

While digitization does not in itself constitute preservation, there are preservation aspects to be considered through the creation of digital surrogates will allow:

  • significant reduction in handling of fragile materials;
  • access to materials that cannot otherwise be easily used;
  • protection of materials at high risk of theft or mutilation.

Technical Feasibility

Potential projects should be evaluated as to whether it is technically possible with current equipment and software to capture, present, and store digital resources in ways that meet user needs.

Considerations include:

  • degree to which a digital version can represent the full content of the original;
  • understanding of how people will use the digital versions and the level of quality that that implies;
  • whether the materials will display well digitally;
  • anticipation of future users with better equipment, to avoid a need to rescan in a few years;
  • staff and resources to support programming, user interface design, and search engine development to assure that the project can fulfil the functions for which digitization is planned;
  • long term storage requirements.

Materials that require special consideration include:

  • materials that require unusually high resolution;
  • materials for which fidelity to original colour is essential;
  • oversize items;
  • items with poor legibility;
  • material with a complex graphic layout intertwined with text.

Intellectual Control Criteria

Potential projects should be evaluated as to whether appropriate intellectual control can be provided for the original materials and the digital versions:

  • cataloguing, processing and related organizational work already accomplished or to be accomplished as part of the project;
  • staff and resources to support creation of appropriate metadata relating to document identification, technical capture information, provenance, and easy navigation within the information resource;
  • accordance with the provisions of the 1994 Copyright Act and any amendments to it.

Consideration of special requirements around traditional knowledge

“Although digitization is ideal for sharing, exchanging, educating and preserving indigenous cultures, it also creates ample opportunities for illicit access to and misuse of traditional knowledge. It is essential that traditional owners be able to define and control the rights and access to their resources, in order to uphold traditional laws; prevent the misuse of indigenous heritage in culturally inappropriate or insensitive ways; and receive proper compensation for their cultural and intellectual property. Finally, it is essential that indigenous communities be able to describe and contextualize their culturally and historically significant collections in their own words and from their own perspectives.”

J. Hunter, B. Koopman, J. Sledge, “Software Tools for Indigenous Knowledge Management”, Museums and the Web 2003

“A cornerstone of an Indigenous Digital Library is that the indigenous communities themselves control the rights management of their cultural intellectual property. Local cultural protocols need to be documented and followed prior to the creation of digital content, and communities must be consulted with regard to the digitization of content already gathered by institutions of social memory.”

Robert Sullivan, “Indigenous Cultural and Intellectual Property Rights”, D-Lib May 2002

Selected Bibliography of Digitisation Selection Policies

(Given in chronological order)

Selecting Research Collections for Digitization by Dan Hazen, Jeffrey Horrell, Jan Merrill-Oldham, 1998

University of Oxford Assessment Criteria for Digitisation, 1999

A Handbook for Digital Projects: A Management Tool for Preservation and Access, edited by Maxine K. Sitts, Northeast Document Conservation Center, Andover, Massachusetts, 2000

Columbia University Selection Criteria for Digital Imaging, 2001

DEF (Denmark’s Electronic Research Library) Final Report. National Digitisation Programme and Policy by Brian Robinson and Simon Tanner, 2001

North Carolina Echo (Exploring Cultural Heritage Online), 2005

National Library of Australia Collection Digitisation Programme 2006

Policy Review

Policy Created: August 2007

Policy Due for Review: August 2009

1 The creation and wide availability of accurate catalogues, indexes and finding aids to enable the discovery of content which is not is another important piece of work required to improve access to heritage content. However this document is focused on digitisation of the content itself.

4 This paper is a slightly revised version of a selection policy document prepared by the NZETC for the National Digital Forum.

5 M. Roberts, W. Norman, N. Minhinnick, D. Wihongi, C. Kirkwood, Kaitiakitanga: Maori Perspectives on Conservation, University of Auckland, 1995, pp 1–2

6 Based largely on the criteria developed and published by Columbia University Libraries.

7 For an expanded discussion on the idea of significance see D. Dorner, S. Young, “A Regional Approach to Identifying Items of National Significance Held by Small Culture Institutions: A Research Report”, 2004.

NZETC Privacy Policy

This website uses Google Analytics, a web analytics service provided by Google, Inc. ("Google"). Google Analytics uses "cookies", which are text files placed on your computer, to help the website analyze how users use the site. The information generated by the cookie about your use of the website (including your IP address) will be transmitted to and stored by Google on servers in the United States. Google will use this information for the purpose of evaluating your use of the website, compiling reports on website activity for website operators and providing other services relating to website activity and internet usage. Google may also transfer this information to third parties where required to do so by law, or where such third parties process the information on Google's behalf. Google will not associate your IP address with any other data held by Google. You may refuse the use of cookies by selecting the appropriate settings on your browser. By using this website, you consent to the processing of data about you by Google in the manner and for the purposes set out above.

The NZETC makes use of Google Analytics in order to evaluate the usage of our site, and this information is useful in allowing us to:

  • determine which resources are heavily used, and so indicate areas that we should consider focusing future digitsation efforts upon;
  • determine which resources are lightly used, and so indicate areas where we should consider improving navigation and promotion of these resources;
  • measure the usage of particular resources so that we can provide feedback to those parties that are assisting us in making these resources available through financial or other support.

If you wish to opt-out of cookies from Google you can on the Google site.

Should you like further information about this privacy policy please contact us.

About this page...

Title: About the New Zealand Electronic Text Centre

Key subjects of this text: New Zealand Electronic Text Centre

Conditions of use