How did we digitise FZ?
- To create a database from the FZ, the text was first obtained in digital form, in an MS Word document provided by the editorial team on CD. If no softcopy of the Flora was available, volumes were digitised from the published bound volumes. For this, the bound volumes were outsourced for digitisation by using Optical Character Recognition (OCR).
- Once the Flora was in digital format, the document was 'tagged' or 'marked-up' by utilising repeating word patterns and formatting them using named MS Word paragraph and character styles created in the style gallery. These style names were applied to the various components of the Flora using a combination of macro and manual marking-up using keyboard shortcuts and the mouse.
- When the text has been fully marked-up it was saved as an XML document. In XML, the marked-up sections of text appear between XML tags which match the names of the styles applied in step 2 above. The document was then restructured using XSLT stylesheets so that the flat Flora text matched the schema of the final database.
- The information in the XML file was imported into a MySQL database: each XML tag represents a field in the database schema.
- Once the database was formed, the data was manipulated for the website output and for data analysis.
Microsoft Word, Instant Saxon, Java and MySQL were used in the database forming process.