Ensure all permits, safety plans and approvals (e.g. Animal Ethics) have been obtained. Any research undertaken within AMPs requires a research permit issued from Parks Australia. See Appendix B for a list of potential permits that may be required.

Confirm sampling design meets survey objectives, is achievable with planned equipment and time, and has been communicated to all key scientists and managers. Generally, the sampling design in an ecological study should be statistically sound with adequate spatial coverage and replication, and it should use an explicit randomization procedure to ensure that independent replicates are obtained (Durden et al. 2016a). Increasing sample size where possible will also help to better inform models, and increase the study’s robustness (Mitchell et al. 2017). See Chapter 2 for further details on sampling design.

Define the sampling area to be surveyed in terms of space and time and identify any categorical constraints that may need to be imposed (e.g. acceptance of only those images captured within an altitude range of 2–4 m above the seabed) (Durden et al. 2016a).

Determine sampling unit (what to quantify within an image) and sample size (number of images, number of transects) to sample the habitat of interest. A complication in the determination of sample size in image-based studies using towed camera systems is variability in the physical size represented by respective images as the camera-to-subject distance often varies (Durden et al. 2016a).

Determine appropriate imagery system based on metric to be quantified. For seafloor imagery, some of the most important operational factors for the design of a platform and its deployment are depth, bottom topography, duration and spatial extent of survey, current speed, altitude control, turbidity and surface sea conditions (Barker et al. 1999). The specific configuration of equipment will depend on the scientific objectives of the survey and the type of data required. For example, high-definition video is commonly used to assess the spatial distribution, abundance and behaviour of benthic epifauna, and is also well-suited to identifying the spatial extent of substratum types and biological habitats (Bowden and Jones 2016). High-resolution images from stereo-cameras on the other hand are necessary for detailed species identification and precise sizing of individual organisms and quantifying specific seabed features (see Dunlop et al. 2015, Durden et al. 2016a, Sheehan et al. 2016).

Determine appropriate camera orientation. Camera orientation for towed systems is a critical parameter for quantitative interpretation of imagery (Bowden and Jones 2016). Images captured perpendicular (i.e. downward-facing) to the seabed are commonly used for spatial benthic ecological studies of sessile organisms, and substratum or seabed composition (Durden et al. 2016a). Whereas, images captured at oblique angles tend to be used for studies of motile fauna, such as demersal fish, as the image frame captures a greater area of seabed (or a larger volume of the water column) (see Bowden and Jones 2016, Durden et al. 2016a). Oblique camera orientation typically introduces inherent gradients of both lens‐to‐subject distance and illumination intensity, while a vertical orientation generally provides more even illumination and uniform subject-to-camera distance (Bowden and Jones 2016). These properties make vertical (i.e. downward-facing) orientated images more optimal for quantitative analyses of benthic substrata and sessile or sedentary biota. We recommend combining high-definition oblique video with high-resolution downward-facing camera/s, as this makes full use of both the descriptive potential of oblique-facing video (N.B, stereo -video required for examining fish metrics) and the potential for accurate quantitative analyses from vertical images, as well as reducing the risk of collision with seabed obstacles (Bowden and Jones 2016). Downward-facing camera/s, coupled with accurate geographic positioning (e.g. USBL, motion sensor) can facilitate mosaicking of images similar to that achievable with AUV platforms.

Particular care should be taken when selecting platform and optics, especially when developing a long-term ecological monitoring program. For example, it is not recommended to change the gear specifications over the monitoring period if the purpose of the study is to detect change over space and time (Sheehan et al. 2016).

Ensure accurate geo-referencing (position, position, position!). The geographic position and orientation of the camera(s) at the time of image capture is _critical _for ensuring accurate geo-referencing of an image (and the objects within it). This geographic position must be integrated with other sensor data to develop habitat maps or interpolations (see below). It is also critical for relating the sampled area to environmental covariates extracted from hydro-acoustic (Mitchell et al. 2017) and other platform sensors (Shortis et al. 2007).

Ensure synchronisation of time stamps. The time standard (typically UTC) for a given survey needs to be pre-determined and strictly adhered to. Synchronisation of timestamps across all systems (e.g. USBL and other platform sensors, PC time(s), ship navigation, video and still camera systems) is critical for ensuring accurate geo-referencing of images. Time accuracy to three decimal places is optimal.

Determine real-time annotation protocols, if desired. Although real-time annotation is not required for this field manual, it is recognised that this is an established practice for many individuals and agencies. If a real-time imagery feed is available, follow agency-specific protocols for onboard annotation. At the least, a qualitative description can be written for each station, thus ensuring some information is immediately available for post-survey reporting and to guide subsequent analysis (see Appendix C) [Recommended].

Stereo-cameras should be pre- or post-calibrated in shallow water using the techniques outlined in Shortis and Harvey (2009). Typical requirements of a multi-station, self-calibration network include multiple convergent photographs, camera roll at each location and a 3D target array (see Shortis et al. 2009). If housings or mounts are changed or damaged during deployment, re-calibration is required.

Paired calibrated lasers should be used if not using stereo-cameras, with a known separation distance used as a reference for scaling objects. This can enhance the performance of 2-D and 3-D imaging systems/reconstructions (Caimi et al. 2008) and align video and stills by time.

Consider potential spatial and temporal errors that may result from the choice of towed camera system and how these errors may potentially affect habitat mapping and modelling of data (e.g. Monk et al. 2012, Rattray et al. 2014). It is important to take into account errors from vessel motion (i.e. heave, pitch, roll and yaw), USBL beacon positioning, GPS, and measurement inaccuracies resulting from the application of stereo-camera calibrations carried out in shallow water to imagery gathered at greater depths (see Shortis et al. 2009). It is also important to ensure that the recording frequency of sensor data is matched to the intended use of the sensor data – e.g. pitch recorded at 1s intervals may not be sufficient to correct for changes in the field of view in a video as the camera is towed.

Consider locational uncertainty in occurrence data. To generate realistic predictions, species distribution models require accurate geo-referencing of occurrence data with environmental variables (Mitchell et al. 2017). Although some high-performing, fine-scale models can be generated from data containing locational uncertainty, interpreting their predictions can be misleading if the predictions are interpreted at scales similar to the spatial errors (Mitchell et al. 2017). See Foster et al. (2012) and Stoklasa et al. (2015) for a more statistical view of this issue in an ecological context.

Consider onboard data formats and establish workflow for data transfer and battery charging prior to survey commencement. This field manual does not mandate particular data formats as these may differ depending on the choice of annotation software and process for specific extensions. For example, video data may require transcoding into web-viewable format (e.g. H264). Common formats include .mp4 and .avi for video data and .jpeg, and .tiff for still imagery. Several video containers (e.g. Quicktime) allow embedding of timecode and/or closed caption tracks into the video file and are frame-accurate during playback. Where possible such formats are preferable. The H264 codec is suboptimal for high speed transects so original video file copies should be kept for reference during analysis. In some instances, saving information in raw format may be necessary for the purpose of post-processing. Files may also need to be compressed for public accessibility. Regardless of data formats, it is essential to establish a documented workflow for data transfer and battery charging prior to survey commencement.

Consider the metadata required for subsequent data post processing, storage and release, such as the video or image location, camera attributes, date, time (in UTC), altitude (in m), angle of acceptance, motion of towed platform (i.e. heave, pitch, roll and yaw in degrees) and the precision required of each (Durden et al. 2016a). Consider size, location and access of final imagery and video datasets and where these will be archived. Metadata must be adequate enough to satisfy conformance checks for data release via open access data portals such as the Australian Ocean Data Network (AODN http://imos.org.au/facilities/aodn/aodn-submit-data/).

Consider metadata at various levels:

  • Archived survey (project) level: to specify the decisions regarding sampling design, image selection, platform used etc.
  • Imagery platform level: camera types, camera orientation, sensors, instrumentation settings (should be kept stable throughout a survey, but metadata needs to reflect any adjustments/ changes made with a timestamp when they are made in the survey.
  • At image/ video level (as per below).

Consider how metadata will link to media type. The most effective way to link visual imagery with metadata is by incorporation into a spatially enabled relational database (Bowden and Jones 2016), using the synchronised time stamps and GIS position for linking imagery and sensor data. Important considerations include:

  • Archived file names should include Platform, Survey, Deployment, Date and Start-Time (e.g. Platform name_ survey name_deployment or site number_YYYY-MM-DDTHH:MM:SSZ_descriptor.json).
  • If possible we recommend writing image metadata into EXIF fields embedded in the digital image file to ensure metadata are not separated from images.
  • Geotagging video imagery is less established but various options exist including: i) Embedding position, date and time on the imagery itself suggest using an inconspicuous location within the field of view; ii) Utilizing the video audio track or closed-caption track to record position date and time using a geostamping device, iii) Proprietary video recording and playback equipment and /or software that associates position metadata with recorded video files (e.g. Streampix https://www.norpix.com/products/streampix/modules/gps.php; GeoDVR https://www.remotegeo.com/geospatial-video-recorders/geodvr-gen3/); and iv) Embedding UTC timecode into the video media file (e.g. Quicktime .imov files recorded by AJA KiPro devices can have timecode generated and embedded by a GPS-timecode generator).