Big data: an opportunity to do things differently

Big data: an opportunity to do things differently

Recent years have seen huge expansion in offshore activities, with a variety of sectors now making use of the seabed (e.g. fishing, aggregate dredging, renewable energy generation, oil and gas, cables and pipelines). For many of these activities there is a requirement to collect benthic environmental data for characterisation and licence compliance monitoring. In addition, government generates significant quantities of such data through its own activities (research and monitoring).

Whilst collected data are used for their original purpose, they also have wider value and utility. In recognition, data providers have become increasingly willing, and in some cases compelled, to share information. There are now a host of repositories, including the Crown Estate’s Marine Data Exchange (MDE), where data can be freely obtained. The challenge now is to turn these disparate datasets into new information, providing benefits for industry and government and leading to greater efficiency and improved sustainability. In addition, bringing data together can facilitate harmonisation and coordination of monitoring activities across the different sectors (Barrio Froján, Cooper & Bolam, 2016).

A framework for achieving the above aspirations has been developed by Cefas scientists. Individual datasets are obtained, on an ongoing basis, from providers and online repositories. Data are checked and standardized before entering into a dedicated PostgreSQL database OneBenthic. The data held in the OneBenthic (currently 35k samples from 780 surveys) are used in research that directly benefits data providers, both in industry and government.

framework model for data integration from datasets to one database to research and app outputs

This approach can be used for a variety of assessment needs.

For example, in Cooper and Barry (2017), data were used to produce a baseline assessment of the UK seabed macrofauna, and to develop an entirely new approach to monitoring the impacts of activities affecting seabed sediments. This new monitoring approach, which forms the basis of the marine aggregate industry’s Regional Seabed Monitoring Programme (RSMP), has both improved the sustainability of dredging, by ensuring seabed conditions remain suitable for recolonization, and reduced compliance monitoring costs by 50%  (BMAPA, 2015). This one example graphically illustrates the benefits of closer industry/government working, and the power of big data for addressing environmental issues.

In another study (Cooper et al., 2019), the combined dataset was used to develop a new habitat classification in which seabed biota are used to identify meaningful habitat. Identified habitat strata are less biologically variable in comparison to the widely used, physical- based EUNIS classification. As a result, far fewer samples are required to detect change, thereby reducing the costs of monitoring.

There are now a host of on-line repositories where data can be freely obtained. The challenge now is to turn these disperate datasets into new information, providing benefits for industry and government and leading to greater efficiency and improved sustainability.

To allow data providers (and other interested parties), to interact with the combined dataset and research outputs, a suite of simple on-line web applications is being developed using the ‘R Shiny’ package. R shiny allows complex data analysis undertaken in the statistical programming language R to be turned into simple and intuitive web apps, allowing users to view outputs in the context of their own areas of interests. All apps are linked to the OneBenthic database, ensuring outputs are based on the very latest available data. Apps are made available through the Cefas Open Science portal and currently include:

  1. Baseline Tool
  2. M-test tool
  3. Faunal cluster ID tool 
  4. Benthic non-native species tool