Dragonfly Data Science: Report on WCPFC project 107b: Improved stock assessment and structural uncertainty grid for Southwest Pacific blue shark

Citation

Neubauer, P., Carvalho, F., Ducharme-Barth, N., Large, K., Brouwer, S., Day, J., & Hamer, P. (2022). Report on WCPFC project 107b: Improved stock assessment and structural uncertainty grid for Southwest Pacific blue shark. WCPFC-SC18-2022/SA-WP-03 (Rev.01). Report to the Western and Central Pacific Fisheries Commission Scientific Committee. Eighteenth Regular Session, 10–18 August 2022. Electronic meeting.

Summary

This analysis presents additional work in order to constrain the model grid employed for the 2021 south Pacific blue shark (BSH) stock assessment in the Western and Central Pacific Ocean (WCPO). The 2021 stock assessment for BSH was accepted by the Scientific Committee at SC17. However, due to a number of uncertainties about the relative merit (model fit, plausibility) of individual models within the large (3888) model grid, SC17 was hesitant about using such a large grid to provide management advice. The SC17 recommended improving the manner in which the grid was selected before approving the results for providing management advice.

The present analysis attempted to address these concerns by running a number of standard diagnostics across all grid model runs to ensure that:

models had sufficiently converged and results were robust to jittering of starting values;
models were consistent and did not show large retrospective patterns (as evidenced by Mohn’s ρ); and
models had reasonable predictive skill.

Acknowledging that the stock was not unfished at the start of the assessment, and that references to unfished biomass may be misleading, as B₀ is likely poorly estimated; we also explored the results using an alternative reference point, namely SB/SB_F=0.

Our initial investigation of models in the 2021 assessment grid found that all models appeared to have converged to global solutions with small gradients for all estimated parameters across all models, all models had positive definite Hessian solutions, and jittering did not lead any models to find alternative optima, likely due to the low number of estimated parameters. Retrospective analyses of the 2021 grid showed that only a small number of models had large retrospective patterns, but these models were not consistently associated with a particular uncertainty axis. The majority of models had Mohn’s ρ values near zero. Filtering by these diagnostics did not significantly reduce the spread of outcomes from the initial 2021 model grid.

Given the lack of reduction of over-all uncertainty in the model grid, we further addressed model assumptions and inputs that were found to drive the spread in uncertainties, namely CPUE and natural mortality (M). Assuming low M, for example, accounted for most high estimates of SB/SB_F=0. Alternative CPUE assumptions had high impact, largely driven by inconsistent trends in early CPUE, and differences in recovery rates in recent CPUE among alternative indices. The 2021 stock assessment grid ignored process error, thereby placing high weight on CPUE indices (i.e., assuming high signal and low uncertainty). As a result, differences in indices were accentuated in grid runs that re-weighted or used alternative CPUE indices.

Two important decisions lead to a strong reduction in both the number of assessment models in the grid, as well as the spread of uncertainty in the outcomes. First, estimating M with an informative prior meant that one structural uncertainty axis could be dropped from the analysis. An additional two axes that contributed little to over-all outcome uncertainty were also dropped, resulting in a substantial reduction in the size of the initial (i.e., pre-diagnostic) grid. In addition, we included allowance for process error in CPUE, which may be large given unknown reporting trends for sharks. Acknowledging this process error in the models leads to less extreme trends, for both the diagnostic assessment scenario as well as the new model grid.

Estimating M also allowed for a closer inspection of the relationship between growth and M . In the previous formulation of the grid, having both growth and M as fixed values allowed for biologically inappropriate combinations of fast growth and low M. Estimating M alleviated this to a certain extent however it identified that in order for the fast growth hypothesis to fit the existing data, M needed to be implausibly large. As a result, the grid was further reduced by excluding the fast growth scenario.

Lastly, we followed recent analyses that have attempted to use various metrics to weight models in the uncertainty grid. We propose an iterative procedure that first excludes models that fail diagnostic criteria. We then weighted input axes for remaining models according to prior probabilities derived from either input analyses or analyst assessments of the relative utility of different inputs (e.g., CPUE time series). This a priori weighting can then be supplemented with a posteriori weighting for model fit or predictive skill.

We investigated a range of possible a posteriori weighting measures for the model grid, namely inverse variance weighting, MASE weights and stacking weights. Using predictive skill in the form of the MASE criterion did not reduce the outcome space significantly. We suggest that MASE is largely a measure of the degree to which a stock is production driven relative to being recruitment (“regime”) driven. The MASE criterion will likely select for production-driven, over recruitment driven models, which may or may not be desirable. We show that stacking weights, weighting the model ensemble directly to maximise model predictive skill, does not appear to share this property. Over-all none of these model-weighting approaches appeared to lead to substantial changes in the range of outcomes from the reduced uncertainty grid. We suggest that more research is required on the topic of model ensemble weighting, and we therefore formulate our recommendations on the basis of prior (input axis) weighting only.

Taken together, these analyses restrict the number of candidate models from 3888 in the 2021 uncertainty grid, to 228 models in the revised uncertainty grid, and lead to lower uncertainty compared with the 2021 model grid. Nevertheless, the over-all model conclusions and recommendations from the 2021 blue shark assessment remain valid. Substantial uncertainties about inputs and biological parameters remain. Our analyses underscore that for low- to medium information stocks, such as most sharks, uncertainties in model outcomes are not necessarily reducible in the short-term. Only improved biological data collection and recording of interactions with bycatch species will lead to improved precision in stock assessment. Nevertheless, we suggest that consistency in estimated recent recovery trends, as well as robustness of these trends to alternative model assumptions provide evidence for effectiveness of recent non-retention measures for sharks, and BSH in particular.

Although the sensitivity analysis highlighted a number of uncertainties, we found a number of consistent patterns in the outcomes. Based on these consistent trends, and using a restricted, weighted set of 228 uncertainty grid runs, we conclude that:

The most influential axis within the reduced uncertainty grid was the initial F assumption.
The stock biomass was low throughout the region through the early 2000s following the expansion of longline fishing effort in the region. But the estimates across the uncertainty grid of 228 models largely indicated that the stock has been recovering since then.
All 228 model runs indicate that fishing mortality at the end of the assessment period was below F_MSY and 87% of (weighted) model runs show that the biomass is above SB_MSY (median SB_recent/SB_MSY = 1.64 (90^th percentiles 0.88 and 1.87; Table 6), with the median estimated depletion SB_recent/SB_F=0 = 0.71 (90^th percentiles 0.37 and 0.82), and SB_recent/SB₀ = 0.80 (90^th percentiles 0.43 and 0.90).
Fishing mortality has declined over the last decade and is currently relatively low with the median F_recent/F_MSY = 0.65 (90^th percentiles 0.43 and 0.86; Table 6). This may be a result of most sharks being released upon capture from by most longline fleets.
Finally, considered against all conventional reference points the stock on average does not appear to be overfished and overfishing is not occurring.

Given some of the uncertainties highlighted above, we recommend that SC18 consider:

Providing more time, either as inter-session projects, or by extending time-frames for shark analyses. This will allow more thorough investigation of input data quality and trends, which shape assessment choices. In addition, it would allow input analyses to be completed in time to be presented to the pre-assessment workshop prior to the stock assessment. In addition, allowing more time for the assessments themselves will allow a more thorough investigation of alternative model structures, which may include comparisons with low-information methods such as spatial risk assessments.
Increased effort to re-construct catch histories for sharks (and other bycatch species) from a range of sources. Our catch reconstruction models showed that model assumptions and formulation can have important implications for reconstructed catches. Additional data sources, such as log-sheet reported captures from reliably reporting vessels, may be incorporated into integrated catch-reconstruction models to fill gaps in observer coverage.
Additional tagging be carried out using satellite tags in a range of locations, especially known nursery grounds in South-East Australia and New Zealand, as well as high seas areas to the north and east of New Zealand, where catch-rates are high. Such tagging may help to resolve questions about the degree of natal homing and mixing of the stock.
Tagging may also help to obtain better estimates of natural mortality, if carried out in sufficient numbers. This could be taken up as part of the WCPFC Shark Research Plan to assess the feasibility and scale of such an analysis.
Additional growth studies from a range of locations could help build a better understanding of typical growth, as well as regional growth differences. Current growth data are conflicting, despite evidence that populations at locations of current tagging studies are likely connected or represent individuals from the same population.
Genetic/genomic studies could be undertaken to augment the tagging work to help resolve these stock/sub-stock structure patterns. To support this work, a strategic tissue sampling program for sharks is recommended with samples to be stored and curated in the Pacific Marine Specimen Bank.