Automatic Pooling

Carl James Schwarz

2024-01-25

1 Introduction

Several vignettes explored how to pool rows and/or columns to deal with the near singularity of the recapture matrix. A new function has been introduced SPAS.autopool() that attempts to automate this process.

The automatic pooling algorithm follows recommendations in Schwarz and Taylor (1998) and proceeds as follows:

The values of min.released,min.inspected,min.recaps can be passed as arguments to the function.

The function returns a suggested pooling vector for the rows and columns and a reduced data matrix if there are rows that are all zero or columns that are all zero.

The SPAS.fit.model() function also now has a autopool argument where an automated pooling will be attempted rather than you passing the pooling vectors.

2 Example.

The example from the Harrison River will be used:

2.1 Load the SPAS package

library(SPAS)
#> ***** SPAS: Stratified Petersen Analysis System - Version 2024.1.31 2024-01-31) ***** 
#> 
#>       Help available with  help(package='SPAS') 
#>       Several vignettes are available. See browseVignettes(package="SPAS")

This will load the SPAS fitting functions and any associated packages needed by SPAS.

2.2 Import the data.

The data should be stored as an \(s+1\) by \(t+1\) (before pooling) matrix. The \(s \times t\) upper left matrix is the number of animals released in row stratum \(i\) and recovered in column stratum \(j\). Row \(s+1\) contains the total number of UNMARKED animals recovered in column stratum \(j\). Column \(t+1\) contains the number of animals marked in each row stratum but not recovered in any column stratum. The \(rawdata[s+1, t+1]\) is not used and can be set to 0 or NA. The sum of the entries in each of the first \(s\) rows is then the number of animals marked in each row stratum. The sum of the entries in each of the first \(t\) columns is then the number of animals captured (marked and unmarked) in each column stratum. The row/column names of the matrix may be set to identify the entries in the output.

Here is the raw data for the Harrison River. Notice that very small number of releases and recoveries in week 6.

harrison.2011.chinook.F.csv <- textConnection("
  4   ,      2   ,      1   ,     1   ,     0   ,     0   ,   130
 12   ,      7   ,     14   ,     1   ,     3   ,     0   ,   330
  7   ,     11   ,     41   ,     9   ,     1   ,     1   ,   790
  1   ,     13   ,     40   ,    12   ,     9   ,     1   ,   667
  0   ,      1   ,      8   ,     8   ,     3   ,     0   ,   309
  0   ,      0   ,      0   ,     0   ,     0   ,     1   ,    65
744   ,   1187   ,   2136   ,   951   ,   608   ,   127   ,     0")

har.data <- as.matrix(read.csv(harrison.2011.chinook.F.csv, header=FALSE))
har.data
#>       V1   V2   V3  V4  V5  V6  V7
#> [1,]   4    2    1   1   0   0 130
#> [2,]  12    7   14   1   3   0 330
#> [3,]   7   11   41   9   1   1 790
#> [4,]   1   13   40  12   9   1 667
#> [5,]   0    1    8   8   3   0 309
#> [6,]   0    0    0   0   0   1  65
#> [7,] 744 1187 2136 951 608 127   0

A total of 138 fish were tagged and released in week 1. Of these fish, 4 were recovered down stream in column stratum 1; 2 were recovered in column stratum 2; and 130 were never seen again. A total of 744 UNMARKED fish were recovered in column stratum 1.

You can add row and column names to the matrix which will used when the data are printed.

2.3 Automate Pooling

We invoke the SPAS.autopool() function and look at the output:

res <- SPAS.autopool(har.data)
res
#> $rawdata
#>       V1   V2   V3  V4  V5  V6  V7
#> [1,]   4    2    1   1   0   0 130
#> [2,]  12    7   14   1   3   0 330
#> [3,]   7   11   41   9   1   1 790
#> [4,]   1   13   40  12   9   1 667
#> [5,]   0    1    8   8   3   0 309
#> [6,]   0    0    0   0   0   1  65
#> [7,] 744 1187 2136 951 608 127   0
#> 
#> $reddata
#>       V1   V2   V3  V4  V5  V6  V7
#> [1,]   4    2    1   1   0   0 130
#> [2,]  12    7   14   1   3   0 330
#> [3,]   7   11   41   9   1   1 790
#> [4,]   1   13   40  12   9   1 667
#> [5,]   0    1    8   8   3   0 309
#> [6,]   0    0    0   0   0   1  65
#> [7,] 744 1187 2136 951 608 127   0
#> 
#> $row.pool
#> [1] 1 2 3 4 6 6
#> 
#> $col.pool
#> [1] 1 2 3 4 5 6
#> 
#> $summary
#>       V1   V2   V3  V4  V5  V6  V7   
#> [1,]   4    2    1   1   0   0 130  1
#> [2,]  12    7   14   1   3   0 330  2
#> [3,]   7   11   41   9   1   1 790  3
#> [4,]   1   13   40  12   9   1 667  4
#> [5,]   0    1    8   8   3   0 309  6
#> [6,]   0    0    0   0   0   1  65  6
#> [7,] 744 1187 2136 951 608 127   0 NA
#> [8,]   1    2    3   4   5   6  NA NA

There were no rows or columns that were all zero, so the reduced data (res) is the same as the original data.

The suggested pooling vector for columns indicates no pooling was done (all entries are the same). The suggested pooling vector for rows, suggests rows 5 and 6 be pooled as indicated by the duplicate entries of 6 in the pooling vector.

Finally, the reduced data and suggest pooling vectors are presented in one display.

2.4 Fitting allowing for automated pooling.

We can use the autopool argument in the SPAS.fit.model to do automated pooling prior to the fit.

mod..1 <- SPAS.fit.model(har.data,
                       model.id="Automated pooling",
                       autopool=TRUE)
#> Using nlminb to find conditional MLE
#> outer mgc:  1881.784 
#> outer mgc:  4389.186 
#> outer mgc:  1468.236 
#> outer mgc:  422.6119 
#> outer mgc:  113.8934 
#> outer mgc:  26.50151 
#> outer mgc:  43.40223 
#> outer mgc:  55.72318 
#> outer mgc:  15.80935 
#> outer mgc:  22.2736 
#> outer mgc:  3.903875 
#> outer mgc:  0.8192351 
#> outer mgc:  10.92489 
#> outer mgc:  6.158485 
#> outer mgc:  2.88843 
#> outer mgc:  9.989736 
#> outer mgc:  7.862851 
#> outer mgc:  19.72383 
#> outer mgc:  4.490593 
#> outer mgc:  1.984147 
#> outer mgc:  1.565641 
#> outer mgc:  5.72284 
#> outer mgc:  15.25525 
#> outer mgc:  5.787902 
#> outer mgc:  16.57644 
#> outer mgc:  4.078164 
#> outer mgc:  19.07256 
#> outer mgc:  2.186887 
#> outer mgc:  3.40589 
#> outer mgc:  3.06535 
#> outer mgc:  7.242766 
#> outer mgc:  2.785986 
#> outer mgc:  5.301617 
#> outer mgc:  1.128007 
#> outer mgc:  2.102554 
#> outer mgc:  0.5954426 
#> outer mgc:  2.630831 
#> outer mgc:  0.3571193 
#> outer mgc:  3.785503 
#> outer mgc:  0.1577663 
#> outer mgc:  0.98001 
#> outer mgc:  0.5945414 
#> outer mgc:  1.18846 
#> outer mgc:  0.3909604 
#> outer mgc:  1.043338 
#> outer mgc:  0.204174 
#> outer mgc:  0.8648443 
#> outer mgc:  0.1053468 
#> outer mgc:  0.520781 
#> outer mgc:  0.07018663 
#> outer mgc:  0.2067039 
#> outer mgc:  0.0523693 
#> outer mgc:  0.05568652 
#> outer mgc:  0.02337035 
#> outer mgc:  0.01194841 
#> outer mgc:  0.005044315 
#> Convergence codes from nlminb  0 relative convergence (4) 
#> Finding conditional estimate of N
SPAS.print.model(mod..1)
#> Model Name: Automated pooling 
#>    Date of Fit: 2024-01-25 12:19 
#>    Version of OPEN SPAS used : SPAS-R 2023-03-31 
#>  
#> Raw data 
#>       V1   V2   V3  V4  V5  V6  V7
#> [1,]   4    2    1   1   0   0 130
#> [2,]  12    7   14   1   3   0 330
#> [3,]   7   11   41   9   1   1 790
#> [4,]   1   13   40  12   9   1 667
#> [5,]   0    1    8   8   3   0 309
#> [6,]   0    0    0   0   0   1  65
#> [7,] 744 1187 2136 951 608 127   0
#> 
#> Row pooling setup : 1 2 3 4 6 6 
#> Col pooling setup : 1 2 3 4 5 6 
#> Physical pooling  : TRUE 
#> Theta pooling     : FALSE 
#> CJS pooling       : FALSE 
#> 
#> 
#> Chapman estimator of population size  70135  (SE  4503  )
#>  
#> 
#> Raw data AFTER PHYSICAL (but not logical) POOLING 
#>       pool1 pool2 pool3 pool4 pool5 pool6  V7
#> pool1     4     2     1     1     0     0 130
#> pool2    12     7    14     1     3     0 330
#> pool3     7    11    41     9     1     1 790
#> pool4     1    13    40    12     9     1 667
#> pool6     0     1     8     8     3     1 374
#>         744  1187  2136   951   608   127   0
#> 
#> Condition number of XX' where X= (physically) pooled matrix is  3461.147 
#> Condition number of XX' after logical pooling                   3461.147 
#> 
#> Large value of kappa (>1000) indicate that rows are approximately proportional which is not good
#> 
#>   Conditional   Log-Likelihood: 47415.98    ;  np: 40 ;  AICc: -94751.97 
#> 
#>   Code/Message from optimization is:  0 relative convergence (4) 
#> 
#> Estimates
#>              pool1  pool2  pool3 pool4 pool5 pool6 psi cap.prob exp factor
#> pool1          3.6    2.7    0.8   0.8   0.0   0.1 130    0.005      195.8
#> pool2         12.0    7.0   14.0   1.0   3.0   0.0 330    1.000        0.0
#> pool3          7.0   11.0   41.0   9.0   1.0   1.0 790    1.000        0.0
#> pool4          1.0   13.8   38.0  11.3  10.6   1.3 667    0.022       44.1
#> pool6          0.0    1.1    7.6   7.6   3.5   1.3 374    0.025       39.6
#> est unmarked 744.0 1185.0 2139.0 952.0 606.0 126.0   0       NA         NA
#>              Pop Est
#> pool1          27162
#> pool2            367
#> pool3            860
#> pool4          33542
#> pool6          16035
#> est unmarked   77966
#> 
#> SE of above estimates
#>              pool1 pool2 pool3 pool4 pool5 pool6  psi cap.prob exp factor
#> pool1          1.5   1.3   0.8   0.8   0.0   0.5 11.4    0.002       85.5
#> pool2          3.5   2.6   3.7   1.0   1.7   0.0 18.2    0.000        0.0
#> pool3          2.6   3.3   6.4   3.0   1.0   1.0 28.1    0.000        0.0
#> pool4          1.0   3.7   6.2   3.2   2.5   1.3 25.8    0.009       17.6
#> pool6          0.0   1.1   2.6   3.0   2.0   1.3 19.3    0.035       57.1
#> est unmarked    NA    NA    NA    NA    NA    NA  0.0       NA         NA
#>              Pop Est
#> pool1          11794
#> pool2              0
#> pool3              0
#> pool4          13070
#> pool6          22539
#> est unmarked   15462
#> 
#> 
#> Chisquare gof cutoff  : 0.1 
#> Chisquare gof value   : 1.131646 
#> Chisquare gof df      : 1 
#> Chisquare gof p       : 0.2874245

The automated pooling combined the last two rows, but the fit is not entire satisfactory because of the high condition number on X’X.

Some further bespoke pooling is likely necessary.

3 References

Schwarz, C. J., & Taylor, C. G. (1998). The use of the stratified-Petersen estimator in fisheries management with an illustration of estimating the number of pink salmon (Oncorhynchus gorbuscha) that return to spawn in the Fraser River. Canadian Journal of Fisheries and Aquatic Sciences, 55, 281–296. https://doi.org/10.1139/f97-238