Measurement Error

No measurements are perfect, so quantifying repeatability is important
module 6
week 12
control structures
if else
(do) while
programming
Author
Affiliation

School of Life Sciences, University of Hawaii

Published

March 30, 2023

Acknowledgements

Material for this lecture was borrowed and adopted from

Learning objectives

Learning objectives

At the end of this lesson you will:

  • Be able to estimate measurement error and repeatability

Overview

Measurement Error and Repeatability

Morphometrics is all about assessing variability, within and between individuals. One of those sources of variability is measurement error.

Measurement Error (ME) itself comes from many potential sources:

  • the measurement device (precision)
  • definition of the measure
  • quality of the measured material
  • the measurer
  • the environment of the measurer (hopefully small!)
  • measurement protocol

We try to minimize ME so that we can reveal the underlying patterns we are interested in, but there will always be some ME. So it is important to quantify at least once at the beginning of the study.

Protocol for assessing ME

The percentage of measurement error is defined as the within-group component of variance divided by the total (within + betwee group) variance (Claude 2008):

\[ \%ME = \frac{s^{2}_{within}}{s^{2}_{within} + s^{2}_{among}} \times 100 \]

We can get the componets of variance \(s^{2}\) from the mean squares (\(MSS\)) of an ANOVA considering the individual (as a factor) source of variation. Individual here represents the within-group variation. The among and within variance can be estimated from the mean sum of squares and \(m\) the number of repeated measurements:

\[ s^{2}_{among} = \frac{MSS_{among} - MSS_{within}}{m} \]

and

\[ s^{2}_{within} = MSS_{within} \]

Example

Suppose we are taking photographs of specimens, and then collecting landmark data from the photos. This is a pretty typical data collection pipeline.

Because we are taking 2D photos from 3D objects, one potential issue is whether the shape variation we obtain is real, or whether it is introduced by placing either the object or the camera at slightly different angles.

Another potential issue is whether we are placing the digitized landmarks in exactly the same place.

There may be additional issues as well - for example some small ambiguity on the physical object, or the material or photos may be of different quality.

Plan your data management

I always recommend storing your metadata in the filenames. That way you never lose the information.

Photo files: A good strategy for data management is to label the photo files: id_picture_replicate.jpg

Where: - id refers to the specimen, - picture the replicate photo (photo1 or photo2), and - replicate the replicate landmark coordinates (rep1 or rep2).

We can parse the metadata from the filenames by code such as:

files <- list.files()  # to read the file names from the current drectory
files <- files <- c("id1_photo1_rep1.jpg", 
                    "id1_photo1_rep2.jpg", 
                    "id1_photo2_rep1.jpg", 
                    "id1_photo2_rep2.jpg"
                    )  # made up example to practice 

# Collect metadata, approach 1 - substr
meta <- strsplit(files, "_|\\.")  # metadata. split filenames by _ or . 
                                  # Need to use \\ to escape the .
id <- sapply(meta, "[[", 1)
photo <- sapply(meta, "[[", 2)
rep <- sapply(meta, "[[", 3)

# Collect metadata approach 2 - sub
# using sub and regular expressions to select (string1)_(string2)_(string3)
# also ignoring the final .jpg, where . is escaped by \\

id <- sub(
        "^([a-zA-Z0-9]+)_([a-zA-Z0-9]+)_([a-zA-Z0-9]+)(\\.jpg)", 
        "\\1", 
        files
      )
photo <- sub(
         "^([a-zA-Z0-9]+)_([a-zA-Z0-9]+)_([a-zA-Z0-9]+)(\\.jpg)", 
         "\\2", 
         files
       )
rep <- sub(
         "^([a-zA-Z0-9]+)_([a-zA-Z0-9]+)_([a-zA-Z0-9]+)(\\.jpg)", 
         "\\3", 
         files
       )

We can use these vectors along with the coordinates to test for measurement error with ANOVA.

Statistical methods for Measurement Error:

We will assess measurement error at two levels, photography error and digitizing error:

Photography error: Take two sets of photos, each time placing the object in front of the camera and positioning the specimen. (I.e., the entire process to give us a good estimate of photo capture error)

Landmark digitizing error: Collect landmarks twice, ideally in different sessions on different days or weeks.

Data: In this example we will have 4 sets of landmark data for each specimen, 2 photos x 2 digitizing replicates, allowing assessment of error associated with the digitization as well as error in capturing the shapes via the photographs.

Model: We will use a nested ANOVA to estimate repeatability and (measurement error) of the landmarks, to try to separate the variation introduced by the digitization process, apart from the other sources of variation.

Analyze with ANOVA:

Nested ANOVA indicates that we have a nested structure of replicates within groups (i.e., rep1 of photo1 has nothing to do with rep1 of photo2. rep is nested within photo.

In R we specify a nested model forumula using : in the model term (to indicate interaction terms only with no main effect):

lm.fit <- lm(coords ~ id:photo:rep)
aov(lm.fit)

Data and model term objects: - coords is the data object (a vector or array) - id is a vector containing labels for each specimen - photo is a vector (photo is 1 or 2) - rep is a vector (digitizing replicate 1 or 2)

Look at the values of the Mean Squares (MS) column in the ANOVA table. Compare the value for id:photo and id:photo:rep with id.

Repeatability

To calculate the repeatability of our digitizing ability, we subtract the MS of the rep term from the individual term and divide by two (because we have two replicates):

((MS(id) – MS(id:photo:rep))/2 )

Then we calculate the ratio of this value to the total MS:

((MS(id) – MS(id:photo:rep))/2 ) / (MS(id)+MS(id:photo)+MS(id:photo:rep))

The result is the repeatability, which in good circumstances is somewhere above 0.95; and thus 5% measurement error.

Simplifed Simulated example:

Simplifed example: 20 specimens, 1 photo, 2 digitzing reps:

20 specimens: (single measurement dataset). 2 repetitons: Digitize each photo twice (once in each of two sessions on different days).
How repeatable are the measurements?

Simulate the data:

true_m <- rnorm(20,20,3)  # true values for  specimens
m1 <- true_m + rnorm(20,0,0.5)  # measurement 1
m2 <- true_m + rnorm(20,0,0.5)  # measurement 2

id <- as.factor(rep(1:20, times=2))
rep <- gl(2, 20)
total_m <- c(m1, m2)
cbind(id, total_m, rep)  # the data
      id  total_m rep
 [1,]  1 24.15721   1
 [2,]  2 23.51620   1
 [3,]  3 20.32648   1
 [4,]  4 21.13753   1
 [5,]  5 19.63922   1
 [6,]  6 18.90770   1
 [7,]  7 20.80480   1
 [8,]  8 21.50574   1
 [9,]  9 19.54461   1
[10,] 10 18.38292   1
[11,] 11 16.51972   1
[12,] 12 22.56470   1
[13,] 13 20.90189   1
[14,] 14 27.00296   1
[15,] 15 19.14295   1
[16,] 16 22.80917   1
[17,] 17 21.36119   1
[18,] 18 17.27635   1
[19,] 19 20.43382   1
[20,] 20 22.27521   1
[21,]  1 24.22700   2
[22,]  2 23.89706   2
[23,]  3 19.29717   2
[24,]  4 21.86834   2
[25,]  5 20.88725   2
[26,]  6 18.54368   2
[27,]  7 20.65837   2
[28,]  8 21.74068   2
[29,]  9 18.87857   2
[30,] 10 17.97394   2
[31,] 11 16.30139   2
[32,] 12 22.07608   2
[33,] 13 21.17731   2
[34,] 14 26.78290   2
[35,] 15 19.46328   2
[36,] 16 21.98306   2
[37,] 17 21.04685   2
[38,] 18 17.49118   2
[39,] 19 20.60442   2
[40,] 20 22.55779   2

Is there a difference between the measurement sessions?

summary(aov(lm ( total_m ~ rep)))
            Df Sum Sq Mean Sq F value Pr(>F)
rep          1   0.01   0.014   0.002  0.962
Residuals   38 230.77   6.073               

No (thatʻs good!)

Is there a difference between individual specimens?

mod <- summary(aov(lm( total_m ~ id )))
mod
            Df Sum Sq Mean Sq F value   Pr(>F)    
id          19 228.00  12.000   86.17 3.82e-15 ***
Residuals   20   2.79   0.139                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Yes, and the resigual mean squared error looks small too (good!). How big is the measurement error?

s2_within <- ms_within <- mod[[1]][2,3]
s2_within
[1] 0.1392685
ms_among <- mod[[1]][1,3]
s2_among <- (ms_among-ms_within)/2
ME <- s2_within/(s2_within+s2_among) * 100
ME
[1] 2.294503

Not bad. A rule of thumb is that 5% ME is good (95% repeatability). If we want to reduce ME, we can use the average of the two measurements in our analyses.

References

Claude, Julien. 2008. Morphometrics with r. 1. Aufl. New York, NY: Springer-Verlag.