Biometric Score Calibration for Forensic Face Recognition

Tauseef Ali

Research output: ThesisPhD Thesis - Research UT, graduation UTAcademic

85 Downloads (Pure)

Abstract

When two biometric specimens are compared using an automatic biometric recognition system, a similarity metric called “score‿ can be computed. In forensics, one of the biometric specimens is from an unknown source, for example, from a CCTV footage or a fingermark found at a crime scene and the other biometric specimen is obtained from a known source, for example, from a suspect. Automatic biometric recognition systems are gradually replacing the forensic examiners’ manual comparison of the two biometric specimens. In forensics, there is a huge interest to use a suitable measure to report the output of the comparison of the two biometric specimens. This has led to the use of the likelihood-ratio, P(s|Hp)P(s|Hd), where s is the score computed by an automatic biometric recognition system, Hp is the hypothesis of the prosecution (which states that the two biometric specimens are obtained from a same-source) and Hd is the hypothesis of the defense (which states that the two biometric specimens are obtained from different sources). Generally, two sets of training scores, one under Hp and the other under Hd, are needed to compute a likelihood-ratio from a score. In this thesis, we review several methods of likelihood-ratio computation focusing mainly on the issues of the sampling variability in the sets of training scores and the specific conditioning imposed on the pairs of the biometric specimens to compute them. Three different methods are considered in detail: Kernel density estimation, Logistic regression and Pool adjacent Violators. The effect of the sampling variability is quantified varying : 1) the shapes of the probability density functions which model the distributions of the scores under Hp and under Hd; 2) the sizes of the training sets under Hp and under Hd; 3) the actual value of the score for which the likelihood-ratio is computed. The study proposes a simulation framework which can be used to study several properties of a likelihood-ratio computation method and to quantify the effect of the sampling variability in a likelihood-ratio. This is useful for an appropriate and informed choice of a likelihood-ratio computation method. It is shown that sampling variability is a serious concern when small sets of the training scores are available for likelihood-ratio computation. Our study of likelihood-ratio computation also focuses on the specific conditioning imposed on the pairs of biometric specimens used for computation of the sets of the training scores. In general, the two sets of training scores are viii Summary obtained from a same-source and different-sources comparisons of biometric specimens. However, the same-source and different-sources conditions can be anchored to a specific suspect in a forensic case or it can be generic samesource and different-sources comparisons independent of the suspect involved in the case. This results in two likelihood-ratios which differ in the nature of the training scores they use and therefore consider slightly different interpretations of the two hypotheses. An empirical study is carried out to quantify how much and how frequently the two likelihood-ratios vary considering a speaker, a face and a fingerprint recognition system. Study showed that there is significant variations in the two likelihood-ratios and therefore explicit definition of the training sets and the hypotheses implied by them is very important. The state-of-the-art towards automated forensic face recognition is reviewed and the concept of likelihood-ratio is applied to several existing biometric face recognition systems. In forensic situations, e.g., when an image from a crime scene is compared with an image from a suspect, forensic face recognition is currently a manual process referred to as “forensic facial comparison‿ and performed by forensic examiners based on their experience and a limited set of guidelines. A step is taken towards automation of forensic face recognition by studying the discriminating powers of different facial features such as eyes, eye brows, nose, etc. This kind of regional comparison is the essence of forensic facial comparison and prove very useful in situations where a part of the face is available for comparison. Besides the automation, it might also be feasible to use existing automatic face recognition systems for forensic comparison and reporting. To this end, several face recognition systems are calibrated so that they produce likelihood-ratios and their performance is evaluated based on the likelihood-ratios assessment tools.
Original languageUndefined
Awarding Institution
  • University of Twente
Supervisors/Advisors
  • Veldhuis, Raymond N.J., Supervisor
  • Meuwly, Didier , Supervisor
  • Spreeuwers, Lieuwe Jan, Advisor
  • Meuwly, D., Supervisor
Thesis sponsors
Award date19 Jun 2014
Place of PublicationEnschede
Publisher
Print ISBNs978-90-365-3689-9
DOIs
Publication statusPublished - 19 Jun 2014

Keywords

  • SCS-Safety
  • METIS-303759
  • Biometric
  • EWI-25054
  • Calibration
  • Forensic
  • Face Recognition
  • IR-91252

Cite this

Ali, Tauseef. / Biometric Score Calibration for Forensic Face Recognition. Enschede : Universiteit Twente, 2014. 133 p.
@phdthesis{8b29b3b21e5240d091ebd2c09af03e9f,
title = "Biometric Score Calibration for Forensic Face Recognition",
abstract = "When two biometric specimens are compared using an automatic biometric recognition system, a similarity metric called “score‿ can be computed. In forensics, one of the biometric specimens is from an unknown source, for example, from a CCTV footage or a fingermark found at a crime scene and the other biometric specimen is obtained from a known source, for example, from a suspect. Automatic biometric recognition systems are gradually replacing the forensic examiners’ manual comparison of the two biometric specimens. In forensics, there is a huge interest to use a suitable measure to report the output of the comparison of the two biometric specimens. This has led to the use of the likelihood-ratio, P(s|Hp)P(s|Hd), where s is the score computed by an automatic biometric recognition system, Hp is the hypothesis of the prosecution (which states that the two biometric specimens are obtained from a same-source) and Hd is the hypothesis of the defense (which states that the two biometric specimens are obtained from different sources). Generally, two sets of training scores, one under Hp and the other under Hd, are needed to compute a likelihood-ratio from a score. In this thesis, we review several methods of likelihood-ratio computation focusing mainly on the issues of the sampling variability in the sets of training scores and the specific conditioning imposed on the pairs of the biometric specimens to compute them. Three different methods are considered in detail: Kernel density estimation, Logistic regression and Pool adjacent Violators. The effect of the sampling variability is quantified varying : 1) the shapes of the probability density functions which model the distributions of the scores under Hp and under Hd; 2) the sizes of the training sets under Hp and under Hd; 3) the actual value of the score for which the likelihood-ratio is computed. The study proposes a simulation framework which can be used to study several properties of a likelihood-ratio computation method and to quantify the effect of the sampling variability in a likelihood-ratio. This is useful for an appropriate and informed choice of a likelihood-ratio computation method. It is shown that sampling variability is a serious concern when small sets of the training scores are available for likelihood-ratio computation. Our study of likelihood-ratio computation also focuses on the specific conditioning imposed on the pairs of biometric specimens used for computation of the sets of the training scores. In general, the two sets of training scores are viii Summary obtained from a same-source and different-sources comparisons of biometric specimens. However, the same-source and different-sources conditions can be anchored to a specific suspect in a forensic case or it can be generic samesource and different-sources comparisons independent of the suspect involved in the case. This results in two likelihood-ratios which differ in the nature of the training scores they use and therefore consider slightly different interpretations of the two hypotheses. An empirical study is carried out to quantify how much and how frequently the two likelihood-ratios vary considering a speaker, a face and a fingerprint recognition system. Study showed that there is significant variations in the two likelihood-ratios and therefore explicit definition of the training sets and the hypotheses implied by them is very important. The state-of-the-art towards automated forensic face recognition is reviewed and the concept of likelihood-ratio is applied to several existing biometric face recognition systems. In forensic situations, e.g., when an image from a crime scene is compared with an image from a suspect, forensic face recognition is currently a manual process referred to as “forensic facial comparison‿ and performed by forensic examiners based on their experience and a limited set of guidelines. A step is taken towards automation of forensic face recognition by studying the discriminating powers of different facial features such as eyes, eye brows, nose, etc. This kind of regional comparison is the essence of forensic facial comparison and prove very useful in situations where a part of the face is available for comparison. Besides the automation, it might also be feasible to use existing automatic face recognition systems for forensic comparison and reporting. To this end, several face recognition systems are calibrated so that they produce likelihood-ratios and their performance is evaluated based on the likelihood-ratios assessment tools.",
keywords = "SCS-Safety, METIS-303759, Biometric, EWI-25054, Calibration, Forensic, Face Recognition, IR-91252",
author = "Tauseef Ali",
year = "2014",
month = "6",
day = "19",
doi = "10.3990/1.9789036536899",
language = "Undefined",
isbn = "978-90-365-3689-9",
publisher = "Universiteit Twente",
school = "University of Twente",

}

Biometric Score Calibration for Forensic Face Recognition. / Ali, Tauseef.

Enschede : Universiteit Twente, 2014. 133 p.

Research output: ThesisPhD Thesis - Research UT, graduation UTAcademic

TY - THES

T1 - Biometric Score Calibration for Forensic Face Recognition

AU - Ali, Tauseef

PY - 2014/6/19

Y1 - 2014/6/19

N2 - When two biometric specimens are compared using an automatic biometric recognition system, a similarity metric called “score‿ can be computed. In forensics, one of the biometric specimens is from an unknown source, for example, from a CCTV footage or a fingermark found at a crime scene and the other biometric specimen is obtained from a known source, for example, from a suspect. Automatic biometric recognition systems are gradually replacing the forensic examiners’ manual comparison of the two biometric specimens. In forensics, there is a huge interest to use a suitable measure to report the output of the comparison of the two biometric specimens. This has led to the use of the likelihood-ratio, P(s|Hp)P(s|Hd), where s is the score computed by an automatic biometric recognition system, Hp is the hypothesis of the prosecution (which states that the two biometric specimens are obtained from a same-source) and Hd is the hypothesis of the defense (which states that the two biometric specimens are obtained from different sources). Generally, two sets of training scores, one under Hp and the other under Hd, are needed to compute a likelihood-ratio from a score. In this thesis, we review several methods of likelihood-ratio computation focusing mainly on the issues of the sampling variability in the sets of training scores and the specific conditioning imposed on the pairs of the biometric specimens to compute them. Three different methods are considered in detail: Kernel density estimation, Logistic regression and Pool adjacent Violators. The effect of the sampling variability is quantified varying : 1) the shapes of the probability density functions which model the distributions of the scores under Hp and under Hd; 2) the sizes of the training sets under Hp and under Hd; 3) the actual value of the score for which the likelihood-ratio is computed. The study proposes a simulation framework which can be used to study several properties of a likelihood-ratio computation method and to quantify the effect of the sampling variability in a likelihood-ratio. This is useful for an appropriate and informed choice of a likelihood-ratio computation method. It is shown that sampling variability is a serious concern when small sets of the training scores are available for likelihood-ratio computation. Our study of likelihood-ratio computation also focuses on the specific conditioning imposed on the pairs of biometric specimens used for computation of the sets of the training scores. In general, the two sets of training scores are viii Summary obtained from a same-source and different-sources comparisons of biometric specimens. However, the same-source and different-sources conditions can be anchored to a specific suspect in a forensic case or it can be generic samesource and different-sources comparisons independent of the suspect involved in the case. This results in two likelihood-ratios which differ in the nature of the training scores they use and therefore consider slightly different interpretations of the two hypotheses. An empirical study is carried out to quantify how much and how frequently the two likelihood-ratios vary considering a speaker, a face and a fingerprint recognition system. Study showed that there is significant variations in the two likelihood-ratios and therefore explicit definition of the training sets and the hypotheses implied by them is very important. The state-of-the-art towards automated forensic face recognition is reviewed and the concept of likelihood-ratio is applied to several existing biometric face recognition systems. In forensic situations, e.g., when an image from a crime scene is compared with an image from a suspect, forensic face recognition is currently a manual process referred to as “forensic facial comparison‿ and performed by forensic examiners based on their experience and a limited set of guidelines. A step is taken towards automation of forensic face recognition by studying the discriminating powers of different facial features such as eyes, eye brows, nose, etc. This kind of regional comparison is the essence of forensic facial comparison and prove very useful in situations where a part of the face is available for comparison. Besides the automation, it might also be feasible to use existing automatic face recognition systems for forensic comparison and reporting. To this end, several face recognition systems are calibrated so that they produce likelihood-ratios and their performance is evaluated based on the likelihood-ratios assessment tools.

AB - When two biometric specimens are compared using an automatic biometric recognition system, a similarity metric called “score‿ can be computed. In forensics, one of the biometric specimens is from an unknown source, for example, from a CCTV footage or a fingermark found at a crime scene and the other biometric specimen is obtained from a known source, for example, from a suspect. Automatic biometric recognition systems are gradually replacing the forensic examiners’ manual comparison of the two biometric specimens. In forensics, there is a huge interest to use a suitable measure to report the output of the comparison of the two biometric specimens. This has led to the use of the likelihood-ratio, P(s|Hp)P(s|Hd), where s is the score computed by an automatic biometric recognition system, Hp is the hypothesis of the prosecution (which states that the two biometric specimens are obtained from a same-source) and Hd is the hypothesis of the defense (which states that the two biometric specimens are obtained from different sources). Generally, two sets of training scores, one under Hp and the other under Hd, are needed to compute a likelihood-ratio from a score. In this thesis, we review several methods of likelihood-ratio computation focusing mainly on the issues of the sampling variability in the sets of training scores and the specific conditioning imposed on the pairs of the biometric specimens to compute them. Three different methods are considered in detail: Kernel density estimation, Logistic regression and Pool adjacent Violators. The effect of the sampling variability is quantified varying : 1) the shapes of the probability density functions which model the distributions of the scores under Hp and under Hd; 2) the sizes of the training sets under Hp and under Hd; 3) the actual value of the score for which the likelihood-ratio is computed. The study proposes a simulation framework which can be used to study several properties of a likelihood-ratio computation method and to quantify the effect of the sampling variability in a likelihood-ratio. This is useful for an appropriate and informed choice of a likelihood-ratio computation method. It is shown that sampling variability is a serious concern when small sets of the training scores are available for likelihood-ratio computation. Our study of likelihood-ratio computation also focuses on the specific conditioning imposed on the pairs of biometric specimens used for computation of the sets of the training scores. In general, the two sets of training scores are viii Summary obtained from a same-source and different-sources comparisons of biometric specimens. However, the same-source and different-sources conditions can be anchored to a specific suspect in a forensic case or it can be generic samesource and different-sources comparisons independent of the suspect involved in the case. This results in two likelihood-ratios which differ in the nature of the training scores they use and therefore consider slightly different interpretations of the two hypotheses. An empirical study is carried out to quantify how much and how frequently the two likelihood-ratios vary considering a speaker, a face and a fingerprint recognition system. Study showed that there is significant variations in the two likelihood-ratios and therefore explicit definition of the training sets and the hypotheses implied by them is very important. The state-of-the-art towards automated forensic face recognition is reviewed and the concept of likelihood-ratio is applied to several existing biometric face recognition systems. In forensic situations, e.g., when an image from a crime scene is compared with an image from a suspect, forensic face recognition is currently a manual process referred to as “forensic facial comparison‿ and performed by forensic examiners based on their experience and a limited set of guidelines. A step is taken towards automation of forensic face recognition by studying the discriminating powers of different facial features such as eyes, eye brows, nose, etc. This kind of regional comparison is the essence of forensic facial comparison and prove very useful in situations where a part of the face is available for comparison. Besides the automation, it might also be feasible to use existing automatic face recognition systems for forensic comparison and reporting. To this end, several face recognition systems are calibrated so that they produce likelihood-ratios and their performance is evaluated based on the likelihood-ratios assessment tools.

KW - SCS-Safety

KW - METIS-303759

KW - Biometric

KW - EWI-25054

KW - Calibration

KW - Forensic

KW - Face Recognition

KW - IR-91252

U2 - 10.3990/1.9789036536899

DO - 10.3990/1.9789036536899

M3 - PhD Thesis - Research UT, graduation UT

SN - 978-90-365-3689-9

PB - Universiteit Twente

CY - Enschede

ER -