Thursday, March 19, 2009

Small survey of Objective Image Quality metrics

PDF version of this post is here

The importance of objective quality metric methods cannot be underestimated: such methods are being used in automated images restoration algorithms, for comparison of images compression algorithms and so on. Quality metrics are graphically presented in Fig. 1.

All proposed quality metrics can divided to two general classes¹: subjective and objective [2].

Subjective evaluation of images quality is oriented on Human Vision System (HVS). As it was mentioned in [3], the best way to assess the quality of an image is perhaps to look at it because human eyes are the ultimate receivers in most image processing environments. The subjective quality measurement Mean Opinion Score (MOS) has been used for many years.

Objective metrics include Mean Squared Error (MSE), or $L_p$-norm [4,5], and measures that are mimicking the HVS such as [6,7,8,9,10,11]. In particular, it is well known that a large number of neurons in the primary visual cortex are tuned to visual stimuli with specific spatial locations, frequencies, and orientations. Images quality metrics that incorporate perceptual quality measures by considering human visual system (HVS) were proposed in [12,13,14,15,16]. Image quality measure (IQM) that computes image quality based on the 2-D spatial frequency power spectrum of an image was proposed in [10]. But still such metrics have poor performance in real applications and widely criticized for not correlating well with perceived quality measurement [3].

As a promising techniques for images quality measure, Universal Quality Index [17,3], Structural SIMilarity index [18,19], and Multidimensional Quality Measure Using SVD [1] are worth to be mentioned ².

Figure 1: Types of images quality metrics.

So there are three objective methods of images' quality estimation to be discussed below: the UQI, the SSIM, and MQMuSVD. Brief information about main ideas of those metrics is given. But first of all, let me render homage to a mean squared error (MSE) metric.

A Good-Old MSE

Considering that $x={x_i | i = 1,2,\dots N}$ and $y={x_i | i = 1,2,\dots N}$ are two images, where N is the number of image's pixels, the MSE between these images is:

Of course, there is more general and well-suitable formulation of MSE for images processing given by Fienup [5]:

(2)

where

(3)

Such NRMSE metrics allows to estimate quality of images especially in various applications of digital deconvolution techniques. Although Eq. 2 is better than pure MSE, the NRMSE metric have been criticizing a lot.

As it was written in remarkable paper [19], the MSE is used commonly for many reasons. The MSE is simple, parameter-free, and easy to compute. Moreover, the MSE has clear physical meaning as the energy of the error signal. Such an energy measure is preserved after any orthogonal linear transformation, such as Fourier transform. The MSE is widely used in optimization tasks and in deconvolution problem [21,22,23]. Finally, competing algorithms have most often been compared using the MSE or Peak SNR ratio.

But problems arising when one is trying to predict human perception of image fidelity and quality using MSE. As it was shown in [19], the MSE is very similar despite the differences in image's distortions. That is why there were many attempts to overcome MSE's limitations and find a new images quality metrics. Some of them are briefly discussed below.

Multidimensional Quality Measure Using SVD

The new metric of images quality called ``Multidimensional Quality Measure Using SVD'' was proposed in [1]. The main idea is that every real matrix A can be decomposed into a product of 3 matrices A = USV^T, where U and V are orthogonal matrices, U^TU = I, V^TV = I, and $S = diag (s_1, s_2, \dots)$. The diagonal entries of S are called the singular values of A, the columns of U are called the left singular vectors of A, and the columns of V are called the right singular vectors of A. This decomposition is known as the Singular Value Decomposition (SVD) of A [24]. If the SVD is applied to the full images, we obtain a global measure whereas if a smaller block is used, we compute the local error in that block:

$s_i$ are the singular values of the original block, $\hat{s}_i$ are the singular values of the distorted block, and N is the block size. If the image size is $K$, we have $(K/N) \times (K/N)$ blocks. The set of distances, when displayed in a graph, represents a ``distortion map''.

A universal image quality index (UQI)

As a more promising new paradigm of images quality measurements, a universal image quality index was proposed in [17]. This images quality metric is based on the following idea:

The main function of the human eyes is to extract structural information from the viewing field, and the human visual system is highly adapted for this purpose. Therefore, a measurement of structural distortion should be a good approximation of perceived image distortion.

The key point of the new philosophy is the switch from error measurement to structural distortion measurement. So the problem is how to define and quantify structural distortions. First, let's define a necessary mathematics [17] for original image X and test image Y . The universal quality index can be written as [3]:

(5)

where

The first component is the linear correlation coefficient between x and y, i.e., this is a measure of loss of correlation. The second component measures how close the mean values are between x and y, i.e., luminance distortion. The third component measures how similar the variances of the signals are, i.e., contrast distortion.

UQI quality measurement method is applied to local regions using sliding window approach. For overall quality index to be obtained, average value of local quality indexes $Q_i$ must be calculated:

(6)

As it mentioned in [17], the average quality index UQI coincides with the mean subjective ranks of observers. That gives to researchers a very powerful tool for images' quality estimation.

Structural SIMilarity (SSIM) index

The Structural Similarity index (SSIM) that is proposed in [18] is a generalized form of a Universal Quality Index [17]. As above, $x$ and $y$ are discrete non-negative signals; $\mu_x$, $\sigma_{x}^2$, and $\sigma_{xy}$ are the mean value of $x$, the variance of $x$, and the covariance of $x$ and $y$, respectively. According to [18] the luminance, contrast, and structure comparison measures were given as follows:

			(7)
			(8)
			(9)

where $C_1$, $C_2$ and $C_3$ are small constants given by $C_1 = (K_1\cdot L)^2$ ; $C_2 = (K_2 \cdot L)^2$ and $C_3 = C_2/2$. Here $L$ is the dynamic range of the pixel values, and $K_1 \ll 1$ and $K_2 \ll 1$ are two scalar constants. The general form of the Structural SIMilarity (SSIM) index between signal x and y is defined as:

(10)

where $\alpha, \beta, \; \text{and} \; \gamma$ are parameters to define the relative importance of the three components [18]. If $\alpha= \beta= \gamma =1$, the resulting SSIM index is given by:

(11)

SSIM is maximal when two images are coinciding (i.e., SSIM is <=1 ). The universal image quality index proposed in [17] corresponds to the case of $C_1 = C_2 = 0$ , therefore is a special case of Eq. (11).

A drawback of the basic SSIM index is its sensitivity to relative translations, scalings and rotations of images [18]. To handle such situations, a waveletdomain version of SSIM, called the complex wavelet SSIM (CW-SSIM) index was developed [25]. The CWSSIM index is also inspired by the fact that local phase contains more structural information than magnitude in natural images [26], while rigid translations of image structures leads to consistent phase shifts.

Despite its simplicity, the SSIM index performs remarkably well [18] across a wide variety of image and distortion types as has been shown in intensive human studies [27].

Instead of conclusion

As it was said in [18], ``we hope to inspire signal processing engineers to rethink whether the MSE is truly the criterion of choice in their own theories and applications, and whether it is time to look for alternatives.'' And I think that such articles provide a great deal of precious information for making decision to give away the MSE.

Useful links:
A very good and brief survey of images quality metrics, with links to MATLAB examples. Zhou Wang's page with huge amount of articles and MATLAB source code for UQI and SSIM. Another useful link for HDR images quality metrics.

Bibliography

1: Aleksandr Shnayderman, Alexander Gusev, and Ahmet M. Eskicioglu.
A multidimensional image quality measure using singular value decomposition.
In Image Quality and System Performance. Edited by Miyake, Yoichi; Rasmussen, D. Rene. Proceedings of the SPIE, Volume 5294, pp. 82-92, 2003.
2: A. M. Eskicioglu and P. S. Fisher.
A survey of image quality measures for gray scale image compression.
In Proceedings of 1993 Space and Earth Science Data Compression Workshop, pp. 49-61, Snowbird, UT, April 2, 1993.
3: Ligang Lu Zhou Wang, Alan C. Bovik.
Why is image quality assessment so difficult?
In In: Proceedings of the ICASSP'02, vol. 4, pp. IV-3313-IV-3316., 2002.
4: W. K. Pratt.
Digital Image Processing.
John Wiley and Sons, Inc., USA, 1978.
5: J.R. Fienup.
Invariant error metrics for image reconstruction.
Applied Optics, No 32, 36:8352-57, 1997.
6: J. L. Mannos and D. J. Sakrison.
The effects of a visual fidelity criterion on the encoding of images,.
IEEE Transactions on Information Theory, Vol. 20, No. 4:525-536, July 1974.
7: J. O. Limb.
Distortion criteria of the human viewer.
IEEE Transactions on Systems, Man, and Cybernetics, Vol. 9, No. 12:778-793, December 1979.
8: H. Marmolin.
Subjective mse measures.
IEEE Transactions on Systems, Man, and Cybernetics, Vol. 16, No. 3:486-489, May/June 1986.
9: J. A. Saghri, P. S. Cheatham, and A. Habibi.
Image quality measure based on a human visual system model.
Optical Engineering, Vol. 28, No. 7:813-818, July 1989.
10: B. N. Norman and H. B. Brian.
Objective image quality measure derived from digital image power spectra.
Optical Engineering, 31(4):813-825, 1992.
11: A.A. Webster, C. T. Jones, M. H. Pinson, S. D. Voran, and S. Wolf.
An objective video quality assessment system based on human perception.
In Proceedings of SPIE, Vol. 1913, 1993.
12: T. N. Pappas and R. J. Safranek.
in book ``Handbook of Image and Video Processing'' (A.Bovik, ed.), chapter Perceptual criteria for image quality evaluation.
Academic Press, May 2000.
13: B. Girod.
in book Digital Images and Human Vision (A. B. Watson, ed.), chapter What's wrong with mean-squared error, pages 207-220.
the MIT press, 1993.
14: S. Daly.
The visible difference predictor: An algorithm for the assessment of image fidelity.
In in Proceedings of SPIE, vol. 1616, pp. 2-15, 1992.
15: A. B. Watson, J. Hu, and J. F. III. McGowan.
Digital video quality metric based on human vision.
Journal of Electronic Imaging, vol. 10, no. 1:20-29, 2001.
16: J.-B. Martens and L. Meesters.
Image dissimilarity.
Signal Processing, vol. 70:155-176, Nov. 1998.
17: Z. Wang and A.C. Bovik.
A universal image quality index.
IEEE Signal Processing Letters, vol. 9, no. 3:81-84, Mar. 2002.
18: Z. Wang, A.C. Bovik, H.R. Sheikh, and E.P. Simoncelli.
Image quality assessment: From error visibility to structural similarity.
IEEE Transactions on Image Processing, vol. 13, no. 4:600-612, Apr. 2004.
19: Zhou Wang and Alan C. Bovik.
Mean squared error: Love it or leave it?
IEEE Signal Processing Magazine, 98:98-117, January 2009.
20: D.M. Chandler and S.S. Hemami.
Vsnr: A wavelet-based visual signal-to-noise ratio for natural images.
IEEE Transactions on Image Processing, vol. 16, no. 9:2284-2298, Sept. 2007.
21: Wiener N.
The extrapolation, interpolation and smoothing of stationary time series.
New York: Wiley, page 163р, 1949.
22: J.R. Fienup.
Refined wiener-helstrom image reconstruction.
Annual Meeting of the Optical Society of America, Long Beach, CA, October 18, 2001.
23: James R. Fienup, Douglas K. Griffith, L. Harrington, A. M. Kowalczyk, Jason J. Miller, and James A. Mooney.
Comparison of reconstruction algorithms for images from sparse-aperture systems.
In Proc. SPIE, Image Reconstruction from Incomplete Data II, volume 4792, pages 1-8, 2002.
24: D. Kahaner, C. Moler, and S. Nash.
Numerical Methods and Software.
Prentice-Hall, Inc., 1989.
25: Z. Wang and E.P. Simoncelli.
Translation insensitive image similarity in complex wavelet domain.
In Proceedings of IEEE International Conference of Acoustics, Speech, and Signal Processing, pp. 573-576., Mar. 2005.
26: T.S. Huang, J.W. Burdett, and A.G. Deczky.
The importance of phase in image processing filters.
IEEE Transactions on Acoustic, Speech, and Signal Processing, vol. 23, no. 6:529-542, Dec. 1975.
27: H.R. Sheikh, M.F. Sabir, and A.C. Bovik.
A statistical evaluation of recent full reference image quality assessment algorithms.
IEEE Transactions on Image Processing, vol. 15, no. 11:3449-3451, Nov. 2006.

Monday, March 16, 2009

Interesting facts about snakes' vision

The more I know about vision systems of animals, the more I think that it is advisable to read and study biology and biophysics for specialists in artificial imagery. So the main topic of this post are snakes that have an ability of thermal vision.

Not all of snakes have an ability of heat vision, but some groups of pythons and rattlesnakes can see both in visible and in far-IR band [1]. Snakes use infra-red radiation with wavelengths centred on 10 micrometers (wavelength corresponds to emitted from warmblooded animals). As it was written in [1],

certain groups of snakes do what no other animals or artificial devices can do. They form detailed images of extremely small heat signatures. What is most fascinating is that they do this with receptors that are microscopic in size, extraordinarily sensitive, uncooled, and are able to repair themselves. Snake infra-red imagers are at least 10 times more sensitive than the best artificial infra-red sensors...[1]

Several papers give us better understanding of how snakes can actually see and attack preys using only heat vision. A brief survey of articles devoted to snakes vision as well as some thoughts are given further.

How does the snake see?

The detection system, which consists of cavities located on each side of the head called pit organs, operates on a principle similar to that of a pinhole camera [2]. Pit vipers and boids, the two snake types that possess this ability, have heat-sensitive membranes that can detect the difference in temperature between a moving prey and its surroundings on the scale of mK. If the radiation intensity hitting the membrane at some point is larger than the emitted thermal radiation of the membrane itself, the membrane heats up at that location [2]. The picture of such cavities is presented in Fig. 1.

Figure 1: Snake's heat vision: a) head of a pit viper with nostril, pit hole, and eye, left to right. Photograph courtesy of Guido Westhoff; b) A pit viper's infra-red-sensitive pit organ works like a pinhole camera. The image from the paper [2].

According to the Planck radiation law as an approximation of the emitted heat intensity, 99% of the radiation is emitted at wavelengths under 75 micrometers and the radiation intensity is maximal at 9.5micrometers [3], which is within the 8-12 micrometers IR atmospheric transmittance window [4].

Because the pit hole is very large compared to the membrane size, the radiation strikes many points. Optical quality of the infra-red vision is much too blurry to allow snakes to strike prey with the observed accuracy of about 5 degrees. The most fascinating is an amount of heat-sensitive sensors and their precision:

In pit vipers, which have only two pit holes (one in front of each eye), a block of about 1600 sensory cells lie on a membrane which has a field of view of about 100 degrees . This means the snake's brain would receive an image resolution of about 2.5 degrees for point-like objects, such as eyes, which are one of the hottest points on mammals... [2]

If the aperture was very small, the amount of energy per unit time (second) reaching the membrane would also be small. The need to gather a reasonable amount of thermal energy per second necessitates the ``pinhole'' of the pit organ to be very large, thus greatly reducing its optical performance. If on the other hand the aperture of the organ is large, the image of a point source of heat is disc-shaped rather than point-like. Since, however, the size of the disc-shaped image may be determined by the detectors on the membrane, it is still possible to tell from which direction the radiation comes, ensuring directional sensitivity of the system [3]. The aperture size was probably an evolutionary trade-off between image sharpness and radiant flux [2]. Although the image that is formed on the pit membrane has a very low quality, the information that is needed to reconstruct the original temperature distribution in space is still available [3].

So how a snake could possibly use such poorly focused IR input to find its prey in darkness with a surprising angular precision of 5 degrees? How the snake may be able to extract information on the location of the prey from the blurred image that is formed on the pit-membrane?

What does the snake see?

Without the ability of real-time imaging the IR organ would be of little use for the snake. So Dr. van Hemmen proved that it is possible to reconstruct the original heat distribution using the blurred image on the membrane [3].

The image on the membrane resulting from the total heat distribution in space will be some complicated shape that consists of the superposition of the contributions of all heat sources [3]. A superposition of edge detectors in the brain can now reconstruct the heat distribution by using the whole image on the membrane for each point in space to be reconstructed. So reconstruction is possible because the information is still available in the blurred image on the pit membrane, where the receptors are [2]. As a demonstration of the model, sample image (see Fig. 2) was used.

Figure 2: The famous hare by Durer (left) was converted into 8- bit gray levels at a resolution of 32x32 (right). The image from the paper [2].

Since a snake has limited computational resources (all ``calculations'' must be realizable in neuronal ``hardware'') the reconstruction model must be simple. Our model [5] thus uses only one computational step (it is noniterative) to estimate the input image from the measured response on the pit membrane. It resembles a Wiener filter and is akin to, but different from, some of the algorithms used in image reconstruction [6].

So it is highly remarkable that snakes can perform some kind of an image processing, like our artificial devices based on ``wavefront coding''[7,8] and ``pupil engineering''[9,10] techniques.

Image processing in nature

There was developed a neuronal algorithm [11] that accurately reconstructed the heat image from the membrane. The most vital requirements is accurate detectors and the ability to detect edges in the images produced on the pit membrane [2]. That is similar to the situation with ``wavefront coding'' devices: the dynamic range and accuracy of the ADC is much more important than it is much more important than an amount of elements.

I would like to introduce an analogy here: such imaging is like drawing a picture on a sand. The more fine the sand, the more accurate and delicate pictures one can draw. That is the case of high dynamic range of the detector. And vice versa: on a coarse and stony sand it is difficult to draw a fine tracery that is the case of low dynamic range's detector [12,13].

But let us get back to the model of snakes vision:

The model has a fairly high noise tolerance. For input noise levels up to 50%, the hare is recognizable. Sensitivity to measurement errors is larger. In our calculations, one pixel of the reconstructed image corresponds to about 3 degrees . For detector noise levels up to about 1% of the membrane heat intensity, a good reconstruction is possible, meaning that the edge of the hare may be determined with about one pixel accuracy. At detector noise levels beyond about 1%, the image is not so easily recognizable, but the presence of an object is still evident...[5]

The assumptions that went into the calculations are a ``worst case scenario''. For instance, we assumed [3] that the input to the pit organ is totally uncorrelated, meaning that the snake has no idea what heat distribution to expect. In reality, important information about the environment is always available. For example, typical temperature and size of a prey animal may be encoded in the neuronal processing structure. If the snake ``knows'' what kind of images to expect, the reconstruction process can be enhanced considerably [3].

How does the reconstruction matrix become imprinted on the snake's neural circuitry in the first place? ``It can't be genetic coding,'' says van Hemmen. ``The snake would need a suitcase full of genes to encode such detail. Besides we know that snakes ...need a season of actual learning, not just anatomical maturation, to acquire their extraordinary skills.''... [11]

On the Fig. 3 it is shown a deconvolution results that give us a concept of the snakes vision capabilities.

Figure 3: On the left, this figure displays the membrane heat intensity as captured by the ``pithole camera''. On the right are reconstructions for four different membrane noise levels. The pit membrane was taken as a flat square containing 41x41 receptors. The model works equally well if applied to other membrane shapes. The membrane noise term was taken to be Gaussian with SIGMA= 25, 100, 200, and 500 from left to right and top to bottom, corresponding to 0.25%, 1%, 2%, and 5% of the maximal membrane intensity. The image from the paper [2]

Ultimately, a snake's ability to utilize information from the pit organs depends on its capability to detect edges in the image produced on the pit membrane. If the snake performed no reconstruction, but instead simply targeted bloblike ``hot spots'' on the membrane, it would still have to be able to discern the edge of the blob. The present model performs edge detection for all spatial positions and hence automatically creates a full reconstruction. A level of neuronal processing beyond what is represented in our model is unlikely to be beneficial since the quality of the system is fundamentally limited by the relatively small number of heat receptors.[5]

Conclusion

Snakes' heat vision presents such a clear image when reconstructed that it surpasses even many human devices - it is far better than any technical uncooled infra-red camera with a similar number of detector cells [2].

Bibliography

1: Liz Tottenham.
Infrared imaging research targets 'snake vision'.
web publication - Discovery: Florida Tech, DE-402-901:4-5, 2002.
2: Lisa Zyga.
Snakes' heat vision enables accurate attacks on prey.
PhysOrg.com, www.physorg.com/news76249412.html, page 2, 2006.
3: Andreas B. Sichert, Paul Friedel, and J. Leo van Hemmen.
Modelling imaging performance of snake infrared sense.
In Proceedings of the 13th Congress of the Societas Europaea Herpetologica. pp. 219-223; M. Vences, J. Kohler, T. Ziegler, W. Bohme (eds): Herpetologia Bonnensis II., 2006.
4: David A. Allen.
Infrared: The New Astronomy.
Infrared: The New Astronomy, 1975.
5: Andreas B. Sichert, Paul Friedel, and J. Leo van Hemmen.
Snake's perspective on heat: Reconstruction of input using an imperfect detection system.
PHYSICAL REVIEW LETTERS, PRL 97:068105, 2006.
6: R. C. Puetter, T. R. Gosnell, and Amos Yahil.
Annu. Rev. Astron. Astrophys, 43:139, 2005.
7: J. van der Gracht, E.R. Dowski, M. Taylor, and D. Deaver.
New paradigm for imaging systems.
Optics Letters, Vol. 21, No 13:919-921, July 1, 1996.
8: Jr. Edward R. Dowski and Gregory E. Johnson.
Wavefront coding: a modern method of achieving high-performance and/or low-cost imaging systems.
In Proc. SPIE, Current Developments in Optical Design and Optical Engineering VIII, volume 3779, pages 137-145, 1999.
9: R. J. Plemmons, M. Horvath, E. Leonhardt, V. P. Pauca, S. Prasad, S. B. Robinson, H. Setty, T. C. Torgersen, J. van der Gracht, E. Dowski, R. Narayanswamy, and P. E. X. Silveira.
Computational imaging systems for iris recognition.
In Proc. SPIE, Advanced Signal Processing Algorithms, Architectures, and Implementations XIV, volume 5559, pages 346-357, 2004.
10: Sudhakar Prasad, Todd C. Torgersen, Victor P. Pauca, Robert J. Plemmons, and Joseph van der Gracht.
Engineering the pupil phase to improve image quality.
In Proc. SPIE, Visual Information Processing XII, volume 5108, pages 1-12, 2003.
11: Bertram Schwarzschild.
Neural-network model may explain the surprisingly good infrared vision of snakes.
Physics Today, IX:18-20, September 2006 Physics Today.
12: Konnik M.V.
Image's linearization from commercial cameras used in optical-digital systems with optical coding.
In Proceedings of 5th International Conference of young scientists ``Optics-2007'', Saint-Petersburg, pages 354-355, 2007.
13: M.V. Konnik, E.A. Manykin, and S.N. Starikov.
Increasing linear dynamic range of commercial digital photocamera used in imaging systems with optical coding.
In OSAV'2008 Topical meeting, Saint-Petersburg, Russia, 2008.