mvkonnik

Monday, May 16, 2011

A note on Waffle Modes

Wavefront sensors are insensitive to certain phase aberration patterns called waffle. A Shack-Hartmann sensor is proven to suffer from blind or null modes: wavefront functions that can yield zero or very small response in the SH WFS output[1].

The commonly-used square Fried geometry[2] sensing in most Shack-Hartmann-based adaptive optics (AO) systems is insensitive to a checkerboard-like pattern of phase error called waffle. This zero mean slope phase error is low over one set of WFS sub-apertures arranged as the black squares of a checkerboard, and high over the white squares.

Waffle mode is essentially a repeated astigmatic pattern at the period of the SH sampling over the entire aperture. This wavefront has zero average slope over every sub-aperture and hence produces zero SH WFS output. The illustration of the waffle pattern on the DM is in Fig. 1.

Figure 1: Waffle mode on the DM [the picture from[3]]

Efforts have been made to understand how to control this and other ``blind'' reconstructor modes[1,4,5]. Advances in wave front reconstruction approaches have improved AO system performance considerably. Many physical processes cause the waffle: misregistration, high noise levels on the photosensor of WFS and so on.

Atmospheric turbulence and the Waffle modes

The initial uncertainty of atmospheric phase[1] is given by Kolmogorov statistics[6,7]. The atmosphere is unlikely to contain high spatial frequency components relative to low spatial frequency ones. Atmospheric turbulence has the form of a fractal and has much more power at large spatial scales so that the idea that the atmospheric wavefront can have big swings between adjacent phase points is physically unrealistic. This acts fortuitously to suppress waffle behaviour, which is a high spatial frequency behaviour.

Waffle modes during the Wavefront Sensing

The first step is to define the phase differences in terms of the phase measurements. Most gradient and curvature wavefront sensors do not sense all possible modes. For instance, the Fried geometry cannot measure the so-called Waffle mode. The Waffle mode occurs when the actuators are moved in a checkerboard pattern: all white squares change position while the black squares remain stationary. This pattern can build up over time, it is reduced by using different actuator geometries or filtering the control signals so as suppress this mode.

The curvature WFS, is not sensitive to waffle mode error, although it has its own particular null space. Curvature sensors are typically not used for AO systems with more sensing channels because of their noise propagation properties[8].

We stress here that waffle mode error does not depend fundamentally on the actuator location and geometry: it only depends on the sensor geometry[9].

Waffle modes in Deformable Mirrors

The best DMs that we make use a quasi-hex geometry, which reduces waffle pattern noise. In the continuous-facesheet mirrors, the actuators push and pull on the surface. Since the surface in continuous, there is some mechanical crosstalk from one actuator to the next one - this is so-called influence function. But that can be controlled through software within the control computer. Because most DM have regular arrays of actuators, either square or hexagonal geometries, the alternate pushing and pulling of adjacent actuators can impart a patterned surface that resembles a waffle. The ``waffle mode'' can appear in images of regular patter, which acts an an unwanted diffraction grating[10].

Removing the waffle modes

There is a weighting approach proposed by Gavel[1] that will actually change the mode space structure. We introduce an actuator weighting, but instead of using a scalar regularisation parameter we use a positive-definite matrix. An a-priori statistical information concerning the quantities to be determined and the measurement noise can be used to set the weighting matrices in the cost function terms.

The technique penalises all local waffle behaviour using the weighting matrix. It was found[1] that only a modification of the cost function by including a penalty term on the actuators will actually modify the mode space. With a suitable weighting matrix on the actuators, waffle behaviour can be sorted out into modes that have the least influence in the reconstruction matrix, thus suppressing waffle in the wavefront solutions.

The problem is that when the waffle modes go unsensed, they are not corrected by the DM and can accumulate within the closed-loop operation of the system. The accumulation of waffle modes leads to serious performance problems. The simplest way to correct them is that the adaptive-optics operator opens the feedback loop of the system in order to clear the DM commands and alleviate the waffle.

The waffle pattern accumulates in a closed-loop feedback system because the sensor cannot sense the mode for compensation. Spatially filtering the DM commands gives the option to maintain a conventional reconstructor while still having the ability to mitigate the waffle mode when it occurs[3]. In those experiments authors used a leaky integrator.

The paper[3] analyses a technique of sensing the waffle mode in the deformable mirror commands and applying a spatial filter to those commands in order to mitigate for the waffle mode. Directly spatially filtering the deformable mirror commands gives the benefit of maintaining the reconstruction of high frequency phase of interest while having the ability to alleviate for the waffle-like pattern when it arises[3].

This paper[3] will present a technique that only alleviates for the waffle when it is detected and does not effect the reconstruction of the phase when the mode has not accumulated to a degree that causes degradation in the performance of the system.

Bibliography

1: D.T. Gavel.
Suppressing anomalous localized waffle behavior in least-squares wavefront reconstructors.
4839:972, 2003.
2: D.L. Fried.
Least-square fitting a wave-front distortion estimate to an array of phase-difference measurements.
JOSA, 67(3):370-375, 1977.
3: K.P. Vitayaudom, D.J. Sanchez, D.W. Oesch, P.R. Kelly, C.M. Tewksbury-Christle, and J.C. Smith.
Dynamic spatial filtering of deformable mirror commands for mitigation of the waffle mode.
In SPIE Optics & Photonics Conference, Advanced Wavefront Control: Methods, Devices, and Applications VII, Vol. 7466: 746609, 2009.
4: L.A. Poyneer.
Advanced techniques for Fourier transform wavefront reconstruction.
4839:1023, 2003.
5: R.B. Makidon, A. Sivaramakrishnan, L.C. Roberts Jr, B.R. Oppenheimer, and J.R. Graham.
Waffle mode error in the AEOS adaptive optics point-spread function.
4860:315, 2003.
6: A.N. Kolmogorov.
Dissipation of energy in the locally isotropic turbulence.
Proceedings: Mathematical and Physical Sciences, 434(1890):15-17, 1991.
7: A.N. Kolmogorov.
The local structure of turbulence in incompressible viscous fluid for very large Reynolds numbers.
Proceedings: Mathematical and Physical Sciences, 434(1890):9-13, 1991.
8: A. Glindemann, S. Hippler, T. Berkefeld, and W. Hackenberg.
Adaptive optics on large telescopes.
Experimental Astronomy, 10(1):5-47, 2000.
9: R. B. Makidon, A. Sivaramakrishnan, M. D. Perrin, L. C. Roberts, Jr., B. R. Oppenheimer, R. Soummer, and J. R. Graham.
An analysis of fundamental waffle mode in early aeos adaptive optics images.
The Publications of the Astronomical Society of the Pacific, 117:831-846, August 2005.
10: R.K. Tyson.
Adaptive optics engineering handbook.
CRC Press, 2000.

Monday, May 9, 2011

A note on speed comparison of wavefront reconstruction approaches

The two main reconstruction geometries are namely Hudgin and Fried. In the Hudgin geometry, gradients are the first differences between neighbouring actuators\cite{poyneer2003advanced}. In the Fried geometry, gradients are the average of the two first-differences on a square. Many adaptive optics systems use vector-matrix-multiply (VMM) reconstructors to convert gradient measurements to wavefront phase estimates. The computation time of the reconstruction using the VMM method scales as $O(n^2)$\cite{PoyneerFastFFTreconWFR}.

The reconstruction of the wavefront by means of the FFT was proposed by Freischlad and Koliopoulos\cite{freischlad1985wavefront} for square apertures on the Hudgin geometry. In a further paper\cite{FreischladFFTreconWFR} the authors derived methods for additional geometries, including the Fried geometry, which uses one Shack-Hartmann (SH) sensor. Freischlad also considered the case of small circular apertures\cite{freischladwfrfft} and the boundary problem was identified.

Computational speed comparison
The computational speed of Hudgin-FT and Fried-FT are limited only by the FFT\cite{poyneer2003advanced}. The extra processing to solve the boundary problem is of a lower order of growth computationally. Therefore FFT implementations have computational costs that scale as $O(n \log n)$. However, the implementation of Fried-FT requires potentially 2 times as much total computation as the Hudgin-FT. For the $64 \times 64$ grid, the FFT can be calculated on currently available systems in around 1 ms\cite{poyneer2003advanced}. If $N$ is a power of two the spatial filter, operations can be implemented with FFT's very efficiently. The computational requirements then scale as $O( N^2 log_2 N )$ rather than as $O( N^4 )$ in the direct vector-matrix multiplication approach. The modified Hudgin takes half as much computation as the Fried geometry model\cite{poyneer2003advanced}.

Comparison of FFT WFR and Zernike reconstruction speed

A very interesting paper appeared in the Journal of Refract Surgery\cite{dai2006comparison}. A comparison between Fourier and Zernike reconstructions was performed. In the paper\cite{dai2006comparison}, noise-free random wavefronts were simulated with up to the 15th order of Zernike polynomials. Fourier full reconstruction was more accurate than Zernike reconstruction from the 6th to the 10th orders for low-to-moderate noise levels. Fourier reconstruction was found to be approximately \textbf{100 times faster than Zernike reconstruction}. For Zernike reconstruction, however, the optimal number of orders must be chosen manually. The optimal Zernike order for Zernike reconstruction is lower for smaller pupils than larger pupils. The paper\cite{dai2006comparison} concludes that the FFT WFR is faster and more accurate than Zernike reconstruction, makes optimal use of slope information, and better represents ocular aberrations of highly aberrated eyes.

Noise propagation

Analysis and simulation show that for apertures just smaller than the square reconstruction grid (DFT case), the noise propagations of the FT methods are favourable. For the Hudgin geometry, the noise propagator grows with $O(\ln n)$. For the Fried geometry, the noise propagator is best-fit by a curve that is quadratic in the number of actuators, or $O(\ln^2 n)$. For fixed power-of-two sized grids (required to obtain the speed of the FFT for all aperture sizes) the noise propagator becomes worse when the aperture was much smaller than the grid\cite{poyneer2003advanced}.

Shack-Hartmann sensor gain

The Shack-Hartmann WFS produces a measurement which deviates from the exact wavefront slope. The exact shape of this curve depends on number of pixels used per sub-aperture and the centroid computation method (see \cite{hardyAObook} section 5.3.1 for a representative set of response curves). The most important feature of the response curve is that even within the linear response range, the gain of the sensor is not unity\cite{poyneer2003advanced}. This gain is important in the open-loop: in a closed loop this problem in mitigated by the overall control loop gain, which can be adjusted instead.

References:

\begin{thebibliography}{1}  \bibitem{poyneer2003advanced} L.A. Poyneer. \newblock {Advanced techniques for Fourier transform wavefront reconstruction}. \newblock In {\em Proceedings of SPIE}, volume 4839, page 1023, 2003.  \bibitem{PoyneerFastFFTreconWFR} Lisa~A. Poyneer, Donald~T. Gavel, and James~M. Brase. \newblock Fast wave-front reconstruction in large adaptive optics systems with   use of the fourier transform. \newblock {\em J. Opt. Soc. Am. A}, 19(10):2100--2111, Oct 2002.  \bibitem{freischlad1985wavefront} K.~Freischlad and C.L. Koliopoulos. \newblock {Wavefront reconstruction from noisy slope or difference data using   the discrete Fourier transform}. \newblock 551:74--80, 1985.  \bibitem{FreischladFFTreconWFR} Klaus~R. Freischlad and Chris~L. Koliopoulos. \newblock Modal estimation of a wave front from difference measurements using   the discrete fourier transform. \newblock {\em J. Opt. Soc. Am. A}, 3(11):1852--1861, Nov 1986.  \bibitem{freischladwfrfft} Klaus~R. Freischlad. \newblock Wave-front integration from difference data. \newblock {\em Interferometry: Techniques and Analysis}, 1755(1):212--218,   1993.  \bibitem{dai2006comparison} G.~Dai. \newblock {Comparison of wavefront reconstructions with Zernike polynomials and   Fourier transforms.} \newblock {\em Journal of refractive surgery}, 22(9):943--948, 2006.  \bibitem{hardyAObook} John~W. Hardy. \newblock {\em {Adaptive optics for astronomical telescopes}}. \newblock Oxford University Press, USA, 1998.  \end{thebibliography}

Monday, April 25, 2011

Reconstruction of the wavefront: Boundary problems on a circular aperture

The ``boundary problem'' leads to large uncorrectable errors: if the slopes in the aperture are simply zero-padded, large errors occur across the aperture. The example of the errors can be seen below, where the reconstruction of the wavefront using FFT WFR modified Hudgin method is shown with the circular aperture and central obscuration.

Reconstruction of the wavefront using FFT WFR modified Hudgin method is shown with the circular aperture and central obscuration (outer is 0.9, and inner is 0.1).

In the case of astronomical telescopes, the gradients are typically available on
a circular aperture. The measurement data cannot be zero padded as it leads to large errors on boundaries. Such errors do not decrease with system size\cite{PoyneerFastFFTreconWFR}.

Because of the spatial periodicity and closed-path-loop conditions (see below), zero-padding the gradient measurements is incorrect. Doing so violates both conditions in general\cite{PoyneerFastFFTreconWFR}. These inconsistencies manifest themselves in errors that span the aperture. The errors do not become less significant as the aperture size increases: unlike the squareaperture case, the amplitude of the error remains large and spans the circular aperture.

Assumptions in the wavefront reconstruction that are not always true
There are two key assumptions in the wavefront reconstruction that must be satisfied for the proper work of the algorithm\cite{PoyneerFastFFTreconWFR}:

the gradients are spatially periodic (necessary for use of the DFT method, and it must be maintained for a set of gradient measurements). Check if the sum of every row or column in the $N\times N$ gradient signal equals zero.
any closed path of gradients must sum to zero (based on the modeling of the gradients as first differences).

Hence the two conditions must be satisfied for correct reconstruction:

all loops (under Hudgin or Fried geometry) must sum to zero;
both slope signals must be spatially periodic (for DFT)

This spatial periodicity assumption does not require that the sensed wave front be inherently periodic\cite{PoyneerFastFFTreconWFR}. Like VMM reconstructors work while estimating only the phase values inside the aperture, FT methods accurately reconstruct on grids of aperture size.

The boundary problem is solved by using specific methods to make the gradient
sets consistent\cite{poyneer2003advanced}. These methods involve setting the values of specific gradients outside the aperture in a way to guarantee correct reconstruction inside the aperture. By using these methods FT reconstructors accurately reconstruct all sensed modes inside the aperture.

Solutions to the boundary problem: boundary method
The first method is the boundary method: it estimates the gradients that cross the boundary of the aperture\cite{PoyneerFastFFTreconWFR}. It follows directly from the development
above of inside, boundary, and outside gradients. Only the inside gradients are known from the measurement. The outside gradients can all be set to zero leaving the boundary gradients undetermined. A loop continuity equation can be written for each of the two smallest loops that involve a boundary gradient.

Boundary method. Setting each closed loop across the aperture edge to zero results in an equation relating the unknown boundary gradients to the measured inside gradients and the zeroed outside gradients [picture from the paper \cite{PoyneerFastFFTreconWFR}].

All of these loop continuity equations involving the boundary gradients combine to form a linear system:

$\mathbf{M}u = c$

Here the matrix $M$ is specific for the selected geometry, the $u$ is the vector of all boundary gradients, and the $c$ is a vector containing sums of measured gradients (it has a fixed combination of gradients, but the value of these gradients depends on the actual measurement\cite{PoyneerFastFFTreconWFR}). In the case of noised measurements, the solution can be found using the pseudoinverse of the matrix $M$.

Solutions to the boundary problem: Extention method
The second method is the extension method: it extends out the gradients from inside the aperture. The extension method extends the wavefront shape to
outside the aperture. It does this by a preserving loop continuity\cite{PoyneerFastFFTreconWFR}. The x gradients are extended up and down out of the aperture, while the y gradients are extended to the left and the right.

Extension method, shown for $N=6$. The values of the gradients closest to the aperture edge are repeated outside the aperture [picture from the paper \cite{PoyneerFastFFTreconWFR}].

The extension method produces a completely consistent set of gradients. These provide perfect reconstruction of the phase when there is no noise, except for the piston. If there is noise, the same procedure is done, though loop continuity will not hold on loops involving the seam gradients, just as the boundary gradients in the boundary method were the best, but not an exact, solution when there was noise\cite{PoyneerFastFFTreconWFR}.

References:

\begin{thebibliography}{1}

\bibitem{PoyneerFastFFTreconWFR} Lisa~A. Poyneer, Donald~T. Gavel, and James~M. Brase. \newblock Fast wave-front reconstruction in large adaptive optics systems with use of the fourier transform. \newblock {\em J. Opt. Soc. Am. A}, 19(10):2100--2111, Oct 2002.

\bibitem{poyneer2003advanced} L.A. Poyneer. \newblock {Advanced techniques for Fourier transform wavefront reconstruction}. \newblock In {\em Proceedings of SPIE}, volume 4839, page 1023, 2003.

\end{thebibliography}

Monday, April 18, 2011

Simple model of ADC, noises in ADC and ADC non-linearity

An analogue-to-digital converter transforms a continuous quantity (voltage or current) into a discrete time digital numbers proportionally to the magnitude of the voltage or current. Most ADCs convert an input voltage to a digital word, but the definition of an ADC includes the possibility of an input current.

An ADC has an analogue reference voltage $V_{REF}$ or current against which the analogue input is compared. The digital output tells us what fraction of the reference voltage or current is the input voltage or current. So if we have N-bit ADC, there are $2^N$ possible output codes, and the difference between each output code is $\frac{V_{REF}}{2^N}$.

Resolution of ADC
Every time you increase the input voltage by $V_{REF}/2^N$ Volt, the output code will increase by one bit. This means that the least significant bit (LSB) represents $V_{REF}/2^N$ Volt, which is the smallest increment that the ADC actually can \textit{resolve}. Then one can say that the \textit{resolution} of this converter is $V_{REF}/2^N$ because ADC resolve voltages as small as such value. Resolution may also be stated in bits.

Because the resolution is $V_{REF}/2^N$, better accuracy / lower error can be achieved as: 1) use a higher resolution ADC and/or 2) use a smaller reference voltage. The problem with high-resolution ADC is the cost. Moreover, the higher resolution / lower $V_{REF}/2^N$, the more difficult to detect a small signal as it becomes lost in the noise thus reducing SNR performance of the ADC. The problem with reducing $V_{REF}$ is a loss of input dynamic range.

Hence the ADC's resolution indicates the number of discrete values that can be produced over the range of analogueue values and can be expressed as:
$$ K_{ADC} = (V_ \mathrm {REF} - V_ \mathrm {min})/N_{max}$$

where $V_ \mathrm {REF}$ is maximum (reference) voltage that can be quantified, $V_ \mathrm {min}$ is minimum quantifiable voltage, and $N_{max} = 2^M$ is the number of voltage intervals ($M$ is ADC's resolution in bits). The ADC's output code can be represented as:

$$ADC_ \mathrm {Code} = \textrm{round}\left[ (V_ \mathrm {input}-V_ \mathrm {min})/K_{ADC} \right]$$

The lower reference voltage $V_{REF}$, the smaller range of voltages with greater accuracy one can measure. This is a common way to get better precision from an ADC without buying a more expensive ADC with higher resolution.

Quantisation error
As the input voltage in ADC increases towards $V_{REF}/2^N$, the output still remains at zero because a range of input voltages is represented by a single output code. When the input reaches $V_{REF}/2^N$, the output code changes from 000 to 001.

The maximum error in ADC is 1 LSB. This 0 to 1 LSB range is known as the ``quantisation
uncertainty'': there is a range of analogue input values that could have caused any given code.

Linearity and Linearity errors
The accuracy of analogue to digital conversion has an impact on overall system quality and efficiency. To be able to improve accuracy you need to understand the errors associated with the
ADC and the parameters affecting them.

Differential Linearity Error
Differential Linearity Error (DLE) describes the error in step size of an ADC: it is the maximum deviation between actual steps and the ideal steps (code width). Here ``ideal steps''' are not for the ideal transfer curve but for the resolution of the ADC. The input code width is the range of input values that produces the same digital output code.

Ideally analogue input voltage change of 1 LSB should cause a change in the digital code. If an analogue input voltage greater than 1 LSB is required for a change in digital code, then the ADC has the differential linearity error.

Here each input step should be precisely 1/8 of reference voltage. The first code transition from 000 to 001 is caused by an input change of 1 LSB as it should be. The second transition, from 001 to 010, has an input change that is 1.2 LSB, so is too large by 0.2 LSB. The input change for the third transition is exactly the right size. The digital output remains
constant when the input voltage changes from 4 LSB to 5 LSB, therefore the code 101
can never appear at the output.

When no value of input voltage will produce a given output code, that code is missing from the ADC transfer function. In data sheets of many ADC it is specified ``no missing codes''; this can be critical in some applications like servo systems.

Integral Linearity Error
Integral Linearity Error (ILE) describes the maximum deviation of ADC transfer function from a linear transfer curve (i.e. between two points along the input-output transfer curve). It is a measure of the straightness of the transfer function and can be greater than the differential non-linearity. In specifications, an ADC is described sometimes as being ``x bits linear.''

The ADC input is usually connected to an operational amplifier, maybe a summing or a differential amplifier which are linear circuits and process an analogue signal. As the ADC is included in the signal chain, we would like the same linearity to be maintained at the ADC level as well. However, inherent technological limitations make the ADC non-linear to some extent and this is where the ILE comes into play.

For each voltage in the ADC input there is a corresponding word at the ADC output. If an ADC is ideal, the steps are perfectly superimposed on a line. But most of real ADC exhibit deviation from the straight line, which can be expressed in percentage of the reference voltage or in LSBs.

The ILE is important because it cannot be calibrated out. Moreover, the ADC non-linearity is often unpredictable: it is difficult to say where on the ADC scale the maximum deviation from the ideal line is. Let's say the electronic device we design has an ADC that needs to measure the input signal with a precision of $\alpha$\% of reference voltage. Due to the ADC quantisation, if we choose a N-bit ADC, the initial measurement error is $E_{ADC} = \pm 0.5$ LSB, which is called the quantisation error:
$$ADC_{qantiz.error} = E_{ADC}/2^N $$

For instance, with 12-bit ADC and measurement error $\pm 1/2$ LSB, quantisation error will be $ADC_{qantiz.error} = \frac{1}{2 \cdot 2^{12}} \approx 0.0012 \% $. If we need to measure the input signal with precision of 0.1\% of VREF, 12-bit ADC does a good job. However, if the INL is large, the actual ADC error may come close to the design requirements of 0.5\%.

Signal-to-Noise ratio for ADC
Signal-to-Noise Ratio (SNR) is the ratio of the output signal amplitude to the output noise level. SNR of ADC usually degrades as frequency increases because the accuracy of the comparator(s) within the ADC degrades at higher input slew rates. This loss of accuracy shows up as noise at the ADC output.

Noise in an ADC comes from four sources:

quantisation noise;
noise generated by the ADC itself;
application circuit noise;
jitter;

The non-linearity of the ADC can be described as input to output transfer function of:
$Output = Input^{\alpha}$

Quantisation noise
Quantisation noise results from the quantisation process - the process of assigning an output code to a range of input values. The amplitude of the quantisation noise decreases as resolution increases because the size of an LSB is smaller at higher resolutions, which reduces the maximum quantisation error. The theoretical maximum of SNR for an ADC with a full-scale sine-wave input derives from quantisation noise
and is about $6.02\cdot N + 1.76$ dB.

Noise generated by the ADC itself
The problem can be in high capacitance of ADC output or input. Device input capacitances causes noise on the supply line. Discharging these capacitances adds noise to the ADC substrate and
supply bus, which can appear at the input as noise.

Minimising the load capacitance at the output will minimise the currents needed to charge and
discharge them, lowering the noise produced by the converter itself. This implies that one
output pin should drive only one input device pin (use a fan-out of one) and the length of the
ADC output lines should be as short as possible.

Application circuit noise
Application circuit noise is that noise seen by the converter as a result of the way the circuit is designed and laid out, for instance, noisy components or circuitry: noisy amplifiers, noise in resisters.

Amplifier noise is an obvious source of noise, but it is extremely difficult to find an amplifier with noise and distortion performance that will not degrade the system noise performance with a high resolution (12-bit or higher) ADC.

We often think of resistors as noisy devices, but choosing resistor values that are as low as
practical can keep noise to a level where system performance is not compromised.

Jitter
A clock signal that has cycle-to-cycle variation in its duty cycle is said to exhibit jitter. Clock jitter causes an uncertainty in the precise sampling time, resulting in a reduction of dynamic performance.

Jitter can result from the use of a poor clock source, poor layout and grounding. One can see that the steps of digital signal in case of jitter are rough and non-uniform.

SNR performance decreases at higher frequencies because the effects of jitter get worse.

Types of ADC
There are different type of ADC schemes, each of them has own advantages and disadvantages.

For example, a Flash ADC has drifts and uncertainties associated with the comparator levels, which lead to poor uniformity in channel width and therefore poor linearity.

Poor linearity is also apparent for SAR ADCs, however it is less than that of flash ADCs. Non-linearity in SAR ADC arises from accumulating errors from the subtraction processes.

Wilkinson ADCs can be characterised as most linear ones; particularly, Wilkinson ADCs have the best (smallest) differential non-linearity.

Flash (parallel encoder) ADC
Flash ADCs are made by cascading high-speed comparators. For an N-bit ADC, the circuit employs 2N-1 comparators. A resistive-divider with 2N resistors provides the reference voltage. The reference voltage for each comparator is one least significant bit (LSB) greater than the reference voltage for the comparator. Each comparator produces a 1 when its analogue input voltage is higher than the reference voltage applied to it. Otherwise, the comparator output is 0.

[the Image from the reference, (c) Maxim-ic]

The key advantage of this architecture is very fast conversion times, which is suitable for high-speed low-resolution applications.

The main disadvantage is its high power consumption. The Flash ADC architecture becomes prohibitively expensive for higher resolutions. [the reference used, (c) Maxim-ic]

Successive approximation register ADC
Successive-approximation-register (SAR) ADCs are the majority of the ADC market for medium- and high-resolution. SAR ADCs provide up to 5Msps sampling rates with resolutions from 8 to 18 bits. The SAR architecture allows for high-performance, low-power ADCs to be packaged in small form factors. The basic architecture is shown in Fig.~\ref{fig:SARADC}.

[ the Image from this reference, (c) Maxim-ic]

The analogue input voltage (VIN) is held on a track/hold. To implement the binary search algorithm, the N-bit register is first set to midscale. This forces the DAC output (VDAC) to be VREF/2, where VREF is the reference voltage provided to the ADC.

A comparison is then performed to determine if VIN is less than, or greater than, VDAC. If VIN is greater than VDAC, the comparator output is a logic high, or 1, and the MSB of the N-bit register remains at 1. Conversely, if VIN is less than VDAC, the comparator output is a logic low and the MSB of the register is cleared to logic 0.

The SAR control logic then moves to the next bit down, forces that bit high, and does another comparison. The sequence continues all the way down to the LSB. Once this is done, the conversion is complete and the N-bit digital word is available in the register.

Generally speaking, an N-bit SAR ADC will require N comparison periods and will not be ready for the next conversion until the current one is complete. This explains why these ADCs are power- and space-efficient. Another advantage is that power dissipation scales with the sample rate.[the reference used, (c) Maxim-ic]

Monday, April 11, 2011

Avalanche Photodiodes in Astronomical Adaptive Optics

An avalanche photodiode (APD) is a photodiode that internally amplifies the photo-current by an avalanche process. In an APD incoming photons produce electron-hole pairs; however, the APD is operated with a large reverse bias (up to 2 kV), which accelerates the photon-generated electrons. The electrons collide with the atomic lattice, releasing additional electrons via secondary ionisation. These secondary electrons also are accelerated, which results in an avalanche of carriers.

Photons entering the diode first pass through the silicon dioxide layer and then through the n and p layers before entering the depletion region where they excite free electrons and holes, which then migrate to the cathode and anode, respectively. When a semiconductor diode has a reverse voltage bias applied and the crystal junction between the p and n layers is illuminated, then a current will flow in proportion to the number of photons incident upon the junction.

As these electrons collide with other electrons in the semiconductor material, they cause a fraction of them to become part of the photo-current - this process is called avalanche multiplication.

The gain of the APD is changed by the reverse-bias voltage applied (larger bias voltage - larger gain). However, a larger reverse-bias voltage also results in increased noise levels. Excess noise resulting from the avalanche multiplication process places a limit on the useful gain of the APD. The avalanche process introduces excess noise because every photogenerated carrier does not undergo the same multiplication. Avalanche photodiodes are capable of modest gain (500-1000), but exhibit substantial dark current, which increases markedly as the bias voltage is increased.

Overall, avalanche photodiodes are compact and immune to magnetic fields, require low currents, are difficult to overload, and have a high quantum efficiency that can reach 90 percent.

Avalance Photodiodes in Wavefront Sensors
It was reported\cite{takami2004performance} that avalanche photodiodes are used in wavefront sensors for adaptive optics.

The modulated light intensity signal is sampled at each sub-aperture of a microlens array and fed through optical fibers to photon-counting avalanche photodiode (APD) modules. The current Subaru AO system has 36 sub-apertures.

A benefit of this type of sensor is that one can use photon-counting avalanche photodiodes without readout noise\cite{takami2004performance}, while for Shack-Hartmann
sensors the readout noise of CCD detector becomes the dominant error source for faint guide stars\cite{takami2004performance}. A 1000x1000 cooled CCD (EEV CCD47-20) camera covering a 20 field is used to monitor the guide star.

Another proposal\cite{aull2010adaptive} is from Lincoln Laboratory where has been investigating Geiger mode avalanche photodiode arrays integrated with CMOS readout circuits. This type of sensor counts photons digitally within the pixel, enabling data to be read out at high rates without the penalty of readout noise.

It has been demonstrated\cite{aull2010adaptive} a high-fill-factor, 16x16 quad cell APD array with monotonic centroiding response. Devices with a 32x32 format are in fabrication.

However, the dark count rate of the high-fill-factor devices is high compared with other detector technologies, and the mechanisms for this have been investigated. The dark count rate is hypothesised to arise from a combination of tunnelling current at the junction periphery, unsuppressed thermal dark current, and optical self re-triggering\cite{aull2010adaptive}.

Advances in APD for AO
Development of novel circuitry for solid state photon counting devices, based on avalanche photodiodes (PC-APD), was also reported\cite{bonaccini1994novel}. The recent development has improved silicon APDs considerably, reducing the afterpulsing effects, improving the effective Q.E., and reducing the dark current to negligible values. These new APD allow to conceive new quenching circuitry and new applications of solid state photon counters for improved adaptive optics performance\cite{bonaccini1994novel}.

The development of APDs is focusing on high bandwidth, low excess noise, and high gain bandwidth\cite{campbell2004recent}. It has been shown that lower noise and higher gain bandwidth products can be achieved by submicron scaling of the multiplication region thickness and replacing InP in the multiplication layer with Al In As\cite{campbell2004recent}. However, spatial uniformity of the photoresponse at very high gains remains work in progress.

References
\begin{thebibliography}{1}

\bibitem{takami2004performance}
H.~Takami, N.~Takato, Y.~Hayano, M.~Iye, S.~Oya, Y.~Kamata, T.~Kanzawa,
Y.~Minowa, M.~Otsubo, K.~Nakashima, et~al.
\newblock {Performance of Subaru Cassegrain adaptive optics system}.
\newblock {\em PUBLICATIONS-ASTRONOMICAL SOCIETY OF JAPAN}, 56(1):225--234,
2004.

\bibitem{aull2010adaptive}
B.F. Aull.
\newblock {Adaptive optics wavefront sensors based on photon-counting detector
arrays}.
\newblock 2010.

\bibitem{bonaccini1994novel}
D.~Bonaccini, S.D. Cova, M.~Ghioni, R.~Gheser, S.~Esposito, and
G.~Brusa-Zappellini.
\newblock {Novel avalanche photodiode for adaptive optics}.
\newblock 2201:650, 1994.

\bibitem{campbell2004recent}
J.C. Campbell, S.~Demiguel, F.~Ma, A.~Beck, X.~Guo, S.~Wang, X.~Zheng, X.~Li,
J.D. Beck, M.A. Kinch, et~al.
\newblock {Recent advances in avalanche photodiodes}.
\newblock {\em Selected Topics in Quantum Electronics, IEEE Journal of},
10(4):777--787, 2004.

\end{thebibliography}

Saturday, April 2, 2011

Modified Hudgin geometry for Wavefront reconstruction using Fast Fourier Transform

One of the attractive abilities of the FFT WFR is that the filtering construct provides flexibility: reconstruction is accomplished by filtering in the frequency domain and one can modify this filter with negligible computational overhead. It is easy to incorporate filtering options into the
reconstruction filters like noise reduction, modal removal, misalignment, or DM geometry compensation.

Misalignment of the WFS data and the DM geometry.

For example, the misalignment can be compensated: the WFS grid and the DM actuators may be misaligned by shifts along x or y dimensions. If the amount is known, shift slope estimate by a fraction of an actuator spacing such as $(exp[-i 2\pi (\Delta_x k + \Delta_y l)/ N])\Phi$.

Modified Hudgin geometry
For Shack-Hartmann, best the reconstructor is modified Hudgin, for which the
slopes are: $s_x[m,n] = \phi[m+1n,n+0.5] - \phi[m,n+0.5]$.
The reconstruction of the wavefront is performed with the following procedure:
$\begin{matrix} \hat{\Phi}[k,l] & = & \left\{ \begin{matrix} 0 & \mbox{if } k,l = 0 \\ S_x[k,l]\cdot H_x[k,l] + S_y[k,l]\cdot H_y[k,l] & \mbox{otherwise } \end{matrix} \right.\end{matrix}$

The the spatial filters are different: we ahve to shift each slope signal half a
sample along the orthogonal direction:
$ H_x[k,l] = \frac{ (\exp\left[ - \frac{2\pi i \cdot k }{N} \right] -1) \exp(-i\pi
l / N) }{ 4\left( \sin^2\frac{\pi k}{N} + \sin^2\frac{\pi l}{N} \right)} \\ H_y[k,l] = \frac{ (\exp\left[ - \frac{2\pi i \cdot l }{N} \right] -1) \exp(-i\pi
k / N) }{ 4\left( \sin^2\frac{\pi k}{N} + \sin^2\frac{\pi l}{N} \right)} $

Taking the inverse transform $\mathcal{F}^{-1}\hat{\Phi}[k,l] $ produces the
estimate of the wavefront $\hat{\phi}[m, n]$. Such a geometry estimations (see Fig.~\ref{fig:ao2004_3670_modified_Hudgin}) are of
high quality, and it does not suffer from global or local waffle like Fried
geometry.

Modified Hudgin geometry.

That was validated\cite{poyneer2003experimental} in on-sky testing at
Palomar Observatory. In the experiments, the authors of
\cite{poyneer2003experimental} tried out several options for geometries and
filtering, and the modified Hudgin performed best. It is noteworthy that the
regular Hudgin geometry suffered from misalignment-like errors. Finally, the
Fried geometry had excessive local waffle.

Moreover, the modified Hudgin takes half as much computation as the Fried geometry model.

Limitations and Disadvantages of the FTR

Disadvantages
First, if the aperture size in sub-apertures is not a power-of-2, that can cause
performance problems: extensive padding to get to a power-of-2 leads to
increased noise. The FTR requires the square or the pseudo-hex DM geometry. The
non-integer ratio of sub-apertures size or actuator spacing
requires correct re-sampling of estimate.

Advantages
The FTR is fast enough for the ExAO systems and large simulation codes. It provides adaptability with filtering - one can compensate the misalignment and
other errors. The modified Hudgin method does not suffer significantly from the global or local waffle.

The problems in implementation of modified Hudgin
The modified Hudgin still requires the suppression of the piston mode:

if ((rownum == 0)&(colnum == 0))
wavefront(rownum+1, colnum+1) = 0;
else

The original formula for the spatial filters works queer. The additional shift must be like this: $\exp(-i2\pi l / N)$ instead of simply $\exp(-i\pi l / N)$.

Moreover, the code for the different shifts gives slightly different results. The results with the spatial filter:

$$H_x[k,l] = \frac{ (\exp\left[ - \frac{2\pi i \cdot k }{N} \right] -1) \exp(-i2\pi l / N) }{ 4\left( \sin^2\frac{\pi k}{N} + \sin^2\frac{\pi l}{N} \right)} $$

and implemented as a MATLAB code:

H_row = (exp(-(i*2*rownum*pi)/N_row) -1)*(exp(-(i*2*colnum*pi)/N_row)); %% spatial filters for X axis that include the complex conjugate of exponentials
H_column = (exp(-(i*2*colnum*pi)/N_col) -1)*(exp(-(i*2*rownum*pi)/N_col)); %% spatial filters for Y axis that include the complex conjugate of exponentials

gives the reconstruction, shown in Fig.~\ref{fig:wfr_modified_Hudgin_row-col}a. Contrary, the spatial filter of:

$$H_x[k,l] = \frac{ (\exp\left[ - \frac{2\pi i \cdot l }{N} \right] -1) \exp(-i2\pi
k / N) }{ 4\left( \sin^2\frac{\pi k}{N} + \sin^2\frac{\pi l}{N} \right)} $$

with code:

H_row = (exp(-(i*2*rownum*pi)/N_row) -1)*(exp(-(i*2*colnum*pi)/N_row)); %% spatial filters for X axis that include the complex conjugate of exponentials
H_column = (exp(-(i*2*colnum*pi)/N_col) -1)*(exp(-(i*2*rownum*pi)/N_col)); %% spatial filters for Y axis that include the complex conjugate of exponentials

gives better result in the reconstruction, as seen in Fig.~\ref{fig:wfr_modified_Hudgin_row-col}b.

The results of reconstruction for the modified Hudgin algorithm for the row spatial filter: a) row-column relationships with shifts, and b) column-row relationships.

The results of the modified Hudgin geometry are indeed better compared with the standard Hudgin geometry (see the reconstruction with conventional Hudgin FFT WFR (below)

The results of reconstruction with conventional FFT Hudgin.

Monday, March 14, 2011

Adding a probability distribution to another: superimpose the probability distribution

We all learned some probability theory at Uni, but the learning pure theory is one thing, and application of it is slightly different. There is learning, understanding and there is an acceptance, as O'Brien from 1984 has told. The topic of this post is how to superimpose two (or more) distributions.

Introduction
For instance, we have a skewed probability density function and the data generated according to it. Let it be Log-Normal distribution:

$p_{LogNorm}(x;\mu, \sigma) = \frac{1}{x\sqrt{2\pi\sigma^2}}\, e^{-\frac{\left(\ln x-\mu\right)^2}{2\sigma^2}}$

with parameters $\mu = 4.0246$ and $\sigma = 0.0350$ respectively. The data looks like:

OK, so far so good: the data is noisy and have values from about 50 to about 64, that is in consistency with the probability probability density function:

The pdf was numerically estimated using hist function in MATLAB.

The Problem:
I need to make a new pdf that is superimposed by the uniform distribution. The idea is to make a distribution with very long "tail". Well, we can do it accurately, using theoretical expressions of sum of random variables (see, for instance, Probability and random processes by Stark and Woods, Chapter 3). But after all, we need to generate the data using it and not just tinker with it.

The solution
Let's say, we need to add some uniform distribution in order to make the "tail" of summary distribution longer. OK, we can easily generate uniform distribution using rand function in MATLAB or Octave. Generate a uniform distribution of random numbers on a specified interval [a,b]. To do this, multiply the output of rand by (b-a) then add a. Here is MATLAB code:

Isize = 256;
a=5;
b=50;
I = a + (b-a)*rand(Isize);

Now it is not obvious: if we just add two distributions (either in spatial domain or in Fourier domain using point-wise addition), we will get wrong result:

What has happenned? Well, because this is uniform distribution, it just wiped out our gentle Log-Normal data. Hence we need to attenuate the amount of the uniform distribution that we add.

One solution is to add, say, every second data point. Nice idea, but not clever: our data will be periodic that is not desirable. The better solution is to generate another uniform distribuiton [0,1] and make it like a mask for addition of desired data points.

In order to attenuate the influence of the other distribution, we need to add only a few point from the other distribution. Here is the example:

z = ones(Isize); %% creating a mask of the adding new pdf.
addition_percentage = 0.6;
z = rand(Isize,Isize);
z(z>addition_percentage) = 0;

%%% Erasing the unnecessary points from the other distribution
I = I.*z;

Here our mask will be like:

0 0 0.1157 0.2088 0.5074

0.3981 0 0.3826 0.3936 0.1435

0 0.0476 0.5989 0.5155 0.5239

0 0 0 0.4117 0

0 0 0.0858 0 0

0 0.4345 0.4846 0.4592 0

0 0.4591 0.3690 0.1043 0

We can use ceil(z) to make it just 0 or 1, but in our case we want the "tail" to decay. Multiplication of two uniform distribution is not a longer uniform distribution, as one might think :-) It will be triangular-shaped and, if we continue to multiply it more, will tend to be Gaussian, as followed by CLT:

The Central Limit Theorem (CLT) states conditions under which the mean of a sufficiently large number of independent and identically distributed random variables, each with finite mean and variance, will be approximately normally distributed.

So then we just add:

I2 = I+Ilong;

where I is uniform distribution multiplied by uniform mask (now will be triangular-like), and Ilong is long-tailed distribution. The result pleases the eye, the mind and the soul:

That's exactly what we desired, and the data is appropriate:

Now it's a good time to write a huge article to SIAM Journal entitled "Recent ground-breaking advances in Advanced Probability Theory" :-)

Various thoughts and jots

The sum of two normal distributions is itself a normal distribution:

N(mean1, variance1) + N(mean2, variance2) ~ N(mean1 + mean2, variance1 + variance2)

This is all on wikipedia page.

Be careful that these really are variances and not standard deviations.