How autofocus works on cameras

The autofocus system of a camera is currently one of the most important and valued functions.

What is focusing on a camera?

In the entry on depth of field we saw that in photography an element of the scene is considered to be in focus when the points of the image that correspond to that element are very small (they form a circle of confusion so small that the human eye sees it as a point)

In practice, when we see an image, the elements in focus appear with a lot of contrast, its edges and lines are very well appreciated, the separations between its parts are sharp and differentiated.

Items that are not in focus appear blurrier, to the point that they may become unrecognizable.

The focusing process involves moving the lens relative to the sensor plane:

In the case of a simple lens, when we move the lens away from the sensor we focus on closer objects.

If we bring the lens closer to the sensor, we focus on more distant objects.

There will come a time when the focus of the lens practically coincides with the plane of the sensor, in that case we will be focusing at infinity: all distant objects (from a certain distance) will be in focus.

At the other extreme, to focus on objects very close to the camera, the lens has to be separated from the plane of the sensor.

In macro photography (photography of very small objects), specific objectives are used that allow focusing from very close (small distance between the camera and the object) or extension tubes with normal objectives are used to ‘move away’ the optical center and allow us to focus with the camera very close to the subject to achieve a higher magnification.

Camera lenses are made up of a lens system, but the principle of focus is the same.

What usually happens is that not all the lenses move within the objective, but rather a specialized group of lenses whose adjustment is equivalent to moving the optical center of the system.

On interchangeable lens cameras (reflex cameras and mirrorless cameras), most lenses include a manual focus ring.

Autofocus

The first autofocus cameras emerged around 1980.

Currently all cameras include automatic focus systems, some of them even without the possibility of manual focus.

In advanced compact and interchangeable lens cameras, there is the option of working with autofocus (this is common in most situations) or with manual focus using the lens’s focus ring.

The autofocus system works as follows:

  • The camera incorporates a detector that normally analyzes a small part of the image (the area of ​​the scene that you want to focus on)
  • The electronic system decides if that piece of image has contrast or if it is blurred.
  • Contrast is usually detected from abrupt transitions between elements of the scene: edges, lines, textures …
  • If the system determines that the image is blurred, it sends the order to move the focus lens slightly. And re-evaluate.
  • There comes a time when the system determines that it has achieved the maximum contrast, the maximum focus, for the area we want to focus on.

What is the ideal focus system like?

It would be a system that …

  • Get the focus really fast , the faster the better, ideally instantaneous
  • Get a precise focus on the point of the scene we want to focus on
  • Get focus in any circumstance

Such an ideal system does not exist, although focusing systems are becoming faster, more precise and more versatile.

It should also be noted that the focusing speed depends on the entire system as a whole: detector precision, algorithm for adjusting the position of the lenses (how and how much the lens has to move), speed and precision of the focusing motor. …

And it also depends on external conditions: amount of light in the scene, texture of the object we focus on …

We are going to see what are the techniques that are currently used in focusing systems, with their pros and cons.

Phase detection focus (reflex)

It is the system used by most SLR cameras.

The mirror of an SLR camera is actually made up of two mirrors.

The main mirror sends the image to the optical viewfinder, but it is a mirror that lets a certain amount of light pass to a second mirror, called a secondary mirror or sub-mirror, which reflects the image towards the phase detector.

The phase detector is a light sensor , which works similar to the image sensor.

This sensor however only receives a very small part of the scene, for example an area in the center of the image (or the area indicated by the focus point selected on the camera).

The focus sensor is specialized in detecting light transitions of the scene, for example an edge of an object, a line, a texture … something that generates a significant contrast between two points of light. This transition turns into an electrical signal, which we could imagine as a spike.

For each focus point there are two separate sensors that triangulate .

Each of them receives the same image of the area we want to focus on. When the image is in focus the peaks of the two electrical signals coincide. When the image is out of focus, the peaks don’t match and the electronics can calculate exactly where the lens needs to move and how far we are from the focus point.

Some focus points only detect vertical transitions (vertical lines or edges of the scene), others only horizontal transitions and some focus points detect both are called focus points cross (cross type AF point).

The phase detection focusing system is very fast and quite accurate.

As it is known at all times where the lens has to move, it is a system that works very well both for fast focus and for tracking moving objects, since the electronic part can even introduce a certain margin of prediction.

For phase detection to work properly, a certain amount of light is required in the scene. It is also necessary that the scene contains those horizontal or vertical lines, borders … in short, that the scene (at least at the point of focus) has a certain texture.

One of the disadvantages of the system is its construction complexity .

The problem comes because the phase detection sensors are located in a different plane from the image sensor .

They do not detect exactly what reaches the main sensor, they are independent elements, and therefore the entire system has to be perfectly built (mechanical part) and synchronized (mechanical and electronic part). Each camera, one by one, has to be calibrated with great precision, otherwise it will present back focus or front focus problems , that is, all the images will appear out of focus.

Another problem with the traditional phase detection approach in SLRs is that when the mirror is raised, this system is no longer operational.

For this reason, when we use the screen (live view) for photography instead of the optical viewfinder, the focus is usually slower, and sometimes much slower depending on the camera.

And the same is true when using the SLR camera for video, as the mirror remains raised all the time. That is, the pure phase detection focusing system (using separate sensors) is not suitable for video .

Phase Detection Focusing is known in English as PDAF ( Phase Detection Auto Focus ). This nomenclature is also used to refer to systems that use hybrid focus: phase + contrast, both in cameras and mobile phones.

Contrast detection focus

It is the system used by most compact cameras and many SLRs when working in live view mode (through the screen). From a technical point of view it is a very simple system, it does not need external elements, or additional sensors, or complex electronics, or calibration.

Once we select the area we want to focus on in the scene, the processor analyzes it directly from the image generated by the sensor.

The system sweeps, moving the focusing lens, and at each position calculates the contrast level of the image. Scanning stops when the maximum contrast level is determined and the processor moves the lens to that position.

In principle it is a process of trial and error, because the system does not know where to move the lens, or how much to move it, and therefore it is a relatively slow process compared to phase detection.

Traditional contrast detection focus movement is typical: lens travel forward, backward travel, forward a little … kind of back and forth until you get focus. In English it is known as autofocus hunting .

When continuous focus is activated, this swing is constant because the camera has to ensure in real time that the distance between the camera and the object we are focusing on has not changed.

In photography, this effect (with continuous focus) can be a bit annoying when viewing the scene through the electronic viewfinder or the rear screen. The photograph itself, the final image, appears perfectly in focus.

In video (with continuous focus) it is more problematic because the algorithms have to track the contrast of the area of ​​interest and at the same time they have to minimize the focus hunting effect that can become very annoying.

A balance has to be found between the system’s response to changes in the scene and the precision of focus. That is why systems based on video contrast detection tend to have slower reactions and transitions (going from focusing on one object to another located on different planes) are not as smooth.

However, contrast detection also has advantages:

  • The focus plane is the sensor plane, no back focus / front focus problem. It is a process that feeds itself, therefore when focus is achieved it is usually very precise (maximum contrast)
  • No need for specific focus points, you can focus using any area of ​​the image
  • Focus can be achieved with less light in the scene
  • Can find focus in scenes where there are no very sharp vertical / horizontal edges
  • Very complex recognition and predictive algorithms can be applied , for example face recognition for faster focus and tracking

The contrast detection approach is known in English by the acronym CDAF ( Contrast Detection Auto Focus ). This nomenclature is also used to refer to systems that do not use a hybrid approach. For example in cameras or mobiles that only rely on contrast detection.

Hybrid focus built into the image sensor

This is probably the system of the future, which is already used in practically all cameras today with different variants.

The idea is very simple: instead of using a separate sensor to do phase detection, why not use the image sensor itself?

Image sensors using hybrid technology include areas (pixels) dedicated exclusively to phase detection. These special pixels are distributed throughout the sensor area.

There may be many phase detection zones on the sensor. For example, the Sony a6000 EVIL (mirrorless) camera

includes 179 phase detection points throughout the entire area of ​​its APS-C sensor.

In general, these phase detectors built into the image sensor are not as effective as the independent phase detectors of SLRs.

One has to think that independent detectors are specialized sensors, with very specific and very fast electronics, internal optics also optimized for phase detection and a separation between pairs that allows a more precise triangulation.

But the advantage of the hybrid system is that the phase detectors are exactly in the plane of the sensor (they are part of it). No calibration error problems.

Another great advantage is that the two focusing techniques can be combined . Phase detection tells the processor where to move the lens very quickly and contrast detection is responsible for fine-tuning the focus and achieving the highest possible contrast.

Furthermore , very powerful prediction algorithms can be implemented , using for example many phase detection points at the same time and many contrast detection areas.

All manufacturers are developing new technologies and algorithms based on hybrid phase / contrast detection.

The speed and precision of focus based on this technology is increasing.

One of the drawbacks of hybrid focusing systems is that the cells dedicated to the focusing system do not contribute to light detection to generate the final image. That is, we can imagine an image with ‘gaps’ that correspond to the position of those focus points.

The camera internally has to do some kind of interpolation to reconstruct a complete image. And this process can lead to a banding effect that is noticeable especially when using a very high ISO or recovering shadows in the development / editing process. In these extreme situations, a pattern can be seen in the image, usually in the form of bands of different tones or colors.

Under normal conditions these effects or patterns in the image are totally invisible.

In video, the advantage of the hybrid focus system is that the phase detection part knows at all times if the distance between the camera and the object in the scene has changed, it does not have to constantly analyze the contrast (trial and error) . The focus hunting effect is greatly minimized.

With the face (and eyes) detection algorithms, the phase detection system helps to determine how far away the main subject is and then the system based on contrast detection is in charge of analyzing and detecting the patterns (face, eyes , etc.). In a system based solely on contrast detection, it is sometimes very difficult to identify a face in an image that is totally out of focus, without distinctive features or patterns.

Dual Pixel CMOS Focus

This Canon technology is basically a hybrid focus system.

It uses phase detection built into the image sensor, but in this system all the pixels in the sensor are used for phase detection focusing .

Each pixel of the sensor is actually made up of two independent cells (two photodiodes, A and B), each with its own micro-lens.

At the moment of focusing, each pair of cells (in the area we are focusing on) works as a phase detector and allows to triangulate the distance to the object and focus.

Another way of looking at it would be to think that the camera has two images: one formed from cells A and the other formed from cells B. By superimposing these two images and seeing their differences, the camera can determine where it has to move the image. focus lens.

Then the contrast detection system and the algorithms that are above can be in charge of doing the fine adjustment or the detection and tracking of patterns (eg faces, eyes, etc.)

When the shutter button is pressed, each pair of pixels is combined to generate the image information, as if it were a single photodiode.

This type of approach works very well for example for video, for tracking objects . Once the object we want to focus on is ‘hooked’, it allows a fairly precise and fast tracking throughout the entire scene since the ‘focus points’ are evenly distributed throughout the sensor.

The Dual Pixel system is not as fast as the traditional phase detection approach for photography (that of specialized SLR cameras), although these types of systems evolve with each generation of cameras.

For the same reasons we discussed in the hybrid approach: the independent phase detector is optimized for this task and the physical separation of each pair of sensors makes triangulation easier.

One drawback of the system is the price. Building a Dual Pixel sensor is more expensive than building a traditional sensor or a hybrid sensor (Hybrid CMOS).

In video, the Dual Pixel system behaves similarly to the generic hybrid system.

The phase detection part gives the initial push to estimate where the main object of the scene is located and the face, eye or object tracking detection algorithms do the fine tuning. The two systems: phase + contrast, are constantly providing information to the camera.

Panasonic DFD focus

Panasonic cameras (starting with the Panasonic GH4) use a system known as DFD (Depth from Defocus) based on contrast detection.

As we saw in the corresponding section, contrast detection has the problem that the system does not know where to move the focusing lens, or how much to move it.

The basic contrast detection algorithm analyzes the image as the focus moves until the contrast level (in the area we are focusing on) reaches a maximum. Let’s assume that 10-15 images of the scene are analyzed until the exact point of focus is found.

The DFD algorithm is based on the following:

The camera analyzes a first image and compares it with the next (with the focusing lens in another position).

The camera searches its database for the characterization of the lens it is using and from that information, and by analyzing the images, it can calculate quite precisely where and how far the focusing lens has to move.

Once the focus lens is in position, fine tuning is performed by trial and error (as in basic contrast detection).

The advantage of the Depth from Defocus system is that the camera only has to analyze 4-5 images and the focus lens travel is much less in most cases. The lens makes two initial movements, a direct movement towards the estimated area and then a couple of correction movements (compared to the 10-15 movements it would make in pure contrast detection).

That translates into shorter response times.

The downside of DFD is that it only works when the camera uses certain lenses: the ones that Panasonic has characterized, which are precisely its lenses.

DFD does not work with non-Panasonic lenses. And when a new lens comes out, the camera firmware needs to be updated so that it is recognized by the DFD system.

With other lenses the camera uses the base contrast detection focusing system.

In video, the DFD system is faster when making transitions between shots. For example, when focusing from a close object to a distant one.

But once the object is in focus (if we use continuous focus) the system has to periodically check that the distance between the object and the camera has not changed. And you have to minimize the effect of focus hunting (micro variations of focus) as much as possible.

Again, that balance makes DFD systems less responsive to changes in the scene than cameras with hybrid focus systems.