In a blog post on Its Own Google AI Blog, Google Software Engineer Bartlomiej Wronski and Computational Imaging Lead Scientist Peyman Milanfar have laid out how they Established the Newest Super Res Zoom technology Within the Pixel 3 and Pixel 3 XL. Over the last year or so, many smartphone manufacturers have added multiple cameras for their telephones with 2x or even 3x optical zoom lenses. Google, however, has taken a different path, deciding instead to stay with a single principal camera in its new Pixel 3 models along with implementing a new feature it is calling Super Res Zoom.
Unlike conventional digital zoom, Super Res Zoom technology is not only upscaling a crop from a single image. Rather, the technology combines many marginally counter frames to make a higher resolution image.
Compared to the normal demosaicing pipeline that needs to interpolate missing colors because of the Bayer color filter array (shirt ), openings can be full of shifting several images one pixel horizontally or vertically. Some committed cameras implement this by shifting the sensor in 1 pixel , however, the Pixel 3 will it cleverly by essentially locating the right alignment in software after amassing multiple, randomly altered samples. Illustration: Google
Even the Google engineers are using the photographer's hand motion - and the consequent motion between individual frames of a burst - for their advantage. After optical augmentation eliminates macro movements (5-20 pixels), the residual high frequency movement due to hand tremor naturally alters the picture on the detector by only a couple pixels. Considering any shift is not likely to be precisely (a multiple of) one pixel, scene detail can be localized with sub-pixel precision, as long as you interpolate between pixels when synthesizing the superb resolution picture.
After the unit is mounted on a tripod or otherwise stabilized natural hand movement is simulated by slightly moving the camera OIS module between shots.
The images out of a burst - of up to 15 frames on the Pixel 3 - have been adapting on a foundation grid of higher resolution compared to that of each individual frame. First a reference frame is selected, and then the rest of the frames are aligned relative to it using sub-pixel precision. This contributes to increased detail - albeit ultimately constrained by the lens' resolving energy - and cleaner images, because frame averaging reduces sound. Whenever there are objects which have moved relative to this reference framework, the program merely joins information from different frames if it has found the correct corresponding feature, thus preventing ghosting.
As a bonus there's no more need to demosaic, resulting in even more picture detail and less noise. With enough frames in a burst any scene element will have dropped on a crimson, green, and blue pixel onto the image sensor. After alignment G, R, and B's information is then readily available for any scene element, removing the requirement for demosaicing.
Furthermore, Google's mix algorithm takes into consideration edges in the picture and adapts accordingly, merging pixels along the direction of borders as opposed to across them. This approach provides a suitable trade-off between increased resolution and noise cancelling, and avoids the artifacts less sophisticated techniques introduce (see dots and enhanced perception of sound in the'Dynamic Pixel Alter' crop on right, here).
One might initially think'wouldn't it be much easier to just put an optical 2x zoom at the telephone', but maybe that is not the question to inquire. Super resolution can increase the resolution of even the standard camera with no zoom, and also all zooms between a wide and a tele module. And any technique that makes a single camera will make multiple camera approaches which much better. Imagine a smartphone using 4 or 3 lens modules that allows you to easily zoom between each of focal point, with high-resolution ensuring focal lengths in between those of each lens module stay detailed.