home
Psychological games
An image as a two-dimensional array of data. 2D graphics

An image as a two-dimensional array of data. 2D graphics

04.02.2019

]

The first computers of the 40s of the XX century ("ABC" (1942), "ENIAC" (1946), "EDSAC" (1949), "MESM" (1950)), were developed and used strictly for calculations and did not have separate graphics tools. However, even then, some enthusiasts tried to use these first-generation computers on vacuum tubes for obtaining and processing images. By programming the memory of the first models of computers and information output devices built on the basis of a matrix of electric lamps, it was possible to obtain simple patterns. Incandescent lamps were turned on and off in a certain order, forming images of various figures.

In the late 40s and early 50s, many computers began to use cathode ray tubes (CRTs) in the form of oscilloscopes, or Williams tubes, which were used as RAM. Theoretically, by writing 0 or 1 in a certain order to such a memory, some image could be displayed on the screen, but in practice this was not used. However, in 1952, British engineer Alexander Douglas ( Alexander Shafto "Sandy" Douglas) wrote the comic program "OXO" (Tic-tac-toe) for the EDSAC programmable computer (1949), which became the first computer game in history. The image of the lattice and zeros with crosses was built by programming the Williams tube or drawn on an adjacent CRT.

In the 1950s, the computing capabilities of computers and the graphics capabilities of peripherals did not allow drawing highly detailed images, but made it possible to display images character by character on monitor screens and standard printers. Images on these devices were built from alphanumeric characters (character graphics, later called ASCII-graphics and ASCII-Art). It's simple: the difference in the density of alphanumeric characters and the peculiarities of human vision: not to perceive image details from a long distance, made it possible to create drawings and pseudo-graphic objects on a computer. Similar images before the advent of computers on paper were created by typists on typewriters in the late 19th century.

In 1950, enthusiast Benjamin Laposky ( Ben Laposky), mathematician, artist and draftsman, began to experiment with the oscilloscope screen, creating complex dynamic figures - oscilions. The dance of light was created by the most complex settings on this cathode-ray device. High-speed photography and special lenses were used to capture images, later pigmented filters were added to fill the pictures with color.

In 1950, a monitor was used for the first time in the Whirlwind-I military computer (according to Russian Whirlwind, Hurricane), built into the US Air Defense SAGE system, for the first time as a means of displaying visual and graphic information. [ ]

In 1955 in the laboratory of the Massachusetts Institute of Technology(MIT) the Light pen was invented. A light pen is a light-sensitive computer input device, basically a nautilus, that is used to select text, draw images, and interact with user interface elements on a computer screen or monitor. The pen only works well with CRT monitors due to the way such monitors scan the screen, which is one pixel at a time, which gives the computer a way to keep track of the expected electron beam scan time and determine the pen position based on the last scan timestamp. At the tip of the pen there is a photocell that emits electronic pulses and simultaneously reacts to the peak glow corresponding to the moment the electron beam passes. It is enough to synchronize the pulse with the position of the electron gun to determine exactly where the pen is pointing.

Light pens were used with might and main in computing terminals of the 1960s. With the advent of LCD (LCD) monitors in the 90s, they practically ceased to be used, since the work of a light pen became impossible with the screens of these devices.

In 1957, engineer Russell Kirsch ( Russell A. Kirsch) from the US National Bureau of Standards invented the first scanner for the SEAC computer and received the first digital image on it - scan photo small child, Walden's own son. [ ]

In the 60s of the XX century, the real flowering of computer graphics began. With the advent of new high-performance computers with monitors based on transistors (2nd generation of computers) and later microchips (3rd generation of computers), computer graphics became not only the sphere of enthusiasts, but a serious scientific and practical direction in the development of computer technologies. The first supercomputers appeared (CBC 6600 and Cray-1), which made it possible to work not only with fast calculations, but with computer graphics on a new level.

In 1960, design engineer William Vetter ( William Fetter) from the Boeing Aircraft Corporation (Eng. Boeing) first introduced the term "Computer graphics". Fetter, drawing the design of the aircraft cockpit on a working computer, decided to describe the type of his activity in the technical documentation. In 1964, William Vetter also created a wire graphic model of a person on a computer and called it "Boeing Man", aka "The First Man", which was later used in television advertisements of the 60s.

In 1962, programmer Steve Russell ( Steve Russell) from MIT on a DEC PDP-1 computer created a separate program with graphics - the computer game “Spacewar! ". The creation of the game took about 200 man-hours. The game used a joystick and had interesting physics with nice graphics. However, the first computer game but without graphics can be considered the program of Alexander Douglas "OXO" (Tic-Tac-Toe, 1952)

In 1963, on the basis of the "TX-2" computer, an American software engineer from MIT, a pioneer of computer graphics, Ivan Sutherland (Ivan Edward Sutherland) created the Sketchpad software and hardware system, which allowed you to draw points, lines and circles on the tube with a light pen. Basic actions with primitives were supported: moving, copying, etc. In fact, it was the first vector editor implemented on a computer, which became the prototype of modern CAD (computer-aided design systems), such as modern AutoCAD or Compass-3D. Also, the program can be called the first graphical interface that came out 10 years before the Xerox Alto computer (1973), and it was such even before the term itself appeared. Ivan Sutherland in 1968 created the prototype of the first virtual reality computer helmet, calling it the "Sword of Damocles" by analogy with the ancient Greek legend.

In the mid 1960s. there were developments in industrial applications of computer graphics. So, under the leadership of T. Mofett and N. Taylor, Itek developed a digital electronic drawing machine (graph plotter).

In 1963, Bell Labs programmer Edward Zagek ( Edward E. Zajac) made the first computer animation - the movement of a satellite around the Earth. The animation showed a theoretical satellite that used gyroscopes to maintain its orientation relative to the Earth. All computer processing was done on IBM 7090 or 7094 series computers using the ORBIT program. [ ]

In subsequent years, other, but more complex and significant animations were released: "Tesseract" (Tesseract aka hypercube, 1965) by Michael Knoll from Bell Labs, "Hummengbird" (Hummingbird, 1967) by Charles Zuri and James Shafers, " Kitty" (1968) by Nikolai Konstantinov, "Metadata" (Metadata, 1971) by Peter Folders, etc.

1964 saw the release of the IBM 2250, the first commercial graphics terminal for the IBM/360 mainframe.

In 1964, General Motors, together with IBM, introduced the DAC-1 computer-aided design system.

In 1967 Professor Douglas Engelbart ( Douglas Carl Engelbart) designed the first computer mouse (XY pointer) and showed its capabilities at an exhibition in the city of San Francisco in 1968.

In 1967, IBM employee Arthur Appel describes an algorithm for removing invisible edges (including partially hidden ones), later called ray casting, the starting point of modern 3D graphics and photorealism.

In the same 1968 [ ] computer graphics experienced significant progress with the advent of the ability to store images and display them on a computer display, a cathode ray tube. The first raster monitors appeared.

In the 70s, computer graphics received a new breakthrough in development. The first color monitors and color graphics appeared. Supercomputers with color displays began to be used to create special effects in films (the 1977 fantasy epic "Star Wars" directed by George Lucas, the 20th-century Fox science fiction horror film Alien directed by Ridley Scott, and the later underrated 1982 sci-fi film Tron by Walt Disney and directed by Steven Lisberger). During this period, computers became even faster, they were taught to draw 3D images, three-dimensional graphics arose and a new direction of visualization - fractal graphics. Personal computers with graphical interfaces using a computer mouse appeared (Xerox Alto (1973)).

In 1971, the mathematician Henri Gouraud, in 1972 Jim Blinn, and in 1973 Bui Tuong Fong developed a shading model that allowed graphics to go beyond the plane and accurately represent the depth of the scene. Jim Blinn pioneered bump mapping, a technique for modeling uneven surfaces. And Phong's algorithm subsequently became the main one in modern computer games.

In 1972, computer graphics pioneer Edwin Catmull ( Edwin Catmull) created the first 3D rendering, a wire and textured model of his own left hand.

In 1975 the French mathematician Benoit Mandelbrot ( Benoit B. Mandelbrot), programming an IBM model computer, built on it an image of the results of calculating a complex mathematical formula (Mandelbrot set), and as a result of analyzing the obtained repeating patterns, he gave beautiful images a name - a fractal (from Latin fractional, broken). This is how fractal geometry and a new promising direction in computer graphics arose - fractal graphics.

In the late 70s, with the advent of personal computers (4th generation - on microprocessors), graphics from industrial systems moved to specific workplaces and to the homes of ordinary users. The video game and computer game industry was born. The first mass-produced personal computer with color graphics was the Apple II PC (1977), later the Apple Macintosh (1984)

In the 80s, with the development of the IBM PC video system (1981), graphics became more detailed and color-reproducing (image resolution increased and the color palette expanded). The first video standards MDA, CGA, EGA, VGA, SVGA appeared. The first standards for file graphic formats were developed, for example GIF (1987), Graphic modeling arose ...

Current state [ | ] Main Applications [ | ]

Scientific graphics - the first computers were used only to solve scientific and industrial problems. To better understand the results obtained, they were graphic processing, built graphs, diagrams, drawings of calculated structures. The first graphics on the machine were obtained in symbolic printing mode. Then special devices appeared - graph plotters (plotters) for drawing drawings and graphs with an ink pen on paper. Modern scientific computer graphics makes it possible to conduct computational experiments with a visual representation of their results.

Business graphics - an area of computer graphics designed to visually represent various indicators of the work of institutions. Planned indicators, reporting documentation, statistical reports - these are the objects for which illustrative materials are created using business graphics. Business graphics software is included in spreadsheets.

Design graphics is used in the work of design engineers, architects, inventors of new technology. This type of computer graphics is an indispensable element of CAD (design automation systems). By means of design graphics, it is possible to obtain both flat images (projections, sections) and spatial three-dimensional images.

Illustrative graphics are arbitrary drawing and drawing on the monitor screen. The illustrative graphics packages are for general purpose application software. The simplest software tools for illustrative graphics are called graphic editors.

Artistic and advertising graphics - made popular largely thanks to television. With the help of a computer, commercials, cartoons, computer games, video tutorials, video presentations are created. Graphics packages for these purposes require large computer resources in terms of speed and memory. A distinctive feature of these graphics packages is the ability to create realistic images and "moving pictures". Obtaining drawings of three-dimensional objects, their rotations, approximations, removals, deformations is associated with a large amount of calculations. The transfer of the illumination of an object, depending on the position of the light source, on the location of the shadows, on the texture of the surface, requires calculations that take into account the laws of optics.

Pixel art Pixel graphics, large shape digital art, is created with raster graphics software, where images are edited at the pixel level. In the enlarged part of the image, the individual pixels appear as squares and are easy to see. In digital images, a pixel (or image element) is a single point in bitmap. Pixels are placed on a regular two-dimensional grid and are often represented by dots or squares. The graphics in most older (or relatively limited) computer and video games, graphical calculator games, and many mobile phone games are mostly pixel art.

Computer animation is the production of moving images on a display screen. The artist creates on the screen drawings of the initial and final positions of moving objects, all intermediate states are calculated and depicted by the computer, performing calculations based on the mathematical description of this type of movement. Such animation is called keyframe animation. There are also various other types of computer animation: procedural animation, shape animation, program animation and animation, where the artist himself draws all the frames "manually". The resulting drawings, displayed sequentially on the screen with a certain frequency, create the illusion of movement.

Multimedia is the combination of a high-quality image on a computer screen with sound. Multimedia systems are most widely used in education, advertising, and entertainment.

Scientific work [ | ]

Computer graphics is also one of the areas of scientific activity. In the field of computer graphics, dissertations are defended, and various conferences are held:

Siggraph conference, held in the USA
Eurographics conferences are held annually by the Eurographics association in Europe
conference Graphikon, held in Russia
CG event, held in Russia
CG Wave 2008, CG Wave, held in Russia

Technical side [ | ]

According to the ways of setting images, graphics can be divided into categories:

2D Graphics [ | ]

Two-dimensional (2D - from the English two dimensions - "two dimensions") computer graphics are classified according to the type of presentation of graphic information, and the image processing algorithms that follow from it. Usually computer graphics are divided into vector and raster, although the fractal type of image representation is also isolated.

Vector graphics [ | ]

At the same time, not every image can be represented as a set of primitives. This presentation method is good for diagrams, is used for scalable fonts, business graphics, is very widely used for creating cartoons and just videos of various content.

Raster graphics [ | ]

Bitmap example

Fractal graphics [ | ]

fractal tree

CGI graphics [ | ]

CGI (eng. computer-generated imagery, lit. “computer-generated images”) - images obtained by a computer based on calculation and used in fine art, printing, cinematic special effects, on television and in simulators. Moving images are created by computer animation, which is a narrower field of CGI graphics.

Representation of colors in the computer [ | ]

To transmit and store color in computer graphics, various forms of its representation are used. In general, a color is a set of numbers, coordinates in some color system.

The standard ways of storing and processing color in a computer are due to the properties of human vision. The most common systems are RGB for displays and CMYK for printing.

Sometimes a system with more than three components is used. The reflection or emission spectrum of the source is measured, which makes it possible to more accurately describe physical properties colors. Such schemes are used in photorealistic 3D rendering.

The real side of graphics [ | ]

Any image on the monitor, by virtue of its plane, becomes a raster, since the monitor is a matrix, it consists of columns and rows. Three-dimensional graphics exist only in our imagination, since what we see on the monitor is a projection of a three-dimensional figure, and we ourselves create the space. Thus, visualization of graphics can only be raster and vector, and the visualization method is only a raster (a set of pixels), and the way the image is specified depends on the number of these pixels.

The world is three-dimensional. Its image is two-dimensional. An important task of painting and now photography is to convey the three-dimensionality of space. The Romans already owned some techniques, then they were forgotten and began to return to classical painting with the Renaissance.

The main technique for creating three-dimensional space in painting is perspective. Railway rails, moving away from the viewer, visually narrow. In painting, the rails can be physically narrowed. In photography, perspective arises automatically: the camera will shoot the rails as narrow as the eye sees them. However, do not let it almost close: it will no longer look like a perspective, but a strange figure; between the rails, the sides of the street, the banks of the river, a noticeable gap should be maintained.

It is important to understand that linear perspective is the most primitive, realistic way to convey the world. It is no coincidence that her appearance is associated with theatrical scenery (Florensky, Reverse Perspective). Conventionality, ease of transfer of a theatrical scene of small depth is very suitable for photography, devoid of a variety of techniques available in painting.

There are perspectives that are much more interesting than linear. In the works of Chinese masters there is a floating perspective, when objects are depicted simultaneously from below, above and in front. It was not a technical mistake of incompetent artists: the legendary author of this technique, Guo Xi, wrote that such a display allows one to realize the world in its totality. The technique of Russian icon painting is similar, in which the viewer can see the face and back of the character at the same time. An interesting technique of icon painting, also found among Western European artists, was a reverse perspective, in which distant objects, on the contrary, are larger than close ones, emphasizing their importance. Only in our days has it been established that such a perspective is correct: unlike distant objects, the foreground is really perceived in reverse perspective (Rauschenbach). Using Photoshop, you can achieve reverse perspective by magnifying background objects. For a viewer accustomed to the laws of photography, such an image will look strange.

The introduction of the corner of the building into the frame, from which the walls diverge in both directions, creates a semblance of an isometric perspective. The brain understands that the walls are at right angles and lays out the rest of the image accordingly. Such a perspective is more dynamic than the frontal one and more natural for the foreground. Just enter the end corners of objects and closely spaced buildings into the frame.

Due to the expansion, the isometric perspective is major, which is rarely suitable for classical portrait. Linear perspective, due to narrowing, better conveys minor emotions.

At the shooting stage, a number of tools are available to the photographer to emphasize the perspective. Objects of equal width (track, street, columns, furrows) that go into the distance, by their narrowing and even simply moving away, indicate to the viewer the three-dimensionality of space. The effect is stronger when shooting from a low angle to increase perspective distortion. This is enough for landscape shooting, but with a small image depth of interior shooting, the effect is hardly noticeable. It can be enhanced a bit in post-processing by narrowing the top of the image (Transform Perspective). However, even in a landscape, a hypertrophied perspective can look interesting.

Depth can be explicit in the meaning of the image: buildings are separated by a street or a river. The diagonal emphasizes three-dimensionality; like a bridge over a river.

Objects of a size known to the viewer in the background set the scale and, accordingly, form the perspective. In landscape photography, such a subject can be a car, but in portrait photography, try bending and tucking your leg (away from the camera) under the chair so that it, while remaining visible, seems smaller. You can even slightly reduce this leg in post-processing.

The ornament conveys perspective by reducing the elements visually. An example would be large tiles on the floor, marking lines on the road.

There is a technique of hypertrophied foreground. Disproportionately large, it creates image depth. Comparing the scale of the foreground and the model, the eye concludes that the model is much further away than it appears. Hypertrophy should remain subtle so that the image is not perceived as an error. This technique is suitable not only for post-processing, but also for shooting: distort the proportions when shooting with a 35 or 50mm lens. Shooting with a wide-angle lens stretches the space, enhancing its three-dimensionality due to the violation of proportions. The effect is stronger if you shoot the model at close range, but beware of grotesque proportions: only the authors of religious images can depict a person larger than a building.

Crossover works great. If the apple partially covers the pear, then the brain will not be mistaken: the apple is in front of the pear. The model, which partially covers the furniture, thus creates the depth of the interior.

The alternation of light and dark spots also gives depth to the image. The brain knows from experience that nearby objects are approximately equally lit, so it interprets differently lit objects as being at different distances. For this effect, the spots alternate in the direction of the perspective axis - deep into the image, not across it. For example, when shooting a model lying away from the camera in a dark frame, put highlights of light near the buttocks and near the legs. You can lighten/darken areas in post-processing.

A succession of increasingly dark objects is perceived to be decreasing. By gradually shading objects along the active line, you can get a subtle sense of perspective. Similarly, depth is conveyed by attenuating light: run a streak of light over furniture or on the floor.

A three-dimensional image can be obtained due to not only light, but also color contrast. This technique was known to the Flemish painters, who placed bright colored spots on their still lifes. Red pomegranate and yellow lemon side by side will look three-dimensional even in flat frontal lighting. They will stand out especially well against the background of purple grapes: a warm color against a cold background. Bright colored surfaces break out of the darkness well even with the weak light typical of a still life. Color contrast works better with primary colors red, yellow, blue, rather than tints.

On a black background, yellow comes forward, blue hides back. On a white background - on the contrary. Color saturation enhances this effect. Why is this happening? Yellow color is never dark, so the brain refuses to believe that a yellow object can be immersed in a dark background, not illuminated. Blue, on the other hand, is dark.

Perspective enhancement in post-processing comes down to simulating atmospheric perception: distant objects seem to us lighter, blurry, with reduced contrast in brightness, saturation and tone.

In addition to long distances, atmospheric effects naturally look in the morning haze, fog, smoky bar. Consider the weather: on a cloudy day or at dusk, there can be no significant difference between the foreground and background.

The strongest of the factors is contrast in brightness. In the settings, this is the usual contrast. Reduce the contrast of distant objects, raise the contrast of the foreground - and the image becomes prominent. This is not about the contrast between the foreground and the background, but about the contrast of the background, which should be lower than the contrast of the foreground. This method is suitable not only for landscapes and genre shooting, but also for studio portraits: raise the contrast of the front of the face, reduce the contrast on the hair and cheekbones, clothing. Portrait filters do something similar, blurring the subject's skin and leaving the eyes and lips sharp.

Contrast adjustment is the easiest way to do 3D post-processing of an image. Unlike other processes, the viewer will hardly notice the changes, which will preserve maximum naturalness.

Blurring is similar to reducing contrast, but they are different processes. The image can be low contrast while remaining sharp. Due to the limited depth of field, blurring of distant objects remains the most popular way to convey three-dimensionality in photography, and it is easy to enhance it by blurring the background in post-processing. Therefore, fewer details should be placed in the background - the brain does not expect distinguishable objects in the distance. Meanwhile, lowering the contrast better corresponds to natural perception: distant mountains are seen as low-contrast, not blurry, because when scanning the landscape, the eye is constantly refocusing, the problem of depth of field is alien to it. By blurring the background, you can at the same time sharpen the foreground. Additionally, in the foreground, you can enhance the lines of the image (High Pass Filter or Clarity). It is the high sharpness of the foreground that explains the characteristic bulge of the image of high-quality lenses. Caution: for the sake of a slight increase in three-dimensionality, you can make the image too hard.

Lighter objects appear more distant. This is due to the fact that in nature we see distant objects through the thickness of the light-scattering air; distant mountains seem bright. In landscape photography, therefore, one should be careful about the position of light objects in the foreground.

Lighten distant objects. The farther away, the more they merge with the brightness and tone of the sky. Please note that horizontal objects (land, sea) are better illuminated than vertical ones (walls, trees), so do not overdo it with lightening the latter. In any case, the objects should remain noticeably less bright than the sky.

Well, if you notice that lightening is another way to reduce the contrast in the brightness of the background. Darken the foreground a little to enhance the bulge effect.

It would seem that in the interior the opposite is true. If on the street the eye is used to the fact that the distance is light, then in the room the light is often focused on the person, and the interior is immersed in darkness; the brain is used to lighting in the foreground, not in the background. In interior images with a shallow scene depth, in contrast to landscape images, the illuminated model protrudes from a dark background. But there is also an opposite factor: 99% of his evolution, man observed the perspective in an open area, and with the advent of rooms, the brain had not yet had time to reorganize. Vermeer preferred a light background for portraits, and they are really convex. Illumination of a vertical background, recommended in photography, not only separates the model from it, but also, by lightening the background, gives the image a slight three-dimensionality. Here we are faced with the fact that the brain analyzes the location of objects according to several factors, and they can be in conflict.

Studio lighting looks interesting, in which the light spots lie on the areas of the model remote from the camera. For example, the chest that is further from the camera is highlighted.

Lower the color saturation on distant objects: due to the thickness of the air separating us, the distant mountains are desaturated almost to the level of monochrome and covered with a blue haze. Foreground saturation can be increased.

Since yellow is light and blue and red are dark, the color contrast is also a brightness contrast.

Desaturating a distant background, do not let it disappear from view. Often, on the contrary, you need to increase the saturation of the background in order to bring it out. This is more important than three-dimensionality.

A lot of tips for 3D photography are about temperature contrast. In fact, this effect is very weak, easily interrupted by the contrast in brightness. In addition, the temperature contrast is annoying, striking.

Very distant objects appear cooler because the warm orange light is absorbed by the air. When photographing a model on the beach with ships on the horizon in the background, lower the color temperature of the distant sea and ships in post-processing. A model in a red swimsuit emerges from the blue sea, and a model in the yellow light of a street lamp emerges from the bluish twilight.

This is the separate toning: we make the model warmer, the background - colder. The brain understands that there are no different color temperatures in the same plane, and perceives such an image as three-dimensional, in which the model protrudes from the background. Separate toning adds depth to landscapes: make the foreground warmer, the background colder.

An important exception to split toning: at sunrise and sunset, the distant background is not cold at all, but warm, with yellow and red-orange tones. The obvious solution - to use a white model in a purple swimsuit - doesn't work because the sunset light puts a warm tint on the model's body as well.

To summarize, to give a photo three-dimensionality based on atmospheric effects, it is necessary to contrast the foreground and background. The main opposition is in the usual contrast: the foreground is contrasting, the background is low-contrast. The second opposition is in sharpness: the foreground is sharp, the background is blurry. The third opposition is according to lightness: the foreground is dark, the background is light. The fourth opposition is by saturation: the foreground colors are saturated, the background colors are desaturated. The fifth opposition is in temperature: the foreground is warm, the background is cold.

These factors are often multidirectional. Yellow is brighter than blue, and light objects appear further than dark ones. It would be natural to expect yellow to recede and blue to approach the viewer. In fact, the opposite is true: a warm color emerges from a cold background. That is, color turns out to be a stronger factor than brightness. Which, on reflection, is not surprising: yellow and red are clearly distinguishable only up close, and the viewer does not expect to meet them at a great distance.

Bottom line: keep the background low-contrast, washed out, light, desaturated, bluish. And be prepared for the fact that the viewer, accustomed to hypertrophied 3D movies, will find the three-dimensionality you have created barely noticeable or absent.

IN portrait photography, it is better to rely on the proven chiaroscuro effect - the play of chiaroscuro on the face of the model, which will make the image quite convex. In genre photography, perspective gives the most noticeable three-dimensional effect. In a still life, the main factor will be the intersection (overlay) of objects.

Don't get carried away by perspective; it is only a background for the frontal plane on which your image trembles. IN modern painting, far from realism, the perspective is not held in high esteem.

Download the whole book:

3D and 2D Images: Models, Algorithms, and Domains of Analysis

P.A. Chochia

Abstract - The issues of modification of a two-scale model and processing algorithms in the transition from two-dimensional to three-dimensional images are considered. Changes in the analysis area, filtering algorithms, image decomposition and object detection are shown. Fast algorithms for calculating the local average and order statistics over a sliding window for 3D images are proposed.

Keywords - Image processing, 3D image, image model, processing algorithm, analysis area, fast algorithms.

I Introduction

When the term "three-dimensional image" is used, they often understand completely different types of data - the main ones are as follows.

1. Data given by a function of three coordinates and being a homeomorphic representation of a certain three-dimensional region of three-dimensional space, including all objects contained in it.

2. A stereoscopic image, consisting of a pair of two-dimensional images, which, due to disparity, give the observer an idea of the location of objects.

3. An image that is a projection of a three-dimensional scene (for example, an axonometric one), which allows you to evaluate the shape and location of objects, but at the same time remains two-dimensional.

4. A two-dimensional image, each point of which corresponds to some coordinates in three-dimensional space, such as range or terrain.

5. Images formed in a special way that create images of objects, such as holograms.

6. Video sequences containing a set of frames of objects. Such data can be represented as a three-dimensional array, but one of the coordinates in this case is not a spatial, but a time coordinate.

Further, under the three-dimensional or 3D-image (continuous or discrete) will be understood only images of the first type. Essentially, a 3D image is an extension of a regular 2D image by adding another spatial dimension.

Methods for forming a three-dimensional image may be different; the most famous is

P.A. Chochia, Ph.D., senior researcher, Institute for Information Transmission Problems of the Russian Academy of Sciences, Moscow. (e-mail: [email protected])).

tomographic scanning - X-ray computed tomography or magnetic resonance imaging. It is possible to obtain a 3D image as a result of seismic exploration in geological surveys, in microscopy using a zoom lens, in computer simulation of three-dimensional objects and scenes, or in some other way.

Processing and analysis of three-dimensional images currently play a significant role in many areas of research, especially in medicine and geology. In this paper, we consider the issues of extending the two-dimensional image model and applying it to three-dimensional images, the issues of modifying the operations of frequency and spatial filtering during the transition to 3D, the issues of smoothing and decomposition of images, noise filtering, detection of contours and objects, as well as computational aspects of the implementation of some algorithms for 3D images.

Most filtering algorithms in the transition from two-dimensional to three-dimensional signals are relatively easy to modify. This will be shown by examples of the most common algorithms based on frequency and spatial filtering.

II. Features of a three-dimensional image In a discrete form, a 3D image is represented by an array X = MxNxK in size. As in 2D, the value of each element xmnk is the value of the logarithm of brightness (energy) quantized into (xmax+1) gradations 0< xmnk < xmax, которое для краткости будем называть просто яркостью. Дискретный элемент 3D-изображения принято называть вексель. Трехмерное изображение, отображающее некоторую сцену, можно рассматривать состоящим из плотно упакованных связных трехмерных областей (объектов), соответствующих деталям сцены. Областью или объектом будем называть максимальное по размеру связное множество элементов изображения, имеющих близкие, возможно плавно меняющиеся значения яркости. Области могут соприкасаться произвольным образом, в том числе одна область может быть полностью окружена другой. На границах соседних областей значения яркости должны заметно различаться. Не соприкасающиеся области могут иметь произвольные, в том числе и совпадающие яркости. Пространственные границы между соседними областями, как объектами различающейся яркости, называют контурами. 3D-изображения характеризуются

International Journal of Open Information Technologies ISSN: 2307-8162 vol. 2, no. 11, 2014

the following properties:

3D image - a set of objects densely filling the image space;

Contours in a 3D image are spatial boundaries between objects;

Sectioning a 3D image with a plane and projecting it onto a plane of any direction gives a two-dimensional signal with all the properties of a conventional 2D image.

III. Areas of Analysis

The domain of analysis is the subset of input data used in parameter estimation. Methods in which each point (or small fragments) of an image uses its own processing parameters, determined by a limited and, as a rule, analysis area centered at a given point, are called local.

Consider a connected set of elements xijle Vd(xmnk) such that the distance from the central element xmnk is not further than d and together they form a figure of some given shape. For d< 2a3 множество Vd(xmnk), окружающее центральный элемент (воксель) xmnk, будем называть окрестностью и обозначать Vmnk, а при d >>1 - fragment and denote Wmnk. Note that, depending on the operations performed, the central element xmnk itself may or may not belong to Vd(x). Accordingly, operations of the form

ymnk = f(xijl 1 xijl e Vd(xmnk) ), (1)

in which the result at each point (m,n,k) depends only on the values of the elements xijl included in Vd(xmnk), are called local operations.

When moving from 2D to 3D, the variants of symmetrical neighborhoods and neighborhoods of elements undergo the following changes. A neighborhood of 2x2 elements (4 pixels) becomes a neighborhood of 2x2x2 elements (8 voxels), with each voxel adjacent to each. In a two-dimensional neighborhood of 3x3 elements (9 pixels), as is known, two neighborhoods of elements can be considered: 4-neighborhood (only on the sides of pixels) and 8-neighborhood (on the sides and vertices of pixels) . An analogue of the first of them in 3D will be a neighborhood with a 6-neighborhood of voxels (Fig. 1a). An analogue of the second one is a neighborhood with a 26-neighbourhood of voxels (Fig. 1c). An intermediate variant with an 18-neighborhood of voxels is possible (Fig. 1b). The choice of neighborhood variant is usually determined by the context of the problem and the algorithm used.

Rice. 1. Neighborhoods and neighborhoods of voxels in 3D: a) 6-neighborhood; b) 18-neighborhood; c) 26-neighbourhood.

In some cases, we are not interested in the entire set of points that fall into the analysis area, but only in some of its subsets, including the central element, which we will call the membership area. The way in which the membership area is chosen depends on

tasks; some options are discussed in section VI.

IV. Two-scale multicomponent model

3D IMAGE

To formulate information about the main properties of images - topological (shapes, sizes of areas and contour differences between them) and statistical (relationships between element values), an appropriate image model is needed. To be useful, it should describe the properties of images at distances determined by the characteristics of the problems, as well as provide opportunities for constructing efficient algorithms for processing and analyzing images.

Statistical relationships of image elements located at large distances differ significantly from similar properties of nearby elements and cannot be described by the same ratios. For a conventional two-dimensional image, a two-scale multicomponent model was developed that describes the relationship of elements quite well both at small distances of several sampling steps and at large distances commensurate with the size of the image objects. It can be successfully transferred to 3D images.

The values of the elements of the 3D image X = will be presented as the sum of statistically independent components:

xmnk Smnk ^tmnk + ^mnk (2)

The first term of the sum is the piecewise-smooth component Smnk, which determines the brightness levels of extended areas of the image; tmnk - texture-detailed component that carries information about the texture and small details; £,mnk - noise component, determined by the noise of the recorder, analog-to-digital converter, etc. All components are assumed

independent and additive, while tmnk and £,mnk are normally distributed and unbiased.

A. Small size scale

On a scale of small size (the scale of neighborhood elements), a relatively small connected set of elements located at a distance of several discretization steps is considered. As in a two-dimensional model, elements of a three-dimensional image are divided into two non-intersecting sets: those that fall on the boundary sections (contour) and those that do not fall (internal), which together full picture. The neighborhood Vmnk of the element xmnk is considered as a group of R elements xrmnk e Vmnk, r = 1,...,R, closest to xmnk and falling into the same set (contour or inner) as the element xmnk (Fig. 2).

Using the least squares method, a hyperplane is drawn that is closest to the values of the elements from Vmnk, a component with a hyperplane oriented along the coordinate axes MNK, some angle, magnitude and direction of which at the point (m,n,k) is characterized by the vector gmnk. At the point r of the neighborhood, the drawn hyperplane differs from the value

International Journal of Open Information Technologies ISSN: 2307-8162 vol. 2, no. 11, 2014

xrmnk with the random value yrmnk . Such a representation

allows you to link the values of the elements of the neighborhood

XLk ^ Vmnk formula:

Xmnk ^mnk + P gmnk + Y mnk , (3)

where pmnk is the value of the drawn hyperplane at the central point of the neighborhood (m,n,k), pr is the distance between the central element Xmnk and xrmnk, grmnk is the value of the projection of gmnk onto the vector from Xmnk to xrmnk, and yrmnk is

random value.

The concept of contour mask E = is introduced: emnk = 1 for contour and emnk = 0 for internal elements. Denoting for contour and internal grmnk elements

through Vmnk and Vmnk, and Ymnk through Cmnk and hmnk

respectively, represent grmnk and jrmnk as sums

grmnk = emnk R1 .

3. From the elements of Vmnk, z such xrmnk e V are selected,

(r = 1,..., z) that fall within the interval (XV - AV, XV + AV), where AV is its half-width. Based on the Xrmnk values from the given interval, the average is calculated:

Xmnk A(V mnk , Xmnk ,

A(Vmnk , Xmnk , n , A) = - E X

where XV - AV< Xmnk < XV + AV.

4. Similarly to item 2, according to the histogram of the fragment H "Wnk and

W r»W r>W/ W/t 3\

given n, the values R1 = R (n /L) and rw = rw(1 - nW/L3) are found. The value of XW is found by comparing

X„„k with RW and RW:

mnk if RW< Xmrk < RV ; XW = RW ,

RW if X„„k > RW

if Xmnk< R1W ; и X"

5. The smoothed value of Smnk is found as a median of the truncated histogram of the HmWnk fragment - that part of it

which is located in the interval (X - A , X + A):

Smnk = med(Wmk, Xmnk , nW, AW).

Rice. Fig. 4. Decomposition result: original image (left), smoothed Sn component (right) and plots of the marked line.

The resulting value of Smnk is considered to be the desired smoothed component. A simplified version of this algorithm was later published under the name bilateral filtering. An example of the decomposition of a two-dimensional image with dimensions of 256x256 elements with a Wmnk smoothing fragment with dimensions of 15x15 elements is shown in Fig. 4.

E. Detection of objects of a given volume It is shown that the above decomposition algorithm can be used to detect objects in an image. Similar to 2D images, for which the problem of detecting objects by their area is solved, in the three-dimensional modification, the problem of detecting objects by their volume is posed.

By analogy with the two-dimensional one, the three-dimensional problem can also be formulated in three versions: detection of objects with a volume (i.e., the number of elements) N j less than a given T1, greater than a given T2, and detection of objects with a volume in the interval T1< N] < T2.

1) Detection of objects with N1 > T It is assumed that the image consists of a fairly even background (a large area U0), on which there is a set of small areas U1,...,UJ, spaced far enough from each other, and you can choose a certain size of the fragment L (L3/2 > T) such that any fragment Wmnk contains at most one region with N] > T, or several smaller ones, but under the condition EN1< T (U1 _ W). В п. 5 алгоритма декомпозиции (25) выберем nW= T, a R"W = R(T/L3) и R2W = R(1-T/L3). Обработкой изображения с данными значениями R1W и R2W получим:

those. the smoothed component of the original image, on which the regions with N] > T remained, detected by the detector with the threshold value S(U0) ± 5, where S(U0) is the average background brightness, and 5< mini{IS(U1) - S(U 0)l}; (S(U 1) - яркости соответствующих областей).

2) Object detection with N1< T

The smoothed component Smnk, in (25) contains only regions with N1 > T, while regions with N1< T содержатся в разностной компоненте tmn = (Xmn - ^mn) - Smn. Детекция объектов с N1 < T достигается пороговым обнаружением в точках, где ltmnl >5 (5 - detection threshold).

3) Object detection with Tj< N] < T2 Возможны два варианта решения.

In the first case, we first choose nW = T1. Then the smoothed component Smnk in (25) will contain objects with N] > T1. Let us re-process it by algorithm (25) with nW = T2 (T2 > T1). Obviously, the newly obtained smoothed component S'mnk will contain only objects with N1 > T1. Taking the difference ymnk = ISmnk - S'mnkl we get a signal containing objects in the range T1< N] < T2. Недостаток данного решения - алгоритм получается двухпроходовым.

Second option. Note that when analyzing histograms by neighborhood and fragment, two different thresholds (nV and nW) are used. We choose the sizes of the neighborhood l and the fragment L larger than usual - such that l3 > 2T1 and L3 > 2T2. Given R1V and RV as

International Journal of Open Information Technologies ISSN: 2307-8162 vol. 2, no. 11, 2014

RV = RV(T1/l2) and RV = RV(1 - TR/2), after operation (24) we will have xmnk, which no longer contains the region with N]< Т1. Далее в п. 5 алгоритма декомпозиции, при анализе HW, зададим RW и R2W как RW = RW(T2/L3) и rw = rw(1 - t2/l3). Получив значение Smnk в (25), возьмем разность ymnk = I xmnk - Smnk\, на которой объекты выделяются пороговым обнаружением.

Note that both the concept of area in the two-dimensional case and the concept of volume in the three-dimensional version of the algorithm are used in a somewhat unusual sense - as a "local" volume, i.e. the volume of that part of the object that falls inside the Wmnk fragment.

VII. Some computational algorithms

A. Sum over a rectangular parallelepiped Let's introduce a notation for the sum over a fragment of a 2D image:

those. S(j)(mn) - the sum of the values of xuv elements falling into a rectangular fragment, the diagonal points of which have coordinates (i,j) and (m-1,n-1). Note that the fragment does not include the point with coordinates (m,n) and the corresponding row and column. Similarly for a 3D image, S(ijl)(mnk) is the sum of the values of xijl elements in a cuboid with corner coordinates (i,j,l) and (m-1,n-1,k-1):

For a two-dimensional image, the classical way to calculate the sum S(mn)(m+Hn+L) over a sliding rectangular fragment with dimensions of HxL elements in the transition from element (m,n) to element (m,n+1) is reduced to the formula

S(m,n+1)(m+H,n+L+1) S(m,n)(m+H,n+L) S(m,n)(m+H,n+1) + S(m,n+L)(m+H,n+L+1), (26)

where the last two members are the sums of elements over the left (removed) and right (added) columns of the fragment. The algorithm requires 4 operations regardless of the size of the fragment - 2 operations in expression (26) and two operations to recalculate each of the sums in the column S(m,n)(m+H,n+V) when moving from row m to row m+ 1. Additionally, N cells are required to store sums by columns.

When switching to a 3D image, formula (26) will be modified for a sliding rectangular parallelepiped with dimensions HxLxJ:

S(m,n,k+1)(m+H,n+L,k+J+1) S(m,n,k)(m+H,n+L,k+J) S(m, n,k)(m+H,n+L,k+1)+ + S(m,n,k+J)(m+H,n+L,k+J+1), (27)

where the last two terms are the sums of elements along the left (removed) and right (added) faces

parallelepiped. This algorithm already requires 6 arithmetic operations, regardless of the size of the fragment - 2 operations in expression (27), two operations for recalculating each of the sums along the faces and two for recalculating the sums along the columns. In addition, we need K

cells to store the array of face sums and NxK cells to store the column sums.

For a two-dimensional image, another algorithm for calculating the sum over a rectangle of arbitrary size is also known. Let the sums Smn = S(0,0)(m,n) be calculated for each point (m,n) over a rectangle with diagonal elements x00 and xm-1,n-1. Then the sum S(ij)(mn) of element values inside the rectangle with corner coordinates (i, j) and (m-1,n-1) will be:

S(ij)(mn) Smn Sm,j Si,

which, on average, for each element of the image requires 2 operations to calculate the sum Smn and 3 operations to calculate by formula (28). However, to store sums Smn, MxN cells, equal to the size of the image, are already required. The advantage is that for the same 3 operations it is possible to calculate S(ij)(mn) for any coordinate values, and not just for the sliding fragment.

Algorithm (28) can also be modified for a 3D image. Let for each point (m,n,k) the sums Smnk over a rectangular parallelepiped with diagonal elements x000 and xm-1,n-1,k-1 be calculated, i.e. Smnk = S(ooo)(mnk). It is easy to show that in this case the sum S(ijl)(mnk) of the values of elements inside the box with angular coordinates (i, j, l) and (m-1,n-1,k-1) is calculated using the following operation:

S(ijl)(mnk) = Smnk -Smjk -Smnl -Sink +Smjl +Sijk +Sinl -Sijl. (29)

Thus, taking into account the fact that 3 arithmetic operations are required to calculate each of the Smnk values, 10 arithmetic operations will be required on average to calculate the sum S(ijl)(mnk) for each element of a three-dimensional image. The amount of additionally required memory will be MxNxK cells with a capacity sufficient for the values of the Smnk sums.

Similarly, one can find dispersions over the fragment D(ijl)(mnk) by calculating the values of the sums of squares S(mnk)(x2) for each of the points (m,n,k) and for the rectangular parallelepiped S(ijl)(mnk)(x2) , and then using the formula

D(ijl)(mnk) = (S(ijl)(mnk)(x) - (S(ijl)(mnk)) )/N(ijl)(mnk), (30)

where S(ijl)(mnk)(x2) is the sum of the squares of the values of the elements that fall into the box, and the total number of points in the box is N(ijl)(mnk) = (m-i)x(n-j)x(k-l).

B. Order statistics on a cuboid

The basis for calculating order statistics for a fragment W of both 2D and 3D images is the histogram of the distribution of brightness values for this fragment hW(x), as well as its integral characteristic FW(x):

FW (x) = Z hW (i) ; FW(xmax) = NW, (3 1)

where xmax is the maximum possible brightness value, and NW is the number of points in the fragment W. Order statistics of the form RW(n), where 0< n < NW, представляют собой зависимость:

RW(n) = z if FW(z-1)< n < FW(z). (32)

Algorithm for sliding calculation of histogram by

International Journal of Open Information Technologies ISSN: 2307-8162 vol. 2, no. 11, 2014

fragment is constructed similarly to formulas (26) and (27), i.e. When a fragment is moved to the next point, points are removed on one face of the fragment and points are added on the opposite face. In the two-dimensional case, the algorithm of sliding histogram calculation over a fragment when moving from a point (m, n) to a neighboring point (m, n + 1) requires an average of 2H number of operations (H is the number of rows in a fragment). In the three-dimensional case, when passing from the point (m, n, k) to the point (m, n, k + 1), 2HxL operations will be required, where HxL is the number of points in the face of the parallelepiped perpendicular to the direction of displacement K.

In the three-dimensional case, the number of required operations grows in proportion to the product HxL. However, if (HxL) > (xmax+1), then instead of operations with the values of individual points, it is advantageous to preform histograms for the faces of the parallelepiped hj)(mnk+1)(x), and

then carry out operations of subtraction and addition of such histograms:

hW(x) = hW(x) - hF(x) +

h(i,j,k+1Xm,n,k+J+1)(x) h(ijk)(m,n,k+J)(X) h(ijk)(m,n,k+1) (x)

H(i,j,k+J)(m,n,k+J+1)(X) ,

where J is the size of the fragment in the direction of displacement K. Actions according to formula (33) require on average 2(xmax+1) arithmetic operations per image point, regardless of the size of the box. To recalculate histograms along the faces of the parallelepiped hfjk)(mnk+1)(x)

additionally, on average, 2L operations per point and (xmax+1)xK memory cells are needed to store K histograms.

The number of operations (xmax+1) required to add/subtract each of the histograms along the faces of the parallelepiped hFjk)(mnk +1)(x) can be reduced if

use the spatial correlation of the processed data. In areas with a slow change in brightness, which are usually the majority in real images, the range of element values is relatively small - several times less than the full range of (xmax+1) gradations. Adding to each of the histograms hFjk)(mnk +1)(x) 2 cells for

remembering the minimum and maximum values distribution, and, accordingly, by processing only the specified range of gradations, it is possible to additionally reduce the total number of operations by several times.

VIII. Conclusion

The issues of modification of two-dimensional models and image processing algorithms as applied to three-dimensional images are considered. Options for changing the analysis area, filtering algorithms, image decomposition and object detection are shown, which are relatively easy to modify when moving from 2D to 3D images. Computational algorithms are proposed that reduce the number of operations in determining the values of the mean and ordinal statistics for a sliding fragment for a 3D image.

Bibliography

Toriwaki J., Yoshida H. Fundamentals of Three-Dimensional Digital Image Processing. New York: Springer, 2009.

Krasilnikov N. Digital processing of 2D and 3D images. St. Petersburg: BHV-Petersburg, 2011.

Gonzalez R., Woods R. Digital Image Processing. M.: Technosfera, 2012.

Chochia P.A. "Two-Scale Image Model". In book.

Image encoding and processing. Moscow: Nauka, 1988, pp. 69-87.

Chochia P.A. Processing and analysis of images on the basis of a two-scale model: Preprint IPPI AN USSR. M.: VINITI, 1986.

Ahmed N., Rao K. Orthogonal transformations in digital signal processing. Moscow: Communication, 1980.

Chochia P.A. Image Enhancement Using Sliding Histograms. Computer Vision, Graphics, Image Processing, 1988, vol. 44, no. 2, pp. 211-229.

Pratt W. Digital Image Processing. M.: Mir, 1982. Vol. 1, 2.

Roberts L. “Automatic Perception of 3D Objects”. In book. integrated robots. M.: Mir, 1973, S. 162-208.

Chochia P.A. “Digital filtering of impulse noise on television images”. Communication technology: ser. Television Technique, 1984, issue 1, pp. 26-36.

Chochia P.A. “Smoothing the Image While Preserving Edges.” In book. Image encoding and processing. M.: Nauka, 1988, S. 87-98.

Chochia P.A. “Some object detection algorithms based on a two-scale image model”. Information processes, 2014, vol. 14, no. 2, pp. 117-136.

Lee J.S. “Digital Image Smoothing and the Sigma Filter”. Computer Vision, Graphics, Image Processing, 1983, vol. 24, no. 2.pp. 255-269.

Tomasi C., Manduchi R. “Bilateral filtering for gray and color images.” Proc. IEEE 6th Int. Conf. on Computer Vision, Bombay, India, 1998, pp. 839-846

Chochia P.A. “Parallel Algorithm for Computing a Moving Histogram”. Avtometry, 1990, No. 2, pp. 40-44.

Chochia P.A. “Modification of the model and processing algorithms in the transition from two-dimensional to three-dimensional images.” In book. IX

International Scientific and Practical Conference “Modern information Technology and IT education." Collection of selected works. M.: MSU, 2014, pp. 820-833.

International Journal of Open Information Technologies ISSN: 2307-8162 vol. 2, no. 11, 2014

Three-dimensional and two-dimensional images: modification of image model, area analysis processing algorithms

Abstract - The specificities of three dimensional images are formulated. The adaptation of analysis area and two-scale image model to 3D-images is studied. The modifications of various image processing algorithms to 3D-images are demonstrated. Fast algorithms for calculating local average and order statistics in the moving window for 3D-images are proposed.

Keywords - Image processing, three-dimensional image, image model, image processing algorithm, analysis area, fast algorithms.

2D (two dimensions) is a type of computer graphics. Such an image will always look flat because it uses only two dimensions - width and height. Used to create logos, maps, websites, advertising banners, in games and application interfaces, cartoons and videos. Even though the 2D graphics look like flat image, due to the shadows, you can achieve the effect of three-dimensional objects (but not photorealism).

Advertising material layouts created for the Museo Argentino de Ciencias Naturales(Museum of Natural Sciences, Buenos Aires). Author: Lucas Rod.

Video for one of the Rijksmuseum projects. The video sequence includes images of 211 works from the museum's online collection.

An interactive game for children about the life of dinosaurs, drawn in 2D graphics.

There are three types of 2D graphics:

Vector graphics: the image is represented as geometric shapes, which gives the maximum accuracy of the constructed image. This image format is easily edited, scaled, rotated, deformed, and allows you to simulate three-dimensionality. Among the shortcomings of the vector, one can note the lack of realism and the inability to use effects. Vector graphics are suitable for drawing drawings and diagrams, used for scalable fonts, business graphics, brand book elements (logos, decorative patterns etc.), used for creating cartoons and various commercials, as well as in printing (provides high image quality).
Raster graphics: a picture is formed from dots of different colors (so-called pixels) that form rows and columns. Such images are highly realistic, due to the possibility of applying a variety of effects. The disadvantage of the raster format is poor scalability (when the image is reduced or enlarged, its quality is lost).
Raster image formats are used when creating web pages on the Internet, mobile applications, any interfaces, in digital painting etc.
An example illustrating the difference between a vector and a raster image.

Sample museum app for touch devices:
Fractal graphics: an image is made up of parts that are in some sense similar to the whole - enlarged parts of an object resemble the object itself and each other. In computer graphics, fractals are used to construct images of natural objects such as trees, bushes, mountain landscapes, sea surfaces, and so on.
Interactive installation in

From a two-dimensional picture, a person is able to compose a very complete idea of the distances to the depicted objects, their shape and size, and thus fully perceive the three-dimensional world in all its depth. How do we achieve this?

As you know, a person with the help of the eyes directly sees exactly a two-dimensional picture. What we see can be captured, for example, with a camera, printed on a sheet of paper (i.e., in a two-dimensional plane) and hung on the wall, so the image that enters our brain from the eyes is two-dimensional.

However, when looking at real objects, and at photographs, and when watching videos, we manage to extract so much information from these two-dimensional pictures that they begin to seem three-dimensional to us. We perceive the relative position of objects in space very well through sight alone. The type of vision that allows you to perceive the shape, size and distance of objects is called stereoscopic vision. A person has such vision and achieves this through the following effects:

binocular vision. The man has two eyes. A slightly different 2D image of the same 3D scene is formed on the retina of each eye. Based on life experience and huge computational abilities, the brain, comparing these two slightly different images, forms an idea of the three-dimensionality of the picture. This effect works best when viewing close objects, such a distance to which is at least somehow comparable to the distance between the eyes. When viewing objects that are more than five meters away, this effect almost does not affect. We also make a reservation right away that in view of the fact that binocular vision is not the only factor that allows you to see in 3D, and since its scope is limited to a few meters, the absence of two eyes would not be a disaster for a person. We would still be able to see in 3D, it would just take us more life experience and time to learn how to apply the rest of the effects. This statement is confirmed very easily. Just close one eye. Well, have you stopped seeing in 3D? No!

Displacement of objects during the movement of the observer. When the observer moves, the picture that he sees is constantly changing, while close objects change their position in this picture much faster than distant ones, which slowly change their position in the observer's field of view. And again, a lot of life experience and the computational abilities of the brain make it possible to perceive the distance to them well by the speed of moving objects in the field of view. By the way, by actually moving one eye to a distance equal to the distance between the eyes, you can replace binocular vision, because indeed, the brain will eventually be able to compare the same two pictures as from two eyes at once. However, this method requires a lot of effort and constant movement, and the pictures will not be captured at the same moment in time, i.e. may already be different. Therefore, binocular vision is still a very useful option, which helps a lot when working with close objects, which is what a person usually does.

Life experience. Most people have a good idea of the size of many familiar objects, such as trees, other people, cars, windows, doors, and so on. With this knowledge, you can make a good estimate of the distance to one of these objects (and hence to those objects that are located nearby), depending on what part of the total field of view they occupy. For example, you will immediately guess that the girl in the photo below is located much closer to the observer than the tower to the top of which she supposedly reaches...

The smoke of distant objects. The atmosphere still has a certain degree of opacity. Therefore, very distant objects appear smoky. So, according to the degree of smoke, it is possible to determine which of the distant objects is located further, and which is closer to the observer. This is a very useful effect, because other methods of constructing a three-dimensional image do not work well for distant objects.

Perspective, shadows and lighting. According to the configuration of the shadows and the degree of illumination of one or another part of the object, on the basis of extensive life experience, the brain perceives the shape of objects well. Perspective is an effect according to which, for example, two parallel lines in space converge to a point on the image at a large distance from the observer. The brain is able to very well perceive the information coming to it due to this effect.

The ability of the eye to focus on only one distance. The eye, like any optical device, cannot see the picture equally well in all its depth, it can only focus on a certain specific range. Thus, we see the clearest objects on which we are in this moment are in focus, and nearer and far objects appear slightly blurred. The brain has information about how far the eyes are currently focused. Thus, by focusing our gaze at different distances, we are able, as it were, to scan the entire space in all its depth.

Close objects obscure distant ones. This obvious effect, although it seems very simple, nevertheless makes a great contribution to the construction of a three-dimensional picture. After all, there is nothing easier than to understand that one object is further than another, if it is partially closed by them.

Image explaining point #3.

Image explaining point #4.

Image explaining point #5.

Image explaining item #6.

After we have talked about all the effects on the basis of which our vision allows us to perceive a three-dimensional picture, we can also make one small remark about 3D cinema.

The fact is that in any movie all the above effects are used, except for the very first one - “binocular vision”. Well, in 3D cinema, due to special technologies, binocularity is also added. When watching movies in 3D, the glasses form a slightly different image for each eye.

However, it should be noted that this does not significantly improve the picture. After all, as already mentioned, with the help of one eye, having a lot of life experience, you can actually see the entire depth of the picture without loss of quality (due to the other six effects used in any movie).

In addition, binocular vision is useful at short distances, and in films we often see wide scenes rather than viewing small objects at close range, so this effect is often not noticeable at all.