What are H.264 and H.265?

What are H.264 and H.265?

H2.64 is the tenth part of MPEG-4. It is a highly compressed digital video codec standard proposed by the Joint Video Team (JVT, Joint Video Tean) jointly formed by the ITU-T Video Coding Experts Group (VCEG) and the SO/IEC Moving Picture Experts Group (MPEG). This standard is often referred to as H.264/AVC (or AVC/H.264 or H.264/MPEG-4 AVC or MPEG-4/H.264AVC) and explicitly states its developers on both sides. The main parts of the H.264 standard are Access Unit delimiter (access unit separator) SEI (additional enhanced information), Primary Coded Picture (basic image coding), Redundant Coded picture (redundant image coding). There are also nstantaneous Decoding Refresh (DR instant decoding refresh), Hypothetical Reference Decoder (HRD, hypothetical reference decoding), Hypothetical Stream Scheduler (HSS, hypothetical stream scheduler). But now H.264 is gradually being replaced by H.265.

In August 2012, Ericsson introduced the first H.265 codec. Six months later, the International Telecommunications Union (ITU) officially approved the HEVC/H265 standard, which is called High Efficiency Video Coding, which is a considerable improvement over the previous H.264 standard. Huawei has the most core patents and is the leader of this standard. H.265 is designed to transmit higher-quality network video under limited bandwidth, and only needs half of the original bandwidth to play the same quality video. The H.265 standard also supports both 4K (4096×2160) and 8K (8192×4320) Ultra HD video. The coding architecture of H.265/HEVC is roughly similar to that of H.264/AVC, including Intra Prediction, Inter Prediction, Transform, and Quantization, Deblocking Filter, Entropy Coding and other modules. But in the HEVC coding architecture, the whole is divided into three basic units, namely the coding unit (Coding Unit, CU), the prediction unit (Predict Unit, PU) and the transformation unit (Transform Unit, TU).

H.265 is a new video coding standard formulated by ITU-T VCEG after H.264. The H.265 standard revolves around the existing video coding standard H.264, retaining some of the original technologies and improving some related technologies at the same time. The new technology uses advanced techniques to improve the relationship between code stream, encoding quality, delay and algorithm complexity to achieve optimal settings. H.264 can achieve standard definition digital image transmission at a speed lower than 1Mbt/s due to algorithm optimization; H.265 can realize the transmission of 720P (resolution 1280×720) ordinary high-definition audio and video transmission at a transmission speed of 1-2 Mbit/s.

What are Session Initiation Protocol, Media Server and Wide Dynamic Technology?

What are Session Initiation Protocol, Media Server and Wide Dynamic Technology?

Session Initiation Protocol, SIP

Session Initiation Protocol SIP: The Session Initiation Protocol is formulated by the Internet Engineering Task Force (lETF: Internet Engineering Task Force) and is a framework protocol for multi-party multimedia communication. It is a text-based application layer control protocol, independent of the underlying transport protocol, used to establish, modify and terminate two- or multi-party multimedia sessions on the P network.

This protocol is often used in the video networking platforms of Safe City and Xueliang Engineering.

Media Server, MS

Media Server is mostly used in large-scale video networking projects. Provide real-time media stream forwarding services, coal storage, historical media information retrieval and on-demand services. The media server receives media data from devices such as SIP devices, gateways or other media servers, and forwards the data to other single or multiple SIP clients and media servers according to instructions.

Wide Dynamic Technology, WDR

Wide dynamic technology (WDR): The so-called dynamic refers to the dynamic range, which refers to the range of change of a certain characteristic that can be changed. For the camera, its dynamic range refers to the camera’s ability to adapt to the light illumination in the shooting scene. Quantify its index and express it in decibels (dB). For example, the dynamic range of an ordinary CCD camera is 3dB, and the wide dynamic range can generally reach 80dB, and the good one can reach 100dB. Even so, compared with the human eye, it is still much worse. The dynamic range of the human eye can reach 1000dB, and the more advanced is that the eagle’s vision is 3.6 times that of the human eye.

So what is the concept of super wide dynamic and super wide dynamic? In fact, this is all artificial. Some manufacturers add a super in order to distinguish it from other manufacturers or to show their own wide dynamic effect. In fact, there are only so-called first and second generation differences. In order to improve the dynamic range of their own cameras, early camera manufacturers adopted the practice of double exposure imaging and then superimposed output. First expose the brighter background quickly to get a relatively clear background, and then slowly expose the object to get a relatively clear object, and then output the two images superimposed in the video memory. Doing so has an inherent disadvantage: one is that the camera has a delay in output, and there is serious smearing when shooting fast-moving objects. The other is that the sharpness is still not enough, especially when the background illumination is very strong and the contrast between the object and the background is large, it is difficult to image clearly.

Wide dynamic range was especially popular in early analog systems and digital systems. It was regarded as an important product selling point in the early days. In the A era, this technology has not been eliminated.

What are the common graphic (image) formats?

What are the common graphic (image) formats?

Generally speaking, the current graphics (image) formats can be roughly divided into two categories: one is bitmap; the other is called drawing class, vector class or object-oriented graphics (image). The former describes graphics (images) in the form of lattices, and the latter describes graphics (images) composed of geometric elements mathematically. Generally speaking, the latter expresses images in a detailed and realistic manner, and the resolution of the graphics (images) after scaling remains unchanged, and is widely used in professional-level graphics (images).

Before introducing the graphics (image) format, it is necessary to understand some related technical indicators of graphics (images): resolution, number of colors, and grayscale of graphics.

Resolution: divided into screen resolution and output resolution, the former is expressed by the number of lines per inch, the larger the value, the better the quality of the graphics (image); the latter is the precision of the impulse output device, expressed by the number of pixels per inch;

Color number and graphic grayscale: expressed in bits, generally written as 2 to the nth power, where n represents the number of bits. When the graphics (image) reaches 24 bits, it can express 16.77 million colors, that is, true color. Grayscale representation class. Let’s learn about the current common graphic file formats one by one through the characteristic suffix name of the graphic file (that is, as shown in Figure .bmp): BMP, DIB, PCP, DIF, WMF, GIF, JPG, TIF, EPS, PSD, CDR, IFF, TGA, PCD, MPT.

BMP (bit map picture): the most commonly used bitmap format on PC has two forms, compressed and uncompressed. This format can express colors from 2-bit to 24-bit, and the resolution can also be from 480×320 to 1024×768. This format is quite stable in the Window environment and is widely used in occasions where the file size is not limited.

DIB (device independent bitmap): The ability to describe images is basically the same as that of BMP, and it can run on a variety of hardware platforms, but the file size is larger.

PCP (PC paintbrush): A compressed and disk space-saving PC bitmap format created by Zsoft, which can represent up to 24-bit graphics (images). There was a certain market in the past, but with the rise of JPEG, its status has gradually declined.

DIF (drawing interchange format): a graphic file in AutoCAD, which stores graphics in ASCII mode, and shows that the graphics are very accurate in size and can be edited by large software such as CorelDraw and 3Ds.

WMF (Windows metafile format): Microsoft windows metafile, which has the characteristics of short file and pattern modeling. Graphics of this type are crude and can only be edited in Microsoft Office.

GIF (graphics interchange format): A compressed graphics format that can be processed by various graphics processing software on various platforms. The disadvantage is that it can only store up to 256 colors.

JPG (joint photographic expert group): A graphic format that can greatly compress graphic files. For the same picture, the files stored in JPG format are 1/10-1/20 of other types of graphic files, and the number of colors can reach up to 24 bits, so it is widely used in homepages on the Internet or picture libraries on the Internet.

TIF (tagged image file format): The file size is huge, but the amount of stored information is also huge, and there are more subtle-level information, which is conducive to the reproduction of the tone and color of the original. The format has two forms, compressed and uncompressed, and the maximum number of supported colors can reach 16M.

EPS (encapsulated PostScript): An ASCII graphic file described in the PostScript language, which can print high-quality graphics (images) on a Postscript graphics printer, and can represent up to 32-bit graphics (images). The format is divided into Photoshop EPS format adobe illustrator EPS and standard EPS format, which can be divided into graphic format and image format.

PSD (Photoshop standard): The standard file format in Photoshop, a format optimized for toshop.

CDR (CorelDraw): The file format of CorelDraw. In addition, CDX is a graphics (image) file that can be used by all Coreldraw applications, and is a mature CDR file.

IF (image file format): It is used for large-scale super graphics processing platforms, such as AMIGA machines, and Hollywood special effects blockbusters are mostly processed in this IF format. Shape (image) effects, including color texture and other realistic reproduction of the original scene. Of course, the computer resources such as memory and external memory consumed by this format are also huge.

TGA (tagged graphic): It is a graphic file format developed by True vision for its display card at an earlier time, and the maximum color number can reach 32 bits. VDA, PIX, WIN, BPX, ICB, etc. belong to its collateral.

 What are “full-duplex” and “half-duplex”, “brightness”, “hue” and “saturation”, and search for pictures by picture?

 What are "full-duplex" and "half-duplex", "brightness", "hue" and "saturation", and search for pictures by picture?

What are “full duplex” and “half duplex”

Full-duplex: can send and receive at the same time. Full-duplex requirements: There are separate channels for receiving and sending, which can be used to realize communication between two stations, star network, ring network, and cannot be used for bus network.

Half-duplex: It is impossible to send and receive at the same time, and the sending and receiving are time-divisional. Half-duplex requirements: transceivers can share the same channel, and can be used in local area networks of various topologies, most commonly used in bus networks. The half-duplex data rate is theoretically half of full-duplex.

What are “brightness”, “hue” and “saturation”

As long as color can be described by brightness, hue and saturation, any colored light seen by the human eye is the combined effect of these three characteristics. So what do brightness, hue, and saturation mean?

Brightness: It is the feeling of brightness caused by light acting on the human eye, which is related to the luminous intensity of the observed object.

Hue: It is the color feeling produced when the human eye sees light of one or more wavelengths. It reflects the class of color and is the basic characteristic that determines color. For example, red and brown refer to hue.

Saturation: refers to the purity of the color, that is, the degree to which white light is incorporated, or the depth of the color

For colored light of the same hue, the darker the saturation, the more vivid or pure the color. Hue and saturation are commonly referred to as chroma.

It can be seen that luminance is used to indicate the brightness of a certain color light, while chromaticity indicates the type and depth of color. In addition, the various colors of light commonly found in nature can be made by matching red (R), green (G), and blue (B) colors in different proportions; similarly, the vast majority of color light can also be decomposed into three colors of red, green and blue, which forms the most basic principle in chromaticity, the principle of three primary colors (RGB).

Search for pictures by picture

Search for pictures by picture has become the basic function of intelligent video surveillance system. Search by image is a professional search engine system that provides users with relevant graphic image data retrieval services in the video surveillance system or on the Internet by searching for image text or visual features. It is a subdivision of search engines. Search by entering keywords that are similar to the image name or content, and search by uploading images or image URLs that are similar to the search results.

Broadly speaking, image features include text-based features (such as keywords, annotations, etc.) and visual features (such as color, logo, texture, shape, etc.). Visual features can be further divided into general visual features and domain-related (locally specific) visual features. The former is used to describe the features common to all images, regardless of the specific type or inner core of the image, mainly including color, texture and shape; the latter is based on some prior knowledge (or assumptions) about the content of the described image, and is closely related to specific applications, such as human facial features or vehicle license plates or vehicle characteristics.

Searching for images by image has been used as a basic function of the A application. By providing a global or local feature, such as a photo of a vehicle, a license plate, a face, a body feature, etc., the user can quickly perform surveillance retrieval from the video image information database.

What are “black level”, “white level” and signal-to-noise ratio?

What are "black level", "white level" and signal-to-noise ratio?

What is “Black Level” and “White Level”

Black level: Define the corresponding signal level when the image data is 0. Adjusting the black level does not affect the amplification of the signal, but only translates the signal up and down. If you adjust the black level up, the image will be darker, if you adjust the black level down, the image will be brighter. When the black level of the camera is 0, the corresponding level below 0V is converted into image data 0, and the level above 0V is converted according to the magnification defined by the gain, and the maximum value is 255. Black level (also called absolute black level) setting, which is the lowest point of black. The so-called black lowest point is the electron beam energy emitted from the CRT picture tube. When the energy of the electron beam is lower than the basic energy that makes the phosphor (fluorescent substance) start to emit light, the black at the lowest position is displayed on the screen. The US NTSC color TV system positions the absolute black level at 7.5IRE, that is, signals below 7.5IRE will be displayed as black, while the Japanese TV system positions the absolute black level at the OIRE white level.

The white level corresponds to the black level, which defines the corresponding signal level when the image data is 255. The difference between it and the black level defines the gain from another angle. In quite a few applications the user cannot see the white level adjustment because the white level is fixed in the hardware circuit.

What is the signal to noise ratio

Signal-to-noise ratio (S/N, Signal/Noise) refers to the ratio between the signal strength of the maximum undistorted sound produced by the sound source and the noise strength at the same time, which is called the signal-to-noise ratio. That is, the ratio of useful signal power (Signal) to noise power (Noise) is referred to as signal-to-noise ratio (Signal/Noise), usually expressed in S/N, and the unit is decibel (dB). This calculation method is also applicable to image systems.

The ratio of the maximum fidelity output of a signal to unavoidable electronic noise in dB. The larger the value, the better. Below the index of 75dB, noise may be found in silence. In general, the signal-to-noise ratio of a sound card is often unsatisfactory due to too much high frequency interference in a computer.

The signal-to-noise ratio of the image captured by the camera and the sharpness of the image are both important indicators to measure the quality of the image. The image signal-to-noise ratio refers to the ratio of the size of the video signal to the size of the noise signal. The two are generated at the same time and cannot be separated. The noise signal is a useless signal, and its existence has an influence on the useful signal, but it cannot be separated from the video signal. Therefore, when choosing a camera, it is enough to select some useful signals that are relatively larger than the noise signals to a certain extent, so the ratio of the two is taken as the standard of measurement. If the signal-to-noise ratio of the image is large, the picture of the image will be clean, and there will be no noise interference (the main picture has snowflakes), and people will look very comfortable; if the signal-to-noise ratio of the image is small, the picture will be full of snowflakes, which will affect the normal viewing effect.

What is “line”, “progressive” and “interlaced”, illuminance/sensitivity and IRE?

What is "line", "progressive" and "interlaced", illuminance/sensitivity and IRE?

“Line”, “progressive” and “interlaced”

In traditional CRT analog TV, the scan of an electron beam in the horizontal direction is called “line”, or “line scan”.

Each frame of the TV is composed of several horizontal scanning lines. The PAL system is 625 lines/frame, and the NTSC system is 525 lines/frame. If all the lines in this frame are continuously completed from top to bottom line by line, or the scanning sequence is 1, 2, 3, …, 525, this scanning method is called progressive scanning.

In fact, one frame of ordinary TV needs to be completed by two scans. The first pass scans only odd-numbered lines, that is, lines 1, 3, 5, …, 525, and the second pass scans only even-numbered lines, that is, lines 2, 4, 6, …, 524. This scanning method is interlaced scanning. A picture containing only odd or even lines is called a “field”. The field containing only odd lines is called “odd field” or “top field”, and the field containing only even lines is called “even field” or “bottom field”. That is, an odd field plus an even field equals one “frame” (one image).

Illumination/sensitivity

Illuminance is a unit that reflects light intensity. Its physical meaning is the luminous flux irradiated on a unit area. The unit of illuminance is the number of lumens (Lm) per square meter, also called Lux: 1Lux=1Lm/square meter. In the above formula, Lm is the unit of luminous flux, which is defined as the amount of light radiated by pure platinum at the melting temperature (about 1770 ° C), its surface area of 1/160m2 within a solid angle of 1 steradian.

In order to have a perceptual understanding of the amount of illumination, let’s take an example to calculate. A 100W incandescent lamp has a total luminous flux of about 1200Lm. If it is assumed that the luminous flux is evenly distributed on the hemisphere, the illuminance values at 1m and 5m away from the light source can be obtained according to the following steps: the area of a hemisphere with a radius of 1m is 2π×12=6.28m2, and the illuminance value at a distance of 1m from the light source is: 1200Lm/6.28m2=191Lux; similarly, the area of a hemisphere with a radius of 5m is: 2π×52=157m2, and the illuminance value at a distance of 5m from the light source is: 1200m/157m2=7.64Lux. It can be seen that the illuminance emitted from the point light source obeys the inverse square law.

1Lux is approximately equal to the illuminance of 1 candle at a distance of 1m. The minimum illuminance (Minimum Illumination) common in the camera parameter specification means that the camera can obtain a clear image only under the marked Lux value. The smaller the value, the better, indicating that the sensitivity of the CCD is higher. Under the same conditions, the illuminance required by a black-and-white camera is much less than 10 times lower than that of a color camera that still has to deal with color intensity.

What is IRE

IRE is the abbreviation of Institute of Radio Engineers. The video signal unit formulated by this institution is called IRE. Now, the IRE value is often used to represent different picture brightness. For example, 10IRE is darker than 20IRE, and the brightest level is 100IRE. So, what’s the difference between setting the absolute black level to 0IRE and 7.5IRE? Due to the limited performance of the early monitors, in fact, the areas where the brightness is lower than 7.5IRE on the screen basically cannot display the details, and it looks like black. By setting the black level to 7.IRE, some signal components can be removed, thereby simplifying the circuit structure to a certain extent. However, the performance of modern monitors has been greatly improved, and the details of the dark parts can be displayed well. At this time, setting the black level to OIRE can reproduce the picture perfectly.

What are “PAL format” and “NTSC format”, “field” and “frame”?

What are PAL format and NTSC format, field and frame

“PAL” and “NTSC”

Although the issue of “standard” is not mentioned much now, it is a very important concept in the era of analog video surveillance, just like the basic standard of whether a motor vehicle runs on the left or on the right.

PAL (Phase Alternating Line) is a TV system established in 1965 and is mainly used in China, Hong Kong, the Middle East and Europe. The color bandwidth of this format is 4.43Mz, the audio bandwidth is 6.5MHz, and the picture is 25 frames per second.

The NTSC (National Television System Committee, National Television System Committee) format is a color television broadcasting standard formulated by the National Television Development Committee of the United States in 1952. The United States, Canada, as well as China Taiwan, South Korea, the Philippines and other countries use this format. The color bandwidth of this system is 3.58MHz, the audio bandwidth is 6.0MHz, and the picture is 30 frames per second.

The reason why the NTSC system is 30 frames per second and the PAL system is 25 frames per second is because the mains electricity in the countries where NTSC is adopted is 110V/60Hz, so the field frequency signal in the TV directly samples the frequency of the AC power supply at 60Hz. Because two fields make up one frame, 60 divided by 2 equals 30, which is exactly the number of frames of the TV, and China’s mains electricity is 220V/50Hz, so the reason is the same as the above is 25 frames per second.

“Field” and “Frame”

In traditional CRT analog TV, a line scan, scanning in the vertical direction is called “field”, or “field scan”. Each TV frame is produced by scanning the screen twice, with the lines of the second scan filling the gaps left by the first scan. So a TV picture of 25 frames/s is actually 50 fields/s (30 frames/s and 60 fields/s respectively for NTSC).

The idea of “frame” comes from the early movies, a still image is called a “frame” (Frame). The picture in the film is 25 frames per second, because the persistence of vision of the human eye just meets the standard of 25 frames per second. Generally speaking, the number of frames, simply put, is the number of frames of pictures transmitted in 1s. It can also be understood that the graphics processor can refresh several times per second, usually expressed in FPS (Frames Per Second). Each frame is a still image, and displaying frames in rapid succession creates the illusion of motion. Higher frame rates result in smoother, more realistic animations. The more frames per second (fps), the smoother the motion displayed.

When a computer plays a video on a monitor, it just displays a series of full frames, without the TV trick of interleaving fields. So neither video formats nor MPEG compression techniques designed for computer monitors use fields. Traditional analog systems use a CRT monitor (similar to a TV) for monitoring, which involves “fields” and “frames.” The digital system uses LCD or a more advanced display (similar to a computer display) to process images using computer technology, so it only involves “frames”, which is also the difference between a digital monitoring system and an analog monitoring system.

Even in the era of artificial intelligence, “frame” is still a very important concept, and how to extract effective “frames” in continuous pictures is crucial. When extracting features of the same face, license plate, human body, and vehicle, how to avoid repeated extraction and extract the clearest picture lies in the “frame extraction” technology.

Four eras of video surveillance system development

Four eras of video surveillance system development
Four eras of video surveillance system development
Four eras of video surveillance system development

Since 2017, people have clearly felt that they are in the fourth industrial revolution. The core of the revolution is artificial intelligence. No AI is not safe, no AI is not smart, and no AI is not video. Artificial intelligence has been developed for more than 70 years, and the real historical impact on the development of video surveillance systems was in 2017. 2017 can be called the first year of this era. The commercial mature application of face recognition technology is the detonation of the birth of this era. point.

Summarizing the development history of video surveillance for more than 60 years, the development of video surveillance can be roughly divided into four eras: the analog era, the digital era, the intelligent era, and the data era.

①The simulation era (1957-2004)
3 see: invisible, unclear, unintelligible
There are three characteristics of the simulation era:
Invisible. Limited by many factors such as cost, technology, etc., many places where monitoring equipment should be installed in the simulation era are not installed, which makes it invisible.
Can’t see clearly. The resolution of analog surveillance is based on TV. 380TVL, 420TVL, 480TVL, and 540TVL are all mainstream resolutions, which are far below the standard of high-definition surveillance. It is more obvious after the analog conversion (the lossy compression of DVR hard disk video recorder is especially obvious).
can not read. The analog video signal has almost no analysis and intelligent functions, and usually realizes the motion detection function of the picture (full picture or partial area).
The main functions of video surveillance in the analog era are surveillance, recording and playback of recordings.

②Digital era (2004-2017)
4 Full: full domain coverage, full network sharing, full-time availability, full-process controllable
Thanks to the advancement of network technology and IT technology, the digital era has improved some of the shortcomings of the analog era, with the following four characteristics:
Global coverage. The video surveillance coverage of key public areas will reach 100%, and the proportion of newly built and rebuilt HD cameras will reach 100%; the video surveillance coverage of important parts of key industries and fields will reach 100%, and the number of new and rebuilt HD cameras will be gradually increased.
Shared across the network. The video surveillance networking rate in key public areas reaches 100%; the video image resource networking rate in key industries and fields involving public areas reaches 100%.
Available all time. The integrity rate of video surveillance cameras installed in key public areas reached 98%, and the integrity rate of video surveillance cameras installed in key industries and fields involving public areas reached 95%, realizing the all-weather application of video image information.
The whole process is controllable. The hierarchical security system for networked applications of public security video surveillance systems has been basically completed, so that important video image information is not out of control, and sensitive video image information is not leaked.
The digital age has expanded large-scale cluster applications, emphasizing coverage and large-scale networking applications, making each “closed-circuit” surveillance system a powerful video surveillance resource network.

③The era of intelligence (beginning in 2017)
3 See: can see, see clearly, understand
Thanks to the development of artificial intelligence technology, video surveillance has undergone qualitative changes after entering the intelligent era:
visible. In the case of global coverage, video surveillance is basically covered without blind spots to ensure that video can be seen in every place that needs to be monitored. Video surveillance is like human eyes to the city, otherwise there will be management blind spots.
Can see clearly. At present, the cameras used in mainstream applications are all networked, with pixels up to 2 million, 3 million, resolutions up to 1080P or 4K, or even higher resolution (such as 8K), and there are also cameras with large viewing angles such as panoramic and fisheye to ensure The picture you see is high-definition, wide, and can be used for intelligent analysis.
understand. Thanks to the development of computer perspective technology such as license plate, vehicle, face, human body, and object feature recognition in the past two years, the computer can automatically recognize this video or image with functions similar to the human eye, and has human brain intelligence.
In the intelligent era, video surveillance already possesses part of the functions of the human brain to realize video analysis and intelligent applications.

④The era of data (starting in 2018)
4 Full: panoramic data, full data, global data, holographic data
Cloud computing and big data are no longer fashionable vocabulary, they have penetrated into all aspects of social governance. After unstructured video image data is structured, big video image data can be formed. These data can be divided into four categories:
Panorama data. Contains data such as people, cars, things, mobile phones, access control, WiFi, IoT perception, maps, addresses, house numbers, grids, population, houses, units, urban parts, etc. in the spatial dimension.
Full amount of data. On the basis of panoramic data, it includes time dimension, full-time and spatial data, including trajectory, activity, event and other data.
Global data. The association between data is built on panoramic data, which belongs to multi-dimensional association information, which is collected from multiple channels, multiple perspectives, and multiple sides. The model that contains all the information of the system, realizes the association, collision and multi-dimensional perception of data.
Holographic data. The global data and video images are merged to produce three-dimensional, multi-dimensional, and interconnected full-time and spatial data. Typical applications include 3D holographic projection, virtual display VR, and enhanced display AR.

The Industrial Revolution

The Industrial Revolution

There are four recognized industrial revolutions, and we are currently in the fourth industrial revolution. The seemingly unrelated fourth industrial revolution has truly affected the progress of video surveillance systems toward intelligence.

The first industrial revolution refers to the technological revolution initiated in Britain in the 1860s. It was a huge revolution in the history of technological development. It opened the era of replacing manual labor with machines. This is not only a technological reform, but also a profound social change. The first industrial revolution revolution started with the birth of working machines, marked by the widespread use of steam engines as power machines. This technological revolution and the related changes in social relations are called the first industrial revolution or industrial revolution. The first industrial revolution allowed the factory system to replace manual workshops and machines to replace manual labor: in terms of social relations, the Industrial Revolution eliminated the self-cultivating class attached to backward production methods, and the industrial bourgeoisie and industrial proletariat formed and grew. stand up.

The second industrial revolution refers to the completion of bourgeois revolutions or reforms in European countries, the United States, and Japan in the mid-nineteenth century, which promoted economic development. In the late 1860s, the second industrial revolution began. Mankind has entered the “Electrical Age”. The second industrial revolution greatly promoted the development of social productive forces and had a profound impact on the economy, politics, culture, military, science and technology, and productivity of human society. The socialization of capitalist production has been greatly strengthened, and monopoly organizations have emerged. The second industrial revolution made the capitalist countries uneven in their economic, cultural, political, military and other aspects. The imperialist struggles for market economy and world hegemony became more intense. The second industrial revolution promoted the formation of the world colonial system, and finally established the capitalist world system, and the world gradually became a whole.

The third industrial revolution is another major leap in the field of science and technology after the steam technology revolution and the power technology revolution in the history of human civilization. The third scientific and technological revolution is marked by the invention and application of atomic energy, electronic computers, space technology, and biological engineering. Information control technology revolution. The third scientific and technological revolution not only greatly promoted the transformation of human society, economy, politics, and culture, but also affected the way of life and thinking of mankind. With the continuous advancement of science and technology, Major changes have also taken place in all aspects of daily life. The third scientific and technological revolution aggravated the uneven development of capitalist countries and brought about new changes in the international status of capitalist countries. In the struggle of socialist countries against Western capitalist countries, the gap between the rich and the poor gradually widened. Promoted changes in social production relations worldwide.

The fourth industrial revolution is a new technological revolution. A new technological revolution based on artificial intelligence, clean energy, robotics, quantum information technology, virtual reality, augmented reality, and biotechnology. In fact, the fourth industrial revolution was also called Industry 4.0, which was first proposed by Germany. Opening the door to the future with artificial intelligence is exactly what this book will discuss in detail, especially its impact on video surveillance systems.

Video surveillance system development history

Video surveillance system development history

In 1951, some companies, mainly American Radio Corporation, began to develop video recorders and video tapes. In December 1953, the Pentax Roussby Institute took the lead in launching the color multi-track video tape and its playback system by using the multi-track method, but the broadcast picture was relatively blurry and could not be put into use immediately. In April 1956, the American Ampere Company took the lead in developing the world’s first practical commercial video tape recorder and named it “Ampere VRX-1000”.

In 1957, Panasonic developed the first electronic picture tube camera, and then introduced the 1-inch camera WV-010 in 1962, the three-color tube camera in 1970, the CCD color camera in 1985, and the WJ-FS50 video recorder in 1992. In 2001, the WJ-HD500 16-channel hard disk video recorder was launched. For nearly 40 years, Panasonic has been at the forefront of the security monitoring technology revolution, with super dynamic technology, automatic dark zone compensation technology, automatic rear focus adjustment technology, and lock tracking technology. , Intelligent analysis technology has led the development of the industry all the way. When it comes to video surveillance, there is another company that cannot be avoided that is Sony, and Sony’s core influence in the field of video surveillance comes from CCD.

In 1969, the Bell Telephone Research Institute in the United States invented the CCD. It is a magician who converts “light” information into “electric” information. In the Sony development team at that time, a young man named Yue Zhi Chengzhi was very interested in CCD and began his research on CCD. However, because this research is still far from commercialization, the more intelligent it becomes, the more it can only do research silently alone. In 1973, Iwama, the then-Vice President of Sony Corporation, a uniquely discerning operator discovered Ochi’s research and said with excitement: “This should be the subject of Sony’s semiconductor department! Okay, let’s do it. Cultivate this seedling!” At that time, Ochi only realized a rough “S” drawn with 64 pixels. However, Iwama put a puzzling sentence on Yue Zhi: “Use CCD to make a camera.” After that, the booming surveillance industry was born.

In November 1973, CCD finally set up a project in Sony, and established a development team centered on Yuezhi. In March 1978, Sony manufactured an integrated block that was considered “impossible” with 110,000 components on a circuit board, creating the world’s first CCD color camera. In 1985, the first 8mm camera “CCD-V8” was born. Afterwards, the development and launch of CCD were on track. Sony introduced HAD sensors in the early 1980s, ON-CHIP MICRO LENS in the late 1980s, SUPER HAD CCDs in the mid-1990s, NEW STRUCTURE CCDs in 1998, and EXVIEW HAD CCDs in 1999. CCD has also been widely used in the field of video surveillance.

CMOS image sensors and CCD image sensors appeared almost simultaneously in the 1960s. After 1990, the passive pixel CMOS image sensor entered the market as the first-generation CMOS image sensor, and then gradually became the mainstream sensor solution for cameras. This is something later.

In 1996, Axis launched the world’s first network camera. In 2008, Axis, together with Bosch and Sony, announced the establishment of ONVIF (Open Network Video Interface Forum), and jointly formulated open industry standards based on the principles of openness and openness, which promoted the development and popularization of network video surveillance systems. In October 2011, GoTV released the first video summary system; in May 2016, Jiadu Technology launched the “Video Cloud +” large application platform V1.0; in March 2018, Yuncong Technology officially released the high-performance AI camera “torch eye” smart face recognition camera.

When it comes to the development history of video surveillance, there are two other companies that have come from behind to mention, that is, China’s Hikvision and Dahua. According to the Nikkei (Chinese version: Nikkei Chinese website), the 2017 “Survey on Global Market Share of Major Goods and Services” was compiled. Among the 71 categories included in the survey, the products of Chinese companies have the largest market share in 9 categories. Among them, the number one camera, Hikvision, has a share of 31.3%, and the second number Dahua has a share of 11.8%. The two companies have a combined share of 43.1%, which means that almost half of the world’s cameras are owned by these two companies. made.

Hikvision was founded in 2001. In 2002, it released the DS-4000M video and audio compression card and DS-8000 network hard disk video recorder, which started Hikvision’s video surveillance journey. This journey lasted 16 years. After that, in 2003, the industry first implemented the H.264 algorithm on the DSP, and based on the H.264 algorithm, it was the first to launch the DS-4000H board with independent intellectual property rights and the network hard disk video recorder DS-8000M, opening a new era of H.264. In 2007, the first infrared barrel series camera was launched. With this camera, it successfully broke into the Chinese camera market and quickly became one of the mainstream infrared camera brands in the market, realizing the leap from DVR to camera. In 2009, the Public Security Industry Solutions Business Department was formally established, and the iVMS-8200 Safe City Comprehensive Application Management Platform product was released, successfully transforming from a product supplier to an overall solution provider. In 2012, the launch of the first civilian product-“Xiaoweishi” marked the beginning of the industry application of video surveillance to small and micro enterprises and home micro-video applications. In 2015, based on the concept of “Smart Security 2.0”, cross-border products represented by industrial cameras and drones were released to promote security from safety to efficiency to benefit. In 2017, Haikang released the AI ​​Cloud cloud edge integration strategy to march towards AI and promote the development of the industry.

Dahua Co., Ltd. was founded in 1993 and established a joint-stock company in 2001. In the early days, Dahua mainly produced hard disk video recorders. At that time, the DVR products on the market were all board-type, and all the core technologies depended on imports. Not only was the price high, ordinary consumers could not afford it, but also caused the long-term impact of national enterprises. suppress. In 2002, Dahua launched the industry’s first self-developed 8-channel audio and video synchronous embedded DVR, creating the first brand of embedded DVR, and making security begin to bloom everywhere. Since then, Dahua has continued to focus on video surveillance technology and has gradually grown to become the second largest company in the world in terms of camera market share. There are several important milestones in the development history of Dahua: Going overseas in 2003, launching the intelligent transportation all-in-one machine in 2007, becoming the de facto standard of China’s electronic police, focusing on image processing technology and network technology in 2008, and introducing CMOS in 2010 Technology and launched the high-definition high-power movement. In 2011, the concept of “cloud” was introduced. In 2012, HDCVI technology was adopted by the HDcctv alliance as the first international standard in China’s security industry. In 2014, the Le Orange brand was established, entered the civilian market, and established as the first in the United States. An overseas branch company, established an artificial intelligence research institute in 2016, and launched a cloud ecology and smart future strategy of “full intelligence, full computing, full perception, and full ecology”. It is expected that in the future, with video capabilities as the core and artificial intelligence as the backing, it will provide smart IoT solutions and operational services, and actively deploy smart video surveillance systems.