Author Topic: Novel Object Removal in Video Using Patch Sparsity  (Read 2224 times)

0 Members and 1 Guest are viewing this topic.


  • Newbie
  • *
  • Posts: 48
  • Karma: +0/-0
    • View Profile
Novel Object Removal in Video Using Patch Sparsity
« on: April 23, 2011, 02:05:40 pm »
Author : B. Vidhya, S. Valarmathy
International Journal of Scientific & Engineering Research, IJSER - Volume 2, Issue 4, April-2011
ISSN 2229-5518
Download Full Paper -

Abstract The process of repairing the damaged area or to remove the specific areas in a video is known as video inpainting. To deal with this kind of problems, not only a robust image inpainting algorithm is used, but also a technique of structure generation is used to fill-in the missing parts of a video sequence taken from a static camera. Most of the automatic techniques of video inpainting are computationally intensive and unable to repair large holes. To overcome this problem, inpainting method is extended by incorporating the sparsity of natural image patches in the spatio-temporal domain is proposed in this paper. First, the video is converted into individual image frames. Second, the edges of the object to be removed are identified by the SOBEL edge detection method. Third, the inpainting procedure is performed separately for each time frame of the images. Next, the inpainted image frames are displayed in a sequence, to appear as a inpainted video. For each image frame, the confidence of a patch located at the image structure (e.g., the corner or edge) is measured by the sparseness of its nonzero similarities to the neighboring patches to calculate the patch structure sparsity. The patch with larger structure sparsity is assigned higher priority for further inpainting. The patch to be inpainted is represented by the sparse linear combination of candidate patches. Patch propagation is performed automatically by the algorithm by inwardly propagating the image patches from the source region into the interior of the target region by means of patch by patch. Compared to other methods of inpainting, a better discrimination of texture and structure is obtained by the structure sparsity and also sharp inpainted regions are obtained by the patch sparse representation. This work can be extended to wide areas of applications, including video special effects and restoration and enhancement of damaged videos.

Index Terms Candidate patches, edge detection, inpainting, linear sparse representation, patch sparsity, patch propagation, texture synthesis.

Video inpainting is the process of filling - in of the missing regions in a video. Inpainting technique is the modification of the images in an undetectable form. Nowadays video has become an important media of communication in the world. Image inpainting is  performed in the spatial domain where as video inpainting is performed in the spatio-temporal domain.

Video inpainting also plays a vital role in the field of image processing and computer vision as that of the image inpainting. There are numerous goals and applications of the video inpainting technique from the restoration of damaged videos and paintings to the removal or replacement of the selected objects in the video.Video inpainting is used to remove objects or restore missing or tainted regions present in a video by utilizing spatial and temporal information from neighbouring scenes. The overriding objective is to generate an inpainted area that is merged seamlessly into the video so that visual coherence is maintained throughout and no distortion in the affected area is observable to the human eye when the video is played as a sequence.

Video is considered to be the display of sequence of framed images. Normally twenty five frames per second are considered as a video. Less than twenty five frames per second will not be considered as a video since the display of those will appear as a flash of still image for the human eye. The main difference between the video and image inpainting methods using texture synthesis is in the size and characteristics of the region to be inpainted. For texture synthesis, the region can be much larger with the main focus being the filling in of two-dimensional repeating patterns that have some associated stochasticity i.e. textures. In contrast, inpainting algorithms concentrate on the filling in of much smaller regions that are characterised by linear structures such as lines and object contours. Criminisi et al. [3] presented a single algorithm to work with both of these cases provided both textures and structures are present in images.

Nowadays, various researches are performed in the field of video inpainting due to the varied and important applications of an automatic means of video inpainting. The main applications include undesired object removal such as removing unavoidable objects like birds or aeroplane that appear during filming, to censor an obscene gesture or action that is not deemed appropriate for the target audience, but for which it would be infeasible or expensive to re-shoot the scene.

Another main application is the restoration of the videos that are damaged by scratches or dust spots or frames that may be corrupted or missing. When the video is transmitted over unreliable networks, there is a possibility of losing significant portions of video frames. To view the video again in its original form, it is necessary to repair these damaged scenes in a manner that is visually coherent to the viewer.

Initially, all the above applications were performed   manually by the restoration professionals which were painstaking, slow and also very expensive. Therefore   automatic video restoration certainly attracted both commercial organizations (such as broadcasters and film studios) and private individuals that wish to edit and maintain the quality of their video collection.
Bertalmio et al. [1], [2] designed frame-by-frame PDEs based video inpainting which laid platform for all researches in the field of video inpainting. Partial Differential Equation (PDE) based methods are mainly edge-continuing methods. In [2], the PDE is applied spatially, and the video inpainting is completed frame-by- frame. The temporal information of the video is not considered in the inpainting process.

Wexler et al [6] proposed a method for space-time   completion of large damaged areas in a video sequence. Here the authors performed the inpainting problem by sampling a set of spatial-temporal patches (a set of pixels at frame t) from other frames to fill in the missing data. Global consistency is enforced for all patches surrounding the missing data so as to ensure coherence of all  surrounding space time patches. This avoids artefacts such as multiple recovery of the same background object and the production of inconsistent object trajectories. This method provides decent results, however it suffers from a high computational load and requires a long video sequence of similar scenes to increase the probability of correct matches. The results shown are of very low resolution videos, and the inpainted static background was different from one frame to another creating a ghost    effect. Significant over-smoothing is observed as well.
Video inpainting meant for repairing damaged video was analysed in [4], [7] which involves gamut of different techniques which made the process very complicated. These works combine motion layer estimation and segmentation with warping and region filling-in. We seek a simpler more fundamental approach to the problem of video inpainting.
Inpainting for stationary background and moving foreground in videos was suggested by Patwardhan et al. [5]. To inpaint the stationary background, a relatively simple spatio-temporal priority scheme was employed where undamaged pixels were copied from frames temporally close to the damaged frame, followed by a spatial filling in step which replaces the damaged region with a best matching patch so as to maintain a consistent background throughout the sequence. Zhang et al., [7] proposed a motion layer based object removal in videos with few illustrations.
In this paper, video inpainting for static camera with a stationary background and moving foreground is considered in the spatial temporal domain. First, the video is converted into image frames. Second, the edges are found by using SOBEL edge detection method. Next, the object to be removed is inpainted using novel               examplar based image inpainting using patch sparsity. The known patch values are propagated into the missing region for every time frame of image to reproduce the original image. Last, the inpainted image frames are displayed to form the inpainted video. Here a video of short duration is considered for inpainting and the      temporal domain information for each image frame is utilized to display the inpainted image frames as a video.
In this paper, Section 2 gives an overview of the image inpainting using extended examplar-based inpainting method. In Section 3, the method of video   inpainting is defined. The experiments and the results are discussed in the Section 4. Finally, conclusion and the future research of the work are discussed in Section 5.

The most fundamental method of image inpainting is the diffusion based image inpainting method in which the unknown region is filled by diffusing the pixel values of the image from the known region. Another method of image inpainting is the examplar based image inpainting in which the region is filled by propagating the patch   values of the known region to unknown region. In the previous work of this paper, an examplar based image inpainting was proposed by incorporating the sparsity of natural image patches.

                                      Fig.1 Patch selection  ( Download Full Paper to View Fig. )

Fig.1 (a) shows that Ω is the missing  region , Ω is the known region, and ∂Ω  is the fill-front of patch propagation  (b) shows the two  examples of the surrounding patch Ψp  and  Ψp which are located at edge and flat  texture region respectively.

The process of filling in of the missing region using the image information from the known region is called as the image inpainting. Let I be the given image with the    missing region or target region Ω. In the examplar based image inpainting, the boundary of the missing region is also called as the fill front and is denoted by ∂Ω. A patch centered at a pixel p is denoted by Ψp.
The examplar based image inpainting is based on the patch propagation. It is done automatically by the algorithm by inwardly propagating the image patches from the source region into the interior of the target region patch by patch. Patch selection and patch inpainting are the two basic steps of patch propagation which is to be iterated continuously till the inpainting process is complete.

Read More: