BEHIND THE ZONE OF AVOIDANCE OF THE MILKY WAY: WHAT CAN WE RESTORE BY DIRECT AND INDIRECT METHODS?

Purpose: to present a brief overview of methods for restoring the large-scale structure of the Universe behind the Zone of Avoidance (ZoA) of the Milky Way; to propose a new “algorithm of darning the ZoA” and new approach based on the Generative adversarial network (GAN) to recover galaxy distribution in the ZoA using optical surveys as an additional platform for programming the artificial neural networks. Design/methodology/approach: Due to the extensive monitoring observations in radio (DOGS project, in HI line), infrared (IRAS and 2MASS surveys), and X-ray spectral ranges, the ZoA has been decreased significantly in size and now the obscured part is about 10 % of the sky in the visible spectral range. The Cosmic Microwave Background (CMB) measurements showed a 180  asymmetry known as the dipole: despite the fact that the resulting vector lies within 20  of the observed CMB dipole, the calculations remain highly ambiguous, partly because the galaxies in the ZoA are not taken into account and the concept of “attractors” should be reconsidered. Hence, the analysis of the spatial distribution of galaxies and their groups in the regions surrounding and behind the ZoA of Milky Way remains a complex and unresolved problem, and estimation of the “invisible” content of the spatial galaxy distribution, which is obscured by this absorption zone, becomes a highly actual one. Restoring the ZoA is possible by indirect methods (signal processing applied to obscured and incomplete data; Voronoi tessellation, etc.). These recovery methods, however, work only for large-scale structures in the ZoA; they are practically not sensitive to individual galaxies and small galaxy systems. We suggest the machine learning technique such as the GAN to apply for modeling the “invisible” spatial galaxy distribution behind the ZoA. Findings: We present “the algorithm of darning the ZoA” for dividing the real extragalactic surveys (e.g, the SDSS DR 14 galaxy sample) on the slices by redshifts, stellar magnitudes, coordinates and other parameters to form a training sample, and the general GAN scheme for the ZoA filling. We discuss principal tasks to generate galaxy distributions and their properties in the ZoA from latent space of features and describe how the discriminative network will compare the obtained artificial survey with the real one and evaluate how it is a realistic one. Conclusions: The incompleteness of data depending on wavelengths indicates that there are steal not resolved problems such as the dynamics in the Local Group and the near Universe; the large-scale structure of the Universe in the sky region obscured by the Milky Way; the velocity flow fields towards the Great Attractor; the CMB dipole. Here, we propose a new “algorithm of darning the ZoA” and the general GAN scheme as an additional machine learning platform to recover a spatial distribution behind the ZoA of our Galaxy.


Introduction
The Zone of Avoidance of the Milky Way was firstly noted by R. Proctor in his paper as concerns with the "General Catalogue of Nebulae" by J. Herschel (1878): he called it as the Zone of Few Nebulae.In 1922, using the data from the "New General Catalogue" by Dreyer (1888, 1895), Charlier was the first who has referred this sky region as a scientific problem in recognizing the nebulae distribution in the area of the sky that is obscured by the Milky Way.In 1961, Shapley has proposed to call this region as the Zone of Avoidance (ZoA) delimited by "the isopleth of five galaxies per square degree from the Lick and Harvard surveys".During the long time this zone was avoided by astronomers interested in the study of extragalactic objects due to 1) the small number of known objects; 2) decreasing the brightness of the extragalactic objects toward to the galactic equator; 3) increasing the concentration of stars on the line of sight which is resulted in increasing the overlapping of the extragalactic object with the star [1].Because the Solar system is located not in the center of our Galaxy, the ZoA is also heterogeneous and longitudinal.
Since 1990s the notion of the ZoA galaxy distribution has changed significantly.If it was previously believed that this area closes an observer about 20 % of the spatial distribution of galaxies in the optical range, then this value is now about 10 %.First of all, this has happened due to studies in the infrared and radio spectral ranges, since due to the decrease in the amount of light absorption with increasing wavelength, the Zone of Avoidance becomes more transparent in these spectral ranges.As for the incompleteness of ZoA galaxy catalogs as a function of the foreground extinction, we note that optical ZoA surveys are complete to an apparent diameter of 14 D   (where the diameters correspond to an isophote of m 2 24.5 arcsec ) for extinction levels less than m 3.0 .

B A 
The incompleteness of galaxy sample depending on the wavelength is an issue for studying dynamic properties of Local Group; the large-scale structure of the Universe (voids, filaments, walls, galaxy clusters, etc.); the velocity flow fields towards the Great Attractor; the dipole in the Cosmic Microwave Background (CMB), and other important problems.
In this paper, 1) we describe briefly in the Chapters 2 and 3 the direct (observational) and indirect methods (data mining plus confirmation through observations), which were provided to recognize celestial bodies in the ZoA; 2) we propose a new approach, the algorithm of darning the ZoA, based on the machine learning technique to reconstruct galaxy distribution in the ZoA, which takes into account the 3D distribution of galaxies and its photometry; the algorithm and a general scheme of the GAN machine learning methods are given in Chapter 4. The conclusions are made in Chapter 5.

A Brief Review of the Direct and Indirect Methods for Restoring the Zone of Avoidance
Due to the fact that galactic gas and dust close a significant part of the sky from visual observation, the detection of sources in this area becomes problematic.Due to the incomplete sampling in the area of absorption, on the basis of which the velocity field is constructed, we cannot say of its homogeneity, which gives an error in the definite direction of motion of our Galaxy by this method.The problem of the discrepancy between the vectors of movement of galaxies of the Local Group relative to the coordinate system associated with the CMB relict radiation suggests that there are a significant number of galaxies in the area of absorption of our Galaxy.
The methodology to solve this problem includes either direct or indirect techniques.Under direct methods is meant the observation of whole-sky surveys in different spectral ranges in the band near the galactic equator ( [ 20 , 20 ]).
b      For example, the currently actively used method is the search for bright sources in the microwave energy range of regions of the heated gas and of areas of star formation HI.These areas are also monitored by radio telescopes with purpose to confirm the assumption of the presence of galaxies.In some cases, when sources can be visible in optical spectral range, this allows us to supplement the data on this source and the Tally-Fisher method to determine the distance to the galaxy.

Observational Programs in IR-, Radio-, and X-ray Spectral Ranges
The first qualitative breakthrough in the study of the ZoA belongs primarily to the Italian astronomer P. Maffei, who in 1968 discovered two galaxies in the ZoA using observations in the IR-range (see, paper by Maffei, 2003, for review of his own works [2]).The elliptical galaxy Maffei 1 together with its companion, spiral galaxy Maffei 2, was discovered on a hyper-sensitized I-N photographic plate exposed on 29 Sept 1967 with the Schmidt telescope at Asiago Observatory.These galaxies were named after as the Maffei 1 and the Maffei 2. For example, the last updated data about Maffei  V m   Maffei 1 is located only 0.55 from the galactic plane in the middle of the ZoA and suffers from about m 4.7 of extinction (a factor of about 1 70) in visible range.If there were no this absorption, it would be one of the largest and brightest elliptical galaxy in the sky (about 3 4 the size of the full moon).The Maffei's discovery had a revolutionary effect for our modern picture of the Local Universe and promoted a lively discussion, first of all, about possible membership of these galaxies in the Local Group.We note several important papers on the determination of a distance to Maffei  A significant progress in a magnificent reduction of the ZoA was connected with exploration of the IRAS and the 2MASS surveys.For example, in 2000, Jarret et al. [8] reported on the detection of newly discovered sources from 2MASS Extended Survey in the fields incorporating the Galactic plane at 40 70 l     and predicted that the area-normalized detection rate is ~1 2  galaxies per deg 2 brighter than m 12.1 (10 mJy).See, also, earlier paper by Lu et al. [9] with results on identifying the HI spectra of galaxies observed by the IRAS.
Observations of the neutral hydrogen (21 cm) in frame of the DOGS project revealed new galaxies in the ZoA, the Dwingeloo 1 [10] and the Dwingeloo 2 [11] (see, for example, on the estimates of their kinematic and dynamic parameters, [4, 12,13]).The DOGS project was conducted with 25m Dwingeloo radio telescope and covered almost the whole observational region of the Northern Galactic Plane 30 200 l     below a Galactic latitude 5 .b   Because of the transparency of the Galaxy to the 21 cm radiation of neutral hydrogen, systematic HI-surveys are particularly powerful in mapping largescale structure (LSS) in this part of the sky.It should be noted that the absence of a signal does not always indicate the absence of a galaxy, but may be associated with a low HI content [14].Nevertheless, that this method is slow and requires a lot of time, the conjunction of HI surveys and 2MASS will greatly increase the current census of galaxies hidden behind the Milky Way.Supplementary to these surveys, the Parkes Multibeam HI ZoA Survey as a systematic deep blind HI survey of the southern Milky Way was begun in 1997 with the Multibeam receiver at the 64m Parkes telescope (surveys were centered on the southern Galactic Plane 52 196 , l     5 b   (see, for example, [15])).
The X-ray spectral range is an excellent window for studies of large-scale structure in the ZoA, because of the Milky Way is transparent to the hard Xray emission above a few keV, also the rich clusters are strong X-ray emitters.Since the X-ray luminosity is roughly proportional to the cluster mass as or 2 , M depending on the still uncertain scaling law between the X-ray luminosity and temperature (see our works [16][17][18][19] and references therein), massive clusters hidden by the Milky Way should be easily table through their X-ray emission [20,21].This method is particularly attractive, because the clusters are primarily composed of earlytype galaxies, which are not recovered by IR galaxy surveys or by systematic HI surveys.

A Brief Review of the Mathematical Simulation, Data Mining, and Machine Learning Methods
Indirect methods consist in applying the mathematical simulation and data mining methods to fill the ZoA as well as to determine the gravitational potentials of the nearest galaxies in order to predict the positions of galaxies and galaxy systems in the area of Milky Way absorption.Now a great attention is also focused on the machine learning technique.
The inhomogeneous distributed mass of matter in the ZoA surrounding the Local Group may cause the unbalanced gravity toward the Local Group (LG) in one direction.The expected velocity of the Local Group can be calculated by the sum of gravitational forces from all known LG galaxies [22,23].Despite the fact that the resulting vector lies within 20 of the observed cosmic background dipole, the calculations remain highly ambiguous, partly because galaxies in the ZoA are not taken into account [24,25].
CMB measurements showed an 180 asymmetry known as dipole.It manifests itself in the heating of 0.1 % of CMB radiation in comparison with the average in one direction and in the same cooling in the opposite direction.These measurements have been confirmed yet by the COBE (1989-1990) studies indicating that the Milky Way and the Local Group are moving at a velocity ~627 km/s p V  to 276 , l   30 , b   towards the Hydra constellation [26].This motion arises as a result of the distribution of matter i M in the Local Group and depends on the cosmological parameter 0  [27]: The absence of objects in the absorption zone also plays a key role in determining the value of the dipole of the collective velocity of the galaxies.Filling the zone 20 b   by galaxies changes the direction of movement measured in the volume of 6000 km/s by 31 [28,29].Unknown galaxies that are closer to us in the ZoA can make a larger contribution to the definition of a vector of collective velocity than whole clusters over long distances: 0.4 10 .
 What is the reason for this movement, which manifests itself in a slight deviation from the homogeneous expansion of the Universe?To overcame this discrepancy between the direction on the dipole and the expected velocity vector made it necessary to introduce the concept of "attractors" (the Great Attractor at a distance of about 60 Mpc).The Local Group is located at the same distance above the Perseus-Fornax cluster (both of which are components of a long chain of galaxies known as the Supergalactic Plane).However a lot of well known nearby large-scale structures are bisected by the Galactic Plane, such as the Local Supercluster, the Perseus-Pisces chain, and the Great Attractor."What is their true extent and their mass?It is curious that the two major superclusters in the Local Universe, i.e.Perseus-Pisces and the Great Attractor overdensity, lie at similar distances on opposite sides of the Local Group, both partially obscured by the ZoA.Which one of the two is dominant in the tug-of-war on the Local Group?Do these features continue across the Galactic Plane and are there other massive structures hidden in the ZoA for which so far no indication exists?What is the size of the largest coherent structures?" -these questions remain unanswered.For instance, the Great Wall and the Perseus-Pisces chain are connected across the ZoA as it was suggested by Giovanelli & Haynes [26] as the indicating structures of 1 (50 200)h Mpc.

 
"The latter would be incompatible with the angular extent over which fluctuations -the seeds of current large-scale structures -have been measured in the CMB".To answer these questions, the superclusters need to be fully mapped across the ZoA.We should take into account that the ZoA is fully incomplete at low Galactic latitudes in the larger Galactic Bulge area ( 0 90 ), l     including the Great Attractor region.Even if the obscured galaxies can be identified, the redshifts are determined very difficult if not impossible to obtain at the higher extinction levels.
Attempts to solve the problem of the incompatibility of the vector apex motion of the Local Group determined by the CMB and the velocity field did not give a positive result, since the method involves the uniform filling of the sky by the galaxies of the field, and chaotic filling them with non-real objects leads to the formation of non-existent fields [1].This problem can be solved by the machine learning methods.
So, the intensive multi-wavelength surveys of the ZoA in the last decades were aimed at addressing such key problems as the cosmological questions about the dynamics of the Local Group, the possible existence of nearby hidden massive galaxies, the dipole determinations based on luminous galaxies, the continuity and size of nearby superclusters, the mapping of cosmic flow fields.Their solution is possible by indirect methods, which include the methods of signal processing applied to obscured and incomplete data; indirect estimates of averaged variables; the mask inversion using Wiener filtering in spherical harmonic analysis; reconstruction of the projected galaxy distribution in IR-, radio-, and X-ray spectral ranges; 2-D Wiener reconstruction to 3-D; methods of Voronoi mosaic, cluster and fractal analysis; machine learning technique.
In this way, for example, the coordinates and masses of new galaxy clusters in the Puppis and in the Vela constellations were calculated [29], as well as the length of the Supergalactic Plane in the ZoA.The velocities of the galaxies near the two edges of the ZoA were used to estimate the mass distribution in it.For example, the center of the Great Attractor was predicted to lie on a line joining the constellations Centaurus and the Pavo.The Norma Supercluster occupies region from 360 to 290 with a weakly visible extension towards Vela (~270 ).
Hence, till now the analysis of the spatial distribution of galaxies and their systems in the areas surrounding the Milky Way Avoidance Zone remains a complex and unresolved problem, as well as the estimation of the "invisible" content of the spatial galaxy distribution, which is obscured by this absorption zone.The last successful results based on the 2MASS Tully-Fisher Survey and the HI observational surveys are presented in works by Said et al. [32][33][34], where the optimized Tully-Fisher relation allowing accurate measures of galaxy distances and peculiar velocities for dust-obscured galaxies is also applied.

Scientific Problems as Concerns with Incompleteness of the Data on the Spatial Distribution in the Sky Area Obscured by the Milky Way. Gaps in Spectroscopic Observations
A state-of-art approach as concern with incompleteness of the data is to use observations of galaxies and their systems that surround the ZoA to reconstruct missing information in it [25,35,36].The study should be conducted within the limited modeling of the Local Universe and unlike ordinary cosmological simulations, these simulations has restrictions on the initial conditions limited by observational data.Thus, these observations may concerns with either own radial velocities of galaxies or redshift catalogues.
Classical 3D reconstruction of the extragalactic objects behind the Milky Way to preserve the coherence of the large scale structure was triggered by the search of the Great Attractor in the 1990s [31].Problem of ZoA reconstruction is related to dealing with gaps in the spectroscopic observations to restore homogeneous sky coverage.A good example is the wide field imager and a multi-object spectrograph (VIMOS) at the European Southern Observatory's Very Large Telescope.It consists of the 4 CCDs with 2 arcmin space between them; the total area coverage is 290 arcmin 2 .Almost 25 % of the field is the unobserved region due to the constructed gaps between the CCDs.It means that for 25 %  galaxies in sample is not possible to get spectroscopic redshifts just less accurate photometric ones are presented.Such regular pattern, which corresponds to footprint of the spectrograph, creates issues for VIMOS Public Extragalactic Redshift Survey (VIPERS).Especially it creates the systematic effects of the violation of the local Poisson hypothesis in cell coun-ting statistics and makes galaxy counts in cells measurements non valid [37].
Existing of unobserved zones in scales comparable to size of investigated zone can have a serious impact on the study of galaxy properties and local environment.In this case the local and deterministic recovery of the missing data is needed [38].For small scale reconstruction are common such techniques as the direct cloning [39], wavelet analysis [40,41], cluster analysis [42,43,44], randomized cloning of objects into unobserved areas or application of the Wiener Filtering [45,46], Voronoi tessellation [47][48][49][50].Cucciati et al. proposed two algorithms [51] that use photometric redshift of target objects and assign redshifts based on the spectroscopic redshifts of the nearest galaxies.A Wiener filter applied in this work was very efficient also to reconstruct the continuous density field instead of individual galaxy positions.This is a Bayesian method with basic assumptions that both the distribution of the overdensity field and the likelihood of observing galaxies are distributed by Gaussian.A true density distribution is reconstructed by maximizing the posterior distribution given by Bayes formula.These methods can clearly separate underdense from overdense regions on scales of which is important for studies of cosmic variance and rare population galaxy systems.

Generative Adversarial Neural Networks for Recovering Damaged Astronomical Images and Surveys
Optical observations of extended objects are limited by random and systematic noise from detector, telescope system and sky background.An image from telescope can be interpreted as a convolution of the real image with point spread function (PSF) and some noise.The Shannon-Nyquist sampling theorem demonstrates limits of deconvolution technique for improving the observed images [52].From other side, there is no unique solution at deconvolution [53,54).Schawinski et al. estimated a possibility to recover artificially degraded images with a high noise better than a simple deconvolution can do [55].They proposed to use stateof-the-art methods of Machine Learning, namely Deep Learning -Generative adversarial network (GAN).In case of galaxy images when we know how they should look like, this information can be helpful for decisions while choosing among many solutions.
Generative adversarial neural networks as the type of the unsupervised machine learning algorithms were firstly invented by Goodfellow et al. in 2014 [56].Core of these classes of algorithms are two neural networks contesting with each other in a zero-sum game.First neural network called "generative" (typically a deconvolutional), generates candidate images and second neural network (a convolutional discriminative) evaluates them.The generative network trains to transfer from a space of features to a particular data distribution.In the same time the discriminative network discriminates between the produced candidates and real examples.Schawinski et al. [55] applied the GAN to 4550 galaxies from the Sloan Digital Sky Survey DR12 (SDSS DR12).The authors have proved that this method can reliably recover features in images of galaxies and can go well beyond the limitation of deconvolutions.As the training sample they used image pairs: one original image of galaxy and the same artificially degraded (convolved with PSF).In general, the GAN is going to learn how to recover the degraded image by minimizing the difference between the recovered and true images.A main feature of this approach is the measurement of the difference between these two images called the loss function.With this purpose, the authors used a second neural network, whose aim is to distinguish the synthetic recovered image from true image.These two neural networks are trained simultaneously.Therefore by training on higher quality images, the GAN method can learn how to recover information from the less quality data by building priors.Such approach has a potential for recovering partially damaged images with gaps, dead CCD chips, ZoA etc.
We propose to apply such approach to the sample of galaxies from the SDSS DR14 (a general galaxy distribution at the redshifts 1 z  is shown in Fig. 1(a), a galaxy distribution in the ZoA is shown in Fig. 1(b).A general scheme of the GAN approach for the filling of the ZoA in extragalactic surveys is shown in Fig. 2. Principal problem with a whole-sky galaxy distribution is that we have just unique sample of galaxies, i.e. just one set for training.We cannot use a set of many images for training likely in the approach described above.A solution could be to prepare the mock catalogues from numerical simulations, which reproduce a target sample.In this case we may generate as much as possible pairs -real survey and survey with ZoA.Additionally position of the ZoA could be randomized over survey field.
A goal of generative artificial neural network (ANN) will be in the trying to generate galaxy distributions and their properties in the ZoA from latent space of features.In the same time, a discriminative network will compare the obtained survey with the real one and evaluate how realistic it is.The generative network produces better surveys with iteration, while the discriminative one becomes more experienced at labeling the synthetic ones.In such a way the system learns the sophisticated loss functions automatically without its predefinition.
The "algorithm of darning the ZoA" for dividing the real extragalactic surveys (for example, SDSS or 2MASS, or future LSST1 survey) on the slices by redshifts, stellar magnitudes, coordinates and other parameters to form a training sample is given in Fig. 3. To apply the algorithm, we should prepare a sample of galaxies surrounding the ZoA, which is complete by stellar magnitudes.To get a 3D spatial distribution of galaxies in this sample, we must obtain their pho- tometric redshifts and to divide this sample on the slices by coordinates, taking into account the cosmological parameters.Each of these slices will contain a real distribution and the damaged image (part of the ZoA region), which will require a darning.The preliminary step how the algorithm works and restores a galaxy distribution should be conducted and tested with subsamples of real galaxies selected from the non-damaged regions.The information about morphological classification of galaxies will be useful at this step and can be obtained by another machine learning technique (see, for example, our works [57] on applying the Random Forest methods to obtain a binary morphological classification (early and late types) or ternary classification, which requires knowledge on the color indices and photometry of galaxies).Another data mining methods such as the Data visualization, Self-Organizing Map, Classification, Bayesian Analysis as well as 3D print models will be also engaged.

Fig. 1 .
Fig. 1.Distribution of galaxies from the SDSS DR14 at the redshifts 1 z  in Galactic coordinates: (a) whole-sky distribution (Mollweide projection); (b) in the ZoA