See discussions, st ats, and author pr ofiles f or this public ation at : https:www .researchgate.ne tpublic ation325391073 [622840]

See discussions, st ats, and author pr ofiles f or this public ation at : https://www .researchgate.ne t/public ation/325391073
Sparse non-negative su per-resolution-simplified and stabilised
Preprint · April 2018
CITATIONS
0READS
60
5 author s, including:
Armin Eft ekhari
École P olyt echnique F édér ale de Lausanne
20 PUBLICA TIONS    4 CITATIONS    
SEE PROFILE
Bog dan T oader
Univ ersity of Oxf ord
3 PUBLICA TIONS    3 CITATIONS    
SEE PROFILE
Hemant T yagi
Alan T uring Instit ute, London
19 PUBLICA TIONS    101 CITATIONS    
SEE PROFILE
All c ontent f ollo wing this p age was uplo aded b y Armin Eft ekhari on 27 May 2018.
The user has r equest ed enhanc ement of the do wnlo aded file.

Sparse non-negative super-resolution — simplified and stabilised
Armin Eftekhari1,2, Jared Tanner1,3, Andrew Thompson3, Bogdan Toader3, and Hemant
Tyagi1,2
1Alan Turing Institute
2University of Edinburgh
3University of Oxford
April 5, 2018
To David L. Donoho, a uniquely positive gentleman,
in celebration of his 60th birthday and with thanks for his inspiration and support.
Abstract
The convolution of a discrete measure, x°k
i1aiti, with a local window function, pstq,
is a common model for a measurement device whose resolution is substantially lower than that of the
objectsbeingobserved. Super-resolutionconcernslocalisingthepointsources tai;tiuk
i1withanaccuracy
beyond the essential support of pstq, typically from msamplesypsjq°k
i1aipsjtiqj, where
jindicates an inexactness in the sample value. We consider the setting of xbeing non-negative and seek
to characterise all non-negative measures approximately consistent with the samples. We first show that
xis the unique non-negative measure consistent with the samples provided the samples are exact, i.e.
j0,m¥2k1samples are available, and pstqgenerates a Chebyshev system. This is independent
of how close the sample locations are and does not rely on any regulariser beyond non-negativity ; as such,
it extends and clarifies the work by Schiebinger et al. in [1] and De Castro et al. in [2], who achieve the
same results but require a total variation regulariser, which we show is unnecessary.
Moreover, we characterise non-negative solutions ^xconsistent with the samples within the bound°m
j12
j¤2. Any such non-negative measure is within Op1{7qof the discrete measure xgenerating the
samples in the generalised Wasserstein distance. Similarly, we show using somewhat different techniques
that the integrals of ^xandxoverpti;tiqare similarly close, converging to one another as and
approach zero. We also show how to make these general results, for windows that form a Chebyshev
system, precise for the case of pstqbeing a Gaussian window. The main innovation of these results
is that non-negativity alone is sufficient to localise point sources beyond the essential sensor resolution
and that, while regularisers such as total variation might be particularly effective, they are not required
in the non-negative setting.
1 Introduction
Super-resolution concerns recovering a resolution beyond the essential size of the point spread function of
a sensor. For instance, a particularly stylised example concerns multiple point sources which, because of
the finite resolution orbandwidth of the sensor, may not be visually distinguishable. Various instances of
this problem exist in applications such as astronomy [3], imaging in chemistry, medicine and neuroscience
[4, 5, 6, 7, 8, 9, 10, 11], spectral estimation [12, 13], geophysics [14], and system identification [15]. Often in
theseapplicationmuchisknownaboutthepointspreadfunctionofthesensor, orcanbeestimatedand, given
such model information, it is possible to identify point source locations with accuracy substantially below
the essential width of the sensor point spread function. Recently there has been substantial interest from
the mathematical community in posing algorithms and proving super-resolution guarantees in this setting,
see for instance [16, 17, 18, 19, 20, 21, 22, 23]. Typically these approaches borrow notions from compressed
sensing [24, 25, 26]. In particular, the aforementioned contributions to super-resolution consider what is
1arXiv:1804.01490v1 [math.OC] 4 Apr 2018

known as the Total Variation norm minimisation over measures which are consistent with the samples. In
this manuscript we show first that, for suitable point spread functions, such as the Gaussian, any discrete
non-negative measure composed of kpoint sources is uniquely defined from 2k1of its samples, and
moreover that this uniqueness is independent of the separation between the point sources. We then show
that by simply imposing non-negativity, which is typical in many applications, any non-negative measure
suitably consistent with the samples is similarly close to the discrete non-negative measure which would
generate the noise free samples. These results substantially simply results by [1, 2] and show that, while
regularisers such as Total Variation may be particularly effective, in the setting of non-negative point sources
such regularisers are not necessary to achieve stability.
1.1 Problem setup
Throughout this manuscript we consider non-negative measures in relation to discrete measures. To be
concrete, let xbe ak-discrete non-negative Borel measure supported on the interval Ir0;1s€R, given by
xk¸
i1aitiwithai¡0andtiPintpIqfor alli: (1)
Consider also real-valued and continuous functions tjum
j1and lettyjum
j1be the possibly noisy measure-
ments collected from xby convolving against sampling functions jptq:
yj»
Ijptqxpdtqjk¸
i1aijptiqj; (2)
wherejwith}}2¤can represent additive noise. Organising the msamples from (2) in matrix notation
by letting
y:ry1ymsTPRm; ptq:r1ptq mptqsTPRm(3)
allows us to state the program we investigate:
findz¥0subject toy»
Iptqzpdtq
2¤1; (4)
with¤1. Herein we characterise non-negative measures consistent with measurements (2) in relation to
the discrete measure (1). That is, we consider any non-negative Borel measure zfrom the Program (4)1and
show that any such zis close toxgiven by (1) in an appropriate metric, see Theorems 4, 5, 11, 12 and 13.
Moreover, we show that the xfrom (1) is the unique solution to Program (4) when 10; e.g. in the setting
of exact samples, i0for alli. Program (4) is particularly notable in that there is no regulariser of z
beyond imposing non-negativity and, rather than specify an algorithm to select a zwhich satisfies Program
(4), we consider all admissible solutions. The admissible solutions of Program (4) are determined by the
source and sample locations, which we denote as
Tttiuk
i1€intpIqandStsjum
j1„I (5)
respectively, as well as the particular functions jptqused to sample the k-sparse non-negative measure x
from (1). Lastly, we introduce the notions of minimum separation and sample proximity, which we use to
characterise solutions of Program (4).
Definition 1. (Minimum separation and sample proximity) For finite ~TTYt0;1u €I, let
pTq¡0be the minimum separation between the points in Talong with the endpoints of I, namely
pTq min
Ti;TjP~T;ij|TiTj|: (6)
1An equivalent formulation of Program (4) minimises }y³
Iptqzpdtq}2over all non-negative measures on I(without any
constraints). In this context, however, we find it somewhat more intuitive to work with Program (4), particularly considering
the importance of the case 0.
2

We define the sample proximity to be the number Pp0;1
2qsuch that, for each source location ti, there exists
a closest sample location slpiqPStotiwith
|tislpiq|¤pTq: (7)
We describe the nearness of solutions to Program (4) in terms of an additional parameter associated
with intervals around the sources T; that is we let ¤pTq{2and define intervals as:
Ti;:
t:|tti|¤(
XI; iPrks; T:k¤
i1Ti;; (8)
whererks1;2;:::;k, and setTC
i;andTC
to be the complements of these sets with respect to I. In order
to make the most general result of Theorems 11 and 12 more interpretable, we turn to presenting them in
Section 1.2 for the case of jptqbeing shifted Gaussians.
1.2 Main results simplified to Gaussian window
In this section we consider jptqto be shifted Gaussians with centres at the source locations sj, specifically
jptqgptsjqeptsjq2
2: (9)
We might interpret (9) as the “point spread function” of the sensing mechanism being a Gaussian window
andsjthe sample locations in the sense that
»
Ijptqxpdtq»
Igptsjqxpdtqpgxqpsjq;@jPrms; (10)
evaluates the “filtered” copy of xat locations sjwheredenotes convolution.
As an illustration, Figure 1 shows the discrete measure xin blue for k3, the continuous function
ypsq  pgxqpxqin red, and the noisy samples ypsjqat the sample locations Srepresented as the black
circles.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
t-0.200.20.40.60.811.2yx
noise-free y
noisy measurements at {si}m
i=1
Figure 1: Example of discrete measure xand measurements where jptq ptsjqforsjPSand the
Gaussian kernel ptqet2
2.
The conditions we impose to ensure stability of Program (4) for ptqGaussian as in (9) are as follows:
Conditions 2. (Gaussian window conditions) When the window function is a Gaussian ptqet2
2,
we require its width and the source and sampling locations from (5)to satisfy the following conditions:
3

1. Samples define the interval boundaries: s10andsm1,
2. Samples near sources: for every iPrks, there exists a pair of samples s;s1€S, one on each side of ti,
such that|sti|¤ands1s, for¤2small enough; which is quantified in Lemma 24.
3. Sources away from the boundary: a
logp1{3q ¤ti;sj¤1a
logp1{3qfor everyiP rksand
jPr2:m1s,
4. Minimum separation of sources: ¤?
2andpTq¡b
logp34
2q, where the minimum separation
pTqof the sources is defined in Definition 1.
The four properties in Conditions 2 can be interpreted as follows: Property 1 imposes that the sources are
within the interval defined by the minimum and maximum sample; Property 2 ensures that there is a pair
of samples near each source which translates into a sampling density condition in relation to the minimum
separation between sources and in particular requires the number of samples m¥2k2; Property 3 is a
technical condition to ensure sources are not overly near the sampling boundary; and Property 4 relates the
minimum separation between the sources to the width of the Gaussian window.
We can now present our main results on the robustness of Program (4) as they apply to the Gaussian
window; these are Theorem 4, which follows from Theorem 11, and Theorem 5, which follows from Theorem
12. However, before stating the stability results, it is important to note that, in the setting of exact samples,
i0, the solution of Program (4) is unique when 10.
Proposition 3. (Uniqueness of exactly sampled sparse non-negative measures for ptqGaussian)
Letxbe a non-negative k-sparse discrete measure supported on I, see (1). If 0,m¥2k1andtjum
j1
are shifted Gaussians as in (9), thenxis the unique solution of Program (4) with 10.
Proposition 3 states that Program (4) successfully localises the kimpulses present in xgiven only 2k1
measurements when jptqare shifted Gaussians whose centres are in I. Theorems 4 and 5 extend this
uniqueness condition to show that any solution to Program (4) with 1¡0is proportionally close to the
unique solution when 10.
Theorem 4. (Wasserstein stability of Program (4)forptqGaussian) LetIr0;1sand consider a
k-sparse non-negative measure xsupported on T€intpIq. Consider also an arbitrary increasing sequence
tsjum
j1€Rand, for positive , lettjptqum
j1be defined in (9), which form according to (3). If m¥2k2
and Conditions 2 hold, then Program (4) with 1is stable in the sense that
dGWpx;pxq¤F1}x}TV (11)
for all¤pTq{2wheredGWis the generalised Wasserstein distance as defined in (19)and the exact
expression of F1F1pk;pTq;1
;1
;qis given in the proof (see (64)in Section 3.4.2). In particular, for
 1?
3andpTq¡b
log5
2, we have:
F1pk;pTq;1
;1
;q c1kC1p1
q
2c2
6p132q2k
; (12)
if
¤min#
c36p132q
pk1q3
2;c4C1
62
3
pk1q1
3+
; (13)
wherec1;c2;c3;c4are universal constants and C11

is given by (59)in Section 3.4.1
The central feature of Theorem 4 is that the proportionality to andof the Wasserstein distance
between any solution to Program (4) and the unique solution for 10is of the form (11). The particular
form ofF1pqis not believed to be sharp; in particular, the exponential dependence on kin (12) follows
from bounding the determinant of a matrix similar to (see (128)) by a lower bound on the minimum
eigenvalue to the kthpower. The scaling with respect to 2is a feature of 1in Program (4) not being
4

normalized with respect to }y}2which, forTandSfixed, decays with due to the increased localisation of
the Gaussian. Note that the dependence is a feature of the proof and the which minimises the bound in
(11) is proportional to to some power as determined by C1p1qfrom (12). Theorem 4 follows from the
more general result of Theorem 11, whose proof is given in Section 3 and the appendices.
As an alternative to showing stability of Program (4) in the Wasserstein distance, we also prove in
Theorem 5 that any solution to Program (4) is locally consistent with the discrete measure in terms of local
averages over intervals Ti;as given in (8). Moreover, for Theorem 5, we make Property 2 of Conditions 2
more transparent by using the sample proximity pTqfrom Definition 1; that is, defined in Conditions
2 is related to the sample proximity from Definition 1 by pTq¤{2.
Theorem 5. (Average stability of Program (4)forptqGaussian: source proximity dependence)
LetI r0;1sand consider a k-sparse non-negative measure xsupported on Tand sample locations Sas
given in (5)and for positive , lettjptqum
j1as defined in (9). If the Conditions 2 hold, then, in the presence
of additive noise, Program (4)is stable and it holds that, for any solution ^xof Program (4)with1:
»
Ti;^xpdtqai¤
pc1F2qc2}^x}TV
2
F3; (14)
»
TC^xpdtq¤F2; (15)
where the exact expressions of F2F2pk;pTq;1
;1
qandF3F3ppTq;;qare given in the proof (see
(70)in Section 3.4.3), provided that ,pTqandsatisfy(27). In particular, for  1?
3,pTq¡b
log5
2
and 0:4, we haveF3ppTq;;q c5and:
F2pk;pTq;1
;1
q c3kC2p1
q
2c4
6p132q2k
: (16)
Above,c1;c2;c3;c4;c5are universal constants and C2p1
qis given by (60)in Section 3.4.1.
The bounds in Theorems 4 and 5 are intentionally similar, and though their proofs make use of the same
bounds, they have some fundamental differences. While both (11) and (14) have the same proportionality
toand, the role of in particular differs substantially in that Theorem 5 considers averages of ^xover
Ti;. Also different in their form is the dependence on }x}TVand}^x}TVin Theorems 4 and 5 respectively.
The presence of }^x}TVin Theorem 5 is a feature of the proof which we expect can be removed and replaced
with}x}TVby proving any solution of Program (4) is necessarily bounded due to the sampling proximity
condition of Definition 1. It is also worth noting that (14) avoids an unnatural 1dependence present in
(11). Theorem 5 follows from the more general result of Theorem 12, whose proof is given in Section 3.4.3.
Lastly, we give a corollary of Theorems 4 and 5 where we show that, for ¡0but sufficiently small, one
can equate the anddependent terms in Theorems 4 and 5 to show that their respective errors approach
zero asgoes to zero.
Corollary 6. Under the conditions in Theorems 4 and 5 and for  1?
3,pTq¡b
log5
2and 0:4,
there exists 0such that:
dGWpx;^xq¤C11
7; (17)»
Ti;^xpdtqai¤C21
6; (18)
for allPp0;0q, where C1and C2are given in the proof in Section 3.4.4.
1.3 Organisation and summary of contributions
Organisation: The majority of our contributions were presented in the context of Gaussian windows in
Section 1.2. These are particular examples of a more general theory for windows that form a Chebyshev
5

system, commonly abbreviated as T-system , see Definition 7. A T-system is a collection of continuous
functions that loosely behave like algebraic monomials. It is a general and widely-used concept in classical
approximation theory [27, 28, 29] that has also found applications in modern signal processing [1, 2]. The
framework we use for these more general results is presented in Section 2.1, the results presented in Section
2.2, and their proof sketched in Section 3. Proofs of the lemmas used to develop the results are deferred to
the appendices.
Summary of contributions: We begin discussing results for general window function with Proposition
8, which establishes that for exact samples, namely 0,tjum
j1a T-system, and from m¥2k1
measurements, the unique solution to Program (4) with 10is thek-sparse measure xgiven in (1). In
other words, we show that the measurement operator in (3) is an injective map from k-sparse non-negative
measures on ItoRmwhentjum
j1form a T-system. No minimum separation between impulses is necessary
here andtjum
j1need only to be continuous. As detailed in Section 1.4, Proposition 8 is more general and
its derivation is far simpler and more intuitive than what the current literature offers. Most importantly, no
explicit regularisation is needed in Program (4) to encourage sparsity: the solution is unique.
Our main contributions are given in Theorems 11 and 12, namely that solutions to Program (4) with
1¡0are proportionally close to the unique solution (1) with 10; these theorems consider nearness
in terms of the Wasserstein distance and local averages respectively. Furthermore, Theorem 11 allows xto
be a general non-negative measure, and shows that solutions to Program (4) must be proportional to both
how wellxmight be approximated by a k-sparse measure, , with minimum source separation 2, and a
proportional distance between and solutions to Program (4). These theorems require m¥2k2and
loosely-speaking the measurement apparatus forms a T*-system, which is an extension of a T-system to
allow the inclusion of an additional function which may be discontinuous, and enforcing certain properties
of minors of . To derive the bounds in Theorems 4 and 5 we show that shifted Gaussians as given in (9)
augmented with a particular piecewise constant function form a T*-system.
Lastly, in Section 2.2.1, we consider an extension of Theorem 12 where the minimum separation between
sources pTqis smaller than . We extend the intervals Ti;from (8) to ~Ti;in (31), where intervals Ti;
which overlap are combined. The resulting Theorem 13 establishes that, while sources closer than may not
be identifiable individually by Program (4), the local average over ~Ti;of bothxin (1) and any solution to
Program (4) will be proportionally within of each other.
To summarise, the results and analysis in this work simplify, generalise and extend the existing results
for grid-free and non-negative super-resolution. These extensions follow by virtue of the non-negativity
constraint in Program (4), rather than the common approach based on the TV norm as a sparsifying penalty.
We further put these results in the context of existing literature in Section 1.4.
1.4 Comparison with other techniques
We show in Proposition 8 that a non-negative k-sparse discrete measure can be exactly reconstructed from
m¥2k1samples (provided that the atoms form a T-system, a property satisfied by Gaussian windows
for example) by solving a feasibility problem. This result is in contrast to earlier results in which a TV
norm minimisation problem is solved. De Castro and Gamboa [2] proved exact reconstruction using TV
norm minimisation, provided the atoms form a homogeneous T-system (one which includes the constant
function). An analysis of TV norm minimisation based on T-systems was subsequently given by Schiebinger
et al. in [1], where it was also shown that Gaussian windows satisfy the given conditions. We show in
this paper that the TV norm can be entirely dispensed with in the case of non-negative super-resolution.
Moreover, analysis of Program (4) is substantially simpler than its alternatives. In particular, Proposition 8
for noise-free super-resolution immediately follows from the standard results in the theory of T-systems. The
fact that Gaussian windows form a T-system is immediately implied by well-known results in the T-system
theory, in contrast to the heavy calculations involved in [1].
While neither of the above works considers the noisy setting or model mismatch, Theorems 11 and 12
in our work show that solutions to the non-negative super-resolution problem which are both stable to
measurement noise and model inaccuracy can also be obtained by solving a feasibility program. The most
closely related prior work is by Doukhan and Gamboa [30], in which the authors bound the maximum
distance between a sparse measure and any other measure satisfying noise-corrupted versions of the same
6

measurements. While [30] does not explicitly consider reconstruction using the TV norm, the problem is
posed over probability measures, that is those with TV norm equal to one. Accuracy is captured according
to the Prokhorov metric. It is shown that, for sufficiently small noise the Prokhorov distance between the
measures is bounded by c, whereis the noise level and cdepends upon properties of the window function.
In contrast, we do not make any total variation restrictions on the underlying sparse measure, we extend to
consider model inaccuracy and we consider different error metrics (the generalised Wasserstein distance and
the local averaged error).
More recent results on noisy non-negative super-resolution all assume that an optimisation problem
involvingtheTVnormissolved. Denoyelleetal.[21]considerthenon-negativesuper-resolutionproblemwith
a minimum separation tbetween source locations. They analyse a TV norm-penalized least squares problem
and show that a k-sparse discrete measure can be stably approximated provided the noise scales with t2k1,
showing that the minimum separation condition exhibits a certain stability to noise. In the gridded setting,
stability results for noisy non-negative super-resolution were obtained in the case of Fourier convolution
kernels in [31] under the assumption that the spike locations satisfy a Rayleigh regularity property, and
these results were extended to the case of more general convolution kernels in [32].
Super-resolution in the more general setting of signedmeasures has been extensively studied. In this
case, the story is rather different, and stable identification is only possible if sources satisfy some separation
condition. The required minimum separation is dictated by the resolution of the sensing system, e.g., the
Rayleigh limit of the optical system or the bandwidth of the radar receiver. Indeed, it is impossible to
resolve extremely close sources with equal amplitudes of opposite signs; they nearly cancel out, contributing
virtually nothing to the measurements. A non-exhaustive list of references is [33, 17, 18, 19, 20, 22, 23].
In Theorem 12 we give an explicit dependence of the error on the sampling locations. This result relies
on local windows, hence it requires samples near each source, and we give a condition that this distance
must satisfy. The condition that there are samples near each source in order to guarantee reconstruction
also appears in a recent manuscript on sparse deconvolution [34]. However, this work relies on the minimum
separation and differentiability of the convolution kernel, which we overcome in Theorem 12.
2 Stability of Program (4)to inexact samples for jptqT-systems
The main results stated in the introduction, Theorems 4 and 5, are for Gaussian windows, which allows the
results to omit technical details of the more general results of Theorems 11-13. These more general results
apply to windows that form Chebyshev systems, see Definition 7, and an extension to T-systems, see
Definition 9, which allows for explicit control of the stability of solutions to Program (4). These Chebyshev
systems and other technical notions needed are introduced in Section 2.1 and our most general contributions
are presented using these properties in Section 2.2.
2.1 Chebyshev systems and sparse measures
Before establishing stability of Program (4) to inexact samples, we show that solutions to Program (4) with
10, that is with yiin (2) having i0, hasxfrom (1) as its unique solution once m¥2k1. This
result relies on jptqforming a Chebyshev system, commonly abbreviated T-system [27].
Definition 7. (Chebyshev, T-system [27]) Real-valued and continuous functions tjum
j1form a T-
system on the interval Iif themmmatrixrjplqsm
l;j1is nonsingular for any increasing sequence tlum
l1€
I.
Example of T-systems include the monomials t1;t;;tm1uon any closed interval of the real line.
In fact, T-systems generalise monomials and in many ways preserve their properties. For instance, any
“polynomial”°m
j1bjjof a T-system tjum
j1has at most m1distinct zeros on I. Or, given mdistinct
points onI, there exists a unique polynomial in tjum
j1that interpolates these points. Note also that linear
independence of tjuis a necessary condition for forming a T-system, but not sufficient. Let us emphasise
that T-system is a broad and general concept with a range of applications in classical approximation theory
and modern signal processing. In the context of super-resolution for example, translated copies of the
Gaussian window, as given in (9), and many other measurement windows form a T-system on any interval.
7

We refer the interested reader to [27, 29] for the role of T-systems in classical approximation theory and to
[35] for their relationship to totally positive kernels .
2.1.1 Sparse non-negative measure uniqueness from exact samples
Our analysis based on T-Systems has been inspired by the work by Schiebinger et al. [1], where the authors
use the property of T-Systems to construct the dual certificate for the spike deconvolution problem and
to show uniqueness of the solution to the TV norm minimisation problem without the need of a minimum
separation. The theory of T-Systems has also been used in the same context by De Castro and Gamboa in
[2]. However, both [1] and [2] focus on the noise-free problem exclusively, while we will extend the T-Systems
approach to the noisy case as well, as we will see later.
Our work, in part, simplifies the prior analysis considerably by using readily available results on T-
Systems and we go one step further to show uniqueness of the solution of the feasibility problem, which
removes the need for TV norm regularisation in the results of Schiebinger et al. [1]; this simplification in the
presence of exact samples is given in Proposition 8.
Proposition 8. (Uniqueness of exactly sampled sparse non-negative measures) Letxbe a non-
negativek-sparse discrete measure supported on Ias given in (1). Lettjum
j1form a T-system on I, and
givenm¥2k1measurements as in (2), thenxis the unique solution of Program (4) with 10.
Proposition 8 states that Program (4) successfully localises the kimpulses present in xgiven only 2k1
measurements when tjum
j1form a T-system on I. Note thattjum
j1only need to be continuous and no
minimum separation is required between the impulses. Moreover, as discussed in Section 1.4, the noise-free
analysis here is substantially simpler as it avoids the introduction of the TV norm minimisation and is more
insightful in that it shows that it is not the sparsifying property of TV minimisation which implies the result,
but rather it follows from the non-negativity constraint and the T-system property, see Section 3.1.
2.1.2 T*-systems in terms of source and sample configuration
While Proposition 8 implies that T-systems ensure unique non-negative solutions, more is needed to ensure
stability of these results to inexact samples; that is ¡0. This is to be expected as T-systems imply
invertibility of the linear system in (3) for any configuration of sources and samples as given in (5), but
doe not limit the condition number of such a system. We control the condition number of by imposing
further conditions on the source and sample configuration, such as those stated in Conditions 2, which is
analogous to imposing conditions that there exists a dual polynomial which is sufficiently bounded away
from zero in regions away from sources, see Section 2.2. In particular, we extend the notion of T-system in
Definition 7 to a T*-system which includes conditions on samples at the boundary of the interval, additional
conditions on the window function, and a condition ensuring that there exist samples sufficiently near sources
as given by the notation (8) but stated in terms of a new variable so as to highlight its different role here.
Definition 9. (T*-system) For an even integer m, real-valued functions tjum
j0form a T*-system on
Ir0;1sif the following holds for every Ttt1;t2;:::;tku€Iwhen¡0is sufficiently small. For any
increasing sequence tlum
l0€Isuch that
00,m1,
except exactly three points, namely 0,m, and saylPintpIq, the other points belong to T,
everyTi;contains an even number of points,
we have that
1. the determinant of the pm1qpm1qmatrixM:rjplqsm
l;j0is positive, and
2. the magnitudes of all minors of Malong the row containing lapproach zero at the same rate2when
Ñ0.
2A function u:RÑRapproaches zero at the rate PwhenupqpPq. See, for example [36], page 44.
8

Let us briefly discuss T*-systems as an alternative to T-systems in Definition 7. The key property of
a T-system to our purpose is that an arbitrary polynomial°m
j0bjjof a T-system tjum
j0onIhas at
mostmzeros. Polynomials of a T*-system may not have such a property as T-systems allow arbitrary
configurations of points while T*-systems only ensure the determinant in condition 1 of Definition 9 be
positive for configurations where the majority of points in are paired in T. However, as the analysis later
shows, condition 1 in Definition 9 is designed for constructing dual certificates for Program (4). We will also
see later that condition 2 in Definition 9 is meant to exclude trivial polynomials that do not qualify as dual
certificates. Lastly, rather than any increasing sequence tlulPr0:ms€I, Definition 9 only considers subsets
that mainly cluster around the support T, whereas in our use all but one entry in is taken from the set
of samples S; this is only intended to simplify the burden of verifying whether a family of functions form
a T*-system. While the first and third bullet points in Definition 9 require that there need to be at least
two samples per interval Ti;as well as samples which define the interval endpoints which gives a sampling
complexity m2k2, we typically require Sto include additional samples, m¡2k2, due to the location
ofTbeing unknown. In fact, as Tis unknown, the third bullet point imposes a sampling density of mbeing
proportional to the inverse of the minimum separation of the sources pTq. The additional point lis not
taken from the set S, it instead acts as a free parameter to be used in the dual certificate. In Figure 2, we
show an example of points tlu10
l0which satisfy the conditions in Definition 9 for k3sources.
Figure 2: Example of tlum
l1that satisfy the conditions in Definition 9 for m10andk3.
We will state some of our more general stability results for solutions of Program (4) in terms of the
generalised Wasserstein distance [37] between x1andx2, both non-negative measures supported on I, defined
as
dGWpx1;x2qinf
z1;z2
}x1z1}TVdWpz1;z2q}x2z2}TV
; (19)
where the infimum is over all non-negative Borel measures z1;z2onIsuch that}z1}TV }z2}TV. Here,
}z}TV³
I|zpdtq|is thetotal variation of measure z, akin to the `1-norm in finite dimensions, and dWis
the standard Wasserstein distance, namely
dWpz1;z2qinf

»
I|12|
pd1;d2q; (20)
where the infimum is over all measures
onIIthat produce z1andz2as marginals. In a sense, dGW
extendsdWto allow for calculating the distance between measures with different masses.3
Moreover, in some of our most general results we consider the extension to where xneed not be a
discrete measure, see Theorem 11. In that setting, we introduce an intermediate k-discrete measure which
approximates xin thedGWmetric. That is, given an integer kand positive , letxk;be ak-sparse
2-separated measure supported on Tk;€intpIqof sizekand with pTk;q¥2such that, for ¥1,
Rpx;k;q:dGWpx;xk;q¤ inf
dGWpx;q; (21)
where the infimum is over all k-sparse 2-separated non-negative measures supported on intpIqand the
parameter allows for near projections of xonto the space of k-sparse 2-separated measures.
Lastly, we also assume that the measurement operator in (3) is Lipschitz continuous, namely there
existsL¥0such that »
Iptqpx1pdtqx2pdtqq
2¤LdGWpx1;x2q; (22)
for every pair of measures x1;x2supported on I.
3In [37], the authors consider the p-Wasserstein distance, where popular choices of pare1and2. In our work, we only use
the 1-Wasserstein distance.
9

2.2 Stability of Program (4)
Equipped with the definitions of T and T*-systems, Definitions 7 and 9 respectively, we are able to char-
acterise any solution to Program (4) for jptqwhich form a T-system and suitable source and sample
configurations (5). We control the stability to inexact measurements by introducing two auxiliary functions
in Definition 10, which quantify the dual polynomials qptqandqptqassociated with Program (4) to be at
least faway from the necessary constraints for all values of tat leastaway from the sources. Specifically,
forFandFdefined below, we will require that qptq¥Fptqandqptq¥Fptqfor alltPr0;1s.
Definition 10. (Dual polynomial separators) Letf:RÑRbe a bounded function with fp0q0,
f;f0;f1be positive constants, and tTi;uk
i1the neighbourhoods as defined in (8). We then define
Fptq:$
''''&
''''%f0; t0;
f1; t1;
fpttiq;when there exists iPrkssuch thattPTi;;
f; elsewhere on intpIq:(23)
Moreover, let Pt1ukbe an arbitrary sign pattern. We define Fas
Fptq:$
''''&
''''%f0; t 0;
f1; t 1;
1fpttiq;when there exists iPrkssuch thattPTi;andi1;
f; everywhere else on intpIq:(24)
We defer the introduction of dual polynomials qandqand the precise role of the above dual polynomial
separators to Section 3, but state our most general results characterising the solutions to Program (4) in
terms of these separators.
Theorem 11. (Wasserstein stability of Program (4)forjptqa T-system) Consider a non-negative
measurexsupported on intpIqp 0;1qand assume that the measurement operator isL-Lipschitz, see (3)
and (22). Consider a k-sparse non-negative discrete measure supported on Tttiuk
i1€intpIqand fix
¤pTq{2, see (6), and consider functions FptqandFptqas defined in Definition 10. For m¥2k2,
suppose that
tjum
j1form a T-system on I,
tFuYtjum
j1form a T*-system on I, and
tFuYtjum
j1form a T*-system on Ifor any sign pattern .
Letpxbe a solution of Program (4) with
1LdGWpx;q: (25)
Then there exist vectors b;tbu€Rmsuch that
dGWpx;pxq¤
62
f
}b}26 min
}b}2
1}}TVdGWpx;q: (26)
where the minimum is over all sign patterns and the vectors b;tbu€Rmabove are the vectors of
coefficients of the dual polynomials qandqassociated with Program (4), see Lemmas 16 and 17 in Section
3 for their precise definitions.
Theorem 4 follows from Theorem 11 by considering jptqGaussian as stated in (9) which is known to
be a T-system [27], and introducing Conditions 2 on the source and sample configuration (5) such that the
conditions of Theorem 11 can be proved and the dual coefficients bandbbounded; the details of these
proofs and bounds are deferred to Section 3 and the appendices.
10

The particular form of FandtFuin Theorem 11, constant away from the support Tofxk;, is purely
to simplify the presentation and proofs. Note also that the error dGWpx;pxqdepends both on the noise
leveland the residual Rpx;k;q, not unlike the standard results in finite-dimensional sparse recovery and
compressed sensing [24, 38]. In particular, when ;;Rpx;k;qÑ0, we approach the setting of Proposition
8, where we have uniqueness of k-sparse non-negative measures from exact samples.
Note that the noise level and the residual Rpx;k;qare not independent; that is, specifies confidence
in the samples and the model for how the samples are taken while Rpx;k;qreflects nearness to the model
ofk-discrete measures. Corollary 6 show that the parameter can be removed, for jptqshifted Gaussians,
in the setting where xisk-discrete, that is Rpx;k;q0, in which case dGWpx;^xqis bounded byOp1{7q.
The more general variant of Theorem 5 follows from Theorem 12 by introducing alternative conditions
on the source and sample configuration and omitting the need for the functions F, which is the cause of
the unnatural 1dependence in Theorem 4.
Theorem 12. (Average stability for Program (4)forjptqa T-system) Let^xbe a solution of Program
(4)and consider the function Fptqas defined in Definition 10. Suppose that:
tjum
j1form a T-system on I,
tFuYtjum
j1form a T*-system on I, and
pTqand0Pp0;1{2qfrom Definition 1 satisfy
pqpqpq1
»1{2
pxqdx1
»1{2
pxqdx: (27)
Then, for any Pp0;{2qand for all iPrks,
»
Ti;^xpdtqai¤
2
18}b}2
f
L}^x}TV k¸
j1pA1qij; (28)
»
TC^xpdtq¤2}b}2
f; (29)
where:
bPRmis the same vector of coefficients of the dual certificate qas in Theorem 11 and fis given in
Definition 10, which is used to construct the dual certificate q, as described in Lemma 16 in Section 3,
8maxs;tPI|pstq|,
Lis the Lipschitz constant of ,
APRkkis the matrix
A
|1pt1q| |1pt2q|:::|1ptkq|
|2pt1q| |2pt2q|:::|2ptkq|
…………
|kpt1q| |kpt2q|:::|kptkq|
; (30)
withiptiqptislpiqqevaluated at slpiqas defined in (7).
Theorem 12 bounds the difference between the average over the interval Ti;of any solution to Program
(4) and the discrete measure whose average is simply ai. The condition on to satisfy (27) is used to
ensure the matrix from (30) is strictly diagonally dominant. It relies on the windows jptqbeing sufficiently
localised about zero. Though Theorem 12 explicitly states that the location of the closest samples to each
source is less than 0pTq, this can be achieved without knowing the locations of the sources by placing the
samples uniformly at interval 20pTqwhich gives a sampling complexity of mp20pTqq1. Lastly, a
similar bound on the integral of ^xoverTC
is given by Lemma 16 in Section 3.
11

2.2.1 Clustering of indistinguishable sources
Theorems 11 and 12 give uniform guarantees for all sources in terms of the minimum separation condition
pTq, which measures the worst proximity of sources. One might imagine that, for example, if all but two
sources are sufficiently well separated, then Theorem 12 might hold for the sources that are well separated;
moreover, assuming is fixed, then if two sources tiandti1with magnitudes aiandai1are closer than
2, namely|titi1| 2, we might imagine that a variant of Theorem 12 might hold but with sources ti
andti1approximated with source tneartiandti1and withaaiai1.
In this section we extend Theorem 12 to this setting by considering fixed and alternative intervals
t~Ti;u~k
i1a partition of Tsuch that each ~Ti;contains a group of consecutive sources ti1;:::;tiri(with
weightsai1;:::;airirespectively) which are within at most 2of each other. Define
~Ti;ki¤
l1Til;;wheretilPTil;and|til1til| 2;@lPrki1s; (31)
for°~k
i1kik, so that we have
T~k¤
i1~Ti;and ~Ti;£~Tj;H;@ij: (32)
Theorem 13. (Average stability for Program (4): grouped sources) Let^xbe a solution of Program
(4)andIr0;1sbe partitioned as described by (31). If the samples are placed uniformly at interval 20
where0satisfies (27)with 2, then there exist tiuiPr~kswithiP~Ti;such that
»
~Ti;^xpdtqki¸
r1air¤
2
18}b}2
f
p2k1qL}^x}TV k¸
j1p~A1qij; (33)
where the constants are the same as in (12)and the matrix ~APR~k~kis
~A
|1p1q| |1p2q|:::|1p~kq|
|2p1q| |2p2q|:::|2p~kq|
…………
|~kp1q| |~kp2q|:::|~kp~kq|
:
Note that Lemma 16 still holds if we replace any group of sources from an interval ~Ti;with someiP~Ti;,
so the bound from Lemma 16 on TC
remains valid without modification.
As an exemplar source location where Theorem 13 might be applied, consider the situation where the k
source locations comprising Tare drawn uniformly at random in p0;1q, where we have that (from [39] page
42, Exercise 22)
PppTq¡qr1pk1qsk; P
0;1
k1
:
Then, the cumulative distribution function is
FpqPppTq¤q1r1pk1qsk;
and so the distribution of pTqis
fpqF1pqpk1qkr1pk1qsk1;
with an expectation of
EppTqq»1
k1
0PppTq¡qd1
pk1q2: (34)
That is, for xfrom (1) with sources Tdrawn uniformly at random in p0;1q, the expected value of pTqis
given by (34) and, in Theorems 11 and 12, the corresponding number of samples mwould scale quadratically
withthenumberofsources kduetothescalingof mpTq1. Alternatively, Theorem13allowsmeaningful
results formproportional to kby grouping the sources that are within k2of one another.
12

3 Dual polynomials for stability of non-negative measures
The results in Section 2 are developed by establishing dual polynomials of Program (4) which are non-
negative except at the source locations, which implies that the solution to Program (4) is unique when
10, see Proposition 8, and then showing that the dual polynomials are sufficiently non-negative away
from the source locations and using this property to develop Theorems 11, 12, and 13. In this section we
state the key lemmas used to prove the aforementioned results and then bound the quantities involving the
dual polynomials in order to establish Theorems 4 and 5 for the case of Gaussian windows.
3.1 Uniqueness of non-negative sparse measures from exact samples: proof of
Proposition 8
Proposition 8 states that if xis a non-negative k-sparse discrete measure supported on I, see (1), provided
m¥2k1andtjum
j1are a T-system, then xis the unique non-negative solution to Program (4) with
10. This follows from the existence of a dual polynomial as stated in Lemma 14, the proof of which is
given in Appendix A.
Lemma 14. (Dual polynomial and uniqueness of non-negative sparse measure equivalence) Let
xbe a non-negative k-sparse discrete measure supported on I, see (1). Then, xis the unique solution of
Program (4) with 10if
thekmmatrixrjptiqsik;jm
i1;j1is full rank, and
there exist real coefficients tbjum
j1andqptq °m
j1bjjptqsuch thatqptqis non-negative on Iand
vanishes only on T.
Figure 3 shows an example of such a dual certificate using Gaussian jas defined in (9). It remains
to show that such a dual polynomial exists. To do this, we employ the concept of T-system introduced in
Definition 7. Of particular interest to us is Theorem 5.1 in [27], slightly simplified below, which immediately
proves Proposition 8.
Lemma15. (DualpolynomialexistenceforT-systems)[27,Theorem5.1,pp. 28] Withm¥2k1,
suppose thattjum
j1form a T-system on I. Then there exists a polynomial qptq°m
j1bjjptqthat is non-
negative on Iand vanishes only on T.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
t00.10.20.30.40.50.60.70.80.91qq(t)
source locations {ti}k
i=1
Figure 3: Example ofdual certificate qptqrequired inLemma 14. Here, we have k3andt10:27;t20:59
andt30:82.
13

3.2 Stabilising dual polynomials for non-negative sparse measures: proof of
Theorem 11
We develop the proof of Theorem 11 by using a dual polynomial analogous to that in Lemma 14, but with
further guarantees that away from the sources the dual polynomial must be sufficiently bounded away from
the constraint bounds. However, first, let us bring the generality of Theorem 11 to discrete measures by
introducing an intermediate measure which isk-discrete and whose support is 2separated. Noting that
the measurement operator isL-Lipschitz, see (22), and using the triangle inequality, it follows that
y»
Iptqpdtq
2¤y»
Iptqxpdtq
2»
Iptqpxpdtqpdtqq
2
¤LdGWpx;q:1: (35)
Therefore, a solution pxof Program (4) with 1specified above can be considered as an estimate of . In
the rest of this section, we first bound the error dGWp;pxqand then use the triangle inequality to control
dGWpx;pxq.
To controldGWp;pxqin turn, we will first show that the existence of certain dual certificates leads to sta-
bility of Program (4). Then we see that these certificates exist under certain conditions on the measurement
operator . Turning now to the details, the following result is slightly more general than the one in [40]
and guarantees the stability of Program (4) if a prescribed dual certificate qexists. The proof is provided in
Appendix B.
Lemma 16. (Error away from the support) Letpxbe a solution of Program (4) with 1specified in
(35) and set hpxto be the error. Consider Fptqgiven in Definition 10 and suppose that there exist a
positive¤pTq{2, real coefficients tbjum
j1, and a polynomial q°m
j1bjjsuch that
qptq¥Fptq;
where the equality holds on T. Then we have that
f»
TChpdtqk¸
i1»
Ti;fpttiqhpdtq¤2}b}21; (36)
wherebPRmis the vector formed by the coefficients tbjum
j1.
There is a natural analogy here with the case of exact samples. In the setting where i0in (2), the
dual certificate qin Lemma 14 was required to be positive off the support T. In the presence of inexact
samples however, Lemma 16 loosely-speaking requires the dual certificate to be bounded4awayfrom zero
(see example in Figure 4) for tPTC
.
Note also that Lemma 16 controls the error haway from the support T, as it guarantees that
»
TChpdtq¤2}b}21
f; (37)
if the dual certificate qexists. Indeed, (37) follows directly from (36) because the sum in (36) is non-negative.
This is in turn the case because fp0q0and the error his non-negative off the support T. Another key
observation is that Lemma 16 is almost silent about the error near the impulses in . Indeed, because
fp0q0by assumption, (36) completely fails to control the error on the support T. However, as the next
result states, Lemma 16 can be strengthened near the support provided that an additional dual certificate
q0exists. The proof, given in Appendix C, is not unsimilar to the analysis in [41].
4Note the scale invariance of (36) under scaling of fand f. Indeed, by changing f;fto f; ffor positive , the proof
dictates that bchanges to band consequently cancels out from both sides of (36). Similarly, if we change to in (3),
the proof dictates that bchanges to b{ and again cancels out, leaving (36) unchanged.
14

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
t-20246810qq(t)
source locations {ti}k
i=1
F(t)(a)
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
t00.10.20.30.40.50.60.7qq(t)
source locations {ti}k
i=1
F(t) (b)
Figure 4: Example of dual certificate qptqthat satisfies the conditions in Lemma 16 where the window
function is the Gaussian kernel ptqet2{2. We takek3andtiPt0:27;0:59;0:82uand the function
Fptqsuch thatqptq¥Fptq.
Lemma 17. (Error near the support) Suppose that the dual certificate in Lemma 16 exists. Consider a
functionF0ptqdefined like Fptqin Definition 10 for the sign pattern 0such that
0
i$
&
%1;when there exists iPrkssuch thattPTi;and³
Ti;hpdtq¡0;
1;when there exists iPrkssuch thattPTi;and³
Ti;hpdtq¤0;
and suppose also that there exist real coefficients tb0
jujPrmsand a polynomial q0°m
j1b0
jjsuch that
q0ptq¥F0ptq;
where the equality holds on T. Then we have that

i1»
Ti;hpdtq¤2
}b}2}b0}2
1: (38)
In words, Lemma 17 controls the error hnear the support T, provided that a certain dual certifi-
cateq0exists (see example in Figure 5). Note that (38) does not control the massof the error, namely°

Ti;|hpdtq|³
T|hpdtq|, but rather it controls°
i|³
Ti;hpdtq|. Of course, the latter is always bounded
by the former, that is

i1»
Ti;hpdtq¤»
T|hpdtq|: (39)
However, the two sides of (39) might differ significantly. For instance, it might happen that the solution px
returns a slightly incorrect impulse at ti{4(rather than ti) but with the correct amplitude of ai. As a
result, the mass of the error is large in this case (³
Ti;|hpdtq|2ai) but the left-hand side of (39) vanishes,
namely|³
Ti;hpdtq| 0. Note that we cannot hope to strengthen (38) by replacing its left-hand side with
the mass of the error, namely³
T|hpdtq|. This is the case mainly because the total variation is not the
appropriate error metric for this context.
Indeed, while the mass of the error³
I|hpdtq|might not be small in general, we can instead control the
generalised Wasserstein distance between the true and estimated measures, namely xandpx, see (19) and
(20). The following result is proved by combining Lemmas 16 and 17, see Appendix D.
Lemma 18. (Stability of Program (4)in the Generalised Wasserstein distance) Suppose that the
dual certificates in Lemmas 16 and 17 exist. Then it holds that
dGWp;pxq¤
62
f
}b}26}b0}2
1}}TV: (40)
15

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
t-50050100150200qq0(t)
source locations {ti}k
i=1
F0(t)(a)
0.2 0.3 0.4 0.5 0.6 0.7 0.8
t-202468101214qq0(t)
source locations {ti}k
i=1
F0(t) (b)
Figure 5: Example of dual certificate q0ptqthat satisfies the conditions in Lemma 17 where the window
function is the Gaussian kernel ptqet2{2. We takek3andtiPt0:27;0:59;0:82uand the function
F0ptqfor the sign pattern 0t 1;1;1usuch thatq0ptq¥F0ptq. For the Gaussian kernel, the existence of
qptqfor any sign pattern guarantees the existence of q0ptqin Lemma 17.
An application of triangle inequality now yields that
dGWpx;pxq¤dGWpx;qdGWp;pxq
¤dGWpx;q
62
f
}b}26}b0}2
1}}TV:(see Lemma 18) (41)
In words, Program (4) is stable xif the certificates qandq0exist. Let us now study the existence of these
certificates. Proposition 19, proved in Appendix E, guarantees the existence of the dual certificate qrequired
in Lemma 16 and heavily relies on the concept of T*-system in Definition 9. We remark that the proof
benefits from the ideas in [27]. Similarly, Proposition 20, stated without proof, ensures the existence of the
certificateq0required in Lemma 17.
Proposition 19. (Existence of q)Form¥2k2, suppose that tjum
j1form a T-system on Iand that
tFuYtjum
j1form a T*-system on I, whereFptqis the function given in Definition 10. Then the dual
certificateqin Lemma 16 exists and consequently Program (4) is stable in the sense that (36) holds.
Note that to ensure the success of Program (4), it suffices that there exists a polynomial q°m
j1bjj
such thatqptq¥Fptqwith the equality met on the support T, see Lemma 16. Equivalently, it suffices that
there exists a non-negative polynomial 9qb0F°m
j1bjjthat vanishes on Tsuch thatb0¡0and at
least one other coefficient, say bj0, is nonzero. This situation is reminiscent of Lemma 15. In contrast to
Lemma 15, however, such 9qexists whentFuYtjum
j1is a T*-system rather than a T-system. The more
subtle T*-system requirement is to avoid trivial or unbounded polynomials.
Proposition 20. (Existence of q0)Form¥2k2, suppose that tjum
j1form a T-system on Iand that
tF0uYtjum
j1form a T*-system on I, whereF0ptqis the function defined in Lemma 17. Then the dual
certificateq0in Lemma 17 exists and consequently Program (4) is stable in the sense that (38) holds.
Having constructed the necessary dual certificates in Propositions 19 and 20, the proof of Theorem 11 is
now complete in light of (41).
16

3.3 Proof of Theorem 12 (Average stability for Program (4))
In this section we give an overview of the main ideas involved in proving Theorem 12. To start with, let
APRkkbe defined as in (30):
A
|1pt1q| |1pt2q|:::|1ptkq|
|2pt1q| |2pt2q|:::|2ptkq|
…………
|kpt1q| |kpt2q|:::|kptkq|
;
whereiptiqptislpiqqis evaluated at the source tiand the closest sample to it, as defined in (7).
The proof of Theorem 12 consists of two steps. We first show that we can bound the error if the matrix A
is strictly diagonally dominant. It is easy to see that, if the window function is localised, then the entries
on the main diagonal are larger in absolute value than the off-diagonal entries. If, moreover, we choose the
sampling locations tsiuiPrmssuch thatAis strictly diagonally dominant (which means that for each source,
there is a sampling location that is "close enough" to it), then the bound (28) is guaranteed.
Proposition 21. For each source ti, selectslpiqto be the closest sample as defined in (7), and define the
matrix A in (30)using the sequences ttiuk
i1,tslpiquk
i1. IfAis strictly diagonally dominant, then the error
around the support is bounded according to (28).
Then, we want to go further and see what it means exactly for Ato be strictly diagonally dominant, so
the second step in the proof of Theorem 12 is to give an upper bound for the distance between the sources
ttiuiPrksand the closest sampling locations tslpiquiPrkssuch thatAis strictly diagonally dominant.
Given an even positive function that is localised at 0and with fast decay, let andas given in
Definition 1, so
|tislpiq|¤ (42)
We want to find 0such that
pslpiqtiq¥¸
jipslpiqtjq;@Pp0;0q;@iPrks; (43)
namely, we want the matrix Ato be strictly diagonally dominant. From the conditions (43), we can obtain
a more general equality, depending on and, that0must satisfy such that, for  0,Ais strictly
diagonally dominant. The equality is given by (27):
p0qp0qp0q1
»1{20
0pxqdx1
»1{20
0pxqdx:
Proposition 22. Let0Pp0;1
2qsuch thattislpiq¤0for alliPrks. If0satisfies (27), then the
matrix A defined in (30)is strictly diagonally dominant.
Finally, we note that the proof of Theorem 13 involves the same ideas as the ones discussed in this section,
with a few modifications. The detailed proofs of Proposition 21 and Proposition 22 are given in Appendices
F and G respectively. The proof of Theorem 13 is similar to the proof presented in the current section, so
we only show the differences in Appendix H.
3.4 Proofs of Theorems 4 and 5 (Gaussian with sparse measure)
In this section we give the main steps taken to obtain the explicit bounds in Theorems 4 and 5 for the
Gaussian window function. These are particular cases of the more general Theorems 11 and 12 respectively,
where the window function is taken to be the Gaussian jptq eptsjq2{2, given in (9), and the true
measurexis a k-discrete non-negative measure as in (1).
17

3.4.1 Bounds on the coefficients of the dual certificates for Gaussian window
We will now give explicit bounds on the vectors of coefficients }b}2and}b}2of the dual certificates qandq
from Lemmas 16 and 17 in terms of the parameters of the problem k;T;Sand(the width of the Gaussian
window).
Firstly, we introduce a more specific form of the dual polynomial separators Fptq,Fptqfrom Definition
10. Here, we take fptq0fortPp;qand f;f1positive constants with f 1. Then,f0is defined to be
greater that both fandf1, with the exact relationship between f0and fgiven in the proof of Lemma 23.
Therefore, for ¡0and a sign pattern Pt1uk,FptqandFptqare:
Fptq$
''''&
''''%f0; t0;
f1; t1;
0;when there exists iPrkssuch thattPTi;;
f;elsewhere on I:(44)
Fptq$
''''&
''''%f0; t0;
f1; t1;
1;when there exists iPrkssuch thattPTi;andi1;
f;elsewhere on I:(45)
With the above definitions, both Theorem 11 and Theorem 12 require that tFuYtjum
j1form a T*-system
onI. Likewise, Theorem 11 requires that tFuYtjum
j1form a T*-system for any sign pattern . We show
in Lemma 23 that both these requirements are satisfied for the choice in (9) of jptqeptsjq2{2. The
proof is given in Appendix J.
Lemma 23. Consider the function Fptqdefined in (44)and suppose that m¥2k2. ThentFuYtjum
j1
form a T*-system on I, withextended totally positive, even Gaussian and jdefined as in (9), provided
thatf0"f,f0"f1and f;f0;f1"0. These requirements are made precise in the proof and are dependent
on. Moreover, for an arbitrary sign pattern andFas defined in (45),tFuYtjum
j1form a T*-system
onIwhen, in addition, f0"1.
In this setting, consider a subset of m2k2samplestsjum
j1€S(since in the proof of Lemma 23 we
select the 2k2samples that are the closest to the sources) such that they satisfy Conditions 2. Therefore,
we have that s10,sms2k21, and
|s2iti|¤; s 2i1s2i;@iPrks; (46)
for a small ¤2, see (9). That is, we collect two samples close to each impulse tiinx, one on each side of
ti. Suppose also that
¤?
2;¡d
log
34
2
; ¤2; (47)
namely the width of the Gaussian is much smaller than the separation of the impulses in x. Lastly, assume
that the impulse locations T ttiuk
i1and sampling points tsjum1
j2are away from the boundary of I,
namely
a
logp1{3q¤ti¤1a
logp1{3q;@iPrks;
a
logp1{3q¤sj¤1a
logp1{3q;@jPr2:m1s: (48)
Wecannowgiveexplicitboundson }b}2and}b}2fortheGaussianwindowfunction,asrequiredbyTheorems
11 and 12. The following result is proved in Appendix K.
18

Lemma 24. Suppose that the window function is Gaussian, as defined in (9), the assumptions (46),(47)
and(48)(namely Conditions 2) hold and satisfies:
¤min$
''''''&
''''''%8Fminp;1
q
34p2k2q
80k8kP1
3
1e2
2 1
2;Cpf0;f1q1
6

4k44k
2 1
3,
//////.
//////-: (49)
Then we have the following bounds:
}b}2¤c
p2k2q
4k54k
4
1?e
2Cpf0;f1q5
4
Fmax
;1

Fmin
;1
2
k
; (50)
}b}2¤?
2k2

1?e
2 Cpf0;f1q2k3
2
Fmax
;1

Fmin
;1
2
k
; (51)
where
Cpf0;f1qf2
0f2
12f02f12; (52)
P1

4
413
42
24
4
2
9
412
48
6
2
: (53)
Fmax
;1


8
14
4
2
1e2
2 1
2
321
42
62
8
24
1e2
2 1
2
;(54)
Fmin
;1

1
12
2
2e2
2
1e2
2: (55)
To obtain the final bounds for the Gaussian window function, we will substitute the above bounds in
the right hand side of (26) in Theorem 11 and in (28) and (29) in Theorem 12. We will then obtain F1in
Theorem 4 (see (64)) and F2in Theorem 5 (see (71)).
For more clarity, in the following lemma we simplify F1andF2further, in the case when stronger
conditions apply to ,pTqand.
Lemma 25. If the conditions in Lemma 24 hold and, in addition,  1?
3,¡b
log5
2and f 1, then
Fmax
;1

Fmin
;1
2 c1
6p132q2; (56)

p62
fqc
4k54k
4Cpf0;f1q5
46
pCpf0;f1q2kq3
2 ?
2k2
1?e
2 c2kC1p1
q
2; (57)
c
p2k2q
4k54k
4
1?e
2Cpf0;f1q5
4
f c3kC2p1
q
2; (58)
where
C11

pCpf0;f1q2kq3
2
f; (59)
C21

Cpf0;f1q5
4
f; (60)
19

and, similarly, the condition (49)is simplified to condition (13). Moreover, if in Theorem 12 satisfies
 2
4, then
1
e22
2e22
2e2
2e22
2
1e2
2e2p1q2
2 c4: (61)
Above,c1;c2;c3;c4are universal constants.
More specifically, (56), (57) will be used to bound F1to obtain (12) and (56), (58), will be used to bound
F2to obtain (16). Lastly, (61) will be used to bound F3, which appears in the bound given by Theorem 5.
The proof of Lemma 25 is given in Appendix N. Note that we give C1andC2as functions of1
because,
asÑ0,C1andC2grow at a rate dependent on , as indicated in the Lemma 27 in Section 3.4.4.
3.4.2 Proof of Theorem 4 (Wasserstein stability of Program (4)forptqGaussian)
We consider Theorem 11 and restrict ourselves to the case x, a k-discrete non-negative measure as in (1)
with support T. Let pTq¥2, withTas the support of x. We begin with estimating the Lipschitz
constant of the measurement operator according to (22), see Appendix I for the proof of the next result.
Lemma 26. ConsiderS tsjum
j1€Randtjum
j1specified in (9). Then the operator :IÑRm
defined in (3) is2?m
?
2e-Lipschitz with respect to the generalised Wasserstein distance, namely (22) holds with
L2?m
?
2e.
We may now invoke Theorem 11 to conclude that, for an arbitrary k-sparse 2-separated non-negative
measurexand arbitrary sampling points tsjum
j1€R, Program (4) with 1is stable in the sense that
there exist vectors b;tbu€Rmsuch that
dGWpx;pxq¤
62
f
}b}26 min
}b}2
}x}TV (62)
provided that m¥2k2andf0"f1"f"0. The exact relationships between f;f0;f1are given in the
proof of Lemma 23 in Appendix J.
Combining Lemma 24 with (62) yields a final bound on the stability of Program (4) with Gaussian
window and completes the proof of the first part of Theorem 4:
dGWpx;pxq¤F1pk;pTq;1
;1
;q}x}TV; (63)
where
F1pk;pTq;1
;1
;q
p62
fqc
4k54k
4C5
46
pC2kq3
2 ?
2k2
1?e
2
Fmax
;1

Fmin
;1
2

k
;(64)
with C;F max;Fmingiven in (52), (54), (55) respectively and f0f0p1
qdepends on (see the proof of Lemma
27).
Finally, to show that (12) holds when  1?
3, and ¡b
log5
2, we apply the first part of Lemma 25
with f 1and the proof of Theorem 4 is complete.
Remark. In particular, f0increases as Ñ0andf0also depends on the other parameters of the problem,
namely;;pTq;k. See Section 3.5 for a detailed discussion. Furthermore, f1and fare considered fixed
positive constants with f1 f0.
20

3.4.3 ProofofTheorem5(AveragestabilityofProgram (4)forptqGaussian: sourceproximity
dependence)
We now apply Theorem 12 with ptqgptqet2{2. We have that 81, the Lipschitz constant Lofg
onr1;1sisL2
2, and

j1pA1qij¤}A1}8 1
minj
gpsjtjq°
ijgpsjtiq : (65)
The last inequality comes from the definition of Ain (30) with ptq gptqandsj:slpjqas given in
Definition 1, and [42]. Then, by assumption, |sjtj|¤for an arbitrary jPrksandgis decreasing, so
gpsjtjq¥gpqe22
2:
We now assume without loss of generality that sj tj. Then, it follows that
|sjti|¥|ji|;ifi jand|sjti|¥|ji|;ifi¡j:
This, in turn, leads to
¸
ijgpsjtiqj1¸
i1gpsjtiqk¸
ij1gpsjtiq
¤j1¸
i1gppjiqqk¸
ij1gppijqq
¤8¸
i1gppiqq8¸
i1gppiqq: (66)
We now bound each sum in (66) as follows

i1gppiqq8¸
i1epiq22
2
e2p1q2
2e22
28¸
i2
e2
2
i22i
¤e2p1q2
2e22
28¸
i2
e2
2
i
pi22i¡ifori¥2q
e2p1q2
2e22
2e22
2
1e2
2; (67)
and similarly, we have that

i1gppiqq8¸
i1epiq22
2¤e22
2e2
2
1e2
2: (68)
By combining (66), (67) and (68), we obtain:
gpsjtjq¸
ijgpsjtiq¥e22
2e22
2e2
2e22
2
1e2
2e2p1q2
2: (69)
The above inequality also holds when we take the minimum over jPrksand, inserting it in (65) and using
this result and the bound on }b}2from Lemma 24 in (28), we obtain (14):
»
Ti;^xpdtqai¤
pc1F2qc2}^x}TV
2
F3; (70)
21

where
F2pk;pTq;1
;1
qc
p2k2q
4k54k
4
1?e
2Cpf0;f1q5
4
f
Fmax
;1

Fmin
;1
2
k
; (71)
F3ppTq;;q1
e22
2e22
2e2
2e22
2
1e2
2e2p1q2
2; (72)
and C;F max;Fminare given in (52), (54), (55) respectively. The error bound away from the sources (15) is
obtained by applying Lemma 16 with the same bounds on }b}2.
Then, by using Lemma 25 with f 1, we obtain (16). Note that we can apply Lemma 25 because, for
 1?
3, we have that5
2¡34
2and, therefore, ¡b
log5
2¡b
logp34
2q.
3.4.4 Proof of Corollary 6
First we give an explicit dependence of C1p1
qandC2p1
qonfor small¡0in the following lemma, proved
in Appendix O.
Lemma 27. Iff1 f0,1 f0and f 1, then there exists 0¡0such that:
C11

 pc1C2
2k4q3
2
f1
6(73)
C21

 c2C5
21
5; (74)
for allPp0;0q, where c1andc2are universal constants and Cis defined in the proof, see (236).
To prove the first part of the corollary, we first let 1
7in the bound on C1p1
qin Lemma 27:
C11

 pc1C2
2k4q3
2
f1
6
7;@ 7
0; (75)
and we substitute the above inequality in the bound (12) in Theorem 4 to obtain:
dGWpx;^xq C11
7;@ 7
0;
where
C1c1kpc1C2
2k4q3
2
2fc2
6p132q2k
}x}TV: (76)
Similarly, let 1
6in the bound on C2p1
qin Lemma 27:
C21

 c2C5
21
5
6;@ 6
0; (77)
which we substitute in the bound (16) in Theorem 5 to obtain:
»
Ti;^xpdtqai¤C21
6;@ 6
0;
where
C2c1c3kc2C5
2
2c4
6p132q2k
c2}^x}TV
2: (78)
Note that we apply Lemma 27 with f 1and that both inequalities in the corollary hold for  07
0,
where0is given by Lemma 27.
22

3.5 Discussion
In this section, we discuss a few issues regarding the robustness of our construction of the dual certificate
from Appendix E. There are two points that need to be raised: the construction itself and the proof that we
indeed have a T*-System.
At the moment, we do not use any samples that are away from sources in the the construction of the
dual certificate. If the sources are close enough compared to , then this is not an issue. However, for 
small relative to the distance between samples, in light of the proof of Lemma 23 (see Appendix J), if we
consider the dual certificate as the expansion of the determinant NofMin (122) along the lrow:
NFplq 0m¸
j1p1qj1 jgplsjq; (79)
then the terms gplqbecome exponentially small (as lis far from all samples sj) and, therefore, the value
ofNis close toFplq(which isfiflPTC
). This is problematic, as we require that N¡0. We can
overcome this by adding “fake” sources at intervals 1so that they cover the regions where we have no true
sources, together with two close samples for each extra source. The determinant Nbecomes:
N f0fps1q  gpsmq
0gpt1s1q gpt1smq
0g1pt1s1q g1pt1smq
………
f gpjs1q gpjsmq
f g1pjs1q g1pjsmq
………
Fplqgpls1q gplsmq
………
0gptks1q gptksmq
0g1ptks1q g1ptksmq
f1gp1s1q gp1smq : (80)
Here, therowsareorderedaccordingtotheorderingofthesetcontaining ti;j;l. Thetermsintheexpansion
of (80) along the row with ldo not approach 0exponentially with this construction, since for any lthere
existssiclose enough so that gplsiq¡ffor somef¡0.
More specifically, consider also the expansion of Nalong the first column:
Nf0N1;1f1Nm1;1FplqNl;1f¸
j lpNj;1Nj1;1qf¸
j¡lpNj;1Nj1;1q:(81)
We use this expansion in the proof of Lemma 23 in Appendix J to show that the functions FYtgjum
j1form
a T*-System. For lPTC
,Fplqfand the setup in Lemma 23, we require that (see (127)):
f0
f¥Nl;1
minlPTCN1;1: (82)
In the construction (80), if we upper bound the pairs Nj;1Nj1;1in the two sums in (81) (a separate
problem by itself), then we can impose a similar condition to (82) for f0and f. From here, we obtain that
23

f0Cfwhere finding C¥Nl;1
minlPTCN1;1involves finding a lower bound on N1;1:
N gpt1s1q gpt1smq
g1pt1s1q g1pt1smq
……
gpjs1q gpjsmq
g1pjs1q g1pjsmq
……
gpls1q gplsmq
……
gptks1q gptksmq
g1ptks1q g1ptksmq
gp1s1q gp1smq : (83)
The structure of the above determinant is similar to the denominator in Appendix K but only up to the row
withl. The rows after it do not preserve the diagonally dominant structure of the matrix, as each source
becomes associated with one close sample to it and the first sample corresponding to the next source. This
is an issue in both the construction described in the proof of Proposition 19 in Appendix E (and detailed
in the proof of Lemma 23) and the construction described in the current section (which would result from
considering a determinant with “fake” sources like (80)). However, by adding extra “fake” sources, one could
argue that the determinant (80) is better behaved, as the distance between a source and the first sample
corresponding to the next source is smaller, which we leave for further work.
Acknowledgements
AE is grateful to Gongguo Tang and Tamir Bendory for enlightening discussions and helpful comments.
This publication is based on work supported by the EPSRC Centre For Doctoral Training in Industrially
Focused Mathematical Modelling (EP/L015803/1) in collaboration with the National Physical Laboratory
and by the Alan Turing Institute under the EPSRC grant EP/N510129/1 and the Turing Seed Funding grant
SF019. The authors thank Stephane Chretien, Alistair Forbes and Peter Harris from the National Physical
Laboratory for the support and insight regarding the associated applications of super-resolution.
References
[1] Geoffrey Schiebinger, Elina Robeva, and Benjamin Recht. Superresolution without separation. Infor-
mation and Inference: a journal of the IMA , 00:1–30, 2017.
[2] Yohann De Castro and Fabrice Gamboa. Exact reconstruction using beurling minimal extrapolation.
Journal of Mathematical Analysis and applications , 395(1):336–354, 2012.
[3] Klaus G. Puschmann and Franz Kneer. On super-resolution in astronomical imaging. Astronomy &
Astrophysics , 436(1):373–378, 2005.
[4] Eric Betzig, George H. Patterson, Rachid Sougrat, O. Wolf Lindwasser, Scott Olenych, Juan S. Boni-
facino, Michael W. Davidson, Jennifer Lippincott-Schwartz, and Harald F. Hess. Imaging intracellular
fluorescent proteins at nanometer resolution. Science, 313(5793):1642–1645, 2006.
[5] Samuel T. Hess, Thanu P. K. Girirajan, and Michael D. Mason. Ultra-high resolution imaging by
fluorescence photoactivation localization microscopy. Biophysical journal , 91(11):4258–4272, 2006.
[6] Michael J. Rust, Mark Bates, and Xiaowei Zhuang. Sub-diffraction-limit imaging by stochastic optical
reconstruction microscopy (storm). Nature methods , 3(10):793–796, 2006.
24

[7] Charles W. Mccutchen. Superresolution in microscopy and the abbe resolution limit. JOSA,
57(10):1190–1192, 1967.
[8] Hayit Greenspan. Super-resolution in medical imaging. The Computer Journal , 52(1):43–63, 2009.
[9] Chaitanya Ekanadham, Daniel Tranchina, and Eero P. Simoncelli. Neural spike identification with
continuous basis pursuit. Computational and Systems Neuroscience (CoSyNe), Salt Lake City, Utah ,
2011.
[10] Stefan Hell. Primer: fluorescence imaging under the diffraction limit. Nature Methods , 6(1):19, 2009.
[11] Ronen Tur, Yonina C. Eldar, and Zvi Friedman. Innovation rate sampling of pulse streams with
application to ultrasound imaging. IEEE Transactions on Signal Processing , 59(4):1827–1842, 2011.
[12] David J. Thomson. Spectrum estimation and harmonic analysis. Proceedings of the IEEE , 70(9):1055–
1096, 1982.
[13] Gongguo Tang, Badri Narayan Bhaskar, and Benjamin Recht. Near minimax line spectral estimation.
IEEE Transactions on Information Theory , 61(1):499–512, 2015.
[14] Valery Khaidukov, Evgeny Landa, and Tijmen Jan Moser. Diffraction imaging by focusing-defocusing:
An outlook on seismic superresolution. Geophysics , 69(6):1478–1490, 2004.
[15] Parikshit Shah, Badri Narayan Bhaskar, Gongguo Tang, and Benjamin Recht. Linear system iden-
tification via atomic norm regularization. In Decision and Control (CDC), 2012 IEEE 51st Annual
Conference on , pages 6265–6270. IEEE, 2012.
[16] EmmanuelJ.CandèsandCarlosFernandez-Granda. Towardsamathematicaltheoryofsuper-resolution.
Communications on Pure and Applied Mathematics , 67(6):906–956, 2014.
[17] Gongguo Tang, Badri Narayan Bhaskar, Parikshit Shah, and Benjamin Recht. Compressed sensing off
the grid. IEEE Transactions on Information Theory , 59(11):7465–7490, 2013.
[18] Karsten Fyhn, Hamid Dadkhahi, and Marco F. Duarte. Spectral compressive sensing with polar interpo-
lation. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing
(ICASSP) , 2013.
[19] Laurent Demanet, Deanna Needell, and Nam Nguyen. Super-resolution via superset selection and prun-
ing.Proceedings of the 10th International Conference on Sampling Theory and Applications (SampTA) ,
2013.
[20] Vincent Duval and Gabriel Peyré. Exact support recovery for sparse spikes deconvolution. Foundations
of Computational Mathematics , pages 1–41, 2015.
[21] Quentin Denoyelle, Vincent Duval, and Gabriel Peyré. Support recovery for sparse super-resolution of
positive measures. Journal of Fourier Analysis and Applications , 23(5):1153–1194, 2017.
[22] Jean-Marc Azais, Yohann De Castro, and Fabrice Gamboa. Spike detection from inaccurate samplings.
Applied and Computational Harmonic Analysis , 38(2):177–195, 2015.
[23] Tamir Bendory, Shai Dekel, and Arie Feuer. Robust recovery of stream of pulses using convex optimiza-
tion.Journal of Mathematical Analysis and Applications , 442(2):511–536, 2016.
[24] David L. Donoho. Compressed sensing. IEEE Transactions on information theory , 52(4):1289–1306,
2006.
[25] Emmanuel J. Candès, Justin Romberg, and Terence Tao. Robust uncertainty principles: Exact signal
reconstructionfromhighlyincompletefrequencyinformation. IEEE Transactions on information theory ,
52(2):489–509, 2006.
25

[26] Emmanuel J. Candès and Terence Tao. Decoding by linear programming. Information Theory, IEEE
Transactions on , 12(51):4203–4215, Dec 2005.
[27] SamuelKarlinandWilliamJ.Studden. Tchebycheff systems: with applications in analysis and statistics .
Pure and applied mathematics. Interscience Publishers, 1966.
[28] Samuel Karlin. Total Positivity, Volume 1 . Total Positivity. Stanford University Press, 1968.
[29] Mark G. Krein, Adolf A. Nudelman, and David Louvish. The Markov Moment Problem And Extremal
Problems .
[30] Paul Doukhan and Fabrice Gamboa. Superresolution rates in Prokhorov metric. Canad. Math. Bull. ,
48(2):316–329, 1996.
[31] Veniamin I. Morgenshtern and Emmanuel J. Candès. Stable super-resolution of positive sources: the
discrete setup. SIAM J. on Imaging Sciences , 9(1):412–444, 2016.
[32] Tamir Bendory. Robust recovery of positive stream of pulses. IEEE Transactions on Signal Processing ,
65(8):2114–2122, 2017.
[33] David L. Donoho. Superresolution via sparsity constraints. SIAM J. Math. Anal. , 23(5):1309–1331,
1992.
[34] Brett Bernstein and Carlos Fernandez-Granda. Deconvolution of Point Sources: A Sampling Theorem
and Robustness Guarantees. (July), 2017.
[35] Allan Pinkus. Spectral properties of totally positive kernels and matrices. Total positivity and its
applications , 359:1–35, 1996.
[36] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algo-
rithms (3rd ed.) . MIT Press, 2009.
[37] BenedettoPiccoliandFrancescoRossi. GeneralizedWassersteindistanceanditsapplicationtotransport
equations with source. Archive for Rational Mechanics and Analysis , 211(1):335–358, 2014.
[38] Emmanuel J. Candès and Michael B. Wakin. An introduction to compressive sampling. IEEE signal
processing magazine , 25(2):21–30, 2008.
[39] William Feller. An Introduction to Probability Theory and its Applications, Vol 2 . Wiley, 1968.
[40] Emmanuel J. Candès and Carlos Fernandez-Granda. Super-resolution from noisy data. Journal of
Fourier Analysis and Applications , 19(6):1229–1254, 2013.
[41] Carlos Fernandez-Granda. Support detection in super-resolution. Proceedings of the 10th International
Conference on Sampling Theory and Applications (SampTA) , pages 145–148, 2013.
[42] James M. Varah. A lower bound for the smallest singular value of a matrix. Linear Algebra and its
Applications , 11(1):3–5, 1975.
[43] Robert J. Plemmons. M-Matrix Characterizations. I-Nonsingular M-Matrices. Linear Algebra and its
Applications , 18(2):175–188, 1977.
[44] Filippo Santambrogio. Optimal Transport for Applied Mathematicians: Calculus of Variations, PDEs,
and Modeling . Progress in Nonlinear Differential Equations and Their Applications. Springer Interna-
tional Publishing, 2015.
[45] Roger A. Horn and Charles R. Johnson. Matrix Analysis . Cambridge University Press, 1990.
[46] Izrail S. Gradshteyn and Iosif M. Ryzhik. Tables of Integrals, Series, and Products (6th ed) . San Diego,
CA: Academic Press, 2000.
26

A Proof of Lemma 14
Let^xbe a solution of Program 4 with 0and leth^xxbe the error. Then, by feasibility of both x
andpxin Program (4), we have that
»
Ijptqhpdtq0; jPrms: (84)
LetTCbe the complement of Tttiuk
i1with respect to I. By assumption, the existence of a dual certificate
allows us to write that»
TCqptqhpdtq»
Iqptqhpdtq»
Tqptqhpdtq

Iqptqhpdtq
qptiq0; iPrks
m¸
j1bj»
Ijptqhpdtq
0:(see (84))
Sincex0onTC, thenh^xonTC, so the last equality is equivalent to
»
TCqptq^xpdtq0: (85)
Butqis strictly positive on TC, so it must be that h^x0onTCand, therefore, h°k
i1citifor
some coefficients tciu. Now (84) reads°k
i1cijptiq0for everyjPrms. This gives ci0for alliPrks
becauserjptiqsi;jis, by assumption, full rank. Therefore h0and^xxonI, which completes the proof
of Lemma 14.
B Proof of Lemma 16
Let^xbe a solution of Program (4) and set h^xto be the error. Then, by feasibility of both andpx
in Program (4) and using the triangle inequality, we have that
»
Iptqhpdtq
2¤21: (86)
Next, the existence of the dual certificate qallows us to write that
f»
TChpdtqk¸
i1»
Ti;fpttiqhpdtq
¤»
TCqptqhpdtqk¸
i1»
Ti;qptqhpdtq

TCqptqhpdtq»
Tqptqhpdtq
TYk
i1Ti;

Iqptqhpdtq
m¸
j1bj»
Ijptqhpdtq
¤}b}2»
Iptqhpdtq
2(Cauchy-Schwarz inequality)
¤}b}221;(see (86))
which completes the proof of Lemma 16.
27

C Proof of Lemma 17
The existence of the dual certificate q0allows us to write that

i1»
Ti;hpdtq
k¸
i1»
Ti;sihpdtqsisign»
Ti;hpdtq
k¸
i1»
Ti;
siq0ptq
hpdtqk¸
i1»
Ti;q0ptqhpdtq
k¸
i1»
Ti;
siq0ptq
hpdtq»
Iq0ptqhpdtq»
TCq0ptqhpdtq

si1»
Ti;
1q0ptq
hpdtq¸
si1»
Ti;
1q0ptq
hpdtq
»
Iq0ptqhpdtq»
TCq0ptqhpdtq
¤¸
si1»
Ti;fpttiqhpdtq¸
si1»
Ti;fpttiqhpdtq»
Iq0ptqhpdtqf»
TChpdtq
k¸
i1»
Ti;fpttiqhpdtqf»
TChpdtq»
Iq0ptqhpdtq
¤2}b}21»
Iq0ptqhpdtq(see Lemma 16)
2}b}21m¸
j1b0

Ijptqhpdtq
¤2}b}21}b0}221;(see (86))
which completes the proof of Lemma 17.
D Proof of Lemma 18
Our strategy is as follows. We first argue that
dGWp;pxqdGWp;rxq;
whererxis the restriction of pxto the-neighbourhood of the support of , namelyTdefined in (8). This,
loosely speaking, reduces the problem to that of computing the distance between two discrete measures
supported on T. We control the latter distance using a particular suboptimal choice of measure
in (20).
Let us turn to the details.
Letrxbe the restriction of pxtoT, namely rxpx|T, or more specifically
rxpdtq#
pxpdtqtPT;
0tPTC
:
Then, using the triangle inequality, we observe that
dGWp;pxq¤dGWp;rxqdGWprx;pxq: (87)
28

The last distance above is easy to control: We write that
dGWprx;pxqinf
~z;^z
}rxrz}TV}pxpz}TVdWprz;pzq
;(see (19))
¤}rxpx}TV}pxpx}TVdWppx;pxq przpzpxq
}rxpx}TV
px|TC
TV

TCpxpdtq ppxis non-negative q

TChpdtq
hpdtqpxpdtqpdtqpxpdtqwhentPTC
¤2}b}21
f:(see Lemma 16) (88)
We next control the term dGWp;rxqin (87) by writing that
dGWp;rxqinf
~z;^z
}z}TV}rxrz}TVdWpz;rzq
(see (19))
¤}}TVrxrx}}TV
}rx}TV
TVdWp;rzq
z;rzrx}}TV
}rx}TV
}rx}TV}}TVdWp;rzq
rx|TC
TVrx|T
TV|T
TVdWp;rzq
px|T
TV|T
TVdWp;rzq
rxpx|T
»
Tpxpdtq»
TpdtqdWp;rzq pandpxare non-negative q
»
ThpdtqdWp;rzq
¤k¸
i1»
Ti;hpdtqdWp;rzq(triangle inequality)
¤2
}b}2}b0}2
1dWp;rzq:(see Lemma 17) (89)
For future use, we record an intermediate result that is obvious from studying (89), namely
}rx}TV}}TV¤2
}b}2b0
2
1: (90)
It remains to control dWp;rzqabove where
dWp;rzqinf»
I2|r|
pd;drq; (91)
is the Wasserstein distance between the measures andrz. The infimum above is over all measures
on
I2IIthat produce andrzas marginals, namely we have:
»
AI
pd;drqpAqand»
IB
pd;drqrzpBq;@A;B€I (92)
For everyiP rks, let also rxirx|Ti;andrzirz|Ti;be the restrictions of rxandrztoTi;, respectively.
Because of our choice of z;rzin the second line of (89), note that rzirxi}z}TV{}rx}TV. Recalling that is
29

supported on Tttiuk
i1, we write that °k
i1aitifor non-negative amplitudes taiuk
i1. Then, noting
thatis supported on Tand~zis supported on T, then any feasible
in (91) is supported on TTand
we can construct a feasible but suboptimal
pd;drqin a two-step approach, where we first extract up to ai
weight of rzion eachti, for example let

1k¸
i1rzipdrq1Tiwhere 1Ti$
&
%Ti; if³
Ti;zipdq¤ai;
rtii;tiisif otherwise ;(93)
whereiis defined such that³tii
tiizipdqai. As a result,
1hasdmarginal equal to rzion the support
of
1and the drmarginal no more than the desired ai. We then construct
2by partitioning the remaining
rziinto the drsubsets in order to make up the ~zmarginal, which is exactly achievable using all of ~zdue to³
~zpdq°k
i1ai. Then, we take

1
2.
Intuitively, this is a transport plan according to which we move as much mass as possible inside each
Ti;(fromaitorzi, the minimum of the masses the two) and the remaining mass is moved outside Ti;.
Therefore, for this choice of
, we have that
dWp;rzq¤»
I2|r|
pd;drq
k¸
i1»
ttiuTi;|r|
pd;drqk¸
i1»
ttiuTC
i;|r|
pd;drq
¤k¸
i1»
ttiuTi;
1pd;drqk¸
i1»
ttiuTC
i;
2pd;drq (94)
The third line above uses the fact that tiand, ifrPTi;, then|r|¤and|r|¤1otherwise. By
evaluating the integrals in the last line above, we find that
dWp;rzq¤k¸
i1mintai;}rzi}TVuk¸
i1
aimintai;}rzi}TVu
¤k¸
i1aik¸
i1ai}rzi}TV
}}TVk¸
i1ai}rzi}TV
k¸
i1ai}}TV

: (95)
Then, we have that
dWp;rzq¤}}TVk¸
i1»
Ti;pdtq}}TV
}rx}TV»
Ti;rxpdtq
see the second line of (89)
¤}}TVk¸
i1»
Ti;pdtqrxpdtqk¸
i1}rxi}TV1}}TV
}rx}TV(triangle inequality)
}}TVk¸
i1»
Ti;hpdtq}rx}TV}}TV
rxpx|T
¤}}TVk¸
i1»
Ti;hpdtq2
}b}2}b0}2
1(see (90))
¤}}TV2
}b}2}b0}2
12
}b}2}b0}2
1:(see Lemma 17) (96)
Substituting the above bound back into (89), we find that
dGWp;rxq¤6
}b}2}b0}2
1}}TV: (97)
30

Then combining (88) and (97) yields
dGWp;pxq¤dGWpx;rxqdGWprx;pxq
¤2}b}21
f6
}b}2}b0}2
1}}TV;
which completes the proof of Lemma 18.
E Proof of Proposition 19
Without loss of generality and for better clarity, suppose that Tttiuk
i1is an increasing sequence. Consider
a positive scalar such that¤¤{2. Consider also an increasing sequence tlum
l1€Ir0;1ssuch
that10,m1, and every Ti;contains an even and nonzero number of the remaining points. Let us
define the polynomial
qptqFptq1ptq mptq
Fp1q1p1q mp1q
Fp2q1p2q mp2q
…………
Fpmq1pmq mpmq; tPI: (98)
Note thatqptq0whentPtlum
l1. By assumption, tFuYtjum
j1form a T*-system on I. Therefore,
invoking the first part of Definition 9, we find that qis non-negative on TC
. We represent this polynomial
withq 
0F°m
j1p1qj 
jjand note that 
0|jpiq|m
i;j1. By assumption also tjum
j1form a
T-system on Iand therefore 
0¡0. This observation allows us to form the normalized polynomial
9q:q

0Fm¸
j1p1qj 
j

0j:Fm¸
j1p1qjb
jj:
Note also that the coefficients t 
jujPr0:mscorrespond to the minors in the second part of Definition 9.
Therefore, for each jP r0:ms, we have that | 
j|approaches zero at the same rate, as Ñ0. So for
sufficiently small 0, everyb
jis bounded in magnitude when ¤0; in particular, |b
j|p1q. This means
that for sufficiently small 0,t9q:¤0uis bounded. Therefore, we can find a subsequence tlul€r0;0s
such thatlÑ0and the subsequence t9qlulconverges to the polynomial
9q:Fm¸
j1bjj:
Note thatbj0for everyjPrms; in particular, |bj|p1q. Hence the polynomial°m
j1bjjis nontrivial,
namely does not uniformly vanish on I. (It would have sufficed to have some nonzero coefficient, say bj0,
rather than requiring all tbjujto be nonzero. However that would have made the statement of Definition 9
more cumbersome.) Lastly observe that 9qis non-negative on Iand vanishes on T(as well as on the boundary
ofI). This completes the proof of Proposition 19.
F Proof of Proposition 21
In this proof, we will use the following result for strictly diagonally dominant matrices from [43]:
Lemma 28. IfAis a strictly diagonally dominant matrix with positive entries on the main diagonal and
negative entries otherwise, then Ais invertible and A1has non-negative entries.
31

Proof of Proposition 21
Let^xbe a solution of (4) and hx^x. Then, with jptq ptsjqfor somej, by reverse triangle
inequality we have
¥
m¸
j1
ypsjq»1
0jptq^xpdtq 2

1{2
¥
m¸
j1
jptqhpdtqj2

1{2
¥
m¸
j1
jptqhpdtq2

1{2
}}2
¥
m¸
j1
jptqhpdtq2

1{2
;
and so

j1»1
0jptqhpdtq 2
¤42ùñ»1
0jptqhpdtq¤2;@jPrms: (100)
We apply the reverse triangle inequality again to find a lower bound of the left-hand side term in (100):
»1
0jptqhpdtq¥»
Ti;jptq^xpdtqaijptiq(101a)

li»
Tl;jptq^xpdtqaljptlq(101b)
»
TCjptq^xpdtq: (101c)
We now need to lower bound the term in (101a) and upper bound the terms in (101b), (101c). For the first
one, we obtain:
»
Ti;jptq^xpdtqaijptiq»
Ti;jptq^xpdtqaijptiqjptiq»
Ti;^xpdtqjptiq»
Ti;^xpdtq
¥jptiq»
Ti;^xpdtqai»
Ti;jptqjptiq^xpdtq
¥jptiq»
Ti;^xpdtqaiL»
Ti;|tti|^xpdtq
¥jptiq»
Ti;^xpdtqaiL»
Ti;^xpdtq: (102)
Therefore, from (102), we obtain:
»
Ti;jptq^xpdtqaijptiq¥jptiq»
Ti;^xpdtqaiL}^x}TV: (103)
32

For the term (101b), we have:
»
Tl;jptq^xpdtqaljptlq»
Tl;jptq^xpdtqaljptlqjptlq»
Tl;^xpdtqjptlq»
Tl;^xpdtq
¤jptlq»
Tl;^xpdtqal»
Tl;jptqjptlq^xpdtq
¤jptlq»
Tl;^xpdtqalL»
Tl;|ttl|^xpdtq
¤jptlq»
Tl;^xpdtqalL»
Tl;^xpdtq;
so
¸
li»
Tl;jptq^xpdtqaljptlq¤¸
li
jptlq»
Tl;^xpdtqal

L¸
li»
Tl;^xpdtq:(104)
Finally, for the term (101c), we have:
»
TCjptq^xpdtq¤max
tPTCjptq»
TC^xpdtq¤82}b}2
f
: (105)
Let us denote
zi»
Ti;^xpdtqai;
for alliPrks. Then, by combining (101) with the bounds (102),(104) and (105), we obtain the j-th row of
a linear system:
2¥jptiqziL»
Ti;^xpdtq¸
lijptlqzlL¸
li»
Tl;^xpdtq82}b}2
f: (106)
By using (106) along with
»
Ti;^xpdtq¸
li»
Tl;^xpdtq»
T^xpdtq¤»
I^xpdtq} ^x}TV;
we obtain:
2
18}b}2
f
L}^x}TV¥jptiqzi¸
lijptlqzl: (107)
Now, for all i, we select the jlpiq, the index corresponding to the closest sample as defined in Definition
1. The inequalities in (107) can be written as
Az¤v; (108)
whereA;zandvare defined as
A
|1pt1q| |1pt2q|:::|1ptkq|
|2pt1q| |2pt2q|:::|2ptkq|
…………
|kpt1q| |kpt2q|:::|kptkq|
; z
z1
z2

zk
; v
2
18}b}2
f
L}^x}TV 
1
1

1
:
BecauseAisstrictlydiagonallydominant, Lemma28holdsandtherefore A1existsandhasnon-negative
entries, so when we multiply (108) by A1, the sign does not change:
z¤A1v; (109)
33

where we can bound the entries of A1[42]:
}A1}8 1
minjpjptjq°
ijjptiqq:
The proof of Proposition 21 is now complete, since (109) is equivalent to our error bound (28).
G Proof of Proposition 22
For the sake of simplicity, let sibe the closest sample slpiqto the source ti, as defined in Definition 1. For a
fixedi, assume (without loss of generality, as we will see later) that si ti. It follows that
|sitl|sitl¥pilq;@l i;
|sitl|tlsi¥pliq;@l¡i;
and so
p|sitl|q¤ppilqq;@l i;
p|sitl|q¤ppliqq;@l¡i:
Then we have
¸
lip|sitl|qi1¸
l1p|sitl|qk¸
li1p|sitl|q
¤i1¸
l1ppilqqk¸
li1ppliqq
i1¸
l1plqki¸
l1plq: (110)
We now want to find upper bounds for each of the two sums in (110). We will derive the bound for the first
term, as the second one is similar. We have that
i1¸
l1plqpq1
i1¸
l2plq; (111)
and the sum in the previous equation is a lower Riemann sum (note that is decreasing in r0;1s)
Si1¸
l2px
lqpxlxl1q
ofpxqoverr;pi1qs, with partition and x
lchosen as follows:
rxl;xl1srl;pl1qs; l 2:::;i1;
x
ll; l 2;:::;i1:
Therefore, the sum Sis less than or equal to the integral:
i1¸
l2plq¤»pi1q
pxqdx: (112)
By substituting (112) into (111), we obtain
i1¸
l1plq¤pq1
»pi1q
pxqdx:
34

We can obtain a similar upper bound for the second sum in (110) and then
¸
lip|sitl|q¤pqpq1
»pi1q
pxqdx1
»pkiq
pxqdx;@iPrks:(113)
We can further upper bound the right hand side over all iPrksand this bound corresponds to the case when
the sourcetiis in the middle of the unit interval (at1
2) and the sources t1andtkare at 0and1respectively:
»pi1q
pxqdx»pkiq
pxqdx¤»1{2
pxqdx»1{2
pxqdx;@iPrks;
and therefore we have
¸
lip|sitl|q¤pqpq1
»1{2
pxqdx1
»1{2
pxqdx;@iPrks:
In order to find 0, we solve (27) since |siti|¤impliesp|siti|q¥pq.
We note that if we only have three sources, then the integral terms should not be included, and if we
have four sources, then the last integral term should not be included.
H Proof of Theorem 13
The proof of Theorem 13 involves the same ideas as Theorem 12. The differences are in the analysis of
Proposition 21. We continue this analysis from (101), where we lower bound the left-hand side term of
(100):
2¥»1
0jptqhpdtq¥»
~Ti;jptq^xpdtq¸
rPrkisairjptirq(114a)

li»
~Tl;jptq^xpdtq¸
rPrklsalrjptlrq(114b)
»
TCjptq^xpdtq; (114c)
where, in each term, the sum from r1tokiis over all the true sources in ~Ti;(for alliPr~ks). In order to
obtain bounds for the terms (114a) and (114b), we need the following fact:
DiPrargmin
rPrkisjptirq;argmax
rPrkisjptirqssuch that jpiqki¸
r1airkr¸
r1arkjptirq;@rPr~ks:(115)
This comes from the continuity of jand intermediate value theorem, since:
min
kjptirq¤°ki
r1airjptirq
°ki
r1air¤max
kjptirq:
35

We proceed as before to find a lower bound for (114a) and an upper bound for (114b), while the upper
bound for (114c) is the same. For (114a):
»
~Ti;jptq^xpdtqki¸
r1airjptirq»
~Ti;jptq^xpdtqjpiqki¸
r1air
»
~Ti;jptq^xpdtqjpiqki¸
r1airjpiq»
~Ti;^xpdtqjpiq»
~Ti;^xpdtq
¥jpiq»
~Ti;^xpdtqki¸
r1air»
~Ti;jptqjpiq^xpdtq
¥jpiq»
~Ti;^xpdtqki¸
r1airL»
~Ti;|ti|^xpdtq
¥jpiq»
~Ti;^xpdtqki¸
r1airLp2ki1q»
~Ti;^xpdtq;
where the width of ~Ti;is at most 2kiandiP~Ti;is chosen according to (115), so the distance |ti|for
tP~Ti;is at mostp2ki1q. For the second term (114b):
»
~Tl;jptq^xpdtqkl¸
r1alrjptlrq»
~Tl;jptq^xpdtqjplqkl¸
r1alr
»
~Tl;jptq^xpdtqjplqkl¸
r1alrjplq»
~Tl;^xpdtqjplq»
~Tl;^xpdtq
¤jplq»
~Tl;^xpdtqkl¸
r1alr»
~Tl;jptqjplq^xpdtq
¤jptlq»
~Tl;^xpdtqkl¸
r1alrL»
~Tl;|tl|^xpdtq
¤jptlq»
~Tl;^xpdtqkl¸
r1alrLp2kl1q»
~Tl;^xpdtq
so
¸
li»
~Tl;jptq^xpdtqkl¸
r1alrjptlrq¤¸
li
jplq»
~Tl;^xpdtqkl¸
r1alr

L¸
lip2kl1q»
~Tl;^xpdtq:
Let
~zi»
~Ti;^xpdtqki¸
r1air
and we obtain an inequality as before:
2
18}b}2
f
p2k1qL}^x}TV¥jpiq~zi¸
lijplq~zl;
36

where we obtained the second constant as follows:
Lp2ki1q»
~Ti;^xpdtqL¸
lip2kl1q»
~Tl;^xpdtqL~k¸
l1p2kl1q»
~Tl;^xpdtq¤L}^x}TV~k¸
l1p2kl1q¤p 2k1qL}^x}TV
and, for all iPr~ks, we selectjpiqargminj|sji|. The linear system is
~A~z¤~v;
with:
~A
|1p1q| |1p2q|:::|1p~kq|
|2p1q| |2p2q|:::|2p~kq|
…………
|~kp1q| |~kp2q|:::|~kp~kq|
;~z
z1
z2

z~k
;~v
2
18}b}2
f
p2k1qL}^x}TV 
1
1

1
:
In Appendix G we discuss what the choice of should be so that, if |tisi| ¤, the matrix Ais
strictly diagonally dominant. Here, the matrix ~Ais similar to Aexcept that we evaluate at|isi|, where
icorresponds to a group of sources in ~Ti;that are located within distances smaller than 2andsiis the
closest sample to i. Given that the minimum separation between isources is 2, the analysis in Appendix
G is the same, so ~Ais strictly diagonally dominant if
|isi|¤2;@iPr~ks
andis chosen to satisfy (27) where we take 2. For the value of found this way, we select the
sampling locations uniformly at intervals of 2.
I Proof of Lemma 26
For reals, let us first study the Lipschitz constant of the operator that takes a non-negative measure x
supported on Ito³
Igpstqxpdtq, wheregptqet2
2. To that end, consider a pair of non-negative measures
z1;z2supported on Isuch that}z1}TV}z2}TV. Their Wasserstein distance dWpz1;z2qwas defined in (20).
The dual of Program (20) is in fact (see, for example [44], Chapter 5)
max
!t»
I!ptqpz1z2qpdtq:!is1-Lipschitzu: (116)
where the supremum is over all (measurable) functions !:IÑR. Consider the particular choice of
!ptq gpstqfor positive to be set shortly. Let us see if this choice of !is feasible for Program (116).
Fort1;t2PI, we write that
!pt1q!pt2q
gpst1qgpst2q
 »t2
t1g1pstqdt
¤ max
tPr1;1s|g1ptq||t2t1|

c
2
e|t2t1|
|t2t1|; (117)
where we set ?
2e
2in the last line above, for small enough (specifically, for  ?
2). Therefore, with
this choice of , the function !specified above is feasible for Program (116). From this observation and the
strong duality between Programs (20) and (116), it follows that
?
2e

Igpstqpz1pdtqz2pdtqq¤dWpz1;z2q: (118)
37

Combined with the other direction, we find that
?
2e
2»
Igpstqpz1pdtqz2pdtqq¤dWpz1;z2q; (119)
namely that the map zѳ
Igpstqzpdtqisp2{?
2eq-Lipschitz with respect to the Wasserstein distance.
It is easy to extend the above conclusion to the generalised Wasserstein distance. Consider a pair of
non-negative measures x1;x2supported on I, and a pair of non-negative measures z1;z2onIsuch that
}z1}TV}z2}TV. Then, using the triangle inequality, we write that
»
Igpstqpx1pdtqx2pdtqq
¤»
Igpstqpx1pdtqz1pdtqq»
Igpstqpz1pdtqz2pdtqq»
Igpstqpz2pdtqx2pdtqq
¤»
Ix1pdtqz1pdtq»
Igpstqpz1pdtqz2pdtqq»
Iz2pdtqx2pdtq
gptq¤1
}x1z1}TV»
Igpstqpz1pdtqz2pdtqq}z2x2}TV
¤}x1z1}TV2
?
2edWpz1;z2q}z2x2}TV(see (119))
¤2
?
2e
}x1z1}TVdWpz1;z2q}z2x2}TV
for¤2?
2e
: (120)
The choice of z1;z2above was arbitrary and therefore, recalling the definition of dGWin (19), we find that
»
Igpstqpx1pdtqx2pdtqq¤2
?
2edGWpx1;x2q; (121)
namely the map xѳ
Igpstqxpdtqis a2
?
2e-Lipschitz operator. It immediately follows that the operator
that was formed using the sampling points Stsjum
j1in (3) is2?m
?
2e-Lipschitz. This completes the proof
of Lemma 26.
J Proof of Lemma 23
Following the definition of T*-systems in Definition 9, consider an increasing sequence tlum
l0€Isuch that
00,m1, and except one more point (say l), the rest of points belong to T, the-neighbourhood of
the support T€intpIq.
We also select the subset of samples Sof size 2k2that is closest to the support T(in Hausdorff
distance), so without loss of generality, we set m2k2. With this assumption, the setup in Definition
9 forces that every neighbourhood Ti;contain exactly two points, say tiandti;:tito simplify the
38

presentation. Then the determinant in part 1 of Definition 9 can be written as
MFp0qgps1q  gpsmq
Fpt1qgpt1s1q gpt1smq
F
t1;
g
t1;s1
g
t1;sm

F
l
g
ls1
g
lsm

Fptkqgptks1q gpsmq
F
tk;
g
tk;s1
g
tk;sm
Fp1qgp1s1q gp1smq
f0gps1qgpsmq
0gpt1s1q gpt1smq
0g
t1;s1
g
t1;sm

F
l
g
ls1
g
lsm

0gptks1q gpsmq
0g
tk;s1
g
tk;sm
f1gp1s1q gp1smq:pevaluatingFptqas in (44)q(122)
We will now need the following lemma in order to simplify the determinant above. The result is proved in
Appendix L.
Lemma 29. LetA;BPRmmwithm¥2anddetpAq¡0. If0¤¤8
34mpA1Bq, then
detpAq
117?e
8mpA1Bq
¤detpABq¤detpAq
117?e
8mpA1Bq
;
wherepXqis the spectral radius of the matrix X. In particular, for the stated choice of ,
detpAq
1?e
2
¤detpABq¤detpAq
1?e
2
:
AsÑ0, note that gpti;sjqgptisjqg1ptisjq2
2g2pq, for somePrtisj;tisjs.
After applying this expansion, we subtract the rows with gptisjqfrom the rows with gpti;sjq, takek
outside of the determinant and we can write Mas:
MkdetpMNMPq;
MNis a matrix with entries independent of and with determinant:
NdetpMNqf0gps1q  gpsmq
0gpt1s1q gpt1smq
0g1pt1s1q g1pt1smq

F
l
g
ls1
g
lsm

0gptks1q gpsmq
0g1ptks1q g1ptksmq
f1gp1s1q gp1smq: (123)
39

Moreover, while the entries of MPdepend on , the magnitude of each entry can be bounded from above
independently of . Consequently,kMPkFis bounded from above independently of . Let us assume for
the moment that N¡0and soMNis invertible. Since pM1
NMPq ¤

M1
N

2kMPkF, this implies that
pM1
NMPqis bounded from above independently of . We can then apply the stronger result of Lemma 29
toMand obtain:
0 p1CNqkN¤M¤p1CNqkN; (124)
whereCN¡0is a constant that does not depend on . Note that we do not need write the condition on
required by Lemma 29 explicitly because Ñ0and also that (124) applies to the minors of MandN
along the row containing l. ThatNis indeed positive (and therefore we can apply Lemma 29) is established
below By its definition in (44), Fplqcan take two values above. Either
Fplq0, which happens when there exists i0Prkssuch thatlPTi0;, namely when lis close to the
supportT. In this case, by applying the Laplace expansion to N, we find that
Nf0N1;1f1Nm1;1; (125)
whereN1;1areNm1;1are the corresponding minors in (123). Note that both N1;1andNm1;1are
positive because the Gaussian window is extended totally positive , see Example 5 in [27]. Recalling
thatf0;f1¡0, we conclude that Nis positive. Therefore, when is sufficiently small, (123) implies
thatMis non-negative when Fplq0. Or
Fplqf, which happens when lPTC
, namely when lis away from the support T. Suppose that
f0"fso thatNis dominated by its first minor, namely N1;1. More precisely, by applying the Laplace
expansion to Nin (123), we find that
Nf0N1;1fNl;1f1Nm1;1; (126)
in which all three minors are positive because the Gaussian window is extended totally positive. Also,
note thatNl;1doesnotdepend onland recall also that f0;f;f1are all positive. Therefore Nin (126)
is positive if
f0
f¡Nl;1
minlN1;1; (127)
where the minimum is over lPTC
. The right-hand side above is well-defined because N1;1N1;1plq
is positive for every lPI,N1;1plqis a continuous function of l, andIis compact. Indeed, N1;1plqis
positive because the Gaussian window is extended totally positive. As before, Nbeing positive implies
thatMis non-negative when is sufficiently small, see (123).
By combining both cases above, we conclude that Mis non-negative for sufficiently small provided that
(127) holds, thereby verifying part 1 of Definition 9. To verify part 2 of that definition, consider the minors
along the row containing linM, see (123). Starting with the first minor along this row and applying the
same arguments as before for M, we observe, after applying Lemma 29, that
M
l;1¥p1Cl;1qkgps1q  gpsmq
gpt1s1q gpt1smq
g1pt1s1q g1pt1smq
……
gptks1q gpt1smq
g1ptks1q g1ptksmq
gp1s1q gp1smq:p1Cl;1qkNl;1; (128)
and alsoM
l;1¤ p1Cl;1qkNl;1asÑ0. HereCl;1¡0is a constant that does not depend on
. Moreover, Nl;1does not depend on and is positive because the Gaussian window is extended totally
positive. Therefore M
l;1in (128) approaches zero at the rate k.
40

Consider next the pj1qth minor along the row containing lofMin (123), namely M
l;j1with
j1;:::;m. Using the same arguments as before, we obtain after applying Lemma 29 that
M
l;j1¥p1Cl;j1qkf0gps1q  gpsj1qgpsj1q  gpsmq
0gpt1s1q gpt1sj1qgpt1sj1q gpt1smq
0g1pt1s1q g1pt1sj1qg1pt1sj1q g1pt1smq

0gptks1q gptksj1qgptksj1q gptksmq
0g1ptks1q g1ptksj1qg1ptksj1q g1ptksmq
f1gp1s1q gp1sj1qgp1sj1q gp1smq
p1Cl;j1qkf0gpt1s1q g
t1sj1
g
t1sj1
gpt1smq
g1pt1s1qg1
t1sj1
g1
t1sj1
g1pt1smq

gptks1q g
tksj1
g
tksj1
gptksmq
g1ptks1q g1
tksj1
g1
tksj1
g1ptksmq
gp1s1q g
1sj1
g
1sj1
gp1smq
p1Cl;j1qkf1gps1q  g
sj1
g
sj1
gpsmq
gpt1s1q g
t1sj1
g
t1sj1
gpt1smq
g1pt1s1q g1
t1sj1
g1
t1sj1
g1pt1smq

gptks1q g
tksj1
g
tksj1
gptksmq
g1ptks1q g1
tksj1
g1
tksj1
g1ptksmq
:p1Cl;j1qk
f0Nl;j1;0f1Nl;j1;1
:p1Cl;j1qkNl;j1; (129)
and alsoM
l;j1¤p1Cl;j1qkNl;j1asÑ0, providedNl;j1¡0. Here,Nl;j1is the determinant
on the first line of (129) and Cl;j1¡0is a constant independent of . Note that Nl;j1;0andNl;j1;1are
both positive because the Gaussian window is extended totally positive. To ensure Nl;j1¡0, we require
f0"f1, or more precisely, the following to hold.
f0
f1¡Nl;j1;1
Nl;j1;0: (130)
It then follows that M
l;j1approaches zero at the rate kfor everyj, thereby verifying part 2 in Definition
9 forlPintpIq. In conclusion, we find that tFuYtjum
j1form a T*-system on Iwithjptqgptsjq
eptsjq2
2as in (9), provided that (127) and (130) hold.
To establish that tFuYtjum
j1form a T*-system on Iwith the Gaussian window, we note that Fplq
is replaced by by Fplq, which takes values 1whenlPTi;for somei, as indicated in (45). The previous
argument, thus, goes through similarly, showing that tFuYtjum
j1is a T*-system on Ifor arbitrary sign
patternwhen (127) and (130) hold and f0"1. The extra condition f0"1comes from the only difference
between the two proofs, namely that, instead of (125), we have
Nf0N1;11Nl;1f1Nm1;1: (131)
Therefore to ensure N¡0, we now require the following to hold.
f0¡Nl;1
minlN1;1: (132)
We leave out the mostly repetitive details. Hence if (127), (130) and (132) hold, then tFuYtjum
j1form
a T*-system on Iwith the Gaussian window. This completes the proof of Lemma 23.
Remark 1. While we do not require that f1"f, if we impose that both f0"fandf1"f, then (126)
holds for smallerf0f, so it is useful in practice. Similarly, from (131) we want that also f1"1.
41

Remark 2. From this proof and in light of the first remark, we see that we only need to specify one end
point rather than both, so we only need m2k1. However, the dual polynomial is better in practice if
we have conditions at both end points (if we specify both f0andf1).
K Proof of Lemma 24
Recall our assumptions that
m2k2; s 10; sms2k21; (133)
and that
|s2iti|¤; s 2i1s2i;@iPrks; (134)
That is, we collect two samples near each impulse in x, supported on T. In addition, we make the following
assumptions on and:
¤?
2;¡d
log
34
2
; ¤2: (135)
After studying Appendix E, it becomes clear that the entries of bPRmare specified as
bjlim
Ñ0p1qj1M
l;j1
M
l;1p 1qj1lim
Ñ0M
l;j1
M
l;1; jPrms; (136)
where the numerator and the denominator are the minors tM
l;jum1
j1ofMin (122) along the row containing
l. Using the upper and lower bounds on these quantities derived earlier, we obtain:
p1Cl;j1qNl;j1
p1Cl;1qNl;1¤M
l;j1
M
l;1¤p1Cl;j1qNl;j1
p1Cl;1qNl;1; jPrms; (137)
which in turn implies the following expression for bj:
bjp 1qj1Nl;j1
Nl;1; jPrms: (138)
Recall that Nl;1;Nl;j1¡0and so,|bj|Nl;j1
Nl;1forjPrms. Therefore, in order to upper bound each |bj|,
we will respectively lower bound the denominator Nl;1and upper bound the numerators Nl;j1.
K.1 Bound on the Denominator of (138)
We now find a lower bound for the first minor, namely Nl;1in (138). Let us conveniently assume that the
spike locations Tttiuk
i1and the sampling points Stsjum1
j2are away from the boundary of interval
Ir0;1s, namely
a
logp1{3q¤ti¤1a
logp1{3q;@iPrks;
a
logp1{3q¤sj¤1a
logp1{3q;@jPr2:m1s: (139)
In particular, (139) implies that
gptiq¤3; gp1tiq¤3; iPrks;
gpsjq¤3; gp1sjq¤3; jPr2:m1s: (140)
For the derivatives, we have that:
|g1ptiq|2ti
2gptiq¤23
2;
42

where we used the fact that 0¤ti¤1and (140). Similarly, for the 1ti,sjand1sj:
|g1ptiq|¤23
2;|g1p1tiq|¤23
2; iPrks;
|g1psjq|¤23
2;|g1p1sjq|¤23
2; jPr2:m1s: (141)
With the assumptions in (133), (134), (140), (141) and the fact that gp0q1, we have that the determinant
Nl;1in (128) is equal to
Nl;11Op3qOp3q  Op3q
…………
Op3q gptis2uqgptis2u1q Op3q
Op3{2q g1ptis2uqg1ptis2u1q Op3{2q
…………
Op3q  Op3qOp3q  1; (142)
where we wrote
gptiqOp3qandg1ptiqOp3
2q ðñ |gptiq|¤M13and|g1ptiq|¤M23
2; (143)
fortiwithin the bounds defined in (139) and some M1;M2¡0. Here, we can take M1M22and we
use the same notation for 1ti,sjand1sjwith the same constants M1M22. We then take the
Taylor expansion of gptis2u1qandg1ptis2u1qaroundtis2u, subtract the columns with tis2ufrom
the columns where we performed the expansion, take koutside of the determinant and we obtain:
Nl;1k|CC1|; (144)
where
C
1 0 0  0
…………
0gptis2uq g1ptis2uq  0
0g1ptis2uq g2ptis2uq  0
…………
0 0 0  1
(145)
C1
0Op2qOp2q  Op2q
…………
Op2q  01
2g2pi;uq Op2q
Op2{2q  01
2g3p1
i;uq Op2{2q
…………
Op2q Op2qOp2q  0
; (146)
for somei;u;1
i;uPrtis2u;tis2usfor alli;u1;:::;k. Note that, using the notation in (143), if
we subtract a function that is Opqwith constant M1¡0from another function that is Opqwith constant
M2¡0, we obtain a function that is Opqwith constant M1M2, which is why we wrote Op2qon the
first and last columns where we subtracted two functions Opqwith the same constant M1(soOp2qimplies
¤2M1). Next, we apply Taylor expansion around tituin the terms with tis2uinCas follows:
gptis2uqgptitutus2uqgptituqptus2uqg1p
i;uq; (147)
43

for some
i;uPrtitu|tus2u|;titu|tus2u|sfor alli;u1;:::;k. Note that|tus2u|¤according
to (134). By applying a similar Taylor expansion to g1andg2, we can write
CAA1; (148)
where
A
1 0 0  0
…………
0gptituq g1ptituq  0
0g1ptituq g2ptituq  0
…………
0 0 0  1
(149)
and
A1
0 0 0  0
…………
0tus2u
g1p
i;uq tus2u
g2p1
i;uq  0
0tus2u
g2p1
i;uq tus2u
g3p2
i;uq  0
…………
0 0 0  0
(150)
for some
i;u;1
i;u;2
i;uPrtitu|tus2u|;titu|tus2u|sfor alli;u1;:::;k. We now substitute
(148) into (144) and we obtain:
Nl;1k|ApA1C1q|; (151)
Assuming for the moment that |A|¡0holds, we obtain via Lemma 29 the following bound:
k
1?e
2
detpAq¤Nl;1¤k
1?e
2
detpAq (152)
if
¤8
34p2k2qpA1pA1C1qq: (153)
We look closer at the condition (153) on in Section K.3. That |A|¡0(and therefore our application of
Lemma 29 above is valid) is established below. We can in fact write Amore compactly as follows. For scalar
t, let
Hptq:
gptq g1ptq
g1ptq g2ptq
PR22; (154)
which allows us to rewrite Aas
A
1 0 12k 0
02k1B 02k1
0 0 12k 1
PRmm; (155)
B:
Hp0qHpt1t2qHpt1t3q Hpt1tkq
Hpt2t1qHp0qHpt2t3q Hpt2tkq

Hptkt1qHptkt2qHptkt3q  Hp0q
PR2k2k; (156)
where 0abis the matrix of zeros of size ab. It follows from (155) that |A||B|by Laplace expansion of
|A|. In particular, the eigenvalues of Aare:1;1and the eigenvalues of B. Let us note that Bis a symmetric
matrix5, sinceHptqHptqT; hence,Ais also symmetric. We now proceed to lower bound |B|, the details
of which are given in Appendix M. The main observation is that Hp0qis a diagonal matrix while the entries
ofHptitjqforijdecay with the separation of sources .
5Indeed,g;g2are even functions and g1is an odd function.
44

Lemma 30. Let¤?
2,¡b
log
34
2
and
0 Fmin
;1

1
12
2
2e2
2
1e2
2 1:
Then, for each i1;:::; 2k, it holds that
ipBq¥Fmin
;1

:
Since|A||B|, we obtain via Lemma 30 that |A|¥Fmin
;1
2k¡0. Using this in (152), leads to the
following bound:
Nl;1¥k
1?e
2
Fmin
;1

2k
: (157)
K.2 Bound on the Numerator of (138)
Since (124) holds for all the minors M
l;jandNl;j, let us now upper bound the Nl;jforj2;:::;m1in
the numerator of (138). Note that we distinguish two cases: jPt3;:::;muandjPt2;m1u. To simplify
the presentation for the first case, suppose, for example, that j3. Using the assumptions in (133), (134),
(140), (141) and the fact that gp0q1,
Nl;3 f0 1Op3q  Op3qOp3q  Op3q

0Op3qgptis3q gptis2uqgptis2u1q Op3q
0Op3{2qg1ptis3q g1ptis2uqg1ptis2u1q Op3{2q

f1Op3qOp3q  Op3qOp3q  1 :(158)
We now expand gptis2u1qaroundgptis2uq, subtract the columns and take out of the determinant as
before:
Nl;3k1 f0 1Op3q  Op3qOp22q  Op3q

0Op3qgptis3q gptis2uq g1pi;uq Op3q
0Op3{2qg1ptis3q g1ptis2uq g2p1
i;uq Op3{2q

f1Op3qOp3q  Op3qOp22q  1 :k1detp~N3q;
(159)
wherei;u;1
i;uPrtis2u;tis2usand denote the matrix in (159) by ~N3. Note the k1since we only
perform column operations on k1columns and detp~N3q¡0(due to the choice of f0andf1). Next let
CPRm3consist of the first, second, and last columns of ~N3andDPRmpm3qconsist of the rest of the
columns of ~N3. Then, we may write that
detprCDsq2
C
D
rCDs
det
CC CD
DC DD
¤detpCCqdetpDDq; (160)
where we note that swapping columns only changes the sign in a determinant (here, detprCDsq detp~N3q)
and in the last inequality we applied Fischer’s inequality (see, for example, Theorem 7.8.3 in [45]), which
works because the matrix rCDsrCDsis Hermitian positive definite. Therefore, we have that
detp~N3q| detprCDsq|¤ detpCCq1
2detpDDq1
2; (161)
and it suffices to bound the determinants on the right-hand side above.
45

K.2.1 Bounding detpCCq
We now write Cas follows:
C
f01 0
………
0 0 0
0 0 0
………
f10 1

0 0 Op3q
………
0Op3qOp3q
0Op3{2qOp3{2q
………
0Op3q 0
:XrZ; (162)
where we denote the first matrix by Xand the second matrix by rZ. We have that
detpCCqdetppXrZqpXrZqq detpXXrZ1q; (163)
withrZ1XrZrZXrZrZand we apply Weyl’s inequality to CCto obtain:
detpCCq¤p1pXXqmaxprZ1qqp2pXXqmaxprZ1qqp3pXXqmaxprZ1qq:(164)
Then
XX
f2
0f2
1f0f1
f0 1 0
f1 0 1
with3pXXq0; (165)
and
}XX}F¤Cpf0;f1q;where Cpf0;f1qf2
0f2
12f02f12; (166)
so
}X}2
2}XX}2¤}XX}F¤Cpf0;f1qso}X}2b
Cpf0;f1q (167)
and
}rZ1}2}XrZrZXrZrZ}2
¤2}X}2}rZ}2}rZ}2
2
2b
Cpf0;f1q}rZ}2
}rZ}2
¤3b
Cpf0;f1q}rZ}2; (168)
where the last inequality holds if
}rZ}2¤b
Cpf0;f1q: (169)
Noting that rZ1is symmetric and maxprZ1q¤}rZ1}2, we substitute (165) and (168) into (164) and obtain:
detpCCq¤
Cpf0;f1q3b
Cpf0;f1q}rZ}2
2
3b
Cpf0;f1q}rZ}2; (170)
if (169) holds. Further applying (169) in the parentheses, we obtain:
detpCCq¤48Cpf0;f1q5
2}rZ}2: (171)
Now, using (143) with M1M22, we are able to upper bound }rZ}F, which is also an upper bound for
}rZ}2:
}rZ}2¤}rZ}F¤d
p2k2qp23q22k23
2
2
¤p2k2q232k23
23
4k44k
2
: (172)
46

Therefore, to satisfy (169), it is sufficient to find such that:
3
4k44k
2
¤b
Cpf0;f1q: (173)
With this choice of , by substituting (172) into (171), we obtain:
detpCCq¤48Cpf0;f1q5
2
4k44k
2
3: (174)
Note that all our calculations so far will be used for bounding }b}2. In case of}b}2, everything is the
same except that we have:
X
X
f2
0f2
12k f 0f1
f0 1 0
f1 0 1
; (175)
which does not have a zero eigenvalue, so we will not obtain the 3factor. We omit the calculations, but we
obtain:
detpC
Cq¤64Cpf0;f1q2k3(176)
with a condition on that is weaker than (173).
K.2.2 Bounding detpDDq
Then, since DDis Hermitian positive definite, we can apply Hadamard’s inequality (Theorem 7.8.1 in [45])
to bound its determinant by the product of its main diagonal entries (i.e. the squared 2-norms of the columns
ofD), so we obtain, after we use (143) with M1M22:
detpDDq¤
86k¸
i1gptis3q2k¸
i1g1ptis3q2

k¹
u2
86k¸
i1gptis2uq2k¸
i1g1ptis2uq2

k¹
u2
324k¸
i1g1pi;uq2k¸
i1g2p1
i;uq2

; (177)
wherei;u;1
i;uPrtis2u;tistus.
For fixeduPt1;:::;ku, we may write that

i1gptis2uq2¤28¸
i0ei22
22
1e2
2: (178)
To see the inequality, if we note that s2i¤ti¤s2i1, we have that
gptis2iq¤gp0q; gptis2pi1qq¤gp0q;
gptis2pi1qq¤gpq; gptis2pi2qq¤gpq;
……
and by adding the inequalities we obtain (178). Likewise, it holds that

i1g1ptis2uq2¤4
42
1e2
2;
47


i1g2ptis2uq2¤2
24
4
2
2
1e2
2: (179)
and, fori;u;1
u;uPrtis2u;tis2us:

i1g1pi;uq2¤4
43
1e2
2;

i1g2p1
i;uq2¤2
24
4
2
3
1e2
2: (180)
Note that we obtained (180) in the same way as (178), except that we obtain a factor of 3instead of 2:
gpi;iq¤gp0q; gpi;i1q¤gp0qgpi;i1q¤gp0q
gpi;i2q¤gpqgpi;i2q¤gpq
gpi;i3q¤gp2qgpi;i3q¤gp2q
……
Lastly, the above bounds also hold if we have gptis2u1qinstead ofgptis2uq. Substituting these bounds
back into (177) and using (143) with M1M22, we obtain:
detpDDq¤
86
14
4
2
1e2
2
k¹
u2
86
14
4
2
1e2
2
k¹
u2
324
4
42
24
4
2
3
1e2
2

¤
8
14
4
2
1e2
2k
32
4
42
24
4
2
3
1e2
2
k1
F1
;1

k
F2
;1

k1
; (181)
where
F1
;1

8
14
4
2
1e2
2;
F2
;1

321
42
62
8
24
1e2
2: (182)
Note that the bound (181) on detpDDqis the same for all j3;:::;m. Combining (174) with (181) in
(159), we obtain:
Nl;j¤k1
2Cpf0;f1q1
2
4k44k
2
1
2
F1
;1

k
2
F2
;1

k1
2
(183)
forj3;:::;mif (173) holds.
48

Finally, we need to upper bound Nl;jforj2andjm1. For simplicity, consider j2. Applying
the same assumptions and operations as in (159), we have
Nl;2k f0Op3q  Op3qOp22q  Op3q

0gptis2q gptis2uq g1pi;uq Op3q
0g1ptis2q g1ptis2uq g2p1
i;uq Op3{2q

f1Op3q  Op3qOp22q  1 ; (184)
wherei;u;1
i;uPrtis2u;tis2us. We bound Nl;2by using Hadamard’s inequality (the more general
version, see [46]) and, after we use (143) with M1M22, we obtain:
Nl;2¤kb
f2
0f2
1
14pk1q64k6
41
2
k¹
u1
86k¸
i1gptis2uq2k¸
i1g1ptis2uq2
1
2
k¹
u1
324k¸
i1g1pi;uq2k¸
i1g2p1
i;uq2
1
2
; (185)
and, by applying the bounds on the sums and ¤1, we obtain:
Nl;2¤kb
f2
0f2
1
4k54k
4
1
2
F1
;1

k
2
F2
;1

k
2
; (186)
forF1andF2defined as in (182), and note that the same bound also holds for Nl;m1.
To conclude, from (183) and (186), we can derive a general bound valid for all j:
Nl;j¤kCpf0;f1q1
2
4k54k
4
1
2
Fmax
;1

k
(187)
for allj2;:::;m1if (173) holds, where
Fmax
;1


8
14
4
2
1e2
2 1
2
321
42
62
8
24
1e2
2 1
2
:(188)
K.3 Condition on 
We now return to the condition (153) that must satisfy so that our application of Lemma 29 is valid. Since
A is positive definite, we have

A1

2¤1{minpAq. Using this, we obtain:
pA1pA1C1qq¤}A1pA1C1q}2¤}A1}2}A1C1}2¤1
minpAq}A1C1}F;(189)
and
A1C1
0Op2q Op2q  Op2q
…………
Op2q tus2u
g1p
i;uq tus2u
g2p1
i;uq1
2g2pi;uq Op2q
Op2{2q tus2u
g2p1
i;uq tus2u
g3p2
i;uq1
2g3p1
i;uq Op2{2q
…………
Op2q  Op2q Op2q  0
;(190)
49

where
i;u;1
i;uPrtis2u;tis2us;

i;u;1
i;u;2
i;uPrtitu|tus2u|;titu|tus2u|s: (191)
for alli;u1;:::;k.
Because the eigenvalues of Aare1;1and the eigenvalues of BandminpBq ¥Fminp;1
qwith 0 
Fminp;1
q   1, then we have that that minpAq ¥Fmin
;1

. Moreover, after applying (143) with
M1M22, we have that:
}A1C1}2
F¤8pk1q48k4
4(from the first and last columns)
8k432k2(from the first and last rows)

i;u1g1p
i;uq2k¸
i;u1g2p1
i;uq2

i;u1
tus2u
g2p1
i;uq1
2g2pi;uq
2

i;u1
tus2u
g3p2
i;uq1
2g3p1
i;uq
2
:(192)
We upper bound this using the inequalities in (180) and, similarly, for g3:

i1g3pi;uq2¤12
48
6
2
3
1e2
2; (193)
and note that these inequalities hold for all numbers in (191). By expanding the parentheses and applying
Cauchy-Schwartz to the products, and using that |tus2u|¤, we obtain:
}A1C1}2
F¤8p2k1q432k28k4
4
k4
43
1e2
2k2
24
4
2
3
1e2
2
9k
42
24
4
2
3
1e2
2
9k
412
48
6
2
3
1e2
2(194)
Using the assumption that ¤2to write4
4¤4, and then by applying ¤1and¤?
2, we can write
}A1C1}2
F¤80k8kP1

3
1e2
2; (195)
whereP1

is a polynomial in1
defined as follows:
P1

4
413
42
24
4
2
9
412
48
6
2
: (196)
Inserting the above observations in the condition (153), we finally obtain
¤8Fminp;1
q
34p2k2q
80k8kP1
3
1e2
2 1
2: (197)
With this choice of , the condition (153) is satisfied.
50

K.4 Bound for (136)
Combining the results from K.1 and K.2, we arrive at
|bj|Nl;j1
Nl;1(see (138))
¤Cpf0;f1q5
4
4k54k
4 1
2
1?e
2
Fmax
;1

Fmin
;1
2
k
;(see (187) and (157)) (198)
and|b
j|¤Cpf0;f1q2k3
2

1?e
2 
Fmax
;1

Fmin
;1
2
k
;(see (176)) (199)
for everyjPrms, provided that the conditions (173) and (197) hold. Consequently,
}b}2gffem¸
j1b2
j¤c
p2k2q
4k54k
4
1?e
2Cpf0;f1q5
4F
;1

k
; (200)
}b}2¤?
2k2

1?e
2 Cpf0;f1q2k3
2F
;1

k
; (201)
where we write F
;1

Fmaxp;1
q
Fminp;1
q2if,¤?
2and¡b
log
34
2
and the conditions (173) and
(197) hold. This completes the proof of Lemma 24 since m2k2by assumption.
L Proof of Lemma 29
To begin with, let us note that
detpABqdetpAqdetpIA1Bq
We will now upper and lower bound the second term in this equality. Denoting CA1B, consider
log detpICq¸
iPRlogp1ipCqq¸
iPIlogp12RepipCqq2|ipCq|2q;
where
RtiPrms:ImpipCqq 0uand
ItiPrms:i1pCq;i2pCqare complex conjugate and Impi1pCqq 0u:
1. ForiPR, if 1
|ipCq|, use apply Taylor expansion for logp1xqand obtain:
logp1ipCqqipCq22
ipCq
22
i;;wherei;P
1|ipCq|;1|ipCq|
:(202)
2. ForiPC, we apply the same Taylor expansion and, writing y2RepipCqq2|ipCq|2, for|y| 1
we obtain:
logp1yqy
i;;wherei;P
1|y|;1|y|
:
Then, we have that
2|ipCq|2|ipCq|2¤4|ipCq| 1;
51

where the first inequality is true if ¤2
|ipCq|and the second inequality is true if  1
4|ipCq|. From
the condition on i;and noting that |RepipCqq|¤|ipCq|, we have that
i;¤1|y|¤14|ipCq|andi;¥1|y|¥14|ipCq|¥1
2;(203)
where the last inequality holds if ¤1
8|ipCq|. Therefore,1
i;¤2if¤1
8|ipCq|.
We now use (202), (203) and ¤1
8|ipCq|. Let¤1
8|ipCq|for all eigenvalues ipCq, then:
log detpICq¸
iPRipCq2

iPR2
ipCq
2
i;¸
iPI1
i;
2RepipCqq2|ipCq|2
;(204)
and, by using that i;¥1|ipCq|¥7
8foriPRand1
i;¤2foriPCand the fact that the index iPI
accounts for two eigenvalues, we obtain:
|log detpICq|¤¸
iPR|ipCq|322
49¸
iPR|ipCq|24¸
iPI|ipCq|22¸
iPI|ipCq|2
¤2m¸
i1|ipCq|2m¸
i1|ipCq|2
¤mpCqp2pCqq
¤mpCqp21
8q17
8mpCq
¤1
2(205)
where the second last inequality holds if ¤1
8pCqand the last inequality holds if ¤8
34mpCq, which is also
the dominating condition for in the proof if m¥2.
Now, note that for |x| ¤1
2, we have that ex1xefor someP r|x|;|x|s „ r1
2;1
2sby Taylor
expansion, and by taking xlog detpICq, we obtain:
detpICqex¤1|x|e1
2¤117?e
8mpCq¤1?e
2;
and similarly
detpICqex¥1|x|e1
2¥117?e
8mpCq¥1?e
2:
From the last two inequalities, by multiplying by detpAq, we obtain the result of our lemma.
M Proof of Lemma 30
Recalling the definition of Bin (156), we apply Gershgorin disc theorem to find the discs Dpaii;°
ijaijq
which contain the eigenvalues of B. Due to the structure of H, we consider two cases:
1. On odd rows, the centre of the disc is aoddiigp0q1and the radius is
Roddik¸
j1
ji|gptitjq||g1ptitjq| (206a)
k¸
j1
jigptitjq
12|titj|
2
;fori1;:::;k: (206b)
52

2. On even rows, the centre of the disc is aeveniig2p0q2
2and the radius is
Revenik¸
j1
ji|g1ptitjq||g2ptitjq| (207a)
k¸
j1
jigptitjq
2|titj|
22
24ptitjq2
4

;fori1;:::;k: (207b)
Since the eigenvalues of Bare real, they are lower bounded by
min
i1;:::;k1Roddior min
i1;:::;k2
2Reveni: (208)
Because|titj|¤1, we have that
1Roddi¥1
12
2

j1
jigptitjq (209)
Since4ptitjq2
42
2¡42
42
2¡0due to our assumptions on (see (135)), we obtain
2
2Reveni¥2
24
4k¸
j1
jigptitjq; (210)
for alli1;:::;k. Using the fact that gis decreasing and |titj|¥|ij|, we obtain

j1
jigptitjq¤k¸
j1
jigp|ij|q (211a)
¤28¸
j1gpjq28¸
j1ej22
2¤28¸
j1
e2
2
j
(211b)
2e2
2
1e2
2;fori1;:::;k: (211c)
The sum of of the series is valid because e2
2 1. Combining this with (209) and (210), we obtain:
1Roddi¥1
12
2
2e2
2
1e2
2; (212a)
2
2Reveni¥2
24
42e2
2
1e2
2; (212b)
for alli1;:::;k. By using the assumption that ¡b
log
34
2
, we can check that the lower bound
in (212a) is greater than zero, and by also using that ¤?
2, we can check that it is smaller than than the
lower bound in (212b).
To conclude, since all the eigenvalues of the matrix Bare real and in the union of the discs Dp1;Roddiq
andDp2
2;Reveniqfori1;:::;k, then, by using the above observation and the lower bound in (212a), we
obtain a lower bound of all the eigenvalues of B:
jpBq¥1
12
2
2e2
2
1e2
2;@j1;:::; 2k: (213)
Note that we may be able to obtain better bounds given by (212b) for keigenvalues if we scale the Gershgorin
discs so that Dp1;RoddiqandDp1;Reveniqbecome disjoint.
53

N Proof of Lemma 25
Because ¡b
log5
2, we have that
e2
2 2
5; (214)
which implies that
1
1e2
2 5
52: (215)
Then,2e2
2
1e2
2¡22
52, so
1p12
2q2e2
2
1e2
2¡132
52;i.e.Fmin
;1

¡132
52: (216)
Similarly, we have that:
8
14
4
2
1e2
2 8650440
4p52q 98
4p52q(217)
and
321
42
62
8
24
1e2
2 3210160812042402240
8p52q 792
8p52q;(218)
where in the last two inequalities we used  1. Combining (216),(217) and (218), we obtain (56):
Fmax
;1

Fmin
;1
2 c1
6p132q2: (219)
Then, using that f 1and thatC5
4
f¤pC2kq3
2
f;pC2kq3
2¤pC2kq3
2
f;we have that:

p62
fqc
4k54k
4C5
46
pC2kq3
2 ?
2k2
1?e
2 c2pC2kq3
2
f?
2k2?
4k4544k2
2
 c2pC2kq3
2
f?
2k2?
8k51
2p 1q
 c2pC2kq3
2
fk
2; (220)
for a large enough constant c2and, from (219) and (220), we obtain (57). Similarly we show that (58) holds:
c
p2k2q
4k54k
4
1?e
2Cpf0;f1q5
4
f c5a
pk1qp4k4544kq
2C5
4
f
 c5a
pk1qp8k5q
2C5
4
fp 1q
 c5k
2C5
4
f: (221)
54

To show that (49) is satisfied if (13) holds, from (215) and (216), we have that:
8Fminp;1
q
34p2k2q
80k8kP1
3
1e2
2 1
2¡8p132q
34p52qp2k2q
80k815kPp1
q
52
1
2
8p132q
34p2k2q
p80k8qp52q215p52qkP1
 1
2
¡8p132q
34p2k2q
2000k20075kP1
 1
2p52 5q
¡8p132q6
34p2k2qp2000k1220012c3kq1
2
P1

 c3
12
¡8p132q6
34p2k2qpc3k200q1
2p 1q
¡c36p132q
pk1q3
2; (222)
and similarly:
C1
6

4k44k
2 1
3C1
62
3
p4k2424kq1
3¡C1
62
3
p8k4q1
3¡c4C1
62
3
pk1q1
3; (223)
for some constants c3;c4. Finally, from (214) and (215), we also obtain:
e2
2e22
2
1e2
2 
24
5
1
52 16
210; (224)
where, in the last inequality, we used  1?
3. Furthermore, if  2
5, then 12¡1
5and
e2p12q
2 
2
5 12
 1
1512 1
151{5; (225)
so we can combine the last two inequalities to obtain:
1e2
2e22
2
1e2
2e2p12q
2¡116
2101
151{5¡1
3: (226)
Finally, to bound e22
2, note from the definition of andthat 
2, and from the assumption that
¤2, we obtain that:
e22
2 e2
42¤e2{4 c5; (227)
where we used that  1. Combining the last two inequalities, we obtain that (61) holds for some constant
c5¡0.
O Proof of Lemma 27
In the proof of Lemma 23 in Appendix J we require that f0"f,f0"f1andf0"1. These conditions come
from equations (126), (130) and (132) respectively. We can, therefore, fix fandf1such that f 1and
55

1 f1 f0, and give an expression of f0as a function of fasÑ0. From equation (126), we have that
f0
f¡Nl;1
minlPTCN1;1plq: (228)
which is required by the condition that detpMNqin (123) is positive when lPTC
. WhileNl;1does not
depend on , we can argue that, for Ñ0, the minimum in the denominator is lower bounded by N1;1plq,
where
lPTtt1;t1;:::;tk;tku: (229)
Therefore, a sufficient condition to ensure (228) is:
f0
f¡Nl;1
minlPTN1;1plq: (230)
Note that minlPTN1;1plqÑ0asÑ0, since two rows of the determinant become equal. More explicitly,
let us assume, for simplicity, that the minimum over the finite set of points Tis attained t1. Then, we
subtract the row with lfrom the first row of N1;1plq, and then expand the determinant along this row. By
takinglt1, we obtain
N1;1pt1qm¸
j1p1qj1N1;1;j
gpt1sjqgpt1sjq
;
where the minors N1;1;jare fixed (i.e. independent of ). Therefore, as Ñ0, we have that N1;1pt1qÑ0,
withN1;1pt1q¡0;@. Then, everything else in N1;1pt1qbeing fixed, there exists 0¡0such that6
min
lPTCN1;1plqN1;1pt1q;@ 0: (231)
We will now find the exact rate at which N1;1pt1qÑ0for 0. In the row with linN1;1pt1q, we
Taylor expand the entries in the columns j1;:::;mas follows:
gplsjqgpt1sjqgpt1sjqg1pt1sjq2
2g2pjq; (232)
for somejPrt1sj;t1sjs, and note that jÑt1sjasÑ0. Then
N1;1pt1qgpt1s1q  gpt1smq
g1pt1s1q  g1pt1smq
……
g1pt1s1q
2g2p1q g1pt1smq
2g2pmq
……
gptks1q  gpsmq
g1ptks1q  g1ptksmq
gp1s1q  gp1smq(subtract 1st row from lrow)
2
2gpt1s1q gpt1smq
g1pt1s1qg1pt1smq
……
g2p1q  g2pmq
……
gptks1q gpsmq
g1ptks1q g1ptksmq
gp1s1q gp1smq:2
2N
1;1;(subtract 2nd row from lrow) (233)
6To see this, take fpqN1;1pt1qand we know that fis continuous, fp0q0andfpq¡0;@. IffpqC2onr0;1sfor
some1¡0andC¡0, which we show later in the proof (see (233)), then take Bmin¥1fpq, and then there exists 0¤1
such thatfpq¤B;@ 0. So we have that fpq¤min¥fpq;@ 0, which implies that min¥fpqfpq;@ 0.
56

for 0and note that swapping the lrow with the third row involves an even number of adjacent row
swaps, so the sign of the determinant remains the same. Also, for  0, we have that:
N
1;1Ñgpt1s1q gpt1smq
g1pt1s1q g1pt1smq
g2pt1s1q g2pt1smq
……
gptks1q gpsmq
g1ptks1q g1ptksmq
gp1s1q gp1smq:N1
1;1¡0; (234)
where the last inequality is true because Gaussians form an extended T-system (see [27]) and the determinant
in the limit does not depend on .7
Substituting (233) and (231) into (228), and noting the the minimum in (231) can be attained at any
lPTdefined in (229) (not necessarily at t1, which we assumed above for simplicity), we obtain:
f0
f¡2Nl;1
2minlPTN
1;1;@ 0; (235)
which is the condition we must impose on f0{fso that detpMNq¡0for 0instead of (127). Therefore,
for 0, we set
f0Cf
2;whereC3Nl;1
minlPTN
1;1and lim
Ñ0CPp0;8q: (236)
Using that f1 f0and1 f0, we bound Cpf0;f1qin (52):
Cpf0;f1q c1f2
0; (237)
where c1is a universal constant. Finally, we insert (236) and (237) into (59) and obtain:
C11

 pc1f2
02kq3
2
fpc1C2
f2{42kq3
2
f
 pc1C2
2k4q3
2
f1
6;@ 0; (238)
where we have used that f 1. Similarly, from (60) we obtain
C21

 c2f5
2
0
fc2C5
2f3
21
5
 c2C5
21
5;@ 0; (239)
wherec2is a universal constant, and this concludes the proof.
7Note that in the previous footnote we assumed that fpqC2, whereCis independent of , but actually from (233) we
have thatfpqN1;1pt1q2
2N
1;1, whereN
1;1ÑN1
1;1¡0asÑ0. Our conclusion that the N1;1goes to zero at the
rate2does not change because, for small enough E¡0, there exists 1¡0such that:
0 N1
1;1E N
1;1 N1
1;1E;@ 1;
sofpq¥C22for all 1, whereC2pN1
1;1Eq{2¡0is independent of . Then, using the argument in the previous
footnote, there exists 0¡0such that min¥fpq¥C22for all 0, and therefore the obtain a version of (235) where the
factor in front of 1{2is independent of .
57
View publication statsView publication stats

Similar Posts