23,953 research outputs found
Theoretical and Experimental Analysis of a Randomized Algorithm for Sparse Fourier Transform Analysis
We analyze a sublinear RAlSFA (Randomized Algorithm for Sparse Fourier
Analysis) that finds a near-optimal B-term Sparse Representation R for a given
discrete signal S of length N, in time and space poly(B,log(N)), following the
approach given in \cite{GGIMS}. Its time cost poly(log(N)) should be compared
with the superlinear O(N log N) time requirement of the Fast Fourier Transform
(FFT). A straightforward implementation of the RAlSFA, as presented in the
theoretical paper \cite{GGIMS}, turns out to be very slow in practice. Our main
result is a greatly improved and practical RAlSFA. We introduce several new
ideas and techniques that speed up the algorithm. Both rigorous and heuristic
arguments for parameter choices are presented. Our RAlSFA constructs, with
probability at least 1-delta, a near-optimal B-term representation R in time
poly(B)log(N)log(1/delta)/ epsilon^{2} log(M) such that
||S-R||^{2}<=(1+epsilon)||S-R_{opt}||^{2}. Furthermore, this RAlSFA
implementation already beats the FFTW for not unreasonably large N. We extend
the algorithm to higher dimensional cases both theoretically and numerically.
The crossover point lies at N=70000 in one dimension, and at N=900 for data on
a N*N grid in two dimensions for small B signals where there is noise.Comment: 21 pages, 8 figures, submitted to Journal of Computational Physic
Approximate Sparse Recovery: Optimizing Time and Measurements
An approximate sparse recovery system consists of parameters , an
-by- measurement matrix, , and a decoding algorithm, .
Given a vector, , the system approximates by , which must satisfy , where denotes the optimal -term approximation to . For
each vector , the system must succeed with probability at least 3/4. Among
the goals in designing such systems are minimizing the number of
measurements and the runtime of the decoding algorithm, .
In this paper, we give a system with
measurements--matching a lower bound, up to a constant factor--and decoding
time , matching a lower bound up to factors.
We also consider the encode time (i.e., the time to multiply by ),
the time to update measurements (i.e., the time to multiply by a
1-sparse ), and the robustness and stability of the algorithm (adding noise
before and after the measurements). Our encode and update times are optimal up
to factors
Non-Symbolic Fragmentation
This paper reports on the use of non-symbolic fragmentation of data for securing communications. Non-symbolic fragmentation, or NSF, relies on breaking up data into non-symbolic fragments, which are (usually irregularly-sized) chunks whose boundaries do not necessarily coincide with the boundaries of the symbols making up the data. For example, ASCII data is broken up into fragments which may include 8-bit fragments but also include many other sized fragments. Fragments are then separated with a form of path diversity. The secrecy of the transmission relies on the secrecy of one or more of a number of things: the ordering of the fragments, the sizes of the fragments, and the use of path diversity. Once NSF is in place, it can help secure many forms of communication, and is useful for exchanging sensitive information, and for commercial transactions. A sample implementation is described with an evaluation of the technology
Three-dimensional dynamic rupture simulation with a high-order discontinuous Galerkin method on unstructured tetrahedral meshes
Accurate and efficient numerical methods to simulate dynamic earthquake rupture and wave propagation in complex media and complex fault geometries are needed to address fundamental questions in earthquake dynamics, to integrate seismic and geodetic data into emerging approaches for dynamic source inversion, and to generate realistic physics-based earthquake scenarios for hazard assessment. Modeling of spontaneous earthquake rupture and seismic wave propagation by a high-order discontinuous Galerkin (DG) method combined with an arbitrarily high-order derivatives (ADER) time integration method was introduced in two dimensions by de la Puente et al. (2009). The ADER-DG method enables high accuracy in space and time and discretization by unstructured meshes. Here we extend this method to three-dimensional dynamic rupture problems. The high geometrical flexibility provided by the usage of tetrahedral elements and the lack of spurious mesh reflections in the ADER-DG method allows the refinement of the mesh close to the fault to model the rupture dynamics adequately while concentrating computational resources only where needed. Moreover, ADER-DG does not generate spurious high-frequency perturbations on the fault and hence does not require artificial Kelvin-Voigt damping. We verify our three-dimensional implementation by comparing results of the SCEC TPV3 test problem with two well-established numerical methods, finite differences, and spectral boundary integral. Furthermore, a convergence study is presented to demonstrate the systematic consistency of the method. To illustrate the capabilities of the high-order accurate ADER-DG scheme on unstructured meshes, we simulate an earthquake scenario, inspired by the 1992 Landers earthquake, that includes curved faults, fault branches, and surface topography
Reallocation Problems in Scheduling
In traditional on-line problems, such as scheduling, requests arrive over
time, demanding available resources. As each request arrives, some resources
may have to be irrevocably committed to servicing that request. In many
situations, however, it may be possible or even necessary to reallocate
previously allocated resources in order to satisfy a new request. This
reallocation has a cost. This paper shows how to service the requests while
minimizing the reallocation cost. We focus on the classic problem of scheduling
jobs on a multiprocessor system. Each unit-size job has a time window in which
it can be executed. Jobs are dynamically added and removed from the system. We
provide an algorithm that maintains a valid schedule, as long as a sufficiently
feasible schedule exists. The algorithm reschedules only a total number of
O(min{log^* n, log^* Delta}) jobs for each job that is inserted or deleted from
the system, where n is the number of active jobs and Delta is the size of the
largest window.Comment: 9 oages, 1 table; extended abstract version to appear in SPAA 201
List decoding of noisy Reed-Muller-like codes
First- and second-order Reed-Muller (RM(1) and RM(2), respectively) codes are
two fundamental error-correcting codes which arise in communication as well as
in probabilistically-checkable proofs and learning. In this paper, we take the
first steps toward extending the quick randomized decoding tools of RM(1) into
the realm of quadratic binary and, equivalently, Z_4 codes. Our main
algorithmic result is an extension of the RM(1) techniques from Goldreich-Levin
and Kushilevitz-Mansour algorithms to the Hankel code, a code between RM(1) and
RM(2). That is, given signal s of length N, we find a list that is a superset
of all Hankel codewords phi with dot product to s at least (1/sqrt(k)) times
the norm of s, in time polynomial in k and log(N). We also give a new and
simple formulation of a known Kerdock code as a subcode of the Hankel code. As
a corollary, we can list-decode Kerdock, too. Also, we get a quick algorithm
for finding a sparse Kerdock approximation. That is, for k small compared with
1/sqrt{N} and for epsilon > 0, we find, in time polynomial in (k
log(N)/epsilon), a k-Kerdock-term approximation s~ to s with Euclidean error at
most the factor (1+epsilon+O(k^2/sqrt{N})) times that of the best such
approximation
Bias in culture-independent assessments of microbial biodiversity in the global ocean
On the basis of 16S rRNA gene sequencing, the SAR11 clade of marine bacteria has almost universal distribution, being detected as abundant sequences in all marine provinces. Yet SAR11 sequences are rarely detected in fosmid libraries, suggesting that the widespread abundance may be an artefact of PCR cloning and that SAR 11 has a relatively low abundance. Here the relative abundance of SAR11 is explored in both a fosmid library and a metagenomic sequence data set from the same biological community taken from fjord surface water from Bergen, Norway. Pyrosequenced data and 16S clone data confirmed an 11-15% relative abundance of SAR11 within the community. In contrast not a single SAR11 fosmid was identified in a pooled shotgun sequenced data set of 100 fosmid clones. This under-representation was evidenced by comparative abundances of SAR11 sequences assessed by taxonomic annotation; functional metabolic profiling and fragment recruitment. Analysis revealed a similar under-representation of low-GC Flavobacteriaceae. We speculate that the fosmid bias may be due to DNA fragmentation during preparation due to the low GC content of SAR11 sequences and other underrepresented taxa. This study suggests that while fosmid libraries can be extremely useful, caution must be used when directly inferring community composition from metagenomic fosmid libraries
- …
