Search CORE

27 research outputs found

Lightweight Lempel-Ziv Parsing

Author: D. Okanohara
D. Okanohara
E. Ohlebusch
E. Ohlebusch
G. Chen
G. Navarro
G. Navarro
J. Barbay
J. Fischer
J. Kärkkäinen
J. Ziv
M. Crochemore
M.I. Abouelhoda
P. Ferragina
P. Ferragina
R. Cánovas
S. Kreft
S. Kuruppu
T. Gagie
T. Kasai
T. Starikovskaya
U. Manber
W.I. Chang
Publication venue
Publication date: 01/01/2013
Field of study

We introduce a new approach to LZ77 factorization that uses O(n/d) words of working space and O(dn) time for any d >= 1 (for polylogarithmic alphabet sizes). We also describe carefully engineered implementations of alternative approaches to lightweight LZ77 factorization. Extensive experiments show that the new algorithm is superior in most cases, particularly at the lowest memory levels and for highly repetitive data. As a part of the algorithm, we describe new methods for computing matching statistics which may be of independent interest.Comment: 12 page

arXiv.org e-Print Archive

Crossref

Wave Energy: a Pacific Perspective

Author: D. Gusfield
D. Okanohara
G. Manzini
J. Fischer
J. Kärkkäinen
J. Kärkkäinen
K. Sadakane
M.I. Abouelhoda
P. Ferragina
R. Dementiev
R. Sinha
S.J. Puglisi
S.J. Puglisi
T. Kasai
U. Manber
V. Mäkinen
Publication venue: The Royal Society
Publication date: 01/01/2009
Field of study

This is the author's peer-reviewed final manuscript, as accepted by the publisher. The published article is copyrighted by The Royal Society and can be found at: http://rsta.royalsocietypublishing.org/.This paper illustrates the status of wave energy development in Pacific Rim countries by characterizing the available resource and introducing the region‟s current and potential future leaders in wave energy converter development. It also describes the existing licensing and permitting process as well as potential environmental concerns. Capabilities of Pacific Ocean testing facilities are described in addition to the region‟s vision of the future of wave energy

CiteSeerX

Crossref

ScholarsArchive@OSU

Research Repository RMIT University

Archivio Istituzionale della Ricerca- Università del Piemonte Orientale

Low Space External Memory Construction of the Succinct Permuted Longest Common Prefix Array

Author: D Okanohara
J Fischer
J Fischer
J Fischer
J Kärkkäinen
J Kärkkäinen
J Kärkkäinen
J Sirén
JI Munro
JS Vitter
K Sadakane
K Sadakane
P Ferragina
P Ferragina
P Ferragina
R Dementiev
T Beller
T Kasai
U Manber
W Hon
W Szpankowski
Publication venue
Publication date: 01/01/2016
Field of study

The longest common prefix (LCP) array is a versatile auxiliary data structure in indexed string matching. It can be used to speed up searching using the suffix array (SA) and provides an implicit representation of the topology of an underlying suffix tree. The LCP array of a string of length

n

can be represented as an array of length

n

words, or, in the presence of the SA, as a bit vector of

2n

bits plus asymptotically negligible support data structures. External memory construction algorithms for the LCP array have been proposed, but those proposed so far have a space requirement of

O(n)

words (i.e.

O(n \log n)

bits) in external memory. This space requirement is in some practical cases prohibitively expensive. We present an external memory algorithm for constructing the

2n

bit version of the LCP array which uses

O(n \log \sigma)

bits of additional space in external memory when given a (compressed) BWT with alphabet size

\sigma

and a sampled inverse suffix array at sampling rate

O(\log n)

. This is often a significant space gain in practice where

\sigma

is usually much smaller than

n

or even constant. We also consider the case of computing succinct LCP arrays for circular strings

arXiv.org e-Print Archive

Crossref

MPG.PuRe

Accelerating the annotation of sparse named entities by dynamic sentence selection

Author: A Culotta
A Globerson
A Vlachos
AA Morgan
B Settles
CA Thompson
D Okanohara
D Shen
EF Tjong Kim Sang
I Dagan
J Lafferty
J Nocedal
JD Kim
JD Kim
Jun'ichi Tsujii
K Tomanek
L Tanabe
LR Rabiner
S Engelson
S Kulick
S Sarawagi
Sophia Ananiadou
Yoshimasa Tsuruoka
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Background: Previous studies of named entity recognition have shown that a reasonable level of recognition accuracy can be achieved by using machine learning models such as conditional random fields or support vector machines. However, the lack of training data (i.e. annotated corpora) makes it difficult for machine learning-based named entity recognizers to be used in building practical information extraction systems. Results: This paper presents an active learning-like framework for reducing the human effort required to create named entity annotations in a corpus. In this framework, the annotation work is performed as an iterative and interactive process between the human annotator and a probabilistic named entity tagger. Unlike active learning, our framework aims to annotate all occurrences of the target named entities in the given corpus, so that the resulting annotations are free from the sampling bias which is inevitable in active learning approaches. Conclusion: We evaluate our framework by simulating the annotation process using two named entity corpora and show that our approach can reduce the number of sentences which need to be examined by the human annotator. The cost reduction achieved by the framework could be drastic when the target named entities are sparse. © 2008 Tsuruoka et al; licensee BioMed Central Ltd

Crossref

Springer - Publisher Connector

PubMed Central

The University of Manchester - Institutional Repository

How to make the most of NE dictionaries in statistical NER

Author: A McCallum
B Settles
D Okanohara
EF Tjong Kim Sang
GD Zhou
J Aoe
J Finkel
J Kazama
J Lafferty
J-D Kim
JD Kim
John McNaught
K Franzen
K Fukuda
K Yamamoto
K-M Park
KJ Lee
L Tanabe
LE Baum
M Rössler
N Collier
S Kim
Sophia Ananiadou
T Kudo
TH Tsai
Y Song
Yoshimasa Tsuruoka
Yutaka Sasaki
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

The University of Manchester - Institutional Repository

Partially Decodable Compression with Static PPM

Author: D. Okanohara
Publication venue: Institute of Electrical and Electronics Engineers (IEEE)
Publication date
Field of study

Crossref

Succinct backward-DAWG-matching

Author: Fredriksson K.
Golynski A.
González R.
Grossi R.
Grossi R.
Kimmo Fredriksson
Okanohara D.
Witten I. H.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Faster Lightweight Lempel-Ziv Parsing

Author: A Lempel
D Belazzougui
D Okanohara
E Ohlebusch
G Navarro
J Fischer
J Kärkkäinen
MA Bender
S Burkhardt
T Beller
T Hagerup
T Starikovskaya
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

We present an algorithm that computes the Lempel-Ziv decomposition in O(n(log σ + log log n)) time and n log σ + ɛn bits of space, where ϵ; is a constant rational parameter, n is the length of the input string, and σ is the alphabet size. The n log σ bits in the space bound are for the input string itself which is treated as read-only. © Springer-Verlag Berlin Heidelberg 2015

arXiv.org e-Print Archive

Crossref

Institutional repository of Ural Federal University named after the first President of Russia B.N.Yeltsin

Broadword Implementation of Rank/Select Queries

Author: A. Golynski
D. Okanohara
D.K. Kim
G. Jacobson
L. Lamport
M.L. Fredman
O. Delpratt
P. Elias
R.F. Geary
R.J. Fisher
R.M. Fano
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

Crossref

Sentiment Classification with Support Vector Machines and Multiple Kernel Functions

Author: A. Graf
A. Kennedy
B. Schölkopf
D. Okanohara
D.B. Fogel
H.-G. Beyer
J. Shawe-Taylor
O. Bousquet
V. Kecman
V.N. Vapnik
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Crossref