2024/08/16 更新

イ トモヒロ
井 智弘
I Tomohiro
Scopus 論文情報  
総論文数: 0  総Citation: 0  h-index: 15

Citation Countは当該年に発表した論文の被引用数

所属
大学院情報工学研究院 知能情報工学研究系
職名
准教授
外部リンク

取得学位

  • 九州大学  -  博士(理学)   2012年04月

学内職務経歴

  • 2019年01月 - 現在   九州工業大学   大学院情報工学研究院   知能情報工学研究系     准教授

  • 2015年10月 - 2018年12月   九州工業大学   若手研究者フロンティア研究アカデミー     特任助教

論文

  • Computing Longest (Common) Lyndon Subsequences 査読有り 国際誌

    Hideo Bannai, Tomohiro I, Tomasz Kociumaka, Dominik Köppl, Simon J. Puglisi

    International Workshop on Combinatorial Algorithms (IWOCA) 2022 ( Springer-Verlag )   13270 LNCS   128 - 142   2022年06月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)

    Given a string T with length n whose characters are drawn from an ordered alphabet of size σ, its longest Lyndon subsequence is a longest subsequence of T that is a Lyndon word. We propose algorithms for finding such a subsequence in O(n3) time with O(n) space, or online in O(n3σ) space and time. Our first result can be extended to find the longest common Lyndon subsequence of two strings of length n in O(n4σ) time using O(n3) space.

    DOI: 10.1007/978-3-031-06678-8_10

    DOI: 10.1007/978-3-031-06678-8_10

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85131911688&origin=inward

  • The "runs" theorem 査読有り 国際誌

    Bannai H., I T., Inenaga S., Nakashima Y., Takeda M., Tsuruta K.

    SIAM Journal on Computing   46 ( 5 )   1501 - 1514   2017年01月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(学術雑誌)

    Copyright © by SIAM. We give a new characterization of maximal repetitions (or runs) in strings based on Lyndon words. The characterization leads to a proof of what was known as the "runs" conjecture [R. M. Kolpakov andG.Kucherov, Proceedings of the IEEE Symposium on Foundations of Computer Science (FOCS), IEEE Computer Society, Los Alamitos, CA, 1999, pp. 596-604]), which states that the maximum number of runs ρ(n) in a string of length n is less than n. The proof is remarkably simple, considering the numerous endeavors to tackle this problem in the last 15 years, and significantly improves our understanding of how runs can occur in strings. In addition, we obtain an upper bound of 3n for the maximum sum of exponents σ(n) of runs in a string of length n, improving on the best known bound of 4.1n by Crochemore et al. [J. Discrete Algorithms, 14 (2012), pp. 29-36], as well as other improved bounds on related problems. The characterization also gives rise to a new, conceptually simple linear-time algorithm for computing all the runs in a string. A notable characteristic of our algorithm is that, unlike all existing linear-time algorithms, it does not utilize the Lempel-Ziv factorization of the string. We also establish a relationship between runs and nodes of the Lyndon tree, which gives a simple optimal solution to the 2-period query problem that was recently solved by Kociumaka et al. [Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, (SODA) 2015, San Diego, CA, SIAM, Philadelphia, 2015, pp. 532-551].

    DOI: 10.1137/15M1011032

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85032917335&origin=inward

  • Computing Longest Lyndon Subsequences and Longest Common Lyndon Subsequences 査読有り

    Bannai H., Tomohiro I., Kociumaka T., Köppl D., Puglisi S.J.

    Algorithmica   86 ( 3 )   735 - 756   2024年03月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(学術雑誌)

    Given a string T of length n whose characters are drawn from an ordered alphabet of size σ, its longest Lyndon subsequence is a maximum-length subsequence of T that is a Lyndon word. We propose algorithms for finding such a subsequence in O(n3) time with O(n) space, or online in O(n3) space and time. Our first result can be extended to find the longest common Lyndon subsequence of two strings of length at most n in O(n4σ) time using O(n2) space.

    DOI: 10.1007/s00453-023-01125-z

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85158153704&origin=inward

  • Longest bordered and periodic subsequences 査読有り 国際誌

    Bannai H., I T., Köppl D.

    Information Processing Letters   182   2023年08月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(学術雑誌)

    We present an algorithm computing the longest periodic subsequence of a string of length n in O(n7) time with O(n3) space. We obtain improvements when restricting the exponents or extending the search allowing the reported subsequence to be subperiodic down to O(n2) time and O(n) space. By allowing subperiodic subsequences in the output, the task becomes finding the longest bordered subsequence, for which we devise a conditional lower bound.

    DOI: 10.1016/j.ipl.2023.106398

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85153571879&origin=inward

  • PalFM-Index: FM-Index for Palindrome Pattern Matching 査読有り 国際誌

    Nagashita S., Tomohiro I.

    Leibniz International Proceedings in Informatics, LIPIcs   259   2023年06月

     詳細を見る

    担当区分:責任著者   記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)

    The palindrome pattern matching (pal-matching) is a kind of generalized pattern matching, in which two strings x and y of same length are considered to match (pal-match) if they have the same palindromic structures, i.e., for any possible 1 ≤ i < j ≤ |x| = |y|, x[i..j] is a palindrome if and only if y[i..j] is a palindrome. The pal-matching problem is the problem of searching for, in a text, the occurrences of the substrings that pal-match with a pattern. Given a text T of length n over an alphabet of size σ, an index for pal-matching is to support, given a pattern P of length m, the counting queries that compute the number occ of occurrences of P and the locating queries that compute the occurrences of P. The authors in [I et al., Theor. Comput. Sci., 2013] proposed an O(n lg n)-bit data structure to support the counting queries in O(m lg σ) time and the locating queries in O(m lg σ + occ) time. In this paper, we propose an FM-index type index for the pal-matching problem, which we call the PalFM-index, that occupies 2n lg min(σ, lg n) + 2n + o(n) bits of space and supports the counting queries in O(m) time. The PalFM-indexes can support the locating queries in O(m + ∆occ) time by adding ∆n lg n + n + o(n) bits of space, where ∆ is a parameter chosen from {1, 2, . . ., n} in the preprocessing phase.

    DOI: 10.4230/LIPIcs.CPM.2023.23

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85165939140&origin=inward

  • Substring Complexities on Run-Length Compressed Strings 査読有り 国際誌

    Kawamoto A., I T.

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   13617 LNCS   132 - 143   2022年11月

     詳細を見る

    担当区分:責任著者   記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)

    Let ST(k) denote the set of distinct substrings of length k in a string T, then its cardinality | ST(k) | is called the k-th substring complexity of T. Recently, δ= max { | ST(k) | / k: k≥ 1 } has been shown to be a good compressibility measure of highly-repetitive strings. In this paper, given T of length n in the run-length compressed form of size ρ, we show that δ can be computed in Csort(ρ, n) time and O(ρ) space, where Csort(ρ, n) = O(min (ρlg lg ρ, ρlg ρn) ) is the time complexity for sorting ρ integers with O(lg n) bits each in O(ρ) space in the Word-RAM model with word size Ω(lg n).

    DOI: 10.1007/978-3-031-20643-6_10

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85142675895&origin=inward

  • Privacy-Preserving Feature Selection with Fully Homomorphic Encryption 査読有り 国際誌

    Ono S., Takata J., Kataoka M., Tomohiro I., Shin K., Sakamoto H.

    Algorithms   15 ( 7 )   2022年07月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(学術雑誌)

    For the feature selection problem, we propose an efficient privacy-preserving algorithm. Let D, F, and C be data, feature, and class sets, respectively, where the feature value x(Fi) and the class label x(C) are given for each x ∈ D and Fi ∈ F. For a triple (D, F, C), the feature selection problem is to find a consistent and minimal subset F′ ⊆ F, where ‘consistent’ means that, for any x, y ∈ D, x(C) = y(C) if x(Fi) = y(Fi) for Fi ∈ F′, and ‘minimal’ means that any proper subset of F′ is no longer consistent. On distributed datasets, we consider feature selection as a privacy-preserving problem: assume that semi-honest parties A and B have their own personal DA and DB. The goal is to solve the feature selection problem for DA ∪ DB without sacrificing their privacy. In this paper, we propose a secure and efficient algorithm based on fully homomorphic encryption, and we implement our algorithm to show its effectiveness for various practical data. The proposed algorithm is the first one that can directly simulate the CWC (Combination of Weakest Components) algorithm on ciphertext, which is one of the best performers for the feature selection problem on the plaintext.

    DOI: 10.3390/a15070229

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85133509105&origin=inward

  • Space-Efficient B Trees via Load-Balancing 査読有り 国際誌

    Tomohiro I, Dominik Köppl

    International Workshop on Combinatorial Algorithms (IWOCA) 2022 ( Springer-Verlag )   13270 LNCS   327 - 340   2022年06月

     詳細を見る

    担当区分:責任著者   記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)

    We study succinct variants of B trees in the word RAM model that require s+ o(s) bits of space, where s is the number of bits essentially needed for storing keys and possibly other satellite values. Assuming that elements are sorted by keys (not necessarily in the order of their integer representations), our B trees support standard operations such as searching, insertion and deletion of elements. In some applications it is useful to associate a satellite value to each element, and to support aggregate operations such as computing the sum of values, the minimum/maximum value in a given range, or search operations based on those values. We propose a B tree representation storing n elements in s+ O(s/ lg n) bits of space and supporting all mentioned operations in O(lg n) time.

    DOI: 10.1007/978-3-031-06678-8_24

    DOI: 10.1007/978-3-031-06678-8_24

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85131937834&origin=inward

  • A Compression-Based Multiple Subword Segmentation for Neural Machine Translation 査読有り

    Nonaka K., Yamanouchi K., I T., Okita T., Shimada K., Sakamoto H.

    Electronics (Switzerland)   11 ( 7 )   2022年04月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(学術雑誌)

    In this study, we propose a simple and effective preprocessing method for subword segmentation based on a data compression algorithm. Compression-based subword segmentation has recently attracted significant attention as a preprocessing method for training data in neural machine translation. Among them, BPE/BPE-dropout is one of the fastest and most effective methods compared to conventional approaches; however, compression-based approaches have a drawback in that generating multiple segmentations is difficult due to the determinism. To overcome this difficulty, we focus on a stochastic string algorithm, called locally consistent parsing (LCP), that has been applied to achieve optimum compression. Employing the stochastic parsing mechanism of LCP, we propose LCP-dropout for multiple subword segmentation that improves BPE/BPE-dropout, and we show that it outperforms various baselines in learning from especially small training data.

    DOI: 10.3390/electronics11071014

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85126910361&origin=inward

  • Converting RLBWT to LZ77 in smaller space 査読有り 国際誌

    Masaki Shigekuni, Tomohiro I

    Data Compression Conference (DCC) 2022 ( IEEE Computer Society Press CPS Online )   2022-March   242 - 251   2022年03月

     詳細を見る

    担当区分:最終著者, 責任著者   記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)

    Since highly repetitive strings, which can be compressed greatly, are ubiquitous nowadays, algorithms working on compressed space have gained much more attention. In particular, converting one compressed format to another without explicit decompression is of great interest as we can take advantage of various compressed formats with small cost of space. In this paper, we consider algorithms to convert run-length encoded Burrows-Wheeler Transform (RLBWT) to Lempel-Ziv 77 (LZ77). We implement a simplified version of the algorithm proposed by Nishimoto and Tabei and propose a two-pass algorithm to reduce its peak memory usage in practice. Experimental results show that the two-pass algorithm uses 23% to 37% less space with up to 10% increase of computational time when we use 8 threads in the second pass.

    DOI: 10.1109/DCC52660.2022.00032

    DOI: 10.1109/DCC52660.2022.00032

    Scopus

    CiNii Research

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85134417981&origin=inward

  • PHONI: Streamed Matching Statistics with Multi-Genome References 査読有り 国際誌

    Christina Boucher, Travis Gagie, Tomohiro I, Dominik Köppl, Ben Langmead, Giovanni Manzini, Gonzalo Navarro, Alejandro Pacheco, Massimiliano Rossi

    Data Compression Conference   2021-March   193 - 202   2021年03月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)

    Computing the matching statistics of patterns with respect to a text is a fundamental task in bioinformatics, but a formidable one when the text is a highly compressed genomic database. Bannai et al. gave an efficient solution for this case, which Rossi et al. recently implemented, but it uses two passes over the patterns and buffers a pointer for each character during the first pass. In this paper, we simplify their solution and make it streaming, at the cost of slowing it down slightly. This means that, first, we can compute the matching statistics of several long patterns (such as whole human chromosomes) in parallel while still using a reasonable amount of RAM; second, we can compute matching statistics online with low latency and thus quickly recognize when a pattern becomes incompressible relative to the database. Our code is available at http://github.com/koeppl/phoni.

    DOI: 10.1109/DCC50243.2021.00027

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85106006610&origin=inward

  • A Separation of γ and b via Thue–Morse Words 査読有り

    Bannai H., Funakoshi M., I T., Köppl D., Mieno T., Nishimoto T.

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   12944 LNCS   167 - 178   2021年01月

     詳細を見る

    担当区分:筆頭著者   記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)

    We prove that for n≥ 2, the size b(tn) of the smallest bidirectional scheme for the nth Thue–Morse word tn is n+ 2. Since Kutsukake et al. [SPIRE 2020] show that the size γ(tn) of the smallest string attractor for tn is 4 for n≥ 4, this shows for the first time that there is a separation between the size of the smallest string attractor γ and the size of the smallest bidirectional scheme b, i.e., there exist string families such that γ= o(b).

    DOI: 10.1007/978-3-030-86692-1_14

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85116783139&origin=inward

  • Extracting the Sparse Longest Common Prefix Array from the Suffix Binary Search Tree 査読有り

    I T., Irving R.W., Köppl D., Love L.

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   12944 LNCS   143 - 150   2021年01月

     詳細を見る

    担当区分:筆頭著者   記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)

    Given a text T of length n, the sparse suffix sorting problem asks for the lexicographic order of suffixes starting at m selectable text positions P. The suffix binary search tree [Irving and Love, JDA’03] is a dynamic data structure that can answer this problem dynamically in the sense that insertions and deletions of positions in P are allowed. While a standard binary search tree on strings needs to store two longest-common prefix (LCP) values per node for providing the same query bounds, each suffix binary search tree node only stores a single LCP value and a bit flag. Its tree topology induces the sorting of the m suffixes by an Euler tour in O(m) time. However, it has not been addressed how to compute the lengths of the longest common prefixes of two suffixes with neighboring ranks with this data structure. We show that we can compute these lengths again by an Euler tour in O(m) time.

    DOI: 10.1007/978-3-030-86692-1_12

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85116834863&origin=inward

  • Re-Pair in Small Space 査読有り 国際誌

    Dominik Köppl, Tomohiro I, Isamu Furuya, Yoshimasa Takabatake, Kensuke Sakai, Keisuke Goto

    Algorithms   14 ( 1 )   1 - 20   2020年12月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(学術雑誌)

    Re-Pairis a grammar compression scheme with favorably good compression rates. The computation of Re-Pair comes with the cost of maintaining large frequency tables, which makes it hard to compute Re-Pair on large-scale data sets. As a solution for this problem, we present, given a text of length n whose characters are drawn from an integer alphabet with size σ = nO(1), an O(min(n2, n2 lg logτ n lg lg lg n/ logτ n)) time algorithm computing Re-Pair with max((n/c) lg n, n⌈lg τ⌉) + O(lg n) bits of working space including the text space, where c ≥ 1 is a fixed user-defined constant and τ is the sum of σ and the number of non-terminals. We give variants of our solution working in parallel or in the external memory model. Unfortunately, the algorithm seems not practical since a preliminary version already needs roughly one hour for computing Re-Pair on one megabyte of text.

    DOI: 10.3390/a14010005

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85099168551&origin=inward

  • Practical Random Access to SLP-Compressed Texts 査読有り 国際誌

    Travis Gagie, Tomohiro I, Giovanni Manzini, Gonzalo Navarro, Hiroshi Sakamoto, Louisa Seelbach Benkner, Yoshimasa Takabatake

    Lecture Notes in Computer Science   12303 LNCS   221 - 231   2020年10月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)

    Grammar-based compression is a popular and powerful approach to compressing repetitive texts but until recently its relatively poor time-space trade-offs during real-life construction made it impractical for truly massive datasets such as genomic databases. In a recent paper (SPIRE 2019) we showed how simple pre-processing can dramatically improve those trade-offs, and in this paper we turn our attention to one of the features that make grammar-based compression so attractive: the possibility of supporting fast random access. This is an essential primitive in many algorithms that process grammar-compressed texts without decompressing them and so many theoretical bounds have been published about it, but experimentation has lagged behind. We give a new encoding of grammars that is about as small as the practical state of the art (Maruyama et al., SPIRE 2013) but with significantly faster queries.

    DOI: 10.1007/978-3-030-59212-7_16

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85092111359&origin=inward

  • RePair in Compressed Space and Time 査読有り 国際誌

    Dominik Köppl, Tomohiro I, Isamu Furuya, Yoshimasa Takabatake, Kensuke Sakai, Keisuke Goto

    The Prague Stringology Conference   134 - 147   2020年08月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)

  • Deterministic Sparse Suffix Sorting in the Restore Model 査読有り 国際誌

    Fischer J., I T., Köppl D.

    ACM Transactions on Algorithms   16 ( 4 )   Article No. 50   2020年07月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(学術雑誌)

    Given a text T of length n, we propose a deterministic online algorithm computing the sparse suffix array and the sparse longest common prefix array of T in O(c g lg n + m lg m lg n lg∗ n) time with O(m) words of space under the premise that the space of T is rewritable, where m ≤ n is the number of suffixes to be sorted (provided online and arbitrarily), and c is the number of characters with m ≤ c ≤ n that must be compared for distinguishing the designated suffixes.

    DOI: 10.1145/3398681

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85025816771&origin=inward

  • Refining the r-index 査読有り

    Bannai H., Gagie T., I T.

    Theoretical Computer Science   812   96 - 108   2020年04月

     詳細を見る

    担当区分:責任著者   記述言語:英語   掲載種別:研究論文(学術雑誌)

    © 2019 Elsevier B.V. Gagie, Navarro and Prezza's r-index (SODA, 2018) promises to speed up DNA alignment and variation calling by allowing us to index entire genomic databases, provided certain obstacles can be overcome. In this paper we first strengthen and simplify Policriti and Prezza's Toehold Lemma (DCC '16; Algorithmica, 2017), which inspired the r-index and plays an important role in its implementation. We then show how to update the r-index efficiently after adding a new genome to the database, which is likely to be vital in practice. As a by-product of this result, we obtain an online version of Policriti and Prezza's algorithm for constructing the LZ77 parse from a run-length compressed Burrows-Wheeler Transform. Our experiments demonstrate the practicality of all three of these results. Finally, we show how to augment the r-index such that, given a new genome and fast random access to the database, we can quickly compute the matching statistics and maximal exact matches of the new genome with respect to the database.

    DOI: 10.1016/j.tcs.2019.08.005

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85070385571&origin=inward

  • Dynamic index and LZ factorization in compressed space 査読有り

    Nishimoto T., I T., Inenaga S., Bannai H., Takeda M.

    Discrete Applied Mathematics   274   116 - 129   2020年03月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(学術雑誌)

    © 2019 Elsevier B.V. In this paper, we propose a new dynamic compressed index of O(w) space for a dynamic text T, where w=O(min(zlogNlog∗M,N)) is the size of the signature encoding of T, z is the size of the Lempel–Ziv77 (LZ77) factorization of T, N is the length of T, and M≥N is the maximum length of T. Our index supports searching for a pattern P in T in O(|P|fA+logwlog|P|log∗M(logN+log|P|log∗M)+occlogN) time and insertion/deletion of a substring of length y in O((y+logNlog∗M)logwlogNlog∗M) time, where occ is the number of occurrences of P in T and [Formula presented]. Also, we propose a new space-efficient LZ77 factorization algorithm for a given text T, which runs in O(NfA+zlogwlog3N(log∗N)2) time with O(w) working space.

    DOI: 10.1016/j.dam.2019.01.014

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85061439333&origin=inward

  • Re-pair in small space 査読有り 国際誌

    Koppl D., Tomohiro I., Furuya I., Takabatake Y., Sakai K., Goto K.

    Data Compression Conference Proceedings   2020-March   2020年03月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)

    Re-Pair is a grammar compression scheme with favorably good compression rates. The computation of Re-Pair comes with the cost of maintaining large frequency tables, which makes it hard to compute Re-Pair on large scale data sets. As a solution for this problem we present, given a text of length n whose characters are drawn from an integer alphabet, an O(n^2) time algorithm computing Re-Pair in n lg max(n, τ) bits of working space including the text space, where τ is the number of terminals and non-terminals.

    DOI: 10.1109/DCC47342.2020.00092

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85086859811&origin=inward

  • Faster privacy-preserving computation of edit distance with moves 査読有り

    Yoshimoto Y., Kataoka M., Takabatake Y., I T., Shin K., Sakamoto H.

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   12049 LNCS   308 - 320   2020年01月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)

    © Springer Nature Switzerland AG 2020. We consider an efficient two-party protocol for securely computing the similarity of strings w.r.t. an extended edit distance measure. Here, two parties possessing strings x and y, respectively, want to jointly compute an approximate value for EDM(x, y), the minimum number of edit operations including substring moves needed to transform x into y, without revealing any private information. Recently, the first secure two-party protocol for this was proposed, based on homomorphic encryption, but this approach is not suitable for long strings due to its high communication and round complexities. In this paper, we propose an improved algorithm that significantly reduces the round complexity without sacrificing its cryptographic strength. We examine the performance of our algorithm for DNA sequences compared to previous one.

    DOI: 10.1007/978-3-030-39881-1_26

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85080940455&origin=inward

  • Rpair: Rescaling RePair with Rsync 査読有り

    Gagie T., I T., Manzini G., Navarro G., Sakamoto H., Takabatake Y.

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   11811 LNCS   35 - 44   2019年10月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)

    © 2019, Springer Nature Switzerland AG. Data compression is a powerful tool for managing massive but repetitive datasets, especially schemes such as grammar-based compression that support computation over the data without decompressing it. In the best case such a scheme takes a dataset so big that it must be stored on disk and shrinks it enough that it can be stored and processed in internal memory. Even then, however, the scheme is essentially useless unless it can be built on the original dataset reasonably quickly while keeping the dataset on disk. In this paper we show how we can preprocess such datasets with context-triggered piecewise hashing such that afterwards we can apply RePair and other grammar-based compressors more easily. We first give our algorithm, then show how a variant of it can be used to approximate the LZ77 parse, then leverage that to prove theoretical bounds on compression, and finally give experimental evidence that our approach is competitive in practice.

    DOI: 10.1007/978-3-030-32686-9_3

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85075670034&origin=inward

  • k-Abelian Pattern Matching: Revisited, Corrected, and Extended 査読有り 国際誌

    Golnaz Badkobeh, Hideo Bannai, Maxime Crochemore, Tomohiro I, Shunsuke Inenaga and Shiho Sugimoto

    The Prague Stringology Conference   29 - 40   2019年08月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)

  • RePair in Compressed Space and Time 査読有り

    Sakai K., Ohno T., Goto K., Takabatake Y., I T., Sakamoto H.

    Data Compression Conference Proceedings   2019-March   518 - 527   2019年05月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)

    © 2019 IEEE. Given a string T of length N, the goal of grammar compression is to construct a small context-free grammar generating only T. Among existing grammar compression methods, RePair (recursive paring) [Larsson and Moffat, 1999] is notable for achieving good compression ratios in practice. In this paper, we propose the first RePair algorithm working in compressed space, i.e., potentially o(N) space for highly compressible texts. The key idea is to give a new way to restructure an arbitrary (context-free) grammar S for T into RePair(T) in compressed space and time. We propose an algorithm for RePair(T) running in O(min(N, nm log N)) space and expected O(min(N, nm log N) m) time or O(min(N, nm log N) log log N) time, where n is the size of S and m is the number of variables in RePair(T). We implemented our O(min(N, nm log N) m)-time algorithm and show it can actually run in compressed space. We also present a new approach to reduce the peak memory usage of existing RePair algorithms combining with our algorithms, and show that the new approach outperforms, both in computation time and space, the most space efficient linear-time RePair implementation to date.

    DOI: 10.1109/DCC.2019.00060

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85066320396&origin=inward

  • Improved upper bounds on all maximal α-gapped repeats and palindromes 査読有り

    I T., Köppl D.

    Theoretical Computer Science   753   1 - 15   2019年01月

     詳細を見る

    担当区分:筆頭著者   記述言語:英語   掲載種別:研究論文(学術雑誌)

    © 2018 Elsevier B.V. We show that the number of all maximal α-gapped repeats and palindromes of a word of length n is at most 3(π2/6+5/2)αn and 7(π2/6+1/2)αn−3n−1, respectively.

    DOI: 10.1016/j.tcs.2018.06.033

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85049535167&origin=inward

  • K-abelian pattern matching: Revisited, corrected, and extended 査読有り 国際誌

    Badkobeh G., Bannai H., Crochemore M., Tomohiro I., Inenaga S., Sugimoto S.

    Proceedings of the Prague Stringology Conference, PSC 2019   29 - 40   2019年01月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)

    Two strings of equal length are called k-Abelian equivalent, if they share the same multi-set of factors of length at most k. Ehlers et al. [JDA, 2015] considered the k-Abelian pattern matching problem, where the task is to find all factors in a text T that are k-Abelian equivalent to a pattern P. They claimed a number of algorithmic results for the off-line and on-line versions of the k-Abelian pattern matching problem. In this paper, we first argue that some of the claimed results by Ehlers et al. [JDA, 2015] contain major errors, and then we present a new algorithm that correctly solves the offline version of the problem within the same bounds claimed by Ehlers et al., in O(n + m) time and O(m) space, where n = |T| and m = |P|. We also show how to correct errors in their online algorithm, and errors in their real-time algorithms for a slightly different problem called the extended k-Abelian pattern matching problem.

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85086064946&origin=inward

  • A faster implementation of online RLBWT and its application to LZ77 parsing 査読有り

    Ohno T., Sakai K., Takabatake Y., I T., Sakamoto H.

    Journal of Discrete Algorithms   52-53   18 - 28   2018年09月

     詳細を見る

    担当区分:責任著者   記述言語:英語   掲載種別:研究論文(学術雑誌)

    © 2018 The Authors Run-length encoded Burrows–Wheeler Transformed strings, resulting in Run-Length BWT (RLBWT), is a powerful tool for processing highly repetitive strings. We propose a new algorithm for online RLBWT working in run-compressed space, which runs in O(nlg⁡r) time and O(rlg⁡n) bits of space, where n is the length of input string S received so far and r is the number of runs in the BWT of the reversed S. We improve the state-of-the-art algorithm for online RLBWT in terms of empirical construction time. Adopting the dynamic list for maintaining a total order, we can replace rank queries in a dynamic wavelet tree on a run-length compressed string by the direct comparison of labels in a dynamic list. Enlisting the proposed online RLBWT, we can efficiently compute the LZ77 factorization in run-compressed space. The empirical results show the efficiencies of both our online RLBWT and LZ77 parsing, especially for highly repetitive strings.

    DOI: 10.1016/j.jda.2018.11.002

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85056312105&origin=inward

  • Lempel–Ziv Factorization Powered by Space Efficient Suffix Trees 査読有り

    Fischer J., I T., Köppl D., Sadakane K.

    Algorithmica   80 ( 7 )   2048 - 2081   2018年07月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(学術雑誌)

    © 2017, Springer Science+Business Media, LLC. We show that both the Lempel–Ziv-77 and the Lempel–Ziv-78 factorization of a text of length n on an integer alphabet of size σ can be computed in O(n) time with either O(nlg σ) bits of working space, or (1 + ϵ) nlg n+ O(n) bits (for a constant ϵ> 0) of working space (including the space for the output, but not the text).

    DOI: 10.1007/s00453-017-0333-1

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85025816771&origin=inward

  • Faster online elastic degenerate string matching 査読有り 国際誌

    Aoyama K., Nakashima Y., I T., Inenaga S., Bannai H., Takeda M.

    Leibniz International Proceedings in Informatics, LIPIcs   105   91 - 910   2018年05月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)

    © 2018 Yoshifumi Sakai; licensed under Creative Commons License CC-BY. An Elastic-Degenerate String [Iliopoulus et al., LATA 2017] is a sequence of sets of strings, which was recently proposed as a way to model a set of similar sequences. We give an online algorithm for the Elastic-Degenerate String Matching (EDSM) problem that runs in O(nm √mlogm+ N) time and O(m) working space, where n is the number of elastic degenerate segments of the text, N is the total length of all strings in the text, and m is the length of the pattern. This improves the previous algorithm by Grossi et al. [CPM 2017] that runs in O(nm2 + N) time.

    DOI: 10.4230/LIPIcs.CPM.2018.9

    Scopus

    CiNii Research

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85048268511&origin=inward

  • Lyndon factorization of grammar compressed texts revisited 査読有り 国際誌

    Furuya I., Nakashima Y., I T., Inenaga S., Bannai H., Takeda M.

    Leibniz International Proceedings in Informatics, LIPIcs   105   241 - 2410   2018年05月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)

    © 2018 Yoshifumi Sakai; licensed under Creative Commons License CC-BY. We revisit the problem of computing the Lyndon factorization of a string w of length N which is given as a straight line program (SLP) of size n. For this problem, we show a new algorithm which runs in O(P(n,N) + Q(n,N)n log logN) time and O(n logN + S(n,N)) space where P(n,N), S(n,N), Q(n,N) are respectively the pre-processing time, space, and query time of a data structure for longest common extensions (LCE) on SLPs. Our algorithm improves the algorithm proposed by I et al. (TCS '17), and can be more efficient than the O(N)-time solution by Duval (J. Algorithms '83) when w is highly compressible.

    DOI: 10.4230/LIPIcs.CPM.2018.24

    Scopus

    CiNii Research

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85048271134&origin=inward

  • Online LZ77 parsing and matching statistics with RLBWTs 査読有り 国際誌

    Bannai H., Gagie T., I T.

    Leibniz International Proceedings in Informatics, LIPIcs   105   71 - 712   2018年05月

     詳細を見る

    担当区分:責任著者   記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)

    © 2018 Yoshifumi Sakai; licensed under Creative Commons License CC-BY. Lempel-Ziv 1977 (LZ77) parsing, matching statistics and the Burrows-Wheeler Transform (BWT) are all fundamental elements of stringology. In a series of recent papers, Policriti and Prezza (DCC 2016 and Algorithmica, CPM 2017) showed how we can use an augmented run-length compressed BWT (RLBWT) of the reverse TR of a text T, to compute offline the LZ77 parse of T in O(n log r) time and O(r) space, where n is the length of T and r is the number of runs in the BWT of TR. In this paper we first extend a well-known technique for updating an unaugmented RLBWT when a character is prepended to a text, to work with Policriti and Prezza's augmented RLBWT. This immediately implies that we can build online the LZ77 parse of T while still using O(n log r) time and O(r) space; it also seems likely to be of independent interest. Our experiments, using an extension of Ohno, Takabatake, I and Sakamoto's (IWOCA 2017) implementation of updating, show our approach is both time- and space-efficient for repetitive strings. We then show how to augment the RLBWT further -albeit making it static again and increasing its space by a factor proportional to the size of the alphabet -such that later, given another string S and O(log log n)-time random access to T, we can compute the matching statistics of S with respect to T in O(|S| log log n) time.

    DOI: 10.4230/LIPIcs.CPM.2018.7

    Scopus

    CiNii Research

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85048304169&origin=inward

  • Tighter Bounds and Optimal Algorithms for All Maximal α-gapped Repeats and Palindromes: Finding All Maximal α-gapped Repeats and Palindromes in Optimal Worst Case Time on Integer Alphabets 査読有り

    Gawrychowski P., I T., Inenaga S., Köppl D., Manea F.

    Theory of Computing Systems   62 ( 1 )   162 - 191   2018年01月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(学術雑誌)

    © 2017, The Author(s). An α-gapped repeat (α ≥ 1) in a word w is a factor uvu of w such that |uv| ≤ α|u|; the two occurrences of u are called arms of this α-gapped repeat. An α-gapped repeat is called maximal if its arms cannot be extended simultaneously with the same character to the right nor to the left. We show that the number of all maximal α-gapped repeats occurring in words of length n is upper bounded by 18αn. In the case of α-gapped palindromes, i.e., factors uvu⊺ with |uv|≤ α|u|, we show that the number of all maximal α-gapped palindromes occurring in words of length n is upper bounded by 28αn + 7n. Both upper bounds allow us to construct algorithms finding all maximal α-gapped repeats and/or all maximal α-gapped palindromes of a word of length n on an integer alphabet of size nO(1) in O(αn) time. The presented running times are optimal since there are words that have Θ(αn) maximal α-gapped repeats/palindromes.

    DOI: 10.1007/s00224-017-9794-5

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85028529401&origin=inward

  • A faster implementation of online run-length burrows-wheeler transform 査読有り

    Ohno T., Takabatake Y., I T., Sakamoto H.

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   10765 LNCS   409 - 419   2018年01月

     詳細を見る

    担当区分:責任著者   記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)

    © Springer International Publishing AG, part of Springer Nature 2018. Run-length encoding Burrows-Wheeler Transformed strings, resulting in Run-Length BWT (RLBWT), is a powerful tool for processing highly repetitive strings. We propose a new algorithm for online RLBWT working in run-compressed space, which runs in O(n lg r) time and O(r lg n) bits of space, where n is the length of input string S received so far and r is the number of runs in the BWT of the reversed S. We improve the state-of-the-art algorithm for online RLBWT in terms of empirical construction time. Adopting the dynamic list for maintaining a total order, we can replace rank queries in a dynamic wavelet tree on a run-length compressed string by the direct comparison of labels in a dynamic list. The empirical result for various benchmarks show the efficiency of our algorithm, especially for highly repetitive strings.

    DOI: 10.1007/978-3-319-78825-8_33

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85046002496&origin=inward

  • Block palindromes: A new generalization of palindromes 査読有り

    Goto K., I T., Bannai H., Inenaga S.

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   11147 LNCS   183 - 190   2018年01月

     詳細を見る

    担当区分:責任著者   記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)

    © Springer Nature Switzerland AG 2018. We study a new generalization of palindromes and gapped palindromes called block palindromes. A block palindrome is a string that becomes a palindrome when identical substrings are replaced with a distinct character. We investigate several properties of block palindromes and in particular, study substrings of a string which are block palindromes. In so doing, we introduce the notion of a maximal block palindrome, which leads to a compact representation of all block palindromes that occur in a string. We also propose an algorithm which enumerates all maximal block palindromes that appear in a given string T in O(|T|+||MBP(T)||) time, where ||MBP(T)|| is the output size, which is optimal unless all the maximal block palindromes can be represented in a more compact way.

    DOI: 10.1007/978-3-030-00479-8_15

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85054822428&origin=inward

  • Approximate Frequent Pattern Discovery in Compressed Space 査読有り

    FUKUNAGA Shouhei, TAKABATAKE Yoshimasa, I Tomohiro, SAKAMOTO Hiroshi

    IEICE Transactions on Information and Systems   101 ( 3 )   593 - 601   2018年01月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(学術雑誌)

    <p>A grammar compression is a restricted context-free grammar (CFG) that derives a single string deterministically. The goal of a grammar compression algorithm is to develop a smaller CFG by finding and removing duplicate patterns, which is simply a frequent pattern discovery process. Any frequent pattern can be obtained in linear time; however, a huge working space is required for longer patterns, and the entire string must be preloaded into memory. We propose an <i>online</i> algorithm to address this problem approximately within compressed space. For an input sequence of symbols, <i>a</i><sub>1</sub>,<i>a</i><sub>2</sub>,..., let <i>G<sub>i</sub></i> be a grammar compression for the string <i>a</i><sub>1</sub><i>a</i><sub>2</sub>…<i>a<sub>i</sub></i>. In this study, an online algorithm is considered one that can compute <i>G</i><sub><i>i</i>+1</sub> from (<i>G<sub>i</sub></i>,<i>a</i><sub><i>i</i>+1</sub>) without explicitly decompressing <i>G<sub>i</sub></i>. Here, let <i>G</i> be a grammar compression for string <i>S</i>. We say that variable <i>X</i> approximates a substring <i>P</i> of <i>S</i> within approximation ratio <i>δ</i> iff for any interval [<i>i</i>,<i>j</i>] with <i>P</i>=<i>S</i>[<i>i</i>,<i>j</i>], the parse tree of <i>G</i> has a node labeled with <i>X</i> that derives <i>S</i>[<i>l</i>,<i>r</i>] for a subinterval [<i>l</i>,<i>r</i>] of [<i>i</i>,<i>j</i>] satisfying |[<i>l</i>,<i>r</i>]|≥<i>δ</i>|[<i>i</i>,<i>j</i>]|. Then, <i>G</i> solves the frequent pattern discovery problem approximately within <i>δ</i> iff for any frequent pattern <i>P</i> of <i>S</i>, there exists a variable that approximates <i>P</i> within <i>δ</i>. Here, <i>δ</i> is called the approximation ratio of <i>G</i> for <i>S</i>. Previously, the best approximation ratio obtained by a polynomial time algorithm was Ω(1/lg<sup>2</sup>|<i>P</i>|). The main contribution of this work is to present a new lower bound Ω(1/<<sup>*</sup>|<i>S</i>|lg|<i>P</i>|) that is smaller than the previous bound when lg<sup>*</sup>|<i>S</i>|<lg|<i>P</i>|. Experimental results demonstrate that the proposed algorithm extracts sufficiently long frequent patterns and significantly reduces memory consumption compared to the offline algorithm in the previous work.</p>

    DOI: 10.1587/transinf.2017FCP0010

    CiNii Article

    その他リンク: https://ci.nii.ac.jp/naid/130006414054

  • LZ-ABT: A practical algorithm for α-balanced grammar compression 査読有り

    Ohno T., Goto K., Takabatake Y., I T., Sakamoto H.

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   10979 LNCS   323 - 335   2018年01月

     詳細を見る

    担当区分:責任著者   記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)

    © Springer International Publishing AG, part of Springer Nature 2018. We propose a new LZ78-style grammar compression algorithm, named LZ-ABT, which is a simple online algorithm to create, given a string of length N over an alphabet of size σ, an α -balanced grammar in O(N logN log σ) time and O(n) space in addition to the input string, where n is the grammar size to output. LZ-ABT can avoid the lower-bound of Ω(N5/4) time of the naive algorithms for LZMW and LZD, other LZ78-style compression algorithms, which was observed in [Badkobeh et al. SPIRE 2017, pp. 51–67]. We also show that the algorithm can be executed in compressed space, i.e., without storing the whole input string explicitly in memory: in O(N log2 N log σ) time and O(n) space, or O(N logN log σ) time and O(n log* N) space. We implement LZ-ABT running in O(N logN log σ) time and O(N) space and empirically show that its performance is competitive to LZD. This is the first practical implementation of α -balanced grammar compression to the best of our knowledge.

    DOI: 10.1007/978-3-319-94667-2_27

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85049890021&origin=inward

  • The Runs Theorem and Beyond 招待有り 査読有り

    I T.

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   11088 LNCS   18 - 23   2018年01月

     詳細を見る

    担当区分:筆頭著者   記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)

    © 2018, Springer Nature Switzerland AG. Repetitions in a string are fundamental properties the string possesses, which have been extensively studied in word combinatorics and utilized in many efficient string processing algorithms. Particularly maximal repetitions, also known as runs, are useful for representing all the repetitions in the string. Since it was shown that the number of runs in a string of length n is upper bounded by O(n) [Kolpakov and Kucherov, FOCS, pp. 596–604, 1999], the following conjecture (known as the “runs” conjecture) have been attracting the attention of many researchers: The number of runs in a string of length n is upper bounded by n. This conjecture was recently solved affirmatively using a characterization based on Lyndon words [Bannai et al., SIAM J Comput, pp. 1501–1514, 2017]. The characterization not only gives a surprisingly simple proof to the 15-years open problem but also provides completely new insights on how repetitions are packed into a string. In this article, we will briefly review the runs theorem and some related topics.

    DOI: 10.1007/978-3-319-98654-8_2

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85053865754&origin=inward

  • Privacy-Preserving String Edit Distance with Moves 査読有り

    Nakagawa S., Sakamoto T., Takabatake Y., I T., Shin K., Sakamoto H.

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   11223 LNCS   226 - 240   2018年01月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)

    © 2018, Springer Nature Switzerland AG. We propose the first two-party protocol for securely computing an extended edit distance. The parties possessing their respective strings x and y want to securely compute the edit distance with move operations (EDM), that is, the minimum number of insertions, deletions, renaming of symbols, or substring moves required to transform x to y. Although computing the exact EDM is NP-hard, there exits an almost linear-time algorithm within the approximation ratio *NN for. We extend this algorithm to the privacy-preserving computation enlisting the homomorphic encryption scheme so that the party can obtain the approximate EDM without revealing their privacy under the semi-honest model.

    DOI: 10.1007/978-3-030-02224-2_18

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85055117003&origin=inward

  • A space-optimal grammar compression 査読有り

    Takabatake Y., I T., Sakamoto H.

    Leibniz International Proceedings in Informatics, LIPIcs   87   2017年09月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)

    A grammar compression is a context-free grammar (CFG) deriving a single string deterministically. For an input string of length N over an alphabet of size σ, the smallest CFG is O(lgN)-Approximable in the offline setting and O(lgN lg∗ N)-Approximable in the online setting. In addition, an information-Theoretic lower bound for representing a CFG in Chomsky normal form of n variables is lg(n!/nσ) + n + o(n) bits. Although there is an online grammar compression algorithm that directly computes the succinct encoding of its output CFG with O(lgN lg∗ N) approximation guarantee, the problem of optimizing its working space has remained open. We propose a fully-online algorithm that requires the fewest bits of working space asymptotically equal to the lower bound in O(N lg lg n) compression time. In addition we propose several techniques to boost grammar compression and show their efficiency by computational experiments.

    DOI: 10.4230/LIPIcs.ESA.2017.67

    Scopus

    CiNii Research

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85030564164&origin=inward

  • Inferring strings from Lyndon factorization 査読有り

    Nakashima Y., Okabe T., I T., Inenaga S., Bannai H., Takeda M.

    Theoretical Computer Science   689   147 - 156   2017年08月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(学術雑誌)

    © 2017 Elsevier B.V. The Lyndon factorization of a string w is a unique factorization ℓ1p1,…,ℓmpm of w such that ℓ1,…,ℓm is a sequence of Lyndon words that is monotonically decreasing in lexicographic order. In this paper, we consider the reverse-engineering problem on Lyndon factorization: Given a sequence S=((s1,p1),…,(sm,pm)) of ordered pairs of positive integers, find a string w whose Lyndon factorization corresponds to the input sequence S, i.e., the Lyndon factorization of w is in a form of ℓ1p1,…,ℓmpm with |ℓi|=si for all 1≤i≤m. Firstly, we show that there exists a simple O(n)-time algorithm if the size of the alphabet is unbounded, where n is the length of the output string. Secondly, we present an O(n)-time algorithm to compute a string over an alphabet of the smallest size. Thirdly, we show how to compute only the size of the smallest alphabet in O(m) time. Fourthly, we give an O(m)-time algorithm to compute an O(m)-size representation of a string over an alphabet of the smallest size. Finally, we propose an efficient algorithm to enumerate all strings whose Lyndon factorizations correspond to S.

    DOI: 10.1016/j.tcs.2017.05.038

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85020825934&origin=inward

  • Longest common extensions with recompression 査読有り

    I T.

    Leibniz International Proceedings in Informatics, LIPIcs   78   2017年07月

     詳細を見る

    担当区分:筆頭著者   記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)

    © Tomohiro I. Given two positions i and j in a string T of length N, a longest common extension (LCE) query asks for the length of the longest common prefix between suffixes beginning at i and j. A compressed LCE data structure stores T in a compressed form while supporting fast LCE queries. In this article we show that the recompression technique is a powerful tool for compressed LCE data structures. We present a new compressed LCE data structure of size O(z lg(N/z)) that supports LCE queries in O(lg N) time, where z is the size of Lempel-Ziv 77 factorization without self-reference of T. Given T as an uncompressed form, we show how to build our data structure in O(N) time and space. Given T as a grammar compressed form, i.e., a straight-line program of size n generating T, we show how to build our data structure in O(nlg(N/n)) time and O(n + z lg(N/z)) space. Our algorithms are deterministic and always return correct answers.

    DOI: 10.4230/LIPIcs.CPM.2017.18

    Scopus

    CiNii Research

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85027268431&origin=inward

  • Faster Lyndon factorization algorithms for SLP and LZ78 compressed text 査読有り

    I T., Nakashima Y., Inenaga S., Bannai H., Takeda M.

    Theoretical Computer Science   656   215 - 224   2016年12月

     詳細を見る

    担当区分:筆頭著者   記述言語:英語   掲載種別:研究論文(学術雑誌)

    © 2016 Elsevier B.V. We present two efficient algorithms which, given a compressed representation of a string w of length N, compute the Lyndon factorization of w. Given a straight line program (SLP) S of size n that describes w, the first algorithm runs in O(n2+P(n,N)+Q(n,N)nlog⁡n) time and O(n2+S(n,N)) space, where P(n,N), S(n,N), Q(n,N) are respectively the pre-processing time, space, and query time of a data structure for longest common extensions (LCE) on SLPs. Given the Lempel–Ziv 78 encoding of size s for w, the second algorithm runs in O(slog⁡s) time and space.

    DOI: 10.1016/j.tcs.2016.03.005

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84961175476&origin=inward

  • Closed factorization 査読有り

    Badkobeh G., Bannai H., Goto K., I T., Iliopoulos C., Inenaga S., Puglisi S., Sugimoto S.

    Discrete Applied Mathematics   212   23 - 29   2016年10月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(学術雑誌)

    © 2016 Elsevier B.V. A closed string is a string with a proper substring that occurs in the string as a prefix and a suffix, but not elsewhere. Closed strings were introduced by Fici (2011) as objects of combinatorial interest in the study of Trapezoidal and Sturmian words. In this paper we present algorithms for computing closed factors (substrings) in strings. First, we consider the problem of greedily factorizing a string into a sequence of longest closed factors. We describe an algorithm for this problem that uses linear time and space. We then consider the related problem of computing, for every position in the string, the longest closed factor starting at that position. We describe a simple algorithm for the problem that runs in O(nlogn/loglogn) time, where n is the length of the string. This also leads to an algorithm to compute the maximal closed factor containing (i.e. covering) each position in the string in O(nlogn/loglogn) time. We also present linear time algorithms to factorize a string into a sequence of shortest closed factors of length at least two, to compute the shortest closed factor of length at least two starting at each position of the string, and to compute a minimal closed factor of length at least two containing each position of the string.

    DOI: 10.1016/j.dam.2016.04.009

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84967167017&origin=inward

  • オンライン文法圧縮による頻出パターン発見 査読有り

    福永 祥平, 高畠 嘉将, 井 智弘, 坂本 比呂志

    人工知能学会研究会資料 人工知能基本問題研究会 ( 一般社団法人 人工知能学会 )   101 ( 0 )   06   2016年08月

     詳細を見る

    記述言語:日本語   掲載種別:研究論文(学術雑誌)

    DOI: 10.11517/jsaifpai.101.0_06

    CiNii Article

    CiNii Research

    その他リンク: http://id.ndl.go.jp/bib/027587749

  • Fully dynamic data structure for LCE queries in compressed space 査読有り

    Nishimoto T., I T., Inenaga S., Bannai H., Takeda M.

    Leibniz International Proceedings in Informatics, LIPIcs   58   2016年08月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)

    © Takaaki Nishimoto, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. A Longest Common Extension (LCE) query on a text T of length N asks for the length of the longest common prefix of suffixes starting at given two positions. We show that the signature encoding G of size w = O(min(z logN log∗M,N)) [Mehlhorn et al., Algorithmica 17(2):183- 198, 1997] of T, which can be seen as a compressed representation of T, has a capability to support LCE queries in O(logN + log ℓ log∗M) time, where ℓ is the answer to the query, z is the size of the Lempel-Ziv77 (LZ77) factorization of T, and M ≥ 4N is an integer that can be handled in constant time under word RAM model. In compressed space, this is the fastest deterministic LCE data structure in many cases. Moreover, G can be enhanced to support efficient update operations: After processing G in O(wfA) time, we can insert/delete any (sub)string of length y into/from an arbitrary position of T in O((y + logN log∗M)fA) time, where fA = O(min{log log M log log w/log log log M, √log w/log log w}). This yields the first fully dynamic LCE data structure working in compressed space. We also present efficient construction algorithms from various types of inputs: We can construct G in O(NfA) time from uncompressed string T; in O(n log log(n log∗M) logN log∗M) time from grammar-compressed string T represented by a straight-line program of size n; and in O(zfA logN log∗M) time from LZ77-compressed string T with z factors. On top of the above contributions, we show several applications of our data structures which improve previous best known results on grammar-compressed string processing.

    DOI: 10.4230/LIPIcs.MFCS.2016.72

    Scopus

    CiNii Research

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85012885943&origin=inward

  • Deterministic sub-linear space LCE data structures with efficient construction 査読有り

    Tanimura Y., I T., Bannai H., Inenaga S., Puglisi S., Takeda M.

    Leibniz International Proceedings in Informatics, LIPIcs   54   1.1 - 1.10   2016年06月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)

    © Yuka Tanimura, Tomohiro I, Hideo Bannai, Shunsuke Inenaga, Simon J. Puglisi, and Masayuki Takeda. Given a string S of n symbols, a longest common extension query LCE(i, j) asks for the length of the longest common prefix of the ith and jth suffixes of S. LCE queries have several important applications in string processing, perhaps most notably to suffix sorting. Recently, Bille et al. (J. Discrete Algorithms 25:42-50, 2014, Proc. CPM 2015:65-76) described several data structures for answering LCE queries that offers a trade-off between data structure size and query time. In particular, for a parameter 1 ≤ τ ≤ n, their best deterministic solution is a data structure of size O(n/τ) which allows LCE queries to be answered in O(τ) time. However, the construction time for all deterministic versions of their data structure is quadratic in n. In this paper, we propose a deterministic solution that achieves a similar space-time trade-off of O(τ min{log τ, log n/τ}) query time using O(n/τ) space, but we significantly improve the construction time to O(nτ).

    DOI: 10.4230/LIPIcs.CPM.2016.1

    Scopus

    CiNii Research

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85011977600&origin=inward

  • Efficiently finding all maximal α-gapped repeats 査読有り

    Gawrychowski P., I T., Inenaga S., Köppl D., Manea F.

    Leibniz International Proceedings in Informatics, LIPIcs   47   2016年02月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)

    © Paweł Gawrychowski, Tomohiro I, Shunsuke Inenaga, Dominik Köppl, and Florin Manea; licensed under Creative Commons License CC-BY. For α ≥ 1, an α-gapped repeat in a word w is a factor uvu of w such that |uv| ≤ α|u|; the two occurrences of a factor u in such a repeat are called arms. Such a repeat is called maximal if its arms cannot be extended simultaneously with the same symbol to the right nor to the left. We show that the number of all maximal α-gapped repeats occurring in words of length n is upper bounded by 18αn, allowing us to construct an algorithm finding all maximal α-gapped repeats of a word on an integer alphabet of size nO(1); in O(αn) time. This result is optimal as there are words that have Θ(αn) maximal α-gapped repeats. Our techniques can be extended to get comparable results in the case of α-gapped palindromes, i.e., factors uvuT with |uv| ≤ α|u|.

    DOI: 10.4230/LIPIcs.STACS.2016.39

    Scopus

    CiNii Research

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84961615105&origin=inward

  • Dynamic index and LZ factorization in compressed space 査読有り

    Nishimoto T., I T., Inenaga S., Bannai H., Takeda M.

    Proceedings of the Prague Stringology Conference, PSC 2016   158 - 171   2016年01月

     詳細を見る

    担当区分:責任著者   記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)

    In this paper, we propose a new dynamic compressed index of O(w) space for a dynamic text T, where w = O(min(z logN log-M,N)) is the size of the signature encoding of T, z is the size of the Lempel-Ziv77 (LZ77) factorization of T, N is the length of T, and M ≥ 4N is an integer that can be handled in constant time under word RAM model. Our index supports searching for a pattern P in T in O(|P|fA + log w log |P| log-M(logN + log |P| log-M) + occ logN) time and insertion/deletion of a substring of length y in O((y + logN log-M) log w logN log-M) time, where fA = O(min{log logM log log w log log logM , √ log w log log w}). Also, we propose a new spaceefficient LZ77 factorization algorithm for a given text of length N, which runs in O(NfA + z log w log3 N(log- N)2) time with O(w) working space.

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85027267678&origin=inward

  • Deterministic sparse suffix sorting on rewritable texts 査読有り

    Fischer J., I T., Köppl D.

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   9644   483 - 496   2016年01月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)

    © Springer-Verlag Berlin Heidelberg 2016. Given a rewritable text T of length n on an alphabet of size σ, we propose an online algorithm computing the sparse suffix array and the sparse longest common prefix array of T in (formula presented) time by using the text space and O(m) additional working space, where m ≤ n is the number of suffixes to be sorted (provided online and arbitrarily), and c ≥ m is the number of characters that must be compared for distinguishing the designated suffixes.

    DOI: 10.1007/978-3-662-49529-2_36

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84961711921&origin=inward

  • Constructing LZ78 tries and position heaps in linear time for large alphabets 査読有り

    Nakashima Y., I T., Inenaga S., Bannai H., Takeda M.

    Information Processing Letters   115 ( 9 )   655 - 659   2015年09月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(学術雑誌)

    © 2015 Elsevier B.V. We present the first worst-case linear-time algorithm to compute the Lempel-Ziv 78 factorization of a given string over an integer alphabet. Our algorithm is based on nearest marked ancestor queries on the suffix tree of the given string. We also show that the same technique can be used to construct the position heap of a set of strings in worst-case linear time, when the set of strings is given as a trie.

    DOI: 10.1016/j.ipl.2015.04.002

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84927723236&origin=inward

  • Compressed automata for dictionary matching 査読有り

    I T., Nishimoto T., Inenaga S., Bannai H., Takeda M.

    Theoretical Computer Science   578   30 - 41   2015年05月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(学術雑誌)

    © 2015 Elsevier B.V.. We address a variant of the dictionary matching problem where the dictionary is represented by a straight line program (SLP). For a given SLP-compressed dictionary D of size n and height h representing m patterns of total length N, we present an O(n2log N)-size representation of Aho-Corasick automaton which recognizes all occurrences of the patterns in D in amortized O(h+m) running time per character. We also propose an algorithm to construct this compressed Aho-Corasick automaton in O(n3log n log N) time and O(n2log N) space. In a spacial case where D represents only a single pattern, we present an O(n log N)-size representation of the Morris-Pratt automaton which permits us to find all occurrences of the pattern in amortized O(h) running time per character, and we show how to construct this representation in O(n3log n log N) time with O(n2log N) working space.

    DOI: 10.1016/j.tcs.2015.01.019

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84951866541&origin=inward

  • Detecting regularities on grammar-compressed strings 査読有り

    I T., Matsubara W., Shimohira K., Inenaga S., Bannai H., Takeda M., Narisawa K., Shinohara A.

    Information and Computation   240   74 - 89   2015年02月

     詳細を見る

    担当区分:筆頭著者   記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)

    © 2014 Elsevier Inc. All rights reserved. We address the problems of detecting and counting various forms of regularities in a string represented as a straight-line program (SLP) which is essentially a context free grammar in the Chomsky normal form. Given an SLP of size n that represents a string s of length N, our algorithm computes all runs and squares in s in O(n3h) time and O(n2) space, where h is the height of the derivation tree of the SLP. We also show an algorithm to compute all gapped-palindromes in O(n3h + gnh log N) time and O(n2) space, where g is the length of the gap. As one of the main components of the above solution, we propose a new technique called approximate doubling which seems to be a useful tool for a wide range of algorithms on SLPs. Indeed, we show that the technique can be used to compute the periods and covers of the string in O(n2h) time and O(nh(n + log2 N)) time, respectively.

    DOI: 10.1016/j.ic.2014.09.009

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85027953511&origin=inward

  • A new characterization of maximal repetitions by Lyndon trees 査読有り

    Bannai H., I T., Inenaga S., Nakashima Y., Takeda M., Tsuruta K.

    Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms   2015-January ( January )   562 - 571   2015年01月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)

    Copyright © 2015 by the Society for Industrial and Applied Mathmatics. We give a new characterization of maximal repetitions (or runs) in strings, using a tree defined on recursive standard factorizations of Lyndon words, called the Lyndon tree. The characterization leads to a remarkably simple novel proof of the linearity of the maximum number of runs p(n) in a string of length n. Furthermore, we show an upper bound of p(n) < 1.5n, which improves on the best upper bound 1.6n (Crochemore & Hie 2008) that does not rely on computational verification. The proof also gives rise to a new, conceptually simple linear-time algorithm for computing all the runs in a string. A notable characteristic of our algorithm is that, unlike all existing linear-time algorithms, it does not utilize the Lempel-Ziv factorization of the string.

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84938221386&origin=inward

  • Beyond the runs theorem 査読有り

    Fischer J., Holub Š., I T., Lewenstein M.

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   9309   277 - 286   2015年01月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)

    © Springer International Publishing Switzerland 2015. In [3], a short and elegant proof was presented showing that a word of length n contains at most n - 3 runs. Here we show, using the same technique and a computer search, that the number of runs in a binary word of length n is at most 22/23n < 0.957n.

    DOI: 10.1007/978-3-319-23826-5_27

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84944682455&origin=inward

  • Arithmetics on suffix arrays of Fibonacci words 査読有り

    Köppl D., I T.

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   9304   135 - 146   2015年01月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)

    © Springer International Publishing Switzerland 2015. We study the sequence of Fibonacci words and some of its derivatives with respect to their suffix array, inverse suffix array and Burrows-Wheeler transform based on the respective suffix array.We show that the suffix array is a rotation of its inverse under certain conditions, and that the factors of the LZ77 factorization of any Fibonacci word yield again similar characteristics.

    DOI: 10.1007/978-3-319-23660-5_12

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84945955196&origin=inward

  • A faster algorithm for computing maximal α-gapped repeats in a string 査読有り

    Tanimura Y., Fujishige Y., I T., Inenaga S., Bannai H., Takeda M.

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   9309   124 - 136   2015年01月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)

    © Springer International Publishing Switzerland 2015. A string x = uvu with both u, v being non-empty is called a gapped repeat with period p = |uv|, and is denoted by pair (x, p). If p ≤ α(|x| − p) with α > 1, then (x, p) is called an α-gapped repeat. An occurrence [i, i+|x|−1] of an α-gapped repeat (x, p) in a string w is called a maximal α-gapped repeat of w, if it cannot be extended either to the left or to the right in w with the same period p. Kolpakov et al. (CPM 2014) showed that, given a string of length n over a constant alphabet, all the occurrences of maximal α-gapped repeats in the string can be computed in O(α2n+occ) time, where occ is the number of occurrences. In this paper, we propose a faster O(αn + occ)-time algorithm to solve this problem, improving the result of Kolpakov et al. by a factor of α.

    DOI: 10.1007/978-3-319-23826-5_13

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84944682942&origin=inward

  • Inferring strings from full abelian periods 査読有り

    Nishida M., I T., Inenaga S., Bannai H., Takeda M.

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   9472   768 - 779   2015年01月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)

    © Springer-Verlag Berlin Heidelberg 2015. Strings u, v are said to be Abelian equivalent if u is a permutation of the characters appearing in v. A string w is said to have a full Abelian period p if w = w1 …wk, where all wi’s are of length p each and are all Abelian equivalent. This paper studies reverse-engineering problems on full Abelian periods. Given a positive integer n and a set D of divisors of n, we show how to compute in O(n) time the lexicographically smallest string of length n which has all elements of D as its full Abelian periods and has the minimum number of full Abelian periods not in D. Moreover, we give an algorithm to enumerate all such strings in amortized constant time per output after O(n)-time preprocessing. Also, we show how to enumerate the strings which have all elements of D as its full Abelian periods in amortized constant time per output after O(n)-time preprocessing.

    DOI: 10.1007/978-3-662-48971-0_64

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84951950947&origin=inward

  • Semi-dynamic compact index for short patterns and succinct van Emde Boas tree 査読有り

    Matsuoka Y., I T., Inenaga S., Bannai H., Takeda M.

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   9133   355 - 366   2015年01月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)

    © Springer International Publishing Switzerland 2015. We present a compact semi-dynamic text index which allows us to find short patterns efficiently. For parameters k ≤ q ≤ logσ n − logσ logσ n and alphabet size σ = O(polylog(n)), all occ occurrences of a pattern of length at most q−k+1 can be obtained in O(k×occ +logσ n) time, where n is the length of the text. Adding characters to the end of the text is supported in amortized constant time. Our index requires (n/k) log(n/k) + n log σ + o(n) bits of space, which is compact (i.e., O(n log σ)) when k = Θ(logσ n). As a byproduct, we present a succinct van Emde Boas tree which supports insertion, deletion, predecessor, and successor on a dynamic set of integers over the universe [0,m − 1] in O(log logm) time and requires only m + o(m) bits of space.

    DOI: 10.1007/978-3-319-19929-0_30

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84948993636&origin=inward

  • Lempel Ziv computation in small space (LZ-CISS) 査読有り

    Fischer J., I T., Köppl D.

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   9133   172 - 184   2015年01月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)

    © Springer International Publishing Switzerland 2015. For both the Lempel Ziv 77- and 78-factorization we propose factorization algorithms using (1 + ε)n lg n + O(n) bits (for any positive constant ε ≤ 1) working space (including the space for the output) for any text of size n over an integer alphabet in O(n/ε2) time.

    DOI: 10.1007/978-3-319-19929-0_15

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84948952875&origin=inward

  • Faster compact on-line Lempel-ziv factorization 査読有り

    Yamamoto J., I T., Bannai H., Inenaga S., Takeda M.

    Leibniz International Proceedings in Informatics, LIPIcs   25   675 - 686   2014年03月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)

    © Jun'ichi Yamamoto, Tomohiro I, Hideo Bannai, Shunsuke Inenaga, and Masayuki Takeda. We present a new on-line algorithm for computing the Lempel-Ziv factorization of a string that runs in O(N logN) time and uses only O(N log σ) bits of working space, where N is the length of the string and σ is the size of the alphabet. This is a notable improvement compared to the performance of previous on-line algorithms using the same order of working space but running in either O(N log3N) time (Okanohara & Sadakane 2009) or O(N log2N) time (Starikovskaya 2012). The key to our new algorithm is in the utilization of an elegant but less popular index structure called Directed Acyclic Word Graphs, or DAWGs (Blumer et al. 1985). We also present an opportunistic variant of our algorithm, which, given the run length encoding of size m of a string of length N, computes the Lempel-Ziv factorization of the string on-line, in O (m · min{n (log logm)(log logN)/log log logN , √ logm/log logm o})time and O(mlogN) bits of space.

    DOI: 10.4230/LIPIcs.STACS.2014.675

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84907858098&origin=inward

  • Faster sparse suffix sorting 査読有り

    I T., Kärkkäinen J., Kempa D.

    Leibniz International Proceedings in Informatics, LIPIcs   25   386 - 396   2014年03月

     詳細を見る

    担当区分:筆頭著者   記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)

    © Tomohiro I, Juha Kärkkäinen, and Dominik Kempa. The sparse suffix sorting problem is to sort b = o(n) arbitrary suffixes of a string of length n using o(n) words of space in addition to the string. We present an O(n) time Monte Carlo algorithm using O(b log b) space and an O(n log b) time Las Vegas algorithm using O(b) space. This is a significant improvement over the best prior solutions by Bille et al. (ICALP 2013): a Monte Carlo algorithm running in O(n log b) time and O(b1+ε) space or O(n log2b) time and O(b) space, and a Las Vegas algorithm running in O(n log2b+b2log b) time and O(b) space. All the above results are obtained with high probability not just in expectation.

    DOI: 10.4230/LIPIcs.STACS.2014.386

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84907811406&origin=inward

  • Inferring strings from suffix trees and links on a binary alphabet 査読有り

    I T., Inenaga S., Bannai H., Takeda M.

    Discrete Applied Mathematics   163 ( PART 3 )   316 - 325   2014年01月

     詳細を見る

    担当区分:筆頭著者   記述言語:英語   掲載種別:研究論文(学術雑誌)

    A suffix tree, which provides us with a linear space full-text index of a given string, is a fundamental data structure for string processing and information retrieval. In this paper we consider the reverse engineering problem on suffix trees: given an unlabeled ordered rooted tree T accompanied with a node-to-node transition function f, infer a string whose suffix tree and its suffix links for inner nodes are isomorphic to T and f, respectively. Also, we consider the enumeration problem in which we enumerate all strings corresponding to an input tree and links. By introducing new characterizations of suffix trees, we show that the reverse engineering problem and the enumeration problem on suffix trees on a binary alphabet can be solved in optimal time. © 2013 Elsevier B.V. All rights reserved.

    DOI: 10.1016/j.dam.2013.02.033

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84889665144&origin=inward

  • Inferring strings from Lyndon factorization 査読有り

    Nakashima Y., Okabe T., I T., Inenaga S., Bannai H., Takeda M.

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   8635 LNCS ( PART 2 )   565 - 576   2014年01月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)

    The Lyndon factorization of a string w is a unique factorization ℓp11,⋯, ℓpmm of w s.t. ℓ1,⋯, ℓm is a sequence of Lyndon words that is monotonically decreasing in lexicographic order. In this paper, we consider the reverse-engineering problem on Lyndon factorization: Given a sequence S = ((s1, p1),⋯, (sm, p m)) of ordered pairs of positive integers, find a string w whose Lyndon factorization corresponds to the input sequence S, i.e., the Lyndon factorization of w is in a form of ℓp11,⋯, ℓpmm with |ℓi| = si for all 1 ≤ i ≤ m. Firstly, we show that there exists a simple O(n)-time algorithm if the size of the alphabet is unbounded, where n is the length of the output string. Secondly, we present an O(n)-time algorithm to compute a string over an alphabet of the smallest size. Thirdly, we show how to compute only the size of the smallest alphabet in O(m) time. Fourthly, we give an O(m)-time algorithm to compute an O(m)-size representation of a string over an alphabet of the smallest size. Finally, we propose an efficient algorithm to enumerate all strings whose Lyndon factorizations correspond to S. © 2014 Springer-Verlag Berlin Heidelberg.

    DOI: 10.1007/978-3-662-44465-8_48

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84906230308&origin=inward

  • Computing palindromic factorizations and palindromic covers on-line 査読有り

    I T., Sugimoto S., Inenaga S., Bannai H., Takeda M.

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   8486   150 - 161   2014年01月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)

    A palindromic factorization of a string w is a factorization of w consisting only of palindromic substrings of w. In this paper, we present an on-line O(n logn)-time O(n)-space algorithm to compute smallest palindromic factorizations of all prefixes of w, where n is the length of a given string w. We then show how to extend this algorithm to compute smallest maximal palindromic factorizations of all prefixes of w, consisting only of maximal palindromes (non-extensible palindromic substring) of each prefix, in O(n logn) time and O(n) space, in an on-line manner. We also present an on-line O(n)-time O(n)-space algorithm to compute a smallest palindromic cover of w. © 2014 Springer International Publishing Switzerland.

    DOI: 10.1007/978-3-319-07566-2_16

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84958527699&origin=inward

  • Closed factorization 査読有り

    Badkobeh G., Bannai H., Goto K., I T., Iliopoulos C., Inenaga S., Puglisi S., Sugimoto S.

    Proceedings of the Prague Stringology Conference 2014, PSC 2014   162 - 168   2014年01月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)

    © Czech Technical University in Prague, Czech Republic. A closed string is a string with a proper substring that occurs in the string as a prefix and a suffix, but not elsewhere. Closed strings were introduced by Fici (Proc. WORDS, 2011) as objects of combinatorial interest in the study of Trapezoidal and Sturmian words. In this paper we consider algorithms for computing closed factors (substrings) in strings, and in particular for greedily factorizing a string into a sequence of longest closed factors. We describe an algorithm for this problem that uses linear time and space. We then consider the related problem of computing, for every position in the string, the longest closed factor starting at that position. We describe a simple algorithm for the problem that runs in O(n log n/ log log n) time.

    Scopus

    その他リンク: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84928791807&origin=inward

▼全件表示

著書

  • Information Processing on Compressed Data

    Takabatake Y., Tomohiro I., Sakamoto H.(共著)

    Sublinear Computation Paradigm: Algorithmic Revolution in the Big Data Era  2021年01月  ( ISBN:9789811640957, 9789811640971

     詳細を見る

    記述言語:英語

    We survey our recent work related to information processing on compressed strings. Note that a “string” here contains any fixed-length sequence of symbols and therefore includes not only ordinary text but also a wide range of data, such as pixel sequences and time-series data. Over the past two decades, a variety of algorithms and their applications have been proposed for compressed information processing. In this survey, wemainly focus on two problems: recompression and privacy-preserving computation over compressed strings. Recompression is a framework in which algorithms transform a given compressed data into another compressed format without decompression. Recent studies have shown that a higher compression ratio can be achieved at lower cost by using an appropriate recompression algorithm such as preprocessing. Furthermore, various privacy-preserving computation models have been proposed for information retrieval, similarity computation, and pattern mining.

    DOI: 10.1007/978-981-16-4095-7_6

    Scopus

学術関係受賞

  • IWOCA best paper award

    Program Committee of the 33rd International Workshop on Combinatorial Algorithms   Computing Longest (Common) Lyndon Subsequences   2022年06月

    Hideo Bannai, Tomohiro I, Tomasz Kociumaka, Dominik Köppl, Simon J. Puglisi

     詳細を見る

    受賞国:ドイツ連邦共和国

    We present an algorithm computing the longest periodic subsequence of a string of length n in O(n^7) time with O(n^4) words of space. We obtain improvements when restricting the exponents or extending the search allowing the reported subsequence to be subperiodic down to O(n^3) time and O(n^2) words of space.

科研費獲得実績

  • 高効率で多様な文字列処理を実現する圧縮変換の理論

    研究課題番号:16K16009  2016年04月 - 2019年03月   若手研究(B)

担当授業科目(学内)

  • 2023年度   圧縮情報処理特論MI

  • 2023年度   圧縮情報処理特論AI

  • 2023年度   圧縮情報処理特論DS

  • 2023年度   知能情報工学特別講義

  • 2023年度   文字列データ処理

  • 2023年度   知能情報工学基礎実験

  • 2023年度   オートマトンと言語理論

  • 2023年度   情報工学概論

  • 2023年度   離散数学Ⅰ

  • 2022年度   離散数学Ⅰ

  • 2022年度   情報工学概論

  • 2022年度   オートマトンと言語理論

  • 2022年度   知能情報工学基礎実験

  • 2022年度   文字列データ処理

  • 2022年度   圧縮情報処理特論DS

  • 2022年度   圧縮情報処理特論AI

  • 2022年度   圧縮情報処理特論MI

  • 2021年度   圧縮情報処理特論

  • 2021年度   知能情報工学特別講義

  • 2021年度   文字列データ処理

  • 2021年度   オートマトンと言語理論

  • 2021年度   離散数学Ⅰ

  • 2020年度   圧縮情報処理特論

  • 2020年度   文字列データ処理

  • 2020年度   離散数学Ⅰ

  • 2019年度   離散数学Ⅰ

▼全件表示

学会・委員会等活動

  • 31st International Symposium on String Processing and Information Retrieval (SPIRE) 2024   31st International Symposium on String Processing and Information Retrieval (SPIRE) 2024 プログラム委員  

    2024年06月 - 2024年09月

  • 28th International Conference on Developments in Language Theory (DLT) 2024   28th International Conference on Developments in Language Theory (DLT) 2024 プログラム委員  

    2024年03月 - 2024年08月

  • 電子情報通信学会   『Special Section on Foundations of Computer Science』 小特集編集委員会 編集委員  

    2024年01月 - 2025年03月

  • 35th Annual Symposium on Combinatorial Pattern Matching (CPM) 2024   Local organizing committee, member  

    2023年09月 - 2024年06月

  • 電子情報通信学会   『Special Section on Foundations of Computer Science』 小特集編集委員会 編集委員  

    2023年01月 - 2024年03月

  • 電子情報通信学会   『Special Section on Foundations of Computer Science』 小特集編集委員会 編集委員  

    2022年03月 - 2023年03月

  • 33rd Annual Symposium on Combinatorial Pattern Matching (CPM) 2022   33rd Annual Symposium on Combinatorial Pattern Matching (CPM) 2022 プログラム委員  

    2022年01月 - 2022年07月

  • 28th International Symposium on String Processing and Information Retrieval (SPIRE) 2021   28th International Symposium on String Processing and Information Retrieval (SPIRE) 2021 プログラム委員  

    2021年05月 - 2021年11月

  • 電子情報通信学会   『Special Section on Foundations of Computer Science』 小特集編集委員会 編集委員  

    2021年01月 - 2022年03月

  • 32nd Annual Symposium on Combinatorial Pattern Matching (CPM) 2021   32nd Annual Symposium on Combinatorial Pattern Matching (CPM) 2021 プログラム委員  

    2021年01月 - 2021年08月

  • 27th International Symposium on String Processing and Information Retrieval (SPIRE) 2020   27th International Symposium on String Processing and Information Retrieval (SPIRE) 2020 プログラム委員  

    2020年05月 - 2020年10月

  • 24th International Conference on Developments in Language Theory (DLT) 2020   24th International Conference on Developments in Language Theory (DLT) 2020 プログラム委員  

    2020年01月 - 2020年05月

  • 23rd International Conference on Developments in Language Theory (DLT) 2019   23rd International Conference on Developments in Language Theory (DLT) 2019 プログラム委員  

    2019年03月 - 2019年08月

  • 30th Annual Symposium on Combinatorial Pattern Matching (CPM) 2019   30th Annual Symposium on Combinatorial Pattern Matching (CPM) 2019 プログラム委員  

    2019年01月 - 2019年06月

  • 25th International Symposium on String Processing and Information Retrieval (SPIRE) 2018   25th International Symposium on String Processing and Information Retrieval (SPIRE) 2018 プログラム委員  

    2018年05月 - 2018年10月

  • 情報処理学会 アルゴリズム研究会   運営委員  

    2018年04月 - 2022年03月

  • 24th International Symposium on String Processing and Information Retrieval (SPIRE) 2017   24th International Symposium on String Processing and Information Retrieval (SPIRE) 2017 プログラム委員  

    2017年05月 - 2017年09月

  • 27th Annual Symposium on Combinatorial Pattern Matching (CPM) 2016   27th Annual Symposium on Combinatorial Pattern Matching (CPM) 2016 プログラム委員  

    2016年01月 - 2016年06月

▼全件表示

国際交流窓口担当

  • 大連理工大学  中華人民共和国  2024年06月 - 現在