Proudly Powered by Wikipedia.

Written Japanese Corpus

Outline

This website uses Wikipedia Japanese article data as a corpus.

13,828,652 sentences, 384,648,362 words, and 1,502,987 unique words.

With this corpus we can research word collocation and can learn written Japanese either.

Example of whole sentence display

表示例1

Example of N-gram display

表示例2

Display Format

There are two methods of display format.

  • Display "whole sentences"
  • Display "N-gram"

Available N-gram options are 3g, 5g, 7g, 9g, 11g and 13g.

Display Format
Whole SentenceDisplay sentences containing search query.
3gSearch query and a word before and after the query. The sum is 3 words.
5gSearch query and 2 word before and after the query. The sum is 5 words.
7gSearch query and 3 word before and after the query. The sum is 7 words.
9gSearch query and 4 word before and after the query. The sum is 9 words.
11gSearch query and 5 word before and after the query. The sum is 11 words.
13gSearch query and 6 word before and after the query. The sum is 13 words.

Search 2 words or more

Separate the words by space.

Example: 猫 と

Search Results

Parts of a sentences matching with query is highlighted in red.

When the system shows a result with whole sentence option, the number of match means the number of sentences matched with search query.

When the system shows a result with N-gram option, the number of match means the number of sentences matched with search query either. However, if there are some parts matching with search query, the number of displaying results may be increased compared with the number of the sentences matched with search query.

ASCII and other languages are not covered.

About Data

There is a summary about corpus data from how to obtain Wikipedia Japanese article data to how to process the data. → Link:Wikipediaの記事データからコーパスを作成する方法

As of June 1, 2015, 13,828,652 sentences, 384,648,362 words, and 1,502,987 unique words (Except ASCII characters).

Notice

This website uses Wikipedia's data and is "under the Creative Commons Attribution-Share-Alike License 3.0 or later".

NEITHER THIS WEBSITE NOR I SHALL BE LIABLE TO YOU OR TO ANY OTHER PARTY FOR ANY DAMAGES, COSTS, OR LOSSES. And this website and I have no relevant to Wikipedia.