BookBot: Chatbot for reading Book

BookBot：読書のためのチャットボット（英訳）

2020/1/1
Akito Fujita

This document has been translated from the Japanese version using DeepL.

It's been a very long time.
What have I been doing after four months of silence?
...I was making this.

bookbot.glitch.me

Contrary to many people's expectations, it is a pure JavaScript web application. In this article, I will give an overview of this system, BookBot.

BookBot, as the name suggests, is a chatbot for reading, and it aims to be a chatbot that users can chat with about books. This idea came about when I was talking about chatbots with a close friend of mine, who seems to be a prolific reader, and he said, "If the book itself talks about what it says, it saves me the trouble of reading.

In fact, when we try to use machine learning techniques to develop a dialogue system, the dialogue corpus becomes a problem. A dialogue corpus is a pair of questions and answers that must be prepared as training data. It is said that in order for a dialogue system using machine learning to be able to carry out a reasonable conversation, about 100,000 question-answer pairs must be collected as training data. Such a long dialogue record is not very common.

However, if you want to prepare a new corpus of dialogues manually to develop a chatbot, you don't need such a large amount of dialogues. According to a New York Times interview with Loebner Prize winner Richard Wallace, in the early development of Alicebot, the bot was first piloted in an office, and Wallace filled in each unanswered question from users (in the office). That's right.

If he taught Alice a new response every time he saw it baffled by a question, he would eventually cover all the common utterances and even many unusual ones. Wallace figured the magic number was about 40,000 responses. Once Alice had that many preprogrammed statements, it -- or ''she,'' as he'd begun to call the program fondly -- would be able to respond to 95 percent of what people were saying to her.

He says that if you assume a chatbot's personality and prepare appropriate response text for it, the chatbot will be able to respond appropriately without using machine learning techniques. In other words, "We have failed to master human conversation, not because we are too complex, but because we are too simple. This is an insight that Richard Wallace has discovered.

However, "manually inputting 40,000 questions and responses"...

...and that's when my friend's idea of "personifying books" came into my mind.

To begin with, books (and blogs, for that matter) are the author's own statement. If you look for a chatbot response in a book, that character becomes the author's own character. And if the author has written a significant amount of text, wouldn't he be able to find the most appropriate sentence for any given question without having to write up a new response?

In the end, his comment led me to the idea of using the original text of books as a dialogue corpus. In a nutshell, the idea is to create a pseudo-dialogue corpus by applying information retrieval techniques.

System Overview

Here is a system that is already in place.

The first platform is Glitch. Basically, it is a typical web application using node.js + express.js JavaScript. It is based on hello-express and hello-sqlite provided by Glitch, so the basic structure has been retained. I also used BotUI because I wanted a chatbot-like UI*1, and it made it look like a chatbot.

Unlike most chatbots, BookBot starts out as a book browser. The startup screen looks like this.

f:id:Akito_Fujita:20201228103437p:plain — BookBot

Thanks to BotUI, when you view it in your phone's web browser, it looks like a running phone app. You can then use BotUI's button feature to navigate to the next page, previous page, or previous/next heading.

Blog posts currently included in the collection

Unfortunately, it's still a prototype at the moment, so you can only view blog posts I've written in the past. At the moment, I have the following seven included.

I'm sorry. It's all written in Japanese.

Chapter	Title
1	When did AI (Artificial Intelligence) start?　The Inside Story of the Dartmouth Conference
2	Do you know what Artificial Brainlessness is?
3	ELIZA (1) What is a program that "mimics listening"?
4	ELIZA (2) What is Client-Centered Therapy?
5	ELIZA (3) Scripting - A mechanism to create a response
6	ELIZA (4) DOCTOR Scripts
7	ELIZA (5) What is the background of the development?

I have about 30 blog posts, and I plan to add more as needed. I also plan to implement a function to call the chatbot on any page.

Script generation from Markdown documents

I spent a lot of time designing the script for BookBot. In conclusion, I decided to use a flat structure based on the original article's chapter structure, or "chain of sections". The main reason for this is that it is "mechanically transformable from the original article", but it is also heavily influenced by Weizenbaum's second paper.*1

For now, I'm assuming that articles are written in Markdown, and I've written a program to automatically generate BookBot scripts from Markdown documents. For interpreting Markdown, I used 'Remark', which I introduced earlier. The source of the module used for interpretation, parse-markdown.js, is attached at the end.

The module parseMarkdown expands the Markdown document into a syntax tree called Markdown Abstract Syntax Tree(MDAST), but there are two plugins to keep in mind.

One is 'remark-gfm'. remark's Markdown parser interprets Markdown documents based on CommonMark Spec, but this plugin also interprets Markdown used on Github, i.e. GitHub Flavored Markdown Spec.

The other is 'remark-footnotes', a plugin that supports the following three types of footnote notation in Markdown.

Here is a footnote reference,[^1]
another,[^longnote],
and optionally there are inline
notes.^[you can type them inline, which may be easier, since you don’t
have to pick an identifier and move down to type the note.]

[^1]: Here is the footnote.

[^longnote]: Here’s one with multiple blocks.

    Subsequent paragraphs are indented to show that they
belong to the previous footnote.

        { some.code }

    The whole paragraph can be indented, or just the first
    line.  In this way, multi-paragraph footnotes work like
    multi-paragraph list items.

This paragraph won’t be part of the note, because it
isn’t indented.

To create a glossary

The underlined part of the response from BookBot is a link. For example, when you see the following...

f:id:Akito_Fujita:20201228125712p:plain — Before Click

Click on the underlined "smart speaker"...

f:id:Akito_Fujita:20201228125744p:plain — After Click

...and a description of the "smart speaker" will be displayed.

I'm actually using Markdown's Link Reference Definition for this. (I think it's called "indirect link" in Japanese?).

Example 161

[foo]: /url "title"

[foo]

...is the format, but BookBot uses "title" for "glossary" as shown below.

[スマートスピーカー]: https://ja.wikipedia.org/wiki/%E3%82%B9%E3%83%9E%E3%83%BC%E3%83%88%E3%82%B9%E3%83%94%E3%83%BC%E3%82%AB%E3%83%BC "用語: スマートスピーカー（英: Smart Speaker）とは、対話型の音声操作に対応したAIアシスタント機能を持つスピーカー。内蔵されているマイクで音声を認識し、情報の検索や連携家電の操作を行う。日本ではAIスピーカーとも呼ばれる。"

The URL in this notation is a link to Wikipedia, and the "Glossary" contains the first sentence of the Wikipedia page.

Note that "term" at the top of the "glossary" is a category that is aware of the Named-entity tag. This link definition is not displayed when Markdown is converted to HTML or PDF, but the term is linked to. For example, if you write the following in Markdown...

突如、[人工無脳][]に執着し始めた理由は[スマートスピーカー][]の登場でした。

If you open it in a normal browser, it will look like this

突如、人工無脳に執着し始めた理由はスマートスピーカーの登場でした。

In the case of BookBot, it displays the "term description" written in the title.

There is another use for this Link Reference Definition, for example, if you write the following in Markdown...

[Alexa][スマートスピーカー]のアプリ（[スキル][]って言うのかな？）の開発者の世代では「対話」と言うとチャットボットのイメージが立ち上がるのでしょうか？

In the browser...

Alexaのアプリ（スキルって言うのかな？）の開発者の世代では「対話」と言うとチャットボットのイメージが立ち上がるのでしょうか？

...and it only shows "Alexa", but clicking on it will take you to the Wikipedia page on "smart speakers". In fact, BookBot uses this notation to define synonyms. In this way, BookBot uses a little trick to create a glossary.

Future work

The development of BookBot is about to reach its climax since it lacks the essential dialogue function.

I would like to start by investigating word2vec.

parse-markdown.js

'use strict'

var vfile = require('to-vfile');
var report = require('vfile-reporter');
var unified = require('unified');
var parse = require('remark-parse');
var gfm = require('remark-gfm');
var footnotes = require('remark-footnotes');


// rmposition -- ポジションデータを削除する

function rmposition() {
  return transformer;

  function transformer(tree, file) {
    traverse(tree);
  }

  function traverse(tree) {
    if (tree.position) delete tree.position;
    if (!tree.children) return
    for (var i = 0; i < tree.children.length; i++) {
      traverse(tree.children[i]);
    }
  }
}


// pluginCompile -- 終端処理

function pluginCompile() {
  this.Compiler = compiler;

  function compiler(tree) {
    //console.log(tree);
    return(tree);
  }
}


function parseMarkdown(name) {
  var processor = unified()
      .use(parse)
      .use(gfm)
      .use(footnotes, {inlineNotes: true})
      .use(rmposition)
      .use(pluginCompile);

  //console.log("# %s", name);
  processor.process(vfile.readSync(name), done);

  var result;
  function done(err, file) {
    //console.error(report(err || file));
    //console.log(file.result);
    result = file.result;
  }

  return(result);
}

module.exports = parseMarkdown;

*1:In his second paper, Weizenbaum examines the structure of conversation by comparing "cocktail party chatter" with "discussion between two physicists".

Cocktail party chatter, for example, has a rather straight line character. Context is constantly being changed -- there is considerable chaining of nodes -- but there is hardly any reversal of direction along already established structure. The conversation is inconsequential in that nothing being said has any effect on any questions raised on a higher level. Contrast this with a discussion between, say, two physicists trying to come to understand the results of some experiment. Their conversation tree would be not only deep but broad as well, i.e., they would ascend to an earlier contextual level in order to generate new nodes from there. The signal that their conversation terminated successfully might well be that they ascended (back to) the original node, i.e., that they are again talking about what they started to discuss.

This thought has inspired me to spend a lot of time thinking about the "conversation of reading. I have spent a lot of time thinking about the "conversation of reading. For example, if we think of reading as an activity to gain new knowledge, we can replace it with a conversation between two people with different levels of knowledge, a conversation between a teacher and a student. (A typical example is the so-called "Zen conversation"). In this case, the conversation proceeds based on the teacher's knowledge. Unlike a "cocktail party chat," the conversation is not aimless, but the level of the student's knowledge is taken into account, so it is not as broad or deep as a "discussion between two physicists.

Also, in the case of reading, the knowledge of the higher level speaker is static (not a line of dialogue spun up on the spot), so if you call the act a conversation, it is characterized by less divergence than others. I believe that authors of books make deliberate decisions on how to organize their chapters in order to promote reader understanding, but since the knowledge level of readers is not as flat as authors assume (they know this, but not that), they may not always follow the table of contents. It is also important to take into account the so-called reader's guide.

As I began to think about how to structure the script so that it would be able to handle these problems without difficulty, I found that I could not find the best solution. In the end, I decided to follow the principle of how most people read, i.e., the chapter structure of a book, and assume that if you don't understand the passage (section) you are reading at the time, you will jump to another passage (section) in the book that describes it.

In terms of response generation, I felt the need to be able to set up a combined strategy for each clause (section), such as rule-based like ELIZA, machine generation by machine learning, and procedural response by automaton. This idea is also an extension of the "hierarchy of scripts" proposed by Weisenbaum.

This implementation of the scripting area is still in flux, so we won't be actively releasing the code for the time being. If you want to see it, please use Glitch's editor to view the source with 'AS-IS'.