SOFTWARE EMULATION OF A HARDWARE VOICE SYNTHESISER (2017)

CLICK HERE TO HEAR THE AUDIO SAMPLES

ABSTRACT

The aim of this project was to develop a fully functional emulator of the Speech Plus CallText 5010 hardware voice synthesiser used by Professor Stephen Hawking. Successful completion of the project would allow him to preserve his voice and would greatly reduce the complexity of the communication system he had been using. There were only two fully working hardware boards in existence, and these were already showing major signs of wear. The goal was to retain the exact characteristics of the voice and all the functionality of the original board. It was achieved by reverse engineering the Digital Signal Processor chip present on it, developing an emulator of the chip and merging it with an already existing custom-made CPU (Central Processing Unit) emulator. The operation of both emulators was carefully verified and validated at all stages of development by comparing it with hardware and making sure that the results are bit-perfect. The origin of the project dates back to 2010 and the final result is a collective effort by a number of people. There have been numerous attempts to copy the behaviour of the synthesiser in the past; however, the emulator created as a part of this project was the first to be accepted and used by Professor Hawking.

Pawel Wozniak and Stephen Hawking.
Visiting Professor Hawking with Intel Labs Team (Lama Nachman, Mark Green, Sangita Sharma, Pawel Wozniak, Max Pinaroc)

KEYWORDS

CPU – Central Processing Unit
DAC – Digital-to-Analogue Converter
DR – Data Register
DSP – Digital Signal Processor
ISA – Industry Standard Architecture
RAM – Random Access Memory
ROM – Read-Only Memory
SPI – Signal Processing Interface
SR – Status Register
TTS – Text-To-Speech
USART – Universal Synchronous/Asynchronous Receiver/Transmitter

ACKNOWLEDGEMENTS

I would like to give special thanks to all the people involved in the project, namely: Peter Benie, Jonathan Wood, Sam Blackburn, Jon Peatfield, Eric Dorsey, Patti Price and Mark Green. The project has been a great challenge that has lasted many years. It would not have been possible to complete it without everyone’s teamwork and passion.
I am also very grateful to my team at Intel Corporation: Stephen Baldwin, Wieslawa Litke, Ali Aram and Martin Tschache, and my mentors at the University of Huddersfield: Pavlos Lazaridis and Violeta Holmes, for all the invaluable knowledge, support and guidance they have provided me with during my years of education.
The history of Professor Hawking's synthesiser was based on an unpublished paper provided by Jonathan Wood, Professor Hawking's Graduate Assistant with his permission.
Article originally published by the University of Huddersfield Press.

INTRODUCTION

The Speech Plus CallText 5010 Hardware Voice Synthesiser has been used by Professor Stephen Hawking as a primary way of communication since 1985. The hardware was made back in the 1980s and, due to its age, it became obsolete, fragile, and likely to break. There was an urgent need for a backup in case it failed. The system took up a lot of physical space and its power consumption was relatively high. The aim of this project was to create a software-emulated version of the synthesiser that would make it easier to develop additional functions or adjustments, and significantly reduce the complexity of the entire communication system. There have been a few attempts to provide a software-based solution in the past but they have been unsuccessful. The developed programs were not accepted by Professor Hawking due to the differences in voice characteristics. Therefore, a crucial requirement for the emulator was that it copied the exact behaviour of the original circuit. A low-level hardware emulation approach was chosen as the best solution after considering other alternatives such as copying the circuit logic onto an FPGA (field-programmable gate array) chip, modifying an existing software-based synthesiser, or using Machine Learning algorithms.

The documentation regarding the synthesiser was very limited. At the beginning of the project, Sam Blackburn, Jon Peatfield and Peter Benie had reverse engineered the hardware boards by following the track layout and analysing the program ROM. After months of hard work, they got access to a schematic and peripheral & programmer’s manual. There was still no official disassembler, programmer, or simulator available for the Digital Signal Processor (DSP) chip, and the original source code was lost. The work was based solely on the hardware boards, a DSP datasheet and previously developed disassembler and an Arduino-based DSP programmer.

The project objectives were to:
- reverse engineer the hardware components and the DSP present on the board,
- extract the contents of data and instruction ROM of the DSP,
- verify the disassembler and disassemble the DSP object code,
- develop an emulator for the NEC 77P20 DSP,
- merge the emulator with an existing CPU emulator written by Peter Benie,
- validate both emulators at all stages of development by comparing them to the hardware.

HISTORY OF THE SYNTHESISER

In 1962, Professor John Linvill conceived the Optacon system to help his blind daughter read ordinary print. The development was led by James Bliss and teams at Stanford University and Stanford Research Institute (Linvill & Bliss, 1966). Following a successful demonstration of the Optacon prototype, Telesensory Systems Inc (TSI) was founded in Palo Alto, California, in 1970. TSI focused on developing a line of products for the visually impaired.

In 1975 James Bliss spoke with Jonathan Allen, and that lead to the development of algorithms from the Natural Language Programming Group at MIT (Massachusetts Institute of Technology) lead by Jonathan Allen (letter to phoneme algorithms) and from the MIT Speech Communications Group lead by Dennis Klatt (phoneme to speech algorithms) that were licensed by MIT to TSI. By 1980, TSI had produced five working prototypes of a real-time system on custom LSI VTM (Large-Scale Integration Vocal Tract Model) chips.

In 1982, Dennis Klatt developed Klattalk (Klatt, 1987) a real-time lab-based text-to-speech (TTS) system. The voices used in his system were based on his family, and the voice called ‘Perfect Paul’ used in the DECtalk system was based on his own.

In 1982, a new company called Speech Plus Inc. was formed and developed multiple text-to-speech systems based on the TSI speech technologies. One of their first products, Speech Plus CallText 5000 Telephone/Voice Module retailed at $2700. This synthesiser utilises a method called the formant speech synthesis. Instead of using pre-recorded voice samples, the voice is generated on the go. The speech output is created using additive synthesis and an acoustic model of the vocal tracts. Parameters such as fundamental frequency, voicing and noise levels are varied over time to create a speech waveform (also called rules-based synthesis). The speech synthesis model was based on the original voice of Dennis Klatt, who made measurements of his vocal tracts, his phonetics and phonology, and on his duration rules. (Klatt, 1979).

During an email discussion on the history of Speech Plus with Eric Dorsey, the software engineer who created intonation algorithms for the CallText boards, Eric mentioned that:
'Speech Plus modified the prosody, duration and thousands of the phonological rules that drove the selection of the formants, bandwidths and amplitudes of the various phonemes. English language has 44 phonemes and their formants, bandwidths and amplitudes depend greatly on the phoneme to left and to the phoneme to right of each target phoneme so there are literally thousands of combinations of formants, bandwidths, and amplitudes for each phoneme depending on its context. Speech Plus made thousands of changes to these contextual rules to improve the intelligibility and naturalness of the voice so over time it diverged from the original MITalk 79 voice that Dennis Klatt created. The voices on CallText and MITalk 79 have a lot of differences but they both have their genesis in Dennis Klatt’s voice.''

Synthesisers using the formant speech synthesis technology tend to sound robotic, but formant synthesiser programs take less memory and resources. That makes them suitable for embedded systems. Another advantage is that they allow complete control over different aspects of the voice, such as prosodies and intonation (Klatt, 1979).

In 1985, Professor Hawking contracted pneumonia during his summer visit to CERN. His condition was life-threatening, and his wife Jane was asked to terminate the life support. She refused and, in consequence, he was subjected to a tracheotomy and lost his ability to speak.

In 1986, he started using a voice synthesiser and adopted the voice as we know it today. The synthesiser he used was a Speech Plus CallText 5000 board that was donated to him by Walt Waltosz from Words Plus. Professor Hawking has also tried using voice synthesisers by DECtalk and Votrax, but much preferred the Speech Plus one. In 1988, he was given a new version of the board, the CallText 5010, but didn’t like it as much as the older model because of differences in the voice. He has also said ‘I used to be able to dial and answer calls but I can’t now’. That was solved by replacing ROM chips in the 5010 board with the ones containing firmware from CallText 5000.

PROJECT BACKGROUND

With time, the hardware synthesiser started becoming obsolete and difficult to maintain. Sam Blackburn, who was a Graduate Assistant to Professor Hawking between 2006 and 2012, started looking into different ways of preserving the voice. An approach using Machine Learning was attempted by Phonetic Arts and the resulting voice was very close to the original, but not close enough to be accepted by Professor Hawking. Another attempt by Intel and Ed Bruckert was to modify a DECtalk board, but again the resulting voice was not accepted. Both approaches had failed to keep to the original voice characteristics such as pitch, break timing, pronunciation, or intonation.

After years of unsuccessful attempts, yet another concept was pursued. Eric Dorsey got in touch with Nuance, a company holding rights to the synthesiser that they had indirectly acquired from Speech Plus. Engineers at Nuance found the upgraded source code from the 1996 version of the CallText voice. After a few months of work, they managed to get it very close to the 1986 version of the voice. They recorded samples and sent them to Professor Hawking for evaluation. The match was very close but not perfect, and once again he did not accept it.

The low-level software emulation approach was started in 2011 by Sam Blackburn, Peter Benie, and Jon Peatfield. There were only two boards available to them at the time: one used by Professor Hawking and a spare backup. There was no documentation available for the hardware, so they made a significant effort in reverse-engineering the backup board. It was not an easy task, since both boards had to be available at any time and experimenting with the backup posed a high risk of destroying it. They eventually identified most of the circuit components, the two main ones being an Intel 80188 CPU and a NEC 7720 DSP. Peter Benie wrote an emulator for the CPU from scratch but did not validate it at the time. Sam Blackburn managed to extract ROM contents of the DSP using a custom-built programmer, and Jon Peatfield wrote a disassembler for the DSP machine code. Jon understood most of the DSP code and used an existing emulator to execute it but got stuck when he found an illegal instruction in the ROM contents. The project dissolved in 2012 when Jon sadly passed away.

During my placement year at Intel, I provided technical support to Professor Hawking. I repaired some of his communication equipment, such as the ‘blink’ sensor used to detect the movements of his cheek muscle, an audio amplifier, and a spare backup synthesiser board. This allowed me to understand the design of the system and notice how fragile it was. I thought it would be a good idea to emulate the synthesiser board in software. After discussing the idea with Professor Hawking’s Graduate Assistant, Jonathan Wood, I was told that it was attempted in the past. I was given access to the files related to the project and saw how much work had already been put into it. I couldn’t understand why it had been abandoned. Jonathan put me in touch with Peter Benie, from whom I heard more about the project background and learned that the reason was not a technical issue. Peter and I decided to bring the project back to life. Shortly after, I was told that there were still two faulty CallText boards in existence, one of them owned by Intel and used for the development of ACAT software. I repaired another unit, making a total of four working and one broken synthesiser boards, and asked to borrow one of them for the duration of the project.

THE SYSTEM

The system used by Professor Hawking consisted of:
- Permobil F3 wheelchair provided by Permobil,
- Lenovo Yoga 260 laptop running Windows 10 provided by Lenovo,
- ACAT interface software developed by Intel,
- A blink sensor,
- CallText 5010 speech synthesiser with an Intel chassis,
- Speakers and amplifiers developed by Sound Research.
The full list can be found at http://www.hawking.org.uk/the-computer.html

ACAT

Assistive Context-Aware Toolkit (ACAT) is an open-source software developed by Intel to enable people with disabilities to have full control over the computer. It enables users to easily communicate with others through keyboard simulation, word prediction and speech synthesis. Users can perform a range of tasks such as editing, managing documents, navigating the Web and accessing emails. More information and a download page can be found at https://github.com/intel/acat/

Between 1986 and 2005, Professor Hawking had been using a hand switch. It acted as a one-button keyboard and, with the help of ACAT software, allowed him to control everything on the computer screen. However, as his condition progressed, he found it difficult to control his hand movements so a search for a new solution began.

Subsequently Professor Hawking started using a Words+ infrared sensor that was commercially available and allowed him to control the computer with his right cheek. However, it had a problem compensating for external light sources because the infrared LED (IR LED) was constantly on. In 2007 Sam Blackburn, his graduate assistant at the time, developed a ‘blink’ sensor that was a big improvement to the design. It solved the previously mentioned issue in a clever way by feeding a square wave into the IR LED (rapidly switching it on and off) and then applying a band-pass filter. It filtered the output signal from unwanted high and low frequencies that were not related to muscle movement. Years later, Mark Green from Intel Corporation further improved the design by simplifying it and making it more efficient. He also created a few backup devices and a schematic. However, the backups felt too sensitive for Professor Hawking, even after the adjustment.

In 2016 I joined the effort and worked on reverse engineering the circuit. I created a schematic of the blink sensor from scratch. It turned out that the schematic owned by Intel had some errors in the component values and that caused the copies to be significantly different. I used my schematic to fix the sensor backups.

Analogue blink sensor as used in 2017.
Labelled analogue blink sensor PCB
Updated schematic of the analogue blink sensor.
Analogue blink sensor schematic

The 555 timer IC generates a square wave that controls an IR LED. The IR wave is projected onto the right cheek and picked back up by an IR photo diode. The signal from the IR photo diode is AC coupled and amplified with a 741 op-amp, averaged and filtered, and used as an input signal to another stage of 741 that acts as a comparator with threshold adjustment. The output of the last stage amplifier is used to control a relay. The signal from relay is connected to a laptop using the EZ keys USB device and picked up by the ACAT software.

The blink sensor was based on analogue parts that introduced some inconsistencies and was difficult to calibrate to achieve the desired sensitivity. It had to be adjusted multiple times throughout a day to compensate for various factors.

Intel developed camera-based algorithms to detect facial gestures that are a part of the open source ACAT software and could be used with a webcam. For Professor Hawking it didn't work as well as the blink, in terms of sensitivity and robustness to illumination. At the end of 2017, the Intel Labs team led by Lama Nachman finished working on a digital version of the blink sensor that was based on a microprocessor sending and receiving infrared waves, so the basic principle of operation was very similar. It was, however, superior to the analogue blink in terms of stability, control, and replicability.

Professor Hawking editing text with ACAT software and a blink sensor.
Editing text with ACAT software and a blink sensor

CALLTEXT BOARD HARDWARE OVERVIEW

The CallText synthesiser could be best described as a custom-designed computer system. There following components are present on the board:
- a power supply unit,
- Intel 80188 CPU,
- ROM (27512),
- RAM (HM6264A),
- DSP (NEC 77P20),
- DAC (AMD AM6012),
- USART (8251A),
- many passive components and operational amplifiers,
- an 8-channel Analog Multiplexer/Demultiplexer,
- a notch low-pass filter,
- D-type flip-flops,
- 8-Bit Identity and Magnitude Comparators,
- Octal Bus Transceivers, Buffers, and Line Drivers,
- Digital Clocks and Crystal Oscillators,
- a 4-bit Binary Counter,
- 74-series and Programmable Array Logic chips.

It was designed for use with IBM PC but another option was to provide it with external power and use it as an entirely standalone device that communicates with a computer via a serial port using a crossover cable. This mode was used by Professor Hawking, so that he could easily send text from his laptop. The resulting voice output could be connected straight to a phone line or to a mono audio device, but only the latter option was used. The voice characteristics such as speed, volume, or pitch can also be regulated by sending escape characters followed by special commands. This feature had to be preserved in the emulator as well.

Hardware voice synthesiser board used by Professor Hawking.
The original Speech Plus CallText 5010 voice synthesiser board with external power supply in a chassis provided by Intel (standalone mode).

CPU

The CallText 5010 board runs on an Intel 80188 x86-16 CPU with three 64kB ROM chips and two 64kB RAM chips. The first ROM contains the firmware, the second one some unidentified binary data, and the third one is a library ROM with routines for controlling the hardware. We have tested and proved that the library ROM is not used in the standalone application or the IBM PC mode and can be completely ignored.

The firmware consists of a set of coroutines, each of which has both read and write buffers. They are cascaded in such a way that the write buffer of one is the read buffer for another. The first routine reads from the serial input buffer that is provided by the USART RdRdy interrupt handler. The CPU reads English text in ASCII form and processes it using the text-to-phoneme algorithms developed at MIT. At subsequent coroutines, the phonemes are used to create packets of formants (e.g. bandwidth, frequency, pitch, and amplitude). The packets are stored in the output buffer. The output buffer is passed on to the DSP using the DSP P0 interrupt handler routine. The flow of data through the program is regulated by the output stage.

The 74-series logic and PAL devices control the data flow on the internal buses between the CPU and IBM PC and USART interfaces. The PAL chips are masked and registered so it would not be possible to reverse engineer any of them. Fortunately, the IBM interface is not necessary for the application and USART could be replaced by only emulating the Rx/Tx registers, status flags, and interrupt signals for RxFull and TxEmpty. Therefore, most of the logic components present on the board could be ignored.

DSP

The DSP used in the system is a NEC 77P20 Signal Processing Interface (SPI) chip. It is a prototype version of the NEC 7720 SPI, meaning that the ROM is not masked and instead it is ultraviolet erasable and electrically programmable (EPROM). That made extracting the program from the chip relatively easy. Unlike the Intel CPU, the DSP can very quickly perform complicated arithmetic operations. One instruction can consist of multiple operations in one cycle.

The SPI is of modified Harvard architecture with three separate memory areas: program ROM (512x23 bits), data ROM (510x13 bits) and data RAM (128x16 bits). Unlike the Intel CPU, the DSP can very quickly perform complicated arithmetic operations. One instruction can consist of multiple operations in one cycle: multiply, accumulate, move data, and adjust memory pointers.

NEC 7720 DSP internal architecture block diagram.
Block diagram of the NEC 7720 DSP internal structure (NEC Electronics Inc., 1985)

On the higher level, the DSP acts as a vocoder. It receives five formant packets from the CPU, 16 bits each (80 bytes in total), and generates the voice waveforms using vocal tracts model algorithms. There are two routines present in the DSP firmware. The first one receives the data packets from the CPU and writes output data into a queue, and the other one outputs the resulting voice samples to the DAC. The sampling rate is regulated by a CPU timer running at 10kHz. After clearing the output queue, the DSP requests a new packet from the CPU by unsetting the P0 flag in the status register. The flag is available externally as an output pin and connected directly to the CPU interrupt pin which triggers the interrupt handler routine. The final DSP stage has one generated packet, which is released when the interrupt is triggered. Immediately after the packet has been sent, it prepares the next packet for sending; there is always one packet waiting to be released. The CPU interrupt routine sends the data to the DSP via the parallel peripheral bus connected to the DSP’s parallel port. It does so by writing 40 pairs of bytes into the DSP’s data register DR. After the first word is received or the request times out, the DSP sets the P0 flag back to 1.

I have analysed the timing of the hardware system with a help of the Zeroplus LAP-C 16128 Logic Cube Analyser, logging signals on pins that were directly connected to the CPU. The below snapshot shows a ~1ms window with 80 bytes of data being written to the DSP Data Register (DR).

Voice synthesiser board logic analysis.
Analysis of the timing of the hardware system

I have noted the following:
- DSP interrupt pin INT is being driven with a 10kHz timer,
- P0 DSP output connected to the CPU interrupt pin (requesting data from the CPU) goes down for 210us and up for 9.79ms (every 10ms, 100Hz @ 97.9% Duty Cycle),
- ~117.5us after P0 going down the CPU checks the Status Register (SR),
- ~277.5us after that the CPU does one more SR read,
- then, 40 times in a row: after ~10us SR is read, ~7us after that 1 byte is written to DR, ~4us after that another byte is written (total of 80 bytes),
- at the end, the CPU does 2 more reads and waits for another P0 interrupt.

DAC

The AMD AM6012 is a 12-bit multiplying Digital-to-Analogue Converter. It is used to convert the waveform digital sample data to an analogue signal before the amplification stage. The 12-bit parallel input to the DAC is fed with a combination of two HCT273 D-Type flip-flops with clear pins and two HCT377 D-Type flip-flops with clock enable pins. The two cascaded HCT377 flip-flops take a 16-bit serial output from the DSP and convert it to parallel acting as shift registers. Once the shifting is done, the remaining HCT273 flip-flops discard 4 bits of data and output the resulting 12-bit signal to the DAC at a rate of 10kHz.

While reverse engineering this part of the circuit, we discovered that the two bits of lowest significance had been swapped, probably due to an error in wiring design. The error caused by this is less than 0.05% and would not be perceived by the human ear:

The error needed to be divided by two because, in half of the cases, when the data is 11 or 00, the output stays the same.

DSP PROGRAMMER

The DSP programmer was originally developed by Sam Blackburn using an Arduino Mega 2560. He managed to read the contents of the ROM, program a blank chip and successfully generate speech by placing the freshly programmed DSP in the original synthesiser. His intention was to use the programmer as a runtime, but he did not manage to figure out how to perform I/O before abandoning the project. The programmer source code came with data and program ROM files extracted from the DSP stored in hexadecimal. We noted that the data ROM contents did not match the instruction ROM addresses, and we found an illegal instruction in the disassembled code. The programmer needed to be validated once again to make sure that the data and instruction files fully matched the ROM contents of the DSP.

HARDWARE

The programmer is based on an Arduino Mega 2560. It incorporates 2 pushbuttons for user control, LEDs to indicate the state of pins and a quad NOR which feeds the DSP’s clock pin with a quartz crystal. The VPP (21V) necessary for programming the ROM on the DSP is provided externally and enabled with a transistor. All the other pins are driven directly from the Arduino.

NEC 7720 DSP programmer based on Arduino Mega.
Arduino-based DSP programmer

DSP ROM CONTENTS

The programmer source code was based on a Flash Arduino library for flash-based data collections, written by Mikal Hart under a GNU license. I have carefully compared the program functions against the DSP datasheet to make sure that they meet the chip specification and to become familiar with the ROM files structure.

NEC 7720D DSP can store 1.5kB of program ROM (512 x 23 bits) and 1kB of data ROM (510 x 13 bits). It turned out that the instruction ROM file downloaded using the programmer stored bits in a byte in reverse order (big endian instead of little endian) and had a dummy bit (logic 0) present in every lowest bit of the middle byte. That could be dealt with by the disassembler or interpreter as a part of the emulator.

It also turned out that the data ROM was not stored on the PC and Arduino in the same way as it was stored on the hardware DSP. The data is stored on the hardware DSP in 510 address spaces (0002H-11FH) accessed by a 9-bit ROM pointer (RP), containing 13 bits each. 2 lowest bytes of data (addresses 000H and 001H) should remain blank since they normally contain testing patterns for the development phase. The data ROM file downloaded with the programmer was stored in hexadecimal as 512 x 2 bytes. That could lead to difficulties running the object code in an emulator.

I wrote a small Python script that reformatted the file to match the physical structure of the IC using regular expressions and some basic string and list manipulation functions. The following things had to be adjusted:
- 2-byte word endianness was changed from big endian to little endian,
- Addresses were reversed so that the lowest address data was stored at the beginning of the file,
- The 3 LSB of every low byte (all zeros, dummy data) were removed,
- The file was converted from hexadecimal to binary base.
I have also used the same script to save the file in hexadecimal and then used the file to plot the DSP data with Matlab:

data = hex2dec(textread('dspdata.txt','%s'))
figure;
plot(data)
DSP data ROM contents plotted in Matlab.
DSP data ROM contents plotted in Matlab

We could clearly identify a sinusoidal wave, two exponential functions, and some other structures in the data ROM. They are used by the DSP for sample generation.

A brand-new NEC 77P20D DSP chip was programmed using the original ROM dump files. The chip was then inserted into the hardware voice synthesiser to verify that the synthesised voice matches the original and no glitches are present.

Programming a NEC 7720P DSP chip.
Programming a blank DSP chip

INTEL 80188 CPU EMULATOR VALIDATION

The Intel 80188 CPU emulator is a console application written by Peter Benie. It uses the firmware extracted from onboard ROM to process input strings and provide the data output to the DSP. The emulator skips certain hardware initialisation routines and does not handle hardware interrupts since no peripheral devices are emulated. This part of the emulator must be validated to make sure that this part of the software solution works correctly before developing an emulator for the DSP.

EMULATOR OUTPUT DATA

There are two types of data being sent to the DSP by the CPU. The first is the initialisation data that is only sent once, upon boot, to configure the correct working mode; the second is streamed after the device is given a string of text to synthesise the voice. The CPU emulator initialises with a default input string of ‘Hello. Welcome to the emulator’, printing the data in hexadecimal. Then, it waits for user input.

I adjusted the emulator source code to provide the data in a format better suited for verification and recompiled it. The default input string and any peripheral console printout functions were removed. The DSP write function was modified to print all the data in one line in 1-byte packets separated with spaces, and the order of bytes was changed from big endian to little endian to match the order they appear in on hardware.

I collected multiple datasets from the emulator output: initialisation header (before the emulator asks for input), data generated with different input test strings (e.g. ‘Hello world’, ‘Hello, how are you?’, a comma character itself followed by a dot, and so on). These datasets should provide enough data for different use cases. Results were saved into text files with spaces replaced by newline characters using regular expressions.

HARDWARE OUTPUT DATA

I used a Zeroplus LAP-C 16128 logic analyser to extract the information from the data bus between the CPU and DSP. I have set it to synchronous mode with external clock (connected to DSP CLK pin running at 8MHz). Since only the data written to the Data Register was of any interest at this point, the analyser was connected to the Chip Select (CS’), Write (WR’), and Data Bus (D0-D7) pins.

CallText 5010 board probed with a Zeroplus Logic Analyser.
Probing the DSP with a logic analyser

I set the trigger to CS’ signal low and, since the analyser has only 256kB of memory, the filter was set to WR’ low & CS’ low. Due to the filter settings, the logic analyser would keep capturing the data written to DR until the buffer is filled up. That would never happen because of initialisation data being quite small, so I came up with a workaround: after the necessary data was sent and the CPU stopped writing to the DSP’s DR, previously disconnected RST channel was connected to VCC to mark the point in time. Then, WR’ and CS’ pins were disconnected to let the buffer fill up with all the data present on the synthesisers’ internal data bus (which is also shared by RAM and ROM). This way it was possible to mark exactly where the data that was the point of interest begins and ends.

Data flow between the CPU and DSP.
Analysing the data sent from the CPU to the DSP

The CPU does not write any data to the DSP on idle, so the trigger settings did not have to be changed. I set the bar at the point where the reset pin has been pulled to high, and exported all the collected data that appeared before the bar to text files.

CPU data sent to the DSP.
Exported formant packets in hexadecimal

COMPARING THE DATA

The data collected from the hardware synthesiser and CPU emulator had to be brought to the same format to be compared. I noticed that the data read from hardware was present on the bus for 2 clock cycles during every write cycle and wrote another Python script to process all the logic analyser exports in the following manner:
- remove all the newline, tab and space characters,
- remove lines 1 to 6,
- remove every other line,
- convert the hexadecimal numbers into lowercase characters,
- remove the cycle counts and 0x prefixes,
- add new lines between single bytes and save the file.

# Python script to convert logic analyser export files
# Pawel Wozniak, November 2017
import sys, getopt, argparse, re
def main(argv):
# parse input/output filename as an argument
	parser = argparse.ArgumentParser(description='Logic Analyser log file formatter')
	parser.add_argument('-f', '--file', help='Input/Output file name', required=True)
	args = parser.parse_args()
	file = args.file
	print('Starting conversion of %s' % file)
	with open(file, 'r') as f:
# read all lines into a list and strip whitespaces
		input = [line.rstrip() for line in f]
# remove lines 1-6 and then every other line
		input = input[6::2]
# convert to lowercase, get rid of everything before the actual bytes
		for i in range(len(input)):
			input[i] = re.sub('(.*)+X', '', input[i]).lower()
# save raw bytes separated by enter into a text file
		with open(file, 'w') as o:
			o.write('\n'.join(input))
	print('Done.')
	return
if __name__ == '__main__':
	main(sys.argv)

I have then compared the hardware and software generated samples with a help of Beyond Compare. The white bar on the left of the below figure represents the buffer of the entire file.

Software and hardware generated PCM samples compared using Beyond Compare.
Comparing the software generated packets to hardware

All the data generated by the emulator was consistent with hardware dumps. The only inconsistency was that the hardware in some occasions generated more data than necessary. The way the CPU emulator works is it does not wait for the DSP response. Instead, due to lack of a working DSP emulator, the response is simulated and that leads to a timeout branch never being executed. The CPU emulator always carries on sending the data to the DSP as if the system was truly synchronous, what is not the case in real hardware. A likely explanation for the mismatch at the time was that the response from the DSP was not emulated correctly. It was not possible to fully verify that hypothesis before the DSP emulator was developed.

NEC 7720P DSP DISASSEMBLER

The disassembler was developed by Peter Benie. It was based entirely on the datasheet and written in C. It takes the hexadecimal instruction ROM dump file as an input and converts it to binary. Due to the structure of the ROM files (as discussed in the DSP programmer section), it must reverse the bit order of every byte. After that is done, the LSB in every mid-byte (dummy data for programming purposes) is removed, resulting in 23-bit instruction words. Then the instructions are compared to the instruction tables and translated into assembly with a help of switch statements. The full list of instructions can be found in the uPD77C2OA, 7720A, 77P20 Digital Signal Processors Datasheet by NEC Electronics Inc. (1985).

The disassembler also conforms with the execution order, as described in the datasheet:
- multiplication begins, data moves from source to destination,
- ALU operations,
- pointer modifications,
- multiply finishes, returns (if requested in OP).

ILLEGAL INSTRUCTION

A single instruction at the location 0x0db was found to be illegal. It was moving data to the DSP RAM from no location (non), which is against the rules outlined in the datasheet:
“The instruction, which is acceptable using the NEC assembler (AS77201), has an inherent conflict in that data is simultaneously being moved into memory and fetched in one instruction. ALU instructions involving either ACCA or ACCB should not be used. In summary, observe the following rules.
(1) DST should not be @MEM when PSEL is RAM.
(2) When SRC is NON, DST must be @NON.
(3) A should not be used as both DST and ASL
(4) B should not be used as both DST and ASL”

That can lead to unpredictable behaviour of the IC. Depending on the internal structure of the chip, one of the following could happen:
- the IDB goes high (0xffff),
- the IDB goes low (0x0000),
- the IDB retains its value due to parasitic capacitance,
- the decoder recognizes illegal instruction and no operation is performed.
I have translated the disassembled code neighboring the illegal instruction to C-like pseudocode to understand its function as much as possible.

Disassembled DSP instruction ROM with illegal instruction
0c3: [600344]   ld      0x001a, @dp
0c4: [600001]   ld      0x0000, @a
0c5: [028000]   op      add     acca, ram
0c6: [48acb0]   jza     0x0cb
0c7: [61fc02]   ld      0x0fe0, @b
0c8: [600504]   ld      0x0028, @dp
0c9: [02c000]   op      add     accb, ram
0ca: [500d30]   jmp     0x0d3
0cb: [0a8080]   op      mov     @non, dr
                        add     acca, idb
0cc: [4a0cf0]   jnsa0   0x0cf
0cd: [600002]   ld      0x0000, @b
0ce: [500d30]   jmp     0x0d3
0cf: [63ffe1]   ld      0x1fff, @a
0d0: [0a0080]   op      mov     @non, dr
                        sub     acca, idb
0d1: [4a0d30]   jnsa0   0x0d3
0d2: [63ffe2]   ld      0x1fff, @b
0d3: [6faeea]   ld      0x7d77, @k
0d4: [600504]   ld      0x0028, @dp
0d5: [0000fd]   op      mov     @l, mem
0d6: [7ffb01]   ld      0xffd8, @a
0d7: [128000]   op      add     acca, m
0d8: [4a0db0]   jnsa0   0x0db
0d9: [60000f]   ld      0x0000, @mem
0da: [500dc0]   jmp     0x0dc
0db: [10000f]   op      mov     @mem, non
0dc: [600364]   ld      0x001b, @dp
0dd: [0000f4]   op      mov     @dp, mem
0de: [002000]   op      dpdec
0df: [4b2e20]   jdplf   0x0e2
0e0: [000046]   op      mov     @dr, dp
0e1: [500e30]   jmp     0x0e3
0e2: [6001a6]   ld      0x000d, @dr
0e3: [600384]   ld      0x001c, @dp
0e4: [000081]   op      mov     @a, dr
0e5: [0a00f0]   op      mov     @non, mem
                        sub     acca, idb
0e6: [48ae40]   jza     0x0e4
0e7: [600364]   ld      0x001b, @dp
0e8: [00008f]   op      mov     @mem, dr
0e9: [000084]   op      mov     @dp, dr
0ea: [00002f]   op      mov     @mem, b
0eb: [600784]   ld      0x003c, @dp
0ec: [0000f1]   op      mov     @a, mem
0ed: [040000]   op      dec     acca
0ee: [00001f]   op      mov     @mem, a
0ef: [4a3a20]   jsa0    0x1a2
0f0: [500180]   jmp     0x018
Disassembled DSP instruction ROM translated to pseudo-C code
c3	a = mem[26]
	if (a = 0)
		if (dr < 0)
			b = 0
		else if (dr > 8191)
			b = 8191
	else
		b = mem[40] + 4064
	if (32119*mem[40] - 40 < 0)// only sign bit + 15 higher bits from the multiply
		mem[40] = 0
	else
		mem[40] = undefined
	dp = mem[27] - 1
	if (dpl = 0x0f)
		dr = 13
	else
		dr = dp
	do while (a = 0)
		a = dr - mem[28]// while loop waiting for data to be sent from cpu
	mem[27] = dr
	mem[dr] = b
	mem[60] -= 1
	if mem[60] < 0
		jump to 0x1a2
	else
f0		jump to 0x018

After a lengthy investigation and testing different replacement instruction on real hardware with no visible changes in operation, we have decided to leave the illegal instruction in the ROM and deal with the problem on a higher level by printing an error message if this portion of the code was ever executed in the emulator.

DSP EMULATOR DEVELOPMENT

The DSP part was based on higan, an open-source multi-system emulator. The processor emulated in higan is NEC 7725, which is very similar to the 7720. Release notes for the higan suggested that the emulator has been recently updated to handle OVA flags correctly and should be pretty much bit-perfect. The license under which it was released, GPLv3, allowed us to use and modify the source code.

There are a few differences between the 7725 and 7720 chips. The former has more ROM and RAM memory, one more temporary register, and three additional instructions. Its instructions are 24-bit long, as opposed to 7720’s 23 bits. The data words are 16-bits long, while in 7720 they are only 13-bits long. The emulator was developed by extracting the DSP part from higan and translating it to a standalone C program. Then, it was adapted by changing the sizes of the registers and modifying the instruction interpreter. A check was embedded into the code to notify the user when the program reaches an illegal instruction that we found in the extracted DSP instruction ROM. If that happened, the emulator would output an error message to help us trace it back to the origin.

The DSP emulator was then merged with the CPU emulator by pipelining the CPU emulator output with the input of the DSP emulator, using the | operator. It was safe to assume that there would be no timeout issues that could potentially occur on hardware, and therefore flow control was not necessary to implement. In the CPU emulator, the DSP response is faked so that the program generates as much data as it can and passes it on to the DSP. The DSP emulator reads the input packets in blocks of 80 bytes and feeds them to the data register DR one word at a time. The DSP emulator processes the given data and outputs the resulting waveform data.

The DSP firmware outputs 0x7F0 to the DAC as a neutral value. It is slightly below half for the 12-bit data (0x800). In consequence, whenever the voice was generated, and the output changed from no input to the DAC (0x800) to the neutral value (0x7F0), pops and clicks could be heard. That problem was also present on the hardware board and was fixed in the emulator by shifting the offset of the DSP output, but some gain had to be sacrificed.

The output from the DSP emulator can be played using computer DAC which also controls the timing of the samples. It is achieved by using the Linux play -t raw command from the sox package.

TESTING

I comprehensively tested the emulated voice by comparing it against the hardware-synthesised voice. At first, I noticed an issue which caused certain words to not be pronounced correctly, i.e. the emulator paused for a split of a second halfway through the sentence. The glitch occurred only with specific words and I realised it was also present on the hardware board I had been using for validation. It was a backup board and it quickly turned out that the behaviour was not present on the original synthesiser. I found that there was one byte of difference in firmware between the boards. It was most likely to have been caused by repeated exposure of the ROM chips to the sunlight. The problem was fixed by updating the firmware in the emulator with the version taken from the original board.

To get the best possible results, I took digital sample recordings from the hardware, instead of recording the analogue signal. I did it by capturing the 12-bit parallel input to the DAC with a logic analyser. Then, I converted the recorded data to an array of Pulse-Code Modulation (PCM) samples stored in hex using a Python script and saved them into a text file separated by newline characters. I used Matlab to generate a waveform:

% Matlab code for generating audio with PCM samples
% Pawel Wozniak, December 2017
y = (hex2dec(textread('hardware.txt','%s')) - 2048)
% normalise y between -1 an +1
y = y./1024;
% sampling frequency is 10kHz
Fs = 10000;
% plot the sound wave
figure;
plot(y)
% save waveform as audio
filename = 'hardware.wav';
sound(y,Fs);
audiowrite(filename,y,Fs);

The above script reads the PCM samples from a text file and converts them do decimal values. Since the resolution is 12-bit, the maximum value for a sample is 4096. A value of 2048 is subtracted from each sample to remove the DC offset. Each sample is then divided by 1024 to normalize the values for maximum amplitude. The divider value should, in theory, be 2048 but in practice the samples had a relatively small amplitude and dividing them by 1024 resulted in the signal ranging from -1 to +1. The waveform was generated using audiowrite function with sampling frequency set to 10kHz and saved into a .wav file.

The voice synthesiser emulator was then used to generate waveforms with the same input strings for comparison. Some samples were found to be slightly misaligned in the time domain. The reason behind is that the data recorded from the hardware board contained more zero-value samples at the beginning of the file, due to the way the trigger was set on the logic analyser. I used Audacity to compare the timing.

Software and hardware generated voice samples misaligned in time domain.
Hardware and software test samples misaligned in the time domain

It was fixed by removing the extra zero-value samples from the hardware-generated text files and recreating waveform files to match the timing of the software-generated audio.

Software and hardware generated voice samples aligned in time domain.
Hardware and software test samples after alignment in the time domain

I compared the hardware and software sample files using Matlab Signal Analyzer:

[software, fs] = audioread('emulator.wav');
[hardware, fs] = audioread('hardware.wav');
signalAnalyzer
Emulator-generated voice analysis in time and spectrum domains.
Comparison of hardware- and software-generated voice in the time and frequency domains

In terms of sound duration, the compared signals were an exact match. However, their amplitudes seemed to be slightly different in places, so I investigated further. More samples were taken from both hardware and software and they were always different from each other. Even the samples from hardware did not match perfectly other hardware samples. We noticed that the differences occurred in the fricative sounds, such as ‘z’, ‘f’, ‘v’, or ‘s’. It turned out to be an effect of a pseudo-random number generator implemented on the DSP used for generating these sounds (Klatt, 1979).

In the spectrum domain it was a very close match but small differences were noted. These could also be explained by the effects of the pseudo-random number generator. Another reason may be the fact that the two bits of the lowest significance are swapped on the hardware and this behaviour is not implemented in software. Either way, there was no perceivable difference in the sound. The voice was concluded to be matching the criteria set by the project objectives.

During the testing, the illegal instruction contained in the DSP ROM was never reached by the emulator. Therefore, it did not affect the operation of the program or the voice characteristics.

The last stage of testing was to present the emulator to Professor Hawking himself. It took place in January 2018, using a laptop running Linux OS to drive the emulator that output the audio to the original audio system mounted on the wheelchair. At this point, I had already developed a way of running the emulator under Windows environment, but the solution was a little too messy compared to Linux. The voice turned out to be much cleaner than the original and was the first hardware replacement to be accepted by its owner.

Stephen Hawking's Voice Emulator running on Linux.
Console output of the working emulator

WINDOWS SUBSYSTEM FOR LINUX

Although the emulator was running natively on Linux, I investigated the possibility of making it work under Windows OS using the Windows Subsystem for Linux environment since that was the system used by Professor Hawking. The biggest issue was the fact that Windows does not officially allow WSL to access any hardware on the host machine. What this means is that native Linux programs can be run under Windows but cannot, for example, play audio. This can be solved using a modified WSL GUI package that runs a pulseaudio stream on the client (Linux) side and installing a pulseaudio server to receive the audio stream on the host (Windows). VSPD (Virtual Serial Port Driver) software was used to emulate the serial connection between ACAT (or, in this case, Putty) and the emulator. Below are the original instructions I wrote on how to deploy the emulator on Windows.

Instructions to run the voice emulator under Windows Subsystem for Linux
1. Install WSL. Get sox, close WSL.
2. Windows command prompt: ubuntu config --default-user root
3. Open WSL again, do apt-get update and upgrade
4. install wsl_gui from the batch script
5. add load-module module-native-protocol-tcp under load-module module-native-protocol-unix in /etc/pulse/default.pa
6. in Windows firewall, enable port 22 TCP all incoming connections
7. install VSPD, set up a virtual connection between COM5 and COM6
8. move the emulator files to c:\emul and you will be able to access them at /mnt/c/emul
9. within the WSL (ttyS6 for COM6):
(
   stty raw 9600 -parenb -cstopb -clocal cread crtscts
   ./emul_integrated_full >&0
) <>/dev/ttyS6
10. connect to COM5 as you would with ACAT, change serial port settings to 7/1/NON or change them from WSL side to 8/1/X
11. extract pulse6 from the link below to c:\soft and run c:\soft\pulse6\bin\pulseaudio.exe

Pulse6 for WSL
Virtual Serial Port Driver
wsl_gui_autoinstall

To automate the process, I have written a batch script that sets everything up and the only thing to be done by the user is to install the WSL from Microsoft Store, extract a specially prepared package to C:/ and run the script.

Automated batch script to run the voice emulator under WSL
@ECHO OFF
SET "LINUXTMP=$(echo '%TMP:\=\\%' | sed -e 's|\\|/|g' -e 's|^\([A-Za-z]\)\:/\(.*\)|/mnt/\L\1\E/\2|')"
echo LINUXTMP = "%LINUXTMP%"
ECHO --- Running Linux installation.
echo yes ^| add-apt-repository ppa:aseering/wsl-pulseaudio > "%TMP%\script.sh"
echo apt-get update >> "%TMP%\script.sh"
echo apt-get -y install pulseaudio unzip >> "%TMP%\script.sh"
echo sed -i 's/; default-server =/default-server = 127.0.0.1/' /etc/pulse/client.conf >> "%TMP%\script.sh"
echo sed -i "s$<listen>.*</listen>$<listen>tcp:host=localhost,port=0</listen>$" /etc/dbus-1/session.conf >> "%TMP%\script.sh"
echo apt-get -y install sox >> "%TMP%\script.sh"
C:\Windows\System32\bash.exe -c "chmod +x '%LINUXTMP%/script.sh' ; tr -d $'\r' < '%LINUXTMP%/script.sh' | tee '%LINUXTMP%/script_clean.sh'; sudo '%LINUXTMP%/script_clean.sh'"
ECHO --- Setting PulseAudio to run at startup.
echo set ws=wscript.createobject("wscript.shell") > "%userprofile%\AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Startup\start_pulseaudio.vbe"
echo ws.run "%cd%\pulse6\bin\pulseaudio.exe --exit-idle-time=-1",0 >> "%userprofile%\AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Startup\start_pulseaudio.vbe"
ECHO --- Opening TCP port 22 in firewall.
netsh advfirewall firewall add rule name="Voice" dir=in action=allow protocol=TCP localport=22
ECHO --- When prompted, DO NOT allow 'pulseaudio' access to any of your networks.  It doesn't need access.
"%userprofile%\AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Startup\start_pulseaudio.vbe"
ECHO --- Please install Virtual Serial Port Driver and create a pair for COM ports 5 and 6.
"%cd%\vspd.exe"
pause
ECHO --- Setting Voice to run at startup.
copy /-y "%cd%\start_voice.bat" "%userprofile%\AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Startup\start_voice.bat"
copy /-y "%cd%\start_voice.bat" "%userprofile%\Desktop\start_voice.bat"
ubuntu config --default-user root
"%userprofile%\Desktop\start_voice.bat"
ECHO --- Please connect ACAT to port COM5.
pause

A while later, Peter also developed a native Windows emulator, but it wasn’t finished in time; Professor Hawking sadly passed away on 14 March 2018 before he was given a chance to try it.

AUDIO SAMPLES

The first ever audio generated by the emulator (December 25th, 2017). Note the popping sound at the beginning and the end of the sentence caused by the DC offset:

Soon after, we have found a bug that caused random, mid-sentence breaks. It turned out to be a problem with the CPU instruction ROM.
On top of that, the speech rate was slightly faster than the original voice, being the effect of us not using escape characters for speech rate control:

The final comparison. In the recordings below you can easily hear the analogue noise and the popping sound caused by DC offset on the hardware synthesiser, while the emulated voice is much cleaner.

Hardware CallText 5010


Software emulator

CONCLUSIONS

The project was challenging in several ways. It was not an easy task to find the relevant documentation, reverse engineer the hardware board, develop the low-level software emulator, and verify its operation. Almost eight years after it first started, it was finally brought to a successful end. The emulator was running on a Raspberry Pi 3 with an external USB DAC for additional amplification. There were plans on porting the code to run natively in Windows environment, but all the work was suspended due to the saddening death of Professor Hawking. He used the emulator as a primary source for his voice for the last two months of his life. When he first tried it, he simply said ‘I love it’.

The objectives of the project were met, namely the DSP emulator was developed and merged with the CPU emulator to form a fully software-based voice synthesiser. The voice characteristics were perfectly matched, which has been proved with the analysis of time and spectrum domains. The voice was accepted by its owner as his own. All the planned work has been carried out although some changes were made to the original project plan. There was no longer a need for developing a solution to inject emulator-generated data packets to the DSP to test the functionality. Instead, samples of the traffic between the CPU and DSP were taken and compared against the software. This solution was less time-consuming and has given more detailed results.

There are multiple advantages of using the emulator over a hardware solution. The whole communication system takes up less physical space and the power consumption is lower. It is easy to keep multiple backups and make changes to the system. The emulator does not break as easily as the hardware does. Upon testing it was also discovered that, even though the voice remained unchanged, the emulator had a clear advantage that was not even considered before: a constant, loud background hiss caused by the degradation of analogue components had gone. Also, popping and clicking sounds generated by the DSP firmware were taken care of, making the voice appear clearer and more defined.

This project was focused on replicating and preserving one of the world’s most recognisable voices. Since the voice itself is trademarked, there are no other potential applications for the emulator. This article is an accurate description of the system used by Professor Hawking throughout the years. Hopefully it can help people suffering from similar Motor Neurone Diseases discover and build solutions tailored to their needs.

REFERENCES, LINKS, AND ARTICLES

Linvill, J.G., & Bliss, J.C. (1966). A Direct Translation Reading Aid for the Blind. Proceedings of the IEEE, 54 (1), 40–51.

Klatt, D. (1979). Software for a cascade/parallel formant synthesizer. The Journal of the Acoustical Society of America, 67 (3), 971–95.

Klatt, D. (1987). Review of text-to-speech conversion for English. The Journal of the Acoustical Society of America, 82 (3), 737–93.

NEC Electronics Inc. (1985). uPD77C2OA, 7720A, 77P20 Digital Signal Processors Datasheet

History of Professor Hawking's voice synthesizer - Peter Benie

Professor Hawking's Voice - Peter Benie

Assistive Context-Aware Toolkit (ACAT) - Intel Labs

My Computer - Jonathan Wood, Graduate Assistant to Professor Stephen Hawking

A brief history of placement: how one student helped Stephen Hawking find his voice - The Engineer

“I love it,” – Stephen Hawking’s reaction to new voice emulator - University of Huddersfield

The quest to save Stephen Hawking's voice - Jason Fagone, San Francisco Chronicle

The man who helped to preserve Stephen Hawking’s iconic voice - University of Cambridge, Medium

Software Emulation of a Hardware Voice Synthesiser - Pawel Wozniak, University of Huddersfield

BACK TO TOP