80-0206-R Datasheet, PDF(4/49 Page) List of Unclassifed Manufacturers – Speech Recognition Processor

English

English German Russian Spanish Italian Polish Chinese Japanese Korean French Portuguese	Language :

80-0206-R Datasheet, PDF (4/49 Pages) List of Unclassifed Manufacturers – Speech Recognition Processor

◁

RSC-4128

Data Sheet

Speech Technologies

Speech Recognition

The RSC-4128 is designed to operate in tandem with the FluentChipâ¢ technology library, including speaker

independent (SI), speaker dependent (SD), and speaker verification (SV) speech recognition. Combinations of

these technologies may used to create applications that are rich in features. These are described below:

Speaker Independent recognition requires no user training. The RSC-4128 can recognize up to 30 words in an

active set (number of sets is limited only by internal ROM size). Text-to-SI (T2SIâ¢), based on a hybrid of Hidden

Markov Modeling and Neural Net technologies, allows creation of accurate SI recognition sets in seconds. SI

requires on-chip ROM or off-chip parallel bus ROM, EPROM, or Flash to store the words to be recognized.

Speaker Dependent recognition allows the user to create names for products or customize recognition sets. SD

is implemented with DTW (dynamic time warping) pattern matching technology. SD requires programmable

memory to store the personalized speech templates(trained patterns) that may be on-chip SRAM, or off-chip

serial or parallel bus EEPROM, Flash Memory, or SRAM. Up to 100 templates can be recognized in an active

set (the number of unique sets is limited only by programmable memory capacity). The RSC-4128 can store up

to 7 SD templates in on-chip SRAM.

Speaker Verification enables the RSC-4128 to authenticate when a previously trained password is spoken by

the target user. SV is also implemented with DTW technology. 5 SV templates can be stored in on-chip SRAM,

or more with external programmable memory such as delineated in SD above.

Word Spotting enables the RSC-4128 to spot a specific word surrounded by other speech within a phrase. This

can be quite effective when the users response may vary (e.g. spotting âtelephoneâ in the phrases âummm

telephoneâ, or âtelephone callâ). This option is available for SI and SD.

Continuous Listening allows the chip to continuously listen for a specific word. This may be used as a trigger

word to request a device to listen for a command. This option is available for SI and SD.

Speech and Music Synthesis

The RSC-4128 provides high-quality speech compression using Sensory SXâ¢ technology. One may select

various data rates from approximately 2.4 to 10.8Kbps to manage speech quality versus allotted memory. The

highest data rates use 16KHz sample rates to provide high quality reproduction of high pitched voices. Speech and

sound effects may also be compressed using 8-bit PCM (64Kbps) or 4-bit ADPCM (32Kbps) technologies.

The RSC-4128 also provides high-quality, eight-voice, wave table music synthesis which allows multiple,

simultaneous instruments for harmonizing. The RSC-4128 uses a MIDI-like system to generate music. One or

more of the eight voices may be speech playback instead of music. One or more of the eight voices may be a drum

track comprising multiple drums. In effect, drum tracks allow the number of simultaneous instruments to exceed 8.

Speech and Music data may be stored in on-chip ROM or off-chip parallel bus ROM, EPROM or Flash. Speech

data may alternatively be stored in off-chip serial data ROM or serial data Flash for extended durations.

Easy to use tools allow the developer to record and compress their own voice talents and create with the push of a

button, or to create their own MIDI scores and instruments.

Record and Playback

The RSC-4128 can perform speech record and playback (sometimes called âvoice memoâ) using either 8 bits

(64Kbps) or 4 bits (32Kbps) per sample, depending on the quantity and quality of playback desired. The record and

playback technology also optionally performs silence removal to reduce memory requirements.

External parallel or serial bus Flash or SRAM is required to store the compressed speech.

4

P/N 80-0206-R

Â© 2006 Sensory Inc.

▷