English
Language : 

80-0206-R Datasheet, PDF (4/49 Pages) List of Unclassifed Manufacturers – Speech Recognition Processor
RSC-4128
Data Sheet
Speech Technologies
Speech Recognition
The RSC-4128 is designed to operate in tandem with the FluentChip™ technology library, including speaker
independent (SI), speaker dependent (SD), and speaker verification (SV) speech recognition. Combinations of
these technologies may used to create applications that are rich in features. These are described below:
 Speaker Independent recognition requires no user training. The RSC-4128 can recognize up to 30 words in an
active set (number of sets is limited only by internal ROM size). Text-to-SI (T2SI™), based on a hybrid of Hidden
Markov Modeling and Neural Net technologies, allows creation of accurate SI recognition sets in seconds. SI
requires on-chip ROM or off-chip parallel bus ROM, EPROM, or Flash to store the words to be recognized.
 Speaker Dependent recognition allows the user to create names for products or customize recognition sets. SD
is implemented with DTW (dynamic time warping) pattern matching technology. SD requires programmable
memory to store the personalized speech templates(trained patterns) that may be on-chip SRAM, or off-chip
serial or parallel bus EEPROM, Flash Memory, or SRAM. Up to 100 templates can be recognized in an active
set (the number of unique sets is limited only by programmable memory capacity). The RSC-4128 can store up
to 7 SD templates in on-chip SRAM.
 Speaker Verification enables the RSC-4128 to authenticate when a previously trained password is spoken by
the target user. SV is also implemented with DTW technology. 5 SV templates can be stored in on-chip SRAM,
or more with external programmable memory such as delineated in SD above.
 Word Spotting enables the RSC-4128 to spot a specific word surrounded by other speech within a phrase. This
can be quite effective when the users response may vary (e.g. spotting “telephone” in the phrases “ummm
telephone”, or “telephone call”). This option is available for SI and SD.
 Continuous Listening allows the chip to continuously listen for a specific word. This may be used as a trigger
word to request a device to listen for a command. This option is available for SI and SD.
Speech and Music Synthesis
The RSC-4128 provides high-quality speech compression using Sensory SX™ technology. One may select
various data rates from approximately 2.4 to 10.8Kbps to manage speech quality versus allotted memory. The
highest data rates use 16KHz sample rates to provide high quality reproduction of high pitched voices. Speech and
sound effects may also be compressed using 8-bit PCM (64Kbps) or 4-bit ADPCM (32Kbps) technologies.
The RSC-4128 also provides high-quality, eight-voice, wave table music synthesis which allows multiple,
simultaneous instruments for harmonizing. The RSC-4128 uses a MIDI-like system to generate music. One or
more of the eight voices may be speech playback instead of music. One or more of the eight voices may be a drum
track comprising multiple drums. In effect, drum tracks allow the number of simultaneous instruments to exceed 8.
Speech and Music data may be stored in on-chip ROM or off-chip parallel bus ROM, EPROM or Flash. Speech
data may alternatively be stored in off-chip serial data ROM or serial data Flash for extended durations.
Easy to use tools allow the developer to record and compress their own voice talents and create with the push of a
button, or to create their own MIDI scores and instruments.
Record and Playback
The RSC-4128 can perform speech record and playback (sometimes called “voice memo”) using either 8 bits
(64Kbps) or 4 bits (32Kbps) per sample, depending on the quantity and quality of playback desired. The record and
playback technology also optionally performs silence removal to reduce memory requirements.
External parallel or serial bus Flash or SRAM is required to store the compressed speech.
4
P/N 80-0206-R
© 2006 Sensory Inc.