Transcribe audio channels with speech to text, synthesize messages with text to speech, and download your audio & transcription files.

SeaVoice STT & TTS Bot

Transcribe audio channels with speech to text, synthesize messages with text to speech, and download your audio & transcription files.

Open our docs in a new tab –>

Visit our website:

SeaVoice Discord Bot Homepage –>

STT Homepage –>

TTS Homepage –>

🐙 The SeaVoice Bot is a new speech-to-text and text-to-speech Discord integration brought to you by Seasalt.ai, a startup run by some of the world’s leading experts in deep speech recognition, neural speech synthesis, and natural language processing. 🐙

Watch the demo video: https://www.youtube.com/embed/drOVk_bexFY

SeaVoice is a voice intelligence bot that uses advanced AI technology to improve the Discord voice channel experience. One of the great things about Discord’s text channels is that they maintain a permanent log of the server’s conversations. But what about the voice channels? Once something is said verbally in the channel, it’s gone - you can’t catch up on part of the conversation you missed or search the conversation later.

Invite SeaVoice to the voice channel, and you can get real time speech transcriptions delivered to a chat channel as the conversation is happening. You’ll also receive a final version of your transcript and voice recording in a DM after the session ends. SeaVoice is set apart from bots offering similar services because it’s backed by state-of-the-art deep learning models crafted by Seasalt.ai.

We feel that providing highly accurate transcriptions for voice channels is a huge accessibility improvement for Discord. Additionally, because transcriptions are automatically posted to a text channel, that means they are permanent, searchable, and shareable. Similarly, speech synthesis also boosts participation in voice channels by making them more accessible to people who can’t or don’t want to speak personally.

Capabilities

✍️ Speech-to-Text

Transcribe Audio from Discord Voice Channels

/recognize [language]

/recognize [language] -> Bot joins the voice channel you’re currently in, and continues to listen and output transcription in real time to the chat channel. The bot will record and transcribe everyone in the voice channel. Transcriptions are output to the text channel where the initial slash command was entered. When the session ends, the bot will DM the session creator a final transcription file, an SRT-formatted transcript file (used for subtitles), and a link to a full audio download. The session will automatically wrap up if all the users leave the voice channel, or if the bot shuts down or restarts for any reason (such as when a new version gets released).

Language Support

SeaVoice currently supports 12 languages. The English and Taiwanese Mandarin models are our own in-house models trained from scratch; they are highly accurate and reliable. All other languages are supported using a multilingual open source model as the base. The performance wasn’t great out of the box, so we integrated it into our own STT pipeline and tuned the model to improve the performance. One thing you may notice with the open source model is “hallucination”. This can manifest in a couple different ways, such as: inserting words/phrases that weren’t said, transcribing in the wrong language, and/or translating the spoken language to a different language.

Language
English
Mandarin (Taiwan)
Spanish
Italian
Portuguese
German
Japanese
Korean
Russian
Hindi
Vietnamese

🗣 Text-to-Speech

Synthesize Speech from Chat to Voice Channel

Seasalt.ai also excels at speech synthesis. We offer a text-to-speech command, which allows users to type in a chat channel and have audio synthesized and played in a particular voice channel for them.

/speak [voice] [text]

To use this command, you should already be in a voice channel. In any text channel, type the /speak slash command and then optionally specify which voice you would like to use, and enter the text that you would like synthesized. When the TTS is done speaking, a 🏁 reaction will be applied to the command message. The default voice if not specified is Orca, you can also set your own default voice using the /user_config command. You can see the available voices below:

Name	Sex	Language
Orca	M	American English
Narwhal	M	British English
Angelfish	F	American English
Starfish	F	Mandarin (Taiwan)
Dolphin	F	Mandarin (Taiwan)

🎙️ Record & Download

Export Audio & Transcriptions from Voice Channels

Users are able to download their transcriptions and full audio recordings to a file.

When the STT session ends the bot will a final transcription file, an SRT-formatted transcript file (used for subtitles), and a link to a full audio download. To download the audio, follow the link and then right click in the web browser and select “Save as…”. Download links will expire after 24 hours - so if you want to a permanent copy of your file, download it to your computer.

Configuration

SeaVoice offers customizable settings for both servers and individual users.

Note: If you update any settings, you must stop and re-start any active /recognize sessions before the new configurations are applied.

👥 Server Settings

Configure settings for everyone in the server

/server_config [live_transcript] [transcript_recipients] [transcript_style] [ignore_bots] [censor]

Use the /server_config command to configure the settings for the current server that you are in. Only users with admin permissions in the server may use this command. Servers currently have the following settings:

👤 User Settings

Configure settings for just yourself

/user_config [exclude_stt] [default_tts_voice]

Use the /user_config command to configure your personal settings for your Discord account. These settings will persist no matter which server you are in. Users currently have the following settings:

⚙️ Server / User Status

Check your current server or user configurations

/server_status

Run the /server_status command to get a break down of your current server configurations.

/user_status

Run the /user_status command to get a break down of your current user configurations.

Pricing

The SeaVoice Discord bot is completely free. No sign up required. Try it out and have fun!

About Seasalt.ai

Seasalt.ai is a Seattle-based startup founded by experts in speech and language technologies.

Data Usage

We collect anonymized voice data for the sole purpose of improving our speech and NLP models. We will never share or sell your data. You can read our full privacy policy here.

Keywords