Hold SHIFT to speak, release to type. Fast, accurate voice-to-text using Groq Whisper API.
- Hold SHIFT to record - Release to transcribe and type
- Fast transcription - Groq Whisper API (~0.5s)
- High accuracy - Whisper large-v3-turbo model
- System tray icon - Right-click for settings
- Auto-hide widget - Only shows when recording
- Easy settings - Mic picker, API key input in UI
- Privacy focused - Audio processed by Groq, not stored
- Accounting Mode - Converts spoken numbers to digits (e.g., "one hundred twenty three" → "123")
- Comma Formatting - Optional commas in large numbers (e.g., "1,234,567")
- Casual Mode - Lowercase output with informal punctuation
- Filter Words - Block unwanted phrases (e.g., "thank you" when nothing said)
- Blue Theme - Modern dark blue UI
- Custom Hotkeys - Change push-to-talk key in settings
- Emoji Voice Commands - Say "happy emoji" to insert 😊 (100+ emojis supported)
| Version | File | Description |
|---|---|---|
| Full | VoiceType.exe |
All features, system tray, emoji support |
| Lite | VoiceTypeLite.exe |
Optimized for older/slower computers |
- Uses
distil-whisper-large-v3-enmodel (faster) - No system tray icon (less memory)
- No emoji conversion
- Simpler UI
- Smaller audio chunks
- Shares settings with Full version
Windows:
- Download
VoiceType.exe(orVoiceTypeLite.exefor older computers) from thedistfolder - Double-click to run (no installation needed)
macOS:
- Download
VoiceType.pkg - Double-click it
- Click "Continue" then "Install"
- Done! VoiceType is in your Applications folder
-
Clone the repo
git clone https://github.com/yourusername/ai-speech-to-text.git cd ai-speech-to-text -
Create virtual environment
python -m venv venv
-
Activate venv
- Windows:
venv\Scripts\activate - Mac/Linux:
source venv/bin/activate
- Windows:
-
Install dependencies
pip install -r requirements.txt
-
Run
python voice_type.py
Or on Windows, just double-click
run.bat
pip install pyinstaller
pyinstaller build/VoiceType.spec --noconfirm
# Or for Lite:
pyinstaller build/VoiceTypeLite.spec --noconfirmThe executable will be created at dist/VoiceType.exe
chmod +x build/build-mac.sh
./build/build-mac.shThis creates:
dist/VoiceType.app- The application bundledist/VoiceType.pkg- PKG installer (share this file)
Note: You must build on the target platform. Windows builds only work on Windows, Mac builds only work on Mac.
- Get a free API key from Groq Console
- Right-click tray icon → Settings (or settings open automatically on first run)
- Paste your API key
- Select your microphone
- Configure features (Accounting Mode, Casual Mode, Filter Words)
- Click Save
- Place cursor where you want text
- Hold SHIFT and speak (widget appears)
- Release SHIFT to transcribe
- Text appears at cursor position
- Widget auto-hides after 2 seconds
When enabled, converts spoken number words to digits:
- "one" → "1"
- "twenty five" → "25"
- "one hundred" → "100"
When enabled with Accounting Mode, adds commas to large numbers:
- "1000000" → "1,000,000"
When enabled, outputs lowercase text with informal punctuation:
- No capitalization
- Periods removed
- Multiple punctuation reduced
Block unwanted phrases from being typed. Useful for blocking:
- "thank you" (common hallucination when nothing said)
- "thanks"
- Any custom words
Enter as comma-separated list in settings.
- Speak clearly for best results
- Works in any app (IDEs, browsers, editors)
- Right-click tray icon for settings anytime (Full version only)
- Say "emoji" after an emoji name to insert it (Full version only)
Speak emoji names to insert actual emojis! Just say the emoji name followed by "emoji":
| Say This | Get This |
|---|---|
| "happy emoji" | 😊 |
| "sad emoji" | 😢 |
| "angry emoji" | 😠 |
| "laughing emoji" | 😂 |
| "heart emoji" | ❤️ |
| "fire emoji" | 🔥 |
| "thumbs up emoji" | 👍 |
| "thinking emoji" | 🤔 |
| "party emoji" | 🎉 |
| "rocket emoji" | 🚀 |
Examples:
- "That's awesome fire emoji" → "That's awesome 🔥"
- "Great job thumbs up emoji" → "Great job 👍"
- "I'm confused thinking emoji" → "I'm confused 🤔"
Over 100+ emojis supported including emotions, animals, food, gestures, and more!
- Python 3.8+
- Microphone
- Internet connection
- Groq API key (free tier available)
- API keys are stored locally in
~/.voice-type-config.json - No data is sent anywhere except Groq API for transcription
- Audio is processed in real-time and not saved to disk permanently
- v1.2.0 - Accounting mode, casual mode, filter words, blue theme, Lite version
- v1.1.0 - Emoji support, custom hotkeys
- v1.0.0 - Initial release
MIT