AI & Machine Learning2025Completed

AI Assistant M3 - Conversational RAG Agent with Voice I/O

Production-ready AI assistant with reasoning agent, voice input/output (OpenAI Whisper & TTS), RAG document Q&A, cloud storage, and 7 integrated tools. Features conversational context awareness, natural voice interaction with 6 voice personalities, lazy loading, and ChatGPT-style session management.

Source Code

AI Assistant M3 - Conversational RAG Agent with Voice I/O

Technologies Used

PythonStreamlitLangChainRAGOpenAI GPT-4OpenAI WhisperOpenAI TTSGoogle Cloud FirestoreGoogle Cloud PlatformChromaDBOpenAI EmbeddingsTavilyWebRTCaudio-recorder-streamlit

Project Overview

🤖 Conversational RAG Agent with Voice I/O & Cloud Storage

Built a production-grade AI assistant featuring an intelligent reasoning agent, RAG-based document Q&A, conversational context awareness, natural voice interaction with OpenAI Whisper & TTS, and Firebase cloud storage with ChatGPT-style session management.

🧠 Intelligent Agent System

Reasoning agent that autonomously selects appropriate tools based on context
Conversational context awareness - understands pronouns and follow-up questions
Two-level memory: agent history (10 msgs) + RAG history (20 msgs)
Custom enhanced prompts for optimal tool selection
GPT-4 integration with LangChain agent framework
Handles multi-tool queries intelligently

🎤 Voice Input/Output System (NEW!)

OpenAI Whisper integration for high-accuracy speech recognition (99+ languages)
Natural text-to-speech with 6 voice personalities (OpenAI TTS)
Auto-speak mode for hands-free conversational experience
Real-time audio transcription with browser WebRTC API
Voice-enabled document Q&A - speak questions, hear responses
Manual playback controls for any AI message
Base64 audio encoding with HTML5 autoplay integration

📄 RAG Document Q&A

Upload PDFs, Word docs, and text files for Q&A
ChromaDB vector store with OpenAI embeddings
Conversational RAG - reformulates questions using chat history
Document chunking with recursive text splitter (1000/200)
Smart retrieval with top-3 similarity search
Supports follow-up questions like "summarize it", "tell me more"

🛠️ Seven Integrated Tools

🔍 Web Search - Tavily API for current information
🌤️ Weather - Real-time data via OpenWeatherMap
💱 Currency Converter - Live exchange rates (50+ currencies)
📊 Stock Prices - Current market data lookup (SerpAPI)
🧮 Calculator - Safe mathematical expression evaluation
📄 Document Q&A - RAG-based conversational queries
🎤 Voice I/O - Speech recognition & natural TTS

☁️ Google Cloud Storage & Session Management

Google Cloud Firestore integration for persistent storage
Cloud-native architecture with service account authentication
ChatGPT-style UI with smart session titles (generated from first message)
Lazy loading - sessions load only when clicked (5-8x faster)
Auto-save functionality with session caching
Delete sessions with one click
Supports multiple conversations with seamless switching
Scalable cloud infrastructure supporting 100+ concurrent users

🏗️ Architecture & Design

Modular architecture: separate tools, agents, RAG, UI, voice, and utils packages
Tool decorator pattern for easy extensibility
Pydantic schemas for type-safe tool inputs
Session state management with Streamlit
Error handling and graceful API failure recovery
Configuration-driven design (easy to customize)
Separation of concerns - voice logic isolated from core chat

⚡ Performance Optimizations

Lazy loading for 5-8x faster startup
Session metadata caching (no repeated Firebase queries)
Title storage at creation (not generated each time)
Optimized Firestore reads (metadata only, not full messages)
Agent caching with @st.cache_resource
Configurable auto-load for speed vs UX balance
Efficient audio encoding with temporary file cleanup

🎯 Voice Module Technical Details

Browser WebRTC MediaRecorder for real-time audio capture
OpenAI Whisper API with language hints for transcription accuracy
OpenAI TTS API with configurable models (tts-1/tts-1-hd)
Base64 audio encoding for browser-native playback
Temporary file management for secure audio processing
Session state for voice preferences (auto-speak, selected voice)
Cost-optimized with configurable quality settings

🏆 Technical Achievements

Intelligent Agent: Built reasoning agent that autonomously selects and chains tools with conversational context awareness
Voice Integration: Implemented production-grade voice I/O system with OpenAI Whisper (speech recognition) and TTS (6 natural voices) for hands-free interaction
Conversational RAG: Implemented dual-memory system with question reformulation for natural document Q&A
Cloud Architecture: Integrated Google Cloud Firestore for production-grade data persistence with service account security
Production Design: Modular, scalable architecture with 7 tools, voice I/O, cloud storage, and ChatGPT-style UX
Performance: Achieved 5-8x faster load times through lazy loading and intelligent caching strategies

💡 Key Features

Reasoning AgentVoice Input/OutputOpenAI Whisper & TTSConversational ContextRAG Document Q&ACloud StorageChatGPT-style UILazy LoadingMulti-Tool IntegrationModular Architecture

Related Projects

Self-Driving Car Simulation - Deep Learning CNN

AI & Machine Learning

⭐ Featured

Self-Driving Car Simulation - Deep Learning CNN

2025Completed

End-to-end deep learning system using NVIDIA CNN architecture to autonomously drive a car in Udacity simulator. Features real-time steering prediction from camera images with comprehensive data augmentation and preprocessing pipeline.

PythonTensorFlowKerasOpenCV+4 more

View Details

Sports Motion Detection & Viewport Tracking

AI & Machine Learning

⭐ Featured

Sports Motion Detection & Viewport Tracking

2025Completed

A Python-based motion detection and viewport tracking system that simulates a "virtual camera" for sports video analysis using computer vision techniques.

PythonOpenCVNumPyComputer Vision

View Details