Glossary
Phone Translation Glossary
Plain-English definitions for the terms that come up when teams evaluate phone translation, multilingual support, and AI voice tools. Written for operators, not for linguists.
Phone Translation Basics
Real-Time Phone Translation
#AI translation of a live phone call as it happens.
A service that translates the audio of a live phone call between two languages with sub-second to sub-2-second latency. The agent and caller speak naturally in their own languages; the system streams translated audio in both directions while the call is in progress. Distinct from voicemail translation (post-call) and document translation (text only).
AI Phone Translation
#Phone translation powered by speech-to-text, machine translation, and text-to-speech models — no human in the loop.
Phone translation that uses AI models for speech recognition (STT), machine translation (MT), and speech synthesis (TTS) to translate calls automatically, instead of routing to a human interpreter. Faster (instant connection vs. 30–60 seconds for OPI), cheaper ($0.20–$0.50/minute vs. $1.50–$3.50/minute for OPI), and available 24/7 in all supported languages.
Over-the-Phone Interpretation (OPI)
#Traditional service where a human interpreter joins a three-way call to translate.
A long-established service where a business calls an interpreter agency, requests an interpreter for a specific language, and conferences them into the call as a third party. Major providers include LanguageLine Solutions, Boostlingo, Cyracom, and Interpretype. Pricing typically $1.50–$3.50/minute. Used in healthcare, legal, government, and large-enterprise settings where certified human interpretation is required.
Multilingual IVR (Interactive Voice Response)
#An automated phone menu that lets callers choose their language before the call routes.
A phone menu played to inbound callers that presents language options (e.g., "For English, press 1. Para español, oprima 2.") and routes the call accordingly. Each option can map to a specific language queue, agent group, or downstream IVR. Eliminates the awkward language-handoff at the start of every call.
Voicemail Translation
#AI-generated transcript and translation of voicemails left in non-English languages.
Process where a non-English voicemail is automatically detected, transcribed in the original language, translated to English, and (typically) summarized by AI. Often includes extracted fields like caller name, callback number, reason for calling, and urgency level. Lets monolingual support teams handle multilingual voicemails without an interpreter.
Bilingual Transcript
#A side-by-side written record of a translated call in both languages.
A persistent record of a phone call that captures the original speech of both parties (one in their native language, one in English) along with the real-time translations, displayed side-by-side or interleaved. Used for quality assurance, CRM integration, training, and compliance. Distinguishes itself from a single-language transcript by preserving exactly what each party actually said in their own language.
AI Call Summary
#An automatically generated summary of what happened on a call.
A short text summary, generated by AI after a call ends, that extracts key topics discussed, action items, decisions made, and customer sentiment. Often pushed automatically to a CRM (Zendesk, HubSpot, Salesforce) so support managers can review or audit calls without listening to the full recording.
Technical Terms
Speech-to-Text (STT)
#AI that converts spoken audio into written text.
Also called Automatic Speech Recognition (ASR). The first step in an AI translation pipeline — converts the speaker's audio into text in their original language. Quality depends on microphone, background noise, accent, and model training data. Phone-tuned STT models are optimized for the narrow audio bandwidth of compressed phone calls.
Text-to-Speech (TTS)
#AI that generates spoken audio from written text.
The final step in an AI translation pipeline — synthesizes the translated text as natural-sounding speech in the target language. Modern neural TTS models (e.g., ElevenLabs, Google Cloud TTS, Amazon Polly) produce voices nearly indistinguishable from humans, with control over voice identity, speed, and emotional tone.
Machine Translation (MT)
#AI that translates text from one language to another.
The middle step in an AI phone translation pipeline. Modern neural machine translation models (transformer-based) handle most language pairs with high accuracy. For phone translation, the model must handle informal speech, partial utterances, and conversational context — different requirements than translating a formal document.
Translation Latency
#The delay between when one party speaks and when the other party hears the translation.
End-to-end time from one speaker finishing a phrase to the other speaker hearing the translation. Measured in seconds. Phone-call translation requires sub-2-second latency to feel natural; over 3 seconds, callers start talking over each other or hanging up. Latency is the sum of speech-to-text, machine translation, text-to-speech, and network round-trip times.
Telephony Provider
#A platform that lets software make and receive phone calls (e.g., Twilio, Vonage, Bandwidth).
A cloud service that provides phone numbers, voice infrastructure, and APIs for placing and receiving phone calls programmatically. TalkTool is built on top of Twilio. A custom-built phone translator (e.g., on the OpenAI Realtime API) would also need to integrate a telephony provider to handle the actual phone-call connection.
SIP / VoIP
#Internet-based phone-call protocols that carry voice over data networks.
SIP (Session Initiation Protocol) is the industry-standard protocol for setting up internet voice calls. VoIP (Voice over IP) is the broader category. Together they let modern phone systems carry calls over the internet instead of legacy phone networks. Most modern translation services, including TalkTool, use SIP/VoIP under the hood.
Customer & Compliance Terms
Limited English Proficiency (LEP)
#A person whose primary language is not English and who has limited ability to read, speak, write, or understand English.
A US legal and policy term used in healthcare, government, and education contexts. Federal law (Title VI of the Civil Rights Act, Section 1557 of the ACA, Executive Order 13166) requires meaningful language access for LEP individuals in many federally funded settings. According to the US Census Bureau (American Community Survey), over 25 million people in the US have LEP status.
Language Access
#Policy and practice of providing meaningful communication to people in their preferred language.
The umbrella term for ensuring people can understand and be understood by an organization regardless of their language. Includes translated documents, qualified interpreters, multilingual signage, and translated phone support. A legal requirement in many regulated industries (healthcare under Section 1557, federally funded programs under Title VI).
Section 1557 of the ACA
#Federal nondiscrimination law requiring healthcare providers to offer language access to patients.
A section of the Affordable Care Act prohibiting discrimination on the basis of race, color, national origin (which includes primary language), sex, age, or disability in health programs receiving federal funding. Effectively requires covered healthcare providers to offer qualified interpretation and translation services to LEP patients. Source: HHS Office for Civil Rights.
BAA (Business Associate Agreement)
#A HIPAA contract between a healthcare entity and a vendor that handles protected health information.
Required under HIPAA whenever a covered entity (e.g., a healthcare provider) shares protected health information (PHI) with a vendor. The BAA defines how the vendor must protect, use, and limit access to that data. Phone translation services that handle healthcare calls typically need to offer a BAA to their healthcare customers.
HIPAA-Ready
#Infrastructure and processes designed to meet HIPAA requirements when paired with a BAA.
A common industry term meaning the vendor's systems can be used in a HIPAA-compliant manner provided a BAA is in place. Distinguishes "can support a HIPAA workflow" from "the vendor is itself a covered entity." Should be confirmed with a signed BAA and the vendor's specific security documentation before handling PHI.
GDPR Compliance
#EU regulation governing personal data of EU residents.
The General Data Protection Regulation governs how personal data of EU residents is collected, processed, and stored. For phone translation services with EU users, this typically requires lawful basis for processing, EU data residency or approved transfer mechanisms, data subject rights (access, deletion), and a data processing agreement with customers.
Adjacent Concepts
Conversational AI vs. Phone Translation
#A conversational AI talks to the customer; a phone translation service helps two humans talk to each other.
Conversational AI (e.g., a customer-service chatbot or a voice agent built on the OpenAI Realtime API) replaces a human agent in the conversation. Phone translation, in contrast, leaves both humans in the conversation and translates between their languages. Different use cases: a chatbot is better for self-service flows; phone translation is better when human judgment is required (sales, complex support, healthcare).
Translated Outbound Call
#An outbound call where the agent speaks one language and the recipient hears another, in real time.
Outbound counterpart to inbound translated calls. The agent dials a customer's number from a dashboard, selects the customer's language, and speaks English. The recipient hears their own language, and the agent hears the customer's responses in English. Used for proactive outreach, appointment reminders, sales follow-up.
Language-Based Call Routing
#Routing inbound calls to specific agents or queues based on the caller's language.
A call-handling rule that sends an inbound call to a particular agent, group, or queue depending on the caller's selected or detected language. Combined with multilingual IVR, language detection, or per-contact preferences. Enables specialized teams (e.g., a Spanish-only escalation queue) without forcing every agent to handle every language.
Agent Availability Queue
#A real-time list of agents marked available to take calls, used to route inbound calls.
A backend mechanism that tracks which agents are available, ringing, or busy at any given moment. Inbound calls are routed to the next available agent in the right queue. Often heartbeat-based (agents send a periodic signal to maintain availability) so disconnected or offline agents are automatically removed.
Call Recording Consent
#Legal requirement to inform (and in some jurisdictions, get permission from) call participants before recording.
US states are split between one-party consent (only one party needs to know about recording — typically the recording party) and two-party (or all-party) consent jurisdictions where every party must be informed. Many companies play a recorded disclosure at the start of a call. For multilingual calls, the disclosure must be played in the caller's language to be effective.
See it in action
TalkTool turns these concepts into a real product: a phone number you can share with non-English-speaking callers, with translation, transcripts, and call routing built in.