Freelance Opportunity: Transcription Specialist
Project Description
Uber AI Solutions is seeking detail-oriented transcription specialists to support a large-scale generative AI training program. In this engagement, you will transcribe and annotate audio files (Single & Multitrack) with accuracy, capturing utterance, stutter, and linguistic nuance exactly as spoken.
Supported Languages & Dialects
We are looking for freelancers in the following languages:
- Arabic: Modern Standard Arabic (ar-001 | ar-MSA), Saudi Arabia (ar-SA), United Arab Emirates (ar-AE | ar-UAE)
- Bengali: (bn-BD | bn-IN)
- Catalan: (ca-ES)
- Chinese: Simplified (zh-CN | zh-Hans), Traditional (zh-Hant), Hong Kong (zh-HK), Taiwan (zh-TW)
- Croatian: (hr-HR)
- Czech: (cs-CZ)
- Danish: (da-DK)
- Dutch: (nl-NL)
- English: United States (en-US), United Kingdom (en-GB)
- Estonian: (et-EE)
- Finnish: (fi-FI)
- French: France (fr-FR), Canada (fr-CA)
- German: Germany (de-DE), Switzerland (de-CH)
- Greek: (el-GR)
- Hebrew: (he-IL)
- Hindi: (hi-IN)
- Hungarian: (hu-HU)
- Indonesian: (id-ID)
- Italian: (it-IT)
- Japanese: (ja-JP)
- Kannada: (kn-IN)
- Korean: (ko-KR)
- Lithuanian: (lt-LT)
- Maithili: (mai-IN)
- Malay: (ms-MY)
- Malayalam: (ml-IN)
- Norwegian: (no-NO)
- Polish: (pl-PL)
- Portuguese: Portugal (pt-PT), Brazil (pt-BR)
- Romanian: (ro-RO)
- Russian: (ru-RU)
- Sinhala: (si-LK)
- Slovak: (sk-SK)
- Spanish: Spain (es-ES), United States (es-US), Latin America (es-419 | es-LATAM), Central Americas (es-419)
- Swedish: (sv-SE)
- Tagalog/Filipino: (tl-PH)
- Tamil: (ta-IN)
- Telugu: (te-IN)
- Thai: (th-TH)
- Turkish: (tr-TR)
- Ukrainian: (uk-UA)
- Urdu: (ur-PK)
- Vietnamese: (vi-VN)
Key Tasks
- Transcription: Transcribe audio with 98% accuracy, capturing every disfluency, filler word (um, uh), false start, and stutter exactly as heard.
- Precision Timestamping: Align text segments to the audio waveform with millisecond precision (max gap <500ms).
- Speaker Identification: Accurately identify and label speakers in multi-speaker audio files (2–8 interlocutors).
- Tagging and Annotation: Apply correct tags for non-speech events—like (laughs) or (applause)—and unintelligible segments.