You are viewing a preview of this job. Log in or register to view more details about this job.

Freelance Opportunity: Transcription Specialist

Project Description

Uber AI Solutions is seeking detail-oriented transcription specialists to support a large-scale generative AI training program. In this engagement, you will transcribe and annotate audio files (Single & Multitrack) with accuracy, capturing utterance, stutter, and linguistic nuance exactly as spoken.

Supported Languages & Dialects

We are looking for freelancers in the following languages:

Arabic: Modern Standard Arabic (ar-001 | ar-MSA), Saudi Arabia (ar-SA), United Arab Emirates (ar-AE | ar-UAE)
Bengali: (bn-BD | bn-IN)
Catalan: (ca-ES)
Chinese: Simplified (zh-CN | zh-Hans), Traditional (zh-Hant), Hong Kong (zh-HK), Taiwan (zh-TW)
Croatian: (hr-HR)
Czech: (cs-CZ)
Danish: (da-DK)
Dutch: (nl-NL)
English: United States (en-US), United Kingdom (en-GB)
Estonian: (et-EE)
Finnish: (fi-FI)
French: France (fr-FR), Canada (fr-CA)
German: Germany (de-DE), Switzerland (de-CH)
Greek: (el-GR)
Hebrew: (he-IL)
Hindi: (hi-IN)
Hungarian: (hu-HU)
Indonesian: (id-ID)
Italian: (it-IT)
Japanese: (ja-JP)
Kannada: (kn-IN)
Korean: (ko-KR)
Lithuanian: (lt-LT)
Maithili: (mai-IN)
Malay: (ms-MY)
Malayalam: (ml-IN)
Norwegian: (no-NO)
Polish: (pl-PL)
Portuguese: Portugal (pt-PT), Brazil (pt-BR)
Romanian: (ro-RO)
Russian: (ru-RU)
Sinhala: (si-LK)
Slovak: (sk-SK)
Spanish: Spain (es-ES), United States (es-US), Latin America (es-419 | es-LATAM), Central Americas (es-419)
Swedish: (sv-SE)
Tagalog/Filipino: (tl-PH)
Tamil: (ta-IN)
Telugu: (te-IN)
Thai: (th-TH)
Turkish: (tr-TR)
Ukrainian: (uk-UA)
Urdu: (ur-PK)
Vietnamese: (vi-VN)

Key Tasks

Transcription: Transcribe audio with 98% accuracy, capturing every disfluency, filler word (um, uh), false start, and stutter exactly as heard.
Precision Timestamping: Align text segments to the audio waveform with millisecond precision (max gap <500ms).
Speaker Identification: Accurately identify and label speakers in multi-speaker audio files (2–8 interlocutors).
Tagging and Annotation: Apply correct tags for non-speech events—like (laughs) or (applause)—and unintelligible segments.