You are viewing a preview of this job. Log in or register to view more details about this job.

Freelance Opportunity: Transcription Specialist

Project Description

Uber AI Solutions is seeking detail-oriented transcription specialists to support a large-scale generative AI training program. In this engagement, you will transcribe and annotate audio files (Single & Multitrack) with accuracy, capturing utterance, stutter, and linguistic nuance exactly as spoken.

Supported Languages & Dialects

We are looking for freelancers in the following languages: 

  • Arabic: Modern Standard Arabic (ar-001 | ar-MSA), Saudi Arabia (ar-SA), United Arab Emirates (ar-AE | ar-UAE)
  • Bengali: (bn-BD | bn-IN)
  • Catalan: (ca-ES)
  • Chinese: Simplified (zh-CN | zh-Hans), Traditional (zh-Hant), Hong Kong (zh-HK), Taiwan (zh-TW)
  • Croatian: (hr-HR)
  • Czech: (cs-CZ)
  • Danish: (da-DK)
  • Dutch: (nl-NL)
  • English: United States (en-US), United Kingdom (en-GB)
  • Estonian: (et-EE)
  • Finnish: (fi-FI)
  • French: France (fr-FR), Canada (fr-CA)
  • German: Germany (de-DE), Switzerland (de-CH)
  • Greek: (el-GR)
  • Hebrew: (he-IL)
  • Hindi: (hi-IN)
  • Hungarian: (hu-HU)
  • Indonesian: (id-ID)
  • Italian: (it-IT)
  • Japanese: (ja-JP)
  • Kannada: (kn-IN)
  • Korean: (ko-KR)
  • Lithuanian: (lt-LT)
  • Maithili: (mai-IN)
  • Malay: (ms-MY)
  • Malayalam: (ml-IN)
  • Norwegian: (no-NO)
  • Polish: (pl-PL)
  • Portuguese: Portugal (pt-PT), Brazil (pt-BR)
  • Romanian: (ro-RO)
  • Russian: (ru-RU)
  • Sinhala: (si-LK)
  • Slovak: (sk-SK)
  • Spanish: Spain (es-ES), United States (es-US), Latin America (es-419 | es-LATAM), Central Americas (es-419)
  • Swedish: (sv-SE)
  • Tagalog/Filipino: (tl-PH)
  • Tamil: (ta-IN)
  • Telugu: (te-IN)
  • Thai: (th-TH)
  • Turkish: (tr-TR)
  • Ukrainian: (uk-UA)
  • Urdu: (ur-PK)
  • Vietnamese: (vi-VN)

 

Key Tasks

  • Transcription: Transcribe audio with 98% accuracy, capturing every disfluency, filler word (um, uh), false start, and stutter exactly as heard.
  • Precision Timestamping: Align text segments to the audio waveform with millisecond precision (max gap <500ms).
  • Speaker Identification: Accurately identify and label speakers in multi-speaker audio files (2–8 interlocutors).
  • Tagging and Annotation: Apply correct tags for non-speech events—like (laughs) or (applause)—and unintelligible segments.