top of page
play1.png


1. Onboarding friction
Many TTS tools require long steps (sign‑up → payment → settings) before hearing the first voice, creating cognitive load.
2. Quality vs. control
Even with natural, expressive voices, users often lack fine‑grained control (speed, pitch, vibrato/variance).
3. Anxiety around cloning
Users need clear, trustworthy UX for consent, data security, and responsibility.
4. Multilingual hassles
Keeping the same voice character across languages without confusion (language selection, pronunciation guides, notation) is hard.
5. Credit/plan fatigue
Credit minutes, downloads, and commercial usage/attribution must be clearly surfaced.

Background & Problem Definition


1) Three‑Step Mental Model: Voice Pick → Type → Listen

  • On the first screen, reach voice selection → text input → playback with a single three‑module layout.

  • Voice cards show emotion keywords and example use cases to reduce choice overload.
     

2) Live Emotion/Tone Controls (Pitch Shift / Pitch Variance / Speed)

  • Real‑time fine‑tuning from a side control panel.

  • When a guide voice recording is uploaded, the selected voice can follow it 
     

3) Voice Cloning Onboarding (BETA)

  • Script‑reading capture → quality check → one‑time training, then save to My Voice Library.

  • After creation, ko/en/ja speech remains in the same voice character using a single IA.

  • Consent & security layer: data encryption, private access only, persistent entry points for deletion/management.
     

4) Same Character Across Languages

  • Place the language toggle in the same context as the voice card/script editor.

  • Provide preview guidance for language‑specific quirks (numbers/units/emojis).
     

5) In‑Context Guidance on Usage Conditions

  • Repeatedly surface credits, download availability/limits, commercial usage per plan, and attribution required for Free in the download modal and header badges.

  • For API users, link to a help page for request rate guidance.

  • In usability tests, the path to first synthesis became clearer; emotion/speed controls felt “easy and fun.”

  • After cloning onboarding, the reuse loop simplified, improving revisit motivation.

  • Brand consistency improved by keeping the same character across languages.

 
TTFV, conversion to first download, cloning onboarding completion rate, multilingual toggle usage rate, free→paid conversion, reduction in support tickets

  • Three‑step mental model to shrink the learning curve versus typical TTS.

  • Control exposure strategy: great defaults for newcomers; instant micro‑tuning for power users.

  • Security/Ethics UI: Always‑visible paths for consent, protection, deletion, and inquiry for psychological safety.

  • Free plan guidance: Gentle reminders about attribution at the download moment.

  • Multilingual IA: Keep language switching within the same task context, not a separate page.

Core Solutions

Results (Qualitative)

Key Design Decisions

Target User


Content creators producing short videos (up to 3 minutes) who generates voiceovers using TTS and integrates them into video editing.

play_3.png
pl3.png
pl2.png
c.png

MVP Version (2024)

sc2.png

Service Ver. 2025

sc1.png

Service Ver. 2025

PLAY

From zero base, I designed it first, so I created problems and accumulated design debt. But as I hired amazing teammates one by one and built the team together, we started solving those issues. It’s like a chaotic yet heartwarming human drama of a project.
 

Year

2023 - Present

Role

Product Designer
​@Supertone AI

bottom of page