Skip to content

Heapy/ptkw

Repository files navigation

ptkw

An iOS app for learning European / Brazilian Portuguese pronunciation through typed input and photo-based OCR, powered by Microsoft Azure Cognitive Services.

Goal

Help a learner quickly hear how any Portuguese word or phrase should sound and see its phonetic form — whether typed in by hand or captured from a real-world sign, book, or menu photo.

Features

1. Text input mode

  • User types a Portuguese word or phrase into a text field.
  • App displays:
    • the input text,
    • the IPA (or SAPI) phonetic transcription,
    • a playback control that speaks the text aloud.
  • User can pick the Portuguese variant (Portugal pt-PT vs. Brazil pt-BR) and, optionally, a specific neural voice.
  • Playback speed can be adjusted (e.g. 0.75×, 1.0×).
  • Recent queries are kept in a local history for quick replay.

2. Photo / OCR mode

  • User takes a new photo with the camera or picks an image from the library.
  • App runs OCR on the image and overlays detected word bounding boxes on top of the photo.
  • User can tap a single word, drag to select multiple adjacent words, or tap a whole line to select all of its words.
  • For the current selection the app shows:
    • the recognized text,
    • its phonetic transcription,
    • a play button that speaks it aloud.
  • Selections and their audio are cached so re-tapping the same word does not re-request the network.

3. Shared behavior

  • Works offline for anything already cached (playback of previously spoken words, previously OCR'd images).
  • Errors (no network, Azure quota exceeded, OCR failure) surface as inline messages, not modal alerts.
  • All Azure requests go through a thin service layer so the TTS / OCR providers can be swapped later without touching the UI.

Tech stack

  • Platform: iOS 17+, Swift 5.9+, SwiftUI, Swift Concurrency (async/await).
  • Architecture: feature modules (TextMode, PhotoMode) over a shared AzureClient layer; view models expose @Observable state.
  • Azure Cognitive Services
    • Speech — Text-to-Speech with neural pt-PT / pt-BR voices; request SSML with <mstts:viseme> / phoneme output to obtain transcription.
    • Computer Vision Read (or Document Intelligence) — OCR with word- level bounding boxes.
  • Storage: Core Data or SwiftData for history and cached audio/OCR blobs; audio files cached on disk keyed by (text, voice, rate) hash.
  • Secrets: Azure keys are kept out of the repo (loaded from a local Secrets.xcconfig that is gitignored, and ultimately a token-exchange service for production).

Out of scope (for v1)

  • Translation to other languages.
  • Grammar analysis / dictionary lookup.
  • User accounts / cloud sync across devices.
  • Android or web client.

Build

  1. Install XcodeGen and CocoaPods:
    brew install xcodegen cocoapods
    
  2. Copy Secrets.xcconfig.example to Secrets.xcconfig and fill in your Azure keys/regions (Speech, Vision, Translator). Secrets.xcconfig is gitignored.
  3. Generate the Xcode project, then install the Speech SDK pod:
    xcodegen generate
    pod install
    
  4. Open ptkw.xcworkspace (not the .xcodeproj) in Xcode, or build from CLI:
    xcodebuild -workspace ptkw.xcworkspace -scheme ptkw \
      -destination 'platform=iOS Simulator,name=iPhone 15' build
    
  5. Run tests:
    xcodebuild test -workspace ptkw.xcworkspace -scheme ptkw \
      -destination 'platform=iOS Simulator,name=iPhone 15'
    

Speech SDK

The Microsoft Cognitive Services Speech SDK for iOS is distributed only via CocoaPods (MicrosoftCognitiveServicesSpeech-iOS). Re-run pod install whenever you regenerate the project with xcodegen generate.

Status

v1 scaffolding in place: shared Azure layer (Speech, Vision, Translator), SwiftData history, TextMode/PhotoMode/History/Settings tabs, unit tests for parsing and view models. Smoke-testable on device once Secrets.xcconfig is filled in.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors