ptkw

An iOS app for learning European / Brazilian Portuguese pronunciation through typed input and photo-based OCR, powered by Microsoft Azure Cognitive Services.

Goal

Help a learner quickly hear how any Portuguese word or phrase should sound and see its phonetic form — whether typed in by hand or captured from a real-world sign, book, or menu photo.

Features

1. Text input mode

User types a Portuguese word or phrase into a text field.
App displays:
- the input text,
- the IPA (or SAPI) phonetic transcription,
- a playback control that speaks the text aloud.
User can pick the Portuguese variant (Portugal pt-PT vs. Brazil pt-BR) and, optionally, a specific neural voice.
Playback speed can be adjusted (e.g. 0.75×, 1.0×).
Recent queries are kept in a local history for quick replay.

2. Photo / OCR mode

User takes a new photo with the camera or picks an image from the library.
App runs OCR on the image and overlays detected word bounding boxes on top of the photo.
User can tap a single word, drag to select multiple adjacent words, or tap a whole line to select all of its words.
For the current selection the app shows:
- the recognized text,
- its phonetic transcription,
- a play button that speaks it aloud.
Selections and their audio are cached so re-tapping the same word does not re-request the network.

3. Shared behavior

Works offline for anything already cached (playback of previously spoken words, previously OCR'd images).
Errors (no network, Azure quota exceeded, OCR failure) surface as inline messages, not modal alerts.
All Azure requests go through a thin service layer so the TTS / OCR providers can be swapped later without touching the UI.

Tech stack

Platform: iOS 17+, Swift 5.9+, SwiftUI, Swift Concurrency (async/await).
Architecture: feature modules (TextMode, PhotoMode) over a shared AzureClient layer; view models expose @Observable state.
Azure Cognitive Services
- Speech — Text-to-Speech with neural pt-PT / pt-BR voices; request SSML with <mstts:viseme> / phoneme output to obtain transcription.
- Computer Vision Read (or Document Intelligence) — OCR with word- level bounding boxes.
Storage: Core Data or SwiftData for history and cached audio/OCR blobs; audio files cached on disk keyed by (text, voice, rate) hash.
Secrets: Azure keys are kept out of the repo (loaded from a local Secrets.xcconfig that is gitignored, and ultimately a token-exchange service for production).

Out of scope (for v1)

Translation to other languages.
Grammar analysis / dictionary lookup.
User accounts / cloud sync across devices.
Android or web client.

Build

Install XcodeGen and CocoaPods:
```
brew install xcodegen cocoapods
```
Copy Secrets.xcconfig.example to Secrets.xcconfig and fill in your Azure keys/regions (Speech, Vision, Translator). Secrets.xcconfig is gitignored.
Generate the Xcode project, then install the Speech SDK pod:
```
xcodegen generate
pod install
```

Open ptkw.xcworkspace (not the .xcodeproj) in Xcode, or build from CLI:

xcodebuild -workspace ptkw.xcworkspace -scheme ptkw \
  -destination 'platform=iOS Simulator,name=iPhone 15' build

Run tests:

xcodebuild test -workspace ptkw.xcworkspace -scheme ptkw \
  -destination 'platform=iOS Simulator,name=iPhone 15'

Speech SDK

The Microsoft Cognitive Services Speech SDK for iOS is distributed only via CocoaPods (MicrosoftCognitiveServicesSpeech-iOS). Re-run pod install whenever you regenerate the project with xcodegen generate.

Status

v1 scaffolding in place: shared Azure layer (Speech, Vision, Translator), SwiftData history, TextMode/PhotoMode/History/Settings tabs, unit tests for parsing and view models. Smoke-testable on device once Secrets.xcconfig is filled in.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.maestro		.maestro
Config		Config
ptkw		ptkw
ptkwTests		ptkwTests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Makefile		Makefile
Podfile		Podfile
Podfile.lock		Podfile.lock
README.md		README.md
Secrets.xcconfig.example		Secrets.xcconfig.example
project.yml		project.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ptkw

Goal

Features

1. Text input mode

2. Photo / OCR mode

3. Shared behavior

Tech stack

Out of scope (for v1)

Build

Speech SDK

Status

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ptkw

Goal

Features

1. Text input mode

2. Photo / OCR mode

3. Shared behavior

Tech stack

Out of scope (for v1)

Build

Speech SDK

Status

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages