A Swift tool that uses macOS Accessibility APIs to extract page content from the Apple Books application. It can extract single or multiple pages, detect chapters, and export content to JSON or audio formats.
- macOS 10.15 or later (for Intel builds) / macOS 11.0 or later (for Apple Silicon)
- Apple Books app
- Terminal with Accessibility permissions
- Swift compiler (comes with Xcode or Command Line Tools)
Before using this tool, you need to grant Terminal (or iTerm) accessibility permissions:
- Open System Preferences → Security & Privacy → Privacy tab
- Select "Accessibility" from the left sidebar
- Click the lock icon and authenticate
- Add Terminal (or your terminal app) to the list
- Ensure the checkbox is checked
swiftc book2json.swift -o book2json./build_book2json.shThe build_book2json.sh script performs the following steps:
- Validates Environment: Checks that the source file exists
- Cleans Previous Builds: Removes any existing binaries
- Compiles for Intel (x86_64): Targets macOS 10.15+ for Intel Macs
- Compiles for Apple Silicon (arm64): Targets macOS 11.0+ for M1/M2/M3 Macs
- Creates Universal Binary: Uses
lipoto combine both architectures - Verifies Output: Shows binary info and supported architectures
- Sets Permissions: Makes the binary executable
# Extract single page (default)
./book2json
# Extract content and save to file
./book2json --output book.json| Option | Description | Example |
|---|---|---|
--pages N |
Extract N pages from the book | ./book2json --pages 10 |
--delay MS |
Set delay between page turns in milliseconds (default: 300) | ./book2json --pages 5 --delay 500 |
--output FILE |
Save JSON output to file instead of stdout | ./book2json --output chapter1.json |
--speak |
Generate audio file from extracted content | ./book2json --speak |
debug |
Enable debug mode with diagnostic output | ./book2json debug |
--diagnostic |
Deep diagnostic mode for troubleshooting | ./book2json --diagnostic |
# Extract entire chapter (10 pages) with custom delay
./book2json --pages 10 --delay 500 --output chapter.json
# Extract and generate audio with debug info
./book2json debug --pages 5 --speak
# Diagnostic mode to inspect UI hierarchy
./book2json --diagnosticWhen extracting multiple pages:
- The tool automatically navigates through pages using keyboard shortcuts
- Chapter headings are detected and content is organized by chapters
- Duplicate detection prevents infinite loops at book end
- Content from all pages is combined into a structured JSON output
{
"title": "Book Title",
"content": "The extracted page content...",
"words_count": 342,
"chars_count": 2012,
"extraction_time_ms": 50,
"language": "nl",
"chapter-title": "Chapter Name"
}{
"title": "Book Title",
"total_chars_count": 15234,
"total_word_count": 2341,
"extraction_time_ms": 3500,
"language": "en",
"content": [
{
"chapter-title": "Chapter 1",
"chapter-content": "Chapter content...",
"chars_count": 5234,
"word_count": 823
},
{
"chapter-title": "Chapter 2",
"chapter-content": "Chapter content...",
"chars_count": 10000,
"word_count": 1518
}
]
}When using the debug flag:
- Diagnostic information is printed to stderr
- Shows UI hierarchy traversal
- Displays extraction progress
- Reports processing time in milliseconds
- JSON output still goes to stdout
- Helps troubleshoot extraction issues
The --diagnostic flag provides deep inspection:
- Explores the complete accessibility tree
- Shows all available UI element attributes
- Helps identify content location in complex layouts
- Useful for debugging extraction failures
When using the --speak flag:
- The extracted content is converted to speech and saved as
books_audio.aiff - Automatically selects the best voice for the detected language
- Prioritizes Siri voices when available
- The title and content are both included in the audio
- Uses macOS's built-in
saycommand for reliable audio generation - The audio file is saved in the current directory
The tool uses Apple's Accessibility APIs to:
- Find the Books app process by its bundle identifier (
com.apple.iBooksX) - Access the app's window hierarchy
- Search for the main content area containing book text
- Recursively extract text from all text elements
- Output the combined page content
If the tool doesn't work:
- "Books app is not running" - The tool will try to launch Books automatically
- "No windows found" - Make sure you have a book open in Books
- "Could not extract page content" - Ensure:
- A book is open and visible
- The page has loaded completely
- Terminal has accessibility permissions
- Empty or partial content - Books may render content in a way that's not fully accessible. Try:
- Scrolling to ensure content is loaded
- Switching to a different view mode in Books
- Using a different book format (EPUB vs PDF)
book2json.swift- Main Swift source codebuild_book2json.sh- Build script for universal binaryfind_books_info.swift- Helper script to find Books app information
- The tool extracts text from the currently visible page only
- PDF books may have different accessibility structures than EPUB books
- Some DRM-protected content may not be accessible
- The tool respects system accessibility settings