4.8 KiB
4.8 KiB
EPUB Reader Documentation
Overview
This project implements two major enhancements to pyWebLayout:
- Enhanced Page Class: Moved HTML rendering logic from the browser into the
Pageclass for better separation of concerns - Tkinter EPUB Reader: A complete EPUB reader application with pagination support
Files Created/Modified
1. Enhanced Page Class (pyWebLayout/concrete/page.py)
New Features Added:
load_html_string()- Load HTML content directly into a Pageload_html_file()- Load HTML from a file- Private conversion methods to transform abstract blocks to renderables
- Integration with existing HTML extraction system
Key Methods:
page = Page(size=(800, 600))
page.load_html_string(html_content) # Load HTML string
page.load_html_file("file.html") # Load HTML file
image = page.render() # Render to PIL Image
Benefits:
- Reuses existing
html_extraction.pyinfrastructure - Converts abstract blocks to concrete renderables
- Supports headings, paragraphs, lists, images, etc.
- Proper error handling with fallback rendering
2. EPUB Reader Application (epub_reader_tk.py)
Features:
- Complete Tkinter-based GUI
- EPUB file loading using existing
epub_reader.py - Chapter navigation with dropdown selection
- Page-by-page display with navigation controls
- Adjustable font size (8-24pt)
- Keyboard shortcuts (arrow keys, Ctrl+O)
- Status bar with loading feedback
- Scrollable content display
GUI Components:
- File open dialog for EPUB selection
- Chapter dropdown and navigation buttons
- Page navigation controls
- Font size adjustment
- Canvas with scrollbars for content display
- Status bar for feedback
Navigation:
- Left/Right arrows: Previous/Next page
- Up/Down arrows: Previous/Next chapter
- Ctrl+O: Open file dialog
- Mouse: Dropdown chapter selection
3. Test Suite (test_enhanced_page.py)
Test Coverage:
- HTML string loading and rendering
- HTML file loading and rendering
- EPUB reader app import and instantiation
- Error handling verification
Technical Architecture
HTML Processing Flow
HTML String/File → parse_html_string() → Abstract Blocks → Page._convert_block_to_renderable() → Concrete Renderables → Page.render() → PIL Image
EPUB Reading Flow
EPUB File → read_epub() → Book → Chapters → Abstract Blocks → Page Conversion → Tkinter Display
Usage Examples
Basic HTML Page Rendering
from pyWebLayout.concrete.page import Page
# Create and load HTML
page = Page(size=(800, 600))
page.load_html_string("""
<h1>Hello World</h1>
<p>This is a <strong>test</strong> paragraph.</p>
""")
# Render to image
image = page.render()
image.save("output.png")
EPUB Reader Application
# Run the EPUB reader
python epub_reader_tk.py
# Or import and use programmatically
from epub_reader_tk import EPUBReaderApp
app = EPUBReaderApp()
app.run()
Features Demonstrated
HTML Parsing & Rendering
- ✅ Paragraphs with inline formatting (bold, italic)
- ✅ Headers (H1-H6) with proper sizing
- ✅ Lists (ordered and unordered)
- ✅ Images with alt text fallback
- ✅ Error handling for malformed content
EPUB Processing
- ✅ Full EPUB metadata extraction
- ✅ Chapter-by-chapter navigation
- ✅ Table of contents integration
- ✅ Multi-format content support
User Interface
- ✅ Intuitive navigation controls
- ✅ Responsive layout with scrolling
- ✅ Font size customization
- ✅ Keyboard shortcuts
- ✅ Status feedback
Dependencies
The EPUB reader leverages existing pyWebLayout infrastructure:
pyWebLayout.io.readers.epub_reader- EPUB parsingpyWebLayout.io.readers.html_extraction- HTML to abstract blockspyWebLayout.concrete.*- Renderable objectspyWebLayout.abstract.*- Abstract document modelpyWebLayout.style.*- Styling system
Testing
Run the test suite to verify functionality:
python test_enhanced_page.py
Expected output:
- ✅ HTML String Loading: PASS
- ✅ HTML File Loading: PASS
- ✅ EPUB Reader Imports: PASS
Future Enhancements
- Advanced Pagination: Break long chapters across multiple pages
- Search Functionality: Full-text search within books
- Bookmarks: Save reading position
- Themes: Dark/light mode support
- Export: Save pages as images or PDFs
- Zoom: Variable zoom levels for accessibility
Integration with Existing Browser
The enhanced Page class can be used to improve the existing html_browser.py:
# Instead of complex parsing in the browser
parser = HTMLParser()
page = parser.parse_html_string(html_content)
# Use the new Page class
page = Page()
page.load_html_string(html_content)
This provides better separation of concerns and reuses the robust HTML extraction system.