pyWebLayout/EPUB_READER_README.md

4.8 KiB

EPUB Reader Documentation

Overview

This project implements two major enhancements to pyWebLayout:

  1. Enhanced Page Class: Moved HTML rendering logic from the browser into the Page class for better separation of concerns
  2. Tkinter EPUB Reader: A complete EPUB reader application with pagination support

Files Created/Modified

1. Enhanced Page Class (pyWebLayout/concrete/page.py)

New Features Added:

  • load_html_string() - Load HTML content directly into a Page
  • load_html_file() - Load HTML from a file
  • Private conversion methods to transform abstract blocks to renderables
  • Integration with existing HTML extraction system

Key Methods:

page = Page(size=(800, 600))
page.load_html_string(html_content)  # Load HTML string
page.load_html_file("file.html")     # Load HTML file
image = page.render()                # Render to PIL Image

Benefits:

  • Reuses existing html_extraction.py infrastructure
  • Converts abstract blocks to concrete renderables
  • Supports headings, paragraphs, lists, images, etc.
  • Proper error handling with fallback rendering

2. EPUB Reader Application (epub_reader_tk.py)

Features:

  • Complete Tkinter-based GUI
  • EPUB file loading using existing epub_reader.py
  • Chapter navigation with dropdown selection
  • Page-by-page display with navigation controls
  • Adjustable font size (8-24pt)
  • Keyboard shortcuts (arrow keys, Ctrl+O)
  • Status bar with loading feedback
  • Scrollable content display

GUI Components:

  • File open dialog for EPUB selection
  • Chapter dropdown and navigation buttons
  • Page navigation controls
  • Font size adjustment
  • Canvas with scrollbars for content display
  • Status bar for feedback

Navigation:

  • Left/Right arrows: Previous/Next page
  • Up/Down arrows: Previous/Next chapter
  • Ctrl+O: Open file dialog
  • Mouse: Dropdown chapter selection

3. Test Suite (test_enhanced_page.py)

Test Coverage:

  • HTML string loading and rendering
  • HTML file loading and rendering
  • EPUB reader app import and instantiation
  • Error handling verification

Technical Architecture

HTML Processing Flow

HTML String/File → parse_html_string() → Abstract Blocks → Page._convert_block_to_renderable() → Concrete Renderables → Page.render() → PIL Image

EPUB Reading Flow

EPUB File → read_epub() → Book → Chapters → Abstract Blocks → Page Conversion → Tkinter Display

Usage Examples

Basic HTML Page Rendering

from pyWebLayout.concrete.page import Page

# Create and load HTML
page = Page(size=(800, 600))
page.load_html_string("""
<h1>Hello World</h1>
<p>This is a <strong>test</strong> paragraph.</p>
""")

# Render to image
image = page.render()
image.save("output.png")

EPUB Reader Application

# Run the EPUB reader
python epub_reader_tk.py

# Or import and use programmatically
from epub_reader_tk import EPUBReaderApp
app = EPUBReaderApp()
app.run()

Features Demonstrated

HTML Parsing & Rendering

  • Paragraphs with inline formatting (bold, italic)
  • Headers (H1-H6) with proper sizing
  • Lists (ordered and unordered)
  • Images with alt text fallback
  • Error handling for malformed content

EPUB Processing

  • Full EPUB metadata extraction
  • Chapter-by-chapter navigation
  • Table of contents integration
  • Multi-format content support

User Interface

  • Intuitive navigation controls
  • Responsive layout with scrolling
  • Font size customization
  • Keyboard shortcuts
  • Status feedback

Dependencies

The EPUB reader leverages existing pyWebLayout infrastructure:

  • pyWebLayout.io.readers.epub_reader - EPUB parsing
  • pyWebLayout.io.readers.html_extraction - HTML to abstract blocks
  • pyWebLayout.concrete.* - Renderable objects
  • pyWebLayout.abstract.* - Abstract document model
  • pyWebLayout.style.* - Styling system

Testing

Run the test suite to verify functionality:

python test_enhanced_page.py

Expected output:

  • HTML String Loading: PASS
  • HTML File Loading: PASS
  • EPUB Reader Imports: PASS

Future Enhancements

  1. Advanced Pagination: Break long chapters across multiple pages
  2. Search Functionality: Full-text search within books
  3. Bookmarks: Save reading position
  4. Themes: Dark/light mode support
  5. Export: Save pages as images or PDFs
  6. Zoom: Variable zoom levels for accessibility

Integration with Existing Browser

The enhanced Page class can be used to improve the existing html_browser.py:

# Instead of complex parsing in the browser
parser = HTMLParser()
page = parser.parse_html_string(html_content)

# Use the new Page class
page = Page()
page.load_html_string(html_content)

This provides better separation of concerns and reuses the robust HTML extraction system.