more reorg

2025-11-07 19:14:37 +01:00 · 2025-11-07 19:14:37 +01:00 · f72c6015c6
commit f72c6015c6
parent 33e2cbc363
7 changed files with 4 additions and 711 deletions
--- a/BROWSER_README.md
+++ b/BROWSER_README.md
@ -1,143 +0,0 @@
 # pyWebLayout HTML Browser
 A simple HTML browser built using the pyWebLayout library components from `pyWebLayout/io/` and `pyWebLayout/concrete/`.
 ## Features
 This browser demonstrates the capabilities of pyWebLayout by implementing:
 ### Rendering Components
 - **Text rendering** with various formatting (bold, italic, underline)
 - **Headers** (H1-H6) with proper sizing and styling
 - **Links** (clickable, with external browser opening for external URLs)
 - **Images** (local files and web URLs with error handling)
 - **Layout containers** for proper element positioning
 - **Basic HTML parsing** and element conversion
 ### User Interface
 - **Navigation controls**: Back, Forward, Refresh buttons
 - **Address bar**: Enter URLs or file paths
 - **File browser**: Open local HTML files
 - **Scrollable content area** with both vertical and horizontal scrollbars
 - **Mouse interaction**: Clickable links with hover effects
 - **Status bar**: Shows current operation status
 ## Usage
 ### Starting the Browser
 ```bash
 python html_browser.py
 ```
 ### Loading Content
 1. **Load the test page**: The browser starts with a welcome page showing various features
 2. **Open local files**: Click "Open File" to browse and select HTML files
 3. **Enter URLs**: Type URLs in the address bar and press Enter or click "Go"
 4. **Navigate**: Use back/forward buttons to navigate through history
 ### Test Files
 - `test_page.html` - A comprehensive test page demonstrating all supported features including:
  - Text formatting (bold, italic, underline)
  - Headers of all levels (H1-H6)
  - Links (both internal and external)
  - Images (includes the sample image from tests/data/)
  - Line breaks and paragraphs
 ## Architecture
 ### HTML Parser (`HTMLParser` class)
 - Simple regex-based HTML tokenizer
 - Converts HTML elements to pyWebLayout abstract objects
 - Handles font styling with a font stack for nested formatting
 - Supports basic HTML tags: h1-h6, b, strong, i, em, u, a, img, br, p, div, span
 ### Browser Window (`BrowserWindow` class)
 - Tkinter-based GUI with navigation controls
 - Canvas-based rendering of pyWebLayout Page objects
 - Mouse event handling for interactive elements
 - Navigation history management
 - File and URL loading capabilities
 ### pyWebLayout Integration
 The browser uses these pyWebLayout components:
 #### From `pyWebLayout/concrete/`:
 - `Page` - Top-level container for web page content
 - `Container` - Layout management for multiple elements
 - `Box` - Basic rectangular container with positioning
 - `Text` - Text rendering with font styling
 - `RenderableImage` - Image loading and display with scaling
 - `RenderableLink` - Interactive link elements
 - `RenderableButton` - Interactive button elements
 #### From `pyWebLayout/abstract/`:
 - `Link` - Abstract link representation with types (internal, external, API, function)
 - `Image` - Abstract image representation with dimensions and loading
 - Font and styling classes for text appearance
 #### From `pyWebLayout/style/`:
 - `Font` - Font management with size, weight, style, and decoration
 - `FontWeight`, `FontStyle`, `TextDecoration` - Typography enums
 - `Alignment` - Layout positioning options
 ## Supported HTML Features
 ### Text Elements
 - `<h1>` to `<h6>` - Headers with appropriate sizing
 - `<p>` - Paragraphs with spacing
 - `<b>`, `<strong>` - Bold text
 - `<i>`, `<em>` - Italic text
 - `<u>` - Underlined text
 - `<br>` - Line breaks
 ### Interactive Elements
 - `<a href="...">` - Links (opens external URLs in system browser)
 ### Media Elements
 - `<img src="..." alt="..." width="..." height="...">` - Images with scaling
 ### Container Elements
 - `<div>`, `<span>` - Generic containers (parsed but not specially styled)
 ## Example Usage
 ```python
 # Start the browser
 from html_browser import BrowserWindow
 browser = BrowserWindow()
 browser.run()
 ```
 ## Limitations
 This is a demonstration browser with simplified HTML parsing:
 - No CSS support (styling is done through pyWebLayout components)
 - No JavaScript execution
 - Limited HTML tag support
 - No form submission (forms can be rendered but not submitted)
 - No advanced layout features (flexbox, grid, etc.)
 ## Dependencies
 - `tkinter` - GUI framework (usually included with Python)
 - `PIL` (Pillow) - Image processing
 - `requests` - HTTP requests for web URLs
 - `pyWebLayout` - The core layout and rendering library
 ## Testing
 Load `test_page.html` to see all supported features in action:
 1. Run the browser: `python html_browser.py`
 2. Click "Open File" and select `test_page.html`
 3. Explore the different text formatting, links, and image rendering
 The test page includes:
 - Various header levels
 - Text formatting examples
 - Clickable links (try the Google link!)
 - A sample image from the test data
 - Mixed content demonstrations
--- a/EPUB_READER_README.md
+++ b/EPUB_READER_README.md
@ -1,175 +0,0 @@
 # EPUB Reader Documentation
 ## Overview
 This project implements two major enhancements to pyWebLayout:
 1. **Enhanced Page Class**: Moved HTML rendering logic from the browser into the `Page` class for better separation of concerns
 2. **Tkinter EPUB Reader**: A complete EPUB reader application with pagination support
 ## Files Created/Modified
 ### 1. Enhanced Page Class (`pyWebLayout/concrete/page.py`)
 **New Features Added:**
 - `load_html_string()` - Load HTML content directly into a Page
 - `load_html_file()` - Load HTML from a file
 - Private conversion methods to transform abstract blocks to renderables
 - Integration with existing HTML extraction system
 **Key Methods:**
 ```python
 page = Page(size=(800, 600))
 page.load_html_string(html_content)  # Load HTML string
 page.load_html_file("file.html")     # Load HTML file
 image = page.render()                # Render to PIL Image
 ```
 **Benefits:**
 - Reuses existing `html_extraction.py` infrastructure
 - Converts abstract blocks to concrete renderables
 - Supports headings, paragraphs, lists, images, etc.
 - Proper error handling with fallback rendering
 ### 2. EPUB Reader Application (`epub_reader_tk.py`)
 **Features:**
 - Complete Tkinter-based GUI
 - EPUB file loading using existing `epub_reader.py`
 - Chapter navigation with dropdown selection
 - Page-by-page display with navigation controls
 - Adjustable font size (8-24pt)
 - Keyboard shortcuts (arrow keys, Ctrl+O)
 - Status bar with loading feedback
 - Scrollable content display
 **GUI Components:**
 - File open dialog for EPUB selection
 - Chapter dropdown and navigation buttons
 - Page navigation controls
 - Font size adjustment
 - Canvas with scrollbars for content display
 - Status bar for feedback
 **Navigation:**
 - **Left/Right arrows**: Previous/Next page
 - **Up/Down arrows**: Previous/Next chapter
 - **Ctrl+O**: Open file dialog
 - **Mouse**: Dropdown chapter selection
 ### 3. Test Suite (`test_enhanced_page.py`)
 **Test Coverage:**
 - HTML string loading and rendering
 - HTML file loading and rendering
 - EPUB reader app import and instantiation
 - Error handling verification
 ## Technical Architecture
 ### HTML Processing Flow
 ```
 HTML String/File → parse_html_string() → Abstract Blocks → Page._convert_block_to_renderable() → Concrete Renderables → Page.render() → PIL Image
 ```
 ### EPUB Reading Flow
 ```
 EPUB File → read_epub() → Book → Chapters → Abstract Blocks → Page Conversion → Tkinter Display
 ```
 ## Usage Examples
 ### Basic HTML Page Rendering
 ```python
 from pyWebLayout.concrete.page import Page
 # Create and load HTML
 page = Page(size=(800, 600))
 page.load_html_string("""
 <h1>Hello World</h1>
 <p>This is a <strong>test</strong> paragraph.</p>
 """)
 # Render to image
 image = page.render()
 image.save("output.png")
 ```
 ### EPUB Reader Application
 ```python
 # Run the EPUB reader
 python epub_reader_tk.py
 # Or import and use programmatically
 from epub_reader_tk import EPUBReaderApp
 app = EPUBReaderApp()
 app.run()
 ```
 ## Features Demonstrated
 ### HTML Parsing & Rendering
 - ✅ Paragraphs with inline formatting (bold, italic)
 - ✅ Headers (H1-H6) with proper sizing
 - ✅ Lists (ordered and unordered)
 - ✅ Images with alt text fallback
 - ✅ Error handling for malformed content
 ### EPUB Processing
 - ✅ Full EPUB metadata extraction
 - ✅ Chapter-by-chapter navigation
 - ✅ Table of contents integration
 - ✅ Multi-format content support
 ### User Interface
 - ✅ Intuitive navigation controls
 - ✅ Responsive layout with scrolling
 - ✅ Font size customization
 - ✅ Keyboard shortcuts
 - ✅ Status feedback
 ## Dependencies
 The EPUB reader leverages existing pyWebLayout infrastructure:
 - `pyWebLayout.io.readers.epub_reader` - EPUB parsing
 - `pyWebLayout.io.readers.html_extraction` - HTML to abstract blocks
 - `pyWebLayout.concrete.*` - Renderable objects
 - `pyWebLayout.abstract.*` - Abstract document model
 - `pyWebLayout.style.*` - Styling system
 ## Testing
 Run the test suite to verify functionality:
 ```bash
 python test_enhanced_page.py
 ```
 Expected output:
 - ✅ HTML String Loading: PASS
 - ✅ HTML File Loading: PASS  
 - ✅ EPUB Reader Imports: PASS
 ## Future Enhancements
 1. **Advanced Pagination**: Break long chapters across multiple pages
 2. **Search Functionality**: Full-text search within books
 3. **Bookmarks**: Save reading position
 4. **Themes**: Dark/light mode support
 5. **Export**: Save pages as images or PDFs
 6. **Zoom**: Variable zoom levels for accessibility
 ## Integration with Existing Browser
 The enhanced Page class can be used to improve the existing `html_browser.py`:
 ```python
 # Instead of complex parsing in the browser
 parser = HTMLParser()
 page = parser.parse_html_string(html_content)
 # Use the new Page class
 page = Page()
 page.load_html_string(html_content)
 ```
 This provides better separation of concerns and reuses the robust HTML extraction system.
--- a/docs/images/ereader_highlighting.gif
+++ b/docs/images/ereader_highlighting.gif
--- a/ereader_bookmarks/test_bookmarks.json
+++ b/ereader_bookmarks/test_bookmarks.json
@ -1,12 +0,0 @@
 {
  "demo_bookmark": {
    "chapter_index": 0,
    "block_index": 27,
    "word_index": 0,
    "table_row": 0,
    "table_col": 0,
    "list_item_index": 0,
    "remaining_pretext": null,
    "page_y_offset": 0
  }
 }
--- a/ereader_bookmarks/test_position.json
+++ b/ereader_bookmarks/test_position.json
@ -1,10 +0,0 @@
 {
  "chapter_index": 0,
  "block_index": 54,
  "word_index": 0,
  "table_row": 0,
  "table_col": 0,
  "list_item_index": 0,
  "remaining_pretext": null,
  "page_y_offset": 0
 }
--- a/examples/README_EREADER.md
+++ b/examples/README_EREADER.md
@ -1,363 +0,0 @@
 # EbookReader - Simple EPUB Reader Application
 The `EbookReader` class provides a complete, user-friendly interface for building ebook reader applications with pyWebLayout. It wraps all the complex ereader infrastructure into a simple API.
 ## Features
 - 📖 **EPUB Loading** - Load EPUB files with automatic content extraction
 - ⬅️➡️ **Page Navigation** - Forward and backward page navigation
 - 🔖 **Position Management** - Save/load reading positions (stable across font changes)
 - 📑 **Chapter Navigation** - Jump to chapters by title or index
 - 🔤 **Font Size Control** - Increase/decrease font size with live re-rendering
 - 📏 **Spacing Control** - Adjust line and block spacing
 - 📊 **Progress Tracking** - Get reading progress and position information
 - 💾 **Context Manager Support** - Automatic cleanup with `with` statement
 ## Quick Start
 ```python
 from pyWebLayout.layout.ereader_application import EbookReader
 # Create reader
 reader = EbookReader(page_size=(800, 1000))
 # Load an EPUB
 reader.load_epub("mybook.epub")
 # Get current page as PIL Image
 page_image = reader.get_current_page()
 page_image.save("current_page.png")
 # Navigate
 reader.next_page()
 reader.previous_page()
 # Close reader
 reader.close()
 ```
 ## API Reference
 ### Initialization
 ```python
 reader = EbookReader(
    page_size=(800, 1000),           # Page dimensions (width, height) in pixels
    margin=40,                        # Page margin in pixels
    background_color=(255, 255, 255), # RGB background color
    line_spacing=5,                   # Line spacing in pixels
    inter_block_spacing=15,           # Space between blocks in pixels
    bookmarks_dir="ereader_bookmarks", # Directory for bookmarks
    buffer_size=5                     # Number of pages to cache
 )
 ```
 ### Loading EPUB
 ```python
 # Load EPUB file
 success = reader.load_epub("path/to/book.epub")
 # Check if book is loaded
 if reader.is_loaded():
    print("Book loaded successfully")
 # Get book information
 book_info = reader.get_book_info()
 # Returns: {
 #   'title': 'Book Title',
 #   'author': 'Author Name',
 #   'document_id': 'book',
 #   'total_blocks': 5000,
 #   'total_chapters': 20,
 #   'page_size': (800, 1000),
 #   'font_scale': 1.0
 # }
 ```
 ### Page Navigation
 ```python
 # Get current page as PIL Image
 page = reader.get_current_page()
 # Navigate to next page
 page = reader.next_page()  # Returns None at end of book
 # Navigate to previous page
 page = reader.previous_page()  # Returns None at beginning
 # Save current page to file
 reader.render_to_file("page.png")
 ```
 ### Position Management
 Positions are saved based on abstract document structure (chapter/block/word indices), making them stable across font size and styling changes.
 ```python
 # Save current position
 reader.save_position("my_bookmark")
 # Load saved position
 page = reader.load_position("my_bookmark")
 # List all saved positions
 positions = reader.list_saved_positions()
 # Returns: ['my_bookmark', 'chapter_2', ...]
 # Delete a position
 reader.delete_position("my_bookmark")
 # Get detailed position info
 info = reader.get_position_info()
 # Returns: {
 #   'position': {'chapter_index': 0, 'block_index': 42, 'word_index': 15, ...},
 #   'chapter': {'title': 'Chapter 1', 'level': 'H1', ...},
 #   'progress': 0.15,  # 15% through the book
 #   'font_scale': 1.0,
 #   'book_title': 'Book Title',
 #   'book_author': 'Author Name'
 # }
 # Get reading progress (0.0 to 1.0)
 progress = reader.get_reading_progress()
 print(f"You're {progress*100:.1f}% through the book")
 ```
 ### Chapter Navigation
 ```python
 # Get all chapters
 chapters = reader.get_chapters()
 # Returns: [('Chapter 1', 0), ('Chapter 2', 1), ...]
 # Get chapters with positions
 chapter_positions = reader.get_chapter_positions()
 # Returns: [('Chapter 1', RenderingPosition(...)), ...]
 # Jump to chapter by index
 page = reader.jump_to_chapter(1)  # Jump to second chapter
 # Jump to chapter by title
 page = reader.jump_to_chapter("Chapter 1")
 # Get current chapter info
 chapter_info = reader.get_current_chapter_info()
 # Returns: {'title': 'Chapter 1', 'level': HeadingLevel.H1, 'block_index': 0}
 ```
 ### Font Size Control
 ```python
 # Get current font size scale
 scale = reader.get_font_size()  # Default: 1.0
 # Set specific font size scale
 page = reader.set_font_size(1.5)  # 150% of normal size
 # Increase font size by 10%
 page = reader.increase_font_size()
 # Decrease font size by 10%
 page = reader.decrease_font_size()
 ```
 ### Spacing Control
 ```python
 # Set line spacing (spacing between lines within a paragraph)
 page = reader.set_line_spacing(10)  # 10 pixels
 # Set inter-block spacing (spacing between paragraphs, headings, etc.)
 page = reader.set_inter_block_spacing(20)  # 20 pixels
 ```
 ### Context Manager
 The reader supports Python's context manager protocol for automatic cleanup:
 ```python
 with EbookReader(page_size=(800, 1000)) as reader:
    reader.load_epub("book.epub")
    page = reader.get_current_page()
    # ... do stuff
 # Automatically saves position and cleans up resources
 ```
 ## Complete Example
 ```python
 from pyWebLayout.layout.ereader_application import EbookReader
 # Create reader with custom settings
 with EbookReader(
    page_size=(800, 1000),
    margin=50,
    line_spacing=8,
    inter_block_spacing=20
 ) as reader:
    # Load EPUB
    if not reader.load_epub("my_novel.epub"):
        print("Failed to load EPUB")
        exit(1)
    # Get book info
    info = reader.get_book_info()
    print(f"Reading: {info['title']} by {info['author']}")
    print(f"Total chapters: {info['total_chapters']}")
    # Navigate through first few pages
    for i in range(5):
        page = reader.get_current_page()
        page.save(f"page_{i+1:03d}.png")
        reader.next_page()
    # Save current position
    reader.save_position("page_5")
    # Jump to a chapter
    chapters = reader.get_chapters()
    if len(chapters) > 2:
        print(f"Jumping to: {chapters[2][0]}")
        reader.jump_to_chapter(2)
        reader.render_to_file("chapter_3_start.png")
    # Return to saved position
    reader.load_position("page_5")
    # Adjust font size
    reader.increase_font_size()
    reader.render_to_file("page_5_larger_font.png")
    # Get progress
    progress = reader.get_reading_progress()
    print(f"Reading progress: {progress*100:.1f}%")
 ```
 ## Demo Script
 Run the comprehensive demo to see all features in action:
 ```bash
 python examples/ereader_demo.py path/to/book.epub
 ```
 This will demonstrate:
 - Basic page navigation
 - Position save/load
 - Chapter navigation
 - Font size adjustments
 - Spacing adjustments
 - Book information retrieval
 The demo generates multiple PNG files showing different pages and settings.
 ## Position Storage Format
 Positions are stored as JSON files in the `bookmarks_dir` (default: `ereader_bookmarks/`):
 ```json
 {
  "chapter_index": 0,
  "block_index": 42,
  "word_index": 15,
  "table_row": 0,
  "table_col": 0,
  "list_item_index": 0,
  "remaining_pretext": null,
  "page_y_offset": 0
 }
 ```
 This format is tied to the abstract document structure, making positions stable across:
 - Font size changes
 - Line spacing changes
 - Inter-block spacing changes
 - Page size changes
 ## Integration Example: Simple GUI
 Here's a minimal example of integrating with Tkinter:
 ```python
 import tkinter as tk
 from tkinter import filedialog
 from PIL import ImageTk
 from pyWebLayout.layout.ereader_application import EbookReader
 class SimpleEreaderGUI:
    def __init__(self, root):
        self.root = root
        self.reader = EbookReader(page_size=(600, 800))
        # Create UI
        self.image_label = tk.Label(root)
        self.image_label.pack()
        btn_frame = tk.Frame(root)
        btn_frame.pack()
        tk.Button(btn_frame, text="Open EPUB", command=self.open_epub).pack(side=tk.LEFT)
        tk.Button(btn_frame, text="Previous", command=self.prev_page).pack(side=tk.LEFT)
        tk.Button(btn_frame, text="Next", command=self.next_page).pack(side=tk.LEFT)
        tk.Button(btn_frame, text="Font+", command=self.increase_font).pack(side=tk.LEFT)
        tk.Button(btn_frame, text="Font-", command=self.decrease_font).pack(side=tk.LEFT)
    def open_epub(self):
        filepath = filedialog.askopenfilename(filetypes=[("EPUB files", "*.epub")])
        if filepath:
            self.reader.load_epub(filepath)
            self.display_page()
    def display_page(self):
        page = self.reader.get_current_page()
        if page:
            photo = ImageTk.PhotoImage(page)
            self.image_label.config(image=photo)
            self.image_label.image = photo
    def next_page(self):
        if self.reader.next_page():
            self.display_page()
    def prev_page(self):
        if self.reader.previous_page():
            self.display_page()
    def increase_font(self):
        self.reader.increase_font_size()
        self.display_page()
    def decrease_font(self):
        self.reader.decrease_font_size()
        self.display_page()
 root = tk.Tk()
 root.title("Simple Ereader")
 app = SimpleEreaderGUI(root)
 root.mainloop()
 ```
 ## Performance Notes
 - The reader uses intelligent page caching for fast navigation
 - First page load may take ~1 second, subsequent pages are typically < 0.1 seconds
 - Background rendering attempts to pre-cache upcoming pages (you may see pickle warnings, which can be ignored)
 - Font size changes invalidate the cache and require re-rendering from the current position
 - Position save/load is nearly instantaneous
 ## Limitations
 - Currently supports EPUB files only (no PDF, MOBI, etc.)
 - Images in EPUBs may not render in some cases
 - Tables are skipped in rendering
 - Complex HTML layouts may not render perfectly
 - No text selection or search functionality (these would need to be added separately)
 ## See Also
 - `examples/ereader_demo.py` - Comprehensive feature demonstration
 - `pyWebLayout/layout/ereader_manager.py` - Underlying manager class
 - `pyWebLayout/layout/ereader_layout.py` - Core layout engine
 - `examples/README_EPUB_RENDERERS.md` - Lower-level EPUB rendering
--- a/pyWebLayout/concrete/text.py
+++ b/pyWebLayout/concrete/text.py
@ -1,8 +1,12 @@
 from __future__ import annotations
 from pyWebLayout.core.base import Renderable, Queriable
 from pyWebLayout.core.query import QueryResult
 from .box import Box
 from pyWebLayout.style import Alignment, Font, FontStyle, FontWeight, TextDecoration
 from pyWebLayout.abstract import Word
 from pyWebLayout.abstract.inline import LinkedWord
 from pyWebLayout.abstract.functional import Link
 from .functional import LinkText, ButtonText
 from PIL import Image, ImageDraw, ImageFont
 from typing import Tuple, Union, List, Optional, Protocol
 import numpy as np
@ -383,10 +387,6 @@ class Line(Box):
            - success: True if word/part was added, False if it couldn't fit
            - overflow_text: Remaining text if word was hyphenated, None otherwise
        """
        # Import LinkedWord here to avoid circular imports
        from pyWebLayout.abstract.inline import LinkedWord
        from pyWebLayout.concrete.functional import LinkText
        # First, add any pretext from previous hyphenation
        if part is not None:
            self._text_objects.append(part)
@ -399,7 +399,6 @@ class Line(Box):
            # LinkText constructor needs: (link, text, font, draw, source, line)
            # But LinkedWord itself contains the link properties
            # We'll create a Link object from the LinkedWord properties
            from pyWebLayout.abstract.functional import Link
            link = Link(
                location=word.location,
                link_type=word.link_type,
@ -572,9 +571,6 @@ class Line(Box):
        Returns:
            QueryResult from the text object at that point, or None
        """
        from pyWebLayout.core.query import QueryResult
        from .functional import LinkText, ButtonText
        point_array = np.array(point)
        # Check each text object in this line