diff --git a/BROWSER_README.md b/BROWSER_README.md deleted file mode 100644 index f102d68..0000000 --- a/BROWSER_README.md +++ /dev/null @@ -1,143 +0,0 @@ -# pyWebLayout HTML Browser - -A simple HTML browser built using the pyWebLayout library components from `pyWebLayout/io/` and `pyWebLayout/concrete/`. - -## Features - -This browser demonstrates the capabilities of pyWebLayout by implementing: - -### Rendering Components -- **Text rendering** with various formatting (bold, italic, underline) -- **Headers** (H1-H6) with proper sizing and styling -- **Links** (clickable, with external browser opening for external URLs) -- **Images** (local files and web URLs with error handling) -- **Layout containers** for proper element positioning -- **Basic HTML parsing** and element conversion - -### User Interface -- **Navigation controls**: Back, Forward, Refresh buttons -- **Address bar**: Enter URLs or file paths -- **File browser**: Open local HTML files -- **Scrollable content area** with both vertical and horizontal scrollbars -- **Mouse interaction**: Clickable links with hover effects -- **Status bar**: Shows current operation status - -## Usage - -### Starting the Browser -```bash -python html_browser.py -``` - -### Loading Content - -1. **Load the test page**: The browser starts with a welcome page showing various features -2. **Open local files**: Click "Open File" to browse and select HTML files -3. **Enter URLs**: Type URLs in the address bar and press Enter or click "Go" -4. **Navigate**: Use back/forward buttons to navigate through history - -### Test Files - -- `test_page.html` - A comprehensive test page demonstrating all supported features including: - - Text formatting (bold, italic, underline) - - Headers of all levels (H1-H6) - - Links (both internal and external) - - Images (includes the sample image from tests/data/) - - Line breaks and paragraphs - -## Architecture - -### HTML Parser (`HTMLParser` class) -- Simple regex-based HTML tokenizer -- Converts HTML elements to pyWebLayout abstract objects -- Handles font styling with a font stack for nested formatting -- Supports basic HTML tags: h1-h6, b, strong, i, em, u, a, img, br, p, div, span - -### Browser Window (`BrowserWindow` class) -- Tkinter-based GUI with navigation controls -- Canvas-based rendering of pyWebLayout Page objects -- Mouse event handling for interactive elements -- Navigation history management -- File and URL loading capabilities - -### pyWebLayout Integration - -The browser uses these pyWebLayout components: - -#### From `pyWebLayout/concrete/`: -- `Page` - Top-level container for web page content -- `Container` - Layout management for multiple elements -- `Box` - Basic rectangular container with positioning -- `Text` - Text rendering with font styling -- `RenderableImage` - Image loading and display with scaling -- `RenderableLink` - Interactive link elements -- `RenderableButton` - Interactive button elements - -#### From `pyWebLayout/abstract/`: -- `Link` - Abstract link representation with types (internal, external, API, function) -- `Image` - Abstract image representation with dimensions and loading -- Font and styling classes for text appearance - -#### From `pyWebLayout/style/`: -- `Font` - Font management with size, weight, style, and decoration -- `FontWeight`, `FontStyle`, `TextDecoration` - Typography enums -- `Alignment` - Layout positioning options - -## Supported HTML Features - -### Text Elements -- `

` to `

` - Headers with appropriate sizing -- `

` - Paragraphs with spacing -- ``, `` - Bold text -- ``, `` - Italic text -- `` - Underlined text -- `
` - Line breaks - -### Interactive Elements -- `` - Links (opens external URLs in system browser) - -### Media Elements -- `...` - Images with scaling - -### Container Elements -- `

`, `` - Generic containers (parsed but not specially styled) - -## Example Usage - -```python -# Start the browser -from html_browser import BrowserWindow - -browser = BrowserWindow() -browser.run() -``` - -## Limitations - -This is a demonstration browser with simplified HTML parsing: -- No CSS support (styling is done through pyWebLayout components) -- No JavaScript execution -- Limited HTML tag support -- No form submission (forms can be rendered but not submitted) -- No advanced layout features (flexbox, grid, etc.) - -## Dependencies - -- `tkinter` - GUI framework (usually included with Python) -- `PIL` (Pillow) - Image processing -- `requests` - HTTP requests for web URLs -- `pyWebLayout` - The core layout and rendering library - -## Testing - -Load `test_page.html` to see all supported features in action: -1. Run the browser: `python html_browser.py` -2. Click "Open File" and select `test_page.html` -3. Explore the different text formatting, links, and image rendering - -The test page includes: -- Various header levels -- Text formatting examples -- Clickable links (try the Google link!) -- A sample image from the test data -- Mixed content demonstrations diff --git a/EPUB_READER_README.md b/EPUB_READER_README.md deleted file mode 100644 index dd4361e..0000000 --- a/EPUB_READER_README.md +++ /dev/null @@ -1,175 +0,0 @@ -# EPUB Reader Documentation - -## Overview - -This project implements two major enhancements to pyWebLayout: - -1. **Enhanced Page Class**: Moved HTML rendering logic from the browser into the `Page` class for better separation of concerns -2. **Tkinter EPUB Reader**: A complete EPUB reader application with pagination support - -## Files Created/Modified - -### 1. Enhanced Page Class (`pyWebLayout/concrete/page.py`) - -**New Features Added:** -- `load_html_string()` - Load HTML content directly into a Page -- `load_html_file()` - Load HTML from a file -- Private conversion methods to transform abstract blocks to renderables -- Integration with existing HTML extraction system - -**Key Methods:** -```python -page = Page(size=(800, 600)) -page.load_html_string(html_content) # Load HTML string -page.load_html_file("file.html") # Load HTML file -image = page.render() # Render to PIL Image -``` - -**Benefits:** -- Reuses existing `html_extraction.py` infrastructure -- Converts abstract blocks to concrete renderables -- Supports headings, paragraphs, lists, images, etc. -- Proper error handling with fallback rendering - -### 2. EPUB Reader Application (`epub_reader_tk.py`) - -**Features:** -- Complete Tkinter-based GUI -- EPUB file loading using existing `epub_reader.py` -- Chapter navigation with dropdown selection -- Page-by-page display with navigation controls -- Adjustable font size (8-24pt) -- Keyboard shortcuts (arrow keys, Ctrl+O) -- Status bar with loading feedback -- Scrollable content display - -**GUI Components:** -- File open dialog for EPUB selection -- Chapter dropdown and navigation buttons -- Page navigation controls -- Font size adjustment -- Canvas with scrollbars for content display -- Status bar for feedback - -**Navigation:** -- **Left/Right arrows**: Previous/Next page -- **Up/Down arrows**: Previous/Next chapter -- **Ctrl+O**: Open file dialog -- **Mouse**: Dropdown chapter selection - -### 3. Test Suite (`test_enhanced_page.py`) - -**Test Coverage:** -- HTML string loading and rendering -- HTML file loading and rendering -- EPUB reader app import and instantiation -- Error handling verification - -## Technical Architecture - -### HTML Processing Flow -``` -HTML String/File → parse_html_string() → Abstract Blocks → Page._convert_block_to_renderable() → Concrete Renderables → Page.render() → PIL Image -``` - -### EPUB Reading Flow -``` -EPUB File → read_epub() → Book → Chapters → Abstract Blocks → Page Conversion → Tkinter Display -``` - -## Usage Examples - -### Basic HTML Page Rendering -```python -from pyWebLayout.concrete.page import Page - -# Create and load HTML -page = Page(size=(800, 600)) -page.load_html_string(""" -

Hello World

-

This is a test paragraph.

-""") - -# Render to image -image = page.render() -image.save("output.png") -``` - -### EPUB Reader Application -```python -# Run the EPUB reader -python epub_reader_tk.py - -# Or import and use programmatically -from epub_reader_tk import EPUBReaderApp -app = EPUBReaderApp() -app.run() -``` - -## Features Demonstrated - -### HTML Parsing & Rendering -- ✅ Paragraphs with inline formatting (bold, italic) -- ✅ Headers (H1-H6) with proper sizing -- ✅ Lists (ordered and unordered) -- ✅ Images with alt text fallback -- ✅ Error handling for malformed content - -### EPUB Processing -- ✅ Full EPUB metadata extraction -- ✅ Chapter-by-chapter navigation -- ✅ Table of contents integration -- ✅ Multi-format content support - -### User Interface -- ✅ Intuitive navigation controls -- ✅ Responsive layout with scrolling -- ✅ Font size customization -- ✅ Keyboard shortcuts -- ✅ Status feedback - -## Dependencies - -The EPUB reader leverages existing pyWebLayout infrastructure: -- `pyWebLayout.io.readers.epub_reader` - EPUB parsing -- `pyWebLayout.io.readers.html_extraction` - HTML to abstract blocks -- `pyWebLayout.concrete.*` - Renderable objects -- `pyWebLayout.abstract.*` - Abstract document model -- `pyWebLayout.style.*` - Styling system - -## Testing - -Run the test suite to verify functionality: -```bash -python test_enhanced_page.py -``` - -Expected output: -- ✅ HTML String Loading: PASS -- ✅ HTML File Loading: PASS -- ✅ EPUB Reader Imports: PASS - -## Future Enhancements - -1. **Advanced Pagination**: Break long chapters across multiple pages -2. **Search Functionality**: Full-text search within books -3. **Bookmarks**: Save reading position -4. **Themes**: Dark/light mode support -5. **Export**: Save pages as images or PDFs -6. **Zoom**: Variable zoom levels for accessibility - -## Integration with Existing Browser - -The enhanced Page class can be used to improve the existing `html_browser.py`: - -```python -# Instead of complex parsing in the browser -parser = HTMLParser() -page = parser.parse_html_string(html_content) - -# Use the new Page class -page = Page() -page.load_html_string(html_content) -``` - -This provides better separation of concerns and reuses the robust HTML extraction system. diff --git a/docs/images/ereader_highlighting.gif b/docs/images/ereader_highlighting.gif new file mode 100644 index 0000000..ca188b6 Binary files /dev/null and b/docs/images/ereader_highlighting.gif differ diff --git a/ereader_bookmarks/test_bookmarks.json b/ereader_bookmarks/test_bookmarks.json deleted file mode 100644 index 3d85313..0000000 --- a/ereader_bookmarks/test_bookmarks.json +++ /dev/null @@ -1,12 +0,0 @@ -{ - "demo_bookmark": { - "chapter_index": 0, - "block_index": 27, - "word_index": 0, - "table_row": 0, - "table_col": 0, - "list_item_index": 0, - "remaining_pretext": null, - "page_y_offset": 0 - } -} \ No newline at end of file diff --git a/ereader_bookmarks/test_position.json b/ereader_bookmarks/test_position.json deleted file mode 100644 index c8b76f3..0000000 --- a/ereader_bookmarks/test_position.json +++ /dev/null @@ -1,10 +0,0 @@ -{ - "chapter_index": 0, - "block_index": 54, - "word_index": 0, - "table_row": 0, - "table_col": 0, - "list_item_index": 0, - "remaining_pretext": null, - "page_y_offset": 0 -} \ No newline at end of file diff --git a/examples/README_EREADER.md b/examples/README_EREADER.md deleted file mode 100644 index a031acf..0000000 --- a/examples/README_EREADER.md +++ /dev/null @@ -1,363 +0,0 @@ -# EbookReader - Simple EPUB Reader Application - -The `EbookReader` class provides a complete, user-friendly interface for building ebook reader applications with pyWebLayout. It wraps all the complex ereader infrastructure into a simple API. - -## Features - -- 📖 **EPUB Loading** - Load EPUB files with automatic content extraction -- ⬅️➡️ **Page Navigation** - Forward and backward page navigation -- 🔖 **Position Management** - Save/load reading positions (stable across font changes) -- 📑 **Chapter Navigation** - Jump to chapters by title or index -- 🔤 **Font Size Control** - Increase/decrease font size with live re-rendering -- 📏 **Spacing Control** - Adjust line and block spacing -- 📊 **Progress Tracking** - Get reading progress and position information -- 💾 **Context Manager Support** - Automatic cleanup with `with` statement - -## Quick Start - -```python -from pyWebLayout.layout.ereader_application import EbookReader - -# Create reader -reader = EbookReader(page_size=(800, 1000)) - -# Load an EPUB -reader.load_epub("mybook.epub") - -# Get current page as PIL Image -page_image = reader.get_current_page() -page_image.save("current_page.png") - -# Navigate -reader.next_page() -reader.previous_page() - -# Close reader -reader.close() -``` - -## API Reference - -### Initialization - -```python -reader = EbookReader( - page_size=(800, 1000), # Page dimensions (width, height) in pixels - margin=40, # Page margin in pixels - background_color=(255, 255, 255), # RGB background color - line_spacing=5, # Line spacing in pixels - inter_block_spacing=15, # Space between blocks in pixels - bookmarks_dir="ereader_bookmarks", # Directory for bookmarks - buffer_size=5 # Number of pages to cache -) -``` - -### Loading EPUB - -```python -# Load EPUB file -success = reader.load_epub("path/to/book.epub") - -# Check if book is loaded -if reader.is_loaded(): - print("Book loaded successfully") - -# Get book information -book_info = reader.get_book_info() -# Returns: { -# 'title': 'Book Title', -# 'author': 'Author Name', -# 'document_id': 'book', -# 'total_blocks': 5000, -# 'total_chapters': 20, -# 'page_size': (800, 1000), -# 'font_scale': 1.0 -# } -``` - -### Page Navigation - -```python -# Get current page as PIL Image -page = reader.get_current_page() - -# Navigate to next page -page = reader.next_page() # Returns None at end of book - -# Navigate to previous page -page = reader.previous_page() # Returns None at beginning - -# Save current page to file -reader.render_to_file("page.png") -``` - -### Position Management - -Positions are saved based on abstract document structure (chapter/block/word indices), making them stable across font size and styling changes. - -```python -# Save current position -reader.save_position("my_bookmark") - -# Load saved position -page = reader.load_position("my_bookmark") - -# List all saved positions -positions = reader.list_saved_positions() -# Returns: ['my_bookmark', 'chapter_2', ...] - -# Delete a position -reader.delete_position("my_bookmark") - -# Get detailed position info -info = reader.get_position_info() -# Returns: { -# 'position': {'chapter_index': 0, 'block_index': 42, 'word_index': 15, ...}, -# 'chapter': {'title': 'Chapter 1', 'level': 'H1', ...}, -# 'progress': 0.15, # 15% through the book -# 'font_scale': 1.0, -# 'book_title': 'Book Title', -# 'book_author': 'Author Name' -# } - -# Get reading progress (0.0 to 1.0) -progress = reader.get_reading_progress() -print(f"You're {progress*100:.1f}% through the book") -``` - -### Chapter Navigation - -```python -# Get all chapters -chapters = reader.get_chapters() -# Returns: [('Chapter 1', 0), ('Chapter 2', 1), ...] - -# Get chapters with positions -chapter_positions = reader.get_chapter_positions() -# Returns: [('Chapter 1', RenderingPosition(...)), ...] - -# Jump to chapter by index -page = reader.jump_to_chapter(1) # Jump to second chapter - -# Jump to chapter by title -page = reader.jump_to_chapter("Chapter 1") - -# Get current chapter info -chapter_info = reader.get_current_chapter_info() -# Returns: {'title': 'Chapter 1', 'level': HeadingLevel.H1, 'block_index': 0} -``` - -### Font Size Control - -```python -# Get current font size scale -scale = reader.get_font_size() # Default: 1.0 - -# Set specific font size scale -page = reader.set_font_size(1.5) # 150% of normal size - -# Increase font size by 10% -page = reader.increase_font_size() - -# Decrease font size by 10% -page = reader.decrease_font_size() -``` - -### Spacing Control - -```python -# Set line spacing (spacing between lines within a paragraph) -page = reader.set_line_spacing(10) # 10 pixels - -# Set inter-block spacing (spacing between paragraphs, headings, etc.) -page = reader.set_inter_block_spacing(20) # 20 pixels -``` - -### Context Manager - -The reader supports Python's context manager protocol for automatic cleanup: - -```python -with EbookReader(page_size=(800, 1000)) as reader: - reader.load_epub("book.epub") - page = reader.get_current_page() - # ... do stuff -# Automatically saves position and cleans up resources -``` - -## Complete Example - -```python -from pyWebLayout.layout.ereader_application import EbookReader - -# Create reader with custom settings -with EbookReader( - page_size=(800, 1000), - margin=50, - line_spacing=8, - inter_block_spacing=20 -) as reader: - # Load EPUB - if not reader.load_epub("my_novel.epub"): - print("Failed to load EPUB") - exit(1) - - # Get book info - info = reader.get_book_info() - print(f"Reading: {info['title']} by {info['author']}") - print(f"Total chapters: {info['total_chapters']}") - - # Navigate through first few pages - for i in range(5): - page = reader.get_current_page() - page.save(f"page_{i+1:03d}.png") - reader.next_page() - - # Save current position - reader.save_position("page_5") - - # Jump to a chapter - chapters = reader.get_chapters() - if len(chapters) > 2: - print(f"Jumping to: {chapters[2][0]}") - reader.jump_to_chapter(2) - reader.render_to_file("chapter_3_start.png") - - # Return to saved position - reader.load_position("page_5") - - # Adjust font size - reader.increase_font_size() - reader.render_to_file("page_5_larger_font.png") - - # Get progress - progress = reader.get_reading_progress() - print(f"Reading progress: {progress*100:.1f}%") -``` - -## Demo Script - -Run the comprehensive demo to see all features in action: - -```bash -python examples/ereader_demo.py path/to/book.epub -``` - -This will demonstrate: -- Basic page navigation -- Position save/load -- Chapter navigation -- Font size adjustments -- Spacing adjustments -- Book information retrieval - -The demo generates multiple PNG files showing different pages and settings. - -## Position Storage Format - -Positions are stored as JSON files in the `bookmarks_dir` (default: `ereader_bookmarks/`): - -```json -{ - "chapter_index": 0, - "block_index": 42, - "word_index": 15, - "table_row": 0, - "table_col": 0, - "list_item_index": 0, - "remaining_pretext": null, - "page_y_offset": 0 -} -``` - -This format is tied to the abstract document structure, making positions stable across: -- Font size changes -- Line spacing changes -- Inter-block spacing changes -- Page size changes - -## Integration Example: Simple GUI - -Here's a minimal example of integrating with Tkinter: - -```python -import tkinter as tk -from tkinter import filedialog -from PIL import ImageTk -from pyWebLayout.layout.ereader_application import EbookReader - -class SimpleEreaderGUI: - def __init__(self, root): - self.root = root - self.reader = EbookReader(page_size=(600, 800)) - - # Create UI - self.image_label = tk.Label(root) - self.image_label.pack() - - btn_frame = tk.Frame(root) - btn_frame.pack() - - tk.Button(btn_frame, text="Open EPUB", command=self.open_epub).pack(side=tk.LEFT) - tk.Button(btn_frame, text="Previous", command=self.prev_page).pack(side=tk.LEFT) - tk.Button(btn_frame, text="Next", command=self.next_page).pack(side=tk.LEFT) - tk.Button(btn_frame, text="Font+", command=self.increase_font).pack(side=tk.LEFT) - tk.Button(btn_frame, text="Font-", command=self.decrease_font).pack(side=tk.LEFT) - - def open_epub(self): - filepath = filedialog.askopenfilename(filetypes=[("EPUB files", "*.epub")]) - if filepath: - self.reader.load_epub(filepath) - self.display_page() - - def display_page(self): - page = self.reader.get_current_page() - if page: - photo = ImageTk.PhotoImage(page) - self.image_label.config(image=photo) - self.image_label.image = photo - - def next_page(self): - if self.reader.next_page(): - self.display_page() - - def prev_page(self): - if self.reader.previous_page(): - self.display_page() - - def increase_font(self): - self.reader.increase_font_size() - self.display_page() - - def decrease_font(self): - self.reader.decrease_font_size() - self.display_page() - -root = tk.Tk() -root.title("Simple Ereader") -app = SimpleEreaderGUI(root) -root.mainloop() -``` - -## Performance Notes - -- The reader uses intelligent page caching for fast navigation -- First page load may take ~1 second, subsequent pages are typically < 0.1 seconds -- Background rendering attempts to pre-cache upcoming pages (you may see pickle warnings, which can be ignored) -- Font size changes invalidate the cache and require re-rendering from the current position -- Position save/load is nearly instantaneous - -## Limitations - -- Currently supports EPUB files only (no PDF, MOBI, etc.) -- Images in EPUBs may not render in some cases -- Tables are skipped in rendering -- Complex HTML layouts may not render perfectly -- No text selection or search functionality (these would need to be added separately) - -## See Also - -- `examples/ereader_demo.py` - Comprehensive feature demonstration -- `pyWebLayout/layout/ereader_manager.py` - Underlying manager class -- `pyWebLayout/layout/ereader_layout.py` - Core layout engine -- `examples/README_EPUB_RENDERERS.md` - Lower-level EPUB rendering diff --git a/pyWebLayout/concrete/text.py b/pyWebLayout/concrete/text.py index 4817075..acd66ac 100644 --- a/pyWebLayout/concrete/text.py +++ b/pyWebLayout/concrete/text.py @@ -1,8 +1,12 @@ from __future__ import annotations from pyWebLayout.core.base import Renderable, Queriable +from pyWebLayout.core.query import QueryResult from .box import Box from pyWebLayout.style import Alignment, Font, FontStyle, FontWeight, TextDecoration from pyWebLayout.abstract import Word +from pyWebLayout.abstract.inline import LinkedWord +from pyWebLayout.abstract.functional import Link +from .functional import LinkText, ButtonText from PIL import Image, ImageDraw, ImageFont from typing import Tuple, Union, List, Optional, Protocol import numpy as np @@ -383,10 +387,6 @@ class Line(Box): - success: True if word/part was added, False if it couldn't fit - overflow_text: Remaining text if word was hyphenated, None otherwise """ - # Import LinkedWord here to avoid circular imports - from pyWebLayout.abstract.inline import LinkedWord - from pyWebLayout.concrete.functional import LinkText - # First, add any pretext from previous hyphenation if part is not None: self._text_objects.append(part) @@ -399,7 +399,6 @@ class Line(Box): # LinkText constructor needs: (link, text, font, draw, source, line) # But LinkedWord itself contains the link properties # We'll create a Link object from the LinkedWord properties - from pyWebLayout.abstract.functional import Link link = Link( location=word.location, link_type=word.link_type, @@ -572,9 +571,6 @@ class Line(Box): Returns: QueryResult from the text object at that point, or None """ - from pyWebLayout.core.query import QueryResult - from .functional import LinkText, ButtonText - point_array = np.array(point) # Check each text object in this line