more reorg
All checks were successful
Python CI / test (push) Successful in 47s

This commit is contained in:
Duncan Tourolle 2025-11-07 19:14:37 +01:00
parent 33e2cbc363
commit f72c6015c6
7 changed files with 4 additions and 711 deletions

View File

@ -1,143 +0,0 @@
# pyWebLayout HTML Browser
A simple HTML browser built using the pyWebLayout library components from `pyWebLayout/io/` and `pyWebLayout/concrete/`.
## Features
This browser demonstrates the capabilities of pyWebLayout by implementing:
### Rendering Components
- **Text rendering** with various formatting (bold, italic, underline)
- **Headers** (H1-H6) with proper sizing and styling
- **Links** (clickable, with external browser opening for external URLs)
- **Images** (local files and web URLs with error handling)
- **Layout containers** for proper element positioning
- **Basic HTML parsing** and element conversion
### User Interface
- **Navigation controls**: Back, Forward, Refresh buttons
- **Address bar**: Enter URLs or file paths
- **File browser**: Open local HTML files
- **Scrollable content area** with both vertical and horizontal scrollbars
- **Mouse interaction**: Clickable links with hover effects
- **Status bar**: Shows current operation status
## Usage
### Starting the Browser
```bash
python html_browser.py
```
### Loading Content
1. **Load the test page**: The browser starts with a welcome page showing various features
2. **Open local files**: Click "Open File" to browse and select HTML files
3. **Enter URLs**: Type URLs in the address bar and press Enter or click "Go"
4. **Navigate**: Use back/forward buttons to navigate through history
### Test Files
- `test_page.html` - A comprehensive test page demonstrating all supported features including:
- Text formatting (bold, italic, underline)
- Headers of all levels (H1-H6)
- Links (both internal and external)
- Images (includes the sample image from tests/data/)
- Line breaks and paragraphs
## Architecture
### HTML Parser (`HTMLParser` class)
- Simple regex-based HTML tokenizer
- Converts HTML elements to pyWebLayout abstract objects
- Handles font styling with a font stack for nested formatting
- Supports basic HTML tags: h1-h6, b, strong, i, em, u, a, img, br, p, div, span
### Browser Window (`BrowserWindow` class)
- Tkinter-based GUI with navigation controls
- Canvas-based rendering of pyWebLayout Page objects
- Mouse event handling for interactive elements
- Navigation history management
- File and URL loading capabilities
### pyWebLayout Integration
The browser uses these pyWebLayout components:
#### From `pyWebLayout/concrete/`:
- `Page` - Top-level container for web page content
- `Container` - Layout management for multiple elements
- `Box` - Basic rectangular container with positioning
- `Text` - Text rendering with font styling
- `RenderableImage` - Image loading and display with scaling
- `RenderableLink` - Interactive link elements
- `RenderableButton` - Interactive button elements
#### From `pyWebLayout/abstract/`:
- `Link` - Abstract link representation with types (internal, external, API, function)
- `Image` - Abstract image representation with dimensions and loading
- Font and styling classes for text appearance
#### From `pyWebLayout/style/`:
- `Font` - Font management with size, weight, style, and decoration
- `FontWeight`, `FontStyle`, `TextDecoration` - Typography enums
- `Alignment` - Layout positioning options
## Supported HTML Features
### Text Elements
- `<h1>` to `<h6>` - Headers with appropriate sizing
- `<p>` - Paragraphs with spacing
- `<b>`, `<strong>` - Bold text
- `<i>`, `<em>` - Italic text
- `<u>` - Underlined text
- `<br>` - Line breaks
### Interactive Elements
- `<a href="...">` - Links (opens external URLs in system browser)
### Media Elements
- `<img src="..." alt="..." width="..." height="...">` - Images with scaling
### Container Elements
- `<div>`, `<span>` - Generic containers (parsed but not specially styled)
## Example Usage
```python
# Start the browser
from html_browser import BrowserWindow
browser = BrowserWindow()
browser.run()
```
## Limitations
This is a demonstration browser with simplified HTML parsing:
- No CSS support (styling is done through pyWebLayout components)
- No JavaScript execution
- Limited HTML tag support
- No form submission (forms can be rendered but not submitted)
- No advanced layout features (flexbox, grid, etc.)
## Dependencies
- `tkinter` - GUI framework (usually included with Python)
- `PIL` (Pillow) - Image processing
- `requests` - HTTP requests for web URLs
- `pyWebLayout` - The core layout and rendering library
## Testing
Load `test_page.html` to see all supported features in action:
1. Run the browser: `python html_browser.py`
2. Click "Open File" and select `test_page.html`
3. Explore the different text formatting, links, and image rendering
The test page includes:
- Various header levels
- Text formatting examples
- Clickable links (try the Google link!)
- A sample image from the test data
- Mixed content demonstrations

View File

@ -1,175 +0,0 @@
# EPUB Reader Documentation
## Overview
This project implements two major enhancements to pyWebLayout:
1. **Enhanced Page Class**: Moved HTML rendering logic from the browser into the `Page` class for better separation of concerns
2. **Tkinter EPUB Reader**: A complete EPUB reader application with pagination support
## Files Created/Modified
### 1. Enhanced Page Class (`pyWebLayout/concrete/page.py`)
**New Features Added:**
- `load_html_string()` - Load HTML content directly into a Page
- `load_html_file()` - Load HTML from a file
- Private conversion methods to transform abstract blocks to renderables
- Integration with existing HTML extraction system
**Key Methods:**
```python
page = Page(size=(800, 600))
page.load_html_string(html_content) # Load HTML string
page.load_html_file("file.html") # Load HTML file
image = page.render() # Render to PIL Image
```
**Benefits:**
- Reuses existing `html_extraction.py` infrastructure
- Converts abstract blocks to concrete renderables
- Supports headings, paragraphs, lists, images, etc.
- Proper error handling with fallback rendering
### 2. EPUB Reader Application (`epub_reader_tk.py`)
**Features:**
- Complete Tkinter-based GUI
- EPUB file loading using existing `epub_reader.py`
- Chapter navigation with dropdown selection
- Page-by-page display with navigation controls
- Adjustable font size (8-24pt)
- Keyboard shortcuts (arrow keys, Ctrl+O)
- Status bar with loading feedback
- Scrollable content display
**GUI Components:**
- File open dialog for EPUB selection
- Chapter dropdown and navigation buttons
- Page navigation controls
- Font size adjustment
- Canvas with scrollbars for content display
- Status bar for feedback
**Navigation:**
- **Left/Right arrows**: Previous/Next page
- **Up/Down arrows**: Previous/Next chapter
- **Ctrl+O**: Open file dialog
- **Mouse**: Dropdown chapter selection
### 3. Test Suite (`test_enhanced_page.py`)
**Test Coverage:**
- HTML string loading and rendering
- HTML file loading and rendering
- EPUB reader app import and instantiation
- Error handling verification
## Technical Architecture
### HTML Processing Flow
```
HTML String/File → parse_html_string() → Abstract Blocks → Page._convert_block_to_renderable() → Concrete Renderables → Page.render() → PIL Image
```
### EPUB Reading Flow
```
EPUB File → read_epub() → Book → Chapters → Abstract Blocks → Page Conversion → Tkinter Display
```
## Usage Examples
### Basic HTML Page Rendering
```python
from pyWebLayout.concrete.page import Page
# Create and load HTML
page = Page(size=(800, 600))
page.load_html_string("""
<h1>Hello World</h1>
<p>This is a <strong>test</strong> paragraph.</p>
""")
# Render to image
image = page.render()
image.save("output.png")
```
### EPUB Reader Application
```python
# Run the EPUB reader
python epub_reader_tk.py
# Or import and use programmatically
from epub_reader_tk import EPUBReaderApp
app = EPUBReaderApp()
app.run()
```
## Features Demonstrated
### HTML Parsing & Rendering
- ✅ Paragraphs with inline formatting (bold, italic)
- ✅ Headers (H1-H6) with proper sizing
- ✅ Lists (ordered and unordered)
- ✅ Images with alt text fallback
- ✅ Error handling for malformed content
### EPUB Processing
- ✅ Full EPUB metadata extraction
- ✅ Chapter-by-chapter navigation
- ✅ Table of contents integration
- ✅ Multi-format content support
### User Interface
- ✅ Intuitive navigation controls
- ✅ Responsive layout with scrolling
- ✅ Font size customization
- ✅ Keyboard shortcuts
- ✅ Status feedback
## Dependencies
The EPUB reader leverages existing pyWebLayout infrastructure:
- `pyWebLayout.io.readers.epub_reader` - EPUB parsing
- `pyWebLayout.io.readers.html_extraction` - HTML to abstract blocks
- `pyWebLayout.concrete.*` - Renderable objects
- `pyWebLayout.abstract.*` - Abstract document model
- `pyWebLayout.style.*` - Styling system
## Testing
Run the test suite to verify functionality:
```bash
python test_enhanced_page.py
```
Expected output:
- ✅ HTML String Loading: PASS
- ✅ HTML File Loading: PASS
- ✅ EPUB Reader Imports: PASS
## Future Enhancements
1. **Advanced Pagination**: Break long chapters across multiple pages
2. **Search Functionality**: Full-text search within books
3. **Bookmarks**: Save reading position
4. **Themes**: Dark/light mode support
5. **Export**: Save pages as images or PDFs
6. **Zoom**: Variable zoom levels for accessibility
## Integration with Existing Browser
The enhanced Page class can be used to improve the existing `html_browser.py`:
```python
# Instead of complex parsing in the browser
parser = HTMLParser()
page = parser.parse_html_string(html_content)
# Use the new Page class
page = Page()
page.load_html_string(html_content)
```
This provides better separation of concerns and reuses the robust HTML extraction system.

Binary file not shown.

After

Width:  |  Height:  |  Size: 170 KiB

View File

@ -1,12 +0,0 @@
{
"demo_bookmark": {
"chapter_index": 0,
"block_index": 27,
"word_index": 0,
"table_row": 0,
"table_col": 0,
"list_item_index": 0,
"remaining_pretext": null,
"page_y_offset": 0
}
}

View File

@ -1,10 +0,0 @@
{
"chapter_index": 0,
"block_index": 54,
"word_index": 0,
"table_row": 0,
"table_col": 0,
"list_item_index": 0,
"remaining_pretext": null,
"page_y_offset": 0
}

View File

@ -1,363 +0,0 @@
# EbookReader - Simple EPUB Reader Application
The `EbookReader` class provides a complete, user-friendly interface for building ebook reader applications with pyWebLayout. It wraps all the complex ereader infrastructure into a simple API.
## Features
- 📖 **EPUB Loading** - Load EPUB files with automatic content extraction
- ⬅️➡️ **Page Navigation** - Forward and backward page navigation
- 🔖 **Position Management** - Save/load reading positions (stable across font changes)
- 📑 **Chapter Navigation** - Jump to chapters by title or index
- 🔤 **Font Size Control** - Increase/decrease font size with live re-rendering
- 📏 **Spacing Control** - Adjust line and block spacing
- 📊 **Progress Tracking** - Get reading progress and position information
- 💾 **Context Manager Support** - Automatic cleanup with `with` statement
## Quick Start
```python
from pyWebLayout.layout.ereader_application import EbookReader
# Create reader
reader = EbookReader(page_size=(800, 1000))
# Load an EPUB
reader.load_epub("mybook.epub")
# Get current page as PIL Image
page_image = reader.get_current_page()
page_image.save("current_page.png")
# Navigate
reader.next_page()
reader.previous_page()
# Close reader
reader.close()
```
## API Reference
### Initialization
```python
reader = EbookReader(
page_size=(800, 1000), # Page dimensions (width, height) in pixels
margin=40, # Page margin in pixels
background_color=(255, 255, 255), # RGB background color
line_spacing=5, # Line spacing in pixels
inter_block_spacing=15, # Space between blocks in pixels
bookmarks_dir="ereader_bookmarks", # Directory for bookmarks
buffer_size=5 # Number of pages to cache
)
```
### Loading EPUB
```python
# Load EPUB file
success = reader.load_epub("path/to/book.epub")
# Check if book is loaded
if reader.is_loaded():
print("Book loaded successfully")
# Get book information
book_info = reader.get_book_info()
# Returns: {
# 'title': 'Book Title',
# 'author': 'Author Name',
# 'document_id': 'book',
# 'total_blocks': 5000,
# 'total_chapters': 20,
# 'page_size': (800, 1000),
# 'font_scale': 1.0
# }
```
### Page Navigation
```python
# Get current page as PIL Image
page = reader.get_current_page()
# Navigate to next page
page = reader.next_page() # Returns None at end of book
# Navigate to previous page
page = reader.previous_page() # Returns None at beginning
# Save current page to file
reader.render_to_file("page.png")
```
### Position Management
Positions are saved based on abstract document structure (chapter/block/word indices), making them stable across font size and styling changes.
```python
# Save current position
reader.save_position("my_bookmark")
# Load saved position
page = reader.load_position("my_bookmark")
# List all saved positions
positions = reader.list_saved_positions()
# Returns: ['my_bookmark', 'chapter_2', ...]
# Delete a position
reader.delete_position("my_bookmark")
# Get detailed position info
info = reader.get_position_info()
# Returns: {
# 'position': {'chapter_index': 0, 'block_index': 42, 'word_index': 15, ...},
# 'chapter': {'title': 'Chapter 1', 'level': 'H1', ...},
# 'progress': 0.15, # 15% through the book
# 'font_scale': 1.0,
# 'book_title': 'Book Title',
# 'book_author': 'Author Name'
# }
# Get reading progress (0.0 to 1.0)
progress = reader.get_reading_progress()
print(f"You're {progress*100:.1f}% through the book")
```
### Chapter Navigation
```python
# Get all chapters
chapters = reader.get_chapters()
# Returns: [('Chapter 1', 0), ('Chapter 2', 1), ...]
# Get chapters with positions
chapter_positions = reader.get_chapter_positions()
# Returns: [('Chapter 1', RenderingPosition(...)), ...]
# Jump to chapter by index
page = reader.jump_to_chapter(1) # Jump to second chapter
# Jump to chapter by title
page = reader.jump_to_chapter("Chapter 1")
# Get current chapter info
chapter_info = reader.get_current_chapter_info()
# Returns: {'title': 'Chapter 1', 'level': HeadingLevel.H1, 'block_index': 0}
```
### Font Size Control
```python
# Get current font size scale
scale = reader.get_font_size() # Default: 1.0
# Set specific font size scale
page = reader.set_font_size(1.5) # 150% of normal size
# Increase font size by 10%
page = reader.increase_font_size()
# Decrease font size by 10%
page = reader.decrease_font_size()
```
### Spacing Control
```python
# Set line spacing (spacing between lines within a paragraph)
page = reader.set_line_spacing(10) # 10 pixels
# Set inter-block spacing (spacing between paragraphs, headings, etc.)
page = reader.set_inter_block_spacing(20) # 20 pixels
```
### Context Manager
The reader supports Python's context manager protocol for automatic cleanup:
```python
with EbookReader(page_size=(800, 1000)) as reader:
reader.load_epub("book.epub")
page = reader.get_current_page()
# ... do stuff
# Automatically saves position and cleans up resources
```
## Complete Example
```python
from pyWebLayout.layout.ereader_application import EbookReader
# Create reader with custom settings
with EbookReader(
page_size=(800, 1000),
margin=50,
line_spacing=8,
inter_block_spacing=20
) as reader:
# Load EPUB
if not reader.load_epub("my_novel.epub"):
print("Failed to load EPUB")
exit(1)
# Get book info
info = reader.get_book_info()
print(f"Reading: {info['title']} by {info['author']}")
print(f"Total chapters: {info['total_chapters']}")
# Navigate through first few pages
for i in range(5):
page = reader.get_current_page()
page.save(f"page_{i+1:03d}.png")
reader.next_page()
# Save current position
reader.save_position("page_5")
# Jump to a chapter
chapters = reader.get_chapters()
if len(chapters) > 2:
print(f"Jumping to: {chapters[2][0]}")
reader.jump_to_chapter(2)
reader.render_to_file("chapter_3_start.png")
# Return to saved position
reader.load_position("page_5")
# Adjust font size
reader.increase_font_size()
reader.render_to_file("page_5_larger_font.png")
# Get progress
progress = reader.get_reading_progress()
print(f"Reading progress: {progress*100:.1f}%")
```
## Demo Script
Run the comprehensive demo to see all features in action:
```bash
python examples/ereader_demo.py path/to/book.epub
```
This will demonstrate:
- Basic page navigation
- Position save/load
- Chapter navigation
- Font size adjustments
- Spacing adjustments
- Book information retrieval
The demo generates multiple PNG files showing different pages and settings.
## Position Storage Format
Positions are stored as JSON files in the `bookmarks_dir` (default: `ereader_bookmarks/`):
```json
{
"chapter_index": 0,
"block_index": 42,
"word_index": 15,
"table_row": 0,
"table_col": 0,
"list_item_index": 0,
"remaining_pretext": null,
"page_y_offset": 0
}
```
This format is tied to the abstract document structure, making positions stable across:
- Font size changes
- Line spacing changes
- Inter-block spacing changes
- Page size changes
## Integration Example: Simple GUI
Here's a minimal example of integrating with Tkinter:
```python
import tkinter as tk
from tkinter import filedialog
from PIL import ImageTk
from pyWebLayout.layout.ereader_application import EbookReader
class SimpleEreaderGUI:
def __init__(self, root):
self.root = root
self.reader = EbookReader(page_size=(600, 800))
# Create UI
self.image_label = tk.Label(root)
self.image_label.pack()
btn_frame = tk.Frame(root)
btn_frame.pack()
tk.Button(btn_frame, text="Open EPUB", command=self.open_epub).pack(side=tk.LEFT)
tk.Button(btn_frame, text="Previous", command=self.prev_page).pack(side=tk.LEFT)
tk.Button(btn_frame, text="Next", command=self.next_page).pack(side=tk.LEFT)
tk.Button(btn_frame, text="Font+", command=self.increase_font).pack(side=tk.LEFT)
tk.Button(btn_frame, text="Font-", command=self.decrease_font).pack(side=tk.LEFT)
def open_epub(self):
filepath = filedialog.askopenfilename(filetypes=[("EPUB files", "*.epub")])
if filepath:
self.reader.load_epub(filepath)
self.display_page()
def display_page(self):
page = self.reader.get_current_page()
if page:
photo = ImageTk.PhotoImage(page)
self.image_label.config(image=photo)
self.image_label.image = photo
def next_page(self):
if self.reader.next_page():
self.display_page()
def prev_page(self):
if self.reader.previous_page():
self.display_page()
def increase_font(self):
self.reader.increase_font_size()
self.display_page()
def decrease_font(self):
self.reader.decrease_font_size()
self.display_page()
root = tk.Tk()
root.title("Simple Ereader")
app = SimpleEreaderGUI(root)
root.mainloop()
```
## Performance Notes
- The reader uses intelligent page caching for fast navigation
- First page load may take ~1 second, subsequent pages are typically < 0.1 seconds
- Background rendering attempts to pre-cache upcoming pages (you may see pickle warnings, which can be ignored)
- Font size changes invalidate the cache and require re-rendering from the current position
- Position save/load is nearly instantaneous
## Limitations
- Currently supports EPUB files only (no PDF, MOBI, etc.)
- Images in EPUBs may not render in some cases
- Tables are skipped in rendering
- Complex HTML layouts may not render perfectly
- No text selection or search functionality (these would need to be added separately)
## See Also
- `examples/ereader_demo.py` - Comprehensive feature demonstration
- `pyWebLayout/layout/ereader_manager.py` - Underlying manager class
- `pyWebLayout/layout/ereader_layout.py` - Core layout engine
- `examples/README_EPUB_RENDERERS.md` - Lower-level EPUB rendering

View File

@ -1,8 +1,12 @@
from __future__ import annotations from __future__ import annotations
from pyWebLayout.core.base import Renderable, Queriable from pyWebLayout.core.base import Renderable, Queriable
from pyWebLayout.core.query import QueryResult
from .box import Box from .box import Box
from pyWebLayout.style import Alignment, Font, FontStyle, FontWeight, TextDecoration from pyWebLayout.style import Alignment, Font, FontStyle, FontWeight, TextDecoration
from pyWebLayout.abstract import Word from pyWebLayout.abstract import Word
from pyWebLayout.abstract.inline import LinkedWord
from pyWebLayout.abstract.functional import Link
from .functional import LinkText, ButtonText
from PIL import Image, ImageDraw, ImageFont from PIL import Image, ImageDraw, ImageFont
from typing import Tuple, Union, List, Optional, Protocol from typing import Tuple, Union, List, Optional, Protocol
import numpy as np import numpy as np
@ -383,10 +387,6 @@ class Line(Box):
- success: True if word/part was added, False if it couldn't fit - success: True if word/part was added, False if it couldn't fit
- overflow_text: Remaining text if word was hyphenated, None otherwise - overflow_text: Remaining text if word was hyphenated, None otherwise
""" """
# Import LinkedWord here to avoid circular imports
from pyWebLayout.abstract.inline import LinkedWord
from pyWebLayout.concrete.functional import LinkText
# First, add any pretext from previous hyphenation # First, add any pretext from previous hyphenation
if part is not None: if part is not None:
self._text_objects.append(part) self._text_objects.append(part)
@ -399,7 +399,6 @@ class Line(Box):
# LinkText constructor needs: (link, text, font, draw, source, line) # LinkText constructor needs: (link, text, font, draw, source, line)
# But LinkedWord itself contains the link properties # But LinkedWord itself contains the link properties
# We'll create a Link object from the LinkedWord properties # We'll create a Link object from the LinkedWord properties
from pyWebLayout.abstract.functional import Link
link = Link( link = Link(
location=word.location, location=word.location,
link_type=word.link_type, link_type=word.link_type,
@ -572,9 +571,6 @@ class Line(Box):
Returns: Returns:
QueryResult from the text object at that point, or None QueryResult from the text object at that point, or None
""" """
from pyWebLayout.core.query import QueryResult
from .functional import LinkText, ButtonText
point_array = np.array(point) point_array = np.array(point)
# Check each text object in this line # Check each text object in this line