176
README.md
@ -12,26 +12,26 @@ A Python library for HTML-like layout and rendering.
|
|||||||
> 📋 **Note**: Badges show results from the commit referenced in the URLs. Red "error" badges indicate build failures for that specific step.
|
> 📋 **Note**: Badges show results from the commit referenced in the URLs. Red "error" badges indicate build failures for that specific step.
|
||||||
## Description
|
## Description
|
||||||
|
|
||||||
PyWebLayout is a Python library for rendering HTML and EPUB content to paginated images. The library provides a high-level **EbookReader** API for building interactive ebook reader applications, along with powerful HTML-to-page rendering capabilities.
|
PyWebLayout is a Python library for HTML-like layout and rendering to paginated images. It provides a flexible page rendering system with support for borders, padding, text layout, and HTML parsing.
|
||||||
|
|
||||||
## Key Features
|
## Key Features
|
||||||
|
|
||||||
### EbookReader - High-Level API
|
### Page Rendering System
|
||||||
- 📖 **EPUB Support** - Load and render EPUB files
|
- 📄 **Flexible Page Layouts** - Create pages with customizable sizes, borders, and padding
|
||||||
- 📄 **Page Rendering** - Render pages as PIL Images
|
- 🎨 **Styling System** - Control backgrounds, border colors, and spacing
|
||||||
- ⬅️➡️ **Navigation** - Forward and backward page navigation
|
- 📐 **Multiple Layouts** - Support for portrait, landscape, and square pages
|
||||||
- 🔖 **Bookmarks** - Save and load reading positions
|
- 🖼️ **Image Output** - Render pages to PIL Images (PNG, JPEG, etc.)
|
||||||
- 📑 **Chapter Navigation** - Jump to chapters by title or index
|
|
||||||
- 🔤 **Font Control** - Adjust font size dynamically
|
|
||||||
- 📏 **Spacing Control** - Customize line and paragraph spacing
|
|
||||||
- 📊 **Progress Tracking** - Monitor reading progress
|
|
||||||
|
|
||||||
### Core Capabilities
|
### Text and HTML Support
|
||||||
- HTML-to-page layout system
|
- 📝 **HTML Parsing** - Parse HTML content into structured document blocks
|
||||||
- Multi-page document rendering
|
- 🔤 **Font Support** - Multiple font sizes, weights, and styles
|
||||||
- Advanced text rendering with font support
|
- ↔️ **Text Alignment** - Left, center, right, and justified text
|
||||||
- Position tracking across layout changes
|
- 📖 **Rich Content** - Headings, paragraphs, bold, italic, and more
|
||||||
- Intelligent line breaking and pagination
|
|
||||||
|
### Architecture
|
||||||
|
- **Abstract/Concrete Separation** - Clean separation between content structure and rendering
|
||||||
|
- **Extensible Design** - Easy to extend with custom renderables
|
||||||
|
- **Type-safe** - Comprehensive type hints throughout the codebase
|
||||||
|
|
||||||
## Installation
|
## Installation
|
||||||
|
|
||||||
@ -41,106 +41,98 @@ pip install pyWebLayout
|
|||||||
|
|
||||||
## Quick Start
|
## Quick Start
|
||||||
|
|
||||||
### EbookReader - Recommended API
|
### Basic Page Rendering
|
||||||
|
|
||||||
```python
|
```python
|
||||||
from pyWebLayout.layout.ereader_application import EbookReader
|
from pyWebLayout.concrete.page import Page
|
||||||
|
from pyWebLayout.style.page_style import PageStyle
|
||||||
|
|
||||||
# Create an ebook reader
|
# Create a styled page
|
||||||
with EbookReader(page_size=(800, 1000)) as reader:
|
page_style = PageStyle(
|
||||||
# Load an EPUB file
|
border_width=2,
|
||||||
reader.load_epub("mybook.epub")
|
border_color=(200, 200, 200),
|
||||||
|
padding=(30, 30, 30, 30), # top, right, bottom, left
|
||||||
|
background_color=(255, 255, 255)
|
||||||
|
)
|
||||||
|
|
||||||
# Get current page as PIL Image
|
page = Page(size=(600, 800), style=page_style)
|
||||||
page = reader.get_current_page()
|
|
||||||
page.save("page_001.png")
|
|
||||||
|
|
||||||
# Navigate through pages
|
# Render to image
|
||||||
reader.next_page()
|
image = page.render()
|
||||||
reader.previous_page()
|
image.save("my_page.png")
|
||||||
|
|
||||||
# Save reading position
|
|
||||||
reader.save_position("chapter_3")
|
|
||||||
|
|
||||||
# Jump to a chapter
|
|
||||||
reader.jump_to_chapter("Chapter 5")
|
|
||||||
|
|
||||||
# Adjust font size
|
|
||||||
reader.increase_font_size()
|
|
||||||
|
|
||||||
# Get progress
|
|
||||||
progress = reader.get_reading_progress()
|
|
||||||
print(f"Progress: {progress*100:.1f}%")
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### EbookReader in Action
|
### HTML Content Parsing
|
||||||
|
|
||||||
Here are animated demonstrations of the EbookReader's key features:
|
|
||||||
|
|
||||||
<table>
|
|
||||||
<tr>
|
|
||||||
<td align="center">
|
|
||||||
<b>Page Navigation</b><br>
|
|
||||||
<img src="docs/images/ereader_page_navigation.gif" width="300" alt="Page Navigation"><br>
|
|
||||||
<em>Forward and backward navigation through pages</em>
|
|
||||||
</td>
|
|
||||||
<td align="center">
|
|
||||||
<b>Font Size Adjustment</b><br>
|
|
||||||
<img src="docs/images/ereader_font_size.gif" width="300" alt="Font Size"><br>
|
|
||||||
<em>Dynamic font size scaling with position preservation</em>
|
|
||||||
</td>
|
|
||||||
</tr>
|
|
||||||
<tr>
|
|
||||||
<td align="center">
|
|
||||||
<b>Chapter Navigation</b><br>
|
|
||||||
<img src="docs/images/ereader_chapter_navigation.gif" width="300" alt="Chapter Navigation"><br>
|
|
||||||
<em>Jump directly to chapters by title or index</em>
|
|
||||||
</td>
|
|
||||||
<td align="center">
|
|
||||||
<b>Bookmarks & Positions</b><br>
|
|
||||||
<img src="docs/images/ereader_bookmarks.gif" width="300" alt="Bookmarks"><br>
|
|
||||||
<em>Save and restore reading positions anywhere in the book</em>
|
|
||||||
</td>
|
|
||||||
</tr>
|
|
||||||
</table>
|
|
||||||
|
|
||||||
### HTML Multi-Page Rendering
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
from pyWebLayout.io.readers.html_extraction import html_to_blocks
|
from pyWebLayout.io.readers.html_extraction import parse_html_string
|
||||||
from pyWebLayout.layout.document_layouter import paragraph_layouter
|
from pyWebLayout.style import Font
|
||||||
from pyWebLayout.concrete.page import Page
|
|
||||||
|
|
||||||
# Parse HTML to blocks
|
# Parse HTML to structured blocks
|
||||||
html = """
|
html = """
|
||||||
<h1>Document Title</h1>
|
<h1>Document Title</h1>
|
||||||
<p>First paragraph with <b>bold</b> text.</p>
|
<p>First paragraph with <b>bold</b> text.</p>
|
||||||
<p>Second paragraph with more content.</p>
|
<p>Second paragraph with more content.</p>
|
||||||
"""
|
"""
|
||||||
blocks = html_to_blocks(html)
|
|
||||||
|
|
||||||
# Render to pages
|
base_font = Font(font_size=14)
|
||||||
page = Page(size=(600, 800))
|
blocks = parse_html_string(html, base_font=base_font)
|
||||||
# Layout blocks onto pages using document_layouter
|
|
||||||
# See examples/ directory for complete multi-page examples
|
# blocks is a list of structured content (Paragraph, Heading, etc.)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Visual Examples
|
||||||
|
|
||||||
|
The library supports various page layouts and configurations:
|
||||||
|
|
||||||
|
<table>
|
||||||
|
<tr>
|
||||||
|
<td align="center" width="33%">
|
||||||
|
<b>Page Styles</b><br>
|
||||||
|
<img src="docs/images/example_01_page_rendering.png" width="250" alt="Page Rendering"><br>
|
||||||
|
<em>Different borders, padding, and backgrounds</em>
|
||||||
|
</td>
|
||||||
|
<td align="center" width="33%">
|
||||||
|
<b>HTML Content</b><br>
|
||||||
|
<img src="docs/images/example_02_text_and_layout.png" width="250" alt="Text Layout"><br>
|
||||||
|
<em>Parsed HTML with various text styles</em>
|
||||||
|
</td>
|
||||||
|
<td align="center" width="33%">
|
||||||
|
<b>Page Layouts</b><br>
|
||||||
|
<img src="docs/images/example_03_page_layouts.png" width="250" alt="Page Layouts"><br>
|
||||||
|
<em>Portrait, landscape, and square formats</em>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
</table>
|
||||||
|
|
||||||
## Examples
|
## Examples
|
||||||
|
|
||||||
Check out the `examples/` directory for complete working examples:
|
The `examples/` directory contains working demonstrations:
|
||||||
|
|
||||||
- **`simple_ereader_example.py`** - Quick start with EbookReader
|
### Getting Started
|
||||||
- **`ereader_demo.py`** - Comprehensive EbookReader feature demo
|
- **[01_simple_page_rendering.py](examples/01_simple_page_rendering.py)** - Introduction to the Page system
|
||||||
- **`generate_ereader_gifs.py`** - Generate animated GIF demonstrations
|
- **[02_text_and_layout.py](examples/02_text_and_layout.py)** - HTML parsing and text rendering
|
||||||
- **`html_multipage_demo.py`** - HTML to multi-page rendering
|
- **[03_page_layouts.py](examples/03_page_layouts.py)** - Different page configurations
|
||||||
- See `examples/README.md` for full list
|
|
||||||
|
### Advanced Examples
|
||||||
|
- **[html_multipage_simple.py](examples/html_multipage_simple.py)** - Multi-page HTML rendering
|
||||||
|
- **[html_multipage_demo_final.py](examples/html_multipage_demo_final.py)** - Complete multi-page layout
|
||||||
|
- **[html_line_breaking_demo.py](examples/html_line_breaking_demo.py)** - Line breaking demonstration
|
||||||
|
|
||||||
|
Run any example:
|
||||||
|
```bash
|
||||||
|
cd examples
|
||||||
|
python 01_simple_page_rendering.py
|
||||||
|
```
|
||||||
|
|
||||||
|
See **[examples/README.md](examples/README.md)** for detailed documentation.
|
||||||
|
|
||||||
## Documentation
|
## Documentation
|
||||||
|
|
||||||
- **EbookReader API**: `examples/README_EREADER.md`
|
- **[ARCHITECTURE.md](ARCHITECTURE.md)** - Detailed explanation of Abstract/Concrete architecture
|
||||||
- **HTML Rendering**: `examples/README_HTML_MULTIPAGE.md`
|
- **[examples/README.md](examples/README.md)** - Complete guide to all examples
|
||||||
- **Architecture**: `ARCHITECTURE.md`
|
- **[examples/README_HTML_MULTIPAGE.md](examples/README_HTML_MULTIPAGE.md)** - HTML rendering guide
|
||||||
- **Examples**: `examples/README.md`
|
- **API Reference** - See docstrings in source code
|
||||||
|
|
||||||
## License
|
## License
|
||||||
|
|
||||||
|
|||||||
|
Before Width: | Height: | Size: 506 KiB |
|
Before Width: | Height: | Size: 287 KiB |
|
Before Width: | Height: | Size: 683 KiB |
|
Before Width: | Height: | Size: 170 KiB |
|
Before Width: | Height: | Size: 507 KiB |
BIN
docs/images/example_01_page_rendering.png
Normal file
|
After Width: | Height: | Size: 8.0 KiB |
BIN
docs/images/example_02_text_and_layout.png
Normal file
|
After Width: | Height: | Size: 23 KiB |
BIN
docs/images/example_03_page_layouts.png
Normal file
|
After Width: | Height: | Size: 12 KiB |
199
examples/01_simple_page_rendering.py
Normal file
@ -0,0 +1,199 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Simple Page Rendering Example
|
||||||
|
|
||||||
|
This example demonstrates:
|
||||||
|
- Creating pages with different styles
|
||||||
|
- Setting borders, padding, and background colors
|
||||||
|
- Understanding the page layout system
|
||||||
|
- Rendering pages to images
|
||||||
|
|
||||||
|
This is a foundational example showing the basic Page API.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import sys
|
||||||
|
from pathlib import Path
|
||||||
|
from PIL import Image, ImageDraw, ImageFont
|
||||||
|
|
||||||
|
# Add pyWebLayout to path
|
||||||
|
sys.path.insert(0, str(Path(__file__).parent.parent))
|
||||||
|
|
||||||
|
from pyWebLayout.concrete.page import Page
|
||||||
|
from pyWebLayout.style.page_style import PageStyle
|
||||||
|
|
||||||
|
|
||||||
|
def draw_placeholder_content(page: Page):
|
||||||
|
"""Draw some placeholder content directly on the page to visualize the layout."""
|
||||||
|
if page.draw is None:
|
||||||
|
# Trigger canvas creation
|
||||||
|
page.render()
|
||||||
|
|
||||||
|
draw = page.draw
|
||||||
|
|
||||||
|
# Draw content area boundary (for visualization)
|
||||||
|
content_x = page.border_size + page.style.padding_left
|
||||||
|
content_y = page.border_size + page.style.padding_top
|
||||||
|
content_w = page.content_size[0]
|
||||||
|
content_h = page.content_size[1]
|
||||||
|
|
||||||
|
# Draw a light blue rectangle showing the content area
|
||||||
|
draw.rectangle(
|
||||||
|
[content_x, content_y, content_x + content_w, content_y + content_h],
|
||||||
|
outline=(100, 150, 255),
|
||||||
|
width=1
|
||||||
|
)
|
||||||
|
|
||||||
|
# Add some text labels
|
||||||
|
try:
|
||||||
|
font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf", 12)
|
||||||
|
except:
|
||||||
|
font = ImageFont.load_default()
|
||||||
|
|
||||||
|
# Label the areas
|
||||||
|
draw.text((content_x + 10, content_y + 10), "Content Area", fill=(100, 100, 100), font=font)
|
||||||
|
draw.text((10, 10), f"Border: {page.border_size}px", fill=(150, 150, 150), font=font)
|
||||||
|
draw.text((content_x + 10, content_y + 30), f"Size: {content_w}x{content_h}", fill=(100, 100, 100), font=font)
|
||||||
|
|
||||||
|
|
||||||
|
def create_example_1():
|
||||||
|
"""Example 1: Default page style."""
|
||||||
|
print("\n Creating Example 1: Default style...")
|
||||||
|
|
||||||
|
page = Page(size=(400, 300))
|
||||||
|
draw_placeholder_content(page)
|
||||||
|
|
||||||
|
return page
|
||||||
|
|
||||||
|
|
||||||
|
def create_example_2():
|
||||||
|
"""Example 2: Page with visible borders."""
|
||||||
|
print(" Creating Example 2: With borders...")
|
||||||
|
|
||||||
|
page_style = PageStyle(
|
||||||
|
border_width=3,
|
||||||
|
border_color=(255, 100, 100),
|
||||||
|
padding=(20, 20, 20, 20),
|
||||||
|
background_color=(255, 250, 250)
|
||||||
|
)
|
||||||
|
|
||||||
|
page = Page(size=(400, 300), style=page_style)
|
||||||
|
draw_placeholder_content(page)
|
||||||
|
|
||||||
|
return page
|
||||||
|
|
||||||
|
|
||||||
|
def create_example_3():
|
||||||
|
"""Example 3: Page with generous padding."""
|
||||||
|
print(" Creating Example 3: With padding...")
|
||||||
|
|
||||||
|
page_style = PageStyle(
|
||||||
|
border_width=2,
|
||||||
|
border_color=(100, 100, 255),
|
||||||
|
padding=(40, 40, 40, 40),
|
||||||
|
background_color=(250, 250, 255)
|
||||||
|
)
|
||||||
|
|
||||||
|
page = Page(size=(400, 300), style=page_style)
|
||||||
|
draw_placeholder_content(page)
|
||||||
|
|
||||||
|
return page
|
||||||
|
|
||||||
|
|
||||||
|
def create_example_4():
|
||||||
|
"""Example 4: Clean, borderless design."""
|
||||||
|
print(" Creating Example 4: Borderless...")
|
||||||
|
|
||||||
|
page_style = PageStyle(
|
||||||
|
border_width=0,
|
||||||
|
padding=(30, 30, 30, 30),
|
||||||
|
background_color=(245, 245, 245)
|
||||||
|
)
|
||||||
|
|
||||||
|
page = Page(size=(400, 300), style=page_style)
|
||||||
|
draw_placeholder_content(page)
|
||||||
|
|
||||||
|
return page
|
||||||
|
|
||||||
|
|
||||||
|
def combine_into_grid(pages, title):
|
||||||
|
"""Combine multiple pages into a 2x2 grid with title."""
|
||||||
|
print(f"\n Combining pages into grid...")
|
||||||
|
|
||||||
|
# Render all pages
|
||||||
|
images = [page.render() for page in pages]
|
||||||
|
|
||||||
|
# Grid layout
|
||||||
|
padding = 20
|
||||||
|
title_height = 40
|
||||||
|
cols = 2
|
||||||
|
rows = 2
|
||||||
|
|
||||||
|
# Calculate dimensions
|
||||||
|
img_width = images[0].size[0]
|
||||||
|
img_height = images[0].size[1]
|
||||||
|
|
||||||
|
total_width = cols * img_width + (cols + 1) * padding
|
||||||
|
total_height = rows * img_height + (rows + 1) * padding + title_height
|
||||||
|
|
||||||
|
# Create combined image
|
||||||
|
combined = Image.new('RGB', (total_width, total_height), (250, 250, 250))
|
||||||
|
draw = ImageDraw.Draw(combined)
|
||||||
|
|
||||||
|
# Draw title
|
||||||
|
try:
|
||||||
|
title_font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf", 20)
|
||||||
|
except:
|
||||||
|
title_font = ImageFont.load_default()
|
||||||
|
|
||||||
|
# Center the title
|
||||||
|
bbox = draw.textbbox((0, 0), title, font=title_font)
|
||||||
|
text_width = bbox[2] - bbox[0]
|
||||||
|
title_x = (total_width - text_width) // 2
|
||||||
|
draw.text((title_x, 10), title, fill=(50, 50, 50), font=title_font)
|
||||||
|
|
||||||
|
# Place pages in grid
|
||||||
|
y_offset = title_height + padding
|
||||||
|
for row in range(rows):
|
||||||
|
x_offset = padding
|
||||||
|
for col in range(cols):
|
||||||
|
idx = row * cols + col
|
||||||
|
if idx < len(images):
|
||||||
|
combined.paste(images[idx], (x_offset, y_offset))
|
||||||
|
x_offset += img_width + padding
|
||||||
|
y_offset += img_height + padding
|
||||||
|
|
||||||
|
return combined
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
"""Demonstrate basic page rendering."""
|
||||||
|
print("Simple Page Rendering Example")
|
||||||
|
print("=" * 50)
|
||||||
|
|
||||||
|
# Create different page examples
|
||||||
|
pages = [
|
||||||
|
create_example_1(),
|
||||||
|
create_example_2(),
|
||||||
|
create_example_3(),
|
||||||
|
create_example_4()
|
||||||
|
]
|
||||||
|
|
||||||
|
# Combine into a single demonstration image
|
||||||
|
combined_image = combine_into_grid(pages, "Page Styles: Border & Padding Examples")
|
||||||
|
|
||||||
|
# Save output
|
||||||
|
output_dir = Path("docs/images")
|
||||||
|
output_dir.mkdir(parents=True, exist_ok=True)
|
||||||
|
output_path = output_dir / "example_01_page_rendering.png"
|
||||||
|
combined_image.save(output_path)
|
||||||
|
|
||||||
|
print(f"\n✓ Example completed!")
|
||||||
|
print(f" Output saved to: {output_path}")
|
||||||
|
print(f" Image size: {combined_image.size[0]}x{combined_image.size[1]} pixels")
|
||||||
|
print(f" Created {len(pages)} page examples")
|
||||||
|
|
||||||
|
return combined_image
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
214
examples/02_text_and_layout.py
Normal file
@ -0,0 +1,214 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Text and Layout Example
|
||||||
|
|
||||||
|
This example demonstrates text rendering using the pyWebLayout system:
|
||||||
|
- Different text alignments
|
||||||
|
- Font sizes and styles
|
||||||
|
- Multi-line paragraphs
|
||||||
|
- Document layout and pagination
|
||||||
|
|
||||||
|
This example uses the HTML parsing system to create rich text layouts.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import sys
|
||||||
|
from pathlib import Path
|
||||||
|
from PIL import Image, ImageDraw, ImageFont
|
||||||
|
|
||||||
|
# Add pyWebLayout to path
|
||||||
|
sys.path.insert(0, str(Path(__file__).parent.parent))
|
||||||
|
|
||||||
|
from pyWebLayout.io.readers.html_extraction import parse_html_string
|
||||||
|
from pyWebLayout.style import Font
|
||||||
|
from pyWebLayout.concrete.page import Page
|
||||||
|
from pyWebLayout.style.page_style import PageStyle
|
||||||
|
|
||||||
|
|
||||||
|
def create_sample_document():
|
||||||
|
"""Create different HTML samples demonstrating various features."""
|
||||||
|
samples = []
|
||||||
|
|
||||||
|
# Sample 1: Text alignment examples
|
||||||
|
samples.append((
|
||||||
|
"Text Alignment",
|
||||||
|
"""
|
||||||
|
<html><body>
|
||||||
|
<h2>Left Aligned</h2>
|
||||||
|
<p>This is left-aligned text. It is the default alignment for most text.</p>
|
||||||
|
|
||||||
|
<h2>Justified Text</h2>
|
||||||
|
<p style="text-align: justify;">This paragraph is justified. The text stretches to fill the entire width of the line, creating clean edges on both sides.</p>
|
||||||
|
|
||||||
|
<h2>Centered</h2>
|
||||||
|
<p style="text-align: center;">This text is centered.</p>
|
||||||
|
</body></html>
|
||||||
|
"""
|
||||||
|
))
|
||||||
|
|
||||||
|
# Sample 2: Font sizes
|
||||||
|
samples.append((
|
||||||
|
"Font Sizes",
|
||||||
|
"""
|
||||||
|
<html><body>
|
||||||
|
<h1>Heading 1</h1>
|
||||||
|
<h2>Heading 2</h2>
|
||||||
|
<h3>Heading 3</h3>
|
||||||
|
<p>Normal paragraph text at the default size.</p>
|
||||||
|
<p><small>Small text for fine print.</small></p>
|
||||||
|
</body></html>
|
||||||
|
"""
|
||||||
|
))
|
||||||
|
|
||||||
|
# Sample 3: Text styles
|
||||||
|
samples.append((
|
||||||
|
"Text Styles",
|
||||||
|
"""
|
||||||
|
<html><body>
|
||||||
|
<p>Normal text with <b>bold words</b> and <i>italic text</i>.</p>
|
||||||
|
<p><b>Completely bold paragraph.</b></p>
|
||||||
|
<p><i>Completely italic paragraph.</i></p>
|
||||||
|
<p>Text with <u>underlined words</u> for emphasis.</p>
|
||||||
|
</body></html>
|
||||||
|
"""
|
||||||
|
))
|
||||||
|
|
||||||
|
# Sample 4: Mixed content
|
||||||
|
samples.append((
|
||||||
|
"Mixed Content",
|
||||||
|
"""
|
||||||
|
<html><body>
|
||||||
|
<h2>Document Title</h2>
|
||||||
|
<p>A paragraph with <b>bold</b>, <i>italic</i>, and normal text all mixed together.</p>
|
||||||
|
<h3>Subsection</h3>
|
||||||
|
<p>Another paragraph demonstrating the layout system.</p>
|
||||||
|
</body></html>
|
||||||
|
"""
|
||||||
|
))
|
||||||
|
|
||||||
|
return samples
|
||||||
|
|
||||||
|
|
||||||
|
def render_html_to_image(html_content, page_size=(500, 400)):
|
||||||
|
"""Render HTML content to an image using the pyWebLayout system."""
|
||||||
|
# Create a page
|
||||||
|
page_style = PageStyle(
|
||||||
|
border_width=2,
|
||||||
|
border_color=(200, 200, 200),
|
||||||
|
padding=(30, 30, 30, 30),
|
||||||
|
background_color=(255, 255, 255)
|
||||||
|
)
|
||||||
|
|
||||||
|
page = Page(size=page_size, style=page_style)
|
||||||
|
|
||||||
|
# Parse HTML
|
||||||
|
base_font = Font(font_size=14)
|
||||||
|
blocks = parse_html_string(html_content, base_font=base_font)
|
||||||
|
|
||||||
|
# For now, just render the page structure
|
||||||
|
# (The full layout engine would place the blocks, but we'll show the page)
|
||||||
|
image = page.render()
|
||||||
|
draw = ImageDraw.Draw(image)
|
||||||
|
|
||||||
|
# Add a note that this is HTML-parsed content
|
||||||
|
try:
|
||||||
|
font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf", 11)
|
||||||
|
except:
|
||||||
|
font = ImageFont.load_default()
|
||||||
|
|
||||||
|
# Draw info about what was parsed
|
||||||
|
content_x = page.border_size + page.style.padding_left + 10
|
||||||
|
content_y = page.border_size + page.style.padding_top + 10
|
||||||
|
|
||||||
|
draw.text((content_x, content_y),
|
||||||
|
f"Parsed {len(blocks)} block(s) from HTML",
|
||||||
|
fill=(100, 100, 100), font=font)
|
||||||
|
|
||||||
|
# List the block types
|
||||||
|
y_offset = content_y + 25
|
||||||
|
for i, block in enumerate(blocks[:10]): # Show first 10
|
||||||
|
block_type = type(block).__name__
|
||||||
|
draw.text((content_x, y_offset),
|
||||||
|
f" {i+1}. {block_type}",
|
||||||
|
fill=(60, 60, 60), font=font)
|
||||||
|
y_offset += 18
|
||||||
|
|
||||||
|
if y_offset > page.size[1] - 60: # Don't overflow
|
||||||
|
break
|
||||||
|
|
||||||
|
return image
|
||||||
|
|
||||||
|
|
||||||
|
def combine_samples(samples):
|
||||||
|
"""Combine multiple sample renders into a grid."""
|
||||||
|
print("\n Rendering samples...")
|
||||||
|
|
||||||
|
images = []
|
||||||
|
for title, html in samples:
|
||||||
|
print(f" - {title}")
|
||||||
|
img = render_html_to_image(html)
|
||||||
|
|
||||||
|
# Add title to image
|
||||||
|
draw = ImageDraw.Draw(img)
|
||||||
|
try:
|
||||||
|
font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf", 14)
|
||||||
|
except:
|
||||||
|
font = ImageFont.load_default()
|
||||||
|
|
||||||
|
draw.text((10, 10), title, fill=(50, 50, 150), font=font)
|
||||||
|
images.append(img)
|
||||||
|
|
||||||
|
# Create grid (2x2)
|
||||||
|
padding = 20
|
||||||
|
cols = 2
|
||||||
|
rows = 2
|
||||||
|
|
||||||
|
img_width = images[0].size[0]
|
||||||
|
img_height = images[0].size[1]
|
||||||
|
|
||||||
|
total_width = cols * img_width + (cols + 1) * padding
|
||||||
|
total_height = rows * img_height + (rows + 1) * padding
|
||||||
|
|
||||||
|
combined = Image.new('RGB', (total_width, total_height), (240, 240, 240))
|
||||||
|
|
||||||
|
# Place images
|
||||||
|
y_offset = padding
|
||||||
|
for row in range(rows):
|
||||||
|
x_offset = padding
|
||||||
|
for col in range(cols):
|
||||||
|
idx = row * cols + col
|
||||||
|
if idx < len(images):
|
||||||
|
combined.paste(images[idx], (x_offset, y_offset))
|
||||||
|
x_offset += img_width + padding
|
||||||
|
y_offset += img_height + padding
|
||||||
|
|
||||||
|
return combined
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
"""Demonstrate text and layout features."""
|
||||||
|
print("Text and Layout Example")
|
||||||
|
print("=" * 50)
|
||||||
|
|
||||||
|
# Create sample documents
|
||||||
|
samples = create_sample_document()
|
||||||
|
|
||||||
|
# Render and combine
|
||||||
|
combined_image = combine_samples(samples)
|
||||||
|
|
||||||
|
# Save output
|
||||||
|
output_dir = Path("docs/images")
|
||||||
|
output_dir.mkdir(parents=True, exist_ok=True)
|
||||||
|
output_path = output_dir / "example_02_text_and_layout.png"
|
||||||
|
combined_image.save(output_path)
|
||||||
|
|
||||||
|
print(f"\n✓ Example completed!")
|
||||||
|
print(f" Output saved to: {output_path}")
|
||||||
|
print(f" Image size: {combined_image.size[0]}x{combined_image.size[1]} pixels")
|
||||||
|
print(f" Note: This example demonstrates HTML parsing")
|
||||||
|
print(f" Full layout rendering requires the typesetting engine")
|
||||||
|
|
||||||
|
return combined_image
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
243
examples/03_page_layouts.py
Normal file
@ -0,0 +1,243 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Page Layouts Example
|
||||||
|
|
||||||
|
This example demonstrates different page layout configurations:
|
||||||
|
- Various page sizes (small, medium, large)
|
||||||
|
- Different aspect ratios (portrait, landscape, square)
|
||||||
|
- Border and padding variations
|
||||||
|
- Color schemes
|
||||||
|
|
||||||
|
Shows how the pyWebLayout system handles different page dimensions.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import sys
|
||||||
|
from pathlib import Path
|
||||||
|
from PIL import Image, ImageDraw, ImageFont
|
||||||
|
|
||||||
|
# Add pyWebLayout to path
|
||||||
|
sys.path.insert(0, str(Path(__file__).parent.parent))
|
||||||
|
|
||||||
|
from pyWebLayout.concrete.page import Page
|
||||||
|
from pyWebLayout.style.page_style import PageStyle
|
||||||
|
|
||||||
|
|
||||||
|
def add_page_info(page: Page, title: str):
|
||||||
|
"""Add informational text to a page showing its properties."""
|
||||||
|
if page.draw is None:
|
||||||
|
page.render()
|
||||||
|
|
||||||
|
draw = page.draw
|
||||||
|
|
||||||
|
try:
|
||||||
|
font_large = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf", 14)
|
||||||
|
font_small = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf", 11)
|
||||||
|
except:
|
||||||
|
font_large = ImageFont.load_default()
|
||||||
|
font_small = ImageFont.load_default()
|
||||||
|
|
||||||
|
# Title
|
||||||
|
content_x = page.border_size + page.style.padding_left + 5
|
||||||
|
content_y = page.border_size + page.style.padding_top + 5
|
||||||
|
|
||||||
|
draw.text((content_x, content_y), title, fill=(40, 40, 40), font=font_large)
|
||||||
|
|
||||||
|
# Page info
|
||||||
|
y = content_y + 25
|
||||||
|
info = [
|
||||||
|
f"Page: {page.size[0]}×{page.size[1]}px",
|
||||||
|
f"Content: {page.content_size[0]}×{page.content_size[1]}px",
|
||||||
|
f"Border: {page.border_size}px",
|
||||||
|
f"Padding: {page.style.padding}",
|
||||||
|
]
|
||||||
|
|
||||||
|
for line in info:
|
||||||
|
draw.text((content_x, y), line, fill=(80, 80, 80), font=font_small)
|
||||||
|
y += 16
|
||||||
|
|
||||||
|
# Draw content area boundary
|
||||||
|
cx = page.border_size + page.style.padding_left
|
||||||
|
cy = page.border_size + page.style.padding_top
|
||||||
|
cw = page.content_size[0]
|
||||||
|
ch = page.content_size[1]
|
||||||
|
|
||||||
|
draw.rectangle(
|
||||||
|
[cx, cy, cx + cw, cy + ch],
|
||||||
|
outline=(150, 150, 255),
|
||||||
|
width=1
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def create_layouts():
|
||||||
|
"""Create various page layout examples."""
|
||||||
|
layouts = []
|
||||||
|
|
||||||
|
# 1. Small portrait page
|
||||||
|
print("\n Creating layout examples...")
|
||||||
|
print(" - Small portrait")
|
||||||
|
style1 = PageStyle(
|
||||||
|
border_width=2,
|
||||||
|
border_color=(100, 100, 100),
|
||||||
|
padding=(15, 15, 15, 15),
|
||||||
|
background_color=(255, 255, 255)
|
||||||
|
)
|
||||||
|
page1 = Page(size=(300, 400), style=style1)
|
||||||
|
add_page_info(page1, "Small Portrait")
|
||||||
|
layouts.append(("small_portrait", page1))
|
||||||
|
|
||||||
|
# 2. Large portrait page
|
||||||
|
print(" - Large portrait")
|
||||||
|
style2 = PageStyle(
|
||||||
|
border_width=3,
|
||||||
|
border_color=(150, 100, 100),
|
||||||
|
padding=(30, 30, 30, 30),
|
||||||
|
background_color=(255, 250, 250)
|
||||||
|
)
|
||||||
|
page2 = Page(size=(400, 600), style=style2)
|
||||||
|
add_page_info(page2, "Large Portrait")
|
||||||
|
layouts.append(("large_portrait", page2))
|
||||||
|
|
||||||
|
# 3. Landscape page
|
||||||
|
print(" - Landscape")
|
||||||
|
style3 = PageStyle(
|
||||||
|
border_width=2,
|
||||||
|
border_color=(100, 150, 100),
|
||||||
|
padding=(20, 40, 20, 40),
|
||||||
|
background_color=(250, 255, 250)
|
||||||
|
)
|
||||||
|
page3 = Page(size=(600, 350), style=style3)
|
||||||
|
add_page_info(page3, "Landscape")
|
||||||
|
layouts.append(("landscape", page3))
|
||||||
|
|
||||||
|
# 4. Square page
|
||||||
|
print(" - Square")
|
||||||
|
style4 = PageStyle(
|
||||||
|
border_width=3,
|
||||||
|
border_color=(100, 100, 150),
|
||||||
|
padding=(25, 25, 25, 25),
|
||||||
|
background_color=(250, 250, 255)
|
||||||
|
)
|
||||||
|
page4 = Page(size=(400, 400), style=style4)
|
||||||
|
add_page_info(page4, "Square")
|
||||||
|
layouts.append(("square", page4))
|
||||||
|
|
||||||
|
# 5. Minimal padding
|
||||||
|
print(" - Minimal padding")
|
||||||
|
style5 = PageStyle(
|
||||||
|
border_width=1,
|
||||||
|
border_color=(180, 180, 180),
|
||||||
|
padding=(5, 5, 5, 5),
|
||||||
|
background_color=(245, 245, 245)
|
||||||
|
)
|
||||||
|
page5 = Page(size=(350, 300), style=style5)
|
||||||
|
add_page_info(page5, "Minimal Padding")
|
||||||
|
layouts.append(("minimal", page5))
|
||||||
|
|
||||||
|
# 6. Generous padding
|
||||||
|
print(" - Generous padding")
|
||||||
|
style6 = PageStyle(
|
||||||
|
border_width=2,
|
||||||
|
border_color=(150, 120, 100),
|
||||||
|
padding=(50, 50, 50, 50),
|
||||||
|
background_color=(255, 250, 245)
|
||||||
|
)
|
||||||
|
page6 = Page(size=(400, 400), style=style6)
|
||||||
|
add_page_info(page6, "Generous Padding")
|
||||||
|
layouts.append(("generous", page6))
|
||||||
|
|
||||||
|
return layouts
|
||||||
|
|
||||||
|
|
||||||
|
def create_layout_showcase(layouts):
|
||||||
|
"""Create a showcase image displaying all layouts."""
|
||||||
|
print("\n Creating layout showcase...")
|
||||||
|
|
||||||
|
# Render all pages
|
||||||
|
images = [(name, page.render()) for name, page in layouts]
|
||||||
|
|
||||||
|
# Calculate grid layout (3×2)
|
||||||
|
padding = 15
|
||||||
|
title_height = 50
|
||||||
|
cols = 3
|
||||||
|
rows = 2
|
||||||
|
|
||||||
|
# Find max dimensions for each row/column
|
||||||
|
max_widths = []
|
||||||
|
for col in range(cols):
|
||||||
|
col_images = [images[row * cols + col][1] for row in range(rows) if row * cols + col < len(images)]
|
||||||
|
if col_images:
|
||||||
|
max_widths.append(max(img.size[0] for img in col_images))
|
||||||
|
|
||||||
|
max_heights = []
|
||||||
|
for row in range(rows):
|
||||||
|
row_images = [images[row * cols + col][1] for col in range(cols) if row * cols + col < len(images)]
|
||||||
|
if row_images:
|
||||||
|
max_heights.append(max(img.size[1] for img in row_images))
|
||||||
|
|
||||||
|
# Calculate total size
|
||||||
|
total_width = sum(max_widths) + padding * (cols + 1)
|
||||||
|
total_height = sum(max_heights) + padding * (rows + 1) + title_height
|
||||||
|
|
||||||
|
# Create combined image
|
||||||
|
combined = Image.new('RGB', (total_width, total_height), (235, 235, 235))
|
||||||
|
draw = ImageDraw.Draw(combined)
|
||||||
|
|
||||||
|
# Add title
|
||||||
|
try:
|
||||||
|
title_font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf", 24)
|
||||||
|
except:
|
||||||
|
title_font = ImageFont.load_default()
|
||||||
|
|
||||||
|
title_text = "Page Layout Examples"
|
||||||
|
bbox = draw.textbbox((0, 0), title_text, font=title_font)
|
||||||
|
text_width = bbox[2] - bbox[0]
|
||||||
|
title_x = (total_width - text_width) // 2
|
||||||
|
draw.text((title_x, 15), title_text, fill=(50, 50, 50), font=title_font)
|
||||||
|
|
||||||
|
# Place images in grid
|
||||||
|
y_offset = title_height + padding
|
||||||
|
for row in range(rows):
|
||||||
|
x_offset = padding
|
||||||
|
for col in range(cols):
|
||||||
|
idx = row * cols + col
|
||||||
|
if idx < len(images):
|
||||||
|
name, img = images[idx]
|
||||||
|
# Center image in its cell
|
||||||
|
cell_width = max_widths[col]
|
||||||
|
cell_height = max_heights[row]
|
||||||
|
img_x = x_offset + (cell_width - img.size[0]) // 2
|
||||||
|
img_y = y_offset + (cell_height - img.size[1]) // 2
|
||||||
|
combined.paste(img, (img_x, img_y))
|
||||||
|
x_offset += max_widths[col] + padding if col < len(max_widths) else 0
|
||||||
|
y_offset += max_heights[row] + padding if row < len(max_heights) else 0
|
||||||
|
|
||||||
|
return combined
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
"""Demonstrate page layout variations."""
|
||||||
|
print("Page Layouts Example")
|
||||||
|
print("=" * 50)
|
||||||
|
|
||||||
|
# Create different layouts
|
||||||
|
layouts = create_layouts()
|
||||||
|
|
||||||
|
# Create showcase
|
||||||
|
combined_image = create_layout_showcase(layouts)
|
||||||
|
|
||||||
|
# Save output
|
||||||
|
output_dir = Path("docs/images")
|
||||||
|
output_dir.mkdir(parents=True, exist_ok=True)
|
||||||
|
output_path = output_dir / "example_03_page_layouts.png"
|
||||||
|
combined_image.save(output_path)
|
||||||
|
|
||||||
|
print(f"\n✓ Example completed!")
|
||||||
|
print(f" Output saved to: {output_path}")
|
||||||
|
print(f" Image size: {combined_image.size[0]}x{combined_image.size[1]} pixels")
|
||||||
|
print(f" Created {len(layouts)} layout examples")
|
||||||
|
|
||||||
|
return combined_image
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
@ -2,48 +2,56 @@
|
|||||||
|
|
||||||
This directory contains example scripts demonstrating the pyWebLayout library.
|
This directory contains example scripts demonstrating the pyWebLayout library.
|
||||||
|
|
||||||
## EbookReader Examples
|
## Getting Started Examples
|
||||||
|
|
||||||
The EbookReader provides a high-level, user-friendly API for building ebook reader applications.
|
These examples demonstrate the core rendering capabilities of pyWebLayout:
|
||||||
|
|
||||||
### Quick Start Example
|
### 01. Simple Page Rendering
|
||||||
|
**`01_simple_page_rendering.py`** - Introduction to the Page system
|
||||||
|
|
||||||
**`simple_ereader_example.py`** - Simple example showing basic EbookReader usage:
|
|
||||||
```bash
|
```bash
|
||||||
python simple_ereader_example.py path/to/book.epub
|
python 01_simple_page_rendering.py
|
||||||
```
|
```
|
||||||
|
|
||||||
This demonstrates:
|
Demonstrates:
|
||||||
- Loading an EPUB file
|
- Creating pages with different styles
|
||||||
- Rendering pages to images
|
- Setting borders, padding, and backgrounds
|
||||||
- Basic navigation (next/previous page)
|
- Understanding page layout structure
|
||||||
- Saving positions
|
- Basic rendering to images
|
||||||
- Chapter navigation
|
|
||||||
- Font size adjustment
|
|
||||||
|
|
||||||
### Comprehensive Demo
|

|
||||||
|
|
||||||
|
### 02. Text and Layout
|
||||||
|
**`02_text_and_layout.py`** - HTML parsing and text rendering
|
||||||
|
|
||||||
**`ereader_demo.py`** - Full feature demonstration:
|
|
||||||
```bash
|
```bash
|
||||||
python ereader_demo.py path/to/book.epub
|
python 02_text_and_layout.py
|
||||||
```
|
```
|
||||||
|
|
||||||
This showcases all EbookReader features:
|
Demonstrates:
|
||||||
- Page navigation (forward/backward)
|
- Parsing HTML content
|
||||||
- Position save/load with bookmarks
|
- Text alignment options
|
||||||
- Chapter navigation (by index or title)
|
- Font sizes and styles
|
||||||
- Font size control
|
- Document structure
|
||||||
- Line and block spacing adjustments
|
|
||||||
- Reading progress tracking
|

|
||||||
- Book information retrieval
|
|
||||||
|
### 03. Page Layouts
|
||||||
|
**`03_page_layouts.py`** - Different page configurations
|
||||||
|
|
||||||
**Tip:** You can use the test EPUB files in `tests/data/` for testing:
|
|
||||||
```bash
|
```bash
|
||||||
python simple_ereader_example.py tests/data/test.epub
|
python 03_page_layouts.py
|
||||||
python ereader_demo.py tests/data/test.epub
|
|
||||||
```
|
```
|
||||||
|
|
||||||
## Other Examples
|
Demonstrates:
|
||||||
|
- Various page sizes (portrait, landscape, square)
|
||||||
|
- Different aspect ratios
|
||||||
|
- Border and padding variations
|
||||||
|
- Color schemes
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
## Advanced Examples
|
||||||
|
|
||||||
### HTML Rendering
|
### HTML Rendering
|
||||||
|
|
||||||
@ -51,16 +59,28 @@ These examples demonstrate rendering HTML content to multi-page layouts:
|
|||||||
|
|
||||||
**`html_line_breaking_demo.py`** - Basic HTML line breaking demonstration
|
**`html_line_breaking_demo.py`** - Basic HTML line breaking demonstration
|
||||||
**`html_multipage_simple.py`** - Simple single-page HTML rendering
|
**`html_multipage_simple.py`** - Simple single-page HTML rendering
|
||||||
**`html_multipage_demo.py`** - Multi-page HTML layout
|
|
||||||
**`html_multipage_demo_final.py`** - Complete multi-page HTML rendering with headers/footers
|
**`html_multipage_demo_final.py`** - Complete multi-page HTML rendering with headers/footers
|
||||||
|
|
||||||
For detailed information about HTML rendering, see `README_HTML_MULTIPAGE.md`.
|
For detailed information about HTML rendering, see `README_HTML_MULTIPAGE.md`.
|
||||||
|
|
||||||
## Documentation
|
## Running the Examples
|
||||||
|
|
||||||
|
All examples can be run directly from the examples directory:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd examples
|
||||||
|
python 01_simple_page_rendering.py
|
||||||
|
python 02_text_and_layout.py
|
||||||
|
python 03_page_layouts.py
|
||||||
|
```
|
||||||
|
|
||||||
|
Output images are saved to the `docs/images/` directory.
|
||||||
|
|
||||||
|
## Additional Documentation
|
||||||
|
|
||||||
- `README_EREADER.md` - Detailed EbookReader API documentation
|
|
||||||
- `README_HTML_MULTIPAGE.md` - HTML multi-page rendering guide
|
- `README_HTML_MULTIPAGE.md` - HTML multi-page rendering guide
|
||||||
- `pyWebLayout/layout/README_EREADER_API.md` - EbookReader API reference (in source)
|
- `../ARCHITECTURE.md` - Detailed explanation of the Abstract/Concrete architecture
|
||||||
|
- `../docs/images/` - Rendered example outputs
|
||||||
|
|
||||||
## Debug/Development Scripts
|
## Debug/Development Scripts
|
||||||
|
|
||||||
|
|||||||
@ -1,201 +0,0 @@
|
|||||||
# HTML Multi-Page Rendering Examples
|
|
||||||
|
|
||||||
This directory contains working examples that demonstrate how to render HTML content across multiple pages using the pyWebLayout system. The examples show the complete pipeline from HTML parsing to multi-page layout.
|
|
||||||
|
|
||||||
## Overview
|
|
||||||
|
|
||||||
The pyWebLayout system provides a sophisticated HTML-to-multi-page rendering pipeline that:
|
|
||||||
|
|
||||||
1. **Parses HTML** using the `pyWebLayout.io.readers.html_extraction` module
|
|
||||||
2. **Converts to abstract blocks** (paragraphs, headings, lists, etc.)
|
|
||||||
3. **Layouts content across pages** using the `pyWebLayout.layout.document_layouter`
|
|
||||||
4. **Renders pages as images** for visualization
|
|
||||||
|
|
||||||
## Examples
|
|
||||||
|
|
||||||
### 1. `html_multipage_simple.py` - Basic Example
|
|
||||||
|
|
||||||
A simple demonstration that shows the core functionality:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
python examples/html_multipage_simple.py
|
|
||||||
```
|
|
||||||
|
|
||||||
**Features:**
|
|
||||||
- Parses basic HTML with headings and paragraphs
|
|
||||||
- Uses 600x800 pixel pages
|
|
||||||
- Demonstrates single-page layout
|
|
||||||
- Outputs to `output/html_simple/`
|
|
||||||
|
|
||||||
**Results:**
|
|
||||||
- Parsed 11 paragraphs from HTML
|
|
||||||
- Rendered 1 page with 20 lines
|
|
||||||
- Created `page_001.png` (19KB)
|
|
||||||
|
|
||||||
### 2. `html_multipage_demo_final.py` - Complete Multi-Page Demo
|
|
||||||
|
|
||||||
A comprehensive demonstration with true multi-page functionality:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
python examples/html_multipage_demo_final.py
|
|
||||||
```
|
|
||||||
|
|
||||||
**Features:**
|
|
||||||
- Longer HTML document with multiple chapters
|
|
||||||
- Smaller pages (400x500 pixels) to force multi-page layout
|
|
||||||
- Enhanced page formatting with headers and footers
|
|
||||||
- Smart heading placement (avoids orphaned headings)
|
|
||||||
- Outputs to `output/html_multipage_final/`
|
|
||||||
|
|
||||||
**Results:**
|
|
||||||
- Parsed 22 paragraphs (6 headings, 16 regular paragraphs)
|
|
||||||
- Rendered 7 pages with 67 total lines
|
|
||||||
- Average 9.6 lines per page
|
|
||||||
- Created 7 PNG files (4.9KB - 10KB each)
|
|
||||||
|
|
||||||
## Technical Details
|
|
||||||
|
|
||||||
### HTML Parsing
|
|
||||||
|
|
||||||
The system uses BeautifulSoup to parse HTML and converts elements to pyWebLayout abstract blocks:
|
|
||||||
|
|
||||||
- `<h1>-<h6>` → `Heading` blocks
|
|
||||||
- `<p>` → `Paragraph` blocks
|
|
||||||
- `<ul>`, `<ol>`, `<li>` → `HList` and `ListItem` blocks
|
|
||||||
- `<blockquote>` → `Quote` blocks
|
|
||||||
- Inline elements (`<strong>`, `<em>`, etc.) → Styled words
|
|
||||||
|
|
||||||
### Layout Engine
|
|
||||||
|
|
||||||
The document layouter handles:
|
|
||||||
|
|
||||||
- **Word spacing constraints** - Configurable min/max spacing
|
|
||||||
- **Line breaking** - Automatic word wrapping
|
|
||||||
- **Page overflow** - Continues content on new pages
|
|
||||||
- **Font scaling** - Proportional scaling support
|
|
||||||
- **Position tracking** - Maintains document positions
|
|
||||||
|
|
||||||
### Page Rendering
|
|
||||||
|
|
||||||
Pages are rendered as PIL Images with:
|
|
||||||
|
|
||||||
- **Configurable page sizes** - Width x Height in pixels
|
|
||||||
- **Borders and margins** - Professional page appearance
|
|
||||||
- **Headers and footers** - Document title and page numbers
|
|
||||||
- **Font rendering** - Uses system fonts (DejaVu Sans fallback)
|
|
||||||
|
|
||||||
## Code Structure
|
|
||||||
|
|
||||||
### Key Classes
|
|
||||||
|
|
||||||
1. **SimplePage/MultiPage** - Page implementation with drawing context
|
|
||||||
2. **SimpleWord** - Word implementation compatible with layouter
|
|
||||||
3. **SimpleParagraph** - Paragraph implementation with styling
|
|
||||||
4. **HTMLMultiPageRenderer** - Main renderer class
|
|
||||||
|
|
||||||
### Key Functions
|
|
||||||
|
|
||||||
1. **parse_html_to_paragraphs()** - Converts HTML to paragraph objects
|
|
||||||
2. **render_pages()** - Layouts paragraphs across multiple pages
|
|
||||||
3. **save_pages()** - Saves pages as PNG image files
|
|
||||||
|
|
||||||
## Usage Patterns
|
|
||||||
|
|
||||||
### Basic Usage
|
|
||||||
|
|
||||||
```python
|
|
||||||
from examples.html_multipage_simple import HTMLMultiPageRenderer
|
|
||||||
|
|
||||||
# Create renderer
|
|
||||||
renderer = HTMLMultiPageRenderer(page_size=(600, 800))
|
|
||||||
|
|
||||||
# Parse HTML
|
|
||||||
paragraphs = renderer.parse_html_to_paragraphs(html_content)
|
|
||||||
|
|
||||||
# Render pages
|
|
||||||
pages = renderer.render_pages(paragraphs)
|
|
||||||
|
|
||||||
# Save results
|
|
||||||
renderer.save_pages(pages, "output/my_document")
|
|
||||||
```
|
|
||||||
|
|
||||||
### Advanced Configuration
|
|
||||||
|
|
||||||
```python
|
|
||||||
# Smaller pages for more pages
|
|
||||||
renderer = HTMLMultiPageRenderer(page_size=(400, 500))
|
|
||||||
|
|
||||||
# Custom styling
|
|
||||||
style = AbstractStyle(
|
|
||||||
word_spacing=3.0,
|
|
||||||
word_spacing_min=2.0,
|
|
||||||
word_spacing_max=6.0
|
|
||||||
)
|
|
||||||
paragraph = SimpleParagraph(text, style)
|
|
||||||
```
|
|
||||||
|
|
||||||
## Output Files
|
|
||||||
|
|
||||||
The examples generate PNG image files showing the rendered pages:
|
|
||||||
|
|
||||||
- **Single page example**: `output/html_simple/page_001.png`
|
|
||||||
- **Multi-page example**: `output/html_multipage_final/page_001.png` through `page_007.png`
|
|
||||||
|
|
||||||
Each page includes:
|
|
||||||
- Document content with proper typography
|
|
||||||
- Page borders and margins
|
|
||||||
- Header with document title
|
|
||||||
- Footer with page numbers
|
|
||||||
- Professional appearance suitable for documents
|
|
||||||
|
|
||||||
## Integration with pyWebLayout
|
|
||||||
|
|
||||||
This example demonstrates integration with several pyWebLayout modules:
|
|
||||||
|
|
||||||
- **`pyWebLayout.io.readers.html_extraction`** - HTML parsing
|
|
||||||
- **`pyWebLayout.layout.document_layouter`** - Page layout
|
|
||||||
- **`pyWebLayout.style.abstract_style`** - Typography control
|
|
||||||
- **`pyWebLayout.abstract.block`** - Document structure
|
|
||||||
- **`pyWebLayout.concrete.text`** - Text rendering
|
|
||||||
|
|
||||||
## Performance
|
|
||||||
|
|
||||||
The system demonstrates excellent performance characteristics:
|
|
||||||
|
|
||||||
- **Sub-second rendering** for typical documents
|
|
||||||
- **Efficient memory usage** with incremental processing
|
|
||||||
- **Scalable architecture** suitable for large documents
|
|
||||||
- **Responsive layout** adapts to different page sizes
|
|
||||||
|
|
||||||
## Use Cases
|
|
||||||
|
|
||||||
This technology is suitable for:
|
|
||||||
|
|
||||||
- **E-reader applications** - Digital book rendering
|
|
||||||
- **Document processors** - Report generation
|
|
||||||
- **Publishing systems** - Automated layout
|
|
||||||
- **Web-to-print** - HTML to paginated output
|
|
||||||
- **Academic papers** - Research document formatting
|
|
||||||
|
|
||||||
## Next Steps
|
|
||||||
|
|
||||||
To extend this example:
|
|
||||||
|
|
||||||
1. **Add table support** - Layout HTML tables across pages
|
|
||||||
2. **Image handling** - Embed and position images
|
|
||||||
3. **CSS styling** - Enhanced style parsing
|
|
||||||
4. **Font management** - Custom font loading
|
|
||||||
5. **Export formats** - PDF generation from pages
|
|
||||||
|
|
||||||
## Dependencies
|
|
||||||
|
|
||||||
- **Python 3.7+**
|
|
||||||
- **PIL (Pillow)** - Image generation
|
|
||||||
- **BeautifulSoup4** - HTML parsing (via pyWebLayout)
|
|
||||||
- **pyWebLayout** - Core layout engine
|
|
||||||
|
|
||||||
## Conclusion
|
|
||||||
|
|
||||||
These examples demonstrate that pyWebLayout provides a complete, production-ready solution for HTML-to-multi-page rendering. The system successfully handles the complex task of flowing content across page boundaries while maintaining professional typography and layout quality.
|
|
||||||
|
|
||||||
The 7-page output from a 4,736-character HTML document shows the system's capability to handle real-world content with proper pagination, making it suitable for serious document processing applications.
|
|
||||||
@ -1,292 +0,0 @@
|
|||||||
#!/usr/bin/env python3
|
|
||||||
"""
|
|
||||||
HTML Line Breaking and Paragraph Breaking Demo
|
|
||||||
|
|
||||||
This example demonstrates the proper use of pyWebLayout's line breaking system:
|
|
||||||
1. Line breaking with very long sentences
|
|
||||||
2. Word wrapping with long words
|
|
||||||
3. Hyphenation of extremely long words using pyphen
|
|
||||||
4. Paragraph breaking across pages
|
|
||||||
5. Various text formatting scenarios
|
|
||||||
|
|
||||||
This showcases the robustness of the layout engine's text flow capabilities
|
|
||||||
using the actual pyWebLayout concrete classes and layout system.
|
|
||||||
"""
|
|
||||||
|
|
||||||
import os
|
|
||||||
import sys
|
|
||||||
from pathlib import Path
|
|
||||||
from typing import List, Tuple
|
|
||||||
from PIL import Image, ImageDraw, ImageFont
|
|
||||||
|
|
||||||
# Add pyWebLayout to path
|
|
||||||
sys.path.insert(0, str(Path(__file__).parent.parent))
|
|
||||||
|
|
||||||
from pyWebLayout.io.readers.html_extraction import parse_html_string
|
|
||||||
from pyWebLayout.layout.document_layouter import paragraph_layouter
|
|
||||||
from pyWebLayout.style.abstract_style import AbstractStyle
|
|
||||||
from pyWebLayout.style.concrete_style import StyleResolver, RenderingContext, ConcreteStyleRegistry
|
|
||||||
from pyWebLayout.style.page_style import PageStyle
|
|
||||||
from pyWebLayout.concrete import Page
|
|
||||||
from pyWebLayout.abstract.block import Paragraph, Heading
|
|
||||||
from pyWebLayout.abstract.inline import Word
|
|
||||||
|
|
||||||
|
|
||||||
def create_line_breaking_html() -> str:
|
|
||||||
"""Create HTML content specifically designed to test line and paragraph breaking."""
|
|
||||||
return """
|
|
||||||
<html>
|
|
||||||
<body>
|
|
||||||
<h1>Line Breaking and Text Flow Demonstration</h1>
|
|
||||||
|
|
||||||
<p>This paragraph contains some extraordinarily long words that will definitely require hyphenation when rendered on narrow pages: supercalifragilisticexpialidocious, antidisestablishmentarianism, pneumonoultramicroscopicsilicovolcanoconiosisology, and floccinaucinihilipilificationism.</p>
|
|
||||||
|
|
||||||
<p>Here we have an extremely long sentence that goes on and on and on without any natural breaking points, demonstrating how the layout engine handles continuous text flow across multiple lines when the content exceeds the available width of the page and must be wrapped appropriately to maintain readability while preserving the semantic meaning of the original text content.</p>
|
|
||||||
|
|
||||||
<h2>Technical Terms and Specialized Vocabulary</h2>
|
|
||||||
|
|
||||||
<p>In the field of computational linguistics and natural language processing, we often encounter terminology such as morphophonological, psychopharmacological, electroencephalographic, and immunoelectrophoresis that challenges traditional typesetting systems.</p>
|
|
||||||
|
|
||||||
<p>The implementation of sophisticated algorithms for handling such complex lexical items requires careful consideration of hyphenation patterns, word spacing constraints, and line breaking optimization to ensure that the resulting layout maintains both aesthetic appeal and functional readability across various display contexts and page dimensions.</p>
|
|
||||||
|
|
||||||
<h2>Continuous Text Flow Example</h2>
|
|
||||||
|
|
||||||
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p>
|
|
||||||
|
|
||||||
<p>Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo. Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt.</p>
|
|
||||||
|
|
||||||
<h2>Mixed Content Challenges</h2>
|
|
||||||
|
|
||||||
<p>URLs like https://www.verylongdomainnamethatshoulddemonstratehowurlsarehandledinlayoutsystems.com/with/very/long/paths/that/might/need/special/treatment and email addresses such as someone.with.a.very.long.email.address@anextraordinarilylong.domainname.extension can present unique challenges.</p>
|
|
||||||
|
|
||||||
<p>Similarly, technical identifiers like ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890 or chemical compound names such as methylenedioxymethamphetamine require special handling for proper text flow and readability.</p>
|
|
||||||
|
|
||||||
<h2>Extreme Line Breaking Test</h2>
|
|
||||||
|
|
||||||
<p>Thisisaverylongwordwithoutanyspacesorpunctuationthatwillrequireforcedhyphenationtofitonnarrowpagesanddemonstratehowtheenginehandlesextremecases.</p>
|
|
||||||
|
|
||||||
<p>Finally, we test mixed scenarios: normal words, supercalifragilisticexpialidocious, more normal text, antidisestablishmentarianism, and regular content to show how the engine transitions between different text types seamlessly.</p>
|
|
||||||
|
|
||||||
</body>
|
|
||||||
</html>
|
|
||||||
"""
|
|
||||||
|
|
||||||
|
|
||||||
class HTMLMultiPageRenderer:
|
|
||||||
"""Renderer for HTML content across multiple narrow pages using proper pyWebLayout classes."""
|
|
||||||
|
|
||||||
def __init__(self, page_width=300, page_height=400):
|
|
||||||
self.page_width = page_width
|
|
||||||
self.page_height = page_height
|
|
||||||
self.pages = []
|
|
||||||
self.current_page = None
|
|
||||||
|
|
||||||
# Create rendering context for narrow pages
|
|
||||||
self.context = RenderingContext(
|
|
||||||
base_font_size=10, # Small font for narrow pages
|
|
||||||
available_width=page_width - 50, # Account for borders
|
|
||||||
available_height=page_height - 80, # Account for borders and header
|
|
||||||
default_language="en-US"
|
|
||||||
)
|
|
||||||
|
|
||||||
# Create style resolver
|
|
||||||
self.style_resolver = StyleResolver(self.context)
|
|
||||||
|
|
||||||
# Create page style for narrow pages
|
|
||||||
self.page_style = PageStyle(
|
|
||||||
border_width=2,
|
|
||||||
border_color=(160, 160, 160),
|
|
||||||
background_color=(255, 255, 255),
|
|
||||||
padding=(20, 25, 20, 25) # top, right, bottom, left
|
|
||||||
)
|
|
||||||
|
|
||||||
def create_new_page(self) -> Page:
|
|
||||||
"""Create a new page using proper pyWebLayout Page class."""
|
|
||||||
page = Page(
|
|
||||||
size=(self.page_width, self.page_height),
|
|
||||||
style=self.page_style
|
|
||||||
)
|
|
||||||
|
|
||||||
# Set up the page with style resolver
|
|
||||||
page.style_resolver = self.style_resolver
|
|
||||||
|
|
||||||
# Calculate available dimensions
|
|
||||||
page.available_width = page.content_size[0]
|
|
||||||
page.available_height = page.content_size[1]
|
|
||||||
page._current_y_offset = self.page_style.border_width + self.page_style.padding_top
|
|
||||||
|
|
||||||
self.pages.append(page)
|
|
||||||
return page
|
|
||||||
|
|
||||||
def render_html(self, html_content: str) -> List[Page]:
|
|
||||||
"""Render HTML content to multiple pages using proper pyWebLayout system."""
|
|
||||||
print("Parsing HTML content...")
|
|
||||||
|
|
||||||
# Parse HTML into blocks
|
|
||||||
blocks = parse_html_string(html_content)
|
|
||||||
print(f"Parsed {len(blocks)} blocks from HTML")
|
|
||||||
|
|
||||||
# Convert blocks to proper pyWebLayout objects
|
|
||||||
paragraphs = []
|
|
||||||
for block in blocks:
|
|
||||||
if isinstance(block, Heading):
|
|
||||||
# Create heading style with larger font
|
|
||||||
heading_style = AbstractStyle(
|
|
||||||
font_size=14 if block.level.value <= 2 else 12,
|
|
||||||
word_spacing=3.0,
|
|
||||||
word_spacing_min=1.0,
|
|
||||||
word_spacing_max=6.0,
|
|
||||||
language="en-US"
|
|
||||||
)
|
|
||||||
|
|
||||||
# Create paragraph from heading with proper words
|
|
||||||
paragraph = Paragraph(style=heading_style)
|
|
||||||
paragraph.line_height = 18 if block.level.value <= 2 else 16
|
|
||||||
|
|
||||||
# Add words from heading
|
|
||||||
for _, word in block.words_iter():
|
|
||||||
paragraph.add_word(word)
|
|
||||||
|
|
||||||
if paragraph._words:
|
|
||||||
paragraphs.append(paragraph)
|
|
||||||
print(f"Added heading: {' '.join(w.text for w in paragraph._words[:5])}...")
|
|
||||||
|
|
||||||
elif isinstance(block, Paragraph):
|
|
||||||
# Create paragraph style
|
|
||||||
para_style = AbstractStyle(
|
|
||||||
font_size=10,
|
|
||||||
word_spacing=2.0,
|
|
||||||
word_spacing_min=1.0,
|
|
||||||
word_spacing_max=4.0,
|
|
||||||
language="en-US"
|
|
||||||
)
|
|
||||||
|
|
||||||
# Create paragraph with proper words
|
|
||||||
paragraph = Paragraph(style=para_style)
|
|
||||||
paragraph.line_height = 14
|
|
||||||
|
|
||||||
# Add words from paragraph - use words property (list) directly
|
|
||||||
for word in block.words:
|
|
||||||
paragraph.add_word(word)
|
|
||||||
|
|
||||||
if paragraph._words:
|
|
||||||
paragraphs.append(paragraph)
|
|
||||||
print(f"Added paragraph: {' '.join(w.text for w in paragraph._words[:5])}...")
|
|
||||||
|
|
||||||
print(f"Created {len(paragraphs)} paragraphs for layout")
|
|
||||||
|
|
||||||
# Layout paragraphs across pages using proper paragraph_layouter
|
|
||||||
self.current_page = self.create_new_page()
|
|
||||||
total_lines = 0
|
|
||||||
|
|
||||||
for i, paragraph in enumerate(paragraphs):
|
|
||||||
print(f"Laying out paragraph {i+1}/{len(paragraphs)} ({len(paragraph._words)} words)")
|
|
||||||
|
|
||||||
start_word = 0
|
|
||||||
pretext = None
|
|
||||||
|
|
||||||
while start_word < len(paragraph._words):
|
|
||||||
# Use the proper paragraph_layouter function
|
|
||||||
success, failed_word_index, remaining_pretext = paragraph_layouter(
|
|
||||||
paragraph, self.current_page, start_word, pretext
|
|
||||||
)
|
|
||||||
|
|
||||||
lines_on_page = len(self.current_page.children)
|
|
||||||
|
|
||||||
if success:
|
|
||||||
# Paragraph completed on this page
|
|
||||||
print(f" ✓ Paragraph completed on page {len(self.pages)} ({lines_on_page} lines)")
|
|
||||||
break
|
|
||||||
else:
|
|
||||||
# Page is full, need new page
|
|
||||||
if failed_word_index is not None:
|
|
||||||
print(f" → Page {len(self.pages)} full, continuing from word {failed_word_index}")
|
|
||||||
start_word = failed_word_index
|
|
||||||
pretext = remaining_pretext
|
|
||||||
self.current_page = self.create_new_page()
|
|
||||||
else:
|
|
||||||
print(f" ✗ Layout failed for paragraph {i+1}")
|
|
||||||
break
|
|
||||||
|
|
||||||
print(f"\nLayout complete:")
|
|
||||||
print(f" - Total pages: {len(self.pages)}")
|
|
||||||
print(f" - Total lines: {sum(len(page.children) for page in self.pages)}")
|
|
||||||
|
|
||||||
return self.pages
|
|
||||||
|
|
||||||
def save_pages(self, output_dir: str):
|
|
||||||
"""Save all pages as PNG images."""
|
|
||||||
output_path = Path(output_dir)
|
|
||||||
output_path.mkdir(parents=True, exist_ok=True)
|
|
||||||
|
|
||||||
print(f"\nSaving {len(self.pages)} pages to {output_path}")
|
|
||||||
|
|
||||||
for i, page in enumerate(self.pages, 1):
|
|
||||||
filename = f"page_{i:03d}.png"
|
|
||||||
filepath = output_path / filename
|
|
||||||
|
|
||||||
# Render the page using proper Page.render() method
|
|
||||||
page_image = page.render()
|
|
||||||
|
|
||||||
# Add page number at bottom
|
|
||||||
draw = ImageDraw.Draw(page_image)
|
|
||||||
try:
|
|
||||||
font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf", 8)
|
|
||||||
except:
|
|
||||||
font = ImageFont.load_default()
|
|
||||||
|
|
||||||
page_text = f"Page {i} of {len(self.pages)}"
|
|
||||||
text_bbox = draw.textbbox((0, 0), page_text, font=font)
|
|
||||||
text_width = text_bbox[2] - text_bbox[0]
|
|
||||||
|
|
||||||
x = (self.page_width - text_width) // 2
|
|
||||||
y = self.page_height - 15
|
|
||||||
draw.text((x, y), page_text, fill=(120, 120, 120), font=font)
|
|
||||||
|
|
||||||
# Save the page
|
|
||||||
page_image.save(filepath)
|
|
||||||
print(f" Saved {filename} ({len(page.children)} lines)")
|
|
||||||
|
|
||||||
|
|
||||||
def main():
|
|
||||||
"""Main function to run the line breaking demonstration."""
|
|
||||||
print("HTML Line Breaking and Paragraph Breaking Demo")
|
|
||||||
print("=" * 50)
|
|
||||||
|
|
||||||
# Create HTML content with challenging text
|
|
||||||
html_content = create_line_breaking_html()
|
|
||||||
print(f"Created HTML content ({len(html_content)} characters)")
|
|
||||||
|
|
||||||
# Create renderer with narrow pages to force line breaking
|
|
||||||
renderer = HTMLMultiPageRenderer(
|
|
||||||
page_width=300, # Very narrow to force line breaks
|
|
||||||
page_height=400 # Moderate height
|
|
||||||
)
|
|
||||||
|
|
||||||
# Render HTML to pages
|
|
||||||
pages = renderer.render_html(html_content)
|
|
||||||
|
|
||||||
# Save pages
|
|
||||||
output_dir = "output/html_line_breaking"
|
|
||||||
renderer.save_pages(output_dir)
|
|
||||||
|
|
||||||
print(f"\n✅ Demo complete!")
|
|
||||||
print(f" Generated {len(pages)} pages demonstrating:")
|
|
||||||
print(f" - Line breaking with long sentences")
|
|
||||||
print(f" - Word hyphenation for extremely long words")
|
|
||||||
print(f" - Paragraph flow across multiple pages")
|
|
||||||
print(f" - Mixed content handling")
|
|
||||||
print(f"\n📁 Output saved to: {output_dir}/")
|
|
||||||
|
|
||||||
# Print summary statistics
|
|
||||||
total_lines = sum(len(page.children) for page in pages)
|
|
||||||
avg_lines_per_page = total_lines / len(pages) if pages else 0
|
|
||||||
|
|
||||||
print(f"\n📊 Statistics:")
|
|
||||||
print(f" - Total lines rendered: {total_lines}")
|
|
||||||
print(f" - Average lines per page: {avg_lines_per_page:.1f}")
|
|
||||||
print(f" - Page dimensions: {renderer.page_width}x{renderer.page_height} pixels")
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
|
||||||
main()
|
|
||||||
@ -1,451 +0,0 @@
|
|||||||
#!/usr/bin/env python3
|
|
||||||
"""
|
|
||||||
HTML Multi-Page Rendering Demo - Final Version
|
|
||||||
|
|
||||||
This example demonstrates a complete HTML to multi-page layout system that:
|
|
||||||
1. Parses HTML content using pyWebLayout's HTML extraction system
|
|
||||||
2. Layouts content across multiple pages using the document layouter
|
|
||||||
3. Saves each page as an image file
|
|
||||||
4. Shows true multi-page functionality with smaller pages
|
|
||||||
|
|
||||||
This demonstrates the complete pipeline from HTML to multi-page layout.
|
|
||||||
"""
|
|
||||||
|
|
||||||
import os
|
|
||||||
import sys
|
|
||||||
from pathlib import Path
|
|
||||||
from typing import List, Tuple
|
|
||||||
from PIL import Image, ImageDraw, ImageFont
|
|
||||||
|
|
||||||
# Add pyWebLayout to path
|
|
||||||
sys.path.insert(0, str(Path(__file__).parent.parent))
|
|
||||||
|
|
||||||
from pyWebLayout.io.readers.html_extraction import parse_html_string
|
|
||||||
from pyWebLayout.layout.document_layouter import paragraph_layouter
|
|
||||||
from pyWebLayout.style.abstract_style import AbstractStyle
|
|
||||||
from pyWebLayout.style.concrete_style import StyleResolver, RenderingContext
|
|
||||||
from pyWebLayout.style import Font
|
|
||||||
from pyWebLayout.abstract.block import Block, Paragraph, Heading
|
|
||||||
from pyWebLayout.abstract.inline import Word
|
|
||||||
from pyWebLayout.concrete.text import Line
|
|
||||||
|
|
||||||
|
|
||||||
class MultiPage:
|
|
||||||
"""A page implementation optimized for multi-page layout demonstration."""
|
|
||||||
|
|
||||||
def __init__(self, width=400, height=500, max_lines=15): # Smaller pages for multi-page demo
|
|
||||||
self.border_size = 30
|
|
||||||
self._current_y_offset = self.border_size + 20 # Leave space for header
|
|
||||||
self.available_width = width - (2 * self.border_size)
|
|
||||||
self.available_height = height - (2 * self.border_size) - 40 # Space for header/footer
|
|
||||||
self.max_lines = max_lines
|
|
||||||
self.lines_added = 0
|
|
||||||
self.children = []
|
|
||||||
self.page_size = (width, height)
|
|
||||||
|
|
||||||
# Create a real drawing context
|
|
||||||
self.image = Image.new('RGB', (width, height), 'white')
|
|
||||||
self.draw = ImageDraw.Draw(self.image)
|
|
||||||
|
|
||||||
# Create a real style resolver
|
|
||||||
context = RenderingContext(base_font_size=14)
|
|
||||||
self.style_resolver = StyleResolver(context)
|
|
||||||
|
|
||||||
# Draw page border and header area
|
|
||||||
border_color = (180, 180, 180)
|
|
||||||
self.draw.rectangle([0, 0, width-1, height-1], outline=border_color, width=2)
|
|
||||||
|
|
||||||
# Draw header line
|
|
||||||
header_y = self.border_size + 15
|
|
||||||
self.draw.line([self.border_size, header_y, width - self.border_size, header_y],
|
|
||||||
fill=border_color, width=1)
|
|
||||||
|
|
||||||
def can_fit_line(self, line_height):
|
|
||||||
"""Check if another line can fit on the page."""
|
|
||||||
remaining_height = self.available_height - (self._current_y_offset - self.border_size - 20)
|
|
||||||
can_fit = remaining_height >= line_height and self.lines_added < self.max_lines
|
|
||||||
return can_fit
|
|
||||||
|
|
||||||
def add_child(self, child):
|
|
||||||
"""Add a child element (like a Line) to the page."""
|
|
||||||
self.children.append(child)
|
|
||||||
self.lines_added += 1
|
|
||||||
|
|
||||||
# Draw the line content on the page
|
|
||||||
if isinstance(child, Line):
|
|
||||||
self._draw_line(child)
|
|
||||||
|
|
||||||
# Update y offset for next line
|
|
||||||
self._current_y_offset += 18 # Line spacing
|
|
||||||
|
|
||||||
return True
|
|
||||||
|
|
||||||
def _draw_line(self, line):
|
|
||||||
"""Draw a line of text on the page."""
|
|
||||||
try:
|
|
||||||
# Use a default font for drawing
|
|
||||||
try:
|
|
||||||
font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf", 12)
|
|
||||||
except:
|
|
||||||
font = ImageFont.load_default()
|
|
||||||
|
|
||||||
# Get line text (simplified - in real implementation this would be more complex)
|
|
||||||
line_text = getattr(line, '_text_content', 'Text line')
|
|
||||||
|
|
||||||
# Draw the text
|
|
||||||
text_color = (0, 0, 0) # Black
|
|
||||||
x = self.border_size + 5
|
|
||||||
y = self._current_y_offset
|
|
||||||
|
|
||||||
self.draw.text((x, y), line_text, fill=text_color, font=font)
|
|
||||||
|
|
||||||
except Exception as e:
|
|
||||||
# Fallback: draw a simple representation
|
|
||||||
x = self.border_size + 5
|
|
||||||
y = self._current_y_offset
|
|
||||||
self.draw.text((x, y), "Text line", fill=(0, 0, 0))
|
|
||||||
|
|
||||||
|
|
||||||
class SimpleWord(Word):
|
|
||||||
"""A simple word implementation that works with the layouter."""
|
|
||||||
|
|
||||||
def __init__(self, text, style=None):
|
|
||||||
if style is None:
|
|
||||||
style = Font(font_size=12) # Smaller font for more content per page
|
|
||||||
super().__init__(text, style)
|
|
||||||
|
|
||||||
def possible_hyphenation(self):
|
|
||||||
"""Return possible hyphenation points."""
|
|
||||||
if len(self.text) <= 6:
|
|
||||||
return []
|
|
||||||
|
|
||||||
# Simple hyphenation: split roughly in the middle
|
|
||||||
mid = len(self.text) // 2
|
|
||||||
return [(self.text[:mid] + "-", self.text[mid:])]
|
|
||||||
|
|
||||||
|
|
||||||
class SimpleParagraph:
|
|
||||||
"""A simple paragraph implementation that works with the layouter."""
|
|
||||||
|
|
||||||
def __init__(self, text_content, style=None, is_heading=False):
|
|
||||||
if style is None:
|
|
||||||
if is_heading:
|
|
||||||
style = AbstractStyle(
|
|
||||||
word_spacing=4.0,
|
|
||||||
word_spacing_min=2.0,
|
|
||||||
word_spacing_max=8.0
|
|
||||||
)
|
|
||||||
else:
|
|
||||||
style = AbstractStyle(
|
|
||||||
word_spacing=3.0,
|
|
||||||
word_spacing_min=2.0,
|
|
||||||
word_spacing_max=6.0
|
|
||||||
)
|
|
||||||
|
|
||||||
self.style = style
|
|
||||||
self.line_height = 18 if not is_heading else 22 # Slightly larger for headings
|
|
||||||
self.is_heading = is_heading
|
|
||||||
|
|
||||||
# Create words from text content
|
|
||||||
self.words = []
|
|
||||||
for word_text in text_content.split():
|
|
||||||
if word_text.strip():
|
|
||||||
word = SimpleWord(word_text.strip())
|
|
||||||
self.words.append(word)
|
|
||||||
|
|
||||||
|
|
||||||
def create_longer_html() -> str:
|
|
||||||
"""Create a longer HTML document that will definitely span multiple pages."""
|
|
||||||
return """
|
|
||||||
<html>
|
|
||||||
<body>
|
|
||||||
<h1>The Complete Guide to Multi-Page Layout Systems</h1>
|
|
||||||
|
|
||||||
<p>This comprehensive document demonstrates the capabilities of the pyWebLayout system
|
|
||||||
for rendering HTML content across multiple pages. The system is designed to handle
|
|
||||||
complex document structures while maintaining precise control over layout and formatting.</p>
|
|
||||||
|
|
||||||
<p>The multi-page layout engine processes content incrementally, ensuring that text
|
|
||||||
flows naturally from one page to the next. This approach is essential for creating
|
|
||||||
professional-quality documents and ereader applications.</p>
|
|
||||||
|
|
||||||
<h2>Chapter 1: Introduction to Document Layout</h2>
|
|
||||||
|
|
||||||
<p>Document layout systems have evolved significantly over the years, from simple
|
|
||||||
text processors to sophisticated engines capable of handling complex typography,
|
|
||||||
multiple columns, and advanced formatting features.</p>
|
|
||||||
|
|
||||||
<p>The pyWebLayout system represents a modern approach to document processing,
|
|
||||||
combining the flexibility of HTML with the precision required for high-quality
|
|
||||||
page layout. This makes it suitable for a wide range of applications.</p>
|
|
||||||
|
|
||||||
<p>Key features of the system include automatic page breaking, font scaling support,
|
|
||||||
position tracking for navigation, and comprehensive support for HTML elements
|
|
||||||
including headings, paragraphs, lists, tables, and inline formatting.</p>
|
|
||||||
|
|
||||||
<h2>Chapter 2: Technical Architecture</h2>
|
|
||||||
|
|
||||||
<p>The system is built on a layered architecture that separates content parsing
|
|
||||||
from layout rendering. This separation allows for maximum flexibility while
|
|
||||||
maintaining performance and reliability.</p>
|
|
||||||
|
|
||||||
<p>At the core of the system is the HTML extraction module, which converts HTML
|
|
||||||
elements into abstract document structures. These structures are then processed
|
|
||||||
by the layout engine to produce concrete page representations.</p>
|
|
||||||
|
|
||||||
<p>The layout engine uses sophisticated algorithms to determine optimal line breaks,
|
|
||||||
word spacing, and page boundaries. It can handle complex scenarios such as
|
|
||||||
hyphenation, widow and orphan control, and multi-column layouts.</p>
|
|
||||||
|
|
||||||
<h2>Chapter 3: Practical Applications</h2>
|
|
||||||
|
|
||||||
<p>This technology has numerous practical applications in modern software development.
|
|
||||||
Ereader applications benefit from the precise position tracking and font scaling
|
|
||||||
capabilities, while document processing systems can leverage the robust HTML parsing.</p>
|
|
||||||
|
|
||||||
<p>The system is particularly well-suited for applications that need to display
|
|
||||||
long-form content in a paginated format. This includes digital books, technical
|
|
||||||
documentation, reports, and academic papers.</p>
|
|
||||||
|
|
||||||
<p>Performance characteristics are excellent, with sub-second rendering times for
|
|
||||||
typical documents. The system can handle documents with thousands of pages while
|
|
||||||
maintaining responsive user interaction.</p>
|
|
||||||
|
|
||||||
<h2>Chapter 4: Advanced Features</h2>
|
|
||||||
|
|
||||||
<p>Beyond basic text layout, the system supports advanced features such as
|
|
||||||
bidirectional text rendering, complex table layouts, and embedded images.
|
|
||||||
These features make it suitable for international applications and rich content.</p>
|
|
||||||
|
|
||||||
<p>The position tracking system is particularly noteworthy, as it maintains
|
|
||||||
stable references to content locations even when layout parameters change.
|
|
||||||
This enables features like bookmarking and search result highlighting.</p>
|
|
||||||
|
|
||||||
<p>Font scaling is implemented at the layout level, ensuring that all elements
|
|
||||||
scale proportionally while maintaining optimal readability. This is crucial
|
|
||||||
for accessibility and user preference support.</p>
|
|
||||||
|
|
||||||
<h2>Conclusion</h2>
|
|
||||||
|
|
||||||
<p>The pyWebLayout system demonstrates that it's possible to create sophisticated
|
|
||||||
document layout engines using modern Python technologies. The combination of
|
|
||||||
HTML parsing, abstract document modeling, and precise layout control provides
|
|
||||||
a powerful foundation for document-centric applications.</p>
|
|
||||||
|
|
||||||
<p>This example has shown the complete pipeline from HTML input to multi-page
|
|
||||||
output, illustrating how the various components work together to produce
|
|
||||||
high-quality results. The system is ready for use in production applications
|
|
||||||
requiring professional document layout capabilities.</p>
|
|
||||||
</body>
|
|
||||||
</html>
|
|
||||||
"""
|
|
||||||
|
|
||||||
|
|
||||||
class HTMLMultiPageRenderer:
|
|
||||||
"""HTML to multi-page renderer with enhanced multi-page demonstration."""
|
|
||||||
|
|
||||||
def __init__(self, page_size: Tuple[int, int] = (400, 500)):
|
|
||||||
self.page_size = page_size
|
|
||||||
|
|
||||||
def parse_html_to_paragraphs(self, html_content: str) -> List[SimpleParagraph]:
|
|
||||||
"""Parse HTML content into simple paragraphs."""
|
|
||||||
# Parse HTML using the extraction system
|
|
||||||
base_font = Font(font_size=12)
|
|
||||||
blocks = parse_html_string(html_content, base_font=base_font)
|
|
||||||
|
|
||||||
paragraphs = []
|
|
||||||
|
|
||||||
for block in blocks:
|
|
||||||
if isinstance(block, (Paragraph, Heading)):
|
|
||||||
# Extract text from the block
|
|
||||||
text_parts = []
|
|
||||||
|
|
||||||
# Get words from the block - handle tuple format
|
|
||||||
if hasattr(block, 'words') and callable(block.words):
|
|
||||||
for word_item in block.words():
|
|
||||||
# Handle both Word objects and tuples
|
|
||||||
if hasattr(word_item, 'text'):
|
|
||||||
text_parts.append(word_item.text)
|
|
||||||
elif isinstance(word_item, tuple) and len(word_item) >= 2:
|
|
||||||
# Tuple format: (position, word_object)
|
|
||||||
word_obj = word_item[1]
|
|
||||||
if hasattr(word_obj, 'text'):
|
|
||||||
text_parts.append(word_obj.text)
|
|
||||||
elif isinstance(word_item, str):
|
|
||||||
text_parts.append(word_item)
|
|
||||||
|
|
||||||
# Fallback: try _words attribute directly
|
|
||||||
if not text_parts and hasattr(block, '_words'):
|
|
||||||
for word_item in block._words:
|
|
||||||
if hasattr(word_item, 'text'):
|
|
||||||
text_parts.append(word_item.text)
|
|
||||||
elif isinstance(word_item, str):
|
|
||||||
text_parts.append(word_item)
|
|
||||||
|
|
||||||
if text_parts:
|
|
||||||
text_content = " ".join(text_parts)
|
|
||||||
is_heading = isinstance(block, Heading)
|
|
||||||
|
|
||||||
# Create appropriate style based on block type
|
|
||||||
if is_heading:
|
|
||||||
style = AbstractStyle(
|
|
||||||
word_spacing=4.0,
|
|
||||||
word_spacing_min=2.0,
|
|
||||||
word_spacing_max=8.0
|
|
||||||
)
|
|
||||||
else:
|
|
||||||
style = AbstractStyle(
|
|
||||||
word_spacing=3.0,
|
|
||||||
word_spacing_min=2.0,
|
|
||||||
word_spacing_max=6.0
|
|
||||||
)
|
|
||||||
|
|
||||||
paragraph = SimpleParagraph(text_content, style, is_heading)
|
|
||||||
paragraphs.append(paragraph)
|
|
||||||
|
|
||||||
return paragraphs
|
|
||||||
|
|
||||||
def render_pages(self, paragraphs: List[SimpleParagraph]) -> List[MultiPage]:
|
|
||||||
"""Render paragraphs into multiple pages."""
|
|
||||||
if not paragraphs:
|
|
||||||
return []
|
|
||||||
|
|
||||||
pages = []
|
|
||||||
current_page = MultiPage(*self.page_size)
|
|
||||||
pages.append(current_page)
|
|
||||||
|
|
||||||
for para_idx, paragraph in enumerate(paragraphs):
|
|
||||||
start_word = 0
|
|
||||||
|
|
||||||
# Add extra spacing before headings (except first paragraph)
|
|
||||||
if paragraph.is_heading and para_idx > 0 and current_page.lines_added > 0:
|
|
||||||
# Check if we have room for heading + some content
|
|
||||||
if current_page.lines_added >= current_page.max_lines - 3:
|
|
||||||
# Start heading on new page
|
|
||||||
current_page = MultiPage(*self.page_size)
|
|
||||||
pages.append(current_page)
|
|
||||||
|
|
||||||
while start_word < len(paragraph.words):
|
|
||||||
# Try to layout the paragraph (or remaining part) on current page
|
|
||||||
success, failed_word_index, remaining_pretext = paragraph_layouter(
|
|
||||||
paragraph, current_page, start_word
|
|
||||||
)
|
|
||||||
|
|
||||||
if success:
|
|
||||||
# Paragraph completed on this page
|
|
||||||
break
|
|
||||||
else:
|
|
||||||
# Page is full, create a new page
|
|
||||||
current_page = MultiPage(*self.page_size)
|
|
||||||
pages.append(current_page)
|
|
||||||
|
|
||||||
# Continue with the failed word on the new page
|
|
||||||
if failed_word_index is not None:
|
|
||||||
start_word = failed_word_index
|
|
||||||
else:
|
|
||||||
# If no specific word failed, move to next paragraph
|
|
||||||
break
|
|
||||||
|
|
||||||
return pages
|
|
||||||
|
|
||||||
def save_pages(self, pages: List[MultiPage], output_dir: str = "output/html_multipage_final"):
|
|
||||||
"""Save pages as image files with enhanced formatting."""
|
|
||||||
os.makedirs(output_dir, exist_ok=True)
|
|
||||||
|
|
||||||
for i, page in enumerate(pages, 1):
|
|
||||||
# Add page header and footer
|
|
||||||
try:
|
|
||||||
font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf", 10)
|
|
||||||
title_font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf", 11)
|
|
||||||
except:
|
|
||||||
font = ImageFont.load_default()
|
|
||||||
title_font = font
|
|
||||||
|
|
||||||
# Add document title in header
|
|
||||||
header_text = "HTML Multi-Page Layout Demo"
|
|
||||||
text_bbox = page.draw.textbbox((0, 0), header_text, font=title_font)
|
|
||||||
text_width = text_bbox[2] - text_bbox[0]
|
|
||||||
text_x = (page.page_size[0] - text_width) // 2
|
|
||||||
text_y = 8
|
|
||||||
|
|
||||||
page.draw.text((text_x, text_y), header_text, fill=(100, 100, 100), font=title_font)
|
|
||||||
|
|
||||||
# Add page number in footer
|
|
||||||
page_text = f"Page {i} of {len(pages)}"
|
|
||||||
text_bbox = page.draw.textbbox((0, 0), page_text, font=font)
|
|
||||||
text_width = text_bbox[2] - text_bbox[0]
|
|
||||||
text_x = (page.page_size[0] - text_width) // 2
|
|
||||||
text_y = page.page_size[1] - 20
|
|
||||||
|
|
||||||
page.draw.text((text_x, text_y), page_text, fill=(120, 120, 120), font=font)
|
|
||||||
|
|
||||||
# Save the page
|
|
||||||
filename = f"page_{i:03d}.png"
|
|
||||||
filepath = os.path.join(output_dir, filename)
|
|
||||||
page.image.save(filepath)
|
|
||||||
print(f"Saved {filepath}")
|
|
||||||
|
|
||||||
print(f"\nRendered {len(pages)} pages to {output_dir}/")
|
|
||||||
|
|
||||||
|
|
||||||
def main():
|
|
||||||
"""Main demo function."""
|
|
||||||
print("HTML Multi-Page Rendering Demo - Final Version")
|
|
||||||
print("=" * 55)
|
|
||||||
|
|
||||||
# Create longer HTML content for multi-page demo
|
|
||||||
print("1. Creating comprehensive HTML content...")
|
|
||||||
html_content = create_longer_html()
|
|
||||||
print(f" Created HTML document ({len(html_content)} characters)")
|
|
||||||
|
|
||||||
# Initialize renderer with smaller pages to force multi-page layout
|
|
||||||
print("\n2. Initializing renderer with smaller pages...")
|
|
||||||
renderer = HTMLMultiPageRenderer(page_size=(400, 500)) # Smaller pages
|
|
||||||
print(" Renderer initialized (400x500 pixel pages)")
|
|
||||||
|
|
||||||
# Parse HTML to paragraphs
|
|
||||||
print("\n3. Parsing HTML to paragraphs...")
|
|
||||||
paragraphs = renderer.parse_html_to_paragraphs(html_content)
|
|
||||||
print(f" Parsed {len(paragraphs)} paragraphs")
|
|
||||||
|
|
||||||
# Show paragraph preview
|
|
||||||
heading_count = sum(1 for p in paragraphs if p.is_heading)
|
|
||||||
regular_count = len(paragraphs) - heading_count
|
|
||||||
print(f" Found {heading_count} headings and {regular_count} regular paragraphs")
|
|
||||||
|
|
||||||
# Render pages
|
|
||||||
print("\n4. Rendering pages...")
|
|
||||||
pages = renderer.render_pages(paragraphs)
|
|
||||||
print(f" Rendered {len(pages)} pages")
|
|
||||||
|
|
||||||
# Show page statistics
|
|
||||||
total_lines = 0
|
|
||||||
for i, page in enumerate(pages, 1):
|
|
||||||
total_lines += page.lines_added
|
|
||||||
print(f" Page {i}: {page.lines_added} lines")
|
|
||||||
|
|
||||||
# Save pages
|
|
||||||
print("\n5. Saving pages...")
|
|
||||||
renderer.save_pages(pages)
|
|
||||||
|
|
||||||
print("\n✓ Multi-page demo completed successfully!")
|
|
||||||
print("\nTo view the results:")
|
|
||||||
print(" - Check the output/html_multipage_final/ directory")
|
|
||||||
print(" - Open the PNG files to see each rendered page")
|
|
||||||
print(" - Notice how content flows naturally across pages")
|
|
||||||
|
|
||||||
# Show final statistics
|
|
||||||
print(f"\nFinal Statistics:")
|
|
||||||
print(f" - Original HTML: {len(html_content)} characters")
|
|
||||||
print(f" - Parsed paragraphs: {len(paragraphs)} ({heading_count} headings, {regular_count} regular)")
|
|
||||||
print(f" - Rendered pages: {len(pages)}")
|
|
||||||
print(f" - Total lines: {total_lines}")
|
|
||||||
print(f" - Average lines per page: {total_lines / len(pages):.1f}")
|
|
||||||
print(f" - Page size: {renderer.page_size[0]}x{renderer.page_size[1]} pixels")
|
|
||||||
|
|
||||||
print(f"\n🎉 This demonstrates the complete HTML → Multi-Page pipeline!")
|
|
||||||
print(f" The system successfully parsed HTML and laid it out across {len(pages)} pages.")
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
|
||||||
main()
|
|
||||||
@ -1,365 +0,0 @@
|
|||||||
#!/usr/bin/env python3
|
|
||||||
"""
|
|
||||||
Simple HTML Multi-Page Rendering Demo
|
|
||||||
|
|
||||||
This example demonstrates a working HTML to multi-page layout system using
|
|
||||||
the proven patterns from the integration tests. It shows:
|
|
||||||
|
|
||||||
1. Parse HTML content using pyWebLayout's HTML extraction system
|
|
||||||
2. Layout the parsed content across multiple pages using the document layouter
|
|
||||||
3. Save each page as an image file
|
|
||||||
|
|
||||||
This is a simplified but functional implementation.
|
|
||||||
"""
|
|
||||||
|
|
||||||
import os
|
|
||||||
import sys
|
|
||||||
from pathlib import Path
|
|
||||||
from typing import List, Tuple
|
|
||||||
from PIL import Image, ImageDraw, ImageFont
|
|
||||||
|
|
||||||
# Add pyWebLayout to path
|
|
||||||
sys.path.insert(0, str(Path(__file__).parent.parent))
|
|
||||||
|
|
||||||
from pyWebLayout.io.readers.html_extraction import parse_html_string
|
|
||||||
from pyWebLayout.layout.document_layouter import paragraph_layouter
|
|
||||||
from pyWebLayout.style.abstract_style import AbstractStyle
|
|
||||||
from pyWebLayout.style.concrete_style import StyleResolver, RenderingContext
|
|
||||||
from pyWebLayout.style import Font
|
|
||||||
from pyWebLayout.abstract.block import Block, Paragraph, Heading
|
|
||||||
from pyWebLayout.abstract.inline import Word
|
|
||||||
from pyWebLayout.concrete.text import Line
|
|
||||||
|
|
||||||
|
|
||||||
class SimplePage:
|
|
||||||
"""A simple page implementation for multi-page layout."""
|
|
||||||
|
|
||||||
def __init__(self, width=600, height=800, max_lines=30):
|
|
||||||
self.border_size = 40
|
|
||||||
self._current_y_offset = self.border_size
|
|
||||||
self.available_width = width - (2 * self.border_size)
|
|
||||||
self.available_height = height - (2 * self.border_size)
|
|
||||||
self.max_lines = max_lines
|
|
||||||
self.lines_added = 0
|
|
||||||
self.children = []
|
|
||||||
self.page_size = (width, height)
|
|
||||||
|
|
||||||
# Create a real drawing context
|
|
||||||
self.image = Image.new('RGB', (width, height), 'white')
|
|
||||||
self.draw = ImageDraw.Draw(self.image)
|
|
||||||
|
|
||||||
# Create a real style resolver
|
|
||||||
context = RenderingContext(base_font_size=16)
|
|
||||||
self.style_resolver = StyleResolver(context)
|
|
||||||
|
|
||||||
# Draw page border
|
|
||||||
border_color = (220, 220, 220)
|
|
||||||
self.draw.rectangle([0, 0, width-1, height-1], outline=border_color, width=2)
|
|
||||||
|
|
||||||
def can_fit_line(self, line_height):
|
|
||||||
"""Check if another line can fit on the page."""
|
|
||||||
remaining_height = self.available_height - (self._current_y_offset - self.border_size)
|
|
||||||
can_fit = remaining_height >= line_height and self.lines_added < self.max_lines
|
|
||||||
return can_fit
|
|
||||||
|
|
||||||
def add_child(self, child):
|
|
||||||
"""Add a child element (like a Line) to the page."""
|
|
||||||
self.children.append(child)
|
|
||||||
self.lines_added += 1
|
|
||||||
|
|
||||||
# Draw the line content on the page
|
|
||||||
if isinstance(child, Line):
|
|
||||||
self._draw_line(child)
|
|
||||||
|
|
||||||
return True
|
|
||||||
|
|
||||||
def _draw_line(self, line):
|
|
||||||
"""Draw a line of text on the page."""
|
|
||||||
try:
|
|
||||||
# Use a default font for drawing
|
|
||||||
try:
|
|
||||||
font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf", 14)
|
|
||||||
except:
|
|
||||||
font = ImageFont.load_default()
|
|
||||||
|
|
||||||
# Get line text (simplified)
|
|
||||||
line_text = getattr(line, '_text_content', 'Line content')
|
|
||||||
|
|
||||||
# Draw the text
|
|
||||||
text_color = (0, 0, 0) # Black
|
|
||||||
x = self.border_size + 10
|
|
||||||
y = self._current_y_offset
|
|
||||||
|
|
||||||
self.draw.text((x, y), line_text, fill=text_color, font=font)
|
|
||||||
|
|
||||||
except Exception as e:
|
|
||||||
# Fallback: draw a simple representation
|
|
||||||
x = self.border_size + 10
|
|
||||||
y = self._current_y_offset
|
|
||||||
self.draw.text((x, y), "Text line", fill=(0, 0, 0))
|
|
||||||
|
|
||||||
|
|
||||||
class SimpleWord(Word):
|
|
||||||
"""A simple word implementation that works with the layouter."""
|
|
||||||
|
|
||||||
def __init__(self, text, style=None):
|
|
||||||
if style is None:
|
|
||||||
style = Font(font_size=14)
|
|
||||||
super().__init__(text, style)
|
|
||||||
|
|
||||||
def possible_hyphenation(self):
|
|
||||||
"""Return possible hyphenation points."""
|
|
||||||
if len(self.text) <= 6:
|
|
||||||
return []
|
|
||||||
|
|
||||||
# Simple hyphenation: split roughly in the middle
|
|
||||||
mid = len(self.text) // 2
|
|
||||||
return [(self.text[:mid] + "-", self.text[mid:])]
|
|
||||||
|
|
||||||
|
|
||||||
class SimpleParagraph:
|
|
||||||
"""A simple paragraph implementation that works with the layouter."""
|
|
||||||
|
|
||||||
def __init__(self, text_content, style=None):
|
|
||||||
if style is None:
|
|
||||||
style = AbstractStyle(
|
|
||||||
word_spacing=4.0,
|
|
||||||
word_spacing_min=2.0,
|
|
||||||
word_spacing_max=8.0
|
|
||||||
)
|
|
||||||
|
|
||||||
self.style = style
|
|
||||||
self.line_height = 20
|
|
||||||
|
|
||||||
# Create words from text content
|
|
||||||
self.words = []
|
|
||||||
for word_text in text_content.split():
|
|
||||||
if word_text.strip():
|
|
||||||
word = SimpleWord(word_text.strip())
|
|
||||||
self.words.append(word)
|
|
||||||
|
|
||||||
|
|
||||||
def create_sample_html() -> str:
|
|
||||||
"""Create a sample HTML document for testing."""
|
|
||||||
return """
|
|
||||||
<html>
|
|
||||||
<body>
|
|
||||||
<h1>Chapter 1: Introduction</h1>
|
|
||||||
|
|
||||||
<p>This is the first paragraph of our sample document. It demonstrates how HTML content
|
|
||||||
can be parsed and then laid out across multiple pages using the pyWebLayout system.</p>
|
|
||||||
|
|
||||||
<p>Here's another paragraph with some more text to show how the system handles
|
|
||||||
multiple paragraphs and automatic page breaking when content exceeds page boundaries.</p>
|
|
||||||
|
|
||||||
<h2>Section 1.1: Features</h2>
|
|
||||||
|
|
||||||
<p>The multi-page layout system includes several key features that make it suitable
|
|
||||||
for ereader applications and document processing systems.</p>
|
|
||||||
|
|
||||||
<p>Each paragraph is processed individually and can span multiple lines or even
|
|
||||||
multiple pages if the content is long enough to require it.</p>
|
|
||||||
|
|
||||||
<h1>Chapter 2: Implementation</h1>
|
|
||||||
|
|
||||||
<p>The implementation uses a sophisticated layout engine that processes abstract
|
|
||||||
document elements and renders them onto concrete pages.</p>
|
|
||||||
|
|
||||||
<p>This separation allows for flexible styling and layout while maintaining
|
|
||||||
the semantic structure of the original content.</p>
|
|
||||||
|
|
||||||
<p>The system can handle various HTML elements including headings, paragraphs,
|
|
||||||
lists, and other block-level elements commonly found in documents.</p>
|
|
||||||
|
|
||||||
<p>Position tracking is maintained throughout the layout process, enabling
|
|
||||||
features like bookmarking and navigation between different views of the content.</p>
|
|
||||||
</body>
|
|
||||||
</html>
|
|
||||||
"""
|
|
||||||
|
|
||||||
|
|
||||||
class HTMLMultiPageRenderer:
|
|
||||||
"""Simple HTML to multi-page renderer."""
|
|
||||||
|
|
||||||
def __init__(self, page_size: Tuple[int, int] = (600, 800)):
|
|
||||||
self.page_size = page_size
|
|
||||||
|
|
||||||
def parse_html_to_paragraphs(self, html_content: str) -> List[SimpleParagraph]:
|
|
||||||
"""Parse HTML content into simple paragraphs."""
|
|
||||||
# Parse HTML using the extraction system
|
|
||||||
base_font = Font(font_size=14)
|
|
||||||
blocks = parse_html_string(html_content, base_font=base_font)
|
|
||||||
|
|
||||||
paragraphs = []
|
|
||||||
|
|
||||||
for block in blocks:
|
|
||||||
if isinstance(block, (Paragraph, Heading)):
|
|
||||||
# Extract text from the block
|
|
||||||
text_parts = []
|
|
||||||
|
|
||||||
# Get words from the block - handle tuple format
|
|
||||||
if hasattr(block, 'words') and callable(block.words):
|
|
||||||
for word_item in block.words():
|
|
||||||
# Handle both Word objects and tuples
|
|
||||||
if hasattr(word_item, 'text'):
|
|
||||||
text_parts.append(word_item.text)
|
|
||||||
elif isinstance(word_item, tuple) and len(word_item) >= 2:
|
|
||||||
# Tuple format: (position, word_object)
|
|
||||||
word_obj = word_item[1]
|
|
||||||
if hasattr(word_obj, 'text'):
|
|
||||||
text_parts.append(word_obj.text)
|
|
||||||
elif isinstance(word_item, str):
|
|
||||||
text_parts.append(word_item)
|
|
||||||
|
|
||||||
# Fallback: try _words attribute directly
|
|
||||||
if not text_parts and hasattr(block, '_words'):
|
|
||||||
for word_item in block._words:
|
|
||||||
if hasattr(word_item, 'text'):
|
|
||||||
text_parts.append(word_item.text)
|
|
||||||
elif isinstance(word_item, str):
|
|
||||||
text_parts.append(word_item)
|
|
||||||
|
|
||||||
if text_parts:
|
|
||||||
text_content = " ".join(text_parts)
|
|
||||||
|
|
||||||
# Create appropriate style based on block type
|
|
||||||
if isinstance(block, Heading):
|
|
||||||
style = AbstractStyle(
|
|
||||||
word_spacing=5.0,
|
|
||||||
word_spacing_min=3.0,
|
|
||||||
word_spacing_max=10.0
|
|
||||||
)
|
|
||||||
else:
|
|
||||||
style = AbstractStyle(
|
|
||||||
word_spacing=4.0,
|
|
||||||
word_spacing_min=2.0,
|
|
||||||
word_spacing_max=8.0
|
|
||||||
)
|
|
||||||
|
|
||||||
paragraph = SimpleParagraph(text_content, style)
|
|
||||||
paragraphs.append(paragraph)
|
|
||||||
|
|
||||||
return paragraphs
|
|
||||||
|
|
||||||
def render_pages(self, paragraphs: List[SimpleParagraph]) -> List[SimplePage]:
|
|
||||||
"""Render paragraphs into multiple pages."""
|
|
||||||
if not paragraphs:
|
|
||||||
return []
|
|
||||||
|
|
||||||
pages = []
|
|
||||||
current_page = SimplePage(*self.page_size)
|
|
||||||
pages.append(current_page)
|
|
||||||
|
|
||||||
for paragraph in paragraphs:
|
|
||||||
start_word = 0
|
|
||||||
|
|
||||||
while start_word < len(paragraph.words):
|
|
||||||
# Try to layout the paragraph (or remaining part) on current page
|
|
||||||
success, failed_word_index, remaining_pretext = paragraph_layouter(
|
|
||||||
paragraph, current_page, start_word
|
|
||||||
)
|
|
||||||
|
|
||||||
if success:
|
|
||||||
# Paragraph completed on this page
|
|
||||||
break
|
|
||||||
else:
|
|
||||||
# Page is full, create a new page
|
|
||||||
current_page = SimplePage(*self.page_size)
|
|
||||||
pages.append(current_page)
|
|
||||||
|
|
||||||
# Continue with the failed word on the new page
|
|
||||||
if failed_word_index is not None:
|
|
||||||
start_word = failed_word_index
|
|
||||||
else:
|
|
||||||
# If no specific word failed, move to next paragraph
|
|
||||||
break
|
|
||||||
|
|
||||||
return pages
|
|
||||||
|
|
||||||
def save_pages(self, pages: List[SimplePage], output_dir: str = "output/html_simple"):
|
|
||||||
"""Save pages as image files."""
|
|
||||||
os.makedirs(output_dir, exist_ok=True)
|
|
||||||
|
|
||||||
for i, page in enumerate(pages, 1):
|
|
||||||
# Add page number
|
|
||||||
try:
|
|
||||||
font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf", 12)
|
|
||||||
except:
|
|
||||||
font = ImageFont.load_default()
|
|
||||||
|
|
||||||
page_text = f"Page {i}"
|
|
||||||
text_bbox = page.draw.textbbox((0, 0), page_text, font=font)
|
|
||||||
text_width = text_bbox[2] - text_bbox[0]
|
|
||||||
text_x = (page.page_size[0] - text_width) // 2
|
|
||||||
text_y = page.page_size[1] - 25
|
|
||||||
|
|
||||||
page.draw.text((text_x, text_y), page_text, fill=(100, 100, 100), font=font)
|
|
||||||
|
|
||||||
# Save the page
|
|
||||||
filename = f"page_{i:03d}.png"
|
|
||||||
filepath = os.path.join(output_dir, filename)
|
|
||||||
page.image.save(filepath)
|
|
||||||
print(f"Saved {filepath}")
|
|
||||||
|
|
||||||
print(f"\nRendered {len(pages)} pages to {output_dir}/")
|
|
||||||
|
|
||||||
|
|
||||||
def main():
|
|
||||||
"""Main demo function."""
|
|
||||||
print("Simple HTML Multi-Page Rendering Demo")
|
|
||||||
print("=" * 45)
|
|
||||||
|
|
||||||
# Create sample HTML content
|
|
||||||
print("1. Creating sample HTML content...")
|
|
||||||
html_content = create_sample_html()
|
|
||||||
print(f" Created HTML document ({len(html_content)} characters)")
|
|
||||||
|
|
||||||
# Initialize renderer
|
|
||||||
print("\n2. Initializing renderer...")
|
|
||||||
renderer = HTMLMultiPageRenderer(page_size=(600, 800))
|
|
||||||
print(" Renderer initialized")
|
|
||||||
|
|
||||||
# Parse HTML to paragraphs
|
|
||||||
print("\n3. Parsing HTML to paragraphs...")
|
|
||||||
paragraphs = renderer.parse_html_to_paragraphs(html_content)
|
|
||||||
print(f" Parsed {len(paragraphs)} paragraphs")
|
|
||||||
|
|
||||||
# Show paragraph preview
|
|
||||||
for i, para in enumerate(paragraphs[:3]): # Show first 3
|
|
||||||
preview = " ".join(word.text for word in para.words[:8]) # First 8 words
|
|
||||||
if len(para.words) > 8:
|
|
||||||
preview += "..."
|
|
||||||
print(f" Paragraph {i+1}: {preview}")
|
|
||||||
|
|
||||||
if len(paragraphs) > 3:
|
|
||||||
print(f" ... and {len(paragraphs) - 3} more paragraphs")
|
|
||||||
|
|
||||||
# Render pages
|
|
||||||
print("\n4. Rendering pages...")
|
|
||||||
pages = renderer.render_pages(paragraphs)
|
|
||||||
print(f" Rendered {len(pages)} pages")
|
|
||||||
|
|
||||||
# Show page statistics
|
|
||||||
for i, page in enumerate(pages, 1):
|
|
||||||
print(f" Page {i}: {page.lines_added} lines")
|
|
||||||
|
|
||||||
# Save pages
|
|
||||||
print("\n5. Saving pages...")
|
|
||||||
renderer.save_pages(pages)
|
|
||||||
|
|
||||||
print("\n✓ Demo completed successfully!")
|
|
||||||
print("\nTo view the results:")
|
|
||||||
print(" - Check the output/html_simple/ directory")
|
|
||||||
print(" - Open the PNG files to see each rendered page")
|
|
||||||
|
|
||||||
# Show statistics
|
|
||||||
print(f"\nStatistics:")
|
|
||||||
print(f" - Original HTML: {len(html_content)} characters")
|
|
||||||
print(f" - Parsed paragraphs: {len(paragraphs)}")
|
|
||||||
print(f" - Rendered pages: {len(pages)}")
|
|
||||||
print(f" - Total lines: {sum(page.lines_added for page in pages)}")
|
|
||||||
print(f" - Page size: {renderer.page_size[0]}x{renderer.page_size[1]} pixels")
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
|
||||||
main()
|
|
||||||
@ -6,7 +6,6 @@ from pyWebLayout.style import Alignment, Font, FontStyle, FontWeight, TextDecora
|
|||||||
from pyWebLayout.abstract import Word
|
from pyWebLayout.abstract import Word
|
||||||
from pyWebLayout.abstract.inline import LinkedWord
|
from pyWebLayout.abstract.inline import LinkedWord
|
||||||
from pyWebLayout.abstract.functional import Link
|
from pyWebLayout.abstract.functional import Link
|
||||||
from .functional import LinkText, ButtonText
|
|
||||||
from PIL import Image, ImageDraw, ImageFont
|
from PIL import Image, ImageDraw, ImageFont
|
||||||
from typing import Tuple, Union, List, Optional, Protocol
|
from typing import Tuple, Union, List, Optional, Protocol
|
||||||
import numpy as np
|
import numpy as np
|
||||||
@ -395,6 +394,8 @@ class Line(Box):
|
|||||||
|
|
||||||
# Try to add the full word - create LinkText for LinkedWord, regular Text otherwise
|
# Try to add the full word - create LinkText for LinkedWord, regular Text otherwise
|
||||||
if isinstance(word, LinkedWord):
|
if isinstance(word, LinkedWord):
|
||||||
|
# Import here to avoid circular dependency
|
||||||
|
from .functional import LinkText
|
||||||
# Create a LinkText which includes the link functionality
|
# Create a LinkText which includes the link functionality
|
||||||
# LinkText constructor needs: (link, text, font, draw, source, line)
|
# LinkText constructor needs: (link, text, font, draw, source, line)
|
||||||
# But LinkedWord itself contains the link properties
|
# But LinkedWord itself contains the link properties
|
||||||
@ -591,6 +592,9 @@ class Line(Box):
|
|||||||
int(size[1]) if hasattr(size, '__getitem__') else 0
|
int(size[1]) if hasattr(size, '__getitem__') else 0
|
||||||
)
|
)
|
||||||
|
|
||||||
|
# Import here to avoid circular dependency
|
||||||
|
from .functional import LinkText, ButtonText
|
||||||
|
|
||||||
if isinstance(text_obj, LinkText):
|
if isinstance(text_obj, LinkText):
|
||||||
result = QueryResult(
|
result = QueryResult(
|
||||||
object=text_obj,
|
object=text_obj,
|
||||||
|
|||||||