143
BROWSER_README.md
Normal file
@ -0,0 +1,143 @@
|
||||
# pyWebLayout HTML Browser
|
||||
|
||||
A simple HTML browser built using the pyWebLayout library components from `pyWebLayout/io/` and `pyWebLayout/concrete/`.
|
||||
|
||||
## Features
|
||||
|
||||
This browser demonstrates the capabilities of pyWebLayout by implementing:
|
||||
|
||||
### Rendering Components
|
||||
- **Text rendering** with various formatting (bold, italic, underline)
|
||||
- **Headers** (H1-H6) with proper sizing and styling
|
||||
- **Links** (clickable, with external browser opening for external URLs)
|
||||
- **Images** (local files and web URLs with error handling)
|
||||
- **Layout containers** for proper element positioning
|
||||
- **Basic HTML parsing** and element conversion
|
||||
|
||||
### User Interface
|
||||
- **Navigation controls**: Back, Forward, Refresh buttons
|
||||
- **Address bar**: Enter URLs or file paths
|
||||
- **File browser**: Open local HTML files
|
||||
- **Scrollable content area** with both vertical and horizontal scrollbars
|
||||
- **Mouse interaction**: Clickable links with hover effects
|
||||
- **Status bar**: Shows current operation status
|
||||
|
||||
## Usage
|
||||
|
||||
### Starting the Browser
|
||||
```bash
|
||||
python html_browser.py
|
||||
```
|
||||
|
||||
### Loading Content
|
||||
|
||||
1. **Load the test page**: The browser starts with a welcome page showing various features
|
||||
2. **Open local files**: Click "Open File" to browse and select HTML files
|
||||
3. **Enter URLs**: Type URLs in the address bar and press Enter or click "Go"
|
||||
4. **Navigate**: Use back/forward buttons to navigate through history
|
||||
|
||||
### Test Files
|
||||
|
||||
- `test_page.html` - A comprehensive test page demonstrating all supported features including:
|
||||
- Text formatting (bold, italic, underline)
|
||||
- Headers of all levels (H1-H6)
|
||||
- Links (both internal and external)
|
||||
- Images (includes the sample image from tests/data/)
|
||||
- Line breaks and paragraphs
|
||||
|
||||
## Architecture
|
||||
|
||||
### HTML Parser (`HTMLParser` class)
|
||||
- Simple regex-based HTML tokenizer
|
||||
- Converts HTML elements to pyWebLayout abstract objects
|
||||
- Handles font styling with a font stack for nested formatting
|
||||
- Supports basic HTML tags: h1-h6, b, strong, i, em, u, a, img, br, p, div, span
|
||||
|
||||
### Browser Window (`BrowserWindow` class)
|
||||
- Tkinter-based GUI with navigation controls
|
||||
- Canvas-based rendering of pyWebLayout Page objects
|
||||
- Mouse event handling for interactive elements
|
||||
- Navigation history management
|
||||
- File and URL loading capabilities
|
||||
|
||||
### pyWebLayout Integration
|
||||
|
||||
The browser uses these pyWebLayout components:
|
||||
|
||||
#### From `pyWebLayout/concrete/`:
|
||||
- `Page` - Top-level container for web page content
|
||||
- `Container` - Layout management for multiple elements
|
||||
- `Box` - Basic rectangular container with positioning
|
||||
- `Text` - Text rendering with font styling
|
||||
- `RenderableImage` - Image loading and display with scaling
|
||||
- `RenderableLink` - Interactive link elements
|
||||
- `RenderableButton` - Interactive button elements
|
||||
|
||||
#### From `pyWebLayout/abstract/`:
|
||||
- `Link` - Abstract link representation with types (internal, external, API, function)
|
||||
- `Image` - Abstract image representation with dimensions and loading
|
||||
- Font and styling classes for text appearance
|
||||
|
||||
#### From `pyWebLayout/style/`:
|
||||
- `Font` - Font management with size, weight, style, and decoration
|
||||
- `FontWeight`, `FontStyle`, `TextDecoration` - Typography enums
|
||||
- `Alignment` - Layout positioning options
|
||||
|
||||
## Supported HTML Features
|
||||
|
||||
### Text Elements
|
||||
- `<h1>` to `<h6>` - Headers with appropriate sizing
|
||||
- `<p>` - Paragraphs with spacing
|
||||
- `<b>`, `<strong>` - Bold text
|
||||
- `<i>`, `<em>` - Italic text
|
||||
- `<u>` - Underlined text
|
||||
- `<br>` - Line breaks
|
||||
|
||||
### Interactive Elements
|
||||
- `<a href="...">` - Links (opens external URLs in system browser)
|
||||
|
||||
### Media Elements
|
||||
- `<img src="..." alt="..." width="..." height="...">` - Images with scaling
|
||||
|
||||
### Container Elements
|
||||
- `<div>`, `<span>` - Generic containers (parsed but not specially styled)
|
||||
|
||||
## Example Usage
|
||||
|
||||
```python
|
||||
# Start the browser
|
||||
from html_browser import BrowserWindow
|
||||
|
||||
browser = BrowserWindow()
|
||||
browser.run()
|
||||
```
|
||||
|
||||
## Limitations
|
||||
|
||||
This is a demonstration browser with simplified HTML parsing:
|
||||
- No CSS support (styling is done through pyWebLayout components)
|
||||
- No JavaScript execution
|
||||
- Limited HTML tag support
|
||||
- No form submission (forms can be rendered but not submitted)
|
||||
- No advanced layout features (flexbox, grid, etc.)
|
||||
|
||||
## Dependencies
|
||||
|
||||
- `tkinter` - GUI framework (usually included with Python)
|
||||
- `PIL` (Pillow) - Image processing
|
||||
- `requests` - HTTP requests for web URLs
|
||||
- `pyWebLayout` - The core layout and rendering library
|
||||
|
||||
## Testing
|
||||
|
||||
Load `test_page.html` to see all supported features in action:
|
||||
1. Run the browser: `python html_browser.py`
|
||||
2. Click "Open File" and select `test_page.html`
|
||||
3. Explore the different text formatting, links, and image rendering
|
||||
|
||||
The test page includes:
|
||||
- Various header levels
|
||||
- Text formatting examples
|
||||
- Clickable links (try the Google link!)
|
||||
- A sample image from the test data
|
||||
- Mixed content demonstrations
|
||||
@ -1,5 +1,5 @@
|
||||
<svg width="140" height="20" viewBox="0 0 140 20" version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" xml:space="preserve" xmlns:serif="http://www.serif.com/" style="fill-rule:evenodd;clip-rule:evenodd;stroke-linejoin:round;stroke-miterlimit:2;">
|
||||
<title>interrogate: 94.6%</title>
|
||||
<title>interrogate: 92.0%</title>
|
||||
<g transform="matrix(1,0,0,1,22,0)">
|
||||
<g id="backgrounds" transform="matrix(1.32789,0,0,1,-22.3892,0)">
|
||||
<rect x="0" y="0" width="71" height="20" style="fill:rgb(85,85,85);"/>
|
||||
@ -12,8 +12,8 @@
|
||||
<g fill="#fff" text-anchor="middle" font-family="DejaVu Sans,Verdana,Geneva,sans-serif" font-size="110">
|
||||
<text x="590" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="610">interrogate</text>
|
||||
<text x="590" y="140" transform="scale(.1)" textLength="610">interrogate</text>
|
||||
<text x="1160" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="370" data-interrogate="result">94.6%</text>
|
||||
<text x="1160" y="140" transform="scale(.1)" textLength="370" data-interrogate="result">94.6%</text>
|
||||
<text x="1160" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="370" data-interrogate="result">92.0%</text>
|
||||
<text x="1160" y="140" transform="scale(.1)" textLength="370" data-interrogate="result">92.0%</text>
|
||||
</g>
|
||||
<g id="logo-shadow" serif:id="logo shadow" transform="matrix(0.854876,0,0,0.854876,-6.73514,1.732)">
|
||||
<g transform="matrix(0.299012,0,0,0.299012,9.70229,-6.68582)">
|
||||
|
||||
|
Before Width: | Height: | Size: 12 KiB After Width: | Height: | Size: 12 KiB |
@ -1 +1 @@
|
||||
41.1%
|
||||
57.0%
|
||||
@ -15,7 +15,7 @@
|
||||
<g fill="#fff" text-anchor="middle" font-family="DejaVu Sans,Verdana,Geneva,sans-serif" font-size="11">
|
||||
<text x="31.5" y="15" fill="#010101" fill-opacity=".3">coverage</text>
|
||||
<text x="31.5" y="14">coverage</text>
|
||||
<text x="80" y="15" fill="#010101" fill-opacity=".3">47%</text>
|
||||
<text x="80" y="14">47%</text>
|
||||
<text x="80" y="15" fill="#010101" fill-opacity=".3">57%</text>
|
||||
<text x="80" y="14">57%</text>
|
||||
</g>
|
||||
</svg>
|
||||
|
||||
|
Before Width: | Height: | Size: 904 B After Width: | Height: | Size: 904 B |
1178
coverage.xml
642
html_browser.py
Normal file
@ -0,0 +1,642 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Simple HTML Browser using pyWebLayout
|
||||
|
||||
This browser can render basic HTML content using the pyWebLayout concrete objects.
|
||||
It supports text, images, links, forms, and basic styling.
|
||||
"""
|
||||
|
||||
import re
|
||||
import tkinter as tk
|
||||
from tkinter import ttk, messagebox, filedialog, simpledialog
|
||||
from PIL import Image, ImageTk
|
||||
from typing import Dict, List, Optional, Tuple, Any
|
||||
import webbrowser
|
||||
import os
|
||||
from urllib.parse import urljoin, urlparse
|
||||
import requests
|
||||
from io import BytesIO
|
||||
|
||||
# Import pyWebLayout components
|
||||
from pyWebLayout.concrete import (
|
||||
Page, Container, Box, Text, RenderableImage,
|
||||
RenderableLink, RenderableButton, RenderableForm, RenderableFormField
|
||||
)
|
||||
from pyWebLayout.abstract.functional import (
|
||||
Link, Button, Form, FormField, LinkType, FormFieldType
|
||||
)
|
||||
from pyWebLayout.style.fonts import Font, FontWeight, FontStyle, TextDecoration
|
||||
from pyWebLayout.style.layout import Alignment
|
||||
|
||||
|
||||
class HTMLParser:
|
||||
"""Simple HTML parser that converts HTML to pyWebLayout objects"""
|
||||
|
||||
def __init__(self):
|
||||
self.font_stack = [Font(font_size=14)] # Default font
|
||||
self.current_container = None
|
||||
|
||||
def parse_html_string(self, html_content: str, base_url: str = "") -> Page:
|
||||
"""Parse HTML string and return a Page object"""
|
||||
# Create the main page
|
||||
page = Page(size=(800, 1600), background_color=(255, 255, 255))
|
||||
self.current_container = page
|
||||
self.base_url = base_url
|
||||
|
||||
# Simple HTML parsing using regex (not production-ready, but works for demo)
|
||||
# Remove comments and scripts
|
||||
html_content = re.sub(r'<!--.*?-->', '', html_content, flags=re.DOTALL)
|
||||
html_content = re.sub(r'<script.*?</script>', '', html_content, flags=re.DOTALL)
|
||||
html_content = re.sub(r'<style.*?</style>', '', html_content, flags=re.DOTALL)
|
||||
|
||||
# Extract title
|
||||
title_match = re.search(r'<title>(.*?)</title>', html_content, re.IGNORECASE)
|
||||
if title_match:
|
||||
page.title = title_match.group(1)
|
||||
|
||||
# Extract body content
|
||||
body_match = re.search(r'<body[^>]*>(.*?)</body>', html_content, re.DOTALL | re.IGNORECASE)
|
||||
if body_match:
|
||||
body_content = body_match.group(1)
|
||||
else:
|
||||
# If no body tag, use the entire content
|
||||
body_content = html_content
|
||||
|
||||
# Parse the body content
|
||||
self._parse_content(body_content, page)
|
||||
|
||||
return page
|
||||
|
||||
def parse_html_file(self, file_path: str) -> Page:
|
||||
"""Parse HTML file and return a Page object"""
|
||||
try:
|
||||
with open(file_path, 'r', encoding='utf-8') as f:
|
||||
html_content = f.read()
|
||||
base_url = os.path.dirname(os.path.abspath(file_path))
|
||||
return self.parse_html_string(html_content, base_url)
|
||||
except Exception as e:
|
||||
# Create error page
|
||||
page = Page(size=(800, 1600), background_color=(255, 255, 255))
|
||||
error_text = Text(f"Error loading file: {str(e)}", Font(font_size=16, colour=(255, 0, 0)))
|
||||
page.add_child(error_text)
|
||||
return page
|
||||
|
||||
def _parse_content(self, content: str, container: Container):
|
||||
"""Parse HTML content and add elements to container"""
|
||||
# Simple token-based parsing
|
||||
tokens = self._tokenize_html(content)
|
||||
|
||||
i = 0
|
||||
while i < len(tokens):
|
||||
token = tokens[i]
|
||||
|
||||
if token['type'] == 'text':
|
||||
if token['content'].strip(): # Only add non-empty text
|
||||
text_obj = Text(token['content'].strip(), self.font_stack[-1])
|
||||
container.add_child(text_obj)
|
||||
|
||||
elif token['type'] == 'tag':
|
||||
# Handle the tag and potentially parse content between opening and closing tags
|
||||
i = self._handle_tag_with_content(token, tokens, i, container)
|
||||
continue
|
||||
|
||||
i += 1
|
||||
|
||||
def _handle_tag_with_content(self, token, tokens, current_index, container):
|
||||
"""Handle tags and their content, returning the new index position"""
|
||||
tag_name = token['name']
|
||||
is_closing = token['closing']
|
||||
|
||||
if is_closing:
|
||||
# Handle closing tags
|
||||
if tag_name in ['b', 'strong', 'i', 'em', 'u', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6']:
|
||||
if len(self.font_stack) > 1: # Don't pop the last font
|
||||
self.font_stack.pop()
|
||||
return current_index + 1
|
||||
|
||||
# For opening tags that affect text styling, parse their content with the new style
|
||||
if tag_name in ['b', 'strong', 'i', 'em', 'u', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6']:
|
||||
# Push new font onto stack
|
||||
self._handle_tag(token, container)
|
||||
|
||||
# Find the matching closing tag and parse content in between
|
||||
content_start = current_index + 1
|
||||
content_end = self._find_matching_closing_tag(tokens, current_index, tag_name)
|
||||
|
||||
if content_end > content_start:
|
||||
# Parse content between opening and closing tags with current font style
|
||||
for j in range(content_start, content_end):
|
||||
content_token = tokens[j]
|
||||
if content_token['type'] == 'text':
|
||||
if content_token['content'].strip():
|
||||
text_obj = Text(content_token['content'].strip(), self.font_stack[-1])
|
||||
container.add_child(text_obj)
|
||||
elif content_token['type'] == 'tag' and not content_token['closing']:
|
||||
# Handle nested tags
|
||||
self._handle_tag(content_token, container)
|
||||
|
||||
# Pop the font from stack
|
||||
if len(self.font_stack) > 1:
|
||||
self.font_stack.pop()
|
||||
|
||||
return content_end + 1 if content_end < len(tokens) else len(tokens)
|
||||
|
||||
else:
|
||||
# Handle other tags normally
|
||||
self._handle_tag(token, container)
|
||||
return current_index + 1
|
||||
|
||||
def _find_matching_closing_tag(self, tokens, start_index, tag_name):
|
||||
"""Find the index of the matching closing tag"""
|
||||
open_count = 1
|
||||
i = start_index + 1
|
||||
|
||||
while i < len(tokens) and open_count > 0:
|
||||
token = tokens[i]
|
||||
if token['type'] == 'tag' and token['name'] == tag_name:
|
||||
if token['closing']:
|
||||
open_count -= 1
|
||||
else:
|
||||
open_count += 1
|
||||
i += 1
|
||||
|
||||
return i - 1 if open_count == 0 else len(tokens)
|
||||
|
||||
def _tokenize_html(self, content: str) -> List[Dict]:
|
||||
"""Simple HTML tokenizer"""
|
||||
tokens = []
|
||||
tag_pattern = r'<(/?)([^>]+)>'
|
||||
|
||||
last_end = 0
|
||||
for match in re.finditer(tag_pattern, content):
|
||||
# Add text before tag
|
||||
text_content = content[last_end:match.start()]
|
||||
if text_content:
|
||||
tokens.append({'type': 'text', 'content': text_content})
|
||||
|
||||
# Add tag
|
||||
is_closing = bool(match.group(1))
|
||||
tag_content = match.group(2)
|
||||
tag_parts = tag_content.split()
|
||||
tag_name = tag_parts[0].lower()
|
||||
|
||||
# Parse attributes
|
||||
attributes = {}
|
||||
if len(tag_parts) > 1:
|
||||
attr_text = ' '.join(tag_parts[1:])
|
||||
attr_pattern = r'(\w+)=(?:"([^"]*)"|\'([^\']*)\'|([^\s>]+))'
|
||||
for attr_match in re.finditer(attr_pattern, attr_text):
|
||||
attr_name = attr_match.group(1).lower()
|
||||
attr_value = attr_match.group(2) or attr_match.group(3) or attr_match.group(4)
|
||||
attributes[attr_name] = attr_value
|
||||
|
||||
tokens.append({
|
||||
'type': 'tag',
|
||||
'name': tag_name,
|
||||
'closing': is_closing,
|
||||
'attributes': attributes,
|
||||
'content': tag_content
|
||||
})
|
||||
|
||||
last_end = match.end()
|
||||
|
||||
# Add remaining text
|
||||
if last_end < len(content):
|
||||
text_content = content[last_end:]
|
||||
if text_content:
|
||||
tokens.append({'type': 'text', 'content': text_content})
|
||||
|
||||
return tokens
|
||||
|
||||
def _handle_tag(self, token: Dict, container: Container):
|
||||
"""Handle HTML tags"""
|
||||
tag_name = token['name']
|
||||
is_closing = token['closing']
|
||||
attributes = token['attributes']
|
||||
|
||||
if is_closing:
|
||||
# Handle closing tags
|
||||
if tag_name in ['b', 'strong']:
|
||||
self.font_stack.pop()
|
||||
elif tag_name in ['i', 'em']:
|
||||
self.font_stack.pop()
|
||||
elif tag_name == 'u':
|
||||
self.font_stack.pop()
|
||||
return
|
||||
|
||||
# Handle opening tags
|
||||
if tag_name in ['h1', 'h2', 'h3', 'h4', 'h5', 'h6']:
|
||||
# Headers
|
||||
size_map = {'h1': 24, 'h2': 20, 'h3': 18, 'h4': 16, 'h5': 14, 'h6': 12}
|
||||
font = self.font_stack[-1].with_size(size_map[tag_name]).with_weight(FontWeight.BOLD)
|
||||
self.font_stack.append(font)
|
||||
|
||||
elif tag_name in ['b', 'strong']:
|
||||
# Bold text
|
||||
font = self.font_stack[-1].with_weight(FontWeight.BOLD)
|
||||
self.font_stack.append(font)
|
||||
|
||||
elif tag_name in ['i', 'em']:
|
||||
# Italic text
|
||||
font = self.font_stack[-1].with_style(FontStyle.ITALIC)
|
||||
self.font_stack.append(font)
|
||||
|
||||
elif tag_name == 'u':
|
||||
# Underlined text
|
||||
font = self.font_stack[-1].with_decoration(TextDecoration.UNDERLINE)
|
||||
self.font_stack.append(font)
|
||||
|
||||
elif tag_name == 'a':
|
||||
# Links
|
||||
href = attributes.get('href', '#')
|
||||
title = attributes.get('title', href)
|
||||
|
||||
# Determine link type
|
||||
if href.startswith('http'):
|
||||
link_type = LinkType.EXTERNAL
|
||||
elif href.startswith('#'):
|
||||
link_type = LinkType.INTERNAL
|
||||
else:
|
||||
link_type = LinkType.INTERNAL
|
||||
|
||||
# Create link callback
|
||||
def link_callback(location, **kwargs):
|
||||
return f"Navigate to: {location}"
|
||||
|
||||
link = Link(href, link_type, link_callback, title=title)
|
||||
link_font = self.font_stack[-1].with_colour((0, 0, 255)).with_decoration(TextDecoration.UNDERLINE)
|
||||
|
||||
# For now, just add the link text with link styling
|
||||
link_text = attributes.get('title', href)
|
||||
renderable_link = RenderableLink(link, link_text, link_font)
|
||||
container.add_child(renderable_link)
|
||||
|
||||
elif tag_name == 'img':
|
||||
# Images
|
||||
src = attributes.get('src', '')
|
||||
alt = attributes.get('alt', 'Image')
|
||||
width = attributes.get('width')
|
||||
height = attributes.get('height')
|
||||
|
||||
if src:
|
||||
# Resolve relative URLs
|
||||
if self.base_url and not src.startswith(('http://', 'https://')):
|
||||
if os.path.isdir(self.base_url):
|
||||
src = os.path.join(self.base_url, src)
|
||||
else:
|
||||
src = urljoin(self.base_url, src)
|
||||
|
||||
try:
|
||||
# Create abstract image
|
||||
from pyWebLayout.abstract.block import Image as AbstractImage
|
||||
abstract_img = AbstractImage(src, alt)
|
||||
|
||||
# Parse dimensions if provided
|
||||
max_width = int(width) if width and width.isdigit() else None
|
||||
max_height = int(height) if height and height.isdigit() else None
|
||||
|
||||
renderable_img = RenderableImage(abstract_img, max_width, max_height)
|
||||
container.add_child(renderable_img)
|
||||
|
||||
except Exception as e:
|
||||
# Add error text if image fails to load
|
||||
error_text = Text(f"[Image Error: {alt}]", Font(colour=(255, 0, 0)))
|
||||
container.add_child(error_text)
|
||||
|
||||
elif tag_name == 'br':
|
||||
# Line breaks - add some vertical space
|
||||
spacer = Box((0, 0), (1, 10))
|
||||
container.add_child(spacer)
|
||||
|
||||
elif tag_name == 'p':
|
||||
# Paragraphs - add some vertical space
|
||||
spacer = Box((0, 0), (1, 5))
|
||||
container.add_child(spacer)
|
||||
|
||||
elif tag_name in ['div', 'span']:
|
||||
# Generic containers - just continue parsing
|
||||
pass
|
||||
|
||||
|
||||
class BrowserWindow:
|
||||
"""Main browser window using Tkinter"""
|
||||
|
||||
def __init__(self):
|
||||
self.root = tk.Tk()
|
||||
self.root.title("pyWebLayout HTML Browser")
|
||||
self.root.geometry("900x700")
|
||||
|
||||
self.current_page = None
|
||||
self.history = []
|
||||
self.history_index = -1
|
||||
|
||||
self.setup_ui()
|
||||
|
||||
def setup_ui(self):
|
||||
"""Setup the user interface"""
|
||||
# Create main frame
|
||||
main_frame = ttk.Frame(self.root)
|
||||
main_frame.pack(fill=tk.BOTH, expand=True, padx=5, pady=5)
|
||||
|
||||
# Navigation frame
|
||||
nav_frame = ttk.Frame(main_frame)
|
||||
nav_frame.pack(fill=tk.X, pady=(0, 5))
|
||||
|
||||
# Navigation buttons
|
||||
self.back_btn = ttk.Button(nav_frame, text="←", command=self.go_back, state=tk.DISABLED)
|
||||
self.back_btn.pack(side=tk.LEFT, padx=(0, 5))
|
||||
|
||||
self.forward_btn = ttk.Button(nav_frame, text="→", command=self.go_forward, state=tk.DISABLED)
|
||||
self.forward_btn.pack(side=tk.LEFT, padx=(0, 5))
|
||||
|
||||
self.refresh_btn = ttk.Button(nav_frame, text="⟳", command=self.refresh)
|
||||
self.refresh_btn.pack(side=tk.LEFT, padx=(0, 10))
|
||||
|
||||
# Address bar
|
||||
ttk.Label(nav_frame, text="URL:").pack(side=tk.LEFT)
|
||||
self.url_var = tk.StringVar()
|
||||
self.url_entry = ttk.Entry(nav_frame, textvariable=self.url_var, width=50)
|
||||
self.url_entry.pack(side=tk.LEFT, fill=tk.X, expand=True, padx=(5, 5))
|
||||
self.url_entry.bind('<Return>', self.navigate_to_url)
|
||||
|
||||
self.go_btn = ttk.Button(nav_frame, text="Go", command=self.navigate_to_url)
|
||||
self.go_btn.pack(side=tk.LEFT, padx=(0, 10))
|
||||
|
||||
# File operations
|
||||
self.open_btn = ttk.Button(nav_frame, text="Open File", command=self.open_file)
|
||||
self.open_btn.pack(side=tk.LEFT)
|
||||
|
||||
# Content frame with scrollbars
|
||||
content_frame = ttk.Frame(main_frame)
|
||||
content_frame.pack(fill=tk.BOTH, expand=True)
|
||||
|
||||
# Create canvas with scrollbars
|
||||
self.canvas = tk.Canvas(content_frame, bg='white')
|
||||
|
||||
v_scrollbar = ttk.Scrollbar(content_frame, orient=tk.VERTICAL, command=self.canvas.yview)
|
||||
h_scrollbar = ttk.Scrollbar(content_frame, orient=tk.HORIZONTAL, command=self.canvas.xview)
|
||||
|
||||
self.canvas.configure(yscrollcommand=v_scrollbar.set, xscrollcommand=h_scrollbar.set)
|
||||
|
||||
v_scrollbar.pack(side=tk.RIGHT, fill=tk.Y)
|
||||
h_scrollbar.pack(side=tk.BOTTOM, fill=tk.X)
|
||||
self.canvas.pack(side=tk.LEFT, fill=tk.BOTH, expand=True)
|
||||
|
||||
# Status bar
|
||||
self.status_var = tk.StringVar(value="Ready")
|
||||
status_bar = ttk.Label(main_frame, textvariable=self.status_var, relief=tk.SUNKEN)
|
||||
status_bar.pack(fill=tk.X, pady=(5, 0))
|
||||
|
||||
# Bind mouse events
|
||||
self.canvas.bind('<Button-1>', self.on_click)
|
||||
self.canvas.bind('<Motion>', self.on_mouse_move)
|
||||
|
||||
# Load default page
|
||||
self.load_default_page()
|
||||
|
||||
def load_default_page(self):
|
||||
"""Load a default welcome page"""
|
||||
html_content = """
|
||||
<html>
|
||||
<head><title>pyWebLayout Browser - Welcome</title></head>
|
||||
<body>
|
||||
<h1>Welcome to pyWebLayout Browser</h1>
|
||||
<p>This is a simple HTML browser built using pyWebLayout components.</p>
|
||||
|
||||
<h2>Features:</h2>
|
||||
<ul>
|
||||
<li>Basic HTML rendering</li>
|
||||
<li>Text formatting (bold, italic, underline)</li>
|
||||
<li>Headers (H1-H6)</li>
|
||||
<li>Links (clickable)</li>
|
||||
<li>Images</li>
|
||||
<li>Forms (basic support)</li>
|
||||
</ul>
|
||||
|
||||
<h2>Try these features:</h2>
|
||||
<p><b>Bold text</b>, <i>italic text</i>, and <u>underlined text</u></p>
|
||||
|
||||
<p>Sample link: <a href="https://www.example.com" title="External link">Visit Example.com</a></p>
|
||||
|
||||
<h3>File Operations</h3>
|
||||
<p>Use the "Open File" button to load local HTML files.</p>
|
||||
|
||||
<p>Or enter a URL in the address bar above.</p>
|
||||
</body>
|
||||
</html>
|
||||
"""
|
||||
|
||||
parser = HTMLParser()
|
||||
self.current_page = parser.parse_html_string(html_content)
|
||||
self.render_page()
|
||||
self.status_var.set("Welcome page loaded")
|
||||
|
||||
def navigate_to_url(self, event=None):
|
||||
"""Navigate to the URL in the address bar"""
|
||||
url = self.url_var.get().strip()
|
||||
if not url:
|
||||
return
|
||||
|
||||
self.status_var.set(f"Loading {url}...")
|
||||
self.root.update()
|
||||
|
||||
try:
|
||||
if url.startswith(('http://', 'https://')):
|
||||
# Web URL
|
||||
response = requests.get(url, timeout=10)
|
||||
response.raise_for_status()
|
||||
html_content = response.text
|
||||
|
||||
parser = HTMLParser()
|
||||
self.current_page = parser.parse_html_string(html_content, url)
|
||||
|
||||
elif os.path.isfile(url):
|
||||
# Local file
|
||||
parser = HTMLParser()
|
||||
self.current_page = parser.parse_html_file(url)
|
||||
|
||||
else:
|
||||
# Try to treat as a local file path
|
||||
if not url.startswith('file://'):
|
||||
url = 'file://' + os.path.abspath(url)
|
||||
|
||||
file_path = url.replace('file://', '')
|
||||
if os.path.isfile(file_path):
|
||||
parser = HTMLParser()
|
||||
self.current_page = parser.parse_html_file(file_path)
|
||||
else:
|
||||
raise FileNotFoundError(f"File not found: {file_path}")
|
||||
|
||||
# Add to history
|
||||
self.add_to_history(url)
|
||||
self.render_page()
|
||||
self.status_var.set(f"Loaded {url}")
|
||||
|
||||
except Exception as e:
|
||||
self.status_var.set(f"Error loading {url}: {str(e)}")
|
||||
messagebox.showerror("Error", f"Failed to load {url}:\n{str(e)}")
|
||||
|
||||
def open_file(self):
|
||||
"""Open a local HTML file"""
|
||||
file_path = filedialog.askopenfilename(
|
||||
title="Open HTML File",
|
||||
filetypes=[("HTML files", "*.html *.htm"), ("All files", "*.*")]
|
||||
)
|
||||
|
||||
if file_path:
|
||||
self.url_var.set(file_path)
|
||||
self.navigate_to_url()
|
||||
|
||||
def render_page(self):
|
||||
"""Render the current page to the canvas"""
|
||||
if not self.current_page:
|
||||
return
|
||||
|
||||
# Clear canvas
|
||||
self.canvas.delete("all")
|
||||
|
||||
# Render the page to PIL Image
|
||||
page_image = self.current_page.render()
|
||||
|
||||
# Convert to PhotoImage
|
||||
self.photo = ImageTk.PhotoImage(page_image)
|
||||
|
||||
# Display on canvas
|
||||
self.canvas.create_image(0, 0, anchor=tk.NW, image=self.photo)
|
||||
|
||||
# Update scroll region
|
||||
self.canvas.configure(scrollregion=self.canvas.bbox("all"))
|
||||
|
||||
# Store page elements for interaction
|
||||
self.page_elements = self._get_clickable_elements(self.current_page)
|
||||
|
||||
def _get_clickable_elements(self, container, offset=(0, 0)) -> List[Tuple]:
|
||||
"""Get list of clickable elements with their positions"""
|
||||
elements = []
|
||||
|
||||
if hasattr(container, '_children'):
|
||||
for child in container._children:
|
||||
if hasattr(child, '_origin'):
|
||||
child_offset = (offset[0] + child._origin[0], offset[1] + child._origin[1])
|
||||
|
||||
# Check if element is clickable
|
||||
if isinstance(child, (RenderableLink, RenderableButton)):
|
||||
elements.append((child, child_offset, child._size))
|
||||
|
||||
# Recursively check children
|
||||
if hasattr(child, '_children'):
|
||||
elements.extend(self._get_clickable_elements(child, child_offset))
|
||||
|
||||
return elements
|
||||
|
||||
def on_click(self, event):
|
||||
"""Handle mouse clicks on the canvas"""
|
||||
# Convert canvas coordinates to image coordinates
|
||||
canvas_x = self.canvas.canvasx(event.x)
|
||||
canvas_y = self.canvas.canvasy(event.y)
|
||||
|
||||
# Check if click is on any clickable element
|
||||
for element, offset, size in self.page_elements:
|
||||
element_x, element_y = offset
|
||||
element_w, element_h = size
|
||||
|
||||
if (element_x <= canvas_x <= element_x + element_w and
|
||||
element_y <= canvas_y <= element_y + element_h):
|
||||
|
||||
# Handle the click
|
||||
if isinstance(element, RenderableLink):
|
||||
result = element._callback()
|
||||
if result:
|
||||
self.status_var.set(result)
|
||||
# For external links, open in system browser
|
||||
if element._link.link_type == LinkType.EXTERNAL:
|
||||
webbrowser.open(element._link.location)
|
||||
|
||||
elif isinstance(element, RenderableButton):
|
||||
result = element._callback()
|
||||
if result:
|
||||
self.status_var.set(f"Button clicked: {result}")
|
||||
|
||||
break
|
||||
|
||||
def on_mouse_move(self, event):
|
||||
"""Handle mouse movement for hover effects"""
|
||||
# Convert canvas coordinates to image coordinates
|
||||
canvas_x = self.canvas.canvasx(event.x)
|
||||
canvas_y = self.canvas.canvasy(event.y)
|
||||
|
||||
# Check if mouse is over any clickable element
|
||||
cursor = "arrow"
|
||||
for element, offset, size in self.page_elements:
|
||||
element_x, element_y = offset
|
||||
element_w, element_h = size
|
||||
|
||||
if (element_x <= canvas_x <= element_x + element_w and
|
||||
element_y <= canvas_y <= element_y + element_h):
|
||||
cursor = "hand2"
|
||||
break
|
||||
|
||||
self.canvas.configure(cursor=cursor)
|
||||
|
||||
def add_to_history(self, url):
|
||||
"""Add URL to navigation history"""
|
||||
# Remove any forward history
|
||||
self.history = self.history[:self.history_index + 1]
|
||||
|
||||
# Add new URL
|
||||
self.history.append(url)
|
||||
self.history_index = len(self.history) - 1
|
||||
|
||||
# Update navigation buttons
|
||||
self.update_nav_buttons()
|
||||
|
||||
def update_nav_buttons(self):
|
||||
"""Update the state of navigation buttons"""
|
||||
self.back_btn.configure(state=tk.NORMAL if self.history_index > 0 else tk.DISABLED)
|
||||
self.forward_btn.configure(state=tk.NORMAL if self.history_index < len(self.history) - 1 else tk.DISABLED)
|
||||
|
||||
def go_back(self):
|
||||
"""Navigate back in history"""
|
||||
if self.history_index > 0:
|
||||
self.history_index -= 1
|
||||
url = self.history[self.history_index]
|
||||
self.url_var.set(url)
|
||||
self.navigate_to_url()
|
||||
|
||||
def go_forward(self):
|
||||
"""Navigate forward in history"""
|
||||
if self.history_index < len(self.history) - 1:
|
||||
self.history_index += 1
|
||||
url = self.history[self.history_index]
|
||||
self.url_var.set(url)
|
||||
self.navigate_to_url()
|
||||
|
||||
def refresh(self):
|
||||
"""Refresh the current page"""
|
||||
if self.current_page:
|
||||
current_url = self.url_var.get()
|
||||
if current_url:
|
||||
self.navigate_to_url()
|
||||
else:
|
||||
self.load_default_page()
|
||||
|
||||
def run(self):
|
||||
"""Start the browser"""
|
||||
self.root.mainloop()
|
||||
|
||||
|
||||
def main():
|
||||
"""Main function to run the browser"""
|
||||
print("Starting pyWebLayout HTML Browser...")
|
||||
|
||||
try:
|
||||
browser = BrowserWindow()
|
||||
browser.run()
|
||||
except Exception as e:
|
||||
print(f"Error starting browser: {e}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@ -43,13 +43,35 @@ class Text(Renderable, Queriable):
|
||||
# The bounding box is (left, top, right, bottom)
|
||||
try:
|
||||
bbox = font.getbbox(self._text)
|
||||
self._width = bbox[2] - bbox[0]
|
||||
self._height = bbox[3] - bbox[1]
|
||||
# Width is the difference between right and left
|
||||
self._width = max(1, bbox[2] - bbox[0])
|
||||
# Height needs to account for potential negative top values
|
||||
# Use the full height from top to bottom, ensuring positive values
|
||||
top = min(0, bbox[1]) # Account for negative ascenders
|
||||
bottom = max(bbox[3], bbox[1] + font.size) # Ensure minimum height
|
||||
self._height = max(font.size, bottom - top)
|
||||
self._size = (self._width, self._height)
|
||||
|
||||
# Store the offset for proper text positioning
|
||||
self._text_offset_x = max(0, -bbox[0])
|
||||
self._text_offset_y = max(0, -top)
|
||||
|
||||
except AttributeError:
|
||||
# Fallback for older PIL versions
|
||||
self._width, self._height = font.getsize(self._text)
|
||||
self._size = (self._width, self._height)
|
||||
try:
|
||||
self._width, self._height = font.getsize(self._text)
|
||||
# Add some padding to prevent cropping
|
||||
self._height = max(self._height, int(self._style.font_size * 1.2))
|
||||
self._size = (self._width, self._height)
|
||||
self._text_offset_x = 0
|
||||
self._text_offset_y = 0
|
||||
except:
|
||||
# Ultimate fallback
|
||||
self._width = len(self._text) * self._style.font_size // 2
|
||||
self._height = int(self._style.font_size * 1.2)
|
||||
self._size = (self._width, self._height)
|
||||
self._text_offset_x = 0
|
||||
self._text_offset_y = 0
|
||||
|
||||
@property
|
||||
def text(self) -> str:
|
||||
@ -123,8 +145,10 @@ class Text(Renderable, Queriable):
|
||||
if self._style.background and self._style.background[3] > 0: # If alpha > 0
|
||||
draw.rectangle([(0, 0), self._size], fill=self._style.background)
|
||||
|
||||
# Draw the text
|
||||
draw.text((0, 0), self._text, font=self._style.font, fill=self._style.colour)
|
||||
# Draw the text using calculated offsets to prevent cropping
|
||||
text_x = getattr(self, '_text_offset_x', 0)
|
||||
text_y = getattr(self, '_text_offset_y', 0)
|
||||
draw.text((text_x, text_y), self._text, font=self._style.font, fill=self._style.colour)
|
||||
|
||||
# Apply any text decorations
|
||||
self._apply_decoration(draw)
|
||||
|
||||
46
test_page.html
Normal file
@ -0,0 +1,46 @@
|
||||
<!DOCTYPE html>
|
||||
<html>
|
||||
<head>
|
||||
<title>Test Page for pyWebLayout Browser</title>
|
||||
</head>
|
||||
<body>
|
||||
<h1>pyWebLayout Browser Test Page</h1>
|
||||
<h3>Images</h3>
|
||||
<p>Here's a sample image:</p>
|
||||
<img src="tests/data/sample_image.jpg" alt="Sample Image" width="200" height="150">
|
||||
<h2>Text Formatting</h2>
|
||||
<p>This is a paragraph with <b>bold text</b>, <i>italic text</i>, and <u>underlined text</u>.</p>
|
||||
|
||||
<h3>Links</h3>
|
||||
<p>Here are some test links:</p>
|
||||
<ul>
|
||||
<li><a href="https://www.google.com" title="Google">External link to Google</a></li>
|
||||
<li><a href="#section1" title="Section 1">Internal link to Section 1</a></li>
|
||||
</ul>
|
||||
|
||||
<h3>Headers</h3>
|
||||
<h1>H1 Header</h1>
|
||||
<h2>H2 Header</h2>
|
||||
<h3>H3 Header</h3>
|
||||
<h4>H4 Header</h4>
|
||||
<h5>H5 Header</h5>
|
||||
<h6>H6 Header</h6>
|
||||
|
||||
<h3>Line Breaks and Paragraphs</h3>
|
||||
<p>This is the first paragraph.</p>
|
||||
<br>
|
||||
<p>This is the second paragraph after a line break.</p>
|
||||
|
||||
<h3 id="section1">Section 1</h3>
|
||||
<p>This is the content of section 1. You can link to this section using the internal link above.</p>
|
||||
|
||||
<h3>Images</h3>
|
||||
<p>Here's a sample image:</p>
|
||||
<img src="tests/data/sample_image.jpg" alt="Sample Image" width="200" height="150">
|
||||
|
||||
<h3>Mixed Content</h3>
|
||||
<p>This paragraph contains <b>bold</b> and <i>italic</i> text, as well as an <a href="https://www.example.com">external link</a>.</p>
|
||||
|
||||
<p><strong>Strong text</strong> and <em>emphasized text</em> should also work.</p>
|
||||
</body>
|
||||
</html>
|
||||
6892
tests/data/Kimi Räikkönen - Wikipedia.html
Normal file
|
After Width: | Height: | Size: 4.3 KiB |
|
After Width: | Height: | Size: 1.9 KiB |
|
After Width: | Height: | Size: 3.2 KiB |
|
After Width: | Height: | Size: 608 B |
|
After Width: | Height: | Size: 1.1 KiB |
|
After Width: | Height: | Size: 1.7 KiB |
BIN
tests/data/Kimi Räikkönen - Wikipedia_files/Edit-clear.svg.webp
Normal file
|
After Width: | Height: | Size: 2.0 KiB |
|
After Width: | Height: | Size: 4.8 KiB |
|
After Width: | Height: | Size: 3.3 KiB |
|
After Width: | Height: | Size: 140 B |
|
After Width: | Height: | Size: 200 B |
|
After Width: | Height: | Size: 378 B |
|
After Width: | Height: | Size: 438 B |
|
After Width: | Height: | Size: 526 B |
|
After Width: | Height: | Size: 94 B |
|
After Width: | Height: | Size: 72 B |
|
After Width: | Height: | Size: 90 B |
|
After Width: | Height: | Size: 302 B |
|
After Width: | Height: | Size: 68 B |
|
After Width: | Height: | Size: 130 B |
|
After Width: | Height: | Size: 134 B |
|
After Width: | Height: | Size: 318 B |
|
After Width: | Height: | Size: 888 B |
|
After Width: | Height: | Size: 568 B |
|
After Width: | Height: | Size: 336 B |
|
After Width: | Height: | Size: 266 B |
|
After Width: | Height: | Size: 164 B |
|
After Width: | Height: | Size: 164 B |
|
After Width: | Height: | Size: 160 B |
|
After Width: | Height: | Size: 138 B |
|
After Width: | Height: | Size: 72 B |
|
After Width: | Height: | Size: 56 B |
|
After Width: | Height: | Size: 56 B |
|
After Width: | Height: | Size: 54 B |
|
After Width: | Height: | Size: 70 B |
|
After Width: | Height: | Size: 88 B |
|
After Width: | Height: | Size: 122 B |
|
After Width: | Height: | Size: 126 B |
|
After Width: | Height: | Size: 72 B |
|
After Width: | Height: | Size: 260 B |
|
After Width: | Height: | Size: 266 B |
|
After Width: | Height: | Size: 416 B |
|
After Width: | Height: | Size: 676 B |
|
After Width: | Height: | Size: 52 B |
|
After Width: | Height: | Size: 50 B |
|
After Width: | Height: | Size: 222 B |
|
After Width: | Height: | Size: 218 B |
|
After Width: | Height: | Size: 70 B |
|
After Width: | Height: | Size: 88 B |
|
After Width: | Height: | Size: 178 B |
|
After Width: | Height: | Size: 292 B |
|
After Width: | Height: | Size: 138 B |
|
After Width: | Height: | Size: 126 B |
|
After Width: | Height: | Size: 68 B |
|
After Width: | Height: | Size: 130 B |
|
After Width: | Height: | Size: 120 B |
|
After Width: | Height: | Size: 246 B |
|
After Width: | Height: | Size: 334 B |
|
After Width: | Height: | Size: 220 B |
|
After Width: | Height: | Size: 262 B |
|
After Width: | Height: | Size: 316 B |
|
After Width: | Height: | Size: 334 B |
|
After Width: | Height: | Size: 196 B |
|
After Width: | Height: | Size: 336 B |
|
After Width: | Height: | Size: 290 B |
|
After Width: | Height: | Size: 202 B |
|
After Width: | Height: | Size: 632 B |
|
After Width: | Height: | Size: 1.5 KiB |
1
tests/data/Kimi Räikkönen - Wikipedia_files/load.css
Normal file
23
tests/data/Kimi Räikkönen - Wikipedia_files/load.js
Normal file
1
tests/data/Kimi Räikkönen - Wikipedia_files/load_002.css
Normal file
@ -393,7 +393,7 @@ class TestRenderableFormField(unittest.TestCase):
|
||||
""" TODO: Fix test
|
||||
@patch('PIL.ImageDraw.Draw')
|
||||
def test_render_field_with_value(self, mock_draw_class):
|
||||
"""Test rendering field with value"""
|
||||
#Test rendering field with value
|
||||
mock_draw = Mock()
|
||||
mock_draw_class.return_value = mock_draw
|
||||
|
||||
|
||||
118
tests/test_html_file_loader.py
Normal file
@ -0,0 +1,118 @@
|
||||
"""
|
||||
Test module for loading HTML files using the html_extraction module.
|
||||
|
||||
This test verifies that HTML files can be loaded from disk and processed
|
||||
using the html_extraction.parse_html_string function.
|
||||
"""
|
||||
|
||||
import os
|
||||
import unittest
|
||||
from pyWebLayout.io.readers.html_extraction import parse_html_string
|
||||
from pyWebLayout.abstract.block import Block
|
||||
from pyWebLayout.style import Font
|
||||
|
||||
|
||||
class TestHTMLFileLoader(unittest.TestCase):
|
||||
"""Test class for HTML file loading functionality."""
|
||||
|
||||
def test_load_html_file(self):
|
||||
"""Test loading and parsing an HTML file from disk."""
|
||||
# Path to the test HTML file
|
||||
html_file_path = os.path.join("tests", "data", "Kimi Räikkönen - Wikipedia.html")
|
||||
|
||||
# Verify the test file exists
|
||||
self.assertTrue(os.path.exists(html_file_path), f"Test HTML file not found: {html_file_path}")
|
||||
|
||||
# Read the HTML file
|
||||
with open(html_file_path, 'r', encoding='utf-8') as file:
|
||||
html_content = file.read()
|
||||
|
||||
# Verify we got some content
|
||||
self.assertGreater(len(html_content), 0, "HTML file should not be empty")
|
||||
|
||||
# Parse the HTML content using the html_extraction module
|
||||
try:
|
||||
blocks = parse_html_string(html_content)
|
||||
except Exception as e:
|
||||
self.fail(f"Failed to parse HTML file: {e}")
|
||||
|
||||
# Verify we got some blocks
|
||||
self.assertIsInstance(blocks, list, "parse_html_string should return a list")
|
||||
self.assertGreater(len(blocks), 0, "Should extract at least one block from the HTML file")
|
||||
|
||||
# Verify all returned items are Block instances
|
||||
for i, block in enumerate(blocks):
|
||||
self.assertIsInstance(block, Block, f"Item {i} should be a Block instance, got {type(block)}")
|
||||
|
||||
print(f"Successfully loaded and parsed HTML file with {len(blocks)} blocks")
|
||||
|
||||
def test_load_html_file_with_custom_font(self):
|
||||
"""Test loading HTML file with a custom base font."""
|
||||
html_file_path = os.path.join("tests", "data", "Kimi Räikkönen - Wikipedia.html")
|
||||
|
||||
# Skip if file doesn't exist
|
||||
if not os.path.exists(html_file_path):
|
||||
self.skipTest(f"Test HTML file not found: {html_file_path}")
|
||||
|
||||
# Create a custom font
|
||||
custom_font = Font(font_size=14, colour=(100, 100, 100))
|
||||
|
||||
# Read and parse with custom font
|
||||
with open(html_file_path, 'r', encoding='utf-8') as file:
|
||||
html_content = file.read()
|
||||
|
||||
blocks = parse_html_string(html_content, base_font=custom_font)
|
||||
|
||||
# Verify we got blocks
|
||||
self.assertGreater(len(blocks), 0, "Should extract blocks with custom font")
|
||||
|
||||
print(f"Successfully parsed HTML file with custom font, got {len(blocks)} blocks")
|
||||
|
||||
def test_load_html_file_content_types(self):
|
||||
"""Test that the loaded HTML file contains expected content types."""
|
||||
html_file_path = os.path.join("tests", "data", "Kimi Räikkönen - Wikipedia.html")
|
||||
|
||||
# Skip if file doesn't exist
|
||||
if not os.path.exists(html_file_path):
|
||||
self.skipTest(f"Test HTML file not found: {html_file_path}")
|
||||
|
||||
with open(html_file_path, 'r', encoding='utf-8') as file:
|
||||
html_content = file.read()
|
||||
|
||||
blocks = parse_html_string(html_content)
|
||||
|
||||
# Check that we have different types of blocks
|
||||
block_type_names = [type(block).__name__ for block in blocks]
|
||||
unique_types = set(block_type_names)
|
||||
|
||||
# A Wikipedia page should contain multiple types of content
|
||||
self.assertGreater(len(unique_types), 1, "Should have multiple types of blocks in Wikipedia page")
|
||||
|
||||
print(f"Found block types: {sorted(unique_types)}")
|
||||
|
||||
def test_html_file_size_handling(self):
|
||||
"""Test that large HTML files can be handled gracefully."""
|
||||
html_file_path = os.path.join("tests", "data", "Kimi Räikkönen - Wikipedia.html")
|
||||
|
||||
# Skip if file doesn't exist
|
||||
if not os.path.exists(html_file_path):
|
||||
self.skipTest(f"Test HTML file not found: {html_file_path}")
|
||||
|
||||
# Get file size
|
||||
file_size = os.path.getsize(html_file_path)
|
||||
print(f"HTML file size: {file_size} bytes")
|
||||
|
||||
# Read and parse
|
||||
with open(html_file_path, 'r', encoding='utf-8') as file:
|
||||
html_content = file.read()
|
||||
|
||||
# This should not raise an exception even for large files
|
||||
blocks = parse_html_string(html_content)
|
||||
|
||||
# Basic verification
|
||||
self.assertIsInstance(blocks, list)
|
||||
print(f"Successfully processed {file_size} byte file into {len(blocks)} blocks")
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
unittest.main()
|
||||