add some additional tests

2025-06-07 20:16:38 +02:00 · 2025-06-07 20:16:38 +02:00 · 899182152a
commit 899182152a
parent c981fbd1c0
83 changed files with 8168 additions and 928 deletions
--- a/BROWSER_README.md
+++ b/BROWSER_README.md
@ -0,0 +1,143 @@
+# pyWebLayout HTML Browser
+
+A simple HTML browser built using the pyWebLayout library components from `pyWebLayout/io/` and `pyWebLayout/concrete/`.
+
+## Features
+
+This browser demonstrates the capabilities of pyWebLayout by implementing:
+
+### Rendering Components
+- **Text rendering** with various formatting (bold, italic, underline)
+- **Headers** (H1-H6) with proper sizing and styling
+- **Links** (clickable, with external browser opening for external URLs)
+- **Images** (local files and web URLs with error handling)
+- **Layout containers** for proper element positioning
+- **Basic HTML parsing** and element conversion
+
+### User Interface
+- **Navigation controls**: Back, Forward, Refresh buttons
+- **Address bar**: Enter URLs or file paths
+- **File browser**: Open local HTML files
+- **Scrollable content area** with both vertical and horizontal scrollbars
+- **Mouse interaction**: Clickable links with hover effects
+- **Status bar**: Shows current operation status
+
+## Usage
+
+### Starting the Browser
+```bash
+python html_browser.py
+```
+
+### Loading Content
+
+1. **Load the test page**: The browser starts with a welcome page showing various features
+2. **Open local files**: Click "Open File" to browse and select HTML files
+3. **Enter URLs**: Type URLs in the address bar and press Enter or click "Go"
+4. **Navigate**: Use back/forward buttons to navigate through history
+
+### Test Files
+
+- `test_page.html` - A comprehensive test page demonstrating all supported features including:
+  - Text formatting (bold, italic, underline)
+  - Headers of all levels (H1-H6)
+  - Links (both internal and external)
+  - Images (includes the sample image from tests/data/)
+  - Line breaks and paragraphs
+
+## Architecture
+
+### HTML Parser (`HTMLParser` class)
+- Simple regex-based HTML tokenizer
+- Converts HTML elements to pyWebLayout abstract objects
+- Handles font styling with a font stack for nested formatting
+- Supports basic HTML tags: h1-h6, b, strong, i, em, u, a, img, br, p, div, span
+
+### Browser Window (`BrowserWindow` class)
+- Tkinter-based GUI with navigation controls
+- Canvas-based rendering of pyWebLayout Page objects
+- Mouse event handling for interactive elements
+- Navigation history management
+- File and URL loading capabilities
+
+### pyWebLayout Integration
+
+The browser uses these pyWebLayout components:
+
+#### From `pyWebLayout/concrete/`:
+- `Page` - Top-level container for web page content
+- `Container` - Layout management for multiple elements
+- `Box` - Basic rectangular container with positioning
+- `Text` - Text rendering with font styling
+- `RenderableImage` - Image loading and display with scaling
+- `RenderableLink` - Interactive link elements
+- `RenderableButton` - Interactive button elements
+
+#### From `pyWebLayout/abstract/`:
+- `Link` - Abstract link representation with types (internal, external, API, function)
+- `Image` - Abstract image representation with dimensions and loading
+- Font and styling classes for text appearance
+
+#### From `pyWebLayout/style/`:
+- `Font` - Font management with size, weight, style, and decoration
+- `FontWeight`, `FontStyle`, `TextDecoration` - Typography enums
+- `Alignment` - Layout positioning options
+
+## Supported HTML Features
+
+### Text Elements
+- `<h1>` to `<h6>` - Headers with appropriate sizing
+- `<p>` - Paragraphs with spacing
+- `<b>`, `<strong>` - Bold text
+- `<i>`, `<em>` - Italic text
+- `<u>` - Underlined text
+- `<br>` - Line breaks
+
+### Interactive Elements
+- `<a href="...">` - Links (opens external URLs in system browser)
+
+### Media Elements
+- `<img src="..." alt="..." width="..." height="...">` - Images with scaling
+
+### Container Elements
+- `<div>`, `<span>` - Generic containers (parsed but not specially styled)
+
+## Example Usage
+
+```python
+# Start the browser
+from html_browser import BrowserWindow
+
+browser = BrowserWindow()
+browser.run()
+```
+
+## Limitations
+
+This is a demonstration browser with simplified HTML parsing:
+- No CSS support (styling is done through pyWebLayout components)
+- No JavaScript execution
+- Limited HTML tag support
+- No form submission (forms can be rendered but not submitted)
+- No advanced layout features (flexbox, grid, etc.)
+
+## Dependencies
+
+- `tkinter` - GUI framework (usually included with Python)
+- `PIL` (Pillow) - Image processing
+- `requests` - HTTP requests for web URLs
+- `pyWebLayout` - The core layout and rendering library
+
+## Testing
+
+Load `test_page.html` to see all supported features in action:
+1. Run the browser: `python html_browser.py`
+2. Click "Open File" and select `test_page.html`
+3. Explore the different text formatting, links, and image rendering
+
+The test page includes:
+- Various header levels
+- Text formatting examples
+- Clickable links (try the Google link!)
+- A sample image from the test data
+- Mixed content demonstrations
--- a/coverage-docs.svg
+++ b/coverage-docs.svg
@ -1,5 +1,5 @@
 <svg width="140" height="20" viewBox="0 0 140 20" version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" xml:space="preserve" xmlns:serif="http://www.serif.com/" style="fill-rule:evenodd;clip-rule:evenodd;stroke-linejoin:round;stroke-miterlimit:2;">
-    <title>interrogate: 94.6%</title>
+    <title>interrogate: 92.0%</title>
    <g transform="matrix(1,0,0,1,22,0)">
        <g id="backgrounds" transform="matrix(1.32789,0,0,1,-22.3892,0)">
            <rect x="0" y="0" width="71" height="20" style="fill:rgb(85,85,85);"/>
@ -12,8 +12,8 @@
    <g fill="#fff" text-anchor="middle" font-family="DejaVu Sans,Verdana,Geneva,sans-serif" font-size="110">
        <text x="590" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="610">interrogate</text>
        <text x="590" y="140" transform="scale(.1)" textLength="610">interrogate</text>
-        <text x="1160" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="370" data-interrogate="result">94.6%</text>
-        <text x="1160" y="140" transform="scale(.1)" textLength="370" data-interrogate="result">94.6%</text>
+        <text x="1160" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="370" data-interrogate="result">92.0%</text>
+        <text x="1160" y="140" transform="scale(.1)" textLength="370" data-interrogate="result">92.0%</text>
    </g>
    <g id="logo-shadow" serif:id="logo shadow" transform="matrix(0.854876,0,0,0.854876,-6.73514,1.732)">
        <g transform="matrix(0.299012,0,0,0.299012,9.70229,-6.68582)">
--- a/coverage-summary.txt
+++ b/coverage-summary.txt
@ -1 +1 @@
-41.1%
+57.0%
--- a/coverage.json
+++ b/coverage.json
--- a/coverage.svg
+++ b/coverage.svg
@ -15,7 +15,7 @@
    <g fill="#fff" text-anchor="middle" font-family="DejaVu Sans,Verdana,Geneva,sans-serif" font-size="11">
        <text x="31.5" y="15" fill="#010101" fill-opacity=".3">coverage</text>
        <text x="31.5" y="14">coverage</text>
-        <text x="80" y="15" fill="#010101" fill-opacity=".3">47%</text>
-        <text x="80" y="14">47%</text>
+        <text x="80" y="15" fill="#010101" fill-opacity=".3">57%</text>
+        <text x="80" y="14">57%</text>
    </g>
 </svg>
--- a/coverage.xml
+++ b/coverage.xml
--- a/html_browser.py
+++ b/html_browser.py
@ -0,0 +1,642 @@
+#!/usr/bin/env python3
+"""
+Simple HTML Browser using pyWebLayout
+
+This browser can render basic HTML content using the pyWebLayout concrete objects.
+It supports text, images, links, forms, and basic styling.
+"""
+
+import re
+import tkinter as tk
+from tkinter import ttk, messagebox, filedialog, simpledialog
+from PIL import Image, ImageTk
+from typing import Dict, List, Optional, Tuple, Any
+import webbrowser
+import os
+from urllib.parse import urljoin, urlparse
+import requests
+from io import BytesIO
+
+# Import pyWebLayout components
+from pyWebLayout.concrete import (
+    Page, Container, Box, Text, RenderableImage, 
+    RenderableLink, RenderableButton, RenderableForm, RenderableFormField
+)
+from pyWebLayout.abstract.functional import (
+    Link, Button, Form, FormField, LinkType, FormFieldType
+)
+from pyWebLayout.style.fonts import Font, FontWeight, FontStyle, TextDecoration
+from pyWebLayout.style.layout import Alignment
+
+
+class HTMLParser:
+    """Simple HTML parser that converts HTML to pyWebLayout objects"""
+    
+    def __init__(self):
+        self.font_stack = [Font(font_size=14)]  # Default font
+        self.current_container = None
+        
+    def parse_html_string(self, html_content: str, base_url: str = "") -> Page:
+        """Parse HTML string and return a Page object"""
+        # Create the main page
+        page = Page(size=(800, 1600), background_color=(255, 255, 255))
+        self.current_container = page
+        self.base_url = base_url
+        
+        # Simple HTML parsing using regex (not production-ready, but works for demo)
+        # Remove comments and scripts
+        html_content = re.sub(r'<!--.*?-->', '', html_content, flags=re.DOTALL)
+        html_content = re.sub(r'<script.*?</script>', '', html_content, flags=re.DOTALL)
+        html_content = re.sub(r'<style.*?</style>', '', html_content, flags=re.DOTALL)
+        
+        # Extract title
+        title_match = re.search(r'<title>(.*?)</title>', html_content, re.IGNORECASE)
+        if title_match:
+            page.title = title_match.group(1)
+        
+        # Extract body content
+        body_match = re.search(r'<body[^>]*>(.*?)</body>', html_content, re.DOTALL | re.IGNORECASE)
+        if body_match:
+            body_content = body_match.group(1)
+        else:
+            # If no body tag, use the entire content
+            body_content = html_content
+        
+        # Parse the body content
+        self._parse_content(body_content, page)
+        
+        return page
+    
+    def parse_html_file(self, file_path: str) -> Page:
+        """Parse HTML file and return a Page object"""
+        try:
+            with open(file_path, 'r', encoding='utf-8') as f:
+                html_content = f.read()
+            base_url = os.path.dirname(os.path.abspath(file_path))
+            return self.parse_html_string(html_content, base_url)
+        except Exception as e:
+            # Create error page
+            page = Page(size=(800, 1600), background_color=(255, 255, 255))
+            error_text = Text(f"Error loading file: {str(e)}", Font(font_size=16, colour=(255, 0, 0)))
+            page.add_child(error_text)
+            return page
+    
+    def _parse_content(self, content: str, container: Container):
+        """Parse HTML content and add elements to container"""
+        # Simple token-based parsing
+        tokens = self._tokenize_html(content)
+        
+        i = 0
+        while i < len(tokens):
+            token = tokens[i]
+            
+            if token['type'] == 'text':
+                if token['content'].strip():  # Only add non-empty text
+                    text_obj = Text(token['content'].strip(), self.font_stack[-1])
+                    container.add_child(text_obj)
+            
+            elif token['type'] == 'tag':
+                # Handle the tag and potentially parse content between opening and closing tags
+                i = self._handle_tag_with_content(token, tokens, i, container)
+                continue
+            
+            i += 1
+    
+    def _handle_tag_with_content(self, token, tokens, current_index, container):
+        """Handle tags and their content, returning the new index position"""
+        tag_name = token['name']
+        is_closing = token['closing']
+        
+        if is_closing:
+            # Handle closing tags
+            if tag_name in ['b', 'strong', 'i', 'em', 'u', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6']:
+                if len(self.font_stack) > 1:  # Don't pop the last font
+                    self.font_stack.pop()
+            return current_index + 1
+        
+        # For opening tags that affect text styling, parse their content with the new style
+        if tag_name in ['b', 'strong', 'i', 'em', 'u', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6']:
+            # Push new font onto stack
+            self._handle_tag(token, container)
+            
+            # Find the matching closing tag and parse content in between
+            content_start = current_index + 1
+            content_end = self._find_matching_closing_tag(tokens, current_index, tag_name)
+            
+            if content_end > content_start:
+                # Parse content between opening and closing tags with current font style
+                for j in range(content_start, content_end):
+                    content_token = tokens[j]
+                    if content_token['type'] == 'text':
+                        if content_token['content'].strip():
+                            text_obj = Text(content_token['content'].strip(), self.font_stack[-1])
+                            container.add_child(text_obj)
+                    elif content_token['type'] == 'tag' and not content_token['closing']:
+                        # Handle nested tags
+                        self._handle_tag(content_token, container)
+            
+            # Pop the font from stack
+            if len(self.font_stack) > 1:
+                self.font_stack.pop()
+            
+            return content_end + 1 if content_end < len(tokens) else len(tokens)
+        
+        else:
+            # Handle other tags normally
+            self._handle_tag(token, container)
+            return current_index + 1
+    
+    def _find_matching_closing_tag(self, tokens, start_index, tag_name):
+        """Find the index of the matching closing tag"""
+        open_count = 1
+        i = start_index + 1
+        
+        while i < len(tokens) and open_count > 0:
+            token = tokens[i]
+            if token['type'] == 'tag' and token['name'] == tag_name:
+                if token['closing']:
+                    open_count -= 1
+                else:
+                    open_count += 1
+            i += 1
+        
+        return i - 1 if open_count == 0 else len(tokens)
+    
+    def _tokenize_html(self, content: str) -> List[Dict]:
+        """Simple HTML tokenizer"""
+        tokens = []
+        tag_pattern = r'<(/?)([^>]+)>'
+        
+        last_end = 0
+        for match in re.finditer(tag_pattern, content):
+            # Add text before tag
+            text_content = content[last_end:match.start()]
+            if text_content:
+                tokens.append({'type': 'text', 'content': text_content})
+            
+            # Add tag
+            is_closing = bool(match.group(1))
+            tag_content = match.group(2)
+            tag_parts = tag_content.split()
+            tag_name = tag_parts[0].lower()
+            
+            # Parse attributes
+            attributes = {}
+            if len(tag_parts) > 1:
+                attr_text = ' '.join(tag_parts[1:])
+                attr_pattern = r'(\w+)=(?:"([^"]*)"|\'([^\']*)\'|([^\s>]+))'
+                for attr_match in re.finditer(attr_pattern, attr_text):
+                    attr_name = attr_match.group(1).lower()
+                    attr_value = attr_match.group(2) or attr_match.group(3) or attr_match.group(4)
+                    attributes[attr_name] = attr_value
+            
+            tokens.append({
+                'type': 'tag',
+                'name': tag_name,
+                'closing': is_closing,
+                'attributes': attributes,
+                'content': tag_content
+            })
+            
+            last_end = match.end()
+        
+        # Add remaining text
+        if last_end < len(content):
+            text_content = content[last_end:]
+            if text_content:
+                tokens.append({'type': 'text', 'content': text_content})
+        
+        return tokens
+    
+    def _handle_tag(self, token: Dict, container: Container):
+        """Handle HTML tags"""
+        tag_name = token['name']
+        is_closing = token['closing']
+        attributes = token['attributes']
+        
+        if is_closing:
+            # Handle closing tags
+            if tag_name in ['b', 'strong']:
+                self.font_stack.pop()
+            elif tag_name in ['i', 'em']:
+                self.font_stack.pop()
+            elif tag_name == 'u':
+                self.font_stack.pop()
+            return
+        
+        # Handle opening tags
+        if tag_name in ['h1', 'h2', 'h3', 'h4', 'h5', 'h6']:
+            # Headers
+            size_map = {'h1': 24, 'h2': 20, 'h3': 18, 'h4': 16, 'h5': 14, 'h6': 12}
+            font = self.font_stack[-1].with_size(size_map[tag_name]).with_weight(FontWeight.BOLD)
+            self.font_stack.append(font)
+            
+        elif tag_name in ['b', 'strong']:
+            # Bold text
+            font = self.font_stack[-1].with_weight(FontWeight.BOLD)
+            self.font_stack.append(font)
+            
+        elif tag_name in ['i', 'em']:
+            # Italic text
+            font = self.font_stack[-1].with_style(FontStyle.ITALIC)
+            self.font_stack.append(font)
+            
+        elif tag_name == 'u':
+            # Underlined text
+            font = self.font_stack[-1].with_decoration(TextDecoration.UNDERLINE)
+            self.font_stack.append(font)
+            
+        elif tag_name == 'a':
+            # Links
+            href = attributes.get('href', '#')
+            title = attributes.get('title', href)
+            
+            # Determine link type
+            if href.startswith('http'):
+                link_type = LinkType.EXTERNAL
+            elif href.startswith('#'):
+                link_type = LinkType.INTERNAL
+            else:
+                link_type = LinkType.INTERNAL
+            
+            # Create link callback
+            def link_callback(location, **kwargs):
+                return f"Navigate to: {location}"
+            
+            link = Link(href, link_type, link_callback, title=title)
+            link_font = self.font_stack[-1].with_colour((0, 0, 255)).with_decoration(TextDecoration.UNDERLINE)
+            
+            # For now, just add the link text with link styling
+            link_text = attributes.get('title', href)
+            renderable_link = RenderableLink(link, link_text, link_font)
+            container.add_child(renderable_link)
+            
+        elif tag_name == 'img':
+            # Images
+            src = attributes.get('src', '')
+            alt = attributes.get('alt', 'Image')
+            width = attributes.get('width')
+            height = attributes.get('height')
+            
+            if src:
+                # Resolve relative URLs
+                if self.base_url and not src.startswith(('http://', 'https://')):
+                    if os.path.isdir(self.base_url):
+                        src = os.path.join(self.base_url, src)
+                    else:
+                        src = urljoin(self.base_url, src)
+                
+                try:
+                    # Create abstract image
+                    from pyWebLayout.abstract.block import Image as AbstractImage
+                    abstract_img = AbstractImage(src, alt)
+                    
+                    # Parse dimensions if provided
+                    max_width = int(width) if width and width.isdigit() else None
+                    max_height = int(height) if height and height.isdigit() else None
+                    
+                    renderable_img = RenderableImage(abstract_img, max_width, max_height)
+                    container.add_child(renderable_img)
+                    
+                except Exception as e:
+                    # Add error text if image fails to load
+                    error_text = Text(f"[Image Error: {alt}]", Font(colour=(255, 0, 0)))
+                    container.add_child(error_text)
+            
+        elif tag_name == 'br':
+            # Line breaks - add some vertical space
+            spacer = Box((0, 0), (1, 10))
+            container.add_child(spacer)
+            
+        elif tag_name == 'p':
+            # Paragraphs - add some vertical space
+            spacer = Box((0, 0), (1, 5))
+            container.add_child(spacer)
+            
+        elif tag_name in ['div', 'span']:
+            # Generic containers - just continue parsing
+            pass
+
+
+class BrowserWindow:
+    """Main browser window using Tkinter"""
+    
+    def __init__(self):
+        self.root = tk.Tk()
+        self.root.title("pyWebLayout HTML Browser")
+        self.root.geometry("900x700")
+        
+        self.current_page = None
+        self.history = []
+        self.history_index = -1
+        
+        self.setup_ui()
+        
+    def setup_ui(self):
+        """Setup the user interface"""
+        # Create main frame
+        main_frame = ttk.Frame(self.root)
+        main_frame.pack(fill=tk.BOTH, expand=True, padx=5, pady=5)
+        
+        # Navigation frame
+        nav_frame = ttk.Frame(main_frame)
+        nav_frame.pack(fill=tk.X, pady=(0, 5))
+        
+        # Navigation buttons
+        self.back_btn = ttk.Button(nav_frame, text="←", command=self.go_back, state=tk.DISABLED)
+        self.back_btn.pack(side=tk.LEFT, padx=(0, 5))
+        
+        self.forward_btn = ttk.Button(nav_frame, text="→", command=self.go_forward, state=tk.DISABLED)
+        self.forward_btn.pack(side=tk.LEFT, padx=(0, 5))
+        
+        self.refresh_btn = ttk.Button(nav_frame, text="⟳", command=self.refresh)
+        self.refresh_btn.pack(side=tk.LEFT, padx=(0, 10))
+        
+        # Address bar
+        ttk.Label(nav_frame, text="URL:").pack(side=tk.LEFT)
+        self.url_var = tk.StringVar()
+        self.url_entry = ttk.Entry(nav_frame, textvariable=self.url_var, width=50)
+        self.url_entry.pack(side=tk.LEFT, fill=tk.X, expand=True, padx=(5, 5))
+        self.url_entry.bind('<Return>', self.navigate_to_url)
+        
+        self.go_btn = ttk.Button(nav_frame, text="Go", command=self.navigate_to_url)
+        self.go_btn.pack(side=tk.LEFT, padx=(0, 10))
+        
+        # File operations
+        self.open_btn = ttk.Button(nav_frame, text="Open File", command=self.open_file)
+        self.open_btn.pack(side=tk.LEFT)
+        
+        # Content frame with scrollbars
+        content_frame = ttk.Frame(main_frame)
+        content_frame.pack(fill=tk.BOTH, expand=True)
+        
+        # Create canvas with scrollbars
+        self.canvas = tk.Canvas(content_frame, bg='white')
+        
+        v_scrollbar = ttk.Scrollbar(content_frame, orient=tk.VERTICAL, command=self.canvas.yview)
+        h_scrollbar = ttk.Scrollbar(content_frame, orient=tk.HORIZONTAL, command=self.canvas.xview)
+        
+        self.canvas.configure(yscrollcommand=v_scrollbar.set, xscrollcommand=h_scrollbar.set)
+        
+        v_scrollbar.pack(side=tk.RIGHT, fill=tk.Y)
+        h_scrollbar.pack(side=tk.BOTTOM, fill=tk.X)
+        self.canvas.pack(side=tk.LEFT, fill=tk.BOTH, expand=True)
+        
+        # Status bar
+        self.status_var = tk.StringVar(value="Ready")
+        status_bar = ttk.Label(main_frame, textvariable=self.status_var, relief=tk.SUNKEN)
+        status_bar.pack(fill=tk.X, pady=(5, 0))
+        
+        # Bind mouse events
+        self.canvas.bind('<Button-1>', self.on_click)
+        self.canvas.bind('<Motion>', self.on_mouse_move)
+        
+        # Load default page
+        self.load_default_page()
+        
+    def load_default_page(self):
+        """Load a default welcome page"""
+        html_content = """
+        <html>
+        <head><title>pyWebLayout Browser - Welcome</title></head>
+        <body>
+            <h1>Welcome to pyWebLayout Browser</h1>
+            <p>This is a simple HTML browser built using pyWebLayout components.</p>
+            
+            <h2>Features:</h2>
+            <ul>
+                <li>Basic HTML rendering</li>
+                <li>Text formatting (bold, italic, underline)</li>
+                <li>Headers (H1-H6)</li>
+                <li>Links (clickable)</li>
+                <li>Images</li>
+                <li>Forms (basic support)</li>
+            </ul>
+            
+            <h2>Try these features:</h2>
+            <p><b>Bold text</b>, <i>italic text</i>, and <u>underlined text</u></p>
+            
+            <p>Sample link: <a href="https://www.example.com" title="External link">Visit Example.com</a></p>
+            
+            <h3>File Operations</h3>
+            <p>Use the "Open File" button to load local HTML files.</p>
+            
+            <p>Or enter a URL in the address bar above.</p>
+        </body>
+        </html>
+        """
+        
+        parser = HTMLParser()
+        self.current_page = parser.parse_html_string(html_content)
+        self.render_page()
+        self.status_var.set("Welcome page loaded")
+        
+    def navigate_to_url(self, event=None):
+        """Navigate to the URL in the address bar"""
+        url = self.url_var.get().strip()
+        if not url:
+            return
+            
+        self.status_var.set(f"Loading {url}...")
+        self.root.update()
+        
+        try:
+            if url.startswith(('http://', 'https://')):
+                # Web URL
+                response = requests.get(url, timeout=10)
+                response.raise_for_status()
+                html_content = response.text
+                
+                parser = HTMLParser()
+                self.current_page = parser.parse_html_string(html_content, url)
+                
+            elif os.path.isfile(url):
+                # Local file
+                parser = HTMLParser()
+                self.current_page = parser.parse_html_file(url)
+                
+            else:
+                # Try to treat as a local file path
+                if not url.startswith('file://'):
+                    url = 'file://' + os.path.abspath(url)
+                
+                file_path = url.replace('file://', '')
+                if os.path.isfile(file_path):
+                    parser = HTMLParser()
+                    self.current_page = parser.parse_html_file(file_path)
+                else:
+                    raise FileNotFoundError(f"File not found: {file_path}")
+            
+            # Add to history
+            self.add_to_history(url)
+            self.render_page()
+            self.status_var.set(f"Loaded {url}")
+            
+        except Exception as e:
+            self.status_var.set(f"Error loading {url}: {str(e)}")
+            messagebox.showerror("Error", f"Failed to load {url}:\n{str(e)}")
+    
+    def open_file(self):
+        """Open a local HTML file"""
+        file_path = filedialog.askopenfilename(
+            title="Open HTML File",
+            filetypes=[("HTML files", "*.html *.htm"), ("All files", "*.*")]
+        )
+        
+        if file_path:
+            self.url_var.set(file_path)
+            self.navigate_to_url()
+    
+    def render_page(self):
+        """Render the current page to the canvas"""
+        if not self.current_page:
+            return
+            
+        # Clear canvas
+        self.canvas.delete("all")
+        
+        # Render the page to PIL Image
+        page_image = self.current_page.render()
+        
+        # Convert to PhotoImage
+        self.photo = ImageTk.PhotoImage(page_image)
+        
+        # Display on canvas
+        self.canvas.create_image(0, 0, anchor=tk.NW, image=self.photo)
+        
+        # Update scroll region
+        self.canvas.configure(scrollregion=self.canvas.bbox("all"))
+        
+        # Store page elements for interaction
+        self.page_elements = self._get_clickable_elements(self.current_page)
+    
+    def _get_clickable_elements(self, container, offset=(0, 0)) -> List[Tuple]:
+        """Get list of clickable elements with their positions"""
+        elements = []
+        
+        if hasattr(container, '_children'):
+            for child in container._children:
+                if hasattr(child, '_origin'):
+                    child_offset = (offset[0] + child._origin[0], offset[1] + child._origin[1])
+                    
+                    # Check if element is clickable
+                    if isinstance(child, (RenderableLink, RenderableButton)):
+                        elements.append((child, child_offset, child._size))
+                    
+                    # Recursively check children
+                    if hasattr(child, '_children'):
+                        elements.extend(self._get_clickable_elements(child, child_offset))
+        
+        return elements
+    
+    def on_click(self, event):
+        """Handle mouse clicks on the canvas"""
+        # Convert canvas coordinates to image coordinates
+        canvas_x = self.canvas.canvasx(event.x)
+        canvas_y = self.canvas.canvasy(event.y)
+        
+        # Check if click is on any clickable element
+        for element, offset, size in self.page_elements:
+            element_x, element_y = offset
+            element_w, element_h = size
+            
+            if (element_x <= canvas_x <= element_x + element_w and
+                element_y <= canvas_y <= element_y + element_h):
+                
+                # Handle the click
+                if isinstance(element, RenderableLink):
+                    result = element._callback()
+                    if result:
+                        self.status_var.set(result)
+                        # For external links, open in system browser
+                        if element._link.link_type == LinkType.EXTERNAL:
+                            webbrowser.open(element._link.location)
+                        
+                elif isinstance(element, RenderableButton):
+                    result = element._callback()
+                    if result:
+                        self.status_var.set(f"Button clicked: {result}")
+                
+                break
+    
+    def on_mouse_move(self, event):
+        """Handle mouse movement for hover effects"""
+        # Convert canvas coordinates to image coordinates
+        canvas_x = self.canvas.canvasx(event.x)
+        canvas_y = self.canvas.canvasy(event.y)
+        
+        # Check if mouse is over any clickable element
+        cursor = "arrow"
+        for element, offset, size in self.page_elements:
+            element_x, element_y = offset
+            element_w, element_h = size
+            
+            if (element_x <= canvas_x <= element_x + element_w and
+                element_y <= canvas_y <= element_y + element_h):
+                cursor = "hand2"
+                break
+        
+        self.canvas.configure(cursor=cursor)
+    
+    def add_to_history(self, url):
+        """Add URL to navigation history"""
+        # Remove any forward history
+        self.history = self.history[:self.history_index + 1]
+        
+        # Add new URL
+        self.history.append(url)
+        self.history_index = len(self.history) - 1
+        
+        # Update navigation buttons
+        self.update_nav_buttons()
+    
+    def update_nav_buttons(self):
+        """Update the state of navigation buttons"""
+        self.back_btn.configure(state=tk.NORMAL if self.history_index > 0 else tk.DISABLED)
+        self.forward_btn.configure(state=tk.NORMAL if self.history_index < len(self.history) - 1 else tk.DISABLED)
+    
+    def go_back(self):
+        """Navigate back in history"""
+        if self.history_index > 0:
+            self.history_index -= 1
+            url = self.history[self.history_index]
+            self.url_var.set(url)
+            self.navigate_to_url()
+    
+    def go_forward(self):
+        """Navigate forward in history"""
+        if self.history_index < len(self.history) - 1:
+            self.history_index += 1
+            url = self.history[self.history_index]
+            self.url_var.set(url)
+            self.navigate_to_url()
+    
+    def refresh(self):
+        """Refresh the current page"""
+        if self.current_page:
+            current_url = self.url_var.get()
+            if current_url:
+                self.navigate_to_url()
+            else:
+                self.load_default_page()
+    
+    def run(self):
+        """Start the browser"""
+        self.root.mainloop()
+
+
+def main():
+    """Main function to run the browser"""
+    print("Starting pyWebLayout HTML Browser...")
+    
+    try:
+        browser = BrowserWindow()
+        browser.run()
+    except Exception as e:
+        print(f"Error starting browser: {e}")
+        import traceback
+        traceback.print_exc()
+
+
+if __name__ == "__main__":
+    main()
--- a/pyWebLayout/concrete/text.py
+++ b/pyWebLayout/concrete/text.py
@ -43,13 +43,35 @@ class Text(Renderable, Queriable):
        # The bounding box is (left, top, right, bottom)
        try:
            bbox = font.getbbox(self._text)
-            self._width = bbox[2] - bbox[0]
-            self._height = bbox[3] - bbox[1]
+            # Width is the difference between right and left
+            self._width = max(1, bbox[2] - bbox[0])
+            # Height needs to account for potential negative top values
+            # Use the full height from top to bottom, ensuring positive values
+            top = min(0, bbox[1])  # Account for negative ascenders
+            bottom = max(bbox[3], bbox[1] + font.size)  # Ensure minimum height
+            self._height = max(font.size, bottom - top)
            self._size = (self._width, self._height)
+            
+            # Store the offset for proper text positioning
+            self._text_offset_x = max(0, -bbox[0])
+            self._text_offset_y = max(0, -top)
+            
        except AttributeError:
            # Fallback for older PIL versions
-            self._width, self._height = font.getsize(self._text)
-            self._size = (self._width, self._height)
+            try:
+                self._width, self._height = font.getsize(self._text)
+                # Add some padding to prevent cropping
+                self._height = max(self._height, int(self._style.font_size * 1.2))
+                self._size = (self._width, self._height)
+                self._text_offset_x = 0
+                self._text_offset_y = 0
+            except:
+                # Ultimate fallback
+                self._width = len(self._text) * self._style.font_size // 2
+                self._height = int(self._style.font_size * 1.2)
+                self._size = (self._width, self._height)
+                self._text_offset_x = 0
+                self._text_offset_y = 0
    
    @property
    def text(self) -> str:
@ -123,8 +145,10 @@ class Text(Renderable, Queriable):
        if self._style.background and self._style.background[3] > 0:  # If alpha > 0
            draw.rectangle([(0, 0), self._size], fill=self._style.background)
        
-        # Draw the text
-        draw.text((0, 0), self._text, font=self._style.font, fill=self._style.colour)
+        # Draw the text using calculated offsets to prevent cropping
+        text_x = getattr(self, '_text_offset_x', 0)
+        text_y = getattr(self, '_text_offset_y', 0)
+        draw.text((text_x, text_y), self._text, font=self._style.font, fill=self._style.colour)
        
        # Apply any text decorations
        self._apply_decoration(draw)
--- a/test_page.html
+++ b/test_page.html
@ -0,0 +1,46 @@
+<!DOCTYPE html>
+<html>
+<head>
+    <title>Test Page for pyWebLayout Browser</title>
+</head>
+<body>
+    <h1>pyWebLayout Browser Test Page</h1>
+    <h3>Images</h3>
+    <p>Here's a sample image:</p>
+    <img src="tests/data/sample_image.jpg" alt="Sample Image" width="200" height="150">
+    <h2>Text Formatting</h2>
+    <p>This is a paragraph with <b>bold text</b>, <i>italic text</i>, and <u>underlined text</u>.</p>
+    
+    <h3>Links</h3>
+    <p>Here are some test links:</p>
+    <ul>
+        <li><a href="https://www.google.com" title="Google">External link to Google</a></li>
+        <li><a href="#section1" title="Section 1">Internal link to Section 1</a></li>
+    </ul>
+    
+    <h3>Headers</h3>
+    <h1>H1 Header</h1>
+    <h2>H2 Header</h2>
+    <h3>H3 Header</h3>
+    <h4>H4 Header</h4>
+    <h5>H5 Header</h5>
+    <h6>H6 Header</h6>
+    
+    <h3>Line Breaks and Paragraphs</h3>
+    <p>This is the first paragraph.</p>
+    <br>
+    <p>This is the second paragraph after a line break.</p>
+    
+    <h3 id="section1">Section 1</h3>
+    <p>This is the content of section 1. You can link to this section using the internal link above.</p>
+    
+    <h3>Images</h3>
+    <p>Here's a sample image:</p>
+    <img src="tests/data/sample_image.jpg" alt="Sample Image" width="200" height="150">
+    
+    <h3>Mixed Content</h3>
+    <p>This paragraph contains <b>bold</b> and <i>italic</i> text, as well as an <a href="https://www.example.com">external link</a>.</p>
+    
+    <p><strong>Strong text</strong> and <em>emphasized text</em> should also work.</p>
+</body>
+</html>
--- a/tests/data/Kimi
+++ b/tests/data/Kimi
--- a/Wikipedia_files/Ambox_important.svg.webp
+++ b/Wikipedia_files/Ambox_important.svg.webp
--- a/Wikipedia_files/Ambox_important.svg_002.webp
+++ b/Wikipedia_files/Ambox_important.svg_002.webp
--- a/Wikipedia_files/Ambox_important.svg_003.webp
+++ b/Wikipedia_files/Ambox_important.svg_003.webp
--- a/Wikipedia_files/Commons-logo.svg.webp
+++ b/Wikipedia_files/Commons-logo.svg.webp
--- a/Wikipedia_files/Commons-logo.svg_002.webp
+++ b/Wikipedia_files/Commons-logo.svg_002.webp
--- a/Wikipedia_files/Commons-logo.svg_003.webp
+++ b/Wikipedia_files/Commons-logo.svg_003.webp
--- a/Wikipedia_files/Edit-clear.svg.webp
+++ b/Wikipedia_files/Edit-clear.svg.webp
--- a/Wikipedia_files/Edit-clear.svg_002.webp
+++ b/Wikipedia_files/Edit-clear.svg_002.webp
--- a/Wikipedia_files/Edit-clear.svg_003.webp
+++ b/Wikipedia_files/Edit-clear.svg_003.webp
--- a/Wikipedia_files/Flag_of_Argentina.svg.webp
+++ b/Wikipedia_files/Flag_of_Argentina.svg.webp
--- a/Wikipedia_files/Flag_of_Argentina.svg_002.webp
+++ b/Wikipedia_files/Flag_of_Argentina.svg_002.webp
--- a/Wikipedia_files/Flag_of_Argentina.svg_003.webp
+++ b/Wikipedia_files/Flag_of_Argentina.svg_003.webp
--- a/Wikipedia_files/Flag_of_Australia_(converted).svg.webp
+++ b/Wikipedia_files/Flag_of_Australia_(converted).svg.webp
--- a/Wikipedia_files/Flag_of_Australia_(converted).svg_002.webp
+++ b/Wikipedia_files/Flag_of_Australia_(converted).svg_002.webp
--- a/Wikipedia_files/Flag_of_Austria.svg.webp
+++ b/Wikipedia_files/Flag_of_Austria.svg.webp
--- a/Wikipedia_files/Flag_of_Austria.svg_002.webp
+++ b/Wikipedia_files/Flag_of_Austria.svg_002.webp
--- a/Wikipedia_files/Flag_of_Austria.svg_003.webp
+++ b/Wikipedia_files/Flag_of_Austria.svg_003.webp
--- a/Wikipedia_files/Flag_of_Barbados.svg.webp
+++ b/Wikipedia_files/Flag_of_Barbados.svg.webp
--- a/Wikipedia_files/Flag_of_Belgium_(civil).svg.webp
+++ b/Wikipedia_files/Flag_of_Belgium_(civil).svg.webp
--- a/Wikipedia_files/Flag_of_Belgium_(civil).svg_002.webp
+++ b/Wikipedia_files/Flag_of_Belgium_(civil).svg_002.webp
--- a/Wikipedia_files/Flag_of_Belgium_(civil).svg_003.webp
+++ b/Wikipedia_files/Flag_of_Belgium_(civil).svg_003.webp
--- a/Wikipedia_files/Flag_of_Brazil.svg.webp
+++ b/Wikipedia_files/Flag_of_Brazil.svg.webp
--- a/Wikipedia_files/Flag_of_Brazil.svg_002.webp
+++ b/Wikipedia_files/Flag_of_Brazil.svg_002.webp
--- a/Wikipedia_files/Flag_of_Brazil.svg_003.webp
+++ b/Wikipedia_files/Flag_of_Brazil.svg_003.webp
--- a/Wikipedia_files/Flag_of_Canada_(Pantone).svg.webp
+++ b/Wikipedia_files/Flag_of_Canada_(Pantone).svg.webp
--- a/Wikipedia_files/Flag_of_Canada_(Pantone).svg_002.webp
+++ b/Wikipedia_files/Flag_of_Canada_(Pantone).svg_002.webp
--- a/Wikipedia_files/Flag_of_Finland.svg.webp
+++ b/Wikipedia_files/Flag_of_Finland.svg.webp
--- a/Wikipedia_files/Flag_of_Finland.svg_002.webp
+++ b/Wikipedia_files/Flag_of_Finland.svg_002.webp
--- a/Wikipedia_files/Flag_of_Finland.svg_003.webp
+++ b/Wikipedia_files/Flag_of_Finland.svg_003.webp
--- a/Wikipedia_files/Flag_of_France.svg.webp
+++ b/Wikipedia_files/Flag_of_France.svg.webp
--- a/Wikipedia_files/Flag_of_France.svg_002.webp
+++ b/Wikipedia_files/Flag_of_France.svg_002.webp
--- a/Wikipedia_files/Flag_of_Germany.svg.webp
+++ b/Wikipedia_files/Flag_of_Germany.svg.webp
--- a/Wikipedia_files/Flag_of_Germany.svg_002.webp
+++ b/Wikipedia_files/Flag_of_Germany.svg_002.webp
--- a/Wikipedia_files/Flag_of_Germany.svg_003.webp
+++ b/Wikipedia_files/Flag_of_Germany.svg_003.webp
--- a/Wikipedia_files/Flag_of_Ireland.svg.webp
+++ b/Wikipedia_files/Flag_of_Ireland.svg.webp
--- a/Wikipedia_files/Flag_of_Ireland.svg_002.webp
+++ b/Wikipedia_files/Flag_of_Ireland.svg_002.webp
--- a/Wikipedia_files/Flag_of_Italy.svg.webp
+++ b/Wikipedia_files/Flag_of_Italy.svg.webp
--- a/Wikipedia_files/Flag_of_Italy.svg_002.webp
+++ b/Wikipedia_files/Flag_of_Italy.svg_002.webp
--- a/Wikipedia_files/Flag_of_Italy.svg_003.webp
+++ b/Wikipedia_files/Flag_of_Italy.svg_003.webp
--- a/Wikipedia_files/Flag_of_Japan.svg.webp
+++ b/Wikipedia_files/Flag_of_Japan.svg.webp
--- a/Wikipedia_files/Flag_of_Japan.svg_002.webp
+++ b/Wikipedia_files/Flag_of_Japan.svg_002.webp
--- a/Wikipedia_files/Flag_of_Mexico.svg.webp
+++ b/Wikipedia_files/Flag_of_Mexico.svg.webp
--- a/Wikipedia_files/Flag_of_Mexico.svg_002.webp
+++ b/Wikipedia_files/Flag_of_Mexico.svg_002.webp
--- a/Wikipedia_files/Flag_of_Monaco.svg.webp
+++ b/Wikipedia_files/Flag_of_Monaco.svg.webp
--- a/Wikipedia_files/Flag_of_Monaco.svg_002.webp
+++ b/Wikipedia_files/Flag_of_Monaco.svg_002.webp
--- a/Wikipedia_files/Flag_of_Norway.svg.webp
+++ b/Wikipedia_files/Flag_of_Norway.svg.webp
--- a/Wikipedia_files/Flag_of_Norway.svg_002.webp
+++ b/Wikipedia_files/Flag_of_Norway.svg_002.webp
--- a/Wikipedia_files/Flag_of_Poland.svg.webp
+++ b/Wikipedia_files/Flag_of_Poland.svg.webp
--- a/Wikipedia_files/Flag_of_Poland.svg_002.webp
+++ b/Wikipedia_files/Flag_of_Poland.svg_002.webp
--- a/Wikipedia_files/Flag_of_South_Africa_(1928-1982).svg.webp
+++ b/Wikipedia_files/Flag_of_South_Africa_(1928-1982).svg.webp
--- a/Wikipedia_files/Flag_of_South_Africa_(1928-1982).svg_002.webp
+++ b/Wikipedia_files/Flag_of_South_Africa_(1928-1982).svg_002.webp
--- a/Wikipedia_files/Flag_of_Sweden.svg.webp
+++ b/Wikipedia_files/Flag_of_Sweden.svg.webp
--- a/Wikipedia_files/Flag_of_Sweden.svg_002.webp
+++ b/Wikipedia_files/Flag_of_Sweden.svg_002.webp
--- a/Wikipedia_files/Flag_of_Sweden.svg_003.webp
+++ b/Wikipedia_files/Flag_of_Sweden.svg_003.webp
--- a/Wikipedia_files/Flag_of_Switzerland_(Pantone).svg.webp
+++ b/Wikipedia_files/Flag_of_Switzerland_(Pantone).svg.webp
--- a/Wikipedia_files/Flag_of_Switzerland_(Pantone).svg_002.webp
+++ b/Wikipedia_files/Flag_of_Switzerland_(Pantone).svg_002.webp
--- a/Wikipedia_files/Flag_of_Venezuela.svg.webp
+++ b/Wikipedia_files/Flag_of_Venezuela.svg.webp
--- a/Wikipedia_files/Flag_of_Venezuela.svg_002.webp
+++ b/Wikipedia_files/Flag_of_Venezuela.svg_002.webp
--- a/Wikipedia_files/Flag_of_the_People's_Republic_of_China.svg.webp
+++ b/Wikipedia_files/Flag_of_the_People's_Republic_of_China.svg.webp
--- a/Wikipedia_files/Flag_of_the_People's_Republic_of_China.svg_002.webp
+++ b/Wikipedia_files/Flag_of_the_People's_Republic_of_China.svg_002.webp
--- a/Wikipedia_files/Flag_of_the_United_Kingdom.svg.webp
+++ b/Wikipedia_files/Flag_of_the_United_Kingdom.svg.webp
--- a/Wikipedia_files/Flag_of_the_United_Kingdom.svg_002.webp
+++ b/Wikipedia_files/Flag_of_the_United_Kingdom.svg_002.webp
--- a/Wikipedia_files/Flag_of_the_United_States.svg.webp
+++ b/Wikipedia_files/Flag_of_the_United_States.svg.webp
--- a/Wikipedia_files/Flag_of_the_United_States.svg_002.webp
+++ b/Wikipedia_files/Flag_of_the_United_States.svg_002.webp
--- a/Wikipedia_files/Flag_of_the_United_States.svg_003.webp
+++ b/Wikipedia_files/Flag_of_the_United_States.svg_003.webp
--- a/Wikipedia_files/OOjs_UI_icon_edit-ltr-progressive.svg.webp
+++ b/Wikipedia_files/OOjs_UI_icon_edit-ltr-progressive.svg.webp
--- a/Wikipedia_files/Symbol_category_class.svg.webp
+++ b/Wikipedia_files/Symbol_category_class.svg.webp
--- a/Wikipedia_files/Symbol_category_class.svg_002.webp
+++ b/Wikipedia_files/Symbol_category_class.svg_002.webp
--- a/Wikipedia_files/load.css
+++ b/Wikipedia_files/load.css
--- a/Wikipedia_files/load.js
+++ b/Wikipedia_files/load.js
--- a/Wikipedia_files/load_002.css
+++ b/Wikipedia_files/load_002.css
--- a/tests/test_concrete_functional.py
+++ b/tests/test_concrete_functional.py
@ -393,7 +393,7 @@ class TestRenderableFormField(unittest.TestCase):
    """ TODO: Fix test
    @patch('PIL.ImageDraw.Draw')
    def test_render_field_with_value(self, mock_draw_class):
-        """Test rendering field with value"""
+        #Test rendering field with value
        mock_draw = Mock()
        mock_draw_class.return_value = mock_draw
        
--- a/tests/test_html_file_loader.py
+++ b/tests/test_html_file_loader.py
@ -0,0 +1,118 @@
+"""
+Test module for loading HTML files using the html_extraction module.
+
+This test verifies that HTML files can be loaded from disk and processed
+using the html_extraction.parse_html_string function.
+"""
+
+import os
+import unittest
+from pyWebLayout.io.readers.html_extraction import parse_html_string
+from pyWebLayout.abstract.block import Block
+from pyWebLayout.style import Font
+
+
+class TestHTMLFileLoader(unittest.TestCase):
+    """Test class for HTML file loading functionality."""
+
+    def test_load_html_file(self):
+        """Test loading and parsing an HTML file from disk."""
+        # Path to the test HTML file
+        html_file_path = os.path.join("tests", "data", "Kimi Räikkönen - Wikipedia.html")
+        
+        # Verify the test file exists
+        self.assertTrue(os.path.exists(html_file_path), f"Test HTML file not found: {html_file_path}")
+        
+        # Read the HTML file
+        with open(html_file_path, 'r', encoding='utf-8') as file:
+            html_content = file.read()
+        
+        # Verify we got some content
+        self.assertGreater(len(html_content), 0, "HTML file should not be empty")
+        
+        # Parse the HTML content using the html_extraction module
+        try:
+            blocks = parse_html_string(html_content)
+        except Exception as e:
+            self.fail(f"Failed to parse HTML file: {e}")
+        
+        # Verify we got some blocks
+        self.assertIsInstance(blocks, list, "parse_html_string should return a list")
+        self.assertGreater(len(blocks), 0, "Should extract at least one block from the HTML file")
+        
+        # Verify all returned items are Block instances
+        for i, block in enumerate(blocks):
+            self.assertIsInstance(block, Block, f"Item {i} should be a Block instance, got {type(block)}")
+        
+        print(f"Successfully loaded and parsed HTML file with {len(blocks)} blocks")
+
+    def test_load_html_file_with_custom_font(self):
+        """Test loading HTML file with a custom base font."""
+        html_file_path = os.path.join("tests", "data", "Kimi Räikkönen - Wikipedia.html")
+        
+        # Skip if file doesn't exist
+        if not os.path.exists(html_file_path):
+            self.skipTest(f"Test HTML file not found: {html_file_path}")
+        
+        # Create a custom font
+        custom_font = Font(font_size=14, colour=(100, 100, 100))
+        
+        # Read and parse with custom font
+        with open(html_file_path, 'r', encoding='utf-8') as file:
+            html_content = file.read()
+        
+        blocks = parse_html_string(html_content, base_font=custom_font)
+        
+        # Verify we got blocks
+        self.assertGreater(len(blocks), 0, "Should extract blocks with custom font")
+        
+        print(f"Successfully parsed HTML file with custom font, got {len(blocks)} blocks")
+
+    def test_load_html_file_content_types(self):
+        """Test that the loaded HTML file contains expected content types."""
+        html_file_path = os.path.join("tests", "data", "Kimi Räikkönen - Wikipedia.html")
+        
+        # Skip if file doesn't exist
+        if not os.path.exists(html_file_path):
+            self.skipTest(f"Test HTML file not found: {html_file_path}")
+        
+        with open(html_file_path, 'r', encoding='utf-8') as file:
+            html_content = file.read()
+        
+        blocks = parse_html_string(html_content)
+        
+        # Check that we have different types of blocks
+        block_type_names = [type(block).__name__ for block in blocks]
+        unique_types = set(block_type_names)
+        
+        # A Wikipedia page should contain multiple types of content
+        self.assertGreater(len(unique_types), 1, "Should have multiple types of blocks in Wikipedia page")
+        
+        print(f"Found block types: {sorted(unique_types)}")
+
+    def test_html_file_size_handling(self):
+        """Test that large HTML files can be handled gracefully."""
+        html_file_path = os.path.join("tests", "data", "Kimi Räikkönen - Wikipedia.html")
+        
+        # Skip if file doesn't exist
+        if not os.path.exists(html_file_path):
+            self.skipTest(f"Test HTML file not found: {html_file_path}")
+        
+        # Get file size
+        file_size = os.path.getsize(html_file_path)
+        print(f"HTML file size: {file_size} bytes")
+        
+        # Read and parse
+        with open(html_file_path, 'r', encoding='utf-8') as file:
+            html_content = file.read()
+        
+        # This should not raise an exception even for large files
+        blocks = parse_html_string(html_content)
+        
+        # Basic verification
+        self.assertIsInstance(blocks, list)
+        print(f"Successfully processed {file_size} byte file into {len(blocks)} blocks")
+
+
+if __name__ == '__main__':
+    unittest.main()
 @ -1 +1 @@
 .1%
 .0%