python四种方式实现html文件转word、pdf文档或图片-BFW博客

python四种方式实现html文件转word、pdf文档或图片

ai大模型可以生成各种各样的设计页面，包括海报、ui界面、简历等你能看到的一切设计，但是没办法直接生成pdf、word等电子文档，如果我们可以将html代码转成word文档，那岂不是ai就能生成各种精美好看大气的文档了。

python有四种方式可以将htm代码文件转换成word文档图片等形式，最后一种最好，与html显示效果一致。

1、BeautifulSoup++Document

from bs4 import BeautifulSoup
from docx import Document
import sys

def html_to_word(html_file: str, output_file: str):
    """将 HTML 文件转换为 Word 文档"""
    # 读取 HTML 文件
    with open(html_file, 'r', encoding='utf-8') as f:
        html_content = f.read()

    soup = BeautifulSoup(html_content, 'html.parser')

    # 创建一个空白的 Word 文档
    doc = Document()

    # 如果有 <title> 标签，则添加标题
    if soup.title:
        doc.add_heading(soup.title.string, level=1)

    # 将 <h1> ~ <h6> 标签内容添加为标题
    for tag_name in ['h1', 'h2', 'h3', 'h4', 'h5', 'h6']:
        for tag in soup.find_all(tag_name):
            # 去掉前后空白，并以标签名称决定标题级别
            text = tag.get_text(strip=True)
            if text:
                level = int(tag_name[1])
                doc.add_heading(text, level=level)

    # 将 <p> 和 <div> 标签内容添加为段落
    for tag in soup.find_all(['p', 'div']):
        text = tag.get_text(strip=True)
        if text:
            doc.add_paragraph(text)

    # 保存 Word 文档
    doc.save(output_file)
    print(f"转换成功，文件保存为：{output_file}")

if __name__ == "__main__":
    if len(sys.argv) != 3:
        print("用法: python htmltoword.py input.html output.docx")
        sys.exit(1)
    html_to_word(sys.argv[1], sys.argv[2])

这种方式能将html转成word，但是样式跟原来的html有差异

2、weasyprint

from selenium import webdriver
from docx import Document
from weasyprint import HTML
import pdfkit
import imgkit
import os

class HTMLConverter:
    def __init__(self):
        """初始化转换器"""
        # Chrome headless模式配置
        self.chrome_options = webdriver.ChromeOptions()
        self.chrome_options.add_argument('--headless')
        self.chrome_options.add_argument('--disable-gpu')

    def to_image(self, html_path: str, output_path: str):
        """将HTML转换为图片
        
        Args:
            html_path: HTML文件路径
            output_path: 输出图片路径(.png/.jpg)
        """
        # 使用imgkit
        options = {
            'quality': 100,
            'form...

点击查看剩余70%

打赏博主×

python四种方式实现html文件转word、pdf文档或图片

网友评论0

用python实现视频聊天换脸伪装成别人

用socket.io打造一个分布式的基于html的即时通讯系统

go语言编写的php应用服务器FrankenPHP

5分钟创建一个android监听通知服务

flutter的生命周期

GitHub上的17款开源项目推荐

一步一步教你用js写一个数据双向绑定的框架

python+Whisper+edge-tts搭建免费离线运行语音识别与文字转语音合成web api

lazyload.js为啥不起作用？

机器学习中开发情感分析器的 4 种方法

{{item.title}}

何为BFWSOA框架

BFWSOA框架特性

BFWSOA框架程序流程图

MVCVPSCW七层架构

BFWSOA框架创建一个小应用

BFWSOA框架路由模式与Apache、Nginx配置

BFWSOA框架表单验证与提交

BFWSOA框架数据库操作

BFWSOA 缓存设置

BFWSOA模型简介

nodejs有没有很快的目录爬虫和通配符文件查找库？

js如何流式输出ai的回答并折叠代码块，点击代码块右侧可预览代码？

ai大模型如何将文章转换成可视化一目了然的图片流程图图表？

大模型生成html版本的ui原型图和ppt演示文档的系统提示词怎么写？

rtsp视频直播流如何转换成websocket流在h5页面上观看？

为啥coze会开源工作流agent coze studio？

如何检测网页是通过收藏夹打开的？

python如何实现类似php的http动态脚本请求处理响应代码？

js如何实现类似php的http动态脚本请求处理响应代码？

trae与solo有啥区别不同？