用php抓取网页进行解析的扩展 simple_html_dom.php
PHP Simple HTML DOM Parser
一款php编写的HTML DOM 解析器
需要 PHP 5+.
支持无效的HTML.
像jquery一样操纵html的tag标签
从html中释放出内容,类似js中innerHTML
1、获取网页
// Create DOM from URL or file $html = file_get_html('http://www.bfw.wiki/'); // Find all images foreach($html->find('img') as $element) echo $element->src . '<br>'; // Find all links foreach($html->find('a') as $element) echo $element->href . '<br>';2、操作网页dom
// Create DOM from string $html = str_get_html('<div id="hello">Hello</div><div id="world">World</div>'); $html->find('div', 1)->class = 'bar'; $html->find('div[id=hello]', 0)->innertext = 'foo'; echo $html; // Output: <div id="hello">foo</div><div id="world" class="bar">World</div>3、导出网页代码
// Dump contents (without tags) from HTML echo file_get_html('http://www.bfw.wiki/')->plaintext;4、蜘蛛抓取网页关键信息
// Create DOM from URL $html = file_get_html('http://slashdot.org/'); // Find all article blocks foreach($html->find('div.article') as $article) { $item['title'] = $article->find('div.title', 0)->plaintext; $item['intro'] = $article->find('div.intro', 0)->plaintext; $item['details'] = $article->find('div.details', 0)->plaintext; $articles[] = $item; } print_r($articles);下载 点击下载simple_html_dom.php
网友评论0