Jericho HTML Parser是一个简单而功能强大的Java HTML解析器库,可以分析和处理HTML文档的一部分,包括一些通用的服务器端标签,同时也可以重新生成无法识别的或无效的HTML。它也提供了一个有用的HTML表单分析器。
官方提到得一些特性:
- Complete rewrite of the parsing engine to allow the encapsulation of different tag types into the new TagType class.
- Requires Java 1.4 or later.
- All programs written for previous versions of the library will have to be recompiled with the new version, regardless of whether any changes are required. This is because several methods, including the Source constructor, now expect a CharSequence as an argument instead of a String.
Java,中间件和Open source社区:http://www.matrix.org.cn
官方网站:
http://jerichohtml.sourceforge.net/doc/index.html
标签: