Android项目---HtmlParse
在解析网站上的内容的时候,总会出现很多html的标签,一般在遇到这种数据的时候,就可以用上Html
如:
content.setText(Html.fromHtml("<html><body>" + title.getContent()+ "</body></html>", null, null));
将title.getcontent()获取的文本信息转为html格式的内容,这样,在一定程度上解决了一些由于文本格式的不同而出现特殊字符的问题。
-------------------------------------------------------------------------------------------------------------------------
官方文档:http://developer.android.com/reference/android/text/Html.html
公共类-----Html extends Object
Summary
Class Overview
This class processes HTML strings into displayable styled text. Not all HTML tags are supported.
Nested
Classes
interface Html.ImageGetter Retrieves images
for HTML <img> tags.
interface Html.TagHandler Is
notified when HTML tags are encountered that the parser does not know how to
interpret.
Public Methods
static
String escapeHtml(CharSequence text)
Returns an HTML escaped
representation of the given plain text.
static
Spanned fromHtml(String source)
Returns displayable styled text from
the provided HTML string.
static Spanned fromHtml(String source,
Html.ImageGetter imageGetter, Html.TagHandler tagHandler)
Returns
displayable styled text from the provided HTML string.
static
String toHtml(Spanned text)
Returns an HTML representation of the
provided Spanned text.
Public Methods
public static String escapeHtml (CharSequence text)
Returns an HTML escaped representation of the given plain text.
public static Spanned fromHtml (String source)
Returns displayable styled text from the provided HTML string. Any <img> tags in the HTML will display as a generic replacement image which your program can then go through and replace with real images.
This uses TagSoup to handle real HTML, including all of the brokenness found in the wild.
public static Spanned fromHtml (String source, Html.ImageGetter imageGetter, Html.TagHandler tagHandler)
Returns displayable styled text from the provided HTML string. Any <img> tags in the HTML will use the specified ImageGetter to request a representation of the image (use null if you don‘t want this) and the specified TagHandler to handle unknown tags (specify null if you don‘t want this).
This uses TagSoup to handle real HTML, including all of the brokenness found in the wild.
public static String toHtml (Spanned text)
Returns an HTML representation of the provided Spanned
text.
-------------------------------------------------------------------------------------------------------------------------
但是,如果获取的文本中有很多的html标签以及图片的话,就要用到html类中的两个方法和处理图片的内部类
这里提供一个解析html格式的图文文本,返回的spanned类型可直接作为textView或者editext的settext(spanned s)参数
public class HtmlParser { static ImageGetter imageGetter = new ImageGetter() { @Override public Drawable getDrawable(String source) { Drawable drawable = null; try { URL url = new URL(source); drawable = Drawable.createFromStream(url.openStream(), ""); drawable.setBounds(0, 0, drawable.getIntrinsicWidth(), drawable.getIntrinsicHeight());// 设置图片的大小 } catch (MalformedURLException e) { // TODO Auto-generated catch block } catch (IOException e) { // TODO Auto-generated catch block } return drawable; } };// 解析图片 public static Spanned GetString(String html) {// 调用该方法,传入html格式的字符串 Spanned s = Html.fromHtml(html, imageGetter, null); return s;//返回spanned类型 } }
但是,又如果,我们想将处理图片和文本的方法放到后台进行,那就要和AsyncTask进行配合使用了。在doInBackground方法中进行处理html文本和图片
/** * 要实现图片的显示需要使用Html.fromHtml的一个重构方法:public static Spanned * fromHtml (String source, Html.ImageGetterimageGetter, * Html.TagHandler * tagHandler)其中Html.ImageGetter是一个接口,我们要实现此接口,在它的getDrawable * (String source)方法中返回图片的Drawable对象才可以。 */ ImageGetter imageGetter = new ImageGetter() { @Override public Drawable getDrawable(String source) { // TODO Auto-generated method stub URL url; Drawable drawable = null; try { url = new URL(source); drawable = Drawable.createFromStream(url.openStream(), null); drawable.setBounds(0, 0, drawable.getIntrinsicWidth(), drawable.getIntrinsicHeight()); } catch (MalformedURLException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } return drawable; } }; HotNewsInfo title = list.get(0); test = Html.fromHtml(title.getContent(), imageGetter, null);
这样就能轻松搞定,文本中出现的特殊字符和.jpg格式的图片了。