jsoup+httpclient获取sina、51博文内容

涉及的demo下载RometePro.rar ,编码utf-8

两大jar简介

HttpClient(要解析的网页内容)

HttpClient 功能介绍

以下列出的是 HttpClient 提供的主要的功能,要知道更多详细的功能可以参见 HttpClient 的主页。

  • 实现了所有 HTTP 的方法(GET,POST,PUT,HEAD 等)

  • 支持自动转向

  • 支持 HTTPS 协议

  • 支持代理服务器等



jsoup(强大的网页内容解析,也可以做网页内容下载,但是网页处理等方面没有httpclient强大)

  jsoup 是一款Java 的HTML解析器,可直接解析某个URL地址、HTML文本内容。它提供了一套非常省力的API,可通过DOM,CSS以及类似于jQuery的操作方法来取出和操作数据。(比HTMLParser优秀多了)

jsoup的主要功能如下:

1. 从一个URL,文件或字符串中解析HTML;
2. 使用DOM或CSS选择器来查找、取出数据;
3. 可操作HTML元素、属性、文本;
jsoup是基于MIT协议发布的,可放心使用于商业项目。
jsoup 的主要类层次结构如下图所示:

wKioL1MRl2aTxisOAAD1kSEEmM4005.jpg


下载httpclient

官网下载

网盘下载

下载JSOUP

官网下载

网盘下载

涉及的demo下载RometePro.rar ,编码utf-8


先来个效果

 sina博文解析内容,原地址:http://blog.sina.com.cn/s/blog_89cc52f20101d1sh.html


 textview内容显示的效果有以下

 1.有链接的自动设置链接(android:autoLink="all")

 2.链接地址可以像editview一样选中(可以通过触摸移动来选中链接地址),然后长安弹出复制对话框

 3.单击链接跳转到浏览器中

105438321.jpg



实现访问解析sina博文

AndroidManifest.xml中添加一下权限  

1
2
< uses-permission  android:name = "android.permission.INTERNET" ></ uses-permission >
< uses-permission  android:name = "android.permission.ACCESS_NETWORK_STATE"  />

布局使用滚动条布局 ScrollView

1)在textview中设置超链接 android:autoLink="all"


2)android:fadingEdge="vertical" (可选项)

设置拉滚动条时 ,边框渐变的放向。none(边框颜色不变),horizontal(水平方向颜色变淡),vertical(垂直方向颜色变淡)。


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
<? xml  version = "1.0"  encoding = "utf-8" ?>
< ScrollView  xmlns:android = "http://schemas.android.com/apk/res/android"
     xmlns:tools = "http://schemas.android.com/tools"
     android:layout_width = "match_parent"
     android:layout_height = "match_parent"
     android:background = "@drawable/app_choose_btn_normalbg"
     android:fadingEdge = "vertical"
     android:scrollbars = "vertical"  >
  < LinearLayout
         android:layout_width = "match_parent"
         android:layout_height = "match_parent"
         android:orientation = "vertical"
         >
      < LinearLayout
     android:layout_width = "match_parent"
     android:layout_height = "wrap_content"
     android:padding = "5dp"
     android:orientation = "horizontal"
     android:background = "@drawable/grid_pictures_gdbg"
     >
     < ImageView
         android:id = "@+id/remote_searchhome"
         android:layout_width = "40dp"
         android:layout_height = "40dp"
         android:src = "@drawable/remote_search_home"
         />
     < EditText
         android:id = "@+id/remote_searedit"
         android:layout_width = "0dp"
         android:layout_height = "40dp"
         android:layout_weight = "1"
         android:singleLine = "true"
         />
     < ImageView
         android:id = "@+id/remote_searchbtn"
         android:layout_width = "40dp"
         android:layout_height = "40dp"
         android:src = "@drawable/search_btn_icon"
         />
</ LinearLayout >
     < TextView
         android:id = "@+id/remotetext"
         android:layout_height = "match_parent"
         android:layout_width = "match_parent"
         android:gravity = "top|left"
         android:background = "@drawable/backmain_bg"
         android:textColor = "@color/red"
         android:autoLink = "all"
         />
  </ LinearLayout >
</ ScrollView >


将httpclient和jsoup加载进libs(拖入libs即可)

111129211.jpg


编写java文件

涉及的sina博文内容以 http://blog.sina.com.cn/s/blog_89cc52f20101d1sh.html 为例

涉及的51cto博文内容以 《两年来的IT资源汇总 》


获取网页内容

截取博文内容关键:Jsoup中有个根据网页的class标签为记号提取内容的函数

1
2
Document myDocument = Jsoup.parse(str);
         Elements links = myDocument.getElementsByClass(divclass);


在一个sina博文网页中通过网页分析得知博文内容的class为articalContent;

112134974.jpg


网页内容获取与文章内容的提取MySelfHttpClient.java

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
import  java.io.IOException;
import  org.apache.http.HttpResponse;
import  org.apache.http.HttpStatus;
import  org.apache.http.client.ClientProtocolException;
import  org.apache.http.client.HttpClient;
import  org.apache.http.client.methods.HttpGet;
import  org.apache.http.impl.client.DefaultHttpClient;
import  org.apache.http.util.EntityUtils;
import  org.jsoup.Jsoup;
import  org.jsoup.nodes.Document;
import  org.jsoup.nodes.Element;
import  org.jsoup.select.Elements;
public  class  MySelfHttpClient {
     //String divclass = "showContent";//51cto博客内容
     String divclass =  "articalContent" ; //sina博客内容
     public  MySelfHttpClient() {
         // TODO Auto-generated constructor stub
     }
     /**
      *
      *
      * @param link 链接地址
      * @param charSet 网页内容的编码类型
      * @return
      */
     public  String getStringFromLink(String link,String charSet){ //获取网页完整内容
         String str =  "" ;
         HttpGet request =  new  HttpGet(link);
         HttpClient httpClient =  new  DefaultHttpClient();
         try {
             HttpResponse response = httpClient.execute(request);
             if (response.getStatusLine().getStatusCode() == HttpStatus.SC_OK){
                 str = EntityUtils.toString(response.getEntity(), charSet);
             } else {
                 str =  "请求错误" ;
             }
         } catch (ClientProtocolException e){
             e.printStackTrace();
         } catch (IOException e){
             e.printStackTrace();
         }
         return  str;
     }
     /**
      *
      * @param str 截取divclass为标签的内容
      * @return 解析到的文章内容
      */
     public  String getContent(String str){ //截取divclass为标签的内容
         String content =  "" ;
         Document myDocument = Jsoup.parse(str);
         Elements links = myDocument.getElementsByClass(divclass);
         //Log.d("str", links.toString());
         for  (Element link : links) {
             content =content + link.text();
             }
         return  content;
     }
}


 判断系统是否联网

网络诊断ConnectionDetector.java

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
import  android.content.Context;
import  android.net.ConnectivityManager;
import  android.net.NetworkInfo;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
public  class  ConnectionDetector {
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
     private  Context _context;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
     public  ConnectionDetector(Context context){
         this ._context = context;
     }
  /**
   *
   *
   * @return true false  诊断是否联网
   */
     public  boolean  isConnectingToInternet(){
         ConnectivityManager connectivity = (ConnectivityManager) _context.getSystemService(Context.CONNECTIVITY_SERVICE);
           if  (connectivity !=  null )
           {
               NetworkInfo[] info = connectivity.getAllNetworkInfo();
               if  (info !=  null )
                   for  ( int  i =  0 ; i < info.length; i++)
                       if  (info[i].getState() == NetworkInfo.State.CONNECTED)
                       {
                           return  true ;
                       }
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
           }
           return  false ;
     }
}


主要的.java实现

  关键1:判断你要解析的网页的编码 ,在sina跟51cto的网页中均没有看到关于页面编码的,不过大多网页都是utf-8或gbk

 关键2:设置textview类似editview一样能长安链接然后进行复制


1
2
3
4
5
6
7
8
/**************************/
//使textview能像edittext一样能复制文本的链接内容
remoteText.setFocusableInTouchMode( true );
remoteText.setFocusable( true );
remoteText.setClickable( true );
remoteText.setLongClickable( true );
remoteText.setMovementMethod(ArrowKeyMovementMethod.getInstance());
/**************************/


 主要实现RemoteText.java

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
package  com.remote;
import  com.remotepro.R;
import  android.app.Activity;
import  android.app.ProgressDialog;
import  android.os.Bundle;
import  android.os.Handler;
import  android.os.Message;
import  android.text.method.ArrowKeyMovementMethod;
import  android.view.View;
import  android.view.Window;
import  android.view.View.OnClickListener;
import  android.widget.EditText;
import  android.widget.ImageView;
import  android.widget.TextView;
import  android.widget.Toast;
public  class  RemoteText  extends  Activity{
     TextView remoteText;
     EditText myEditText;
     ImageView mySearchBtn;
     ImageView myHomeBtn;
     MySelfHttpClient mySelfHttpClient;
     String link =  "http://blog.sina.com.cn/s/blog_89cc52f20101d1sh.html" ; //sina博客
     String charSet =  "utf-8" ; //sina博客
     //String link = "http://7071976.blog.51cto.com/7061976/1289909";
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
     //String charSet = "gbk";
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
     String myText;
     //这句判断链接类型,在toast提示是否符合本次解析的网址类型
     String linktag =  "http://blog.sina.com.cn" ;//以sina为列子
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
     ConnectionDetector myConnectionDetector; //诊断时否联网
     ProgressDialog myProgressDialog =  null ; //加载进度条
     @Override
     protected  void  onCreate(Bundle savedInstanceState) {
         // TODO Auto-generated method stub
         super .onCreate(savedInstanceState);
         requestWindowFeature(Window.FEATURE_NO_TITLE);
         setContentView(R.layout.remotemain);
         init();
     }
     public  void  init(){
         remoteText = (TextView)findViewById(R.id.remotetext);
         myEditText = (EditText)findViewById(R.id.remote_searedit);
         mySearchBtn = (ImageView)findViewById(R.id.remote_searchbtn);
         myHomeBtn = (ImageView)findViewById(R.id.remote_searchhome);
         mySelfHttpClient =  new  MySelfHttpClient();
         myConnectionDetector =  new  ConnectionDetector( this );
         mySearchBtn.setOnClickListener(mySearcClick);
         myHomeBtn.setOnClickListener(myHomeClckListener);
         initText();
     }
     /***************************/
     public  void  initText(){
         if (myConnectionDetector.isConnectingToInternet()){
         myProgressDialog = ProgressDialog.show( this , getString(R.string.waiting), getResources().getString(R.string.loading));
         new  InitTextThead().start();
         }
     }
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
     class  InitTextThead  extends  Thread{
         @Override
         public  void  run() {
             // TODO Auto-generated method stub
             super .run();
             //获取解析内容
             myText = mySelfHttpClient.getContent(mySelfHttpClient.getStringFromLink(link, charSet));
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
             myHandler.sendEmptyMessage( 1 );
         }
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
     }
     Handler myHandler =  new  Handler(){
         @Override
         public  void  handleMessage(Message msg) {
             // TODO Auto-generated method stub
             super .handleMessage(msg);
             switch  (msg.what) {
             case  1 :
                 /**************************/
                 //使textview能像edittext一样能复制文本的链接内容
                 remoteText.setFocusableInTouchMode( true );
                 remoteText.setFocusable( true );
                 remoteText.setClickable( true );
                 remoteText.setLongClickable( true );
                 remoteText.setMovementMethod(ArrowKeyMovementMethod.getInstance());
                 /**************************/
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                
                 remoteText.setText(myText);
                 myProgressDialog.dismiss();
                 break ;
             case  2 :
                 remoteText.setFocusableInTouchMode( true );
                 remoteText.setFocusable( true );
                 remoteText.setClickable( true );
                 remoteText.setLongClickable( true );
                 remoteText.setMovementMethod(ArrowKeyMovementMethod.getInstance());
                 remoteText.setText(myText);
                 myProgressDialog.dismiss();
                 break ;
             case  3 :
                 myProgressDialog.dismiss();
                 Toast.makeText(RemoteText. this , R.string.errorlingaddr, Toast.LENGTH_LONG).show();
                 break ;
             default :
                 break ;
             }
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
         }
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
     };
     /********************************/
     OnClickListener mySearcClick =  new  OnClickListener() {
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
         @Override
         public  void  onClick(View v) {
             // TODO Auto-generated method stub
             searchclick();
         }
     };
     public  void  searchclick(){
         if (myConnectionDetector.isConnectingToInternet()){
             myProgressDialog = ProgressDialog.show( this , getResources().getString(R.string.waiting), getResources().getString(R.string.loading));
         new  SearchThread().start();
         }
     }
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
     class  SearchThread  extends  Thread{
         @Override
         public  void  run() {
             // TODO Auto-generated method stub
             super .run();
             String link = myEditText.getText().toString();
             if (link.startsWith(linktag)){
                 myText = mySelfHttpClient.getContent(mySelfHttpClient.getStringFromLink(link, charSet));
                 myHandler.sendEmptyMessage( 2 );
             } else {
                 myHandler.sendEmptyMessage( 3 );
             }
         }
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
     }
     /********************************/
     OnClickListener myHomeClckListener =  new  OnClickListener() {
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
         @Override
         public  void  onClick(View v) {
             // TODO Auto-generated method stub
             initText();
         }
     };
}


解析51cto博文,

 将MySelfHttpClient.java,RemoteText.java的注释修改


修改 MySelfHttpClient.java

1
2
//String divclass = "showContent";//51cto博客内容
     String divclass =  "articalContent" ; //sina博客内容



RemoteText.java的注释,51cto的网页编码为gbk

1
2
3
4
5
String link =  "http://blog.sina.com.cn/s/blog_89cc52f20101d1sh.html" ; //sina博客
     String charSet =  "utf-8" ; //sina博客
     //String link = "http://7071976.blog.51cto.com/7061976/1289909";
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                
     //String charSet = "gbk";



 如果要使用本软件中的edit输入框使用链接,还需修改RemoteText.java中的linktag内容,

MySelfHttpClient.java

1
String linktag =  "http://blog.sina.com.cn" ;//判断editview中的链接是否合法,这里以sina为例


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
class  SearchThread  extends  Thread{
     @Override
     public  void  run() {
         // TODO Auto-generated method stub
         super .run();
         String link = myEditText.getText().toString();
         if (link.startsWith(linktag)){
             myText = mySelfHttpClient.getContent(mySelfHttpClient.getStringFromLink(link, charSet));
             myHandler.sendEmptyMessage( 2 );
         } else {
             myHandler.sendEmptyMessage( 3 );
         }
     }
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
}


解析http://7071976.blog.51cto.com/7061976/1289909 博文内容效果如下

120554730.jpg



总结:本文以获取博文内容为例,使用httpclient抓取网页内容,以jsoup为解析提取博文内容,看起来在text上显示的内容有点混乱,但这是可以改进的

技术推广:就以井冈山大学图书管理系统为例,这套图书系统是学校租用外面公司的,安一般思路要开发图书馆里系统客户端需要后台数据库接出个站点提供数据检索,但那个公司不提供这方面的服务,那么可以通过httpclient解析网页实现登录,查询,续借等功能,这样一个android客户端的就能实现了。



本文转自lilin9105 51CTO博客,原文链接:http://blog.51cto.com/7071976/1297327,如需转载请自行联系原作者
优秀的个人博客,低调大师

微信关注我们

原文链接:https://yq.aliyun.com/articles/447804

转载内容版权归作者及来源网站所有!

低调大师中文资讯倾力打造互联网数据资讯、行业资源、电子商务、移动互联网、网络营销平台。持续更新报道IT业界、互联网、市场资讯、驱动更新,是最及时权威的产业资讯及硬件资讯报道平台。

相关文章

发表评论

资源下载

更多资源
Mario,低调大师唯一一个Java游戏作品

Mario,低调大师唯一一个Java游戏作品

马里奥是站在游戏界顶峰的超人气多面角色。马里奥靠吃蘑菇成长,特征是大鼻子、头戴帽子、身穿背带裤,还留着胡子。与他的双胞胎兄弟路易基一起,长年担任任天堂的招牌角色。

Apache Tomcat7、8、9(Java Web服务器)

Apache Tomcat7、8、9(Java Web服务器)

Tomcat是Apache 软件基金会(Apache Software Foundation)的Jakarta 项目中的一个核心项目,由Apache、Sun 和其他一些公司及个人共同开发而成。因为Tomcat 技术先进、性能稳定,而且免费,因而深受Java 爱好者的喜爱并得到了部分软件开发商的认可,成为目前比较流行的Web 应用服务器。

Eclipse(集成开发环境)

Eclipse(集成开发环境)

Eclipse 是一个开放源代码的、基于Java的可扩展开发平台。就其本身而言,它只是一个框架和一组服务,用于通过插件组件构建开发环境。幸运的是,Eclipse 附带了一个标准的插件集,包括Java开发工具(Java Development Kit,JDK)。

Sublime Text 一个代码编辑器

Sublime Text 一个代码编辑器

Sublime Text具有漂亮的用户界面和强大的功能,例如代码缩略图,Python的插件,代码段等。还可自定义键绑定,菜单和工具栏。Sublime Text 的主要功能包括:拼写检查,书签,完整的 Python API , Goto 功能,即时项目切换,多选择,多窗口等等。Sublime Text 是一个跨平台的编辑器,同时支持Windows、Linux、Mac OS X等操作系统。