我用php ,curl主假如抓取数据,固然我们能够用其他的要领来抓取,比方fsockopen,file_get_contents等。然则只能抓那些能直接接见的页面,假如要抓取有页面接见掌握的页面,或许是登录今后的页面就比较困难了。
1,抓取无接见掌握文件
<?php $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, "http://localhost/mytest/phpinfo.php"); curl_setopt($ch, CURLOPT_HEADER, false); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); //假如把这行解释掉的话,就会直接输出 $result=curl_exec($ch); curl_close($ch); ?>
2,运用代办举行抓取
为何要运用代办举行抓取呢?以google为例吧,假如去抓google的数据,短时间内抓的很频仍的话,你就抓取不到了。google对你的ip地点做限定这个时刻,你能够换代办从新抓。
<?php $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, "http://blog.51yip.com"); curl_setopt($ch, CURLOPT_HEADER, false); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, TRUE); curl_setopt($ch, CURLOPT_PROXY, 125.21.23.6:8080); //url_setopt($ch, CURLOPT_PROXYUSERPWD, 'user:password');假如要暗码的话,加上这个 $result=curl_exec($ch); curl_close($ch); ?>
3,post数据后,抓取数据
零丁说一下数据提交数据,由于用 curl的时刻,许多时刻会有数据交互的,所以比较重要的。
<?php $ch = curl_init(); /*在这里须要注重的是,要提交的数据不能是二维数组或许更高 *比方array('name'=>serialize(array('tank','zhang')),'sex'=>1,'birth'=>'20101010') *比方array('name'=>array('tank','zhang'),'sex'=>1,'birth'=>'20101010')如许会报错的*/ $data = array('name' => 'test', 'sex'=>1,'birth'=>'20101010'); curl_setopt($ch, CURLOPT_URL, 'http://localhost/mytest/curl/upload.php'); curl_setopt($ch, CURLOPT_POST, 1); curl_setopt($ch, CURLOPT_POSTFIELDS, $data); curl_exec($ch); ?>
在 upload.php文件中,print_r($_POST);应用curl就可以抓掏出upload.php输出的内容Array ( [name] => test [sex] => 1 [birth] => 20101010 )
4,抓取一些有页面接见掌握的页面
之前写过一篇,页面接见掌握的3种要领有兴致的能够看一下。
假如用上面提到的要领抓的话,会报以下毛病:
You are not authorized to view this page You do not have permission to view this directory or page using the credentials that you supplied because your Web browser is sending a WWW-Authenticate header field that the Web server is not configured to accept.
这个时刻,我们就要用CURLOPT_USERPWD来举行考证了
<?php $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, "http://club-china"); /*CURLOPT_USERPWD重要用来破解页面接见掌握的 *比方日常平凡我们所以htpasswd发生页面掌握等。*/ //curl_setopt($ch, CURLOPT_USERPWD, '231144:2091XTAjmd='); curl_setopt($ch, CURLOPT_HTTPGET, 1); curl_setopt($ch, CURLOPT_REFERER, "http://club-china"); curl_setopt($ch, CURLOPT_HEADER, 0); $result=curl_exec($ch); curl_close($ch); ?>
5,模仿登录到sina
我们要抓取数据,多是登录今后的内容,这个时刻我们就要用到curl的模仿登录功用了。
<?php function checklogin( $user, $password ) { if ( emptyempty( $user ) || emptyempty( $password ) ) { return 0; } $ch = curl_init( ); curl_setopt( $ch, CURLOPT_REFERER, "http://mail.sina.com.cn/index.html" ); curl_setopt( $ch, CURLOPT_HEADER, true ); curl_setopt( $ch, CURLOPT_RETURNTRANSFER, true ); curl_setopt( $ch, CURLOPT_USERAGENT, USERAGENT ); curl_setopt( $ch, CURLOPT_COOKIEJAR, COOKIEJAR ); curl_setopt( $ch, CURLOPT_TIMEOUT, TIMEOUT ); curl_setopt( $ch, CURLOPT_URL, "http://mail.sina.com.cn/cgi-bin/login.cgi" ); curl_setopt( $ch, CURLOPT_POST, true ); curl_setopt( $ch, CURLOPT_POSTFIELDS, "&logintype=uid&u=".urlencode( $user )."&psw=".$password ); $contents = curl_exec( $ch ); curl_close( $ch ); if ( !preg_match( "/Location: (.*)\\/cgi\\/index\\.php\\?check_time=(.*)\n/", $contents, $matches ) ) { return 0; }else{ return 1; } } define( "USERAGENT", $_SERVER['HTTP_USER_AGENT'] ); define( "COOKIEJAR", tempnam( "/tmp", "cookie" ) ); define( "TIMEOUT", 500 ); echo checklogin("zhangying215","xtaj227"); ?>
以上就是php curl中经常运用的5个代码示例引见的细致内容,更多请关注ki4网别的相干文章!