Помогите доделать парсилку [Perl]

Discussion in 'PHP' started by $p01nt, 20 Feb 2009.

  1. $p01nt

    $p01nt Elder - Старейшина

    Joined:
    19 Feb 2008
    Messages:
    116
    Likes Received:
    20
    Reputations:
    1
    Всем привет!
    Ребят, не получается спарсить id пользователей с hi5.com, проблема с отправкой запросов.
    Т.е. скрипт должен отправлять запрос "Найти юзеров" потом "показать по тому же запросу еще 10 человек" и т.д.
    запрос "найти юзеров" отлично работае, отлично парсится, а дальше затруднения.
    Вот собственно код (должен спрасить первую и вторую страничку поиска, но парсит только первую):

    Code:
           use warnings; 
    	use strict; 
    	use HTTP::Cookies; 
    	use LWP::UserAgent; 
    	
    	 my $browser = LWP::UserAgent->new(); 
         my $cookies = HTTP::Cookies->new(); 
         $browser->cookie_jar($cookies); 
    	 
    	 open(ID,">>id.txt");	 
    	 
    	 my $url='http://hi5.com/friend/processSearch.do?searchNew=1&fromPage=%2Ffriend%2FWEB-INF%2Fsearch%2FsearchTotal.jsp&fromEmail=0&oldSearchString=&email=&name=&ageFrom=25&ageTo=55&gender=0&loveStatus=1069&goals=&country=1030&zip=&city=&miles=0&miles=0';
    	my $url2='http://hi5.com/friend/processSearch.do?searchText=&searchType=advanced&offset=10&qx=People+Search+';
    
    	my $response=$browser->get($url);
    	$response=$browser->get($url2)->as_string;
    	open(GOPOD,">gso.html"); print GOPOD $response;
     
  2. [dei]

    [dei] Active Member

    Joined:
    24 Nov 2008
    Messages:
    171
    Likes Received:
    112
    Reputations:
    5
    Code:
    use HTTP::Cookies; 
    use LWP::UserAgent;
    use IO::Handle;
    
    my $browser = LWP::UserAgent->new(); 
    my $cookies = HTTP::Cookies->new(); 
    $browser->cookie_jar($cookies); 
     
    open(ID, ">id.txt");
    ID->autoflush(1);
    my $url = 'http://hi5.com/friend/processSearch.do?searchNew=1&fromPage=%2Ffriend%2FWEB-INF%2Fsearch%2FsearchTotal.jsp&fromEmail=0&oldSearchString=&email=&name=&ageFrom=25&ageTo=55&gender=0&loveStatus=1069&goals=&country=1030&zip=&city=&miles=0&miles=0';
    
    my $response = $browser->get($url)->as_string;
    
    while($response =~ /<a href="javascript:paginatePeople\('(\d+)',''\);" class="link_pagination_arrow"> Next &gt;<\/a>/) {
        my $nx=$1;
        while($response =~ /title=".*?" href="\/friend\/p(\d+)/g) {
            print ID $1."\n";
        }
        $response = $browser->get('http://hi5.com/friend/processSearch.do?searchText=&searchType=advanced&offset='.$nx.'&qx=People+Search+')->as_string;
    }