Чекер http://pastebin.com/

Discussion in 'Песочница' started by Шниперсон, 4 Apr 2016.

  1. Шниперсон

    Joined:
    14 May 2015
    Messages:
    63
    Likes Received:
    13
    Reputations:
    3
    Народ есть ли сейчас чекер http://pastebin.com/, и вобще он в тренде ?
     
  2. rct

    rct Active Member

    Joined:
    13 Jun 2015
    Messages:
    359
    Likes Received:
    107
    Reputations:
    7
    На что чекать?
     
  3. Шниперсон

    Joined:
    14 May 2015
    Messages:
    63
    Likes Received:
    13
    Reputations:
    3
    На хэши и мыла например и всякую полезную инфу
     
  4. rct

    rct Active Member

    Joined:
    13 Jun 2015
    Messages:
    359
    Likes Received:
    107
    Reputations:
    7
    А список урлов где брать не подскажешь? К поиску метода в доке http://pastebin.com/api нет. Так-то написать не долго.
     
  5. Шниперсон

    Joined:
    14 May 2015
    Messages:
    63
    Likes Received:
    13
    Reputations:
    3
    хотяб грабить урлы отсюда, раз в минуту http://pastebin.com/archive
     
  6. blackbox

    blackbox Elder - Старейшина

    Joined:
    31 Dec 2011
    Messages:
    362
    Likes Received:
    62
    Reputations:
    11
    Вот питоновский вариант, он грабит линки раз в минуту и сохраняет все в текстовые файлы в указанный каталог.

    Code:
    # -*- coding: utf-8 -*-
    
    import socks
    import socket
    import requests
    import sys
    import re
    import os
    import time
    
    def custom_function(url, dir):
        dir = dir + '/'
        ext = 'txt'
        headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
        try:
            r = requests.get(url, headers=headers, timeout=15)
        except Exception as e:
            print(e)
            return None
        try:
            html = r.text
        except Exception as e:
            print(e)
            return None
        #print(html)
        if url[-1] == '/':
            url = url[:-1]
        #print(url)
        pos = url.rfind('/')
        if ( pos == -1 ):
            print("Error getting file name from url")
            return None
        name = url[pos+1:]
        file_name = name + '.' + ext
        print('Saving to ' + file_name)
        try:
            if not os.path.exists(dir):
                os.mkdir(dir)
        except Exception as e:
            print(e)
            return None
          
        try:  
            f = open(dir + file_name, 'wb')
            f.write(html.encode("utf-8"))
            f.close()
        except Exception as e:
            print(e)
            return None
          
        return True
          
    url = 'http://pastebin.com/archive'
    raw_pre_url = 'http://pastebin.com/raw/'
    dir = 'saved'
    sleep_time = 60
    #socks.set_default_proxy(socks.SOCKS5, "localhost", 9150)
    #socket.socket = socks.socksocket
    headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
    
    while True:
        try:
            r = requests.get(url, headers=headers, timeout=10)
        except Exception as e:
            print(e)
            print('Exiting')
            sys.exit()
        html = r.text
        #print(t)
        regex = re.compile('class="i_p0" alt="" /><a href="/(.+)">Untitled</a></td>')
        matches = regex.findall(html)
        #print(matches)
        for match in matches:
            dest = raw_pre_url+match
            #print(dest)
            custom_function(dest, dir)
        print('Sleeping for ' + str(sleep_time) + ' seconds ...')
        time.sleep(sleep_time)
          
      
    
    

    Нужно еще будет поставить requests и PySocks модули командой: pip install имя_модуля.
     
    #6 blackbox, 8 Apr 2016
    Last edited: 8 Apr 2016
    Шниперсон and trolex like this.