http file size/mime prober

lifescore · 19 Jan 2021

Накидана на коленке, принимает аргументы разделенные пробелами, табуляцией.
Основные фичи:
1. Не требует соблюдения sheme url а также http/s. Принимает как domain.tld/link так и http(s)://domain.tld.
2. После установки соединения сбрасывает коннект. Загружаются первые КБ данных без необходимости грузить файлы.
3. Запускается где угодно.

з.ы. при использовании xargs с параметром 200 процессов обход 400.000 ссылок на файлы заняло ~15 минут.

есть также реализация на dotnet, с параллельными потоками (шустрее), если интересно поделюсь по просьбе.

Install debian/ubuntu
▶ sudo apt install ripgrep curl
Install MacOs
▶ brew install ripgrep curl
Windows with WSL (support)

Options
Usage of http_file_prober.sh:
Code:
▶ http_file_prober.sh [url] [url2] [urlX..]
       Pass url or bulk urls space delimetr as arguments
▶ cat urls.txt | xargs -n10 -P25 /fullpath/http_file_prober.sh
       Run script in many proccess where "-P25" - number of proccess. For help check `man xargs`
Usage sample
Code:
▶ ./http_file_prober.sh https://pastebin.com/raw/{8XD3uxYQ,cydhSP1v,g5wTvVq7}
https://pastebin.com/raw/8XD3uxYQ                                                     | Null       | text/plain
https://pastebin.com/raw/cydhSP1v                                                     | Null       | text/plain
https://pastebin.com/raw/g5wTvVq7                                                     | Null       | text/plain
Code:
▶ ./http_file_prober.sh https://yandex.ru/news/quotes/2002.html https://yandex.ru/support/common/troubleshooting/main.html
https://yandex.ru/news/quotes/2002.html                                               | 6856       | text/html
https://yandex.ru/support/common/troubleshooting/main.html                            | 266851     | text/html
Sources: https://github.com/h4r7w3l1/http_file_prober

Useful Searches

http file size/mime prober

lifescore Elder - Старейшина