Curl и фильтры потоков

raarkil · 29 Jan 2023

Используйте cURL, чтобы получить исходный код веб-сайта «https://www.inlanefreight.com» и отфильтровать все уникальные пути этого домена. В качестве ответа укажите количество этих путей.

такой кажись простой а в ступор так кинул, может что не так делаю?
curl https://www.inlanefreight.com | grep src=https://www.inlanefreight.com/*/ | wc -l

П.С я просто учусь ещё пока только и не про весь спектр команд и их аргумент слышал(а есть что по теме почитать то я буду рад)

b3 · 29 Jan 2023

Code:

┌──(kali㉿kali)-[/dev/shm]
└─$ curl -f -L -s https://curl.se/ | grep -Po '(http|ftp|https):\/\/([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,@?^=%&:\/~+#-]*[\w@?^=%&\/~+#-])'
http://www.w3.org/TR/html4/loose.dtd
https://github.com/curl/curl/releases.atom
https://github.com/curl/curl
https://lists.haxx.se/listinfo/curl-library
https://lists.haxx.se/listinfo/curl-users
https://github.com/curl/curl/issues
https://curl.se/logo/curl-logo.svg
https://bestpractices.coreinfrastructure.org/projects/63
https://github.com/curl/curl
https://www.fastly-insights.com/insights.js?k=8cb1247c-87c2-4af9-9229-768b1990f90b

регулярка из гугла для сбора всех урлов. Но есть же еще ссылки которые начинаются с слеши или просто указывающие на файл типа <a href="/page.html">

Code:

┌──(kali㉿kali)-[/dev/shm]
└─$ curl -f -L -s https://curl.se/ | grep -Po 'href="(.*)"' | awk -F\" '{print $2}' | sort -u 
/book.html
/changes.html
/curl.css
/dashboard.html
/dev/
/dev/builds.html
/dev/code-review.html
/dev/code-style.html
/dev/contribute.html
/dev/deprecate.html
/dev/internals.html
/dev/release-notes.html
/dev/release-procedure.html
/dev/roadmap.html
/dev/runtests.html
/dev/secprocess.html
/dev/testcurl.html
/docs/
/docs/bugbounty.html
/docs/caextract.html
docs/copyright.html
/docs/faq.html
/docs/help-us.html
docs/help-us.html
/docs/http2.html
/docs/http-cookies.html
/docs/httpscripting.html
/docs/irc.html
/docs/knownbugs.html
/docs/manpage.html
/docs/manual.html
/docs/projdocs.html
/docs/protdocs.html
/docs/reldocs.html
/docs/releases.html
/docs/security.html
/docs/sslcerts.html
docs/thanks.html
/docs/todo.html
/docs/tooldocs.html
/docs/versions.html
/docs/videos/
/docs/vulnerabilities.html
/docs/whodocs.html
/donation.html
/download.html
download.html
/favicon.ico
/gethelp.html
https://bestpractices.coreinfrastructure.org/projects/63
https://github.com/curl/curl
https://github.com/curl/curl/issues
https://github.com/curl/curl/releases.atom
https://lists.haxx.se/listinfo/curl-library
https://lists.haxx.se/listinfo/curl-users
/libcurl/
/libcurl/abi.html
/libcurl/c/
/libcurl/c/example.html
/libcurl/c/libcurl-tutorial.html
/libcurl/competitors.html
/libcurl/features.html
/libcurl/relatedlibs.html
/libcurl/theysay.html
/libcurl/using/
/logo/curl-symbol.svg
/mail/
/mail/list.cgi?list=curl-library
/news.html
/rfc/
/sponsors.html
sponsors.html
/support.html
/tiny/

b3 · 29 Jan 2023

Но в таких ссылках еще есть пути к картинкам и стилям:
/favicon.ico
/curl.css

Code:

┌──(kali㉿kali)-[/dev/shm]
└─$ curl -f -L -s https://curl.se/ | grep -Po 'href="(.*)"' | awk -F\" '{print $2}' | sort -u | grep -vP '\.(ico|svg|css)$'
/book.html
/changes.html
/dashboard.html
/dev/
/dev/builds.html
/dev/code-review.html
/dev/code-style.html
/dev/contribute.html
/dev/deprecate.html
/dev/internals.html
/dev/release-notes.html
/dev/release-procedure.html
/dev/roadmap.html
/dev/runtests.html
/dev/secprocess.html
/dev/testcurl.html
/docs/
/docs/bugbounty.html
/docs/caextract.html
docs/copyright.html
/docs/faq.html
/docs/help-us.html
docs/help-us.html
/docs/http2.html
/docs/http-cookies.html
/docs/httpscripting.html
/docs/irc.html
/docs/knownbugs.html
/docs/manpage.html
/docs/manual.html
/docs/projdocs.html
/docs/protdocs.html
/docs/reldocs.html
/docs/releases.html
/docs/security.html
/docs/sslcerts.html
docs/thanks.html
/docs/todo.html
/docs/tooldocs.html
/docs/versions.html
/docs/videos/
/docs/vulnerabilities.html
/docs/whodocs.html
/donation.html
/download.html
download.html
/gethelp.html
https://bestpractices.coreinfrastructure.org/projects/63
https://github.com/curl/curl
https://github.com/curl/curl/issues
https://github.com/curl/curl/releases.atom
https://lists.haxx.se/listinfo/curl-library
https://lists.haxx.se/listinfo/curl-users
/libcurl/
/libcurl/abi.html
/libcurl/c/
/libcurl/c/example.html
/libcurl/c/libcurl-tutorial.html
/libcurl/competitors.html
/libcurl/features.html
/libcurl/relatedlibs.html
/libcurl/theysay.html
/libcurl/using/
/mail/
/mail/list.cgi?list=curl-library
/news.html
/rfc/
/sponsors.html
sponsors.html
/support.html
/tiny/

raarkil · 30 Jan 2023

блин надо регулярки учить, хотел их обойти, пока к ним привыкнешь то мозг сломаешь)
спасибо большое
я правильно понимаю что впринципе нам нужны не так нужны ссылки с указанием формата а ссылки:
http://www.w3.org/TR/html4/
без loose.dtd на конце?

raarkil · 1 Feb 2023

спасибо, все решил

Curl и фильтры потоков

raarkil New Member

b3 Banned

b3 Banned

raarkil New Member

raarkil New Member

Useful Searches

Curl и фильтры потоков

raarkil New Member

b3 Banned

b3 Banned

raarkil New Member

raarkil New Member