an open source web crawler

Being a web developer, it’s often handy to crawl one of your sites and see if any links are broken, or are given plain 500 errors because something is broken.

A classic tool to do this with, is Xenu’s Link Sleuth. That tool is old though, no longer updated and it’s a pure GUI tool. Since I couldn’t find what I wanted as a ready to use command line tool, I got down and wrote my own. It took a while, but recently it became functional enough to be released as a v1 and open it up to the world as an open source tool.

So by this I present *drumroll*, the Sitecrawler command line based site crawler (yeah, I know, naming is hard).

What can it do?

Crawl a site by following all links in a page. It only crawls internal links and HTML content.
Crawls links only once. No crawling loops please.
Possibility to export the crawled links to a CSV file, containing the referrer links (handy for tracking 404’s).
Limit crawl time to a fixed number of minutes for large sites.
Set the number of parallel jobs to use for crawling.
Add a delay to throttle requests on slow sites, or sites with rate limits.

It’s written in .NET 6, so it runs on Windows, Mac and Linux. Check it out on GitHub for more details and downloads. It’s proven useful for me already, so I hope it does the same for you.

The post an open source web crawler appeared first on n3wjack's blog.

an open source web crawler

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112