Will anyone find me a decent web archival program? [M$20,000 bounty]
3
1kṀ350
2026
61%
chance

I want something like Pappet: very simple, doesn't waste my time reading documentation and debugging, just lets me save a website in its entirely. Pappet does not work for me because it cannot handle websites that are behind a login page, and any links in the .mhtlm files it creates point to the live site rather than other archived files. I also haven't tested its robustness, I imagine it might fail on JS-rendered content, miss "links" that aren't <a> elements, etc.

The website I want to archive right now is https://bugs-legacy.mojang.com/, so while I'd ideally like the tool to be more broadly applicable, I'll resolve the market based on whether the tool can archive that particular website to my satisfaction.

Will give you 20k mana if you point me to a suitable tool.

Resolves NO if no one does before close.

Get
Ṁ1,000
to start trading!
Sort by:

I've been using Archivebox running on a RaspberryPi. It's the opposite though of your non-functional requirements

I think you can use the singlefile CLI for this. See the last two examples in the linked readme.

@GarrettBaker eg single-file https://www.wikipedia.org --crawl-links=true --crawl-inner-links-only=true --crawl-max-depth=1 --crawl-rewrite-rule="^(.*)\\?.*$ $1"

i want to know one too

© Manifold Markets, Inc.TermsPrivacy