I want something like Pappet: very simple, doesn't waste my time reading documentation and debugging, just lets me save a website in its entirely. Pappet does not work for me because it cannot handle websites that are behind a login page, and any links in the .mhtlm files it creates point to the live site rather than other archived files. I also haven't tested its robustness, I imagine it might fail on JS-rendered content, miss "links" that aren't <a> elements, etc.
I only need a single snapshot, not a constantly-updated archive.
The website I want to archive right now is https://bugs-legacy.mojang.com/, so while I'd ideally like the tool to be more broadly applicable, I'll resolve the market based on whether the tool can archive that particular website to my satisfaction.
Will give you 20k mana if you point me to a suitable tool. Resolves NO if no one does before close.
People are also trading
Try HTTrack perhaps: https://www.httrack.com/
Instructions here for archiving a site that requires login: https://superuser.com/questions/157331/mirroring-a-web-site-behind-a-login-form#1274008
@IsaacKing No, that's after you download it, they are explaining one way to view the downloaded site. But the download itself is automated -- you specify a page to start from and some settings about how deep to follow links recursively and it proceeds automatically from that point.
@A Seems broken, the resulting files display raw HTML or a blank page that says "click here". Did it work for you?
@AlexanderTheGreater Hmm, seems complicated. I'll count it if you can lay out a simple series of terminal commands that get me the end result I want.
@GarrettBaker eg single-file https://www.wikipedia.org --crawl-links=true --crawl-inner-links-only=true --crawl-max-depth=1 --crawl-rewrite-rule="^(.*)\\?.*$ $1"
@IsaacKing My understanding is that it acts through your already downloaded chrome or chromium, and that it should work if you get automatically logged into the relevant websites through the cookies saved there
@GarrettBaker Same linking issue as Pappet; all internal links are becoming external links in the archive.
It also seems to not be loading Javascript properly, all javascript functionality breaks in the archived files.
