Just the Text, Huh?

If anyone knows of a proxy I could give a web URL to and receive a simple .txt version back of the article, please let me know! Otherwise, I might be tempted to create one. Maybe a gopher service?

I don't know about a proxy, but I wonder how far @m150 could get with the following command:

$ lynx -dump -nolist ${URL} > ${FILENAME}.txt

If a site is too dependent on JS, this won't work, but if there's text hidden under entirely too much JS this might be enough to extract it. You'll still want to massage it using sed, though.

That's what I did when retrieving and cleaning the Limyaael Rants.

every

Lynx works OK and mine defaults to utf-8. I use a sed filter I built to convert extended ASCII stuff to be US-ASCII compliant. Here is my filter so far:

https://every.sdf.org/.webshare/TXT.txt

m15o

Thanks starbreaker! That's actually a very elegant way. Always impressed to see the wonders of piping commands. Someone else mentioned:

textify.it

Which I still haven't tested.

rosie88

Can I use this as sort of a "mirror" for some sort of gemini "news" capsule?

starbreaker

I just tried textify.it with my websites. It seems to balk at processing tables that contain links.