Day 1: Building a tool to generate context pages

I want to share lists of links, but make them readable and archived

Close up photo of keyboard keys. — | *'TYPE' by SarahDeer is licensed with CC BY 2.0* |

Project Scope and ToDos

Take a link and turn it into an oEmbed/Open Graph style share card
Take a link and archive it in the most reliable way
When the link is a tweet, display the tweet but also the whole tweet thread.
When the link is a tweet, archive the tweets, and display them if the live ones are not available.
Capture any embedded retweets in the thread. Capture their thread if one exists
Capture any links in the Tweet
Create the process as an abstract function that returns the data in a savable way

Archive links on Archive.org and save the resulting archival links
Create link IDs that can be used to cache related content
Integrate it into the site to be able to make context pages here.
Archive linked YouTubes

Day 1

Ok, so this is a thing that happens a lot. I collect a bunch of links to a particular topic, and I want to share it. But it's hard to read a bunch of links, so how do I make it more readable?

I thought through some scope requirements and to dos and put them on the top of this page first. My first goal is to take a list of links and turn them into something more easy to read. I think the best way is by creating Open Graph style share cards for each link and replacing the link in place with those cards. So let's handle that request process.

Selecting test tool

I think the easiest way to move forward is to build some test processes first so that I can run links through the function I'm building and test my outputs. I've now done tests with Jest and Mocha. Another popular library is Chai, so let's try that.

Archiving Tools Refrerence

It's also worthwhile to do exactly the sort of thing I'm talking about here and record some info about archiving links.

GitHub - palewire/savemy.news: Save My News: A personal, permanent clipping service

1/21/2022

Save My News: A personal, permanent clipping service - GitHub - palewire/savemy.news: Save My News: A personal, permanent clipping service

Read Archived

GitHub - palewire/archiveis: A simple Python wrapper for the archive.is capturing service

1/21/2022

A simple Python wrapper for the archive.is capturing service - GitHub - palewire/archiveis: A simple Python wrapper for the archive.is capturing service

Read Archived

Websites change. Perma Links don't.

1/21/2022

Perma.cc helps scholars, journals, courts, and others create permanent records of the web sources they cite.

Read Archived

Conifer

1/21/2022

Collect and revisit web pages — Free, open-source web archiving service.

Archived Read

Help:Using the Wayback Machine - Wikipedia

1/21/2022

The Wayback Machine is a service which can be used to cite archived copies of web pages used by articles. This is useful if a web page has changed, moved, or disappeared; links to the original content can be retained. This process can be performed automatically, using the web interface for User:InternetArchiveBot.

Read Archived

Save Pages in the Wayback Machine – Internet Archive Help Center

2/6/2022

Many people have shown interest in making sure the Wayback Machine has copies of the web pages they care about most. These saved pages can be cited, shared, linked to – and they will continue to exist even after the original page changes or is removed from the web.

Read Archived

archive.is

1/21/2022

This document provides information about the Memento compliant archive.is.

Read Archived

Memento Guide: Introduction

1/21/2022

Last updated: January 19, 2015

Archived Read

ArchiveTeam Warrior - Archiveteam

1/21/2022

If you have any issues or feedback, see the AT #warrior IRC channel on hackint.

Read Archived

Software - Archiveteam

1/21/2022

The WARC Ecosystem has information on tools to create, read and process WARC files.

Read Archived

GitHub - eloquence/freeyourstuff.cc: freeyourstuff.cc - universal content liberation

1/21/2022

freeyourstuff.cc - universal content liberation. Contribute to eloquence/freeyourstuff.cc development by creating an account on GitHub.

Read Archived

GitHub - dhamaniasad/WARCTools: A list of tools related to W(eb)ARC(hive)

1/21/2022

A list of tools related to W(eb)ARC(hive). Contribute to dhamaniasad/WARCTools development by creating an account on GitHub.

Read Archived

Puppeteer HTML to PDF Generation with Node.js - RisingStack Engineering

8/23/2021

Learn to generate a Puppeteer PDF document from a heavily styled React page using Node.js, headless Chrome and Docker.

Read Archived

html-pdf-node

1/21/2022

Convert any html content or html page to PDF. Latest version: 1.0.8, last published: 2 months ago. Start using html-pdf-node in your project by running `npm i html-pdf-node`. There are 6 other projects in the npm registry using html-pdf-node.

html, pdf, nodejs, puppeteer, handlebars

Read Archived

GitHub - mozilla/readability: A standalone version of the readability lib

1/21/2022

A standalone version of the readability lib. Contribute to mozilla/readability development by creating an account on GitHub.

Read Archived

This is all pretty much more extensive then I want to do for my first run at this project, but it is good to have a list. To start, let's turn link lists into HTML cards.

Sanitizing the URL

Ok, first thing is to sanitize the URL.

There's a fairly popular Node sanitation library, I'll start there.

I'll pull the regex WordPress uses to clean URLs, as I've used that in PHP and it's fairly reliable.

Finally, I want to strip marketing params that are commonly used in links. I could make my own code here, but a quick search around has revealed that someone built some good regexes to handle this.

Ok, this makes for a good test setup. It looks like Chai builds on top of Mocha, so let's install that too.

Ok, it looks like Chai has a suite of tools, the major ones are should, expect and assert.

Ok, let's make some bad links.

I want to invalidate mailto links also. So let's see if I can throw an error and capture it in Chai.

I should be able to capture the tests with .should.Throw and expect(fn).to.throw(new Error('text'))

Hmm, that's not working.

Ok, it looks like it has a different format and does require we put the error-throwing function inside another function... for some reason. I also can't use the error object, just the error text. Also unclear from the docs.

it("should throw on mailto links", () => {
	expect(() => {
		linkModule("mailto:test@example.com?subject=hello+world");
	}).to.throw("Invalid Mailto Link");
});

Ok, my sanitizer looks good and I think that I have some good coverage. Next step will be handling the Fetch step and building out the data model. But this is a good place to stop.

git commit -am "Set up sanitizer and unit tests"

Previous Day Next Day