Day 3: Wrestling with OEmbed and Metadata

I want to share lists of links, but make them readable and archived
Close up photo of keyboard keys.
| 'TYPE' by SarahDeer is licensed with CC BY 2.0 |

Project Scope and ToDos

  1. Take a link and turn it into an oEmbed/Open Graph style share card
  2. Take a link and archive it in the most reliable way
  3. When the link is a tweet, display the tweet but also the whole tweet thread.
  4. When the link is a tweet, archive the tweets, and display them if the live ones are not available.
  5. Capture any embedded retweets in the thread. Capture their thread if one exists
  6. Capture any links in the Tweet
  7. Create the process as an abstract function that returns the data in a savable way
  • Archive links on Archive.org and save the resulting archival links
  • Create link IDs that can be used to cache related content
  • Integrate it into the site to be able to make context pages here.

Day 3

Ok, yesterday I was trying to knock down the oEmbed process from Facebook and getting nothing. Let's take this back to base principles and see if I can make a request outside of Node that gets what I need

Ok, it looks like I don't have the right permissions for my Facebook app? Sort of taking the o out of oEmbed if I need an app, permissions and a key isn't it Facebook?

Ok, to get the oEmbed process working I need to have my App verified on Facebook... which means uploading a photo of my government provided ID? Nope, fk that. Ok, just no Facebook oembeds in this process then.

Ok, let's grab the page data that tells us about a post now. To do that, I'm going to use a classic package I've done some work in before: JSDOM.

JSDOM can do its own requests, but I would prefer to handle that as a separate step.

First I'm going to build a basic object that can contain data about the page that should be useful. I want to predefine a few namespaces I would use. Let's pull the standard stuff from the meta tags and JSON-LD. I can also use Dublin Core potentially. I can also use h-card perhaps or h-entry? We can try that out at some later point.

Ok, so once I have the DOM set up how can I grab the data I need?

On the DOM object I can execute window.document.getElementsByTagName("meta"); and get a list back. Interestingly tags using the name property are accessible on the resulting object by name. For OpenGraph we can use a wildcard search of querySelectorAll.

const openGraphNodes = window.document.querySelectorAll(
"meta[property^='og:']"
);

Can I use to.equal in mocha?

result.metadata.keyvalues.equal([
"jekyll",
"social-media",
]);

Ok, did some searching around and it looks like the right way to handle this:

expect(result.metadata.keyvalues).to.have.members([
"jekyll",
"social-media",
])
const pullMetadataFromRDFProperty = (documentObj, topNode) => {
const graphNodes = documentObj.querySelectorAll(
`meta[property^='${topNode}:']`
);
const openGraphObject = Array.from(graphNodes).reduce((prev, curr) => {
const keyValue = curr.attributes
.item(0)
.nodeValue.replace(`${topNode}:`, "");
if (prev.hasOwnProperty(keyValue)) {
const lastValue = prev[keyValue];
if (Array.isArray(lastValue)) {
prev[keyValue].push(curr.content);
} else {
prev[keyValue] = [lastValue, curr.content];
}
} else {
prev[keyValue] = curr.content;
}
return prev;
}, {});
// console.log("openGraphObject", openGraphObject);
return openGraphObject;
};

Now I can use this function to capture the Twitter metadata as well!

git commit -am "Setting up scrape of twitter data"

Oh wait, I need to account for the fact that some tags are using name and some are using property.

git commit -am "Fix pullMetadataFromRDFProperty to have a prop type"

A few more modifications and I can get it to capture DublinCore if available as well.

I can even build some tests to prove some negative cases. That should be useful for more comprehensive testing.

Basically this should allow me to compose a bunch of different tests with different HTML.

git commit -am "More extensive test coverage"

Looking good. Now I want to test it end to end.

	describe("should create link objects from a domain requests", function () {
this.timeout(5000);
it("should resolve a basic URL", async function () {
const result = await linkModule.getLinkData({
sanitizedLink:
"http://aramzs.github.io/jekyll/social-media/2015/11/11/be-social-with-jekyll.html",
link: "http://aramzs.github.io/jekyll/social-media/2015/11/11/be-social-with-jekyll.html",
});
result.status.should.equal(200);
result.metadata.title.should.equal(
"How to make your Jekyll site show up on social"
);
result.metadata.author.should.equal("Aram Zucker-Scharff");
result.metadata.description.should.equal(
"Here's how to make Jekyll posts easier for others to see and share on social networks."
);
result.metadata.canonical.should.equal(
"http://aramzs.github.io/jekyll/social-media/2015/11/11/be-social-with-jekyll.html"
);
expect(result.metadata.keywords).to.have.members([
"jekyll",
"social-media",
]);
result.opengraph.title.should.equal(
"How to make your Jekyll site show up on social"
);
result.opengraph.locale.should.equal("en_US");
result.opengraph.description.should.equal(
"Here's how to make Jekyll posts easier for others to see and share on social networks."
);
result.opengraph.url.should.equal(
"http://aramzs.github.io/jekyll/social-media/2015/11/11/be-social-with-jekyll.html"
);
result.twitter.card.should.equal("summary_large_image");
result.twitter.creator.should.equal("@chronotope");
result.twitter.title.should.equal(
"How to make your Jekyll site show up on social"
);
result.twitter.image.should.equal(
"https://raw.githubusercontent.com/AramZS/aramzs.github.io/master/_includes/tumblr_nwncf1T2ht1rl195mo1_1280.jpg"
);
result.dublinCore.Format.should.equal("video/mpeg; 10 minutes");
result.dublinCore.Language.should.equal("en");
result.dublinCore.Publisher.should.equal("publisher-name");
result.dublinCore.Title.should.equal("HYP");
result.jsonLd["@type"].should.equal("BlogPosting");
result.jsonLd.headline.should.equal(
"How to make your Jekyll site show up on social"
);
result.jsonLd.description.should.equal(
"Here's how to make Jekyll posts easier for others to see and share on social networks."
);
expect(result.jsonLd.image).to.have.members([
"https://raw.githubusercontent.com/AramZS/aramzs.github.io/master/_includes/tumblr_nwncf1T2ht1rl195mo1_1280.jpg",
]);
});
});

Ok, a few more tweaks and a reminder that I don't have Dublin Core on my actual site and it should be good to go.

git commit -am "End to end unit test for building a link object"

Now I have a good looking data object I can use to build context cards:

{
originalLink: 'http://aramzs.github.io/jekyll/social-media/2015/11/11/be-social-with-jekyll.html',
sanitizedLink: 'http://aramzs.github.io/jekyll/social-media/2015/11/11/be-social-with-jekyll.html',
oembed: false,
jsonLd: {
'@type': 'BlogPosting',
headline: 'How to make your Jekyll site show up on social',
description: "Here's how to make Jekyll posts easier for others to see and share on social networks.",
image: [
'https://raw.githubusercontent.com/AramZS/aramzs.github.io/master/_includes/tumblr_nwncf1T2ht1rl195mo1_1280.jpg'
],
mainEntityOfPage: {
'@type': 'WebPage',
'@id': 'http://aramzs.github.io/jekyll/social-media/2015/11/11/be-social-with-jekyll.html'
},
datePublished: '2015-11-11 10:34:51 -0500',
dateModified: '2015-11-11 10:34:51 -0500',
isAccessibleForFree: 'True',
isPartOf: {
'@type': [ 'CreativeWork', 'Product', 'Blog' ],
name: 'Fight With Tools',
productID: 'aramzs.github.io'
},
discussionUrl: false,
license: 'http://creativecommons.org/licenses/by-sa/4.0/',
author: {
'@type': 'Person',
name: 'Aram Zucker-Scharff',
description: 'Aram Zucker-Scharff is Director for Ad Engineering at Washington Post, lead dev for PressForward and a consultant. Tech solutions for journo problems.',
sameAs: 'http://aramzs.github.io/aramzs/',
image: {
'@type': 'ImageObject',
url: 'https://raw.githubusercontent.com/AramZS/aramzs.github.io/master/_includes/Aram-Zucker-Scharff-square.jpg'
},
givenName: 'Aram',
familyName: 'Zucker-Scharff',
alternateName: 'AramZS',
publishingPrinciples: 'http://aramzs.github.io/about/'
},
publisher: {
'@type': 'Organization',
name: 'Fight With Tools',
description: "A site discussing how to imagine, build, analyze and use cool code and web tools. Better websites, better stories, better developers. Technology won't save the world, but you can.",
sameAs: 'http://aramzs.github.io',
logo: {
'@type': 'ImageObject',
url: 'https://41.media.tumblr.com/709bb3c371b9924add351bfe3386e946/tumblr_nxdq8uFdx81qzocgko1_1280.jpg'
},
publishingPrinciples: 'http://aramzs.github.io/about/'
},
editor: {
'@type': false,
name: false,
description: false,
sameAs: false,
image: { '@type': false, url: false },
givenName: false,
familyName: false,
alternateName: false,
publishingPrinciples: false
},
'@context': 'http://schema.org'
},
status: 200,
metadata: {
author: 'Aram Zucker-Scharff',
title: 'How to make your Jekyll site show up on social',
description: "Here's how to make Jekyll posts easier for others to see and share on social networks.",
canonical: 'http://aramzs.github.io/jekyll/social-media/2015/11/11/be-social-with-jekyll.html',
keywords: [ 'jekyll', 'social-media' ]
},
dublinCore: {},
opengraph: {
title: 'How to make your Jekyll site show up on social',
description: "Here's how to make Jekyll posts easier for others to see and share on social networks.",
url: 'http://aramzs.github.io/jekyll/social-media/2015/11/11/be-social-with-jekyll.html',
site_name: 'Fight With Tools by AramZS',
locale: 'en_US',
type: 'article',
typeObject: {
published_time: '2015-11-11 10:34:51 -0500',
modified_time: false,
author: 'http://facebook.com/aramzs',
publisher: 'https://www.facebook.com/aramzs',
section: 'Code',
tag: [ 'jekyll', 'social-media' ]
},
image: 'https://raw.githubusercontent.com/AramZS/aramzs.github.io/master/_includes/tumblr_nwncf1T2ht1rl195mo1_1280.jpg'
},
twitter: {
site: '@chronotope',
description: "Here's how to make Jekyll posts easier for others to see and share on social networks.",
card: 'summary_large_image',
creator: '@chronotope',
title: 'How to make your Jekyll site show up on social',
image: 'https://raw.githubusercontent.com/AramZS/aramzs.github.io/master/_includes/tumblr_nwncf1T2ht1rl195mo1_1280.jpg'
}
}