Feature Request: getOpenGraph Func

Following the discussion at the topic What all cool stuff can hugo + colly do? I discovered that there is an Opengraph Server written in Go that retrieves Open Graph data from websites and returns them as JSON.

I just installed Opengraph Server and tested it. It works beautifully with Instagram, Vimeo, YouTube, Twitter and even Speaker Deck.

Retrieving the Open Graph meta tags exposes more useful information than the official oEmbed APIs of these services.

I am proposing a Hugo getOpenGraph function so that the Open Graph Data can be retrieved from within Hugo and saved in the /data/ folder as JSON objects to be consumed by Hugo’s internal shortcodes.

For example the Open Graph data of an Instagram post located at https://www.instagram.com/p/BWNjjyYFxVx/ could be saved under /data/BWNjjyYFxVx.json

Basically the ID of the various media will be used as the JSON filename.

The bonus points with such an approach are that the photo URLs exposed by Instagram through Open Graph do not expire unlike the ones exposed by the official oEmbed API.

This would be an amazing feature and it would greatly assist us with creating Hugo Internal shortcodes that are GDPR compliant and respect users’ privacy.

CC/ @bep @kaushalmodi @brunoamaral @lucperkins @RickCogley and everyone else.

6 Likes

That is very cool.

a couple of thoughts/questions.

  • Hopefully, the get would be flexible enough to specify a different root folder since you might not want a whole bunch of stuff dumped in the root of /data?
  • When would this run in comparison to the rest of the build. One problem I found with using alternate output formats to try and build a referenceable category/tag list, was that it was built into the /public/data folder, not the /data folder. Obvious, of course, in hindsight. But not what I needed.
  • Could this be added as another command line option to Hugo as well? So that you could rebuild the data on request - that would mean that it could be consumed in the Hugo build process but be updated on schedule if needed.

As I envision it the data files should be stored under /data/REMOTE-HOST-NAME/*.json

Anyway, I just want to point out that I am not a Dev (I’m getting there though) and I couldn’t possibly develop a new Hugo function on my own and send a Pull Request to Master.

This feature request is open game for anyone who feels that he/she would like to develop it.

But right now… I am already looking at ways of loading these JSON files from the local Open Graph Local Server to my /data/ folder with bash.

If I succeed with this approach @TotallyInformation I will share it.

I think it certainly an idea worth thinking about.

As I don’t know or what to know Golang, I would tend to do these tasks using Node.JS as a pre-build task.

In fact, that’s what I’ve started thinking about in regard to capturing my Twitter interactions into my blog. Previously, I had a live twitter feed in a sidebar but this is horribly slow to load and exposes readers to tracking. So it is gone but I would like to integrate my posts, retweets and perhaps even replies (with their original msg) as “asides” into the stream of updates.

It’s pretty simple to do this actually.

For example

  1. First download Open Graph Server.
  2. Compile it (you need to have Go installed)
  3. Then initiate opengraph-server (or you can call it whatever you want when you compile it)
  4. The server is available under http://localhost:8000/
  5. Create a folder under /data/social-media/
  6. Then within that folder create a txt file with the URLs you need (one per line)
  7. The URLs will have the form of http://localhost:8000/?=<social-media-url>
  8. We will be using xargs and cURL (if you don’t have any of these now it’s the time to install them from your package manager)
  9. Then in the above folder where urls.txt is stored open a terminal and run
    xargs -a urls.txt -I{} curl -# -O {}; find . ! -name '*.txt' -type f -exec mv {} {}.json \;
  10. Enjoy! Now the Open Graph Data is available for your Hugo project.

You can store the above steps in a bash script so that you can execute it whenever you need to update your data files.

cURL stores these files with their ID as the file name. And then with mv I give these files the .json file extension.

It’s a piece of cake to reference these JSON data files from a shortcode. It works the same way as with Hugo’s internal shortcodes.

Now not everything you might need from Twitter is stored in those Open Graph tags. But still this is a clean and free way to retrieve more stuff than the official oEmbed endpoint allows.

A getOpenGraph Hugo function would eliminate the need to use Open Graph Server and simplify this process. But you know what… since I can do this right now with bash I’m a very happy camper! :sunglasses:

EDIT
I just updated the command above to exclude the .txt file when adding the .json extension.
Also if you store this in a bash script make sure to execute find . -name "*.json" -type f -delete to delete these files before refreshing them (otherwise the .json extension will be added every time the above command runs).

Well, you lose me there sadly. As I say, I don’t do Golang and I am very reluctant to try and install yet another language on my working PC (that’s one of the reasons why I rapidly ditched Jekyll). Also I work on Windows for development, not Linux.

However, I might try opengraph.io which looks like it does the same thing but runs in the cloud. Maybe there is a Node.JS equivalent as well, I’ll look when I get some time.

I also don’t code in Go… but I like to build the latest versions of all tools I use.

Oh, “installing” Go toolchain is nothing like Ruby! It’s mainly extracting an archive (that will have the go binary), and setting the GOROOT, GOPATH and PATH env vars. That’s it! (my little blog post on that).

2 Likes

Thanks for the info. Maybe I’ll get to it at some point. I am all too curious for my own good after all! It’s just that I’m already using too many tools and keeping everything up-to-date is already taking up too much of my time. I am trying to simplify!

This is a very simple setup and also easy to keep up to date.

There was a time when I also felt intimidated installing programming languages, compiling from source etc.

But it’s not that difficult. You can always use a Virtual Machine machine to try things out without messing around your workstation.

Urm, that adds several more things to keep up to date :persevere:

However, I appreciate the attempts to lure me into the Golang sphere :wink: As I say, no time right now but maybe later. I’ve more important things to focus on right now.

It’s a cool feature request that you’re making here! But before people jump into learning Go to make it, this feature first needs to be confirmed by community leaders right? Otherwise it runs the risk of becoming a pull request that never gets merged, which is a shame of time and energy for all parties involved. :hugs:

1 Like

Well that’s why I made this thread and didn’t open a Github issue.

But I am already doing this with the technique I outlined above.

1 Like

There are some issues with this approach:

  1. It falls into the “web scraping” category, of which we have discussed if falls in to the gray or black area
  2. The services in question (YouTube) have exposed APIs for this for a reason, and I will suspect that they have some fine print written about going the “scraping route” to get more data

Well these APIs basically track anyone who views their content.

This is a big privacy issue and hopefully it will be resolved in the future.

Anyway. I understand your approach.

I’m also closing this topic.