We’re seeing sites that are placing .md file next to the html/xml outputs - which the .md files provide the content that can feed the LLMs.
I asked ChatGPT which told me the following manner - which makes sense - but wanted to ask the real experts
Hugo doesn’t have a built-in feature to export the site as both XML and Markdown (MD) during its standard build process. However, you can achieve this by using custom outputs.
Option 1: Use Custom Output Formats
Hugo supports custom output formats which can be used to generate additional file types, like XML and MD, in parallel with the HTML.
Opinion. Not consensus or official. And without pandering into the whole “evil AI” region:
An AI that requires a specific format to consume information is not an AI. HTML is structured markup. H1 is the most important headline. H2 and subsequent up to H6 are subsequent headline structures. P tags denote paragraphs. IMG tags denote images. Attributes on those tags show information like what image to show (SRC) or what the image shows (TITLE, ALT attributes, CAPTION tags). TABLE is for data, THEAD and TH is for the column/row setup, TBODY and TD is for the data.
My humble opinion: AI doesn’t need it’s own format to understand my website, if it can’t understand HTML then it’s not AI.
Regarding your ChatGPTs answer:
The answer is based on old data (GPT-3.5 January 2022, before it was September 2021, GPT-4 is April 2023). Hugo grows a generation in 2 years. Code samples from 2 years ago might result in smirks within daily users of GoHugo.
The answer might have additional input from your previous conversations, so if “it” thinks that you don’t like shortcodes, it won’t prefer or offer such solutions.
Option 1 would be the way to go for any other output format. BUT you won’t get markdown output from GoHugo, only parsed format. Would have to somehow “re-format” back into Markdown. Which I would say is not an option but crazy. There is a cp command on shell that can copy markdown files to markdown.
Option 2 is what AI people call “hallucinations”.
Now having for business reasons looked deeper into these things I would say lean back for 6 months and restart looking into it with the option to just not doing it. There is no serious discussion about anything regarding dedicated file formats to help AI understand content or websites, because AI is supposed to understand it without that.
There are discussions about things like llmtxt which again is just a collection of additional information about the pages you have online (Think a frontmatter parameter per content file that gets aggregated into a single file that can be consumed by the AI to weigh the content). It’s also not in any connection to a real existing AI tool that would make use of it, just a bunch of people that want to make their content “better understandable” for AI (which, ehm, I don’t know).
Use that time saved going through your existing content and use proper HTML to structure it, CSS to design it (not HTML to design it and CSS to put colors on it).