╔════════════════════════════════════════════════════════════════════╗ ║ ISSUE: #02 DATE: 2026-01-06 ║ ╟────────────────────────────────────────────────────────────────────╢ ║ ║ ║ ▀ █▄ █ ▀█▀ █▀▀ █▀█ █▀ █▀█ ▄▀█ █▀▀ █▀▀ ║ ║ █ █ ▀█ █ ██▄ █▀▄ ▄█ █▀▀ █▀█ █▄▄ ██▄ ║ ║ ║ ║ AUTHOR: Irons, Sam PUBLISHER: Interspace Studio ║ ║ TYPE: Newsletter LANGUAGE: en-US ║ ║ SUBJECTS: LLM Generation · Metadata Quality · SEO Best Practices ║ ║ ║ ║ DESCRIPTION: Comprehensive guide exploring how content ║ ║ professionals can leverage LLMs for metadata generation, covering ║ ║ SEO best practices, Dublin Core mapping, and research-validated ║ ║ quality metrics including completeness, accuracy, and conformance ║ ║ to expectations. ║ ╟────────────────────────────────────────────────────────────────────╢ ║ Interspace can make mistakes. Consider checking important info. ║ ╚════════════════════════════════════════════════════════════════════╝ ╔════════════════════════════════════════════════════════════════════╗ ║ I N T R O D U C T I O N ║ ╚════════════════════════════════════════════════════════════════════╝ !! PLEASE FORWARD THIS TO WHOEVER YOU THINK MAY BE INTERESTED !! Interspace is a content newletter written by Sam Irons, founder of Interspace Studio in Sydney, Australia. Interspace covers content strategy, UX writing, technical writing, and content practices. Interspace is also a community. You've probably received this from a co-worker (if I didn't send it to you directly). Communities of practice are essential to keeping disciplines resilient, values- driven, and creative. If something I've written sparks a discussion, then we're tending to and growing that community. Welcome. Happy new year! In this issue, I explore metadata. Can we use LLMs (large language models) to generate useful metadata? This newsletter covers three topics: 1. Search engine optimization basics and prompts 2. A strategy for modern site builders to transform Dublin Core metadata into JSON-LD and HTML markup 3. How to measure metadata quality and spot poor metadata using LLMs This newsletter is for nerds. Word nerds, tech nerds, AI nerds. It's packed with ideas, techniques, prompts, and experiments. I've included metadata transformation code for Next.js and Astro. If you work with structured content and site builders, this issue is for you. Subscribe to future issues, or view back issues: << http://interspacestudio.com.au/newsletter >> Thank you for reading, Lots of love, Sam Irons irons.sam@interspacestudio.com.au !! PLEASE FORWARD THIS TO WHOEVER YOU THINK MAY BE INTERESTED !! ╔════════════════════════════════════════════════════════════════════╗ ║ C O N T E N T S ║ ╚════════════════════════════════════════════════════════════════════╝ 1. Starting slow with SEO 2. Metadata in a vibecoded world 3. Measuring metadata quality 4. Discussion 5. This month's reading 6. Thanks! ╔════════════════════════════════════════════════════════════════════╗ ║ S T A R T I N G S L O W W I T H S E O [01] ║ ╚════════════════════════════════════════════════════════════════════╝ Most content professionals working on websites or applications probably get their first dose of metadata when they start thinking about search engine optimization (SEO). Its always been a dark art and constantly changing with Google firmly in control. Google and other search engines crawl pages and compare metadata against the content of the page to validate it before ranking. Throughout the constantly changing landscape of The Search Algorithm, well-written and well-structured content - made for human consumption - continues to top the search engine results page. LLMs, trained on human written content, also seem to prefer well structured and well written content when searching and drawing from the web. Naturally, one of the first problem spaces that excited content professionals when ChatGPT began making waves was the generation of metadata. Does LLM-generated metadata stack up to the expert curation of content professionals and domain experts? Researchers from Syracuse University and the University of Washington ran a test with 26 educators, students, and other education professionals. They were each given 15 lesson plans and their associated metadata blocks, completed with Dublin Core and its educational extension elements. Half were shown the lesson plan first, then the metadata; the other half, vice versa. You can read their paper in full here: << https://dl.acm.org/doi/epdf/10.1145/564376.564464 >> They found that participants' satisfaction scores (whether the metadata matched the content or not) varied only minimally between human-generated and LLM-generated samples. LLMs are fantastic at summarizing. Given a page, any model off the shelf can write you a good summary. New models, like GPT5 and Claude 4.5 are even more sophisticated. Their natural language processing already considers industry best practices when they "reason". Ask an LLM to write page titles and meta description for content, and it will return (most of the time) content that fits within known character limits using active calls to action and benefits statements directed at specific audiences. Still, here's a few tips and tricks to add to the countless pages of advice already online: * Give the LLM the role of an SEO expert. * Tell it to follow industry best practices. * Ask it to check its work. * Use few-shot prompting: give examples of good and bad metadata. * Ask it to optimize output for your target audience's questions. The first three are pretty standard best practice for prompting these days, mostly from observations and anectodal quality assessments. The last two come from research-minded capitalism. Fidelity Investments, Bangalore, published a paper describing their approach to building a "smart data catalog" to improve generating metadata for data tables. You can read their paper in full here: << https://arxiv.org/abs/2503.09003 >> They found: * Fine-tuning the dataset helps significantly. This means cleaning data and weighting it. For example, they eliminated audit columns. They weighted highly-ranked columns based on user popularity. * Few-shot prompting helps. Providing examples of good and bad metadata improved quality. The authors also found that business glossaries and style guides help LLMs. Additionally, the authors discussed that making business glossaries and style guides available to LLMs helped the models. Put it all together and here's a little prompt for you to generate instances of SEO page titles and meta descriptions: \\ INSTRUCTIONS \\ You are an SEO expert. Generate a page title and meta \\ description for the page content below. \\ \\ First, analyze the content. Identify the target audience. \\ Find the most important information for that audience. \\ \\ Then, generate a page title and meta description. Follow all \\ industry best practices. \\ \\ RULES \\ - Keep titles under 60 characters. Aim for 46 or less. \\ - Write meta descriptions between 150-160 characters. \\ - Use active voice in meta descriptions. \\ - Include a call to action in meta descriptions. \\ \\ PAGE TITLE EXAMPLES \\ Poor: Our content management software \\ Better (promises a benefit): Save 10 hours a week with our \\ content management system \\ \\ Poor: Guide to SEO/LMO \\ Better (injects news): The 2025 Ultimate Guide to SEO and \\ LMO \\ \\ Poor: Writing Tips \\ Better (includes numbers): 13 Easy Hacks for Better Business \\ Writing \\ \\ META DESCRIPTION EXAMPLES \\ Poor (just a list of keywords): Sewing supplies, yarn, \\ colored pencils, sewing machines, threads, bobbins, needles \\ Better (specific and detailed): Get everything you need to \\ sew your next garment. Open Monday-Friday 8-5pm, located in \\ the Fashion District. \\ \\ Poor (generic): Local news in Whoville, delivered to your \\ doorstep. Find out what happened today. \\ Better (specific and detailed): Upsetting the small town of \\ Whoville, a local elderly man steals everyone's presents the \\ night before an important event. Stay tuned for live updates \\ on the matter \\ \\ Poor (too short): Mechanical pencil \\ Better (specific and detailed): Self-sharpening mechanical \\ pencil that autocorrects your penmanship. Includes 2B \\ auto-replenishing lead. Available in both Vintage Pink and \\ Schoolbus Yellow. Order 50+ pencils, get free shipping. \\ \\ PAGE CONTENT \\ {content} \\ \\ GLOSSARY \\ {glossary} \\ \\ STYLE GUIDE \\ {style guide} These examples come from Google's documentation and my experience. Tailor yours to what converts and what causes problems in your content. This prompting can automate your workflow. Write the content. Focus on the page. Run this during publishing. Better yet, create an LLM agent. Set it to run automatically. Use your glossary and style guide as knowledge sources. ╔════════════════════════════════════════════════════════════════════╗ ║ M E T A D A T A I N A V I B E C O D E D W O R L D [02] ║ ╚════════════════════════════════════════════════════════════════════╝ I've noticed a shift in the last year or so, with the rise of vibecoding. Most of these tools recommend and build sites using Next.js, Astro, Gatsby, or similar static site generators (SSG). These systems use markdown to store content. Simple frontmatter describes the markdown content. When the site builds, it transforms frontmatter into structured JSON-LD and HTML markup. If you haven't defined a metadata strategy and you use a modern SSG, I recommend two approaches: * Use Dublin Core standards for internal cataloging. Dublin Core is a metadata standard designed for "cross-disciplinary resource discovery." * Map Dublin Core elements to Schema.org and OpenGraph when you build the site. Dublin Core works well in markdown frontmatter. It stays human-readable. Schema.org and OpenGraph markup help with SEO and social media sharing. Here's the basic frontmatter and mappings. They suit any textual content. You can extend them later: Dublin Schema.org OpenGraph ------ ---------- --------- title title og:title subject keywords - description description og:description type type og:type coverage temporalCoverage - creator author article:author publisher publisher - contributor contributor - date datePublished article:published_time identifier url og:url language inLanguage og:locale image - og:image image-alt - og:image:alt Your developers can transform these into proper JSON-LD and HTML markup. As a New Year's gift, I asked Claude to generate templates for Next.js and Astro. You can find them here: << https://github.com/ironssamuel/ssg-metadata-templates >> With this in mind, we can improve our prompt to generate instances of structured metadata. Instead of freeform text, we generate structured metadata: \\ INSTRUCTIONS \\ You are an SEO expert. Generate metadata for the page content \\ below. \\ \\ First, analyze the content. Identify the target audience. Find \\ the most important information for that audience. \\ \\ Then, generate metadata as valid frontmatter. Follow this \\ format: \\ \\ ``` \\ title: "The name of the resource. 60 characters or less." \\ subject: "Keywords or phrases describing the content." \\ description: "Description of the content. 150-160 characters. \\ Use active voice. Include a call to action." \\ type: Article/TechArticle/HowTo \\ coverage: "The spatial or temporal characteristics of the \\ content." \\ creator: // The person or organization who created the \\ content. \\ publisher: // The entity that made the resource available, \\ such as a publishing house, university, or company. \\ contributor: // A person or organization who made significant \\ contributions but secondary to the creator (editor, \\ transcriber, illustrator). \\ date: YYYY-MM-DD - Creation or availability date. \\ identifier: // String or number that uniquely identifies the \\ resource. Examples: URLs, URNs, ISBN. \\ language: "Language code of the content." \\ image: // public/images/path/to/open-graph/image \\ image-alt: // Alternative description of Open Graph image \\ ``` \\ \\ OUTPUT RULES \\ - Return only the metadata frontmatter in plain text. \\ - Complete all fields. \\ - Wrap title, subject, coverage, creator, publisher, \\ contributor, and image-alt in quotation marks. \\ - Format date as YYYY-MM-DD with no quotation marks. \\ - Format identifier as a JavaScript comment. Example: \\ "identifier: // path-to-file/name" \\ - Format image as a JavaScript comment. Example: "image: // \\ public/docs/og-image.png" \\ - Format image-alt as a JavaScript comment. Example: \\ "image-alt: // 'Description of Open Graph image.'" \\ \\ PAGE CONTENT \\ {content} \\ \\ GLOSSARY \\ {glossary} \\ \\ STYLE GUIDE \\ {style guide} I've commented out some fields that should be scrutinized closely, like the identifier. If you can complete these or any metadata fields deterministically, do it. For example, the date field can be completed via the last updated date of the file or deploy time. The creator field can be completed by the user data of the person saving or commiting the file. The identifier can be completed using the file name or path. Etc. Again, this can be automated into your publishing workflow, which can provide serious effeciency gains at scale. Can you trust an LLM to generate useful metadata at scale? To answer that, we need to dive a little deeper into what makes for high-quality metadata. ╔════════════════════════════════════════════════════════════════════╗ ║ M E A S U R I N G M E T A D A T A Q U A L I T Y [03] ║ ╚════════════════════════════════════════════════════════════════════╝ Information scientists have studied auto-tagging and metadata generation for years. Library sciences lead this work. In 2004, Bruce and Hillman created a framework for Cornell Law School. They suggested these metrics for evaluating metadata quality: * Completeness - Metadata describes content as fully as possible. * Accuracy - Metadata is as correct as possible. * Conformance to expectations - Metadata fulfills user requirements for tasks like finding, identifying, and selecting resources. * Logical consistency and coherence - Metadata follows domain standards for language and structure. * Accessibility - Metadata is retrievable and understandable. I think "findability" describes this better. * Timeliness - Metadata is current. * Provenance - The source of metadata is known and credible. Read their full paper: << https://ecommons.cornell.edu/server/api/core/bitstreams/ 2b3e14fd-82a9-49ce-a8c4-9fd096010a08/content >> These are great starting points. You can imagine giving users metadata samples from a repository. Then ask them to rate the metadata on these dimensions. In 2009, Ochoa (NYU) and Duval (KU Leuven) went further. They sought to measure metadata quality programmatically. They created metrics for evaluating each of Bruce and Hillman's characteristics. Read their full paper: << https://www.researchgate.net/publication/220387581_ Automatic_evaluation_of_metadata_quality_in_digital_ libraries >> Here's a brief summary: * Completeness: A basic measure counts completed fields. A better measure weights important fields more heavily. * Accuracy: A basic measure checks if fields contain correct information (numbers in number fields) and data quality (no broken links). A better measure counts words shared between metadata and the resource. * Conformance to expectation: This measures how unique the metadata is compared to others in the set. * Consistency: This checks if metadata follows standards in structure (like Dublin Core) and language. * Coherence: This checks if all metadata fields describe the resource similarly. * Findability/accessibility: A basic measure looks at explicit links (like "relates to" or "is a version of"). A better measure looks at implicit links by traversing a data graph. * Timeliness: A basic measure checks currency (last updated date). A better measure compares currency to average quality over time. * Provenance: This measures perceived trust. The researchers created metrics to measure all these aspects programmatically. They used proxies to estimate some. But here's the finding: most metrics don't matter much for overall quality compared to how humans evaluate metadata. They ran three studies to validate their findings. In the first study, they tested if their metrics matched human ratings. 22 researchers evaluated 20 metadata instances (10 manual, 10 auto-generated). They graded metadata on a 7-point scale for each parameter. "In general, the quality metrics do not correlate with their expected quality parameters as human[sic.] rate them." But one metric stood out. It influenced all quality measures AND matched how humans rated other aspects. "If all the parameters are averaged, the final result could be mostly estimated (80%) by the Qtinfo metric in combination with the origin of the metadata." Qtinfo is a conformance to expectation metric. It measures how well metadata fulfills user requirements for finding, identifying, selecting, and obtaining a resource. The researchers suggest that usefulness depends on unique information in the metadata. Users differentiate resources more easily when metadata instances aren't similar. They defined how to measure this programmatically. It would take another full newsletter to explain the equations. In broad strokes, the researchers measure the importance of a word in a document as a proportion of its frequency in the document, and inversely proportionate to how frequently documents in the corpus contain that word. Or, plainer, if a unique word appears frequently in one document but rarely in others, then it is more useful in finding, indentifying, and selecting the content. For content professionals, this means one thing: You can't evaluate metadata alone. You must evaluate it against the complete content set. To get LLMs to create meaningful metadata, they need to evaluate all metadata instances in the set. This generates unique metadata that differentiates resources rather than just summarizing documents. Agentic design makes this easy. You can explicitly provide knowledge sources to the LLM. Ochoa and Duval's second study measured manual versus auto-generated metadata using their metrics. Unsurprisingly: "In general, the metrics found that manual metadata set has higher quality than the automatic metadata set." This happened before the LLM revolution. They used SAmgI, a simple text analysis algorithm. One major difference: completeness. Human experts filled more fields than the bot could. LLMs have changed this significantly. They can help close the gap. Even then, SAmgI generated more accurate metadata than humans. It used text directly from the resource. Humans use synonyms. LLMs do too. What is interesting to content professionals is how of this scales. The researchers ran a third study looking to use programmatic quality metrics as a "filter" to identify poor quality metadata entries. A few metrics were found to be extremely useful in this regard: * Completeness * Weighted completeness * Conformance to expectation In designing evaluation systems, these metrics should be used when defining scorecards for measuring metadata across a set. Three deterministic metrics that can be programmatically calculated feels lightweight to me. Its certainly faster than regular sampling and surveying against human panels. ╔════════════════════════════════════════════════════════════════════╗ ║ D I S C U S S I O N [04] ║ ╚════════════════════════════════════════════════════════════════════╝ This investigation taught me a lot: * LLMs generate metadata instances well when summarizing a page. They can generate structured metadata that follows industry standards like Dublin Core. * Don't create metadata in a vacuum. Compare instances across the set. Generate unique, distinct metadata entries instead of just summarizing pages individually. Provide LLMs with glossaries and style guides for even better results. * Markdown wins again. This makes me smile. Simple frontmatter transforms easily into structured markup during site generation. We can keep the author experience clean and templated. See examples in Astro and Next.js: << https://github.com/ironssamuel/ssg-metadata-templates >> * Complete metadata fields automatically when you can. * You can quantify and monitor metadata quality. Use quantitative frameworks to evaluate metadata across a set. Identify poorly formed entries. I give LLMs a solid 5/6 on these approaches. Enterprises gain massive efficiency by using LLM-generated metadata. They can spot poorly formed metadata with quality checks and scripts. Freelancers and contractors speed up workflows too. Use templated approaches to static site generation. Most importantly, better metadata helps users. It helps them find, identify, and select the correct record. Whether they search or browse. Isn't that what it's all about? ╔════════════════════════════════════════════════════════════════════╗ ║ T H I S M O N T H ' S R E A D I N G [05] ║ ╚════════════════════════════════════════════════════════════════════╝ Here's your monthly oracle reading from the Design Oracle! YOUR JANUARY PRACTICE This month invites you to build with structure and surrender. Map the path forward with conviction. Break complexity into manageable steps. But hold your plans lightly. The territory reveals itself as you walk it. The best course adapts to discoveries along the way. Planning serves your work. It shouldn't constrain it. Progress comes through repetition with purpose. Each iteration brings wisdom you can only learn through doing. Question each version with care: Are you making meaningful improvements or just making changes? Let scrutiny guide your refinement. Trust the cyclical nature of growth. Excellence emerges through layers. Each builds on what came before. The cards suggest January is a month of disciplined flexibility. Create the structure you need to move forward. Then refine through cycles of making and questioning. The path to excellence is rarely linear. Trust the spiral. Get your own oracle deck. Drive personal insights. Get motivated with design rituals. The Design Oracle is free in the public domain: << http://design-oracle.github.com/ >> ╔════════════════════════════════════════════════════════════════════╗ ║ T H A N K S ! ║ ╚════════════════════════════════════════════════════════════════════╝ If you've read this far, thank you so much! I know these newsletters are long, but hopefully they've given you something to think about and discuss with your community of practice. Subscribe to new issues or read back issues: << http://interspacestudio.com.au/newsletter >> Check out my services and rates there too. I help businesses succeed with their content. I consult, contract, coach, and speak. Reach out to see what I can do for your business. << irons.sam@interspacestudio.com.au >> Until next time! !! PLEASE FORWARD THIS TO WHOEVER YOU THINK MAY BE INTERESTED !!