While setting up my Next.js blog, I've documented settings for metadata, favicons and canonical links which all reside in the <head>
element. Today's topic does not reside inside the <head>
element but is just as crucial for any website or blog: a sitemap.
And because one requires the other, we will also be creating a robots.txt file.
If you are not familiar with the importance of sitemaps and robots.txt files, you can learn about sitemaps on the Google Developers site.
Two Ways to Build a Sitemap
I have found two ways to build a sitemap for a Next.js website. I tried both, and I will guide you through them. In the end, I'll share which one I chose for my blog and why. Of course, your needs may be different than mine so learning about both will help your decision, too.
1. Generate a sitemap with next-sitemap
The npm package next-sitemap was how I first learned to build sitemaps in Next.js. It is a very useful dependency and is still maintained as of this writing (November, 2023).
You can install next-sitemap like this:
npm i next-sitemap
Configure next-sitemap
After installing next-sitemap, you need to create a next-sitemap.config.js
file in the root directory of your project. This is the same place your package.json
file is.
Here's what my next-sitemap.config.js
file looks like:
/** @type {import('next-sitemap').IConfig} */
module.exports = {
siteUrl: 'https://www.davegray.codes/',
exclude: ['/icon.svg', '/apple-icon.png', '/manifest.webmanifest', '/tags/*'],
generateRobotsTxt: true,
generateIndexSitemap: false,
robotsTxtOptions: {
policies: [
{
userAgent: '*',
allow: '/',
}
]
}
}
You can review what each setting above does on the next-sitemap npm page, but I think you can quickly see I'm excluding some files and the /tags/*
path from the generated sitemap. I'm also telling it to generate a robots.txt
file.
One key benefit of generating a Next.js sitemap with the next-sitemap package is that it supports a sitemap index. This can be beneficial for large websites and blogs. We're talking sites with thousands of pages.
Add to your post build
Your Next.js site needs to generate all static pages first, so creating the sitemap is not part of your build process. Instead, it gets generated in a postbuild process. We trigger that by adding the postbuild script to package.json
:
"scripts": {
"dev": "next dev",
"build": "next build",
"start": "next start",
"lint": "next lint",
"postbuild": "next-sitemap"
},
And now you can type npm run build
in your terminal to build your site and generate a sitemap.xml file and a robots.txt file that you will find in your /public
directory.
My sitemap.xml
generated by next-sitemap:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:news="http://www.google.com/schemas/sitemap-news/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:mobile="http://www.google.com/schemas/sitemap-mobile/1.0" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1" xmlns:video="http://www.google.com/schemas/sitemap-video/1.1">
<url><loc>https://www.davegray.codes</loc><lastmod>2023-11-14T19:24:05.794Z</lastmod><changefreq>daily</changefreq><priority>0.7</priority></url>
<url><loc>https://www.davegray.codes/posts/does-my-nextjs-blog-need-canonical-links</loc><lastmod>2023-11-14T19:24:05.794Z</lastmod><changefreq>daily</changefreq><priority>0.7</priority></url>
<url><loc>https://www.davegray.codes/posts/nextjs-favicon-svg-icon-apple-chrome</loc><lastmod>2023-11-14T19:24:05.794Z</lastmod><changefreq>daily</changefreq><priority>0.7</priority></url>
<url><loc>https://www.davegray.codes/posts/nextjs-ordering-merging-metadata</loc><lastmod>2023-11-14T19:24:05.794Z</lastmod><changefreq>daily</changefreq><priority>0.7</priority></url>
<url><loc>https://www.davegray.codes/posts/ssg-ssr</loc><lastmod>2023-11-14T19:24:05.794Z</lastmod><changefreq>daily</changefreq><priority>0.7</priority></url>
</urlset>
My robots.txt
file generated by next-sitemap:
# *
User-agent: *
Allow: /
# Host
Host: https://www.davegray.codes/
# Sitemaps
Sitemap: https://www.davegray.codes/sitemap.xml
2. Generate a sitemap with Next.js
Next.js now offers built-in sitemap generation. This was first introduced in version 13.3 and updated in v13.5.
sitemap.ts
The sitemap.ts example in the Next.js docs shows the return type for the sitemap function by providing statically-typed data. However, when we build the function, we can add in the logic we need to generate the necessary return type instead of typing out the URLs one-by-one.
Here's what my sitemap.ts file looks like:
import { MetadataRoute } from 'next'
import { getPostsMeta } from '@/lib/posts'
export default async function sitemap(): Promise<MetadataRoute.Sitemap> {
const allPosts = await getPostsMeta()
const home = {
url: 'https://www.davegray.codes/',
lastModified: new Date().toString(),
}
if (!allPosts) return [home]
const posts = allPosts.map(post => ({
url: `https://www.davegray.codes/posts/${post.id}`,
lastModified: post.modified,
}))
// Date of most recent post
home.lastModified = allPosts[0].date
return [home, ...posts]
}
Above, I'm calling my getPostsMeta
function to get the front matter
data from my MDX files. After confirming I received the data, I map over it and create a sitemap entry for each blog post on my site. Finally, I assign the date of the most recent blog post to the lastModified
field for the home page because the home page will display a link to the latest blog post.
The sitemap.ts file should be saved in the root of your app
directory. This is the same location you will find your globals.css
file in.
Generate a robots.txt file
Next.js also offers built-in robots.txt generation. This was first introduced in version 13.3.
Here is what my robots.ts file contains:
import { MetadataRoute } from 'next'
export default function robots(): MetadataRoute.Robots {
return {
rules: {
userAgent: '*',
allow: '/',
},
sitemap: 'https://www.davegray.codes/sitemap.xml',
}
}
Your robots.txt file should also be saved in the root of your app directory.
The Output
After creating these files, you can check their output in dev mode by typing npm run dev
in your terminal.
Here is the output for my sitemap found at localhost:3000/sitemap.xml
:
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://www.davegray.codes/</loc>
<lastmod>2023-11-14</lastmod>
</url>
<url>
<loc>https://www.davegray.codes/posts/does-my-nextjs-blog-need-canonical-links</loc>
<lastmod>2023-11-14</lastmod>
</url>
<url>
<loc>https://www.davegray.codes/posts/nextjs-favicon-svg-icon-apple-chrome</loc>
<lastmod>2023-11-13</lastmod>
</url>
<url>
<loc>https://www.davegray.codes/posts/nextjs-ordering-merging-metadata</loc>
<lastmod>2023-11-12</lastmod>
</url>
</urlset>
Here is the output for my robots.txt file found at localhost:3000/robots.txt
:
User-Agent: *
Allow: /
Sitemap: https://www.davegray.codes/sitemap.xml
Note that the process above is different than generating those files with next-sitemap. It is not a postbuild process. You can check the output while running in dev mode. In addition, it does not create static files in your public directory. You won't see these files on your computer. However, you will be able to view these files while running your project.
Which Sitemap Generation Method Did I Choose for My Blog?
Both approaches have their benefits and there is no one specific correct answer here.
In the end, I chose to go with the first-class support that is now built-in with Next.js.
Notable Differences: next-sitemap vs. Next.js sitemap generation
In comparison, you can see the output above from the Next.js generation method is more minimalistic that that of next-sitemap. The next-sitemap approach provides links to more XML schemas that I don't need. However, depending on your content, you might need one or more of those sitemap extensions.
The next-sitemap default config that I used also provides the <changefreq>
and <priority>
values. Next.js also supports these values (as of v13.4.5), but Google ignores changefreq and priority, so I don't feel like I need them.
Google does use the <lastMod>
value "if it's consistently and verifiably accurate". For this property, I found next-sitemap could provide the same new Date()
value for each URL, but with Next.js generation, I could provide an accurate modified date value in the front matter of my MDX files and extract that value to provide the "consistently and verifiably accurate" data that Google wants here.
In the robots.txt file, next-sitemap provides a Host
value. I found that Google does not expect or require a Host value in your robots.txt file, and Host is not part of the original robots.txt specification. I didn't feel like I needed it in my robots.txt file after discovering this, and Next.js robots.txt generation does not include it.
If I had a large site (5000+ pages), I would have chosen next-sitemap. It currently makes it easy to generate a sitemap index, and as of this writing (November, 2023), Next.js does not support this. The Next.js docs do mention in a Good to know section that they do plan to support sitemap indexes in the future.