I did use a search engine, and most of the results were like the site you mentioned, which compares dynamic URLs such as example.com/?page=52 with “SEO-friendly” ones such as example.com/category/page/
They don’t compare URLs ending in “page-name/” with those ending in “page-name.html” which are the two default options available with Hugo.
In my head, the only pro for uglyURLs is if you’re running in a setting where you got no choice, i.e. serving directly from a file system (in combo with relativeURLs=true).
Pretty URLs are pretty ugly if you have multiple output formats. For example, Instead of /foo/bar.json you’ll end up with /foo/bar/index.json. I’d love a way to have pretty URLs for HTML and ugly URLs for other stuff…
Does anyone know if there is a limit to how many directory levels a search engine spider will crawl?
Technically depends on the spider program and how it is implemented, though I doubt the limit is an easily attainable one, if one exists at all.
I’m also wondering if search engines consider pages in different directories to be less related to each other than ones in the same directory.
I think they wouldn’t pay much attention to that, especially with all of the dynamic servers out there. If I understand correctly, a spider basically reads a page, does what it can to parse the content (microdata helps a lot with that), and follows the links it finds on the site. A link inside an <article> element would be viewed as related to that article, while a link with rel="next" is interpreted as the page that comes after the current one, which is a different sort of relationship.
If you use microdata, the search engine can have a lot more information to go from, such as connecting that two articles on different sites were written by the same person.
My guess is that you’re worried about SEO, which almost always means “optimizing for Google.” Related information isn’t based on directory structure as much as the cross-linking you’re doing within the site, but that’s not necessarily your biggest concern as much as developing really solid content and improving your “link juice”; i.e., getting other reputable sites to link your content while you also link to other reputable sites. You can also do some structured data (JSON+LD) to make your content a bit more machine readable. As far as controlling what’s crawled, I would recommend looking into. Google’s webmaster tools:
I’ve thought of another pro for pretty URLs: You can change the file extension to .php, .shtml or whatever after the site is live if you need to add dynamic features, without breaking any existing links (as the filename is not usually shown).