What is the right way to arrange the content folder and md files

The work that I might be asked to work upon (if I clear the interview, of course) is surely messed up (at least this is what I think right now). There are 28 states in India. So we have 28 folders for them (numbered as 0, 1, 2, 3, and so on).

Now each of these State is geographically and politically divided into Districts and so we have about 780 of such folders named 01, 02, 03 (for districts in state 0), named 11,12,13 (for districts in state 1) and so on.

Each of these districts are further divided into sub-divisions (and probably 6057 folders) named as 01_1, 01_2, 01_3 and so on for subdivisions of district 1 in State 0.

Get this idea? Oh and then we have sets of md files in all of these folders

I have been informed that they have about 13k folders with almost 45k of md files in them.

The earlier programmer, has kept all folders directly under the content directory and so we have something similar to the following screenshot (I do not have the exact data as of now, right now I have a snapshot based on what I have been briefed) –

current-folder-structure

My opinion is that it might be easier it we have folders that are grouped together. I had a discussion with the earlier guy and he said that he followed the above approach (of having everything in separate folders) just because these URLs would be shorter – much shorter than if we followed a nested approach.

I do realize that the URL might get longer with the nested structures, but wont it be easier to write codes and generate Statistical Data from the folders arranged in Directory-Subdirectory way? This is how I would like to arrange the things, if they would be better to go with -

Would this be better

Which one would be your preference and why ?

As the community is experienced here, I thought, it would be better to ask for your opinions - so that I might be better prepared for my interview scheduled coming Saturday.

Thank you once again for taking out time to read and help me.

Warm regards,
Sid.

You want to read up on Hugo “permalinks” configurations options.

By default Hugo does mimic the directory structure in “content” when it build the site. You can however change this via the “permalinks” configurations.

1 Like

Thank you @frjo for your response. Appreciated :slight_smile:

At the moment, it is not the permalinks that I am worried about (yes, I do believe that in my way of organizing data, it would generate longer URLs, but that issue can be handled later). I am more concerned about the way the data is organized. I hope you would agree we all have our own “ease” or “at home ground” factor when we code / organize stuff.

Like the earlier guy who was working found it to be easier to work with everything as a “subfolder” of the content (maybe he had a vision or a plan of how he would handle things). And me on the other hand, believes that it would be easier if I kept all related subfolders together in an hierarchy - more inclined towards this because I think it would be easier to pull out stats better.

What I am seeking out for is - which way would be better for goHugo.

Maybe there is some better way to get the data organized that neither of us might be aware of.

It does not matter which way you use to organise content, as long as it works and makes work easier for any future developers. What I don’t understand is why, instead of using state names, the previous dev used codes (isn’t that confusing for all those pages?)?

I have a similar project that shows healthcare centres locations and all of them are grouped together by their location. So, for example, if there are 45 states, the main section folder content/hospitals then under the section I have the names of all the states (with an _index.md file) and then each of the states nested section houses further folders based other lower sub-divisions and then the hospital names in these subdivisions.

So, a typical file path would look like content/hospitals/state/sub-state/sub-sub-state/hospital-name.md where some states can have more than one hospital.

In you case, I would do something similar to my example. That is, create a states folder inside the content directory, then move all the 28 states in there as nested sections. Then for each state, list all districts for each inside the respective folders, and the sub-divisions in a similar manner and the the respective markdown files in each subdivision. For example, content/states/state-name/district-name/sub-division/lorem.md.

Going this way also means that the permalinks will be long, especially if built using the content structure of Hugo, but that can be rectified with permalinks (as @frjo mentioned), and would have even been better if the names of these areas were used to name the folders, rather than codes.

2 Likes

Hello @Arif

Thanks for understanding my desire in the situation. I completely agree that it would have been more meaningful (and a lot lot more easier) if names were used instead of numbers. But then, it is how it is, and I do not have control on that. Personally, I would never have opted in with the number system. Arranging these folders and files with numeric names is more confusing at the moment. I have told them that I would either need the data in named files and folders or arranged as per the states. Either way, they can expect my services only when those 13k folders and 48k files are grouped as per my requirement (just to avoid messing up their files and data). They have not confirmed anything yet.

If I do get to working on this, I would definitely follow the suggestions of you and @frjo for shortening the URLs.

Sorry to arrive late to the party, but… is there any reason you can’t stuff all this into a data file or files, and then render the pages from data?

1 Like

@jmooring

Sir, honestly, I had not thought of this (stuffing all data into a single data file or files) as until this moment, I was merely trying to build upon the work that was already been done and this is why I was seeking out if arranging the data into folders and subfolders would be a better option.

I do believe, the way the data is arranged at the moment (everything inside its own folder under content folder), it might not build on a normal system (given the fact that there are 13k folders with 48k md files in between them which would in turn make too many fetches to data and probably get into memory issues).

To test @regis’s simple approach to generating pages from data, I built a 20,000 page site in less than a minute, and most of that data was retrieved remotely instead of locally.

My thinking here is that data queries and analysis are much easier when working with a data set.

It may be worth an experiment to verify applicability and performance.

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.