Eventually in every monorepo, you encounter tools and projects that generate code -- maybe types based on schemas, React or SOLID components generated by Storybook or some other design system tool, config files imported at build time, etc. Let's dive into how to correctly handle generated code in a Rush monorepo.
Distinguish between two types of generated code
There are two types of generated code: "classic" auto-generated code, which are temporary files generated as a side effect of your build, and "permanent" auto-generated files that you then check into your source control. In my repos I like to refer to these by the name "generated" and "codegen" (codegen files are checked in, whereas generated files are rebuilt and thrown away on every build).
Which type is appropriate depends on your use case. Using transient generated files is cleaner, as there's a single source of truth checked in, so this is usually the default. But there are valid reasons to check in generated files: for example, if files are generated based on config files that come from an external repo, or the generated files are much easier for developers to read and reason about than the actual source of truth, then using the codegen approach could be preferred.
Classic (transient) generated code
These files should be treated as build outputs. They may be generated by a custom build script defined by a project, or by a Heft plugin that runs before the compile step, etc. Ideally, you want to use a single standard path for where these files live across your repo, and encourage all current and future project owners to follow this standard.
In my current repo, I encourage project owners to use the folder
src/generated for generated files. The root pattern would be
*/*/src/generated, so this is what goes in the root
Just like other build outputs, you'll need to add this folder to your list of output folder names for build caching. Assuming you've enabled phased builds and build caching, you may end up with a
rush-project.json that looks something like this:
"outputFolderNames": ["src/generated", "lib", "lib-commonjs", "dist", ".heft"]
"outputFolderNames": ["temp/coverage", "temp/jest-reports"]
This configuration ensures that when you restore a
_phase:build phase from cache for a project, it'll restore the
src/generated folder as well, just like if you'd built the project locally. This ensures any files that your
_phase:test might rely are all sitting where they belong.
Checked-in (codegen) generated code
Unlike the first type, these files should be treated as build inputs. In Rush, any file that isn't gitignored and isn't specifically excluded in the
rush-project.json config is by default considered a build input, so this doesn't require any additional configuration.
In my current repo, I suggest that projects use the standard folder
src/codegen to store these files. This pattern (
*/*/src/codegen) is pre-populated in the root
.pretterignore file, which ensures that changing Prettier settings doesn't cause changes in generated files. Depending on how complex these generated files are, you may also consider adding this pattern to other ignore files -- for example you may not want these files to pollute your Code Coverage % (by excluding them in Jest's coverage settings). If you use a tool like SonarCloud or CodeScene, you may want to exclude these files in those tools as well.
Avoiding drift in codegen files
The typical happy path for a codegen file is that the local developer will make changes to the source of truth files, run the
rushx build for that project, and then check in both the updated source of truth file and the updated generated file, merging them in the same pull request.
However, if the developer forgets to check in the generated file, or maybe doesn't run a build at all, then you have "drift" -- another developer who checks out that commit and runs the build will have unexpected modified files in the
src/ folder. To prevent this, you want your pull request to block any merges that would result in drift.
One way you could do this on a global level is to build the entire monorepo, and then check for any files in a
git diff -- if any files have been modified locally during a build, then someone has done something incorrectly, and you could fail the build. This approach has a big downside which is that it can be difficult to track down exactly what caused the failing build.
A better approach is to use Heft's default
--production flag. Running
rushx build in a project should generate the updated files; running
rushx build --production should generate the updated files and if they don't match with what's already there, fail the build with a detailed error. This pattern is used by other Rushstack tools like API Extractor, and usually only requires a little bit of extra code to enable. A nice advantage of this pattern is that since your PR build runs
rush build --production, any kind of drift must be corrected before a PR can merge -- and the developer can even run the command locally if they need to do more troubleshooting on why the drift occurred.
Note that comparing your new in-memory generated file to the file on the hard disk is made much more complicated if the file on the hard disk gets modified by Prettier or other linters/formatters -- this is one reason to make sure that codegen files are ignored by Prettier or other tools that would change formatting in these files.
The suggestions above are Rush-oriented, but if you run a monorepo using other build tooling, the same general priniciples should apply:
- Come up with standardized paths for all projects to follow when they generate code
- Use different paths for "generated" and "codegen" files
- Don't autoformat generated files (and consider excluding them from other tools as well)
- Have a project's default build update its files, and have an optional flag that will fail instead if the checked-in file has drifted