November 24, 2022

Keeping the Growing Number of Flyway Migrations in Check

Flyway is a fantastic tool for managing an application’s database alongside the application’s source code. But even in small projects, the number of migrations grows inevitably. That is no problem for application startup. But if you use Flyway to initialise the database before every integration test to ensure your tests are independent and reliable, the tests take longer with every migration you add. For example, if you have 50 migrations that take 1 second to run and there are 600 test methods, you would spend 10 CPU minutes alone on creating test databases. There must be a better way.

Fortunately, there is. Unfortunately, it requires a fair number of custom Gradle tasks.

The idea is to automatically do the following before running the tests:

Provision a database container.
Apply all migrations to the database with Flyway.
Create a database dump and place it on the classpath for future consumption.
Remove the container.

Before a test is about to run, the test database can be initialised with the previously created database dump without any involvement of Flyway. That is much faster than applying each migration individually. Furthermore, the time needed to create the test database increases with the number of database objects and no longer with the number of migrations.

The implementation is relatively straightforward thanks to the Gradle Docker plugin:

def dumpPath = project.buildDir.toPath().resolve('resources/test/db.sql')

task pullDatabaseImage(type: DockerPullImage) {
	def stateFile = project.buildDir.toPath().resolve("tmp/squash.txt")

	inputs.files(fileTree('src/main/resources/db/migration'))
		.withPropertyName('migrations')
		.withPathSensitivity(PathSensitivity.RELATIVE)
	outputs.file(stateFile)
		.withPropertyName('stateFile')

	image = 'docker.io/library/postgres:15.1'

	doLast {
		Files.writeString(stateFile, getImage().get(), StandardOpenOption.CREATE,
			StandardOpenOption.TRUNCATE_EXISTING)
	}
}

task createDatabaseContainer(type: DockerCreateContainer) {
	dependsOn pullDatabaseImage

	onlyIf { pullDatabaseImage.didWork }

	targetImageId pullDatabaseImage.getImage()
	hostConfig.portBindings = ['5432:5432']
	hostConfig.autoRemove = true
	withEnvVar('POSTGRES_USER', 'flyway')
	withEnvVar('POSTGRES_PASSWORD', 'flyway')
	withEnvVar('POSTGRES_DB', 'squash')
}

task startDatabaseContainer(type: DockerStartContainer) {
	dependsOn createDatabaseContainer

	onlyIf { createDatabaseContainer.didWork }

	targetContainerId createDatabaseContainer.getContainerId()
}

task stopDatabaseContainer(type: DockerStopContainer) {
	dependsOn startDatabaseContainer

	onlyIf { startDatabaseContainer.didWork }

	targetContainerId startDatabaseContainer.getContainerId()
}

task runFlywayMigrations(type: FlywayMigrateTask) {
	dependsOn startDatabaseContainer

	onlyIf { startDatabaseContainer.didWork }

	driver = 'org.postgresql.Driver'
	url = 'jdbc:postgresql://localhost:5432/squash'
	user = 'flyway'
	password = 'flyway'
	connectRetries = 5
}

task dumpDatabase(type: DockerExecContainer) {
	dependsOn runFlywayMigrations

	onlyIf { runFlywayMigrations.didWork }

	targetContainerId startDatabaseContainer.getContainerId()
	withCommand(['pg_dump', '--clean', '--if-exists', '--inserts',
                 '--disable-dollar-quoting', '--no-owner', '-d', 'squash',
                 '-U', 'flyway', '--file', '/tmp/db.sql'])
}

task copyDatabaseDumpToHost(type: DockerCopyFileFromContainer) {
	dependsOn dumpDatabase
	finalizedBy stopDatabaseContainer

	onlyIf { dumpDatabase.didWork }
	outputs.file(dumpPath)

	targetContainerId startDatabaseContainer.getContainerId()
	remotePath = '/tmp/db.sql'
	hostPath = dumpPath.toString()
}

tasks.withType(ProcessResources).configureEach {
	dependsOn copyDatabaseDumpToHost
}

Two things to highlight:

The database dump is written to build/resources/test/db.sql. As such, it ends up on the test classpath of the project. Tests can pick it up from there by referencing classpath:/db.sql (Spring notation).
The tasks participate in incremental builds. As a result, they only run if any of the migrations change or the database dump is missing. Consequently, the tasks do not impact build times during development once the database dump has been created. And the database dump is never outdated.

The example uses PostgreSQL. It should be possible to adapt it to any other database system (like MariaDB or even Oracle) as long as you can get ahold of a container image with that database system.

A working implementation with the full source code can be found in my sample project on GitHub. The most relevant parts are the build file and the test that uses the created database dump.