As developers, we frequently find ourselves trying to understand features in an unfamiliar and massive codebase. Whether it’s your team’s internal project, someone else’s code, or an open-source repository - making sense of it all can be quite daunting. The first challenge? Simply figuring out where to begin!
Tools we will be using:
- ripgrep or rg: An alternative of
grep
with default recursive search. github. - fd: An alternative of
find
command. github. - Any editor of your choice: VSCode, IntelliJ, Zed etc.
Lets Start
Problem Statement: Let’s say you work in Digital Ocean RDS team, a Java codebase and want to understand how the automated backup system works for Postgres.
First thought would be to say open the codebase in InteliJ and search for “backup”? The search results has 100’s of matches, huh?
Let’s do better & try using CLI, we’ll start simple and with time add more filters.
Search for “backup” with rg
:
$ rg backup
backend/app/src/main/pkg/file1.java
12:import io.company.backup;
28: restore.backupUUID(id);
backend/app/test/resources/file2.json
5: "backup_id": null,
41: "backupData": null,
48: "backup_id": null,
frontend/app/test/resources/file2.tsx
...
more files here...
This has a lot of files, we are only interested in java
files atm. so let’s add a fiter for that:
$ rg backup -type java
But wait this will not search words e.g. class AutomatedBackup {}
, so let’s do a case-insensitive search.
# -i means search case in-sensitive
$ rg backup -type java -i
Well, let’s say there are 20+ files with backup
keyword in it and each file is having 10-100 occurences of word backup
.
It will be overwhelming to just figure out which file to look at?
Maybe we can ignore the search results for now and only look a the file names first.
# -l says that only print the file names matching the query and not the actual lines
$ rg backup -type java -i -l
backend/app/src/main/pkg/AutomatedBackup.java
backend/app/src/main/pkg/PostgresRestore.java
backend/app/src/main/pkg/PostgresRestoreUtil.java
backend/app/src/main/pkg/AutomatedBackupImpl.java
backend/app/src/main/pkg2/BackupUtil.java
backend/app/src/test/pkg/AutomatedBackupImplTest.java
backend/app/src/test/pkg2/BackupUtilTest.java
more files ...
Now maybe we have some idea like AutomatedBackupImpl
could be a good candiate to start looking at.
Open the file in IntelliJ, look at the methods defined and you are good to start.
We can even lookup just the files having backup
in it’s name using fd
:
# -i for case in-sensitive search
$ fd -i backup
backend/app/src/main/pkg/AutomatedBackup.java
backend/app/src/main/pkg/AutomatedBackupImpl.java
backend/app/src/main/pkg2/BackupUtil.java
This may be useful at times but it’s not necessary that some files related to Backup functionality
will have backup
in their file names.
Combining fd and rg
Now let’s say you want to search for backup
keyword in all files that have backup
in their name.
If you see above our search results have files like PostgresRestore.java
etc as well in results.
How to do it?
- List all files having
backup
in their name:$ fd -i backup --extension java
. - Search all files with keyword
backup
:$ rg -i backup
Combine the two:
$ rg -i backup $(fd -i backup --extension java)
# Command substitution $() executes fd first to get list of backup files
# Then rg searches for 'backup' only in those files
Multine search
A very frequent use case is that during incidents or debugging we need to search for errors across code base. If you simply use code editor search like VSCode or IntelliJ, they can take regex but don’t support multiline search. On top of it, the commands are IDE specific. I have one trick that may be helpful.
Sample code with multiline error message:
backend/app/src/main/pkg/BackupStorageManager.java
89: StorageMetrics metrics = getStorageMetrics();
90: log.error(
91: "S3 backup upload failed for instance {} to bucket {}. " +
92: "Available storage: {}GB, Required: {}GB",
93: instanceId, bucketName, metrics.getAvailable(), requiredSpace
94: );
Let’s say you encounter this error message in your logs:
S3 backup upload failed for instance i-123456 to bucket my-backup-bucket. Available storage: 10GB, Required: 50GB
Now if you search for this exact string or it’s regex in code, you may not find anything because the error message is split across multiple lines in code.
In such cases you can split the string you are trying to search and use -A/-C
flag in grep/rg.
# -A 5 means print 5 more lines after the matching line
$ rg "S3 backup upload failed" -A 5 | rg "Available storage"
I know this is a hypothetical example but I have used this trick a few times and it works in practice.
That’s all I have for now :) Happy Hacking!