Suppose you have a directory with millions of files in it. Performing commands on so many files in Linux is often challenging because of limits on the number of arguments that can be provided to commands.

Filtering files based on filename

Say you want to know how many of those files start with the character a. A simple way to see that would be this command:

ls a* | wc -l
The intention of this command is to use ls to list all files starting with a and then use wc -l to count how many lines the ls command produced. However, on most Linux systems this fails with the message "Argument list too long". The wildcard expression a* is expanded to many arguments and the system can't handle it.

A nice alternative is to use find, which can handle the wildcard by itself without problems:

find -maxdepth 1 -type f -name "a*" | wc -l

The option -maxdepth 1 tells it to only find files in the current directory, not in any subdirectories. The option -type f tells it to only return items that are files. Check the man page for find for more details about its many options.

A neat additional feature of find is that you can tell it to perform an operation on the files it finds, for example delete them:

find -maxdepth 1 -type f -name "a*" -exec rm -f {} \;

Filtering files based on contents

Now suppose you want to find all files containing the word penguin. Find can't search inside the files, but grep can. However, the following would fail again:

grep -l "penguin" *

No problem again if we combine it with find:

find -maxdepth 1 -type f -name "*" -exec grep -l "penguin" {} \;

If we want to perform a command on the files in the output of that command, an additional command called xargs can help. It takes a list of files and performs a specified operation on them. To delete all these penguin-containing files, we could do:

find -maxdepth 1 -type f -name "*" -exec grep -l "penguin" {} \; | xargs rm -f