The @trickest Inventory project is an interesting resource. It has a massive set of hostnames, live services, spidered URLs, and cloud data organized by Bug Bounty program. There is so much more data than I have interest in storing for my needs. In fact, the only thing I am interested in is the hostnames resource. Here is a quick and dirty way to pull the hostnames.txt file from every program without cloning the entire project.

💡
There is a good chance I am going to embarrass myself with this post and there is a better way. But this is part of learning and I embrace it. Please let me know and I will post the faster way at the top.

First, pull the current project git history without cloning it:

git clone --no-checkout \
--depth 1 \
--single-branch --branch=main \
https://github.com/trickest/inventory.git

Above we are cloning the project without checking out any files; --no-checkout. We are also only pulling HEAD (--depth 1) and only focused on the main branch.

Note, just the commit history from main takes up 336Mb 😲

Finally, we are going to download every hostname.txt file. This is done by finding the listing the HEAD tree, grep'ing for the filename, urlencoding & , and then downloading the file.

git ls-tree --full-name --name-only -r HEAD | \
grep hostnames.txt | \
sed -e "s/&/%26/" | \
xargs -I {} sh -c 'curl -o $(echo {} | cut -d\/ -f1)_hostnames.txt https://raw.githubusercontent.com/trickest/inventory/main/{}'

At this point you should have a directory full of the relevant files.

💡
If you are using this technique with another project take care that you trust the input (directories and filenames). You are piping them into a subshell.

If you don't want to pipe into a subshell (yolo), you can use wget (remove the -o subshell) but you will be left with every file named hostnames.txt.X:

git ls-tree --full-name --name-only -r HEAD | \
grep hostnames.txt | \
sed -e "s/&/%26/" | \
xargs -I {} wget https://raw.githubusercontent.com/trickest/inventory/main/{}