Yesterday I needed to write another quick ruby script to traverse a directory tree, extract source code from a variety of file types and create one file that contained all of the extracted source code. My initial thought was to use the Dir class; however, that resulted in a blind goose chase.
It turns out you don’t have to roll your own directory traversal using the Dir class (which was more than a little clunky to try to accomplish) instead you need to use the Find module (documented here).
The nice thing was that once I found that module the script to accomplish my task was super easy to write (about 50 lines of code including comments/whitespace). I chose to use a configuration file to hold some user-specified settings so that the tool could be used to process other types of projects (I processed a c# and asp.net project this time but will need to do the same to a CF project soon).
Here is the code - if you see something that could be improved, please let me know in the comments.
require 'find' class PatentBuilder def initialize() @rootDirectory =IO.readlines("patent.cfg").strip! @includedFileTypes =IO.readlines("patent.cfg").downcase.split(",") @excludedDirectoryNames =IO.readlines("patent.cfg").downcase.split(",") puts @includedFileTypes end def rootDirectory @rootDirectory end def RecurseTree outfile = File.new("patent_output.txt","w+"); totalFiles = 0; Find.find(@rootDirectory) do |path| if FileTest.directory?(path) #determine if this is a directory we don't like... if @excludedDirectoryNames.include?(File.basename(path.downcase)) Find.prune #don't look in this directory else outfile << "// DIRECTORY: " << path outfile << "\n" next end else #we have a file filetype = File.basename(path.downcase).split(".").last if @includedFileTypes.include?(filetype) outfile << "// FILE: " << path outfile << "\n" File.open(path).each do |line| outfile << line end #puts path totalFiles = totalFiles + 1 outfile << "\n" else end end end puts "total files = " << totalFiles.to_s end end PatentBuilder.new.RecurseTree
The configuration file (named patent.cfg) is pretty simple and needs to go in the same directory as the ruby file (patent.rb):
::Root Directory c:\some\directory\path ::Included File Types (extensions) cs,aspx,vb,asmx,css,xml,resx,ascx,master,sitemap,sql, ::Excluded Directory Names Tests,Documentation,.svn,
You may notice that each of my lists is comma separated and the final item in each list is trailed by a comma. Without that trailing comma the code to split the string on the comma’s lost the final token in my list.