Yesterday I needed to write another quick ruby script to traverse a directory tree, extract source code from a variety of file types and create one file that contained all of the extracted source code. My initial thought was to use the Dir class; however, that resulted in a blind goose chase.

It turns out you don’t have to roll your own directory traversal using the Dir class (which was more than a little clunky to try to accomplish) instead you need to use the Find module (documented here).

The nice thing was that once I found that module the script to accomplish my task was super easy to write (about 50 lines of code including comments/whitespace). I chose to use a configuration file to hold some user-specified settings so that the tool could be used to process other types of projects (I processed a c# and asp.net project this time but will need to do the same to a CF project soon).

Here is the code - if you see something that could be improved, please let me know in the comments.

require 'find'
class PatentBuilder
def initialize()
 @rootDirectory =IO.readlines("patent.cfg")[1].strip!
 @includedFileTypes =IO.readlines("patent.cfg")[3].downcase.split(",")
 @excludedDirectoryNames =IO.readlines("patent.cfg")[5].downcase.split(",")

puts @includedFileTypes
end

def rootDirectory
 @rootDirectory
end

def RecurseTree
outfile = File.new("patent_output.txt","w+");
totalFiles = 0;
 Find.find(@rootDirectory) do |path|

  if FileTest.directory?(path)
   #determine if this is a directory we don't like...
   if @excludedDirectoryNames.include?(File.basename(path.downcase))

Find.prune #don't look in this directory
   else

outfile << "// DIRECTORY: " << path

outfile << "\n"

next
   end
  else #we have a file
   filetype = File.basename(path.downcase).split(".").last
   if @includedFileTypes.include?(filetype)

outfile << "// FILE: " << path

outfile << "\n"

File.open(path).each do |line|

 outfile << line

end

#puts path

totalFiles = totalFiles + 1

outfile << "\n"
   else
   end

  end
 end
 puts "total files = " << totalFiles.to_s
end
end

PatentBuilder.new.RecurseTree

The configuration file (named patent.cfg) is pretty simple and needs to go in the same directory as the ruby file (patent.rb):

::Root Directory
c:\some\directory\path
::Included File Types (extensions)
cs,aspx,vb,asmx,css,xml,resx,ascx,master,sitemap,sql,
::Excluded Directory Names
Tests,Documentation,.svn,

You may notice that each of my lists is comma separated and the final item in each list is trailed by a comma. Without that trailing comma the code to split the string on the comma’s lost the final token in my list.

Comments