Tuesday, December 19, 2006

QuickSort

I was reading an article on sorting algorithms today and thought, "Hrm, I wonder how well CF compares to these others when running a hand written QuickSort."

The answer, not very good - however, the good news is that it isn't really important considering the built in array sorting method is VERY fast, even for large datasets (10,000 elements).

I gave up waiting on CF to finish the QuickSort for 10,000 elements but for just 1,000 it took over 4 seconds on average - with the built in sort the time was not-measurable.

Here is the QuickSort Algorithim I used:


<cffunction name="partition" returntype="struct">
<cfargument name="a" type="array" />
<cfargument name="first" type="numeric" />
<cfargument name="last" type="numeric" />

<cfset var pivot = arguments.a[arguments.first] />
<cfset var lastS1 = arguments.first />
<cfset var firstUnknown = arguments.first+1 />
<cfset var s = structNew() />

<cfloop condition="firstUnknown LTE arguments.last">
<cfif a[firstUnknown] LT pivot>
<cfset lastS1 = lastS1 + 1 />
<cfset arraySwap(arguments.a,firstUnknown,lastS1) />
</cfif>
<cfset firstUnknown = firstUnknown + 1 />
</cfloop>

<cfset arraySwap(arguments.a,arguments.first,lastS1) />

<cfset s.s1 = lastS1 />
<cfset s.a = arguments.a />
<cfreturn s />

</cffunction>

<cffunction name="quicksorter" returntype="array">
<cfargument name="a" type="array">
<cfargument name="first" type="numeric">
<cfargument name="last" type="numeric">

<cfset var pivotInfo = StructNew() />

<cfif arguments.first LT arguments.last>

<cfset pivotInfo = partition(a,arguments.first,arguments.last) />
<cfset arguments.a = pivotInfo.a />
<cfset arguments.a = quicksorter(arguments.a,arguments.first,pivotInfo.s1-1) />
<cfset arguments.a = quicksorter(arguments.a,pivotInfo.s1+1,arguments.last) />
</cfif>

<cfreturn a />
</cffunction>

<cffunction name="quicksort" returntype="array">
<cfargument name="a" type="array" />
<cfargument name="size" type="numeric" />

<cfreturn quicksorter(arguments.a,1,arguments.size) />
</cffunction>


You can call it like so. This can be used to sort smaller portions of the array if you want.

<cfset l1 = quicksort(l1,arrayLen(l1)) />


The linked article has a couple of other sorting algorithms you can look at and has them implemented in many languages (just not CF). CF, however, is best not included in the conversation with languages like C++ which completes the quicksort on 10,000 elements in 0.0038 seconds.

Friday, December 01, 2006

T-SQL Query Paging

Here is a tip to help you get pages of results back from SQL Server. For instance, let's say you were trying to display a listing of products with 20 products per page (and you have a couple hundred products).

On option (since this is a CF Blog) is to use the cfoutput startrow attribute. However, that solution isn't always viable because the size of your dataset may be too large. Instead, what you need is to just pull the correct 20 results back from the table. Sadly, there isn't an "OFFSET" function in T-SQL so we need to rephrase the problem a little in order to figure out the solution.

For example, let's say we want the first page, so the top 20 results. That is pretty easy, just use SELECT TOP 20 .... but what about the second or subsequent pages? How do you get the 21-40 items? It's easier than you might suspect. What you are actually trying to get is the bottom y of the top x results. To look at that another way you want the top y of the top x results ordered backwards.


SELECT * FROM (
SELECT TOP y * FROM (
SELECT TOP x * FROM sometable
ORDER BY somefield ASC
)
ORDER BY somefield DESC)
ORDER BY somefield


The innermost query, SELECT TOP x, grabs the first x number of rows, the second query SELECT TOP y, gets the last y of x rows, and the outermost query, SELECT * puts the results in the right order.

However, there is a caveat here, and is only a problem when you get to the last page of results. To keep the example simple lets say there are 26 products. If you use this technique as described your second page will show 14 of the products that were on the first page.

We need some new variables.
Let
y1 = max number of items to show on a page (20)
y2 = actual number of items to show on this page (calculated later)
p = page number (2)
x = subset of items we are getting the bottom y2 of = (y1*p) or (20*2)
t = total number of items (26)

We need to figure out a way to calculate y2 when p > 1. y2=y1-x-t

if(p==1){
y2=y1;
} else {
y2=y1-x-t
}

Lets replace those variables with values and see how it works out; starting with page
2.


if(p==1){
y2=20;
} else {
y2=20-(20*2)-26;
}


As you can see on page 2 y2 = 20 -(40-26) = 20-14 = 6 which is exactly what we want to show on page 2.