Creating a comma delimited list of column values without a SQL cursor at SQLBook.com
In the article how to avoid cursors we saw some methods for avoiding cursors. Another common scenario many new SQL developers use a cursor for is building a comma delimited list of column values. There may be many reasons you want to build a comma delimited list of values from a table column such as creating a list of values to use in an strong>IN clause in a dynamic SQL query.
Lets say we have the following rows in a Customer Table (the number of columns and rows in the table is abbreviated for this example):
We want to join the values in the CustomerID column into a comma delimited list which would result in:
3, 4, 5
One way to solve this would be to use a SQL Cursor and select the values from the column and then loop through the resultset one row at a time and append the value from the row. However this would be an inefficient solution:
DECLARE @List varchar(200) DECLARE @CustomerID int SET @List = '' DECLARE CustIDCursor CURSOR FOR SELECT CustomerID FROM Customer OPEN CustIDCursor FETCH NEXT FROM CustIDCursor INTO @CustomerID WHILE @@FETCH_STATUS = 0 BEGIN IF @List = '' BEGIN SET @List = Cast(@CustomerID As varchar(5)) END ELSE BEGIN SET @List = @List + ',' + Cast(@CustomerID As varchar(5)) END FETCH NEXT FROM CustIDCursor INTO @CustomerID END CLOSE CustIDCursor DEALLOCATE CustIDCursor SELECT @List As 'List'
Replacing the cursor with a SET based approach
Most of the time a set based solution is preferred over a cursor. A set based solution operates on multiple rows. Relational database systems such as SQL Server or Oracle use special algorithms which optimise the processing of multiple rows (resultsets). They are not so efficient at repeating a query one row at a time - row by row operations (such as when a cursor is implemented). See the box below for more of an explanation on this.
We can achieve our desired result by using a simple SELECT statement in combination with the COALESCE function. In the example below the @List variable has a value appended to it for each row in the resultset. The coalesce function takes a list of input parameters and returns the first non null paramter that is passed to it.
For the first row in the resultset the value of @List will be NULL and so the coalesce function returns an empty string (the first NON NULL parameter passed to it). The CustomerID value of this row is assigned to the @List variable. On subsequent rows in the resultset the @List variable will not be NULL and so the coalese function returns the value of @List with a comma contatenated (appended) to the end of the @List variable. The CustomerID is then appended after the comma.
DECLARE @List varchar(2000) SELECT @List = COALESCE(@List + ',', '') + Cast(CustomerID As varchar(5)) FROM Customer SELECT @List As 'List'
If you had a larger number of items in the resulting comma delimited list this solution may be further improved by removing the COALESCE function. This is because in the above example the conditional function COALESE must be called for each row in the resultset. In this instance removing this conditional aspect from each row will speed up the query. We can easily achieve the result the COALESE function was gaining for us (the correct placement of commas in the list) by simply appending the comma after each CustomerID and then knocking the last comma off when the list is complete. We use the SUBSTRING function (SQL Server) or SUBSTR function (Oracle / MySQL) to knock the last comma off.
DECLARE @List varchar(2000) SET @List = '' SELECT @List = @List + Cast(CustomerID As varchar(5)) + ',' FROM Customer SET @List = SUBSTRING(@List, 1, Len(@List) - 1) SELECT @List As 'List'
Why are set based queries more efficient than cursors
To understand this we must understand how data is stored in the database. Data rows of a table are stored in data pages. When we add indexes to our database the data is stored sequentially (either the data in the data pages is physically sequenced by column values or column values are sequenced and stored in index pages which point to the data pages). Many Relational Database Management Systems (RDBMS) such as SQL Server and Oracle are designed to efficiently access data by sequentially reading data and index pages.
When using a set based query approach the database management system frequently does not need to reread a data page for the next row as it will be on the same data page as the last row.
When using a row by row operation (such as a cursor) the RDBMS may have to read the same page multiple times (once for each row). This can slow the performance of your queries.
In this article we have demonstrated how to build a comma delimited list of column values using a set based SQL query. Cursors can be used to achieve the same result but are not as efficient and also require more complicated and lengthy coding. We have also given an insight into why set based queries are often more efficient than row by row (cursor) type queries.