Using SQL Cross Join - the report writers secret weapon
The SQL CROSS JOIN is an often overlooked part of SQL but it can prove invaluable if you are aware of which situations it can be useful in.
A Cross Join (also sometimes known as a Cartesian Join) results in every row of one table being joined to every row of another table. The CROSS JOIN does not establish a relationship between the two tables like an INNER JOIN or OUTER JOIN would by specifying which columns to join on.
The example below creates two basic tables and performs a cross join on them:
CREATE TABLE Customer ( CustomerID int, CustomerName varchar(50) ) CREATE TABLE Product ( ProductID int, ProductName varchar(50) ) INSERT INTO Customer VALUES (1, 'Joe Bloggs') INSERT INTO Customer VALUES (2, 'Bob Marley') INSERT INTO Customer VALUES (3, 'Jimi Hendrix') INSERT INTO Product VALUES (1, 'MP3 Player') INSERT INTO Product VALUES (2, 'LCD TV') SELECT * FROM Customer CROSS JOIN Product ======= Results: ======= CustomerID CustomerName ProductID ProductName ----------- --------------- ------------ ---------------------- 1 Joe Bloggs 1 MP3 Player 1 Joe Bloggs 2 LCD TV 2 Bob Marley 1 MP3 Player 2 Bob Marley 2 LCD TV 3 Jimi Hendrix 1 MP3 Player 3 Jimi Hendrix 2 LCD TV
You can see from the above example that a cross join gets a record from the first table and then creates a new row for every row in the 2nd table. It then does the same for the next record in the first table etc.
When there is no WHERE filters applied to the CROSS JOIN the number of records returned will be the number of records in Table1 x the number of records in Table2. i.e. in the above example this is 3 x 2 = 6. With even moderate sized tables of a few thousand records you can see this can quickly produce very large resultsets that take large amounts of memory and can greatly reduce the performance of the application. e.g. 2000 customers CROSS JOIN 2000 products with no WHERE clause = 4 million records.
Using the WHERE Clause with the CROSS JOIN
We can use the WHERE clause with the CROSS JOIN to filter the records returned:
SELECT * FROM Customer CROSS JOIN Product WHERE Customer.CustomerID = 2 ======= Results: ======= CustomerID CustomerName ProductID ProductName ----------- --------------- ------------ ---------------------- 2 Bob Marley 1 MP3 Player 2 Bob Marley 2 LCD TV
Many text books cite that CROSS JOINS should be avoided due to the very large recordsets they create and that the only reason for using CROSS JOIN is to create some dummy data that can be used for testing your queries with. There are however, much more useful and powerful reasons for using CROSS JOIN.
Using CROSS JOIN as part of a report query
For an example of how we can utilise the CROSS JOIN when writing a query to generate a report let us use the example that we want to generate a report of how many items were sold for each customer and each product. A common attempt to try this would be:
-- assumes a 'Sales' table with the following columns: -- CustomerID, ProductID, Quantity SELECT C.CustomerName, P.ProductName, SUM(S.Quantity) As NumberSold FROM Customer C INNER JOIN Sales S ON C.CustomerID = S.CustomerID INNER JOIN Product P ON P.ProductID = S.ProductID GROUP BY CustomerName, ProductName ORDER BY CustomerName, ProductName
Remember we want evidence of each customer and product in our resultset even if there have been no sales for that customer / product combination. The problem with the above query is that if a customer or product has no sales associated with it there would be no record of the customer or product in the resultset. i.e. If NumberSold is 0 then there is no evidence of the customer or product existing.
We can solve this problem by using a CROSS JOIN in combination with another sub query. If we use a CROSS JOIN query as the basis of our report we can guarantee that we will get every Customer / Product combination in our resultset:
SELECT C.CustomerName, P.ProductName FROM Customer C CROSS JOIN Product P
We now need to add the NumberSold column onto our resultset and we can do this by using a derived table and joining it to our original query by using a LEFT OUTER JOIN:
SELECT C.CustomerName, P.ProductName, S.NumberSold FROM Customer C CROSS JOIN Product P LEFT OUTER JOIN (SELECT CustomerID, ProductID, SUM(Quantity) As NumberSold FROM Sales GROUP BY CustomerID, ProductID) S ON C.CustomerID = S.CustomerID AND P.ProductID = S.ProductID ORDER BY C.CustomerName, P.ProductName ======== Results: ======== CustomerName ProductName NumberSold ------------- --------------- ----------- Bob Marley LCD TV 1 Bob Marley MP3 Player NULL Jimi Hendrix LCD TV 5 Jimi Hendrix MP3 Player NULL Joe Bloggs LCD TV NULL Joe Bloggs MP3 Player 3
Great! This query does what we wanted but if the NumberSold was 0 then the LEFT JOIN gives us NULL in the NumberSold column. We can convert NULL to 0 in our application or in SQL Server we can use the ISNULL() function to convert NULL to 0. This is demonstrated below where we amend the first line of our SELECT statement.
SELECT C.CustomerName, P.ProductName, IsNull(S.NumberSold, 0) As NumberSold
This join pattern of: Table1 CROSS JOIN Table2 LEFT OUTER JOIN DerivedTable3 is the secret weapon of the report writer and can be used to provide an efficient, elegant solution to the 'missing rows' problem.
Adding row filters to our report
WHERE clauses can be applied in both the outer query (the query with the CROSS JOIN) and the inner query (the aggregating part) to filter the data.
For instance to filter the data by Customer or Product we would add a WHERE clause to the outer query:
WHERE C.CustomerName LIKE 'J%'
To filter data by Sales characteristics such as DateOfSale or SalesValue we would add a WHERE clause to the inner query:
-- assumes Sales table has a column called SalesValue SELECT C.CustomerName, P.ProductName, IsNull(S.NumberSold, 0) As NumberSold FROM Customer C CROSS JOIN Product P LEFT OUTER JOIN (SELECT CustomerID, ProductID, SUM(Quantity) As NumberSold FROM Sales WHERE SalesValue > 100.00 GROUP BY CustomerID, ProductID) S ON C.CustomerID = S.CustomerID AND P.ProductID = S.ProductID WHERE C.CustomerName LIKE 'J%' ORDER BY C.CustomerName, P.ProductName
In this article we have seen how the SQL CROSS JOIN works by combining each row in one table with each row in another table. This can produce very large resultsets that can bring a database to it knees. To keep the size of the resultset down it is advisable to:
- Specify which fields to select from each table in the cross join rather than using SELECT *
- Use a WHERE clause to reduce the number of rows returned
The CROSS JOIN is a powerful tool when used for creating reports that require each row from one table to be displayed for one or more rows of another table. The CROSS JOIN solution is:
Table1 CROSS JOIN Table2 LEFT JOIN DerivedTable3 (where the DerivedTable3 is an aggregating sub query.)
A WHERE Clause can be added to both the outer query and the inner query to filter the data according to the needs of the report.
We hope you find this useful in helping you develop your reports.