Extracting SSRS Report RDL (XML) from the ReportServer database

Overview

11/30/2011 – I have created a CodePlex project based on this blog post with some scripts in response to the questions I got in this posts comments. The project is called “SQL Server Reporting Services (SSRS) Catalog Queries”. I have a separate blog post about the project here.

12/17/2011 – I have updated the CodePlex project project and included the SSIS project that extracts catalog contents directly to the file system! w00t!

06/01/2011 – REPOSTED TO FIX FORMATTING ISSUES. LET ME KNOW IF YOU CAN’T READ IT.

Before I even start this entry I should state that the methods I document here, while instructive and useful, are frowned upon by Microsoft, and they are not supported. You should consider getting to the SQL Server Reporting Services (SSRS) item contents via the Web Services provided with SSRS rather than extracting them directly from the database as I describe here. As SSRS is versioned the structure and storage methods used to manage the Report Server items may change, and the methods described here may no longer work. You will have a better chance (although no guarantee) that your code will continue to work against new versions of SSRS if you use the web services instead. Ok, the disclaimer is out of the way.

There have been numerous times that I have wanted to extract the XML of a report (RDL) or the definition of a Shared Data Source (RDS) from the ReportServer database. One of the first times I had this need was as part of a disaster recovery. A client lost the source files for their reports, as well as the reporting services installation. The quickest way to get their report definitions back was to just extract the XML from the ReportServer.dbo.Catalog table’s “Content” column. For a current project I’m working on, I want to determine the database objects (tables, stored procedures, views, functions) that are used by the Datasets in each report. Other times, I have just been curious and wanted a quick way (from within SQL) to view the item definitions.

This blog post, as well as a few to follow document the methods I have developed to do just this.

The ReportServer Database:

SQL Server Reporting Services (SSRS) keeps the various Reports, Data Sources, Images and other resources it provides inside a SQL Server Database. This database may or may not be on the same computer as the Reporting Services instance itself. Also, while the database is named “ReportServer” by default, it could be named something else, so you will need to determine for your installation what instance that database is on, as well as what it’s name is. The scripts below assume the default database name of “ReportServer”.

Inside the ReportServer database there is a Catalog table (ReportServer.dbo.Catalog). The Catalog table stores the items (Folders, Reports, Resources, Linked Reports, Data Sources, Report Models, Report Parts and Shared Datasets) that are available via the SSRS instance. There are numerous columns in the Catalog table, and I won’t describe them all here. However, the ones of interest to this article include:

Column Name Data Type Description
ItemID uniqueidentifier The Primary Key. A system generated GUID.
Path nvarchar(425) The root (“Home”) based path to the item (including its name) in the virtual folder structure provided by the site.
Name nvarchar(425) The name of the report item without the path.
Type int Identifies the type of the item being stored:1 = Folder
2 = Report
3 = Resources
4 = Linked Report
5 = Data Source
6 = Report Model
7 = Report Part (SQL 2008 R2, unverified)
8 = Shared Dataset (SQL 2008 R2)
Content image The actual bytes of the report item. If this is a folder or Linked Report, the content is null, otherwise it is stores the binary image of the report item.

Armed with that knowledge you can see that a query to retrieve the binary content for all reports would would like:

SELECT
   ItemID
  ,Name
  ,Type
  ,Content
FROM ReportServer.dbo.Catalog

And the result might look like:

ItemID Name                    Type  Content
------ ----------------------- ----- -------------------------
909... AdventureWorks2008R2        5 0xEFBBBF3C3F786D6C2076...
8CC... AdventureWorksDW2008R2      5 0xEFBBBF3C3F786D6C2076...
A20... AdventureWorksLT2008R2      5 0xEFBBBF3C3F786D6C2076...
7BE... Northwind                   5 0xEFBBBF3C3F786D6C2076...
630... Products                    2 0x3C3F786D6C2076657273...
DF6... AnnualSalesByMonth          2 0x3C3F786D6C2076657273...
06F... SalespersonReport           2 0x3C3F786D6C2076657273...
D9E... ParameterTest01             2 0x3C3F786D6C2076657273...

The UTF-8 Byte Order Mark (BOM):

Unicode strings can have a special character called the “Byte Order Mark” at the beginning to indicate the byte order (endianess, big endian, little endian) and encoding (UTF-8, UTF-16, UTF-32) that was used to create the string. SSRS stores the XML items (Report RDL and Data Source definitions) using the UTF-8 encoding. it just so happens that UTF-8 Unicode strings do not NEED to have a BOM and in fact ideally would not have one. However, you will see some report items in your SSRS that begin with a specific sequence of bytes (0xEFBBBF). That sequence is the UTF-8 Byte Order Mark. It’s character representation is the following three characters, “”. While it is supported, it can cause problems with the conversion to XML, so it may be worthwhile to remove it for Reports (Type=2), Data Sources (Type=5), Report Parts (Type=7), and Shared Datasets (Type=8). You can do that by testing to see if the first three bytes are 0xEFBBBF and then trimming them if they are. Before we do that though, we have to get rid of that pesky image data type used by the Content field.

Casting Out Image:

The ReportServer.dbo.Catalog.Content column uses the deprecated image data type. Unfortunately we can’t perform string manipulations or direct conversions to varchar or xml from the image data type. To fix the problem, we need to first convert (or cast) the image to a varbinary(max), and then perform any string manipulations or additional conversions on the varbinary value.

Getting the Content as Varbinary, Varchar and XML:

Ok, so putting all the stuff from above together, the following code will extract the contents of all Reports, Data Sources, Report Parts and Shared Datasets (Types 2,5,7, & 8) from the database. I’ve broken the query down into a two CTEs and an outer query to isolate the key parts. you should be able to copy the query below, and run it on the same server as your ReportServer database to view the results:

--The first CTE gets the content as a varbinary(max)
--as well as the other important columns for all reports,
--data sources and shared datasets.
WITH ItemContentBinaries AS
(
  SELECT
     ItemID,Name,[Type]
    ,CASE Type
       WHEN 2 THEN 'Report'
       WHEN 5 THEN 'Data Source'
       WHEN 7 THEN 'Report Part'
       WHEN 8 THEN 'Shared Dataset'
       ELSE 'Other'
     END AS TypeDescription
    ,CONVERT(varbinary(max),Content) AS Content
  FROM ReportServer.dbo.Catalog
  WHERE Type IN (2,5,7,8)
),
--The second CTE strips off the BOM if it exists...
ItemContentNoBOM AS
(
  SELECT
     ItemID,Name,[Type],TypeDescription
    ,CASE
       WHEN LEFT(Content,3) = 0xEFBBBF
         THEN CONVERT(varbinary(max),SUBSTRING(Content,4,LEN(Content)))
       ELSE
         Content
     END AS Content
  FROM ItemContentBinaries
)
--The outer query gets the content in its varbinary, varchar and xml representations...
SELECT
   ItemID,Name,[Type],TypeDescription
  ,Content --varbinary
  ,CONVERT(varchar(max),Content) AS ContentVarchar --varchar
  ,CONVERT(xml,Content) AS ContentXML --xml
FROM ItemContentNoBOM

Here is a screen shot of my results so you can see that the XML type is recognized:

Catalog Content as XML Query Results

Querying the RDL Contents:

Ok, so big deal. We did some trimming and casting and we got XML back. Where this really starts to become useful is when you start using SQL Server’s built in XML data type to parse the XML, and extract key pieces. For example, the following query builds on the query from above and extracts the actual commands used for the each dataset in each Report or Shared Dataset. Cool!

--The first CTE gets the content as a varbinary(max)
--as well as the other important columns for all reports,
--data sources and shared datasets.
WITH ItemContentBinaries AS
(
  SELECT
     ItemID,Name,[Type]
    ,CASE Type
       WHEN 2 THEN 'Report'
       WHEN 5 THEN 'Data Source'
       WHEN 7 THEN 'Report Part'
       WHEN 8 THEN 'Shared Dataset'
       ELSE 'Other'
     END AS TypeDescription
    ,CONVERT(varbinary(max),Content) AS Content
  FROM ReportServer.dbo.Catalog
  WHERE Type IN (2,5,7,8)
),
--The second CTE strips off the BOM if it exists...
ItemContentNoBOM AS
(
  SELECT
     ItemID,Name,[Type],TypeDescription
    ,CASE
       WHEN LEFT(Content,3) = 0xEFBBBF
         THEN CONVERT(varbinary(max),SUBSTRING(Content,4,LEN(Content)))
       ELSE
         Content
     END AS Content
  FROM ItemContentBinaries
)
--The old outer query is now a CTE to get the content in its xml form only...
,ItemContentXML AS
(
  SELECT
     ItemID,Name,[Type],TypeDescription
    ,CONVERT(xml,Content) AS ContentXML
 FROM ItemContentNoBOM
)
--now use the XML data type to extract the queries, and their command types and text....
SELECT
     ItemID,Name,[Type],TypeDescription,ContentXML
    ,ISNULL(Query.value('(./*:CommandType/text())[1]','nvarchar(1024)'),'Query') AS CommandType
    ,Query.value('(./*:CommandText/text())[1]','nvarchar(max)') AS CommandText
FROM ItemContentXML
--Get all the Query elements (The "*:" ignores any xml namespaces)
CROSS APPLY ItemContentXML.ContentXML.nodes('//*:Query') Queries(Query)

Here is a screenshot of the results. Again, this would be more impressive on your own machine with your own reports and queries being shown:

Querying the Content XML

I know that the text in the image the above is really small, but you may be able to see that the CommandText column contains the actual SELECT statements used in each report or shared dataset.

This is just the tip of the ice berg though. Once we have the XML, we can do a lot of different things with it. I hope in the future to post some additional entries that outline creating a set of functions that could be used to parse your report xml in interesting ways. till then, enjoy!