Extracting SSRS Report RDL (XML) from the ReportServer database

Overview

11/30/2011 – I have created a CodePlex project based on this blog post with some scripts in response to the questions I got in this posts comments. The project is called “SQL Server Reporting Services (SSRS) Catalog Queries”. I have a separate blog post about the project here.

12/17/2011 – I have updated the CodePlex project project and included the SSIS project that extracts catalog contents directly to the file system! w00t!

06/01/2011 – REPOSTED TO FIX FORMATTING ISSUES. LET ME KNOW IF YOU CAN’T READ IT.

Before I even start this entry I should state that the methods I document here, while instructive and useful, are frowned upon by Microsoft, and they are not supported. You should consider getting to the SQL Server Reporting Services (SSRS) item contents via the Web Services provided with SSRS rather than extracting them directly from the database as I describe here. As SSRS is versioned the structure and storage methods used to manage the Report Server items may change, and the methods described here may no longer work. You will have a better chance (although no guarantee) that your code will continue to work against new versions of SSRS if you use the web services instead. Ok, the disclaimer is out of the way.

There have been numerous times that I have wanted to extract the XML of a report (RDL) or the definition of a Shared Data Source (RDS) from the ReportServer database. One of the first times I had this need was as part of a disaster recovery. A client lost the source files for their reports, as well as the reporting services installation. The quickest way to get their report definitions back was to just extract the XML from the ReportServer.dbo.Catalog table’s “Content” column. For a current project I’m working on, I want to determine the database objects (tables, stored procedures, views, functions) that are used by the Datasets in each report. Other times, I have just been curious and wanted a quick way (from within SQL) to view the item definitions.

This blog post, as well as a few to follow document the methods I have developed to do just this.

The ReportServer Database:

SQL Server Reporting Services (SSRS) keeps the various Reports, Data Sources, Images and other resources it provides inside a SQL Server Database. This database may or may not be on the same computer as the Reporting Services instance itself. Also, while the database is named “ReportServer” by default, it could be named something else, so you will need to determine for your installation what instance that database is on, as well as what it’s name is. The scripts below assume the default database name of “ReportServer”.

Inside the ReportServer database there is a Catalog table (ReportServer.dbo.Catalog). The Catalog table stores the items (Folders, Reports, Resources, Linked Reports, Data Sources, Report Models, Report Parts and Shared Datasets) that are available via the SSRS instance. There are numerous columns in the Catalog table, and I won’t describe them all here. However, the ones of interest to this article include:

Column Name Data Type Description
ItemID uniqueidentifier The Primary Key. A system generated GUID.
Path nvarchar(425) The root (“Home”) based path to the item (including its name) in the virtual folder structure provided by the site.
Name nvarchar(425) The name of the report item without the path.
Type int Identifies the type of the item being stored:1 = Folder
2 = Report
3 = Resources
4 = Linked Report
5 = Data Source
6 = Report Model
7 = Report Part (SQL 2008 R2, unverified)
8 = Shared Dataset (SQL 2008 R2)
Content image The actual bytes of the report item. If this is a folder or Linked Report, the content is null, otherwise it is stores the binary image of the report item.

Armed with that knowledge you can see that a query to retrieve the binary content for all reports would would like:

SELECT
   ItemID
  ,Name
  ,Type
  ,Content
FROM ReportServer.dbo.Catalog

And the result might look like:

ItemID Name                    Type  Content
------ ----------------------- ----- -------------------------
909... AdventureWorks2008R2        5 0xEFBBBF3C3F786D6C2076...
8CC... AdventureWorksDW2008R2      5 0xEFBBBF3C3F786D6C2076...
A20... AdventureWorksLT2008R2      5 0xEFBBBF3C3F786D6C2076...
7BE... Northwind                   5 0xEFBBBF3C3F786D6C2076...
630... Products                    2 0x3C3F786D6C2076657273...
DF6... AnnualSalesByMonth          2 0x3C3F786D6C2076657273...
06F... SalespersonReport           2 0x3C3F786D6C2076657273...
D9E... ParameterTest01             2 0x3C3F786D6C2076657273...

The UTF-8 Byte Order Mark (BOM):

Unicode strings can have a special character called the “Byte Order Mark” at the beginning to indicate the byte order (endianess, big endian, little endian) and encoding (UTF-8, UTF-16, UTF-32) that was used to create the string. SSRS stores the XML items (Report RDL and Data Source definitions) using the UTF-8 encoding. it just so happens that UTF-8 Unicode strings do not NEED to have a BOM and in fact ideally would not have one. However, you will see some report items in your SSRS that begin with a specific sequence of bytes (0xEFBBBF). That sequence is the UTF-8 Byte Order Mark. It’s character representation is the following three characters, “”. While it is supported, it can cause problems with the conversion to XML, so it may be worthwhile to remove it for Reports (Type=2), Data Sources (Type=5), Report Parts (Type=7), and Shared Datasets (Type=8). You can do that by testing to see if the first three bytes are 0xEFBBBF and then trimming them if they are. Before we do that though, we have to get rid of that pesky image data type used by the Content field.

Casting Out Image:

The ReportServer.dbo.Catalog.Content column uses the deprecated image data type. Unfortunately we can’t perform string manipulations or direct conversions to varchar or xml from the image data type. To fix the problem, we need to first convert (or cast) the image to a varbinary(max), and then perform any string manipulations or additional conversions on the varbinary value.

Getting the Content as Varbinary, Varchar and XML:

Ok, so putting all the stuff from above together, the following code will extract the contents of all Reports, Data Sources, Report Parts and Shared Datasets (Types 2,5,7, & 8) from the database. I’ve broken the query down into a two CTEs and an outer query to isolate the key parts. you should be able to copy the query below, and run it on the same server as your ReportServer database to view the results:

--The first CTE gets the content as a varbinary(max)
--as well as the other important columns for all reports,
--data sources and shared datasets.
WITH ItemContentBinaries AS
(
  SELECT
     ItemID,Name,[Type]
    ,CASE Type
       WHEN 2 THEN 'Report'
       WHEN 5 THEN 'Data Source'
       WHEN 7 THEN 'Report Part'
       WHEN 8 THEN 'Shared Dataset'
       ELSE 'Other'
     END AS TypeDescription
    ,CONVERT(varbinary(max),Content) AS Content
  FROM ReportServer.dbo.Catalog
  WHERE Type IN (2,5,7,8)
),
--The second CTE strips off the BOM if it exists...
ItemContentNoBOM AS
(
  SELECT
     ItemID,Name,[Type],TypeDescription
    ,CASE
       WHEN LEFT(Content,3) = 0xEFBBBF
         THEN CONVERT(varbinary(max),SUBSTRING(Content,4,LEN(Content)))
       ELSE
         Content
     END AS Content
  FROM ItemContentBinaries
)
--The outer query gets the content in its varbinary, varchar and xml representations...
SELECT
   ItemID,Name,[Type],TypeDescription
  ,Content --varbinary
  ,CONVERT(varchar(max),Content) AS ContentVarchar --varchar
  ,CONVERT(xml,Content) AS ContentXML --xml
FROM ItemContentNoBOM

Here is a screen shot of my results so you can see that the XML type is recognized:

Catalog Content as XML Query Results

Querying the RDL Contents:

Ok, so big deal. We did some trimming and casting and we got XML back. Where this really starts to become useful is when you start using SQL Server’s built in XML data type to parse the XML, and extract key pieces. For example, the following query builds on the query from above and extracts the actual commands used for the each dataset in each Report or Shared Dataset. Cool!

--The first CTE gets the content as a varbinary(max)
--as well as the other important columns for all reports,
--data sources and shared datasets.
WITH ItemContentBinaries AS
(
  SELECT
     ItemID,Name,[Type]
    ,CASE Type
       WHEN 2 THEN 'Report'
       WHEN 5 THEN 'Data Source'
       WHEN 7 THEN 'Report Part'
       WHEN 8 THEN 'Shared Dataset'
       ELSE 'Other'
     END AS TypeDescription
    ,CONVERT(varbinary(max),Content) AS Content
  FROM ReportServer.dbo.Catalog
  WHERE Type IN (2,5,7,8)
),
--The second CTE strips off the BOM if it exists...
ItemContentNoBOM AS
(
  SELECT
     ItemID,Name,[Type],TypeDescription
    ,CASE
       WHEN LEFT(Content,3) = 0xEFBBBF
         THEN CONVERT(varbinary(max),SUBSTRING(Content,4,LEN(Content)))
       ELSE
         Content
     END AS Content
  FROM ItemContentBinaries
)
--The old outer query is now a CTE to get the content in its xml form only...
,ItemContentXML AS
(
  SELECT
     ItemID,Name,[Type],TypeDescription
    ,CONVERT(xml,Content) AS ContentXML
 FROM ItemContentNoBOM
)
--now use the XML data type to extract the queries, and their command types and text....
SELECT
     ItemID,Name,[Type],TypeDescription,ContentXML
    ,ISNULL(Query.value('(./*:CommandType/text())[1]','nvarchar(1024)'),'Query') AS CommandType
    ,Query.value('(./*:CommandText/text())[1]','nvarchar(max)') AS CommandText
FROM ItemContentXML
--Get all the Query elements (The "*:" ignores any xml namespaces)
CROSS APPLY ItemContentXML.ContentXML.nodes('//*:Query') Queries(Query)

Here is a screenshot of the results. Again, this would be more impressive on your own machine with your own reports and queries being shown:

Querying the Content XML

I know that the text in the image the above is really small, but you may be able to see that the CommandText column contains the actual SELECT statements used in each report or shared dataset.

This is just the tip of the ice berg though. Once we have the XML, we can do a lot of different things with it. I hope in the future to post some additional entries that outline creating a set of functions that could be used to parse your report xml in interesting ways. till then, enjoy!

45 Responses

  • This blogs contains some of rare and crucial information about SSRS, which is good.
    With the continuation to this I want to know if some one wants to eliminate BOM characters from XML produce by SSRS 2005, then what it can be.
    I have aware of RSReportServer.Config changes configuration, but I can’t find those XML attribute in this configuration file.
    I can found this file in my –
    C:\Program Files\Microsoft SQL Server\MSSQL.3\Reporting Services\ReportServer
    Can you bring this issue to some solution?

    Thanks

  • This is awesome…thanks for sharing.

  • @Anonymous, I'm not sure what you are asking. In the source code provided in my post the second CTE demonstrates how to test to see if the XML field starts with a BOM, and if so, how to strip if off. Look at the CASE statement in the CTE named "ItemContentNoBOM"

  • Awesome query thanks. Exactly what I was after.
    The same request – find any reports that access a certain SQL field.

    What's source control – if only!!!!

  • Awesome stuff, love it. Is there a way to find out the version of the report? I want to display it at the bottom of the report so if a client calles in I can ensure they have the correct version. So, is there a way to dynamicaly pull the filters into the report header, i want to always show what the user has selected but dont want to hard code each one every time.

  • @Kevin, I haven't found anything that works as a report "version". There isn't anything in the standard RDL markup. I have considered adding my own custom elements to store the version, but the designers would't support that and would require that your report authors be disciplined about entereing the current version in the RDL manually.

    As for the filters, you wouldn't want to get them from the database using the queries I describe here. Thats because the database has the static definition of the report, and wouldnt provide the runtime values provided for reports.

    You might look into using the Report.Parameters collection in a custom VB Function in the Code of the report.

  • yeah, thought about having a table that contained the current version just didnt knoe it that was stored somewhere and like you said getting people to update as they make changes never works. I did try using the report parameters and it works well, just wondered if there was a report part that i could add that would just look through then filters and add a list of what eas selected. Enjoy your postings, keep it up!!

  • Awesome. Just lost our reporting server and got all the reports back. Thanks.

  • HI
    Thanks for the nice blog..

    Though i have one good question for you.

    I wanted to store the names of users woh had accessed the report. we have use the active directory for users.
    Is there any way that can be used to know the history of that report.

    We need this scenario because we need to decommision reports if that hasnt been used for last 3 months.

    Thanks in advance..
    harshal

  • @Harshal, you can get report execution information from the ReportServer.dbo.ExecutionLog table.

    Robert Bruckner has a great blog post on how to query that table here:

    http://tinyurl.com/285a8h2

  • This is fantastic and happens to be /exactly/ what I was looking for.

    Thank you!

  • Hi Great post, in my case I would like to know that tables that are used by a report?

  • @luispimi – I have code that can do that (kind of) and will try to blog about it as soon as possible, however I am in a crunch over the next couple of weeks finishing a course I am writing. However, I can tell you that the hardest part about what you want to do is parsing the SQL statement that the report's data set uses. If the report uses queries in its data sets not stored procedures, there is no easy way to determine the dependencies of the query. I have written logic to attempt to parse the SQL statement to determine the objects that are used in a SELECT statement, but my logic is rudimentary. Parsing a SQL statement is no trivial matter. Anyhow, I promise to try and do a follow up post in the next month or so. Sorry if thats too late to help!

  • @Naomi, thanks for the link. I hope in the next few weeks to post an entry on how you can find any object dependencies of a report based on my lame, but typically effective SQL parsing logic.

  • hi….:)

    "ReportServer.dbo.Catalog table" includes ".rdl file information"

    and…which table includes ".rds file information"??

    i can't find that table…plz help me

  • @Anonymous – .rds (Data Source) information is in the same table. They are the records with a Type column value of 5 (read the blog post again, and that may be clearer). You can run the query I show in the blog post under the heading of "Getting the Content as Varbinary, Varchar and Xml:" and you will see the shared data sources (type=5) listed in there as well.

    If you just ran the LAST query in my blog post, it is specifically looking for data set definitions and so the data sources aren't displayed in that final result.

    HOWEVER – Only SHARED data sources will be documented there. If a report has an EMBEDDED data source you will have to extract that from the reports .rdl.

    Lastly, remember that the information in this blog post is exploratory. If you REALLY want to get the reports data source configuration you should be web services from code, not querying the database. Querying the report server catalog database directly is not supported.

  • I want to extract SSRS Report RDS from the ReportServer database.

    which table has that information ???

    ex)

    Test
    b58a6183-b5fd-4a37-98a8-8ea84bfc255b

    SQL
    data source=xxx.xxx.xxx.xxx;initial catalog=TEST

  • @Anonymous. Again, it's there in the Catalog table. The data source definition will be returned as some XML, and one of the elements in the XML will be the connection string. Here is a modified version of the last query that extract's all connection strings it finds (included those embedded in report .rdl, not just shared data sources). I'm not sure how well you'll be able to read this query from the comments page, so drop me an email (bret@netconnex.com) if you can't read it. Here it is:

    –The first CTE gets the content as a varbinary(max)
    –as well as the other important columns for all reports,
    –data sources and shared datasets.
    WITH ItemContentBinaries AS
    (
    SELECT
    ItemID,Name,[Type]
    ,CASE Type
    WHEN 2 THEN 'Report'
    WHEN 5 THEN 'Data Source'
    WHEN 7 THEN 'Report Part'
    WHEN 8 THEN 'Shared Dataset'
    ELSE 'Other'
    END AS TypeDescription
    ,CONVERT(varbinary(max),Content) AS Content
    FROM ReportServer.dbo.Catalog
    WHERE Type IN (2,5,8)
    ),
    –The second CTE strips off the BOM if it exists…
    ItemContentNoBOM AS
    (
    SELECT
    ItemID,Name,[Type],TypeDescription
    ,CASE
    WHEN LEFT(Content,3) = 0xEFBBBF
    THEN CONVERT(varbinary(max),SUBSTRING(Content,4,LEN(Content)))
    ELSE
    Content
    END AS Content
    FROM ItemContentBinaries
    )
    –The old outer query is now a CTE to get the content in its xml form only…
    ,ItemContentXML AS
    (
    SELECT
    ItemID,Name,[Type],TypeDescription
    ,CONVERT(xml,Content) AS ContentXML
    FROM ItemContentNoBOM
    )
    –now use the XML data type to extract the queries, and their command types and text….
    SELECT
    ItemID,Name,[Type],TypeDescription,ContentXML
    ,ConnectionString.value('(text())[1]','nvarchar(max)') AS ConnectionString
    FROM ItemContentXML
    –Get all the Query elements (The "*:" ignores any xml namespaces)
    CROSS APPLY ItemContentXML.ContentXML.nodes('//*:ConnectString') ConnectionStrings(ConnectionString)

  • @Anonymous, I didn't update the comments in the query I just posted. Here is the exact same query, just with more appropriate comments in it:

    –The first CTE gets the content as a varbinary(max)
    –as well as the other important columns for all reports,
    –data sources and shared datasets.
    WITH ItemContentBinaries AS
    (
    SELECT
    ItemID,Name,[Type]
    ,CASE Type
    WHEN 2 THEN 'Report'
    WHEN 5 THEN 'Data Source'
    WHEN 7 THEN 'Report Part'
    WHEN 8 THEN 'Shared Dataset'
    ELSE 'Other'
    END AS TypeDescription
    ,CONVERT(varbinary(max),Content) AS Content
    FROM ReportServer.dbo.Catalog
    WHERE Type IN (2,5,8)
    ),
    –The second CTE strips off the BOM if it exists…
    ItemContentNoBOM AS
    (
    SELECT
    ItemID,Name,[Type],TypeDescription
    ,CASE
    WHEN LEFT(Content,3) = 0xEFBBBF
    THEN CONVERT(varbinary(max),SUBSTRING(Content,4,LEN(Content)))
    ELSE
    Content
    END AS Content
    FROM ItemContentBinaries
    )
    –The old outer query is now a CTE to get the content in its xml form only…
    ,ItemContentXML AS
    (
    SELECT
    ItemID,Name,[Type],TypeDescription
    ,CONVERT(xml,Content) AS ContentXML
    FROM ItemContentNoBOM
    )
    –now use the XML data type to extract the connection strings (ConnectString) elements text….
    SELECT
    ItemID,Name,[Type],TypeDescription,ContentXML
    ,ConnectionString.value('(text())[1]','nvarchar(max)') AS ConnectionString
    FROM ItemContentXML
    –Get all the ConnectString elements (The "*:" ignores any xml namespaces)
    CROSS APPLY ItemContentXML.ContentXML.nodes('//*:ConnectString') ConnectionStrings(ConnectionString)

  • awesome…..Thank You very much…

  • Hi can we find how many fields are use in report design may be there are many fields in dataset but we are using few in report can we get this?

  • @Kirti, based on your request and those of others, I have created a CodePlex project that has a sample SSMS project with some scripts in it. You can download the project via CodePlex from:

    http://ssrscatalogqueries.codeplex.com/

    The last script in the project "06 – Determine Fields Actually Used.sql" provides an pretty cool answer to your question. Make sure to read the "00" first script in the project as well as run the "01" and "03" scripts as well to create the base objects needed. I hope you and others find this sample project useful.

  • Thank you Bret

  • Bret,
    These scripts are just what I need to try to retrieve the data I need for some SSRS reports that we are missing the report source.

    I have executed the first script to create the ReportQueries database with sucess. Though I have run into an issue with 02 -Query Catalog Content. I've include the where clause to include on the 2,5,7,8 types.
    I'm getting the Msg 9420 "XML parsing: line 906, character 10, illegal xml character".
    Can you offer any advise on how to trouble shoot this issue.
    Thanks, L.Cain

  • @L.Cain. My guess is that there is a specific item (Report, DataSource, etc) that is causing that problem. You might try making the WHERE Clause more specific (like adding a specific item name) to make sure that you can retrieve anything.

    I have run into a problem where some report RDLs have a an extra byte with a value of 0 (0x00) at the end. That could be the problem you are having as well. You might try adding a bit of code to strip of the right most character IF if it is a 0x00. Sorry I don't have an example handy, right now, but I'll try to update my CodePlex project in the next day or two.

    Finally, if you are trying to extract the contents to external files, you might look at another blog post of mine titled, "Extracting SSRS Content using the SSIS “Export Column” Component" (http://blogs.netconnex.com/2011/08/extracting-ssrs-content-using-ssis.html). There is an SSIS project you can download from that blog post that will extract your report catalog contents as files. Kinda cool.

    Lastly, if you are still having problems, drop me an email at bret@netconnex.com and we can take this conversation into a more convenient medium.

  • Thanks for the assistance Bret. I give your suggestions a try.

  • Very Nice Post..

  • This is awesome!!! Thanks for posting.
    We had a webservice version and it would take ages to get all the data (cause we have over 1600+ reports)
    Thanks

  • TP, glad it helped! If you need to extract the items out to file system files, you might be interested in a follow up post I have that uses this query in SSIS to extract the items as files titled "Extracting SSRS Content using the SSIS “Export Column” Component". You can find that post here:
    http://blogs.netconnex.com/2011/08/extracting-ssrs-content-using-ssis.html

  • This is some great stuff. I used the last query posted to try and identify if the Datasource settings had been changed between backups, however, it appeared not to give me updated info which I thought was odd. I ran it against our production ReportServer db but was not seeing the same connection strings as those posted in the DataSources on the ReportManager. They looked like old data for some reason. Any ideas what that might be due to?

  • @Anonymous, I'm not sure what to tell you. The information that is in the Catalog table is the data that Report Manager uses. I have an SSMS project that you can download from http://ssrscatalogqueries.codeplex.com/ that has some specific queries (numbers 07 & 08) that investigate the connection strings used by data sources and their components. You might give them a try. Communicating via comments in a blog post is difficult though so if you continue to have problems, drop me a note at bret@netconnex.com.

  • @L.Cain and others, I have updated the codeplex project to first limit the queries to item types 2,5,7 & 8, as well as adding another CTE to remove null-termination characters (0x00) if they exist. You can download the project here:
    http://ssrscatalogqueries.codeplex.com/releases/view/78926

  • AWESOME. Thank you.

  • Hi Bret, Is there an easy way to extract the filter criteria used in the report.
    Thanks

  • Hi,
    This is a fab article which has saved me loads of boring documentation writing :o) I have a question – I have been trying to get a list of the connection strings for my data sources from a reporting services database (we are using SQL 2008 R2) – and have been querying the content xml from the CatalogContentView view that you kindly provided

    The query I am using is as follows:

    SELECT
    ItemID,Name,[Path],TypeDescription,
    ISNULL(DSD.value(‘(./*:ConnectString/text())[1]’,’nvarchar(max)’),”) AS DataSetName

    FROM ReportQueries.dbo.CatalogContentView AS CCV
    CROSS APPLY CCV.ContentXML.nodes(‘//*:DataSourceDefinition’) DataSourceDefinition(DSD)

    I am getting back the correct list of DataSources – and when I add and delete them they are appearing and disappearing from the query as expected. However if I amend the query string in the data source (for example if I change the server name) – this is not being reflected in the query – the old values still persists – but when I look at the properties of the data source via the report server interface the changes I have made are there.

    This is driving me nuts!

    I presume that I am not picking the connection string up from the correct place – but I can’t see where else it would be? Can you help? I have also tried running your 08 – Extract Connection String Components sql and the same is happening with that (i.e. the changes are not being reflected). Is there some weird caching mode or something bleeding obvious that I am missing?

    Many Thanks
    Gill:-)

    • Gill,

      This is one of the limitations of querying the database directly. As mentioned in my post, the best (and only supported way) to query the Reporting Services catalog is through the web services, not directly against the DB as I describe here. The Data Source item that you are getting back from the CatalogContentView is the ORIGINAL data source as it was first created (likely in Visual Studio). The actual connection string is encrypted and stored in the dbo.DataSources.ConnectionString column. You can’t decrypt that value as part of the SQL query. You can however query the information via the Web Services and retrieve the CURRENT connection string details that way.

      You can fid out more about the Reporting Services web services here: http://msdn.microsoft.com/en-us/library/ms152787(v=sql.105).aspx

      Bret

  • Excellent article. Keep up the good work!

    Many thanks.

  • is Query.value a user defined function? could you please give some links to explain this function?

  • is Query.value a UDF? could you please give some info o fthis function?

  • Hi
    thanks for this post it is really helpful. have not worked on XML much so bit struggling. could you please give some clue how to get DataSourceReference using same code?
    Many thanks

  • This is perfect! Exactly what I was looking for and more. I hadn’t even considered converting the RDL to XML data type and then being able to query the DOM. I almost smacked my foreheard and said “Duh! its XML after all..”

    This is my first visit to your blog, but will definitley be exploring and reading more posts.

  • Question: The final output only has Reports. I was also hoping to extract Data Sources. The first CTE gets them, but they don’t appear thereafter. Why is that?

  • […] guts of query – http://bretstateham.com/extracting-ssrs-report-rdl-xml-from-the-reportserver-database/, matter of searching string in xml returned. (the string you’re searching on lastly […]

Leave a Reply to luispimi