Allocation Add-in Version 0.2

Version 0.2 of the Allocation Add-in has just been released.

It’s available from CodePlex here.

This new version includes support for SQL Server 2008 RTM and has a couple of new features.

allocationinfo_fitscan

The Allocation Map can now be set so that the entire file is rendered to fit the screen without the need for scrolling. This is the default view. Fit map is rendered asynchronously and is also cached.

 newbuttons

There’s also the option to toggle on and off the table and allocation map.

Allocation Information Add-in

First of all, my apologies. I’ve been busy with work and other stuff so I haven’t had much spare time to update this blog or SQL Internals Viewer.

One thing I have been working on is a new free add-in for SQL Server Management Studio called the Allocation SQL Server Management Studio Add-in (maybe it needs a better name...)

Recently I’ve been working on refactoring a large database. I wanted a quick way of seeing in which tables the majority of data was stored in, but it’s not that easy without using sp_spaceused or querying DMVs.

The add-in makes space allocation easy to see by creating a new screen in SSMS (both 2005 and 2008) that has a table and displays things like table size and row numbers, and also the percentage of the database the table uses.

allocinfo

As you can see from the screenshot it includes an Allocation Map similar to SQL Internals Viewer. I see the possible* future of the add-in as a version of the Internals Viewer that is more geared towards DBA use. I’m going to add in fragmentation information soon.

The add-in is my entry to the SQL Heroes competition. I’ve created the project in Codeplex (http://www.codeplex.com/) so you can view and download the source code.

I’m warming to Codeplex, and releasing the source of SQL Internals Viewer is something I’ve had in mind for a while, so it’ll probably go in there too.

Release

The release 0.1 of the add-in can be downloaded here

 

*I use possible as the add-in could also turn into Internals Viewer 2. Which would you prefer – standalone as it is now or integrated with SSMS?

Page compression - internals and examples

First of all, apologies for the delay in writing this. I meant to blog about page compression as soon as I added it in to Internals Viewer but I didn’t get round to it.

Going over it again for this blog post and testing things out with RC0 I found a few bugs, so along with this comes a new release, version 1.02. It can be downloaded here.

Syntax:  ALTER TABLE table REBUILD WITH (DATA_COMPRESSION = PAGE)

Page compression uses the following techniques to save space:

  • Row Compression (covered here)
  • Prefix compression
  • Dictionary compression

Applying Page compression to a table or index doesn’t necessarily mean that all of the techniques are used. Row compression will always be used if page compression is used, but prefix and dictionary compression are only used if SQL Server determines that space can be saved by using them.

For this post I’ve created an example script that can be used with SQL Server 2008 (RC0) and SQL Internals Viewer. The script will create and populate two tables, one called FirstNames_PrefixOnly and one called FirstNames_PrefixAndDictionary. It can be downloaded here.

Both page compression techniques work by cutting down on repetition in the page. Prefix compression identifies common prefixes on a per-column basis. SQL Server identifies a common prefix and then it assumes that all values start with that prefix. Records only need to store the differences from the prefix.

Here’s an example. All of the following names start with Antoni.  The prefix is stored as Antoni so only the suffixes need to be stored, saving space.

Antonia
Antoni
e
Antoni
etta
Antoni
na
Antonio

Dictionary compression is applied after prefix compression and identifies common patterns in the data. If a particular pattern occurs more that once space can be saved by adding the pattern to the dictionary and in its place adding a reference back to the dictionary entry. Prefixes are specific to each column, but the dictionary can be used across columns.

Following on from the example of first names beginning with Antoni, there may be repetition in the table:

Antonietta
Antonietta
Antonietta
Antonietta
Antonina
Antonina
Antonina
Antonina

Prefix compression (in red) has already identified common prefixes. Dictionary compression indentifies the common pattern (etta and na), adds them to the dictionary and replaces them with a reference to the entry.

So this...

Data:       Antonietta
            Antonietta
            Antonina
            Antonina

Becomes this...

Prefix:      Antoni
Dictionary:  [0]etta, [1]na
Data:        [0]
             [0]
             [1]
             [1]

There’s more information on this on the Storage Engine Blog and also in Books Online.

Compression Info structure

If a table or index uses Page compression and it uses Prefix or Dictionary compression it will have something called a Compression Info (CI) structure just after the page header starting at byte 96.

The CI structure contains:

  • A header describing the CI structure and what it contains
  • The Anchor Record (for Prefix compression)
  • The Dictionary (if used)

In SQL Internals Viewer if a page has a CI structure an additional set of options will appear underneath the Offset table. Selecting one of the three items will colourise and decode it in the Page Viewer.


 
Header

The CI header is made of the following parts:

  • Status Bits – 1 byte
    • If bit 1 is 1 the CI has an anchor record
    • If bit 2 is 1 the CI has a dictionary
  • Page Mod Count – 2 byte short (I’m not sure what this does)
  • Length – Length of the anchor record (Only present if there is a dictionary) – 2 byte short
  • Size – Size of the CI structure – 2 byte short

Prefix Compression - Anchor Record

The anchor record is a record that uses the new row compression record format. The anchor record defines the prefix, if it exists, for each column.

For more on the new row compression record format see the previous post about row compression.

The screenshot above shows the anchor record for the first page of the example FirstNames_PrefixOnly table.

0x416E6E6162656C6C61 decodes to Annabella – this is the prefix for column 1.

 
 
In the example table, NameId 461 is Annemarie. The first name field is made up of two things. The first is the data offset. This is 1 or 2 bytes (depending on if the first bit is set) that determines at what offset the subsequent data starts. 03 decodes to 3 and 65 6D 61 72 69 65 is emarie. This means the prefix is used for the first three bytes, and then the rest of the data is taken from the field.

Annabella (Prefix from the anchor record)
123
emarie (Data from the row)
Annemarie

The record for Annabella (NameId 441) doesn’t use any space and the prefix provides all of the data:


 
When the prefix isn’t used at all the data offset is set to 0, which means the entire anchor prefix is ignored:

 
Dictionary Compression

The Dictionary is stored in the Compression Info structure.


 
The dictionary structure is as follows:

  • Entry count – 2 bytes
    • This defines the number of entries in the dictionary
  • Dictionary entry offset array – 2 * Entry count bytes
    • They array defines the end offset of each dictionary entry in a very similar way to the column offset array in a standard record with variable length fields.
  • Dictionary entries
    • Defined by the offset array

The example script creates a table called FirstNames_PrefixAndDictionary. The table has repetition in it which makes it suitable for dictionary compression.

How does a field refer back to the dictionary? The row structure discussed in the previous row compression post has a CD Array that contains information about each column in a record stored as 4-bit integers, two to a byte.  Row compression covers values 0-10 for the CD array, above 10 and is used to mark that the value in the field is a symbol, a reference to the dictionary item. The size of the symbol is determined by the CD array value minus 11.

Here’s an example:

This record has two columns, so the CD Array is one byte (0xC3) that is split out into 3 (2 bytes short) and 12 (1 byte symbol). The 1 byte symbol value is 4 so we need to replace this with the data in dictionary entry 4.

Because this column has a prefix the first byte is the offset of the data, and the subsequent bytes need to be appended at the offset. This column has a prefix of 41 64 65 6C 61 69 64 61  (Adelaida).

Dictionary 4 has the value 04 65 which means 0x65 (e) is appended to the prefix at byte 4, this is 41 64 65 6C 65. This decodes to Adele. It's as simple as that!

SQL Internals Viewer - New version with sparse column support

I've just released a new version of SQL Internals Viewer that has support for 2008 sparse columns, a feature introduced in SQL Server 2008 CTP6.

There are also a few bug fixes and minor changes.

It's available to download from http://www.sqlinternalsviewer.com/

Thanks to Kalen for the help with the sparse vector complex header info.

SQL Internals Viewer 1.0 Released

I’m pleased to announce that SQL Internals Viewer 1.0 has been released. It can be downloaded from www.sqlinternalsviewer.com.

I've also put up the first part of a user guide that covers the main window and Allocation Map. The second part will follow shortly which will cover the Page Viewer. The user guide is available here.

If you've got an existing version installed it will need to be removed through Add/Remove Programs or Programs and Features in Control Panel.

New features

Encode and Find is a new feature in the Page Viewer that allows you to encode a value to a particular data type and then search for it in the page. It can be accessed using the Page – Encode and Find menu item or the button on the toolbar.

Encode and Find

There’s also a new feature on the hex viewer so that once you’ve found the data you are looking for you can select the record that it is contained in. This can be done by right clicking on the byte and selecting Select Record.

SQL Server 2008 Page and Row Compression

The Page Viewer can display the new SQL Server 2008 Page and Row compression row structures, including the CI (Compression Information) structure.

2008 Compression

At the moment the application only supports data pages.

Key

There is a new improved version of the Key for the allocation map.

Key

Clicking on an item on the Key will highlight it on the Allocation Map and fade the other items. Clicking on it again will clear the select.

Improvements

There have been several bug fixes and performance fixes, including improvements to the load times for databases.

Clicking on the Allocation Map will open the page in the current Page Viewer. To open the page in a new Page Viewer hold down the shift button and click on the page. There are more details of the new changes in the User Guide.

SQL Server 2008 Support

There is still work to be done to get Internals Viewer working will all of the latest features of SQL Server 2008, including Page and Row compression on indexes and sparse columns. It’s something I’m working on at the moment and I hope to release in the next few months.

Row compression – internal structure

The CTP6 release of SQL Server 2008 includes row and page compression. It’s a feature that will only be in the Developer and Enterprise edition of SQL Server, so I wasn’t sure whether I should (or could!) add it in to SQL Internals Viewer. I had a look into it and thought it would be worth putting it in as it’s one of those things where it’ll be very useful to understand how it works.

So far I’ve only got row compression covered in SQL Internals Viewer, but I thought I’d better get down what I’ve found out so far.

There’s more on the new SQL Server 2008 compression on the SQL Server Storage Engine Blog here.

Row compression has a completely different row format. The size of a field is determined by the minimum amount of space needed to store it.  The Books Online topic ‘Row Compression Implementation’ has a good run down of how space is saved with different data types.

It’s very easy to add row compression to a table. The syntax is:

ALTER TABLE table REBUILD WITH (DATA_COMPRESSION = ROW)

The standard row structure is covered in Kalen Delaney’s Inside Microsoft SQL Server 2005: The Storage Engine, and there’s also an overview here (Storage Engine Blog).

Compressed row structure

(There may be mistakes in this as it’s currently undocumented, please let me know if anything needs correcting)

Compressed row example

Status Bits A
1st byte

This looks the same as a normal row, although there may be differences.

Number of columns 1 or 2 byte integer
2nd/2nd-3rd byte

This is the first instance where space can be saved. If the number of columns can fit into one byte (0-254) one byte will be used, if not two bytes are used. If the first (high-order) bit is 1 on the first byte this indicates a second byte is used.

CD Array
Next (Number of columns/2) + (Number of columns%2) bytes

I’m guessing CD stands for Column Description or Compression Description. It’s an array of 4-bit (nibble) integers, stored 2 per byte.  Every column in the row has a CD Array entry that determines if it is null, empty, stored ‘short’ (and if so the size) or stored ‘long’.
Short and long are the equivalents to the difference between fixed and variable length storage in the standard row format. Short CD Array entries represent fixed length storage (defined by the CD Array entry), but they use the optimal amount or storage. Long fields are similar to variable length fields, they have an entry in a row offset array, and these too use optimized storage.

Possible values for the CD array are:

0 – Null
1 – Empty
2 – 1 byte short
3 – 2 bytes short
4 – 3 bytes short
5 – 4 bytes short
6 – 5 bytes short
7 – 6 bytes short
8 – 7 bytes short
9 – 8 bytes short
10 - Long

If a field is a BIT data type the value of the CD Array is used as the value.

Row compression essentially turns every compressible field into a variable length field. It seems that the distinction between long and short columns is used so the extra overhead (column offset array entry) is only used when necessary. Below 9 bytes the CD array can be used to store the length. Above 8 bytes and an extra two bytes are used for the offset array entry.

Short Column Data
Next ∑ (short bytes in CD Array)

Unknown
?

Number of variable length columns
Next 2 bytes

Column offset array
Next (2 * Number of variable length columns) bytes

Each 2-byte integer defines the end offset of the variable length field

Long Column Data
The long/variable length fields with the offsets defined in the offset array.

Here’s an example in Internals Viewer:

No compression:

Row with no compression

With compression:

Row with compression

This only covers data records. I've still got to look into indexes and after that page compression (which when used also uses row compression). I'll also try to blog on how the data is actually stored and how to decode it.

Hopefully everything will be covered in version 1.0 of SQL Internals Viewer.

Server Alert - Trial version available

There is now a trial version of Server Alert available from http://www.internalexternal.com/ServerAlertTrial.aspx

New Product: Server Alert

I’m pleased to announce a new application called Server Alert.

The application is a small add-in for SQL Server Management Studio that shows a coloured bar at the side of all query windows. The coloured bar indicates which server the window is connected to. Different servers can be assigned different colours.

I’ve created this to make the current connection is a lot clearer. Although the server name is on the status bar at the bottom of a query it can be all too easy to execute a query on the wrong server, especially if multiple queries are open on different connections. Server Alert makes it a lot more apparent what the current connection is to avoid the heart-stopping “was that the right server?” moments!

There is a small demo of it in action at the new website: www.internalexternal.com\serveralert.aspx.

It’s available through www.internalexternal.com for $16.

For example you can colour code green for test or dev environments...

Server Alert connected to Production server

 ...and red for production environments

Server Alert connected to a test database

Stored Procedure parameters

Here’s some more SQL that writes SQL. One way of debugging a stored procedure is to chop off the CREATE PROCEDURE at the top and replace it with DECLARE and SET statements for the variables, then step through the stored procedure.

The following SQL gives an easy way of extracting the stored procedure parameters and creating variables based on the parameters, including the data type.  Just copy-paste the output into a query window.

The variable initialization is output as template parameters so you can press Ctrl+Shift+M and easily populate the variables using the Template Parameter window.

The first version is a simple query, the second is a UDF that you can keep in the master database.

Query version:

DECLARE @StoredProcName VARCHAR(100)

 

SET @StoredProcName = '(Stored Proc Name)'

 

SELECT 'DECLARE ' + c.name +

       ' ' + t.name +

       CASE WHEN t.name LIKE '%char%'

            THEN '(' + CONVERT(varchar, c.max_length) + ')'

            ELSE '' END

FROM   sys.parameters c

       INNER JOIN sys.types t ON c.user_type_id = t.user_type_id

WHERE  c.object_id = OBJECT_ID(@StoredProcName)

UNION ALL

SELECT ' '

UNION ALL

SELECT 'SET ' + c.name +

       ' = <' + c.name +

       ',' + t.name +

       CASE WHEN t.name LIKE '%char%'

            THEN '(' + CONVERT(varchar, c.max_length) + ')'

            ELSE '' END +

       ',>'

FROM   sys.parameters c

       INNER JOIN sys.types t ON c.user_type_id = t.user_type_id

WHERE  c.object_id = OBJECT_ID(@StoredProcName)

Example

Output from SET @StoredProcName = 'HumanResources.uspUpdateEmployeePersonalInfo' in the AdventureWorks db:

DECLARE @EmployeeID int

DECLARE @NationalIDNumber nvarchar(30)

DECLARE @BirthDate datetime

DECLARE @MaritalStatus nchar(2)

DECLARE @Gender nchar(2)

 

SET @EmployeeID = <@EmployeeID,int,>

SET @NationalIDNumber = <@NationalIDNumber,nvarchar(30),>

SET @BirthDate = <@BirthDate,datetime,>

SET @MaritalStatus = <@MaritalStatus,nchar(2),>

SET @Gender = <@Gender,nchar(2),>

User-defined function version:

CREATE FUNCTION dbo.uFn_StoredProcVariables(@StoredProcName SYSNAME)

                                              RETURNS NVARCHAR(MAX) AS

BEGIN

    DECLARE @Declares NVARCHAR(MAX)

    DECLARE @Sets     NVARCHAR(MAX)

 

    SET @Declares =

        (SELECT 'DECLARE ' + c.name +

                ' ' + t.name +

                CASE WHEN t.name LIKE '%char%'

                     THEN '('+CONVERT(varchar, c.max_length)+')'

                     ELSE '' END + CHAR(10)

         FROM   sys.parameters c

                INNER JOIN sys.types t

                             ON c.user_type_id = t.user_type_id

         WHERE  c.object_id = OBJECT_ID(@StoredProcName)

         FOR XML PATH(''))

 

    SET @Sets =

        (SELECT 'SET ' + c.name +

                ' = <' + c.name +

                ',' + t.name +

                CASE WHEN t.name LIKE '%char%'

                     THEN '(' + CONVERT(varchar, c.max_length) + ')'

                     ELSE '' END + ',>' + CHAR(10)

         FROM   sys.parameters c

                INNER JOIN sys.types t

                             ON c.user_type_id = t.user_type_id

         WHERE  c.object_id = OBJECT_ID(@StoredProcName)

         FOR XML PATH(''))

 

    RETURN @Declares

           + CHAR(10)

           + REPLACE(REPLACE(@Sets, '&gt;', '>'), '&lt;', '<')

END

Example

PRINT dbo.uFn_StoredProcVariables('HumanResources.uspUpdateEmployeePersonalInfo')

Scuffling with ‘String or binary data would be truncated’

The error ‘String or binary data would be truncated’ can be annoying.  It occurs when you try to insert or update a string or binary column with a value that is too large. Recently I was trying to INSERT from a SELECT from one table to another and I got this error. It can be a pain tracking down the cause, especially if there are a large number of columns or a large dataset involved.

In the past I’ve written queries to give me the LEN for each column, but again if there are a large number of columns involved this can be very time consuming.

Below is a way of identifying which rows are causing the problem. This doesn’t help if you’ve got a large number of columns, as you still need to work out which field is causing the problem, but it will help if you have a large dataset and the problem rows are very sparse.

For this example I’ll create a couple of tables and generate some data. The source table has a column of VARCHAR(50), whereas the destination has VARCHAR(25):

CREATE TABLE SourceTable

    (

    RowId  INT

   ,Chars  INT

   ,String VARCHAR(50)

    )

GO

 

CREATE TABLE DestinationTable

    (

    RowId  INT

   ,Chars  INT

   ,String VARCHAR(25)

    )

GO

Next the tables are populated with a random number of ‘X’s, between 0 and 50. In theory you should get about 50% with a length above 25 characters and 50% below.

DECLARE @i INT

DECLARE @RandomNumber INT

 

SET @i=0

WHILE @i <= 50

BEGIN

    SET @RandomNumber = ROUND(50 * RAND(), 0)

 

    INSERT INTO SourceTable

    SELECT @i, @RandomNumber, REPLICATE('X', @RandomNumber)

 

    SET @i=@i+1

END

GO

Next try inserting from SourceTable to DestinationTable:

INSERT INTO DestinationTable

SELECT * FROM SourceTable
GO

This results in the error:

Msg 8152, Level 16, State 14, Line 1

String or binary data would be truncated.

The statement has been terminated.

It’s possible to ignore the 'String or binary data would be truncated' message by setting ANSI_WARNINGS to OFF. This will truncate fields where they don’t fit. ANSI_WARNINGS OFF has drawbacks and it is better to correct a problem rather than ignore it.

The following can be used to work out which rows are causing the issue:

1. Take a copy of the destination table:

SELECT * INTO #Destination FROM DestinationTable WHERE 1=2

GO

2. Set ANSI_WARNINGS OFF and perform the insert into the copy of the destination table, then set ANSI_WARNINGS ON again:

SET ANSI_WARNINGS OFF

GO

 

INSERT INTO #Destination

SELECT * FROM SourceTable

GO

SET ANSI_WARNINGS ON

GO

As ANSI_WARNINGS is off SQL Server truncates the fields rather than produces the warning.

3. Next compare what you would like to insert against what was inserted with the ANSI_WARNINGS OFF truncating. By using EXCEPT you only select the rows that don't match, and have therefore been truncated:

SELECT * FROM SourceTable

EXCEPT

SELECT * FROM #Destination

GO

The rows that have been truncated and are the cause of the ‘String or binary data would be truncated’ error.

(Note - The use of EXCEPT limits this to 2005/2008. The finaly query could be re-written for SQL Server 2000 and below.)

This isn’t the most elegant solution, and as I said if there were a large number of columns you’d still need to hunt through for the offender(s), but at least this gives an idea of where to look. I may have missed some glaringly obvious solution to this problem, so I’d be interested to know if anyone has any other ways of dealing it.

 

 

 

SQL Internals Viewer on LearnSQLServer.com

Scott Whigham at LearnSQLServer.com has featured SQL Internals Viewer in a new series of video tutorials. The site has a whole range of video tutorials on SQL Server covering the basics right up to advanced topics.

I've seen the videos and they are a good introduction to the app and what you can do with it.

The videos are available here (requires subscription).

SQL Server 2008 TIME data type

The new TIME type stores a time with a specified scale that defines the fractional second precision.

The scale ranges from 0-7 representing 0-7 significant digits for the fractional seconds. The default precision is TIME(7), giving 7 significant digits, a range of .0000000 to .9999999.

TIME is stored as an integer of various sizes, depending on the scale. For a scale of 0-2 it is stored as a 3 byte integer, 3-4 a 4 byte integer, and for scale 5-7 it is stored as a 5 byte integer.

The scale is then used to calculate the time since midnight, with an accuracy ranging from 1 second to 100 nanoseconds.

If t is the value stored in the time column and n is the scale the time from midnight in seconds can be calculated by t / 10n.

Here’s a summary of the storage and scaling (seconds, milliseconds, and nanoseconds are the respective duration t is multiplied by):

Scale Storage (bytes) Seconds Milliseconds Nanoseconds
TIME(0) 3 1 1000 1000000000
TIME(1) 3 0.1 100 100000000
TIME(2) 3 0.01 10 10000000
TIME(3)  4 0.001 1 1000000
TIME(4)  4 0.0001 0.1 100000
TIME(5) 5 0.00001 0.01 10000
TIME(6) 5 0.000001 0.001 1000
TIME(7) 5 0.0000001 0.0001 100

It’s possible to extract the unscaled value from a TIME value, although it requires a few steps.

DECLARE @Time TIME(7) = '00:01:00' -- Format HH:mm:SS[.nnnnnnn]

DECLARE @BinaryTime VARBINARY(8)

 

SET @BinaryTime = SUBSTRING(CONVERT(VARBINARY, REVERSE(CONVERT(VARBINARY, @Time))),

                            1,

                            DATALENGTH(@Time))

                                               

SELECT CONVERT(BIGINT, @BinaryTime) -- Unscaled TIME value

-- Result: 600000000

The above example gives a result of 600000000, which, looking at the scale makes sense. The scale is 7, so a time of 1 minute past midnight is 60 seconds = 600000000 / 107.

DECLARE @Time TIME(3) = '00:01:00' -- Format HH:mm:SS[.nnnnnnn]

DECLARE @BinaryTime VARBINARY(8)

 

SET @BinaryTime = SUBSTRING(CONVERT(VARBINARY, REVERSE(CONVERT(VARBINARY, @Time))),

                            1,

                            DATALENGTH(@Time))

                                               

SELECT CONVERT(BIGINT, @BinaryTime) -- Unscaled TIME value

 

-- Result: 60000


A scale of 3 gives a result of 6000 as 60 seconds = 6000 / 103

Books Online has more information about the new DATE type here.

SQL Server 2008 DATE data type

SQL Server 2008 has several new data types, including new date and time types.  In a series of short posts I’ll go into how these data types are structured. All of these new types are supported in SQL Internals Viewer, and a new data type viewer is coming up in a future version of the app.

The new date and time data types are:

  • DATE – Stores a date value
  • TIME – Stores a time value with an accuracy of up to 100 nanoseconds
  • DATETIME2 – Stores a date and time value with the higher TIME accuracy
  • DATETIMEOFFSET – Stores a date and time value with a time zone offset

DATE type internals

The date type simply stores a date, ranging from January 1st 0001 (1 AD) to December 31st 9999. Internally the type is stored as a 3 byte (24-bit) integer. The integer value is the number of days since the base date of 01/01/0001.

It isn’t possible to convert from an INT to DATE directly. Running SELECT CONVERT(DATE, 1) will result in the following error:

Msg 529, Level 16, State 2, Line 1

Explicit conversion from data type int t