Monday, August 18, 2008

Microsoft SQL Server is a relational database management system (RDBMS) produced by Microsoft. Its primary query language is Transact-SQL, an implementation of the ANSI/ISO standard Structured Query Language (SQL) used by both Microsoft and Sybase

Architecture

The architecture of Microsoft SQL Server is broadly divided into three components: SQLOS which implements the basic services required by SQL Server, including thread scheduling, memory management and I/O management; the Relational Engine, which implements the relational database components including support for databases, tables, queries and stored procedures as well as implementing the type system; and the Protocol Layer which exposes the SQL Server functionality.

SQLOS

SQLOS is the base component in the SQL Server architecture. It implements functions normally associated with the Operating System - thread scheduling, memory management, I/O management, buffer pool management, resource management, synchronization primitives and locking, and deadlock detection. Because the requirement of SQL Server is highly specialized, SQL Server implements its own memory and thread management system, rather than using the generic one already implemented in the operating system. It divides all the operations it performs into a series of Tasks - both background maintenance jobs as well as processing requests from clients. Internally, a pool of worker threads is maintained, onto which the tasks are scheduled. A task is associated with the thread until it is completed, only after its completion is the thread freed and returned to the pool. If there are no free threads to assign the task to, the task is temporarily blocked. Each worker thread is mapped onto either an Operating System thread or a fiber. Fibers are user mode threads that implement co-operative multitasking. Using fibers means SQLOS does all the book-keeping of thread management itself, but then it can optimize them for its particular use. SQLOS also includes synchronization primitives for locking as well as monitoring for the worker threads to detect and recover from deadlocks.

SQLOS handles the memory requirements of SQL Server as well. Reducing disc I/O is one of the primary goals of specialized memory management in SQL Server. It maintains a buffer pool, which is used to cache data pages from the disc, and to satisfy the memory requirements for the query processor, and for other internal data structures. SQLOS monitors all the memory allocated from the buffer pool, ensuring that the components return unused memory to the pool, and shuffles data out of the cache to make room for newer data. For changes that are made to the data in buffer, SQLOS writes the data back to the disc lazily, that is when the disc subsystem is either free, or there have significant number of changes made to the cache, while still serving requests from the cache. For this, it implements a Lazy Writer, which handles the task of writing the data back to persistent storage.

SQL Server normally supports up to 2 GB memory on x86 hardware, though it can be configured to use up to 64 GB if the Address Windowing Extension is used in the supporting operating system. For x64 hardware, it supports 8 TB of memory, and 7 TB for IA-64 systems (currently it is limited by Windows Server 2003 SP1 to 1TB). However, when running x86 versions of SQL Server on x64 hardware, it can access 4 GB of memory without any special configuration.

Relational engine

The Relational engine implements the relational data store using the capabilities provided by SQLOS, which is exposed to this layer via the private SQLOS API. It implements the type system, to define the types of the data that can be stored in the tables, as well as the different types of data items (such as tables, indexes, logs etc) that can be stored. It includes the Storage Engine, which handles the way data is stored on persistent storage devices and provides methods for fast access to the data. The storage engine implements log-based transaction to ensure that any changes to the data are ACID compliant. It also includes the query processor, which is the component that retrieves data. SQL queries specify what data to retrieve, and the query processor optimizes and translates the query into the sequence of operations needed to retrieve the data. The operations are then performed by worker threads, which are scheduled for execution by SQLOS.

Protocol layer

Protocol layer implements the external interface to SQL Server. All operations that can be invoked on SQL Server are communicated to it via a Microsoft-defined format, called Tabular Data Stream (TDS).Tabular Data Stream (TDS) is an application layer protocol, used to transfer data between a database server and a client. Initially designed and developed by Sybase Inc. for their Sybase SQL Server relational database engine in 1984, and later by Microsoft in Microsoft SQL Server. TDS packets can be encased in other physical transport dependent protocols, including TCP/IP, Named pipes, and Shared memory. Consequently, access to SQL Server is available over these protocols. In addition, the SQL Server API is also exposed over web services.

Data storage

The main unit of data storage is a database, which is a collection of tables with typed columns. SQL Server supports different data types, including primary types such as Integer, Float, Decimal, Char (including character strings), Varchar (variable length character strings), binary (for unstructured blobs of data), Text (for textual data) among others. It also allows user-defined composite types (UDTs) to be defined and used. SQL Server also makes server statistics available as virtual tables and views (called Dynamic Management Views or DMVs). A database can also contain other objects including views, stored procedures, indexes and constraints, in addition to tables, along with a transaction log. An SQL Server database can contain a maximum of 231 objects, and can span multiple OS-level files with a maximum file size of 220 TB. The data in the database are stored in primary data files with an extension .mdf. Secondary data files, identified with an .ndf extension, are used to store optional metadata. Log files are identified with the .ldf extension.

Storage space allocated to a database is divided into sequentially numbered pages, each 8 KB in size. A page is the basic unit of I/O for SQL Server operations. A page is marked with a 96-byte header which stores metadata about the page including the page number, page type, free space on the page and the ID of the object that owns it. Page type defines the data contained in the page - data stored in the database, index, allocation map which holds information about how pages are allocated to tables and indexes, change map which holds information about the changes made to other pages since last backup or logging, or contain large data types such as image or text. While page is the basic unit of an I/O operation, space is actually managed in terms of an extent which consists of 8 pages. A database object can either span all 8 pages in an extent ("uniform extent") or share an extent with up to 7 more objects ("mixed extent"). A row in a database table cannot span more than one page, so is limited to 8 KB in size. However, if the data exceeds 8 KB and the row contains Varchar or Varbinary data, the data in those columns are moved to a new page (or possible a sequence of pages, called Allocation unit) and replaced with a pointer to the data.

For physical storage of a table, its rows are divided into a series of partitions (numbered 1 to n). The partition size is user defined; by default all rows are in a single partition. A table is split into multiple partitions in order to spread a database over a cluster. Rows in each partition are stored in either B-tree or heap structure. If the table has an associated index to allow fast retrieval of rows, the rows are stored in-order according to their index values, with a B-tree providing the index. The data is in the leaf node of the leaves, and other nodes storing the index values for the leaf data reachable from the respective nodes. If the index is non-clustered, the rows are not sorted according to the index keys. An indexed view has the same storage structure as an indexed table. A table without an index is stored in an unordered heap structure. Both heaps and B-trees can span multiple allocation units.

Buffer management

SQL Server buffers pages in RAM to minimize disc I/O. Any 8 KB page can be buffered in-memory, and the set of all pages currently buffered is called the buffer cache. The amount of memory available to SQL Server decides how many pages will be cached in memory. The buffer cache is managed by the Buffer Manager. Either reading from or writing to any page copies it to the buffer cache. Subsequent reads or writes are redirected to the in-memory copy, rather than the on-disc version. The page is updated on the disc by the Buffer Manager only if the in-memory cache has not been referenced for some time. While writing pages back to disc, asynchronous I/O is used whereby the I/O operation is done in a background thread so that other operations do not have to wait for the I/O operation to complete. Each page is written along with its checksum when it is written. When reading the page back, its checksum is computed again and matched with the stored version to ensure the page has not been damaged or tampered with in the mean time.

Logging and Transaction

SQL Server ensures that any change to the data is ACID-compliant, i.e., it uses transactions to ensure that any operation either totally completes or is undone if fails, but never leave the database in an intermediate state. Using transactions, a sequence of actions can be grouped together, with the guarantee that either all actions will succeed or none will. SQL Server implements transactions using a write-ahead log. Any changes made to any page will update the in-memory cache of the page, simultaneously all the operations performed will be written to a log, along with the transaction ID which the operation was a part of. Each log entry is identified by an increasing Log Sequence Number (LSN) which ensure that no event overwrites another. SQL Server ensures that the log will be written onto the disc before the actual page is written back. This enables SQL Server to ensure integrity of the data, even if the system fails. If both the log and the page were written before the failure, the entire data is on persistent storage and integrity is ensured. If only the log was written (the page was either not written or not written completely), then the actions can be read from the log and repeated to restore integrity. If the log wasn't written, then also the integrity is maintained, even though the database is in a state when the transaction as if never occurred. If it was only partially written, then the actions associated with the unfinished transaction are discarded. Since the log was only partially written, the page is guaranteed to have not been written, again ensuring data integrity. Removing the unfinished log entries effectively undoes the transaction. SQL Server ensures consistency between the log and the data every time an instance is restarted.

Concurrency and locking

SQL Server allows multiple clients to use the same database concurrently. As such, it needs to control concurrent access to shared data, to ensure data integrity - when multiple clients update the same data, or clients attempt to read data that is in the process of being changed by another client. SQL Server provides two modes of concurrency control: pessimistic concurrency and optimistic concurrency. When pessimistic concurrency control is being used, SQL Server controls concurrent access by using locks. Locks can be either shared or exclusive. Exclusive lock grants the user exclusive access to the data - no other user can access the data as long as the lock is held. Shared locks are used when some data is being read - multiple users can read from data locked with a shared lock, but not acquire an exclusive lock. The latter would have to wait for all shared locks to be released. Locks can be applied on different levels of granularity - on entire tables, pages, or even on a per-row basis on tables. For indexes, it can either be on the entire index or on index leaves. The level of granularity to be used is defined on a per-database basis by the database administrator. While a fine grained locking system allows more users to use the table or index simultaneously, it requires more resources. So it does not automatically turn into higher performing solution. SQL Server also includes two more lightweight mutual exclusion solutions - latches and spinlocks - which are less robust than locks but are less resource intensive. SQL Server uses them for DMVs and other resources that are usually not busy. SQL Server also monitors all worker threads that acquire locks to ensure that they do not end up in deadlocks - in case they do, SQL Server takes remedial measures, which in many cases is to kill one of the threads entangled in a deadlock and rollback the transaction it started. To implement locking, SQL Server contains the Lock Manager. The Lock Manager maintains an in-memory table that manages the database objects and locks, if any, on them along with other metadata about the lock. Access to any shared object is mediated by the lock manager, which either grants access to the resource or blocks it.

SQL Server also provides the optimistic concurrency control mechanism, which is similar to the multiversion concurrency control used in other databases. The mechanism allows a new version of a row to be created whenever the row is updated, as opposed to overwriting the row, i.e., a row is additionally identified by the ID of the transaction that created the version of the row. Both the old as well as the new versions of the row are stored and maintained, though the old versions are moved out of the database into a system database identified as Tempdb. When a row is in the process of being updated, any other requests are not blocked (unlike locking) but are executed on the older version of the row. If the other request is an update statement, it will result in two different versions of the rows - both of them will be stored by the database, identified by their respective transaction IDs.

Data retrieval

The main mode of retrieving data from an SQL Server database is querying for it. The query is expressed using a variant of SQL called T-SQL, a dialect Microsoft SQL Server shares with Sybase SQL Server due to its legacy. The query declaratively specifies what is to be retrieved. It is processed by the query processor, which figures out the sequence of steps that will be necessary to retrieve the requested data. The sequence of actions necessary to execute a query is called a query plan. There might be multiple ways to process the same query. For example, for a query that contains a join statement and a select statement, executing join on both the tables and then executing select on the results would give the same result as selecting from each table and then executing the join, but result in different execution plans. In such case, SQL Server chooses the plan that is supposed to yield the results in the shortest possible time. This is called query optimization and is performed by the query processor itself.

SQL Server includes a cost-based query optimizer which tries to optimize on the cost, in terms of the resources it will take to execute the query. Given a query, the query optimizer looks at the database schema, the database statistics and the system load at that time. It then decides which sequence to access the tables referred in the query, which sequence to execute the operations and what access method to be used to access the tables. For example, if the table has an associated index, whether the index should be used or not - if the index is on a column which is not unique for most of the columns (low "selectivity"), it might not be worthwhile to use the index to access the data. Finally, it decides whether to execute the query concurrently or not. While a concurrent execution is more costly in terms of total processor time, because the execution is actually split to different processors might mean it will execute faster. Once a query plan is generated for a query, it is temporarily cached. For further invocations of the same query, the cached plan is used. Unused plans are discarded after some time.

SQL Server also allows stored procedures to be defined. Stored procedures are parameterized T-SQL queries, that are stored in the server itself (and not issued by the client application as is the case with general queries). Stored procedures can accept values sent by the client as input parameters, and send back results as output parameters. They can also call other stored procedures, and can be selectively provided access to. Unlike other queries, stored procedures have an associated name, which is used at runtime to resolve into the actual queries. Also because the code need not be sent from the client every time (as it can be accessed by name), it reduces network traffic and somewhat improves performance. Execution plans for stored procedures are also cached as necessary.

SQL CLR

Microsoft SQL Server 2005 includes a component named SQL CLR via which it integrates with .NET Framework. Unlike most other applications that leverage .NET Framework functionality, SQL Server itself hosts the .NET Framework runtime, i.e., memory, threading and resource management requirements of .NET Framework are satisfied by SQLOS itself, rather than the underlying Windows operating system. SQLOS provides deadlock detection and resolution services for .NET code as well. With SQL CLR, stored procedures and triggers can be written in any managed .NET language, including C# and VB.NET. Managed code can also be used to define UDTs which can be persisted in the database. Managed code is compiled to .NET assemblies and after being verified for type safety, registered at the database. After that, they can be invoked like any other procedure.[9] However, only a subset of the Base Class Library is available, when running code under SQL CLR. Most APIs relating to user interface functionality are not available.

When writing code for SQL CLR, data stored in SQL Server databases can be accessed using the ADO.NET APIs like any other managed application that accesses SQL Server data. However, doing that creates a new database session, different from the one in which the code is executing. To avoid this, SQL Server provides some enhancements to the ADO.NET provider that allows the connection to be redirected to the same session which already hosts the running code. Such connections are called context connections and are set by setting context connection parameter to true in the connection string. SQL Server also provides several other enhancements to the ADO.NET API, including classes to work with tabular data or a single row of data as well as classes to work with internal metadata about the data stored in the database. It also provides access to the XML features in SQL Server, including XQuery support. These enhancements are available only when running code under SQL CLR, and not available generally.

Services

SQL Server also includes an assortment of add-on services. While these are not essential for the operation for the database system, these provide value added services on top of the core database management system. These services either run as a part of some SQL Server component or out-of-process as Windows Service and presents their own API to control and interact with them.

Service Broker

The Service Broker, which runs as a part of the database engine, provides a reliable messaging and message queuing platform for SQL Server applications. Used inside an instance, it is used to provide an asynchronous programming environment. For cross instance applications, Service Broker communicates over TCP/IP and allows the different components to be synchronized together, via exchange of messages.

Replication Services

SQL Server Replication Services are used by SQL Server to replicate and synchronize database objects, either in entirety or a subset of the objects present, across replication agents, which might be other database servers across the network, or database caches on the client side. Replication follows a publisher/subscriber model, i.e., the changes are sent out by one database server ("publisher") and are received by others ("subscribers"). SQL Server supports three different types of replication:

Transaction replication

Each transaction made to the publisher database (master database) is synced out to subscribers, who update their databases with the transaction. Transactional replication synchronizes databases in near real time.

Merge replication

Changes made at both the publisher and subscriber databases are tracked, and periodically the changes are synchronized bi-directionally between the publisher and the subscribers. If the same data has been modified differently in both the publisher and the subscriber databases, synchronization will result in a conflict which has to be resolved - either manually or by using pre-defined policies.

Snapshot replication

Snapshot replication published a copy of the entire database (the then-snapshot of the data) and replicates out to the subscribers. Further changes to the snapshot are not tracked.

Analysis Services

SQL Server Analysis Services adds OLAP and data mining capabilities for SQL Server databases. The OLAP engine supports MOLAP, ROLAP and HOLAP storage modes for data. Analysis Services supports the XML for Analysis standard as the underlying communication protocol. The cube data can be accessed using MDX queries.[17] Data mining specific functionality is exposed via the DMX query language. Analysis Services includes various algorithms - Decision trees, clustering algorithm, Naive Bayes algorithm, time series analysis, sequence clustering algorithm, linear and logistic regression analysis, and neural networks - for use in data mining.

Reporting Services

SQL Server Reporting Services is a report generation environment for data gathered from SQL Server databases. It is administered via a web interface. Reporting services features a web services interface to support the development of custom reporting applications. Reports are created as RDL files.

Reports can be designed using recent versions of Microsoft Visual Studio(including Visual Studio.NET 2003 onwards) with Business Intelligence Development Studio, installed or with the included Report Builder. Once created, RDL files can be rendered in a variety of formats including Excel, PDF, CSV, XML, TIFF (and other image formats), and HTML Web Archive.

Notification Services

SQL Server Notification Services is a platform for generating notifications, which are sent to Notification Services subscribers. A subscriber registers for a specific event or transaction (which is registered on the database server as a trigger); when the event occurs, Notification Services uses Service Broker to send a message to the subscriber informing about the occurrence of the event.

Integration Services

SQL Server Integration Services is used to integrate data from different data sources. It is used for the ETL capabilities for SQL Server for data warehousing needs. Integration Services includes GUI tools to build data extraction workflows integration various functionality such as extracting data from various sources, querying data, transforming data including aggregating, duplication and merging data, and then loading the transformed data onto other sources, or sending e-mails detailing the status of the operation.

Full Text Search Service

The SQL Server Full Text Search service architecture

The SQL Server Full Text Search service architecture

SQL Server Full Text Search service is a specialized indexing and querying service for unstructured text stored in SQL Server databases. The full text search index can be created on any column with character based text data. It allows for words to be searched for in the text columns. While it can be performed with the SQL LIKE operator, using SQL Server Full Text Search service can be more efficient. Full Text Search (FTS) allows for inexact matching of the source string, indicated by a Rank value which can range from 0 to 1000 - a higher rank means a more accurate match. It also allows linguistic matching ("inflectional search"), i.e., linguistic variants of a word (such as a verb in a different tense) will also be a match for a given word (but with a lower rank than an exact match). Proximity searches are also supported, i.e., if the words searched for do not occur in the sequence they are specified in the query but are near each other, they are also considered a match. T-SQL exposes special operators that can be used to access the FTS capabilities.

The Full Text Search engine is divided into two processes - the Filter Daemon process (msftefd.exe) and the Search process (msftesql.exe). These processes interact with the SQL Server. The Search process includes the indexer (that creates the full text indexes) and the full text query processor. The indexer scans through text columns in the database. It can also index through binary columns, and use iFilters to extract meaningful text from the binary blob (for example, when a Microsoft Word document is stored as an unstructured binary file in a database). The iFilters are hosted by the Filter Daemon process. Once the text is extracted, the Filter Daemon process breaks it up into a sequence of words and hands it over to the indexer. The indexer filters out noise words, i.e., words like A, And etc, which occur frequently and are not useful for search. With the remaining words, an inverted index is created, associating each word with the columns they were found in. SQL Server itself includes a Gatherer component that monitors changes to tables and invokes the indexer in case of updates.

When a full text query is received by the SQL Server query processor, it is handed over to the FTS query processor in the Search process. The FTS query processor breaks up the query into the constituent words, filters out the noise words, and uses an inbuilt thesaurus to find out the linguistic variants for each word. The words are then queried against the inverted index and a rank of their accurateness is computed. The results are returned to the client via the SQL Server process.

History

Genesis

SQL Server Release History

Version

Year

Release Name

Codename

1.0
(OS/2)

1989

SQL Server 1.0

-

4.21
(WinNT)

1993

SQL Server 4.21

-

6.0

1995

SQL Server 6.0

SQL95

6.5

1996

SQL Server 6.5

Hydra

7.0

1998

SQL Server 7.0

Sphinx

-

1999

SQL Server 7.0
OLAP Tools

Plato

8.0

2000

SQL Server 2000

Shiloh

8.0

2003

SQL Server 2000
64-bit Edition

Liberty

9.0

2005

SQL Server 2005

Yukon

10.0

2008?

SQL Server 2008

Katmai

The code base for MS SQL Server (prior to version 7.0) originated in Sybase SQL Server, and was Microsoft's entry to the enterprise-level database market, competing against Oracle, IBM, and, later, Sybase itself. Microsoft, Sybase and Ashton-Tate originally teamed up to create and market the first version named SQL Server 1.0 for OS/2 (about 1989) which was essentially the same as Sybase SQL Server 3.0 on Unix, VMS, etc. Microsoft SQL Server 4.2 was shipped around 1992 (available bundled with Microsoft OS/2 version 1.3). Later Microsoft SQL Server 4.21 for Windows NT was released at the same time as Windows NT 3.1. Microsoft SQL Server v6.0 was the first version designed for NT, and did not include any direction from Sybase.

About the time Windows NT was released, Sybase and Microsoft parted ways and pursued their own design and marketing schemes. Microsoft negotiated exclusive rights to all versions of SQL Server written for Microsoft operating systems. Later, Sybase changed the name of its product to Adaptive Server Enterprise to avoid confusion with Microsoft SQL Server. Until 1994 Microsoft's SQL Server carried three Sybase copyright notices as an indication of its origin.

Since parting ways, several revisions have been done independently. SQL Server 7.0 was the first true GUI based database server and was a rewrite from the legacy Sybase code. It was succeeded by SQL Server 2000, which was the first edition to be launched in a variant for the IA-64 architecture.

In the six years since release of Microsoft's previous SQL Server product (SQL Server 2000), advancements have been made in performance, the client IDE tools, and several complementary systems that are packaged with SQL Server 2005. These include: an ETL tool (SQL Server Integration Services or SSIS), a Reporting Server, an OLAP and data mining server (Analysis Services), and several messaging technologies, specifically Service Broker and Notification Services.

SQL Server 2005

SQL Server 2005, released in November 2005, is the successor to SQL Server 2000. It included native support for managing XML data, in addition to relational data. For this purpose, it defined an xml data type that could be used either as a data type in database columns or as literals in queries. XML columns can be associated with XSD schemas; XML data being stored is verified against the schema. XML is converted to an internal binary data type before being stored in the database. Specialized indexing methods were made available for XML data. XML data is queried using XQuery; SQL Server 2005 added some extensions to the T-SQL language to allow embedding XQuery queries in T-SQL. In addition, it also defines a new extension to XQuery, called XML DML, that allows query-based modifications to XML data. SQL Server 2005 also allows a database server to be exposed over web services using TDS packets encapsulated within SOAP requests. When the data is access over web services, results are returned as XML.

For relational data, T-SQL has been augmented with error handling features and support for recursive queries. SQL Server 2005 has also been enhanced with new indexing algorithms and better error recovery systems. Data pages are checksummed for better error resiliency, and optimistic concurrency support has been added for better performance. Permissions and access control have been made more granular and the query processor handles concurrent execution of queries in a more efficient way. Partitions on tables and indexes are supported natively, so scaling out a database onto a cluster is easier. SQL CLR was introduced with SQL Server 2005 to let it integrate with the .NET Framework.

SQL Server 2008

The next version of SQL Server is SQL Server 2008, code-named "Katmai", slated to launch on February 27, 2008 and release (RTM) in Q3 2008.The most recent CTP was made available on February 19, 2008. SQL Server 2008 aims to make data management self-tuning, self organizing, and self maintaining with the development of SQL Server Always On technologies, to provide near-zero downtime. SQL Server 2008 will also include support for structured and semi-structured data, including digital media formats for pictures, audio, video and other multimedia data. In current versions, such multimedia data can be stored as BLOBs (binary large objects), but they are generic bitstreams. Intrinsic awareness of multimedia data will allow specialized functions to be performed on them. According to Paul Flessner, senior Vice President, Server Applications, Microsoft Corp., SQL Server 2008 can be a data storage backend for different varieties of data: XML, email, time/calendar, file, document, spatial, etc as well as perform search, query, analysis, sharing, and synchronization across all data types.

Other new data types include specialized date and time types and a Spatial data type for location-dependent data.[34] Better support for unstructured and semi-structured data is provided using the FILESTREAM[35] data type has been added, which can be used to reference any file stored on the file system.[36] Structured data and metadata about the file is stored in SQL Server database, whereas the unstructured component is stored in the file system. Such files can be accessed both via Win32 file handling APIs as well as via SQL Server using T-SQL; doing the latter accesses the file data as a binary BLOB. Backing up and restoring the database backs up or restores the referenced files as well. SQL Server 2008 also natively supports hierarchical data, and included T-SQL constructs to directly deal with them, without using recursive queries.

Spatial data will be stored in two types. A "Flat Earth" (GEOMETRY or planar) data type represents geospatial data which has been projected from its native, spherical, coordinate system into a plane A "Round Earth" data type (GEOGRAPHY) uses an ellipsoidal model in which the Earth is defined as a single continuous entity which does not suffer from the singularities such as the international dateline, poles, or map projection zone "edges". Approximately 70 methods will be available when SQL Server "Katmai" ships to represent spatial operations for the Open Geospatial Consortium Simple Features for SQL, Version 1.1

SQL Server includes better compression features, which also helps in improving scalability. It also includes Resource Governor that allows reserving resources for certain users or workflows. It will also include capabilities for transparent encryption of data as well as compression of backups. SQL Server Katmai will support the ADO.NET Entity Framework and the reporting tools, replication, and data definition will be build around the Entity Data Model.[39] SQL Server Reporting Services will gain charting capabilities from the integration of the data visualization products from Dundas Data Visualization Inc., which was acquired by Microsoft. On the management side, SQL Server 2008 will include the Declarative Management Framework which allows configuring policies and constraints, on the entire database or certain tables, declaratively. The version of SQL Server Management Studio included with SQL Server 2008 supports IntelliSense for SQL queries. However, with the current CTP, it is limited to SELECT queries; it will be expanded to other T-SQL constructs over later releases. SQL Server 2008 will also make the databases available via Windows PowerShell providers and management functionality available as Cmdlets, so that the server and all the running instances can be managed from Windows PowerShell.

Editions

Microsoft makes SQL Server available in multiple versions, with different feature sets and targeting different users. These versions are:

SQL Server Compact Edition (SQL CE)

The compact edition is an embedded database engine. Due to its small size (2MB DLL footprint), it has a slightly reduced feature set compared to the other editions. It is limited to 4GB maximum database size and cannot be run as a Windows service, Compact Edition must be hosted by the application using it.

SQL Server Express Edition

SQL Server Express Edition is a scaled down, free edition of SQL Server, which includes the core database engine. While there are no limitations on the number of databases or users supported, it is limited to using one processor, 1 GB memory and 4 GB database files. The entire database is stored in a single .mdf file, and thus making it suitable for XCOPY deployment. It is intended as a replacement for MSDE. The Full-Text Search Service is made available as a free add-on to SQL Server Express edition as well. In addition, a stripped down version of SQL Server Management Studio is also available for this edition.

SQL Server Workgroup Edition

SQL Server Workgroup Edition includes the core database functionality but does not include the additional services.

SQL Server Standard Edition

SQL Server Standard edition includes the core database engine, along with the stand-alone services. It differs from Enterprise edition in that it supports less number of active instances (number of nodes in a cluster) and does not include some high-availability functions such as hot-add memory (allowing memory to be added while the server is still running), and parallel indexes.

SQL Server Enterprise Edition

SQL Server Enterprise Edition is the full-featured version of SQL Server, including both the core database engine and add-on services, while including a range of tools for creating and managing an SQL Server cluster.

SQL Server Developer Edition

SQL Server Developer Edition includes the same features as SQL Server Enterprise Edition, but is limited by the license to be only used as a development and test system, and not as production server. This version is available to download by students free of charge as a part of Microsoft's DreamSpark program.

Tools

SQLCMD

SQLCMD is a command line application that comes with Microsoft SQL Server, and exposes the management features of SQL Server. It allows SQL queries to be written and executed from the command prompt. It can also act as a scripting language to create and run a set of SQL statements as a script. Such scripts are stored as a .sql file, and are used either for management of databases or to create the database schema during the deployment of a database.

Visual Studio

Microsoft Visual Studio includes native support for data programming with Microsoft SQL Server. It can be used to write and debug code to be executed by SQL CLR. It also includes a data designer that can be used to graphically create, view or edit database schemas. Queries can be created either visually or using code. SSMS 2008 onwards, provides intellisense for SQL queries as well.

SQL Server Management Studio

Microsoft SQL Server Management Studio showing a query, the results of the query, and the Object Explorer pane while connected to a SQL Server database engine instance.

Microsoft SQL Server Management Studio showing a query, the results of the query, and the Object Explorer pane while connected to a SQL Server database engine instance.

SQL Server Management Studio is a GUI tool included with SQL Server 2005 and later for configuring, managing, and administering all components within Microsoft SQL Server. The tool includes both script editors and graphical tools which work with objects and features of the server. SQL Server Management Studio replaces Enterprise Manager as the primary management interface for Microsoft SQL Server since SQL Server 2005. A version of SQL Server Management Studio is also available for SQL Server Express Edition, for which it is known as SQL Server Management Studio Express (SSMSE).

A central feature of SQL Server Management Studio is the Object Explorer, which allows the user to browse, select, and act upon any of the objects within the server. It can be used to visually observe and analyze query plans and optimize the database performance, among others. SQL Server Management Studio can also be used to create a new database, alter any existing database schema by adding or modifying tables and indexes or analyze performance. It includes the query windows which provides a GUI based interface to write and execute queries.

Business Intelligence Development Studio

Business Intelligence Development Studio (BIDS) is the IDE from Microsoft used for developing data analysis and Business Intelligence solutions utilizing the Microsoft SQL Server Analysis Services, Reporting Services and Integration Services. It is based on the Microsoft Visual Studio development environment but customizes with the SQL Server services-specific extensions and project types, including tools, controls and projects for reports (using Reporting Services), Cubes and data mining structures (using Analysis Services).

Programmability

T-SQL

T-SQL (Transact-SQL) is the primary means of programming and managing SQL Server. It exposes keywords for the operations that can be performed on SQL Server, including creating and altering database schemas, entering and editing data in the database as well as monitoring and managing the server itself. Client applications, both which consume data or manage the server, leverage SQL Server functionality by sending T-SQL queries and statements which are then processed by the server and results (or errors) returned to the client application. SQL Server allows it to be managed using T-SQL. For this it exposes read only tables from which server statistics can be read. Management functionality is exposed via system-defined stored procedures which can be invoked from T-SQL queries to perform the management operation.

SQL Native Client

SQL Native Client is the native client side data access library for Microsoft SQL Server, version 2005 onwards. It natively implements support for the SQL Server features including the TDS implementation, support for mirrored SQL Server databases, full support for all data types supported by SQL Server, asynchronous operations, query notifications, encryption support, as well as receiving multiple result sets in a single database session. SQL Native Client is used under the hood by SQL Server plug-ins for other data access technologies, including ADO or OLE DB. It is also used by the native SQL Server ADO.NET provider, SqlClient. The SQL Native Client can also be directly used, bypassing the generic data access layers.

Transact-SQL

Transact-SQL (T-SQL) is Microsoft's and Sybase's proprietary extension to the SQL language. Microsoft's implementation ships in the Microsoft SQL Server product. Sybase uses the language in its Adaptive Server Enterprise, the successor to Sybase SQL Server.

Transact-SQL enhances SQL with these additional features:

Flow control

Keywords for flow control in Transact-SQL include BEGIN and END, BREAK, CONTINUE, GOTO, IF and ELSE, RETURN, WAITFOR, and WHILE.

IF and ELSE allow conditional execution. This batch statement will print "weekend" if the current date is a weekend day, or "weekday" if the current date is a weekday.

IF DATEPART(dw, GETDATE()) = 7 OR DATEPART(dw, GETDATE()) = 1
   PRINT 'It is the weekend.'
ELSE
   PRINT 'It is a weekday.'


BEGIN and END mark a block of statements. If more than one statement is to be controlled by the conditional in the example above, we can use BEGIN and END like this:

IF DATEPART(dw, GETDATE()) = 7 OR DATEPART(dw, GETDATE()) = 1
BEGIN
   PRINT 'It is the weekend.'
   PRINT 'Get some rest!'
END
ELSE
BEGIN
   PRINT 'It is a weekday.'
   PRINT 'Get to work!'
END

WAITFOR will wait for a given amount of time, or until a particular time of day. The statement can be used for delays or to block execution until the set time.

RETURN is used to immediately return from a stored procedure or function.

BREAK ends the enclosing WHILE loop, while CONTINUE causes the next iteration of the loop to execute. An example of a WHILE loop is given below.

Local variables

Local variables are so named because they're local to the script executing them. Transact SQL doesn't support user-defined global variables.

DECLARE will declare a variable, giving it a name and a type. The SET statement can be used to provide a value, and the variable may be used in a statement by referencing its name.

This script declares a variable as an integer, initializes it, then uses WHILE to execute a loop.

DECLARE @Counter INT
SET @Counter = 10
WHILE @Counter > 0
BEGIN
   PRINT 'The count is ' + CONVERT(VARCHAR(10), @Counter)
   SET @Counter = @Counter - 1
END

The body of the loop will print a message including the value of the variable, and then decrement the counter.

A variable may be initialized as the result of a statement, like this:

DECLARE @ArticleCount INT
SELECT @ArticleCount = COUNT(*) FROM Articles
 
INSERT INTO SizeLog (SampleTime, ArticleCount) VALUES (GETDATE(), @ArticleCount)

which will get the count of rows in the Articles table, then insert a row including that count and the current clock time into the SizeLog table.

Changes to DELETE and UPDATE statements

In Transact-SQL, both the DELETE and UPDATE statements allow a FROM clause to be added which allows joins to be included.

This example deletes all users who have been flagged with the 'Idle' flag.

DELETE users 
  FROM users as u
  JOIN user_flags as f
    ON u.id=f.id
 WHERE f.name = 'Idle'

BULK INSERT

BULK INSERT is a Transact-SQL statement which implements a bulk data-loading process, inserting multiple rows into a table, reading data from an external sequential file. Use of BULK INSERT results in better performance than processes which issue individual INSERT statements for each row to be added.

No comments: