Section 1

Preview this deck

How is a table stored if a clustered index has not been defined on the table?

Front

Star 0%
Star 0%
Star 0%
Star 0%
Star 0%

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Active users

0

All-time users

0

Favorites

0

Last updated

2 years ago

Date created

Mar 1, 2020

Cards (27)

Section 1

(27 cards)

How is a table stored if a clustered index has not been defined on the table?

Front

SQL Server essentially supports two types of tables: a clustered table, one on which a clustered index has been defined, and a heap table, or just plain heap. Unlike a clustered table, data within a heap is not ordered in any way. It is essentially a pile of data. If you add a row to the table, the database engine simply tacks it at the end of the page. When the page fills, data is added to a new page. In most cases, you'll want to create a clustered index on a table to take advantage of the sorting capabilities and query benefits they can deliver. (Consider what it would be like to find a number in a phone book if it were not sorted in any way.) However, if you choose not to create a clustered index, you can still create non-clustered indexes on the heap. In such cases, each row in the index includes a pointer that identifies the row being referenced in the heap. The pointer includes the data file ID, page number, and row number for the targeted data.

Back

What is Microsoft DTS?

Front

Microsoft SQL Server Data Transformation Services (DTS) is a set of graphical tools and programmable objects that lets you extract, transform, and consolidate data from disparate sources into single or multiple destinations.

Back

Explain Consolidation in the context of BI?

Front

Consolidation involves the aggregation of data that can be accumulated and computed in one or more dimensions.

Back

What is the difference between a Primary Key and a Unique Key constraint?

Front

• Primary key: - Doesn't allow null values. - By default it adds a clustered index - A table can have only 1 primary key - Are the target of foreign key constraints. • Unique key: - Allows null value, but only one key can be null. - By default it adds a unique non-clustered index. - You can define multiple unique keys.

Back

What does OLAP consists of?

Front

Consolidation (Roll-up), Drill down, slicing and dicing.

Back

Why would you use a covering index instead of a composite index?

Front

First, let's make sure we understand the differences between them. A composite index is simply one in which you include more than one key column. Multiple key columns can be useful for uniquely identifying a row, as can be the case when a unique cluster is defined on a primary key, or you're trying to optimize an often-used query that references multiple columns. In general, however, the more key columns an index contains, the less efficient that index, which means composite indexes should be used judiciously. That said, there are times when a query would benefit greatly if all the referenced columns were located on the same leaf nodes as the index. This is not an issue for clustered indexes because all data is already three. (That's why it's so important to give plenty of thought to how you create your clustered indexes.) But a nonclustered index includes only the key column values in the leaf nodes. For all other data, the optimizer must take additional steps to retrieve that data from elsewhere, which could represent significant overhead for your most common queries. That's where the covering index comes in. When defining your non-clustered index, you can include columns in addition to the key columns. For example, suppose one of your application's primary queries retrieves data from both the OrderID and OrderDate columns in the Sales table: SELECT OrderID, OrderDate FROM Sales WHERE OrderID = 12345; You can create a composite non-clustered index on both columns, but the OrderDate column only adds overhead to the index and serves no purpose as a key column. A better solution is to create a covering index with OrderID as the key column and OrderDate as the included column: CREATE NON-CLUSTERED INDEX ix_orderid ON dbo.Sales(OrderID) INCLUDE (OrderDate); This way, you avoid the disadvantages of indexing a column unnecessarily, while still benefiting your query. The included column is not part of the key, but the data is still stored in the leaf nodes. This can improve performance without incurring more overhead. Plus, there are fewer restrictions on columns used as included columns, compared to those used as key columns.

Back

Given the many benefits of clustered tables, why even bother with heaps?

Front

Clustered tables are great, and most of your queries will probably perform best of your tables are configured with clustered indexes. But in some cases you might want to leave the table in its natural state, as a heap, and create only non-clustered index to support your queries. A heap, as you'll recall, stores data in an unspecified order. Normally, the database engine adds the data in the order the rows are inserted into the table, although the engine likes to move rows around on occasion to store them more efficiently. As a result, you have no way to predict how the data will be ordered. If the query engine must find data without the benefit of a nonclustered index, it does a full table scan to locate the target rows. On a very small table, this is usually not a big deal, but as a heap grows in size, performance is likely to quickly degrade. A nonclustered index can help by using a pointer that directs the query q u e s t i o n s a n d a n s w e r s engine to the file, page, and row where the data is stored— normally a far better alternative to a table scan. Even so, it's still hard to beat the benefits of a clustered index when weighing query performance. Yet heaps can help improve performance in certain situations. Consider the table that has a lot of insert activity, but few updates and deletes, if any. For example, a table that stores log data is likely restricted mostly to insert operations, until perhaps the data is archived. On a heap, you won't see the type of page splits and fragmentation you would with a clustered index (depending on the key columns) because rows are simply added to the end of the heap. Too much page splitting can have a significant effect on performance, and not in a good way. In general, heaps make insert operations relatively painless, and you don't have to contend with the storage or maintenance overhead you find with clustered indexes. But the lack of updates and deletions should not be the only considerations. The way in which data is retrieved is also an important factor. For example, you should not use a heap if you frequently query ranges of data or the queried data must often be sorted or grouped. What all this means is that you should consider using a heap only when you're working with ultra-light tables or your DML operations are limited to inserts and your queries are fairly basic (and you're still using non-clustered indexes). Otherwise, stick with a well-designed clustered index. One defined on a simple ascending key, such as the ubiquitous IDENTITY column.

Back

Can you create a non-clustered index on a subset of data in your key column?

Front

By default, a non-clustered index contains one row for every row in the table. Of course, you can say the same about a clustered index, given that the index is the table, but in terms of a nonclustered index, the one-to-one relationship is an important concept because, since SQL Server 2008, you've been able to create filtered indexes that limit the rows included in the index. A filtered index can improve query performance because it is smaller and includes filtered statistics, which are more accurate than full-table statistics, resulting in better execution plans. A filtered index also reduces storage and maintenance costs. The index is updated only when the applicable underlying data changes. Be aware, however, that SQL Server places a number of restrictions on filtered indexes, such as not being able to create a filtered index on a view, so be sure to check out the SQL Server documentation.

Back

What does denormalization mean?

Front

Denormalization is the process of attempting to optimize the read performance of a database by adding redundant data or by grouping data. In some cases, denormalization is a means of addressing performance or scalability in relational database software. Databases intended for online transaction processing (OLTP) are typically more normalized than databases intended for online analytical processing (OLAP). OLTP applications are characterized by a high volume of small transactions such as updating a sales record at a supermarket checkout counter. The expectation is that each transaction will leave the database in a consistent state. By contrast, databases intended for OLAP operations are primarily "read mostly" databases. OLAP applications tend to extract historical data that has accumulated over a long period of time. For such databases, redundant or "denormalized" data may facilitate business intelligence

Back

What are the types of indexes in SQL Server?

Front

1. Hash With a hash index, data is accessed through an in-memory hash table. Hash indexes consume a fixed amount of memory, which is a function of the bucket count. 2. Memory-optimized non-clustered indexes For memory-optimized non-clustered indexes, memory consumption is a function of the row count and the size of the index key columns. 3. Clustered A clustered index sorts and stores the data rows of the table or view in order based on the clustered index key. The clustered index is implemented as a B-tree index structure that supports fast retrieval of the rows, based on their clustered index key values. 4. Non-clustered A non-clustered index can be defined on a table or view with a clustered index or on a heap. Each index row in the non-clustered index contains the non-clustered key value and a row locator. This locator points to the data row in the clustered index or heap having the key value. The rows in the index are stored in the order of the index key values, but the data rows are not guaranteed to be in any particular order unless a clustered index is created on the table. 5. Unique A unique index ensures that the index key contains no duplicate values and therefore every row in the table or view is in some way unique. Uniqueness can be a property of both clustered and nonclustered indexes. 6. Columnstore An in-memory columnstore index stores and manages data by using column-based data storage and column-based query processing. Columnstore indexes work well for data warehousing workloads that primarily perform bulk loads and read-only queries. Use the columnstore index to achieve up to 10x query performance gains over traditional row-oriented storage, and up to 7x data compression over the uncompressed data size. 7. Index with included columns A non-clustered index that is extended to include nonkey columns in addition to the key columns. 8. Index on computed columns An index on a column that is derived from the value of one or more other columns, or certain deterministic inputs. 9. Filtered An optimized non-clustered index, especially suited to cover queries that select from a well-defined subset of data. It uses a filter predicate to index a portion of rows in the table. A welldesigned filtered index can improve query performance, reduce index maintenance costs, and reduce index storage costs compared with full-table indexes. 10. Spatial A spatial index provides the ability to perform certain operations more efficiently on spatial objects (spatial data) in a column of the geometry data type. The spatial index reduces the number of objects on which relatively costly spatial operations need to be applied. 11. XML A shredded, and persisted, representation of the XML binary large objects (BLOBs) in the xml data type column. 12. Full-text A special type of token-based functional index that is built and maintained by the Microsoft Full-Text Engine for SQL Server. It provides efficient support for sophisticated word searches in character string.

Back

Does a clustered index have to be created on the primary key column?

Front

You can create a clustered index on any qualified columns. True, a clustered index and primary key constraint is usually a match made in heaven, so well suited in fact that when you define a primary key, a clustered index is automatically created, if one doesn't already exist. Still, you might decide that the clustered index would be better matched elsewhere, and often your decision would be justified. The main purpose of a clustered index is to sort all the rows in your table, based on the key columns in your index definition, and provide quick and easy access to the table's data. The table's primary key can be a good choice because it uniquely identifies every row in the table, without the need for additional data. In some cases, a surrogate primary key can be an even better choice because, in addition to being unique, the values are small and added sequentially, making the non-clustered indexes that reference those values more efficient as well. The query optimizer also loves such an arrangement because joins can be processed faster, as can queries that in some other way reference the primary key and its associated clustered index. As I said, a match made in heaven. In the end, however, your clustered index should take into account a number of factors, such as how many non-clustered indexes will be pointing to the clustered index, how often the clustered key values will change and how large those key columns are. When the values in a clustered index change or the index doesn't perform well, all of the other tables' indexes can be impacted. A clustered index should be based on relatively stable columns that grow in an orderly fashion, as opposed to growing randomly. The index should also support the queries most commonly accessing the table's data so they can take full advantage of the data being sorted and available in the leaf nodes. If the primary key fits this scenario, then use it. Otherwise, use a different set of columns.

Back

What is a FACT table in data warehousing?

Front

A fact table is the central table in a "star schema" of a data warehouse. A fact table stores quantitative information for analysis and is often denormalized. A fact table is mainly made up of Foreign key column which references to various dimension and numeric measure values on which aggregation will be performed. Fact tables are of different types, E.g. Transactional, Cumulative and Snapshot.

Back

Why can't a table have two clustered indexes?

Front

The short answer? A clustered index is the table. When you define a clustered index on a table, the database engine sorts all the rows in the table, in ascending or descending order, based on the columns identified in the index definition (the key columns). The clustered index is not a separate entity like it is with other index types, but rather a mechanism for sorting the table and facilitating quick data access.

Back

If indexes are so great, why not just create them on every column?

Front

No good deed goes unpunished. At least that's how it works with indexes. Sure, they're great as long as all you run are SELECT statements against the database, but throw in a lot of INSERT, UPDATE, and DELETE statements, and the landscape quickly changes. When you issue a SELECT statement, the query engine finds the index, navigates the B-tree structure, and locates the desired data. What could be simpler? But that all changes if you issue a data modification statement, such as UPDATE. True, for the first part of the UPDATE operation, the query engine can again use the index to locate the row to be modified. That's the good news. And if it's a simple update and no key values are involved, chances are the process will be fairly painless. But if the update forces a page split or key values change and get moved to different nodes, the index might need to be reorganized, impacting other indexes and operations and resulting in slower performance all around. Same with a DELETE statement. An index can help locate the data to be deleted, but the deletion itself might result in page reshuffling. And as for the INSERT statement, it's the sworn enemy of all indexes. You start adding a lot of data and your indexes have to be modified and reorganized and everybody suffers. So the way in which your database is queried must be uppermost in your thinking when determining what sort of indexes to add and how many. More is not necessarily better. Before you throw another index at a table, consider the costs, not only on query performance, but also on disk space, index maintenance, and the domino effects on other operations. Your index strategy is one of the most important aspects of a successful database implementation and should take into account a number of considerations, from index size to the number of unique values to the type of queries being supported.

Back

What is the difference between DELETE, TRUNCATE and DROP in SQL Server?

Front

The DELETE command is used to remove rows from a table. A WHERE clause can be used to only remove some rows. If no WHERE condition is specified, all rows will be removed. After performing a DELETE operation, you need to COMMIT or ROLLBACK the transaction to make the change permanent or to undo it. Note that this operation will cause all DELETE triggers on the table to fire. TRUNCATE removes all rows from a table. The operation cannot be rolled back and no triggers will be fired. As such, TRUCATE is faster and doesn't use as much undo space as a DELETE. The DROP command removes a table from the database. All the table rows, indexes and privileges will also be removed. No DML triggers will be fired. The operation cannot be rolled back.

Back

What does database normalization refer to?

Front

Database normalization is the process of organizing the fields and tables of a relational database to minimize redundancy. Normalization usually involves dividing large tables into smaller (and less redundant) tables and defining relationships between them. The objective is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database using the defined relationships.

Back

What are the types of JOINS in SQL Server?

Front

1. Inner Join Inner join returns only those records/rows that match/exists in both the tables. 2. Left Outer Join A left outer join returns all records/rows from the left table and from the right table returns only matched records. If there are no columns matching in the right table, it returns NULL values. 3. Right Outer Join A right outer join returns all records/rows from the right table and from the left table returns only matched records. If there are no columns matching in the left table, it returns NULL values. 4. Full Outer Join A full outer join combines a left outer join and a right outer join. This join returns all records/rows from both the tables. If there are no columns matching in either table, it returns NULL values. 5. Self Join A self join is used to join a database table to itself, particularly when the table has a Foreign key that references its own Primary Key. Basically we have only three types of joins: Inner join, Outer join and Cross join. We use any of these three JOINS to join a table to itself. Hence, self join is not a "type" of sql join 6. Cross Join A cross join is a Cartesian join which means a Cartesian product of both the tables. This join does not need any condition to join two tables. This join returns records/rows that are a multiplication of records from both tables.

Back

How can an index improve performance if the query engine has to negotiate through all those index nodes?

Front

First off, indexes do not always improve performance. Too many of the wrong type of indexes can bog down a system and make query performance worse. That said, if indexes have been carefully implemented, they can provide a significant performance boost. Think of a big fat book about SQL Server performance tuning. Imagine that you want to find information about configuring the Resource Governor. You can thumb through the book one page at a time, or you can go to the index and find the exact page number where the information is located (assuming the book has been properly indexed). Undoubtedly, this could save you a considerable amount of time, despite the fact that you must refer to an entirely different structure (the index) to get the information you need from the primary structure (the book). Just like a book's index, a SQL Server index lets you perform targeted queries instead of scanning all of a table's data. For small tables, a full scan is usually not a big deal, but large tables spread across many data pages can result in excessively longrunning queries if no index exists to point the query engine in the right direction. Imagine being lost on the Los Angeles freeways at rush hour without a map, and you will get the idea.

Back

What are the types of SQL Server constraints?

Front

1. Unique Constraint: Column may not contain duplicate values 2. CHECK Constraint: Allow you to specify a value that the database will use to populate fields that are created and left blank 3. NOT NULL constraint: Allow you to specify that a column may not contain NULL values. 4. PRIMARY KEY constraint: specify fields that uniquely identify each record in the table. 5. FOREIGN KEY constraint: are fields in a relational database table that match the primary key column of another table. Foreign keys can be used to cross-reference tables.

Back

What are tuples in BI?

Front

A tuple uniquely identifies a slice of data from a cube. The tuple is formed by a combination of dimension members, as long as there are no two or more members that belong to the same hierarchy.

Back

If you index a view is it still a view?

Front

A view is a virtual table made up a data from one or more other tables. It is essentially a named query that retrieves the data from the underlying tables when you call that view. You can improve a view's performance by creating clustered and non-clustered indexes on that view, just like you create indexes on a table, the main caveat being that you must create a unique clustered index before you can create a non-clustered one. When creating an indexed view (also referred to as a materialized view), the view definition itself remains a separate entity. It is, after all, merely a hard-coded SELECT statement stored in the database. The indexes are a different story. Whether you create a clustered or non-clustered index, the data is persisted to disk, just like a regular index. In addition, when the data in the underlying tables change, the indexes are automatically updated (which means you might want to avoid indexing views where the underlying data changes frequently). In any case, the view itself still remains a view, but one that just happens to have indexes associated with it. Before you can create an index on a view, it must meet a number of restrictions. For example, the view can reference only base tables, not other views, and those tables must be within the same database. There are lots more restrictions, of course, so be sure to refer to the SQL Server documentation for all the sordid details.

Back

Does it matter how many duplicate values a key column contains?

Front

Whenever you create an index, you should try to minimize the number of duplicate values contained in your key columns, or more precisely, try to keep the ratio of duplicate values as low as possible, when compared to the entire set of values. If you're working with a composite index, that duplication refers to the key columns as a whole. The individual columns can contain lots of duplicates, but duplication across the columns should be at a minimum. For example, if you create a composite non-clustered index on the FirstName and LastName columns, you can have multiple John values and multiple Doe values, but you want to have as few John Doe values as possible, or better still, only one John Doe. The ratio of unique values within a key column is referred to as index selectivity. The more unique the values, the higher the selectivity, which means that a unique index has the highest possible selectivity. The query engine loves highly selective key columns, especially if those columns are referenced in the WHERE clause of your frequently run queries. The higher the selectivity, the faster the query engine can reduce the size of the result set. The flipside, of course, is that a column with relatively few unique values is seldom a good candidate to be indexed.

Back

Can you create a clustered index on a column with duplicate values?

Front

Yes and no. Yes, you can create a clustered index on key columns that contain duplicate values. No, the key columns cannot remain in a non-unique state. Let me explain. If you create a non-unique clustered index on a column, the database engine adds a fourbyte integer (uniquifier) to duplicate values to ensure their uniqueness and, subsequently, to provide a way to identify each row in the clustered table. For example, you might decide to create a clustered index on the LastName column of a table that contains customer data. The column includes the values Franklin, Hancock, Washington, and Smith. You then insert the values Adams, Hancock, Smith, and Smith. Because the values in the key column must ultimately be unique, the database engine will modify the duplicates so that the values look something like this: Adams, Franklin, Hancock, Hancock1234, Washington, Smith, Smith4567, and Smith5678. On the surface, this might seem an okay approach, but the integer increases the size of the key values, which could start becoming an issue if you have a lot of duplicate values and those values are being referenced by foreign keys and non-clustered indexes. For this reason, you should try to create unique clustered indexes whenever possible. If not possible, at least go for columns that have a high percentage of unique values.

Back

What is the relationship between unique and primary key constraints and a table's indexes?

Front

Primary key and unique constraints ensure that the values in the key columns are unique. You can define only one primary key on a table and it cannot contain null values. You can create multiple unique constraints on a table and each one can contain a single null value. When you create a primary key constraint, the database engine also creates a unique clustered index, if a clustered index doesn't already exist. However, you can override the default behavior and specify that a non-clustered index be created. If a clustered index does exist when you create the primary key, the database engine creates a unique non-clustered index. When you create a unique constraint, the database engine creates a unique non-clustered index. However, you can specify that a unique clustered index be created if a clustered index does not already exist. For all practical purposes, a unique constraint and unique index are one in the same.

Back

What are measure and measure groups in BI?

Front

A metrics value stored in your Fact Table is called a measure. Measures are used to analyze performance of the business. A measure usually contains numeric data, which can be aggregated against the usage of an associated dimensions. A measure group holds a collection of related measures.

Back

What does fill factor refer to?

Front

Fill factor is the value that determines the percentage of space on each leaf-level page to be filled with data. In SQL Server, the smallest unit is a page. Every page can store one or more rows based on the size of the row. The default value of the Fill factor is 100, which is same as a value of 0. The default Fill Factor (100 or 0) will allow the SQL Server to fill the leaf-level pages of an index with the maximum numbers of the rows it can fit. There will be no or very little empty space left in the page when the fill factor is 100.

Back

In SQL Server , why are clustered and non-clustered indexes considered B-tree indexes?

Front

A basic SQL Server index, whether clustered or non-clustered, is spread across a set of pages, referred to as the index nodes. These pages are organized into a hierarchical B-tree structure. At the top sits the root node, at the bottom, the leaf nodes, with intermediate nodes in between. The root node represents the main entry point for queries trying to locate data via the index. From that node, the query engine negotiates the hierarchy down to the appropriate leaf node, where the actual data resides. For example, suppose your query is looking for the row that contains the key value 82. The query engine starts at the root node, which points to the correct intermediate node, in this case, the 1-100 node. From the 1-100 node, the engine proceeds to the 51-100 node, and from there, goes to the 76-100 leaf node. If this is a clustered index, the leaf node will contain the entire row of data associated with the key value 82. If this is a non-clustered index, the leaf node will point to the clustered table or heap where the row exists.

Back