Friday, April 16, 2010

interview questions

(Q) What is a Database or Database Management System (DBMS)?
Twist: What is the difference between a file and a database? Can files qualify as a database?

Note: Probably these questions are too basic for experienced SQL SERVER guys. But from a fresher�s point of view, it can be a difference between getting a job and being jobless.

1.Database provides a systematic and organized way of storing, managing and retrieving from a collection of logically related information.
2.Secondly, the information has to be persistent, that means even after the application is closed the information should be persisted.
3.Finally, it should provide an independent way of accessing data and should not be dependent on the application to access the information.
Ok, let me spend a few more sentences on explaining the third aspect. Below is a simple figure of a text file that has personal detail information. The first column of the information is Name, second Address and finally Phone Number. This is a simple text file, which was designed by a programmer for a specific application.


Figure 1.1: Non-Uniform Text File
It works fine in the boundary of the application. Now, some years down the line a third party application has to be integrated with this file. In order for the third party application to be integrated properly, it has the following options:

•Use the interface of the original application.
•Understand the complete details of how the text file is organized, example the first column is Name, then Address and finally Phone Number. After analyzing, write a code which can read the file, parse it etc. Hmm, lot of work, right.
That�s what the main difference is between a simple file and a database; database has an independent way (SQL) of accessing information while simple files do not (That answers my twisted question defined above). File meets the storing, managing and retrieving part of a database, but not the independent way of accessing data.

Note: Many experienced programmers think that the main difference is that file cannot provide multi-user capabilities which a DBMS provides. But if you look at some old COBOL and C programs where files were the only means of storing data, you can see functionalities like locking, multi-user etc. provided very efficiently. So it�s a matter of debate. If some interviewers think of this as a main difference between files and database, accept it� going in to debate means probably losing a job.

(Just a note for fresher�s: Multi-user capabilities mean that at one moment of time more than one user should be able to add, update, view and delete data. All DBMS' provides this as in-built functionalities, but if you are storing information in files, it�s up to the application to write logic to achieve these functionalities).

(Q) What is the Difference between DBMS and RDBMS?
As mentioned before, DBMS provides a systematic and organized way of storing, managing and retrieving from a collection of logically related information. RDBMS also provides what DBMS provides, but above that, it provides relationship integrity. So in short, we can say:

RDBMS = DBMS + REFERENTIAL INTEGRITY

For example, in the above Figure 1.1, every person should have an Address. This is a referential integrity between Name and Address. If we break this referential integrity in DBMS and files, it will not complain, but RDBMS will not allow you to save this data if you have defined the relation integrity between person and addresses. These relations are defined by using �Foreign Keys� in any RDBMS.

Many DBMS companies claimed that their DBMS product was RDBMS compliant, but according to industry rules and regulations, if the DBMS fulfills the twelve CODD rules, it�s truly a RDBMS. Almost all DBMS (SQL SERVER, ORACLE etc.) fulfill all the twelve CODD rules and are considered truly as RDBMS.

Note: One of the biggest debates is whether Microsoft Access is an RDBMS? We will be answering this question in later section.

(DB)What are CODD Rules?
Twist: Does SQL SERVER support all the twelve CODD rules?

Note: This question can only be asked on two conditions when the interviewer is expecting you to be at a DBA job or you are complete fresher, yes and not to mention the last one he treats CODD rules as a religion. We will try to answer this question from the perspective of SQL SERVER.

In 1969, Dr. E. F. Codd laid down 12 rules, which a DBMS should adhere to in order to get the logo of a true RDBMS.

Rule 1: Information Rule
"All information in a relational database is represented explicitly at the logical level and in exactly one way - by values in tables."

In SQL SERVER, all data exists in tables and are accessed only by querying the tables.

Rule 2: Guaranteed Access Rule
"Each and every datum (atomic value) in a relational database is guaranteed to be logically accessible by resorting to a combination of table name, primary key value and column name."

In flat files, we have to parse and know the exact location of field values. But if a DBMS is truly an RDBMS, you can access the value by specifying the table name, field name, for instance Customers.Fields [�Customer Name�].

SQL SERVER also satisfies this rule. In ADO.NET we can access field information using table name and field names.

Rule 3: Systematic Treatment of Null Values
"Null values (distinct from the empty character string or a string of blank characters and distinct from zero or any other number) are supported in fully relational DBMS for representing missing information and inapplicable information in a systematic way, independent of data type.�

In SQL SERVER, if there is no data existing, NULL values are assigned to it. Note NULL values in SQL SERVER do not represent spaces, blanks or a zero value; it is a distinct representation of missing information and thus satisfies rule 3 of CODD.

Rule 4: Dynamic On-line Catalog Based on the Relational Model
"The database description is represented at the logical level in the same way as ordinary data, so that authorized users can apply the same relational language to its interrogation as they apply to the regular data."

The Data Dictionary is held within the RDBMS. Thus, there is no need for off-line volumes to tell you the structure of the database.

Rule 5: Comprehensive Data Sub-language Rule
"A relational system may support several languages and various modes of terminal use (for example, the fill-in-the-blanks mode). However, there must be at least one language whose statements are expressible, per some well-defined syntax, as character strings and that is comprehensive in supporting all the following items:

•Data Definition
•View Definition
•Data Manipulation (Interactive and by program)
•Integrity Constraints
•Authorization
•Transaction boundaries ( Begin, commit and rollback)"
SQL SERVER uses SQL to query and manipulate data, which has a well-defined syntax and is being accepted as an international standard for RDBMS.

Note: According to this rule, CODD has only mentioned that some language should be present to support it, but not necessary that it should be SQL. Before the 80�s, different�s database vendors were providing their own flavor of syntax until in 1980, ANSI-SQL came in to standardize this variation between vendors. As ANSI-SQL is quite limited, every vendor including Microsoft introduced their additional SQL syntax in addition to the support of ANSI-SQL. You can see SQL syntax varying from vendor to vendor.

Rule 6: View-updating Rule
"All views that are theoretically updatable are also updatable by the system."

In SQL SERVER, not only views can be updated by the user, but also by SQL SERVER itself.

Rule 7: High-level Insert, Update and Delete
"The capability of handling a base relation or a derived relation as a single operand applies not only to the retrieval of data, but also to the insertion, update and deletion of data."

SQL SERVER allows you to update views that in turn affect the base tables.

Rule 8: Physical Data Independence
"Application programs and terminal activities remain logically unimpaired whenever any changes are made in either storage representations or access methods."

Any application program (C#, VB.NET, VB6, VC++ etc) does not need to be aware of where the SQL SERVER is physically stored or what type of protocol it is using, the database connection string encapsulates everything.

Rule 9: Logical Data Independence
"Application programs and terminal activities remain logically unimpaired when information-preserving changes of any kind that theoretically permit un-impairment are made to the base tables."

Application programs written in C# or VB.NET do not need to know about any structure changes in SQL SERVER database. Example: adding of new field etc.

Rule 10: Integrity Independence
"Integrity constraints specific to a particular relational database must be definable in the relational data sub-language and storable in the catalog, not in the application programs."

In SQL SERVER, you can specify data types (integer, nvarchar, Boolean etc.) which put in data type checks in SQL SERVER rather than through application programs.

Rule 11: Distribution Independence
"A relational DBMS has distribution independence."

SQL SERVER can spread across more than one physical computer and across several networks; but from application programs, it has not a big difference but just specifying the SQL SERVER name and the computer on which it is located.

Rule 12: Non-subversion Rule
"If a relational system has a low-level (single-record-at-a-time) language, that low level cannot be used to subvert or bypass the integrity Rules and constraints expressed in the higher level relational language (multiple-records-at-a-time)."

In SQL SERVER whatever integrity rules are applied on every record are also applicable when you process a group of records using application program in any other language (example: C#, VB.NET, J# etc.).

Readers can see from the above explanation that SQL SERVER satisfies all the CODD rules, some database gurus consider SQL SERVER as not truly being an RDBMS, but that�s a matter of debate.

(Q) Is Access Database a RDBMS?
Access fulfills all rules of CODD, so from this point of view, yes it�s truly an RDBMS. However, many people can contradict it as a large community of Microsoft professionals think that Access is not an RDBMS.

(Q) What is the Main Difference between ACCESS and SQL SERVER?
As mentioned before, Access fulfills all the CODD rules and behaves as a true RDBMS. But there�s a huge difference from an architecture perspective, due to which many developers prefer to use SQL SERVER as the major database rather than Access. Following is the list of architecture differences between them:

•Access uses file server design and SQL SERVER uses the Client / Server model. This forms the major difference between SQL SERVER and ACCESS.
Note: Just to clarify what is client server and file server I will make a quick description of widely accepted architectures. There are three types of architectures:
◦Main frame architecture (This is not related to the above explanation but just mentioned as it can be useful during an interview and also for comparing with other architectures)
◦File sharing architecture (Followed by ACCESS)
◦Client Server architecture (Followed by SQL SERVER).
In Main Frame architecture, all the processing happens on central host server. User interacts through a dumb terminal that only sends keystrokes and information to the host. All the main processing happens on the central host server. So the advantage in such type of architecture is that you need least configuration clients. But the disadvantage is that you need a robust central host server like Main Frames.

In File sharing architecture, which is followed by Access database, all the data is sent to the client terminal and then processed. For instance, if you want to see customers who stay in India, in File Sharing architecture all customer records will be sent to the client PC regardless whether the customer belongs to India or not. On the client PC customer records from India are sorted/filtered out and displayed, in short all processing logic happens on the client PC. Therefore, in this architecture, the client PC should have heavy configuration and it increases network traffic as a lot of data is sent to the client PC. However, the advantage of this architecture is that your server can be of a low configuration.


Figure 1.2: File Server Architecture of Access
In client server architecture, the above limitation of the file server architecture is removed. In client server architecture, you have two entities, client and the database server. File server is now replaced by database server. Database server takes up the load of processing any database related activity and the client does any validation aspect of database. As the work is distributed between the entities it increases scalability and reliability. Second, the network traffic also comes down as compared to file server. For example if you are requesting customers from India, database server will sort/ filter and send only Indian customer details to the client, thus bringing down the network traffic tremendously. SQL SERVER follows the client-server architecture.


Figure 1.3: Client Server Architecture of SQL SERVER
•The second issue comes in terms of reliability. In Access, the client directly interacts with the Access file, in case there is some problem in the middle of a transaction, there are chances that an Access file can get corrupt. But in SQL SERVER, the engine sits in between the client and the database, so in case of any problems in the middle of a transaction, it can revert back to its original state.
Note: SQL SERVER maintains a transaction log by which you can revert back to your original state in case of any crash.
•When your application has to cater to a huge load demand, highly transactional environment and high concurrency, then its better to go for SQL SERVER or MSDE.
•But when it comes to cost and support, Access stands better than SQL SERVER. In case of SQL SERVER, you have to pay for per client license, but Access runtime is free.
Summarizing: SQL SERVER gains points in terms of network traffic, reliability and scalability whereas Access gains points in terms of cost factor.

(Q) What is the Difference between MSDE and SQL SERVER 2000?
MSDE is a royalty free, redistributable and cut short version of the giant SQL SERVER database. It is primarily provided as a low cost option for developers who need a database server, which can easily be shipped and installed. It can serve as a good alternative for Microsoft Access database as it overcomes quite a few problems which Access has.

Below is a complete list, which can give you a good idea of the differences:

•Size of database: Microsoft Access and MSDE have a limitation of 2GB while SQL SERVER has 1,048,516 TB1.
•Performance degrades in MSDE 2000 when maximum number of concurrent operations goes above 8 or is equal to 8. It does not mean that you cannot have more than eight concurrent operations but the performance degrades. Eight-connection performance degradation is implemented by using SQL SERVER 2000 workload governor (we will be looking into more detail of how it works). As compared to SQL SERVER 2000, you can have 32,767 concurrent connections.
•MSDE does not provide OLAP and Data warehousing capabilities.
•MSDE does not have support facility for SQL mail.
•MSDE 2000 does not have GUI administrative tool such as enterprise manager, Query analyzer or Profiler. But there are roundabout ways by which you can manage MSDE 2000:
◦Old command line utility OSQL.EXE
◦VS.NET IDE Server Explorer: Inside VS.NET IDE, you have a functionality which can give you a nice GUI administrative tool to manage IDE.
◦SQL SERVER WEB Data administrator installs a web based GUI which you can use to manage your database.
For any details refer here.
•SQL-DMO objects can be used to build your custom UI
•There are many third party tools, which provide administrative capability GUI, which is out of scope of the book as it is only meant for interview questions.
•MSDE does not support Full text search.
Summarizing: There are two major differences: The first is the size limitation (2 GB) of the database and second is the concurrent connections (eight concurrent connections) which are limited by using the workload governor. During an interview, this answer will suffice if the interviewer is really testing your knowledge.

(Q) What is SQL SERVER Express 2005 Edition?
Twist: What is the difference between SQL SERVER Express 2005 and MSDE 2000?

Note: Normally comparison is when the product is migrating from one version to other version. When SQL SERVER 7.0 was migrating to SQL 2000, asking differences was one of the favorite questions.

SQL SERVER Express edition is a scaled down version of SQL SERVER 2005 and the next evolution of MSDE.

Listed below are some major differences between them:

•MSDE maximum database size is 2GB while SQL SERVER Express has around 4GB.
•In terms of programming language support MSDE has only TSQL, but SQLSERVER Express has TSQL and .NET. In SQL SERVER Express 2005, you can write your stored procedures using .NET.
•SQL SERVER Express does not have connection limitation, which MSDE had and was controlled through the workload governor.
•There was no XCOPY support for MSDE, SQL SERVER Express has it.
•DTS is not present in SQL SERVER express while MSDE has it.
•SQL SERVER Express has reporting services while MSDE does not.
•SQL SERVER Express has native XML support and MSDE does not.
Note: Native XML support means now in SQL SERVER 2005:

•You can create a field with data type XML.
•You can provide SCHEMA to the SQL SERVER fields with XML data type.
•You can use new XML manipulation techniques like XQUERY also called as XML QUERY.
There is a complete chapter on SQL SERVER XML Support, so till then this will suffice.

Summarizing: The major difference is the database size (2 GB and 4 GB), support of .NET support in stored procedures and native support for XML. This much can convince the interviewer that you are clear about the differences.

(DB) What is SQL Server 2000 Workload Governor?
Workload governor limits the performance of SQL SERVER Desktop engine (MSDE) if the SQL engine receives more load than what is meant for MSDE. MSDE was always meant for trial purpose and non-critical projects. Microsoft always wanted companies to buy their full blow version of SQL SERVER, so in order that they can put limitation on MSDE performance and number of connections, they introduced Workload governor.

Workload governor sits between the client and the database engine and counts the number of connections per database instance. If Workload governor finds that the number of connections exceeds eight connections, it starts stalling the connections and slowing down the database engine.

Note: It does not limit the number of connections but makes the connection request go slow. By default 32,767 connections are allowed both for SQL SERVER and MSDE. But it just makes the database engine go slow above eight connections.

What is the Difference between SQL SERVER 2000 and 2005?
Twist: What is the difference between Yukon and SQL SERVER 2000?

Note: This question will be one of the favorites during SQL SERVER interviews. I have marked the points which should be mentioned by developers as PG and DBA for Database Administrator.

Following are some major differences between the two versions:

•(PG) The most significant change is the .NET integration with SQL SERVER 2005. Stored procedures, user-defined functions, triggers, aggregates, and user-defined types can now be written using your own favorite .NET language (VB.NET, C#, J# etc.). This support was not there in SQL SERVER 2000 where the only language was T-SQL. In SQL 2005, you have support for two languages T-SQL and .NET.
•(PG) SQL SERVER 2005 has reporting services for reports which is a newly added feature and does not exist for SQL SERVER 2000. It was a separate installation for SQL Server 2000.
•(PG) SQL SERVER 2005 has introduced two new data types varbinary (max) and XML. If you remember in SQL SERVER 2000, we had image and text data types. Problem with image and text data types is that they assign the same amount of storage irrespective of what the actual data size is. This problem is solved using varbinary (max) which acts depending on amount of data. One more new data type is included XML which enables you to store XML documents and does schema verification. In SQL SERVER 2000, developers used varchar or text data type and all validation had to be done programmatically.
•(PG) SQL SERVER 2005 can now process direct incoming HTTP request without IIS Web server. In addition, stored procedure invocation is enabled using the SOAP protocol.
•(PG) Asynchronous mechanism is introduced using server events. In Server event model the server posts an event to the SQL Broker service, later the client can come and retrieve the status by querying the broker.
•For huge databases, SQLSERVER has provided a cool feature called �Data partitioning�. In data partitioning, you break a single database object such as a table or an index into multiple pieces. But for the client application accessing the single database object, �partitioning� is transparent.
•In SQL SERVER 2000, if you rebuilt clustered indexes even the non-clustered indexes where rebuilt. But in SQL SERVER 2005 building the clustered indexes does not build the non-clustered indexes.
•Bulk data uploading in SQL SERVER 2000 was done using BCP (Bulk copy program�s) format files. Now in SQL SERVER 2005 bulk, data uploading uses XML file format.
•In SQL SERVER 2000 there were maximum 16 instances, but in 2005 you can have up to 50 instances.
•SERVER 2005 has support of �Multiple Active Result Sets� also called as �MARS�. In previous versions of SQL SERVER 2000 in one connection, you could only have one result set. Now in one SQL connection, you can query and have multiple results set.
•In previous versions of SQL SERVER 2000, system catalog was stored in the master database. In SQL SERVER 2005, it�s stored in a resource database which is stored as sys object. You cannot access the sys object directly as in the older version we were accessing the master database.
•This is one of the hardware benefits which SQL SERVER 2005 has over SQSERVER 2000 � support of hyper threading. WINDOWS 2003 supports hyper threading; SQL SERVER 2005 can take advantage of the feature unlike SQL SERVER 2000 which did not support hyper threading.
Note: Hyper threading is a technology developed by INTEL which creates two logical processors on a single physical hardware processor.
•SMO will be used for SQL Server Management.
•AMO (Analysis Management Objects) to manage Analysis Services servers, data sources, cubes, dimensions, measures, and data mining models. You can mapm AMO in old SQL SERVER with DSO (Decision Support Objects).
•Replication is now managed by RMO (Replication Management Objects).
Note: SMO, AMO and RMO are all using .NET Framework.
•SQL SERVER 2005 uses current user execution context to check rights rather than ownership link chain, which was done in SQL SERVER 2000.
Note: There is a question on this later see for execution context questions.
•In previous versions of SQL SERVER the schema and the user name was same, but in current, the schema is separated from the user. Now the user owns schema.
Note: There are questions on this, refer �Schema� later.
Note: Ok below are some GUI changes.
•Query analyzer is now replaced by query editor.
•Business Intelligence development studio will be used to create Business intelligence solutions.
•OSQL and ISQL command line utility is replaced by SQLCMD utility.
•SQL SERVER Enterprise manager is now replaced by SQL SERVER Management studio.
•SERVER Manager which was running in system tray is now replaced by SQL Computer manager.
•Database mirror concept is supported in SQL SERVER 2005, which was not present in SQL SERVER 2000.
•In SQL SERVER 2005 Indexes can be rebuilt online when the database is in actual production. If you look back in SQL SERVER 2000, you cannot do insert, update, and delete operations when you are building indexes.
•(PG) Other than Serializable, Repeatable Read, Read Committed, and Read Uncommitted isolation levels, there is one more new isolation level �Snapshot Isolation level�.
Note: We will see �Snapshot Isolation level� in detail in the coming questions.
Summarizing: The major significant difference between SQL SERVER 2000 and SQL SERVER 2005 is in terms of support of .NET Integration, Snap shot isolation level, Native XML support, handling HTTP request, Web service support and Data partitioning. You do not have to really say all the above points during an interview. A sweet summary and you will rock.

(Q) What are E-R diagrams?
E-R diagram also termed as Entity-Relationship diagram shows the relationship between various tables in the database. Example: Tables Customer and Customer Addresses have a one to many relationship (i.e. one customer can have multiple addresses) this can be shown using the ER diagram. ER diagrams are drawn during the initial stages of a project to forecast how the database structure will shape up. Below is a screen shot of a sample ER diagram of �Asset Management� which ships free with Access.


Figure 1.4: Asset management ER diagram.
(Q) How many Types of Relationship Exist in Database Designing?
There are three major relationship models:

•One-to-one


Figure 1.5: One-to-One relationship ER diagram
•One-to-many
In this many records in one table correspond to the one record in another table.
Example: Every one customer can have multiple sales. So there exist one-to-many relationships between customer and sales table.
One Asset can have multiple Maintenance. So Asset entity has one-to-many relationship between them as the ER model shows below.


Figure 1.6: One-to-Many Relationship ER diagram
•Many-to-many
In this, one record in one table corresponds to many rows in another table and also vice-versa.
For instance: In a company, one employee can have many skills like Java , C# etc. and also one skill can belong to many employees.
Given below is a sample of many-to-many relationship. One employee can have knowledge of multiple Technology. So in order to implement this, we have one more table Employee Technology which is linked to the primary key of Employee and Technology table.


Figure 1.7: Many-to-Many Relationship ER diagram

No comments:

Post a Comment