Thursday, June 7, 2012

GIST NOTES 20 - SQL


GIST NOTES 20 - SQL

[DISCLAIMER: This is solely for non-commercial use. I don't claim ownership of this content. This is a crux of all my readings studies and analysis. Some of them are excerpts from famous books/sources on  the subject. Some of them are my contemplation upon experiments with direct hand coded code samples using IDE or notepad.


I've created this mainly to reduce an entire book into few pages of critical content that we should never forget. Even after years, you don't need to read the entire book again to get back its philosophy. I hope these notes will help you to replay the entire book in your mind once again.]


RDBMS - Relational DB Management System
SQL - Structured Query Language

#SQL is not case sensitive
#SQL DML and DDL

SQL can be divided into two parts: The Data Manipulation Language (DML) and the Data Definition Language (DDL).

The query and update commands form the DML part of SQL:

SELECT - extracts data from a database
UPDATE - updates data in a database
DELETE - deletes data from a database
INSERT INTO - inserts new data into a database
The DDL part of SQL permits database tables to be created or deleted. It also defines indexes (keys), specifies links between tables, and imposes constraints between tables. The most important DDL statements in SQL are:

CREATE DATABASE - creates a new database
ALTER DATABASE - modifies a database
CREATE TABLE - creates a new table
ALTER TABLE - modifies a table
DROP TABLE - deletes a table
CREATE INDEX - creates an index (search key)
DROP INDEX - deletes an index

#KEYS:

The relationships between columns located in different tables are usually described through the use of keys.

As you can see we have a PRIMARY KEY in each table. The Primary key serves as a mechanism to refer to other fields within the same row. In this case, the Primary key is used to identify a relationship between a row under consideration and the person whose name is located inside the 'names' table. We use the AUTO_INCREMENT statement only for the 'names' table as we need to use the generated contact_id number in all the other tables for identification of the rows.

This type of table design where one table establishes a relationship with several other tables is known as a 'one to many' relationship.
In a 'many to many' relationship we could have several Auto Incremented Primary Keys in various tables with several inter-relationships.

Foreign Key:

A foreign key is a field in a table which is also the Primary Key in another table. This is known commonly as 'referential integrity'.


#FULL TEXT INDEXING and Searching

Since version 3.23.23, Full Text Indexing and Searching has been introduced into MySQL. FULLTEXT indexes can be created from VARCHAR and TEXT columns. FULLTEXT searches are performed with the MATCH function. The MATCH function matches a natural language query on a text collection and from each row in a table it returns relevance.The resultant rows are organized in order of relevance.

Full Text searches are a very powerful way to search through text. But is not ideal for small tables of text and may produce inconsistent results. Ideally it works with large collections of textual data.

Optimizing your Database

Well, databases do tend to get large at some or the other. And here arises the issue of database optimization. Queries are going to take longer and longer as the database grows and certain things can be done to speed things up.

Clustering

The easiest method is that of 'clustering'. Suppose you do a certain kind of query often, it would be faster if the database contents were arranged in a in the same way data was requested. To keep the tables in a sorted order you need a clustering index. Some databases keep stuff sorted automatically.

Ordered Indices

These are a kind of 'lookup' tables of sorts. For each column that may be of interest to you, you can create an ordered index.
It needs to be noted that again these kinds of optimization techniques produce a system load in terms of creating a new index each time the data is re-arranged.

There are additional method such as B-Trees, Hashing which you may like to read up about but will not be discussed here.


Replication

Replication is the term given to the process where databases synchronize with each other. In this process one database updates it's own data with respect to another or with reference to certain criteria for updates specified by the programmer. Replication can be used under various circumstances. Examples may be : safety and backup, to provide a closer location to the database for certain users.


What are Transactions ?

In an RDBMS, when several people access the same data or if a server dies in the middle of an update, there has to be a mechanism to protect the integrity of the data. Such a mechanism is called a Transaction. A transaction groups a set of database actions into a single instantaneous event. This event can either succeed or fail. i.e .either get the job done or fail.

The definition of a transaction can be provided by an Acronym called 'ACID'.

(A)tomicity: If an action consists of multiple steps - it's still considered as one operation.

(C) Consistency: The database exists in a valid and accurate operating state before and after a transaction.

(I) Isolation: Processes within one transaction are independent and cannot interfere with that in others.

(D) Durability: Changes affected by a transaction are permanent.

To enable transactions a mechanism called 'Logging' needs to be introduced. Logging involves a DBMS writing details on the tables, columns and results of a particular transaction, both before and after, onto a log file. This log file is used in the process of recovery. Now to protect a certain database resource (ex. a table) from being used and written onto simulatneously several techniques are used. One of them is 'Locking' another is to put a 'time stamp' onto an action. In the case of Locking, to complete an action, the DBMS would need to acquire locks on all resources needed to complete the action. The locks are released only when the transaction is completed.

Now if there were say a large numbers of tables involved in a particular action, say 50, all 50 tables would be locked till a transaction is completed.

To improve things a bit, there is another technique used called 2 Phase Locking or 2PL. In this method of locking, locks are acquired only when needed but are released only when the transaction is completed.

This is done to make sure that that altered data can be safely restored if the transaction fails for any reason.

This technique can also result in problems such as "deadlocks".

In this case - 2 processes requiring the same resources lock each other up by preventing the other to complete an action. Options here are to abort one, or let the programmer handle it.

MySQL implements transactions by implementing the Berkeley DB libraries into its own code. So it's the source version you'd want here for MySQL installation. Read the MySQL manual on implementing this.


Beyond MySQL


What are Views ?

A view allows you to assign the result of a query to a new private table. This table is given the name used in your VIEW query.
Although MySQL does not support views yet a sample SQL VIEW construct statement would look like:

CREATE VIEW TESTVIEW AS SELECT * FROM names;


What are Triggers ?

A trigger is a pre-programmed notification that performs a set of actions that may be commonly required. Triggers can be programmed to execute certain actions before or after an event occurs. Triggers are very useful as they they increase efficiency and accuracy in performing operations on databases and also are increase productivity by reducing the time for application development. Triggers however do carry a price in terms of processing overhead.


What are Procedures ?

Like triggers, Procedures or 'Stored' Procedures are productivity enhancers. Suppose you needed to perform an action using a programming interface to the database in say PERL and ASP. If a programmed action could be stored at the database level, it's obvious that it has to be written only once and cam be called by any programming language interacting with the database.

Procedures are executed using triggers.


Beyond RDBMS

Distributed Databases (DDB)

A distributed database is a collection of several, logically interrelated database located at multiple locations of a computer network. A distributed database management system permits the management of such a database and makes the operation transparent to the user. Good examples of distributed databases would be those utilized by banks, multinational firms with several office locations where each distributed data system works only with the data that is relevant to it's operations. DDBs have have full functionality of any DBMS. It's also important to know that the distributed databases are considered to be actually one database rather than discrete files and data within distributed databases are logically interrelated.

Object Database Management Systems or ODBMS

When the capabilities of a database are integrated with object programming language capababilities, the resulting product is an ODBMS. Database objects appear as programming objects in an ODBMS. Using an ODBMS offers several advantages. The ones that can be most readily appreciated are:

1. Efficiency
When you use an ODBMS, you're using data the way you store it. You will use less code as you're not dependent on an intermediary like SQL or ODBC. When this happens you can create highly complex data structures through your programming language.

2. Speed
When data is stored the way you'd like it to be stored (i.e. natively) there is a massive performance increase as no to-and-fro translation is required.


#The SQL SELECT DISTINCT Statement

In a table, some of the columns may contain duplicate values. This is not a problem, however, sometimes you will want to list only the different (distinct) values in a table.

The DISTINCT keyword can be used to return only distinct (different) values.

SQL SELECT DISTINCT Syntax

SELECT DISTINCT column_name(s)
FROM table_name

##The WHERE Clause

The WHERE clause is used to extract only those records that fulfill a specified criterion.

SQL WHERE Syntax

SELECT column_name(s)
FROM table_name
WHERE column_name operator value

SELECT * FROM Persons
WHERE City='Sandnes'

##Quotes Around Text Fields

SQL uses single quotes around text values (most database systems will also accept double quotes).

Although, numeric values should not be enclosed in quotes.

For text values:

This is correct:

SELECT * FROM Persons WHERE FirstName='Tove'

This is wrong:

SELECT * FROM Persons WHERE FirstName=Tove
For numeric values:

This is correct:

SELECT * FROM Persons WHERE Year=1965

This is wrong:

SELECT * FROM Persons WHERE Year='1965'

Operators Allowed in the WHERE Clause

With the WHERE clause, the following operators can be used:

Operator Description
= Equal
<> Not equal
> Greater than
< Less than
>= Greater than or equal
<= Less than or equal
BETWEEN Between an inclusive range
LIKE Search for a pattern
IN If you know the exact value you want to return for at least one of the columns

##AND and OR operators

SELECT * FROM Persons
WHERE FirstName='Tove'
AND LastName='Svendson'

SELECT * FROM Persons
WHERE FirstName='Tove'
OR FirstName='Ola'

SELECT * FROM Persons WHERE
LastName='Svendson'
AND (FirstName='Tove' OR FirstName='Ola')

##The ORDER BY Keyword

The ORDER BY keyword is used to sort the result-set by a specified column.

The ORDER BY keyword sort the records in ascending order by default.

If you want to sort the records in a descending order, you can use the DESC keyword.

SQL ORDER BY Syntax

SELECT column_name(s)
FROM table_name
ORDER BY column_name(s) ASC|DESC

SELECT * FROM Persons
ORDER BY LastName DESC

##The INSERT INTO Statement

The INSERT INTO statement is used to insert a new row in a table.

SQL INSERT INTO Syntax

It is possible to write the INSERT INTO statement in two forms.

The first form doesn't specify the column names where the data will be inserted, only their values:

INSERT INTO table_name
VALUES (value1, value2, value3,...)
The second form specifies both the column names and the values to be inserted:

INSERT INTO table_name (column1, column2, column3,...)
VALUES (value1, value2, value3,...)

INSERT INTO Persons
VALUES (4,'Nilsen', 'Johan', 'Bakken 2', 'Stavanger')

INSERT INTO Persons (P_Id, LastName, FirstName)
VALUES (5, 'Tjessem', 'Jakob')

##The UPDATE Statement

The UPDATE statement is used to update existing records in a table.

SQL UPDATE Syntax

UPDATE table_name
SET column1=value, column2=value2,...
WHERE some_column=some_value
Note: Notice the WHERE clause in the UPDATE syntax. The WHERE clause specifies which record or records that should be updated. If you omit the WHERE clause, all records will be updated!

##The DELETE Statement

The DELETE statement is used to delete rows in a table.

SQL DELETE Syntax

DELETE FROM table_name
WHERE some_column=some_value
Note: Notice the WHERE clause in the DELETE syntax. The WHERE clause specifies which record or records that should be deleted. If you omit the WHERE clause, all records will be deleted!

Delete All Rows

It is possible to delete all rows in a table without deleting the table. This means that the table structure, attributes, and indexes will be intact:

DELETE FROM table_name

or

DELETE * FROM table_name
Note: Be very careful when deleting records. You cannot undo this statement!

###SQL PRIMARY KEY Constraint

The PRIMARY KEY constraint uniquely identifies each record in a database table.

Primary keys must contain unique values.

A primary key column cannot contain NULL values.

Each table should have a primary key, and each table can have only ONE primary key.

SQL PRIMARY KEY Constraint on CREATE TABLE

The following SQL creates a PRIMARY KEY on the "P_Id" column when the "Persons" table is created:

MySQL:

CREATE TABLE Persons
(
P_Id int NOT NULL,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Address varchar(255),
City varchar(255),
PRIMARY KEY (P_Id)
)
SQL Server / Oracle / MS Access:

CREATE TABLE Persons
(
P_Id int NOT NULL PRIMARY KEY,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Address varchar(255),
City varchar(255)
)
To allow naming of a PRIMARY KEY constraint, and for defining a PRIMARY KEY constraint on multiple columns, use the following SQL syntax:

MySQL / SQL Server / Oracle / MS Access:

CREATE TABLE Persons
(
P_Id int NOT NULL,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Address varchar(255),
City varchar(255),
CONSTRAINT pk_PersonID PRIMARY KEY (P_Id,LastName)
)
Note: In the example above there is only ONE PRIMARY KEY (pk_PersonID). However, the value of the pk_PersonID is made up of two columns (P_Id and LastName).
SQL PRIMARY KEY Constraint on ALTER TABLE

To create a PRIMARY KEY constraint on the "P_Id" column when the table is already created, use the following SQL:

MySQL / SQL Server / Oracle / MS Access:

ALTER TABLE Persons
ADD PRIMARY KEY (P_Id)
To allow naming of a PRIMARY KEY constraint, and for defining a PRIMARY KEY constraint on multiple columns, use the following SQL syntax:

MySQL / SQL Server / Oracle / MS Access:

ALTER TABLE Persons
ADD CONSTRAINT pk_PersonID PRIMARY KEY (P_Id,LastName)
Note: If you use the ALTER TABLE statement to add a primary key, the primary key column(s) must already have been declared to not contain NULL values (when the table was first created).

To DROP a PRIMARY KEY Constraint

To drop a PRIMARY KEY constraint, use the following SQL:

MySQL:

ALTER TABLE Persons
DROP PRIMARY KEY
SQL Server / Oracle / MS Access:

ALTER TABLE Persons
DROP CONSTRAINT pk_PersonID

###SQL FOREIGN KEY Constraint

A FOREIGN KEY in one table points to a PRIMARY KEY in another table.

Let's illustrate the foreign key with an example. Look at the following two tables:

The "Persons" table:

P_Id LastName FirstName Address City
1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger
The "Orders" table:

O_Id OrderNo P_Id
1 77895 3
2 44678 3
3 22456 2
4 24562 1
Note that the "P_Id" column in the "Orders" table points to the "P_Id" column in the "Persons" table.

The "P_Id" column in the "Persons" table is the PRIMARY KEY in the "Persons" table.

The "P_Id" column in the "Orders" table is a FOREIGN KEY in the "Orders" table.

The FOREIGN KEY constraint is used to prevent actions that would destroy links between tables.

The FOREIGN KEY constraint also prevents that invalid data form being inserted into the foreign key column, because it has to be one of the values contained in the table it points to.

SQL FOREIGN KEY Constraint on CREATE TABLE

The following SQL creates a FOREIGN KEY on the "P_Id" column when the "Orders" table is created:

MySQL:

CREATE TABLE Orders
(
O_Id int NOT NULL,
OrderNo int NOT NULL,
P_Id int,
PRIMARY KEY (O_Id),
FOREIGN KEY (P_Id) REFERENCES Persons(P_Id)
)
SQL Server / Oracle / MS Access:

CREATE TABLE Orders
(
O_Id int NOT NULL PRIMARY KEY,
OrderNo int NOT NULL,
P_Id int FOREIGN KEY REFERENCES Persons(P_Id)
)
To allow naming of a FOREIGN KEY constraint, and for defining a FOREIGN KEY constraint on multiple columns, use the following SQL syntax:

MySQL / SQL Server / Oracle / MS Access:

CREATE TABLE Orders
(
O_Id int NOT NULL,
OrderNo int NOT NULL,
P_Id int,
PRIMARY KEY (O_Id),
CONSTRAINT fk_PerOrders FOREIGN KEY (P_Id)
REFERENCES Persons(P_Id)
)

SQL FOREIGN KEY Constraint on ALTER TABLE

To create a FOREIGN KEY constraint on the "P_Id" column when the "Orders" table is already created, use the following SQL:

MySQL / SQL Server / Oracle / MS Access:

ALTER TABLE Orders
ADD FOREIGN KEY (P_Id)
REFERENCES Persons(P_Id)
To allow naming of a FOREIGN KEY constraint, and for defining a FOREIGN KEY constraint on multiple columns, use the following SQL syntax:

MySQL / SQL Server / Oracle / MS Access:

ALTER TABLE Orders
ADD CONSTRAINT fk_PerOrders
FOREIGN KEY (P_Id)
REFERENCES Persons(P_Id)

To DROP a FOREIGN KEY Constraint

To drop a FOREIGN KEY constraint, use the following SQL:

MySQL:

ALTER TABLE Orders
DROP FOREIGN KEY fk_PerOrders
SQL Server / Oracle / MS Access:

ALTER TABLE Orders
DROP CONSTRAINT fk_PerOrders


###SQL JOIN

The JOIN keyword is used in an SQL statement to query data from two or more tables, based on a relationship between certain columns in these tables.

Tables in a database are often related to each other with keys.

A primary key is a column (or a combination of columns) with a unique value for each row. Each primary key value must be unique within the table. The purpose is to bind data together, across tables, without repeating all of the data in every table.

Different SQL JOINs

Before we continue with examples, we will list the types of JOIN you can use, and the differences between them.

JOIN: Return rows when there is at least one match in both tables
LEFT JOIN: Return all rows from the left table, even if there are no matches in the right table
RIGHT JOIN: Return all rows from the right table, even if there are no matches in the left table
FULL JOIN: Return rows when there is a match in one of the tables

###SQL INNER JOIN Keyword

The INNER JOIN keyword return rows when there is at least one match in both tables.

SQL INNER JOIN Syntax

SELECT column_name(s)
FROM table_name1
INNER JOIN table_name2
ON table_name1.column_name=table_name2.column_name
PS: INNER JOIN is the same as JOIN.

SQL INNER JOIN Example

The "Persons" table:

P_Id LastName FirstName Address City
1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger
The "Orders" table:

O_Id OrderNo P_Id
1 77895 3
2 44678 3
3 22456 1
4 24562 1
5 34764 15
Now we want to list all the persons with any orders.

We use the following SELECT statement:

SELECT Persons.LastName, Persons.FirstName, Orders.OrderNo
FROM Persons
INNER JOIN Orders
ON Persons.P_Id=Orders.P_Id
ORDER BY Persons.LastName
The result-set will look like this:

LastName FirstName OrderNo
Hansen Ola 22456
Hansen Ola 24562
Pettersen Kari 77895
Pettersen Kari 44678
The INNER JOIN keyword return rows when there is at least one match in both tables. If there are rows in "Persons" that do not have matches in "Orders", those rows will NOT be listed.

###LEFT JOIN

SELECT Persons.LastName, Persons.FirstName, Orders.OrderNo
FROM Persons
LEFT JOIN Orders
ON Persons.P_Id=Orders.P_Id
ORDER BY Persons.LastName
The result-set will look like this:

LastName FirstName OrderNo
Hansen Ola 22456
Hansen Ola 24562
Pettersen Kari 77895
Pettersen Kari 44678
Svendson Tove
The LEFT JOIN keyword returns all the rows from the left table (Persons), even if there are no matches in the right table (Orders).

###RIGHT JOIN

SELECT Persons.LastName, Persons.FirstName, Orders.OrderNo
FROM Persons
RIGHT JOIN Orders
ON Persons.P_Id=Orders.P_Id
ORDER BY Persons.LastName
The result-set will look like this:

LastName FirstName OrderNo
Hansen Ola 22456
Hansen Ola 24562
Pettersen Kari 77895
Pettersen Kari 44678
  34764
The RIGHT JOIN keyword returns all the rows from the right table (Orders), even if there are no matches in the left table (Persons).

###FULL JOIN

SELECT Persons.LastName, Persons.FirstName, Orders.OrderNo
FROM Persons
FULL JOIN Orders
ON Persons.P_Id=Orders.P_Id
ORDER BY Persons.LastName
The result-set will look like this:

LastName FirstName OrderNo
Hansen Ola 22456
Hansen Ola 24562
Pettersen Kari 77895
Pettersen Kari 44678
Svendson Tove
  34764
The FULL JOIN keyword returns all the rows from the left table (Persons), and all the rows from the right table (Orders). If there are rows in "Persons" that do not have matches in "Orders", or if there are rows in "Orders" that do not have matches in "Persons", those rows will be listed as well.


No comments:

Post a Comment