Similar presentations:
Sql Chapter Two
1.
SQL Chapter Two2.
Overview• Basic Structure
• Verifying Statements
• Specifying Columns
• Specifying Rows
3.
IntroductionSQL is a modular language that uses
statements and clauses.
4.
Basic structure of PROC SQL:PROC SQL;
statement (select)
clauses (from, where, group by, having, order by);
QUIT;
Note: place semicolon at the end of the last clause only.
5.
Statementsselect - specifies the columns to be selected
Select statement has the following features:
-selects data that meets certain conditions
-groups data
-specifies an order for the data
-formats data
-calculates new variables
6.
Clausesfrom - specifies the tables to be queried
where - subsets the data based on a condition - optional
group by - classifies the data into groups - optional
having - subsets groups of data based on a group condition
order by - sorts row by the values of specific columns
Note: the order of the clauses are significant.
7.
Overview• Basic Structure
• Verifying Statements
• Specifying Columns
• Specifying Rows
8.
Verifying StatementsTwo functions that can be used to verify if your statement syntax
are:
validate - used to check the select statement syntax
noexec - checks for invalid syntax in all types of SQL statements
9.
Validateproc sql;
validate
select timemile, restpulse,
maxpulse
from project.fitness
where timemile gt 7;
proc sql;
validate
select timemile, restpulse,
maxpulse,
from project.fitness
where timemile gt 7;
NOTE: PROC SQL statement
has valid syntax.
Syntax error, expecting one
of the following: a quoted
string, !, !!, &...
10.
NoExectproc sql noexec;
select timemile, restpulse, maxpulse
from project.fitness
where timemile gt 7;
NOTE: Statement not executed due to NOEXEC option.
11.
ContrastingFeatures of validate:
-tests syntax of query without
executing the query
-checks the validity of column
name
-prints error messages for
invalid queries
-is only used for select
statements
Features of noexec:
-Checks for invalid syntax in all
types of SQL statements
12.
Overview• Basic Structure
• Verifying Statements
• Specifying Columns
• Specifying Rows
13.
Specifying ColumnsObjectives
-Displaying columns directly from a table
-Displaying columns calculated from other
columns
-Calculating columns using a CASE expression
14.
Displaying data from a tableTo print all of a table columns in the order that they were stored, use an
asterisk in the SELECT statement: PATIENT PULSE TEMP BPS BPD
PROC SQL;
SELECT *
FROM VITALS;
QUIT;
101
101
101
102
102
102
103
103
103
103
104
104
72
75
74
81
77
78
77
77
78
75
72
69
98.5
98.6
98.5
99
98.7
98.7
98.3
98.5
98.6
99.2
98.8
99.1
130
133
136
141
144
142
137
133
140
147
128
131
88
92
90
93
97
93
79
74
80
89
83
86
15.
Printing Specify ColumnsIf you do not want to print out all columns in a table in the order that
they were stored, you can specify the columns to be printed in the
order that you want them in the SELECT statement or CASE
EXPRESSION in the select statement .
PROC SQL;
CREATE TABLE TESTMED AS
SELECT PATIENT,
CASE ((PATIENT/2 =
INT(PATIENT/2)) +
(PATIENT = .))
WHEN 1 THEN 'Med A'
WHEN 0 THEN 'Med B'
ELSE 'Error'
END AS DOSEGRP
LENGTH=5
FROM VITALS
ORDER BY PATIENT;
QUIT;
PATIENT
101
101
101
102
102
102
103
103
103
103
104
104
104
DOSEGRP
Med B
Med B
Med B
Med A
Med A
Med A
Med B
Med B
Med B
Med B
Med A
Med A
Med A
16.
Calculating ColumnsWe can calculate a new column by using data in an existing column
and then naming the new column using the as function.
Calculate the proportion of Units form each country
CODE:
OUTPUT:
17.
Calculated columns using SAS DatesRecall from previous chapters in our SAS book that dates
are stored in a different format when run through SAS.
We will then use these dates to calculate new columns.
18.
Example: Calculate the range of dates in a Dailypricesdataset.
CODE:
OUTPUT:
19.
Creating new columnsThe use of CASE expression can be used to create a new column
CODE:
OUTPUT:
20.
Creating a tableTo create and populate a table with the rows from an SQL query,
use create table.
proc sql;
create table states as
select state_code,
state_name
from d2data.state;
quit;
State_ Obs Code State_Name
99
UT
Utah
100
VT
Vermont
101
VA
Virginia
102
WA
Washington
103
WV
West Virginia
104
WI
Wisconsin
105
WY
Wyoming
106
N/A
21.
Overview• Basic Structure
• Verifying Statements
• Specifying Columns
• Specifying Rows
22.
Specifying Rows in a tableObjectives
-Selecting a subset of rows
-Removing duplicate rows
-Subsetting using where clauses, escape clauses, and
calculated values
23.
Selecting a subset of rowsproc sql;
title 'large orders';
select Product_ID,
total_retail_price
from d2data.order_item
where total_retail_price >
1000;
quit;
Large orders
Total Retail Price
Product ID
For This Product
240200100076
$1,796.00
240400200097
$1,250.40
240100400043
$1,064.00
240200200013
$1,266.00
240300100032
$1,200.20
240300300070
$1,514.40
230100700009
$1,687.50
230100700008
$1,542.60
240300300090
$1,561.80
230100700009
$1,136.20
230100200025
$1,103.60
240200100173
$1,937.20
24.
Where clauseUse a where to specify a condition that data must fulfill before
being selected.
CODE:
OUTPUT:
Where clauses uses common comparisons (lt, gt, eq, etc)
and logical operators (OR, Not, And, In, Is Null, ...).
25.
Removing duplicationsUse distinct keyword to eliminate duplications.
CODE (without DISTINCT):
OUTPUT:
CODE (with DISTINCT):
26.
Escape ClauseThe escape clause allows you to designate a single character that will
indicate how proc sql will interpret LIKE wildcards when SAS is
searching within a character string.
Example: Select observations from a string variable
containing an underscore ('_').
CODE:
OUTPUT:
27.
Subsetting calculated valuesSince the where clause is evaluated before the select, it's
possible for an error to show up since the columns used in the
where clause must exist in the table or be derived from an
existing column.
There are two fixes for this, the first would be repeating the
calculation in the where clause. The alternative method would
be using CALCULATED keyword to refer to an already calculated
column in the select.
28.
Subsetting calculated valuesproc sql;
title 'Lack of profit';
select Product_ID,
((total_retail_price/quan
tity) - costprice_per_Unit) as
profit
from d2data.order_item
where calculated profit < 3;
quit;
title;
Lack of profit
Product ID profit
230100500045
0.7
230100500068
0.9
240100100433
1.85
240700200004
2
240200100021
1.5
240100100031
2.4
240700200007
2.9
240100100232
1.9
230100500004
1.85
230100500004
1.85
240700100017 -1.41
29.
SummaryBasic Structure
• PROC SQL;
statement (select)
clauses (from, where, group by, having, order by);
QUIT;
Verifying Statements
• validate - used to check the select statement syntax
• noexec - checks for invalid syntax in all types of SQL statements
Specifying Columns
• Displaying columns directly from a table
• Displaying columns calculated from other columns
• Calculating columns using a CASE expression
Specifying Rows
• Selecting a subset of rows
• Removing duplicate rows
• Subsetting using where clauses, escape clauses, and calculated values