5.21M
Categories: informaticsinformatics softwaresoftware

Window Functions. Lecture 7

1.

Lecture 7
Window Functions
Assistant professor: Yermaganbetova Madina

2.

PostgreSQL Window Functions
• A window function performs a calculation across a set of table rows
that are somehow related to the current row. This is comparable to
the type of calculation that can be done with an aggregate function.
But unlike regular aggregate functions, use of a window function does
not cause rows to become grouped into a single output row — the
rows retain their separate identities. Behind the scenes, the window
function is able to access more than just the current row of the query
result.

3.

PostgreSQL Window Functions
• Window functions like aggregate functions operate on a set of rows
called a window frame. Unlike aggregate functions, window functions
return a single value for each row from the underlying query. The
window is defined using the OVER() clause. This allows us to define
the window based on a specific column, similar to GROUP BY in
aggregate functions. You can use aggregate functions with window
functions but you will need to use them with the OVER() clause.

4.

Window Functions and Aggregate Functions
Both window functions and aggregate functions:
• Operate on a set of rows
• Calculate aggregate amounts
• Group or partition data on one or more columns
Aggregate functions differ from window functions in:
• Using GROUP BY to define a set of rows for aggregation
• Group rows based on column values
• Collapses rows to a single row for each defined group
Window functions differ from aggregate functions in:
• Use OVER() instead of GROUP BY to define the set of rows
• Use more functions in addition to aggregates eg RANK(), LAG(), LEAD()
• Can group rows on the rows rank, percentile, etc. in addition to its column value
• Does not collapse rows to a single row per group
• Might use a sliding window frame based on the current row

5.

SQL window function types
Value window
functions
• FIRST_VALUE()
• LAG()
• LAST_VALUE()
• LEAD()
Ranking window
Aggregate window
functions
functions
• CUME_DIST()
• AVG()
• DENSE_RANK()
• COUNT()
• NTILE()
• MAX()
• PERCENT_RANK() • MIN()
• RANK()
• SUM()
• ROW_NUMBER()

6.

Window Functons in Action
• Lets assume that you wanted to find the
highest paid person in each department.
There's a chance you could do this by
creating
a
complicated
stored
procedure, or maybe even some very
complex SQL. Most developers would
even opt for pulling the data back into
their preferred language and then
looping over results. With window
functions this gets much easier.

7.

Hopefully its clear from here how we can filter and find only the top paid
employee in each department:
The best part of this is Postgres will
optimize the query for you versus
parsing over the entire result set if you
were to do this your self in plpgsql or in
your applications code.

8.

Analytic (Window) Functions
• Window functions applies aggregate and ranking functions over a
particular window (set of rows).
• Unlike aggregate functions, they can return multiple rows for each
group
• OVER clause is used with window functions to define that window.
• OVER clause does two things :
Partitions rows into form set of rows. (PARTITION BY clause is used)
Orders rows within those partitions into a particular order. (ORDER BY
clause is used)

9.

• Syntax:
window_function(arg1, arg2,..)
OVER ( [PARTITION BY partition_expression]
[ORDER BY expression] );
The window_function is the name of the window function. Some window functions do not accept
any argument.
The PARTITION BY clause divides rows into multiple groups or partitions to which the window
function is applied. Like the example above, we used the product group to divide the products
into groups (or partitions).
The PARTITION BY clause is optional. If you skip the PARTITION BY clause, the window function
will treat the whole result set as a single partition.
The ORDER BY clause specifies the order of rows in each partition to which the window function
is applied.
The ORDER BY clause uses the NULLS FIRST or NULLS LAST option to specify whether nullable
values should be first or last in the result set. The default is NULLS LAST option.
The frame_clause defines a subset of rows in the current partition to which the window function
is applied. This subset of rows is called a frame.

10.

If you use multiple window functions in a query:
you can use the WINDOW clause to shorten the query as shown in the following query:
It is also possible to use the WINDOW clause even though you call one window function in a query:

11.

ROW_NUMBER Function
• Assigns numbering to the rows of the
result set data.
• The set of rows on which
the ROW_NUMBER() function operates
is called a window.
Syntax:
ROW_NUMBER() OVER( [PARTITION BY
column_1, column_2,…] [ORDER BY
column_3,column_4,…] )

12.

Row_number()Example:

13.

RANK() Function
• The RANK() function assigns a ranking
within an ordered partition.
• The rank of the first row is 1.
• The RANK() function adds the number of
tied rows to the tied rank to calculate the
rank of the next row, so the ranks may not
be sequential. In addition, rows with the
same values will get the same rank.
Syntax:
RANK() OVER ( [PARTITION BY
partition_expression, ... ] ORDER
BY sort_expression [ASC | DESC],
... )

14.

RANK() Example:
In the laptop product group, both Dell Vostro and
Sony VAIO products have the same price,
therefore, they receive the same rank 1. The next
row in the group is HP Elite that receives the rank
3 because the rank 2 is skipped.

15.

DENSE_RANK() Function
• The DENSE_Rank() function assigns
ranking within an ordered partition BUT Syntax:
DENSE_RANK() OVER ( [PARTITION BY
the ranks are consecutive.
partition_expression, ... ] ORDER
• For
each
partition, BY sort_expression [ASC | DESC],
the DENSE_RANK() function returns the ... )
same rank for the rows which have the
same values.

16.

DENSE_RANK() Example:
Within the laptop product group, rank 1 is
assigned twice to Dell Vostro and Sony VAIO.
The next rank is 2 assigned to HP Elite.

17.

CUME_DIST()
• Sometimes, you may want to create a report that shows the top or
bottom x% values from a data set, for example, top 1% of products by
revenue. Fortunately, PostgreSQL provides us with the CUME_DIST()
function to calculate it.
• CUME_DIST = RANK/COUNT

18.

CUME_DIST() Example:
SELECT *, cume_dist() OVER ( PARTITION BY subject ORDER BY grade )
FROM student_grade;

19.

PERCENT_RANK()
• The PERCENT_RANK() function is like the CUME_DIST() function. The
PERCENT_RANK() function evaluates the relative standing of a value
within a set of values.
• PERCENT_RANK = (RANK – 1)/(COUNT -1)
• The following illustrates the syntax of the PERCENT_RANK() function:

20.

PERCENT_RANK() Example:
SELECT name, amount, PERCENT_RANK() OVER ( PARTITION BY year ORDER BY
amount ) FROM sales_stats;

21.

NTILE
• The PostgreSQL NTILE() function allows you to divide ordered rows in
the partition into a specified number of ranked groups as equal size as
possible. These ranked groups are called buckets.
• Number of rows in each group = number of rows in set / number of
specified groups

22.

NTILE example:
SELECT name, amount, NTILE(3) OVER(PARTITION BY year ORDER BY amount )
FROM sales_stats;

23.

FIRST_VALUE() Function
• The function returns the first value from the first row of the
ordered set.
Syntax:
FIRST_VALUE ( expression ) OVER ( [PARTITION BY
partition_expression, ... ] ORDER BY sort_expression)

24.

FIRST_VALUE() Example:
The following statement uses the FIRST_VALUE()
to return the lowest price for every product
group.

25.

LAST_VALUE() Function
• The function returns the last value in an ordered partition of a result set.
Syntax:
LAST_VALUE ( expression ) OVER ( [PARTITION
BY partition_expression, ... ] ORDER BY
sort_expression)

26.

LAST_VALUE()
Example:
The following statement uses the LAST_VALUE() function to return
the highest price for every product group.

27.

LEAD() Function
• The LEAD() function has the ability to access data from the next row.
• The LEAD() function is very useful for comparing the value of the current
row with the value of the row that following the current row.
Syntax:
LEAD(expression [,offset
[,default_value]]) OVER ( [PARTITION BY
partition_expression, ... ] ORDER BY
sort_expression [ASC | DESC], ... )

28.

LEAD() Example:
The following statement uses the LEAD()
function to return the prices from the next row
and calculates the difference between the price
of the current row and the next row.

29.

LAG() Function
• The LAG() function has the ability to access data from the previous row.
• The LAG() function will be very useful for comparing the values of the
current and the previous row.
Syntax:
LAG( expression [,offset [,default_value]])
OVER ( [PARTITION BY partition_expression,
... ] ORDER BY sort_expression [ASC | DESC],
... )

30.

LAG()
Example:
The following statement uses the LAG()
function to return the prices from the
previous row and calculates the difference
between the price of the current row and the
previous row.

31.

Assignment
SELECT
ROW_NUMBER() OVER (
ORDER BY E_id
) row_num,
E_name
FROM
Employee;

32.

Assignment
SELECT
first_name,
last_name,
salary,
FIRST_VALUE (first_name)
OVER (
ORDER BY salary
) lowest_salary
FROM
employees e;

33.

Assignment
SELECT
employee_id,
fiscal_year,
salary,
LAG(salary) OVER (
PARTITION BY
employee_id
ORDER BY
fiscal_year) previous_salary
FROM
basic_pays;

34.

Assignment
SELECT
first_name,
last_name,
hire_date,
LEAD(hire_date, 1) OVER (
ORDER BY
hire_date
) AS next_hired
FROM
employees;

35.

Assignment
SELECT
first_name,
last_name,
salary,
RANK() OVER (ORDER BY
salary) salary_rank
FROM
employees;

36.

Assignment
select
sellerid, qty,
percent_rank()
over (partition by
sellerid order by qty)
from winsales;

37.

Assignment
select sellerid, qty,
cume_dist()
over (partition by
sellerid order by qty)
from winsales;

38.

Assignment
select salesid, sellerid,
qty,
row_number() over
(partition by sellerid
order by qty asc) as
row
from winsales
order by 2,4;

39.

Assignment
select eventname, caldate,
pricepaid, ntile(4)
over(order by pricepaid
desc) from sales, event, date
where
sales.eventid=event.eventid
and
event.dateid=date.dateid
and eventname='Hamlet'
and caldate='2008-08-26'
order by 4;
English     Русский Rules