[자료구조와 알고리즘] Heaps

13 분 소요

10. Heaps

heap을 이용하여 sorting을 하는 것을 heap sort라고 함.

heap이 주로 사용되는 곳은 priority queue임.

10-2

Complete binary tree

Complete binary tree has the maximum number of entries for its height
- 첫 번째 그림(height = 0) 에서 height 0에 대해 node를 maximum으로 채우는 경우는 1개가 들어가는 구조.

Binary tree is nearly complete if
- it has the minimum height for its nodes and
- all nodes in the last level are found on the left

10-3

Properties of Nearly complete binary trees

h = [log₂N], where N is the number of nodes and h is the height of the tree
- [] means Gauss’s notation
Leaf nodes are [N/2] + 1, [N/2] + 2, …, N
- 1 ~ [N/2] 은 자식노드를 가지는 internal node, [N/2] + 1 ~ N 는 leaf node이다.
In a complete binary tree, N = 2^h+1 – 1

10-4

6.1 Heaps

Heap is a nearly complete binary tree that satisfies the heap property
There are two kinds of binary heaps:
- max-heaps and min-heaps.
In both kinds, the values in the nodes satisfy a heap property, the specifics of which depend on the kind of heap.
In a max-heap, the max-heap property is that for every node i other than the root,
- 어떤 root 노드의 값이 그 노드의 자식 노드의 값보다 항상 크거나 같다.
- A[Parent(i)] >= A[i],
- that is, the value of a node is at most the value of its parent. Thus, the largest element in a max-heap is stored at the root
- max-heap에서 가장 큰 element는 root에 담겨있다.
A min-heap is organized in the opposite way;
- The smallest element in a min-heap is at the root

10-5

(a) 그림의 tree는 nearly complete binary tree이면서 max heap property를 만족한다.

10-6

6.2 Maintaining the heap property

In order to maintain the max-heap property, we call the procedure MAX-HEAPIFY.

Its inputs are an array A and an index i into the array. When it is called, MAX-HEAPIFY assumes that the binary trees rooted at LEFT(i) and RIGHT(i) are max-heaps, but that A[i] might be smaller than its children, thus violating the max-heap property.

MAX-HEAPIFY lets the value at A[i] “float down” in the max-heap so that the subtree rooted at index i obeys the max-heap property.

어떤 node의 값을 max heap property를 만족하도록 구조를 조정하는 것

10-7

max heap을 만들어야 하는데 2번 노드를 보니까 max heap property를 만족하지 않는 것을 볼 수 있다. (자식 노드보다 내가 더 크지 않음)
heap property를 만족할 때까지 해당 노드를 끌어 내린다.
2번 노드와 그 자식 노드인 4번 노드의 값을 비교하여 자식 노드의 값이 더 크다면 두 노드의 값을 바꾼다.
또 다시 4번 노드를 확인하여 위의 과정을 진행한다.
- 만족하지 않기 때문에 4번 노드와 9번 노드의 값을 바꿔준다.
만족!

10-8

6.3 Building a heap

We can use the procedure MAX-HEAPIFY in a bottom-up manner to convert an array A[1 .. n]

where n = A.length, into a max-heap.

10-9

internal node 중에서 가장 큰 애부터 루트 노드까지에 대하여 Max- heapify를 진행한다.

10-10

Implementation of Max Heap

class MaxHeap:  # 1
    def __init__(self, data_list = None):
        self.h = [0]	# the heap
        if data_list is not None:
            self.h.extend(data_list)
            
        # 7; build the max heap
        # apply max_heapify() only for internal nodes
        for i in range(self.size() // 2, 0, -1): # internal node의 제일 끝에 있는 애부터 1까지
            self.max_heapify(i)
            

a = [4, 8, 7, 2, 9, 10, 5, 1, 3, 6] # 66
h1 = MaxHeap(a)
print(h1.h)

10-11

	# 12; return the current size of the heap
    def size(self):
        return len(self.h) - 1 # 0 제외
    
    # 16; move item k as down as possible
    def max_heapify(self, k): # 17 ; k는 heapify를 진행할 노드 number
        left = k * 2
        right = k * 2 + 1
        
        largest = k # 21
        if left <= self.size() and self.h[left] > self.h[largest]:
            largest = left
        if right <= self.size() and self.h[right] > self.h[largest]:
            largest = right
         
        if largest != k: # 27
            self.h[k], self.h[largest] = self.h[largest], self.h[k]
            self.max_heapify(largest) # recursion

largest는 현재 부모 노드를 말하고 left node가 존재하면서(and) 그 left 노드가 부모 노드보다 크면 largest를 바꾼다.(right도 마찬가지로)
둘의 값을 바꿔준다.
sub tree로 들어가기 위해서 재귀 호출을 진행한다.()
left를 먼저 확인하고 right를 확인하기 때문에 둘 다 max-heapify를 만족하지 않는 경우 마지막에 largest를 update한 right node와 교환을 진행한다.

# max_heapify 한 번 진행할 때마다 바뀌는 과정
[4, 8, 7, 2, 9, 10, 5, 1, 3, 6] # 초기 배열

[0, 4, 8, 7, 2, 9, 10, 5, 1, 3, 6] 
[0, 4, 8, 7, 3, 9, 10, 5, 1, 2, 6] 2-3
[0, 4, 8, 10, 3, 9, 7, 5, 1, 2, 6] 7-10
[0, 4, 9, 10, 3, 8, 7, 5, 1, 2, 6] 8-9
[0, 10, 9, 7, 3, 8, 4, 5, 1, 2, 6] 10-7-4(recursion)
[0, 10, 9, 7, 3, 8, 4, 5, 1, 2, 6] 

한 노드에 대해서 max-heapify를 돌리면 나오는 결과를 보여줌
10개의 노드가 있기 때문에 10/2 부터 1까지 for문이 돈다. (5~1)

# 재귀 전까지 print()를 찍어본 것인데 참고
[0, 4, 8, 7, 2, 9, 10, 5, 1, 3, 6] # 초기 상태
[0, 4, 8, 7, 3, 9, 10, 5, 1, 2, 6] # 4번째 노드 (2,3 교환)
[0, 4, 8, 7, 3, 9, 10, 5, 1, 2, 6] # 재귀를 했지만 다음 노드가 없어서 변화 x
[0, 4, 8, 10, 3, 9, 7, 5, 1, 2, 6] # 3번째 노드 (7,10 교환)
[0, 4, 8, 10, 3, 9, 7, 5, 1, 2, 6] # 재귀를 했지만 다음 노드가 없어서 변화 x
[0, 4, 9, 10, 3, 8, 7, 5, 1, 2, 6] # 2번째 노드 (8,9 교환)
[0, 4, 9, 10, 3, 8, 7, 5, 1, 2, 6] # 재귀를 했지만 이미 만족하고 있어서 변화 x(9, 6)
[0, 10, 9, 7, 3, 8, 4, 5, 1, 2, 6] # 1번째 노드 (4, 10 교환)
[0, 10, 9, 7, 3, 8, 4, 5, 1, 2, 6] # 
[0, 10, 9, 7, 3, 8, 4, 5, 1, 2, 6]
[0, 10, 9, 7, 3, 8, 4, 5, 1, 2, 6]

아래 것이 맞는 거임(재귀까지 고려)

[0, 4, 8, 7, 2, 9, 10, 5, 1, 3, 6] # 초기 상태
[0, 4, 8, 7, 2, 9, 10, 5, 1, 3, 6]
[0, 4, 8, 7, 3, 9, 10, 5, 1, 2, 6] # 4번째 노드 (2,3 교환)
[0, 4, 8, 7, 3, 9, 10, 5, 1, 2, 6]
[0, 4, 8, 10, 3, 9, 7, 5, 1, 2, 6] # 3번째 노드 (7,10 교환)
[0, 4, 8, 10, 3, 9, 7, 5, 1, 2, 6]
[0, 4, 9, 10, 3, 8, 7, 5, 1, 2, 6] # 2번째 노드 (8,9 교환)
[0, 4, 9, 10, 3, 8, 7, 5, 1, 2, 6]
[0, 10, 9, 4, 3, 8, 7, 5, 1, 2, 6] # 1번째 노드 (4, 10 교환)
[0, 10, 9, 7, 3, 8, 4, 5, 1, 2, 6] # (재귀) 3번째 노드(4, 7 교환)
[0, 10, 9, 7, 3, 8, 4, 5, 1, 2, 6] 

코드 주고 다음을 실행하였을 때 배열의 값들이 어떻게 변화하는지 시험에 나올 것 같음.

10-12

6.4 The heapsort algorithm

asending order(오름차순)
내가 가진 값들 중 가장 큰 data를 가진 노드는 항상 1번 index 일 것인데 이 아이디어를 이용

제일 큰 값인 1번 index를 제일 작은 값인 마지막 index 간의 자리를 서로 바꾼다.
heap의 길이를 하나 줄인다.
- 이 과정을 통해 1번 index는 유일하게 heap property를 만족하지 않는 노드가 된다.
맨 위에 있는 1번 index의 노드(최솟값)를 MAX_HEAPIFY를 통해 아래로 끌어내린다.
위의 3 과정을 계속해서 반복(재귀적으로)

10-13

10-14

시험에 100% 나옴

	def heap_sort(self): # 31
        save = self.h[:] # shallow copy
        sorted_list = self.h[1:]
        
        for i in range(self.size(), 0, -1): # 35 ; 제일 마지막 노드부터
            self.h[1], self.h[i] = self.h[i], self.h[1]
            sorted_list[i - 1] = self.h[i] #제일 큰 걸 가져와서 list에 넣는다.
            self.h.pop() #list에 넣은 건 제거
            self.max_heapify(1) # 39 ; root에 대한 heapify
            
        self.h = save # 41 ; 망가진 h 복구
        return sorted_list

print(h1.heap_sort())
print(h1.h)

save에 h를 copy를 하고나서 heapify를 진행하면 h에 대해 heapify가 진행 되어 h가 망가지기 때문에 저장했던 save를 h에 다시 복구해 준다.
max-heapify는 클래스에 내장된 h를 이용하기 때문에 save에 대해서 heapify가 진행되지 않고 무조건 h로 heapify가 진행된다. 그래서 root 노드와 가장 마지막 노드를 바꾸는 것 까지는 되지만 제일 중요한 heapify를 못하기 때문에 내가 질문 한 건 틀린 것이다.
- 내가 한 질문: save로 heap sort를 진행하고 41번 라인을 없애면 안되나?

10-15

6.5 Priority queues

In this section, we present one of the most popular applications of a heap:

as an efficient priority queue.

As with heaps, priority queues come in two forms:

max-priority queues
min-priority queues.

A priority queue is a data structure for maintaining a set S of elements, each with an associated value called a key. A max-priority queue supports the following operation:

INSERT(S, x) inserts the element x into the set S, which is equivalent to the operation

MAXIMUM(S) returns the element of S with the largest key.
EXTRACT-MAX(S) removes and returns the element of S with the largest key.

10-16

procedure HEAP-EXTRACT-MAX

psuedo code

#HEAP-EXTRACT-MAX(A) # max heap 리스트 
if A.heap_size < 1: # 1
    error "heap underflow"
max = A[1]
A[1] = A[A.heap_size] # 제일 끝에 있는 노드를 제일 첫 노드에 넣는다.
A.heap_size = A.heap_size - 1 # heap size를 하나 줄인다.
MAX-HEAPIFY(A, 1) # 제일 위에 있는 노드는 heap property를 만족하지 않으므로 끌어 내린다.
return max # 7

10-17

	def pop(self): # 44
        if self.size() == 0:
            return None
        
        item = self.h[1] # 48 
        # copy the last item to root and remove it
        self.h[1] = self.h[self.size()]
        self.h.pop()
        self.max_heapify(1)
        
        return item # 54

for _ in range(h1.size()): # 73
    print(h1.pop(), end=' ')
print()
print(h1.h) # 76

10-18

procedure MAX-HEAP-INSERT

데이터를 추가할 때는 그냥 집어 넣으면 안 되고 적절히 자기 자리를 찾아서 넣어주어야 한다.

heap property를 만족하도록 15를 추가하려고 한다.
새로 추가할 노드를 무조건 맨 밑, 맨 왼쪽에 붙인다. -> 항상 nearly complete binary tree (heap 만족)
붙인 15를 heap property를 만족하도록 위로 올린다. (heapify)
위로 올리는 과정은 아래로 내리는 heapify보다 더 쉽다.
- 부모 노드랑만 비교하면 되기 때문에 (부모 노드보다 크면 자식 노드보다는 무조건 클 것이기 때문에)

10-19

	def insert(self, item): # 56
        self.h.append(item)
        
        # move up the item to keep the max-heap property
        k = self.size() # 60
        while k > 1 and self.h[k] > self.h[k//2]:
            self.h[k], self.h[k//2] = self.h[k//2], self.h[k]
            k //= 2 # 내 index를 부모 node index와 바꿈.

h1 = MaxHeap() # 80
for d in (4, 8, 7, 2, 9, 10, 5, 1, 3, 6):
    h1.insert(d)
print(h1.h)

먼저 heap의 제일 마지막에 붙이면 되기 때문에 그냥 append를 하고
heap이 비지 않았으면서(and) 새로 추가하는 노드가 부모 노드보다 크다면 둘이 바꿔줘야 한다.
바꿔주고 나서 내 인덱스를 바꾼 위치로 update해서 반복문을 계속 돌면서 올라간다.
이 과정을 거치면 새로 추가하는 노드는 자기의 위치를 찾아갈 것이다.

10-20

Time Complexity of Building a Heap

MaxHeap()의 time complexity는?

Each call to MAX-HEAPIFY costs O(logn) time, and BUILD-MAX-HEAP makes O(n) such calls.

Thus, the running time is O(nlogn).
MAX-HEAPIFY -> root 노드부터 시작해서 재귀적으로 heapify
BUILD-MAX-HEAP -> internal node의 가장 큰 애부터 root node까지 max-heapify를 진행하는 과정

This upper bound, though correct, is not asymptotically(점근적으로) tight.

We can derive a tighter bound by observing that the time for MAX-HEAPIFY to run at a node varies with the height of the node in the tree, and the heights of most nodes are small.

Out tighter analysis relies on the properties that an n-element heap has height [logn] and at most [n/2^h+1] nodes of any height h

[n/2^h+1ㄱ -> 반올림(ceiling)
- It means nodes of any height h
- 각 층에 있을 수 있는 노드의 최대 갯수
height = ㄴlogn] -> 가우스 (괄호에서 위가 잘림 - ㄴ 모양)
- It means height
제일 아래 층이 h = 0! / n은 노드의 전체 갯수를 의미!

worst time complexity

앞에 곱해지는 term
- Max heapify에서 loop를 도는 최대 횟수
- 0층은 leaf node이므로 loop를 0번 도는 것이고
- 1층은 그 위의 노드로 0층에 대한 loop를 1번 돌면 된다.
뒤에 곱해지는 term(ex.n/2 by 위에 있는 n/2^h+1)
- n/2 : h=0에 있을 수 있는 노드의 최대 갯수
- n/4: h=1에 있을 수 있는 노드의 최대 갯수

10-21

The time required by MAX-HEAPIFY when called on a node of height h is O(h), and so we can express the total cost of BUILD-MAX-HEAP as being bounded from above by

-> O(h) : max-heapify에서 loop를 도는 횟수

0x , 1x, 2x, …. 를 묶어 놓은 것

오른쪽 항에 앞에 n뒤에 붙은 sum의 의미는 n이 어떤 값을 갖던지 상관없이 2보다 작거나 같은 상수를 뜻한다.
- O(n) 으로 퉁칠 수 있음

We evaluate the last summation by substituting x = 1/2 in the formula (A.8), yielding

10-22

Why should one ever use a heap instead of a sorted array?

정렬된 배열 대신 힙을 사용해야 하는 이유?

Thus, we can bound the running time of BUILD-MAX-HEAP as

Yes, building a heap takes linear time and inserts/deletes take O(logn). While peeking the heap, which means finding the min/max stored element is, of course, O(1), popping the head is O(logn).

The answer is simple. In some use cases, you don’t care about sorting all of your data, but you do care about constantly maintaining the smallest/largest element, which corresponds to a Priority Queue, a concept that is used extensively and in many applications.

삽입을 하는 상황에서
- sorted list의 경우 -> O(n)
  - 하나하나 밀어야 되니까
- priority queue의 경우 -> O(logn)
큰 데이터를 뽑아내야 하는 상황에서
- sorting의 경우, 정렬을 하는데 O(nlogn)이고
- priority queue의 경우 O(n)이기 때문에 prioriy queue가 훨씬 빠르다.

10-23

heapq - HEAP queue algorithm

python 내장 heap module

heapq.heappush(heap, item)

Push the value item onto the heap, maintaining the heap invariant
- insert 하는 함수
- 값이 제일 작은 애가 제일 앞에 나와있는 구조

heapq.heappop(heap)

Pop and return the smallest item from the heap, maintaining the heap invariant. If the heap is empty, IndexError is raised. To access the smallest item without popping it, use heap[0].
제일 앞에 있는 놈을 추출

heapq.heappushpop(heap, item)

Push item on the heap, then pop and return the smallest item from the heap. The combined action runs more efficiently than heappush() followed by a separate call to heappop().
item을 insert하고, 제일 앞에 있는 data를 pop

heapq.heapify(x)

Transform list x into a heap, in-place, in linear time.
주의할 점: minheap 이라는 점이다. (우리가 배운 maxheap과 다르게)

https://docs.python.org/3.8/library/heapq.html

10-24

import heapq # 1


# A heap sort algorithm
def heapsort(iterable): # 5
    h = []
    for value in iterable: # heapify 하는 과정
        heapq.heappush(h, value)
    return [heapq.heappop(h) for _ in range(len(h))] # sorting 과정


# You can transform a populated list into a heap
a = [3, 5, 1, 2, 6, 8, 7] # 13 
heapq.heapify(a)
print(a)
print(heapsort(a))

# Or use a list initialized to []
# Heap elements can be tuples.
b = [] # 20
heapq.heappush(b, (5, 'write code'))
heapq.heappush(b, (7, 'release product'))
heapq.heappush(b, (1, 'write spec'))
print(heapq.heappop(b)) # 24

실행 결과

[1, 2, 3, 5, 6, 8, 7]
[1, 2, 3, 5, 6, 7, 8]
(1, 'write spec')

튜플로 heap data가 구성되는 경우 무조건 맨 앞에 있는 element를 기준으로 우선순위가 결정 된다.
- 그래서 가장 작은 값을 가지는 1의 튜플을 반환한다.(min-heap이기 때문에)

(별첨)max-heapify 되는 과정

[0, 3, 5, 1, 2, 6, 8, 7] 
[0, 3, 5, 8, 2, 6, 1, 7]
[0, 3, 5, 8, 2, 6, 1, 7]
[0, 3, 6, 8, 2, 5, 1, 7]
[0, 3, 6, 8, 2, 5, 1, 7]
[0, 8, 6, 3, 2, 5, 1, 7]
[0, 8, 6, 7, 2, 5, 1, 3]

heap의 크기가 7이므로 7//2 = 3번부터 1번까지 min-heapify가 진행된다.

10-25

https://www.acmicpc.net/problem/14235

10-26

max heap으로 priority queue를 만들어서 큐가 비었으면 -1 아니면 queue에서 꺼내서 준다.
python은 minheap이기 때문에 max heap으로 하기 위해서는 queue에 저장할 데이터를 저장할 때 -를 붙였다가 다시 출력할 때 -를 붙여주면 된다.
2개를 충전하는데 첫번째는 3짜리 두번째는 2짜리
- 우선순위로 3을 먼저 줄 것이다.

import heapq
import sys

sys.stdin = open('bj14235_in.txt', 'r')

n = int(input())
present = []

for i in range(n):
    a = list(map(int, input().split()))

    if a[0] == 0:
        if len(present) == 0:
            print(-1)
        else:
            tmp = -heapq.heappop(present)
            print(tmp)
    else:
        for j in range(a[0]):
            heapq.heappush(present, -a[j+1])

10-27

https://www.acmicpc.net/problem/14241

10-28

큰 점수를 얻으려면 최대한 먼저 큰 놈끼리 합쳐야 한다.
queue에 데이터가 하나 남을 때까지 반복
- 두 놈을 꺼내서 점수 계산하고 합친 놈을 다시 queue에 넣고

Twitter Facebook LinkedIn

Kang Chang Ryong