Spring 2012
Master of Computer Application (MCA) – Semester IV
MC0080 – Analysis and design of algorithms
Assignment Set – 2
- Dijkstra's algorithm, conceived by Dutch computer scientist Edsger Dijkstra in 1956 and published in 1959, is a graph search algorithm that solves the single-source shortest path problem for a graph with non-negative edge path costs, producing a shortest path tree. This algorithm is often used in routing and as a subroutine in other graph algorithms.
For a given source vertex (node) in the graph, the algorithm finds the path with lowest cost (i.e. the shortest path) between that vertex and every other vertex. It can also be used for finding costs of shortest paths from a single vertex to a single destination vertex by stopping the algorithm once the shortest path to the destination vertex has been determined. For example, if the vertices of the graph represent cities and edge path costs represent driving distances between pairs of cities connected by a direct road, Dijkstra's algorithm can be used to find the shortest route between one city and all other cities.
Let the node at which we are starting be called the initial node. Let the distance of node Y be the distance from the initial node to Y. Dijkstra's algorithm will assign some initial distance values and will try to improve them step by step.
- Assign to every node a tentative distance value: set it to zero for our initial node and to infinity for all other nodes.
- Mark all nodes unvisited. Set the initial node as current. Create a set of the unvisited nodes called the unvisited set consisting of all the nodes except the initial node.
- For the current node, consider all of its unvisited neighbors and calculate their tentative distances. For example, if the current node A is marked with a tentative distance of 6, and the edge connecting it with a neighbor B has length 2, then the distance to B (through A) will be 6+2=8. If this distance is less than the previously recorded tentative distance of B, then overwrite that distance. Even though a neighbor has been examined, it is not marked as visited at this time, and it remains in the unvisited set.
- When we are done considering all of the neighbors of the current node, mark the current node as visited and remove it from the unvisited set. A visited node will never be checked again; its distance recorded now is final and minimal.
- If the destination node has been marked visited (when planning a route between two specific nodes) or if the smallest tentative distance among the nodes in the unvisited set is infinity (when planning a complete traversal), then stop. The algorithm has finished.
- Set the unvisited node marked with the smallest tentative distance as the next "current node" and go back to step 3.
- a) Binary Search Trees: In computer science, a binary search tree (BST), which may sometimes also be called an ordered or sorted binary tree, is a node-based binary tree data structure which has the following properties:
- The left subtree of a node contains only nodes with keys less than the node's key.
- The right subtree of a node contains only nodes with keys greater than or equal to the node's key.
- Both the left and right subtrees must also be binary search trees.
Generally, the information represented by each node is a record rather than a single data element.
However, for sequencing purposes, nodes are compared according to their keys rather than any part of their associated records.
The major advantage of binary search trees over other data structures is that the related sorting algorithms and search algorithms such as in-order traversal can be very efficient.
Binary search trees are a fundamental data structure used to construct more abstract data structures such as sets, multisets, and associative arrays. Operations on a binary search tree require comparisons between nodes. These comparisons are made with calls to a comparator, which is a subroutine that computes the total order (linear order) on any two values. This comparator can be explicitly or implicitly defined, depending on the language in which the BST is implemented.
The height of the Binary Search Tree equals the number of links from the root node to the deepest node.
The reason that binary trees are used more often than n-ary trees for searching is that with every comparison in a (balanced) binary tree, you eliminate about half the tree from consideration. In contrast, with
an n-ary tree, you can eliminate (n-1)/n of the tree using log(n) comparisons (using a binary search).
Applications of binary trees
- Binary Search Tree - Used in many search applications where data is constantly entering/leaving, such as the map and set objects in many languages' libraries.
- Binary Space Partition - Used in almost every 3D video game to determine what objects need to be rendered.
- Binary Tries - Used in almost every high-bandwidth router for storing router-tables.
- Hash Trees - used in p2p programs and specialized image-signatures in which a hash needs to be verified, but the whole file is not available.
- Heaps - Used in heap-sort; fast implementations of Dijkstra's algorithm; and in implementing efficient priority-queues, which are used in scheduling processes in many operating systems, Quality-of-Service in routers, and A* (path-finding algorithm used in AI applications, including video games).
- Huffman Coding Tree (Chip Uni) - used in compression algorithms, such as those used by the .jpeg and .mp3 file-formats.
- GGM Trees - Used in cryptographic applications to generate a tree of pseudo-random numbers.
- Syntax Tree - Constructed by compilers and (implicitly) calculators to parse expressions.
- Treap - Randomized data structure used in wireless networking and memory allocation.
- T-tree - Though most databases use some form of B-tree to store data on the drive, databases which
- keep all (most) their data in memory often use T-trees to do so.
b) Red Black Trees: A red–black tree is a type of self-balancing binary search tree, a data structure used in computer science, typically to implement associative arrays.
Since it is a balanced tree, it guarantees insertion, search and delete to be O(log n) in time, where n is the total number of elements in the tree. A red–black tree is a special type of binary tree, used in computer science to organize pieces of comparable data, such as text fragments or numbers.
The leaf nodes of red–black trees do not contain data. These leaves need not be explicit in computer memory—a null child pointer can encode the fact that this child is a leaf—but it simplifies some algorithms for operating on red–black trees if the leaves really are explicit nodes. To save memory, sometimes a single sentinel node performs the role of all leaf nodes; all references from internal nodes to leaf nodes then point to the sentinel node.
Red–black trees, like all binary search trees, allow efficient in-order traversal (that is: in the order Left–Root–Right) of their elements. The search-time results from the traversal from root to leaf, and therefore a balanced tree, having the least possible tree height, results in O(log n) search time. A red–black tree is a binary search tree where each node has a color attribute, the value of which is either red or black. In addition to the ordinary requirements imposed on binary
search trees, the following requirements apply to red–black trees:
- A node is either red or black.
- The root is black. (This rule is sometimes omitted from other definitions. Since the root can always be changed from red to black, but not necessarily vice-versa, this rule has little effect on analysis.)
- All leaves are the same color as the root.
- Both children of every red node are black.
- Every simple path from a given node to any of its descendant leaves contains the same number of black nodes.
- Context-free grammars : In formal language theory, a context-free grammar (CFG) is a formal grammar in which every production rule is of the form V → w; where V is a single nonterminal symbol, and w is a string of terminals and/or nonterminals (w can be empty).
The languages generated by context-free grammars are known as the context-free languages.
Context-free grammars are important in linguistics for describing the structure of sentences and words in natural language, and in computer science for describing the structure of programming languages and other formal languages.
In linguistics, some authors use the term phrase structure grammar to refer to context-free grammars, whereby phrase structure grammars are distinct from dependency
grammars. In computer science, a popular notation for context-free grammars is Backus–Naur Form, or BNF. A context-free grammar provides a simple and mathematically precise mechanism for describing the methods by which phrases in some natural language are built from smaller blocks, capturing the "block structure" of sentences in a natural way. Its simplicity makes the formalism amenable to rigorous mathematical study.
Important features of natural language syntax such as agreement and reference are not part of the context-free grammar, but the basic recursive structure of sentences, the way in which clauses nest inside other clauses, and the way in which lists of adjectives and adverbs are swallowed by nouns and verbs, is described exactly. The formalism of context-free grammars was developed in the mid-1950s by Noam Chomsky, and also their classification as a special type of formal grammar (which he called phrase-structure grammars). What Chomsky called a phrase structure grammar is also known now as a constituency grammar, whereby constituency grammars stand in contrast to dependency grammars. In Chomsky's generative grammar framework, the syntax of natural language was described by context-free rules combined with transformation rules.
Context-free grammars are simple enough to allow the construction of efficient parsing algorithms which, for a given string, determine whether and how it can be generated from the grammar. An Earley parser is an example of such an algorithm, while the widely used LR and
LL parsers are simpler algorithms that deal only with more restrictive subsets of context-free grammars.
- Turing machines: A Turing machine is a device that manipulates symbols on a strip of tape according to a table of rules. Despite its simplicity, a Turing machine can be adapted to simulate the logic of any computer algorithm, and is particularly useful in explaining the functions of a CPU inside a computer. The Turing machine is not intended as a practical computing technology, but rather as a hypothetical device representing a computing machine. Turing machines help computer scientists understand the limits of mechanical computation. A Turing machine that is able to simulate any other Turing machine is called a universal Turing machine (UTM, or simply a universal machine). The Turing machine mathematically models a machine that mechanically operates on a tape. On this tape are symbols which the machine can read and write, one at a time, using a tape head. Operation is fully determined by a finite set of elementary instructions such as "in state 42, if the symbol seen is 0, write a 1; if the symbol seen is 1, shift to the right, and change into state 17; in state 17, if the symbol seen is 0, write a 1 and change to state 6;" etc.
Examples of the Turing machine:
- Turing's very first machine
- Copy routine
- 3-state busy beaver
Formal definition: Hopcroft and Ullman (1979, p. 148) formally define a (one-tape) Turing machine as a 7-tuple
M= \langle Q, \Gamma, b, \Sigma, \delta, q_0, F \rangle where
- Q is a finite, non-empty set of states
- \Gamma is a finite, non-empty set of the tape alphabet/symbols
- b \in \Gamma is the blank symbol (the only symbol allowed to occur on the tape infinitely often at any step during the computation)
- \Sigma\subseteq\Gamma\setminus\{b\} is the set of input symbols
- q_0 \in Q is the initial state
- F \subseteq Q is the set of final or accepting states
- \delta: Q \setminus F \times \Gamma \rightarrow Q \times \Gamma \times \{L,R\} is a partial function called the transition function, where L is left shift, R is right shift. (A relatively uncommon variant allows "no shift", say N, as a third element of the latter set.)
Anything that operates according to these specifications is a Turing machine.
The 7-tuple for the 3-state busy beaver looks like this (see more about this busy beaver at Turing machine examples):
- Q = \{ \mbox{A}, \mbox{B}, \mbox{C}, \mbox{HALT} \}
- \Gamma = \{ 0, 1 \}
- b = 0 ("blank")
- \Sigma = \{ 1 \}
- q_0 = \mbox{A} (the initial state)
- F = \{ \mbox{HALT} \}
- \delta = see state-table below
Initially all tape cells are marked with 0.
- Matrix Chain Multiplication Algorithm: Matrix chain multiplication is an optimization problem that can be solved using dynamic programming. Given a sequence of matrices, we want to find the most efficient way to multiply these matrices together. The problem is not actually to perform the multiplications, but merely to decide in which order to perform the multiplications. Matrix multiplication is associative:
For four matrices A, B, C, and D: (ABC)D = (AB)(CD) = A(BCD) = A(BC)D = ....
Let's assume that the minimum cost, or minimum number of arithmetic operations, needed to multiply out the matrices. If we're only multiplying two matrices, there's only one way to multiply them, so the minimum cost is the cost of doing this. In general, we can find the minimum cost using the following recursive algorithm:
- Take the sequence of matrices and separate it into two subsequence.
- Find the minimum cost of multiplying out each subsequence.
- Add these costs together, and add in the cost of multiplying the two result matrices.
- Do this for each possible position at which the sequence of matrices can be split, and take the minimum over all of them.
For example, if we have four matrices ABCD, we compute the cost required to find each of (A)(BCD), (AB)(CD), and
(ABC)(D), making recursive calls to find the minimum cost to compute ABC, AB, CD, and BCD. We then choose the best one. Better still, this yields not only the minimum cost, but also demonstrates the best way of doing the multiplication: just group it the way that yields the lowest total cost, and do the same for each factor.
- clique problem: The clique problem refers to any of the problems related to finding particular complete subgraphs ("cliques") in a graph, i.e., sets of elements where each pair of elements is connected. For example, the maximum clique problem arises in the following real-world setting. Consider a social network, where the graph’s vertices represent people, and the graph’s edges represent mutual acquaintance. To find a largest subset of people who all know each other, one can systematically inspect all subsets, a process that is too time-consuming to be practical for social networks comprising more than a few dozen people. Although this brute-force search can be improved by more efficient algorithms, all of these algorithms take exponential time to solve the problem. Therefore, much of the theory about the clique problem is devoted to identifying special types of graph that admit more efficient algorithms, or to establishing the computational difficulty of the general problem in various models of computation. Along with its applications in social networks, the clique problem also has many applications in
- bioinformatics and computational chemistry. Clique problems include:
- finding the maximum clique (a clique with the largest number of vertices),
- finding a maximum weight clique in a weighted graph,
- listing all maximal cliques (cliques that cannot be enlarged)
- Solving the decision problem of testing whether a graph contains a clique larger than a given size.
These problems are all hard: the clique decision problem is NP-complete (one of Karp's 21 NP-complete problems), the problem of finding the maximum clique is both fixed-parameter intractable and hard to approximate, and listing all maximal cliques may require exponential time as there exist graphs with exponentially many maximal cliques. Nevertheless, there are algorithms for these problems that run in exponential time or that handle certain more specialized input graphs in polynomial time.
Karp forms a graph that has a vertex for every pair (v,c), where v is a variable or its negation and c is a clause in the formula that contains v. Vertices are connected by an edge if they represent compatible variable assignments for different clauses: that is, there is an edge from (v,c) to (u,d) whenever c ≠ d and u and v are not each other’s negations. If k denotes the number of clauses in the CNF formula, then the k-vertex cliques in this graph represent ways of assigning truth values to some of its variables in order to
satisfy the formula; therefore, the formula is satisfiable if and only if a k-vertex clique exists.
No comments:
Post a Comment