Experience: June 2012

Describe the Following with suitable Examples

a) Cost Estimation

b) Measuring index Selectivity

a) Cost Estimation:- Cost estimation models are mathematical algorithms or parametic equation used to estimate the costs of a product or project. The results of the models are typically necessary to obtain approval to proceed, and are factored into business plans, budgets, and other financial planning and tracking mechanisms.

These algorithms were originally performed manually but now are almost universally computerized. They may be standardized (available in published texts or purchased commercially) or proprietary, depending on the type of business, product, or project in question. Simple models may use standard spared sheet products. Models typically function through the input of parameters that describe the attributes of the product or project in question, and possibly physical resource requirements. The model then provides as output various resources requirements in cost and time.Cost modeling practitioners often have the titles of cost estimators, cost engineers, or parametric analysts.

Typical applications include:

Construction

Software Development

Manfacturing

New product develoment

b) Measuring Index Selectivity:- B*TREE Indexes improve the performance of queries that select a small percentage of rows from a table. As a general guideline, we should create indexes on tables that are often queried for less than 15% of the table's rows. This value may be higher in situations where all data can be retrieved from an index, or where the indexed columns can be used for joining to other tables. The ratio of the number of distinct values in the indexed column / columns to the number of records in the table represents the selectivity of an index. The ideal selectivity is 1. Such a selectivity can be reached only by unique indexes on NOT NULL columns.

Example with good Selectivity:- A table having 100'000 records and one of its indexed column has 88000 distinct values, then the selectivity of this index is 88'000 / 10'0000 = 0.88. Oracle implicitly creates indexes on the columns of all unique and primary keys that you define with integrity constraints. These indexes are the most selective and the most effective in optimizing performance. The selectivity of an index is the percentage of rows in a table having the same value for the indexed column. An index's selectivity is good if few rows have the same value. Example with bad Selectivity:- lf an index on a table of 100'000 records had only 500 distinct values, then the index's selectivity is 500 / 100'000 = 0.005 and in this case a query which uses the limitation of such an index will retum 100'000 / 500 = 200 records for each distinct value. It is evident that a full table scan is more efficient as using such an index where much more I/O is needed to scan repeatedly the index and the table

Describe the Following with suitable Examples

Graphics Vs Declarative Data Models.
Structural semantic data Model SSDM.

Graphics Vs Declarative Data Models:-A data model is a tool used to specify the structure and (some) semantics of the information to be represented in a database. Depending on the model type used, a data model can be expressed in diverse formats, including:

Graphic, as used in most semantic data model types, such as ER and extended/enhanced ER (EER) models.
Lists of declarative statements, as used in
- The relational model for relation definitions,
- AI/deductive systems for specification of facts and rules,
- Metadata standards such as Dublin core for specification of descriptive attribute-value pairs,
- Data definition languages (DDL).
Tabular, as used to present the content of a DB schema. Even the implemented and populated DB is only a model of the real world as represented by the data. studies of the utility of different model forms indicates that

Graphic models are easier for a human reader to interpret and check for completeness and correctness than list models (Kim & March,1995,Nordbotton & Crosby,1999), while List formed models are more readily converted to a set of data definition statements for compilation and construction of a DB schema.
These observations support the common practice of using 2 model types, a graphic model type for requirements analysis and a list model type - relational, functional, or OO - for implementation.
Translation of an ER-based graphic model to list form, or directly to a set of DDL (data definition language) statements, is so straight forward, that most CASE (computer aided software engineering) tools include support for the translation. The problem with automated translations is that the designer may not be sufficiently aware of the semantics lost in the translation.

Structural Semantic Data Model:-A data model in software engineering is an abstract model that describes how data are represented and accessed. Data models formally define data elements and relationships among data elements for a domain of interest. According to Hoberman (2009), "A data model is a wayfinding tool for both business and IT professionals, which uses a set of symbols and text to precisely explain a subset of real information to improve communication within the organization and thereby lead to a more flexible and stable application environment." A data model explicitly determines the structure of data or structured data. Typical applications of data models include database models, design of information systems, and enabling exchange of data. Usually data models are specified in a data modeling language. Communication and precision are the two key benefits that make a data model important to applications that use and exchange data. A data model is the medium which project team members from different backgrounds and with different levels of experience can communicate with one another. Precision means that the terms and rules on a data model can be interpreted only one way and are not ambiguous. A data model can be sometimes referred to as a data structure, especially in the context of programming languages. Data models are often complemented by function models, especially in the context of enterprise models.A semantic data model in software engineering is a technique to define the meaning of data within the context of its interrelationships with other data. A semantic data model is an abstraction which defines how the stored symbols relate to the real world. A semantic data model is sometimes called a conceptual data model. The logical data structure of a database management system (DBMS), whether hierarchical, network, or relational, cannot totally satisfy the requirements for a conceptual definition of data because it is limited in scope and biased toward the implementation strategy employed by the DBMS. Therefore, the need to define data from a conceptual view has led to the development of semantic data modeling techniques. That is, techniques to define the meaning of data within the context of its interrelationships with other data. As illustrated in the figure. The real world, in terms of resources, ideas, events, etc., are symbolically defined within physical data stores. A semantic data model is an abstraction which defines how the stored symbols relate to the real world. Thus, the model must be a true representation of the real world

The figure illustrates the way data models are developed and used today. A conceptual data model is developed based on the data requirements for the application that is being developed, perhaps in the context of an activity model. The data model will normally consist of entity types, attributes, relationships, integrity rules, and the definitions of those objects. This is then used as the start point for interface or database design Data architecture is the design of data for use in defining the target state and the subsequent planning needed to hit the target state. It is usually one of several architecture domains that form the pillars of an enterprise architecture or solution architecture.

A data architecture describes the data structures used by a business and/or its applications. There are descriptions of data in storage and data in motion; descriptions of data stores, data groups and data items; and mappings of those data artifacts to data qualities, applications, locations etc.

Discuss the following

Query Processing in object-oriented Database System
Query Processing Architecture

Query Processing in object-oriented Database System:-Object-oriented database systems have been proposed as an effective solution for providing the data management facilities of complex applications. Proving this claim and the investigation of related issues such as query processing have been hampered by the absence of a formal object-oriented data and query model. This thesis presents a model of queries for object-oriented databases and uses it to develop a query processing methodology. Two formal query languages are developed: a declarative object calculus and a procedural object algebra. The query processing methodology assumes that queries are initially specified as object calculus expressions. Algorithms are developed to prove the safety of calculus expressions and to translate them to their object algebra equivalents. Object algebra expressions represent sets of objects which may not all be of the same type. This can cause type violations when the expressions are nested.
Query processing Architecture:-SQL statements are the only commands sent from applications to Microsoft® SQL Server™ 2000. All of the work done by an instance of SQL Server is the result of accepting, interpreting, and executing SQL statements. The processes by which SQL statements are executed by SQL Server include:

Single SQL statement processing.

Batch processing.

Stored procedure and trigger execution.

Execution plan caching and reuse.

Parallel query processing.

Describe the following

Data mining functions
Data mining Technique

Data mining function:-Data mining refers to the broadly-defined set of techniques involving finding meaningful patterns - or information - in large amounts of raw data.
At a very high level, data mining is performed in the following stages (note that terminology and steps taken in the data mining process varies by data mining practitioner):
1. Data collection: gathering the input data you intend to analyze
2. Data scrubbing: removing missing records, filling in missing values where appropriate
3. Pre-testing: determining which variables might be important for inclusion during the analysis stage
4. Analysis/Training: analyzing the input data to look for patterns
5. Model building: drawing conclusions from the analysis phase and determining a mathematical model to be applied to future sets of input data
6. Application: applying the model to new data sets to find meaningful patterns
Data mining can be used to classify or cluster data into groups or to predict likely future outcomes based upon a set of input variables/data.
Common data mining techniques and tools include, for example:
a. decision tree learning
b. Bayesian classification
c. neural networks
During the analysis phase (sometimes also called the training phase), it is customary to set aside some of the input data so that it can be used to cross-validate and test the model, respectively. This is an important step taken in order to to avoid "over-fitting" the model to the original data set used to train the model, which would make it less applicable to real-world applications.
Data mining Technique:-There are several major data mining techniques have been developed and used in data mining projects recently including association, classification, clustering, prediction and sequential patterns. We will briefly examine those data mining techniques with example to have a good overview of them.

Association:-Association is one of the best known data mining technique. In association, a pattern is discovered based on a relationship of a particular item on other items in the same transaction. For example, the association technique is used in market basket analysis to identify what products that customers frequently purchase together. Based on this data businesses can have corresponding marketing campaign to sell more products to make more profit.

Classification:- Classification is a classic data mining technique based on machine learning. Basically classification is used to classify each item in a set of data into one of predefined set of classes or groups. Classification method makes use of mathematical techniques such as decision trees, linear programming, neural network and statistics. In classification, we make the software that can learn how to classify the data items into groups. For example, we can apply classification in application that “given all past records of employees who left the company, predict which current employees are probably to leave in the future.” In this case, we divide the employee’s records into two groups that are “leave” and “stay”. And then we can ask our data mining software to classify the employees into each group.

Clustering:-Clustering is a data mining technique that makes meaningful or useful cluster of objects that have similar characteristic using automatic technique. Different from classification, clustering technique also defines the classes and put objects in them, while in classification objects are assigned into predefined classes. To make the concept clearer, we can take library as an example. In a library, books have a wide range of topics available. The challenge is how to keep those books in a way that readers can take several books in a specific topic without hassle. By using clustering technique, we can keep books that have some kind of similarities in one cluster or one shelf and label it with a meaningful name. If readers want to grab books in a topic, he or she would only go to that shelf instead of looking the whole in the whole library.

Prediction:-The prediction as it name implied is one of a data mining techniques that discovers relationship between independent variables and relationship between dependent and independent variables. For instance, prediction analysis technique can be used in sale to predict profit for the future if we consider sale is an independent variable, profit could be a dependent variable. Then based on the historical sale and profit data, we can draw a fitted regression curve that is used for profit prediction.

Sequential Patterns:-Sequential patterns analysis in one of data mining technique that seeks to discover similar patterns in data transaction over a business period. The uncover patterns are used for further business analysis to recognize relationships among data.

Describe the following

Statement and transaction in a Distributed Database.
Heterogeneous Distributed Database Systems.

Statement and Transaction in a Distributed Database:-A distributed transaction is a transaction that updates data on two or more networked computer systems. Distributed transactions extend the benefits of transactions to applications that must update distributed data. Implementing robust distributed applications is difficult because these applications are subject to multiple failures, including failure of the client, the server, and the network connection between the client and server. In the absence of distributed transactions, the application program itself must detect and recover from these failures. For distributed transactions, each computer has a local transaction manager. When a transaction does work at multiple computers, the transaction managers interact with other transaction managers via either a superior or subordinate relationship. These relationships are relevant only for a particular transaction. Each transaction manager performs all the enlistment, prepare, commit, and abort calls for its enlisted resource managers (usually those that reside on that particular computer). Resource managers manage persistent or durable data and work in cooperation with the DTC to guarantee atomicity and isolation to an application. In a distributed transaction, each participating component must agree to commit a change action (such as a database update) before the transaction can occur. The DTC performs the transaction coordination role for the components involved and acts as a transaction manager for each computer that manages transactions. When committing a transaction that is distributed among several computers, the transaction manager sends prepare, commit, and abort messages to all its subordinate transaction managers. In the two-phase commit algorithm for the DTC, phase one involves the transaction manager requesting each enlisted component to prepare to commit; in phase two, if all successfully prepare, the transaction manager broadcasts the commit decision.

In general, transactions involve the following steps:

1) Applications call the transaction manager to begin a transaction.

2) When the application has prepared its changes, it asks the transaction manager to commit the transaction. The transaction manager keeps a sequential transaction log so that its commit or abort decisions will be durable.

If all components are prepared, the transaction manager commits the transaction and the log is cleared.
If any component cannot prepare, the transaction manager broadcasts an abort decision to all elements involved in the transaction.
While a component remains prepared but not committed or aborted, it is in doubt about whether the transaction committed or aborted. If a component or transaction manager fails, it reconciles in-doubt transactions when it reconnects.

When a transaction manager is in-doubt about a distributed transaction, the transaction manager queries the superior transaction manager. The root transaction manager, also referred to as the global commit coordinator, is the transaction manager on the system that initiates a transaction and is never in-doubt. If an in-doubt transaction persists for too long, the system administrator can force the transaction to commit or abort.

Heterogeneous Distributed Database Systems:- In a heterogeneous distributed database system, at least one of the databases is a non-Oracle Database system. To the application, the heterogeneous distributed database system appears as a single, local, Oracle Database. The local Oracle Database server hides the distribution and heterogeneity of the data.The Oracle Database server accesses the non-Oracle Database system using Oracle Heterogeneous Services in conjunction with an agent. If you access the non-Oracle Database data store using an Oracle Transparent Gateway, then the agent is a system-specific application. For example, if you include a Sybase database in an Oracle Database distributed system, then you need to obtain a Sybase-specific transparent gateway so that the Oracle Database in the system can communicate with it. Alternatively, you can use generic connectivity to access non-Oracle Database data stores so long as the non-Oracle Database system supports the ODBC or OLE DB protocols.

Experience

About Me

Saturday, 23 June 2012

MC0077-Spring Drive Assignment-2012