Systems Analysis &
Design: Topic 11
Database and File Design
- File design and database design are among the most important
steps in the design process.
- The process of database design parallels the process of system
analysis and design.
- First, we must identify the requirements--in this case, data
- Second, we must model the requirements using one of the available
- Finally, we translate the logical model of the database into
a physical method of implementation, choosing among different
storage media, file types, and access methods.
- The objectives of data base design include
- data availability
- data integrity (accuracy and consistency)
- efficient data storage
- efficient updating and retrieval
- purposeful information retrieval (usable format)
Approaches to Data Storage
- There are two approaches to the storage of data in a computer-based
- The first method is to store the data in individual files,
each unique to a particular application.
- The second approach to the storage of data in a computer-based
system involves building a database, which is a formally defined
and centrally controlled store of data intended for use in many
- Although database management systems are widely used, conventional
file structures are still viable alternatives.
- For less complex systems, traditional file systems can be
cheaper to install and easier to maintain.
- They also have an advantage in operating speed.
- The two general methods of file organization are sequential
and random access.
- Records in sequential files must be read and written in the
sequence in which they are physically stored in the files.
- This means that a user must always start at the beginning
of the file and work to the end.
- With sequential files, users cannot jump into the middle even
if they are looking for a single record.
- Sequential files reside on tape or disk and are used primarily
in batch environments.
- Sequential files are quite useful and provide the fastest
access when data must be processed in batches, or groups, when
an immediate response is not necessary.
- For instance, a sequential file might be adequate for a payroll
system in which employee records are updated just once each week.
- Sequential files cannot be used when rapid access to specific
records is required.
- Random access files allow the user to get any record without
first reading the records preceding it.
- Random files reside on a disk and are required for on-line
- Random access files are necessary when the user must be able
to access an individual record rapidly, such as customer queries.
Advantages of Conventional Files:
- Files can be designed and built quite rapidly.
- Processing speed is better when using files.
Disadvantages of Conventional Files:
- When it becomes important to query the system for a combination
of some of the attributes, these attributes may be contained in
separate files or may not even exist.
- Files necessarily involve data redundancy.
- Updating files is more time-consuming.
- Data integrity is a concern, since a change in one file will
also require modification of the same data in other files.
- A database is a central source of data meant to be shared
by many users for a variety of applications.
- The heart of a database is the DBMS (database management system),
which allows the creation, modification, and updating of the database;
the retrieval of data; and the generation of reports.
- The effectiveness objectives of the database include:
- Ensuring that data can be shared among users for a variety
- Maintaining data that are both accurate and consistent.
- Ensuring that all data required for current and future applications
will be readily available.
- Allowing the database to evolve as the needs of the users
- Allowing users to construct their personal view of the data
without concern for the way the data are physically stored.
Advantages of Databases:
- The sharing of the data means that data need to be stored
- Data integrity is more likely since changes to data are accomplished
more easily and reliably if the data appear once rather than in
many different files.
- The data have a better chance of being available in a database
than in a conventional file system.
- A database can evolve as the needs of users and applications
- The database approach allows users their own view of the data.
- DBMS software is generally accompanied by a query language,
which is a generalized language that allows users to interrogate
a database easily.
- DBMS software is generally accompanied by report generation
facilities that can be used to easily produce desired reports.
Disadvantages of Databases:
- All of the data are stored in one place and are therefore
more vulnerable to catastrophes and require complete backup.
- A database cannot be optimized for retrieving data for a specific
application, since it may be shared by many users for various
- Additional software for the DBMS is required, meaning greater
- DBMS software sometimes requires computer upgrades.
- Any object or event about which someone chooses to collect
data is an entity.
- An entity may be a person, place, or thing--for example, a
salesperson, a city, or a product.
- Any entity can also be an event or unit of time such as a
machine breakdown, a sale, or a month or year.
- Relationships are associations between entities.
- Relationships can be one-to-one, one-to-many, or many-to-many.
- An attribute is some characteristic of an entity.
- There can be many attributes for each entity.
- The words data item and field are used interchangeably with
attribute, and are the same as the data elements that are included
in the data dictionary.
- Attributes can have values, can be of fixed or variable length,
and can be alphabetic, numeric, or alphanumeric.
- A record is a collection of data items that have something
in common with the entity described.
- An illustration of a record is shown below (Figure 17.9).
- A key is one of the data items in a record that is used to
identify a record.
- When a key uniquely identifies a record, it is called a primary
- When a key does not uniquely identify a record,
but can still be used for data retrieval purposes, it is called a
- Metadata is data that provides details about the data in
- Metadata provides such details as the data item name as well
as the data type and length of each.
- Entity-relationship diagrams provide a way of modeling the
data items and the relationships among them.
- An ERD offers several advantages:
- It is a model of logical data, independent of the actual physical
implementation of data storage and retrieval.
- It provides a tool for communicating with the database designers
and administrators. The analysts define the logical requirements
of the database in a set of specifications, which is then passed
on to database designers who figure out exactly how to accomplish
the physical details.
- It defines the data schema precisely, with no ambiguity.
- The model converts easily into a physical implementation.
- Entities are represented by rectangles
- Relationships are represented by diamonds.
- Relationships are explicitly named with a verb.
- Two entities and a connecting relationship can be interpreted
as a sentence, such as "PATIENT experiences TREATMENTS"
or conversely "TREATMENTS are experienced by PATIENT."
- An entity-relationship example is shown below (Figure 17.7).
- A relational structure consists of one or more two-dimensional
tables, which are referred to as relations.
- The rows of the table represent the records, and the columns
- Figure 17.19 shows three tables to (1) describe the items
and keep track of the current price of compact discs (ITEM-PRICE),
(2) describe the details of the order (ORDER), and (3) identify
the status of the order (ITEM-STATUS).
- To determine the price of an item, we need to know the item
number to be able to find it in the relation ITEM-PRICE.
- To update "G. MacRae's" credit-card number, we can
search the ORDER relation for MacRae and correct it only once,
even though he ordered many compact discs.
- To find out the status of part of an order, however, we must
know the ITEM-# and ORDER-#, and locate that information in the
- One of the primary advantages of the relational
is that ad-hoc queries are efficiently handled.
- When referring to relational structures a file is called a
relation, a record is usually
referred to as a tuple, and
the attribute value set is called a domain.
- Normalization is the transformation of complex data stores
to a set of smaller, stable data structures.
- Third Normal form can be attained by following the three steps
- The first stage of the process includes removing all
groups and identifying the primary key.
- The second step ensures that all nonkey attributes are fully
dependent on the primary key. All partial
dependencies are removed and placed in another relation.
- The third step removes any transitive
dependencies. A transitive dependency is one in which
nonkey attributes are dependent on other nonkey attributes.
(SALESPERSON-NUMBER, SALESPERSON-NAME, SALES-AREA,
CUSTOMER-NAME, WAREHOUSE-NUMBER, WAREHOUSE-LOCATION, SALES-AMOUNT)
* the inner set of parentheses represents the repeating group.
First Normal Form (1NF)
- The first step in normalizing a relation is to remove the
- Figure 17.26 shows how the original, unnormalized relation
is normalized by separating the relation into two new relations:
- The relation SALESPERSON contains
the primary key SALESPERSON-NUMBER
and all of the nonrepeating-attributes.
- The second relation, SALESPERSON-CUSTOMER,
contains the primary key from the relation SALESPERSON
as well as all of the attributes that were part of the repeating
group. In one approach, one must use a concatenated key (both
to access the rest of the information. A more detailed approach will be
covered in CIS 480.
- These relations can be written as
SALESPERSON (SALESPERSON-NUMBER, SALESPERSON-NAME, SALES-AREA)
CUSTOMER-NUMBER, CUSTOMER-NAME, WAREHOUSE-NUMBER, WAREHOUSE-LOCATION,
- The relation SALESPERSON-CUSTOMER
is a first normal relation, but it is not in its ideal form because
some of the attributes are not functionally dependent on the primary
key, SALESPERSON-NUMBER, CUSTOMER-NUMBER.
- This means that some of the nonkey attributes are dependent
only on CUSTOMER-NUMBER,
and not on the entire key.
- For example SALES-AMOUNT
is dependent on both SALESPERSON-NUMBER
but the other three attributes are dependent only on CUSTOMERNUMBER.
Second Normal Form (2NF)
- In the second normal form, all of the attributes will be functionally
dependent on the primary key.
- Therefore, the next step is to remove all of the partially
dependent attributes and place them in another relation.
- Figure 17.28 shows how the relation SALESPERSON-CUSTOMER
is split into two new relations: SALES
- These relations can also be expressed as:
CUSTOMER-WAREHOUSE (CUSTOMER-NUMBER, CUSTOMER-NAME, WAREHOUSE-NUMBER, WAREHOUSE-LOCATION)
- Although the relation CUSTOMER-WAREHOUSE
is in second normal form it can be simplified further because
some of the nonkey attributes are dependent not only on the primary
key, but also on a nonkey attribute.
- This type of dependency is referred to as a transitive dependency.
is dependent not only on CUSTOMER-NUMBER,
but also on WAREHOUSE-NUMBER.
Third Normal Form (3NF)
- Third normal form requires that all of the nonkey attributes
are fully functionally dependent on the primary key and there
are no transitive (nonkey) dependencies.
- It is possible to break apart the relation CUSTOMER-WAREHOUSE
into two relations, as shown in Figure 17.30, to get two new relations
called CUSTOMER and WAREHOUSE:
is a foreign key in the relation CUSTOMER.
- A foreign key is any attribute that is nonkey in relation,
but serves as a primary key in another relation.
- Normalizing SALES-REPORT
to third normal form results in four 3NF relations:
SALESPERSON (SALESPERSON-NUMBER, SALESPERSON-NAME, SALES-AREA)
- The third normal form is adequate for most database design
- The simplification gained from transforming an unnormalized
relation into a set of 3NF relations is a tremendous benefit in
the insertion, deletion, and updating of information in the database.
- An entity-relationship diagram for the database is shown in
Guidelines for Database Design
- Do not combine two distinct entities in one file.
- A specific data field should exist only in one master file.
- Each database relation should have programs to Read, Add,
Modify, and Delete records.
- When designing the data storage for a new system, we must
first decide whether the access should be sequential or on-line.
- If it is to be on-line, we have to choose among the indexed
files, direct access files, and database management systems.
- If the DBMS category is chosen, we must then choose among
several DBMS packages available.
- In all cases the analyst must evaluate the strengths and weaknesses
of each alternative.
- Compatibility with any existing databases may also factor
into the decision, as will the ability of the database to handle
an appropriate volume of data with acceptable speed and efficiency.
- In the ideal database, each piece of data is stored in exactly
- The more redundancy there is, the greater the maintenance
difficulty and the greater the waste of storage space.
- In some cases, however, we may choose to tolerate a certain
amount of redundancy in order to gain convenience or faster performance.
- In any case, we should be aware that redundancy degrades maintainability,
and should be used with care.
Planning for Performance
- A database must allow fast and efficient access to the required
- This is especially critical since input and output operations
are traditionally among the slowest of computer tasks.
- We must identify the types of data access that will be required
most often, and then (if possible) design the database in such a way as to facilitate
those types of access.
Building in Flexibility
- The business environment changes by the minute, and the database
must change to match.
- The database should be set up so that it can easily accommodate
revisions in data structure and access methods without requiring
modifications of existing programs.
- It should also allow ad hoc requests for information--which
are spur-of-the-moment and not likely to occur again.
- This type of request can be very important in making management
decisions, yet its very nature makes it impossible to plan far
enough in advance to write a specific program to retrieve the
- Most database management systems have a query language that
allows ad hoc requests.
Insuring Integrity and Security
- Information in a database is of little value if it is unreliable.
- At the same time it must be possible to exclude unauthorized
users from accessing certain areas of the database.