gives rise to information that in turn gives rise to knowledge. Knowledge leads to understanding. Understanding leads to wisdom.
may be univariate if it has only one variable. It may be bivariate if it has two variables allowing correlation. It may be
multivariate with several variables allowing more sophisticated analyses.
document is stored data in any form: paper, book, letter, message, image, e-mail, voice, and sound. Some documents are ephemeral
but can still be retrieved for the brief time that they exist and are recoverable.
is physically stored as bytes. A byte has 8 bits and can therefore represent 28 = 256 characters. ASCII is a machine
language that uses only 127 codes (95 character codes and 25 control codes). ANSI is an extension of ASCII used by Microsoft.
Different languages use different numbers of codes for example Greek uses 219 characters, Cyrillic uses 259 characters, Arabic
uses 196 characters, and Chinese uses 65, 536 characters.
compression makes data storage and document retrieval easier because the search is carried out in a smaller space. Character,
image, and sound data can all be compressed; however compression may involve loss of some data.
may be formatted in tables of several types of databases (relational, hierarchical, and network). It may be unformatted such
as images, sound, or electronic monitoring in the hospital. Formatted documents are easier to retrieve.
may be described as sequential, indexed, tree structured, or clustered.
and PDQ are examples of medical data bases. MEDLINE was established in 1971. Every year 400,000 articles from 3,700 journals
are added and are indexed using medical subject headings (MESH). GRATEFUL MED is a query language used to search MEDLINE.
PDQ is a data base about cancer
surrogates used in data retrieval are: identifiers, abstracts, extracts, reviews, indexes, and queries.
are short documents used to retrieve larger documents by matching, mapping, or use of Boolean logic (and, or, but). Queries
may in natural or probabilistic language. Fuzzy queries are deliberately not rigid to increase the probability of retrieval.
Other forms of data retrieval are term extraction (based on low frequency of important terms), term association (based on
terms that normally occur together), lexical measures (using specialized formulas), trigger phrases (like figure, table, conclusion),
synonyms (same meaning), antonyms (opposite meaning), homographs (same spelling but different meaning), and homonyms (same
sound but different spelling). Stemming algorithms help in retrieval by removing ends of words leaving only the roots. Specialized
mathematical techniques are used to assess the effectiveness of data retrieval.
warehousing is a method of extraction of data from various sources, storing it as historical and integrated data for use in
decision-support systems. Meta
data is a term used for definition of data stored in the data warehouse (i.e. data about data). A data model is a graphic
representation of the data either as diagrams or charts. The data model reflects the essential features of an organization.
The purpose of a data model is to facilitate communication between the analyst and the user. It also helps create a logical
discipline in database design.
4.0 DATA MINING
Data mining is the discovery part of knowledge discovery in data (KDD) involving knowledge
engineering, classification, and problem solving. KDD starts with selection, cleaning, enrichment, and coding. The products
of data mining are pattern recognition. These patterns are then applied to new situations in predicting and profiling. Artificial
intelligence (AI), based on machine learning, imbues computers with some creativity and decision making capabilities using
Key words: Data coding, Data compression, Data encryption,
Data mining, Data modeling, Data processing, Data protection, Data recovery, Data reduction, Data replication, Data retrieval,
Data storage, Data structures, and Data value