Chapter 1

INTRODUCTION

INTRODUCTION TO REAL TIME SYSTEMS

Real-time computing systems are to be found in many contemporary applications. From nuclear reactors to animation software, from medical imaging systems to automobiles, real-time software controls the time-dependent operations.

An important characteristic of the majority of real time environments is that they need to be operational on a continuous basis and that computations must be delivered on a timely basis and be accurate.

Faults may be life threatening (e.g. a nuclear reactor failing to shut-down properly, the flight control system of an airplane), highly expensive (e.g. a manufacturing plant) or merely annoying (e.g. short interruptions during a teleconferencing).

Many factors must be accounted for in designing such systems.

Real time (i.e. guaranteed delivery of results within prescribed time limitations).
Protection against faults and damage
Human Factors.

REAL TIME SYSTEMS EXAMPLES

AVIONICS

Computers have been introduced at an ever accelerated rate in controlling and managing flight.

Early aircraft employed simple control loops, mostly mechanical/hydraulic to control the flight.

Modern aircraft are fly by wire systems. This means that all pilot inputs are intercepted by a computer which in turn processes and activates the appropriate actuators.

Software is now responsible for progressively more complex decisions, in many instances reducing the pilot to just monitor and routine communications with ground control.

Fly by wire Aircraft

Drawbacks of extreme flight-automation

Loss of skill and preparedness from the pilot's part.
Faults in the logic of the specifications are difficult to find. They are mostly uncovered after an incident quite frequently resulting in the loss of life. ( The A320 Accident in Warsaw)
Human Factors Engineering is becoming crucial in managing the information that the avionics can provide, and focusing attention to the crucial part of information.

SPACE SHUTTLE COMPUTER SYSTEMS

Five IBM AP-101(since 1991 version 101S) General Purpose Computers (GPCs). These have a 360/370 architecture. The CPU is augmented by 25 I/O processors to accommodate the increased I/O requirements of the space shuttle systems.

Memory is limited (the AP-101S has 524,288 16-bit words of memory. Memory employs error detection and correction and it is battery backed-up. (The original shuttles had core memory).

The GPC's are housed in air transport racks (ATR), whose dimensions are 10.2 inches wide, 7.625 inches high and 19.52 inches long. The original APS101B was packaged in two such boxes, one for the CPU and another for the Input/Output Processor (IOP), but the newer AP101S combines both processors into one ATR. The AP-101S consumes 560 watts.

During ascent, and entry four computers are running the Primary Avionics Shuttle Software (PASS) and one of them is running the Backup Flight Software (BFS). Both software packages have been written by different teams to minimize the possibility that the same software error will appear in both systems.

Both software and hardware have undergone extensive testing and the hardware is fully space qualified. (View Space Shuttle Images)

SOME MORE ENVIRONMENTS

Automotive
Manufacturing
Monitoring and Control

REAL TIME INTRODUCTION

Time

Time is a critical resource that has to be managed. Unlike most computer systems where fairness forces time allocation to all the tasks, and thus, depending on the load the completion time of a given task may occur earlier or later, in real time systems, tasks have deadlines. The result of the computation, if produced after a deadline, is irrelevant. A good example of a computation where deadlines are relevant deadlines is weather prediction. The prediction time should be subsequent to the completion time, otherwise the computation has no meaning apart from checking the validity of the computation model.

A deadline is said to be hard if its violation implies a catastrophe. ( e.g. deadlines in nuclear plant control).

A deadline is said to be soft if its violation does not imply the destruction of a system (or subsystem). (e.g. image acquisition and image telemetry systems in a spacecraft. Images are acquired periodically and sent to Earth. If, because of a higher priority job, the image acquisition task was forced to delay and lose a frame, this is not catastrophic to the spacecraft (Voyager sent back approximately 6000 images from Jupiter).

Fault Tolerance

The real-time system must be designed and constructed in such a way so as to be fault tolerant. Failure of parts of the system must not bring the total system to a halt.

There are two major categories of faults: permanent and transient.

Permanent faults affect parts of a system permanently, and the only way to deal with them is to isolate the affected parts.

Transient faults have no lasting effect but their existence must be recognized so that the affected computation could be invalidated and the fault would not propagate.

Detection and protection against faults must be integrated into the design of a real-time system from its inception and happens in several layers in order to be economical.

A flipped bit in a DRAM is effectively detected and corrected by using coding (through an EDC unit).

A flipped bit in a register, on the other hand, cannot be effectively dealt with through coding since it would add significantly to the calculation time.

Similarly, an arithmetic/logic component failure cannot be dealt with easily by coding. A solution that exists is to replicate and vote in hardware. But this is quite expensive since it would require replication of the circuitry, power and volume/area requirements. For certain applications, the power/mass/volume are critical resources that cannot be allocated freely.

In addition, not all the computations need to be protected by triplication and voting. If the error rate is not exceedingly high for certain non-critical computations, simple detection of the error may be sufficient. For other applications, not even fault detection is necessary (e.g., image telemetry). Such a system must provide primitives and a framework through which the system integration and applications developer may tailor the target system.

A well thought framework is therefore of paramount importance since software errors are easily introduced and are the most difficult to detect and correct.

Environmental Integration

Real time systems are intimately integrated in their environment which is usually harsh. This in turn, introduces limitations in power/volume/weight requirements. It also introduces requirements for radiation immunity, damage tolerance and the ability of the system to change gradually (graceful degradation).

Damage is the physical alteration of a system caused by an external agent. Damage is normally localized.

A distributed system offers the best solution for both fault tolerance and expandability/weight/power/volume limitations, and damage tolerance. (A meteorite that hits one part of the spacecraft may destroy one mode but not the whole system.)

DESIGN AT ALL LEVELS.

CONSIDER ALL ASPECTS.

PROVE THE DESIGN AT EVERY LEVEL