MULTIPLE REGRESSION
MODELS AND USE THE MODEL TO THE RELATIONSHIP BETWEEN THE BLOOD PRESSURE AND THE
WEIGHT, AGE AND HEIGHT
CHAPTER
ONE
1.0 INTRODUCTION
A common factor of many
scientific investigations is that variation in the value of one variable is
caused to a great extent, by variation in the values of other related
variables. For instance, variation in crop yield can largely be explained in
terms of variation in the amount of rainfall and the quantity of fertilizer
applied. The amount of fuel consumed by a certain brand of car over a given
distance varies according to the age and the speed of the car and so forth.
Therefore, a primary goal of many statistical investigations is to establish
relationships which make it possible to predict one variable in terms of
others.
Regression analysis is a
statistical investigation of the relationship between a dependent variable Y
and one or more independent variable(s) X or X’s, and the use of the modeled
relationship to predict, control or optimize the value of the dependent
variable Y. The relationship is formulated in an equation that express the
values of Y in terms of the corresponding values of X or X’s and enables future
values of Y to be predicted in terms of the observed values of X, or to be
controlled or optimized by calculating the values of X or X’s. The independent
variables X’s are also called explanatory variables or controlled variables,
while the dependent variable Y is also called the response variable.
Regression models are of various
kinds. A regression study involving only two variables, a dependent variable Y
and one independent variable X, is called a simple linear regression or
univariate regression while a study involving a Yvariable and two or more
Xvariable is called a multiple regression. The term bivariate regression and
multivariate regression are often used to distinguish between multiple
regression involving two Xvariables and those involving more than two
Xvariable. If a regression is linear in the X’s and the parameters, we refer
to it as a simple linear regression or a multiple linear regression depending on
whether it involves one Xvariable or more than Xvariables. An example of a
simple linear regression model is:
……………………………………(1)
While
an example of a multiple linear regression model is:
…………………………………………………………….. (2)
Regression being linear in the X’s and the parameters means
that no term in the model involves second and higher powers of the X’s or the
parameters, or a product or quotient of two X’s or two parameters.
1.1 AIMS
AND OBJECTIVES OF THE STUDY
The main aim of this project is to
derive the multiple regression models and use the model to the relationship
between the blood pressure and the weight, age and height of 100 individuals
1.2 SCOPE OF THE STUDY
This
project is restricted to the data gotten from the Federal Medical Centre Owo, Ondo State
which covers the blood pressure, weight, age, height of 100 individuals in
2010.
1.3 IMPORTANCE
OF STUDY
In Statistics, regression analysis includes
many techniques for modeling and analyzing several variables, when the focus is
on the relationship between a dependent variable and one or more independent
variables. More specifically, regression analysis help one to understand how the typical value of the dependent
variable changes when any one of the independent variables is varied, while the
other independent variables are held fixed. Most commonly, regression analysis
estimates the conditional expectation of the dependent variables given the
independent variables i.e. the average value of the dependent variable when the
independent variables are fixed. Less commonly, the focus is on a quantile, or
other location parameter of the conditional distribution of the dependent
variable given the independent variables called the regression function.
Regression analysis is widely used for
forecasting, where its use has substantial overlap with the field of machine
learning. Regression analysis is also used to understand which among the
independent variables are related to the dependent variable, and to explore the
forms of these relationships.
1.4 DEFINITION
OF TERMS
During the course of this
research, so many terminologies and abbreviations were encountered which are
precisely defined below:
DATA:
data is a fact or piece of information, especially when examined and used to
find out things or to make decision.
PARAMETERS:
This is something that decides or limits the way in which something can be
done.
ERROR:
Error is a random variable with a mean of zero conditional on the explanatory
variables.
REGRESSION
ANALYSIS: This is a statistical
tool, which helps to predict one variable from another variables or variables
on the basis of assumed nature of the relationship between the variables.
DBP:
Diastolic Blood Pressure
SBP:
Systolic Blood Pressure
TSS:
Total Sum of Squares
SSE:
Sum of Squares due to Error
DF:
Degree of Freedom
CHAPTER
TWO
2.1 LITERATURE REVIEW
Regression analysis is a
statistical methodology that utilizes the relationship between two or more
quantitative variables so that one variable can be predicted from the other(s).
This methodology is widely used in business, the social and behavioral
sciences, biological sciences and many other disciplines.
The term “regression” was coined by Sir
Francis Galton in the nineteenth century to describe a biological phenomenon.
The phenomenon was that the heights of descendants of tall ancestors tend to
regress down towards a normal average. For Galton, regression had only this
biological meaning, but his was later extended by Udny Yule and Karl Pearson in
1913 to a more general statistical context. In the work of Yule and Pearson,
the joint distribution of the response and explanatory variables is assumed to
be Gaussian. This assumption was weakened by R. A. Fisher in his work of 1922
and 1925. Fisher assumed that the conditional distribution of the response
variable is Gaussian, but the joint distributions need not to be. In this
respect, Fisher’s assumption is closer to Gauss’ formulation of 1821.
The earliest form of regression was the
method of least squares which was published by Legendre in 1805 and by Gauss in
1809. Legendre and Gauss both applied the method to the problem of determining,
from astronomical observations, the orbits of bodies about the sun (mostly
comets, but also later the then newly discovered minor planets). Gauss
published a further development of the theory of least squares in 1821,
including a version of the GaussMarkov theorem.
Blair
[1962] described regression analysis as a mathematical measure of the average
relationship between two or more variables in terms of the original units of
the data.
Hamburg [1970] said
regression analysis refers to the methods by which estimates are made of the
values of a variable from knowledge of the values of one or more other
variables and to the measurement of the errors involved in this estimation
process.
Yamane [1974] and Karylowski [1985] both said
in a presentation that one of the most frequently used techniques in economics
and business research to find a relation between two or more variables that are
related casually is regression.
Chou [1978] says regression analysis attempts
to establish the ‘nature of the relationship’ between variables i.e. to study
the functional relationship between the variables and thereby provide a
mechanism for prediction or forecasting.
Multiple regression methods continue to
be an area of research. In recent decades, various definition and derivation
have been made, and the derivations have been to various life problems.
Koutsoyiannis
[1973] in his book ‘Theory of Econometrics’ used cofactor approach to derive
the multiple regression model and then used the derivation to show that the
economic theory postulates that the quantity demanded for a given commodity
depends on the price and on consumers’ income.
Schaeffer and McClave [1982] in their
book ‘Statistics for Engineer’ used the inverse matrix method to derive the
multiple regression and then went further using the derivation to show that the
average amount of energy required to heat a house depends not only on the air
temperature, but also on the size of the house, the amount of insulation, and
the type of heating unit.
Okeke [2009] in his book ‘Fundamentals
of Analysis of variance in statistics designed Experiment’ used the crammer’s
rule to show the multiple regression and then used it to show that a chemical
process may depend on temperature, pressure and concentration of the catalyst.
CHAPTER
THREE
3.0 SOURCE OF DATA
The data used in this project
work was collected from the Federal
Medical Center
Owo Ondo
State.
3.1 METHOD OF DATA COLLECTION
The data
used in this project work is a secondary source.
3.2 METHOD OF ANALYSIS
MULTIPLE REGRESSION ANALYSIS
The aim of multiple regression
is to examine the nature of the relationship between a given dependent variable
and two or more independent variable. The model describing the relationship
between the dependent variable Y and a set of k independent variable can be expressed as
Here, n is the number of
observation on both the dependent and the independent variable. is the ith
observation on the dependent variable Y. are known constants representing respectively
the ith observation on the independent and normally distributed
In this project work, we are
going to use multiple regression to examine the nature of the relationship
between blood pressure and weight, age and height.
3.3 PARAMETER ESTIMATION IN MULTIPLE PEGRESSION
can be
writer in matrix notation as
n x 1 n x (k+1) (k+1)x1
nx1
Which can still be written as
The sum of squared residuals is
Differentiating with respect to
B and equation to zero we have
Dividing both sides by two
Where
3.4 HYPOTHESIS TO BE TESTED
We are going to test if weight,
age, or height cannot single handedly cause a change in blood pressure (H_{0})
or otherwise (H_{1}).
DECISION:
If t_{cal }< t_{tab,} we accept
H_{0} otherwise reject H_{0} and accept H_{1}.
3.5 ANOVA TABLE FOR MULTIPLE REGRESSION
Source

DF

SS

Ms

Fratio

Regression

K

SSR



Error

NK1

SSE

SSE


Total

n1

SST



Where SSR =

3.6 COEFFICIENT OF MULTIPLE CORRELATION
The coefficient measures the proportion of the total
variation in the dependent variables y that is ascribed or attributed to y on
the independent variables that are include in the regression if is given as:
CHAPTER
FOUR
DATA
ANALYSIS
4.0 INTRODUCTION
This chapter
presents the summary of data to be studied and analyzed. Since the procedure
under listed in solving multiple regression problem have been known in the
previous chapter , we shall now emphasize on real life problem using the data
then from Federal Medical center Owo Ondo State as case study.
Furthermore, the steps and formula
stated in chapter three can also be applied here, but we shall be focusing only
on the results extracted from statistical software called MINITAB used in
analyzing this data.
4.1 THE MINITAB RESULTS SHOWING THE RELATIONSHIP
BETWEEN THE SYSTOLIC BLOOD PRESSURE AND THE WEIGHT, AGE, AND HEIGHT OF 100
INDIVIDUALS
The regression equation is:
4.2 ADEQUACY OF THE MODEL
To test the significance of
each parameter
Pvalue = 0.624
CRITICAL VALUE FROM THE TTABLE
Decision: Since = 4.8899 critical value =1.980 we
reject and accept and conclude
that weight contribute majorly to change in SBP i.e any increase in
weight, lead to change in SBP and it is significant at = 5%
HYPOTHESIS
FOR
Pvalue = 0.000
Decision: Since = 5.3774 critical value
We reject and conclude that age is a major factor for
change in SBP i.e. the older we become, the nearer we are to high blood
pressure and it is significant at = 5%
HYPOTHESIS
FOR
Pvalue = 0.884
Decision: Since we accept and conclude that height cannot single
handedly contribute to a change in SBP and it is not significant at
4.3 ANALYSIS OF VARIANCE
Source

DF

SS

MS

F

P

Regression

3

2675.57

891.86

10.52

0.000

Error

96

8135.47

84.74



Total

99

10811.04




TEST
OF HYPOTHESIS
DECISION
Since the Pvalue = 0.000 falls
in the acceptance region, we conclude that weight, age, height are factors to
be considered for a change in SBP
4.4 COEFFICIENT OF MULTIPLE CORRELATION
From the ANOVA table, we have
=
0.2472 X 100% = 24.74%
This implies that 24.74% of the
total variation are normal in terms of change in blood pressure when the
factors i.e. weight () Age () and height
() were
considered.
In other words, 75.26% are suffering from either high blood
pressure or low blood pressure
CHAPTER FIVE
5.1 SUMMARY
Regression analysis is a statistical
tool used to establish linear relationship between predetermined variables and
dependent variable and thereby be able predicts the future variable using the
other variables.
In this project, multiple regression
was exhaustively discussed and was used in analyzing the effect of weight, age,
height on blood pressure and conclusions were drawn.
5.2 CONCLUSION
From the result of the statistical
software MINITAB used in analyzing the data, and from the interpretation of the
result in chapter four, it was observed that weight contribute majorly to
change in blood pressure when age and height are held fixed. Age also
contribute majorly to change in blood pressure when weight and height are held
fixed but height does no contribute to change in blood pressure when weight and
age are held fixed. But the three (weight, age and height) jointly contribute
immensely to a change in blood pressure.
5.3 RECOMMENDATION
Since it has been shown that any unit
change in our weight in conjunction with the advancement of our age can cause a
change in our blood pressure, it is advised that we concentrate on our weight
to avoid being over weighted and as we advance in age, we should always go for
regular check of our blood pressure.
The health sector should also endeavour
to always give adequate advice to the public on the effect of the food we eat
and engaging in excessive thought (too much of thinking) on our blood pressure.
REFERENCES
Arua, A.I. and Okafor, F.C. (1997); Fundamentals
of Statistics for Higher Education: Fijac Academic Press.
Dixon, W.J. and Massey, F.J. (1969);
Introduction to Statistical Analysis: New York McGrawHill book company.
Draper,
N.R., and Smith, H. (1988); Applied Regression Analysis, New York: John wiles & sons.
Francis,
A. (1986); Business Mathematics and Statistics: DP Publications, Aldine house,
Aldine place 142/144, Uxbridge road, London.
Gupta,
S.P. (1969); Statistical Methods: Sultan Chand and Sons, 23, Daryaganj, New Delhi.
Kleinbaum,
D.G., Kupper, L.L., and Muller, K.E. (1988); Applied Regression Analysis and
other Multivariable Methods. 2^{nd} Edition, Boston: PWSKent Publishing Company.
Koutsoyiannis,
A. (1973); Theory of Econometrics: Palgrave Houndmills, Basingstoke
and Hampshire New York.
Okeke,
A.O. (2009); Fundamentals of Analysis of Variances in Statistical Designed
Experiments: Macro Academic Publishers; 1, Anigbogu close Achara layout Enugu.
Schaeffer
R.L. and McClave, J.T. (1982); Statistics for Engineers: PWS Publishers, a
division of Wadsworth, Inc. USA.
Spiegel,
M.R., and Stephens, L.J. (1999); 3^{rd} Edition, New York, Schaum’s
outline series: McGrawHill.
APPENDIX A
S/N

Weight (kg)

SBP

DBP

AGE

Height (m)

BMI (kg/m^{2})

1

72

164

82

72

1.76

23.24380

2

65

108

70

24

1.71

22.22906

3

67

128

75

17

1.62

25.52964

4

70

124

70

23

1.67

25.09950

5

75

139

74

42

1.48

34.24032

6

67

144

65

51

1.83

20.00657

7

73

127

75

22

1.89

20.43616

8

75

136

78

41

2.1

17.0068

9

78

138

76

64

1.55

32.46618

10

78

112

68

28

1.71

26.67487

11

71

129

80

24

1.62

27.0538

12

71

140

90

63

1.22

48.3741

13

72

115

63

22

1.71

18.80921

14

55

120

80

24

1.89

19.03642

15

68

123

85

27

2.0

17.75000

16

71

135

64

23

2.01

16.08871

17

84

134

97

57

1.62

32.00732

18

60

125

88

23

1.68

21.2585

19

77

122

80

26

1.65

28.28283

20

56

115

70

19

1.92

15.19097

21

52

129

80

22

1.55

21.71166

22

70

130

80

26

1.65

21.64412

23

78

126

73

26

1.82

23.54788

24

75

112

65

24

1.98

19.1307

25

66

129

80

53

1.52

28.56648

26

70

125

76

23

2.04

16.82045

27

67

126

79

60

1.82

20.22703

28

59

113

74

24

1.55

24.55775

29

65

123

72

23

1.80

20.06173

30

55

128

72

33

1.77

17.55562

31

60

136

67

42

1.74

19.81768

32

88

140

90

76

1.77

28.08899

33

80

119

76

25

1.55

33.29865

34

70

124

76

25

1.83

20.90239

35

78

116

66

27

1.83

23.29123

36

84

140

80

55

1.98

21.42639

37

82

130

80

29

1.86

23.70216

38

63

125

70

20

1.74

20.80856

39

74

134

92

65

1.58

27.6469

40

55

147

84

69

1.07

48.03913

41

63

133

87

52

1.77

20.10916

42

70

131

95

75

1.70

24.22145

43

73

117

70

22

2.01

18.66886

44

75

125

68

24

1.86

21.67881

45

59

108

63

28

1.49

26.57538

46

76

130

77

26

1.92

20.61632

47

67

117

74

40

1.83

20.00657

48

60

124

77

23

1.71

20.51913

49

80

128

75

60

1.86

23.12326

50

70

132

80

55

1.98

17.85532

51

75

132

73

71

1.79

23.10751

52

62

160

89

75

1.21

42.34683

53

73

123

63

23

1.89

20.43616

54

75

136

80

60

1.34

41.76877

55

60

117

71

25

1.49

27.02581

56

82

134

83

44

1.89

22.95568

57

72

127

74

28

1.52

31.16343

58

64

130

72

80

1.55

26.63892

59

75

139

81

49

1.98

19.1307

60

68

131

76

63

1.74

22.46003

61

84

127

92

27

1.74

27.744745

62

75

128

82

21

1.56

30.81854

63

70

149

96

68

1.74

23.12062

64

78

128

77

24

1.71

26.67487

65

80

116

77

26

1.71

27.35885

66

52

120

80

25

1.74

17.17532

67

64

105

60

23

1.71

22.22906

68

60

129

74

21

1.65

22.03857

69

70

122

76

27

1.39

36.23001

70

60

118

84

22

1.59

24.97399

71

69

129

82

33

1.23

45.60777

72

74

132

81

40

1.83

22.09681

73

70

121

80

23

1.77

22.34352

74

65

121

73

35

1.70

20.74755

75

80

122

77

26

1.89

22.39579

76

74

123

79

34

1.79

23.09541

77

56

124

75

31

1.72

18.92915

78

70

128

80

43

1.68

24.80159

79

70

124

77

23

1.74

23.12062

80

69

115

69

59

1.68

24.44728

81

70

121

79

38

1.18

50.27291

82

72

126

76

23

1.98

18.36547

83

79

116

76

30

1.77

25.22625

84

68

125

81

24

1.77

21.70513

85

71

112

68

21

1.55

29.55255

86

75

112

72

35

1.77

23.93948

87

66

112

70

25

1.71

22.57105

88

67

125

85

77

1.63

25.21736

89

75

120

70

24

1.89

20.99605

90

72

121

74

23

1.77

22.9819

91

68

118

65

81

1.83

20.30517

92

72

113

75

22

1.89

20.15621

93

75

127

70

63

1.07

65.50790

94

77

113

76

24

1.55

32.04995

95

80

102

68

26

1.71

27.35885

96

75

115

65

24

1.98

19.1307

97

70

118

80

40

1.62

26.67276

98

78

110

76

63

1.22

42.40527

99

60

120

63

72

1.71

20.51913

100

59

115

80

90

1.69

20.65751

Abbreviations
WT = Weight
SBP = Systolic Blood pressure
DBP = Diastolic Blood pressure
BMI = Body mass index =
HT = Height.
APPENDIX B
Regression Analysis
The regression equation is
SBP = 110 + 0.60 WT(Kg) + 0.270 AGE + 0.68 HT(M)
Predictor Coef StDev T
P
Constant 109.63
10.89 10.06 0.000
WT(Kg) 0.0603 0.1227 0.49
0.624
AGE 0.27008 0.05021 5.38
0.000
HT(M) 0.683 4.678 0.15
0.884
5 = 9.206 RSq = 24.7% RSq(adj) = 22.4%
Analysis of Variance
Source DF SS MS F P
Regression
3 2675.57 891.86 10.52
0.000
Error 96 8135.47 84.74
Total 99 10811.04
Total 99 10811.04
Source DF Seq SS
WT(Kg) 1 62.03
WT(Kg) 1 62.03
AGE 1 2611.74
HT(M) 1
1.81
Unusual Observations