MULTIPLE REGRESSION
MODELS AND USE THE MODEL TO THE RELATIONSHIP BETWEEN THE BLOOD PRESSURE AND THE
WEIGHT, AGE AND HEIGHT
CHAPTER
ONE
1.0 INTRODUCTION
A common factor of many
scientific investigations is that variation in the value of one variable is
caused to a great extent, by variation in the values of other related
variables. For instance, variation in crop yield can largely be explained in
terms of variation in the amount of rainfall and the quantity of fertilizer
applied. The amount of fuel consumed by a certain brand of car over a given
distance varies according to the age and the speed of the car and so forth.
Therefore, a primary goal of many statistical investigations is to establish
relationships which make it possible to predict one variable in terms of
others.
Regression analysis is a
statistical investigation of the relationship between a dependent variable Y
and one or more independent variable(s) X or X’s, and the use of the modeled
relationship to predict, control or optimize the value of the dependent
variable Y. The relationship is formulated in an equation that express the
values of Y in terms of the corresponding values of X or X’s and enables future
values of Y to be predicted in terms of the observed values of X, or to be
controlled or optimized by calculating the values of X or X’s. The independent
variables X’s are also called explanatory variables or controlled variables,
while the dependent variable Y is also called the response variable.
Regression models are of various
kinds. A regression study involving only two variables, a dependent variable Y
and one independent variable X, is called a simple linear regression or
univariate regression while a study involving a Y-variable and two or more
X-variable is called a multiple regression. The term bivariate regression and
multivariate regression are often used to distinguish between multiple
regression involving two X-variables and those involving more than two
X-variable. If a regression is linear in the X’s and the parameters, we refer
to it as a simple linear regression or a multiple linear regression depending on
whether it involves one X-variable or more than X-variables. An example of a
simple linear regression model is:
……………………………………(1)
While
an example of a multiple linear regression model is:
…………………………………………………………….. (2)
Regression being linear in the X’s and the parameters means
that no term in the model involves second and higher powers of the X’s or the
parameters, or a product or quotient of two X’s or two parameters.
1.1 AIMS
AND OBJECTIVES OF THE STUDY
The main aim of this project is to
derive the multiple regression models and use the model to the relationship
between the blood pressure and the weight, age and height of 100 individuals
1.2 SCOPE OF THE STUDY
This
project is restricted to the data gotten from the Federal Medical Centre Owo, Ondo State
which covers the blood pressure, weight, age, height of 100 individuals in
2010.
1.3 IMPORTANCE
OF STUDY
In Statistics, regression analysis includes
many techniques for modeling and analyzing several variables, when the focus is
on the relationship between a dependent variable and one or more independent
variables. More specifically, regression analysis help one to understand how the typical value of the dependent
variable changes when any one of the independent variables is varied, while the
other independent variables are held fixed. Most commonly, regression analysis
estimates the conditional expectation of the dependent variables given the
independent variables i.e. the average value of the dependent variable when the
independent variables are fixed. Less commonly, the focus is on a quantile, or
other location parameter of the conditional distribution of the dependent
variable given the independent variables called the regression function.
Regression analysis is widely used for
forecasting, where its use has substantial overlap with the field of machine
learning. Regression analysis is also used to understand which among the
independent variables are related to the dependent variable, and to explore the
forms of these relationships.
1.4 DEFINITION
OF TERMS
During the course of this
research, so many terminologies and abbreviations were encountered which are
precisely defined below:
DATA:
data is a fact or piece of information, especially when examined and used to
find out things or to make decision.
PARAMETERS:
This is something that decides or limits the way in which something can be
done.
ERROR:
Error is a random variable with a mean of zero conditional on the explanatory
variables.
REGRESSION
ANALYSIS: This is a statistical
tool, which helps to predict one variable from another variables or variables
on the basis of assumed nature of the relationship between the variables.
DBP:
Diastolic Blood Pressure
SBP:
Systolic Blood Pressure
TSS:
Total Sum of Squares
SSE:
Sum of Squares due to Error
DF:
Degree of Freedom
CHAPTER
TWO
2.1 LITERATURE REVIEW
Regression analysis is a
statistical methodology that utilizes the relationship between two or more
quantitative variables so that one variable can be predicted from the other(s).
This methodology is widely used in business, the social and behavioral
sciences, biological sciences and many other disciplines.
The term “regression” was coined by Sir
Francis Galton in the nineteenth century to describe a biological phenomenon.
The phenomenon was that the heights of descendants of tall ancestors tend to
regress down towards a normal average. For Galton, regression had only this
biological meaning, but his was later extended by Udny Yule and Karl Pearson in
1913 to a more general statistical context. In the work of Yule and Pearson,
the joint distribution of the response and explanatory variables is assumed to
be Gaussian. This assumption was weakened by R. A. Fisher in his work of 1922
and 1925. Fisher assumed that the conditional distribution of the response
variable is Gaussian, but the joint distributions need not to be. In this
respect, Fisher’s assumption is closer to Gauss’ formulation of 1821.
The earliest form of regression was the
method of least squares which was published by Legendre in 1805 and by Gauss in
1809. Legendre and Gauss both applied the method to the problem of determining,
from astronomical observations, the orbits of bodies about the sun (mostly
comets, but also later the then newly discovered minor planets). Gauss
published a further development of the theory of least squares in 1821,
including a version of the Gauss-Markov theorem.
Blair
[1962] described regression analysis as a mathematical measure of the average
relationship between two or more variables in terms of the original units of
the data.
Hamburg [1970] said
regression analysis refers to the methods by which estimates are made of the
values of a variable from knowledge of the values of one or more other
variables and to the measurement of the errors involved in this estimation
process.
Yamane [1974] and Karylowski [1985] both said
in a presentation that one of the most frequently used techniques in economics
and business research to find a relation between two or more variables that are
related casually is regression.
Chou [1978] says regression analysis attempts
to establish the ‘nature of the relationship’ between variables i.e. to study
the functional relationship between the variables and thereby provide a
mechanism for prediction or forecasting.
Multiple regression methods continue to
be an area of research. In recent decades, various definition and derivation
have been made, and the derivations have been to various life problems.
Koutsoyiannis
[1973] in his book ‘Theory of Econometrics’ used co-factor approach to derive
the multiple regression model and then used the derivation to show that the
economic theory postulates that the quantity demanded for a given commodity
depends on the price and on consumers’ income.
Schaeffer and McClave [1982] in their
book ‘Statistics for Engineer’ used the inverse matrix method to derive the
multiple regression and then went further using the derivation to show that the
average amount of energy required to heat a house depends not only on the air
temperature, but also on the size of the house, the amount of insulation, and
the type of heating unit.
Okeke [2009] in his book ‘Fundamentals
of Analysis of variance in statistics designed Experiment’ used the crammer’s
rule to show the multiple regression and then used it to show that a chemical
process may depend on temperature, pressure and concentration of the catalyst.
CHAPTER
THREE
3.0 SOURCE OF DATA
The data used in this project
work was collected from the Federal
Medical Center
Owo Ondo
State.
3.1 METHOD OF DATA COLLECTION
The data
used in this project work is a secondary source.
3.2 METHOD OF ANALYSIS
MULTIPLE REGRESSION ANALYSIS
The aim of multiple regression
is to examine the nature of the relationship between a given dependent variable
and two or more independent variable. The model describing the relationship
between the dependent variable Y and a set of k independent variable can be expressed as
Here, n is the number of
observation on both the dependent and the independent variable. is the ith
observation on the dependent variable Y. are known constants representing respectively
the ith observation on the independent and normally distributed
In this project work, we are
going to use multiple regression to examine the nature of the relationship
between blood pressure and weight, age and height.
3.3 PARAMETER ESTIMATION IN MULTIPLE PEGRESSION
can be
writer in matrix notation as
n x 1 n x (k+1) (k+1)x1
nx1
Which can still be written as
The sum of squared residuals is
Differentiating with respect to
B and equation to zero we have
Dividing both sides by two
Where
3.4 HYPOTHESIS TO BE TESTED
We are going to test if weight,
age, or height cannot single handedly cause a change in blood pressure (H0)
or otherwise (H1).
DECISION:
If tcal < ttab, we accept
H0 otherwise reject H0 and accept H1.
3.5 ANOVA TABLE FOR MULTIPLE REGRESSION
Source
|
DF
|
SS
|
Ms
|
F-ratio
|
Regression
|
K
|
SSR
|
|
|
Error
|
N-K-1
|
SSE
|
SSE
|
|
Total
|
n-1
|
SST
|
|
|
Where SSR =
|
3.6 COEFFICIENT OF MULTIPLE CORRELATION
The coefficient measures the proportion of the total
variation in the dependent variables y that is ascribed or attributed to y on
the independent variables that are include in the regression if is given as:
CHAPTER
FOUR
DATA
ANALYSIS
4.0 INTRODUCTION
This chapter
presents the summary of data to be studied and analyzed. Since the procedure
under listed in solving multiple regression problem have been known in the
previous chapter , we shall now emphasize on real life problem using the data
then from Federal Medical center Owo Ondo State as case study.
Furthermore, the steps and formula
stated in chapter three can also be applied here, but we shall be focusing only
on the results extracted from statistical software called MINITAB used in
analyzing this data.
4.1 THE MINITAB RESULTS SHOWING THE RELATIONSHIP
BETWEEN THE SYSTOLIC BLOOD PRESSURE AND THE WEIGHT, AGE, AND HEIGHT OF 100
INDIVIDUALS
The regression equation is:
4.2 ADEQUACY OF THE MODEL
To test the significance of
each parameter
P-value = 0.624
CRITICAL VALUE FROM THE T-TABLE
Decision:- Since = 4.8899 critical value =1.980 we
reject and accept and conclude
that weight contribute majorly to change in SBP i.e any increase in
weight, lead to change in SBP and it is significant at = 5%
HYPOTHESIS
FOR
P-value = 0.000
Decision:- Since = 5.3774 critical value
We reject and conclude that age is a major factor for
change in SBP i.e. the older we become, the nearer we are to high blood
pressure and it is significant at = 5%
HYPOTHESIS
FOR
P-value = 0.884
Decision: Since we accept and conclude that height cannot single
handedly contribute to a change in SBP and it is not significant at
4.3 ANALYSIS OF VARIANCE
Source
|
DF
|
SS
|
MS
|
F
|
P
|
Regression
|
3
|
2675.57
|
891.86
|
10.52
|
0.000
|
Error
|
96
|
8135.47
|
84.74
|
|
|
Total
|
99
|
10811.04
|
|
|
|
TEST
OF HYPOTHESIS
DECISION
Since the P-value = 0.000 falls
in the acceptance region, we conclude that weight, age, height are factors to
be considered for a change in SBP
4.4 COEFFICIENT OF MULTIPLE CORRELATION
From the ANOVA table, we have
=
0.2472 X 100% = 24.74%
This implies that 24.74% of the
total variation are normal in terms of change in blood pressure when the
factors i.e. weight () Age () and height
() were
considered.
In other words, 75.26% are suffering from either high blood
pressure or low blood pressure
CHAPTER FIVE
5.1 SUMMARY
Regression analysis is a statistical
tool used to establish linear relationship between predetermined variables and
dependent variable and thereby be able predicts the future variable using the
other variables.
In this project, multiple regression
was exhaustively discussed and was used in analyzing the effect of weight, age,
height on blood pressure and conclusions were drawn.
5.2 CONCLUSION
From the result of the statistical
software MINITAB used in analyzing the data, and from the interpretation of the
result in chapter four, it was observed that weight contribute majorly to
change in blood pressure when age and height are held fixed. Age also
contribute majorly to change in blood pressure when weight and height are held
fixed but height does no contribute to change in blood pressure when weight and
age are held fixed. But the three (weight, age and height) jointly contribute
immensely to a change in blood pressure.
5.3 RECOMMENDATION
Since it has been shown that any unit
change in our weight in conjunction with the advancement of our age can cause a
change in our blood pressure, it is advised that we concentrate on our weight
to avoid being over weighted and as we advance in age, we should always go for
regular check of our blood pressure.
The health sector should also endeavour
to always give adequate advice to the public on the effect of the food we eat
and engaging in excessive thought (too much of thinking) on our blood pressure.
REFERENCES
Arua, A.I. and Okafor, F.C. (1997); Fundamentals
of Statistics for Higher Education: Fijac Academic Press.
Dixon, W.J. and Massey, F.J. (1969);
Introduction to Statistical Analysis: New York McGraw-Hill book company.
Draper,
N.R., and Smith, H. (1988); Applied Regression Analysis, New York: John wiles & sons.
Francis,
A. (1986); Business Mathematics and Statistics: DP Publications, Aldine house,
Aldine place 142/144, Uxbridge road, London.
Gupta,
S.P. (1969); Statistical Methods: Sultan Chand and Sons, 23, Daryaganj, New Delhi.
Kleinbaum,
D.G., Kupper, L.L., and Muller, K.E. (1988); Applied Regression Analysis and
other Multivariable Methods. 2nd Edition, Boston: PWS-Kent Publishing Company.
Koutsoyiannis,
A. (1973); Theory of Econometrics: Palgrave Houndmills, Basingstoke
and Hampshire New York.
Okeke,
A.O. (2009); Fundamentals of Analysis of Variances in Statistical Designed
Experiments: Macro Academic Publishers; 1, Anigbogu close Achara layout Enugu.
Schaeffer
R.L. and McClave, J.T. (1982); Statistics for Engineers: PWS Publishers, a
division of Wadsworth, Inc. USA.
Spiegel,
M.R., and Stephens, L.J. (1999); 3rd Edition, New York, Schaum’s
outline series: McGraw-Hill.
APPENDIX A
S/N
|
Weight (kg)
|
SBP
|
DBP
|
AGE
|
Height (m)
|
BMI (kg/m2)
|
1
|
72
|
164
|
82
|
72
|
1.76
|
23.24380
|
2
|
65
|
108
|
70
|
24
|
1.71
|
22.22906
|
3
|
67
|
128
|
75
|
17
|
1.62
|
25.52964
|
4
|
70
|
124
|
70
|
23
|
1.67
|
25.09950
|
5
|
75
|
139
|
74
|
42
|
1.48
|
34.24032
|
6
|
67
|
144
|
65
|
51
|
1.83
|
20.00657
|
7
|
73
|
127
|
75
|
22
|
1.89
|
20.43616
|
8
|
75
|
136
|
78
|
41
|
2.1
|
17.0068
|
9
|
78
|
138
|
76
|
64
|
1.55
|
32.46618
|
10
|
78
|
112
|
68
|
28
|
1.71
|
26.67487
|
11
|
71
|
129
|
80
|
24
|
1.62
|
27.0538
|
12
|
71
|
140
|
90
|
63
|
1.22
|
48.3741
|
13
|
72
|
115
|
63
|
22
|
1.71
|
18.80921
|
14
|
55
|
120
|
80
|
24
|
1.89
|
19.03642
|
15
|
68
|
123
|
85
|
27
|
2.0
|
17.75000
|
16
|
71
|
135
|
64
|
23
|
2.01
|
16.08871
|
17
|
84
|
134
|
97
|
57
|
1.62
|
32.00732
|
18
|
60
|
125
|
88
|
23
|
1.68
|
21.2585
|
19
|
77
|
122
|
80
|
26
|
1.65
|
28.28283
|
20
|
56
|
115
|
70
|
19
|
1.92
|
15.19097
|
21
|
52
|
129
|
80
|
22
|
1.55
|
21.71166
|
22
|
70
|
130
|
80
|
26
|
1.65
|
21.64412
|
23
|
78
|
126
|
73
|
26
|
1.82
|
23.54788
|
24
|
75
|
112
|
65
|
24
|
1.98
|
19.1307
|
25
|
66
|
129
|
80
|
53
|
1.52
|
28.56648
|
26
|
70
|
125
|
76
|
23
|
2.04
|
16.82045
|
27
|
67
|
126
|
79
|
60
|
1.82
|
20.22703
|
28
|
59
|
113
|
74
|
24
|
1.55
|
24.55775
|
29
|
65
|
123
|
72
|
23
|
1.80
|
20.06173
|
30
|
55
|
128
|
72
|
33
|
1.77
|
17.55562
|
31
|
60
|
136
|
67
|
42
|
1.74
|
19.81768
|
32
|
88
|
140
|
90
|
76
|
1.77
|
28.08899
|
33
|
80
|
119
|
76
|
25
|
1.55
|
33.29865
|
34
|
70
|
124
|
76
|
25
|
1.83
|
20.90239
|
35
|
78
|
116
|
66
|
27
|
1.83
|
23.29123
|
36
|
84
|
140
|
80
|
55
|
1.98
|
21.42639
|
37
|
82
|
130
|
80
|
29
|
1.86
|
23.70216
|
38
|
63
|
125
|
70
|
20
|
1.74
|
20.80856
|
39
|
74
|
134
|
92
|
65
|
1.58
|
27.6469
|
40
|
55
|
147
|
84
|
69
|
1.07
|
48.03913
|
41
|
63
|
133
|
87
|
52
|
1.77
|
20.10916
|
42
|
70
|
131
|
95
|
75
|
1.70
|
24.22145
|
43
|
73
|
117
|
70
|
22
|
2.01
|
18.66886
|
44
|
75
|
125
|
68
|
24
|
1.86
|
21.67881
|
45
|
59
|
108
|
63
|
28
|
1.49
|
26.57538
|
46
|
76
|
130
|
77
|
26
|
1.92
|
20.61632
|
47
|
67
|
117
|
74
|
40
|
1.83
|
20.00657
|
48
|
60
|
124
|
77
|
23
|
1.71
|
20.51913
|
49
|
80
|
128
|
75
|
60
|
1.86
|
23.12326
|
50
|
70
|
132
|
80
|
55
|
1.98
|
17.85532
|
51
|
75
|
132
|
73
|
71
|
1.79
|
23.10751
|
52
|
62
|
160
|
89
|
75
|
1.21
|
42.34683
|
53
|
73
|
123
|
63
|
23
|
1.89
|
20.43616
|
54
|
75
|
136
|
80
|
60
|
1.34
|
41.76877
|
55
|
60
|
117
|
71
|
25
|
1.49
|
27.02581
|
56
|
82
|
134
|
83
|
44
|
1.89
|
22.95568
|
57
|
72
|
127
|
74
|
28
|
1.52
|
31.16343
|
58
|
64
|
130
|
72
|
80
|
1.55
|
26.63892
|
59
|
75
|
139
|
81
|
49
|
1.98
|
19.1307
|
60
|
68
|
131
|
76
|
63
|
1.74
|
22.46003
|
61
|
84
|
127
|
92
|
27
|
1.74
|
27.744745
|
62
|
75
|
128
|
82
|
21
|
1.56
|
30.81854
|
63
|
70
|
149
|
96
|
68
|
1.74
|
23.12062
|
64
|
78
|
128
|
77
|
24
|
1.71
|
26.67487
|
65
|
80
|
116
|
77
|
26
|
1.71
|
27.35885
|
66
|
52
|
120
|
80
|
25
|
1.74
|
17.17532
|
67
|
64
|
105
|
60
|
23
|
1.71
|
22.22906
|
68
|
60
|
129
|
74
|
21
|
1.65
|
22.03857
|
69
|
70
|
122
|
76
|
27
|
1.39
|
36.23001
|
70
|
60
|
118
|
84
|
22
|
1.59
|
24.97399
|
71
|
69
|
129
|
82
|
33
|
1.23
|
45.60777
|
72
|
74
|
132
|
81
|
40
|
1.83
|
22.09681
|
73
|
70
|
121
|
80
|
23
|
1.77
|
22.34352
|
74
|
65
|
121
|
73
|
35
|
1.70
|
20.74755
|
75
|
80
|
122
|
77
|
26
|
1.89
|
22.39579
|
76
|
74
|
123
|
79
|
34
|
1.79
|
23.09541
|
77
|
56
|
124
|
75
|
31
|
1.72
|
18.92915
|
78
|
70
|
128
|
80
|
43
|
1.68
|
24.80159
|
79
|
70
|
124
|
77
|
23
|
1.74
|
23.12062
|
80
|
69
|
115
|
69
|
59
|
1.68
|
24.44728
|
81
|
70
|
121
|
79
|
38
|
1.18
|
50.27291
|
82
|
72
|
126
|
76
|
23
|
1.98
|
18.36547
|
83
|
79
|
116
|
76
|
30
|
1.77
|
25.22625
|
84
|
68
|
125
|
81
|
24
|
1.77
|
21.70513
|
85
|
71
|
112
|
68
|
21
|
1.55
|
29.55255
|
86
|
75
|
112
|
72
|
35
|
1.77
|
23.93948
|
87
|
66
|
112
|
70
|
25
|
1.71
|
22.57105
|
88
|
67
|
125
|
85
|
77
|
1.63
|
25.21736
|
89
|
75
|
120
|
70
|
24
|
1.89
|
20.99605
|
90
|
72
|
121
|
74
|
23
|
1.77
|
22.9819
|
91
|
68
|
118
|
65
|
81
|
1.83
|
20.30517
|
92
|
72
|
113
|
75
|
22
|
1.89
|
20.15621
|
93
|
75
|
127
|
70
|
63
|
1.07
|
65.50790
|
94
|
77
|
113
|
76
|
24
|
1.55
|
32.04995
|
95
|
80
|
102
|
68
|
26
|
1.71
|
27.35885
|
96
|
75
|
115
|
65
|
24
|
1.98
|
19.1307
|
97
|
70
|
118
|
80
|
40
|
1.62
|
26.67276
|
98
|
78
|
110
|
76
|
63
|
1.22
|
42.40527
|
99
|
60
|
120
|
63
|
72
|
1.71
|
20.51913
|
100
|
59
|
115
|
80
|
90
|
1.69
|
20.65751
|
Abbreviations
WT = Weight
SBP = Systolic Blood pressure
DBP = Diastolic Blood pressure
BMI = Body mass index =
HT = Height.
APPENDIX B
Regression Analysis
The regression equation is
SBP = 110 + 0.60 WT(Kg) + 0.270 AGE + 0.68 HT(M)
Predictor Coef StDev T
P
Constant 109.63
10.89 10.06 0.000
WT(Kg) 0.0603 0.1227 0.49
0.624
AGE 0.27008 0.05021 5.38
0.000
HT(M) 0.683 4.678 0.15
0.884
5 = 9.206 R-Sq = 24.7% R-Sq(adj) = 22.4%
Analysis of Variance
Source DF SS MS F P
Regression
3 2675.57 891.86 10.52
0.000
Error 96 8135.47 84.74
Total 99 10811.04
Total 99 10811.04
Source DF Seq SS
WT(Kg) 1 62.03
WT(Kg) 1 62.03
AGE 1 2611.74
HT(M) 1
1.81
Unusual Observations