**CHAPTER ONE**

**INTRODUCTION**

An Extension of the tobit
model is the tobit model originally developed by James Tobin, 1958 in the
probit model estimates the probability of owning a house as a function of some
socio economic variables. Then in the tobit model our culerest is in finding
out the amount of money a person or family spends on a house in relation to
socio economic variables. We have these data only on consumers who actually purchase
a house.

Thus consumers are decided
into two groups one consisting of n

_{1}and another n_{2}n_{1}consist of those consumers about whom we have information on the regressors (say income, mortgage interest rate, number of people in the family etc) as well as the regressed (amount of Expenditure on housing).
Another consist of n

_{2 }consumers about whom we have information only on the repressors but not on the regressed.
A sample in which information
on the regressand is available only for some observation is known as a censored
sample.

Therefore, the tobit – model
is also known as a censored regression model limited dependent variables
regression models because of the restrain put on the value’s taken by the regressed
(Gujarati 2004) statistically, are can express tobit model as

Expenditure data not available, but income data
available.

Both expenditure and income data available

Figure 1:

Plot of amount of money consumer spends in buying a house
versus income.

Consider the figure I

_{1}if y is not observed (because of censoring) all such observations (=n_{2}) denoted by crosses will lie on the horizontal axis
If y is observed, the
observations (=n

_{1}) denoted by dots will lie in the X – Y plane.
Y1 = B

_{1}X_{i}+ U_{i}if RHS>0 = 0
It is intuitively clear that
if we estimate a regression line based on the n

_{1}observations only, the resulting intercept and slope coefficient are bound to be different than if all the (n_{1}+ n_{2}) observations were taken into account.
One can estimate tobit or
censored regression models by maximum likelihood. But James Heckman has
proposed an alternative to the ML method, which is comparatively simple this alternative
consist of a two – step estimating procedure. In step 1, we first estimate the probability
a consumer owning a house which is done on the basis of the probit model. In step
2, we estimate the model by adding to it a variable called (inverse mills ratio
or the hazard rate) that is derived from the probit estimate.

The Heckman procedure yields
consistent. Estimates of the parameters but flay are not efficient as the ML
estimates. Since most modern statistical software packages have.

**CHAPTER TWO**

**ILLUSTRATION OF TOBIT MODEL**

Ray fair’s model of extramarital affairs in an
interesting and theoretically innovative article, Ray fair collected a sample
of 601 men and woman that is married for the first time and analyzed their
responses to a question about extramarital affair.

The variables used in this
study are defined as follows:

Y = number of
affair in the past year

Z1 = O for female I
for male.

Z2 = number of
years married

Z4 = children: O if
no children and 1 if there are children

Z5 = religiousness
on scale of 1 to 5, 1 being ant religion

Z6 = education, years:
graded school = 9 high school

= 12 PhD or
other = 20

Z7 = Occupation, “Hollingshead” scale, 1. 7

Z8 = self – rating of manage, 1 = very unhappy,
5 = very happy of the 601 responses,
451 individuals had no extra marital affairs, and 150 individual had one or
more affairs.

In terms of figure 1, if we plot the number of
affairs on the vertical axis and say education on the horizontal axis, there will
be 451 observations lying along the horizontal axis thus, we have a censored
sample and a tobit model may be appropriate.

The
model supposes that there is a latent (i.e. unobservable) variable y

_{i}. This variable linearly depends on T_{i}via a parameter (vector) which determines the relationship between the independent variable (or vector) T_{i}and the latent variable y_{i}(just as in a linear model). In addition, there is normally distributed error term U_{i}to capture random influences on this relationship. The observable variable y_{i}is defined to be equal to the latent variable whenever the latent variable is above zero and zero otherwise.
Y

_{i}= y_{i}if y_{i}> 0
0 if y

_{i}≤ 0
where y

_{i}is a latent variable:
Y

_{i}= Bx_{i}+ u_{i}, u_{i }N(0, o^{2})**Consistency**

If
the relationship parameter

^{B}is estimated by regressing the observed y_{ion}x_{i}, the resulting ordinary least squares regression estimator is inconsistent. It will yield a downwards-based estimated of the slop coefficient and an upwards-biased estimated of the intercept. Takeshi Amemiya (1973) has proven that the maximum likelihood estimator suggested by Tobin for this model is consistent.**Interpretation**

The

^{B}coefficient should not be interpreted as the effect of x_{ion}y_{i}, as one would with a liner regression model; this is a common error. Instead, it should be interpreted as the combination of (1) the change in y_{i}of those above the limit, weighted by the probability of being above the limit; and (2) the change in the probability of being above the limit, weighted by the expected value of y_{i}if above.^{(2)}**Variations of the Tobit model**

Variations of the Tobit
model can be produced by changing by where and when censoring occurs. Amemiya
(1985) classifies these variations into five categories (Tobit type 1 – Tobit
type V), where Tobit type I stands for the first model described above.
Schnedler (2005) provides a general formula to obtain consistent likelihood
estimators for these and other variations of the Tobit model.

**Type 1**

The Tobit model is a special
case of a censored regression model, because the latent variable y

_{i}cannot always be observed while the independent variable x_{i}is observable. A common variation of the Tobit model is censoring at a value y_{L}different from zero:
Y

_{i}= y_{i}if y_{i}> y_{L}
y

_{L}if y_{i}≤ y_{L}
another example is censoring of values above yu.

Y

_{i}= y_{i}if y_{i}> y_{u}
y

_{u}if y_{i}≤ y_{u}
yet another model results when y

_{i}is censored from above and below at the same time.
Y

_{i}= y_{i}if y_{i}> y_{L}< yu
y

_{L}if y_{i}≤ y_{L}
y

_{u}if y_{i }≤ yu
The rest of the models will be presented as being
bounded from below at 0, though this can be generalized as we have done for
type 1.

Type II Tobit models introduce a second latent
variable.

Y

_{i}= y_{2i}if y_{i}> 0
0 if y

_{i}≤ 0.
Heckman (1987) falls into the type II Tobit. In Tpe
1 Tobit, the latent variable absorb both the process of participation and
‘outcome’ of interest. Type II Tobit allows the process of
participation/selection and the process of ‘outcome’ to be independent,
conditional on x.

A
lots of problems related to this are available in literature. The following is
one example which we have taken from taken from the website http://www.ats.ucla.edu/stat/sas/dae/tobit.htm.

**Example 1:**Consider the situation in which we have a measure of academic aptitude (scaled 200-800) which we want to model using reading and math test scores, as well as, the type of program the student is enrolled in (academic, general, or vocational). The students who answer all questions on the academic aptitude test correctly receive a score of 800, even though it is likely that these students are not “truly” equal in aptitude. The same is true of students who answer all of the questions incorrectly. All such students would have a score of 200 (i.e. the lowest score possible). Meaning that even though censoring from below was possible, but it does not occur in the dataset.

**HYTHESIS TESTING**

**What is Hypothesis Testing?**

A statistical hypothesis is an assumption about a
population parameter. This assumption may or may not be true. Hypothesis
testing refers to the formal procedures used by statisticians to accept or
reject statistical hypothesis.

Hypothesis
test is a method of making decisions using data from is a method of making decisions
using data from a scientific study.

So
the method in which we select samples to learn more about characteristics in a
given population is also known as hypothesis testing. Hypothesis testing is
really a systematic way to test claims or ideas about a group or population

**Types of hypothesis**

There
are two types of hypothesis:

1. Null hypothesis denoted by H, is usually the hypothesis that
sample observation results purely from chance.

2. Alternative
hypothesis: The alternative hypothesis an alterative denoted by H

_{i}or Ha, it is the hypothesis (H_{i}is that sample observations are influenced) a statement that by some non-random cause directly contradicts a null hypothesis for example, suppose we wanted to determine by stating that whether a coin was fair and balanced. The actual value a null hypothesis might be that half of the of a population flips would result in heads and half in parameters is Tails. The alternative hypothesis might be less than, (<) that the number of the leads and Tails greater than> or would be very different. Symbolically, not equal to (=) these hypothesis would be expressed as the value stated in the null H0: p = 0.5
hypothesis. The Ha: = 0.5

Alternative suppose we
flipped the coin 50 times, hypothesis states resulting in 40 heads and 10
fails, what we think given this result, we would be inclined is wrong about to reject
the null hypothesis, we would the null hypothesis. Conclude, based on the
evidence, that the coin was probably not fair and balanced.

So the alternative hypothesis establishes where to
place the level of significance. The level of significance refers to a
criterion of judgment upon is a decision is made regarding the value stated in
a null hypothesis not that the only reason we are test the null hypothesis is
because we thank is wrong. Then we have to state what we think is wrong about
the null hypothesis in an alternative hypothesis.

For example children
watching TV, we may have reason to believe that children watch more then (>)
or less (>) 3 hours of TV per week. When we are uncertain of the direction,
we can state that the value in the null hypothesis is not equal to (=) 3 hours

In a court room, since the defendant is assumed to
be innocent (This is the null hypothesis) the burden is on a prosecutor to
conduct a trial to show evidence that the defendant is not innocent.

In a similar way, we assume
the null hypothesis is true, placing the burden on the researcher to conduct a
study to show evidence that the null hypothesis is unlikely to be true. Regardless
we always make a decision about the null hypothesis (that it is likely or
unlikely to be trice)

Where as an alternative
hypothesis (H

_{i}) is the statement that directly contradicts a null hypothesis by stating that the actual value of a population parameter is more than (>) or less than (<) or not equal to (=) the value stated in the null hypothesis. The alternative hypothesis states what we think wrong about the null hypothesis.

**Four steps in hypothesis**

Statiscians follows a formal
process to determine whether to reject a null hypothesis, based on sample data this
process, called hypothesis testing, consist of four steps.

·
State the hypothesis: This involves stating the null and alternative
hypothesis. The hypothesis are stated in such a way that they are mutually exclusive.
Ie, if one is true the other must be false.

·
Formulate an analysis plan: the analysis plan describes how to use
sample data to evaluate the null hypothesis. The evaluation often focuses
around a single test statistic.

·
Analyze sample data:- find the value of the test statistic (mean score,
proportion, to score, Z score, etc) described in the analysis plan

·
Interpret results: Apply the decision null described in the analysis
plan. If the value of the test statistic is unlikely, based on the null
hypothesis, reject the null hypothesis.

**Can Null Hypothesis Be Accepted**

Some researcher say that a can null hypothesis be
accepted test can have one of two outcomes. You accept the null can null
hypothesis be accepted or you reject many statisticians, however take issue
with the notion of “accepting the null can null hypothesis be accepted” instead
they say you reject the null can null hypothesis be accepted or you fail to reject
the null can null hypothesis be accepted. Why the distention between “acceptance”
and “failure to reject” Acceptance implies that the null can null hypothesis is
true. Failure to reject implies that the data are not sufficiently persuasive
for us to prefer the alternative can null hypothesis be accepted over the null
can null hypothesis be accepted other approach to decision making, such as
Bayesian decision theory, attempt to balance the consequences of incorrect
decisions across all possibilities rather than concentrating on a simple null can
null hypothesis. A number of other approaches to reaching a decision based on data
are available via decision theory and optimal decisions. Some of which have desirable
properties, yet hypothesis testing is a dominant approach to data analysis in
many fields of science. Extensions to the theory of hypothesis testing include
the study of the power of tests, which refers to the probability of correctly rejecting
the null hypothesis when a given state of nature exist. Such consideration can
be used for the purpose of sample size determination prior to the collection of
data

**Decision Errors**

Two types of errors can result form a hypothesis
test

Type 1 error: a type 1 error occurs when the
researcher rejects null hypothesis when it is true. The probability of
committing a Type 1 error is called significance level. This probability is
also called alpha, and is often denoted by a.

The II error. A type II error occurs when the researcher
fails to reject a null hypothesis that is false. The probability of committing
a type II error is called Beta, and is often denoted by b. The probability of not committing a type II
error is called the Power of the test.

**Decision rules**

The
analysis plan includes decision rules for rejecting the null hypothesis. In
practice, statisticians describe these decision rules in two ways-with
reference to a P-value or with reference to a region of acceptance.

**P-value:**The strength of evidence in support of a null hypothesis is measured by the P-value. Suppose the test statistic is equal to S. the P=value is the probability of observing a test statistic as extreme as S, assuming the null hypothesis is true. If the P-value in less than the significance level, we reject the null hypothesis.

Region
of acceptance. The regional of acceptance is a range of values. If the test
statistic falls within the region acceptance, the null hypothesis is not
rejected. The region of acceptance is a range of values. If test statistic
falls within the region o acceptance, the null hypothesis is not rejected. The
region of acceptance is defined so that the chance of making a type I error is
equal to the significance level.

The
set of values outside the region of acceptance is called the region of
acceptance is called the region of rejection. If the test statistic falls within
the region of rejection, the null hypothesis is rejected. In such cases, we say
that the hypothesis has been rejected at the a level of significance.

These approaches are equivalent. Some statistics
texts use the P-value approach; others use the region of acceptance approach.
In subsequent lessons, this tutorial will present examples that illustrate each
approach.

**One Tailed and Two Tailed Tests.**

A
test of a statistical hypothesis, where the region of rejection is on only one
side of the samplings distribution, is called a one-tailed test. For example,
suppose the null hypotheses would be that the mean is greater than 10. the
region of rejection would consist of a range of number located, located on the
right side of sampling distribution; that is
a set of number greater than 10

A test of a statistical hypothesis, where the region
of rejection is both sides of the sapling distribution, is called a two-toile detest.
For example, suppose the null hypothesis states that the mean is equal to 10.
the alternative hypothesis would be that the mean is less than 10 or greater
than 10. the region of rejection would consist of a range of number located on
both sides of sampling distribution; that is, the region of rejection would
consist partly of numbers that were less than 10 and partly of numbers that
were greater than 10.

**REFERENCES**

Damodar N. Gujarati
(2000) Basic econometrics

Micheal Ugota Awoke (2001) Econometrics theory and
application

International encyclopedia of the social sciences
(2008)

Mcdonald, John f., Moffit, robber A. (1980),

*“The Uses of tobit Analysis”, The Review of Economics and Statistics*(The MIT Press) 62 92): 318-321