An Extension of the tobit model is the tobit model originally developed by James Tobin, 1958 in the probit model estimates the probability of owning a house as a function of some socio economic variables. Then in the tobit model our culerest is in finding out the amount of money a person or family spends on a house in relation to socio economic variables. We have these data only on consumers who actually purchase a house.
Thus consumers are decided into two groups one consisting of n1 and another n2 n1 consist of those consumers about whom we have information on the regressors (say income, mortgage interest rate, number of people in the family etc) as well as the regressed (amount of Expenditure on housing).
Another consist of n2 consumers about whom we have information only on the repressors but not on the regressed.
A sample in which information on the regressand is available only for some observation is known as a censored sample.
Therefore, the tobit – model is also known as a censored regression model limited dependent variables regression models because of the restrain put on the value’s taken by the regressed (Gujarati 2004) statistically, are can express tobit model as
Expenditure data not available, but income data available.
Both expenditure and income data available
Plot of amount of money consumer spends in buying a house versus income.
Consider the figure I1 if y is not observed (because of censoring) all such observations (=n2) denoted by crosses will lie on the horizontal axis
If y is observed, the observations (=n1) denoted by dots will lie in the X – Y plane.
Y1 = B1 Xi + Ui if RHS>0 = 0
It is intuitively clear that if we estimate a regression line based on the n1 observations only, the resulting intercept and slope coefficient are bound to be different than if all the (n1 + n2) observations were taken into account.
One can estimate tobit or censored regression models by maximum likelihood. But James Heckman has proposed an alternative to the ML method, which is comparatively simple this alternative consist of a two – step estimating procedure. In step 1, we first estimate the probability a consumer owning a house which is done on the basis of the probit model. In step 2, we estimate the model by adding to it a variable called (inverse mills ratio or the hazard rate) that is derived from the probit estimate.
The Heckman procedure yields consistent. Estimates of the parameters but flay are not efficient as the ML estimates. Since most modern statistical software packages have.
ILLUSTRATION OF TOBIT MODEL
Ray fair’s model of extramarital affairs in an interesting and theoretically innovative article, Ray fair collected a sample of 601 men and woman that is married for the first time and analyzed their responses to a question about extramarital affair.
The variables used in this study are defined as follows:
Y = number of affair in the past year
Z1 = O for female I for male.
Z2 = number of years married
Z4 = children: O if no children and 1 if there are children
Z5 = religiousness on scale of 1 to 5, 1 being ant religion
Z6 = education, years: graded school = 9 high school
= 12 PhD or other = 20
Z7 = Occupation, “Hollingshead” scale, 1. 7
Z8 = self – rating of manage, 1 = very unhappy, 5 = very happy of the 601 responses, 451 individuals had no extra marital affairs, and 150 individual had one or more affairs.
In terms of figure 1, if we plot the number of affairs on the vertical axis and say education on the horizontal axis, there will be 451 observations lying along the horizontal axis thus, we have a censored sample and a tobit model may be appropriate.
The model supposes that there is a latent (i.e. unobservable) variable yi. This variable linearly depends on Ti via a parameter (vector) which determines the relationship between the independent variable (or vector) Ti and the latent variable yi (just as in a linear model). In addition, there is normally distributed error term Ui to capture random influences on this relationship. The observable variable yi is defined to be equal to the latent variable whenever the latent variable is above zero and zero otherwise.
Yi = yi if yi > 0
0 if yi ≤ 0
where yi is a latent variable:
Yi = Bxi + ui, ui N(0, o2)
If the relationship parameterB is estimated by regressing the observed yion xi, the resulting ordinary least squares regression estimator is inconsistent. It will yield a downwards-based estimated of the slop coefficient and an upwards-biased estimated of the intercept. Takeshi Amemiya (1973) has proven that the maximum likelihood estimator suggested by Tobin for this model is consistent.
The B coefficient should not be interpreted as the effect of xion yi, as one would with a liner regression model; this is a common error. Instead, it should be interpreted as the combination of (1) the change in yi of those above the limit, weighted by the probability of being above the limit; and (2) the change in the probability of being above the limit, weighted by the expected value of yi if above.(2)
Variations of the Tobit model
Variations of the Tobit model can be produced by changing by where and when censoring occurs. Amemiya (1985) classifies these variations into five categories (Tobit type 1 – Tobit type V), where Tobit type I stands for the first model described above. Schnedler (2005) provides a general formula to obtain consistent likelihood estimators for these and other variations of the Tobit model.
The Tobit model is a special case of a censored regression model, because the latent variable yi cannot always be observed while the independent variable xi is observable. A common variation of the Tobit model is censoring at a value yL different from zero:
Yi = yi if yi > yL
yL if yi ≤ yL
another example is censoring of values above yu.
Yi = yi if yi > yu
yu if yi ≤ yu
yet another model results when yi is censored from above and below at the same time.
Yi = yi if yi > yL< yu
yL if yi ≤ yL
yu if yi ≤ yu
The rest of the models will be presented as being bounded from below at 0, though this can be generalized as we have done for type 1.
Type II Tobit models introduce a second latent variable.
Yi = y2i if yi > 0
0 if yi ≤ 0.
Heckman (1987) falls into the type II Tobit. In Tpe 1 Tobit, the latent variable absorb both the process of participation and ‘outcome’ of interest. Type II Tobit allows the process of participation/selection and the process of ‘outcome’ to be independent, conditional on x.
A lots of problems related to this are available in literature. The following is one example which we have taken from taken from the website http://www.ats.ucla.edu/stat/sas/dae/tobit.htm.
Example 1: Consider the situation in which we have a measure of academic aptitude (scaled 200-800) which we want to model using reading and math test scores, as well as, the type of program the student is enrolled in (academic, general, or vocational). The students who answer all questions on the academic aptitude test correctly receive a score of 800, even though it is likely that these students are not “truly” equal in aptitude. The same is true of students who answer all of the questions incorrectly. All such students would have a score of 200 (i.e. the lowest score possible). Meaning that even though censoring from below was possible, but it does not occur in the dataset.
What is Hypothesis Testing?
A statistical hypothesis is an assumption about a population parameter. This assumption may or may not be true. Hypothesis testing refers to the formal procedures used by statisticians to accept or reject statistical hypothesis.
Hypothesis test is a method of making decisions using data from is a method of making decisions using data from a scientific study.
So the method in which we select samples to learn more about characteristics in a given population is also known as hypothesis testing. Hypothesis testing is really a systematic way to test claims or ideas about a group or population
Types of hypothesis
There are two types of hypothesis:
1. Null hypothesis denoted by H, is usually the hypothesis that sample observation results purely from chance.
2. Alternative hypothesis: The alternative hypothesis an alterative denoted by Hi or Ha, it is the hypothesis (Hi is that sample observations are influenced) a statement that by some non-random cause directly contradicts a null hypothesis for example, suppose we wanted to determine by stating that whether a coin was fair and balanced. The actual value a null hypothesis might be that half of the of a population flips would result in heads and half in parameters is Tails. The alternative hypothesis might be less than, (<) that the number of the leads and Tails greater than> or would be very different. Symbolically, not equal to (=) these hypothesis would be expressed as the value stated in the null H0: p = 0.5
hypothesis. The Ha: = 0.5
Alternative suppose we flipped the coin 50 times, hypothesis states resulting in 40 heads and 10 fails, what we think given this result, we would be inclined is wrong about to reject the null hypothesis, we would the null hypothesis. Conclude, based on the evidence, that the coin was probably not fair and balanced.
So the alternative hypothesis establishes where to place the level of significance. The level of significance refers to a criterion of judgment upon is a decision is made regarding the value stated in a null hypothesis not that the only reason we are test the null hypothesis is because we thank is wrong. Then we have to state what we think is wrong about the null hypothesis in an alternative hypothesis.
For example children watching TV, we may have reason to believe that children watch more then (>) or less (>) 3 hours of TV per week. When we are uncertain of the direction, we can state that the value in the null hypothesis is not equal to (=) 3 hours
In a court room, since the defendant is assumed to be innocent (This is the null hypothesis) the burden is on a prosecutor to conduct a trial to show evidence that the defendant is not innocent.
In a similar way, we assume the null hypothesis is true, placing the burden on the researcher to conduct a study to show evidence that the null hypothesis is unlikely to be true. Regardless we always make a decision about the null hypothesis (that it is likely or unlikely to be trice)
Where as an alternative hypothesis (Hi) is the statement that directly contradicts a null hypothesis by stating that the actual value of a population parameter is more than (>) or less than (<) or not equal to (=) the value stated in the null hypothesis. The alternative hypothesis states what we think wrong about the null hypothesis.
Four steps in hypothesis
Statiscians follows a formal process to determine whether to reject a null hypothesis, based on sample data this process, called hypothesis testing, consist of four steps.
· State the hypothesis: This involves stating the null and alternative hypothesis. The hypothesis are stated in such a way that they are mutually exclusive. Ie, if one is true the other must be false.
· Formulate an analysis plan: the analysis plan describes how to use sample data to evaluate the null hypothesis. The evaluation often focuses around a single test statistic.
· Analyze sample data:- find the value of the test statistic (mean score, proportion, to score, Z score, etc) described in the analysis plan
· Interpret results: Apply the decision null described in the analysis plan. If the value of the test statistic is unlikely, based on the null hypothesis, reject the null hypothesis.
Can Null Hypothesis Be Accepted
Some researcher say that a can null hypothesis be accepted test can have one of two outcomes. You accept the null can null hypothesis be accepted or you reject many statisticians, however take issue with the notion of “accepting the null can null hypothesis be accepted” instead they say you reject the null can null hypothesis be accepted or you fail to reject the null can null hypothesis be accepted. Why the distention between “acceptance” and “failure to reject” Acceptance implies that the null can null hypothesis is true. Failure to reject implies that the data are not sufficiently persuasive for us to prefer the alternative can null hypothesis be accepted over the null can null hypothesis be accepted other approach to decision making, such as Bayesian decision theory, attempt to balance the consequences of incorrect decisions across all possibilities rather than concentrating on a simple null can null hypothesis. A number of other approaches to reaching a decision based on data are available via decision theory and optimal decisions. Some of which have desirable properties, yet hypothesis testing is a dominant approach to data analysis in many fields of science. Extensions to the theory of hypothesis testing include the study of the power of tests, which refers to the probability of correctly rejecting the null hypothesis when a given state of nature exist. Such consideration can be used for the purpose of sample size determination prior to the collection of data
Two types of errors can result form a hypothesis test
Type 1 error: a type 1 error occurs when the researcher rejects null hypothesis when it is true. The probability of committing a Type 1 error is called significance level. This probability is also called alpha, and is often denoted by a.
The II error. A type II error occurs when the researcher fails to reject a null hypothesis that is false. The probability of committing a type II error is called Beta, and is often denoted by b. The probability of not committing a type II error is called the Power of the test.
The analysis plan includes decision rules for rejecting the null hypothesis. In practice, statisticians describe these decision rules in two ways-with reference to a P-value or with reference to a region of acceptance.
P-value: The strength of evidence in support of a null hypothesis is measured by the P-value. Suppose the test statistic is equal to S. the P=value is the probability of observing a test statistic as extreme as S, assuming the null hypothesis is true. If the P-value in less than the significance level, we reject the null hypothesis.
Region of acceptance. The regional of acceptance is a range of values. If the test statistic falls within the region acceptance, the null hypothesis is not rejected. The region of acceptance is a range of values. If test statistic falls within the region o acceptance, the null hypothesis is not rejected. The region of acceptance is defined so that the chance of making a type I error is equal to the significance level.
The set of values outside the region of acceptance is called the region of acceptance is called the region of rejection. If the test statistic falls within the region of rejection, the null hypothesis is rejected. In such cases, we say that the hypothesis has been rejected at the a level of significance.
These approaches are equivalent. Some statistics texts use the P-value approach; others use the region of acceptance approach. In subsequent lessons, this tutorial will present examples that illustrate each approach.
One Tailed and Two Tailed Tests.
A test of a statistical hypothesis, where the region of rejection is on only one side of the samplings distribution, is called a one-tailed test. For example, suppose the null hypotheses would be that the mean is greater than 10. the region of rejection would consist of a range of number located, located on the right side of sampling distribution; that is a set of number greater than 10
A test of a statistical hypothesis, where the region of rejection is both sides of the sapling distribution, is called a two-toile detest. For example, suppose the null hypothesis states that the mean is equal to 10. the alternative hypothesis would be that the mean is less than 10 or greater than 10. the region of rejection would consist of a range of number located on both sides of sampling distribution; that is, the region of rejection would consist partly of numbers that were less than 10 and partly of numbers that were greater than 10.
Damodar N. Gujarati (2000) Basic econometrics
Micheal Ugota Awoke (2001) Econometrics theory and application
International encyclopedia of the social sciences (2008)
Mcdonald, John f., Moffit, robber A. (1980), “The Uses of tobit Analysis”, The Review of Economics and Statistics (The MIT Press) 62 92): 318-321