| Related sites for http://people.ucsc.edu/~wittman/sec3.html |
| UCT_Fencing_Club Committee contacts, weapon facts and pictures from the University of Cape Town's fencing club. | | University_of_Natal_Durban_Fencing_Club News, location and photos. Also includes national rankings. | | College_Baseball_Links Links for NAIA baseball teams | | National_Association_of_Intercollegiate_Athletics_(NAIA) Official baseball site. Provides the qualification plan, ratings, championship updates, and the honors program. | | Bob_Uecker Filmography at IMDb with biography, trivia, and actor and TV guest credits. | | Bob_Uecker AmIAnnoying.com interactive poll concerning the annoyance factor for the announcer. Lists pros and cons of considering him annoying. | | Bob_Uecker BaseballLibrary.com profile with anecdotes, career and team-by-team statistics, and links. | | Bob_Uecker Milwaukee Buzz column profiling and interviewing the announcer for the Brewers about his current and former jobs. | | Bob_Uecker__Harry_Doyle Cast profile from the film Major League: Back to the Minors. | | Bob_Uecker_Quotes Baseball World collection of some of his one-liners, many self-depreciating about his baseball skills. | | Bob_Uecker_Statistics Baseball-Reference.com provides major league batting and fielding major statistics for each season and his career, his rank on various season and career statistical leaderboards, and comparisons to si | | Bob_Uecker_Statistics Baseball Almanac's biographical data, year-by-year statistics, career totals, and links to related material. | | CNN/SI_Baseball__Bob_Uecker Career batting, baserunning, and miscellaneous statistics with links to each team for which he played, year-by-year. | | Mr__Belvedere__Bob_Uecker_as_George_Owens Profile of his acting stint in the ABC sitcom. Includes photographs and links. | | Quotations__Bob_Uecker Baseball Almanac collection of quotes from the ballplayer turned announcer, many made from the broadcast booth. | | Uecker_Elected_to_National_Radio_Hall_of_Fame Milwaukee Brewers press release concerning their announcer's induction. (August 2, 2001) | | Orange_Coast_College Alumni achievements, statistics, match schedule, roster and coach information for this Costa Mesa, California community college. | | Doug_Daniels\'_Power_Page Articles on training and tips for reaching lifting goals. | | Fred_Hatfield\'s_Page Lots of information on powerlifting, weightlifting, and training in general. | | Power-Matrix A workout that increases strength and power without drugs. A brief description of correct techniques for bench press, deadlift and squat. | | Russian_Kettlebell_Fitness_com Forums, articles, newsletters, and workouts. Kettlebell training specifically geared towards the Special Operations Force Operator. | | CIS_Basketball Official site contains press releases, championships coverage, game schedules with results, and player awards. | | McMaster_Marauders Also features Canadian University Hoops Discussion List, many links to Canadian Athletics. | | American_Gymnastics_Association Includes events schedule, meet results, team information for all AGA members. | | British_Gymnastics Official governing body for the sport within the UK. Offers news, events, pictures and information for coaches and judges. | | Gymnastics_Ontario Governing body that works with the community clubs to provide programs and services. Organization information, clubs and job opportunities. | | Irish_Gymnastics Governing body for gymnastics in Ireland. Provides news, gallery, clubs, events and results, education, products and a section for members. | | Namibia_Gymnastics_Federation Governing body provides news, calendar, photo gallery and club directory. | | USA_Gymnastics_Online The official web site of USA Gymnastics. Live events, athlete biographies, member services, and games. | | Welsh_Gymnastics National association offers news, events, results and awards with information on disciplines, courses and club directory. | | Worldgymnaestrada Contains news releases, organization details, and history of future event. | | Nevis_Netball 2000 Caribbean Netball Championships. Also sections on netball history and regulations. | | Global_Greyhounds Discussion forum from greyhound racing fans worldwide. Also has a classified ads section. | | Hove_Owners_Forum Advanced cards and full form, plus trial results. | | Abacus_Racing Private syndicate with news and details of their racing greyhounds. East Sus,UK. | | Able_Acres_Racing_Greyhounds Breeders of racing greyhounds. The site includes pups for sale, brood, studs, boarding and links. Oklahoma, USA. | | Alton_Greyhound_Racing Racing greyhound classified ads for pups, broods, graded track dogs and sires. Syndication information. USA. | | Bahama_Mama_Greyhounds_Inc_ Greyhounds for sale and adoption including photos and 5-generation pedigrees. Florida, USA. | | Beechcroft_Kennels Breeders: Information on all the broods and what pups for sale, plus a diary of what is going on at the kennels. UK. | | The_Breeding_Station_Grefina All about their greyhounds with race results, photos, pedigrees and puppies. Czech Republic. [Czech, English, German] |
|
Wittman100cUCSC Applied Economics Laboratory and Research Seminar: Section 3 wittman@cats.ucsc.edu III. BASKETBALL PLAYER SALARIES Introduction Choice of variables Regressions Linear specification Multiplicative Specification An alternate specification Other ways of detecting discrimination Do whites play longer in general than their skills would suggest? Do cities with a higher percentage of whites, play whites a higher percentage of the time? Do teams with more white players do worse than teams with more black players? Discrimination among fans and sportswriters Opportunistic Empiricism White Men Can't Jump Predicting draft number How skilled are basketball scouts Data Files A. Introduction Sports statistics create a great opportunity to measure the relationship between productivity and income. The data is much more detailed than that typically available to economists. The basketball data set collected by Kahn and Sherer is very rich and allows us to test a number of hypotheses. Suppose that we want to find out the role of race in determining salaries. A simple-minded way of doing this is to run the following regression: ls SAL c RACE where RACE is 1 if white; 0 otherwise. The results suggest that there is no discrimination against black basketball players since the coefficient of RACE is negative, implying that whites make less than blacks (Please note that I sometimes use black and white for a short hand to the preferred African-American and European-American). While simple income comparisons (between ethnic backgrounds or genders) are commonly done, it is wrong methodologically, since one needs to control for productivity. In this case, productivity means how many baskets and rebounds each player makes. The work by Kahn and Sherer provides guidelines on proper econometric methodology. B. Choice of variables The Kahn and Sherer article, like most of the articles chosen for study in this course, is an exemplary model of research. Its results are convincing for a variety of reasons: (1) There is not one, but several related studies employing different data, all of which confirm in different ways the basic ideas. (2) The authors undertook various formulations of the econometric model and the effect of RACE was robust to the alternative formulations. (3) the authors have chosen a good data set -- the performance variables are relatively close to the ideal. (4) the authors are aware of the possible biases inherent in the data and account for them. The purpose of this course is to get you to think for yourself and develop critical understanding. You will not just replicate someone else's work (including mine). In this spirit, one should always critically assess others' work and try to improve on it. With regard to Kahn and Sherer's study, I believe that there is room for improvement in their choice of variables. In choosing variables one should think carefully. One does not just throw in variables which seem to make sense. One chooses the formulation that makes the most sense. Furthermore one needs to carefully consider the data. I start with the last point first. In this study income is a function of performance. If we do not include bonuses for playoff games, then income does not depend on this year's performance but rather on previous years' performance. That is, salary contracts are made before the start of the season and depend on previous years' performance with the preceding year's performance being most influential (unless there was a multi-year contract). Ideally we would have salary as a function of lagged performance. In this data set, we are given the total points over all seasons. Thus this data set implicitly assumes that performance is the same each year. Such an assumption is incorrect. But that is what we have to work with. In this study Kahn and Sherer use logs so that, in the original formulation, the variables are multiplied. Suppose that one thought that salary (SAL) should be a function of total offensive rebounds (OFFREB) in a year. Then one might want to have either OFFREB per year as a summary or break it down into constituent parts OFFREB PER MINUTE * AVERAGE MINUTES PER GAME PLAYED* GAMES PER YEAR. The authors have these last two variables denoted by MINS and GAMES respectively, but they have OFFREB per game not per minute. Given MINS and GAMES, it makes more sense to have offensive rebounds per minute than per game. Also note that POINTS is career points scored. It should be in the same units as OFFREB (either per game as the author did or per minute as I have suggested). I believe that the interesting variable is average minutes played by year, MINPYEAR, rather than its constituent parts, GAMES * MINS. Therefore MINPYEAR should be substituted since the constituent parts give no clue as to worth, and we should save on degrees of freedom when there is no cost in doing so. Also, I think that the variables should be per minute rather than per game (then minutes instead of games) since per game conflates productivity per minute and number of minutes per game and the variable games may not vary as much as minutes played per game. Also the negatives are more meaningful per minute. Someone who plays only a few minutes per game will have fewer fouls per game than someone who plays a lot of minutes per game; a measurement of fouls per game would make it look like the more fouls, the higher the pay. We want to capture the negatives and one of the negatives is missing shots. The authors use career field goal percentages (fraction made) but this is already embodied in total points. Again one might want to think of this as a formula. Instead of total field goal points, the authors should have used field goal points attempted per minute times field goal percentage. But better yet, instead of having FTPCT and FGPCT the authors should have had FTMISSED and FGMISSED (field goals missed per minute and free throws missed per minute). Once again, the negatives are in the same unit of account as the positives. I am somewhat skeptical about the use of CENTER and FORWARD. If players in these positions are better, they should be captured in the other variables such as OFFREB or ASSISTS. To also include CENTER would then be double counting. I do not see CENTER and FORWARD as proxies for other unmeasured variables, but those who know more about basketball may disagree and want to include them. While the authors do not use height, some students wanted to include height because taller players would be more productive, other things being equal. However, we already have these measures of productivity (for example, rebounds) and therefore one should not include height. STEALS and BLOCKS are such a rare event that I doubt that they would add to someone's salary. Now they might be a proxy for other skills, but the rarity of observations suggest little confidence in the coefficients. I might be inclined to drop them from the equation.(1) I would also be inclined to drop DRAFTNO since most of the other variables should be a good predictor of the number. If I were to keep it, it would be as a residual from the predicted DRAFTNO when the independent variables are the above productivity numbers (See section B2). There are two kinds of approaches to econometrics--throw everything into the soup (hoping that the econometrics will clarify the relationships) and carefully choosing the key ingredients (so that we know what we are eating intellectually). I prefer the latter approach. Hence I do not want both rebounds and height in my equations. The authors also use several variables concerning the characteristic of the local area, including RACEMSA, POPSMA, INCSMA. I am skeptical that these variables would be relevant. My skepticism does depend on how I characterize the market for basketball players. I believe that players are in competition with one another. To illustrate, suppose that there are two black players of equal skill and one player plays in a heavily white city and the other in a heavily black city and fans are prejudiced in favor of their own race. The team owner in the heavily black city will not pay more for the black player since he could get the other black player from the white city for less. Hence racial bias will not appear as variations in back pay across cities. Now theory is a good guide to setting up equations and choosing variables, but ultimately theory needs to be confronted with data. These variables could be left in and we could let the data show whether Wittman is right. My own taste is to not do this regarding the variables under discussion. In general, I like to limit the number of questionable variables thrown into the equation. If it is a central issue, then I will keep such variables in, even if questionable, since that is the question. Here I feel that these other variables are not as central to the question I am trying to answer (is there discrimination, not whether fans are the source of discrimination) and I will choose to not include them. HOMEATT is also a questionable variable. If the players draw the crowds because of their personalities or whatever beyond the wins implied by WINPCT or scoring, then it may be OK. But it may have nothing to do with the present players or embodied in the other variables and therefore useless. I would be inclined not to use it. In a nutshell, SEASONS, GAMES, CENTER, FORWARD, FTPCT and FGPCT would be dropped, FGMISSED and FTMISSED would be added, and all measures of productivity would be per minute. I would also drop RACEMSA POPSMA and INCSMA C. Regressions 1. Linear specification Using the variable we have identified in the last section (and dropping those that I found objectionable), a priori (before looking at the data) my choice of independent variables are: POINTPM = (2*TLFGM + TRIPTM + TLFTM)/TLMINS It is useful to consider the equation for POINTPM in greater detail. Total field goals made (TLFGM) includes 2 pointers and 3 pointers while free throws are worth 1 point. Therefore a triple pointer gets 2 points for being a field goal plus 1 point for being a triple pointer which adds up to3) OFFREBPM = OFFREB / TLMINS DEFREBPM = DEFREB / TLMINS ASSISTPM = ASSISTS / TLMINS PFOULSM = PFOULS / TLMINS MISFGPM = (TLFGA -TLFGM) / TLMINS MISFTPM = (TLFTA -TLFTM) / TLMINS Note: All these variables can be generated by putting "genr" before the equation. Since SEASONS has a zero in it, we first must change the smpl to exclude that observation. Luckily, there were no values of TLMINS that were zero: smpl if SEASONS > 0 genr MINPS = TLMINS/SEASONS ls SAL c POINTPM OFFREBPM DEFREBPM ASSISTPM PFOULSM MISFGPM MISFTPM MINPS RACE The regression results are very encouraging concerning the quality of the model. LS // Dependent Variable is SAL Date: 4/27/94 / Time: 2:35 SMPL range: 1 - 235 SMPL condition: SEASONS 0 Number of observations: 234 VARIABLECOEFFICIENTSTD. ERROR T-STAT.2-TAIL SIG.C -620775.42163593.67 -3.7946175 0.000POINTPM 1332533.7 276639.32 4.8168631 0.000OFFREBPM 1443412.4 1038525.7 1.3898668 0.165 DEFREBPM 2167483.6 473957.53 4.5731600 0.000ASSISTPM 1258202.1 426914.68 2.9471979 0.003PFOULSM -1711873.7 659265.93 -2.5966360 0.009 MISFGPM -430593.25 531381.41 -0.8103280 0.418 MISFTPM 187252.73 1552241.7 0.1206337 0.904 MINPS 112.57455 36.068379 3.1211424 0.002 RACE 108078.20 39311.358 2.7492869 0.006 R-squared0.559371Mean of dependent407236.6Adjusted R-squared 0.541667S.D. of dependent351579.2S.E. of regression 238020.1 Sum of squared resid1.27E+13Durbin-Watson stat1.829886 F-statistic31.59603Log likelihood-3223.867 The R-square of .56 is very high for cross section, especially considering the fact that the independent variables are not the same type of thing as the dependent variable. If one ran consumption against income, both are in dollars and consumption is a large part of income so a high R-square would not be surprising. In time series money might be regressed against money lagged. Again a high R-square would not be surprising. But here the high results are not guaranteed by the formulation of the data. The F-statistic, 31.6, is large and significant. More importantly, almost all of the coefficients have the correct sign giving us considerable confidence in the results. The more points per minute, offensive rebounds per minute, defensive rebounds per minute, assists per minute and minutes played, the higher the salary; the more fouls per minute and missed field goals per minute, the lower the salary. The only wrong sign is associated with missed free throws. It should be negative, but it is positive although not at all significant (0.90 probability). According to these results, being white is worth an extra $108,078 a year. The result is very significant (0.003 as a one tail test). Also according to the results an extra point per minute is worth $1,332,533 (remember this is based on data for 1985-86, when salaries where considerably lower). While the regression results are very supportive, one multiple regression is not conclusive. One should check whether the results are robust to alternative formulations, and other studies based on other data sets should be undertaken. I will now briefly discuss two alternative specifications based on the same data set. 2. Multiplicative Specification In the regression just discussed, the independent variables had an additive effect. I choose this because I felt that points and rebounds are additive in their effect on salary, not multiplicative (although minutes and points per minute are clearly multiplicative). Also a linear equation is easier to interpret. However in many empirical studies, it is common to assume a multiplicative effect between the independent variables (equivalently, that the variables are additive in their logs). Therefore, I took logs of all the variables considered in the previous multiple regression. Note that WHITE = log(RACE + 1). This is because log(0) is undefined while log(1) = 0. ls LSAL c LPOINTPM LOFFREBPM LDEFREBPM LASSISTPM LPFOULSM LMISFGPM LMISFTPM LMINPS LWHITE LS // Dependent Variable is LSAL Date: 4/27/94 / Time: 2:35 SMPL range: 1 - 235 SMPL condition: SEASONS 0 Number of observations: 234 VARIABLECOEFFICIENTSTD. ERRORT-STAT. 2-TAIL SIG. C 10.553533 0.2910370 36.261831 0.000 LPOINTPM 2.5613401 0.4921478 5.2044122 0.000 LOFFREBPM0.6260268 1.8475615 0.3388395 0.735 LDEFREBPM3.5845057 0.8431815 4.2511673 0.000 LASSISTPM 0.5036242 0.7594912 0.6631074 0.507LPFOULSM -1.9291459 1.1728495 -1.6448367 0.100 LMISFGPM -1.4192388 0.9453399 -1.5013000 0.133 LMISFTPM -1.9459130 2.7614742 -0.7046646 0.481 LMINPS 0.0005184 0.0000642 8.0789216 0.000 LWHITE 0.2844901 0.1008961 2.8196356 0.005 R-squared0.697809 Mean of dependent12.63027Adjusted R-squared 0.685667S.D. of dependent0.755267S.E. of regression 0.423443 Sum of squared resid40.16415Durbin-Watson stat1.816247F-statistic57.47251Log likelihood-125.8371 The regression results are a bit different than our earlier formulation. In general the coefficients are smaller, and the standard errors higher. LPFOULSM is only significant at the 10% level. However, in some ways the model suggests a better fit: the intercept is positive, the R-square is 0.6978, and LMISFTPM is negative. In any event, it remains true that whites again make more than blacks. (2) 3. An alternate specification One student suggested a totally different formulation. The measured variables may not capture the true productivity of a basketball player. Sports professionals may be able to better assess productivity than students doing a multiple regression. Therefore the student suggested an equation somewhat similar to the following: SAL = A + B (TEAMSAL - SAL) + C ALLPRO/SEASONS + D DRAFTNO + E RACE Because SAL is both the dependent and independent variable in this equation we must group SAL on the left of the equation: SAL = [ A / (1+B) ] + [ B / (1+B) ] (TEAMSAL - SAL) + [ C / (1+B) ] ALLPRO/SEASONS + [ D / (1+B) ] DRAFTNO + [ E / (1+B) ] RACE genr ALLPROPS = ALLPRO/SEASONS genr TSAL = TEAMSAL - SAL ls SAL c TSAL ALLPROPS DRAFTNO RACE LS // Dependent Variable is SAL Date: 4/27/94 / Time: 2:36 SMPLrange: 1 - 235 SMPL condition: SEASONS 0 Number of observations: 234 VARIABLECOEFFICIENTSTD. ERRORT-STAT.2-TAIL SIG. C330198.5961389.8955.37871240.000 TSAL0.02196490.01354131.62207070.105 ALLPROPS1270227.8105624.42 12.0258910.000 DRAFTNO-3163.2962647.31520 -4.8867942 0.000 RACE14224.233 38777.972 0.36681220.714 R-squared0.470153Mean of dependent407236.6 Adjusted R-squared0.460898S.D. of dependent 351579.2 S.E. of regression258142.0Sum of squared resid1.53E+13 Durbin-Watson stat1.721033F-statistic 50.80000 Log likelihood -3245.441 DRAFTNO should be negative since a higher DRAFTNO means an earlier pick. SAL is subtracted from TEAMSAL so SAL is not partially regressed against itself. Note that the sign of E depends on the racism of sportswriters and basketball scouts relative to the racism occurring in salaries. For example, suppose that sportswriters tended to choose whites for ALLPRO and that they overrated whites more than owners of teams overpaid whites. Then the coefficient of RACE would be negative since payment to whites would be less than thought justified by sportswriters (even though owners tended to slightly overpay white players). Still my a priori is that the coefficient of RACE will be positive. As can be seen, the coefficients are in the predicted direction, but the coefficient of RACE is insignificant (.357 as a one tail test). Once again the R square is quite high and the equation as a whole is very significant. Note that before I ran the regression, I decided not to include ALLSTAR. This is because I felt that ALLSTAR and ALLPRO would be highly correlated, creating multicollinearity problems. The regression results, LS ALLPRO C ALLSTAR, suggest that I was right to be concerned. One also needs to be aware of the potential biases that might arise when variables are only be imperfect proxies. Consider the variable SAL -- 1985-1986 Pro compensation. As the authors note, SAL does not include non-salary compensation such as bonuses. So what we might think of as yearly income may not be the same as the actual variable chosen. Suppose that SALARY underestimates yearly income that is E[u] < 0. then our assumptions justifying the use of least squares is violated and our least squares estimate of the intercept term is biased downwards from the true intercept. Suppose that the non-measured salary is likely to be greater for Whites (which the authors argue is the case, but their argument is not that compelling; there is also little reason to believe that the reverse is true). Then the least squares assumption regarding independence between the error term and the variable, RACE, does not hold and the least squares estimate of the coefficient on RACE (1 for white) is downward biased from the true relationship. Now if bonuses are not correlated with RACE, then the estimated coefficient of RACE is not biased but its variance is larger than otherwise. D. Other ways of detecting discrimination. The following four tests would not only be interesting exercises, but also useful contributions to our knowledge. As far as I know, there is no published research on these particular questions. 1. Do whites play longer in general than their skills would suggest? If this were the case, minutes played per season would be greater for whites. This could be examined by looking at: ls MINPS c POINTPM OFFREBPM DEFREBPM ASSISTPM PFOULFPM MISFTPM RACE LS // Dependent Variable is MINPS Date: 07/31/96 Time: 15:33 Sample: 1 235 Included observations: 234 Excluded observations: 1 VariableCoefficientStd. Error T-StatisticProb. C 588.1130 282.48762.0819070.0385 POINTPM 2524.891 311.3882 8.1085000.0000 OFFREBPM 1461.030 1955.084 0.747298 0.4557 DEFREBPM 5306.779 858.6971 6.180036 0.0000 ASSISTPM 3257.324 808.7326 4.027690 0.0001 PFOULSM -8804.913 1102.966 -7.982939 0.0000 RACE -99.46619 76.36512 -1.302508 0.1941 R-squared 0.580260Mean dependent var1740.593 Adjusted R-squared0.569166S.D. dependent var724.5671 S.E. of regression475.5911Akaike info criterion12.35857 Sum squared resid51344422Schwartz criterion12.46194 Log likelihood-1770.985F-statistic52.30187 Durbin-Watson stat 1.343757Prob(F-statistic) 0.000000 Note that I did not include MISFTPM since my earlier results suggested that this variable is unreliable. 2. Do cities with a higher percentage of whites, play whites a higher percentage of the time? This may have two components: more white players and playing them more often than justified. This is the key to discrimination in competitive markets -- segregation. One set of firms discriminate and the other non-discriminatory firms gain by reverse discrimination. The researcher needs to know economics in order to test for discrimination since salaries are part of labor markets. Also it is virtually impossible to test the apriori hypothesis of no discrimination (since statistical tests are designed to reject, not accept) 3. Do teams with more white players do worse than teams with more black players? This could be fairly easily tested 4. Discrimination among fans and sportswriters A test for discrimination among fans or sportswriters would have ALLSTAR and ALLPRO as the dependent variable.3 E. Opportunistic Empiricism As stated in earlier lectures, one purpose of this course is to make you into unrelenting empiricists so that whenever you hear a "factual" statement you ask the following: (1) how in principle the statement could be tested if any data were freely available and (2) how the statement can actually be tested given existing data. 1. White Men Can't Jump To illustrate from my own personal experience, when I saw the movie, "White Men Can't Jump," I immediately thought of some hypothetical tests. One could ask for a random sample of black and white men (or black and white pro basketball players) to jump and record how high their feet got off the ground or how high their hands reached (controlling for the person's height). But more exciting from the viewpoint of today's lecture, we have data to indirectly test the hypothesis. 4 Consider the following data: genr REB = OFFREB + DEFREB genr REBMIN = REB/TLMINS. Note that REBMIN is only an imperfect measure of jumping ability since getting rebounds also depends on being in the right place at the right time. In econometrics we often have to make use of imperfect proxies. On the other hand, some might say that part of being a good jumper is being at the right place at the right time. RACE = 1 if white. genr HEIGHT = 12 * HEIGHTF + HEIGHTI -Height thus gives t********************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************0069510-2.76843310.0061 HEIGHT0.02027740.000879023.0684480.0000 R-squared0.702210Mean of dependent var 0.191381 Adjusted R-squared0.699621S.D. of dependent var 0.081876 S.E. of regression0.044874 Sum of squared resid 0.463138 Log likelihood394.1070F-statistic271.1784 Durbin-Watson stat 1.891042Prob(F-statistic)0.000000 Our a prior expectations are that the coefficient of RACE is negative and the coefficient of HEIGHT is positive. The results are very strong. Both coefficients have the right sign and are highly significant (0.003 and 0.0000, respectively). The R-square is 70%. In the movie, the white player was not able to do a dunk shot but he was very good at shooting from a distance. Unfortunately, the data collected by Kahn and Sherer does not have statistics on dunk shots. However, other data may provide clues to jumping. Two point goals are shot close to the hoop, while 3 point goals and free throws are shot from farther away and are less likely to involve jumping. The next few regressions adjust the sample set to the following: smpl if SEASONS > 0 and TRIPTM > 0 and TLFTM > 0 and TLMINS > 0 and TLFGA > 0 genr TLS1 = 2*(TLFGM - TRIPTM) / (3 * TRIPTM + TLFTM) TLS1 is the ratio of points made from close up to points made at a distance. 2*(TLFGM - TRIPTM) assumes that the field goal measure includes 2 and 3 point shots. ls TLS1 c HEIGHT RACE LS // Dependent Variable is TLS1 Date: 3-25-1994 / Time: 21:42 SMPL range: 2 - 235 SMPL condition: SEASONS 0 AND TRIPTM 0 AND TLFTM 0 AND TLMINS 0 AND TLFGRIA + + 65 BELGIUM + + 66 CYPRUS + + 67 DENMARK + + 68 FINLAND + + 69 FRANCE + + 70 GERMANY + + endent var 4.003704 Adjusted R-squared -0.004663 S.D. of dependent var1.476499 S.E. of regression1.479937 Sum of squared resid 350.4343 Log likelihood -293.6690 F-statistic 0.624016 Durbin-Watson stat1.966688Prob(F-statistic) 0.537087 In this formulation the coefficient of HEIGHT should be positive and the coefficient of RACE should be negative. The results are only mildly confirming. The signs are in the correct direction, but the levels of significance are 0.134 and 0.305. The R-square is 0.008. There is no one correct way of defining variables and setting up equations. I have combined several variables into one dependent variable measure (TLS1). The above equation looks for comparative advantage, not absolute advantage (a black could be twice as good as a white player in two point field goals and three times as good in three pointers, and hence would look comparatively worse using the measure I have invented). Another possibility is to control for overall basketball ability, perhaps measured by minutes played in a season. One might then use one of the two following equations: genr TLS2 = TLMINS/SEASONS ls TLFGM c TLS2 HEIGHT RACE For a copy of the printout see the full (paper) copy or, genr TLS3 = TLFGM/TLFGA For a copy of the printout see the full (paper) copy On the other hand, the statement about white men not being able to jump may be a statement about basketball ability in general and measured on an absolute scale. In this way we would not want to control for ability in general since the statement would imply that blacks had a higher ability in general. Total points per minute might be regressed against RACE and height. genr POINTS = 2*(TLFGM - TRIPTM) + 3* TRIPTM + 2* TLFTM genr POINTPM = POINTS/TLMINS ls POINTPM c RACE HEIGHT For a copy of the printout see the full (paper) copy But here we know the answer already since blacks make up 75% of the National Basketball Association players and only 11% of the population, blacks are on average better players than whites. Which of these equations is best? Obviously, it depends on the question you are trying to ask. But one can also judge the question. The last equation is boring because we know the general answer already. Equation 1 answers the initial question most directly, but it is in the same spirit as equation 5. It is a judgment call, but my feeling is that equation 2 (where the dependent variable is TLS1) is best. It asks whether blacks play a different type of game than whites, not whether they are better. I think that this is a more interesting question with a more interesting answer since the answer is not so obvious. Equations 3 and 4 ask similar questions to 2, but not with such a direct and clear measure. 2. Predicting draft number This data set also contains information about college performance. For example, CFGM stands for field goals made in college. One could predict draft number based on college performance (The better the college performance, the lower the draft number). Unfortunately, colleges play in different quality leagues so the numbers are not that meaningful (I do very well against my 8 year old). So if possible, one would want to have a proxy for quality of competition (FFOUR is a possibility). 3. How skilled are basketball scouts? Even with the rudimentary skills taught in this course, I believe that students are capable of producing publishable research (in secondary journals) if they ask the right questions. I know virtually nothing about statistical studies of sports, but I suspect that the following question has not been answered previously with econometric tools and if cleverly done, might be publishable: What is the relation between draft choice and eventual performance? A rudimentary stab at this question might look at the following equation: genr POINTPS = POINTS/SEASONS genr REBOUNDS = (OFFREB + DEFREB)/SEASONS genr ASSISTPS = ASSISTS/SEASONS ls DRAFTNO c POINTPS REBOUNDS ASSISTPS For a copy of the printout see the full (paper) copy A more sophisticated study and a better data set would account for the fact that some draft choices are no longer playing (a real bad choice if they were drafted recently). Alternatively, one might confine the study to the first 2 or 3 years after the draft. One should always be aware of missing data and how it might alter the observed empirical results. Now that there are free agents, draft choice is not as important in the past. One could test whether there is declining care in choice by seeing whether R squared has declined over time. I do not want to spend a great deal of time on this issue. I just wanted to suggest that there are lots of questions that can be answered with the data sets provided in this course. Data Files Data File: NBADATA.ASC Source: Kahn, Lawrence M.; Sherer, Peter D., "Racial Differences in Professional Basketball Players' Compensation," Journal of Labor Economics v6, n1 (Jan. 1988):40-61. Name Variable Description ABAGAMES I3 (F3.0) number of ABA games ALLPRO I2 (F2.0) number of times all league 1st or 2nd team ALLSTAR I2 (F2.0)number of times named to all-star team ASSISTS I4 (F4.0)total pro assistsBLOCKS I4 (F4.0)total pro shots blockedBYEAR I2 (F2.0)--birth year (e.g. 55=1955) CAWARDS I1 (F1.0) --total college player of the year awards plus times named to first or second All-America Team CFGA I4 (F4.0)total college field goals attemptedCFGM I4 (F4.0)total college field goals made CFTA I3 (F3.0)total college free throws attemptedCFTM I3 (F3.0) total college free throws made CGAMES I3 (F3.0)total college games CHAMP I2 (F2.0)number of pro championship teams played on CMINS I4 (F4.0) total college minutes CONF I2 (F2.0) field not usedCREB I4 (F4.0) total college rebounds CSEA I1 (F1.0) total college seasons CTRPA I3 (F3.0) total college three point goals attempted CTRPM I2 (F2.0) total college three goals made DEFREB I5 (F5.0) total pro defensive reboundsDISQUAL I2 (F2.0) number of times disqualified DRAFTNO I3 (F3.0) college draft number EARLY I1 (F1.0) dummy variable for leaving college early FFOUR I1 (F1.0) number of trips to final four (college) GPLAY I3 (F3.0) number pro playoff games played HEIGHTI I2 (F2.0) inches to be added ontoHEIGHTFI1 (F1.0) height in feet, e.g. 6 or 7 NOTCOL I1 (F1.0) dummy variable for not attending college OFFREB I4 (F4.0) total pro offensive rebounds PFOULS I4 (F4.0) total pro fouls committed PLAYID I3 (F3.0) player ID number POSITION I1 (F1.0) position (1 or 5= center; 2,4 or 7= forward; 3 or 6= guard) PRODEF I2 (F2.0) number of times 1st or 2nd all-defensive team RACE I1 (F1.0) race, 1= white, 0= black SAL I7 (F7.0) 1985-6 pro compensation SEASONS I2 (F2.0) total pro seasons STEALS I4 (F4.0) total pro steals TEAM I2 (F2.0) NBA team (in alphabetical order: e.g. 1= Atlanta, 2= Boston, etc.) TEAMCH I2 (F2.0) number of pro team changes TLFGA I5 (F5.0) total pro field goals attempted TLFGM I5 (F5.0) total pro field goals made TLFTA I5 (F5.0) total pro free throws attempted TLFTM I5 (F5.0) total pro free throws made TLGAMES I4 (F4.0) total pro (NBA or ABA) games played TLMINS I5 (F5.0) total pro minutes played TRIPTA I3 (F3.0) total pro three point goals attempted TRIPTM I3 (F3.0) total pro three point goals made WEIGHT I3 (F3.0) weight in pounds YPLAY I2 (F2.0) number of years in the pro playoffs The following variables refer to the player's 1985-86 team ARENA I5 (F5.0) arena capacity COL83 F7.4 1983 SMSA cost of living index HOMEAT I6 (F6.0) previous season's home attendanceINCOME I5 (F5.0) 1983 SMSA per capita income in dollars MAX I4 (F4.2) maximum ticket price in dollars MIN I4 (F4.2) minimum ticket price in dollars POPCIT I5 (F5.1) 1980 city population (divided by 10000) POPCMA I5 (F5.1) 1980 Consolidated Metropolitan Area population (divided by 10000) POPMSA I5 (F5.1) 1980 Standard Metropolitan Statistical Area population (divided by 10000) RACECIT I3 (F3.1) percent of 1980 population in the city that was black RACECMA I3 (F3.1) percent of 1980 population in the Consolidated Metropolitan Area that was black RACEMSA I3 (F3.1) percent of 1980 population in the Standard Metropolitan Statistical Area that was black TEAMSAL I8 (F8.0) total team salary TOTAT I7 (F7.0) previous season's total attendance (home plus away) WINPCT I3 (F3.3) previous season's winning percentage Notes: (1) If a player only played a few minutes in a season, then our confidence in his output per minute variables would be reduced. In such a situation, weighted least squares should be used.(back to text) (2) The R-squares of the two equations cannot be directly compared since one is measuring percent explanation of the variation in SAL and the other percent explanation of the variation in LOG(SAL).(back to text) References: Kahn, Lawrence M.; Sherer, Peter D., "Racial Differences in Professional Basketball Players' Compensation," Journal of Labor Economics v6, n1 (Jan. 1988):40-61. back to the top |
|