Punti deboli dell’indagine PISA segnalati nel volume curato da Stefan Thomas Hopmann, Gertrud Brinek e Martin Retzl dell’università di Vienna: PISA According to PISA, pubblicato da LIT VERLAG GmbH & Co. KG, Vienna e Berlino, 2007

Version imprimable de cet article Version imprimable

Dubbi scientifici sulla validità dell’indagine PISA

Da anni, Gerard Bracey pubblica nella rivista americana "Phi Delta Kappan", una relazione sullo stato dell’istruzione statale negli Stati Uniti intitolata "The Condition of Public Education". Il 18esimo rapporto della serie, dal titolo " Schools - Are - Awful Bloc Still Busy in 2008" è incluso nel No. 2, vol. 90, ottobre 2008, di "Phi Delta Kappan", pp. 103 - 114. Bracey vi mette in evidenza varie stranezze del dibattito sulla scuola negli Stati Uniti e denuncia diverse spiegazioni fasulle sullo stato della scuola americana. Tra queste, quelle derivate dai risultati conseguiti dai quindicenni americani nell’indagine PISA.

Si riproduce qui il capitolo del rapporto Bracey che riguarda PISA perché vale la pena considerare l’argomentazione dell’autore che costringe a aprire un dibattito scientifico sulla pertinenza dei dati PISA e sulla validità di quest’indagine comparata su vasta scala.

Bracey è noto per le forti riserve di natura metodologica sollevate da sempre nei riguardi dell’indagine PISA nonché per la contestazione dei risultati dell’indagine e la denuncia dell’uso fattone negli Stati Uniti.

PISA LEANS, MAYBE FALLS

In the U.S., the Programme of International Student Assessment (PISA) has caused only momentary distress. Here, PISA, especially the 2006 administration, has been used as a general purpose cudgel — especially since neither TIMSS, on which we rank higher, nor PIRLS, on which we rank high, can be used for that purpose. School critics present the ranks, nothing else, and only for OECD countries, although another 27 countries take part. The ranks supposedly prove that America can’t cut it in the global economy.

British economist S.J. Prais has a slightly different view:

"That the U.S., the world’s top economic performing country, was found to have schooling attainments that are only middling casts fundamental doubts on the value, and approach, of these surveys [e.g., PISA]. It could be that the hyper-involved statistical method of analysis used is, as many have suggested, wholly inappropriate. Or it could be, as two U.S. academics have suggested, that the level of schooling does not matter all that much for economic progress; rather it is ‘Adam Smithian’ factors such as economies of scale, and minimally regulated labor markets that allow U.S. ‘employers enormous agility in hiring, paying, and allocating workers." [1]

The Swiss-based Institute for Management Development and the World Economic Forum back Prais’ contentions about the economic performance of the U.S.

Prais’ comment comes in a chapter he wrote for PISA According to PISA, edited by S tefan Thomas Hopmann, Gertrude Brinek, and Martin Retzl, all of the University of Vienna. The book holds PISA up to its claims of reliability, validity, importance, etc. It might be, according to the editors, the first independent look at PISA, the first examination not done by PISA officials themselves. The study does not fare well.

In their introduction, Hopmann and Brinek paint a sad picture of how PISA operates. “What emerged [as we produced this book] was a picture not unlike that seen in the behavior of large companies when they encounter a potential scandal. . . . If some critique is voiced in public, the first response seems to be silence. Numerous PISA officials were invited to contribute to the book and all declined, one saying one doesn’t want to provide ‘a forum for unproven al- legations.’’’

“If that is not enough, the next step is often to raise doubts about the motives and the abilities of those who are critical of the enterprise,” write Hopmann and Brinek. “The next step is to acknowledge some problems, but to insist that they are very limited in nature and scope, not affecting the overall picture. . . . Finally, there is the statement that the criticism does not contain anything new, and nothing that has not been dealt with in the PISA research itself — and of- ten this claim is accompanied by references to opaque technical reports that only insiders can understand, or to unpublished papers or reports.” [2]

In Europe, PISA has had far more impact on discussions of curriculum, structure, and instruction than here.

What is wrong with PISA? Lots.

Let’s start with the items, at least the ones we know about — PISA officials have exhibited extraordinary secrecy about the whole project. Peter Fensham, an Australian science educator and member of both PISA and TIMSS, deplored the secrecy: “By their decision to maintain most items in a test secret. . . TIMSS and PISA deny to curriculum authorities and to teachers the most immediate feedback the project could make, namely the release in detail of the items, that would indicate better than framework state- ments, what is meant by ‘science learning.’ The re- leased items are tantalizing few and can easily be mis- interpreted.” [3]

Svein Sjøberg of the University of Oslo raises some of the same issues. A released math item shows a man’s foot about twice its actual size, contains typos, and, in the end, is impossible to answer because the picture on which the item is based presents contradic- tory information. “Students who simply insert num- bers in the formula without thinking will get it right. More critical students who start thinking will, how- ever, be confused and get in trouble.” [4].

Sjøberg wonders about the translations. PISA starts with “authentic text,” meaning that it has to have been published in one of the 60 countries involved. Well, he says, it might be authentic in the country of origin, but he is highly suspicious of what happens when it gets translated. He presents a science example about cloning from a newspaper article, “A Copy- ing Machine for Living Beings?” about Dolly, the cloned sheep. In addition to containing errors of fact, Sjøberg says the Norwegian version translated the headline word for word, rendering it into complete nonsense.

Not only do items begin with authentic text, Sjøberg quotes the PISA web site that the items must have no “cultural bias” and be “unanimously approved.” Sjøberg then lists alphabetically the first 13 countries taking part in PISA: Argentina, Australia, Austria, Azerbaijan, Belgium, Brazil, Bulgaria, Cana- da, Chile, Colombia, Croatia, the Czech Republic, and Denmark. “We can only imagine the deliberation towards unanimous acceptance of all items among 60 countries with the demands that there should be no cultural bias and that the context of no country should be favored.” Maybe you can imagine it, Svein, I can’t.

Marcus Puchhammer of the University of Applied Sciences in Vienna also has concerns about the language, but he expresses them quantitatively. [5]. He shows that items in German are substantially longer than the same items in English (this would hold true in French, as well). That should make life more difficult for German kids. And not only longer, they contain less frequently used words. Puchhammer compares some of the words in items to where they appear in the 10,000 most frequent words in both languages. Fifteen of 17 comparisons favor English. Four of the words or phrases, such as “clips,” “bar graph,” and “communicate,” don’t even appear in the German most frequent 10,000. “Average” is ranked 388th in English, 3,259th in German. Puchhammer notes that German grammar is considered more complex than English and that the German habit of injecting subordinate clauses into the middle of sentences likely degrades their readability.

Joachim Wuttke of the Jülich Research Center in Munich takes on some of the technical problems. [6] , says Wuttke, claims to measure the “outcomes of education systems in terms of student achievements.” But some of the participating countries have fewer than 60% of their 15-year-olds in school. Obviously, PISA can’t say anything about education outcomes in those countries. Although schools were supposed to exclude no more than 5% of the students from testing, the decision was left to “the professional opinion of the school principal, or by other quali- fied staff.” Wuttke contends this produces a “completely uncontrollable source of uncertainty.” Searching the technical report, Wuttke finds inconsistent means of excluding students. Denmark, Finland, Ireland, Poland, and Spain excluded students with dyslexia, Denmark excluded students with dyscalculia, and Luxembourg excluded recent immigrants.

There were other technical problems. For example, students in special schools for those with learning disabilities were given a one-hour test (the regular test took two hours) that contained easy items. In Austria, students in vocational schools were underrepresented, something that was not discovered until a new government, suspicious of the results, ordered an investigation.

Also, Wuttke contends, some countries do not have consistent databases, which led to 102.5% of 15-year- old Swedes being tested and 107.7% of Tuscans. And, as Prais points out in his essay, making 15-year-olds the unit of testing is itself a problem. Countries dif- fer in the percent of 15-year-olds in a given grade. Some will be in a class mostly with 14-year-olds and, if they’ve been held back twice, 13-year-olds. If they’ve been skipped ahead, most of their peers will be 16-year-olds.

Response rates of schools were supposed to be 85%. In the U.S., only 64.9% agreed to participate, most replacement schools declined, and the final rate was 68.1%. Wuttke observes that the U.S. contributes 25% of OECD’s budget.

Wuttke finds that “Only one-third of the items that had reached the field trial [stage] were finally used in the main test. Items that did not fit into the idea that competence can be measured in a culturally neutral way on a one-dimensional scale were simply eliminated. Field test results remain unpublished, although one could imagine an open-ended analysis providing valuable insight into the diversity of education outcomes.” [7]

PISA officials have often argued that students learn in school mostly in specific disciplines, yet the real world mixes science and mathematical problems with other considerations. Sjøberg observes that no paper- and-pencil test can mimic these kinds of interactions. Wuttke contends that the statistical analyses used in PISA are also a problem. In particular, Wuttke argues that the one-parameter Rasch model of Item Re- sponse Theory is wholly inappropriate. So why is it used? Wuttke thinks it is used because it’s the only model that yields unambiguous rankings. A multidimensional model could result in one country being #1 on dimension one, another #1 on dimension two, and so on. If that happened, no nation could claim unambiguously that “We’re #1!”

Fatigue and test-taking tactics also seem to play a role. Dutch students try to answer every item, but they often guess toward the end of the test. Austrian and German students skip many questions from the beginning on, leaving them enough time to finish without speeding up. Greek students either get tired or don’t have an internal sense of time. They start off well in the first block of questions, but by the time they get to the fourth (final) block of items, their non- reached items and missing responses top 35%.

In some countries, students apparently don’t understand that there can be only one right answer, and up to 10% of the items generate multiple responses from test takers. As Wuttke says, deciding if all five choices are correct takes more time than finding one correct answer and moving on.

In his epilogue, Hopmann notes PISA’s underlying assumptions:

The assumption that what PISA measures is somehow im- portant knowledge for the future. There is no research available which proves this assertion. . . .

The assumption that the economic future is dependent on the knowledge base monitored by PISA: [it] relies on strong and unproven arguments, which have no basis when, for instance, comparing success in PISA’s predeces- sors and later economic development.

The assumption that PISA measures what is learned in schools: this is not [even] PISA’s own starting point, which is not to use national curricula as a point of reference.

The assumption that PISA measures competitiveness of schooling (most of the variance in PISA is attributable to background factors).

The assumption that PISA thus measures. . . school structures, teacher quality, the curriculum, etc.

In short: PISA relies on strong assumptions based on weak data. [8]

[1] S. S. Prais, “England: Poor Survey Response and No Sampling of Teaching Groups,” In : Stefan Thomas Hopmann, Gertrude Brinek, and Martin Retzl, eds., PISA According to PISA (Wien: LIT-Verlag, 2008), pp.139-58.

[2] Gertrude Brinek and Stefan Thomas Hopmann, “PISA According to PISA: Does PISA Keep What It Promises?” in : PISA According to PISA, pp. 9-20.

[3] Svein Sjøberg, “PISA and ‘Real Life Challenges’: Mission Impossible?” in : PISA According to PISA, pp. 203-24

[4] Svein Sjøberg, “PISA and ‘Real Life Challenges’: Mission Impossible?” in : PISA According to PISA, pp. 217

[5] Marcus Puchhammer, “Language-Based Item Analysis: Problems in Intercultural Comparisons” in : PISA According to PISA, pp. 127-39.

[6] Joachim Wuttke, "Uncertainties and Bias in PISA" in : PISA According to PISA PISA, pp. 241-64.

[7] Joachim Wuttke, “Uncertainties and Bias in PISA,” in : PISA According to PISA, pp. 251

[8] Stefan Thomas Hopmann, “Epilogue,” in : PISA According to PISA, pp. 390-91.