Notes on: Cooper, B (1998) 'Using Bernstein and Bourdieu to understand children's difficulties with "realistic" mathematics testing: an exploratory study'.  Qualitative Studies in Education, 11 (4): 511-32.

Dave Harris

There has been an increasing stress on teaching maths in supposedly realistic contexts, but this might contribute to the 'differential validity' of the test items.  The problem arises because children face different difficulties 'when negotiating the boundary between esoteric mathematical knowledge and their everyday knowledge' (511), the inverse of the usual problem about difficulties in applying mathematics.  Detailed responses from two children are discussed, who vary both in terms of gender and social class, and then the results are discussed in connection with Bernstein and Bourdieu on the effects of socioeconomic status.

The maths curriculum was reformed in the 60s and 70s in favour of practical problem solving and investigational approaches, which would be tested in assessed coursework.  The reforms were supposed to address real life applications of mathematics, so assessment moved away from mathematical criteria as such towards looking at 'problems with a range of possible answers'.  The boundary between school maths and every day life [classification for Bernstein] 'was to be weakened'.  These tendencies have been downgraded since, and the proposals were greeted with different degrees of enthusiasm - in a minority of schools, all mathematics is taught through investigations.

New forms of testing were to be developed as well, following new policy committees to include policy makers and educators.  The result was 'an attempt to argue for "authentic assessment"' (513), at the same time as an attempt to move towards 'simple paper and pencil tests of limited educational objectives'.  The model that resulted from the Task Group on Assessment and Testing emphasized continuous assessment by teachers, combined with externally set standard assessment tasks (SATs) at the end of the four key stages.  External assessment has become privileged over continuous assessment, but SATs will also include some practical and investigative work.  The original purpose of the SAT was to moderate teacher assessment, but there was a move towards controlled assessment for all attainment targets - leading to a fear of  excessive workload and a teacher boycott, and only a partial take up of the tests, in most cases with the results not reported to government.

The compromise resulted in the form of paper and pencil tests involving 'contrived tasks', a compromise with investigative practice.  These run the risk of disadvantaging pupils attempting to relate maths to the real world.  There was a critique that performing the tests might involve actually avoiding drawing on everyday knowledge, especially at Key Stage two.  What resulted was a separation of providing the solution and having to demonstrate the process of working subsequently, and then pencil and paper tests that simulated realistic settings and contexts - but this also threatens validity because it involves different capacities for children with different socioeconomic backgrounds.

Subsequent simplification of national tests also threaten both assessment and effective pedagogy.  The Dearing Report suggested that assessment be reorganized in terms of levels, descriptors and clusters which were subsequently incorporated into the National Curriculum.  This might prevent fragmented teaching, but once more it was a policy adopted without regard for any particular research on the affects of assessment tests.

Data were accumulated following work carried out in 1994 in one primary school.  15 interviews with year six children followed their work on a series of national curriculum tests.  The school had boycotted, so there were no rehearsals.  'Many problems with the tests and the associated marking schemes' emerged (514), especially in terms of threats to validity.  In particular, a valid test should have '" minimal construct- irrelevant variants"' (515) - [in other words, every day knowledge should not unduly affect the validity of testing mathematical knowledge].

[The children are described.  The claim is that any differences 'might be systematically distributed across the socioeconomic structure'.  However, we must see the cases as illustrations rather than proof.  Bernstein and Bourdieu might be discussed in support]

[Details of the 1994 tests follow.  In one, charts are presented showing the colours of socks worn on a particular school day by girls and boys.  The data are presented as pie charts representing percentages, but there are more girls than boys.  The test is to agree or not with the statement that more girls wore patterned socks than boys did, and the trick is that even though they have the same percentage of the population, the population of girls is bigger].

The children gave different reasons for arriving at the answer, even though they both gave the right answer.  The working class boy introduced all sorts of extraneous detail about socks and what girls like, 'inappropriate use' (517) of real world experience.  Bernstein's data [with Holland] are then reviewed, showing children items organized in terms of context independent variables [a list of animal products, vegetable products, cereals].  The test was to group together food items.  Class differences emerged because middle class children 'were more likely to use general principles of classification'[eg both made from milk], while working class children referred to their every day life [eg that's what we have for Sunday dinner].  Bernstein links this with his general theory of pedagogic codes, differentiating the organization of knowledge by pedagogues in terms of 'particular values of classification and framing', while learners have different access to 'recognition and realization rules'.  The recognition rule helps people generate text by connecting together suitable realizations.  There are other implications in that realizations are made public [so they can be shared, widely or not].  Recognition might include recognizing power relations involved in pedagogy [not critically, but as a definition of what is required]  although this still might not produce legitimate realizations, as with '"many children of the marginal classes"' [quoting Bernstein].

Bernstein went on to discuss the findings at greater length, arguing that this was not just a difference between abstract and concrete thinking, because there was a social basis.  Marginal class classifications have a direct relation to the local context and local experience as their material base, while legitimate classifications have a less direct relation: both relate to the same material base [local experience] but in different ways.  Middle class children have two principles of classification, arranged in a hierarchy [and can choose the appropriate one].  Working class children tend to take a more literal interpretation of the coding instructions, '"a non specialized recognition rule"', reflecting local contexts [which are generalized even to special cases], while middle class children realise that they are operating in a specialised context.  The test instructions, to choose any form of grouping that they want, depends on these realization rules.  That in turn depends on middle class children seeing '"the strong classification between home and school"', which in turn reflects the domination of official pedagogic practice and meanings over local ones.

[In another test in this paper, children were asked to sort objects according to different criteria - sorting rubbish in this case, with one example provided.  The middle class girl sorted objects according to whether they were three dimensional or  two dimensional, containers or not.  The working class boy originally thought of a classification depending on whether objects were metal or glass on the one hand, and paper and card on the other, and justified his decision according to whether items can be crushed, or whether they could be left out for the dustman or had to be recycled, accompanied with an anecdote about a neighbour disposing of some rubbish].  The girl's response is like the typical middle class response discussed above, and she seems to have privileged the rule about dimensionality over her everyday experience [which she also refers to].  The working class boy's answers are still tied to the material base of his local knowledge, even though he begins with some general properties.  'Diane knew explicitly at a metacognitive level what she had needed to do - and what she needed to censor- in order to produce "legitimate text"'(520), which is a recognition of the power relations of school.

[In a third test, the children are shown a diagram of a tree, and told that it measures 21,500 millimetres in height.  The test involves translating that measure into a 'more appropriate' metric unit, apparently to test the standardized attainment of using units in context.] Diane knew the rule that there were 1000 millimetres in a metre, but was puzzled because she saw it as taking off three noughts and there were only two.  The teacher intervened to explain that there were 21 thousand millimetres, and then she got it.  She also commented that measuring a tree in millimetres would be unrealistic in real life but explained that she was just doing what she had been told in the tests rather than questioning them - that would result in being told off by a teacher.

[In the fourth test, the diagram shows a bag containing pieces of card with people's names on them.  The test is to pair up boys and girls, three of each, with nine possibilities.  The test was provided officially only for the most able children. One example is given in the test rubric.] Here, Mike 'produces an initial false negative', as did several other of the children.  What was required was something more like Piaget's formal operations, but the physical analogue of drawing names out of the hat was much more limited - it implied that once drawn out, the names would not be put back to be recombined, and if they were physically put back to be redrawn at random, a number of repetitions was likely, so it would take a long time before the nine different possibilities were actually produced.  Diane solved the problem by taking each boy's name and combining it with different girls' names, even using ditto marks in place of the names, although she was not so good at explaining why nine combinations resulted.  Mike began by muttering about actually putting hands into bags, taking out names on top, then taking out names further down - this gave some of the possibilities but not all, and Mike saw the results as a matter of luck whether names are on top for not [just as in the everyday example mentioned when balls are drawn out of a hat to decide opponents in the FA cup].  Nevertheless he came up somehow with nine combinations.  He also said he was confused by the instruction to pair names, thinking that this would just produce one pair.  This shows that Mike also had a metacognitive awareness of different choices, that he had appropriate realization rules, but had not recognized the requirement for them in the context - while the interview subsequently made this clear, the test did not.

With the other children, those who had not produced nine pairs initially were allowed to try again.  The success rate rose, so this cannot be 'a very reliable or valid test item', and the narrative context seems to have a substantial effect (525).

We can connect this to Bourdieu on the responses to works of art in Distinction, to see if adults have the same sort of difference rules as children do.  Bourdieu also looks at the boundary between 'everyday and esoteric concerns and frames of reference', even though the data are old.  However, organizing forms may not be that easy to change.  Bourdieu uses the term habitus rather than rule, and habitus is rooted in socioeconomic and cultural experience, to produce durable dispositions, but also a certain vagueness and indeterminacy - it produces generative capacities [rather than fixed rules].

Distinction discusses differences between cognitive and aesthetic frameworks of members of different social classes, where the main differences are in  terms of whether people respond to function or form, abstract or everyday, whether art is seen as autonomous or to be reduced to everyday life.  [The example is the classic one of reactions to a photograph of work worn hands of old women - which happens to be my favourite example as well!].  We can use Bourdieu to understand the children's responses, whether they focus on form, or whether examples are related to everyday life. Overall, what seems to be important is how the contexts and items in tests are read, and that there are differences between the social classes here.

In technical terms, we can see these factors as affecting test validity. A model can be devised [ 528 see below] to address these issues as problems for tests, allowing especially for cultural background.  Detailed analysis will be required to establish the strength of the causals, and the remaining issue is whether competence in general is the same as mathematical competence specifically.  Certainly, competence does need to be differentiated, allowing for socioultural factors, and context for practice might be particularly important.




Constructivism in particular tends to 'operate with the model of an acultural child' (528), assuming some shared understandings developing a specifically in classrooms.  Bernstein and Bourdieu need to be examined instead.  Although social class does seem important, other factors might also have to be considered, including gender [and some work by Cooper and others is cited].  Current testing is obviously based on a set of assumptions about what should count as school maths and what should be tested, but there is a 'growing concern that performance assessment in particular might be associated with unfairness in respect to cultural background' (529).  This paper has not discussed 'consequential validity' [all these types of validity are based on the work of Messick 1989], but that issue raises further concern that items used to improve pedagogy can also 'lead to less fair assessment outcomes'.

Back to ed studs page