Assignment II for Biostats Course VHM 801 at AVC - Fall semester 2021

The assignment is worth 10% of the final course mark. Please be aware that by handing in the home assignment you implicitly acknowledge to have read and accepted the instructions for home assignments as described on the VHM 801 homepage.

A laboratory routinely carries out measurements of the fibre content (in percent) in soy-bean cakes (animal feed). Two laboratory technicians, one experienced and one relatively unexperienced in this particular type of analysis, have analysed (different) samples of a particular batch submitted for testing. Their measurements are given in the listing below (also available in Minitab format and as a comma-separated file, for import into Stata and other statistical software). The technicians are labelled 1 and 2, and it is not revealed which of them has more experience.

Laboratory technician 1:
12.28 12.53 12.25 12.37 12.48 12.58 12.43 12.30 12.46 12.43

Laboratory technician 2:
12.25 12.45 12.31 12.41 12.30 12.20 12.25 12.64 12.26 12.73 12.42 12.17 12.09 12.23

The assignment has five questions to be answered. Make sure to include in your answer information about your model assumptions when ever that would be relevant.

  1. For each of the technicians, compute a 99% confidence interval of the fibre content in the batch, and interpret your interval(s) carefully. Include in your interpretation whether you consider the interval(s) as approximate or exact.

  2. The (mean) fibre content for the batch was stated by the producer to be 12.36%. Based on the results for each technician, give a statistical assessment of whether each technician's measurements seem to be in agreement with the producer's value. State your conclusion(s) carefully.

  3. There was concern that the less experienced technician could have a bias in the measurements, that is, a higher or lower measured fibre content (beyond chance variation) than a more experienced technician. Carry out a statistical test to determine whether there is evidence of such a bias in the data.

  4. Further investigation of the two technicians' measurements was based on the expected range (or interval) for a single measurement. For this purpose, we will assume the producer's stated mean content (of 12.36%) to be correct and, based on past experience, that laboratory measurements of this type are approximately normally distributed with a laboratory error (standard deviation) of 0.10%. Use these assumptions to compute a 95% range (interval) for any single measurement, valid for both technicians. That is, with the assumptions made about the distribution of measurements, there should be a 95% probability that any (new) measurement falls inside the interval.

    Next, determine for each technician the number of their actual measurements falling outside the expected interval. Finally, use these counts to test (separately for each technician) whether the measurements agree with the model/assumptions, in the sense that the proportion of measurements outside the interval agrees (beyond chance variation) with what would be expected from the model.
    (Hint: First determine the distribution of the number of measurements that fall outside the interval, if the stated model is correct. Then set up a statistical test (with suitable hypotheses) in that distribution, and compute a P-value using the general principles of statistical tests.)

  5. Summarise the results of your investigations in the questions above into a conclusion about the performance of each technician. In this part, include also include any other features of the distributions of the measurements by the technicians that you find relevant. You may also try to guess which of the two technicians is the less experienced one (note: you are not required to guess correctly to receive a full mark for this question).

Henrik Stryhn (hstryhn@upei.ca) 2021-10-17