[Sigia-l] Learnability and its impact on testing

Sun Sep 21 22:36:11 EDT 2003

As part of my Masters in Human Factors course we have recently been 
having an online discussion regarding covariance and learnability which 
I thought I would share with subscribers of this list.

 >>Lecturer wrote:
 >>"In a nutshell, covariance is variation in a sample that you cant get 
rid of using ordinary experimental controls eg participant selection, 
task structure, task allocation. You cant get rid of it because it is an 
inherent part of the people involved in the experiement. For example you 
cant control what people learn, their age, their gender, their experience.

 >>For example you what to see whether there is a difference in user 
performance using a one website prototype compared to another prototype 
for a specialist group like say geothermal physicists...

 >>You recruit as many of them as you can and you give them tasks on the 
first prototype design and you measure variables such as accuracy and 
errors and satisfaction.  Then you want to give them the second design. 
BUT when you give them the second prototype (content in both is
exactly the same) they have learnt something, for example what the 
content is, from having been exposed to the first prototype. So the 
second prototype will have an unfair advantage. Sadly you cant get rid 
of this learning but you can estimate its effect if any. This is where 
Analysis of Covariance (ANCOVA) comes in.  Analysis of Variance 
(ANOVA).... allows you to statistically partition out the effects of 
covariates so that you can see any differences associated with the 
experimental conditions 'as if' there was no learning, experience etc. "

My response:
Having done a fair bit of prototype and information architecture 
testing, I have actually noted that test participants can skew results 
even in a single test session because they have learnt something. I have 
also found that the rate of learning differs and resulting accuracy of 
results differs substantially depending on the test participant.  As 
such, I am not convinced that the learning variable can be easily 
quantified and the results adjusted accordingly.

For instance, I recently ran a series of tests with users to assess the 
ease of navigation of a website prototype (using the method outlined by  
Donna Maurer see: 
http://www.boxesandarrows.com/archives/cardbased_classification_evaluation.php).  
We spend about 20 minutes with each test participant and gave them about 
20-30 short scenarios and asked them where they would go in the site to 
find that content.  

As participants learnt the structure of the site they increasingly 
learnt and remembered where they had seen a particular menu label and 
hence were more likely to go to the right section first time. Some 
participants had little memory of where they had seen things and as 
such, their level of learning did not really impact results.  However, I 
remember one very intelligent woman who after about 5 minutes had almost 
memorised the whole structure of the site and was pretty excited that 
she got almost 100% task completion after that point. This pretty much 
invalidated her entire test session as we were not measuring how easy it 
was to find content but rather how good her memory was.  

The point of all this ramble?  Having read the articles on covariance, see
http://web.uccs.edu/lbecker/Psy590/ancova2.htm
http://www.cogs.susx.ac.uk/users/andyf/teaching/rm2/ancova.pdf

I am still not convinced how to measure and taken into account the 
learning variable as it significantly varies for different people.  Even 
if we could measure it, I don't see how we could adjust the results 
accordingly for this type of study i.e. in the case of superwoman, after 
the first few scenarios when she had memorised the site structure, we 
were effectively not measuring her task completion rate at all which was 
pretty much 100%.  Even if we somehow measured her learning ability and 
adjusted our result accordingly, I would still question the value of the 
final result.

In a perfect academic world, we would get around this by testing a 
smaller number of scenarios with a large number of users or some other 
method. However, in a commercial world this is not feasible and you 
often have to do the best you can with your limited time and budget.

Some of the ways we tried to improve the validity of our results were by:
1) putting suspected problem navigation areas first,
2) mixing scenarios around a lot so there weren't a lot of scenarios for 
one area of the site grouped together,
3) mixing the order of scenarios presented to different test 
participants (so the same scenarios were not always at the end);
4) noting and disregarding some of the results that were obviously 
skewed by the participants memory (easily noted through comments such as 
"Now I saw that in the X section before didn't I so I will try there 
first".)

Interested in any other recommendations that are achievable in a 
commercial world or hearing from experience of others.

Regards
Tania Lang
Peak Usability
Brisbane, Australia