Processing PSM data
Contact: Amnon Silverstein
ABSTRACT
Elizabeth Pirrotta from HP Barcelona used a pattern strength meter I built,
and she collected some data from ten subjects, for 14 patterns, three trials each.
At first, this data seems very noisy, but it cleans up well with a bit of processing.
Here are tif files. They are 256X256 samples at 600 dpi, linear reflectance.
A value of 0 is set to light-trap black and 255 is set to paper-white.
Ten subjects adjusted a dial until a pattern was just-visible. They did this
three times each for 14 different patterns, for a total of 420 adjustments.
The scatter of the data looks very noisy.
The horizontal axis has the 14 patterns used in the study. The vertical axis
shows the 30 contrast settings for each pattern made by the ten different
subjects. They were instructed to adjust the contrast until the pattern was
just-visible.

Averaging within the subjects unclutters the graph, but the subjects still do
not seem to show agreement as to the absolute contrast each pattern should have
to be just-visible. The axis are the same as the previous figure, but the
average of the three settings is shown.

The reason for much of the disagreement is due to the subjects' criteria. Each subject
has a slightly different criteria for just-visible, and this makes it hard
to see that the relative visibility of the patterns is in good agreement.
To correct for this, each subject is compared to the average of all subjects, and
a single factor which normalizes their criteria to the average is computed.
For the ten subjects, the normalization factors were:
1.1090
1.0481
0.9695
1.0546
0.9358
1.0032
1.1783
0.7166
1.1022
0.8828
The first subject had a critera for just-visible that required 111% of the
contrast as the average subject's just-visible criteria. The most stringent
subject wanted the pattern to have 72% of the average subject's just-visible
contrast.

By scaling all the measurments from each subject by this factor, the true
agreement between the subjects can be seen. This agreement is quite good. The
standard deviation for subjects is +/- 8% contrast for each measurement.

The number of times-threshold of each pattern is shown in the above plot.
As can be seen, the patterns were not very far above threshold contrast.
This is simply the reciprocal of the previous data. e.g. If the pattern
could be seen at .25 times its original contrast, it had 4 times the threshold
contrast.
Files to transfer (including the data in matlab and
Lotus 1-2-3 formats, the matlab code to generate the figures, and eps figures)
PSM data troubleshooting guide
- Problems with subjects
- Instructions
Were the instructions clear? Did the subjects sometimes set it so the pattern was completely
undetectable, and thus only set a lower-bound on threshold instead of a threshold?
- Criteria
Were the subjects encouraged to take too severe a criteria for threshold? If the criteria was
too severe, they may have been uncertain of the stimulus strength.
- Were the subjects a homogenious group?
Were some subjects corrected differently than others (some were wearing glasses and some were not)?
Were some subjects not sufficiently corrected? Did some subjects need bifocal correction?
- Instrumentation problems
- Was the lighting consistent?
- Were the viewing angles and head positions consistent?
- Were the same regions of each sample used in each trial?
- Was there any problem with a parameter that might be sensitive to slight positional differences,
(such as specular reflection)?
- Were the samples different enough from each other? Were they too strong or too weak? The PSM can not measure sub-threshold pattern strengths, and it has a small amount of leakage for very strong patterns even at 0 contrast setting.
- Statistical stuff
- Normalize the subjects to the average subject by a scale factor to correct for that subjects criteria.
- If the subjects are normalized, and their three trials are averaged, how do they compare with each other?
- If the thresholds are converted into an ordering, how do the orderings compare across subjects?
- Troubleshooting ideas
- What samples had the least consistent settings? What is special about these samples?
- Plot error vs mean threshold. How are they related?
- Which subjects have the best agreement with each other?
- If subjects that had poor consistency are removed from the group, how does the cross-subject
consistency improve?
- Were subjects that had very severe criteria different from other subjects?
- If subjects are given stronger stimuli, does that improve consistency? Subjects may not have
been able to see the stimuli well enough to judge their strength, and thus had to choose too
strong a criterion.
- Construct simple stimuli that have known strengths, and see how subjects do with those. Bands with
different contrasts produced on a Fujix, for example.
- Maybe have a special sample to set the subject's criterion. Put the card in at a set level, and tell the
subject that this is what a threshold stimulus should look like. After every sample (or few samples),
show the subject the demo sample again.