March 31, 1998
TEL: (919) 515-6764
E-mail: mcclure@eos.ncsu.edu
MEMORANDUM
TO: ShootOut Participants
FROM: W. F. McClure, Professor
SUBJECT: RULES FOR the SOFTWARE SHOOTOUT at the IDRC98
("THE CHAMBERSBURG CONFERENCE")
Wilson College, Chambersburg, Pennsylvania, August 9-14, 1998.
INTRODUCTION
The Software Shootout has been a popular session at the IDRC. Its purpose
is to encourage chemometric studies and draw attention to the plethora of
software that exists with the hope of discovering the best method(s) of
analysis, both quantitative and qualitative. This year, for the first time,
the Shootout will be held as a formal part of the conference. There will
be at least two invited participants; The remaining time will be open to
anyone who would like to enter into the exchange of ideas.
Participants in the ShootOut are given two data sets (or files) of consisting
of scans and chemistry obtained from fescue grass grown in a soil medium
where the moisture was wicked from liquid tanks containing four levels of
fertilization (0, 50, 250 and 500 ppm of nitrogen). Each level of fertilization
was duplicated giving a total of 8 tanks. The purpose of the experiment
was to address an environmental problem that confronts growers every year
with increasing intensity: How much fertilizer should be added to maximize
production while, at the same time, minimizing the environmental consequences
of over fertilization?
There are a minimum of two questions that this study should address: (1)
Can NIR spectrometry measure the nitrogen status of plant material? And,
(2) Is this information related to fertilization? (Of course, there are
other considerations that the Shooters may want to take into account after
they look at the data.)
DATA FILES
Files with a suffix of *.DA1 and *.CN1 (chemical data) are NSAS formatted
files for spectral and constituent data respectively. Files with *.txt suffix
are in ASCII format. There are eight (8) files. A short description of the
files are given as follows:
1. WD0.DA1 - (n = 282) - Wet/green samples with double
scans on 6500. The grass samples were scanned in their wet-green state within
12 hours after harvesting (two scans per sample with the second scan from
a repack aliquot). Blind-duplicate chemical (reference lab) analyses of
freeze-dried samples are attached; one analysis value is attached to each
of the duplicate scans.
2. WD0.CN1 - Constituent data for the above file.
3. wd0sp.txt - (n = 282) - Wet/green spectra in
ASCII MATLAB matrix format.
4. wd0cn.txt - Constituent data in ASCII format
for above samples.
5. PS0.DA1 - (n = 141) - Powdered (dry ground) with
one scan of each sample. The related chemical values are the average of
the blind duplicates.
6. PS0.CN1 - Constituent data for the above file.
7. ps0sp.txt - (n = 141) - Single scans of powdered
samples.
8. ps0cn.txt - Constituent data in ASCII format
for above samples.
SPECTRAL DATA
Neither the spectral data nor the constituent data has been doctored in
any way. All spectra came from grass samples in a planned experiment as
stated above. There are no trick this year as in the past. The spectra and
constituent are real-life data. Thus, shooters can attack the data without
worrying about substitutions or reassigned values.
CONSTITUENT DATA
The chemistry was determined on a LECO CNS-2000 Carbon, Nitrogen and Sulphur
Analyzer. This instrument is a non-dispersive, infrared, microcomputer based
instrument designed to measure the carbon, nitrogen and sulphur content
in a wide variety of organic compounds. Carbon and sulphur are measured
by infrared radiation detection; nitrogen is determined by conductivity.
Nitrogen, sulphur and carbon were analyzed in blind duplicates and are attached,
for example, as nitrogen a (average of the two duplicates), nitrogen 1 and
nitrogen 2 as the duplicates respectively. Hence, there are nine (9) values
associated with each spectral file, three for each constituent and one fertilization
parameter.
FILE CODES
The three data sets are in two format: (1) NSAS and (2) ASCII (MatLab) formats.
Upper-case filenames are in NSAS (FOSS NIRSystems) format; lower-case filenames
are in ASCII - MatLab format. Letter-codes used in the filenames are defined
as follows:
1. First Letter in Filename (sample constitution):
W = wet green sample scanned directly after harvesting
P = dry sample ground through a 1 mm screen in a Wiley mill
Q = dry sample (P) scanned on a 19 filter instrument
2. Second Letter in Filename (number of scans per sample):
D = Double scans (repacks)
S = Single scans (no repacks)
3. Third Letter in Filename (pretreatments):
0 = no pretreatment
All data for the shootout have been posted on the IDRC website by Dr. Rob
Lodder and his associates at the University of Kentucky:
http://kerouac.pharm.uky.edu/asrg/cnirs/cnirs.html
Remember, there will be at least two invited presentations; volunteers will
be accommodated on a first come basis. Invited presenters will have a maximum
of 30 minutes; volunteer presenters will have a maximum of 15 minutes. A
discussion period will be held at the end. Comments, observations and criticisms
by the audience will be considered at this time. Prizes will be awarded
according to the ruling of a panel of three judges. The decision of the
judges ( based on originality, systematics and novelty of both qualitative
and quantitative analyses and best presentation ) will be final.
OBJECTIVE: All participants are asked to:
A. Develop their best calibration for each of the three constituents using
any software they choose.
B. Both Qualitative and Quantitative (outliers, etc.) results should be
reported.
PREVIOUS WINNERS of the SHOOTOUT will be allowed to defend their position
in the ShootOut. Decisions of the Judges are final. Presentation of awards
will be made at the banquet on Thursday night. Winners MUST be present to
win.
VOLUNTEER PRESENTERS should contact Fred McClure as soon as possible:
email: mcclure@eos.ncsu.edu
FAX: 919-515-7760
TEL: 919-515-6764