这是一个简单的例子,说明了Cromwell软件包(Coombes等,蛋白质组学,2005年)用于分析Seldi/Maldi蛋白质组学光谱。***** 警告。这个示例不是独立的。*****要使用此示例,您将需要(除此处提供的文件之外)MATLAB的工作版本(我们仅在6.5版及更高版本上对此进行了测试)和Rice Wavelet工具箱,可从www-dsp中获得。我们已经在那里包含了Cromwell的副本。***结束警告。这个示例不是独立的。***这里使用的光谱是Pusztai等人(2004年),癌症100:1814-22中使用的光谱。完整数据集可从http://bioinformatics.mdanderson.org获得。这里的光谱是衍生自普通血清库的QC样品的20个低质量扫描。 all of the spectra should be telling the same story here. All of the spectra have intensities at 33885 m/z values, and the vector of m/z values is the same for all spectra. The machine was calibrated to a set of known peaks shortly before the entire set of spectra was run. The spectra were run in randomized order on a series of chips. The files have been provided in 2 formats. RawBinary contains "Low_mass_serum_QC.xpt" which is the binary format used by the Ciphergen software. This file contains all of the spectra used here. RawXML contains all of the spectra exported from the above xpt file using the Ciphergen software (version 3.1.1). This format contains the spectra intensities both before any processing (the integer counts in tofDataSamples) and after application of various correction factors (the m/z, intensity pairs in processedDataSamples). The XML files also contain all of the setting parameters used in processing the data, including run times of the spectra (which in general can be used to confirm the randomness or lack thereof of the run order with respect to sample group). Due to historical development, the scripts in Cromwell were written to deal with files in two-column .csv format, with the first column corresponding to M/Z and the second to intensity. We extract these files from the XML files using the kludged script xml2txt.pl (this takes about 5-10 sec on my laptop.) Note that this script does not simply take the last half of the XML datafile, consisting of the M/Z,Intensity pairs returned by the Ciphergen software. Rather, it takes these M/Z values but draws the Intensity values from the raw integer counts supplied in tofDataSamples, thus getting the data before any preprocessing has been applied. Running the above script will place .txt versions of the spectra in the folder RawSpectra/. Finally, we shift to the processing of the raw spectra to produce baseline corrected and smoothed spectra together with matrices of peak intensities. This procedure is detailed in processSpectra.m (our m-file) Most of the processing is described in far more detail in the m-file named above, and one or two illustrative pictures will be stored in Figs/. Matlab binary .mat files (CorrectedSpectra.mat and Peaks.mat) will be produced for later analysis. Hope that helps!