---
title: "Spectral Analysis"
always_allow_html: yes
date: "`r Sys.Date()`"
output: bookdown::html_document2
vignette: >
%\VignetteIndexEntry{Spectral Analysis}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
fig.pos = 'H',
comment = "#>"
)
```
```{r, message=F, warning =F , echo = FALSE }
library(webshot)
library(plotly)
```
# Package description
Infrared, near-infrared and Raman spectroscopic data measured during chemical
reactions, provide structural fingerprints by which molecules can be identified
and quantified. The application of these spectroscopic techniques as inline
process analytical tools (PAT), provides the (pharma-)chemical industry with
novel tools, allowing to monitor their chemical processes, resulting in a better
process understanding through insight in reaction rates, mechanistics,
stability, etc. Data can be read into R via the generic spc-format, which is
generally supported by spectrometer vendor software. Versatile pre-processing
functions are available to perform baseline correction by linking to the
'baseline' package; noise reduction via the 'signal' package; as well as time
alignment, normalization, differentiation, integration and interpolation.
Implementation based on the S4 object system allows storing a pre-processing
pipeline as part of a spectral data object, and easily transferring it to other
datasets. Interactive plotting tools are provided based on the 'plotly' package.
Non-negative matrix factorization (NMF) has been implemented to perform
multivariate analyses on individual spectral datasets or on multiple datasets at
once. NMF provides a parts-based representation of the spectral data in terms of
spectral signatures of the chemical compounds and their relative proportions.
The functionality to read in spc-files was adapted from the 'hyperSpec' package.
In this vignette we review the package functionality from loading data to
NMF-analysis.
# Basic functionality
## SpectraInTime-class
Spectral data are represented as en S3-class which containt a data matrix
together with meta data such as wavelengths, time points and preprocessing
steps executed on the data. An artificial example is added in the package by
which we will illustrate the basic functionality.
```{r, echo = TRUE, message = FALSE, results = TRUE}
library( spectralAnalysis )
spectralEx <- getSpectraInTimeExample( )
str( spectralEx)
```
You can extract slots via specific methods:
```{r, echo = TRUE, message = FALSE, results = TRUE}
dim( spectralEx)
getExperimentName(spectralEx)
getExtraInfo( spectralEx )
getStartTime( spectralEx )
getTimePoints( spectralEx )
getTimePoints( spectralEx , timePointsAlt = TRUE , timeUnit = "seconds" )
getTimePoints( spectralEx , timeUnit = "minutes" )
getTimePoints( spectralEx , timeUnit = "hours" )
getUnits( spectralEx )
getTimePoints( spectralEx , timePointsAlt = TRUE )
spectra <- getSpectra( spectralEx )
dim( spectra )
```
One can modify slot using specific methods:
```{r, echo = TRUE, message = FALSE, results = TRUE}
setTimePointsAlt( spectralEx ) <- getTimePoints( spectralEx ) - 200
```
Using these methods, will avoid you to make some errors since validity testing
is included.
```{r, echo = TRUE, message = FALSE, results = TRUE, error = TRUE}
setTimePointsAlt( spectralEx ) <- getTimePoints( spectralEx ) * 5
```
to get more information on the SpectraIntime class and its methods:
```{r , eval = FALSE }
?SpectraInTime
```
## Reading in data
The function *loadAllSPCFiles()* allows to read in all spectra data into a
list:
```{r, echo = TRUE, message = FALSE, results = TRUE , eval = FALSE}
allSPCFiles <- loadAllSPCFiles(directoryFiles)
```
## Subsetting
Different subsetting methods have been defined:
* logical, integer (by order)
* range `r()` function
* closest element matching `e()` function
* for time matching one can change the time unit between 'seconds' , 'minutes'
and 'hours'.
```{r, echo = TRUE, message = FALSE, results = TRUE}
print( r( 2.5 , 10.8) )
print( r( c(1 , 3 , 2 , 6 , 9 , 3 ) ) )
print( e( 1, 3 ,5 ,6 ,7 ,8 ) )
```
Note that `r()` and `e()` work similar to the usual `c()` function:
```{r, echo = TRUE, message = FALSE, results = TRUE}
# range subsetting
spectralEx <- getSpectraInTimeExample()
spectraSubset <- spectralEx[ r( 1000 , 30000 ) , r(130 , 135 ) ]
getTimePoints( spectraSubset )
getSpectralAxis( spectraSubset )
spectraTimeSubset <- spectralEx[ r( 1000 , 30000 ) , ]
spectraWavelengthSubset <- spectralEx[ , r(130 , 135 ) ]
# other types of subsetting
# logical
spectraSubsetLogical <- spectralEx[ getTimePoints( spectralEx ) > 300 ,
getSpectralAxis( spectralEx ) <= 500 ]
subsetLogTimeAlt <- spectralEx[
getTimePoints( spectralEx , timePointsAlt = TRUE ) > 0 ,
getSpectralAxis( spectralEx ) <= 500 ]
# integer
subsetInteger <- spectralEx[ c( 1, 5, 10) , c( 4 , 4 , 4 , 8 , 16) ]
# closest element matching
spectraSubsetElem <- spectralEx[ e( 1.234 , 3.579 ) ,
e( 200.001 , 466.96 ) , timeUnit = "hours" ]
getTimePoints( spectraSubsetElem , timeUnit = "hours" )
getSpectralAxis( spectraSubsetElem )
```
## Summary
```{r, echo = TRUE, message = FALSE, results = TRUE , eval = TRUE }
summarySpec <- summary( spectralEx )
str( summarySpec )
```
## Graphics
### One spectraInTime
4 different visualization options are implemented:
```{r plotBaseTime, fig.cap = "time view plot", echo = TRUE, message = FALSE, results = TRUE , eval = TRUE }
data = getSpectraInTimeExample()
library( plotly )
plot( x = data[ e( 1 , 2 , 3) , , timeUnit = "hours" ] , type = "time" , timeUnit = "hours" , timePointsAlt = FALSE )
```
```{r plotBaseWavelenght, fig.cap = "Wavelength view plot" , echo = TRUE, message = FALSE, results = TRUE , eval = TRUE }
plot( x = data[ , e( seq( 200 , 400 , 50 ) ) ] , type = "spectralAxis" , timeUnit = "minutes" , timePointsAlt = TRUE )
```
```{r plotBaseContour, fig.cap = "Contour plot" , echo = TRUE, message = FALSE, results = TRUE , eval = TRUE }
plot( x = data , type = "contour" , nColors = 200 , colors = "C" , timeUnit = "seconds", timePointsAlt = TRUE )
```
Note: 3D-plot not run to save space:
```{r plotBase3D, fig.cap = c("3D plot" , "time view plot" , "wavelength view plot" , "contour plot"), echo = TRUE, message = FALSE, results = TRUE , eval = FALSE }
plot( x = data , type = "3D" , timeUnit = "hours" , timePointsAlt = FALSE )
```
note that subsetting with element matching is useful for the time and wavelength
views.
### List of spectra in time
We have also 2 options to plot a list of SpectraInTime-objects directly.
Line plot:
```{r plotListLine, fig.cap = "Line plot for list of SpectraInTime" }
listOfSpectra <- getListOfSpectraExample()
plot( listOfSpectra , times = 1:3 , timeUnit = "hours" , colors = "A" )
```
and contour plot:
```{r plotListContour, fig.cap = "Contour plot for list of SpectraInTime"}
plot( listOfSpectra , timeUnit = "hours" , colors = "C" , type = "contour" )
```
## Time alignment
Time alignment of *SpectraInTime* refers to translating and possibly
subsetting the secondary time axis. Such that different experiments have a
comparable time origin.
Information to align *SpectraInTime* is contained in *processTimes* object for
one experiment:
```{r, echo = TRUE, message = FALSE, results = TRUE }
processTimes <- getProcessTimesExample()
processTimes
```
and *processTimesFrame* for multiple experiments:
```{r, echo = TRUE, message = FALSE, results = TRUE }
processTimesFrame <- getProcessTimesFrameExample()
processTimesFrame
```
There are 3 methods to align a *SpectraInTime*:
```{r, echo = TRUE, message = FALSE, results = TRUE }
spectra <- getSpectraInTimeExample()
listOfSpectra <- getListOfSpectraExample()
pathProcessTimes <- getPathProcessTimesExample()
ex1 <- timeAlign( x = spectra , y = processTimes ,
cutCooling = TRUE , cutBeforeMinTemp = TRUE )
ex2 <- timeAlign( x = listOfSpectra , y = processTimesFrame ,
cutCooling = TRUE , cutBeforeMinTemp = TRUE )
ex3 <- timeAlign( x = listOfSpectra , y = pathProcessTimes ,
cutCooling = TRUE , cutBeforeMinTemp = TRUE , timeFormat = "%Y-%m-%d %H:%M:%OS" )
```
# Real IR example
We used 2 example spectra from a chemical synthesis monitored by an IR-probe to illustrate preprocessing and NMF-analyis.
```{r , echo = TRUE}
exampleData1 <- readSpectra( system.file( "exampleData/exampleExperiment1.txt" ,
package = "spectralAnalysis") )
```
```{r plotReal, fig.cap = "Contour plot 2 real example experiments"}
plot( exampleData1, type = "contour" )
```
## Pre-processing
Pre-processing is applied to correct for any physical influences such as light
scattering. It allows for better interpretation and more precise modeling.
Usually several pre-processing steps are applied after each other and the ideal
pre-processing procedure depends on the technology and the measured system.
### Selection of time and wavelength range
Scientist can often indicate a relevant wavelength range. In this manner we can
reduce the number of covariates.
```{r}
dim( exampleData1 )
wavelengthRange <- r ( 800 , 1625 )
spectralDataSelect <- exampleData1[ r( 0 , 5 ) , wavelengthRange , timeUnit = "hours" ]
```
```{r plot4, fig.cap = "Selected spectra of an example experiment", fig.align = "center", fig.pos = "h" , echo = FALSE }
plot( spectralDataSelect , type = "contour" )
```
### Baseline correction
Baseline correction implies fitting a global function for each measured spectra
and subtracting it from the data.
```{r plot5, fig.cap = "Raw spectra example experiment", fig.align = "center", fig.pos = "h"}
timesToShow <- e( 0.5 , 5 )
plot( spectralDataSelect[ timesToShow , , timeUnit = "hours"] , type = "time" )
```
```{r plot6, fig.cap = "Baseline corrected spectra example experiment", fig.align = "center", fig.pos = "h"}
spectralDataBaseline <- baselineCorrect( spectralDataSelect ,
method = 'modpolyfit', degree = 4 )
plot( spectralDataBaseline[ timesToShow , , timeUnit = "hours"] , type = "time" )
```
### smoothing
Smoothing or filtering in the wavelength domain can reduce measurement noise,
the default smoothing is the Savitky-golay filter, which uses a local polynomial
approximation, and which can be also be used to calculate derivatives in the
wavelength domain. This can be useful to better detect peaks, especially in
NIR-spectra where peaks are not clearly distinguishable otherwise.
```{r plot7, fig.cap = "Smoothed spectra example experiment", fig.align = "center", fig.pos = "h"}
spectralDataSmooth <- smooth( spectralDataBaseline , window = 5 )
plot( spectralDataSmooth[ timesToShow , , timeUnit = "hours"] , type = "time" )
```
smoothing has little influence here, because the measurements contain little
noise.
```{r plot8, fig.cap = "Derivative spectra example experiment", fig.align = "center", fig.pos = "h"}
spectralDataDerivative <- smooth( spectralDataBaseline , derivative = 1 )
plot( spectralDataDerivative[ timesToShow , , timeUnit = "hours"] , type = "time" )
```
### Normalization
Normalization standardizes the intensities by:
* the area under the curve
* a reference peak, usually the peak of the solvent
* standardization by some mean and scale function
```{r plot9, fig.cap = "Integral normalized spectra example experiment", fig.align = "center", fig.pos = "h"}
spectralDataNormalized <- normalize( spectralDataSmooth , method = "integration" )
plot( spectralDataNormalized[ timesToShow , , timeUnit = "hours"] , type = "time" )
```
### Scatter correction
Multiplicative scatter correction (MSC) removes scatter effects by regressing a
spectra in the wavelength domain against a reference spectra and substracting
the found intercept and dividing by the slope for each spectra
```{r, eval = FALSE }
?scatterCorrrect
```
### Transfer preprocessing steps
preprocessing steps can be replicated on new data:
```{r}
allSpectraDataProcessed <- lapply( listOfSpectra , preprocess , with = spectralDataSmooth )
```
## NMF-analysis
We will perform NMF-ananalysis on the smoothed and baseline corrected spectra
for the time range: 0 - 5 hours assuning 3 chemical components.
```{r , results = FALSE}
spectralExSelect <- spectralDataSmooth[ r( 0 , 5 ) , , timeUnit = "hours" ]
nmfResult <- spectralNMF( spectralExSelect , rank = 3 , subsamplingFactor = 5 )
nmfObject <- getDimensionReduction( nmfResult , type = "NMF")$NMF
```
```{r plotNMF, fig.cap = "NMF-trends on example experiments" }
nmfTrends <- t( NMF::coef( nmfObject ) )
matplot( nmfTrends , type = "l" , x = getTimePoints( spectralExSelect , timeUnit = "hours" ) , xlab = "time in hours" )
```
NMF-analysis is a useful exploratory tool to get insight in the chemical
process.
Including pure-component spectra can increase the quality of NMF-analysis.
see
```{r, eval = FALSE }
?spectralNMF
```