Title: | Fast Covariance Estimation for Multivariate Sparse Functional Data |
---|---|
Description: | Multivariate functional principal component analysis via fast covariance estimation for multivariate sparse functional data or longitudinal data proposed by Li, Xiao, and Luo (2020) <doi: 10.1002/sta4.245>. |
Authors: | Cai Li [aut,cre] and Luo Xiao [aut] |
Maintainer: | Cai Li <[email protected]> |
License: | GPL-3 |
Version: | 0.1-4 |
Built: | 2025-01-04 03:09:06 UTC |
Source: | https://github.com/cli9/mfaces |
Fast Covariance Estimation for Multivariate Sparse Functional Data
Package: | mfaces |
Type: | Package |
Version: | 0.1-4 |
Date: | 2022-07-18 |
License: | GPL-3 |
Cai Li and Luo Xiao
Maintainer: Cai Li <[email protected]>
Cai Li, Luo Xiao, and Sheng Luo, 2020. Fast covariance estimation for multivariate sparse functional data. Stat, 9(1), p.e245, doi:10.1002/sta4.245.
The function is to estimate the mean and covariance function from a cluster of multivariate functions/longitudinal observations.
mface.sparse(data, newdata = NULL, center = TRUE, argvals.new = NULL, knots = 7, knots.option = "equally-spaced", p = 3, m = 2, lambda = NULL, lambda_mean = NULL, lambda_bps = NULL, search.length = 14, lower = -3, upper = 10, calculate.scores = FALSE, pve = 0.99)
mface.sparse(data, newdata = NULL, center = TRUE, argvals.new = NULL, knots = 7, knots.option = "equally-spaced", p = 3, m = 2, lambda = NULL, lambda_mean = NULL, lambda_bps = NULL, search.length = 14, lower = -3, upper = 10, calculate.scores = FALSE, pve = 0.99)
data |
a list containing all functional outcomes.
Each element is a data frame with three arguments:
(1) |
newdata |
of the same strucutre as |
center |
logical. If TRUE, then Pspline smoothing of the population mean will be conducted and subtracted from the data before covariance smoothing; if FALSE, then the population mean will be just 0s. |
argvals.new |
a vector of observation time points to evaluate mean function, covariance function, error variance and etc. If NULL, then 100 equidistant points in the range of data time points will be used. |
knots |
the number of knots for B-spline basis functions to be used; defaults to 7. The resulting number of basis functions is the number of interior knots plus the degree of B-splines. |
knots.option |
if |
p |
the degrees of B-splines; defaults to 3. |
m |
the order of differencing penalty; defaults to 2. |
lambda |
the value of the smoothing parameter for auto-covariance smoothing; defaults to NULL. |
lambda_mean |
the value of the smoothing parameter for mean smoothing; defaults to NULL. |
lambda_bps |
the value of the smoothing parameter for cross-covariance smoothing; defaults to NULL. |
search.length |
the number of equidistant (log scale) smoothing parameters to search; defaults to 14. |
lower , upper
|
bounds for log smoothing parameter; defaults are -3 and 10, respectively. |
calculate.scores |
if TRUE, scores will be calculated. |
pve |
Defaults 0.99. To select the number of eigenvalues by percentage of variance. |
This is a generalized version of bivariate P-splines (Eilers and Marx, 2003) for covariance smoothing of multivariate sparse functional or longitudinal data. It uses tensor product B-spline basis functions and employs differencing penalties on the assosciated parameter matrix. The smoothing parameters in the method are selected by leave-one-subject-out cross validation and is implemented with a fast algorithm.
If center
is TRUE, then the population means will be calculated and are smoothed by
univariate P-spline smoothing: pspline
(Eilers and Marx, 1996). This univariate
smoothing uses leave-one-subject-out cross validation to select the smoothing parameter.
If knots.option is "equally-spaced", then the differencing penalty in Eilers and Marx (2003) is used; if knots.option is "quantile" then the integrated squared second order derivative penalty in Wood (2016) is used.
fit |
Univariate FPCA fit for each function |
y.pred , mu.pred , Chat.diag.pred , se.pred , var.error.pred
|
Predicted/estimated objects at |
Theta |
Estimated parameter matrix |
argvals.new |
Vector of time points to evaluate population parameters |
Chat.new , Cor.new , Cor.raw.new , Chat.raw.diag.new , var.error.new
|
Estimated objects at |
eigenfunctions , eigenvalues
|
Estimated eigenfunctions (scaled eigenvector) and eigenvalues at |
var.error.hat |
Estimated objects for each outcome |
calculate.scores , rand_eff
|
if |
... |
... |
Cai Li <[email protected]> and Luo Xiao <[email protected]>
Cai Li, Luo Xiao, and Sheng Luo, 2020. Fast covariance estimation for multivariate sparse functional data. Stat, 9(1), p.e245, doi:10.1002/sta4.245.
Luo Xiao, Cai Li, William Checkley and Ciprian Crainiceanu, Fast covariance estimation for sparse functional data, Stat. Comput., doi:10.1007/s11222-017-9744-8.
Paul Eilers and Brian Marx, Multivariate calibration with temperature interaction using two-dimensional penalized signal regression, Chemometrics and Intelligent Laboratory Systems 66 (2003), 159-174.
Paul Eilers and Brian Marx, Flexible smoothing with B-splines and penalties, Statist. Sci., 11, 89-121, 1996.
Simon N. Wood, P-splines with derivative based penalties and tensor product smoothing of unevenly distributed data, Stat. Comput., doi:10.1007/s11222-016-9666-x.
face.sparse
in face
## a toy example ## settings n <- 25 sigma <- 0.1 seed <- 118 set.seed(seed) ## data generation N1 <- sample(3:7,n,replace=TRUE) N2 <- sample(3:7,n,replace=TRUE) N3 <- sample(3:7,n,replace=TRUE) subj1 <- c() subj2 <- c() subj3 <- c() for(i in 1:n){ subj1 <- c(subj1,rep(i, N1[i])) subj2 <- c(subj2,rep(i, N2[i])) subj3 <- c(subj3,rep(i, N3[i])) } t1 <- runif(sum(N1)) t2 <- runif(sum(N2)) t3 <- runif(sum(N3)) tnew <- seq(0,1,length=100) y1 <- 5*sin(2*pi*t1) y2 <- 5*cos(2*pi*t2) y3 <- 5*(t3-1)^2 x1 <- t(matrix(rep(5*sin(2*pi*tnew),n),length(tnew),n)) x2 <- t(matrix(rep(5*cos(2*pi*tnew),n),length(tnew),n)) x3 <- t(matrix(rep(5*(tnew-1)^2,n),length(tnew),n)) psi11 <- function(x){sqrt(2/3)*sin(2*pi*x)} psi12 <- function(x){sqrt(2/3)*cos(4*pi*x)} psi13 <- function(x){sqrt(2/3)*sin(4*pi*x)} psi21 <- function(x){sqrt(2/3)*sin((1-1/2)*pi*x)} psi22 <- function(x){sqrt(2/3)*sin((2-1/2)*pi*x)} psi23 <- function(x){sqrt(2/3)*sin((3-1/2)*pi*x)} psi31 <- function(x){sqrt(2/3)*sin(1*pi*x)} psi32 <- function(x){sqrt(2/3)*sin(2*pi*x)} psi33 <- function(x){sqrt(2/3)*sin(3*pi*x)} Lambda <- c(2,1,0.5)*3 x <- matrix(NA,nrow=n*length(tnew),ncol=3) xi <- matrix(NA,nrow=n,ncol=3) for(k in 1:3){xi[,k] = rnorm(n)*sqrt(Lambda[k])} for(i in 1:n){ seq1 <- (sum(N1[1:i])-N1[i]+1):(sum(N1[1:i])) seq2 <- (sum(N2[1:i])-N2[i]+1):(sum(N2[1:i])) seq3 <- (sum(N3[1:i])-N3[i]+1):(sum(N3[1:i])) Xt = xi[i,1]*c(psi11(t1[seq1]),psi21(t2[seq2]),psi31(t3[seq3])) + xi[i,2]*c(psi12(t1[seq1]),psi22(t2[seq2]),psi32(t3[seq3])) + xi[i,3]*c(psi13(t1[seq1]),psi23(t2[seq2]),psi33(t3[seq3])) y1[seq1] = y1[seq1] + Xt[1:N1[i]] y2[seq2] = y2[seq2] + Xt[N1[i]+1:N2[i]] y3[seq3] = y3[seq3] + Xt[N1[i]+N2[i]+1:N3[i]] x[((i-1)*length(tnew)+1) :(length(tnew)*i),] = c(x1[i,], x2[i,], x3[i,]) + xi[i,1]*c(psi11(tnew),psi21(tnew),psi31(tnew)) + xi[i,2]*c(psi12(tnew),psi22(tnew),psi32(tnew)) + xi[i,3]*c(psi13(tnew),psi23(tnew),psi33(tnew)) } True_C <- Lambda[1]*c(psi11(tnew),psi21(tnew),psi31(tnew))%x% t(c(psi11(tnew), psi21(tnew), psi31(tnew))) + Lambda[2]*c(psi12(tnew), psi22(tnew), psi32(tnew))%x% t(c(psi12(tnew), psi22(tnew), psi32(tnew))) + Lambda[3]*c(psi13(tnew), psi23(tnew), psi33(tnew))%x% t(c(psi13(tnew), psi23(tnew), psi33(tnew))) ## observed data y1 <- y1 + rnorm(sum(N1))*sigma y2 <- y2 + rnorm(sum(N2))*sigma y3 <- y3 + rnorm(sum(N3))*sigma # true trajectories x1 <- t(matrix(x[,1],length(tnew),n)) x2 <- t(matrix(x[,2],length(tnew),n)) x3 <- t(matrix(x[,3],length(tnew),n)) true_eigenfunctions <- eigen(True_C)$vectors*sqrt(length(tnew)) true_eigenvalues <- eigen(True_C)$values/length(tnew) ## organize data and apply mFACEs data <- list("y1" = data.frame("subj"= subj1, "argvals" = t1, "y" = y1), "y2" = data.frame("subj"= subj2, "argvals" = t2, "y" = y2), "y3" = data.frame("subj"= subj3, "argvals" = t3, "y" = y3)) fit <- mface.sparse(data, argvals.new = tnew, knots = 5) ## set calculate.scores to TRUE if want to get scores fit <- mface.sparse(data, argvals.new = tnew, knots = 5, calculate.scores = TRUE) scores <- fit$rand_eff$scores ## prediction of several subjects for(i in 1:2){ sel <- lapply(data, function(x){which(x$subj==i)}) dat_i <- mapply(function(data, sel){data[sel,]}, data = data, sel = sel, SIMPLIFY = FALSE) dat_i_pred <- lapply(dat_i, function(x){ data.frame(subj=rep(x$subj[1],nrow(x) + length(tnew)), argvals = c(rep(NA,nrow(x)),tnew), y = rep(NA,nrow(x) + length(tnew))) }) for(j in 1:length(dat_i)){ dat_i_pred[[j]][1:nrow(dat_i[[j]]), ] <- dat_i[[j]] } pred <- predict(fit, dat_i_pred) y_pred <- mapply(function(pred_y.pred, dat_i){ pred_y.pred[nrow(dat_i)+1:length(tnew)]}, pred_y.pred = pred$y.pred, dat_i = dat_i, SIMPLIFY = TRUE) pre <- pred Ylim = c(-12,12) Xlim = c(0,1) Ylab = bquote(y^(1)) Xlab = "t" main = paste("Subject ", dat_i[[1]][1,1],sep="") idx = (nrow(dat_i[[1]])+1):(nrow(dat_i[[1]])+length(tnew)) plot(dat_i[[1]][,"argvals"],dat_i[[1]][,"y"],ylim=Ylim,xlim=Xlim,ylab=Ylab,xlab=Xlab, main=main,cex.lab=2.0,cex.axis = 2.0,cex.main = 2.0,pch=1) lines(tnew,pre$y.pred$y1[idx],col="red",lwd=2) lines(tnew,pre$y.pred$y1[idx]-1.96*pre$se.pred$y1[idx],col="blue",lwd=2,lty=2) lines(tnew,pre$y.pred$y1[idx]+1.96*pre$se.pred$y1[idx],col="blue",lwd=2,lty=2) lines(tnew,x1[i,],col="purple",lwd=2) Ylab = bquote(y^(2)) Xlab = "t" main = paste("Subject ", dat_i[[1]][1,1],sep="") idx = (nrow(dat_i[[2]])+1):(nrow(dat_i[[2]])+length(tnew)) plot(dat_i[[2]][,"argvals"],dat_i[[2]][,"y"],ylim=Ylim,xlim=Xlim,ylab=Ylab,xlab=Xlab, main=main,cex.lab=2.0,cex.axis = 2.0,cex.main = 2.0,pch=1) lines(tnew,pre$y.pred$y2[idx],col="red",lwd=2) lines(tnew,pre$y.pred$y2[idx]-1.96*pre$se.pred$y2[idx],col="blue",lwd=2,lty=2) lines(tnew,pre$y.pred$y2[idx]+1.96*pre$se.pred$y2[idx],col="blue",lwd=2,lty=2) lines(tnew,x2[i,],col="purple",lwd=2) Ylab = bquote(y^(3)) Xlab = "t" main = paste("Subject ", dat_i[[1]][1,1],sep="") idx = (nrow(dat_i[[3]])+1):(nrow(dat_i[[3]])+length(tnew)) plot(dat_i[[3]][,"argvals"],dat_i[[3]][,"y"],ylim=Ylim,xlim=Xlim,ylab=Ylab,xlab=Xlab, main=main,cex.lab=2.0,cex.axis = 2.0,cex.main = 2.0,pch=1) lines(tnew,pre$y.pred$y3[idx],col="red",lwd=2) lines(tnew,pre$y.pred$y3[idx]-1.96*pre$se.pred$y3[idx],col="blue",lwd=2,lty=2) lines(tnew,pre$y.pred$y3[idx]+1.96*pre$se.pred$y3[idx],col="blue",lwd=2,lty=2) lines(tnew,x3[i,],col="purple",lwd=2) }
## a toy example ## settings n <- 25 sigma <- 0.1 seed <- 118 set.seed(seed) ## data generation N1 <- sample(3:7,n,replace=TRUE) N2 <- sample(3:7,n,replace=TRUE) N3 <- sample(3:7,n,replace=TRUE) subj1 <- c() subj2 <- c() subj3 <- c() for(i in 1:n){ subj1 <- c(subj1,rep(i, N1[i])) subj2 <- c(subj2,rep(i, N2[i])) subj3 <- c(subj3,rep(i, N3[i])) } t1 <- runif(sum(N1)) t2 <- runif(sum(N2)) t3 <- runif(sum(N3)) tnew <- seq(0,1,length=100) y1 <- 5*sin(2*pi*t1) y2 <- 5*cos(2*pi*t2) y3 <- 5*(t3-1)^2 x1 <- t(matrix(rep(5*sin(2*pi*tnew),n),length(tnew),n)) x2 <- t(matrix(rep(5*cos(2*pi*tnew),n),length(tnew),n)) x3 <- t(matrix(rep(5*(tnew-1)^2,n),length(tnew),n)) psi11 <- function(x){sqrt(2/3)*sin(2*pi*x)} psi12 <- function(x){sqrt(2/3)*cos(4*pi*x)} psi13 <- function(x){sqrt(2/3)*sin(4*pi*x)} psi21 <- function(x){sqrt(2/3)*sin((1-1/2)*pi*x)} psi22 <- function(x){sqrt(2/3)*sin((2-1/2)*pi*x)} psi23 <- function(x){sqrt(2/3)*sin((3-1/2)*pi*x)} psi31 <- function(x){sqrt(2/3)*sin(1*pi*x)} psi32 <- function(x){sqrt(2/3)*sin(2*pi*x)} psi33 <- function(x){sqrt(2/3)*sin(3*pi*x)} Lambda <- c(2,1,0.5)*3 x <- matrix(NA,nrow=n*length(tnew),ncol=3) xi <- matrix(NA,nrow=n,ncol=3) for(k in 1:3){xi[,k] = rnorm(n)*sqrt(Lambda[k])} for(i in 1:n){ seq1 <- (sum(N1[1:i])-N1[i]+1):(sum(N1[1:i])) seq2 <- (sum(N2[1:i])-N2[i]+1):(sum(N2[1:i])) seq3 <- (sum(N3[1:i])-N3[i]+1):(sum(N3[1:i])) Xt = xi[i,1]*c(psi11(t1[seq1]),psi21(t2[seq2]),psi31(t3[seq3])) + xi[i,2]*c(psi12(t1[seq1]),psi22(t2[seq2]),psi32(t3[seq3])) + xi[i,3]*c(psi13(t1[seq1]),psi23(t2[seq2]),psi33(t3[seq3])) y1[seq1] = y1[seq1] + Xt[1:N1[i]] y2[seq2] = y2[seq2] + Xt[N1[i]+1:N2[i]] y3[seq3] = y3[seq3] + Xt[N1[i]+N2[i]+1:N3[i]] x[((i-1)*length(tnew)+1) :(length(tnew)*i),] = c(x1[i,], x2[i,], x3[i,]) + xi[i,1]*c(psi11(tnew),psi21(tnew),psi31(tnew)) + xi[i,2]*c(psi12(tnew),psi22(tnew),psi32(tnew)) + xi[i,3]*c(psi13(tnew),psi23(tnew),psi33(tnew)) } True_C <- Lambda[1]*c(psi11(tnew),psi21(tnew),psi31(tnew))%x% t(c(psi11(tnew), psi21(tnew), psi31(tnew))) + Lambda[2]*c(psi12(tnew), psi22(tnew), psi32(tnew))%x% t(c(psi12(tnew), psi22(tnew), psi32(tnew))) + Lambda[3]*c(psi13(tnew), psi23(tnew), psi33(tnew))%x% t(c(psi13(tnew), psi23(tnew), psi33(tnew))) ## observed data y1 <- y1 + rnorm(sum(N1))*sigma y2 <- y2 + rnorm(sum(N2))*sigma y3 <- y3 + rnorm(sum(N3))*sigma # true trajectories x1 <- t(matrix(x[,1],length(tnew),n)) x2 <- t(matrix(x[,2],length(tnew),n)) x3 <- t(matrix(x[,3],length(tnew),n)) true_eigenfunctions <- eigen(True_C)$vectors*sqrt(length(tnew)) true_eigenvalues <- eigen(True_C)$values/length(tnew) ## organize data and apply mFACEs data <- list("y1" = data.frame("subj"= subj1, "argvals" = t1, "y" = y1), "y2" = data.frame("subj"= subj2, "argvals" = t2, "y" = y2), "y3" = data.frame("subj"= subj3, "argvals" = t3, "y" = y3)) fit <- mface.sparse(data, argvals.new = tnew, knots = 5) ## set calculate.scores to TRUE if want to get scores fit <- mface.sparse(data, argvals.new = tnew, knots = 5, calculate.scores = TRUE) scores <- fit$rand_eff$scores ## prediction of several subjects for(i in 1:2){ sel <- lapply(data, function(x){which(x$subj==i)}) dat_i <- mapply(function(data, sel){data[sel,]}, data = data, sel = sel, SIMPLIFY = FALSE) dat_i_pred <- lapply(dat_i, function(x){ data.frame(subj=rep(x$subj[1],nrow(x) + length(tnew)), argvals = c(rep(NA,nrow(x)),tnew), y = rep(NA,nrow(x) + length(tnew))) }) for(j in 1:length(dat_i)){ dat_i_pred[[j]][1:nrow(dat_i[[j]]), ] <- dat_i[[j]] } pred <- predict(fit, dat_i_pred) y_pred <- mapply(function(pred_y.pred, dat_i){ pred_y.pred[nrow(dat_i)+1:length(tnew)]}, pred_y.pred = pred$y.pred, dat_i = dat_i, SIMPLIFY = TRUE) pre <- pred Ylim = c(-12,12) Xlim = c(0,1) Ylab = bquote(y^(1)) Xlab = "t" main = paste("Subject ", dat_i[[1]][1,1],sep="") idx = (nrow(dat_i[[1]])+1):(nrow(dat_i[[1]])+length(tnew)) plot(dat_i[[1]][,"argvals"],dat_i[[1]][,"y"],ylim=Ylim,xlim=Xlim,ylab=Ylab,xlab=Xlab, main=main,cex.lab=2.0,cex.axis = 2.0,cex.main = 2.0,pch=1) lines(tnew,pre$y.pred$y1[idx],col="red",lwd=2) lines(tnew,pre$y.pred$y1[idx]-1.96*pre$se.pred$y1[idx],col="blue",lwd=2,lty=2) lines(tnew,pre$y.pred$y1[idx]+1.96*pre$se.pred$y1[idx],col="blue",lwd=2,lty=2) lines(tnew,x1[i,],col="purple",lwd=2) Ylab = bquote(y^(2)) Xlab = "t" main = paste("Subject ", dat_i[[1]][1,1],sep="") idx = (nrow(dat_i[[2]])+1):(nrow(dat_i[[2]])+length(tnew)) plot(dat_i[[2]][,"argvals"],dat_i[[2]][,"y"],ylim=Ylim,xlim=Xlim,ylab=Ylab,xlab=Xlab, main=main,cex.lab=2.0,cex.axis = 2.0,cex.main = 2.0,pch=1) lines(tnew,pre$y.pred$y2[idx],col="red",lwd=2) lines(tnew,pre$y.pred$y2[idx]-1.96*pre$se.pred$y2[idx],col="blue",lwd=2,lty=2) lines(tnew,pre$y.pred$y2[idx]+1.96*pre$se.pred$y2[idx],col="blue",lwd=2,lty=2) lines(tnew,x2[i,],col="purple",lwd=2) Ylab = bquote(y^(3)) Xlab = "t" main = paste("Subject ", dat_i[[1]][1,1],sep="") idx = (nrow(dat_i[[3]])+1):(nrow(dat_i[[3]])+length(tnew)) plot(dat_i[[3]][,"argvals"],dat_i[[3]][,"y"],ylim=Ylim,xlim=Xlim,ylab=Ylab,xlab=Xlab, main=main,cex.lab=2.0,cex.axis = 2.0,cex.main = 2.0,pch=1) lines(tnew,pre$y.pred$y3[idx],col="red",lwd=2) lines(tnew,pre$y.pred$y3[idx]-1.96*pre$se.pred$y3[idx],col="blue",lwd=2,lty=2) lines(tnew,pre$y.pred$y3[idx]+1.96*pre$se.pred$y3[idx],col="blue",lwd=2,lty=2) lines(tnew,x3[i,],col="purple",lwd=2) }
Internal function.
No return value, called for internal usage
Predict subject-specific curves based on a fit from "mface.sparse".
## S3 method for class 'mface.sparse' predict(object, newdata, calculate.scores = T, ...)
## S3 method for class 'mface.sparse' predict(object, newdata, calculate.scores = T, ...)
object |
a fitted object from the R function "mface.sparse". |
newdata |
a list containing all functional outcomes. Each element is a data frame with three arguments:
(1) |
calculate.scores |
if TRUE, scores will be calculated. |
... |
further arguments passed to or from other methods. |
This function makes prediction based on observed data for each subject. So for each subject,
it requires at least one observed data. For the time points prediction is desired but no observation is available, just make the corresponding data$y
as NA.
object |
A "mface.sparse" fit |
newdata |
Input data |
y.pred , mu.pred , se.pred , Chat.diag.pred , var.error.pred
|
Predicted/estimated objects at the observation time points in |
rand_eff |
if |
... |
... |
Cai Li <[email protected]>
Cai Li, Luo Xiao, and Sheng Luo, 2020. Fast covariance estimation for multivariate sparse functional data. Stat, 9(1), p.e245, doi:10.1002/sta4.245.
# See the examples for "mface.sparse".
# See the examples for "mface.sparse".