"The Use of High-Dimensional Sparse Structural Equation Modeling"- An Engineering Article for the University of Miami



 By Nancy Abramson | 06/17/19

Structural equation modeling (SEM) is a powerful and flexible toolbox for statistical inference – the practice of forming judgments typically on the basis of random sampling. SEM is a set of mathematical models, computer algorithms and statistical methods that function as tools to help with statistical inference. However, despite the flexibility of structural equation models (SEMs), there are few efficient and effective inference methods that can work with the kinds of problems with many variables and possible values that commonly occur in contemporary fields, such as genomics. Dr. Xiaodong Cai, a professor in the Department of Electrical and Computer Engineering, is developing more effective SEM methods that can be used in modern, highly complex research – specifically in genomics. The National Institute of General Medical Sciences is sponsoring his research.

What is SEM?

SEM is often used, particularly in social sciences, to impute relationships between unobserved constructs from observations of observable variables. That’s where the inference comes in: using these SEM math models, researchers can infer something unobservable from what has been observed. SEM has well-documented merits in biology, ecology, economics, psychology and social sciences.

Consider human intelligence, for example. Intelligence cannot be measured directly the way height or weight can be – it is an unobservable, or latent, variable. So, instead, psychologists develop a hypothesis of intelligence. They can then come up with measurement instruments (perhaps test questions designed to measure intelligence). Then, they use SEM to test their hypothesis using data gathered from people who took their intelligence test. The test questions are the observable construct and the psychologists use SEM to infer, from the observable intelligence test, something about the test taker’s level of intelligence.

Taking SEM High-Dimensional

Much current research, however, deals with problems that contain far more variables and far more possible values than an intelligence test. Fully describing such a problem requires many coordinates – such data is known as high-dimensional data, and it requires high-dimensional statistics, and therefore high-dimensional SEMs.

Cai says a truly focused effort is required to make necessary breakthroughs in high-dimensional SEMs and demonstrate their suitability in emerging research areas. He is working to develop more accurate and efficient inference methods for high-dimensional SEMs for use in inference of gene networks and optimized strategies for chemical genomics.

To do this, Cai and his team will first use novel algorithmic techniques and parallel computing develop a set of efficient and robust inference methods for high-dimensional SEMS. Next, they will use those techniques to make SEM-based inference about genes’ regulatory networks. They will then apply those inferences to optimized chemical genomics by integrating multiple types of data under the SEM framework. Finally, they will use experiments to test whether their inferences were correct.

“This research could have broad impact on human health,” Cai says. “It could help researchers understand the role of genes and their interactions in various diseases. That would enable the construction of more comprehensive small molecule libraries for use in new therapeutics.” He also notes that the applications will also go beyond genomics, paving the way for using high-dimensional SEMs in areas including economics, psychology, ecology, bio behavior and other social sciences.

View a PDF of the story