Proteomics is nearly 30 years old. Increasingly sophisticated equipment and ever-improving methodologies keep pushing the boundaries of proteomic studies, and great strides in precision medicine in recent years expand the horizon of this burgeoning discipline.
1. Experiment Design
Traditional proteomics is more oriented toward scientific research, and many of these studies are done with cell lines in the experiment design, often in the forms of single experimental groups + control groups, or different experimental groups + control groups. The proteomics approach of this kind is well established and we call it Multidimensional Protein Identification Technology (MudPIT).
The mushrooming precision medicine and clinical proteomics bring about opportunities and challenges alike. As a significant path to precision medicine, proteomics plays major roles in clinical research, and efficient experimental processes of the past no longer work. Multi-cohort experiments, multiple sample types, analysis of large cohort samples, joint analysis of various data collection modes, and AI deep learning empowered by proteomic big data are emerging problems never seen before in proteomics research. Data stability, batch effects, and missing values in the sample collection process are challenges daunting to the researchers.
(1) Non-parallelism and instability in the experimental and the MS analysis stage compromise the final data reproducibility of the experiments with large cohort samples, impairing the result accuracy.
(2) Different batches of experiments and reagents, different operators, and instruments may produce inevitable but reducible batch effects. Severe ones will affect the quantitative accuracy of proteomics and eventually lead to biased results.
(3) Missing values: There are Missing Not At Random (MNAR) and Missing Completely At Random (MCAR) in proteomics data. MNAR indicates the absence or low volume of protein in the samples, while MCAR means the missing values by mass spectrometry errors or data retrieval errors. To obtain more reliable results, researchers devise various solutions for the missing values (e.g., filled mean, maximum, minimum, etc.) to ensure that the data are more authentic, and facilitate the subsequent analyses.
In theory, there is no best solution, and we can work out the most appropriate method through continuous exploration. In this case, Meticulous verification is certainly a stable route, but in the interest of time, it’s recommendable to focus on proven technical processes, such as what Westlake Omics is doing.
Westlake Omics possesses more sophisticated proteomics technology. Its scientific service platform exercises strict quality control standards that are observed at every stage of experimental operation, contributing to high-quality and stable MS data. The company has a variety of methods to assess batch effects, and solutions to correct batch effects. These tools in the experimental design process can well balance the batch effects. For MCAR, Westlake Omics is committed to minimizing the losses from errors and formulating solutions for both missing problems to ensure maximum data authenticity.
2. Sample Preparation
The extremely minute clinical tissue samples are not readily available. Traditional manual grinding often fails to produce the expected results due to large losses and poor parallelism. In addition, there are various types of samples from clinical sources, ranging from relatively well-studied ones such as liver, kidney, lung, muscle, and brain, to rare cases such as hair, nail, stool, and bone tissues, from which proteins are difficult to extract with traditional grinding. The rare samples are increasingly considered important research targets. Westlake Omics’ PCT-based high-throughput and highly reproducible micro-sample preparation technology maintain good protein yields and experimental parallelism. It is compatible with a wide range of sample types, with accurate research data and output.
3. MS Analysis Technology
In the past, labeling reagents such as TMT and iTRAQ were often employed in the research-oriented experiments of quantitative proteomics. Still, they only supported simultaneous comparison of several or a dozen of samples, and even if a pooled sample is used as a reference for large cohorts, the Batch Effect is still there. Therefore, clinical high-throughput proteomics is now focusing on DIA. With the patented Pulse DIA, Westlake Omics further improves the depth of protein identification building on traditional DIA. The algorithm-based cohort design, PCT sample preparation technology, and tailor-made data analysis service, are all adapted to the requirements of clinical proteomics.
4. Proteomics big data combined with AI deep learning for clinical diagnosis of diseases
Westlake Omics introduces AI machine learning and integrates it with proteomic big data for the development of precision medicine.
In the traditional analysis process, researchers may focus more on the relationship between a single or a few biomarkers and the label, in direct and straightforward ways, such as differential analysis. However, clinical problems are often complex, and there are inextricable links among different proteins, and between proteins and genes. Traditional methods have certain limitations, if we can establish relatively deep machine learning models to clarify underlying relationships, we may solve the problems of the analysis; secondly, many traditional biomarker analyses mostly rely on the integration of information reported in the literature. However, proteomic big data contain a huge amount of information, and machine learning can help researchers reveal deeper unknown biomarkers and connections between those markers, discover and explore corners that biologists failed to reach, and achieve perfect complementarity. Besides, machine learning facilitated by complex features such as protein interaction networks and protein signaling pathways can reveal the occurrence and development of diseases in a more systematic and comprehensive way. More importantly, as the number of data increases, more sophisticated machine learning will be built to better handle some previously unpredictable ambiguous cases, such as the categorization of thyroid and Type II Diabetes.
Westlake Omics has accumulated many years of experience in high-throughput and high-stability proteomics sample preparation and data analysis, generating and analyzing over 100,000 proteomes. The company integrates it with AI deep learning and establishes an AI-enabled proteomics technology platform. The professional design of large cohort samples supports deep high-throughput proteomics analysis of the very small amount and multiple clinical tissue samples, which can well solve the pain points of proteomic research of clinical samples and assist clinical diagnosis.
Proteomics is nearly 30 years old. Increasingly sophisticated equipment and ever-improving methodologies keep pushing the boundaries of proteomic studies, and great strides in precision medicine in recent years expand the horizon of this burgeoning discipline.
1. Experiment Design
Traditional proteomics is more oriented toward scientific research, and many of these studies are done with cell lines in the experiment design, often in the forms of single experimental groups + control groups, or different experimental groups + control groups. The proteomics approach of this kind is well established and we call it Multidimensional Protein Identification Technology (MudPIT).
The mushrooming precision medicine and clinical proteomics bring about opportunities and challenges alike. As a significant path to precision medicine, proteomics plays major roles in clinical research, and efficient experimental processes of the past no longer work. Multi-cohort experiments, multiple sample types, analysis of large cohort samples, joint analysis of various data collection modes, and AI deep learning empowered by proteomic big data are emerging problems never seen before in proteomics research. Data stability, batch effects, and missing values in the sample collection process are challenges daunting to the researchers.
(1) Non-parallelism and instability in the experimental and the MS analysis stage compromise the final data reproducibility of the experiments with large cohort samples, impairing the result accuracy.
(2) Different batches of experiments and reagents, different operators, and instruments may produce inevitable but reducible batch effects. Severe ones will affect the quantitative accuracy of proteomics and eventually lead to biased results.
(3) Missing values: There are Missing Not At Random (MNAR) and Missing Completely At Random (MCAR) in proteomics data. MNAR indicates the absence or low volume of protein in the samples, while MCAR means the missing from mass spectrometry errors or data retrieval errors. To obtain more reliable results, researchers devise various solutions for the missing values (e.g., filled mean, maximum, minimum, etc.) to ensure that the data are more authentic, and facilitate the subsequent analyses.
In theory, there is no best solution, and we can work out the most appropriate method through continuous exploration. In this case, Meticulous verification is certainly a stable route, but in the interest of time, it’s recommendable to focus on proven technical processes, such as what Westlake Omics is doing.
Westlake Omics possesses more sophisticated proteomics technology. Its scientific service platform exercises strict quality control standards that are observed at every stage of experimental operation, contributing to high-quality and stable MS data. The company has a variety of methods to assess batch effects, and solutions to correct batch effects. These tools in the experimental design process can well balance the batch effects. For MCAR, Westlake Omics is committed to minimizing the losses from errors and formulating solutions for both missing problems to ensure maximum data authenticity.
2. Sample Preparation
The extremely minute clinical tissue samples are not readily available. Traditional manual grinding often fails to produce the expected results due to large losses and poor parallelism. In addition, there are various types of samples from clinical sources, ranging from relatively well-studied ones such as liver, kidney, lung, muscle, and brain, to rare cases such as hair, nail, stool, and bone tissues, from which proteins are difficult to extract with traditional grinding. The rare samples are increasingly considered important research targets. Westlake Omics’ PCT-based high-throughput and highly reproducible micro-sample preparation technology maintain good protein yields and experimental parallelism. It is compatible with a wide range of sample types, with accurate research data and output.
3. MS Analysis Technology
In the past, labeling reagents such as TMT and iTRAQ were often employed in the research-oriented experiments of quantitative proteomics. Still, they only supported simultaneous comparison of several or a dozen of samples, and even if a pool sample is used as a reference for large cohorts, the Batch Effect is still there. Therefore, clinical high-throughput proteomics is now focusing on DIA. With the patented Pulse DIA, Westlake Omics further improves the depth of protein identification building on traditional DIA. The algorithm-based cohort design, PCT sample preparation technology, and tailor-made data analysis service, are all adapted to the requirements of clinical proteomics.
4. Proteomics big data combined with AI deep learning for clinical diagnosis of diseases
Westlake Omics introduces AI machine learning and integrates it with proteomic big data for the development of precision medicine.
In the traditional analysis process, researchers may focus more on the relationship between a single or a few biomarkers and the label, in direct and straightforward ways, such as differential analysis. However, clinical problems are often complex, and there are inextricable links among different proteins, and between proteins and genes. Traditional methods have certain limitations, if we can establish relatively deep machine learning models to clarify underlying relationships, we may solve the problems of the analysis; secondly, many traditional biomarker analyses mostly rely on the integration of information reported in the literature. However, proteomic big data contain a huge amount of information, and machine learning can help researchers reveal deeper unknown biomarkers and connections between those markers, discover and explore corners that biologists failed to reach, and achieve perfect complementarity. Besides, machine learning facilitated by complex features such as protein interaction networks and protein signaling pathways can reveal the occurrence and development of diseases in a more systematic and comprehensive way. More importantly, as the number of data increases, more sophisticated machine learning will be built to better handle some previously unpredictable ambiguous cases, such as the categorization of thyroid and Type II Diabetes.
Westlake Omics has accumulated many years of experience in high-throughput and high-stability proteomics sample preparation and data analysis, generating and analyzing over 100,000 proteomes. The company integrates it with AI deep learning and establishes an AI-enabled proteomics technology platform. The professional design of large cohort samples supports deep high-throughput proteomics analysis of the very small amount and multiple clinical tissue samples, which can well solve the pain points of proteomic research of clinical samples and assist clinical diagnosis.