Job was saved successfully.
Job was removed from Saved Jobs.

Job Details

MD Anderson Cancer Center

Institute Research Scientist - Computational Biology / Data Engineering





Houston, Texas, United States

The University of Texas MD Anderson Cancer Center aims to eliminate cancer in Texas, the nation, and the world, through outstanding programs that integrate patient care, research, and prevention, and through education for undergraduate and graduate students, trainees, professionals, employees, and the public.

MD Anderson Therapeutics Discovery Division
Within The University of Texas MD Anderson Cancer Center lies the Therapeutics Discovery Division (TDD), a powerful engine driving the future of new targeted, immune- and cell-based therapies. Therapeutics Discovery eliminates the bottlenecks that hamper traditional drug discovery by employing a multidisciplinary team of dedicated researchers, doctors, drug developers, and scientific experts working together to develop small-molecule drugs, biologics, and cellular therapies. Our unique structure and collaborative approach allow the team to work with agility, bringing novel medicines from concept to clinic quickly and efficiently - all under the same roof.

The TRACTION platform
The Translational Research to AdvanCe Therapeutics and Innovation in ONcology (TRACTION) platform is an industrialized translational research group that aligns world-class drug discovery and development with highly innovative the science and clinical care research, for which MD Anderson Cancer Center is known. Through an investment in patient-centric research, we have developed the infrastructure, platforms, and capabilities to enable transformative research. TRACTION's approach combines innovative cancer genetics, disruptive technologies, deep mechanistic biology, disease modeling, and pharmacology to accelerate the translation of novel discoveries into definitive clinical hypotheses. By partnering with the drug discovery engines within Therapeutics Discovery, we aim to advance a portfolio of small molecules, biologics, and cell therapies for our patients. We work in a fast-paced, milestone-driven environment with a focus on team science and interdisciplinary research. Our unique approach has created a biotech-like engine within the walls of the nation's leading cancer center to bring life-saving medicines to our patients more quickly and effectively.

We are seeking a highly skilled and detail-oriented data engineer to join our team as a Research Scientist of Computational Biology. This position will be responsible for the development, deployment, and maintenance of reproducible data analytics pipelines and interactive visualizations for large-scale datasets, improving their Findability, Accessibility, Interoperability, and Reuse (FAIR). The ideal candidate will coordinate the application and development of cutting-edge tools and methodologies to integrate diverse internal and public data assets to transform data into knowledge that enable advancements of Institute projects. The candidate works closely with biologists to identify and recommend analytical opportunities leading to the discovery of tumor related genes/pathways to further advance scientific discovery and clinical therapeutic drug development. Success will be measured by the ability to drive the application of computational biology tools for data processing and integration within a highly collaborative, team-science environment to enable hypothesis-driven testing of oncology therapeutics and/or biomarker strategies in the clinical setting. Overall, the improvement of the team's data assets will have a rapid and direct impact on patient care.

Key Functions

1. Design, build, and test systems to reproducibly process, integrate, and interactively visualize large-scale datasets.
2. Deploy analytical workflows to support Institute projects and deliver data packages under defined timelines.
3. Coordinate and design molecular profiling studies with biologists, capturing metadata, managing their execution, and identifying novel analytical approaches that lead to mechanistic insight into compound or gene perturbations to inform on biomarkers for clinical applications.
4. Gain proficiency in the statistical methods and software solutions for next-gen sequencing and other large-scale data types and be able to adapt and deploy these techniques for internal purposes.
5. Maintain cleanly written code in gitlab and collaborate with team members to further optimize.
6. Document development of analytical approaches with computational notebook applications (i.e Jupyter Notebooks).
7. Develop strong collaborative relationships with internal and external groups.
8. Interpret, present, and report research findings at internal meetings and external scientific conferences.
9. Use independent thinking skills to manage resources.
10. Work well under pressure and drive projects that impact critical timelines.


Required: Master's degree in biology, biochemistry, molecular biology, cell biology, enzymology, pharmacology, chemistry or related field.

Preferred: Ph.D. in Computer Science, Engineering, Applied Mathematics, Biostatistics or a related discipline from an accredited university.


Required: Six years experience of relevant research experience in laboratory With a PhD in a natural science or Medical degree, two years of required experience.

Preferred Candidate will possess the following:

1. MD or PhD with >3 years of relevant post-degree experience in a pharmaceutical/biotech environment.
2. Extensive experience with code management (GitLab or GitHub), and computational notebook (Jupyter Notebook, etc), solutions.
3. Evidence of successfully contributing or leading software development projects in a professional environment.
4. Familiar with FAIR guiding principles and their application to data science projects.
5. Evidence of proficiency in programing languages (Python, R, JavaScript), scripting languages (Bash), high-performance computing, and database management systems.
6. Experience with cloud computing, container images, reproducible workflows, and interactive visualization.
7. Experience with machine-learning and/or data mining algorithms (ie. Clustering, classification, etc.), and experience utilizing common parametric and non-parametric statistical tests (ie. T-test, ANOVA, Wilcoxon- signed-rank test, Fisher's exact test, etc.) for data analysis.
8. Evidence of ability to develop statistical algorithms, or the comprehensive assessment of algorithms, for the analysis of large multidimensional datasets, successfully manipulating large volume datasets, and experience with high performance computing are essential.
9. Knowledge and experience in areas of genomics, next-gen sequencing analytics (alignment tools, mutational variant callers, ChIP-seq, etc), pathway analysis, and network analysis.
10. Outstanding organizational skills and the ability to effectively present results and conclusions to co-workers, collaborators and manager.

It is the policy of The University of Texas MD Anderson Cancer Center to provide equal employment opportunity without regard to race, color, religion, age, national origin, sex, gender, sexual orientation, gender identity/expression, disability, protected veteran status, genetic information, or any other basis protected by institutional policy or by federal, state or local laws unless such distinction is required by law. [Register to View] Information

  • Requisition ID: 144195
  • Employment Status: Full-Time
  • Employee Status: Regular
  • FLSA: exempt and not eligible for overtime pay
  • Work Week: Days
  • Fund Type: Soft
  • Pivotal Position: Yes
  • Minimum Salary: US Dollar (USD) 98,000
  • Midpoint Salary: US Dollar (USD) 122,500
  • Maximum Salary : US Dollar (USD) 147,000
  • Science Jobs: Yes