Scientific Data Analyst | Chemistry Intelligence, Biomarker Analytics & AI-Driven Workflows
PhD • 10+ years in analytical science, measurement systems, and experimental data • Python • SQL • ML
I build robust, interpretable models for scientific and real-world data — especially where noise, drift, validation, and domain context determine whether machine learning is actually useful.
- PhD in Analytical Chemistry with 10+ years working with analytical instrumentation and scientific data
- Experience with LC–MS, GC–MS, HPLC, spectroscopy, sensors, and diagnostics-related workflows
- Research leadership experience in Germany, including interdisciplinary R&D coordination at Hahn-Schickard
- Strong focus on data quality, reproducibility, explainability, and validation-first workflows
- Differentiator: I do not just model data — I understand how the data is generated, how systems drift, and how results need to be validated in real workflows
Applied Data Scientist / Scientific Data Analyst / Product Analytics roles in data-heavy environments
Focus areas:
Instrumentation • Diagnostics • Scientific Software • Research Data • Manufacturing • Industry 4.0 / IIoT
Location: Germany (English-first teams)
German: B1
End-to-end cheminformatics workflow for predicting molecular properties such as solubility, toxicity, and BBB penetration.
Focus: RDKit fingerprints, Random Forest, Lipinski validation, Streamlit UI
Repo: https://github.com/alexdbatista/data-science-portfolio/tree/main/featured/toxpred-explainable
Automated pathology screening using label-free Mid-IR hyperspectral cubes.
Focus: Out-of-core pipeline, U-Net spatial segmentation, manifold learning (PCA + UMAP)
Repo: https://github.com/alexdbatista/data-science-portfolio/tree/main/featured/qcl-breast-cancer-diagnostics
Diagnostics-style workflow for preprocessing, feature selection, and interpretable modeling to support biomarker prioritization.
Focus: QC mindset, explainable ML, scientific interpretation, “what to validate next”
Repo: https://github.com/alexdbatista/data-science-portfolio/tree/main/featured/metabolomics-biomarker-discovery
https://github.com/alexdbatista/data-science-portfolio
Python: pandas, numpy, scikit-learn, PyTorch, SHAP, RDKit, Streamlit
Data: SQL, data cleaning, feature engineering, validation-first workflows
ML: tree-based models, linear models, SVM, clustering, spatial deep learning
Workflow: Git/GitHub, Docker, PyTest, VS Code, Linux/Bash
Domain: sensors and instrumentation, chemometrics, assay data, analytical chemistry, physical-world measurement systems
LinkedIn: https://www.linkedin.com/in/alexdbatista/
Email: alex.domin.batista@gmail.com
Portfolio: https://github.com/alexdbatista/data-science-portfolio
Languages: Portuguese (native) • English (fluent) • German (B1)
- 50 peer-reviewed publications
- ~1,266 citations
- h-index 18
- Humboldt Research Fellow (Ulm University)
- Former Professor (UFU, Brazil)
- Research Group Leader (Hahn-Schickard, Germany)