realnribal

Hi there 👋, I'm Henri

Welcome to my GitHub space! I'm a passionate technologist specializing in data science, machine learning, and big data engineering. I transform complex data into actionable insights and build scalable solutions to solve real-world problems.

🔍 Exploring cutting-edge technologies and methodologies
🤝 Collaborating on open-source projects and innovative solutions
💡 Creating impact through data-driven decision making

Feel free to reach out for discussions on data science, ML/AI projects, or the latest tech trends. Let's build something amazing together!

📂 Portfolio

🤖 Machine Learning & AI

Flight Delay Prediction System
End-to-end ML pipeline for predicting flight delays using weather data

Technologies: Apache Spark, Scala, MLflow, Docker, GCP Dataproc
ML Techniques: PCA feature engineering, Random Forest, k-fold cross-validation
Results: 85.8% accuracy with complete CI/CD deployment pipeline

SVM Optimization Learning
Advanced implementation of Support Vector Machine optimization techniques

Focus: Linear and non-linear separability scenarios
Methods: Hinge loss, ramp loss, hard margin optimization
Output: Comparative analysis with performance visualizations on 2D synthetic datasets

Honey Production Analysis & Forecasting
Time series analysis and predictive modeling of US honey production (1998-2012)

Objective: Forecast honey production trends for upcoming years
Repository: Part of AI Projects Collection

PageRank on Apache Spark
Scalable PageRank algorithm implementation with multi-scale Wikipedia graph analysis

Technologies: Scala, Apache Spark, GCP Dataproc, GitHub Actions
Scale: From wiki-chti (5K pages, 40K edges) to wiki-fr (400K pages, 5M edges)
Optimization: Performance comparison between baseline and partition-optimized implementations
Analysis: Interactive Jupyter notebooks for comparative performance metrics

LLM-GreenTune: Eco-Efficient Language Models
Sustainable LLM optimization through distillation, fine-tuning, and compression techniques

Distillation: Llama-3.2-3B → 1B student model (temperature-scaled softmax T=2.0, α=0.85)
Fine-tuning: LoRA (r=16, α=16) + QLoRA with 4-bit NF4 quantization on financial Q&A (7K samples)
Compression: Magnitude pruning + GPTQ quantization achieving 67% memory reduction
RAG System: SEC 10-K API, FAISS vector DB, HuggingFace embeddings for financial documents
Performance: 85%+ accuracy retention with ROUGE, BLEU, and perplexity metrics
Deployment: Production-ready Gradio chatbot for real-time financial Q&A

H&M Fashion Recommendation Pipeline
End-to-end recommendation system for personalized fashion suggestions

Dataset: 31M+ transactions, 1.4M customers, 105K articles
Algorithm: LightFM with collaborative filtering (WARP/BPR loss functions)
Approach: Hybrid model combining collaborative and content-based features
Optimization: Grid search hyperparameter tuning
Deployment: Streamlit interface for real-time predictions

📊 Data Analysis & Visualization

Electric Vehicle Charging Stations Analysis
Comprehensive analysis of EV charging infrastructure

Technologies: Python, pandas, data visualization libraries
Analysis: Station distribution, usage patterns, and infrastructure insights

☁️ Big Data Engineering

Common Crawl Domain Graph Analysis
Large-scale analysis of web domain relationships from Common Crawl dataset

Technologies: Apache Spark, Hadoop
Scale: Processing petabytes of web crawl data
Focus: Domain graph structure and connectivity patterns

Spark Connected Components Finder
Distributed graph algorithm implementation for finding connected components

Algorithm: Connected Components Finder (CCF)
Framework: Apache Spark for distributed processing
Application: Large-scale graph analysis and network clustering

🛠 Technical Skills

💻 Programming & Scripting

Languages: Python • Java • Go • Bash • PowerShell • SQL
Data Formats: YAML • JSON

🤖 Machine Learning & AI

Frameworks: Scikit-learn • MLflow • LightFM • HuggingFace Transformers
Deep Learning: LoRA • QLoRA • Model Distillation • Quantization (GPTQ, NF4)
Techniques: SVM • Random Forest • PCA • Cross-validation • Time Series Forecasting • Recommendation Systems
RAG & Vector DBs: FAISS • LangChain • Semantic Search

📊 Big Data & Analytics

Processing: Apache Spark • Hadoop • Scala
Platforms: Google Cloud Platform (Dataproc) • Databricks
Algorithms: PageRank • Connected Components • Graph Analysis
Tools: Pandas • Jupyter • Data Visualization

⚙️ DevOps & Infrastructure

Containerization: Docker • Podman
CI/CD: Jenkins • GitHub Actions
Automation: Ansible
Cloud: Google Cloud Platform (Dataproc, Compute Engine)
Deployment: Gradio • Streamlit

🐧 System Administration

OS: Ubuntu • Gentoo
Tools: SystemD • Bash scripting • Network Configuration
Virtualization: VirtualBox

🌐 Networking & Security

Protocols: TCP/IP • DNS • DHCP • HTTP/S
Security: Wireshark
Automation: Ansible

🔧 Development Tools

Version Control: Git • GitHub • GitLab
IDEs: VSCode • PyCharm • Vim
Documentation: Markdown • Sphinx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

realnribal

Highlights

Block or report realnribal

Hi there 👋, I'm Henri

📂 Portfolio

🤖 Machine Learning & AI

📊 Data Analysis & Visualization

☁️ Big Data Engineering

🛠 Technical Skills

💻 Programming & Scripting

🤖 Machine Learning & AI

📊 Big Data & Analytics

⚙️ DevOps & Infrastructure

🐧 System Administration

🌐 Networking & Security

🔧 Development Tools

📈 GitHub Stats

Popular repositories Loading

Uh oh!