Skip to content
View realnribal's full-sized avatar
  • Paris
  • 21:35 (UTC +02:00)

Highlights

  • Pro

Block or report realnribal

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
realnribal/README.md

Hi there πŸ‘‹, I'm Henri

Welcome to my GitHub space! I'm a passionate technologist specializing in data science, machine learning, and big data engineering. I transform complex data into actionable insights and build scalable solutions to solve real-world problems.

πŸ” Exploring cutting-edge technologies and methodologies
🀝 Collaborating on open-source projects and innovative solutions
πŸ’‘ Creating impact through data-driven decision making

Feel free to reach out for discussions on data science, ML/AI projects, or the latest tech trends. Let's build something amazing together!


πŸ“‚ Portfolio

πŸ€– Machine Learning & AI

Flight Delay Prediction System
End-to-end ML pipeline for predicting flight delays using weather data

  • Technologies: Apache Spark, Scala, MLflow, Docker, GCP Dataproc
  • ML Techniques: PCA feature engineering, Random Forest, k-fold cross-validation
  • Results: 85.8% accuracy with complete CI/CD deployment pipeline

SVM Optimization Learning
Advanced implementation of Support Vector Machine optimization techniques

  • Focus: Linear and non-linear separability scenarios
  • Methods: Hinge loss, ramp loss, hard margin optimization
  • Output: Comparative analysis with performance visualizations on 2D synthetic datasets

Honey Production Analysis & Forecasting
Time series analysis and predictive modeling of US honey production (1998-2012)

PageRank on Apache Spark
Scalable PageRank algorithm implementation with multi-scale Wikipedia graph analysis

  • Technologies: Scala, Apache Spark, GCP Dataproc, GitHub Actions
  • Scale: From wiki-chti (5K pages, 40K edges) to wiki-fr (400K pages, 5M edges)
  • Optimization: Performance comparison between baseline and partition-optimized implementations
  • Analysis: Interactive Jupyter notebooks for comparative performance metrics

LLM-GreenTune: Eco-Efficient Language Models
Sustainable LLM optimization through distillation, fine-tuning, and compression techniques

  • Distillation: Llama-3.2-3B β†’ 1B student model (temperature-scaled softmax T=2.0, Ξ±=0.85)
  • Fine-tuning: LoRA (r=16, Ξ±=16) + QLoRA with 4-bit NF4 quantization on financial Q&A (7K samples)
  • Compression: Magnitude pruning + GPTQ quantization achieving 67% memory reduction
  • RAG System: SEC 10-K API, FAISS vector DB, HuggingFace embeddings for financial documents
  • Performance: 85%+ accuracy retention with ROUGE, BLEU, and perplexity metrics
  • Deployment: Production-ready Gradio chatbot for real-time financial Q&A

H&M Fashion Recommendation Pipeline
End-to-end recommendation system for personalized fashion suggestions

  • Dataset: 31M+ transactions, 1.4M customers, 105K articles
  • Algorithm: LightFM with collaborative filtering (WARP/BPR loss functions)
  • Approach: Hybrid model combining collaborative and content-based features
  • Optimization: Grid search hyperparameter tuning
  • Deployment: Streamlit interface for real-time predictions

πŸ“Š Data Analysis & Visualization

Electric Vehicle Charging Stations Analysis
Comprehensive analysis of EV charging infrastructure

  • Technologies: Python, pandas, data visualization libraries
  • Analysis: Station distribution, usage patterns, and infrastructure insights

☁️ Big Data Engineering

Common Crawl Domain Graph Analysis
Large-scale analysis of web domain relationships from Common Crawl dataset

  • Technologies: Apache Spark, Hadoop
  • Scale: Processing petabytes of web crawl data
  • Focus: Domain graph structure and connectivity patterns

Spark Connected Components Finder
Distributed graph algorithm implementation for finding connected components

  • Algorithm: Connected Components Finder (CCF)
  • Framework: Apache Spark for distributed processing
  • Application: Large-scale graph analysis and network clustering

πŸ›  Technical Skills

πŸ’» Programming & Scripting

Languages: Python β€’ Java β€’ Go β€’ Bash β€’ PowerShell β€’ SQL
Data Formats: YAML β€’ JSON

πŸ€– Machine Learning & AI

Frameworks: Scikit-learn β€’ MLflow β€’ LightFM β€’ HuggingFace Transformers
Deep Learning: LoRA β€’ QLoRA β€’ Model Distillation β€’ Quantization (GPTQ, NF4)
Techniques: SVM β€’ Random Forest β€’ PCA β€’ Cross-validation β€’ Time Series Forecasting β€’ Recommendation Systems
RAG & Vector DBs: FAISS β€’ LangChain β€’ Semantic Search

πŸ“Š Big Data & Analytics

Processing: Apache Spark β€’ Hadoop β€’ Scala
Platforms: Google Cloud Platform (Dataproc) β€’ Databricks
Algorithms: PageRank β€’ Connected Components β€’ Graph Analysis
Tools: Pandas β€’ Jupyter β€’ Data Visualization

βš™οΈ DevOps & Infrastructure

Containerization: Docker β€’ Podman
CI/CD: Jenkins β€’ GitHub Actions
Automation: Ansible
Cloud: Google Cloud Platform (Dataproc, Compute Engine)
Deployment: Gradio β€’ Streamlit

🐧 System Administration

OS: Ubuntu β€’ Gentoo
Tools: SystemD β€’ Bash scripting β€’ Network Configuration
Virtualization: VirtualBox

🌐 Networking & Security

Protocols: TCP/IP β€’ DNS β€’ DHCP β€’ HTTP/S
Security: Wireshark
Automation: Ansible

πŸ”§ Development Tools

Version Control: Git β€’ GitHub β€’ GitLab
IDEs: VSCode β€’ PyCharm β€’ Vim
Documentation: Markdown β€’ Sphinx


πŸ“ˆ GitHub Stats

GitHub Stats

Popular repositories Loading

  1. git_practice git_practice Public

  2. realnribal.github.io realnribal.github.io Public

    Ruby

  3. Graph_using_Matplotlib Graph_using_Matplotlib Public

    Jupyter Notebook

  4. seaborn seaborn Public

    Jupyter Notebook

  5. AI-Projects AI-Projects Public

    Some Data Science Stuff using Python

    Jupyter Notebook

  6. henri_balamou_portofolio henri_balamou_portofolio Public

    Je dΓ©cris un peu les diffΓ©rents projets sur lesquels j'ai travaillΓ©