Skip to content

realnribal/Analyze-Common-Crawl-Data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Common Crawl Domain Analysis

Overview

This project analyzes domain data from the Common Crawl dataset, a non-profit organization that maintains petabytes of web content and makes it publicly available for research and educational purposes.

About common crawl

Common crawl crawls, archives, and analyses content from all public websites.

Structure

  • data/: Raw and processed data
  • notebooks/: Analysis notebooks
  • src/: Source code

Setup

  1. Install requirements: pip install -r requirements.txt
  2. Run Jupyter notebook: jupyter notebook notebooks/analysis.ipynb

Usage

[Add usage instructions here]

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors