Incorrect Labels in Hadoop Log Data

Recently, I began working on a demo for our log analysis tool, [LogDelta](https://github.com/EvoTestOps/LogDelta), using your [Hadoop](https://github.com/logpai/loghub/tree/master/Hadoop). However, during the demo's creation, I grew increasingly suspicious of certain labels in the Hadoop data. As a result, what started as a simple demo evolved into a label investigation, ultimately requiring far more effort than initially anticipated.

I focused solely on the PageRank application, meaning that the WordCount application might still contain additional incorrect labels. Below are the identified incorrect labels along with their corresponding fixes:

| ID                  | Orig Label   | Fixed Label  |
| ------------------- | ------------ | ------------ |
| 1445144423722\_0024 | Normal       | Disk Full    |
| 1445182159119\_0017 | Machine Down | Normal       |
| 1445062781478\_0020 | Machine Down | Normal       |
| 1445182151478\_0015 | Machine Down | Disk Full    |
| 1445182159119\_0013 | Disk Full    | Machine Down |
| 1445182159119\_0011 | Disk Full    | Machine Down |

If you're curious about how I reached these conclusions, the process is documented in a [YouTube playlist](https://www.youtube.com/playlist?list=PLTUjKYPvVhe6JhHBlkJN_yPhVDR5w2ej2). 

- The key part of the label correction is covered in the final [video](https://youtu.be/2GWZob7K5h0).
- The earlier videos provide details on how the suspicions began to arise.
- I have also shared the [text script](https://github.com/EvoTestOps/LogDelta/blob/main/demo/label_investigation/video_script.md) of the video, which includes some visuals.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect Labels in Hadoop Log Data #56

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ID	Orig Label	Fixed Label
1445144423722_0024	Normal	Disk Full
1445182159119_0017	Machine Down	Normal
1445062781478_0020	Machine Down	Normal
1445182151478_0015	Machine Down	Disk Full
1445182159119_0013	Disk Full	Machine Down
1445182159119_0011	Disk Full	Machine Down

Incorrect Labels in Hadoop Log Data #56

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions