Data Compression
Project in which we explore different algorithms and compression approaches such as Huffman coding or entropy change. The distribution of the symbols of the compressed files (image, electronic book, etc.) is also analyzed.
Example
1. Symbols Distribution
- Bytes Distribution
- Entropy
2. Text Compression
- Semantic Compression
- NLP Preprocessing Approach
- Decompression and Validation
3. File Compression
- Huffman Code from Scratch
- Compress Image with Huffman Code
- Compress Text file with Huffman Code
- Decompress file with Huffman Code
- Simple approach
- Probabilistic approach
- Pseudo-random approach
4. Compression & Entropy
- Compression with current Entropy
- Changing Entropy for higher Compression
- Restoring Entropy to Decompression
Data
PNG images and plain textbook of different sizes and different languages (English and Spanish).
Python Dependencies
import io
import math
import timeit
import numpy as np
import pandas as pd
from collections import Counter
from PIL import Image
from scipy.stats import entropy
import seaborn as sns
import matplotlib.pyplot as plt
Acknowledgment
To Ramses Coraspe for his good ideas and the validation of the compression / decompression processes used in this project.
Contributing and Feedback
Any kind of feedback/criticism would be greatly appreciated (algorithm design, documentation, improvement ideas, spelling mistakes, etc…).
Authors
- Created by Andrés Segura Tinoco
- Created on June 17, 2019
License
This project is licensed under the terms of the MIT license.