Files and Symbols Distribution

  • Created by Andrés Segura Tinoco
  • Created on June 17, 2019

In computer science a file is a resource for recording data discretely in a computer storage device. Just as words can be written to paper, so can information be written to a computer file. Files can be edited and transferred through the internet on that particular computer system. [1]

Some types of files are: text files, images, videos, music, etc, and all of them are made up of sequences (arrays) of bytes.

In [1]:
# Load Python libraries
import io
import numpy as np
import pandas as pd
from collections import Counter
from PIL import Image
In [2]:
# Load Plot libraries
import seaborn as sns
import matplotlib.pyplot as plt

File: Image 1

In [3]:
# Loading an example image
file_path = "../data/img/example-1.png"
img = Image.open(file_path)
In [4]:
# Show image dimension (resolution)
img.size
Out[4]:
(1920, 1080)
In [5]:
# Show image extension
img.format
Out[5]:
'PNG'
In [6]:
# Show image
img
Out[6]:
In [7]:
# Read file in low level (Bytes)
def get_image_bytes(file_path):
    with open(file_path, 'rb') as f:
        return bytearray(f.read());
    return None;
In [8]:
# Show size (KB)
low_byte_list1 = get_image_bytes(file_path)
round(len(low_byte_list1) / 1024, 2)
Out[8]:
2728.96
In [9]:
# Show size (MB)
round(len(low_byte_list1) / 1024 / 1020, 2)
Out[9]:
2.68
In [10]:
# Create a matrix
row_len = 2232
col_len = 1252
matrix = np.zeros((row_len, col_len))
matrix.shape
Out[10]:
(2232, 1252)
In [11]:
# Calculate additional bits
gap = np.prod(matrix.shape) - len(low_byte_list1)
gap
Out[11]:
6
In [12]:
# Save bytes into matrix
data = np.array(low_byte_list1)
for i in range(0, len(data)):
    ix_row = int(i / col_len)
    ix_col = i % col_len
    matrix[ix_row][ix_col] = data[i]
In [13]:
# Plot image in binary
fig, ax = plt.subplots(figsize = (14, 14))
sns.heatmap(matrix, ax = ax)
ax.set_title("Bytes of the Image", fontsize = 16)
ax.set_xlabel('columns', fontsize = 12)
ax.set_ylabel('rows', fontsize = 12)
plt.show()