Web Development in Practice: Text to Image
No, you aren't having a stroke. At least, I sure hope. Anyhow, you are likely reading this title correctly. Today we will have some fun with a simple data wrangling exercise that involves encoding text as an image. We will also cover the reverse operation: we'll read text that has been encoded as an image.
If you are in the market for improving your intuition about some practical CS information theory and are interested in looking at things from a data storage and encoding perspective, this article is for you!
Why (you may be whining) on God's green earth (you may be dramatically appending) are we going to practice encoding text as an image? Well, my darling, please shush it and allow me to explain. Being a web developer means that you get to do just about anything you want to do on your websites. The only limitation is your skill and imagination. Engaging with this exercise will help you to more deeply understand different ways that data can be represented, and at the same time you'll be gaining practice with essential JavaScript APIs.
Practice isn't the only reason we are here today. Actually, I am building something and I need to encode and decode text from an image. Since you asked, I am designing a portable and simple format for representing ASL (American Sign Language) gestures. The main guiding constraint is that the format fits within a flat WebP image in a way that can be easily understood by common sense in case someone can't find my software or documentation in the future, but still has access to the images. Each individual image of this format is a grid of low-resolution frames in sequence: an obvious animation. The last frame region will encode text as grayscale pixels, which I hope is an obvious metadata-encoding scheme. Of course I could encode text as EXIF metadata, but this is hidden and will surely be lost if the image is converted to another file format or subsequently re-saved by any system that doesn't involve complicated EXIF data-retention capabilities.
Bueno; now you know this is not only an exercise, but also a real-world practical application that I need to implement in order to help solve a real-life problem. Vamos.
Before we get into any JavaScript, let's quickly and simply think about how text and images can be interchangeable from a data perspective. Although text characters and image pixels appear differently when we look at them on a webpage, binary bytes are still the basic data type that represents them in computer memory.
Remember what is binary and what is a byte? I'll remind you! Binary is the base-2 system for representing unique values. A byte is made up of 8 binary digits (0 or 1) and each byte is capable of representing any unsigned integer value between 0 and 255. Think of a bite as a row of 8 light switches. The binary representation of the integer 0 is 00000000 (all the switches set to off). The binary representation of the integer 9 is 00001001 (note the two switches that are on). So on, and so fourth until we reach 255, represented in binary as 11111111 (all switches are on). It is only important to know that the binary byte (that is physically represented by on/off switches in computer memory) can be quickly translated to these integer values they correlate to in the realm of base-10 numbers. So going forward here, we don't need to think in terms of switches and binary, but just in terms of their byte representation as base-10 integers form 0 to 255.
Text can be encoded using the value of each character's binary byte representation. All we have to do to translate between the two is to assign one of these byte integers (0-255) to each character. This assignment has already been handled by conventional standards (like ASCII and Unicode), so we don't have to worry about what number correlates to what character, it is enough just to know that these integers and the characters they represent are interchangeable.
Images can also be encoded by using the integer representation of each pixel's intensity value. For color images, the final color of each pixel that you see rendered on your screen is actually determined by the interplay of 3 color intensity values that range from 0 to 255: one value each for red, green, and blue. You see, each color intensity value can be stored in computer memory as a byte (just like a text character can be). Note that if all of these individual color intensity values are equal, the resulting pixel will have no particular tint but instead will be on the gray scale.
As the text and images can both be encoded in memory as binary, and each binary byte can be represented by an unsigned integer that ranges from 0 to 255, then you must be able to see how these disparate media types are essentially interchangeable at some level, and only really differ once we have higher-level systems interpret them as either text or an image.
Now that we have theory out of the way, we can get to practice. Let's start this practice by seeing how a particular alphanumeric character can be presented as the integer value that corresponds to its byte representation, and how that integer can then be stored as a gray pixel. This is our simplest example of encoding text as an image.
First, let's choose a character: Z (note it is definitely uppercase)
Now we need to find out what integer represents this character. There are already universal encoding standards that define which characters correspond to which integers. We'll use the default modern encoding standard of UTF-8 in these examples (more on this later). We can find out that the integer 90 conventionally corresponds to the alphanumeric character "Z" (as defined by both ASCII and UTF-8 encoding schemes) by employing this simple snippet of JavaScript that utilizes the TextEncoder interface:
// encode text with the TextEncoder interface
const text_code = new TextEncoder().encode("Z")
// print out the result
console.log(text_code)
// console output: Uint8Array(1) [ 90 ]
Next, let's assign this value to something that can represent it as a gray pixel. We'll create a proper image later, when we have more text to encode. For now, we just need something that takes a red-green-blue value and can render a color from it. We can use HTML with the CSS background-color property to achieve this here. I'll write the code and then embed it so you can see the result:
<div style="background-color: rgb(90, 90, 90); height: 100px; width: 100px; margin: auto;">
</div>
How gray! As you can see, the color of this colored square is defined by integers that are stored somewhere. The user doesn't usually see those numbers, but they are the essence of how the thing is stored behind the scenes. Therefore, in order to reverse this curse, all we do is read any one of the color value numbers and figure out which alphanumeric character it corresponds to.
Let's make it slightly harder on ourselves. We already know that the integer 90 corresponds to the character "Z" in this scheme. But what if we have the code for a colored square and don't already know the character that its stored integers represent? Like this:
<div style="background-color: rgb(122, 122, 122); height: 100px; width: 100px; margin: auto;">
</div>
That square is even brighter, I do say! But whatever does it mean? I mean, what character does the integer 122 correspond to, and how can we find out? It is just as easy as before. Except to translate an integer back to text, we use the JavaScript TextDecoder interface:
// decode text with the TextDecoder interface
const text_decode = new TextDecoder().decode(new Uint8Array([122]))
// print out the result
console.log(text_decode)
// console output: z
Note that above, the decode method of the TextDecoder interface accepts a specific data type: the unsigned 8-bit integer array that can be constructed with the Uint8Array() constructor. Remember from earlier? There's 8 bits in a byte! This is the data type that can hold integers that range from 0 to 255 as a list (lists are defined by brackets [..] as you know).
And with that, I'll end part 1 of this miniseries. You should now understand the basic theory of encoding and decoding text to an image and I hope you are thinking about how we'll expand on this concept next. In the next installment, we'll get more into some code for an approach that will have us encoding and decoding more text to a portable image. Stay tuned, the link will appear here soon.
<3 Grant