Using Strings With Bytes in Python

Bytes is a data structure in Python that can be used when we wish to store a collection of bytes in an ordered manner in a contiguous area of memory. It is also an immutable data type.

When working with data, you might find yourself in a situation where you need to convert bytes to other datatypes, strings for instance. Fortunately, Python provides us with all the necessary tools to convert bytes to strings and vice versa.

In this article, we will learn to convert between bytes and strings. We will also provide you with multiple examples to make the understanding easier. Without further ado, let’s start!

Why do we need to convert?

You may be wondering when do we need to convert between strings and bytes. Here is a simple example: reading from files.

When we read or write files the data is always handled as bytes. Bytes are easy to work for machines. That’s why it is important to learn how to encode and convert a string into bytes.

But bytes are not human-friendly to us since it is made up of zeros and ones. Hence to work with this type of data, or to even read it, it needs to be converted to a more human-friendly format such as strings. Hence we will need to decode it back to string-type. 

How do we do this encoding and decoding? Read on to learn about it!

Converting strings to bytes

To convert a string to a bytes-type, we can use the bytes() method to convert a string-type object to a bytes one.

The bytes() constructor returns a bytes object which is an array of the given bytes.

Here’s a quick example:

my_string = "This string will be converted to a bytes object"

my_bytes = bytes(my_string, 'utf-8')

print(my_bytes)
b'This string will be converted to a bytes object'

The syntax for using the bytes() constructor is as follows:

bytes(source, encoding, errors)

The bytes() method takes 3 optional arguments:

  1. Source
  2. Encoding
  3. Errors
  1. The source parameter lets the bytes() initialize the array in different ways. This is where we pass our data to be converted to a bytes object. 

Since we’ll be converting strings today, our source parameter will be a string-type object.

  1. The encoding is simply used the let the bytes() know how we want the string to be encoded. For instance UTF-8 or UTF-16 or ASCII or etc. 

Note: If the source is a string, then the encoding parameter becomes compulsory. If you entered another data type, such as an integer, then it is optional.

  1. The errors parameter lets the bytes() method know what to do in case the string conversion fails. There are six types of error response
  1. strict – This is the default response and raises a UnicodeDecodeError. 
  2. ignore – ignores the part which cannot be encoded in the result.
  3. replace – replaces the part that cannot be encoded to a question mark (?) in the result.
  4. xmlcharrefreplace – this inserts XML character reference instead of unencodable unicode.
  5. backslashreplace – this inserts a \uNNNN escape sequence instead of unencodable unicode.
  6. namereplace – inserts a \N{…} escape sequence instead of unencodable unicode.

Let’s look at a few more examples:

Encoding the string to a UTF-8 encoding:

my_string = "Hello"
print("The original string is:", my_string)

my_bytes = bytes(my_string, 'utf-8')
print("The encoded string is:", my_bytes)
The original string is: Hello
The encoded string is: b'Hello'

Encoding the string to an ASCII encoding

my_string = "Hello"
print("The original string is:", my_string)

my_bytes = bytes(my_string, 'ascii')
print("The encoded string is:", my_bytes)
The original string is: Hello
The encoded string is: b'Hello'

Encoding the string with an appropriate response to errors using the errors parameter;

Ignoring the unencodable code

my_string = "Hello Wörld!"
print("The original string is:", my_string)

my_bytes = bytes(my_string, 'ascii', 'ignore')
print("The encoded string is:", my_bytes)
The original string is: Hello Wörld!
The encoded string is: b'Hello Wrld!'

As we can see, the character ‘ö’ which is not a part of the ASCII code was correctly ignored in the encoded bytes thereby handling it perfectly.

Replacing the unencodable code with a question mark

my_string = "Hello Wörld!"
print("The original string is:", my_string)

my_bytes = bytes(my_string, 'ascii', 'replace')
print("The encoded string is:", my_bytes)
The original string is: Hello Wörld!
The encoded string is: b'Hello W?rld!'

This time, instead of ignoring the unencodable character, we replaced it with a question mark. This allows developers to notice it and debug it quickly.

Converting bytes to string

We can convert a bytes-type object to a string-type with Python’s str() constructor. Here’s how you can do that

my_bytes = bytes(b'Hello')
print("The original bytes object is:", my_bytes)

my_string = str(my_bytes, 'utf-8')
print("The converted string is:", my_string)
The original bytes object is: b'Hello'
The converted string is: Hello

The syntax for str() is very similar to the syntax for bytes():

str(object=b'', encoding='utf-8', errors='strict')

Just like the bytes() function, the str() constructor also takes in 3 parameters:

  1. Object – The str() function would return the string version of whatever ‘Object’ you pass through.
  2. Encoding – That the bytes object needs to be decoded from (this can be UTF-8, UTF-16, UTF-32, ASCII, etc)
  3. Errors – Response when there are some problems with the decoding. This also has the same 6 types of errors just like in the bytes() function.

Let’s jump into the examples right away!

Decoding a normal bytes string

my_bytes = bytes("Embedded Inventor", 'utf-8')

my_string = str(my_bytes, 'ascii')
print(my_string)
Embedded Inventor

Decoding a numerical bytes string

my_bytes = bytes("100", 'utf-8')

my_string = str(my_bytes, 'ascii')
print(my_string)
100

Now let’s see what happens if you intentionally pass a bytes object with errors

my_bytes = bytes('Embedded Inventör', encoding='utf-8')

my_string = (str(my_bytes, encoding='ascii', errors='ignore'))
print(my_string)
Embedded Inventr

In this method, we ignored the character ‘ö’ and printed the rest of the characters resulting in ‘Embedded Inventr’. The error parameter was set to ‘ignore’.

Now let’s try the same with a different error parameter: ‘strict’

my_bytes = bytes('Embedded Inventör', encoding='utf-8')

my_string = (str(my_bytes, encoding='ascii', errors='strict'))
print(my_string)
Traceback (most recent call last):
  File "/home/main.py", line 3, in <module>
    my_string = (str(my_bytes, encoding='ascii', errors='strict'))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 15: ordinal not in range(128)

This time the str() constructor tries to decode the ‘ö’ character. Since it cannot be decoded to ASCII, it raises the UnicodeDecodeError as the output. 

This article focuses specifically on the usage of strings with the bytes class. If you haven’t already check out other other detailed article on bytes.

Bytes: Everything you need to know!

Also I invite you to watch the YouTube video we made on ByteArray which is just the mutable cousin of the Bytes class!

And that is all for today’s article!

Kudos to you for reading the whole thing, only a few have the patience to do so!

I hope you enjoyed reading this article and found this article helpful!

Also, feel free to share it with your friends and colleagues!

If your thirst for knowledge has not been quenched yet, here are some related articles that might spark your interest!

Related Articles

Python: bytearray vs bytes, Similarities & Differences Explained!

Python ByteArray: Everything you need to know!

Bytes: Append & Extend, Explained with Examples

Thanks to Namazi Jamal for his contributions in writing this article!

Photo of author
Editor
Balaji Gunasekaran
Balaji Gunasekaran is a Senior Software Engineer with a Master of Science degree in Mechatronics and a bachelor’s degree in Electrical and Electronics Engineering. He loves to write about tech and has written more than 300 articles. He has also published the book “Cracking the Embedded Software Engineering Interview”. You can follow him on LinkedIn