Unicode Obfuscation: Braille Patterns for Penetration Testing

0x00 Preface

---

In penetration testing, critical code (such as shellcode) is often obfuscated to evade static detection and analysis.

I recently encountered an interesting sample that uses Braille Patterns to obfuscate strings, posing significant challenges for static analysis.

As shown in the figure below

Alt text

Sample address:

https://www.virustotal.com/gui/file/06f90a471f65de9f9805a9e907d365a04f4ebed1bf28b458397ad19afdb9ac00/detection

This article will introduce this method of obfuscating strings using Unicode encoding. It will cover implementing encoding and decoding for Braille Patterns through programs and share ideas for using other Unicode character tables for encoding and decoding.

0x01 Introduction

---

This article will cover the following:

Implementation principles of the sample characters
Encoding via program
Decoding via program
Approach to encoding and decoding using other Unicode character tables

0x02 Implementation principle of sample characters

---

Basic Knowledge 1: Unicode

Unicode is an encoding scheme developed to overcome the limitations of traditional character encoding methods. It assigns a uniform and unique binary code to each character in every language, meeting the requirements for cross-language and cross-platform text conversion and processing.

Simple understanding: Every character we see on a computer corresponds to a unique Unicode code.

Applying this to the sample mentioned above, although Braille Patterns are used to create difficulty for manual analysis, converting each character into its Unicode code can overcome this problem.

Basic Knowledge 2: Braille Patterns

These are specialized text symbols designed for blind people to read by touch and write.

Braille consists of 64 distinct patterns, meaning each character has 64 possible styles.

Braille Patterns occupy 256 positions in the Unicode table, meaning there are 256 Unicode codes corresponding to Braille Patterns.

To support more characters, the following method is used in the correspondence:

Lowercase English letters correspond to single Unicode codes
Arabic numerals correspond to two Unicode codes, with the first Unicode code fixed at U283C
Uppercase English letters correspond to two Unicode codes, with the first Unicode code fixed at U2820

Note:

Lowercase English letters also correspond to two Unicode codes, with the first Unicode code fixed at U2830, but the first Unicode code is usually omitted

During the code obfuscation process, we can deviate from the above syntax to increase the difficulty of code analysis

For example, first encode the code in base64 (which consists of 64 characters), then randomly map it to the 256 zones of Braille Patterns

In summary, we can derive the implementation principle: convert the characters to be encrypted into Unicode codes, and then convert the Unicode codes into actual symbols

Therefore, decryption is also very simple: regardless of how complex the symbols are, first convert them into Unicode codes, then analyze them

To improve efficiency, the following sections introduce the methods for programmatically implementing encoding and decoding

For intuitive understanding, the program implementations all use Braille Grade 1 encoding, i.e., converting letter by letter, excluding abbreviations and other word-level transformations

0x03 Implementing Encoding via Program

---

For the implementation of encoding, simplicity and practicality are prioritized, so web-based encoding is chosen

Reference materials:

http://www.byronknoll.com/braille.html

This website supports Braille Grade 1 encoding

By viewing the source code, it can be found that http://www.byronknoll.com/braille.html implements Braille Grade 1 encoding through JavaScript scripts

Therefore, we only need to make simple modifications (fix some transcoding bugs, remove some features)

To check if there are bugs in the transcoding process, we need to know the Unicode and HTML codes corresponding to each Braille character. Reference materials available:

https://www.ziti163.com/uni/2800-28ff.shtml?id=83#

The correspondence between Unicode codes and English characters can be referenced from:

http://www.doc88.com/p-695153826363.html

The original code supports uppercase and lowercase letters, numbers, and some special symbols, but there are some bugs in the handling of special symbols, such as incorrect conversion for + and !

The modified code has been uploaded to GitHub and can be accessed directly for encoding at the following address:

https://an-open-source-project/tool/BrailleGenerator.html

Supports the following characters: 1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ),!/-.?;'$

Test as shown in the image below

Alt text

Note:

My code only serves as a development template, so the conversion bugs for + and ! are not fixed, and + and = are not supported

0x04 Decoding via Program Implementation

---

To combine with exploitation methods (e.g., in-memory loading of PE files), C# is used here for implementation

For details on in-memory loading of PE files, refer to the previous article 'In-Memory Loading of PE Files via .NET'

The program implementation process is as follows:

1. Store the Braille characters obtained from BrailleGenerator.html in an array

2. Convert Braille characters into Unicode characters, noting that Arabic numerals and uppercase letters occupy two Unicode characters

3. Convert Unicode characters into actual characters through corresponding mappings

The code has been uploaded to GitHub at the following address:

An open-source project

Supports the following characters: 1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ),!/-.?;'$

Supports .NET 3.5 and newer versions

The compilation command is as follows:

C:\Windows\Microsoft.NET\Framework\v3.5\csc.exe BrailleToASCII.cs
or
C:\Windows\Microsoft.NET\Framework\v4.0.30319\csc.exe BrailleToASCII.cs

0x05 Approach for Encoding and Decoding Using Other Unicode Character Tables

---

1. Encoding

Convert the code into Unicode, generate new Unicode codes through custom mapping relationships, and finally convert them into corresponding symbols

Reference code:

An open-source project

2. Decoding

During exploitation, decrypt according to the encrypted mapping relationships

If analyzing samples with obfuscated code, set breakpoints before the code loading process to obtain the decoded content

0x06 Summary

---

This article uses Braille Patterns as an example to introduce the basic method of obfuscating strings using Unicode encoding. It demonstrates encoding and decoding of Braille Patterns through programs and briefly discusses the approach for encoding and decoding using other Unicode character tables.

Penetration Basics - Obfuscating Strings Using Unicode Encoding

0x00 Preface

0x01 Introduction

0x02 Implementation principle of sample characters

Basic Knowledge 1: Unicode

Basic Knowledge 2: Braille Patterns

0x03 Implementing Encoding via Program

0x04 Decoding via Program Implementation

0x05 Approach for Encoding and Decoding Using Other Unicode Character Tables

1. Encoding

2. Decoding

0x06 Summary

Related News

AntiVirus Evasion Tool (avet) Testing and Analysis

Penetration Basics - Choosing a Suitable C2 Domain

Penetration Basics - Zimbra Version Detection

Penetration Basics - Searching and Exporting Emails from Exchange Servers

Penetration Basics - Running csvde on Windows 7