0x00 Introduction

---

Continuing the study of image steganography techniques, this time focusing on learning and understanding the JPEG file format. Compared to PNG file formats, JPEG files are relatively simpler. The methods for extracting hidden payloads are largely similar, with the main difference lying in the file formats themselves, leading to variations in exploitable details.

Tools mentioned in this article:

  • Hex Editor: Hex Editor
  • Steganography Detection: Stegdetect

Download link:

https://github.com/abeluck/stegdetect

  • Edit EXIF Information: MagicEXIF

Download link:

http://www.magicexif.com/

  • Analyze JPEG Image Format: JPEGsnoop

Download link:

http://www.impulseadventure.com/photo/jpeg-snoop.html

0x01 Related Concepts

---

JPEG File

JPEG stands for Joint Photographic Experts Group

Supports lossy compression

Does not support transparency

Does not support animation

Non-vector

Difference between JPEG and JPG

JPEG can serve as both a file extension and represent the file format

JPG is an abbreviation of JPEG, representing the file extension

JPEG and JPG are essentially the same, and their formats are interchangeable

Color Model

Uses the YCrCb color model, which is more suitable for image compression than RGB

  • Y represents luminance
  • Cr represents the red component
  • Cb represents the blue component

The human eye is far more sensitive to changes in luminance Y than to changes in chrominance C. If each point stores an 8-bit luminance value Y, and every 2x2 points store one CrCb value, the perceived visual quality of the image will not change significantly, while saving half the space.

The RGB model requires 4x3=12 bytes for 4 points

The YCrCb model requires 4+2=6 bytes for 4 points

[R G B] -> [Y Cb Cr] conversion:

Y = 0.299R + 0.587G + 0.114*B

Cb = -0.1687R - 0.3313G + 0.5*B + 128

Cr = 0.5R - 0.4187G - 0.0813*B + 128

[Y,Cb,Cr] -> [R,G,B] conversion:

R = Y + 1.402 *(Cr-128)

G = Y - 0.34414(Cb-128) - 0.71414(Cr-128)

B = Y + 1.772 *(Cb-128)

File format

JPEG files can generally be divided into two parts: markers and compressed data

Markers:

Consist of two bytes, the first byte is a fixed value 0xFF, and the second byte has different values depending on the meaning

Any number of meaningless 0xFF fillers can be added before each marker, multiple consecutive 0xFF bytes can be interpreted as one 0xFF, indicating the start of a marker

Common markers:

  • SOI 0xD8 Start of Image
  • APP0 0xE0 Application Reserved Marker 0
  • APPn 0xE1 - 0xEF Application Reserved Marker n (n=1~15)
  • DQT 0xDB Quantization Table (Define Quantization Table)
  • SOF0 0xC0 Start of Frame (Start Of Frame)
  • DHT 0xC4 Huffman Table (Define Huffman Table)
  • DRI 0xDD Restart Interval (Define Restart Interval)
  • SOS 0xDA Start of Scan (Start Of Scan)
  • EOI 0xD9 End of Image

Compressed Data:

The first two bytes store the length of the entire segment, including these two bytes

Note:

This length representation method follows high-order first, low-order last, which differs from the length representation method in PNG files

For example, if the length is 0x12AB, the storage order is 0x12, 0xAB

Exif Information

Exif files are a type of JPEG file that comply with the JPEG standard, but include shooting information and thumbnail images in the file header information

JPEG files taken with a camera will have this information

Stored in the APP1 (0xFFE1) data area

The next two bytes store the size of the APP1 data area (i.e., the Exif data area).

Followed by the Exif Header, a fixed structure: 0x457869660000.

Then comes the Exif data.

Tool for viewing Exif information: exiftool.

Download address:

https://github.com/alchemy-fr/exiftool

Tool for editing Exif information: MagicEXIF.

Download address:

http://www.magicexif.com/

The addition operation is as shown in the figure.

Alt text

0x02 Common Steganography Methods

---

  • DCT encryption
  • LSB encryption
  • DCT LSB
  • Average DCT
  • High Capacity DCT
  • High Capacity DCT - Algorithm

The above steganography methods are referenced from:

https://www.blackhat.com/docs/asia-14/materials/Ortiz/Asia-14-Ortiz-Advanced-JPEG-Steganography-And-Detection.pdf

There are already many open-source tools capable of implementing the above advanced steganography methods

Common steganography tools:

  • JSteg
  • JPHide
  • OutGuess
  • Invisible Secrets
  • F5
  • appendX
  • Camouflage

Of course, corresponding steganalysis tools have also existed for a long time

For example: Stegdetect

Download link:

https://github.com/abeluck/stegdetect

0x03 Hiding Payload Using JPEG File Format

---

Next, we introduce some hiding ideas generated after studying file formats:

1. Directly append data at the end

Alt text

As shown, it does not affect normal image viewing

2. Insert custom COM comment

COM comment is 0xff and 0xfe

Insert data 0x11111111

Length is 0x04

Total length is 0x06

The complete hexadecimal format is 0xffff000611111111

Insert position is before DHT, as shown in the figure

Alt text

After insertion, as shown in the figure, it does not affect normal image viewing

Alt text

Change ff to fe, as shown in the figure, also does not affect normal image viewing

Alt text

3. Insert ignorable marker codes

Same principle as above, replace marker codes with special values that can be ignored

For example:

  • 00
  • 01 *TEM
  • d0 *RST0
  • dc DNL
  • ef APP15

Testing shows that the above identification codes do not affect normal image viewing

4. Modify DQT

DQT: Define Quantization Table

Identification code is 0xdb

The next two bytes indicate length

The next byte indicates QT configuration information

First 4 bits are QT number

Last 4 bits are QT precision, 0=8bit, otherwise 16bit

Finally, QT information with length being an integer multiple of 64

View DQT information of the test image, as shown

Alt text

Length is 0x43, decimal 67

00 indicates QT number 0, precision 8bit

Next 64 bytes are QT information bytes

Note:

The DQT format here is referenced from http://www.opennet.ru/docs/formats/jpeg.txt

Try replacing these 64 bytes, as shown in the figure

Alt text

Comparison before and after as shown in the figure reveals changes in the image

Alt text

If only adjusting some bytes to payload, how much difference can it make? Compare as shown in the figure

Alt text

By analogy, there are many positions available for modification

0x04 Detection and Identification

---

For the above hiding methods, traces can be discovered using JPEG image format analysis tools

For example, JPEGsnoop

Download address:

http://www.impulseadventure.com/photo/jpeg-snoop.html

Supports format analysis for the following files:

  • .JPG - JPEG Still Photo
  • .THM - Thumbnail for RAW Photo / Movie Files
  • .AVI* - AVI Movies
  • .DNG - Digital Negative RAW Photo
  • .PSD - Adobe Photoshop files
  • .CRW, .CR2, .NEF, .ORF, .PEF - RAW Photo
  • .MOV* - QuickTime Movies, QTVR (Virtual Reality / 360 Panoramic)
  • .PDF - Adobe PDF Documents

Actual test:

As shown below, the COM comment added to the image was discovered

Alt text

As shown below, the added payload was identified by examining the DQT data, where 0x11 corresponds to decimal 17

Alt text

Similarly, JPEGsnoop can parse the EXIF information of JPEG images, as shown below

Alt text

Note:

For testing purposes, the following values in the screenshot were manually added using MagicEXIF software:

EXIF Make/Model: OK [test] [???]
EXIF Makernotes: NONE
EXIF Software: OK [MagicEXIF Metadata Codec 1.02]

0x05 Supplement

---

Compared to PNG files, adding payloads to JPEG files is much simpler because JPEG files lack checksums for image data.

The method of downloading JPEG images, parsing them, and executing payloads will not be discussed here.

(Refer to https://an-open-source-project/%E9%9A%90%E5%86%99%E6%8A%80%E5%B7%A7-%E5%88%A9%E7%94%A8PNG%E6%96%87%E4%BB%B6%E6%A0%BC%E5%BC%8F%E9%9A%90%E8%97%8FPayload)

0x06 Summary

---

This article introduces the JPEG format, focusing on how to hide payloads using specific marker codes based on the JPEG file format. While this method does not affect normal image viewing, details can still be detected with format analysis software. There is much more to learn in the official documentation on the JPEG format; the deeper the understanding, the more techniques available for research.