0x00 Preface
---
The previous article 'Penetration Basics - Exchange Version Detection and Vulnerability Scanning' introduced two methods for version detection using Python. For version identification, known version information is first obtained from the official website, stored in a list, and then detailed Exchange version information is obtained through string matching. The open-source code Exchange_GetVersion_MatchVul.py received positive feedback. However, this method has a drawback: it requires regularly accessing the official website to manually update the version information list in the scanning script.
To further improve efficiency, this article introduces another implementation method: by accessing the official website and directly extracting detailed version information from the returned data. The advantage is that it no longer requires regular script updates.
0x01 Introduction
---
This article will cover the following:
- Parsing webpage data with BeautifulSoup
- Implementation details
- Open-source code
0x02 Parsing Webpage Data with BeautifulSoup
---
BeautifulSoup is a Python library for extracting data from HTML or XML files, which can improve development efficiency.
Installation:
pip install bs4 |
1. Basic Usage
In Python implementation, you first need to obtain webpage content through the requests library, then call BeautifulSoup for parsing.
Test code:
from bs4 import BeautifulSoup |
The above code will access https://docs.microsoft.com/en-us/exchange/new-features/build-numbers-and-release-dates?view=exchserver-2019, pass the webpage data to BeautifulSoup for optimization and display.
Example of partial output from executing the code:
  Â
Exchange Server 2019 CU12 May22SU
May 10, 2022
15.2.1118.9
15.02.1118.009
Exchange Server 2019 CU12 (2022H1)
April 20, 2022
15.2.1118.7
15.02.1118.007
For the above results, each "tr" node corresponds to a version information, with child nodes "td" representing specific version details
2. Only filter out the content of "tr" nodes
Test code:
from bs4 import BeautifulSoup |
Example output from executing the code:
---
---
| Exchange Server 2019 CU12 May22SU | May 10, 2022 | 15.2.1118.9 | 15.02.1118.009 |
| Exchange Server 2019 CU12 (2022H1) | April 20, 2022 | 15.2.1118.7 | 15.02.1118.007 |
| Exchange Server 2019 CU11 May22SU | May 10, 2022 | 15.2.986.26 | 15.02.0986.026 |
Next, attempt to remove invalid data
3. Extract version information
Test code:
from bs4 import BeautifulSoup |
Example of partial output from executing the code:
--- |
Next, you can attempt to match the exact version
4. Exact Version Matching
Test code:
from bs4 import BeautifulSoup |
Example output of executing code:
[+] Exchange Information |
For older versions of Exchange, accurate version numbers cannot be obtained, so a rough version matching function also needs to be implemented
5. Rough Version Matching
Test code:
from bs4 import BeautifulSoup |
Example output from executing the code:
[+] Exchange Information |
6. Extract webpage data timestamp
To accurately obtain version information, it is also necessary to extract the update time of the webpage data
Mark the location of webpage data timestamp:
06/29/2022
Code to locate this timestamp:
print(soup.find_all('time')) |
Example output from executing the code:
[] |
Code to extract the timestamp:
print(soup.find('time').text) |
Example output from executing the code:
06/29/2022 |
Based on the above information, we can write new code to identify Exchange versions by reading data from the official website to obtain accurate versions. Considering automated identification of multiple targets, to avoid repeatedly accessing the website to read data, the code structure has been appropriately optimized. It only needs to access https://docs.microsoft.com/en-us/exchange/new-features/build-numbers-and-release-dates?view=exchserver-2019 once and save the webpage result in a variable. The code has been uploaded to GitHub at the following address:
An open-source project
Considering situations where the internal network cannot access the official website, we implemented a method to parse webpage files locally to obtain accurate versions. The code has been uploaded to GitHub at the following address:
An open-source project
You can first visit the official website and save the webpage content as exchange.data, then execute the script Exchange_GetVersion_ParseFromFile.py
0x03 Summary
---
This article introduces optimization methods for Exchange version identification, eliminating the need to manually update version information lists in scanning scripts. The open-source code includes Exchange_GetVersion_ParseFromWebsite.py and Exchange_GetVersion_ParseFromFile.py