感谢2023年的陪伴,2024年继续加油^_^
该系列文章将系统整理和深入学习系统安全、逆向分析和恶意代码检测,文章会更加聚焦,更加系统,更加深入,也是作者的慢慢成长史。漫漫长征路,偏向虎山行。享受过程,一起奋斗~
-
一.恶意软件概念
-
二.利用MS Defender批量标注恶意软件
-
1.构建恶意软件样本白名单
-
2.恶意软件家族标注
-
3.样本家族名称提取
-
4.Defender恶意家族命名规则解析
-
三.恶意样本家族构造方法
-
四.Usenix Sec20恶意软件标注工作
-
五.总结
作者的github资源:
-
逆向分析:
-
https://github.com/eastmountyxz/
SystemSecurity-ReverseAnalysis
-
网络安全:
-
https://github.com/eastmountyxz/
NetworkSecuritySelf-study
一.恶意软件概念
二.利用MS Defender批量标注恶意软件
1.构建恶意软件样本白名单
2.恶意软件家族标注
3.样本家族名称提取
# By: Li YXZ 2024-04-17
import argparse
import os
import pandas as pd
def print_and_export(parse_result):
output_path = 'parse_result1.csv'
if os.path.exists(output_path):
os.remove(output_path)
malware_families_samples = []
for malware_family, malware_family_samples in parse_result.items():
malware_family_samples = set(malware_family_samples)
for malware_family_sample in malware_family_samples:
malware_family_sample])
malware_samples = pd.DataFrame(malware_families_samples, columns=['Malware_Family', 'Sample'])
index=False)
print('Finished!')
def get_line_value(line: str):
line = line.strip().rstrip('n')
first_colon_index = line.index(':')
return line[first_colon_index + 1:]
def get_subpart_info_of_threat(sub_block: list):
# 在Threat Name引导的数据块中,Threat Name开头信息为4行,每个文件的信息也为4行,正好可用同一函数获取数据
threat_name_or_resource_schema = get_line_value(sub_block[0])
id_value_or_resource_path = get_line_value(sub_block[1])
severity_or_sigseq = get_line_value(sub_block[2])
resources_number_or_sigsha = get_line_value(sub_block[3])
return threat_name_or_resource_schema, id_value_or_resource_path, severity_or_sigseq, resources_number_or_sigsha
def parse_resource_scan_results(scan_results):
"""
Resource Scan开头:
Begin Resource Scan
Scan ID:{79739BA1-802B-4F5C-9FAD-FC0C19DC3A3C}
Scan Source:
Start Time:
End Time:
Explicit resource to scan
Resource Schema:folder
Resource Path:
Result Count: (威胁类型数量,即扫描的所有文件分为多少种类型/多少Threat Name)
每组威胁信息:
Threat Name:
ID:
Severity:
Number of Resources: (可能有多个文件属于该类威胁Threat Name)
Schema、Resource Path、两个Extended Info(SigSeq、SigSha),多个文件则每组信息不断排列]
Resource Schema:file
Resource Path:
Extended Info - SigSeq:
Extended Info - SigSha:
Resource Schema:containerfile
Resource Path:
Extended Info - SigSeq:
Extended Info - SigSha:
...
"""
resource_scan_attributes = ['Scan ID', 'Scan Source', 'Start Time', 'End Time']
threat_name_sig = 'Threat Name'
resource_scan_sig = {}
for index, line in enumerate(scan_results):
if ':' not in line:
continue
line = line.strip().rstrip('n')
first_colon_index = line.index(':')
sig = line[: first_colon_index]
if sig in resource_scan_attributes:
line[first_colon_index + 1:] =
elif sig == threat_name_sig:
scan_results = scan_results[index:] # 去掉Resource Scan开头信息部分
break
parse_result = {}
line_num = 0
while line_num < len(scan_results):
id_value, severity, resources_number =
get_subpart_info_of_threat(scan_results[line_num: line_num + 4])
# Threat Name如:TrojanDownloader:O97M/Donoff.PAB!MTB 、 Virus:X97M/Mailcab.B
# 可以看出Threat Name有一个固定的结构 具体的恶意家族写在'/'和'.'之前需要对此进行提取
# print(threat_name)
forward_slash_index = threat_name.index('/')
try:
try:
end_index = threat_name.index('.', forward_slash_index)
except ValueError:
end_index = threat_name.index('!', forward_slash_index)
threat_name = threat_name[forward_slash_index + 1: end_index]
except ValueError:
threat_name = threat_name[forward_slash_index + 1:]
threat_name = threat_name.lower()
line_num += 4
samples = []
resources_number = int(float(resources_number))
for _ in range(resources_number):
path, sigseq, sigsha = get_subpart_info_of_threat(scan_results[line_num: line_num + 4])
# 对于xlsx、xlsb、xlsm,Microsoft Defender能够识别出其中具体哪个表是恶意的
# 从而出现'...er-1303209212.xlsb->xl/worksheets/sheet3.bin'这样的路径
if '->' in path:
path = path[: path.index('->')]
file_name = os.path.basename(path)
samples.append(file_name)
line_num += 4
if threat_name in parse_result:
parse_result[threat_name].extend(samples)
else:
samples =
# 返回结果为一个字典 每一threat_name项的值为一个列表 包含了该类threat_name的所有文件
# 如{'Gozi': [1.xls, 2.xlsm, ...], ...}
return parse_result
def main():
arg_parser = argparse.ArgumentParser()
'--file', type=str, help='The path of MPLog', metavar='FILE_PATH')
args = arg_parser.parse_args()
# MPLog在系统中所在位置:C:ProgramDataMicrosoftWindows DefenderSupportMPLog-[...]-[...].txt
if args.file:
mplog_path = os.path.abspath(args.file)
else:
mplog_path = os.path.abspath('MPLog-20230718-182625.log')
with open(mplog_path, 'r', encoding='utf-16') as mplog:
mplog_content = mplog.readlines()
resource_scan_begin = 'Begin Resource Scan'
resource_scan_begin_indexes = []
resource_scan_end = 'End Scan'
resource_scan_end_indexes = []
for line_num, line in enumerate(mplog_content):
line = line.strip()
if line.startswith(resource_scan_begin):
resource_scan_begin_indexes.append(line_num)
elif line.startswith(resource_scan_end):
resource_scan_end_indexes.append(line_num)
latest_scan_begin_index = resource_scan_begin_indexes[-1]
latest_scan_end_index = resource_scan_end_indexes[-1]
latest_scan_results = mplog_content[latest_scan_begin_index + 1: latest_scan_end_index]
samples_classification = parse_resource_scan_results(latest_scan_results)
# 将家族类别与样本文件名写入parse_result.csv中
print_and_export(samples_classification)
if __name__ == '__main__':
main()
Begin Resource Scan
Resource Path:D:malware
Result Count:7
Threat Name:Virus:Win32/Nemim.A
Resource Path:D:malwarepe-0a7c325973943
Threat Name:TrojanDownloader:O97M/ZLoader.ARJ!MTB
Resource Path:D:malwarexlm-00e27f49734eb
Number of Resources:2
Threat Name:TrojanDropper:PowerShell/Cobacis.B
Resource Path:D:malwareps-beacon-
Resource Path:D:malwareps-beacon
Threat Name:TrojanDownloader:O97M/EncDoc.DK!MTB
Resource Path:D:malwarexlm-00cf050ee5410d
Threat Name:Trojan:Win32/Tapaoux.A
Resource Path:D:malwarepe-0a812976b9412
Threat Name:TrojanDownloader:Win32/Garveep.B
Resource Path:D:malwarepe-0b269bdd4c2d1
Threat Name:TrojanDownloader:O97M/EncDoc.YAF!MTB
Resource Path:D:malwarexlm-00aadd0cee3b5
End Scan
4.Defender恶意家族命名规则解析
* Adware
* Backdoor
* Behavior
* BrowserModifier
* Constructor
* DDoS
* Exploit
* HackTool
* Joke
* Misleading
* MonitoringTool
* Program
* Personal Web Server (PWS)
* Ransom
* RemoteAccess
* Rogue
* SettingsModifier
* SoftwareBundler
* Spammer
* Spoofer
* Spyware
* Tool
* Trojan
* TrojanClicker
* TrojanDownloader
* TrojanNotifier
* TrojanProxy
* TrojanSpy
* VirTool
* Virus
* Worm
* AndroidOS: Android operating system
* DOS: MS-DOS platform
* EPOC: Psion devices
* FreeBSD: FreeBSD platform
* iOS: iPhone operating system
* Linux: Linux platform
* macOS: MAC 9.x platform or earlier
* macOS_X: macOS X or later
* OS2: OS2 platform
* Palm: Palm operating system
* Solaris: System V-based Unix platforms
* SunOS: Unix platforms 4.1.3 or lower
* SymbOS: Symbian operating system
* Unix: general Unix platforms
* Win16: Win16 (3.1) platform
* Win2K: Windows 2000 platform
* Win32: Windows 32-bit platform
* Win64: Windows 64-bit platform
* Win95: Windows 95, 98 and ME platforms
* Win98: Windows 98 platform only
* WinCE: Windows CE platform
* WinNT: WinNT
* ABAP: Advanced Business Application Programming scripts
* ALisp: ALisp scripts
* AmiPro: AmiPro script
* ANSI: American National Standards Institute scripts
* AppleScript: compiled Apple scripts
* ASP: Active Server Pages scripts
* AutoIt: AutoIT scripts
* BAS: Basic scripts
* BAT: Basic scripts
* CorelScript: Corelscript scripts
* HTA: HTML Application scripts
* HTML: HTML Application scripts
* INF: Install scripts
* IRC: mIRC/pIRC scripts
* Java: Java binaries (classes)
* JS: JavaScript scripts
* LOGO: LOGO scripts
* MPB: MapBasic scripts
* MSH: Monad shell scripts
* MSIL: .NET intermediate language scripts
* Perl: Perl scripts
* PHP: Hypertext Preprocessor scripts
* Python: Python scripts
* SAP: SAP platform scripts
* SH: Shell scripts
* VBA: Visual Basic for Applications scripts
* VBS: Visual Basic scripts
* WinBAT: Winbatch scripts
* WinHlp: Windows Help scripts
* WinREG: Windows registry scripts
* A97M: Access 97, 2000, XP, 2003, 2007, and 2010 macros
* HE: macro scripting
* O97M: Office 97, 2000, XP, 2003, 2007, and 2010 macros - those that affect Word, Excel, and PowerPoint
* PP97M: PowerPoint 97, 2000, XP, 2003, 2007, and 2010 macros
* V5M: Visio5 macros
* W1M: Word1Macro
* W2M: Word2Macro
* W97M: Word 97, 2000, XP, 2003, 2007, and 2010 macros
* WM: Word 95 macros
* X97M: Excel 97, 2000, XP, 2003, 2007, and 2010 macros
* XF: Excel formulas
* XM: Excel 95 macros
* .dam: damaged malware
* .dll: Dynamic Link Library component of a malware
* .dr: dropper component of a malware
* .gen: malware that is detected using a generic signature
* .kit: virus constructor
* .ldr: loader component of a malware
* .pak: compressed malware
* .plugin: plug-in component
* .remnants: remnants of a virus
* .worm: worm component of that malware
* !bit: an internal category used to refer to some threats
* !cl: an internal category used to refer to some threats
* !dha: an internal category used to refer to some threats
* !pfn: an internal category used to refer to some threats
* !plock: an internal category used to refer to some threats
* !rfn: an internal category used to refer to some threats
* !rootkit: rootkit component of that malware
* @m: worm mailers
* @mm: mass mailer worm
三.恶意样本家族构造方法
四.Usenix Sec20恶意软件标注工作
五.总结
行路难,多歧路。感谢家人的陪伴,小珞最近长大了很多,调皮又可爱,走路都在唱歌,希望她也一样。感谢大家2023年的支持和关注,让我们在2024年继续加油!分享更多好文章,感恩,娜璋白首。
前文回顾(下面的超链接可以点击喔):
-
[系统安全] 五十七.恶意软件分析 (9)利用MS Defender实现恶意样本家族批量标注(含学术探讨)
原文始发于微信公众号(娜璋AI安全之家):[系统安全] 五十七.恶意软件分析 (9)利用MS Defender实现恶意样本家族批量标注(含学术探讨)