Obfuscated Files or Information: Invisible Unicode

Adversaries may abuse invisible or non-printing Unicode characters to conceal malicious content within files, scripts, or text. By inserting characters that do not visibly render, adversaries may hide data, alter how content is interpreted, or make malicious code appear as benign text or whitespace. Adversaries may encode these malicious payloads, using binary, Base64, or custom schemes, to be reconstructed at runtime through scripting features such as JavaScript Proxy traps, eval(), or other dynamic execution methods. This technique enables adversaries to evade visual inspection and basic static analysis by hiding malicious encoded content in innocuous text.[1][2][3]

Unicode is a standardized character encoding model that assigns a unique numerical value, known as a code point, to every character across writing systems, enabling consistent text representation across platforms, applications, and languages. Code points are represented as U+ followed by a hexadecimal value and may be encoded using formats such as UTF-8 or UTF-16. Adversaries may abuse the valid code points in Unicode that are not visibly rendered but still take up bytes, such as zero-width spaces, variation selectors, or bidirectional formatting controls, to conceal malicious payloads.[2][4][5]

Adversaries may additionally exploit Private Use Area (PUA) characters, a range of code points reserved for custom assignment. PUA characters that are not defined by a font or application are typically rendered blank.[1]

Unicode characters may also be leveraged in support of other techniques such as Phishing, Right-to-Left Override, or User Execution. For example, some adversaries may embed artificial intelligence (AI) prompt injections using invisible Unicode characters in emails or documents that appear benign when processed by AI systems.[6][7]

ID: T1027.018
Sub-technique of:  T1027
Tactic: Stealth
Platforms: Linux, Windows, macOS
Contributors: Menachem Goldstein; Rich Rafferty (NR Labs)
Version: 1.0
Created: 22 April 2026
Last Modified: 23 April 2026

Procedure Examples

ID Name Description
S9010 GlassWorm

GlassWorm has utilized invisible Unicode Private Use Area (PUA) characters to obfuscate its malicious code so that it does not render in code editors.[8][9][10]

Mitigations

This type of attack technique cannot be easily mitigated with preventive controls since it is based on the abuse of system features.

Detection Strategy

ID Name Analytic ID Analytic Description
DET0920 Detection Strategy for Invisible Unicode AN2063

Detection identifies execution of scripts or files that appear visually benign (low printable character ratio) but result in runtime decoding, dynamic evaluation, and subsequent process or network activity. Correlation links script execution with abnormal Unicode density and follow-on behavior such as child process creation or outbound connections.

AN2064

Detection identifies execution of scripts containing high concentrations of invisible Unicode characters followed by decoding or interpretation behaviors (e.g., base64 decode, eval) and subsequent process or network activity. Emphasis is placed on mismatch between file entropy/structure and execution output.

AN2065

Detection identifies execution of scripts or applications containing invisible Unicode payloads reconstructed at runtime, correlated with abnormal AppleScript, JavaScript for Automation, or shell execution and subsequent process or network behavior inconsistent with visible file content.

References