How to Create a Custom Strategy

MaskMe is extensible. If none of the built-in strategies fit your needs, you can create a custom strategy in minutes.

When to Create a Custom Strategy

Domain-specific transformations (medical codes, financial formats)
Industry-specific rules (NACE codes, zip code hierarchies)
Multi-field dependencies (redact based on another field’s value)
Compliance rules (e.g., mask if classification is “confidential”)

Example: You need to anonymize medical diagnoses using a custom grouping hierarchy instead of hashing.

Anatomy of a Strategy

Every MaskMe strategy is a simple Python function:

def apply(value, **kwargs):
    """
    Transform a value.

    Args:
        value: The value to transform
        **kwargs: Optional parameters (strategy-specific)

    Returns:
        Transformed value, or "__DROP__" to remove the field
    """
    # Your transformation logic
    return transformed_value

Signature requirements:

Function name: apply
First parameter: value (the data to transform)
Accepts **kwargs (for consistency with other strategies)
Returns: transformed value (any type) or the special string "__DROP__"

Example 1: Simple Redaction with Domain Logic

Let’s say you have medical diagnosis codes (ICD-10) and want to keep only the first character (broader category).

File: custom_strategies/icd_generalize.py

"""
Custom strategy: Generalize ICD-10 codes to their first character.

ICD-10: J10.01 (Influenza A) → J (Diseases of respiratory system)
"""

def apply(value, **kwargs):
    """
    Keep first character of ICD-10 code.

    Args:
        value: ICD-10 code (e.g., "J10.01")

    Returns:
        First character (e.g., "J")
    """
    if value is None:
        return ""

    str_val = str(value).strip()
    return str_val[0] if str_val else ""

Usage in rules.json:

{
  "diagnosis": "icd_generalize"
}

Using with Python API:

from maskme.core.engine import MaskMe

# Import your custom strategy
from custom_strategies import icd_generalize

# Register it in the engine's strategy registry
masker = MaskMe(rules={
    "diagnosis": "icd_generalize"
})
masker.strategies["icd_generalize"] = icd_generalize.apply

# Now use it
masked_records = list(masker.mask(records))

Example 2: Conditional Redaction

Suppose you want to redact sensitive data only if it’s marked as “confidential”:

File: custom_strategies/conditional_redact.py

"""
Custom strategy: Redact if marked confidential, otherwise hash.
"""

import hashlib
from maskme.strategies import redaction

def apply(value, is_confidential=False, **kwargs):
    """
    Redact if confidential, else hash.

    Args:
        value: Value to transform
        is_confidential: Boolean flag (passed from rules)

    Returns:
        Redacted value or hash
    """
    if value is None:
        return ""

    if is_confidential:
        # Redact completely
        return redaction.apply(value, char="*")
    else:
        # Hash instead (less destructive)
        return hashlib.sha256(str(value).encode()).hexdigest()

Usage in rules.json:

{
  "email": {"strategy": "conditional_redact", "is_confidential": true},
  "phone": {"strategy": "conditional_redact", "is_confidential": false}
}

Example 3: Domain-Specific Transformation (PII Masking)

A realistic example: anonymize personal names using a lookup table of fake names.

File: custom_strategies/fake_names.py

"""
Custom strategy: Replace names with fake names (consistent).

Same real name → same fake name (deterministic for record linkage).
"""

import hashlib

# Simple lookup table (in production, use a larger database)
FAKE_NAMES = [
    "Alex", "Jordan", "Casey", "Morgan", "Riley",
    "Taylor", "Quinn", "Dakota", "Bailey", "Avery"
]

def apply(value, salt="", **kwargs):
    """
    Replace name with consistent fake name.

    Args:
        value: Real name (e.g., "Alice Johnson")
        salt: Salt for consistency

    Returns:
        Fake name (same input → same output)
    """
    if value is None:
        return ""

    # Create deterministic hash to pick from list
    hash_input = f"{value}{salt}".encode()
    hash_value = int(hashlib.sha256(hash_input).hexdigest(), 16)
    fake_index = hash_value % len(FAKE_NAMES)

    return FAKE_NAMES[fake_index]

Usage:

from maskme.core.engine import MaskMe
from custom_strategies import fake_names

masker = MaskMe(rules={
    "name": {"strategy": "fake_names", "salt": "my-org-salt"}
})
masker.strategies["fake_names"] = fake_names.apply

# "Alice Johnson" → "Alex" (always)
# "Bob Smith" → "Jordan" (always)
# Different real names, same fake → consistency for linking

Example 4: Field-Dependent Masking

Mask a field differently based on another field’s value:

File: custom_strategies/field_aware.py

"""
Custom strategy: Mask differently based on record classification.

Example: Salary anonymization
  - If classification="public": hash
  - If classification="internal": noise
  - If classification="confidential": drop
"""

import hashlib
import random

def apply(value, record=None, salt="", **kwargs):
    """
    Classification-aware masking.

    Args:
        value: Salary to mask
        record: Full record (for context), optional
        salt: Salt for hashing

    Returns:
        Masked value or "__DROP__"
    """
    if value is None:
        return ""

    # This is a simplified example. In practice, the engine would need
    # to pass the full record context (not currently built-in).
    # See advanced patterns below.

    classification = record.get("classification", "internal") if record else "internal"

    if classification == "public":
        return hashlib.sha256(f"{value}{salt}".encode()).hexdigest()[:8]
    elif classification == "internal":
        # Add ~5% noise
        noised = float(value) * (1 + random.uniform(-0.05, 0.05))
        return f"{noised:.0f}"
    else:  # confidential
        return "__DROP__"

Registering Your Custom Strategy

Option 1: Register in Python Code

from maskme.core.engine import MaskMe
from custom_strategies import my_strategy

masker = MaskMe(rules=my_rules)
masker.strategies["my_custom_strategy"] = my_strategy.apply

Option 2: Create a Module and Import

If you have multiple custom strategies:

File: my_company/anonymization/strategies.py

"""
My company's custom strategies.
"""

def medical_code_generalize(value, **kwargs):
    return str(value)[0] if value else ""

def pii_mask(value, **kwargs):
    # ... implementation
    pass

Then use:

from maskme.core.engine import MaskMe
from my_company.anonymization import strategies

masker = MaskMe(rules=my_rules)
masker.strategies["medical"] = strategies.medical_code_generalize
masker.strategies["pii"] = strategies.pii_mask

Testing Your Custom Strategy

Always test before using in production:

from custom_strategies import my_strategy

# Test basic cases
assert my_strategy.apply("test") == "expected"
assert my_strategy.apply(None) == ""
assert my_strategy.apply(123) == "123"  # Type conversion

# Test edge cases
assert my_strategy.apply("") == "expected"
assert my_strategy.apply("   ") == "expected"

# Test with parameters
result = my_strategy.apply("test", param1="value")
assert result == "expected"

# Test determinism (if required)
value1 = my_strategy.apply("consistent-input", salt="my-salt")
value2 = my_strategy.apply("consistent-input", salt="my-salt")
assert value1 == value2

Full Example: End-to-End

Scenario: Anonymize student test scores. Keep the score but round to nearest 5, hash student ID.

File: custom_strategies/education.py

"""
Custom strategies for education datasets.
"""

import hashlib

def round_score(value, **kwargs):
    """Round score to nearest 5."""
    if value is None:
        return None

    score = float(value)
    return round(score / 5) * 5

def hash_student_id(value, salt="", **kwargs):
    """Hash student ID for anonymity."""
    if value is None:
        return ""

    input_str = f"{value}{salt}".encode()
    return hashlib.sha256(input_str).hexdigest()[:8]

Usage:

import csv
from maskme.core.engine import MaskMe
from custom_strategies.education import round_score, hash_student_id

# Set up engine
masker = MaskMe(rules={
    "student_id": "hash_student_id",
    "name": "drop",
    "score": "round_score",
    "class": "keep"
})

masker.strategies["hash_student_id"] = hash_student_id
masker.strategies["round_score"] = round_score

# Load and process data
with open("scores.csv") as f:
    records = list(csv.DictReader(f))

masked = list(masker.mask(records))

# Save
with open("scores_masked.csv", "w") as f:
    writer = csv.DictWriter(f, fieldnames=masked[0].keys())
    writer.writeheader()
    writer.writerows(masked)

Input (scores.csv):

student_id,name,score,class
S001,Alice,87,Math
S002,Bob,92,Math
S003,Carol,78,Science

Output (scores_masked.csv):

student_id,score,class
a1b2c3d4,85,Math
e5f6g7h8,90,Math
i9j0k1l2,80,Science

Advanced: Strategy Composition

Combine multiple custom strategies:

def hash_then_redact(value, salt="", keep_end=4, **kwargs):
    """Hash then show last N characters."""
    hashed = hashlib.sha256(f"{value}{salt}".encode()).hexdigest()
    length = len(hashed)
    if keep_end >= length:
        return hashed
    redacted_part = "*" * (length - keep_end)
    return redacted_part + hashed[-keep_end:]

Result:

"alice@example.com"
→ hashed: "d6d5d09f12b3f0f1a8a2c1e3b5e7d9f2"
→ redacted: "****************************7d9f2"

Best Practices

Keep it simple: One transformation per strategy
Document parameters: What does each kwarg do?
Handle None: Always check for None input
Be deterministic: Same input → same output (unless randomness is intentional)
Validate inputs: Raise clear errors for invalid parameters
Test edge cases: Empty strings, None, wrong types
Document privacy: What privacy guarantee does this provide?
Avoid data leakage: Don’t accidentally include sensitive data in error messages

Next Steps

Need a more complex strategy? See API Reference for the full engine API
Want to measure privacy of your custom strategy? See analytics modules
Ready to ship? See Getting Started with MaskMe for production patterns