How to Create a Custom Strategy
MaskMe is extensible. If none of the built-in strategies fit your needs, you can create a custom strategy in minutes.
When to Create a Custom Strategy
Domain-specific transformations (medical codes, financial formats)
Industry-specific rules (NACE codes, zip code hierarchies)
Multi-field dependencies (redact based on another field’s value)
Compliance rules (e.g., mask if classification is “confidential”)
Example: You need to anonymize medical diagnoses using a custom grouping hierarchy instead of hashing.
Anatomy of a Strategy
Every MaskMe strategy is a simple Python function:
def apply(value, **kwargs):
"""
Transform a value.
Args:
value: The value to transform
**kwargs: Optional parameters (strategy-specific)
Returns:
Transformed value, or "__DROP__" to remove the field
"""
# Your transformation logic
return transformed_value
Signature requirements:
Function name:
applyFirst parameter:
value(the data to transform)Accepts
**kwargs(for consistency with other strategies)Returns: transformed value (any type) or the special string
"__DROP__"
Example 1: Simple Redaction with Domain Logic
Let’s say you have medical diagnosis codes (ICD-10) and want to keep only the first character (broader category).
File: custom_strategies/icd_generalize.py
"""
Custom strategy: Generalize ICD-10 codes to their first character.
ICD-10: J10.01 (Influenza A) → J (Diseases of respiratory system)
"""
def apply(value, **kwargs):
"""
Keep first character of ICD-10 code.
Args:
value: ICD-10 code (e.g., "J10.01")
Returns:
First character (e.g., "J")
"""
if value is None:
return ""
str_val = str(value).strip()
return str_val[0] if str_val else ""
Usage in rules.json:
{
"diagnosis": "icd_generalize"
}
Using with Python API:
from maskme.core.engine import MaskMe
# Import your custom strategy
from custom_strategies import icd_generalize
# Register it in the engine's strategy registry
masker = MaskMe(rules={
"diagnosis": "icd_generalize"
})
masker.strategies["icd_generalize"] = icd_generalize.apply
# Now use it
masked_records = list(masker.mask(records))
Example 2: Conditional Redaction
Suppose you want to redact sensitive data only if it’s marked as “confidential”:
File: custom_strategies/conditional_redact.py
"""
Custom strategy: Redact if marked confidential, otherwise hash.
"""
import hashlib
from maskme.strategies import redaction
def apply(value, is_confidential=False, **kwargs):
"""
Redact if confidential, else hash.
Args:
value: Value to transform
is_confidential: Boolean flag (passed from rules)
Returns:
Redacted value or hash
"""
if value is None:
return ""
if is_confidential:
# Redact completely
return redaction.apply(value, char="*")
else:
# Hash instead (less destructive)
return hashlib.sha256(str(value).encode()).hexdigest()
Usage in rules.json:
{
"email": {"strategy": "conditional_redact", "is_confidential": true},
"phone": {"strategy": "conditional_redact", "is_confidential": false}
}
Example 3: Domain-Specific Transformation (PII Masking)
A realistic example: anonymize personal names using a lookup table of fake names.
File: custom_strategies/fake_names.py
"""
Custom strategy: Replace names with fake names (consistent).
Same real name → same fake name (deterministic for record linkage).
"""
import hashlib
# Simple lookup table (in production, use a larger database)
FAKE_NAMES = [
"Alex", "Jordan", "Casey", "Morgan", "Riley",
"Taylor", "Quinn", "Dakota", "Bailey", "Avery"
]
def apply(value, salt="", **kwargs):
"""
Replace name with consistent fake name.
Args:
value: Real name (e.g., "Alice Johnson")
salt: Salt for consistency
Returns:
Fake name (same input → same output)
"""
if value is None:
return ""
# Create deterministic hash to pick from list
hash_input = f"{value}{salt}".encode()
hash_value = int(hashlib.sha256(hash_input).hexdigest(), 16)
fake_index = hash_value % len(FAKE_NAMES)
return FAKE_NAMES[fake_index]
Usage:
from maskme.core.engine import MaskMe
from custom_strategies import fake_names
masker = MaskMe(rules={
"name": {"strategy": "fake_names", "salt": "my-org-salt"}
})
masker.strategies["fake_names"] = fake_names.apply
# "Alice Johnson" → "Alex" (always)
# "Bob Smith" → "Jordan" (always)
# Different real names, same fake → consistency for linking
Example 4: Field-Dependent Masking
Mask a field differently based on another field’s value:
File: custom_strategies/field_aware.py
"""
Custom strategy: Mask differently based on record classification.
Example: Salary anonymization
- If classification="public": hash
- If classification="internal": noise
- If classification="confidential": drop
"""
import hashlib
import random
def apply(value, record=None, salt="", **kwargs):
"""
Classification-aware masking.
Args:
value: Salary to mask
record: Full record (for context), optional
salt: Salt for hashing
Returns:
Masked value or "__DROP__"
"""
if value is None:
return ""
# This is a simplified example. In practice, the engine would need
# to pass the full record context (not currently built-in).
# See advanced patterns below.
classification = record.get("classification", "internal") if record else "internal"
if classification == "public":
return hashlib.sha256(f"{value}{salt}".encode()).hexdigest()[:8]
elif classification == "internal":
# Add ~5% noise
noised = float(value) * (1 + random.uniform(-0.05, 0.05))
return f"{noised:.0f}"
else: # confidential
return "__DROP__"
Registering Your Custom Strategy
Option 1: Register in Python Code
from maskme.core.engine import MaskMe
from custom_strategies import my_strategy
masker = MaskMe(rules=my_rules)
masker.strategies["my_custom_strategy"] = my_strategy.apply
Option 2: Create a Module and Import
If you have multiple custom strategies:
File: my_company/anonymization/strategies.py
"""
My company's custom strategies.
"""
def medical_code_generalize(value, **kwargs):
return str(value)[0] if value else ""
def pii_mask(value, **kwargs):
# ... implementation
pass
Then use:
from maskme.core.engine import MaskMe
from my_company.anonymization import strategies
masker = MaskMe(rules=my_rules)
masker.strategies["medical"] = strategies.medical_code_generalize
masker.strategies["pii"] = strategies.pii_mask
Testing Your Custom Strategy
Always test before using in production:
from custom_strategies import my_strategy
# Test basic cases
assert my_strategy.apply("test") == "expected"
assert my_strategy.apply(None) == ""
assert my_strategy.apply(123) == "123" # Type conversion
# Test edge cases
assert my_strategy.apply("") == "expected"
assert my_strategy.apply(" ") == "expected"
# Test with parameters
result = my_strategy.apply("test", param1="value")
assert result == "expected"
# Test determinism (if required)
value1 = my_strategy.apply("consistent-input", salt="my-salt")
value2 = my_strategy.apply("consistent-input", salt="my-salt")
assert value1 == value2
Full Example: End-to-End
Scenario: Anonymize student test scores. Keep the score but round to nearest 5, hash student ID.
File: custom_strategies/education.py
"""
Custom strategies for education datasets.
"""
import hashlib
def round_score(value, **kwargs):
"""Round score to nearest 5."""
if value is None:
return None
score = float(value)
return round(score / 5) * 5
def hash_student_id(value, salt="", **kwargs):
"""Hash student ID for anonymity."""
if value is None:
return ""
input_str = f"{value}{salt}".encode()
return hashlib.sha256(input_str).hexdigest()[:8]
Usage:
import csv
from maskme.core.engine import MaskMe
from custom_strategies.education import round_score, hash_student_id
# Set up engine
masker = MaskMe(rules={
"student_id": "hash_student_id",
"name": "drop",
"score": "round_score",
"class": "keep"
})
masker.strategies["hash_student_id"] = hash_student_id
masker.strategies["round_score"] = round_score
# Load and process data
with open("scores.csv") as f:
records = list(csv.DictReader(f))
masked = list(masker.mask(records))
# Save
with open("scores_masked.csv", "w") as f:
writer = csv.DictWriter(f, fieldnames=masked[0].keys())
writer.writeheader()
writer.writerows(masked)
Input (scores.csv):
student_id,name,score,class
S001,Alice,87,Math
S002,Bob,92,Math
S003,Carol,78,Science
Output (scores_masked.csv):
student_id,score,class
a1b2c3d4,85,Math
e5f6g7h8,90,Math
i9j0k1l2,80,Science
Advanced: Strategy Composition
Combine multiple custom strategies:
def hash_then_redact(value, salt="", keep_end=4, **kwargs):
"""Hash then show last N characters."""
hashed = hashlib.sha256(f"{value}{salt}".encode()).hexdigest()
length = len(hashed)
if keep_end >= length:
return hashed
redacted_part = "*" * (length - keep_end)
return redacted_part + hashed[-keep_end:]
Result:
"alice@example.com"
→ hashed: "d6d5d09f12b3f0f1a8a2c1e3b5e7d9f2"
→ redacted: "****************************7d9f2"
Best Practices
Keep it simple: One transformation per strategy
Document parameters: What does each kwarg do?
Handle None: Always check for None input
Be deterministic: Same input → same output (unless randomness is intentional)
Validate inputs: Raise clear errors for invalid parameters
Test edge cases: Empty strings, None, wrong types
Document privacy: What privacy guarantee does this provide?
Avoid data leakage: Don’t accidentally include sensitive data in error messages
Next Steps
Need a more complex strategy? See API Reference for the full engine API
Want to measure privacy of your custom strategy? See analytics modules
Ready to ship? See Getting Started with MaskMe for production patterns