Latest research, product updates and best practices on staying secure in the cloud | Permiso

Intern Showcase: Anonymizing Logs Made Easy with LogLicker

Written by Corey Ahl | Aug 17, 2023 2:52:54 PM
 LogLicker On GitHub: https://github.com/Permiso-io-tools/LogLicker 


Introduction

Logs play a crucial role in monitoring and analyzing system activity, but handling sensitive information within them can be a daunting task. Whether you're sharing logs with third-party services like ChatGPT or revealing examples of malicious activities on online forums, the process of finding and replacing sensitive data is time-consuming and error prone.

Introducing LogLicker

LogLicker is a tool designed to simplify the process of anonymizing logs by replacing sensitive information with randomized placeholders. While LogLicker was designed to address AWS CloudTrail logs, it can also be configured to work with other log types. This enables you to share logs more freely and perform analyses without compromising data privacy. LogLicker is written in python and requires the packages: boto3, exrex, and regex.

Key Features:

  • Information Detection: LogLicker uses regular expressions (regexes) to identify sensitive data within the logs. This can include IP addresses, email addresses, access keys, and more.
  • Information Replacement: LogLicker uses regular expressions to generate random values that correspond closely to the sensitive data, replacing it with these similar values.
  • Manifest File: LogLicker creates a manifest file that maps the original sensitive information to the corresponding replacement values. This manifest file can be stored and used later to deanonymize the data.
  • Customization: By editing the default regex files or adding new ones, this utility can also be used to identify sensitive information from other services.

How LogLicker Works:

  1. Data Source Selection: You can use LogLicker to process logs from either a local text file or directly fetch them from the CloudTrail API.
  2. Sensitive Information Detection: LogLicker uses regexes to scan the logs and identifies any instances of sensitive information that need to be anonymized.
  3. Safe Replacement: When LogLicker detects sensitive data, it generates values based on the regexes, preserving the data’s format and context without compromising privacy.
  4. Manifest Creation: LogLicker generates a manifest file of changes made. The manifest can be used to identify the information found within the logs and input it back into the tool for easy deanonymization.

Use Cases:

  • Secure Sharing: Anonymize logs before sharing them with third-party services or online communities, ensuring sensitive data remains protected.
  • Analysis: Anonymized logs enable analysis of logs without risking exposure of sensitive information.
  • Identification of Sensitive Information: By using LogLicker and the generated manifest file, you can more easily view all discovered sensitive information.

Use Examples

Help -h

The three subparsers each have their own help details.

python3 RunLogLicker.py rawtext -h
python3 RunLogLicker.py cloudtrail -h
python3 RunLogLicker.py rawcloudtrail -h

 

Example 1 - Typical Use Case

This is an example of using LogLicker to anonymize logs, processing the logs through a third-party service, then deanonymizing the logs.

1. Process logs from input/rawcloudtrail.txt and output anonymized logs to output/anonymizedrawtext.txt. Save the manifest file to output/sensitiveinfomanifest.json. (If no output is provided, a default name is used in the tool’s ‘output’ directory. All output will append a unique hash resulting from the reviewed data.)

python3 RunLogLicker.py rawtext -ifp output/rawcloudtrail.txt -ofp output/anonymizedrawtext.txt -omfp output/sensitiveinfomanifest.json

Snippet from output/anonymizedrawtext.txt:

{
    “eventVersion”: “1.09",
    “userIdentity”: {
        “type”: “IAMUser”,
        “principalId”: “AIDAZIA2LHZ7HRF5DKD6”,
        “arn”: “arn:aws:iam::668284825298:user/random-generated-nameRXnmhQqIgPwq”,
        “accountId”: “668284825298”,
        “accessKeyId”: “AKIALZ6DQM4QEAVPM4OK”,
        “userName”: “random-generated-nameRXnmhQqIgPwq”
    },
    “eventTime”: “2023-08-03T14:51:07Z”,
    “eventSource”: “cloudtrail.amazonaws.com”,
    “eventName”: “LookupEvents”,
    “awsRegion”: “eu-southnorth-1”,
    “sourceIPAddress”: “041.0.97.240”,
    “userAgent”: “Boto3/1.27.0 md/Botocore#1.30.0 ua/2.0 os/windows#10 md/arch#amd64 lang/python#3.11.4 md/pyimpl#CPython cfg/retry-mode#legacy Botocore/1.30.0”,
    “requestParameters”: {
        “startTime”: “Jan 1, 2023, 12:00:00 AM”,
        “endTime”: “Feb 2, 2023, 12:00:00 AM”,
        “maxResults”: 1000
    },
    “responseElements”: null,
    “requestID”: “474571a3-a575-4468-9674-1ae1fc4752e9",
    “eventID”: “a709d759-acbb-49db-8a9f-faf013b923b0",
    “readOnly”: true,
    “eventType”: “AwsApiCall”,
    “managementEvent”: true,
    “recipientAccountId”: “668284825298",
    “eventCategory”: “Management”,
    “tlsDetails”: {
        “tlsVersion”: “TLSv1.2”,
        “cipherSuite”: “ECDHE-RSA-AES128-GCM-SHA256”,
        “clientProvidedHostHeader”: “cloudtrail.eu-southnorth-1.amazonaws.com
    }
}

Snippet from output/sensitiveinfomanifest.json:

{
    "longTermAccessKeyID": {
        "AKIAG4Z3NFJEZ4WLJ28H": "AKIALZ6DQM4QEAVPM4OK"
    },
    "shortTermAccessKeyID": {
        "ASIA34GD6J1HXJXK64UT": "ASIAIQO1LK7W5OJFOBCT",
        "ASIAXNFTUTO7AHYOQEI3": "ASIAXAMYGIFW10188XWB",
        "ASIA6ITB0B1SO0FF3VW9": "ASIA9T4WLGZ9C0XUDO9G"
    },
    "publicKeyID": {},
    "stsServiceBearerTokenID": {},
    "contextSpecificCredentialID": {},
    "groupID": {},
    "ec2InstanceProfileID": {},
    "iamUserID": {},
    "managedPolicyID": {},
    "roleID": {
        "AROAEPBCOMGTZR0EAOH07": "AROATGYYNHWNI5PRMPUCA",
        "AROAMY5IOQYLXZAVYT3TA": "AROA59WJDL2MDP2U6ZNZQ"
    },
    "certificateID": {},
    "accountID": {
        "497402563602": "668284825298"
    },
    "username": {
        "random-nameqPHgAoU59wq5": "random-generated-nameRXnmhQqIgPwq",
        "random-namemFoXAeql5rNA": "random-generatedZIsNOnywsrZQ",
        "random-generated-namebTrHRCf1xNG0": "random-namexQj9TnMBCcoa"
    },
    "arn": {
        "assumed-role/random-generated-namebTrHRCf1xNG0/EC2_ORCHESTRATOR7978706923425647585": "random-generated-nameMcBMncaOPYLI",
        "assumed-role/random-generated-namebTrHRCf1xNG0/MandoService-1720578260957050536": "random-nameoN9HFyT6NipP"
    },
    "instanceID": {},
    "ipv4": {
        "254.253.251.214": "41.0.97.240"
    },
    "region": {
        "cn-west-1": "eu-south-1"
    },
    "email": {},
    "specifiedStrings": {}
}

 

2. Put output/anonymizedrawtext.txt through any third-party service, with limited risk of leaking sensitive information.

3. De-anonymize the logs using the manifest:


python3 RunLogLicker.py rawtext -ifp output/anonymizedlogsafter3rdpartyservice.txt -ofp output/deanonymizedrawtext.txt -imfp output/sensitiveinfomanifest.json -da true

This will use the manifest to replace all the anonymized information with the original information.

 

Example 2 - Identifying information

This is an example of using LogLicker to find instances of long-term access keys within CloudTrail logs from 2021-12-01 to present.

python3 RunLogLicker.py cloudtrail -omfp output/sensitiveinfo.json  -s 2021-12-01 -l 2000 -rl longTermAccessKeyID

output/sensitveInfo.json then contains:

{
    "longTermAccessKeyID": {
        "AKIA49GZFM1Y0CATEHQP": "AKIAK0WSTQ16GSANA8YD",
        "AKIARJAZ4NK5QXOE38P9": "AKIAE5L2PREGASDEJQKD",
        "AKIARKQLFANJ3CS27T6H": "AKIA60UW2CA49SFGSAAA",
        "AKIAHOWCXK59EUTNM2ZJ": "AKIA5M0J15EZ7M5GSD5F",
        "AKIA6AXOJSTNPGHCBWR0": "AKIAXMASQXM5Q1GSAKCO",
        "AKIA3SXRZM04YTICAJ25": "AKIAM9A62U40E3GDAS27",
        "AKIACFP149JZO0KAITUL": "AKIA6LPHFSDCKGSAEFIR",
        "AKIAMOFQV05K3RPICHBE": "AKIACMTPCKOA71NVGDS8",
    },
    "shortTermAccessKeyID": {},
    "publicKeyID": {},
    "stsServiceBearerTokenID": {},
    "contextSpecificCredentialID": {},
    "groupID": {},
    "ec2InstanceProfileID": {},
    "iamUserID": {},
    "managedPolicyID": {},
    "roleID": {},
    "certificateID": {},
    "accountID": {},
    "username": {},
    "arn": {},
    "instanceID": {},
    "ipv4": {},
    "region": {},
    "email": {},
    "specifiedStrings": {}
}