GCP Virus Scanning Pipeline - Deployment Instructions
This pipeline serves as an example for integrating the ClamAV API server into a virus scanning pipeline and automating the process of scanning new files uploaded to GCS buckets. The Pub/Sub topics and subscriptions for clean and infected files can be used to trigger downstream processing based on scan results.
If this example doesn't fit your specific requirements, please let us know your needs. If possible, we can develop a customized solution tailored to your use case.
Overview
This GCP pipeline automatically scans files uploaded to a Google Cloud Storage (GCS) bucket for viruses using an external ClamAV scanning API. When files are uploaded to the configured bucket, the pipeline:
- Detects the upload via GCS bucket notifications
- Triggers a Cloud Function via Pub/Sub
- Scans the file using the configured virus scanning API
- Publishes scan results to separate Pub/Sub topics for clean and infected files
Architecture
The pipeline consists of the following components:
- GCS Bucket Notifications: Automatically publishes events to Pub/Sub when files are uploaded
- Pub/Sub Topic & Subscription: Receives GCS upload notifications
- Cloud Function (2nd Gen): Processes upload events and scans files for viruses
- Pub/Sub Topics for Results: Separate topics for clean and infected file notifications
- Service Accounts: Manages permissions for GCS-to-Pub/Sub notifications
Prerequisites
Before deploying this pipeline, ensure you have:
-
GCP Account & Project
- A GCP project with billing enabled
- Owner or Editor role on the project
- APIs enabled (see below)
-
Required GCP APIs Enabled
gcloud services enable cloudfunctions.googleapis.com gcloud services enable pubsub.googleapis.com gcloud services enable storage.googleapis.com gcloud services enable storage-component.googleapis.com gcloud services enable run.googleapis.com gcloud services enable cloudbuild.googleapis.com gcloud services enable artifactregistry.googleapis.com -
Terraform Installed
- Terraform version 1.10.5 or higher
- Installation instructions
-
gcloud CLI Installed & Configured
- Installation instructions
- Authenticated with your GCP account:
gcloud auth login gcloud auth application-default login
-
Existing GCS Bucket
- A GCS bucket where files will be uploaded (must already exist)
- The bucket name will be used in the deployment configuration
-
Virus Scanning API
- A running ClamAV virus scanning API service
- The API endpoint URL (IP address or hostname)
- The API should be accessible from your Cloud Function's VPC network
Download Files
Download the required Terraform and Python files for the pipeline:
- main.tf - Main Terraform configuration
- variables.tf - Variable definitions
- function/main.py - Cloud Function entry point
- function/requirements.txt - Python dependencies
Quick Setup:
-
Create a directory for your pipeline:
mkdir gcp-pipeline cd gcp-pipeline -
Download all files using one of the following methods:
Method 1: Direct download from documentation site
# Set the base URL (adjust if your documentation is hosted elsewhere) BASE_URL="https://docs.elmcomputing.io" # Download Terraform files curl -O "${BASE_URL}/gcp/virus-scanning-pipeline/files/main.tf" curl -O "${BASE_URL}/gcp/virus-scanning-pipeline/files/variables.tf" # Create function directory and download Python files mkdir -p function curl -o function/main.py "${BASE_URL}/gcp/virus-scanning-pipeline/files/function/main.py" curl -o function/requirements.txt "${BASE_URL}/gcp/virus-scanning-pipeline/files/function/requirements.txt"Method 2: Manual download
- Click on each file link above to download
- Organize them in the following directory structure:
gcp-pipeline/ ├── main.tf ├── variables.tf └── function/ ├── main.py └── requirements.txt
Method 3: Right-click and "Save link as"
- Right-click on each file link above
- Select "Save link as" or "Download linked file"
- Save to the appropriate location in your
gcp-pipelinedirectory
Deployment Steps
Step 1: Prepare Your Environment
-
Download the pipeline files (see Download Files section above) and organize them in a directory structure.
-
Navigate to the pipeline directory:
cd gcp-pipeline
Step 2: Configure Variables
Edit the variables in variables.tf or create a terraform.tfvars file with your specific values:
Required Variables:
project_id = "your-gcp-project-id"
region = "us-central1" # Your preferred GCP region
zone = "us-central1-a" # Your preferred GCP zone
uploads_bucket_name = "your-existing-bucket-name" # Must already exist
virus_scan_api_url = "10.128.0.47" # IP or hostname of your virus scanning API
Optional Variables (with defaults):
prefix: Prefix to add to all resource names for easy identification (default:"dev")function_runtime: Python runtime version (default:"python310")function_entry_point: Function entry point (default:"process_gcs_event")function_memory: Memory allocation (default:"256M")function_timeout: Timeout in seconds (default:300)function_max_instances: Maximum instances (default:1)function_min_instances: Minimum instances (default:0)function_cpu: CPU allocation (default:"0.5")function_max_concurrency: Max concurrent requests per instance (default:1)service_account_id: Service account ID (default:"gcs-pubsub-notifier")pubsub_ack_deadline_seconds: Message acknowledgement deadline (default:60)pubsub_message_retention_duration: Message retention (default:"2592000s"- 30 days)
Note on Resource Naming:
All resource names will be prefixed with the prefix variable value (default: "dev"). For example, with the default prefix = "dev", resources will be named like:
dev-<bucket-name>-upload(Pub/Sub topic)dev-gcs-virus-scan-<bucket-name>(Cloud Function)dev-gcs-pubsub-notifier(Service account)
Example terraform.tfvars file:
project_id = "my-gcp-project"
region = "us-central1"
zone = "us-central1-a"
uploads_bucket_name = "my-uploads-bucket"
virus_scan_api_url = "10.128.0.47"
prefix = "dev" # Optional: prefix for all resource names (default: "dev")
function_memory = "512M"
function_timeout = 600
Step 3: Initialize Terraform
Initialize Terraform to download required providers:
terraform init
This will create a .terraform directory and download the Google provider.
Step 4: Review Deployment Plan
Review what Terraform will create:
terraform plan
This will show you:
- Resources that will be created
- Any potential issues or warnings
- Estimated costs (if applicable)
Expected Resources:
- Pub/Sub topics (upload notifications, clean files, infected files)
- Pub/Sub subscriptions
- Cloud Function (2nd gen)
- GCS bucket for function source code
- Service accounts and IAM bindings
- GCS bucket notification configuration
Step 5: Deploy the Pipeline
Apply the Terraform configuration:
terraform apply
Terraform will prompt you to confirm. Type yes to proceed.
Deployment typically takes 5-10 minutes as it:
- Creates Pub/Sub topics and subscriptions
- Packages and uploads the Cloud Function source code
- Deploys the Cloud Function
- Configures GCS bucket notifications
- Sets up IAM permissions
- Configures VPC egress settings
Step 6: Verify Deployment
After deployment completes, verify the resources:
-
Check Cloud Function:
gcloud functions describe <prefix>-gcs-virus-scan-<your-bucket-name> --region=<your-region> --gen2(Replace
<prefix>with your actual prefix value, or omit if prefix is empty) -
Check Pub/Sub Topics:
gcloud pubsub topics list | grep <your-bucket-name>(Topics will be prefixed if the
prefixvariable is set) -
Check GCS Notifications:
gcloud storage buckets notifications list gs://<your-bucket-name> -
Check Function Logs:
gcloud functions logs read <prefix>-gcs-virus-scan-<your-bucket-name> --region=<your-region> --gen2 --limit=50(Replace
<prefix>with your actual prefix value, or omit if prefix is empty)
Configuration Details
Cloud Function Environment Variables
The Cloud Function is configured with the following environment variables:
GCP_PROJECT: Automatically set to your project IDVIRUS_SCAN_API_URL: URL of your virus scanning API (fromvirus_scan_api_urlvariable)CLEAN_TOPIC_NAME: Pub/Sub topic for clean filesINFECTED_TOPIC_NAME: Pub/Sub topic for infected files
VPC Network Configuration
The Cloud Function is configured with:
- VPC Egress:
private-ranges-only- Only private IP ranges can be accessed - Network: Default VPC network
- Subnet: Default subnet
This ensures the function can access your virus scanning API if it's on a private network.
Pub/Sub Topics Created
<prefix>-<bucket-name>-upload: Receives GCS upload notifications<prefix>-<bucket-name>-clean: Receives notifications for clean files<prefix>-<bucket-name>-infected: Receives notifications for infected files
Each topic has a corresponding subscription for message consumption.
Note: All resource names will include the prefix (default: "dev"). For example, with the default prefix, topics will be named dev-<bucket-name>-upload, etc.
Testing the Pipeline
Test with a Clean File
-
Upload a test file to your GCS bucket:
echo "This is a test file" > test.txt gsutil cp test.txt gs://<your-bucket-name>/test.txt -
Check Cloud Function logs:
gcloud functions logs read <prefix>-gcs-virus-scan-<your-bucket-name> --region=<your-region> --gen2 --limit=20(Replace
<prefix>with your actual prefix value, or omit if prefix is empty) -
Check the clean topic for messages:
gcloud pubsub subscriptions pull <prefix>-<your-bucket-name>-clean-sub --limit=1(Replace
<prefix>with your actual prefix value, or omit if prefix is empty)
Test with an Infected File (EICAR)
-
Create EICAR test file (safe test virus signature):
echo 'X5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*' > eicar.txt -
Upload to bucket:
gsutil cp eicar.txt gs://<your-bucket-name>/eicar.txt -
Check logs and infected topic:
gcloud functions logs read <prefix>-gcs-virus-scan-<your-bucket-name> --region=<your-region> --gen2 --limit=20 gcloud pubsub subscriptions pull <prefix>-<your-bucket-name>-infected-sub --limit=1(Replace
<prefix>with your actual prefix value, or omit if prefix is empty)
Monitoring
View Function Logs
gcloud functions logs read <prefix>-gcs-virus-scan-<your-bucket-name> --region=<your-region> --gen2 --follow
(Replace <prefix> with your actual prefix value, or omit if prefix is empty)
Monitor Pub/Sub Metrics
In the GCP Console:
- Navigate to Pub/Sub > Topics
- Select your topics to view metrics:
- Message count
- Publish rate
- Subscription backlog
Monitor Cloud Function Metrics
In the GCP Console:
- Navigate to Cloud Functions
- Select your function to view:
- Invocation count
- Execution time
- Error rate
- Active instances
Troubleshooting
Function Not Triggering
-
Verify GCS notifications are configured:
gcloud storage buckets notifications list gs://<your-bucket-name> -
Check Pub/Sub topic has messages:
gcloud pubsub subscriptions pull <prefix>-<your-bucket-name>-upload-sub --limit=5(Replace
<prefix>with your actual prefix value, or omit if prefix is empty) -
Verify function is active:
gcloud functions describe <prefix>-gcs-virus-scan-<your-bucket-name> --region=<your-region> --gen2(Replace
<prefix>with your actual prefix value, or omit if prefix is empty)
Virus Scan API Connection Issues
-
Check function logs for connection errors:
gcloud functions logs read <prefix>-gcs-virus-scan-<your-bucket-name> --region=<your-region> --gen2 --limit=50(Replace
<prefix>with your actual prefix value, or omit if prefix is empty) -
Verify VPC configuration:
- Ensure the virus scanning API is accessible from the function's VPC
- Check firewall rules allow traffic from Cloud Functions
-
Test API connectivity:
- The function uses
http://<virus_scan_api_url>:8080/api/clamav/scan/gcs/object - Verify this endpoint is reachable from your VPC
- The function uses
Permission Errors
-
Check service account permissions:
gcloud projects get-iam-policy <your-project-id> -
Verify GCS bucket permissions:
gsutil iam get gs://<your-bucket-name>
Function Timeout Issues
If files are large and scans take longer:
-
Increase function timeout in
variables.tf:function_timeout = 600 # 10 minutes -
Increase function memory:
function_memory = "512M" -
Re-apply Terraform:
terraform apply
Updating the Pipeline
Update Function Code
-
Modify code in
function/main.pyorfunction/requirements.txt -
Re-apply Terraform:
terraform applyTerraform will detect changes and redeploy the function automatically.
Update Configuration Variables
-
Modify variables in
variables.tforterraform.tfvars -
Review changes:
terraform plan -
Apply changes:
terraform apply
Cleanup / Uninstall
To remove all resources created by this pipeline:
terraform destroy
Warning: This will delete:
- All Pub/Sub topics and subscriptions
- The Cloud Function
- GCS bucket notifications
- Service accounts (if not used elsewhere)
- Function source code bucket
The original uploads bucket will not be deleted.
Support
For issues or questions:
- Check the troubleshooting section above
- Review Cloud Function logs for error messages
- Verify all prerequisites are met
- Ensure your virus scanning API is running and accessible
Additional Resources
- Terraform Google Provider Documentation
- Cloud Functions (2nd Gen) Documentation
- Pub/Sub Documentation
- GCS Notifications Documentation
Disclaimer
This pipeline and its associated components are provided on an "AS-IS" basis. To the fullest extent permitted by law, Elm Computing disclaims and excludes any implied or statutory warranty, including any warranty of title, non-infringement, merchantability or fitness for a particular purpose. Elm Computing does not warrant that the pipeline will operate uninterrupted or error-free, or that all errors will be corrected.
Users are solely responsible for:
- Ensuring the pipeline is properly configured and deployed according to their specific requirements
- Maintaining and updating the pipeline components as needed
- Verifying that the virus scanning API is properly configured and accessible
- Implementing appropriate security measures and access controls
- Backing up data and implementing disaster recovery procedures
- Complying with all applicable laws, regulations, and security requirements
- Any loss, damage, or liability resulting from the use or inability to use this pipeline
The pipeline relies on ClamAV, an open-source antivirus solution. While ClamAV is widely used and maintained, no antivirus solution can guarantee 100% detection of all malware. Users should implement additional security measures as appropriate for their use case and risk tolerance.