How to Dynamically Manage Azure Data Factory IPs in Snowflake
Automate the capture and whitelisting of Azure Data Factory IPs in Snowflake for seamless data operations.

Introduction
Managing IP whitelisting between Azure Data Factory (ADF) and Snowflake presents a unique challenge due to ADF’s dynamic IP address allocation. Manual updates to Snowflake’s IP allowlist can be time-consuming and error-prone, potentially causing pipeline failures and data transfer interruptions. This guide presents an automated solution that keeps your Snowflake instance synchronized with ADF’s IP ranges.
Key Challenges
- Dynamic IP Allocation: Azure Data Factory can use different IP addresses based on load and availability
- Regional Variations: IP ranges differ across Azure regions
- Regular Updates: Azure’s IP ranges are updated weekly
- Security Compliance: Maintaining strict access control while ensuring availability
Solution Overview
Our approach combines Python scripts, Snowflake stored procedures, and scheduled tasks to:
- Fetch current ADF IP ranges from Azure
- Update Snowflake’s IP allowlist automatically
- Track IP usage and clean up stale entries
- Maintain an audit trail of IP changes
Step 1: Setting Up the Infrastructure
First, let’s create the necessary Snowflake objects to manage our IP addresses.
-- Create a database and schema for IP management
CREATE DATABASE IF NOT EXISTS IP_MANAGEMENT;
USE DATABASE IP_MANAGEMENT;
CREATE SCHEMA IF NOT EXISTS ADF_IP_CONTROL;
USE SCHEMA ADF_IP_CONTROL;
-- Create table for storing IP information
CREATE TABLE IF NOT EXISTS AZURE_ADF_IPS (
IP_ADDRESS VARCHAR(45),
REGION VARCHAR(50),
SERVICE VARCHAR(50),
FIRST_SEEN TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP(),
LAST_SEEN TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP(),
ACTIVE BOOLEAN DEFAULT TRUE,
NETWORK_PREFIX VARCHAR(20),
CIDR_RANGE INTEGER,
COMMENT VARCHAR(1000),
PRIMARY KEY (IP_ADDRESS, REGION)
);
-- Create view for active IPs
CREATE OR REPLACE VIEW V_ACTIVE_ADF_IPS AS
SELECT
IP_ADDRESS,
REGION,
LAST_SEEN,
NETWORK_PREFIX || '/' || CIDR_RANGE AS CIDR_NOTATION
FROM AZURE_ADF_IPS
WHERE ACTIVE = TRUE;
Step 2: Python Script for IP Management
Here’s a robust Python script that fetches and processes Azure IP ranges:
import requests
import ipaddress
import json
from datetime import datetime
import snowflake.connector
from typing import List, Dict
class ADFIPManager:
def __init__(self, region: str, snowflake_conn: Dict[str, str]):
self.region = region
self.sf_conn = snowflake_conn
self.azure_ip_url = "https://download.microsoft.com/download/7/1/D/71D86715-5596-4529-9B13-DA13A5DE5B63/ServiceTags_Public_20231211.json"
def fetch_azure_ips(self) -> List[Dict[str, str]]:
"""Fetch and parse Azure IP ranges for ADF"""
response = requests.get(self.azure_ip_url)
if response.status_code != 200:
raise Exception(f"Failed to fetch Azure IPs: {response.status_code}")
data = response.json()
adf_ips = []
for value in data['values']:
if ('DataFactory' in value.get('properties', {}).get('systemService', '')
and self.region in value.get('properties', {}).get('region', '')):
for prefix in value['properties']['addressPrefixes']:
network = ipaddress.ip_network(prefix)
adf_ips.append({
'ip_address': prefix,
'network_prefix': str(network.network_address),
'cidr_range': network.prefixlen
})
return adf_ips
def update_snowflake_ips(self, ips: List[Dict[str, str]]) -> None:
"""Update Snowflake with new IP ranges"""
with snowflake.connector.connect(**self.sf_conn) as conn:
cursor = conn.cursor()
# Mark all existing IPs as inactive
cursor.execute("""
UPDATE AZURE_ADF_IPS
SET ACTIVE = FALSE
WHERE REGION = %s
""", (self.region,))
# Insert or update new IPs
for ip in ips:
cursor.execute("""
MERGE INTO AZURE_ADF_IPS t
USING (SELECT %s as IP_ADDRESS, %s as REGION) s
ON t.IP_ADDRESS = s.IP_ADDRESS AND t.REGION = s.REGION
WHEN MATCHED THEN
UPDATE SET
ACTIVE = TRUE,
LAST_SEEN = CURRENT_TIMESTAMP()
WHEN NOT MATCHED THEN
INSERT (IP_ADDRESS, REGION, SERVICE, NETWORK_PREFIX, CIDR_RANGE)
VALUES (%s, %s, 'DataFactory', %s, %s)
""", (ip['ip_address'], self.region, ip['ip_address'],
self.region, ip['network_prefix'], ip['cidr_range']))
# Usage example
if __name__ == "__main__":
snowflake_conn = {
"user": "YOUR_USERNAME",
"password": "YOUR_PASSWORD",
"account": "YOUR_ACCOUNT",
"warehouse": "YOUR_WAREHOUSE",
"database": "IP_MANAGEMENT",
"schema": "ADF_IP_CONTROL"
}
manager = ADFIPManager("australiaeast", snowflake_conn)
ips = manager.fetch_azure_ips()
manager.update_snowflake_ips(ips)
Step 3: Automating with Snowflake Tasks
Let’s create a Snowflake task to automatically run our IP updates:
-- Create a stored procedure to execute Python script
CREATE OR REPLACE PROCEDURE UPDATE_ADF_IPS()
RETURNS STRING
LANGUAGE PYTHON
RUNTIME_VERSION = '3.8'
PACKAGES = ('requests', 'ipaddress')
HANDLER = 'run_ip_update'
AS $$
def run_ip_update(snowflake_session):
import requests
import ipaddress
# Copy relevant code from Python script above
# Adapt to run within Snowflake
return "IP update completed successfully"
$$;
-- Create task to run every 6 hours
CREATE OR REPLACE TASK REFRESH_ADF_IPS_TASK
WAREHOUSE = COMPUTE_WH
SCHEDULE = 'USING CRON 0 */6 * * * America/Los_Angeles'
AS
CALL UPDATE_ADF_IPS();
-- Start the task
ALTER TASK REFRESH_ADF_IPS_TASK RESUME;
Step 4: Monitoring and Maintenance
Creating a Monitoring Dashboard
-- Create a view for monitoring IP changes
CREATE OR REPLACE VIEW V_IP_CHANGES AS
SELECT
DATE_TRUNC('day', FIRST_SEEN) as CHANGE_DATE,
COUNT(CASE WHEN ACTIVE = TRUE THEN 1 END) as ACTIVE_IPS,
COUNT(CASE WHEN ACTIVE = FALSE THEN 1 END) as INACTIVE_IPS,
COUNT(*) as TOTAL_IPS
FROM AZURE_ADF_IPS
GROUP BY CHANGE_DATE
ORDER BY CHANGE_DATE DESC;
Cleanup Procedures
-- Create procedure to clean up old IP entries
CREATE OR REPLACE PROCEDURE CLEANUP_OLD_IPS()
RETURNS STRING
LANGUAGE SQL
AS
$$
BEGIN
-- Archive IPs not seen in 30 days
INSERT INTO ARCHIVED_ADF_IPS
SELECT *
FROM AZURE_ADF_IPS
WHERE ACTIVE = FALSE
AND LAST_SEEN < DATEADD(days, -30, CURRENT_TIMESTAMP());
-- Delete archived records
DELETE FROM AZURE_ADF_IPS
WHERE ACTIVE = FALSE
AND LAST_SEEN < DATEADD(days, -30, CURRENT_TIMESTAMP());
RETURN 'Cleanup completed successfully';
END;
$$;
Best Practices and Considerations
Security
- Principle of Least Privilege
- Create dedicated roles for IP management
- Limit access to IP management procedures
- Regularly audit IP access patterns
- Error Handling
- Implement retry logic for API calls
- Set up alerts for failed updates
- Maintain fallback IP ranges
Performance
- Resource Management
- Use appropriate warehouse sizes
- Schedule updates during off-peak hours
- Implement efficient IP range lookups
- Monitoring
- Track task execution history
- Monitor IP change patterns
- Set up alerts for unexpected changes
Troubleshooting Common Issues
- Task Failures
- Check warehouse availability
- Verify Python runtime dependencies
- Review task history for error messages
- IP Synchronization Issues
- Validate Azure API responses
- Check network connectivity
- Review IP update logs
Conclusion
Automating Azure Data Factory IP management in Snowflake is crucial for maintaining reliable data pipelines. This solution provides a robust framework that can be adapted to various organizational needs and scaled as requirements grow.
Next Steps
- Implement the solution in a test environment
- Customize monitoring and alerting
- Develop disaster recovery procedures
- Plan for regional expansion
References
- Azure IP Ranges and Service Tags - Public Cloud
- Snowflake Network Policies
- Azure Data Factory IP Configuration
sitemap: priority: 0.8 lastmod: 2024-12-15 changefreq: weekly