Below you will find a set of common issues and available resources for diagnosing errors during the ControlShift to Redshift data synchronization as detailed in our developer documentation.
We’ve created separated sections for general issues, for problems that are not specific to signatures data sync using Glue, nor to other data sync using the controlshift-redshift-loader Lambda function. You will also find two additional sections tackling issues that are specific to these sync mechanisms.
Signatures not syncing (AWS Glue)
Other data not syncing (AWS Lambda function)
Database Schema Divergence
It’s possible that ControlShift's database schema diverges from Redshift's. In order to ensure both schemas are aligned, you'll need to manually run a DDL script to recreate Redshift tables with the current ControlShift schema. To get the DDL script, you can either:
- Download from the developers documentation page. Since this is statically generated, it’s possible that at has fallen out of date. To ensure you get the most up-to-date version of the schema use the option below.
- Generate the DDL script from our JSON API using the script
For more information see our developer documentation.
Please note that this script will delete all the data on these tables, so after running it, you will need to trigger a full data export from the webhooks integration page in your ControlShift > Settings > Integrations > Webhooks to re-sync all of your data.
Lambda functions not running on webhook notifications
In this case, the first step would be to ensure that the webhook notifications are correctly sent to AWS. To do this:
- Go to your ControlShift site and navigate to the admin homepage > Settings > Integrations > Webhooks > Manage
- Click on the Trigger button under Test Nightly Bulk Data Export Webhook section to manually trigger a full export for all tables
- Click on the Configure link next to the webhook endpoint, and then click on Debug. You should see a page like the one below, listing all the webhook notifications sent and updated in real time. Ensure that the values for the 2nd column from the left, listing the HTTP status code for the webhook notification response, is 200 for all rows. If that’s not the case double check the webhook endpoint URL and ensure it matches the output URL from the terraform module.
If you have confirmed that the response for webhook notifications is 200 OK, but the Lambda functions to load data into Redshift are still not running, you can check the logs for the controlshift-webhook-handler Lambda function. These should provide an insight into any errors occurring while they run.
Additionally, you can look at the API gateway dashboard for controlshift-webhook-receiver to confirm that webhook notifications are being correctly processed to trigger the Lambda function.
Other places you can check are the controlshift-received-webhooks and controlshift-received-webhooks-glue SQS queues. The controlshift-webhook-handler Lambda function will queue jobshere for processing each data export CSV, either with the controlshift-redshift-loader Lambda function or with AWS Glue (via controlshift-run-glue-crawler and controlshift-run-glue-job lambdas).
In particular, you may want to check the Number Of Messages Received metric, to check that the lambda queued messages for processing the CSVs.
Unauthorized access to data exports S3 bucket
Access to the data exports S3 bucket must be granted for your AWS account. If you’re seeing errors while attempting to download the data from the S3 bucket (agra-data-exports-production for the North American and agra-data-exports-production-eu-central-1 for the European ControlShift data center) verify that the AWS account ID you entered in the webhook configuration is correct. If that’s the case, contact us at email@example.com to ensure we’ve updated the S3 bucket permissions to grant you access.
Details of data load commands on Redshift
From the Queries and loads page for your Redshift cluster, you can see a list of the COPY commands executed to sync the data. These will list commands executed by both, AWS Glue and the Lambda function.
Signatures not syncing (AWS Glue)
Glue signatures table not generated
The first step for importing signatures using AWS Glue is to run a crawler on the CSV. This process analyzes the CSV file and generates a schema based on the columns and their values.
To verify that the data export CSV was correctly processed, navigate to the AWS Glue homepage, and then select the Tables option from the left sidebar menu. You should see a page similar to the image below, clicking on the signatures link should display a list of the signatures columns.
If you don’t see the signatures table, or its columns are not correct, you should check that the crawler has run without errors. To do that click on the Crawlers option on the left sidebar menu. On the crawlers page, you should see a page similar to the one below. From this page you can find the crawler’s logs for the latest run and the history of all of the crawler’s runs, with their status. If the status is anything other than Completed, check the crawler logs for errors.
In case there are no runs for the crawler you should check that the controlshift-run-glue-crawler Lambda function has run. This function runs automatically when a webhook notification for a new data export for the signatures table is received. Look at the function’s logs to confirm that there haven’t been any errors while running.
Glue job errors
Timeout while connecting to S3
The controlshift-redshift-sync terraform module expects your AWS Redshift infrastructure to be hosted on the US East (us-east-1) region. If you are running your infrastructure, and particularly Glue on another region you may find an error similar to the following:
An error occurred while calling o79.getDynamicFrame. Unable to execute HTTP request: Connect to agra-data-exports-production.s3.amazonaws.com:443 [agra-data-exports-production.s3.amazonaws.com/18.104.22.168] failed: connect timed out
This is caused by the AWS Glue job not being able to connect to the S3 bucket where we store the data exports (agra-data-exports-production), which is hosted on the us-east-1 region.
If you need to run your infrastructure in a region other than us-east-1, you’ll need to setup a VPC peering connection between the VPC in your region and a VPC in the us-east-1 region. See AWS documentation for more details.
Other job errors
AWS Glue jobs write lots of logging during their runs, which is usually useful for troubleshooting errors. From the job’s page you can see an historical list of all its runs, along with error and other debugging logs.
If rows are being imported into Redshift, but all of them have missing values on one or more columns, it’s possible that the Glue job script that references the signatures data schema is out of date. You can easily fix this issue by applying the terraform changes, which should query ControlShift’s API to get the latest schema for the signatures table and regenerate the script file with that information.
Other data not syncing (AWS Lambda function)
Check the controlshift-redshift-loader Lambda function logs for details. We include lots of info and debug entries to help you follow the execution path of the code, but any errors will be prefixed with “error:” which should make it easy to spot.
The controlshift-redshift-loader Lambda function processes CSV files in batches. To keep track of their status, it uses DynamoDB. You can check which batch a file was included on by checking the LambdaRedshiftProcessedFiles table. Look for the CSV file path in S3 on the loadFile column and write down the batchId value for the record. With that value, you can check the LambdaRedshiftBatches table, where you can find the batch status.
It may also be useful to check the LambdaRedshiftBatchLoadConfig table. This table contains the configuration for each type of data export that the loader receives: a combination of either incremental or full + the ControlShift table name.
Please note that DynamoDB record updates usually fail when done via terraform, so if you change a configuration on your ControlShift site (e.g.: enable BZIP2 compression for data exports), you will likely need to manually delete all records on the LambdaRedshiftBatchLoadConfig table, and apply terraform changes after that’s done.
One final place you can check for details is the S3 bucket where the controlshift-redshift-loaderlambda function stores the manifests. The name of this bucket is defined via the manifest_bucket_name variable passed to the controlshift-redshift-sync terraform module. In this bucket you’ll find the with the manifests/ prefix, the manifest files themselves, containing the URL of the data export CSV file, along with some metadata used by the lambda function. With the failed/ prefix, you’ll find the same manifest files but specifically for the cases where the CSV was not correctly processed and imported into Redshift.
Please sign in to leave a comment.