Simple Storage Service (S3) is web service that allows us to store and access information, at any time, from anywhere on the web. It basically acts as a cloud storage. Amazon do not give direct access to use the data that is stored in S3 (Simple Storage Service) because any person can access this information. To keep the data safe, we simply load information from DMP to S3 and then download it to our local machine.
Step 4: Download Data on Local Machine
Downloading individual files can be tedious time consuming because the data is gathered daily. Therefore, we have written scripts that helps us download the data. The seven scripts are:
The above files help us download the data. For every hour, we get 4 files – TD_Impressions, TD_Clicks, TD_Videos and TD_Conversion. So, for a day, we get 24 files for each data frame and a total of 96 files (combined) each day. We store the above data into four different folders.
All these folders contain daily records ordered by date. The “Run.R” file triggers all the other files. The downloaded data is in the form of CSV files. Next step is to load the data to Sawyer S3 bucket.
Step 5: Sawyer S3 Bucket
Sawyer Studios have their account on AWS. Using the login credentials i.e an access key ID and secret access key, we create Amazon S3 bucket i.e Sawyer S3 bucket and upload the data files from previous step to the bucket. Sawyer S3 is a private bucket and can be accessed only by the admin. To make it available to all the other employee, we transfer it to Redshift.
Step 6: Redshift Transfer
The process of loading data into Sawyer Studio’s Amazon Redshift database tables are as follows:
(1) Launch an Amazon Redshift cluster and create database tables.
(2) Use COPY commands to load the tables from the data files on Amazon S3.
(3) Troubleshoot load errors and modify your COPY commands to correct the errors.