This is a sample data pipeline implemented using the Transposit platform.
This data is all fake. It is pulled from this sample data.
The daily sales data that is dropped into an S3 bucket by some other process.
We want to process that data and enrich it with additional information. We only want to process new orders. For these orders, we want to add who the sales lead was and the the inventory level. For this example, these enrichments are mocked up, but you could easily add a new data connector which would reach out to internal APIs.
We want to push a summation of each region's sales totals to a Google spreadsheet (perhaps for executive dashboard) and add all new orders to a data warehouse.
The pipeline looks like this:
S3 file -> filter out old orders -> add sales leads -> add inventory data -> sum up sales -> update Google sheets -> update BigQuery
You need to set up the following external resources:
processedfolder. Download the
100 Sales Records.csvfile from the sample data and upload it to the
AmazonS3FullAccesspermissions, or at a minimum read/write permissions for the bucket you created above. You'll need the
AWS_SECRET_ACCESS_KEYfor this user.
orderdata. Create the table from a file upload (of the sample sales data) and select 'Auto detect' for the schema, so that the schema gets picked up from the CSV file.
delete from \default.orderdata` where 1=1
(or just delete some of them:delete from `default.orderdata` where Region='Europe'`)
Create a free Transposit account.
scheduled_taskoperation, which contains the S3 manipulations, the
pipeline_topoperation which documents the pipeline, and the
add_sales_repoperation, which adds in some sales data.
300000(5 minutes) for the
scheduled_taskoperation by going to the Properties tab.
scheduled_taskoperation every 10 minutes:
37 /10 * ? * *
You can also run the pipeline by clicking "Run now & show log". You should then see records start to appear in the BigQuery table.