+

Let's say you don't have an AWS Direct Connect or a VPN connection from your AWS account to your on-premise datacenter. But you have some processed data that needs to be sent back to your on-premise Hadoop cluster upon completion. This requires you to somehow initiate a process (in this example a Hadoop Distcp) from your on-premise. How do we know when the processed data is ready? How do we know when to start the data copy process? Just leave a message!

Here is an example of a small AWS SQS consumer application that works with Hadoop and AWS s3 to copy processed data from s3 to your local Hadoop cluster.

In this example scenario, we have Amazon EMR processing data that is outputted to s3. The last step in the EMR workflow is to post a message to an AWS SQS queue. The body of this message contains the

»