From 4f43f407dbcacdac41ec7d3ab3f48df7208910a2 Mon Sep 17 00:00:00 2001 From: Sabby Anandan Date: Tue, 16 Feb 2016 13:00:38 -0800 Subject: [PATCH] Add new analytics category Add twitter-analytics sample Update `admin-local` references with `server-local` --- README.adoc | 10 +- analytics/twitter-analytics/README.adoc | 160 ++++++++++++++++++ .../images/twitter_analytics.png | Bin 0 -> 426148 bytes streaming/http-to-cassandra/README.adoc | 2 +- streaming/http-to-mysql/README.adoc | 2 +- 5 files changed, 170 insertions(+), 4 deletions(-) create mode 100644 analytics/twitter-analytics/README.adoc create mode 100644 analytics/twitter-analytics/images/twitter_analytics.png diff --git a/README.adoc b/README.adoc index 6ea8bdd..dd50c84 100644 --- a/README.adoc +++ b/README.adoc @@ -6,12 +6,18 @@ This repository provides sample starter applications and code for use with the S ### link:streaming/http-to-cassandra/README.adoc[http-cassandra] -A data pipeline demonstration that consumes data from an `http` endpoint and writes the payload to Cassandra database using the `cassandra` sink. +A data pipeline demonstration that consumes data from an `http` endpoint and writes the payload to Cassandra database using the `cassandra` sink application. ### link:streaming/http-to-mysql/README.adoc[http-mysql] -A data pipeline demonstration that consumes data from an `http` endpoint and writes the payload to MySQL database using the `jdbc` sink. +A data pipeline demonstration that consumes data from an `http` endpoint and writes the payload to MySQL database using the `jdbc` sink application. ## Task / Batch +## Analytics + +### link:analytics/twitter-analytics/README.adoc[twitter-analytics] + +A data pipeline demonstration that consumes data from twitter-firehose using `twitterstream` source application and computes simple analytics over data-in-trasnsit with the help of `field-value-counter` sink application. + ## Data Science diff --git a/analytics/twitter-analytics/README.adoc b/analytics/twitter-analytics/README.adoc new file mode 100644 index 0000000..2e4279d --- /dev/null +++ b/analytics/twitter-analytics/README.adoc @@ -0,0 +1,160 @@ +:sectnums: += Twitter Analytics + +In this demonstration, you will learn how to orchestrate a data pipeline using http://cloud.spring.io/spring-cloud-dataflow/[Spring Cloud Data Flow] to consume data from _TwitterStream_ and compute simple analytics over data-in-transit using _Field-Value-Counter_. + +We will begin by discussing the steps to prep, configure and operationalize Spring Cloud Data Flow's `local-server`, a Spring Boot application. + +== Using Local SPI + +=== Prerequisites + +In order to get started, make sure that you have the following components: + +* Local build of link:https://github.com/spring-cloud/spring-cloud-dataflow[Spring Cloud Data Flow] +* Running instance of link:http://redis.io/[Redis] +* Twitter credentials from link:https://apps.twitter.com/[Twitter Developers] site + +=== Running the Sample Locally + +. Launch the `local-server` ++ +``` +$ cd +$ java -jar spring-cloud-dataflow-server-local/target/spring-cloud-dataflow-server-local-1.0.0.BUILD-SNAPSHOT.jar + +``` ++ + +. Connect to Spring Cloud Data Flow's `shell` ++ +``` +$ cd +$ java -jar spring-cloud-dataflow-shell/target/spring-cloud-dataflow-shell-1.0.0.BUILD-SNAPSHOT.jar + + ____ ____ _ __ + / ___| _ __ _ __(_)_ __ __ _ / ___| | ___ _ _ __| | + \___ \| '_ \| '__| | '_ \ / _` | | | | |/ _ \| | | |/ _` | + ___) | |_) | | | | | | | (_| | | |___| | (_) | |_| | (_| | + |____/| .__/|_| |_|_| |_|\__, | \____|_|\___/ \__,_|\__,_| + ____ |_| _ __|___/ __________ + | _ \ __ _| |_ __ _ | ___| | _____ __ \ \ \ \ \ \ + | | | |/ _` | __/ _` | | |_ | |/ _ \ \ /\ / / \ \ \ \ \ \ + | |_| | (_| | || (_| | | _| | | (_) \ V V / / / / / / / + |____/ \__,_|\__\__,_| |_| |_|\___/ \_/\_/ /_/_/_/_/_/ + +1.0.0.BUILD-SNAPSHOT + +Welcome to the Spring Cloud Data Flow shell. For assistance hit TAB or type "help". +dataflow:>version +1.0.0.BUILD-SNAPSHOT +``` + ++ +. Create and deploy the following streams ++ +``` +(1) dataflow:>stream create tweets --definition "twitterstream --consumerKey= --consumerSecret= --accessToken= --accessTokenSecret= | log" +Created new stream 'tweets' + +(2) dataflow:>stream create tweetlang --definition ":tweets.twitterstream > field-value-counter --fieldName=lang --name=language --store=redis" --deploy +Created and deployed new stream 'tweetlang' + +(3) dataflow:>stream create tagcount --definition ":tweets.twitterstream > field-value-counter --fieldName=entities.hashtags.text --name=hashtags --store=redis" --deploy +Created and deployed new stream 'tagcount' + +(4) dataflow:>stream deploy tweets +Deployed stream 'tweets' +``` +NOTE: To get a consumerKey and consumerSecret you need to register a twitter application. If you don’t already have one set up, you can create an app at the link:https://apps.twitter.com/[Twitter Developers] site to get these credentials. The tokens ``, ``, ``, and `` are required to be replaced with your account credentials. + ++ +. Verify the streams are successfully deployed. Where: (1) is the primary pipeline; (2) and (3) are tapping the primary pipeline with the DSL syntax `.