Files
spring-cloud-dataflow-samples/analytics/twitter-analytics
Sabby Anandan 5d54cb280d Add scfn + scdf integration sample
- add a new folder for scfn + scdf integration
- include quick-link from the root README
- replace spring-cloud-stream-modules with spring-cloud-stream-app-starters
- adjust text content for better continuity
- replace older bit.ly's with bacon release-train

Add function-runner repo link for clarity
2017-10-17 19:38:26 -04:00
..
2016-02-16 13:00:38 -08:00
2017-10-17 19:38:26 -04:00

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
:sectnums:
= Twitter Analytics

In this demonstration, you will learn how to orchestrate a data pipeline using http://cloud.spring.io/spring-cloud-dataflow/[Spring Cloud Data Flow] to consume data from _TwitterStream_ and compute simple analytics over data-in-transit using _Field-Value-Counter_.

We will begin by discussing the steps to prep, configure and operationalize Spring Cloud Data Flow's `Local` server.

== Using Local Server

=== Prerequisites

Make sure that you have the following components:

* Local build of link:https://github.com/spring-cloud/spring-cloud-dataflow[Spring Cloud Data Flow]
* Running instance of link:http://redis.io/[Redis]
* Running instance of link:http://kafka.apache.org/downloads.html[Kafka]
* Twitter credentials from link:https://apps.twitter.com/[Twitter Developers] site

=== Running the Sample Locally

. Launch the `local-server`
+
```
$ cd <PATH/TO/SPRING-CLOUD-DATAFLOW>
$ java -jar spring-cloud-dataflow-server-local/target/spring-cloud-dataflow-server-local-<VERSION>.jar

```
+

. Connect to Spring Cloud Data Flow's `shell`
+
```
$ cd <PATH/TO/SPRING-CLOUD-DATAFLOW>
$ java -jar spring-cloud-dataflow-shell/target/spring-cloud-dataflow-shell-<VERSION>.jar

  ____                              ____ _                __
 / ___| _ __  _ __(_)_ __   __ _   / ___| | ___  _   _  __| |
 \___ \| '_ \| '__| | '_ \ / _` | | |   | |/ _ \| | | |/ _` |
  ___) | |_) | |  | | | | | (_| | | |___| | (_) | |_| | (_| |
 |____/| .__/|_|  |_|_| |_|\__, |  \____|_|\___/ \__,_|\__,_|
  ____ |_|    _          __|___/                 __________
 |  _ \  __ _| |_ __ _  |  ___| | _____      __  \ \ \ \ \ \
 | | | |/ _` | __/ _` | | |_  | |/ _ \ \ /\ / /   \ \ \ \ \ \
 | |_| | (_| | || (_| | |  _| | | (_) \ V  V /    / / / / / /
 |____/ \__,_|\__\__,_| |_|   |_|\___/ \_/\_/    /_/_/_/_/_/

<VERSION>

Welcome to the Spring Cloud Data Flow shell. For assistance hit TAB or type "help".
dataflow:>version
<VERSION>
```

+
. https://github.com/spring-cloud/spring-cloud-dataflow/blob/master/spring-cloud-dataflow-docs/src/main/asciidoc/streams.adoc#register-a-stream-app[Register] Kafka binder variant of out-of-the-box applications
+

```
dataflow:>app import --uri http://bit.ly/Bacon-RELEASE-stream-applications-kafka-10-maven
```

+
. Create and deploy the following streams
+
```
(1) dataflow:>stream create tweets --definition "twitterstream --consumerKey=<CONSUMER_KEY> --consumerSecret=<CONSUMER_SECRET> --accessToken=<ACCESS_TOKEN> --accessTokenSecret=<ACCESS_TOKEN_SECRET> | log"
Created new stream 'tweets'

(2) dataflow:>stream create tweetlang  --definition ":tweets.twitterstream > field-value-counter --fieldName=lang --name=language" --deploy
Created and deployed new stream 'tweetlang'

(3) dataflow:>stream create tagcount --definition ":tweets.twitterstream > field-value-counter --fieldName=entities.hashtags.text --name=hashtags" --deploy
Created and deployed new stream 'tagcount'

(4) dataflow:>stream deploy tweets
Deployed stream 'tweets'
```
NOTE: To get a consumerKey and consumerSecret you need to register a twitter application. If you dont already have one set up, you can create an app at the link:https://apps.twitter.com/[Twitter Developers] site to get these credentials. The tokens `<CONSUMER_KEY>`, `<CONSUMER_SECRET>`, `<CONSUMER_SECRET>`, and `<ACCESS_TOKEN_SECRET>` are required to be replaced with your account credentials.

+
. Verify the streams are successfully deployed. Where: (1) is the primary pipeline; (2) and (3) are tapping the primary pipeline with the DSL syntax `<stream-name>.<label/app name>` [e.x. `:tweets.twitterstream`]; and (4) is the final deployment of primary pipeline

+
```
dataflow:>stream list
```
+
. Notice that `tweetlang.field-value-counter`, `tagcount.field-value-counter`, `tweets.log` and `tweets.twitterstream` link:https://github.com/spring-cloud-stream-app-starters/[Spring Cloud Stream] applications are running as Spring Boot applications within the `local-server`.
+

```
2016-02-16 11:43:26.174  INFO 10189 --- [nio-9393-exec-2] o.s.c.d.d.l.OutOfProcessModuleDeployer   : deploying module org.springframework.cloud.stream.module:field-value-counter-sink:jar:exec:1.0.0.BUILD-SNAPSHOT instance 0
   Logs will be in /var/folders/c3/ctx7_rns6x30tq7rb76wzqwr0000gp/T/spring-cloud-data-flow-6990537012958280418/tweetlang-1455651806160/tweetlang.field-value-counter
2016-02-16 11:43:26.206  INFO 10189 --- [nio-9393-exec-3] o.s.c.d.d.l.OutOfProcessModuleDeployer   : deploying module org.springframework.cloud.stream.module:field-value-counter-sink:jar:exec:1.0.0.BUILD-SNAPSHOT instance 0
   Logs will be in /var/folders/c3/ctx7_rns6x30tq7rb76wzqwr0000gp/T/spring-cloud-data-flow-6990537012958280418/tagcount-1455651806202/tagcount.field-value-counter
2016-02-16 11:43:26.806  INFO 10189 --- [nio-9393-exec-4] o.s.c.d.d.l.OutOfProcessModuleDeployer   : deploying module org.springframework.cloud.stream.module:log-sink:jar:exec:1.0.0.BUILD-SNAPSHOT instance 0
   Logs will be in /var/folders/c3/ctx7_rns6x30tq7rb76wzqwr0000gp/T/spring-cloud-data-flow-6990537012958280418/tweets-1455651806800/tweets.log
2016-02-16 11:43:26.813  INFO 10189 --- [nio-9393-exec-4] o.s.c.d.d.l.OutOfProcessModuleDeployer   : deploying module org.springframework.cloud.stream.module:twitterstream-source:jar:exec:1.0.0.BUILD-SNAPSHOT instance 0
   Logs will be in /var/folders/c3/ctx7_rns6x30tq7rb76wzqwr0000gp/T/spring-cloud-data-flow-6990537012958280418/tweets-1455651806800/tweets.twitterstream
```
+
. Verify that two `field-value-counter` with the names `hashtags` and `language` is listing successfully
+
```
dataflow:>field-value-counter list
╔════════════════════════╗
║Field Value Counter name║
╠════════════════════════╣
║hashtags                ║
║language                ║
╚════════════════════════╝
```
+
. Verify you can query individual `field-value-counter` results successfully
+
```
dataflow:>field-value-counter display hashtags
Displaying values for field value counter 'hashtags'
╔══════════════════════════════════════╤═════╗
║                Value                 │Count║
╠══════════════════════════════════════╪═════╣
║KCA                                   │   40║
║PENNYSTOCKS                           │   17║
║TEAMBILLIONAIRE                       │   17║
║UCL                                   │   11║
║...                                   │   ..║
║...                                   │   ..║
║...                                   │   ..║
╚══════════════════════════════════════╧═════╝

dataflow:>field-value-counter display language
Displaying values for field value counter 'language'
╔═════╤═════╗
║Value│Count║
╠═════╪═════╣
║en   │1,171║
║es   │  337║
║ar   │  296║
║und  │  251║
║pt   │  175║
║ja   │  137║
║..   │  ...║
║..   │  ...║
║..   │  ...║
╚═════╧═════╝

```

+
. Go to `Dashboard` accessible at `http://localhost:9393/dashboard` and launch the `Analytics` tab. From the default `Dashboard` menu, select the following combinations to visualize real-time updates on `field-value-counter`.

- For real-time updates on `language` tags, select:
 .. Metric Type as `Field-Value-Counters`
 .. Stream as `language`
 .. Visualization as `Bubble-Chart` or `Pie-Chart`
- For real-time updates on `hashtags` tags, select:
 .. Metric Type as `Field-Value-Counters`
 .. Stream as `hashtags`
 .. Visualization as `Bubble-Chart` or `Pie-Chart`

image::images/twitter_analytics.png[Twitter Analytics Visualization]

== Summary

In this sample, you have learned:

* How to use Spring Cloud Data Flow's `Local` server
* How to use Spring Cloud Data Flow's `shell` application
* How to create streaming data pipeline to compute simple analytics using `Twitter Stream` and `Field Value Counter` applications