diff --git a/README.adoc b/README.adoc index dd50c84..42314b2 100644 --- a/README.adoc +++ b/README.adoc @@ -21,3 +21,7 @@ A data pipeline demonstration that consumes data from an `http` endpoint and wri A data pipeline demonstration that consumes data from twitter-firehose using `twitterstream` source application and computes simple analytics over data-in-trasnsit with the help of `field-value-counter` sink application. ## Data Science + +### link:datascience/species-prediction/README.adoc[species-prediction] + +A simple demonstration to walkthrough the steps to compute real-time predictions using https://en.wikipedia.org/wiki/Predictive_Model_Markup_Language[PMML] data microservice application. diff --git a/datascience/species-prediction/README.adoc b/datascience/species-prediction/README.adoc new file mode 100644 index 0000000..d7da113 --- /dev/null +++ b/datascience/species-prediction/README.adoc @@ -0,0 +1,158 @@ +:sectnums: += Species Prediction + +In this demonstration, you will learn how to use https://en.wikipedia.org/wiki/Predictive_Model_Markup_Language[PMML] model in the context of streaming data pipeline orchestrated by http://cloud.spring.io/spring-cloud-dataflow/[Spring Cloud Data Flow]. + +We will begin by discussing the steps to prep, configure and operationalize Spring Cloud Data Flow's `local-server`, a Spring Boot application. + +== Using Local SPI + +=== Prerequisites + +In order to get started, make sure that you have the following components: + +* Local build of link:https://github.com/spring-cloud/spring-cloud-dataflow[Spring Cloud Data Flow] +* Running instance of link:http://redis.io/[Redis] + +=== Running the Sample Locally + +. Launch the `local-server` ++ +``` +$ cd +$ java -jar spring-cloud-dataflow-server-local/target/spring-cloud-dataflow-server-local-1.0.0.BUILD-SNAPSHOT.jar + +``` ++ + +. Connect to Spring Cloud Data Flow's `shell` ++ +``` +$ cd +$ java -jar spring-cloud-dataflow-shell/target/spring-cloud-dataflow-shell-1.0.0.BUILD-SNAPSHOT.jar + + ____ ____ _ __ + / ___| _ __ _ __(_)_ __ __ _ / ___| | ___ _ _ __| | + \___ \| '_ \| '__| | '_ \ / _` | | | | |/ _ \| | | |/ _` | + ___) | |_) | | | | | | | (_| | | |___| | (_) | |_| | (_| | + |____/| .__/|_| |_|_| |_|\__, | \____|_|\___/ \__,_|\__,_| + ____ |_| _ __|___/ __________ + | _ \ __ _| |_ __ _ | ___| | _____ __ \ \ \ \ \ \ + | | | |/ _` | __/ _` | | |_ | |/ _ \ \ /\ / / \ \ \ \ \ \ + | |_| | (_| | || (_| | | _| | | (_) \ V V / / / / / / / + |____/ \__,_|\__\__,_| |_| |_|\___/ \_/\_/ /_/_/_/_/_/ + +1.0.0.BUILD-SNAPSHOT + +Welcome to the Spring Cloud Data Flow shell. For assistance hit TAB or type "help". +dataflow:>version +1.0.0.BUILD-SNAPSHOT +``` + ++ +. Create and deploy the following stream ++ +``` + +dataflow:>stream create --name pmmlTest --definition "http --server.port=9001 | pmml --modelLocation=https://raw.githubusercontent.com/spring-cloud/spring-cloud-stream-modules/master/pmml-processor/src/test/resources/iris-flower-classification-naive-bayes-1.pmml.xml --inputs='Sepal.Length=payload.sepalLength,Sepal.Width=payload.sepalWidth,Petal.Length=payload.petalLength,Petal.Width=payload.petalWidth' --outputs='Predicted_Species=payload.predictedSpecies' --inputType='application/x-spring-tuple' --outputType='application/json'| log" --deploy +Created and deployed new stream 'pmmlTest' + +``` +NOTE: The built-in `pmml` processor will load the given PMML model definition and create an internal object representation that can be evaluated quickly. When the stream receives the data, it will be used as the input for the evaluation of the analytical model `iris-flower-classifier-1` contained in the PMML document. The result of this evaluation is a new field `predictedSpecies` that was created from the `pmml` processor by applying a classifier that uses the naiveBayes algorithm. + ++ +. Verify the stream is successfully deployed + ++ +``` +dataflow:>stream list +``` ++ +. Notice that `pmmlTest.http`, `pmmlTest.pmml`, and `pmmlTest.log` link:https://github.com/spring-cloud/spring-cloud-stream-modules/[Spring Cloud Stream] applications are running within the `local-server`. ++ + +``` +2016-02-18 06:36:45.396 INFO 31194 --- [nio-9393-exec-1] o.s.c.d.d.l.OutOfProcessModuleDeployer : deploying module org.springframework.cloud.stream.module:log-sink:jar:exec:1.0.0.BUILD-SNAPSHOT instance 0 + Logs will be in /var/folders/c3/ctx7_rns6x30tq7rb76wzqwr0000gp/T/spring-cloud-data-flow-3038434123335455382/pmmlTest-1455806205386/pmmlTest.log +2016-02-18 06:36:45.402 INFO 31194 --- [nio-9393-exec-1] o.s.c.d.d.l.OutOfProcessModuleDeployer : deploying module org.springframework.cloud.stream.module:pmml-processor:jar:exec:1.0.0.BUILD-SNAPSHOT instance 0 + Logs will be in /var/folders/c3/ctx7_rns6x30tq7rb76wzqwr0000gp/T/spring-cloud-data-flow-3038434123335455382/pmmlTest-1455806205386/pmmlTest.pmml +2016-02-18 06:36:45.407 INFO 31194 --- [nio-9393-exec-1] o.s.c.d.d.l.OutOfProcessModuleDeployer : deploying module org.springframework.cloud.stream.module:http-source:jar:exec:1.0.0.BUILD-SNAPSHOT instance 0 + Logs will be in /var/folders/c3/ctx7_rns6x30tq7rb76wzqwr0000gp/T/spring-cloud-data-flow-3038434123335455382/pmmlTest-1455806205386/pmmlTest.http +``` ++ +. Post sample data pointing to the `http` endpoint: `http://localhost:9001` [`9001` is the `server.port` we specified for the `http` source in this case] ++ +``` +dataflow:>http post --target http://localhost:9001 --contentType application/json --data "{ \"sepalLength\": 6.4, \"sepalWidth\": 3.2, \"petalLength\":4.5, \"petalWidth\":1.5 }" +> POST (application/json;charset=UTF-8) http://localhost:9001 { "sepalLength": 6.4, "sepalWidth": 3.2, "petalLength":4.5, "petalWidth":1.5 } +> 202 ACCEPTED +``` ++ +. Verify the predicted outcome by tailing `http post --target http://localhost:9001 --contentType application/json --data "{ \"sepalLength\": 6.4, \"sepalWidth\": 3.2, \"petalLength\":4.5, \"petalWidth\":1.8 }" +> POST (application/json;charset=UTF-8) http://localhost:9001 { "sepalLength": 6.4, "sepalWidth": 3.2, "petalLength":4.5, "petalWidth":1.8 } +> 202 ACCEPTED +``` +NOTE: `petalWidth` value changed from `1.5` to `1.8` + ++ +. The `predictedSpecies` will now be listed as `virginica`. ++ + +``` +{ + "sepalLength": 6.4, + "sepalWidth": 3.2, + "petalLength": 4.5, + "petalWidth": 1.8, + "Species": { + "result": "virginica", + "type": "PROBABILITY", + "categoryValues": [ + "setosa", + "versicolor", + "virginica" + ] + }, + "predictedSpecies": "virginica", + "Probability_setosa": 1.0443898084700813E-8, + "Probability_versicolor": 0.1750120333571921, + "Probability_virginica": 0.8249879561989097 +} +``` + +== Summary + +In this sample, you have learned: + +* How to use Spring Cloud Data Flow in `local-server` mode +* How to use Spring Cloud Data Flow's `shell` +* How to use `pmml` processor to compute real-time predictions