IST718 WK10
10.1 Presenting Introduction
So welcome to week 10. Time to wrap this up. When we started this course, our goal was to provide a broad introduction to advanced analytics, develop a portfolio of resources, demonstrations, recipes, and examples of various analytic techniques. From the beginning, with exploratory data analysis to the last few weeks with machine learning and neural networks, we’ve had several primary learning objectives that have provided the structure for each week. This structure we’ve framed around OSEMN– obtain, scrub, explore, model, interpret. obtaining data and explaining data structures and data elements.
Once we have a problem to explore, the next logical step– is sometimes the first logical step– is, hey, look, I have data. Maybe I can find a problem. But normally, the first step is to obtain data once we know what the problem is. What data do we need? Where does the data come from? How do I access the data? How do I store the data? How large is the data?
Scrubbing the data is we build that pipeline to ingest the data. We want to start scrubbing the data so that we know that we’re removing any outliers, or we’re handling processing any outliers, we’re converting unstructured text to structured text. And then we’re putting this data in a format, a repository where we can explore the data, usually done with exploratory data analysis, to give us a sense of the range or the span of the data, the variety of the data.
And we want to start looking at the data in context to the response variable that we’re interested in. Is it a classification problem? Is it a separation problem? Is it a forecasting problem? And how do these dependent– or independent variables impact on the dependent or the response variable?
Once we’ve got this idea of the relationship, we can start modeling the machine learning aspects. We can start modeling sort of right off the bat, and then we can interpret. How do we provide the results back from our modeling effort, how do we provide the results back from our exploratory data analysis in a manner that lets us tell a story and communicate these results in a meaningful way, and hopefully drive the business to make a decision based on what the data is telling us?
So our challenge this week is really to help you as you’re finishing up the course to think through how you wrap up your course project, take some of the lessons learned from the laboratory exercises, and take some of the lessons learned as we’ve gone from pipeline to product, and communicate these results in a meaningful way.
10.2 Cat in the Hat
TODO
Cole Nussbaumer Knaflic achieved “rockstar” status after the release of her book, Storytelling With Data.” After watching her presentation on Cat in the Hat and data visualization, what jumped out at you about your current proposed visualizations for the course project? Identify a graphic from your project or other course work that fell a little short, then rework it based on some of Cole’s principles. Submit both images with a short write-up of what was “wrong” and how you made it “right.”
10.3 SOAR
So taking the idea of hypothesis, data analysis, and findings, we’re going to spice it up a little bit, and we’re going to look at how we can SOAR. How we can specify, observe, analyze, and recommend. And over the next couple of sessions, we’ll walk through each one of these, and give some examples of how we can go from just here’s my hypothesis or my null hypothesis to here’s a problem that I’m trying to solve, and this is what I think this problem could do, and this is what we’ve done with the data to have a better understanding, and here’s the recommendation or the next steps we want the business to take.
10.4 Specify
The first part that we’re going to focus on for this session is specifying. We’ll review how in each week or in almost every week this semester we’ve looked at the different elements of a problem and how we turn that elements of the problem into the specification of the work we’re going to do with the data. So let’s just review quickly some of these examples.
So that’s just one example of how we could do that for the baseball game. We have other opportunities to do this, whether it’s in retail. We’ve got a group of loyal customers that have been with us a long time. But our competitors are making an aggressive effort in order to get them to switch. We’ve done a market research study to look at our customers. And we’ve been able to identify some primary drivers of what keeps them engaged with us and also some distractors to our plans that might encourage them to leave us.
10.5 Observe
And now we want to observe. We want to tell a story with our data that shows how it led to the analysis, and of course, how it leads to the recommendation that we want to make. So it’s not just a story.
For this part, it’s not just a story of, I observed that I had this many counts, I observed that I had this many records, I observed that I had this mean or median or standard deviation. It’s an observe and attempt to tell the story of the data itself. And we really want to drive up the energy with the observe portion because the data is what’s ultimately driving us to a recommendation.
10.6 Analyze
- Correlation
- Autocorrelation
- Partial autocorrelation
- Sentiment Analysis
- Picking Values (with a decision tree)
10.7 Recommend
So let’s review, quickly, some of the examples that we’ve had throughout the course. In the market basket analysis, we’ve got a chart that tells us, in a graph format, that these different grocery items are more connected to other grocery items– for instance chips and salsa, dairy and bread– showing these pictures and showing this graph might not be the story you want to tell. So it might be a case of, hey, here’s what chip sales are by themselves. Here’s what salsa sales are by themselves. When we combine chips and salsa together, that has this much confidence from our analytical approach, we can expect a sales increase of x.
So in this sense, you might have two different analytical approaches working together– a forecasting approach working with the market basket analysis. When we looked at the Bayesian approach to trying to get someone to switch in the electronics, based upon their brand loyalty, we came up with the little tagline that a consumer’s preference leads to a consumer choice. A consumer choice drives sales, and driving sales increase the market share for our brand.