Category: Data Analytics

In the midst of teaching my children about the economics behind a lemonade stand, I realized that the lessons I am trying to teach my 7 year old daughter are either outdated or just plain wrong. With the changing face of business, even my simplest comments about trying to be in a high-traffic location don’t really make a lot of sense in today’s world.

Me: “You should think about setting up in a place where there’s a lot of people, you are more likely to get customers.”

Passing along the knowledge that high volume leads to higher sales isn’t exactly true. Yes, it’s always nice to have more opportunity, but my comment lacked a basic understanding of targeting. Also, the way I started the sentence isn’t really true in business anymore. “You should think” doesn’t align with modern reality. Taking a step back, I need to show her that we need to “examine the data” that shows where your customers are most likely to be. Maybe her soccer field is better than an indoor mall.

Me: “Make sure you prepare enough ingredients.”

My comment to her was followed up by a plethora of questions like “how do you know how much XXX ingredient to have?” After she asked this question, I began to see that I was completely unprepared on the knowledge front. I definitely started this conversation without drinking my own lemonade (pun intended). Not only did I not know the lemonade business, but I hadn’t even begun thinking about weather, holidays, or cyclical patterns that affect sales. Each of these factors are issues businesses inherently think about and take for granted, but never formalize into their plan. Luckily, in the end, the volume of ingredients came down to how much she could carry. *Phew* crisis averted.

Me: “Make a big sign so everyone who goes by can see it.”

Outside of the obvious, there are many more marketing channels than a sign. I cringed even as I said this. I again, ran into my own pre-conceived notions about business. Bigger isn’t necessarily better. I worry that putting this in her head is only going to make her think that’s the only factor in success. Instead, I should have explained that boutique shops can make higher margin based on a quality product and targeted advertising.

Maybe I’m the problem here. It seems like as a parent with a business and computer science degree, I really should have better answers for my daughter. So I sat down, thought through what I’d said and began to talk about signals, patterns in data, and making decisions based on the data.

Daughter: “Dad, do you mean a signal like a red light?”

Yes. I started to explain to her that if car velocity was the target variable, then a red light would be the signal, albeit the cause… obviously, I am going to have to talk about this when she get older – or, more accurately, when I get a little wiser. Being in the forefront of the software industry, I’m seeing a trend to bring this science to the masses. Eureqa is one of our software partners that has opened doors and makes common sense of data science, and frankly, has blown the socks off retail analytics. I loaded the software onto one of our four node (32 core) clusters. I’ve started showing my daughter the software and what all the numbers mean. Not surprisingly, it has been easier when she can see the trend lines. At some point, I will need to teach my daughter about the data science behind the software, but for now, she trusts the data and what it tells her.

The perception she is forming of business is already radically different than mine. In my career, I’ve been helping businesses with collection and analysis of data through software. That’s assumed for her. She already makes decisions based on technology (don’t get me started about a 7 year old having a phone or her texting me at bedtime). Her trust of data and patterns is, well, a generation ahead.

After one of my sessions of showing her the Eureqa software, she asked why everyone didn’t use this technology. I almost fell down the rat hole of describing the “Simply Orange” case study to her (telling her that analytics are actually everywhere), but wisely, I just smiled, poured her a lemonade, and told her I thought everyone should use it and maybe she could be the one to show the world. If you haven’t read the case study about how Coca Cola uses information to normalize their orange juice, you should. Its where we are headed.

I never thought I’d get so tongue tied explaining the business of a lemonade stand but it came out fine in the end with a little help.

PS: Julia@chicagoist, you are wrong about Simply Orange. Data science is the status quo today and should be embraced. It creates better products, not flawed ones. I’ll put my money on the data science any day of the week, because in a few years my daughter is coming to eat your lunch.

Artificial Intelligence Meets Business Intelligence

How smart is your lemonade stand? Learn how RoundTower Technologies is helping businesses and organizations like yours to streamline data science workflows and transform raw data into easy to interpret analytical models in minutes.

For three years in a row, a machine learning algorithm that learned to predict off past results data provided by BrisNet, has crushed the Kentucky Derby result, but you’ll be lucky if you hear about it outside of this blog as it’s not something that the mainstream horse media necessarily want to hear – a machine doing better than the human pundit (where have we seen this before!)

Eureqa is a machine learning program that uses evolutionary algorithms (that is algorithms that learn off one another) to generate predictive models. The program was developed by Michael Schmidt and his team an Nutonian and has been used in a number of applications. We actually used Eureqa her at Performance Genetics to do our initial model building.

Eureqa, or more specifically Nutonian, approached Ed de Rosa, the marketing Manager at leading horse data supplier BrisNet, back in 2014 with the idea that they would try to predict the most likely winner of the 2014 Kentucky Derby. That year, after analyzing the data provided by BrisNet (which includes some proprietary figures developed by BrisNet for their handicappers) they came to the following 5 horses: California Chrome, Wicked Strong, Danza, Vicars in Trouble, and Samraat.

Their predicted top 5 for
the 2014 Kentucky Derby
1 California Chrome
2 Wicked Strong
3 Danza
4 Vicars in Trouble
5 Samraat

You can read their predictions here. As we now know California Chrome won the Kentucky Derby but it was worth noting that they also correctly picked Wicked Strong, Danza and Samrat to fill the first five over the line, only missing longshot Commanding Curve.

Interestingly, along with a detailed discussion on how they went about their work, they supplied the final algorithm for their predictive score:

Horse Score = 5.614695362 + 2.634162332*(Racing Style_Early) + 0.5869793526*(Trainer Meet %)*Speed - 0.06186576034*Speed - 57.63578215*(Trainer Meet %) - 1.000054353*exp(1.027235778*(Starting Price Implied Probability Standardized))

Their predicted top 5 for
the 2015 Kentucky Derby
1 American Pharoah
2 Dortmund
3 Materiality
4 Danzig Moon
5 Tencendur

The following year they were back at it again. This time they didn’t reproduce their final algorithm, which would have to be somewhat similar to the one the year prior as there was only another year’s worth of data to add, but they again did very well with their predicted first five in the Kentucky Derby.

Again with American Pharoah they had the winner, but they also had the third placegetter Dortmund and the 5th placegetter Danzig Moon in their top 5 selections.

So we come to this year’s predictions. In their blog post they mention a couple of new factors that their algorithm has learned to weight, but rest on just five variables:

  • Standardized live odds probability
  • Speed over the past two races
  • Post position
  • Racing style
  • Track conditions

You will see from the algorithm they posted from two years ago that “Racing Style”,  “Speed” and “Odds Probability” were already variables used previously while the Post Position and Track Conditions also became of interest. Racing Style and Speed are figures developed just for BrisNet customers, while the Odds probability is pretty much standard in any predictive algorithm as the betting market is somewhat rational in its thoughts on each runner.

Their predicted top 5 for
the 2016 Kentucky Derby
1 Nyquist
2 Gun Runner
3 Exaggerator
4 Creator
5 Mohaymen

As we now know the first four across the line were Nyquist, Exaggerator, Gun Runner and Mohaymen so they got the Superfecta (first 4) across the line from their five selections which paid a healthy $542 for a $1 bet. While the winners have all been favorites, they have picked the winner of the Kentucky Derby as their first selection in each year and had many of the placegetters in their first five. It’s an interesting use of machine learning processes and a machine learning algorithm, and it’s certainly better than a lot of the pundits out their that is for sure!

REPOST: A machine learning algorithm has crushed the Kentucky Derby by Byron Rogers

In a recent survey by RightScale, internal business units expect no less than near-instant access to cloud services. Among enterprises that offer a self-service cloud portal, 80% can currently provision cloud (public or private) workloads in less than an hour—double the figure of 40%, a year ago. However, a majority of enterprises are still missing critical elements of Cloud Governance, such as a defined portfolio of cloud providers; guidelines for applications that should be moved to the cloud; policies for Cloud Service Level Agreements (SLA) and Disaster Recovery (DR); and basic approval policies. Now that Cloud and DevOps are so intertwined, many organizations use cloud infrastructure as the foundation for building a solid DevOps platform. Some of the automated configuration management tools in this space are Chef, Puppet, Docker, Salt and Ansible. These tie-in with cloud automation tools to automatically deploy infrastructure across clouds that have become extremely effective and central to the function of many organizations.

RoundTower Technologies has built dedicated Service Practices around these area’s to help our clients get the self-service, automation, scale, speed, and availability for their organization to become more efficient and profitable.  We’ve taken a three pronged approach to the complex decisions facing our clients today:

  1. Workload Optimization & Migration – IT leaders need to develop a bi-modal strategy for their workloads.  Traditional steady-state workloads need to be placed in environments that meet critical operational parameters while offering the cost optimization of Internet-as-a-Service (IaaS) models.  Next-Gen Applications need flexible cloud services that meet the needs of these scalable, agile workloads and the accelerated continuous development platforms required to enable their evolution.   RoundTower brings the expertise and tools to help clients develop this strategy and enable the migration.
  2. Cloud & Automation – Whether a client is just embarking on a move to cloud and automation or has already developed a mature approach, RoundTower is focused on delivering capabilities and expertise in the leading technologies to enable clients to design, implement, integrate and optimize these platforms.  RoundTower is invested in technologies from VMware, Cisco, Openstack, Amazon, Microsoft Azure, and others, which support private, hybrid or public cloud architectures.
  3. DevOps – While the transformation to a full DevOps IT approach can be significant, RoundTower is initially focused on helping clients introduce and leverage the leading DevOps oriented tools and platforms as a broader on-ramp to lifecycle overhaul for application development.  Given the crucial fact that these tools have significant cloud awareness and integration, RoundTower’s strategy is to ensure that they align well with the first two prongs of our approach. As a result, we are focused on technologies from Puppet, Chef, Pivotal CloudFoundry, Atlassian, Amazon and Microsoft.

The advent of cloud based computing is reordering the world of information technology but customers can count on RoundTower to help them effectively implement the latest technologies and remain competitive in the ever changing marketplace.

RoundTower Technologies and the Par 4 Technology Group are merging to create one of the leading data center infrastructure solution providers in the industry. The company consolidation will employ over 200 people and is expected to exceed $300 million in revenue in 2016, while serving over 1,000 clients, including many of the world’s foremost corporations.

“As technology evolves and business challenges increase in complexity, our approach of attracting and empowering the best technical talent to solve difficult problems has been very successful,” says Stephen West, RoundTower Managing Partner. “Combining these organizations accelerates that strategy to the benefit of our clients and partners.”

Maintaining over 1,200 certifications from technology giants such as EMC, Cisco, VMware, Hewlett-Packard, Citrix, Microsoft and others, the combined company will be one of the most technically astute consulting and managed services firms in the field. This will better enable them to assist clients with a comprehensive range of long-standing solutions in storage, virtualization, data protection, cloud computing, and security, as well as new options in emerging technologies such as DevOps, Automation and Big Data.

“We are very excited by the opportunity to enhance our clients’ experience and add value that positively impacts their bottom line,” says Stephen Power, RoundTower Managing Partner. Gary Halloran, President of Par 4, also sees sterling advantages for customers. “Aligning with RoundTower broadens our portfolio of services significantly, making our clients the biggest beneficiaries of this merger.”

Going forward, the new entity will operate under the RoundTower name and Gary Halloran will join the executive leadership team. Company headquarters will be in Cincinnati, OH, with offices in Boston MA, Columbus OH, Nashville TN, Louisville KY, Indianapolis IN, Philadelphia PA, Tampa FL, Miami, FL and Baton Rouge, LA.

As the Storage Practice Manager at RoundTower, one technical request that my team receives quite often is to analyze storage array performance data and use it to assess the health of an environment, size out capacity upgrades, or locate performance bottlenecks. We can quickly churn out our analysis of what is going on inside the array, but this is often only a small piece of the overall picture. For a more thorough understanding of storage performance, it is important to have information about the entire environment–including server/application design, SAN layout, and the business’ expectations of what is defined as “good performance”. Having only array-level throughput and bandwidth data can result in some interesting performance review sessions. I have walked into a customer meeting before and said, “What the heck is LUN 3452? It shoots up to 100 megabytes per second all the time.” The customer said, “Ignore that one; it’s the storage team’s MP3 collection.” In the opposite case, I have occasionally found that the business’ “mission critical, high performance databases” are on some of the least busy LUNs in an array. Understanding the business requirements is a critical component. Commonly we will be given a handful of days’ worth of performance data and asked to assess performance. My first question will be, “Does this data capture the most important or highest utilization periods?” For most environments, a week’s worth of data is enough for a quick health check, but is not enough to analyze the long-term utilization of a storage system. An interview with application developers or DBAs who are exposed to the business will usually result in an understanding of which time periods are the busiest from their perspective. This helps to target the best times for data collection. Knowing how the key applications are designed is also important. Many times we have been asked to assess a performance problem and have been given only storage array data. When searching for a bottleneck, we can simply walk down the stack until the pain point is discovered. On the other hand, when analyzing data for a general performance review, we must consider even the applications which appear to be low-performance in our collected data. Understanding when certain applications perform batch workloads or other high-throughput operations keeps us from glossing over storage devices which may not be busy during our collection period, but can have a major impact on storage at other times. Once the application and business designs are understood, we will analyze performance with these requirements in mind. It is not simply a review of IOps, bandwidth, and response time. These metrics can have very different meanings depending on the perspective. As an example, consider a storage system that averages 5000 IOps and 40 MBps during the business day, but routinely spikes to 15000 IOps and 500 MBps every night at 2am. If it is determined that this is the hour that nightly backups begin, my opinion on large-block I/O and high response times will be different than if this is determined to be a critical batch workload that causes significant business impact if not completed in less than 30 minutes. On the other hand, I may find during my business interviews that having data backed up within specific timeframes is a critical function, causing me to place very high importance on this 2am workload. The final consideration when reviewing performance data is the underlying architecture of the storage platform for which we have collected data. Each storage array handles I/O differently—utilizing caching algorithms, load balancing, drive tiering, and other features to deliver the appropriate performance as it is demanded. This is especially important if we are reviewing performance data to size out a replacement solution. The new system may have significantly different performance characteristics, which must be considered when modeling against workloads from an older array. As an example, consider a 2 TB database system that runs on a storage platform that contains 8 GB of cache. After collecting application data, it is determined that only 1% of the database is being accessed at any given moment, and that 50% of all I/O is being serviced from the existing storage array’s cache. If this environment is moved to a storage platform that contains 128 GB of cache, the performance of the database may be significantly changed. It may be determined that a high cache utilization rate allows for lower performance requirements to the backend drives. These are only a small handful of examples where merely having access to array-level performance will not tell the full story. It is important to tie this performance data together with business and application requirements, giving a full understanding of what is meant by “good performance”. Taking the thorough approach to storage analysis may result in significantly more effort, but will leave a business much more prepared to handle full operational requirements, and to validate the money that has been spent on existing solutions. Many organizations do not have the time or resources to tackle this analysis internally, which is where the team of RoundTower engineers will fill in. We have the experience to not only review performance data, but the desire to learn enough about your business to make some sense of it all.