GoVertical presents

Vertical ML/AI Startup Creation Weekend

Hosted by Madrona Venture Labs & TiE Seattle

As a free benefit for participants, we would like to extend an invitation to the Amazon SageMaker workshop on Feb 14 from 1p-5p.


Retail resources

Welcome to the Retail vertical page! In order to make the most of the time the weekend of the event, please review our key educational materials and data sets. 

Be Prepared! Start thinking through what types of data could power your business and product ideas. Often times a combination of multiple, disparate data sets can yield the most ingenious ideas and solutions!

Panel videos

The following videos were recording during the April 19 Panel event. You may wish to reference them in preparation of the weekend ML event.

ML Panel moderated by Dan Weld. Panelists: Xin Luna Dong, Yejin Choi & Kevin Jamieson


VC Panel moderated by Jay Bartot. Panelists: Tim Porter, Mike Miller, Pradeep Rathinam & Ankur Teredesai


Sector analysis

Vertical description

The retail market includes any sale of products to consumers. From eCommerce, Grocery, Gasoline, Ikea, to Stitch Fix. How can we utilize technology to improve the shopping experience?

How big an opportunity space is this, how is it growing, and what’s driving that growth?  

The US retail market is worth $3.5 trillion and growing at 3%. eCommerce only has 10% market share of total sales. The smart retail market is worth $10B but is growing at 24%. Grocery is a $663 billion market in the US. Growth is driven by the adoption of smartphones and changing customer demands (customer service, delivery, one click).

What are the segments/pockets?

There are several different types of technology used in smart retail. Bluetooth, ZigBee, RFID, WIFI, VR/AR, LPWAN. You can also segment it by application: smart label, store navigation, smart payments, robotics, analytics, visual marketing. The largest market size among categories is in hardware as computing power will limit adoption of new technologies and require upgrading. Robotics is estimated to have the most growth among segments.

What has been the VC investing trend in this category?  

What is the technology spend and trend in this category, or the revenue growth rate of companies in the category (whichever is applicable)?

64% of retailers are selling omnichannel in the US. 40% of retailers plan to increase their spend on technology. There is still much more data to take advantage of from shoppers.

What are the proof points that success may be rewarded?

At a high level, what problems are there to be solved using technology?  

What current trends are driving change in this category?  

How specifically can ML/AI change the game in this category?  

Investment hypothesis / rationale

The retail market is so large and valuable there are many right tailed opportunities. We should be able to identify interesting spaces to innovate and capture portions of this expansive market.

What adverse conditions / headwinds are there for a play in this space? What makes it difficult?

Data sets

Your novel business idea should be grounded in real-world data with plausible machine-learning/analytics on top. We've compiled a collection of datasets from which to gain inspiration. Note that you are not restricted to basing your idea on the data sets below. You may discover other open source data sets that inspire your creativity or you may bring your own proprietary data sets if you wish.

Many of the datasets below are from Kaggle, Figure-Eight (Crowdflower), Data.World, etc. The advantage of these datasets is that many have been cleaned and normalized and are ready to be explored with ML and data science tools. Note that the use of these datasets is often intended for research purposes only. Be sure to read any associated license agreements to understand if there are commercial restrictions if you plan to continuing using the data after the workshop is over.

Sample Data Sets

This dataset contains product reviews and metadata from Amazon, including 142.8 million reviews spanning May 1996 - July 2014

Idea: Can you design a model that learns what a high-quality, specific, helpful review looks like, and give you real-time suggestions as you write reviews?

Idea 2: Can you mine the review data to pull out the three things people most like and the three things people most dislike about various products?

This dataset contains Question and Answer data from Amazon, totaling around 1.4 million answered questions.

A large collection of Amazon and Yelp reviews, plus Yahoo Answers data.

Idea: Design a review summarizer that summarizes the positive and negative reviews for a product to allow users to quickly understand overall review sentiment from users.

Labeled tweets about multiple brands and products. (originating page found here)

Gengo scoured the internet to gather a list of publicly available ecommerce datasets for machine learning projects. Enjoy!

Access and analyze open grocery data for Canada.