Big Data

Making Data Manageable

Extreme Processing, Storage and Analysis

Big Data Analytics

From our beginnings as a company building high-traffic applications, Thumbtack has extensive experience in managing and analyzing large data sets. With PhD Data Scientists on staff, we can help with problems ranging from recommendation engines to traffic analysis and from ad optimization to analyzing digital subscription patterns.

  • Data Science and Data Mining
  • Hadoop Implementations
  • Semantic Analysis
  • Business Intelligence

What Does It Mean To Do Big Data Analytics?

Big Data is a term that is bandied about to mean virtually anything, but at Thumbtack we use it to mean one of two things:

  • Data that is large enough so that it is impractical to store or query in sufficient granularity or on traditional infrastructure
  • Attempting to find meaning or predictive value in unstructured or semi-structured data, through a combination of manual and machine learning approaches

These roughly fit into the buckets of Data Warehousing and Data Mining, but as the amount of data becomes larger and less structured, and technologies such as Hadoop becoming widespread, companies are often looking for assistance to make sense of this ever expanding universe of usable data.

There is a third type of BigData challenge organizations are facing: Handling large quantities of data on and responding to it quickly. We discuss this operational workload on our NoSQL section.

Business Domains

Thumbtack deals with a wide variety of business domains, but here are some common scenarios:

Recommendation Engines

Leveraging computational methods to determine relationships between data, and to make personalized recommendations of products, content, or other people. Continual refinement of such systems to adapt to changes in traffic patterns or user behavior.

Segment Generation

Evaluating large sets of data to group users into various segments based on behavior, and examining how such segments perform when targeted with ads, personalization or customized offers. Detecting “look-alike” segments based on characteristics of known user segments.

Business Intelligence

Taking large quantities of unstructured data and preprocessing so that it can be easily aggregated and visualized. For example, providing responsive user tooling to group and forecast performance across overlapping user segments.

Semantic Analysis

Using heuristics to analyze digital content and create metadata. These can be used to build taxonomies, categorization maps, and content recommendations. This metadata can then be leveraged to improve targeting of ads or to recommend content to the appropriate people.

Trend Detection

Real-time detection of changing behavior or system performance. This enables rapid response to unexpected events, and helps detect and defend against arbitrage opportunities.

Machine Learning

Using Bayesian methods, rules-based engines, Apriori algorithms, and other methods to generate rule sets that perform well out of sample. For example, generating ad bidding strategies based on continuous streams of click-through and conversion strategies.


If your enterprise is handling a large amount of data, chances are you are already using Hadoop. Thumbtack has experience using Cloudera, Hortonworks, MapR Technologies, and other Hadoop distributions to answer questions in a variety of industries. In addition to using standard tools such as Hive and Pig, we have built complex data workflows and ETL processes to get each type of data into a place where it can be visualized and/or interpreted.

Hadoop is a very general purpose tool, and the use cases are limitless. Contact us if you’d like to discuss a concrete use case.

Contact Us

Big Data is a huge field that can’t easily be summarized. Contact us so we can discuss a tailored solution to your needs.