From our beginnings as a company building high-traffic applications, Thumbtack has extensive experience in managing and analyzing large data sets. With PhD Data Scientists on staff, we can help with problems ranging from recommendation engines to traffic analysis and from ad optimization to analyzing digital subscription patterns.
Big Data is a term that is bandied about to mean virtually anything, but at Thumbtack we use it to mean one of two things:
These roughly fit into the buckets of Data Warehousing and Data Mining, but as the amount of data becomes larger and less structured, and technologies such as Hadoop becoming widespread, companies are often looking for assistance to make sense of this ever expanding universe of usable data.
There is a third type of BigData challenge organizations are facing: Handling large quantities of data on and responding to it quickly. We discuss this operational workload on our NoSQL section.
Thumbtack deals with a wide variety of business domains, but here are some common scenarios:
Leveraging computational methods to determine relationships between data, and to make personalized recommendations of products, content, or other people. Continual refinement of such systems to adapt to changes in traffic patterns or user behavior.
Evaluating large sets of data to group users into various segments based on behavior, and examining how such segments perform when targeted with ads, personalization or customized offers. Detecting “look-alike” segments based on characteristics of known user segments.
Taking large quantities of unstructured data and preprocessing so that it can be easily aggregated and visualized. For example, providing responsive user tooling to group and forecast performance across overlapping user segments.
Using heuristics to analyze digital content and create metadata. These can be used to build taxonomies, categorization maps, and content recommendations. This metadata can then be leveraged to improve targeting of ads or to recommend content to the appropriate people.
Real-time detection of changing behavior or system performance. This enables rapid response to unexpected events, and helps detect and defend against arbitrage opportunities.
Using Bayesian methods, rules-based engines, Apriori algorithms, and other methods to generate rule sets that perform well out of sample. For example, generating ad bidding strategies based on continuous streams of click-through and conversion strategies.
If your enterprise is handling a large amount of data, chances are you are already using Hadoop. Thumbtack has experience using Cloudera, Hortonworks, MapR Technologies, and other Hadoop distributions to answer questions in a variety of industries. In addition to using standard tools such as Hive and Pig, we have built complex data workflows and ETL processes to get each type of data into a place where it can be visualized and/or interpreted.
Hadoop is a very general purpose tool, and the use cases are limitless. Contact us if you’d like to discuss a concrete use case.
Big Data is a huge field that can’t easily be summarized. Contact us so we can discuss a tailored solution to your needs.