Data

A casual data science conference by DataHack.




DataConf brings together data science and machine learning experts from the top companies in Israel for a day of knowledge sharing. DataConf will occur in parallel to DataHack and will be open to both hackathon participants and guests.The event will take place on the 4 of October, full agenda and more information will be published soon.

DATAHACK 2018 is here!

Registration is OPEN!

Agenda

This agenda is partial. Updates are coming soon. :)

09:00 - 09:35

Dana Kaner, Data Scientist at PerimeterX

Bootstrap, Random Forest and all sorts of magic

The Bootstrap resampling method is often used  for statistical inference. We demonstrate its power and simplicity through the well known Random Forest algorithm. We present both the theoretical background on the above topics and an implementation in R.

09:40 - 10:15

Pavel Levin, Senior Data Scientist at Booking.com

Where should I travel next? Modeling multi-destination trips with Recurrent Neural Networks.

Many real-world problems naturally give rise to sequential data. Language models are already widely used to tackle computational problems related to natural language. We would like to present a non-NLP example by walking through a solution to the problem of recommending next destinations to customers who are taking a single trip to multiple cities using RNN-based sequence modeling.

10:20 - 10:55

Ari Bornstien, Sr. Cloud Developer Advocate at Microsoft

Beyond Word Embeddings

Since the advent of word2vec, word embeddings have become a go to method for encapsulating distributional semantics in NLP applications. This presentation will review the strengths and weaknesses of using pre-trained word embeddings, and demonstrate how to incorporate more complex semantic representation schemes such as Semantic Role Labeling, Abstract Meaning Representation and Semantic Dependency Parsing in to your applications

10:55 - 11:10

Coffee Break

11:10 - 11:45

Dr. Michal Shmueli-Scheuer, Researcher at IBM Research

Conversational bots for customer support

In this talk, I'll cover various aspects of conversational bots, focusing on the domain of customer support. Often, human conversations with bots mimic the way humans interact with each other. Moreover, even when customers know that they are interacting with virtual agents (bots), they still expect them to behave like humans. One way to improve interactions with bots is by giving them some human characteristics ,such as emotion and personality. I'll show how a model of neural response generation can be used to generate bot responses according to a target personality. I'll then cover a methodology for detecting egregious conversations in a setting using conversational bots by examining behavioral cues from the customer, patterns in the agents’ responses, and customer-agent interactions.

11:50 - 12:25

Nofar Betzalel, Data Scientist at PayPal

Semi-Supervised Learning Tagging Coverage Extension

When PayPal's risk decision making processes approve a transaction, we soon know whether it was the right decision. However, for declined transactions this is not the case, as our tagging coverage is not complete. This makes it more challenging for analysts and data scientists to understand our False-Positives when performing research and when measuring our decision making processes. In this talk I will discuss how we use Semi-Supervised learning to tag declined transactions as ones that would have been fraudulent or not, if were approved.  This approach enables us to utilize both tagged and non-tagged transactions to train a model for the issued task.

12:25 - 13:10

Lunch

13:10 - 13:45

Dr. Lev Faivishevsky, Researcher at Intel Advanced Analytics

Using Deep-Learning to Detect Video distortions

Since the acquisition of Mobileye, it became common knowledge that Intel is interested in building AI-based products and producing hardware for AI applications. A less widely known role of AI at Intel is an internal role, using the huge and diverse data related to Intel's own operations to transform the way the company works and create a large value. Processor design, manufacturing and sales are leveraging machine-learning methods, including computer-vision, natural language processing and reinforcement learning techniques. The talk will start with a little background about these applications, and focus on one deep-learning based video analytics solution, used in the context of the processor validation. We will describe this non-standard use-case and the challenges in resolving it, most of which are also relevant for other use-cases in the domain, including handling scarcity of labeled data and coping with tight requirements in terms of both accuracy and run-time.

13:50 - 14:25

Prof Danny Pfeffermann, Government Statistician & Director at the Central Bureau of Statistics

Can Big Data Really Replace Traditional Surveys for the Production of Official Statistics

The big advancements in technology, which enable to access and analyse 'big data', coupled with increased demand for more accurate, more detailed and more timely official data, but with tightened available budgets, puts inevitable pressure on producers of official statistics to replace traditional sample surveys by big data sources. In the first part of my presentation I shall discuss some of the major challenges in the use of big data for official statistics, pointing out their advantages and limitations. In the second part I shall consider a general class of statistical models, which can possibly link the big data under consideration to the corresponding target, finite population data. The use of a model in the class may allow estimating finite population parameters, without the need for reference samples or administrative files.

14:30 - 15:05

Avi Hendler-Bloom, Data Scientist at Mobileye

Overcoming the Electronic Traffic Sign Problem

Abstract: Electronic traffic signs are commonly made with LEDs. Due to the differences in frequency and phase between each LED light, classifying this type of sign is challenging.This talk will address the issues faced, and introduce a solution.

15:10 - 15:55

Dr. Gil Chamiel, Director of Data Science and Algorithm Engineering at Taboola

Deep And Shallow Learning in Recommendation Systems

Deep Learning have been gaining increasing attention in the recommendation systems community, replacing some of the traditional methods. In this talk, we will share some lessons we learned from using deep learning at huge scale in Taboola's recommendation system. Specifically, we will talk about the motivation for using deep learning and the tradeoffs between deep models and simpler models. We will discuss our approach to building neural networks with multiple input types (numerical, categorical, text, and images); capturing non trivial interactions between features using both deep dense architectures and Factorization Machine models; Tradeoffs between memorization and generalization and other tips regarding network architectures.

15:55 - 16:10

Coffee Break

16:10 - 16:45

Daniel Benzaquen, Data Scientist at Lightricks

A\B testing at Scale

A/B testing is a central statistical procedure used frequently by data-scientists. Unfortunately, the standard A/B testing framework was originally designed to cope with a handful number of tests, while these days, conducting tens and even hundreds of tests, simultaneously, is a common scenario.

Directly applying the standard procedure, however, is highly problematic as many tests imply many false-discoveries, that potentially lead to sub-optimal performances. With the goal of controlling the false-discovery-rate, several procedures were designed: probably the most naive one is Bonferroni correction; More advanced schemes are Fisher's least-significant-difference, Benjamini-Hochberg etc.
Yet, utilizing these schemes comes with the price of high False-negative rate that scales with the number of tests being conducted.

In this talk we discuss our attempt to bypass these challenges by utilizing a Bayesian Multi-Armed-Bandit approach, namely, Thompson-Sampling (TS) that operates in an online-learning manner. We share our experience and insights based on simulations and real-life experiments.

Finally, we discuss some generalizations of the standard TS scheme we made, that allow us to optimize over (non-trivial) statistical quantities (i.e., unnecessarily the conversion-rate/click-through-rate, which are of obvious interest, but users Life-Time-Value (LTV) etc).

16:50 - 17:25

Oren Shamir, Head of CV algorithm development at Innoviz Technologies

Neural networks for point clouds: Adding the 3rd dimension

Since Alexnet, DNNs have been used with rapidly increasing success to perform a wide variety of tasks on 2D images. This is the result of increased data availability, increased effective processing power, as well as incremental algorithmic improvements. Today, DNNs achieve super-human results on multiple tasks in the 2D data domain.

Processing of 3D data using DNNs has been studied less during that time. 3D sensors are less abundant, and are more variable in their capabilities and properties. In the past few years various methods for processing of 3D data have emerged, driven mainly by the medical imaging industry and, more recently, the autonomous car industry. 3D data may be unstructured, sparse and irregular, yielding unique challenges relative to 2D image data.

In this talk I will discuss the challenges of working with 3D data, and present an overview of approaches towards 3D data processing in DNNs.

Stay Tuned

Thank you! Your submission has been received!
Please enter your email