<back to home

DataConf

DataConf, our FREE data science conference, is back this year and it's better than ever!

DataConf brings together data science and machine learning experts from top companies in Israel for a day of knowledge sharing.

Talks will be given by data scientists and researchers from various companies, including Orcam, Armis, Lightricks, Monday.com, MobileEye, JPMorgan, Taboola, LivePerson and many more!

08:30 - 09:00

Registration, Opening remarks

09:00 - 09:35

Production data-science in practice

Elad Tsur from Planck Resolution

Creating your ML models is just the first step on the way to bring value to your customers. But the work isn't done when your model is trained, there's still a minor step needed - to put those models in production. At Planck, we manage hundreds of models in production. In this talk I'll share with you how we're doing that, how we test & deploy them, and how we allow many data-scientists, developers and analysts to create and push new models to production without interfering with each other.

09:40 - 10:15

The innards of a modern speech recognition system

Yonatan Wexler from Orcam

There has been a great leap in quality of speech recognition in the last two years. This talk will outline the structure of a successful speech recognition system and review some of the key techniques which allowed this progress. We will cover acoustic modeling, language modeling, language embedding for increased accuracy , voice activity detection, deep source separation, various deep learning network structures and useful losses.

10:20 - 10:55

Learning with Highly Imbalanced Data: the Extreme Value Regression Approach

Orit Moradov from Lightricks

Binary classification is a fundamental problem with numerous real-life applications. Accordingly, many algorithms and frameworks are designed to address it, ranging from parametric models such as logistic regression, to non-parametric classifiers such as decision trees, K-NN etc. In the presence of imbalanced data, these models are known to perform poorly, and usually converge to a trivial solution - i.e., classifying all observations to the majority group. The Generalized Extreme Value (GEV) regression models the input’s underlying distribution with a family of distributions that better captures the 'extreme' nature of the data. In this talk, we will present and discuss the GEV framework and exemplify its usage with a real-life case study.

10:55 - 11:10

Coffee Break

11:10 - 11:45

Graph Neural Networks for Recommender Systems

Dr. Yedid Hoshen from Taboola

Recommender systems take a set of unordered features as input e.g. user location, target category, publisher name. Standard high-performing deep architectures such as CNNs or RNNs assume particular data symmetries which are not satisfied in recommender systems. In this talk, we will overview Graph Neural Network methods and describe their connection to recommender systems. We will show how several popular recommendation algorithms can be seen as particular instances of GNNs. Finally, we will describe how the connection with GNNs motivates new recommendation system architectures

11:50 - 12:25

A/B Tests Mistakes That Cost us 1M$

Stav Levi & Oryan Moshe from Monday

At monday.com, we run hunderds of tests every month. We make mistakes in the procecss and learn as we go. Come learn from every step of our journey as we scale. We built our own internal A/B Testing tools for operation and analysis, complete with data models and statistical analysis.

12:25 - 13:10

Lunch

13:10 - 13:45

Anomaly detection over network traffic

Ron Shoham from Armis

Applying Machine Learning techniques in the network security field is a very challenging task. Lack of labelled data, non-stationary behaviour and the increasing number of devices and device types are just some of the reasons why. At Armis, our challenges go from detecting, classifying and profiling devices to alert for anomalous and suspicious behaviours. In this talk we will review a few of the methods we use for anomaly detection over unlabelled data.

14:30 - 15:05

DS research in a fast-moving company: Staffing for messaging case study

Matan Mandelbrod from LivePerson

Research, by its nature, is uncertain, and requires iterations and time, whereas a fast-moving company delivers products in a tight and committed schedule. How can we, as researchers, bridge this gap? What approach should we undertake so as to conduct an efficient research and provide value even in the non-academic environment of a fast-moving company? Messaging technology presents new challenges for contact center operation management. We’ll be using the problem of staffing for messaging-based contact centers as a case study to share our insights and experience in the context of these questions. So attendees value is twofold - introduction to a cutting-edge scientific problem, and a set of tools and methods to conduct an effective research in a fast-moving company

15:10 - 15:55

Blinded by our own models - semi supervised learning approach

Sharon Datner from Paypal

We often face the task of developing the next version of a model that is already taking action in the world. In some cases, these actions prevent us from observing the true label. These missing labels make the problem of developing the next version of the model more complicated. In this talk we’ll dive into the challenges of working with the incomplete label-set that results from previous model-based actions, introduce a practical semi-supervised approach to train 2nd generation models that can overcome these challenges and describe how we applied it successfully in the fraud detection domain.

15:55 - 16:10

Coffee Break

16:10 - 16:45

Unsupervised learning of geometry from video

Itay Blunetal from Mobileye

In this work we combine classic computer vision techniques with cutting-edge machine learning to understand the 3D geometry of our surroundings with no labeled data. This is a crucial task for autonomous driving development.

16:50 - 17:25

NLP beyond Kaggle: Unstructured Texts and Real Life

Adam Bali from NetApp

Modeling natural language for industry solutions imposes extreme challenges. Companies invest endless efforts in making sense of piles of textual human communication. Handling different tasks, domains and contexts requires extra attention and creative methodologies. This talk will outline some of those challenges, and demonstrate how approaches like semi-supervised and transfer learning are utilized in the pipeline of our product, designed to detect sensitive information across any organization's data silos. Applying those enabled us to craft robust solutions, even when labeled data is lacking.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Datahack | non-profit organization | contact@datahack.org.ilFacebook Twitter