Latest World News Update
  • Home
  • Business
  • National
  • Entertainment
  • Sports
  • Health
  • Science
  • Tech
  • World
  • Marathi
  • Hindi
  • Gujarati
  • videos
  • Press Release
    • Press Release
    • Press Release Distribution Packages
  • Live Streaming
  • Legal Talk
Reading: Yandex releases world’s largest event dataset for advancing recommender systems – World News Network
Share
Latest World News UpdateLatest World News Update
Font ResizerAa
Search
  • Home
    • Home 1
  • Categories
  • Legal Talk
  • Bookmarks
  • More Foxiz
    • Sitemap
Follow US
Copyright © 2014-2023 Ruby Theme Ltd. All Rights Reserved.
Latest World News Update > Blog > Business > Yandex releases world’s largest event dataset for advancing recommender systems – World News Network
Business

Yandex releases world’s largest event dataset for advancing recommender systems – World News Network

worldnewsnetwork By worldnewsnetwork Last updated: June 2, 2025 7 Min Read
SHARE

VMPL
New Delhi [India], June 2: Yandex has published Yambda (Yandex Music Billion-Interactions Dataset), the world’s largest currently available open dataset for recommender systems, containing nearly 5 billion anonymized user interactions with audio tracks from its music streaming platform, Yandex Music.
Yambda serves as a universal benchmark for testing new approaches and algorithms across all domains utilizing recommender systems — e-commerce, social networks, and short-form video platforms.
The dataset enables researchers to develop and test new recommender algorithms against its baseline models, accelerating innovation. Startups with limited data can leverage the dataset to build and test systems using Yambda before scaling. This accelerates the creation of advanced technologies tailored to business needs worldwide.

Bridging the research-industry gap
The quality and scale of training data are critical to delivering relevant recommendations on platforms like streaming services, social networks, short-form video apps, and e-commerce marketplaces. However, research in recommender systems has lagged behind rapidly advancing fields like large language models, largely due to limited access to large-scale datasets. Effective recommendation models require terabytes of behavioral data, which commercial platforms possess but rarely share publicly.
Researchers are often left with small, outdated datasets that fail to capture the complexity of modern usage:
* Spotify’s Million Playlists dataset is too small for commercial-scale recommender systems.
* Netflix Prize dataset, with ~17,000 items and date-only timestamps, limits temporal modeling and large-scale research.
* Criteo 1TB Click Logs dataset lacks proper documentation and identifiers, and focuses narrowly on ad clicks.
“Recommender systems are inherently tied to sensitive data. Companies can only publish recommender system datasets publicly after exhaustive anonymization, a resource-intensive process that’s slowed open innovation,” explains Nikolai Savushkin, Head of Recommender Systems at Yandex.
This data scarcity creates a gap: models that excel in academic settings often underperform in real-world applications. Efforts to integrate recommender systems with advanced architectures are also constrained by the lack of suitable training data.
About the Yambda dataset
Yambda addresses recommender system challenges by providing a massive, anonymized dataset from its music streaming service with ~28 million monthly users. This dataset provides insights into how users interact with the content offered by Yandex Music, which is known for its sophisticated recommendation system My Wave that tailors the listening experience to the tastes of each user. To protect privacy, all user and track data is anonymized, using numeric identifiers to meet privacy standards.
Key features of the dataset:
* 4.79 billion anonymized user interactions collected over 10 months.
* Data from 1 million users and anonymized descriptors for 9.39 million tracks.
* Includes two feedback types: implicit interactions (listens) and explicit interactions (likes, dislikes, and their removal).
* Offers audio embeddings (vector representations generated via convolutional neural networks) and anonymized information about tracks.
* Features an “is_organic” flag marking whether users discovered tracks independently or through recommendations, enabling deeper behavioral analysis.
* All events are timestamped, which supports the analysis of user behavior over time and allows models to be evaluated under conditions that closely resemble real-world use.
The dataset is released in Apache Parquet format, compatible with distributed processing systems such as Spark or Hadoop and analytical libraries like Pandas and Polars.
“Yambda empowers researchers to test innovative hypotheses and businesses to build smarter recommender systems. Ultimately, users benefit — finding the perfect song, product, or service effortlessly,” notes Nikolai Savushkin.
Dataset versions and evaluation
Available in three sizes — approximately 5 billion, 500 million, and 50 million events — the Yambda dataset accommodates researchers and developers with different needs and computational resource capacities.

The dataset uses Global Temporal Split (GTS) for evaluation, a method that splits data by timestamps to preserve event sequences. Unlike Leave-One-Out, which removes the last positive interaction from each user’s history for testing, GTS avoids breaking temporal dependencies between training and test sets. This ensures a more realistic model testing — mimicking real-world conditions where future data is unavailable.
Baseline implementations include MostPop, DecayPop, ItemKNN, iALS, BPR, SANSA, and SASRec, providing benchmarks for comparing new recommender system approaches. These baselines are evaluated using standard metrics, including:
* NDCG@k (ranking quality)
* Recall@k (retrieval effectiveness)
* Coverage@k (catalog diversity)
“When industry leaders share hard-won tools and data, a rising tide lifts all boats: researchers gain real-world benchmarks, startups access resources once reserved for tech giants, and users everywhere enjoy greater personalization,” added Nikolay Savushkin.
Yambda, the world’s largest open recommender system dataset, is now available on Hugging Face.
About Yandex
Yandex is a global technology company that builds intelligent products and services powered by machine learning. The company’s goal is to help consumers and businesses better navigate the online and offline world. Since 1997, Yandex has been delivering world-class, locally relevant search and information services and has also developed market-leading on-demand transportation services, navigation products, and other mobile applications for millions of consumers across the globe.
About My Wave
My Wave, a personalized recommendation system integrated into the multi-million-user music streaming service, Yandex Music, employs deep neural models and AI algorithms to analyze over a thousand factors — including user interactions, customizable mood/language settings, and real-time music analysis of spectrograms, frequency ranges, rhythm, vocal tone, and genre. By processing listening history and track sequences, it dynamically adapts to user preferences, identifies audio similarities, and predicts musical tastes to deliver tailored suggestions.
(ADVERTORIAL DISCLAIMER: The above press release has been provided by VMPL. ANI will not be responsible in any way for the content of the same)

Contents
WORLD MEDIA NETWORKPRESS RELEASE DISTRIBUTIONPress releases distribution in 166 countriesPress releases in all languagesPress releases in Indian LanguagesIndia PackagesEurope PackagesAsia PackagesMiddle East & Africa PackagesSouth America PackagesUSA & Canada PackagesOceania PackagesCis Countries PackagesWorld Packages

Disclaimer: This story is auto-generated from a syndicated feed of ANI; only the image & headline may have been reworked by News Services Division of World News Network Inc Ltd and Palghar News and Pune News and World News

sponsored by

WORLD MEDIA NETWORK


PRESS RELEASE DISTRIBUTION

Press releases distribution in 166 countries

EUROPE UK, INDIA, MIDDLE EAST, AFRICA, FRANCE, NETHERLANDS, BELGIUM, ITALY, SPAIN, GERMANY, AUSTRIA, SWITZERLAND, SOUTHEAST ASIA, JAPAN, SOUTH KOREA, GREATER CHINA, VIETNAM, THAILAND, INDONESIA, MALAYSIA, SOUTH AMERICA, RUSSIA, CIS COUNTRIES, AUSTRALIA, NEW ZEALAND AND MORE

Press releases in all languages

ENGLISH, GERMAN, DUTCH, FRENCH, PORTUGUESE, ARABIC, JAPANESE, and KOREAN CHINESE, VIETNAMESE, INDONESIAN, THAI, MALAY, RUSSIAN. ITALIAN, SPANISH AND AFRICAN LANGUAGES

Press releases in Indian Languages

HINDI, MARATHI, GUJARATI, TAMIL, TELUGU, BENGALI, KANNADA, ORIYA, PUNJABI, URDU, MALAYALAM
For more details and packages

Email - support@worldmedianetwork.uk
Website - worldmedianetwork.uk

India Packages

Read More

Europe Packages

Read More

Asia Packages

Read More

Middle East & Africa Packages

Read More

South America Packages

Read More

USA & Canada Packages

Read More

Oceania Packages

Read More

Cis Countries Packages

Read More

World Packages

Read More
sponsored by

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
[mc4wp_form]
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Copy Link Print
Previous Article Know Your Opponents: India and Thailand renew rivalry in men’s football – World News Network
Next Article Kamal Haasan welcomes verdict in Anna University sexual assault case – World News Network
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

FacebookLike
TwitterFollow
PinterestPin
InstagramFollow

Subscribe Now

Subscribe to our newsletter to get our newest articles instantly!

[mc4wp_form]
Most Popular
Allu Arjun, Ranveer Singh, Vicky Kaushal, others hail RCB’s IPL triumph – World News Network
June 4, 2025
Bisleri International and Apparel Group Announce Strategic Partnership to Expand Beverage Footprint Across the Middle East & Africa – World News Network
June 4, 2025
Ganga Bath Fittings Limited IPO Opens on June 04, 2025 – World News Network
June 4, 2025
“The Scent of Soil and the Flow of Time” — A Poetic Chronicle of Emotion and Identity by Dr. Prashant Kumar Bhardwaj – World News Network
June 4, 2025
UST Expands India Presence with Two New Offices in Delhi NCR – World News Network
June 4, 2025

You Might Also Like

TCL India extends its partnership with Rohit Sharma as its Brand Ambassador – World News Network

4 Min Read

Greek-Indian “Eutopia” Marks a New Chapter in Greek-Indian Manpower Mobility – World News Network

4 Min Read

Hyundai Motor Company President and CEO Jose Munoz Reinforces Hyundai’s Journey as a Mobility Leader at FISITA World Mobility Conference 2025 – World News Network

5 Min Read

Miles Education and St. James School of Medicine partner to enable Indian NEET Qualifiers to become Licensed U.S. Doctors – World News Network

6 Min Read
Latest World News Update
Copyright © 2024 World News Network. All Rights Reserved.
Welcome Back!

Sign in to your account

Lost your password?