Subscribe to the PwC Newsletter

Join the community, edit dataset, edit dataset tasks.

Some tasks are inferred based on the benchmarks list.

Add a Data Loader

Remove a data loader.

  • huggingface/datasets -
  • tensorflow/datasets -
  • pytorch/text -

Edit Dataset Modalities

Edit dataset languages, edit dataset variants.

The benchmarks section lists all benchmarks using a given dataset or any of its variants. We use variants to distinguish between results evaluated on slightly different versions of the same dataset. For example, ImageNet 32⨉32 and ImageNet 64⨉64 are variants of the ImageNet dataset.

Add a new evaluation result row

Imdb movie reviews.

imdb movie review dataset csv

The IMDb Movie Reviews dataset is a binary sentiment analysis dataset consisting of 50,000 reviews from the Internet Movie Database (IMDb) labeled as positive or negative. The dataset contains an even number of positive and negative reviews. Only highly polarizing reviews are considered. A negative review has a score ≤ 4 out of 10, and a positive review has a score ≥ 7 out of 10. No more than 30 reviews are included per movie. The dataset contains additional unlabeled data.

Benchmarks Edit Add a new result Link an existing benchmark

Trend Task Dataset Variant Best Model Paper Code
Paper Code Results Date Stars

Dataset Loaders Edit Add Remove

imdb movie review dataset csv

Similar Datasets

License edit, modalities edit, languages edit.

IMDb Non-Commercial Datasets

Subsets of IMDb data are available for access to customers for personal and non-commercial use. You can hold local copies of this data, and it is subject to our terms and conditions. Please refer to the Non-Commercial Licensing and copyright/license and verify compliance.

As of March 18, 2024 the datasets on this page are backed by a new data source. There has been no change in location or schema, but if you encounter issues with the datasets following the March 18th update, please contact [email protected].

Data Location

The dataset files can be accessed and downloaded from https://datasets.imdbws.com/ . The data is refreshed daily.

IMDb Dataset Details

Each dataset is contained in a gzipped, tab-separated-values (TSV) formatted file in the UTF-8 character set. The first line in each file contains headers that describe what is in each column. A '\N' is used to denote that a particular field is missing or null for that title/name. The available datasets are as follows:

title.akas.tsv.gz

  • titleId (string) - a tconst, an alphanumeric unique identifier of the title
  • ordering (integer) – a number to uniquely identify rows for a given titleId
  • title (string) – the localized title
  • region (string) - the region for this version of the title
  • language (string) - the language of the title
  • types (array) - Enumerated set of attributes for this alternative title. One or more of the following: "alternative", "dvd", "festival", "tv", "video", "working", "original", "imdbDisplay". New values may be added in the future without warning
  • attributes (array) - Additional terms to describe this alternative title, not enumerated
  • isOriginalTitle (boolean) – 0: not original title; 1: original title

title.basics.tsv.gz

  • tconst (string) - alphanumeric unique identifier of the title
  • titleType (string) – the type/format of the title (e.g. movie, short, tvseries, tvepisode, video, etc)
  • primaryTitle (string) – the more popular title / the title used by the filmmakers on promotional materials at the point of release
  • originalTitle (string) - original title, in the original language
  • isAdult (boolean) - 0: non-adult title; 1: adult title
  • startYear (YYYY) – represents the release year of a title. In the case of TV Series, it is the series start year
  • endYear (YYYY) – TV Series end year. '\N' for all other title types
  • runtimeMinutes – primary runtime of the title, in minutes
  • genres (string array) – includes up to three genres associated with the title

title.crew.tsv.gz

  • directors (array of nconsts) - director(s) of the given title
  • writers (array of nconsts) – writer(s) of the given title

title.episode.tsv.gz

  • tconst (string) - alphanumeric identifier of episode
  • parentTconst (string) - alphanumeric identifier of the parent TV Series
  • seasonNumber (integer) – season number the episode belongs to
  • episodeNumber (integer) – episode number of the tconst in the TV series

title.principals.tsv.gz

  • nconst (string) - alphanumeric unique identifier of the name/person
  • category (string) - the category of job that person was in
  • job (string) - the specific job title if applicable, else '\N'
  • characters (string) - the name of the character played if applicable, else '\N'

title.ratings.tsv.gz

  • averageRating – weighted average of all the individual user ratings
  • numVotes - number of votes the title has received

name.basics.tsv.gz

  • primaryName (string)– name by which the person is most often credited
  • birthYear – in YYYY format
  • deathYear – in YYYY format if applicable, else '\N'
  • primaryProfession (array of strings)– the top-3 professions of the person
  • knownForTitles (array of tconsts) – titles the person is known for

Get started

Contact us to see how IMDb data can solve your customers needs.

  • Español – América Latina
  • Português – Brasil
  • Tiếng Việt

TFDS now supports the Croissant 🥐 format ! Read the documentation to know more.

imdb_reviews

  • Description :

Large Movie Review Dataset. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.

Additional Documentation : Explore on Papers With Code north_east

Config description : Plain text

Homepage : http://ai.stanford.edu/~amaas/data/sentiment/

Source code : tfds.datasets.imdb_reviews.Builder

  • 1.0.0 (default): New split API ( https://tensorflow.org/datasets/splits )

Download size : 80.23 MiB

Dataset size : 129.83 MiB

Auto-cached ( documentation ): Yes

Split Examples
25,000
25,000
50,000
  • Feature structure :
  • Feature documentation :
Feature Class Shape Dtype Description
FeaturesDict
label ClassLabel int64
text Text string

Supervised keys (See as_supervised doc ): ('text', 'label')

Figure ( tfds.show_examples ): Not supported.

Examples ( tfds.as_dataframe ):

imdb_reviews/plain_text (default config)

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2024-09-20 UTC.

  • Data Science
  • Data Analysis
  • Data Visualization
  • Machine Learning
  • Deep Learning
  • Computer Vision
  • Artificial Intelligence
  • AI ML DS Interview Series
  • AI ML DS Projects series
  • Data Engineering
  • Web Scrapping

IMDB Datasets : Types, Usages, and Application

The IMDb dataset refers to a collection of data compiled and provided by IMDb (Internet Movie Database), one of the most comprehensive online databases of movies, TV shows, actors, and production crew information. IMDb is a widely used platform for accessing information about films and television programs, including details such as cast and crew credits, user ratings and reviews, plot summaries, trivia, and more.

Table of Content

Types of IMDB datasets

How to download imdb dataset, how to load imbd datasets, applications of imdb datasets, use cases or project ideas using imdb dataset.

The IMDb dataset typically includes structured data in formats such as CSV (Comma-Separated Values) or JSON (JavaScript Object Notation), containing information about movies, TV shows, actors, directors, genres, ratings, release dates, and other related attributes. These datasets are often used for research, analysis, and development of applications related to the entertainment industry, such as recommendation systems, market research, and academic studies.

The IMDb datasets provide various types of information about movies, TV shows, actors, crew members, ratings, and more.

DatasetPurposeKey Fields
Basic information about movies, TV shows, and video games , , , , , , , ,
Alternate names for titles , , , , , , ,
Principal cast/crew members for each title , , , , ,
Director and writer information for each title , ,
Information about episodes of TV series , , ,
IMDb ratings and the number of votes for each title , ,
Information about people (actors, directors, writers, etc.) , , , , ,
Information about the genres associated with each title ,

Here’s a step-by-step guide for downloading IMDb datasets:

Method 1: Downloading from the IMDb Website

  • Open your web browser and go to www.imdb.com.
  • Browse through the available datasets or use the search function to find the specific dataset you’re interested in, such as IMDb Top 250 movies or IMDb ratings.
  • Click on the download link or button associated with the dataset you want to download.
  • Follow any on-screen instructions, such as agreeing to terms of use or providing your email address, to initiate the download process.
  • The dataset will typically be downloaded as a compressed file (e.g., ZIP or CSV format).

Method 2: Downloading from Third-Party Sources

  • Use a search engine to find websites or repositories that host IMDb datasets. You can search for terms like “IMDb dataset Kaggle” or “IMDb dataset GitHub”.
  • Visit the websites or repositories that appear in the search results.
  • Look for IMDb datasets or collections of movie-related data.
  • Review the available datasets and choose a source that offers the dataset you’re interested in. Popular platforms like Kaggle, GitHub, and data.world often have IMDb datasets.
  • Once you’ve found a suitable dataset, follow the instructions provided on the website or repository to download it.
  • This typically involves clicking on a download link or cloning the repository if it’s hosted on GitHub.
  • The dataset will be downloaded to your computer as a compressed file, which you can then extract to access the individual files.

Method 3: Accessing Data via IMDb API

  • Go to the IMDb Developer website (https://developer.imdb.com/) and sign up for an API key.
  • Follow the instructions to create an account and obtain your API key.
  • Review the IMDb API documentation to understand how to make requests and retrieve data.
  • The documentation will provide details on endpoints, parameters, and response formats.
  • Use your preferred programming language or tool to make requests to the IMDb API.
  • Include your API key in each request to authenticate your access.
  • Follow the guidelines in the documentation to construct requests for the specific data you need, such as movie details, ratings, or reviews.
  • Process the responses returned by the IMDb API to extract the desired data.
  • Depending on your application, you may choose to store the data locally, analyze it in real-time, or display it to users.

Load Datasets Using TensorFlow

TensorFlow Datasets (TFDS) provides a collection of ready-to-use datasets for use with TensorFlow. Some IMDb datasets are available through TFDS. Use TFDS to load the IMDb dataset (e.g., IMDb reviews for sentiment analysis).

Load Datasets Using keras Imdb Dataset

Keras, which is now part of the TensorFlow library, provides built-in support for the IMDb dataset, particularly the IMDb movie reviews dataset, which is commonly used for sentiment analysis.

Keras includes the imdb dataset in its datasets module. You can load it directly without needing to manually download it.

  • Media Platforms : Many media and entertainment companies license IMDb data to enhance content discovery. They use it for in-catalog and out-of-catalog title search, as well as to power relevant content recommendations.
  • Amazon Personalize and Amazon SageMaker : IMDb data can be ingested into Amazon Personalize and Amazon SageMaker to build recommendation engines and machine learning applications 1 .
  • Financial Trading Systems : IMDb databases (IMDBs) can benefit applications that require real-time data processing, such as financial trading systems
  • Online Gaming : IMDBs are useful for online gaming platforms that need low-latency access to data.
  • E-Commerce Platforms : Real-time inventory management and personalized recommendations can leverage IMDb data.
  • Big Data Analytics : IMDb data can be used for large-scale analytics, trend analysis, and insights.
  • Sentiment Analysis : Researchers and data scientists analyze IMDb movie reviews using natural language processing (NLP) techniques to determine sentiments.
  • Scientific Simulations : IMDBs can be used in scientific simulations that require fast data access.
  • IMDb databases are compared with other database technologies for specific use cases, highlighting their strengths and limitations.

Content-Based Filtering :

  • IMDb data can be used for content-based recommendations. By analyzing movie attributes (such as genres, directors, actors, and release years), systems can suggest similar titles to users based on their preferences.
  • For example, if a user enjoys action movies with Tom Cruise, the system can recommend other action films featuring Tom Cruise.

Collaborative Filtering :

  • IMDb ratings and user reviews provide valuable data for collaborative filtering. This technique recommends items based on the preferences of similar users.
  • By analyzing user-item interactions (ratings, watch history), collaborative filtering can suggest movies that users with similar tastes enjoyed.

Hybrid Recommendations :

  • Combining content-based and collaborative filtering approaches leads to hybrid recommendations. IMDb data can be used to build hybrid models that offer personalized suggestions.
  • These models consider both item attributes (content-based) and user behavior (collaborative).

Genre Analysis and Trends :

  • Researchers and analysts study IMDb data to identify genre trends over time. Which genres are popular? How have preferences changed?
  • IMDb’s extensive genre information allows for detailed analysis of audience preferences.

Box Office Predictions :

  • IMDb data, including movie budgets, ratings, and release dates, can be used to predict box office performance.
  • Machine learning models trained on historical data can estimate a movie’s potential revenue.

Casting Decisions and Talent Management :

  • IMDb provides information about actors, directors, and crew members. Talent agencies and casting directors use this data for decision-making.
  • For instance, casting directors can explore actors’ filmographies and ratings to make informed choices.

Entertainment News and Blogs :

  • Entertainment journalists and bloggers use IMDb data to write articles, reviews, and profiles.
  • IMDb’s comprehensive database ensures accurate and up-to-date information.

FAQ – IMDb Dataset

Q1: what is the imdb dataset worth.

The IMDb dataset holds immense value for researchers, with its comprehensive coverage of entertainment content and audience interactions. Its insights can inform business decisions, drive innovation, and advance scholarly research in the field.

Q2: How much does the IMDb dataset cost?

The IMDb dataset is typically available for free download, though some third-party providers may offer enhanced versions or value-added services for a fee.

Q3: Where can I download the IMDb dataset?

The IMDb dataset can be downloaded from the IMDb datasets page on the official website, or from reputable data repositories and platforms such as Kaggle or GitHub.

Q4: Is there a specific format for the IMDb dataset?

The IMDb dataset is commonly available in formats such as CSV, JSON, or SQL dumps, making it compatible with a wide range of data analysis tools and programming languages.

Q5: How can I access the IMDb dataset via Hugging Face?

Hugging Face, a popular platform for accessing natural language processing datasets, may offer IMDb datasets or related resources through its repository. Users can search for IMDb datasets using the platform’s search functionality.

Similar Reads

Please login to comment..., improve your coding skills with practice.

 alt=

What kind of Experience do you want to share?

Open Access

Imdb users' ratings dataset.

imdb movie review dataset csv

Abstract 

This dataset contains 4669820 ratings from 1499238 users to 351109 movies on the imdb.com website. This data is collected from reviews ( https://www.imdb.com/review/rw0000001/ ). Each row in this dataset is as follows:

userID, movieID, rating, review date

For example : 

ur18238764, tt2177461, 9, 22 January 2019

Use the following code to read the dataset :

import numpy as np

dataset = np.load ("Dataset.npy")

print (dataset [0])

  • Log in to post comments

Dataset Files

Dataset access, how to access this dataset.

This Open Access dataset is available to all IEEE DataPort users. Please login or register.

Login   Create a FREE IEEE Account

Upload your Dataset

How to upload dataset files directly to aws.

IEEE DataPort Subscribers may upload their dataset files directly to IEEE DataPort's AWS S3 file storage. Please read the Upload Your Files directly to the IEEE DataPort S3 Bucket help topic for detailed instructions.

You will need the following information to complete your upload:

  • Your AWS Access Key and Secret key, which can be found on your IEEE DataPort User Profile .
  • DATASET TYPE: open
  • DATASETID: 3421

Dataset Citation

Share / embed, embed this dataset on another website.

Copy and paste the HTML code below to embed your dataset:

Share via email or social media

Click the buttons below:

facebook

Share a link to this dataset

Permalink: http://ieee-dataport.org/open-access/imdb-users-ratings-dataset

DOI Link: https://dx.doi.org/10.21227/br41-bd49

Short Link: http://ieee-dataport.org/3421

Access on AWS

Want to access the data files.

Open Access data files are available to all users upon login. Login or create a free account today.

Navigation Menu

Search code, repositories, users, issues, pull requests..., provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

  • Notifications You must be signed in to change notification settings

IMDB Movie Reviews Large Dataset - 50k Reviews

laxmimerit/IMDB-Movie-Reviews-Large-Dataset-50k

Folders and files.

NameName
3 Commits

Repository files navigation

Imdb-movie-reviews-large-dataset-50k.

This dataset is taken from https://ai.stanford.edu/~amaas/data/sentiment/ and then preprocess to put all positive and negative reviews in the same file for training and testing. It help you to put more effort on algorithm instead of data collection.

IMAGES

  1. IMDb Movie Reviews Dataset

    imdb movie review dataset csv

  2. IMDb data csv

    imdb movie review dataset csv

  3. GitHub

    imdb movie review dataset csv

  4. IMDb Movie Reviews Dataset

    imdb movie review dataset csv

  5. IMDB movie review dataset

    imdb movie review dataset csv

  6. IMDb Top 1000 Movies Dataset

    imdb movie review dataset csv

VIDEO

  1. High quality Movie review 🍿🇬🇧 #shorts #movie #viralvideo #theflash #dc #explore

  2. -NLP Challenge: IMDB Dataset of 50K Movie Reviews to perform Sentiment analysis

  3. How To Scrape Data From IMDB Using Selenium With Python

  4. 5 使用DataSet读取IMDB数据集

  5. Fellowship AI

  6. IMDb Movie Review Classification Using Deep learning & Live testing of reviews from IMDb Website

COMMENTS

  1. IMDB Dataset of 50K Movie Reviews

    Large Movie Review Dataset. Kaggle uses cookies from Google to deliver and enhance the quality of its services and to analyze traffic. Learn more. OK, Got it. Something went wrong and this page crashed! If the issue persists, it's likely a problem on our side. Unexpected end of JSON input.

  2. IMDb Movie Reviews Dataset

    The IMDb Movie Reviews dataset is a binary sentiment analysis dataset consisting of 50,000 reviews from the Internet Movie Database (IMDb) labeled as positive or negative. The dataset contains an even number of positive and negative reviews. Only highly polarizing reviews are considered. A negative review has a score ≤ 4 out of 10, and a positive review has a score ≥ 7 out of 10. No more ...

  3. IMDB dataset (Sentiment analysis) in CSV format

    IMDB Movie Review Dataset transform into CSV files. Kaggle uses cookies from Google to deliver and enhance the quality of its services and to analyze traffic. Learn more. OK, Got it. Something went wrong and this page crashed! If the issue persists, it's likely a problem on our side.

  4. IMDB_Data_Analysis/imdb_top_1000.csv at main

    Raw. This project involves analyzing IMDB's Top 1000 movies dataset based on various variables. The analysis covers IMDB scores, Meta scores, genres, and gross values, with visualizations created using Plotly, Seaborn, and Matplotlib libraries to reveal insightful trends and patterns.

  5. IMDb Non-Commercial Datasets

    IMDb Dataset Details. Each dataset is contained in a gzipped, tab-separated-values (TSV) formatted file in the UTF-8 character set. ... (e.g. movie, short, tvseries, tvepisode, video, etc) primaryTitle (string) - the more popular title / the title used by the filmmakers on promotional materials at the point of release; originalTitle (string ...

  6. imdb_reviews

    Large Movie Review Dataset. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.

  7. rishimule/Sentiment-Analysis-of-Movie-Reviews

    This project aims to perform sentiment analysis on the IMDB movie review dataset. It utilizes deep learning techniques, particularly LSTM and Conv1D layers, to classify movie reviews into positive and negative sentiments. The model is built using Keras and GloVe embeddings for word representations.

  8. GitHub

    About DatasetIMDB dataset having 50K movie reviews for natural language processing or Text analytics. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training and 25,000 for testing.

  9. IMDb Movie Reviews Dataset

    This dataset contains nearly 1 Million unique movie reviews from 1150 different IMDb movies spread across 17 IMDb genres - Action, Adventure, Animation, Biography, Comedy, Crime, Drama, Fantasy, History, Horror, Music, Mystery, Romance, Sci-Fi, Sport, Thriller and War. The dataset also contains movie metadata such as date of release of the movie, run length, IMDb rating, movie rating (PG-13, R ...

  10. IMDB Movies Dataset

    Top 1000 Movies by IMDB Rating. Kaggle uses cookies from Google to deliver and enhance the quality of its services and to analyze traffic. Learn more. OK, Got it. Something went wrong and this page crashed! If the issue persists, it's likely a problem on our side. Unexpected end of JSON input.

  11. IMDb-Review-Analysis/IMDb_Reviews.csv at master

    Using sentiment analysis to classify documents based on their polarity. In particular, this project works with a dataset of 50,000 movie reviews from the Internet Movie Database (IMDb) and build a predictor that can distinguish between positive and negative review.

  12. IMDb Movie Dataset: All Movies by Genre

    A comprehensive collection of all movies listed on IMDb, sorted by genre. A comprehensive collection of all movies listed on IMDb, sorted by genre. Kaggle uses cookies from Google to deliver and enhance the quality of its services and to analyze traffic. Learn more. OK, Got it. Something went wrong and this page crashed! ...

  13. Movie-Review-Sentiment-Analysis/IMDB-Dataset.csv at master

    Sentiment of a movie review is predicted using three different neural network models - MLP, CNN and LSTM. GloVe embedding is used for vector representation of words. - SK7here/Movie-Review-Sentim...

  14. IMDB Datasets : Types, Usages, and Application

    The dataset will typically be downloaded as a compressed file (e.g., ZIP or CSV format). Method 2: Downloading from Third-Party Sources. ... Keras, which is now part of the TensorFlow library, provides built-in support for the IMDb dataset, particularly the IMDb movie reviews dataset, which is commonly used for sentiment analysis.

  15. sahildit/IMDB-Movies-Extensive-Dataset-Analysis

    The movie dataset includes 85,855 movies with attributes such as movie description, average rating, votes, Year, Date published, Title, description, genre, etc. • Demonstrate the true understanding of what the data says using Visualisation. The data sets contain the information about IMDB movies ...

  16. IMDB Movies Dataset

    A Comprehensive Database of Movie Information, Ratings, and Reviews from IMDB. A Comprehensive Database of Movie Information, Ratings, and Reviews from IMDB. Kaggle uses cookies from Google to deliver and enhance the quality of its services and to analyze traffic. Learn more. OK, Got it. Something went wrong and this page crashed! ...

  17. IMDb Users' Ratings Dataset

    This dataset contains 4669820 ratings from 1499238 users to 351109 movies on the imdb.com website. This data is collected from reviews (https://www.imdb.com/review ...

  18. IMDB-Movie-Reviews-Large-Dataset-50k

    IMDB Movie Reviews Large Dataset - 50k Reviews. Contribute to laxmimerit/IMDB-Movie-Reviews-Large-Dataset-50k development by creating an account on GitHub.

  19. Top 1000 IMDb Movies Dataset

    Discover the Greatest Movies of All Time - IMDb's Top 1000 Movie Rankings. Discover the Greatest Movies of All Time - IMDb's Top 1000 Movie Rankings. Kaggle uses cookies from Google to deliver and enhance the quality of its services and to analyze traffic. Learn more. OK, Got it. Something went wrong and this page crashed! ...