Film dataset


This dataset includes most films and actors in Wikipedia, and a couple of films in IMDB.

The film graph and actor graph are extracted from DBpedia 2014 and YAGO 3.0.2, which are represented as RDF triples, consisting of a source entity, a target entity and a labeled edge. Another film JSON document is crawled from IMDB by OMDB API.

In this dataset, we present them together in the file named "mix graph". Besides, in order to present them separately, we present all films and actors as subjects in their individual files (named as "film graph" and "actor graph") and use "-" as the reversed direction of the predicate. Note that, there may exist some duplication in film graph and actor graph.

Complete Data Statistics:

RDF graph of DBpedia
Film Actor Mix
Source nodes 85,667 86,465 488,140
Labeled edges 173 852 634
Target nodes 330,250 649,130 514,269
Triples 1,718,482 1,909,674 3,183,420
RDF graph of Yago
Film Actor Mix
Source nodes 35,819 27,874 83,841
Labeled edges 14 37 36
Target nodes 54,269 88,294 984,90
Triples 520,520 522,800 899,288
JSON document of IMDB
Rows 96,781
Attributes 20

Sample Data:


<Forrest_Gump> -<notableWork> <Eric_Roth> <Forrest_Gump> -<child>

<Forrest_Gump_(character)> <Forrest_Gump> <budget> "5.5E7"^^<;

<Forrest_Gump> <cinematography> <Don_Burgess_(cinematographer)>

<Forrest_Gump> <country> <United_States>

<Forrest_Gump> <director> <Robert_Zemeckis> <Forrest_Gump> <distributor> <Paramount_Pictures>

<Forrest_Gump> <editing> <Arthur_Schmidt_(film_editor)>

<Forrest_Gump> <gross> "6.77387716E8"^^<;

<Forrest_Gump> <musicComposer> <Alan_Silvestri>

<Forrest_Gump> <producer> <Charles_Newirth>

<Forrest_Gump> <producer> <Steve_Starkey>

<Forrest_Gump> <runtime> "8520.0"^^<;

<Forrest_Gump> <starring> <Gary_Sinise>

<Forrest_Gump> <starring> <Mykelti_Williamson>

<Forrest_Gump> <starring> <Robin_Wright>

<Forrest_Gump> <starring> <Sally_Field>

<Forrest_Gump> <starring> <Tom_Hanks>

<Forrest_Gump> <writer> <Eric_Roth>

<Forrest_Gump> <Work/runtime> "142.0"^^<;

<Forrest_Gump> <subject> <Category:1990s_comedy-drama_films>

<Forrest_Gump> <subject> <Category:1994_films>

<Forrest_Gump> <subject> <Category:American_comedy-drama_films>


<Tom_Hanks> -<guest> <100_(30_Rock)>

<Tom_Hanks> -<producer> <A_Hologram_for_the_King_(film)>

<Tom_Hanks> -<producer> <Cast_Away>

<Tom_Hanks> -<starring> <A_Hologram_for_the_King_(film)>

<Tom_Hanks> -<starring> <A_League_of_Their_Own>

<Tom_Hanks> -<spouse> <Rita_Wilson>

<Tom_Hanks> -<voice> <Sheriff_Woody>

<Tom_Hanks> <activeYearsStartYear> "1978"^^<;

<Tom_Hanks> <birthDate> "1956-07-09"^^<;

<Tom_Hanks> <birthPlace> <Concord,_California>

<Tom_Hanks> <birthYear> "1956"^^<;

<Tom_Hanks> <child> <Colin_Hanks>

<Tom_Hanks> <education> <California_State_University,_Sacramento>

<Tom_Hanks> <occupation> <Tom_Hanks__1>

<Tom_Hanks> <spouse> <Rita_Wilson>

<Tom_Hanks> <subject> <Category:1956_births>

<Tom_Hanks> <subject> <Category:20th-century_American_male_actors>


<Forrest_Gump> -<wroteMusicFor> <Alan_Silvestri>

<Forrest_Gump> -<created> <Eric_Roth>

<Forrest_Gump> -<created> <Robert_Zemeckis>

<Forrest_Gump> -<actedIn> <Tom_Hanks>

<Forrest_Gump> -<actedIn> <Sally_Field>

<Forrest_Gump> -<actedIn> <Mykelti_Williamson>

<Forrest_Gump> -<actedIn> <Sam_Anderson>

<Forrest_Gump> -<actedIn> <Gary_Sinise>

<Forrest_Gump> -<actedIn> <Robin_Wright>

<Forrest_Gump> <isLocatedIn> <United_States>

<Forrest_Gump> -<edited> <Arthur_Schmidt_(film_editor)>

<Forrest_Gump> -<directed> <Robert_Zemeckis>

<Forrest_Gump> rdf:type <wikicat_1990s_comedy_films>

<Forrest_Gump> rdf:type <wikicat_1994_films>


<Tom_Hanks> <actedIn> <Toy_Story_3>

<Tom_Hanks> <actedIn> <Toy_Story_2>

<Tom_Hanks> <wasBornIn> <Concord,_California>

<Tom_Hanks> <actedIn> <The_Da_Vinci_Code_(film)>

<Tom_Hanks> <actedIn> <Apollo_13_(movie)>

<Tom_Hanks> <actedIn> <Sleepless_in_Seattle>

<Tom_Hanks> <actedIn> <Catch_Me_If_You_Can>

<Tom_Hanks> <actedIn> <Every_Time_We_Say_Goodbye_(film)>

<Tom_Hanks> <hasWonPrize> <Saturn_Award>

<Tom_Hanks> <hasWonPrize> <Golden_Globe_Award>

<Tom_Hanks> rdf:type <wikicat_Writers_from_Los_Angeles,_California>

<Tom_Hanks> rdf:type <wikicat_People_from_California>

<Tom_Hanks> rdf:type <wikicat_Living_people>

<Tom_Hanks> rdf:type <wikicat_Actors>


"Title":"Forrest Gump",



"Released":"06 Jul 1994",

"Runtime":"142 min",

"Genre":"Comedy, Drama",

"Director":"Robert Zemeckis",

"Writer":"Winston Groom (novel), Eric Roth (screenplay)",

"Actors":"Tom Hanks, Rebecca Williams, Sally Field, Michael Conner Humphreys",

"Plot":"Forrest Gump, while not intelligent, has accidentally been present at many historic moments, but his true love, Jenny Curran, eludes him.",



"Awards":"Won 6 Oscars. Another 39 wins & 65 nominations.",









Example Queries:

Query Description
q1 Find all slots of films which has won the American academic awards.
q2 Find all actors who is born in United States and has played comedy films.
q3 Find all films which is released before 2000 and includes Chinese actor.
q4 Find all actors who is born before 1980 and has played the films with at least imdbRating 8.0.
q5 Find all actors who is born and die in different countries and starring the films with at least imdbRating 8.0.
q6 Find all directors who directs the films with actors coming from more than two countries.
q7 Find all directors who directs the at least two American academic awards films with at least imdbRating 8.0.
q8 Count average age of actors in a film with at least imdbRating 8.0.
q9 Count average imdbRating of films starring by the American academic best male actor.
q10 Map the film in graph to the film in JSON with their features, since there exists situations such as ambiguity.

Download files: