Licence de Mathématiques. Mathématiques du Web. Pascal Azerad c ⃝ 28 mars 2014

Licence de Mathématiques. Mathématiques du Web. Pascal Azerad c ⃝ 28 mars 2014 2 Table des matières 1 Le réseau : un graphe gigantesque. 7 1.1 Google : la légende. . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2 Le graphe du web . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.3 Exploration du web . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3.1 un programme MATLAB. . . . . . . . . . . . . . . . . 10 1.4 TP1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2 Modélisation de grands ensembles de documents pour la re- cherche d’information 21 2.1 Indexation du web ou d’une base de donnée. . . . . . . . . . . 21 2.2 Modélisation par espace vectoriel. . . . . . . . . . . . . . . . . 23 2.3 La décomposition de valeur singulière : un outil puissant pour l’indexation sémantique latente. . . . . . . . . . . . . . . . . . 25 2.4 Un peu d’algèbre linéaire. . . . . . . . . . . . . . . . . . . . . 25 2.5 TD autour de la décomposition de valeurs singulières. . . . . . 32 2.6 TP 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3 Le moteur de recherche Google et le page ranking. 37 3.1 L’équation du PageRank . . . . . . . . . . . . . . . . . . . . . 37 3.1.1 Traduction matricielle. . . . . . . . . . . . . . . . . . . 38 3.2 L’algorithme page rank. . . . . . . . . . . . . . . . . . . . . . 39 3.2.1 modification de la matrice H . . . . . . . . . . . . . . . 39 3.2.2 Le choix de α . . . . . . . . . . . . . . . . . . . . . . . 44 3.2.3 TD 4 : sensibilité du page rank vis à vis du paramètre α. 44 3.2.4 Point de vue spectral : puissance de matrice et valeurs propres. . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.2.5 Résolution directe du système linéaire . . . . . . . . . . 51 3.3 TD. Introduction aux chaînes de Markov. . . . . . . . . . . . . 52 3.4 TP4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.5 TP4 (variante) . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.5.1 Promenade aléatoire sur le Web. . . . . . . . . . . . . . 55 3 4 TABLE DES MATIÈRES 3.6 un programme MATLAB. . . . . . . . . . . . . . . . . . . . . 55 4 Résumé MATLAB. 57 TABLE DES MATIÈRES 5 Préambule Ce fascicule volontairement succinct poursuit les buts suivants : – être assimilable par un étudiant de licence en sept séances. – montrer une application récente et enthousiasmante des mathématiques. Ce document est une première version, l’auteur est très reconnaissant pour toute erreur, coquille ou commentaire qu’on voudra bien lui adresser à 6 TABLE DES MATIÈRES Chapitre 1 Le réseau : un graphe gigantesque. 1.1 Google : la légende. Ce qui suit est extrait de de Wikipedia : Google began in March 1996 as a research project by Larry Page and Sergey Brin, Ph.D. students at Stanford working on the Stanford Digital Library Project (SDLP). The SDLP’s goal was « to develop the enabling technologies for a single, integrated and universal digital library. » and was funded through the National Science Foundation among other federal agencies. In search for a dissertation theme, Page considered, among other things, exploring the mathematical properties of the World Wide Web, understanding its link structure as a huge graph. His supervisor Terry Winograd encouraged him to pick this idea (which Page later recalled as « the best advice I ever got ») and Page focused on the problem of finding out which web pages link to a given page, considering the number and nature of such backlinks to be valuable information about that page (with the role of citations in academic publishing in mind). In his research project,nicknamed « BackRub » he was soon joined by Sergey Brin, a fellow Stanford Ph.D. student supported by a National Science Foundation Graduate Fellowship. Brin was already a close friend, whom Page had first met in the summer of 1995 in a group of potential new students which Brin had volunteered to show around the campus. Page’s web crawler began exploring the web in March 1996, setting out from Page’s own Stanford home page as its only starting point. To convert the backlink data that it gathered into a measure of importance for a given web page, Brin and Page developed the PageRank algorithm. Analyzing BackRub’s output, which, for a given URL, consisted of a list of backlinks ranked by 7 8 CHAPITRE 1. LE RÉSEAU : UN GRAPHE GIGANTESQUE. importance, it occurred to them that a search engine based on PageRank would produce better results than existing techniques (existing search engines at the time essentially ranked results according to how many times the search term appeared on a page). . . Convinced that the pages with the most links to them from other highly relevant Web pages must be the most relevant pages associated with the search, Page and Brin tested their thesis as part of their studies, and laid the foundation for their search engine. By early 1997, the backrub page described the state as follows : Some Rough Statistics (from August 29th, 1996) Total indexable HTML urls : 75.2306 Million Total content downloaded : 207.022 gigabytes BackRub is written in Java and Python and runs on several Sun Ultras and Intel Pentiums running Linux. The primary database is kept on an Sun Ultra II with 28GB of disk. Scott Hassan and Alan Steremberg have provided a great deal of very talented implementation help. Sergey Brin has also been very involved and deserves many thanks. Larry Page Originally the search engine used the Stanford website with the domain The domain was registered on September 15, 1997. They formally incorporated their company, Google Inc., on Sep- tember 4, 1998 at a friend’s garage in Menlo Park, California. The name « Google » originated from a misspelling of « googol, » which refers to the number represented by a 1 followed by one-hundred zeros (although Enid Blyton used the phrase « Google Bun » in The Magic Faraway Tree (published 1941). Having found its way increasingly into everyday language, the verb, « google » was added to the Merriam Webster Collegiate Dictionary and the Oxford English Dictionary in 2006, meaning, « to use the Google search engine to obtain information on the Internet. » Google Home Page September 1998 By the end of 1998, Google had an index of about 60 million pages. The home page was still marked « BETA », but an article in already argued that Google’s search results were better than those of competitors like Hot- bot or, and praised it for being more technologically innovative than the overloaded portal sites (like Yahoo !,, Lycos, Netscape’s Netcenter,, and which at that time, during the growing dot-com bubble, were seen as « the future of the Web », especially by stock market investors. In March 1999, the company moved into offices at 165 University Avenue in Palo Alto, home to several other noted Silicon Valley technology startups. After quickly outgrowing two other sites, the company leased a complex of buildings in Mountain View at 1600 Amphitheatre Parkway from Silicon 1.2. LE uploads/s1/ cours-web.pdf

  • 35
  • 0
  • 0
Afficher les détails des licences
Licence et utilisation
Gratuit pour un usage personnel Attribution requise
  • Détails
  • Publié le Dec 17, 2022
  • Catégorie Administration
  • Langue French
  • Taille du fichier 0.7931MB