Ruby-Skript: Server-Statistik aus Firefox-Lesezeichen

Dieses Skript ist eine Erweiterung von diesem Skript nutzt die Lesezeichen vom Firefox. Diese kann man im JSON-Format exportieren, indem Firefox > Lesezeichen > Alle Lesezeichen anzeigen > Importieren und Sichern > Sichern... aufgerufen wird. Aus allen Lesezeichen werden Anfragen an den Server gestellt. Dieser schickt in der Antwort den eingesetzten Server-Typ mit. Alle Typen werden aufgenommen und gezählt.

Am Ende bekommt man so eine Übersicht über die eingesetzten Server der eigenen Lesezeichen.

Gems

curb wird genutzt, um die HTTP-Header-Anfragen mittels Curl durchzuführen.
http_headers parst den Header, der mittels Curb empfangen wurde.

Installation

$ gem install curb http_headers $ wget https://raw.github.com/gist/2253541/6d60762a76d17f520b825e4cd875c31cf3767390/count_server.rb $ chmod +x count_server.rb

Beispiel

$ ./count_server.rb bookmarks-2012-03-31.json

Skript

<!DOCTYPE html> <html> <head prefix="og: http://ogp.me/ns# fb: http://ogp.me/ns/fb# githubog: http://ogp.me/ns/fb/githubog#"> <meta charset='utf-8'> <meta http-equiv="X-UA-Compatible" content="IE=edge"> <title>404 · GitHub</title> <link rel="search" type="application/opensearchdescription+xml" href="/opensearch.xml" title="GitHub" /> <link rel="fluid-icon" href="https://github.com/fluidicon.png" title="GitHub" /> <link rel="shortcut icon" href="/favicon.ico" type="image/x-icon" /> <link href="https://a248.e.akamai.net/assets.github.com/stylesheets/bundles/github-e2fb92c4dcb5e5b1ce2ffd0e84d6bf80937d9197.css" media="screen" rel="stylesheet" type="text/css" /> <link href="https://a248.e.akamai.net/assets.github.com/stylesheets/bundles/github2-98a6177ed18ac7b415e311fdb34652f17ad0038c.css" media="screen" rel="stylesheet" type="text/css" /> <script src="https://a248.e.akamai.net/assets.github.com/javascripts/bundles/jquery-225576cef50ef2097c9f9fbcd8953c1572544611.js" type="text/javascript"></script> <script src="https://a248.e.akamai.net/assets.github.com/javascripts/bundles/github-353ded132c604f1bdf010516392d71052f37ffcf.js" type="text/javascript"></script> </head> <body class="logged_out env-production " data-blob-contribs-enabled="yes"> <div id="wrapper"> <div id="header" class="true clearfix"> <div class="container clearfix"> <a class="site-logo" href="https://github.com">  <img alt="GitHub" class="github-logo-4x" height="30" src="https://a248.e.akamai.net/assets.github.com/images/modules/header/logov7@4x.png?1323882717" /> <img alt="GitHub" class="github-logo-4x-hover" height="30" src="https://a248.e.akamai.net/assets.github.com/images/modules/header/logov7@4x-hover.png?1324325358" /> </a>  <ul class="top-nav logged_out"> <li class="pricing"><a href="https://github.com/plans">Signup and Pricing</a></li> <li class="explore"><a href="https://github.com/explore">Explore GitHub</a></li> <li class="features"><a href="https://github.com/features">Features</a></li> <li class="blog"><a href="https://github.com/blog">Blog</a></li> <li class="login"><a href="https://github.com/login?return_to=%2Fgist%2F2253541%2Fcount_server_firefox_bookmarks.rb">Login</a></li> </ul> </div> </div> <div class="site clearfix"> <div class="container"> <style type="text/css"> #header{ border-bottom: 0; } .site{ padding: 0; margin-top: -20px; } * { margin: 0px; padding: 0px; } #parallax_field{ overflow: hidden; position: absolute; left: 0; top: 0; height: 370px; width: 100%; } #parallax_field #parallax_bg { position: absolute; top: -20px; left: -20px; width: 110%; height: 425px; z-index: 1; } #parallax_illustration { display:block; width: 940px; height: 370px; position: relative; overflow: hidden; clear: both; } #parallax_illustration #parallax_error_text { position: absolute; top: 72px; left: 72px; z-index: 10; } #parallax_illustration #parallax_octocat { position: absolute; top: 94px; left: 356px; z-index: 9; } #parallax_illustration #parallax_speeder { position: absolute; top: 150px; left: 432px; z-index: 8; } #parallax_illustration #parallax_octocatshadow { position: absolute; top: 297px; left: 371px; z-index: 7; } #parallax_illustration #parallax_speedershadow { position: absolute; top: 263px; left: 442px; z-index: 6; } #parallax_illustration #parallax_building_1 { position: absolute; top: 73px; left: 467px; z-index: 5; } #parallax_illustration #parallax_building_2 { position: absolute; top: 113px; left: 762px; z-index: 4; } #footer { margin-top: 0px; z-index: 12; } </style> <div id="parallax_field"> <img alt="building" class="js-plaxify" data-invert="true" data-xrange="0" data-yrange="20" height="415" id="parallax_bg" src="https://a248.e.akamai.net/assets.github.com/images/modules/404/parallax_bg.jpg?1315937721" width="940" /> </div> <div id="parallax_illustration"> <img alt="404 | “This is not the web page you are looking for”" class="js-plaxify" data-xrange="20" data-yrange="10" height="249" id="parallax_error_text" src="https://a248.e.akamai.net/assets.github.com/images/modules/404/parallax_errortext.png?1315937721" width="271" /> <img alt="Octobi Wan Catnobi" class="js-plaxify" data-xrange="10" data-yrange="10" height="230" id="parallax_octocat" src="https://a248.e.akamai.net/assets.github.com/images/modules/404/parallax_octocat.png?1315937721" width="188" /> <img alt="land speeder" class="js-plaxify" data-xrange="10" data-yrange="10" height="156" id="parallax_speeder" src="https://a248.e.akamai.net/assets.github.com/images/modules/404/parallax_speeder.png?1315937721" width="440" /> <img alt="Octobi Wan Catnobi's shadow" class="js-plaxify" data-xrange="10" data-yrange="10" height="49" id="parallax_octocatshadow" src="https://a248.e.akamai.net/assets.github.com/images/modules/404/parallax_octocatshadow.png?1315937721" width="166" /> <img alt="land speeder's shadow" class="js-plaxify" data-xrange="10" data-yrange="10" height="75" id="parallax_speedershadow" src="https://a248.e.akamai.net/assets.github.com/images/modules/404/parallax_speedershadow.png?1315937721" width="430" /> <img alt="building" class="js-plaxify" data-invert="true" data-xrange="50" data-yrange="20" height="123" id="parallax_building_1" src="https://a248.e.akamai.net/assets.github.com/images/modules/404/parallax_building_1.png?1315937721" width="304" /> <img alt="building" class="js-plaxify" data-invert="true" data-xrange="75" data-yrange="30" height="50" id="parallax_building_2" src="https://a248.e.akamai.net/assets.github.com/images/modules/404/parallax_building_2.png?1315937721" width="116" /> </div> </div> <div class="context-overlay"></div> </div> <div id="footer-push"></div> </div>  <div id="footer" > <div class="upper_footer"> <div class="container clearfix">  <![if !IE]><h4 id="blacktocat">GitHub Links</h4><![endif]> <ul class="footer_nav"> <h4>GitHub</h4> <li><a href="https://github.com/about">About</a></li> <li><a href="https://github.com/blog">Blog</a></li> <li><a href="https://github.com/features">Features</a></li> <li><a href="https://github.com/contact">Contact & Support</a></li> <li><a href="https://github.com/training">Training</a></li> <li><a href="http://enterprise.github.com/">GitHub Enterprise</a></li> <li><a href="http://status.github.com/">Site Status</a></li> </ul> <ul class="footer_nav"> <h4>Tools</h4> <li><a href="http://get.gaug.es/">Gauges: Analyze web traffic</a></li> <li><a href="http://speakerdeck.com">Speaker Deck: Presentations</a></li> <li><a href="https://gist.github.com">Gist: Code snippets</a></li> <li><a href="http://mac.github.com/">GitHub for Mac</a></li> <li><a href="http://mobile.github.com/">Issues for iPhone</a></li> <li><a href="http://jobs.github.com/">Job Board</a></li> </ul> <ul class="footer_nav"> <h4>Extras</h4> <li><a href="http://shop.github.com/">GitHub Shop</a></li> <li><a href="http://octodex.github.com/">The Octodex</a></li> </ul> <ul class="footer_nav"> <h4>Documentation</h4> <li><a href="http://help.github.com/">GitHub Help</a></li> <li><a href="http://developer.github.com/">Developer API</a></li> <li><a href="http://github.github.com/github-flavored-markdown/">GitHub Flavored Markdown</a></li> <li><a href="http://pages.github.com/">GitHub Pages</a></li> </ul> </div> </div> <div class="lower_footer"> <div class="container clearfix">  <![if !IE]><div id="legal"><![endif]> <ul> <li><a href="https://github.com/site/terms">Terms of Service</a></li> <li><a href="https://github.com/site/privacy">Privacy</a></li> <li><a href="https://github.com/security">Security</a></li> </ul> <p>© 2012 <span title="0.01533s from fe12.rs.github.com">GitHub</span> Inc. All rights reserved.</p> </div> <div class="sponsor"> <a href="http://www.rackspace.com" class="logo"> <img alt="Dedicated Server" height="36" src="https://a248.e.akamai.net/assets.github.com/images/modules/footer/rackspaces_logo.png?1329521040" width="38" /> </a> Powered by the <a href="http://www.rackspace.com ">Dedicated Servers</a> and<br/> <a href="http://www.rackspacecloud.com">Cloud Computing</a> of Rackspace Hosting<span>®</span> </div> </div> </div> </div> <div id="keyboard_shortcuts_pane" class="instapaper_ignore readability-extra" style="display:none"> <h2>Keyboard Shortcuts <small><a href="#" class="js-see-all-keyboard-shortcuts">(see all)</a></small></h2> <div class="columns threecols"> <div class="column first"> <h3>Site wide shortcuts</h3> <dl class="keyboard-mappings"> <dt>s</dt> <dd>Focus site search</dd> </dl> <dl class="keyboard-mappings"> <dt>?</dt> <dd>Bring up this help dialog</dd> </dl> </div> <div class="column middle" style='display:none'> <h3>Commit list</h3> <dl class="keyboard-mappings"> <dt>j</dt> <dd>Move selection down</dd> </dl> <dl class="keyboard-mappings"> <dt>k</dt> <dd>Move selection up</dd> </dl> <dl class="keyboard-mappings"> <dt>c <em>or</em> o <em>or</em> enter</dt> <dd>Open commit</dd> </dl> <dl class="keyboard-mappings"> <dt>y</dt> <dd>Expand URL to its canonical form</dd> </dl> </div> <div class="column last" style='display:none'> <h3>Pull request list</h3> <dl class="keyboard-mappings"> <dt>j</dt> <dd>Move selection down</dd> </dl> <dl class="keyboard-mappings"> <dt>k</dt> <dd>Move selection up</dd> </dl> <dl class="keyboard-mappings"> <dt>o <em>or</em> enter</dt> <dd>Open issue</dd> </dl> </div> </div> <div style='display:none'> <div class="rule"></div> <h3>Issues</h3> <div class="columns threecols"> <div class="column first"> <dl class="keyboard-mappings"> <dt>j</dt> <dd>Move selection down</dd> </dl> <dl class="keyboard-mappings"> <dt>k</dt> <dd>Move selection up</dd> </dl> <dl class="keyboard-mappings"> <dt>x</dt> <dd>Toggle selection</dd> </dl> <dl class="keyboard-mappings"> <dt>o <em>or</em> enter</dt> <dd>Open issue</dd> </dl> </div> <div class="column middle"> <dl class="keyboard-mappings"> <dt>I</dt> <dd>Mark selection as read</dd> </dl> <dl class="keyboard-mappings"> <dt>U</dt> <dd>Mark selection as unread</dd> </dl> <dl class="keyboard-mappings"> <dt>y</dt> <dd>Remove selection from view</dd> </dl> </div> <div class="column last"> <dl class="keyboard-mappings"> <dt>c</dt> <dd>Create issue</dd> </dl> <dl class="keyboard-mappings"> <dt>l</dt> <dd>Create label</dd> </dl> <dl class="keyboard-mappings"> <dt>i</dt> <dd>Back to inbox</dd> </dl> <dl class="keyboard-mappings"> <dt>u</dt> <dd>Back to issues</dd> </dl> <dl class="keyboard-mappings"> <dt>/</dt> <dd>Focus issues search</dd> </dl> </div> </div> </div> <div style='display:none'> <div class="rule"></div> <h3>Issues Dashboard</h3> <div class="columns threecols"> <div class="column first"> <dl class="keyboard-mappings"> <dt>j</dt> <dd>Move selection down</dd> </dl> <dl class="keyboard-mappings"> <dt>k</dt> <dd>Move selection up</dd> </dl> <dl class="keyboard-mappings"> <dt>o <em>or</em> enter</dt> <dd>Open issue</dd> </dl> </div> </div> </div> <div style='display:none'> <div class="rule"></div> <h3>Network Graph</h3> <div class="columns equacols"> <div class="column first"> <dl class="keyboard-mappings"> <dt><span class="badmono">←</span> <em>or</em> h</dt> <dd>Scroll left</dd> </dl> <dl class="keyboard-mappings"> <dt><span class="badmono">→</span> <em>or</em> l</dt> <dd>Scroll right</dd> </dl> <dl class="keyboard-mappings"> <dt><span class="badmono">↑</span> <em>or</em> k</dt> <dd>Scroll up</dd> </dl> <dl class="keyboard-mappings"> <dt><span class="badmono">↓</span> <em>or</em> j</dt> <dd>Scroll down</dd> </dl> <dl class="keyboard-mappings"> <dt>t</dt> <dd>Toggle visibility of head labels</dd> </dl> </div> <div class="column last"> <dl class="keyboard-mappings"> <dt>shift <span class="badmono">←</span> <em>or</em> shift h</dt> <dd>Scroll all the way left</dd> </dl> <dl class="keyboard-mappings"> <dt>shift <span class="badmono">→</span> <em>or</em> shift l</dt> <dd>Scroll all the way right</dd> </dl> <dl class="keyboard-mappings"> <dt>shift <span class="badmono">↑</span> <em>or</em> shift k</dt> <dd>Scroll all the way up</dd> </dl> <dl class="keyboard-mappings"> <dt>shift <span class="badmono">↓</span> <em>or</em> shift j</dt> <dd>Scroll all the way down</dd> </dl> </div> </div> </div> <div style='display:none'> <div class="rule"></div> <div class="columns threecols"> <div class="column first" style='display:none'> <h3>Source Code Browsing</h3> <dl class="keyboard-mappings"> <dt>t</dt> <dd>Activates the file finder</dd> </dl> <dl class="keyboard-mappings"> <dt>l</dt> <dd>Jump to line</dd> </dl> <dl class="keyboard-mappings"> <dt>w</dt> <dd>Switch branch/tag</dd> </dl> <dl class="keyboard-mappings"> <dt>y</dt> <dd>Expand URL to its canonical form</dd> </dl> </div> </div> </div> </div> <div id="markdown-help" class="instapaper_ignore readability-extra"> <h2>Markdown Cheat Sheet</h2> <div class="cheatsheet-content"> <div class="mod"> <div class="col"> <h3>Format Text</h3> <p>Headers</p> <pre> # This is an <h1> tag ## This is an <h2> tag ###### This is an <h6> tag</pre> <p>Text styles</p> <pre> *This text will be italic* _This will also be italic_ **This text will be bold** __This will also be bold__ *You **can** combine them* </pre> </div> <div class="col"> <h3>Lists</h3> <p>Unordered</p> <pre> * Item 1 * Item 2 * Item 2a * Item 2b</pre> <p>Ordered</p> <pre> 1. Item 1 2. Item 2 3. Item 3 * Item 3a * Item 3b</pre> </div> <div class="col"> <h3>Miscellaneous</h3> <p>Images</p> <pre> ![GitHub Logo](/images/logo.png) Format: ![Alt Text](url) </pre> <p>Links</p> <pre> http://github.com - automatic! [GitHub](http://github.com)</pre> <p>Blockquotes</p> <pre> As Kanye West said: > We're living the future so > the present is our past. </pre> </div> </div> <div class="rule"></div> <h3>Code Examples in Markdown</h3> <div class="col"> <p>Syntax highlighting with <a href="http://github.github.com/github-flavored-markdown/" title="GitHub Flavored Markdown" target="_blank">GFM</a></p> <pre> ```javascript function fancyAlert(arg) { if(arg) { $.facebox({div:'#foo'}) } } ```</pre> </div> <div class="col"> <p>Or, indent your code 4 spaces</p> <pre> Here is a Python code example without syntax highlighting: def foo: if not bar: return true</pre> </div> <div class="col"> <p>Inline code for comments</p> <pre> I think you should use an `<addr>` element here instead.</pre> </div> </div> </div> </div> <div class="ajax-error-message"> <p><span class="icon"></span> Something went wrong with that request. Please try again. <a href="javascript:;" class="ajax-error-dismiss">Dismiss</a></p> </div> <div id="logo-popup"> <h2>Looking for the GitHub logo?</h2> <ul> <li> <h4>GitHub Logo</h4> <a href="http://github-media-downloads.s3.amazonaws.com/GitHub_Logos.zip"><img alt="Github_logo" src="https://a248.e.akamai.net/assets.github.com/images/modules/about_page/github_logo.png?1315937721" /></a> <a href="http://github-media-downloads.s3.amazonaws.com/GitHub_Logos.zip" class="minibutton btn-download download"><span><span class="icon"></span>Download</span></a> </li> <li> <h4>The Octocat</h4> <a href="http://github-media-downloads.s3.amazonaws.com/Octocats.zip"><img alt="Octocat" src="https://a248.e.akamai.net/assets.github.com/images/modules/about_page/octocat.png?1315937721" /></a> <a href="http://github-media-downloads.s3.amazonaws.com/Octocats.zip" class="minibutton btn-download download"><span><span class="icon"></span>Download</span></a> </li> </ul> </div> <span id='server_response_time' data-time='0.02171' data-host='fe12'></span> </body> </html>

Methoden-Beschreibung

parse_json parst die angegebene JSON-Datei zu einem JSON-Objekt.
get_server_uri_json sucht rekursiv im JSON-Objekt nach URIs, die das HTTP-Protokoll nutzen. Es werden nur die Domains aufgenommen. Zudem werden auch alle gleichen URIs gelöscht. Das ist dann nötig, wenn man mehrere Links zu gleichen Domains gespeichert hat.

Beispiel: http://blog.dsiw-it.de/me/ wird zu http://blog.dsiw-it.de

fetch_uris setzt die Anfragen an die Server der URIs ab und speichert die Antwort in einem Hash ab. Dabei werden Weiterleitungen verfolgt.
filter_headers filtert aus allen Antworten nur den Header heraus, sodass der Body verworfen wird. Wirklich Speichersparsam ist das nicht, mir ist aber keine bessere Möglichkeit eingefallen.
get_server_types filtert aus den Headern die eigentlichen Server-Bezeichnungen raus und nutzt filter_version.
filter_version filtert die Versionsnummern der Bezeichnungen heraus. Ich habe festgestellt, dass diese immer nach einem / folgen. Also die Bezeichnungen dieses Schema haben: Serverbezeichnung/Version_und_andere_Informationen.
add_server fügt einen neuen Servertyp mit der Anzahl 1 hinzu, wenn dieser noch nicht im Hash enthalten ist, ansonsten wird die Anzahl des Typs inkrementiert (+1).
reverse_sorted sortiert die Serveranzahl absteigend, sodass der meist genutzte Server ganz oben auf der Liste steht.

Ausgabe

Ausgabe (get_servers_output.log) download

Get headers of 969 server. Please wait...
: apache
: nginx
: microsoft-iis
: nv
: lighttpd
: apache-coyote
: cloudflare-nginx
: gse
: zope
: litespeed
: httpd
: google frontend
: gws
: server.archlinux.de
: server
: ibm_http_server
: nginx
: codesite_static_content
: yts
: cherrypy
: gunicorn
[...]

Number of URIs:    969
Number of servers: 979
Number of no information: 0 (0.0%)
./get_servers.rb bookmarks-2012-03-17.json  6,71s user 2,57s system 0% cpu 23:37,76 total

Anmerkung: Dadurch, dass ich sehr Open-Source interessiert bin, kann es die Zahlen beeinflussen. Entsprechend wird der Apache in Gegensatz zum Microsoft IIS sehr oft genutzt.

Leider brauchte das Skript für diese Anzahl an Servern lange 23 Minuten (siehe letzte Zeile).
Wie man in der Funktion fetch_uris erkennen kann, habe ich dort ein paar Zeilen auskommentiert. Wenn diese wieder von Ruby interpretiert werden, dann würden die Anfragen von Curb parallel verlaufen. Die Laufzeit fiel damit auf ungefähr eine Minute. Leider wurden dabei viele leere Antworten in die Liste aufgenommen. Bei dieser Anzahl der Server waren es 700 Anfragen mit keiner Information. Mit zunehmender Anzahl der parallelen Anfragen, wuchs die Anzahl der leeren Antworten exponentiell.
Ich habe mich an dieses Beispiel vom Gem gehalten.

Ich freue mich auf Verbesserungsvorschläge!

Wie schaut bei euch die Server-Verteilung aus?

DSIW

Alles was interessant ist... (Linux, Programmierung, Datenschutz, Medien, uvm.)