Dieses Skript ist eine Erweiterung von diesem Skript nutzt die Lesezeichen vom Firefox. Diese kann man im JSON-Format exportieren, indem Firefox >
Lesezeichen > Alle Lesezeichen anzeigen > Importieren und Sichern > Sichern...
aufgerufen wird.
Aus allen Lesezeichen werden Anfragen an den Server gestellt. Dieser schickt in der Antwort den eingesetzten
Server-Typ mit. Alle Typen werden aufgenommen und gezählt.
Am Ende bekommt man so eine Übersicht über die eingesetzten Server der eigenen Lesezeichen.
Gems
curb wird genutzt, um die HTTP-Header-Anfragen mittels Curl durchzuführen.
http_headers parst den Header, der mittels Curb empfangen wurde.
Installation
$ gem install curb http_headers
$ wget https://raw.github.com/gist/2253541/6d60762a76d17f520b825e4cd875c31cf3767390/count_server.rb
$ chmod +x count_server.rb
Beispiel
$ ./count_server.rb bookmarks-2012-03-31.json
Skript
<!DOCTYPE html>
<html>
<head prefix="og: http://ogp.me/ns# fb: http://ogp.me/ns/fb# githubog: http://ogp.me/ns/fb/githubog#">
<meta charset='utf-8'>
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<title>404 · GitHub</title>
<link rel="search" type="application/opensearchdescription+xml" href="/opensearch.xml" title="GitHub" />
<link rel="fluid-icon" href="https://github.com/fluidicon.png" title="GitHub" />
<link rel="shortcut icon" href="/favicon.ico" type="image/x-icon" />
<link href="https://a248.e.akamai.net/assets.github.com/stylesheets/bundles/github-e2fb92c4dcb5e5b1ce2ffd0e84d6bf80937d9197.css" media="screen" rel="stylesheet" type="text/css" />
<link href="https://a248.e.akamai.net/assets.github.com/stylesheets/bundles/github2-98a6177ed18ac7b415e311fdb34652f17ad0038c.css" media="screen" rel="stylesheet" type="text/css" />
<script src="https://a248.e.akamai.net/assets.github.com/javascripts/bundles/jquery-225576cef50ef2097c9f9fbcd8953c1572544611.js" type="text/javascript"></script>
<script src="https://a248.e.akamai.net/assets.github.com/javascripts/bundles/github-353ded132c604f1bdf010516392d71052f37ffcf.js" type="text/javascript"></script>
</head>
<body class="logged_out env-production " data-blob-contribs-enabled="yes">
<div id="wrapper">
<div id="header" class="true clearfix">
<div class="container clearfix">
<a class="site-logo" href="https://github.com">
<!--[if IE]>
<img alt="GitHub" class="github-logo" src="https://a248.e.akamai.net/assets.github.com/images/modules/header/logov7.png?1323882717" />
<img alt="GitHub" class="github-logo-hover" src="https://a248.e.akamai.net/assets.github.com/images/modules/header/logov7-hover.png?1324325358" />
<![endif]-->
<img alt="GitHub" class="github-logo-4x" height="30" src="https://a248.e.akamai.net/assets.github.com/images/modules/header/logov7@4x.png?1323882717" />
<img alt="GitHub" class="github-logo-4x-hover" height="30" src="https://a248.e.akamai.net/assets.github.com/images/modules/header/logov7@4x-hover.png?1324325358" />
</a>
<!--
make sure to use fully qualified URLs here since this nav
is used on error pages on other domains
-->
<ul class="top-nav logged_out">
<li class="pricing"><a href="https://github.com/plans">Signup and Pricing</a></li>
<li class="explore"><a href="https://github.com/explore">Explore GitHub</a></li>
<li class="features"><a href="https://github.com/features">Features</a></li>
<li class="blog"><a href="https://github.com/blog">Blog</a></li>
<li class="login"><a href="https://github.com/login?return_to=%2Fgist%2F2253541%2Fcount_server_firefox_bookmarks.rb">Login</a></li>
</ul>
</div>
</div>
<div class="site clearfix">
<div class="container">
<style type="text/css">
#header{
border-bottom: 0;
}
.site{
padding: 0;
margin-top: -20px;
}
* {
margin: 0px;
padding: 0px;
}
#parallax_field{
overflow: hidden;
position: absolute;
left: 0;
top: 0;
height: 370px;
width: 100%;
}
#parallax_field #parallax_bg {
position: absolute;
top: -20px;
left: -20px;
width: 110%;
height: 425px;
z-index: 1;
}
#parallax_illustration {
display:block;
width: 940px;
height: 370px;
position: relative;
overflow: hidden;
clear: both;
}
#parallax_illustration #parallax_error_text {
position: absolute;
top: 72px;
left: 72px;
z-index: 10;
}
#parallax_illustration #parallax_octocat {
position: absolute;
top: 94px;
left: 356px;
z-index: 9;
}
#parallax_illustration #parallax_speeder {
position: absolute;
top: 150px;
left: 432px;
z-index: 8;
}
#parallax_illustration #parallax_octocatshadow {
position: absolute;
top: 297px;
left: 371px;
z-index: 7;
}
#parallax_illustration #parallax_speedershadow {
position: absolute;
top: 263px;
left: 442px;
z-index: 6;
}
#parallax_illustration #parallax_building_1 {
position: absolute;
top: 73px;
left: 467px;
z-index: 5;
}
#parallax_illustration #parallax_building_2 {
position: absolute;
top: 113px;
left: 762px;
z-index: 4;
}
#footer {
margin-top: 0px;
z-index: 12;
}
</style>
<div id="parallax_field">
<img alt="building" class="js-plaxify" data-invert="true" data-xrange="0" data-yrange="20" height="415" id="parallax_bg" src="https://a248.e.akamai.net/assets.github.com/images/modules/404/parallax_bg.jpg?1315937721" width="940" />
</div>
<div id="parallax_illustration">
<img alt="404 | “This is not the web page you are looking for”" class="js-plaxify" data-xrange="20" data-yrange="10" height="249" id="parallax_error_text" src="https://a248.e.akamai.net/assets.github.com/images/modules/404/parallax_errortext.png?1315937721" width="271" />
<img alt="Octobi Wan Catnobi" class="js-plaxify" data-xrange="10" data-yrange="10" height="230" id="parallax_octocat" src="https://a248.e.akamai.net/assets.github.com/images/modules/404/parallax_octocat.png?1315937721" width="188" />
<img alt="land speeder" class="js-plaxify" data-xrange="10" data-yrange="10" height="156" id="parallax_speeder" src="https://a248.e.akamai.net/assets.github.com/images/modules/404/parallax_speeder.png?1315937721" width="440" />
<img alt="Octobi Wan Catnobi's shadow" class="js-plaxify" data-xrange="10" data-yrange="10" height="49" id="parallax_octocatshadow" src="https://a248.e.akamai.net/assets.github.com/images/modules/404/parallax_octocatshadow.png?1315937721" width="166" />
<img alt="land speeder's shadow" class="js-plaxify" data-xrange="10" data-yrange="10" height="75" id="parallax_speedershadow" src="https://a248.e.akamai.net/assets.github.com/images/modules/404/parallax_speedershadow.png?1315937721" width="430" />
<img alt="building" class="js-plaxify" data-invert="true" data-xrange="50" data-yrange="20" height="123" id="parallax_building_1" src="https://a248.e.akamai.net/assets.github.com/images/modules/404/parallax_building_1.png?1315937721" width="304" />
<img alt="building" class="js-plaxify" data-invert="true" data-xrange="75" data-yrange="30" height="50" id="parallax_building_2" src="https://a248.e.akamai.net/assets.github.com/images/modules/404/parallax_building_2.png?1315937721" width="116" />
</div>
</div>
<div class="context-overlay"></div>
</div>
<div id="footer-push"></div><!-- hack for sticky footer -->
</div><!-- end of wrapper - hack for sticky footer -->
<!-- footer -->
<div id="footer" >
<div class="upper_footer">
<div class="container clearfix">
<!--[if IE]><h4 id="blacktocat_ie">GitHub Links</h4><![endif]-->
<![if !IE]><h4 id="blacktocat">GitHub Links</h4><![endif]>
<ul class="footer_nav">
<h4>GitHub</h4>
<li><a href="https://github.com/about">About</a></li>
<li><a href="https://github.com/blog">Blog</a></li>
<li><a href="https://github.com/features">Features</a></li>
<li><a href="https://github.com/contact">Contact & Support</a></li>
<li><a href="https://github.com/training">Training</a></li>
<li><a href="http://enterprise.github.com/">GitHub Enterprise</a></li>
<li><a href="http://status.github.com/">Site Status</a></li>
</ul>
<ul class="footer_nav">
<h4>Tools</h4>
<li><a href="http://get.gaug.es/">Gauges: Analyze web traffic</a></li>
<li><a href="http://speakerdeck.com">Speaker Deck: Presentations</a></li>
<li><a href="https://gist.github.com">Gist: Code snippets</a></li>
<li><a href="http://mac.github.com/">GitHub for Mac</a></li>
<li><a href="http://mobile.github.com/">Issues for iPhone</a></li>
<li><a href="http://jobs.github.com/">Job Board</a></li>
</ul>
<ul class="footer_nav">
<h4>Extras</h4>
<li><a href="http://shop.github.com/">GitHub Shop</a></li>
<li><a href="http://octodex.github.com/">The Octodex</a></li>
</ul>
<ul class="footer_nav">
<h4>Documentation</h4>
<li><a href="http://help.github.com/">GitHub Help</a></li>
<li><a href="http://developer.github.com/">Developer API</a></li>
<li><a href="http://github.github.com/github-flavored-markdown/">GitHub Flavored Markdown</a></li>
<li><a href="http://pages.github.com/">GitHub Pages</a></li>
</ul>
</div><!-- /.site -->
</div><!-- /.upper_footer -->
<div class="lower_footer">
<div class="container clearfix">
<!--[if IE]><div id="legal_ie"><![endif]-->
<![if !IE]><div id="legal"><![endif]>
<ul>
<li><a href="https://github.com/site/terms">Terms of Service</a></li>
<li><a href="https://github.com/site/privacy">Privacy</a></li>
<li><a href="https://github.com/security">Security</a></li>
</ul>
<p>© 2012 <span title="0.01533s from fe12.rs.github.com">GitHub</span> Inc. All rights reserved.</p>
</div><!-- /#legal or /#legal_ie-->
<div class="sponsor">
<a href="http://www.rackspace.com" class="logo">
<img alt="Dedicated Server" height="36" src="https://a248.e.akamai.net/assets.github.com/images/modules/footer/rackspaces_logo.png?1329521040" width="38" />
</a>
Powered by the <a href="http://www.rackspace.com ">Dedicated
Servers</a> and<br/> <a href="http://www.rackspacecloud.com">Cloud
Computing</a> of Rackspace Hosting<span>®</span>
</div>
</div><!-- /.site -->
</div><!-- /.lower_footer -->
</div><!-- /#footer -->
<div id="keyboard_shortcuts_pane" class="instapaper_ignore readability-extra" style="display:none">
<h2>Keyboard Shortcuts <small><a href="#" class="js-see-all-keyboard-shortcuts">(see all)</a></small></h2>
<div class="columns threecols">
<div class="column first">
<h3>Site wide shortcuts</h3>
<dl class="keyboard-mappings">
<dt>s</dt>
<dd>Focus site search</dd>
</dl>
<dl class="keyboard-mappings">
<dt>?</dt>
<dd>Bring up this help dialog</dd>
</dl>
</div><!-- /.column.first -->
<div class="column middle" style='display:none'>
<h3>Commit list</h3>
<dl class="keyboard-mappings">
<dt>j</dt>
<dd>Move selection down</dd>
</dl>
<dl class="keyboard-mappings">
<dt>k</dt>
<dd>Move selection up</dd>
</dl>
<dl class="keyboard-mappings">
<dt>c <em>or</em> o <em>or</em> enter</dt>
<dd>Open commit</dd>
</dl>
<dl class="keyboard-mappings">
<dt>y</dt>
<dd>Expand URL to its canonical form</dd>
</dl>
</div><!-- /.column.first -->
<div class="column last" style='display:none'>
<h3>Pull request list</h3>
<dl class="keyboard-mappings">
<dt>j</dt>
<dd>Move selection down</dd>
</dl>
<dl class="keyboard-mappings">
<dt>k</dt>
<dd>Move selection up</dd>
</dl>
<dl class="keyboard-mappings">
<dt>o <em>or</em> enter</dt>
<dd>Open issue</dd>
</dl>
</div><!-- /.columns.last -->
</div><!-- /.columns.equacols -->
<div style='display:none'>
<div class="rule"></div>
<h3>Issues</h3>
<div class="columns threecols">
<div class="column first">
<dl class="keyboard-mappings">
<dt>j</dt>
<dd>Move selection down</dd>
</dl>
<dl class="keyboard-mappings">
<dt>k</dt>
<dd>Move selection up</dd>
</dl>
<dl class="keyboard-mappings">
<dt>x</dt>
<dd>Toggle selection</dd>
</dl>
<dl class="keyboard-mappings">
<dt>o <em>or</em> enter</dt>
<dd>Open issue</dd>
</dl>
</div><!-- /.column.first -->
<div class="column middle">
<dl class="keyboard-mappings">
<dt>I</dt>
<dd>Mark selection as read</dd>
</dl>
<dl class="keyboard-mappings">
<dt>U</dt>
<dd>Mark selection as unread</dd>
</dl>
<dl class="keyboard-mappings">
<dt>y</dt>
<dd>Remove selection from view</dd>
</dl>
</div><!-- /.column.middle -->
<div class="column last">
<dl class="keyboard-mappings">
<dt>c</dt>
<dd>Create issue</dd>
</dl>
<dl class="keyboard-mappings">
<dt>l</dt>
<dd>Create label</dd>
</dl>
<dl class="keyboard-mappings">
<dt>i</dt>
<dd>Back to inbox</dd>
</dl>
<dl class="keyboard-mappings">
<dt>u</dt>
<dd>Back to issues</dd>
</dl>
<dl class="keyboard-mappings">
<dt>/</dt>
<dd>Focus issues search</dd>
</dl>
</div>
</div>
</div>
<div style='display:none'>
<div class="rule"></div>
<h3>Issues Dashboard</h3>
<div class="columns threecols">
<div class="column first">
<dl class="keyboard-mappings">
<dt>j</dt>
<dd>Move selection down</dd>
</dl>
<dl class="keyboard-mappings">
<dt>k</dt>
<dd>Move selection up</dd>
</dl>
<dl class="keyboard-mappings">
<dt>o <em>or</em> enter</dt>
<dd>Open issue</dd>
</dl>
</div><!-- /.column.first -->
</div>
</div>
<div style='display:none'>
<div class="rule"></div>
<h3>Network Graph</h3>
<div class="columns equacols">
<div class="column first">
<dl class="keyboard-mappings">
<dt><span class="badmono">←</span> <em>or</em> h</dt>
<dd>Scroll left</dd>
</dl>
<dl class="keyboard-mappings">
<dt><span class="badmono">→</span> <em>or</em> l</dt>
<dd>Scroll right</dd>
</dl>
<dl class="keyboard-mappings">
<dt><span class="badmono">↑</span> <em>or</em> k</dt>
<dd>Scroll up</dd>
</dl>
<dl class="keyboard-mappings">
<dt><span class="badmono">↓</span> <em>or</em> j</dt>
<dd>Scroll down</dd>
</dl>
<dl class="keyboard-mappings">
<dt>t</dt>
<dd>Toggle visibility of head labels</dd>
</dl>
</div><!-- /.column.first -->
<div class="column last">
<dl class="keyboard-mappings">
<dt>shift <span class="badmono">←</span> <em>or</em> shift h</dt>
<dd>Scroll all the way left</dd>
</dl>
<dl class="keyboard-mappings">
<dt>shift <span class="badmono">→</span> <em>or</em> shift l</dt>
<dd>Scroll all the way right</dd>
</dl>
<dl class="keyboard-mappings">
<dt>shift <span class="badmono">↑</span> <em>or</em> shift k</dt>
<dd>Scroll all the way up</dd>
</dl>
<dl class="keyboard-mappings">
<dt>shift <span class="badmono">↓</span> <em>or</em> shift j</dt>
<dd>Scroll all the way down</dd>
</dl>
</div><!-- /.column.last -->
</div>
</div>
<div style='display:none'>
<div class="rule"></div>
<div class="columns threecols">
<div class="column first" style='display:none'>
<h3>Source Code Browsing</h3>
<dl class="keyboard-mappings">
<dt>t</dt>
<dd>Activates the file finder</dd>
</dl>
<dl class="keyboard-mappings">
<dt>l</dt>
<dd>Jump to line</dd>
</dl>
<dl class="keyboard-mappings">
<dt>w</dt>
<dd>Switch branch/tag</dd>
</dl>
<dl class="keyboard-mappings">
<dt>y</dt>
<dd>Expand URL to its canonical form</dd>
</dl>
</div>
</div>
</div>
</div>
<div id="markdown-help" class="instapaper_ignore readability-extra">
<h2>Markdown Cheat Sheet</h2>
<div class="cheatsheet-content">
<div class="mod">
<div class="col">
<h3>Format Text</h3>
<p>Headers</p>
<pre>
# This is an <h1> tag
## This is an <h2> tag
###### This is an <h6> tag</pre>
<p>Text styles</p>
<pre>
*This text will be italic*
_This will also be italic_
**This text will be bold**
__This will also be bold__
*You **can** combine them*
</pre>
</div>
<div class="col">
<h3>Lists</h3>
<p>Unordered</p>
<pre>
* Item 1
* Item 2
* Item 2a
* Item 2b</pre>
<p>Ordered</p>
<pre>
1. Item 1
2. Item 2
3. Item 3
* Item 3a
* Item 3b</pre>
</div>
<div class="col">
<h3>Miscellaneous</h3>
<p>Images</p>
<pre>
![GitHub Logo](/images/logo.png)
Format: ![Alt Text](url)
</pre>
<p>Links</p>
<pre>
http://github.com - automatic!
[GitHub](http://github.com)</pre>
<p>Blockquotes</p>
<pre>
As Kanye West said:
> We're living the future so
> the present is our past.
</pre>
</div>
</div>
<div class="rule"></div>
<h3>Code Examples in Markdown</h3>
<div class="col">
<p>Syntax highlighting with <a href="http://github.github.com/github-flavored-markdown/" title="GitHub Flavored Markdown" target="_blank">GFM</a></p>
<pre>
```javascript
function fancyAlert(arg) {
if(arg) {
$.facebox({div:'#foo'})
}
}
```</pre>
</div>
<div class="col">
<p>Or, indent your code 4 spaces</p>
<pre>
Here is a Python code example
without syntax highlighting:
def foo:
if not bar:
return true</pre>
</div>
<div class="col">
<p>Inline code for comments</p>
<pre>
I think you should use an
`<addr>` element here instead.</pre>
</div>
</div>
</div>
</div>
<div class="ajax-error-message">
<p><span class="icon"></span> Something went wrong with that request. Please try again. <a href="javascript:;" class="ajax-error-dismiss">Dismiss</a></p>
</div>
<div id="logo-popup">
<h2>Looking for the GitHub logo?</h2>
<ul>
<li>
<h4>GitHub Logo</h4>
<a href="http://github-media-downloads.s3.amazonaws.com/GitHub_Logos.zip"><img alt="Github_logo" src="https://a248.e.akamai.net/assets.github.com/images/modules/about_page/github_logo.png?1315937721" /></a>
<a href="http://github-media-downloads.s3.amazonaws.com/GitHub_Logos.zip" class="minibutton btn-download download"><span><span class="icon"></span>Download</span></a>
</li>
<li>
<h4>The Octocat</h4>
<a href="http://github-media-downloads.s3.amazonaws.com/Octocats.zip"><img alt="Octocat" src="https://a248.e.akamai.net/assets.github.com/images/modules/about_page/octocat.png?1315937721" /></a>
<a href="http://github-media-downloads.s3.amazonaws.com/Octocats.zip" class="minibutton btn-download download"><span><span class="icon"></span>Download</span></a>
</li>
</ul>
</div>
<span id='server_response_time' data-time='0.02171' data-host='fe12'></span>
</body>
</html>
Methoden-Beschreibung
parse_json
parst die angegebene JSON-Datei zu einem JSON-Objekt.
get_server_uri_json
sucht rekursiv im JSON-Objekt nach URIs, die das HTTP-Protokoll nutzen. Es
werden nur die Domains aufgenommen. Zudem werden auch alle gleichen URIs gelöscht. Das ist
dann nötig, wenn man mehrere Links zu gleichen Domains gespeichert hat.
Beispiel: http://blog.dsiw-it.de/me/
wird zu http://blog.dsiw-it.de
fetch_uris
setzt die Anfragen an die Server der URIs ab und speichert die Antwort in einem Hash ab. Dabei
werden Weiterleitungen verfolgt.
filter_headers
filtert aus allen Antworten nur den Header heraus, sodass der Body verworfen wird. Wirklich
Speichersparsam ist das nicht, mir ist aber keine bessere Möglichkeit eingefallen.
get_server_types
filtert aus den Headern die eigentlichen Server-Bezeichnungen raus und nutzt
filter_version
.
filter_version
filtert die Versionsnummern der Bezeichnungen heraus. Ich habe festgestellt, dass diese immer nach
einem /
folgen. Also die Bezeichnungen dieses Schema haben: Serverbezeichnung/Version_und_andere_Informationen
.
add_server
fügt einen neuen Servertyp mit der Anzahl 1 hinzu, wenn dieser noch nicht im Hash
enthalten ist, ansonsten wird die Anzahl des Typs inkrementiert (+1).
reverse_sorted
sortiert die Serveranzahl absteigend, sodass der meist genutzte Server ganz oben auf der Liste
steht.
Ausgabe
Ausgabe (get_servers_output.log) download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Get headers of 969 server. Please wait...
674 : apache
123 : nginx
52 : microsoft-iis
18 : nv
16 : lighttpd
15 : apache-coyote
6 : cloudflare-nginx
6 : gse
5 : zope
5 : litespeed
4 : httpd
4 : google frontend
4 : gws
3 : server.archlinux.de
3 : server
3 : ibm_http_server
2 : nginx
2 : codesite_static_content
2 : yts
2 : cherrypy
2 : gunicorn
[...]
Number of URIs: 969
Number of servers: 979
Number of no information: 0 (0.0%)
./get_servers.rb bookmarks-2012-03-17.json 6,71s user 2,57s system 0% cpu 23:37,76 total
Anmerkung: Dadurch, dass ich sehr Open-Source interessiert bin, kann es die Zahlen beeinflussen. Entsprechend wird der
Apache in Gegensatz zum Microsoft IIS sehr oft genutzt.
Leider brauchte das Skript für diese Anzahl an Servern lange 23 Minuten (siehe letzte Zeile).
Wie man in der Funktion fetch_uris
erkennen kann, habe ich dort ein paar Zeilen auskommentiert. Wenn diese wieder von
Ruby interpretiert werden, dann würden die Anfragen von Curb parallel verlaufen. Die Laufzeit fiel damit auf ungefähr
eine Minute. Leider wurden dabei viele leere Antworten in die Liste aufgenommen. Bei dieser Anzahl der Server waren es
700 Anfragen mit keiner Information. Mit zunehmender Anzahl der parallelen Anfragen, wuchs die Anzahl der leeren
Antworten exponentiell.
Ich habe mich an dieses Beispiel vom Gem gehalten.
Ich freue mich auf Verbesserungsvorschläge!
Wie schaut bei euch die Server-Verteilung aus?