昨日Qiitaのアドベントカレンダーの購読数とストック数について調べたついでにはてなブックマーク数も調べたので気が早いけどランキングを出してみた
sucrose.hatenablog.com
Qiita
意外と(?)上位に企業系のアドベントカレンダーが多い
順位 | カレンダー名 | はてなブックマーク数 |
---|---|---|
1 | システムエンジニア | 1639 |
2 | Vim | 1500 |
3 | 第2のドワンゴ | 1432 |
4 | freee Engineers | 1384 |
5 | ピクシブ株式会社 | 1367 |
6 | プログラミング大好きベーシック | 1284 |
7 | ドワンゴ | 1195 |
8 | gumi | 1129 |
9 | Yahoo! JAPAN Tech | 922 |
10 | Go その2 | 867 |
Qiita
# -*- coding: utf-8 -*- import pyquery import time import requests def getCalendarList(year, page): calendar_list = pyquery.PyQuery(url='http://qiita.com/advent-calendar/{}/calendars?page={}'.format(year, page)) calendars = set() for elm in calendar_list.find('.adventCalendarList_calendarTitle > a'): a = pyquery.PyQuery(elm) href = a.attr('href') calendars.add((href[22:], a.text())) return calendars def getArticles(name): calendar = pyquery.PyQuery(url='http://qiita.com/advent-calendar/2015/{}'.format(name)) article = set() for elm in calendar.find('.adventCalendarItem_entry > a'): a = pyquery.PyQuery(elm) url = a.attr('href') article.add(url) return article def getHatenaBookmarkCount(urls): assert len(urls) <= 50 return requests.get('http://api.b.st-hatena.com/entry.counts', params={'url': urls}).json() if __name__ == '__main__': calendars = set() for i in range(1, 20): calendars |= getCalendarList(2015, i) time.sleep(1) result = [] for name, title in calendars: articles = getArticles(name) urls = ['http://qiita.com/advent-calendar/2015/' + name] + list(articles) hatebu_count = getHatenaBookmarkCount(urls) result.append((sum(hatebu_count.values()), name, title)) time.sleep(1) result.sort(reverse=True) print u'|*順位|*カレンダー名|*はてなブックマーク数|'.encode('utf-8') for i, (count, name, title) in enumerate(result[:100], 1): print u'|{0}|<a href="http://qiita.com/advent-calendar/2015/{1}">{2}</a>|{3}|'.format(i, name, title, count).encode('utf-8')
Adventar
ちょっと書きかけ
# -*- coding: utf-8 -*- import pyquery import time import requests def getCalendarList(year): calendar_list = pyquery.PyQuery(url='http://www.adventar.org/calendars?year={}'.format(year)) urls = set() for elm in calendar_list.find('.mod-calendarList-title > a'): a = pyquery.PyQuery(elm) href = 'http://www.adventar.org' + a.attr('href') urls.add(href) return urls def getArticles(url): calendar = pyquery.PyQuery(url=url) article = set() for elm in calendar.find('.mod-entryList-url > a'): a = pyquery.PyQuery(elm) if 'http' in a.text(): url = a.attr('href') article.add(url) return article def getHatenaBookmarkCount(urls): assert len(urls) <= 50 return requests.get('http://api.b.st-hatena.com/entry.counts', params={'url': urls}).json() if __name__ == '__main__': calendars = getCalendarList(2015) time.sleep(1) result = [] for url in sorted(calendars): articles = getArticles(url) urls = [url] + list(articles) hatebu_count = getHatenaBookmarkCount(urls) print url, sum(hatebu_count.values()) result.append((sum(hatebu_count.values()), url)) time.sleep(1) result.sort(reverse=True) print result[:30]