CrazyAirhead

疯狂的傻瓜,傻瓜也疯狂——傻方能执著,疯狂才专注!

0%

OLAP基本概念

联机分析处理(On-Line Analytical Processing,OLAP)

联机分析处理的概念最早是由关系数据库之父爱德华·库德(E·F·Codd)博士于1993年提出的,是一种用于组织大型商务数据库和支持商务智能的技术。OLAP 数据库分为一个或多个多维数据集,每个多维数据集都由多维数据集管理员组织和设计以适应用户检索和分析数据的方式,从而更易于创建和使用所需的数据透视表和数据透视图。

维(Dimension)

是人们观察数据的特定角度,是考虑问题时的一类属性,属性集合构成一个维(时间维、地理维等)。

维的层次(Level)

人们观察数据的某个特定角度(即某个维)还可以存在细节程度不同的各个描述方面(时间维:日期、月份、季度、年)。

维的成员(Member)

维的一个取值,是数据项在某维中位置的描述。(“某年某月某日”是在时间维上位置的描述)。

度量(Measure)

多维数组的取值。(2000年1月,上海,笔记本电脑,$100000)。

指标(Quota),补充

可度量的属性。一般为某种值,如费用,入院人次。

典型操作

OLAP的基本多维分析操作有钻取(Drill-up和Drill-down)、切片(Slice)和切块(Dice)、以及旋转(Pivot)等。

钻取

是改变维的层次,变换分析的粒度。它包括向下钻取(Drill-down)和向上钻取(Drill-up)/上卷(Roll-up)。Drill- up是在某一维上将低层次的细节数据概括到高层次的汇总数据,或者减少维数;而Drill-down则相反,它从汇总数据深入到细节数据进行观察或增加新维。

切片和切块

是在一部分维上选定值后,关心度量数据在剩余维上的分布。如果剩余的维只有两个,则是切片;如果有三个或以上,则是切块。

旋转

是变换维的方向,即在表格中重新安排维的放置(例如行列互换)。

ElasticSearch

Apache Lucene™可能是目前存在的,不论开源还是私有的,拥有最先进,高性能和全功能搜索引擎功能的库。但Lucene是很复杂的。 Elasticsearch是一个使用Java编写的开源的搜索引擎,它的内部使用 Lucene 做索引与搜索,但是它的目标是使全文检索变得简单, 通过隐藏 Lucene 的复杂性,取而代之的提供一套简单一致的 RESTful API。

然而,Elasticsearch不仅仅是 Lucene,并且也不仅仅只是一个全文搜索引擎。 它可以被下面这样准确的形容:

  • 一个分布式的实时文档存储,每个字段可以被索引与搜索
  • 一个分布式实时分析搜索引擎
  • 能胜任上百个服务节点的扩展,并支持PB级别的结构化或者非结构化数据

Elasticsearch将所有的功能打包成一个单独的服务,这样你可以通过程序去访问它提供的简单的RESTful API服务, 不论你是使用自己喜欢的编程语言还是直接使用命令行。

Elasticsearch是一个实时的分布式搜索分析引擎, 它能让你以一个之前从未有过的速度和规模,去探索你的数据。 它被用作全文检索、结构化搜索、分析以及这三个功能的组合:

  • Wikipedia 使用 Elasticsearch 提供带有高亮片段的全文搜索,还有 search-as-you-type 和 did-you-mean 的建议。
  • 卫报 使用 Elasticsearch 将网络社交数据结合到访客日志中,实时的给它的编辑们提供公众对于新文章的反馈。
  • Stack Overflow 将地理位置查询融入全文检索中去,并且使用 more-like-this 接口去查找相关的问题与答案。
  • GitHub 使用 Elasticsearch 对1300亿行代码进行查询。

ES文档和多维数据集

ES的存储结构

ES文档通过JSON格式来表示。文档是可以被索引的基本单元,文档需要索引需要指定类型。ES能支持索引的数据类型有,其中的数组类型,对象类型和内嵌类型的支持,使得ES能存储更复杂的文档。

多维联机分析处理(MOLAP)

MOLAP将OLAP分析所用到的多维数据物理上存储为多维数组的形式,形成“立方体”的结构。维的属性值被映射成多维数组的下标值或下标的范围,而总结数据作为多维数组的值存储在数组的单元中。由于MOLAP采用了新的存储结构,从物理层实现起,因此又称为物理OLAP(PhysicalOLAP);而 ROLAP主要通过一些软件工具或中间软件实现,物理层仍采用关系数据库的存储结构,因此称为虚拟OLAP(VirtualOLAP)。

我们可以看到基于复杂数据类型(Complex datatypes)构建的ES文档与MOLAP的概念是非常接近的,利用ElasticSearch的聚合的能力,我们能相对简单的实现OLAP框架。

方案

要实现整个OLAP框架主要需要处理以下几个环节

  1. 数据建模
  2. 数据ETL
  3. 数据分析
  4. 数据展示

大致思路如下:
OLAP

数据建模

数据建模就是建立多维数据集的过程,同时为了能更有效的利用ES的索引,不建议过深的JSON层次,所以对应到维度建模时采用的是星形模式。

正常情况,是先进行数据建模,之后才能进行数据ETL,首先需要知道抽取的源数据的结构,即数据元数据之后,才能进一步分析数据模型。对于ES来说,实际上就是建立Index/Type的过程。

维度管理

元数据管理

元数据管理维护了原始数据的元数据结构,简单来说就是原始数据有哪些库,哪些表,表之间有什么关系。

映射管理

映射管理是指定了需要抽取的数据范围及方法和确定了多维数据集的模型。

数据ETL

正常情况,是先进行数据建模,之后才能进行数据ETL,而实际过程中,是一般需要知道源数据的结构,即数据元数据之后,才能进一步分析数据模型。

数据ETL,实质就是从各个数据源提取数据,对数据进行转换,并最终加载填充数据到数据仓库维度建模后的表中。只有当多维数据集被填充好,ETL工作才算完成。

数据加载主要分为两种,一种是首次加载,一种是刷新加载(增量加载)。当前产品中主要涉及两种数据库,Mysql和Hbase。

首次加载

通过Quartz Job或MapRedurce Job将数据转换为Kafka消息,由数据加载组件来进行数据处理。

刷新加载

通过扩展mysql-binlog-connector-java和HBase Side-Effect Processor,可以实现数据的实时增量更新。本方案中通过Kafka消息中间件来统一转发不同类型的数据更新,此处需要定义好Kafka的消息格式,以便数据加载组件能较简单的处理数据。

数据转换

转换步骤主要是指对提取好了的数据的结构进行转换,以满足目标数据仓库模型的过程。此外,转换过程也负责数据质量工作,这部分也被称为数据清洗(data cleaning)。转换过程可以通过逐步实现转换算法,动态上线的方式进行。

统计分析

统计分析其实就是从事实表中统计任意组合维度的指标,也就是过滤、分组、聚合,其中,聚合除了一般的SUM、COUNT、AVG、MAX、MIN等,还有一个重要的COUNT(DISTINCT)。ElasticSearch本来就是做实时搜索的,过滤自然不是问题,现在也支持各种聚合以及Pipeline aggregations(相当于SQL子查询的功能)。

统计分析是根据模板配置和页面操作时的条件生成ES语法,执行语法,返回结果数据的过程。从表面看,该过程不难,然而为了生成的ES语法的准确性,ES语法的生成引擎是非常重要的一个环节,通过在模板管理来增加一些配置来辅助ES的语法生成,如行维度,列维度,维的层次等等。

数据展示

通过模板的动态配置,利用Echart和Vue可以实现,动态布局和各种仪表盘的展示,这部分内容也可参看Kibana。

小结

ES的文档结构符合MOLAP的概念,它提供的聚合功能可以实现OLAP的各种操作,基于Lucene和分布式架构在性能上也能得到保障。通过定义一定的规则和开发一系列的辅助功能,可比较容易的实现一套通用的实时OLAP的框架。本文主要讲解的是基于ES构建OLAP框架的思路,而非完整的构建方案。

参考连接

起因

之前通过GithubPages和Hexo搭建了个人博客,在这里可看到搭建的方法。现在呢,希望用另一个帐号,管理一个不同内容的博客。有了之前的经验,整个搭建的过程还是比较顺利,主要碰到的问题就是代码发布。之前github帐号,是通过Windows的凭证管理来实现的,按理说Github配置了多个帐号也是可以管理的。如图:

crdential.png

但是两套博客在发布代码的时候都出现了问题。

初步方案

将用户名和密码配置在Hexo的_config.yml的配置项中,形如

1
https://enderjo:xyz@github.com/enderjo/enderjo.github.io

虽然这样配置是解决了发布问题,但密码安全问题就暴露出来了,想想之前有了解到Github有ssh功能,但也没用过,初步试了下一样发现两个帐号配置的问题,github不支持同一个公钥用于不同的帐号,会提示公钥已经被使用了,请教了下前同事事了解到ssh有个config的配置。

最终方案

基于同事的提示刚开始创建的config如下:

1
2
3
4
5
Host github:enderjo
Hostname github.com
User git
PreferredAuthentications publickey
IdentityFile ~/.ssh/github-enderjo/id_rsa

Window中在用户目录%USERPROFILE%下。之后使用ssh测试也是通的。
如下:

1
2
$ ssh -T github.com:enderjo
Hi enderjo! You've successfully authenticated, but GitHub does not provide shell access.

但测试Git Clone的时候提示无法

1
2
3
4
5
6
7
$ git clone git@github.com:enderjo/enderjo.github.io.git
Cloning into 'enderjo.github.io'...
git@github.com: Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

参考Error: Permission denied (publickey)通过ssh-add证书的方式,能正常git clone。此时已可初步判断,git clone没有正常取到config的配置。
接着同事发来这篇文章
https://gitlab.com/gitlab-org/gitlab-ce/issues/45593
其中提到,可通过如下命令查看git clone的实际执行过程:

1
GIT_TRACE=1 GIT_SSH_COMMAND="ssh -vvv" git clone git@gitlab.example.com:my-group/my-project.git

测试后的部分代码如下:

1
2
3
4
5
$ GIT_TRACE=1 GIT_SSH_COMMAND="ssh -vvv" git clone git@github.com:enderjo/enderjo.github.io.git
14:03:51.164492 exec-cmd.c:236 trace: resolved executable dir: C:/Program Files/Git/mingw64/bin
14:03:51.165492 git.c:415 trace: built-in: git clone git@github.com:enderjo/enderjo.github.io.git
Cloning into 'enderjo.github.io'...
14:03:51.225463 run-command.c:637 trace: run_command: unset GIT_DIR; 'ssh -vvv' git@github.com 'git-upload-pack '\''enderjo/enderjo.github.io.git'\'''

此时我们可以看到,实际执行的地址和测试的地址是不一样的。

1
2
$ ssh -T git@github.com:enderjo
$ ssh -vvv git@github.com

ssh命令是根据config文件中Host记录与ssh命令时所输入的主机名来进行匹配的,而git clone因为使用:分隔用户名。

之后查到这篇文章。config文件中的Host就是一个别名。

调整config

1
2
3
4
5
Host gh-enderjo
Hostname github.com
User git
PreferredAuthentications publickey
IdentityFile ~/.ssh/github-enderjo/id_rsa

修改git clone命令

1
2
3
4
5
6
7
$ git clone gh-enderjo:enderjo/enderjo.github.io.git
Cloning into 'enderjo.github.io'...
remote: Counting objects: 4822, done.
remote: Compressing objects: 100% (386/386), done.
remote: Total 4822 (delta 873), reused 2292 (delta 846), pack-reused 2496
Receiving objects: 100% (4822/4822), 5.20 MiB | 215.00 KiB/s, done.
Resolving deltas: 100% (1686/1686), done.

测试正常,修改Hexo的_config.yml也能正常发布代码了。

增加Mac的配置

因为之前配置过了Windows,这里我需要把Windows中的ssh配置先拷贝到Mac上(~/.ssh)。
测试,此时会提示。

1
2
3
4
5
6
7
8
Permissions 0755 for '/Users/airhead/.ssh/github-enderjo/id_rsa' are too open.
It is required that your private key files are NOT accessible by others.
This private key will be ignored.
Load key "/Users/airhead/.ssh/github-enderjo/id_rsa": bad permissions
git@github.com: Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights

需要修改权限为400(600也可以),注意修改路径

1
chmod 400 /Users/airhead/.ssh/github-enderjo/id_rsa

小结

利用ssh的config功能可以很好的管理Git的多帐号问题,但需要注意修改Git仓库的获取地址。

虽然一直有使用Git,但对于Git和ssh还是有很多不了解的地方。

参考链接

Error: Permission denied (publickey)
https://gitlab.com/gitlab-org/gitlab-ce/issues/45593
ssh-config配置
SSH Config 那些你所知道和不知道的事

因本人能力有限,理解不到位,翻译内容可能存在偏差。如果可能,请尽量读原文

We have recently completed a milestone where we were able to drop jQuery as a dependency of the frontend code for GitHub.com. This marks the end of a gradual, years-long transition of increasingly decoupling from jQuery until we were able to completely remove the library. In this post, we will explain a bit of history of how we started depending on jQuery in the first place, how we realized when it was no longer needed, and point out that—instead of replacing it with another library or framework—we were able to achieve everything that we needed using standard browser APIs.

我们最近完成了一个里程碑,就是将jQuery从我们Github.com的前端依赖库中移了。直到完全移除了jQuery库才意味着一个逐步的、长达数年的去jQuery化的结束。在这篇文章中,我们会解释为一些历史,为什么我们一开始会引入jQuery,我们是何时意识到已经不再需要jQuery了,以及找出可替代它的类库或者框架,我需要使用浏览器的标准API来达到目的。

为什么jQuery之前是有用的(Why jQuery made sense early on)

GitHub.com pulled in jQuery 1.2.1 as a dependency in late 2007. For a bit of context, that was a year before Google released the first version of their Chrome browser. There was no standard way to query DOM elements by a CSS selector, no standard way to animate visual styles of an element, and the 【XMLHttpRequest interface](https://developer.mozilla.org/en-US/docs/Web/API/XMLHttpRequest) pioneered by Internet Explorer was, like many other APIs, inconsistent between browsers.

GitHub.com在2007年底时引入了jQuery 1.2.1。因为这样的一些背景,当时距Google发布第一个版本还有一年时间。通过CSS选择器来查询DOM元素没有标准的方式,将元素视觉动画化也没有标准的样式,IE提供XMLHttpRequest接口,和其他API一样,在不同的浏览器之间也是不同。

jQuery made it simple to manipulate the DOM, define animations, and make “AJAX” requests— basically, it enabled web developers to create more modern, dynamic experiences that stood out from the rest. Most importantly of all, the JavaScript features built in one browser with jQuery would generally work in other browsers, too. In those early days of GitHub when most of its features were still getting fleshed out, this allowed the small development team to prototype rapidly and get new features out the door without having to adjust code specifically for each web browser.

jQuery使得操作DOM,定义动画,发起“AJAX”请求变得简单——基本上,它使Web开发人员能够创建出更现代、更动态的体验,这些体验在其他方面都非常突出。更为重要的是,使用了jQuery的JavaScript特性在不同的浏览器都可正常工作。在GitHub早期,它的大部分功能还在不断充实,这使得小型开发团队能够快速地进行原型,而无需为每个Web浏览器专门调整代码来实现新的特性。

The simple interface of jQuery also served as a blueprint to craft extension libraries that would later serve as building blocks for the rest of GitHub.com frontend: pjax and facebox.

jQuery的简单接口为设计扩展库提供了蓝图,GitHub.com扩展前端的基础模块: pjaxfacebox

We will always be thankful to John Resig and the jQuery contributors for creating and maintaining such a useful and, for the time, essential library.

我们将一直感激John Resig和jQuery的贡献者创建和维护了这样一个有用的,对于当时来说,必不可少的类库。

近年来的Web标准(Web standards in the later years)

Over the years, GitHub grew into a company with hundreds of engineers and a dedicated team gradually formed to take responsibility for the size and quality of JavaScript code that we serve to web browsers. One of the things that we’re constantly on the lookout for is technical debt, and sometimes technical debt grows around dependenices that once provided value, but whose value dropped over time.

这些年来,Github成长为一家拥有数百名工程师的公司,并逐渐形成了一只专门负责我们运行在Web浏览器上的JavaScript代码的大小和质量的团队。其中一件我们一直在做的事情就是找出技术债务,有时技术债务会随着曾经提供价值的附属物而增长,但其价值会随着时间的推移而下降。

When it came to jQuery, we compared it against the rapid evolution of supported web standard in modern browsers and realized:

The $(selector) pattern can easily be replaced with querySelectorAll();
CSS classname switching can now be achieved using Element.classList;
CSS now supports defining visual animations in stylesheets rather than in JavaScript;
$.ajax requests can be performed using the Fetch Standard;
The addEventListener() interface is stable enough for cross-platform use;
We could easily encapsulate the event delegation pattern with a lighweight library;
Some syntactic sugar that jQuery provides has become reduntant with the evolution of JavaScript language.
Furthermore, the chaining syntax didn’t satisfy how we wanted to write code going forward. For example:

$(‘.js-widget’)
.addClass(‘is-loading’)
.show()
This syntax is simple to write, but to our standards, doesn’t communicate intent really well. Did the author expect one or more js-widget elements on this page? Also, if we update our page markup and accidentally leave out the js-widget classname, will an exception in the browser inform us that something went wrong? By default, jQuery silently skips the whole expresion when nothing matched the initial selector; but to us, such behavior was a bug rather than a feature.

Finally, we wanted to start annotating types with Flow to perform static type checking at build time, and we concluded that the chaining syntax doesn’t lend itself well to static analysis, since almost every result of a jQuery method call is of the same type. We chose Flow over alternatives because, at the time, features such as @flow weak mode allowed us to progressively and efficiently start applying types to a codebase which was largely untyped.

All in all, decoupling from jQuery would mean that we could rely on web standards more, have MDN web docs be de-facto default documentation for our frontend developers, maintain more resilient code in the future, and eventually drop a 30 kB dependency from our packaged bundles, speeding up page load times and JavaScript execution times.

Incremental decoupling
Even with an end goal in sight, we knew that it wouldn’t be feasible to just allocate all resources we had to rewriting everything from jQuery to vanilla JS. If anything, such a rushed endeavor would likely lead to many regressions in site functionality that we would later have to weed out. Instead, we:

Set up metrics that tracked ratio of jQuery calls used per overall line of code and monitored that graph over time to make sure that it’s either staying constant or going down, not up.

Graph of jQuery usage going down over time.

We discouraged importing jQuery in any new code. To facilitate that using automation, we created eslint-plugin-jquery which would fail CI checks if anyone tried to use jQuery features, for example $.ajax.

There were now plenty of violations of eslint rules in old code, all of which we’ve annotated with specific eslint-disable rules in code comments. To the reader of that code, those comments would serve as a clear signal that this code doesn’t represent our current coding practices.

We created a pull request bot that would leave a review comment on a pull request pinging our team whenever somebody tried to add a new eslint-disable rule. This way we would get involved in code review early and suggest alternatives.

A lot of old code had explicit coupling to external interfaces of pjax and facebox jQuery plugins, so we’ve kept their interfaces relatively the same while we’ve internally replaced their implementation with vanilla JS. Having static type checking helped us have greater confidence around those refactorings.

Plenty of old code interfaced with rails-behaviors, our adapter for the Ruby on Rails approach to “unobtrusive” JS, in a way that they would attach an AJAX lifecycle handler to certain forms:

// LEGACY APPROACH
$(document).on(‘ajaxSuccess’, ‘form.js-widget’, function(event, xhr, settings, data) {
// insert response data somewhere into the DOM
})
Instead of having to rewrite all of those call sites at once to the new approach, we’ve opted to trigger fake ajax* lifecycle events and keep these forms submitting their contents asynchronously as before; only this time fetch() was used internally.

We maintained a custom build of jQuery and whenever we’ve identified that we’re not using a certain module of jQuery anymore, we would remove it from the custom build and ship a slimmer version. For instance, after we have removed the final usage of jQuery-specific CSS pseudo-selectors such as :visible or :checkbox, we were able to remove the Sizzle module; and when the last $.ajax call was replaced with fetch(), we were able to remove the AJAX module. This served a dual purpose: speeding up JavaScript execution times while at the same time ensuring that no new code is created that would try using the removed functionality.

We kept dropping support for old Internet Explorer versions as soon as it would be feasible to, as informed by our site analytics. Whenever use of a certain IE version dropped below a certain treshold, we would stop serving JavaScript to it and focus on testing against and supporting more modern browsers. Dropping support for IE 8–9 early on allowed us to adopt many native browser features that would otherwise be hard to polyfill.

As part of our refined approach to building frontend features on GitHub.com, we focused on getting away with regular HTML foundation as much as we could, and only adding JavaScript behaviors as progressive enhancement. As a result, even those web forms and other UI elements that were enhanced using JS would usually also work with JavaScript disabled in the browser. In some cases, we were able to delete certain legacy behaviors altogether instead of having to rewrite them in vanilla JS.

With these and similar efforts combined over the years, we were able gradually reduce our dependence on jQuery until there was not a single line of code referencing it anymore.

Custom Elements
One technology that has been making waves in the recent years is Custom Elements: a component library native to the browser, which means that there are no additional bytes of a framework for the user to download, parse and compile.

We had created a few Custom Elements based on the v0 specification since 2014. However, as standards were still in flux back then, we did not invest as much. It was not until 2017 when the Web Components v1 spec was released and implemented in both Chrome and Safari that we began to adopt Custom Elements on a wider scale.

During the jQuery migration, we looked for patterns that would be suitable for extraction as custom elements. For example, we converted our facebox usage for displaying modal dialogs to the element.

Our general philosophy of striving for progressive enhancement extends to custom elements as well. This means that we keep as much of the content in markup as possible and only add behaviors on top of that. For example, shows the raw timestamp by default and gets upgraded to translate the time to the local timezone, while , when nested in the

element, is interactive even without JavaScript, but gets upgraded with accessibility enhancements.

Here is an example of how a custom element could be implemented:

// The local-time element displays time in the user’s current timezone
// and locale.
//
// Example:
// Sep 6, 2018
//
class LocalTimeElement extends HTMLElement {
static get observedAttributes() {
return [‘datetime’]
}

attributeChangedCallback(attrName, oldValue, newValue) {
if (attrName === ‘datetime’) {
const date = new Date(newValue)
this.textContent = date.toLocaleString()
}
}
}

if (!window.customElements.get(‘local-time’)) {
window.LocalTimeElement = LocalTimeElement
window.customElements.define(‘local-time’, LocalTimeElement)
}

One aspect of Web Components that we’re looking forward to adopting is Shadow DOM. The powerful nature of Shadow DOM has the potential to unlock a lot of possibilities for the web, but that also makes it harder to polyfill. Because polyfilling it today incurs a performance penalty even for code that manipulates parts of the DOM unrelated to web components, it is unfeasible for us to start using it in production.

Polyfills
These are the polyfills that helped us transition to using standard browser features. We try to serve most of these polyfills only when absolutely necessary, i.e. to outdated browsers as part of a separate “compatibility” JavaScript bundle.

背景

公司IM项目采用个推进行消息推送,之前使用Hbuilder在线打包的方式集成。这次采用Hbuilder插件开发并离线打包的方式新增视频功能(视频使用WebRTC协议),其中的呼入消息需要使用到个推的推送(主要考量就是复用,减少开发量)。
通过个推官网的文档Android Studio快速集成(推荐),能很快的将个推SDK集成进来。

实现及问题

业务逻辑

  1. 视频发起者,发起视频邀请。
  2. 服务器接到请求后,推送个推透传消息给视频接收者。
  3. 视频接收者,弹出的呼入页面,点击接收开始视频会话,或拒绝视频邀请。

    实现

    需要继承自com.igexin.sdk.GTIntentService的类,用于接收CID、透传消息。以下为对应的事件回调方法:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    public class DemoIntentService extends GTIntentService {
    @Override
    public void onReceiveMessageData(Context context, GTTransmitMessage msg) {
    //省略其他业务逻辑
    Intent intent = new Intent(getBaseContext(), XxxActivity.class);
    getApplication().startActivity(intent);
    }

    @Override
    public void onReceiveClientId(Context context, String clientid) {
    Log.e(TAG, "onReceiveClientId -> " + "clientid = " + clientid);
    //省略注册个推与用户信息
    }
    }

    问题

    正常情况,此时应该能弹出指定的XxxActivity页面,然而并未弹出。经过一段资料查找和测试验证。需要增加Intent.FLAG_ACTIVITY_NEW_TASK的设置,代码如下:
    1
    2
    3
    4
    5
    6
    7
    @Override
    public void onReceiveMessageData(Context context, GTTransmitMessage msg) {
    //省略其他业务逻辑
    Intent intent = new Intent(getBaseContext(), XxxActivity.class);
    intent.addFlags(Intent.FLAG_ACTIVITY_NEW_TASK);
    getApplication().startActivity(intent);
    }

    FLAG_ACTIVITY_NEW_TASK

    int FLAG_ACTIVITY_NEW_TASK
    If set, this activity will become the start of a new task on this history stack.

如果设置了FLAG_ACTIVITY_NEW_TASK,页面会被置顶。从文档来看主要还是对Android开发的基础知识不够了解。

小结

算是Android开发中的一个小坑吧,记录下来,方便日后查找。希望本文对采用类似方案的朋友有所帮助。

参考

个推-点击推送跳转至指定页面(透传)
Intent

准备

官方文档为兼容各个版本,看起来比较杂乱,不容易读懂。自己根据实际的开发情况,整理了本文。本文参考官方文档Android平台第三方插件开发指导, 对官方文档有理解不到位之处未能说明清楚,请参看官方文档。

下载最新版本5+SDK,下载地址, 将SDK解压到任意目录,目录结构如下,以下统称为5+SDK目录。

术语字典

JS Plugin Bridge

H5+ Plugin Bridge层JS部分API,插件调用者通过调用API,触发Native层扩展插件相应方法的调用。

Native Plugin Bridge

H5+ Plugin Bridge层Native部分API,插件开发者通过实现接口类方法,实现扩展插件业务逻辑。插件开发者调用API,实现Native扩展方法运行结果的返回。

Native层扩展插件

开发者使用原生语言实现的5+扩展插件,可被JS层通知调用。

插件类别名(插件别名)

读时需要按插件类|别名的方式来断句。JS层字符串,用来声明JS层和Native层插件的对应关系。

技术架构

HTML5+ 基座扩展采用三层结构,JS层、PluginBridge层和Native层。 三层功能分别是: 

  • JS层: 在Webview页面调用,触发Native层代码,获取执行结果。 
  • PluginBridge层: 将JS层请求进行处理,触发Native层扩展插件代码。 
  • Native层: 插件扩展的平台原生代码,负责执行业务逻辑并执行结果返回到请求页面。

插件开发者在开发扩展插件时需要为扩展插件编写对应的JS API,JS API将在HTML页面中触发调用对应Native扩展方法的请求,并负责接收运行结果。

插件调用者(也可能为插件调用者)通过调用Javascript Plugin Bridge的API用来完成对Native层代码的调用和运行结果的返回。

在实际应用中,插件开发者可以根据扩展方法的实际需求不同,提供同步或者异步JS API,插件调用者根据JS API将设置为同步执行或异步执行。

扩展插件工作流程

同步执行

同步执行的扩展方法会阻塞当前JS代码的执行,直到Native层插件扩展方法执行完毕。

异步执行

异步扩展方法不会阻塞当前JS代码的执行,插件调用者需要设置回调方法接收Native层返回的执行结果。插件开发者需要在插件中调用 Native plugin brigde的方法将执行结果返回到请求页面。

工程示例请参考SDK内包含的HBuilder-Integrate-AS工程,工程里已经整合了插件开发和集成方式的示例。

插件实现方式

创建插件类

创建一个继承自StandardFeature的类,实现第三方插件扩展。

创建插件类需要引入的包 

1
2
3
importio.dcloud.common.DHInterface.IWebview;
importio.dcloud.common.DHInterface.StandardFeature;
importio.dcloud.common.util.JSUtil;

实现扩展方法

插件初始化

重写onStart方法,需要设置dcloud_properties.xml的Service节点,该方法才会被调用。

1
public void onStart(Contextcontext,Bundlebundle,String[]strings)

扩展插件方法声明

Native层扩展插件的方法声明如下:

1
public void methodName(IWebview webView, JSONArray array)

参数说明

参数名 类型 说明
webView IWebview 发起请求的webview
array JSONArray JS请求传入的参数

只有符合以上函数声明的方法,才可被JS层调用。

决定执行方法

插件开发者对返回值的不同调用方式决定了JS API的执行方法。不同的执行方法对入参array(JSONArray)的要求也会有所有不同。

同步执行方法

插件开发者通过JSUtil.wrapJsVar返回时为同步执行。同步执行方法在返回结果时可以直接将结果以return的形式返回给js层。

方法声明
1
String wrapJsVar(String value);
参数说明
参数名 类型 说明
value String 要返回到JS层的值

查看io.dclod.util.JSUtil了解更多的返回值类型的处理(boolean,Number, String, JSONArray, JSONObject)。

示例代码
1
JSUtil.wrapJsVar("Html5 Plus Plugin Hello1!");

异步执行方法

插件开发者调用JSUtil.execCallback返回时为异步执行。通过CallbackId实现回调函数的关联处理,对于CallbackId的传入由插件开发者具体约定入参array(JSONArray)中的位置,通常使用第一个。

方法声明
1
String execCallback(IWebview pWebView, String pCallbackId, String pMessage, int pStatus, boolean isJson, boolean pKeepCallback);
参数说明
参数名 类型 说明
pWebView IWebview 扩展插件方法运行的窗口
pCallbackId String 回调函数的唯一标识
pMessage String 回调函数的参数,即返回结果
pStatus int 操作是否成功,成功则使用JSUtil.OK,否则使用错误代码
isJson boolean 回调函数参数是否为JSON数据
pKeepCallback boolean 是否可多次触发回调函数
示例代码
1
JSUtil.execCallback(pWebview, cbId, (which==AlertDialog.BUTTON_POSITIVE)?"ok":"cancel", JSUtil.OK, false, false);

完整代码示例

该示例只展示同步或异步方法,完整内容参考官网或者SDK中示例工程。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
package com.example.H5PlusPlugin;
import io.dcloud.common.DHInterface.IWebview;
import io.dcloud.common.DHInterface.StandardFeature;
import io.dcloud.common.util.JSUtil;
import org.json.JSONArray;
import org.json.JSONException;
import org.json.JSONObject;
public class PGPlugintest extends StandardFeature
{
public void PluginTestFunction(IWebview pWebview, JSONArray array)
{
String CallBackID = array.optString(0);
JSONArray newArray = new JSONArray();
newArray.put(array.optString(1));
newArray.put(array.optString(2));
newArray.put(array.optString(3));
newArray.put(array.optString(4));
JSUtil.execCallback(pWebview, CallBackID, newArray, JSUtil.OK, false);
}
public String PluginTestFunctionSync(IWebview pWebview, JSONArray array)
{
String inValue1 = array.optString(0);
String inValue2 = array.optString(1);
String inValue3 = array.optString(2);
String inValue4 = array.optString(3);
String ReturnValue = inValue1 + "-" + inValue2 + "-" + inValue3 + "-" + inValue4;
return JSUtil.wrapJsVar(ReturnValue);
}
}

插件调用方法

设置插件类别名

插件调用者在实现JS API时首先要定义一个插件类别名,在Android工程的assets\data\dcloud_properties.xml文件中声明插件类别名和Native层扩展插件类的对应关系。feature节点下声明的插件将会在调用时创建相应的对象。

1
2
3
4
5
<properties>
<features>
<feature name="plugintest" value="com.example.H5PlusPlugin.PGPlugintest"></feature>
</features>
</properties>

如果开发的插件有应用启动时初始化的需求,需要同时配置service节点。

1
2
3
4
5
<properties>
<services>
<service name="plugintest" value="com.example.H5PlusPlugin.PGPlugintest"></service>
</services>
</properties>

调用方式实现

同步调用

插件调用者需要调用JS Plugin Bridge的window.plus.bridge.execSync()方法,该方法可同步获取Native插件返回的运行结果。

函数声明
1
void plus.bridge.execSync( String service, String action, Array<String> args );
参数说明
参数名 类型 说明
service String 插件类别名
action String 调用Native层插件方法名称
args Array 参数列表

异步调用

插件调用者需要调用JS Plugin Bridge的plus.bridge.exec()方法,该方法会通知Native层插件执行指定方法,运行结果会通过回调的方式通知JS层。

函数声明
1
void plus.bridge.exec( String service, String action, Array<String> args );
参数说明
参数名 类型 说明
service String 插件类别名
action String 调用Native层插件方法名称
args Array 参数列表
为了接收和处理结果,插件调用者需要调用window.plus.bridge.callbackId()生成callbackId,并通过args( Array )传入。对JS不是很了解,个人认为应该类似一个Map(回调函数表),提供回调函数注册。

实际使用及示例代码

根据业务的需要和使用的方便,可以在JS层进行插件调用的封装使用。

封装示例

该示例只展示同步或异步方法,完整内容参考官网或者SDK中示例工程。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
document.addEventListener( "plusready",  function()
{
// 声明的JS“扩展插件别名”
var _BARCODE = 'plugintest',
B = window.plus.bridge;
var plugintest =
{
// 声明异步返回方法
PluginTestFunction : function (Argus1, Argus2, Argus3, Argus4, successCallback, errorCallback )
{
var success = typeof successCallback !== 'function' ? null : function(args)
{
successCallback(args);
},
fail = typeof errorCallback !== 'function' ? null : function(code)
{
errorCallback(code);
};
callbackID = B.callbackId(success, fail);
// 通知Native层plugintest扩展插件运行”PluginTestFunction”方法
return B.exec(_BARCODE, "PluginTestFunction", [callbackID, Argus1, Argus2, Argus3, Argus4]);
},
// 声明同步返回方法
PluginTestFunctionSync : function (Argus1, Argus2, Argus3, Argus4)
{
// 通知Native层plugintest扩展插件运行“PluginTestFunctionSync”方法并同步返回结果
return B.execSync(_BARCODE, "PluginTestFunctionSync", [Argus1, Argus2, Argus3, Argus4]);
}
};
window.plus.plugintest = plugintest;
}, true );

HTML使用示例

该示例只展示同步或异步方法,完整内容参考官网或者SDK中示例工程。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
<!DOCTYPE HTML>
<html>
<head>
<meta charset="utf-8"/>
<meta name="viewport" content="initial-scale=1.0, maximum-scale=1.0, user-scalable=no"/>
<meta name="HandheldFriendly" content="true"/>
<meta name="MobileOptimized" content="320"/>
<title>H5Plugin</title>
<script type="text/javascript" src="./js/common.js"></script>
<script type="text/javascript" src="./js/test.js"></script>
<script type="text/javascript">
function pluginShow() {
plus.plugintest.PluginTestFunction("Html5","Plus","AsyncFunction","MultiArgument!", function( result ) {alert( result[0] + "_" + result[1] + "_" + result[2] + "_" + result[3] );},function(result){alert(result)});
}
function pluginGetString()
{
alert(plus.plugintest.PluginTestFunctionSync("Html5","Plus","SyncFunction","MultiArgument!"));
}
</script>
<link rel="stylesheet" href="./css/common.css" type="text/css" charset="utf-8"/>
</head>
<body>
<header>
<div class="nvbt" onclick="back();"><div class="iback"></div></div>
<div class="nvtt">PluginTest</div>
</header>
<div id="dcontent" class="dcontent">
<br/>
<div class="button" onclick="pluginShow()">PluginTestFunction()</div>
<div class="button" onclick="pluginGetString()">PluginTestFunctionSync()</div>
<br/>
</div>
</body>
</html>

参考链接

Android平台第三方插件开发指导
Android平台以WebView方式集成HTML5+SDK方法

本文来源Private Networks

私有网络(Private Networks)

Allows ipfs to only connect to other peers who have a shared secret key.

IPFS只有共享密钥的节点才可互相连接。

状态(State)

Experimental

实验性

In Version

master, 0.4.7

如何启用(How to enable)

Generate a pre-shared-key using ipfs-swarm-key-gen:

通过ipfs-swarm-key-gen生成一个共享密钥:

1
2
go get github.com/Kubuxu/go-ipfs-swarm-key-gen/ipfs-swarm-key-gen
ipfs-swarm-key-gen > ~/.ipfs/swarm.key

To join a given private network, get the key file from someone in the network and save it to ~/.ipfs/swarm.key (If you are using a custom $IPFS_PATH, put it in there instead).

加入私有网络,需要从网络中的其他人中获取密钥文件并把它保存为~/.ipfs/swarm.key(如果你设置了$IPFS_PATH,替换为设置的目录)。

When using this feature, you will not be able to connect to the default bootstrap nodes (Since we aren’t part of your private network) so you will need to set up your own bootstrap nodes.

当你使用这个功能的时候,你不可以使用默认的启动节点(因为不是私有网络的一部分),因此你需要设置你自己的启动节点。

First, to prevent your node from even trying to connect to the default bootstrap nodes, run:

首先为了防止你的节点尝试连接默认的启动节点,你需要运行:

1
ipfs bootstrap rm --all

注意,这里有个BUG,可能会导致当前节点也被移除,而启动错误,需要手动重新添加。

Then add your own bootstrap peers with:

然后添加你自己的启动节点

1
ipfs bootstrap add <multiaddr>

For example:

例如:

1
ipfs bootstrap add /ip4/104.236.76.40/tcp/4001/ipfs/QmSoLV4Bbm51jM9C4gDYZQ9Cy3U6aXMJDAbzgu2fzaDs64

Bootstrap nodes are no different from all other nodes in the network apart from the function they serve.

除了提供启动功能外,启动节点和网络中的其他节点没有什么不同。

To be extra cautious, You can also set the LIBP2P_FORCE_PNET environment variable to 1 to force the usage of private networks. If no private network is configured, the daemon will fail to start.

为了更安全,你也可以设置LIBP2P_FORCE_PNET环境变量为1来强制使用私用网络。如果没有配置私有网络,ipfs守护进程就会无法启动。

参考链接

https://zhuanlan.zhihu.com/p/35141862

准备

本文主要参考使用新版本5+SDK创建最简Android原生工程(Android studio)Android离线打包。如有未说明清楚之处,参看官方文档。

下载最新版本5+SDK,下载地址, 将SDK解压到任意目录,目录结构如下,以下统称为5+SDK目录。

新建Android工程

从欢迎界面或者文件>新建工程,选择Android类型的项目,使用默认配置或者根据需要做相关调整。创建Anroid应用本身不是本文重点。如果有问题可以删了重来,建个一两次基本没问题了。
新建Android工程

配置Hbuilder

注意将项目树使用Project方式显示,以便说明内容不引起误解

删除原生文件

删除原生工程中java目录下系统默认创建的源代码

复制aar文件

复制5+SDK目录下的SDK->libs->lib.5plus.base-release.aar文件到原生工程工程的app->libs目录下

配置build.gradle

打开app下的build.gradle文件

添加aar文件引用到dependencies

添加aar文件引用到dependencies,如下代码。如过编译器有警告信息,根据提示信息修改成imlementation。

1
compile(name: 'lib.5plus.base-release', ext: 'aar')

添加aar文件搜索路径

添加aar文件搜索路径与dependencies同级, 代码如下

1
2
3
4
5
repositories {
flatDir {
dirs 'libs'
}
}

修改targetSdkVersion

targetSdkVersion修改为21

设置multiDexEnabled

multiDexEnabled设置成false

配置Androidmanifest.xml

打开工程的Androidmanifest.xml文件,用以下内容替换原有application节点的内容

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
<application
android:name="io.dcloud.application.DCloudApplication"
android:allowClearUserData="true"
android:icon="@drawable/icon"
android:label="@string/app_name"
android:largeHeap="true"
>
<activity
android:name="io.dcloud.PandoraEntry"
android:configChanges="orientation|keyboardHidden|keyboard|navigation"
android:label="@string/app_name"
android:launchMode="singleTask"
android:hardwareAccelerated="true"
android:theme="@style/TranslucentTheme"
android:screenOrientation="user"
android:windowSoftInputMode="adjustResize" >
<intent-filter>
<action android:name="android.intent.action.MAIN" />
<category android:name="android.intent.category.LAUNCHER" />
</intent-filter>
</activity>
</application>

增加icon.png

在app->src->res->drawble目录下放应用的图标文件文件命名为icon.png,如果没有可从5+SDK目录下的HBuilder-Hello目录下找。

配置资源目录

创建assets目录(app->src->main->assets),将5+SDK目录下的SDK->assets->data目录复制到新创建的assets目录下

配置应用目录

创建apps目录(app->src->main->assets->apps),将5+SDK目录下的HBuilder-Hello下的Apps(HBuilder-Hello->app->src->main->assets->apps)目录下内容复制到新创建的apps目录下。 注意: 应用资源的路径为[appid]->www, appid为应用资源manifest.json文件中id节点的值。

配置dcloud_control.xml

修改应用的assets->data->dcloud_control.xml文件的apps->app节点的appid属性的值改为manifest.json文件id节点的值,资源目录(app->[appid])、manifest.json的id(id)和dcloud_control的appid,三者必须一致,否则应用无法正常启动。

编译运行。

打包

配置应用的权限

参考5+SDK目录下“Feature列表.xls”文档,确定应用中使用到的扩展API,在AndroidManifest.json文件中调整API的权限。

配置应用的包名及版本号

打开AndroidManifest.xml文件,在代码视图中修改根节点的package属性值,如下:

其中package为应用的包名,采用反向域名格式,为应用的标识;versionCode为应用的版本号(整数值),用于各应用市场的升级判断,建议与manifest.json中version -> code值一致;versionName为应用的版本名称(字符串),在系统应用管理程序中显示的版本号,建议与manifest.json中version -> name值一致。

配置应用名称

打开res -> values -> strings.xml文件,修改“app_name”字段值,该值为安装到手机上桌面显示的应用名称。

配置应用图标和启动界面

将应用的图标(文件名为icon.png)和启动图片按照对应的尺寸拷贝到工程的res -> drawable-XXX目录下。

应用图标

指定各种分辨率设备上使用的应用图标(png格式)

节点名 图标尺寸 说明
mdpi 48*48 普通屏程序图标
ldpi 48*48 大屏程序图标
hdpi 72*72 高分屏程序图标
xhdpi 96*96 20P高分屏程序图标
xxhdpi 144*144 1080P高分屏程序图标

启动图片

指定各种分辨率设备上使用的启动图片(png格式)

节点名 图标尺寸 说明
mdpi 240*282 普通屏启动图片
ldpi 320*442 大屏启动图片
hdpi 480*762 高分屏启动图片
xhdpi 720*1242 720高分屏幕启动图片
xxhdpi 1080*1882 1080p高分屏启动图片

更新应用资源

打开assets -> apps 目录,将下面“HelloH5”目录名称修改为应用manifest.json中的id名称(这步非常重要,否则会导致应用无法正常启动),并将所有应用资源拷贝到其下的www目录。

配置应用信息

打开assets -> data下的dcloud_control.xml文件:

其中appid值为apps目录下的[appid]目录,appid决定运行哪个应用;appver为应用的版本号,用于应用资源的升级,必须保持与manifest.json中的version -> name值完全一致;version值为应用基座版本号(plus.runtime.innerVersion返回的值),不要随意修改。

生成安装包

如果只是测试使用Build->build apks即可,如果用于正式环境,build->Generate Signed APK。按要求填写相关信息。

Android Studio是Google基于IntelliJ IDEA Community版本开发的定制版本,下载Android Studio需要翻墙,下载后基本按默认方式安装基本可以完成Android的开发环境配置。

IntelliJ IDEA包含了Android Studio中的所有功能,同时公司使用IntelliJ IDEA做为开发工具,所以使用IntelliJ IDEA作为开发工具就更好了。因为没有自动Android SDK,配置会繁琐一些。

准备

  • 安装Java 8 JDK
  • 安装IntelliJ IDEA 2018.2

需要确保IDEA已经正常运行,已开发Java项目。

配置IntelliJ IDEA

建议使用新版本(2018.2)支持页面中配置Android SDK。

关闭所有项目,此时会打开欢迎界面,在configure>settings或者在文件settings(Ctrl+Alt+S)打开系统配置,之后选择Appearenc&Behavior>system setting>Android SDK。

在Android SDK Location处点击Edit,选择相关SDK工具包,等待下载完成。
安装Android SDK

创建Android项目

• 在欢迎界面,创建新项目或者在File>New Project菜单中选择。
• 在项目类型选Android,可以一路默认,也可以更加自己的实际需要设置应用名,最小支持的SDK和模板等内容。
创建Android项目

注意事项

因为模拟器的原因需要关闭Windows的hyper-v,这个在安装安装SDK时会提示。

建议购买Jetbrains正版授权。

起因

IntelliJ IDEA出了新版本(2018.2),之前使用网上找的LicenseServer地址,更新后无法正常使用了。查看文档时发现如下方法可激活。

因为评论内容,评论多了可能不好找,摘录下来,另外自己在配置时一些地方没有搞懂,还是记录下来的好。

之前听说javaagent很强大,现在看来确实是的。

方法

Rover updated his crack for Jetbrains 2018.2 releases. Download it here: http://bit.ly/jetbrainscrack210

Usage:

  1. Remove any license you had before.
  2. Click “Configure” -> “Edit Custom VM Options …”
  3. Append “-javaagent:{JetbrainsCrackPath}” to end line.
    ie: -javaagent:~/JetbrainsCrack-2.10-release-enc.jar
  4. Restart IDE
  5. Click Register
  6. Select “Activation Code”
  7. Enter any character.

Registered to Rover12421/Rover12421

If you prefer not to see Rover12421 you can paste the below code to the Activation Code, and it will be licensed to “Lanyus/ Not Me”, change the “Lanyus” and “Not Me” to your own liking.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
{"licenseId":"ThisCrackLicenseId",
"licenseeName":"Lanyus",
"assigneeName":"Not Me",
"assigneeEmail":"rover12421@163.com",
"licenseRestriction":"By Rover12421 Crack, Only Test! Please support genuine!!!",
"checkConcurrentUse":false,
"products":[
{"code":"II","paidUpTo":"2099-12-31"},
{"code":"DM","paidUpTo":"2099-12-31"},
{"code":"AC","paidUpTo":"2099-12-31"},
{"code":"RS0","paidUpTo":"2099-12-31"},
{"code":"WS","paidUpTo":"2099-12-31"},
{"code":"DPN","paidUpTo":"2099-12-31"},
{"code":"RC","paidUpTo":"2099-12-31"},
{"code":"PS","paidUpTo":"2099-12-31"},
{"code":"DC","paidUpTo":"2099-12-31"},
{"code":"RM","paidUpTo":"2099-12-31"},
{"code":"CL","paidUpTo":"2099-12-31"},
{"code":"PC","paidUpTo":"2099-12-31"},
{"code":"DB","paidUpTo":"2099-12-31"},
{"code":"GO","paidUpTo":"2099-12-31"},
{"code":"RD","paidUpTo":"2099-12-31"}
],
"hash":"2911276/0",
"gracePeriodDays":7,
"autoProlongated":false}

补充

如果是新装或者完全清理了之前版本,步骤1是可以不用执行的。

步骤2是需要特别注意的地方,菜单位置为”Help”->”Edit Custom VM Options …”,因系统未授权,不能进入,需要手动修改相关文件。

参考官网文档:

修改内容:

  1. 在安装目录找到对应版本的vm配置文件(IDE_HOME\bin<product>[bits][.exe].vmoptions)
  2. 将文件拷贝到配置目录(\Users<USER ACCOUNT NAME>.)的config目录下。如果未生成该目录,运行Idea进行初始配置就可生成。
  3. 增加-javaagent。
  4. 修改配置,随便填位置,可以填自己的需要的内容。
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    {"licenseId":"ThisCrackLicenseId",
    "licenseeName":"随便填",
    "assigneeName":"随便填",
    "assigneeEmail":"邮箱,随便填",
    "licenseRestriction":"描述信息,随便填",
    "checkConcurrentUse":false,
    "products":[
    {"code":"II","paidUpTo":"2099-12-31"},
    {"code":"DM","paidUpTo":"2099-12-31"},
    {"code":"AC","paidUpTo":"2099-12-31"},
    {"code":"RS0","paidUpTo":"2099-12-31"},
    {"code":"WS","paidUpTo":"2099-12-31"},
    {"code":"DPN","paidUpTo":"2099-12-31"},
    {"code":"RC","paidUpTo":"2099-12-31"},
    {"code":"PS","paidUpTo":"2099-12-31"},
    {"code":"DC","paidUpTo":"2099-12-31"},
    {"code":"RM","paidUpTo":"2099-12-31"},
    {"code":"CL","paidUpTo":"2099-12-31"},
    {"code":"PC","paidUpTo":"2099-12-31"},
    {"code":"DB","paidUpTo":"2099-12-31"},
    {"code":"GO","paidUpTo":"2099-12-31"},
    {"code":"RD","paidUpTo":"2099-12-31"}
    ],
    "hash":"2911276/0",
    "gracePeriodDays":7,
    "autoProlongated":false}

    参考链接

    https://www.52pojie.cn/thread-832601-1-1.html

本文基于《IPFS - Content Addressed, Versioned, P2P File System(DRAFT 3)》进行翻译,翻译过程中主要参考IPFS白皮书,根据自己的理解来做调整。

作者: Juan Benet (juan@benet.ai)

摘要(ABSTRACT)

The InterPlanetary File System (IPFS) is a peer-to-peer distributed file system that seeks to connect all computing devices with the same system of files. In some ways, IPFS is similar to the Web, but IPFS could be seen as a single BitTorrent swarm, exchanging objects within one Git repository. In other words, IPFS provides a high throughput content-addressed block storage model, with contentaddressed hyper links. This forms a generalized Merkle DAG, a data structure upon which one can build versioned file systems, blockchains, and even a Permanent Web. IPFS combines a distributed hashtable, an incentivized block exchange, and a self-certifying namespace. IPFS has no single point of failure, and nodes do not need to trust each other.

星际文件系统(IPFS)是一种对等分布式文件系统,旨在将所有计算设备连接到相同的文件系统。在某些方面,IPFS和Web很像,但IPFS可以看作是一个BitTorrent集群,并在Git仓库中做对象交换。换句话来说,IPFS提供了高吞吐的基于内容寻址的块存储模型和超链接。这形成了一个广义的默克尔有向无环图(Merkle DAG)数据结构,可以用这个数据结构构建版本化文件系统,区块链,甚至是永久性网站。IPFS结合了分布式哈希表,带激励机制的块交换和自认证的命名空间。IPFS没有单点故障,节点不需要相互信任。

1 介绍(INTRODUCTION)

There have been many attempts at constructing a global
distributed file system. Some systems have seen significant success, and others failed completely. Among the academic attempts, AFS [6] has succeeded widely and is still in use today. Others [7, ?] have not attained the same success. Outside of academia, the most successful systems have been peer-to-peer file-sharing applications primarily geared toward large media (audio and video). Most notably, Napster, KaZaA, and BitTorrent [2] deployed large file distribution systems supporting over 100 million simultaneous users. Even today, BitTorrent maintains a massive deployment where tens of millions of nodes churn daily [16]. These applications saw greater numbers of users and files distributed than their academic file system counterparts. However, the applications were not designed as infrastructure to be built upon. While there have been successful repurposings[^1], no general file-system has emerged that offers global, low-latency, and decentralized distribution.

在构建全球化的分布式文件系统方面,已经有很多尝试。一些系统取得了重要的成功,而另一些却彻底的失败了。在学术界的尝试中,AFS[6]取得了广泛的成功,至今也还在使用。另一些[7,?]就没有获得一样的成功。学术之外,最成功的系统是面向大多媒体(音频和视频)的点对点,文件共享的应用系统。最值得注意的是,Napster,KaZaA和BitTorrent[2]部署了大型文件分发系统,支持超过1亿的同步用户。即使在今天, BitTorrent也维持着每天千万节点的活跃数[16]。可以看到,这些应用程序分发的用户和文件数量比学术文件系统对应数量多。但是,这些应用不是作为基础设施来设计的。虽然取得了成功的应用,但没有出现一种通用的文件系统,支持全球化,低延迟,去中心化分发。

Perhaps this is because a “good enough” system for most use cases already exists: HTTP. By far, HTTP is the most successful “distributed system of files” ever deployed. Coupled with the browser, HTTP has had enormous technical and social impact. It has become the de facto way to transmit files across the internet. Yet, it fails to take advantage of dozens of brilliant file distribution techniques invented in the last fifteen years. From one prespective, evolving Web infrastructure is near-impossible, given the number of backwards compatibility constraints and the number of strongparties invested in the current model. But from another perspective, new protocols have emerged and gained wide use since the emergence of HTTP. What is lacking is upgrading design: enhancing the current HTTP web, and introducing new functionality without degrading user experience.

可能是适用大多数场景的“足够好用”的系统已经存在的原因:它就是HTTP。到目前为止,HTTP是最成功的“文件发布系统”。与浏览器相结合,HTTP在技术和社会上有巨大的影响力。它已成为互联网文件传输的事实标准。然而,它没有采用最近15年发明的数十种先进的文件分发技术。从一个角度来看,考虑到向后兼容性约束的数量以及对当前模型感兴趣的强大团队的数量,演进Web基础架构几乎不可能实现。但从另一个角度来看,自HTTP出现以来,新的协议已经出现并得到广泛的应用。 缺乏的是升级设计:增强当前的HTTP网络,并引入新功能而不会降低用户体验。

Industry has gotten away with using HTTP this long because moving small files around is relatively cheap, even for small organizations with lots of traffic. But we are entering a new era of data distribution with new challenges: (a)hosting and distributing petabyte datasets, (b) computing on large data across organizations, (c) high-volume highdefinition on-demand or real-time media streams, (d) versioning and linking of massive datasets, (e) preventing accidental disappearance of important files, and more. Many of these can be boiled down to “lots of data, accessible everywhere.” Pressed by critical features and bandwidth concerns, we have already given up HTTP for different data distribution protocols. The next step is making them part of the Web itself.

业界长期使用HTTP,因为移动小文件相对便宜,即使对于流量大的小型组织也是如此。但我们正在进入了一个数据分发的新时代,随之而来的是新的挑战:(a)托管和分发PB级的数据集,(b)跨组织的大数据计算,(c)大容量高清晰度按需或实时媒体流,(d)大规模的数据集版本化和链接,(e)防止重要文件意外丢失,等等。许多挑战可以归结来“大量数据,随处访问”。受关键特性和带宽问题的影响,我们已经放弃了HTTP,而使用不同的数据分布协议。下一步是让这些协议成为Web本身的一部分。

Orthogonal to efficient data distribution, version control systems have managed to develop important data collaboration workflows. Git, the distributed source code version control system, developed many useful ways to model and implement distributed data operations. The Git toolchain offers versatile versioning functionality that large file distribution systems severely lack. New solutions inspired by Git are emerging, such as Camlistore [?], a personal file storage system, and Dat [?] a data collaboration toolchain and dataset package manager. Git has already influenced distributed filesystem design [9], as its content addressed Merkle DAG data model enables powerful file distribution strategies. What remains to be explored is how this data structure can influence the design of high-throughput oriented file systems, and how it might upgrade the Web itself.

与高效的数据分发相对应,版本控制系统已经设法开发了重要数据的协作工作流。分布式源代码版本控制系统Git开发了许多有用的方法来建模和实现分布式数据操作。 Git工具链提供了大型文件分发系统严重缺乏的多种版本功能。 受Git启发的新解决方案正在兴起,如Camlistore [?],个人文件存储系统,以及Dat [?]数据协作工具链和数据集包管理器。 Git已经影响了分布式文件系统设计[9],因为它的内容寻址Merkle DAG数据模型可以实现强大的文件分发策略。 还有待探讨的是,这种数据结构如何影响高吞吐量文件系统的设计,以及它如何升级Web本身。

This paper introduces IPFS, a novel peer-to-peer versioncontrolled filesystem seeking to reconcile these issues. IPFS synthesizes learnings from many past successful systems.Careful interface-focused integration yields a system greater than the sum of its parts. The central IPFS principle is modeling all data as part of the same Merkle DAG.

本文介绍IPFS,一种新颖的对等网络版本控制的文件系统,旨在解决这些问题。 IPFS综合了过去许多成功的系统的经验教训。精心设计、专注于接口集成的系统产生的效益大于构建它的各个部件的总和。IPFS的核心原则是将所有数据建模为同一Merkle DAG的一部分。

2 背景(BACKGROUND)

This section reviews important properties of successful peer-to-peer systems, which IPFS combines.
本章节回顾成功的对等系统的重要特性,IPFS结合了这些特性。

2.1 分布式哈希表(Distributed Hash Tables)

Distributed Hash Tables (DHTs) are widely used to coordinate and maintain metadata about peer-to-peer systems.For example, the BitTorrent MainlineDHT tracks sets of peers part of a torrent swarm.

分布式散列表(DHT)被广泛用于协调和维护对等系统的元数据。例如,BitTorrent MainlineDHT可以跟踪Torrent群组的一些对等节点。

2.1.1 Kademlia DHT

Kademlia [10] is a popular DHT that provides:

  1. Efficient lookup through massive networks: queries on average contact「log2(n)」nodes. (e.g. 20 hops for a network of 10; 000; 000 nodes).
  2. Low coordination overhead: it optimizes the number of control messages it sends to other nodes.
  3. Resistance to various attacks by preferring long-lived nodes.
  4. Wide usage in peer-to-peer applications, including Gnutella and BitTorrent, forming networks of over 20 million nodes [16].

Kademlia[10]是一个流行的分布式哈希表(DHT),它提供了:

  1. 大规模网络的高效查询:平均查询“log2(n)”节点。(例如,对于10,000,000个节点的网络为20跳)。
  2. 低协调开销:它优化了发送给其他节点的控制消息的数量。
  3. 通过选择长期在线节点来抵抗各种攻击。
  4. 在包括Gnutella和BitTorrent在内的对等应用中广泛使用,形成了超过2000万个节点的网络[16]。

2.1.2 Coral DSHT

While some peer-to-peer filesystems store data blocks directly in DHTs, this “wastes storage and bandwidth, as data must be stored at nodes where it is not needed” [5]. The Coral DSHT extends Kademlia in three particularly important ways:

  1. Kademlia stores values in nodes whose ids are “nearest” (using XOR-distance) to the key. This does not take into account application data locality, ignores “far” nodes that may already have the data, and forces “nearest” nodes to store it, whether they need it or not.
    This wastes significant storage and bandwith. Instead, Coral stores addresses to peers who can provide the data blocks.
  2. Coral relaxes the DHT API from get_value(key) to get_any_values(key) (the “sloppy” in DSHT). This still works since Coral users only need a single (working) peer, not the complete list. In return, Coral can distribute only subsets of the values to the “nearest” nodes, avoiding hot-spots (overloading all the nearest nodes when a key becomes popular).
  3. Additionally, Coral organizes a hierarchy of separate DSHTs called clusters depending on region and size. This enables nodes to query peers in their region first, “finding nearby data without querying distant nodes”[5] and greatly reducing the latency of lookups.

一些对等文件系统直接将数据块文件存在DHTs中,这“浪费存储和带宽,因为必须数据存储在实际不需要的它的节点上”[5]。Coral DSHT在3个特别重要的方面扩展Kademlia:

  1. Kademlia将值存储在IDs与结点“最接近”(使用XOR-distance方法)的节点中。这样做没有考虑应用程序数据局部性,忽略了“远”节点可能已经有数据,并强制“最近”的节点存储数据,无论这些节点是否需要。这浪费了大量的存储和带宽。相反,Coral存储的是可以提供数据块的节点地址。
  2. Coral将DHT API的get_value(key)修改成get_any_values(key)(DSHT中的“sloppy”)。Coral用户只要一个(在线)节点,而不是完整列表就可以正常工作。作为回报,Coral可以仅将值的子集分配到“最近”的节点,避免热点(当某个Key变得流行时,重载所有最近的节点)。
  3. 另外,Coral根据区域和大小组织了一个称为群集的独立DSHT层次结构。这允许节点首先查询其区域中的节点,“查找附近的数据而不查询远程节点”[5]并大大减少查找的延迟。

2.1.3 S/Kademlia DHT

S/Kademlia [1] extends Kademlia to protect against malicious attacks in two particularly important ways:

  1. S/Kademlia provides schemes to secure NodeId generation, and prevent Sybill attacks. It requires nodes to create a PKI key pair, derive their identity from it, and sign their messages to each other. One scheme includes a proof-of-work crypto puzzle to make generating Sybills expensive.
  2. S/Kademlia nodes lookup values over disjoint paths, in order to ensure honest nodes can connect to each other in the presence of a large fraction of adversaries in the network. S/Kademlia achieves a success rate of 0.85 even with an adversarial fraction as large as half of the nodes.

S/Kademlia [1]在两个特别重要的方面扩展了Kademlia,用来防止恶意攻击。

  1. S/Kademlia提供了保护NodeId生成和防止SyBill攻击的方案。它需要节点生成公私钥对,从中获取标识,并相互签名。 其中一种方案包括一个工作证明密码难题,增加Sybills攻击的成本。
  2. S/Kademlia节点在不相交的路径上查找值,是为了保证网络中存在很大一部分恶意节点的情况下,节点也能相互连接。即使恶意节点的高达一半,S/Kademlia连接的成功率为0.85。

2.2 块交换-BitTorrent(Block Exchanges - BitTorrent)

BitTorrent [3] is a widely successful peer-to-peer filesharing system, which succeeds in coordinating networks of untrusting peers (swarms) to cooperate in distributing pieces of files to each other. Key features from BitTorrent and its ecosystem that inform IPFS design include:

  1. BitTorrent’s data exchange protocol uses a quasi tit-for-tat strategy that rewards nodes who contribute to each other, and punishes nodes who only leech others’ resources.
  2. BitTorrent peers track the availability of file pieces, prioritizing sending rarest pieces first. This takes load off seeds, making non-seed peers capable of trading with each other.
  3. BitTorrent’s standard tit-for-tat is vulnerable to some exploitative bandwidth sharing strategies. PropShare [8] is a different peer bandwidth allocation strategy that better resists exploitative strategies, and improves the performance of swarms.

BitTorrent[3]是一个非常成功的点对点文件共享系统,它成功地协调了不信任的对等网络节点(集群)相互分发文件块。IPFS从BitTorrent和它的生态系统的关键特征获得设计灵感:

  1. BitTorrent的数据交换协议使用了类似以牙还牙(tit-for-tat)的策略,即奖励贡献节点,惩罚只索取的节点。
  2. BitTorrent节点会跟踪文件块的可用性,会优先发送最稀缺的文件块。这减轻了种子节点的负担,使得非种子节点能相互交换数据(交易)。
  3. BitTorrent的以牙还牙的标准很容易受到一些剥削性带宽共享策略的影响。PropShare[8]是一种不同的对等带宽分配策略,它更好地抵抗剥削性策略,提高集群的性能。

2.3 版本控制系统——Git(Version Control Systems - Git)

Version Control Systems provide facilities to model files changing over time and distribute different versions efficiently. The popular version control system Git provides a powerful Merkle DAG[^2] object model that captures changes to a filesystem tree in a distributed-friendly way.

  1. Immutable objects represent Files (blob), Directories (tree), and Changes (commit).
  2. Objects are content-addressed, by the cryptographic hash of their contents.
  3. Links to other objects are embedded, forming a Merkle DAG. This provides many useful integrity and workflow properties.
  4. Most versioning metadata (branches, tags, etc.) are simply pointer references, and thus inexpensive to create and update.
  5. Version changes only update references or add objects.
  6. Distributing version changes to other users is simply transferring objects and updating remote references.

版本控制系统为随时间变化的文件建模和有效分发不同版本提供了设施。流行的版本控制系统Git提供了强大的默克尔有向无环图(Merkle DAG)[^2]对象模型,它以分布式友好的方式控制文件系统树的变更。

  1. 不可变的对象表示文件(blob),目录(树)和更改(提交)。
  2. 对象通过内容的加密散列进行内容寻址。
  3. 与其他对象的链接是嵌入的,形成了一个Merkle DAG。这提供了许多有用的完整性和工作流属性。
  4. 大多数版本元数据(分支,标签等)都只是指针引用,因此创建和更新的代价非常小。
  5. 版本变更只是更新引用或者添加对象。
  6. 分发版本变更给其他用户只是简单的传输对象和更新远程引用。

2.4 自我认证文件系统——SFS(Self-Certified Filesystems - SFS)

SFS [12, 11] proposed compelling implementations of both (a) distributed trust chains, and (b) egalitarian shared global namespaces. SFS introduced a technique for building SelfCertified Filesystems: addressing remote filesystems using the following scheme.

1
/sfs/<Location>:<HostID>

where Location is the server network address, and:

1
HostID = hash(public_key || Location) 

SFS[12,11]提出的关于(a)分布式信任链和(b)全局对等共享命名空间的实现方案令人信服。SFS引入了文件自认证技术,使用以下方案寻址远程文件系统

1
/sfs/<Location>:<HostID>

基中Location就是网络地址,而HostID

1
HostID = hash(public_key || Location) 

Thus the name of an SFS file system certifies its server. The user can verify the public key offered by the server, negotiate a shared secret, and secure all traffic. All SFS instances share a global namespace where name allocation is cryptographic, not gated by any centralized body.

因此,SFS文件系统的名称证明了它的服务器。用户可以验证服务器提供的公钥,协商共享秘密,并保证所有的通信。所有SFS实例都共享一个全局命名空间,其中的名称分配是加密的,而不是由任何集中主体设置的。

3 IPFS设计(IPFS DESIGN)

IPFS is a distributed file system which synthesizes successful ideas from previous peer-to-peer sytems, including DHTs, BitTorrent, Git, and SFS. The contribution of IPFS is simplifying, evolving, and connecting proven techniques into a single cohesive system, greater than the sum of its parts. IPFS presents a new platform for writing and deploying applications, and a new system for distributing and versioning large data. IPFS could even evolve the web itself.

IPFS是一个发布式文件系统,它从之前的DHTs,BitTorrent,Git和SFS等对等系统中吸取成功的想法。IPFS的贡献在于通过简化和演化的方式将已证实的技术整合成在一个内聚系统,而不只是简单的组合。IPFS提供了一个编写和部署应用的新平台,以及一个用于分发和版本化大量数据的新系统。IPFS甚至可以进化网络本身。

IPFS is peer-to-peer; no nodes are privileged. IPFS nodes store IPFS objects in local storage. Nodes connect to each other and transfer objects. These objects represent files and other data structures. The IPFS Protocol is divided into a stack of sub-protocols esponsible for different functionality:

  1. Identities - manage node identity generation and verification. Described in Section 3.1.
  2. Network - manages connections to other peers, uses various underlying network protocols. Configurable. Described in Section 3.2.
  3. Routing - maintains information to locate specific peers and objects. Responds to both local and remote queries. Defaults to a DHT, but is swappable. Described in Section 3.3.
  4. Exchange - a novel block exchange protocol (BitSwap) that governs efficient block distribution. Modelled as a market, weakly incentivizes data replication. Trade Strategies swappable. Described in Section 3.4.
  5. Objects - a Merkle DAG of content-addressed immutable objects with links. Used to represent arbitrary datastructures, e.g. file hierarchies and communication systems. Described in Section 3.5.
  6. Files - versioned file system hierarchy inspired by Git. Described in Section 3.6.
  7. Naming - A self-certifying mutable name system. Described in Section 3.7.

IPFS是对等网络;没有特权节点。IPFS的节点在本地存储对象。节点连接其他节点并交换对象。这些对象表示文件和其他数据结构。IPFS协议划分为一组负责不同功能的子协议:

  1. 身份 - 管理节点身份生成和验证, 在3.1节描述。
  2. 网络 - 管理与其他节点的连接,使用各种底层网络协议。配置化。在3.2节描述。
  3. 路由 - 维护信息以定位指定节点和对象的信息。响应本地和远程的查询。默认为DH​​T,但可更换。在3.3节描述。
  4. 交换 - 一种支持有效块分配的新型块交换协议(BitSwap)。模拟市场,弱化数据复制。贸易策略可替换。在3.4节描述。
  5. 对象 - 具有链接的内容寻址不可变对象的默克尔有向无环图(Merkle DAG)。用于表示任意数据结构,例如文件层次和通信系统。在3.5节描述。
  6. 文件 - 由Git启发的版本化文件系统层次结构。在3.6节描述。
  7. 命名 - 自我认证的可变名称系统。在3.7节描述。

These subsystems are not independent; they are integrated and leverage blended properties. However, it is useful to describe them separately, building the protocol stack from the bottom up.

这些子系统不是孤立的,它们是结合在一起,相互利用各自的属性。但是,分开描述它们是有用的,自底向上的构建协议栈。

Notation: data structures and functions below are specified in Go syntax.

注意:以下的数据结构和函数用Go语言表示。

3.1 身份(Identities)

Nodes are identified by a NodeId, the cryptographic hash[^3] of a public-key, created with S/Kademlia’s static crypto puzzle [1]. Nodes store their public and private keys (encrypted with a passphrase). Users are free to instatiate a “new” node identity on every launch, though that loses accrued network benefits. Nodes are incentivized to remain the same.

节点由一个NodeId标识,它是一个公钥的密码散列[^3],是由S/Kademlia的静态密码难题[1]生成的。节点存储其公私钥(用密码加密)。用户可以在每次启动时自由地设置一个“新”节点身份,尽管这会损失积累的网络利益。激励节点保持不变。

1
2
3
4
5
6
7
8
9
10
11
type NodeId Multihash
type Multihash []byte
// self-describing cryptographic hash digest
type PublicKey []byte
type PrivateKey []byte
// self-describing keys
type Node struct {
NodeId NodeID
PubKey PublicKey
PriKey PrivateKey
}

S/Kademlia based IPFS identity generation:
IPFS基于S/Kademlia生成身份:

1
2
3
4
5
6
7
difficulty = <integer parameter>
n = Node{}
do {
n.PubKey, n.PrivKey = PKI.genKeyPair()
n.NodeId = hash(n.PubKey)
p = count_preceding_zero_bits(hash(n.NodeId))
} while (p < difficulty)

Upon first connecting, peers exchange public keys, and check: hash(other.PublicKey) equals other.NodeId. If not, the connection is terminated.

第一次连接时,节点相互交换公钥,并进行检查:节点分钥的哈希是否与节点ID相等。如果不是,终止连接。

Note on Cryptographic Functions.

关于加密函数的注意事项

Rather than locking the system to a particular set of function choices, IPFS favors self-describing values. Hash digest values are stored in multihash format, which includes a short header specifying the hash function used, and the digest length in bytes. Example:

IPFS没有将系统锁定在一组特定函数选择上,而是选择自我描述的值。哈希摘要以多重哈希的格式存储,其包括指定使用的哈希函数的头和摘要字节长度。例如:

1
<function code><digest length><digest bytes>

This allows the system to (a) choose the best function for the use case (e.g. stronger security vs faster performance), and (b) evolve as function choices change. Self-describing values allow using different parameter choices compatibly.

这允许系统(a)针对场景选择最佳函数(例如,更强的安全性与更快的性能),(b)随着函数选择的变化而演变。自描述值允许使用不同的参数选择兼容性。

3.2 网络(Network)

IPFS nodes communicate regualarly with hundreds of other nodes in the network, potentially across the wide internet. The IPFS network stack features:

  • Transport: IPFS can use any transport protocol, and is best suited for WebRTC DataChannels [?] (for browser connectivity) or uTP(LEDBAT [14]).
  • Reliability: IPFS can provide reliability if underlying networks do not provide it, using uTP (LEDBAT [14]) or SCTP [15].
  • Connectivity: IPFS also uses the ICE NAT traversal techniques [13].
  • Integrity: optionally checks integrity of messages using a hash checksum.
  • Authenticity: optionally checks authenticity of messages using HMAC with sender’s public key.

IPFS节点与数百个节点进行常规通讯,可能跨越广域网络。IPFS网络堆栈功能:

  • 传输层: IPFS可以使用任何传输协议,并且最适合WebRTC DataChannels [?](用于浏览器连接)或uTP(LEDBAT [14])。
  • 可靠性: 如果底层网络不提供可靠性,IPFS可使用uTP(LEDBAT [14])或SCTP [15]来提供​​可靠性。
  • 可连接性:IPFS也可以使用ICE NAT穿墙打洞技术[13]。
  • 完整性:可以选用哈希校验来检查消息的完整性。
  • 真实性:可以选用带有发送者的公钥的HMAC来检查消息的真实性。

3.2.1 节点寻址说明(Note on Peer Addressing)

IPFS can use any network; it does not rely on or assume access to IP. This allows IPFS to be used in overlay networks.IPFS stores addresses as multiaddr formatted byte strings for the underlying network to use. multiaddr provides a way to express addresses and their protocols, including support for encapsulation. For example:

IPFS可以使用任何网络; 但它不依赖或假定访问IP。这使得IPFS可以在覆盖网络中使用。IPFS使用由字节字符串组成多地址格式存储地址,供底层网络使用。多地址提供了一种表达地址及其协议的方法,包括对封装的支持。例如:

1
2
3
4
# an SCTP/IPv4 connection
/ip4/10.20.30.40/sctp/1234/
# an SCTP/IPv4 connection proxied over TCP/IPv4
/ip4/5.6.7.8/tcp/5678/ip4/1.2.3.4/sctp/1234/

3.3 路由(Routing)

IPFS nodes require a routing system that can find (a) other peers’ network addresses and (b) peers who can serve particular objects. IPFS achieves this using a DSHT based on S/Kademlia and Coral, using the properties discussed in 2.1. The size of objects and use patterns of IPFS are similar to Coral [5] and Mainline [16], so the IPFS DHT makes a distinction for values stored based on their size. Small values (equal to or less than 1KB) are stored directly on the DHT.For values larger, the DHT stores references, which are the NodeIds of peers who can serve the block.

IPFS节点需要一个路由系统, 这个路由系统可用于查找:(a)其他同伴的网络地址,(b)专门用于服务特定对象的对等节点。IPFS使用基于S/Kademlia和Coral的DSHT和属性,这些属性在2.1节中介绍过。在对象大小和使用模式方面, IPFS 类似于Coral[5] 和Mainline[16], 因此,IPFS DHT根据存储的值的大小使用不同的存储方式。小的对象(等于或小于1KB)直接存储在DHT上。对于更大的对象,DHT只存储值索引,这个索引是可以提供数据块服务的节点的NodeId。

The interface of this DSHT is the following:

DSHT的接口如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
type IPFSRouting interface {
FindPeer(node NodeId)
// gets a particular peer’s network address

SetValue(key []bytes, value []bytes)
// stores a small metadata value in DHT

GetValue(key []bytes)
// retrieves small metadata value from DHT

ProvideValue(key Multihash)
// announces this node can serve a large value

FindValuePeers(key Multihash, min int)
// gets a number of peers serving a large value
}

Note: different use cases will call for substantially different routing systems (e.g. DHT in wide network, static HT in local network). Thus the IPFS routing system can be swapped for one that fits users’ needs. As long as the interface above is met, the rest of the system will continue to function.

注意:不同的场景会需要本质上不同的路由系统(例如广域网中使用DHT,局域网中使用静态HT)。因此,IPFS路由系统可以根据用户的需求替换的。只要符合上面的接口就可以,系统都能继续正常运行。

3.4 块交换——BitSwap协议(Block Exchange - BitSwap Protocol)

In IPFS, data distribution happens by exchanging blocks with peers using a BitTorrent inspired protocol: BitSwap. Like BitTorrent, BitSwap peers are looking to acquire a set of blocks (want_list), and have another set of blocks to offer in exchange (have_list). Unlike BitTorrent, BitSwap is not limited to the blocks in one torrent. BitSwap operates as a persistent marketplace where node can acquire the blocks they need, regardless of what files those blocks are part of. The blocks could come from completely unrelated
files in the filesystem. Nodes come together to barter in the marketplace.

IPFS使用BitSwap协议,即通过对等节点间交换数据块来分发数据,这个协议受BitTorrent启发。和BitTorrent一样, 每个对等节点在下载的同时不断向其他对等节点上传已下载的数据。和BT协议不同的是, BitSwap不局限于一个torrent文件中的数据块。BitSwap管理就像一个永久的市场提供各个节点想要获取的数据块,而不管这些块哪个文件中的一部分。这些数据块可能来自文件系统中完全不相关的文件。所有节点在这个市场进行交易。

While the notion of a barter system implies a virtual currency could be created, this would require a global ledger to track ownership and transfer of the currency. This can be implemented as a BitSwap Strategy, and will be explored in a future paper.

既然是交易系统的概念就意味着可以创建虚拟货币,但这将需要一个全球帐本来跟踪货币的所有权和交换。这作为BitSwap策略的来实现,并将在未来的论文中探讨。

In the base case, BitSwap nodes have to provide direct value to each other in the form of blocks. This works fine when the distribution of blocks across nodes is complementary, meaning they have what the other wants. Often, this will not be the case. In some cases, nodes must work for their blocks. In the case that a node has nothing that its peers want (or nothing at all), it seeks the pieces its peers want, with lower priority than what the node wants itself. This incentivizes nodes to cache and disseminate rare pieces, even if they are not interested in them directly.

在基本情况下,BitSwap节点必须以块的形式直接提供值。当跨节点的块的分布是互补的时候,节点各取所需,这能运行良好。 情况并非总是如此,在某些情况下,节点必须为自己的块而工作。 这种情况是节点没有其对等节点所需的块(或什么也没有),它会更低的优先级去寻找对等节点想要的块。这会激励节点去缓存和传播稀有片段, 即使节点对这些片段不感兴趣。

3.4.1 BitSwap信用(BitSwap Credit)

The protocol must also incentivize nodes to seed when they do not need anything in particular, as they might have the blocks others want. Thus, BitSwap nodes send blocks to their peers optimistically, expecting the debt to be repaid. But leeches (free-loading nodes that never share) must be protected against. A simple credit-like system solves the problem:

  1. Peers track their balance (in bytes verified) with other nodes.
  2. Peers send blocks to debtor peers probabilistically, according to a function that falls as debt increases.

协议必须带有激励机制,去激励节点在不需要任何数据的情况下也提供服务,因为节点可能有其他节点需要的数据块。 因此,BitSwap的节点会积极的发送块,以期获得报酬。但必须防止吸血鬼节点(从不共享块的空负载节点),一个简单的类信用的体系解决了这些问题:

  1. 对等点会记录与其他节点的差额(以字节为单位)。
  2. 对等节点按概率向债务节点方发送数据块,这个概率是一个随着债务增加而下降的函数。

Note that if a node decides not to send to a peer, the node subsequently ignores the peer for an ignore_cooldown timeout. This prevents senders from trying to game the probability by just causing more dice-rolls. (Default BitSwap is 10 seconds).

注意的是,如果节点决定不向某个节点发送数据,节点会在随后的ignore_cooldown超时时间内忽略该节点的请求。这样可以防止发送者尝试多次发送来提高概率(BitSwap默认是10秒)。

3.4.2 BitSwap策略(3.4.2 BitSwap Strategy)

The differing strategies that BitSwap peers might employ have wildly different effects on the performance of the exchange as a whole. In BitTorrent, while a standard strategy is specified (tit-for-tat), a variety of others have been implemented, ranging from BitTyrant [8] (sharing the leastpossible), to BitThief [8] (exploiting a vulnerability and never share), to PropShare [8] (sharing proportionally). A range of strategies (good and malicious) could similarly be implemented by BitSwap peers. The choice of function, then, should aim to:

  1. maximize the trade performance for the node, and the whole exchange
  2. prevent freeloaders from exploiting and degrading the exchange
  3. be effective with and resistant to other, unknown strategies
  4. be lenient to trusted peers

BitSwap对等节点可能采用的不同策略对整个网络数据交换的性能有着非常不同的影响。在BitTorrent中,除了指定了标准策略(tit-for-tat),也实现了其他不同的策略,从BitTyrant [8](尽可能分享),到BitThief [8](利用漏洞从不共享),到PropShare [8](按比例分享)。BitSwap对等节点可以实现一系列类似的策略(好意的和恶意的)。对于函数的选择,目标是:

  1. 最大化整个网络和节点的数据交换性能。
  2. 防止空负载节点利用和降低交换。
  3. 能有效抵制未知策略。
  4. 对可信任的对等节点更宽容。

The exploration of the space of such strategies is future work. One choice of function that works in practice is a sigmoid, scaled by a debt retio:

探索这些策略的空间是未来的工作。在实践中使用的一个函数是sigmoid,通过负债比例来衡量:

Let the debt ratio r between a node and its peer be:
通过下面的公式一个节点和对等节点的负债比例r:

1
r = bytes_sent / (bytes_recv + 1)

Given r, let the probability of sending to a debtor be:

给定r,就可以计算给负债节点的发送数据的概率:

1
P(send | r ) = 1 − ( 1/  ( 1 + exp(6 − 3r) ) )

As you can see in Figure 1, this function drops off quickly as the nodes’ debt ratio surpasses twice the established credit.
正如图1所见,当节点的债务比率超过已建立的信用的两倍时,这个函数就会迅速下降。

1
2
3
4
5
r
P ( send j r )
1
0 1 2 3 4
Figure 1: Probability of Sending as r increases

The debt ratio is a measure of trust: lenient to debts between nodes that have previously exchanged lots of data success fully, and merciless to unknown, untrusted nodes. This (a)provides resistance to attackers who would create lots of new nodes (sybill attacks), (b) protects previously successful trade relationships, even if one of the nodes is temporarily unable to provide value, and (c) eventually chokes relation ships that have deteriorated until they improve.

负债比是信任的衡量标准:对于之前成功的互换过很多数据的节点会宽容债务,而对未知节点会严格很多。这个(a)给与那些创造很多节点的攻击者(sybill 攻击)一个障碍。(b)保护了之前成功交易节点之间的关系,即使这个节点暂时无法提供数据。(c)最终阻塞那些关系已经恶化的节点之间的通信,直到他们被再次证明。

3.4.3 BitSwap帐本(BitSwap Ledger)

BitSwap nodes keep ledgers accounting the transfers with other nodes. This allows nodes to keep track of history and avoid tampering. When activating a connection, BitSwap nodes exchange their ledger information. If it does not match exactly, the ledger is reinitialized from scratch, losing the accrued credit or debt. It is possible for malicious nodes to purposefully “lose” the Ledger, hoping to erase debts. It is unlikely that nodes will have accrued enough debt to war rant also losing the accrued trust; however the partner node is free to count it as misconduct, and refuse to trade.

BitSwap节点保存了一个记录与所有其他节点之间交易的账本。这个可以让节点追踪历史记录以及避免被篡改。当激活了一个链接,BitSwap节点就会互换它们账本信息。如果这些账本信息并不完全相同,分类账本将会重新初始化, 那些应计信贷和债务会丢失。 恶意节点会有意去失去“这些“账本, 从而期望清除自己的债务。节点是不太可能在失去了应计信托的情况下还能累积足够的债务去授权认证。伙伴节点可以自由的将其视为不当行为, 拒绝交易。

1
2
3
4
5
6
7
type Ledger struct {
owner NodeId
partner NodeId
bytes_sent int
bytes_recv int
timestamp Timestamp
}

Nodes are free to keep the ledger history, though it is not necessary for correct operation. Only the current ledger entries are useful. Nodes are also free to garbage collect ledgers as necessary, starting with the less useful ledgers: the old (peers may not exist anymore) and small.

节点可以自由的保留账本历史,因为历史帐本不是正确操作所必须的,只有当前的账本条目才是有用的。节点也可以根据需要自由收集帐本,先从不太有用的帐本(旧的(其他对等节点可能不存在)和小的)开始收集。

3.4.4 BitSwap详解(BitSwap Specification)

BitSwap nodes follow a simple protocol.
BitSwap节点遵循一个简单协议。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
// Additional state kept
type BitSwap struct {
ledgers map[NodeId]Ledger
// Ledgers known to this node, inc inactive
active map[NodeId]Peer
// currently open connections to other nodes
need_list []Multihash
// checksums of blocks this node needs
have_list []Multihash
// checksums of blocks this node has
}

type Peer struct {
nodeid NodeId
ledger Ledger
// Ledger between the node and this peer
last_seen Timestamp
// timestamp of last received message
want_list []Multihash
// checksums of all blocks wanted by peer
// includes blocks wanted by peer’s peers
}

// Protocol interface:
interface Peer {
open (nodeid :NodeId, ledger :Ledger);
send_want_list (want_list :WantList);
send_block (block :Block) -> (complete :Bool);
close (final :Bool);
}

Sketch of the lifetime of a peer connection:

  1. Open: peers send ledgers until they agree.
  2. Sending: peers exchange want_lists and blocks.
  3. Close: peers deactivate a connection.
  4. Ignored: (special) a peer is ignored (for the duration of a timeout) if a node’s strategy avoids sending

对等节点连接过程概述:

  1. 打开:对等节点同意后发送帐本
  2. 发送:对等结点交换需求列表(want_lists)和数据块
  3. 关闭:对等结点断开连接
  4. 忽略:因为结点策略一个节点会忽略(在一段超时时间内)对等节点请求

Peer.open(NodeId, Ledger).

When connecting, a node initializes a connection with a Ledger, either stored from a connection in the past or a new one zeroed out. Then, sends an Open message with the Ledger to the peer.

当连接的时候,节点通过帐本来初始化连接,这个帐本可能之前连接保存的帐本,也可能是一个全新的空账本。然后,发送将账本做为打开消息发送给对等节点。

Upon receiving an Open message, a peer chooses whether to activate the connection. If - acording to the receiver’s Ledger - the sender is not a trusted agent (transmission below zero, or large outstanding debt) the receiver may opt to ignore the request. This should be done probabilistically with an ignore_cooldown timeout, as to allow errors to be corrected and attackers to be thwarted.

当接到一个打开消息时,对等节点决定是否保持这个连接。根据接收到的帐本,如果发送者不是一个可信节点(传输低于零或者有很大的债务),接收者可以忽略这个请求。会通过概率来计算一个忽略冷却超时时间(ignore_coolown timeout),这样可以纠正错误和阻止攻击。

If activating the connection, the receiver initializes a Peer object with the local version of the Ledger and sets the last_seen timestamp. Then, it compares the received Ledger with its own. If they match exactly, the connections have opened. If they do not match, the peer creates a new zeroed out Ledger and sends it.

如果连接成功,接收者用本地账本来初始化一个Peer对象并设置last_seen时间戳。然后,它会将接受到的账本与自己的账本进行比较。如果两个账本完全匹配,那么这个链接就被打开,如果账本并不匹配,那么节点会创建一个全新的空账本并且发送。

Peer.send_want_list(WantList).

While the connection is open, nodes advertise their want_list to all connected peers. This is done (a) upon opening the connection, (b) after a randomized periodic timeout, (c) after a change in the want_list and (d) after receiving a new block.

当连接建立后,节点会向所有连接的对等结点广播需求列表(want_list)。广播会在以下几种情况发生(a)打开连接后(b)随机间歇超时后(c)want_list改变后(d)接收到一个新的块之后。

Upon receiving a want_list, a node stores it. Then, it checks whether it has any of the wanted blocks. If so, it sends them according to the BitSwap Strategy above.

当接收到需求列表(want_list),节点会将它保存下来,然后,检查是否有需要的数据块。如果有基于BitSwap策略发送数据。

Peer.send_block(Block).

Sending a block is straightforward. The node simply transmits the block of data. Upon receiving all the data, the receiver computes the Multihash checksum to verify it matches the expected one, and returns confirmation.

发送一个块很简单。节点只是传输数据块。当接收到了所有数据的时候,接收者会计算多重hash校验和来验证它是否是自己所需数据,然后返回确认信息。

Upon finalizing the correct transmission of a block, the receiver moves the block from need_list to have_list, and both the receiver and sender update their ledgers to reflect the additional bytes transmitted.

在完成一个正确的块传输之后,接受者会将此块从需要列表(need_list)移到已有列表(have_list), 最后接收者和发送者都会更新它们的账本来计算出传输的额外数据字节数。

If a transmission verification fails, the sender is either malfunctioning or attacking the receiver. The receiver is free to refuse further trades. Note that BitSwap expects to operate on a reliable transmission channel, so transmission errors - which could lead to incorrect penalization of an honest sender - are expected to be caught before the data is given to BitSwap.

如果一个传输验证失败,发送者可以出故障,也可能是攻击者。接收者可以选择拒绝后面的交易。注意,BitSwap是期望运行在一个可靠的传输通道上,所以最好在数据发送给BitSwap之前就发现传输错误(可能导致对一个诚实发送者的错误惩罚)。

Peer.close(Bool).

The final parameter to close signals whether the intention to tear down the connection is the sender’s or not. If false, the receiver may opt to re-open the connection immediatelty. This avoids premature closes.

传给close的参数,标识是否要关闭连接。如果参数值为false,接收者可能会立即重新开启连接。这避免过早的关闭连接。

A peer connection should be closed under two conditions:

  • a silence_wait timeout has expired without receiving any messages from the peer (default BitSwap uses 30 seconds). The node issues Peer.close(false).
  • the node is exiting and BitSwap is being shut down. In this case, the node issues Peer.close(true).

一个对等节点会在下面两种情况关闭连接:

  • 在静默等待时间(silence_wait timeout)已经超时,并且没有接收到其他对等节点的任何信息(BitSwap默认使用30秒),节点会发送Peer.close(false)。
  • 在节点要退出和BitSwap要关闭的时候,节点会发送Peer.close(true)。

After a close message, both receiver and sender tear down the connection, clearing any state stored. The Ledger may be stored for the future, if it is useful to do so.

在接收到关闭消息之后,接收者和发送者会断开链接,清除所有被存储的状态。如果觉得账本在未来还有用可能会被保存下来。

注意事项(Notes.)

  • Non-open messages on an inactive connection should be ignored. In case of a send_block message, the receiver may check the block to see if it is needed and correct, and if so, use it. Regardless, all such out-of order messages trigger a close(false) message from the receiver to force re-initialization of the connection.

  • 在一个不活跃的连接上,非打开消息应该被忽略。在发送块(send_block)消息时,接收者应该检查这个块,看它是否是自己所需的并且正确,如果是,就使用此块。总之,所有这些无序消息都会触发接收方发送close(false)消息,以强制重新初始化连接。

3.5 Merkle DAG对象(Object Merkle DAG)

The DHT and BitSwap allow IPFS to form a massive peer-to-peer system for storing and distributing blocks quickly and robustly. On top of these, IPFS builds a Merkle DAG, a directed acyclic graph where links between objects are cryptographic hashes of the targets embedded in the sources. This is a generalization of the Git data structure. Merkle DAGs provide IPFS many useful properties, including:

  1. Content Addressing: all content is uniquely identified by its multihash checksum, including links.
  2. Tamper resistance: all content is verified with its checksum. If data is tampered with or corrupted, IPFS detects it.
  3. Deduplication: all objects that hold the exact same content are equal, and only stored once. This is particularly useful with index objects, such as git trees and commits, or common portions of data.

DHT和BitSwap使得IPFS可以构造一个大规模的点对点系统用来数据块的快速稳定存储和分发。最主要的是,IPFS创建了一个Merkle DAG,它一个有向无环图,对象间的关系按加密哈希对象的方式嵌入在源中(译注,此处表达不畅,请看原文)。这是对Git数据结构的一种泛化。Merkle DAGS为IPFS提供了很多有用的属性,包括:

  1. 内容寻址:所有内容通过多重哈希校验和来唯一识别,包括关系(links)。
  2. 防篡改:所有的内容都用它的校验和来验证。如果数据被篡改或损坏,IPFS会检测到。
  3. 无重复数据:所有的对象都拥有相同的内容并只存储一次。这对于索引对象非常有用,比如Git的树(tree)和提交(commits),或者数据的公共部分。

The IPFS Object format is:

以下是IPFS对象的格式:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
type IPFSLink struct {
Name string
// name or alias of this link
Hash Multihash
// cryptographic hash of target
Size int
// total size of target
}

type IPFSObject struct {
links []IPFSLink
// array of links
data []byte
// opaque content data
}

The IPFS Merkle DAG is an extremely flexible way to store data. The only requirements are that object references be (a) content addressed, and (b) encoded in the format above. IPFS grants applications complete control over the data field; applications can use any custom data format they chose, which IPFS may not understand. The separate in object link table allows IPFS to:

IPFS Merkle DAG一种非常灵活的存储数据的方式。只要求对象引用是(a)内容可寻址,(b)用上面的格式编码。IPFS允许应用完全的掌控数据字段;应用可以使用任何自定义格式的数据,即使数据IPFS都无法理解。独立的内部对象关系表使得IPFS可以:

  • List all object references in an object. For example:

  • 在一个对象中存所有对象的引用,比如:

    1
    2
    3
    4
    5
    6
    > ipfs ls /XLZ1625Jjn7SubMDgEyeaynFuR84ginqvzb
    XLYkgq61DYaQ8NhkcqyU7rLcnSa7dSHQ16x 189458 less
    XLHBNmRQ5sJJrdMPuu48pzeyTtRo39tNDR5 19441 script
    XLF4hwVHsVuZ78FZK6fozf8Jj9WEURMbCX4 5286 template

    <object multihash> <object size> <link name>
  • Resolve string path lookups, such as foo/bar/baz. Given an object, IPFS resolves the first path component to a hash in the object’s link table, fetches that second object, and repeats with the next component. Thus, string paths can walk the Merkle DAG no matter what the data formats are.

  • 解决字符串路经查找,例如foo/bar/baz。给出一个对象,IPFS会通过对象的关系表的哈希解析第一个路径的组成部分,再获取路径的第二个组成部分,如此反复,可遍历Merkle DAG,而不关心数据的格式是什么。

  • Resolve all objects referenced recursively:

  • 能过递归解析对象的引用:

    1
    2
    3
    4
    5
    6
    7
    > ipfs refs --recursive \
    /XLZ1625Jjn7SubMDgEyeaynFuR84ginqvzb
    XLLxhdgJcXzLbtsLRL1twCHA2NrURp4H38s
    XLYkgq61DYaQ8NhkcqyU7rLcnSa7dSHQ16x
    XLHBNmRQ5sJJrdMPuu48pzeyTtRo39tNDR5
    XLWVQDqxo9Km9zLyquoC9gAP8CL1gWnHZ7z
    ...

A raw data field and a common link structure are the necessary components for constructing arbitrary data structures on top of IPFS. While it is easy to see how the Git object model fits on top of this DAG, consider these other potential data structures: (a) key-value stores (b) traditional relational databases (c) Linked Data triple stores (d) linked document publishing systems (e) linked communications platforms (f) cryptocurrency blockchains. These can all be modeled on top of the IPFS Merkle DAG, which allows any of these systems to use IPFS as a transport protocol for more complex applications.

原始数据字段和通用关系结构是IPFS构建任意数据结构的必要组成部分。可以很容易看出DAG是如何适用Git的对象模型的,考虑一些其他潜在的数据结构:(a)键值存储(b)传统关系型数据(c)数据三备份存储(d) 文档发布系统(e)通信平台(f)加密货币区块。这些系统都可以套用IPFS Merkle DAG,这些系统可以使用IPFS作为传输协议来实现更复杂的应用。

3.5.1 路径(Paths)

IPFS objects can be traversed with a string path API. Paths work as they do in traditional UNIX filesystems and the Web. The Merkle DAG links make traversing it easy. Note that full paths in IPFS are of the form:

IPFS对象可以遍历一个字符串路径。路径格式与传统UNIX文件系统以及Web一致。Merkle DAG的links使遍历变得很容易。在IPFS中的完整路径格式是:

1
2
3
4
# format
/ipfs/<hash-of-object>/<name-path-to-object>
# example
/ipfs/XLYkgq61DYaQ8NhkcqyU7rLcnSa7dSHQ16x/foo.txt

The /ipfs prefix allows mounting into existing systems at a standard mount point without conflict (mount point names are of course configurable). The second path component (first within IPFS) is the hash of an object. This is always the case, as there is no global root. A root object would have the impossible task of handling consistency of millions of objects in a distributed (and possibly disconnected) environment. Instead, we simulate the root with content addressing. All objects are always accessible via their hash. Note this means that given three objects in path <foo>/bar/baz, the last object is accessible by all:

/ipfs前缀可以通过标准挂载点的方式挂载到一个已存在的系统上,只要挂载点不冲突(挂载点名称当然是可配置的)。路径的第二个部分(第一个是IPFS)是一个对象的哈希。情况总是如此,因为没有全局的根。一个根对象会有一个不可能完成的任务,就是在分布式环境(可能还断开链接)中处理百万对象的一致性。因此,我们用地址可寻址来模拟根。通过所有的对象都可以通过它的哈希来访问。这意思是说,给一个<foo>/bar/baz,最后一个对象可以被所有对象访问的:

1
2
3
/ipfs/<hash-of-foo>/bar/baz
/ipfs/<hash-of-bar>/baz
/ipfs/<hash-of-baz>

3.5.2 本地对象(Local Objects)

IPFS clients require some local storage, an external system on which to store and retrieve local raw data for the objects IPFS manages. The type of storage depends on the node’s use case. In most cases, this is simply a portion of disk space(either managed by the native filesystem, by a key-value store such as leveldb [4], or directly by the IPFS client). In others, for example non-persistent caches, this storage is just a portion of RAM.

IPFS客户端需要一些本地存储,一个IPFS管理的对象存储以及检索本地原始数据的外部系统。存储的类型因节点的使用场景不同而不同。在大多数情况下,这个存储只是硬盘空间的一部分(不论是本地的文件系统,还是使用键值存储如leveldb来管理,或者用IPFS客户端直接管理),在其他的情况下,例如非持久性缓存,只是内存的一部分。

Ultimately, all blocks available in IPFS are in some node’s local storage. When users request objects, they are found, downloaded, and stored locally, at least temporarily. This provides fast lookup for some configurable amount of time thereafter.

最终,IPFS中的所有的块都是可获取的,这此块分散一些节点的本地存储中。当用户请求对象时,对象会被查找,下载,并存储到本地,至少也是临时缓存。一段时间后,就可快速的查找一些配置。

3.5.3 对象锁定(Object Pinning)

Nodes who wish to ensure the survival of particular objects can do so by pinning the objects. This ensures the objects are kept in the node’s local storage. Pinning can be done recursively, to pin down all linked descendent objects as well. All objects pointed to are then stored locally. This is particularly useful to persist files, including references. This also makes IPFS a Web where links are permanent, and Objects can ensure the survival of others they point to.

如果节点希望保持特定对象的生命周期,就可以锁定此对象。这保证此对象被保存在节点的本地存储中。锁定操作是递归的,也会锁定所有相关的派生对象。所有被指定的对象都保存在本地。这对于持久化文件特别有用,包括引用。。这使能IPFS能成为链接是永久有效的Web,且只要对象被指定对象就一直存在。

3.5.4 对象发布(Publishing Objects)

IPFS is globally distributed. It is designed to allow the files of millions of users to coexist together. The DHT, with content-hash addressing, allows publishing objects in a fair, secure, and entirely distributed way. Anyone can publish an object by simply adding its key to the DHT, adding themselves as a peer, and giving other users the object’s path. Note that Objects are essentially immutable, just like in Git. New versions hash differently, and thus are new objects. Tracking versions is the job of additional versioning objects.

IPFS是全球分布的。它的设计目的是让数百万用户的文件能够共存。DHT使用内容哈希寻址技术,使发布对象是公平的,安全的,完全分布式的。任何人都可以轻松的分布对象,只要将对象的key加入到DHT中,添加到对等节点,然后把路径给其他的用户就可以了。需要要注意的是,对象本质上是不可改变的,就像在Git中一样。新版本的哈希值不同,因此是新对象。版本跟踪的任务就是识别对象的其他版本。

3.5.5 对象级加密(Object-level Cryptography)

IPFS is equipped to handle object-level cryptographic operations. An encrypted or signed object is wrapped in a special frame that allows encryption or verification of the raw bytes.

IPFS支持对象级别加密操作。加密或签名的对象被封装在一个特殊的框架中,允许对原始字节进行加密或验证。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
type EncryptedObject struct {
Object []bytes
// raw object data encrypted
Tag []bytes
// optional tag for encryption groups
}
type SignedObject struct {
Object []bytes
// raw object data signed
Signature []bytes
// hmac signature
PublicKey []multihash
// multihash identifying key
}

Cryptographic operations change the object’s hash, defining a different object. IPFS automatically verifies signatures, and can decrypt data with user-specified keychains. Links of encrypted objects are protected as well, making traversal impossible without a decryption key. It is possible to have a parent object encrypted under one key, and a child under another or not at all. This secures links to shared objects.

加密操作改变了对象的哈希值,认为是新的对象。IPFS自动的验证签名以及使用用户指定的钥匙链解密数据。加密对象的关系用同样的方式加密,没有解密秘钥就无法遍历对象。父对象用一个秘钥加密,而子对象用另一个秘钥加密或者根本没有加密这是不可能的。这保证关系可以共享对象。

3.6 文件(Files)

IPFS also defines a set of objects for modeling a versioned filesystem on top of the Merkle DAG. This object model is similar to Git’s:

  1. block: a variable-size block of data.
  2. list: a collection of blocks or other lists.
  3. tree: a collection of blocks, lists, or other trees.
  4. commit: a snapshot in the version history of a tree.

IPFS定义了一组对象,用于在MerkelDAG之上对版本化的文件系统的建模。这个对象模型与Git比较相似:

  1. 块(block):一个可变大小的数据块
  2. 列表(list):块或者链表的集合
  3. 树(tree):块,链表,或者树的集合
  4. 提交(commit):树在版本历史记录中的一个快照

I hoped to use the Git object formats exactly, but had to depart to introduce certain features useful in a distributed filesystem, namely (a) fast size lookups (aggregate byte sizes have been added to objects), (b) large file deduplication (adding a list object), and (c) embedding of commits into trees. However, IPFS File objects are close enough to Git that conversion between the two is possible. Also, a set of Git objects can be introduced to convert without losing any information (unix file permissions, etc).

我原本希望使用与Git对象格式一致的模型,但不得不舍弃一些分布式文件系统中非常有用的特征,如(a)大小快速查找(总字节大小已经加入到对象中)(b)大文件的不重复(添加到list对象)(c)提交(commits)嵌入到树(trees)中。不过,IPFS文件对象与Git还是非常相近的,两者之间进行转换是有可能的。而且,Git的一个系列的对象可以被转换过来也不会丢失任何的信息(unix文件权限等等)。

Notation: File object formats below use JSON. Note that this structure is actually binary encoded using protobufs, though ipfs includes import/export to JSON.

注意:下面的文件对象使用JSON格式。值得注意的是尽管IPFS可以按JSON的格式导入导出,但文件对象的结构是用protobufs进行二进制编码的。

3.6.1 文件对象:blob(File Object: blob)

The blob object contains an addressable unit of data, and represents a file. IPFS Blocks are like Git blobs or filesystem data blocks. They store the users’ data. Note that IPFS files can be represented by both lists and blobs. Blobs have no links.

blob对象表示一个文件且包含一个可寻址的数据单元,IPFS的blobs就像Git的blobs或者文件系统数据块。它们存储用户的数据。需要注意的是IPFS文件可以使用lists或者blobs来表示。Blobs没有links。

1
2
3
4
{
"data": "some data here",
// blobs have no links
}

3.6.2 文件对象:list(File Object: list)

The list object represents a large or deduplicated file made up of several IPFS blobs concatenated together. lists contain an ordered sequence of blob or list objects. In a sense, the IPFS list functions like a filesystem file with indirect blocks. Since lists can contain other lists, topologies including linked lists and balanced trees are possible. Directed graphs where the same node appears in multiple places allow in-file deduplication. Of course, cycles are not possible, as enforced by hash addressing.

list表示由几个IPFS的blobs连接成的大文件或者非重复文件。lists包含blob或list的有序序列。从某种程度上而言,IPFS的list的功能就像一个带有间接块的文件系统。由于lists可以包含其他的lists,那么包含linked的链表和平衡树的拓扑结构是有可能的。有向图允许文件内数据不重复,即相同的节点出现在多个不同位置。当然,循环是不可能的,因为强制采用被哈希寻址。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
{
"data": ["blob", "list", "blob"],
// lists have an array of object types as data
"links": [
{
"hash": "XLYkgq61DYaQ8NhkcqyU7rLcnSa7dSHQ16x",
"size": 189458
},
{
"hash": "XLHBNmRQ5sJJrdMPuu48pzeyTtRo39tNDR5",
"size": 19441
},
{
"hash": "XLWVQDqxo9Km9zLyquoC9gAP8CL1gWnHZ7z",
"size": 5286
}
// lists have no names in links
]
}

3.6.3 文件对象:tree(File Object: tree)

The tree object in IPFS is similar to Git’s: it represents a directory, a map of names to hashes. The hashes reference blobs, lists, other trees, or commits. Note that traditional path naming is already implemented by the Merkle DAG.
IPFS中的tree对象与Git中相似,它代表着一个目录,一个名字到哈希值的映射。哈希值则表示着blobs,lists,其他的trees,或者commits。注意,传统路径的命名早已经被Merkle DAG实现了。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
{
"data": ["blob", "list", "blob"],
// trees have an array of object types as data
"links": [
{
"hash": "XLYkgq61DYaQ8NhkcqyU7rLcnSa7dSHQ16x",
"name": "less",
"size": 189458
},
{
"hash": "XLHBNmRQ5sJJrdMPuu48pzeyTtRo39tNDR5",
"name": "script",
"size": 19441
},
{
"hash": "XLWVQDqxo9Km9zLyquoC9gAP8CL1gWnHZ7z",
"name": "template",
"size": 5286
}
// trees do have names
]
}

3.6.4 文件对象:commit(File Object: commit)

The commit object in IPFS represents a snapshot in the version history of any object. It is similar to Git’s, but can reference any type of object. It also links to author objects.
IPFS中的commit对象代表任何对象在版本历史记录中的一个快照。与Git中类似,但是它能够表示任何类型的对象。它还可以关联作者对象。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
{
"data": {
"type": "tree",
"date": "2014-09-20 12:44:06Z",
"message": "This is a commit message."
},
"links": [
{
"hash": "XLa1qMBKiSEEDhojb9FFZ4tEvLf7FEQdhdU",
"name": "parent",
"size": 25309
},
{
"hash": "XLGw74KAy9junbh28x7ccWov9inu1Vo7pnX",
"name": "object",
"size": 5198
},
{
"hash": "XLF2ipQ4jD3UdeX5xp1KBgeHRhemUtaA8Vm",
"name": "author",
"size": 109
}
]
}

Figure 2: Sample Object Graph
Figure 3: Sample Objects

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
> ipfs file-cat <ccc111-hash> --json
{
"data": {
"type": "tree",
"date": "2014-09-20 12:44:06Z",
"message": "This is a commit message."
},
"links": [
{
"hash": "<ccc000-hash>",
"name": "parent",
"size": 25309
},
{
"hash": "<ttt111-hash>",
"name": "object",
"size": 5198
},
{
"hash": "<aaa111-hash>",
"name": "author",
"size": 109
}
]
}

> ipfs file-cat <ttt111-hash> --json
{
"data": ["tree", "tree", "blob"],
"links": [
{
"hash": "<ttt222-hash>",
"name": "ttt222-name",
"size": 1234
},
{
"hash": "<ttt333-hash>",
"name": "ttt333-name",
"size": 3456
},
{
"hash": "<bbb222-hash>",
"name": "bbb222-name",
"size": 22
}
]
}

> ipfs file-cat <bbb222-hash> --json
{
"data": "blob222 data",
"links": []
}

3.6.5 版本控制(Version control)

The commit object represents a particular snapshot in the version history of an object. Comparing the objects (and children) of two different commits reveals the differences between two versions of the filesystem. As long as a single commit and all the children objects it references are accessible, all preceding versions are retrievable and the full history of the filesystem changes can be accessed. This falls out of the Merkle DAG object model.

Commit对象代表着一个对象在历史版本中的一个特定快照。在两个不同的commit中比较对象(和子对象)可以显示出两个不同版本文件系统的区别。只要commit和它所有子对象的引用是能够被访问的,所有前版本是可获取的,所有文件系统改变的全部历史是可访问的,这就与Merkle DAG对象模型脱离开来了。

The full power of the Git version control tools is available to IPFS users. The object model is compatible, though not the same. It is possible to (a) build a version of the Git tools modified to use the IPFS object graph, (b) build a mounted FUSE filesystem that mounts an IPFS tree as a Git repo, translating Git filesystem read/writes to the IPFS formats.

Git版本控制工具的所有功能对于IPFS的用户是可用的。对象模型不完全一致,但也是可兼容的。这可能(a)构建一个Git工具版本改造成使用IPFS对象图,(b)构建一个挂载FUSE文件系统,挂载一个IPFS的tree作为Git的仓库,把Git文件系统的读/写转换为IPFS的格式。

3.6.6 文件系统路径(Filesystem Paths)

As we saw in the Merkle DAG section, IPFS objects can be traversed with a string path API. The IPFS File Objects are designed to make mounting IPFS onto a UNIX filesystem simpler. They restrict trees to have no data, in order to represent them as directories. And commits can either be represented as directories or hidden from the filesystem entirely.

如我们在Merkle DAG中看到的一样,IPFS对象可以使用字符串路径API来遍历。IPFS文件对象是精心设计的,为了让IPFS更容易的挂载到UNIX文件系统中。文件对象trees被严格限制为不能有数据,用于表示目录。Commits可以以目录的形式出现,也可以完全的隐藏在文件系统中。

3.6.7 将文件分割成ListS和Blob(Splitting Files into Lists and Blob )

One of the main challenges with versioning and distributing large files is finding the right way to split them into independent blocks. Rather than assume it can make the right decision for every type of file, IPFS offers the following alternatives:

(a) Use Rabin Fingerprints [?] as in LBFS [?] to pick suitable block boundaries.

(b) Use the rsync [?] rolling-checksum algorithm, to detect blocks that have changed between versions.

(c) Allow users to specify block-splitting functions highly tuned for specific files.

版本控制和分发大文件其中一个最主要的挑战是:找到一个正确的方法来将它们分割成独立的块。IPFS提供了以下的几种选择来决定不同文件的分割:

(a) 就像在LBFS[?]中一样使用Rabin Fingerprints[?]来选择一个比较合适的块边界。

(b) 使用rsync[?]滚动校验和(rolling-checksum)算法,来检测块在版本之间的变化。

(c) 允许用户为特定文件指定高度优化的块分割(block-splitting)函数。

3.6.8 路径查找性能(Path Lookup Performance)

Path-based access traverses the object graph. Retrieving each object requires looking up its key in the DHT, connecting to peers, and retrieving its blocks. This is considerable overhead, particularly when looking up paths with many components. This is mitigated by:

  • tree caching: since all objects are hash-addressed, they can be cached indefinitely. Additionally, trees tend to be small in size so IPFS prioritizes caching them over blobs.
  • flattened trees: for any given tree, a special flattened tree can be constructed to list all objects reachable from the tree. Names in the flattened tree would really be paths parting from the original tree, with slashes.

基于路径的访问需要遍历对象图。获取每个对象要求在DHT中查找它们的key,连接到对等节点,然后获取它的块。这造成相当大的开销,特别是查找的路径由很多子路径组成时。下面的方法可以减缓开销:

  • 树缓存(tree caching):由于所有的对象都是哈希寻址的,它们可以被无限的缓存。另外,trees一般比较小,所以IPFS会优先缓存trees后再缓存blobs。
  • 树扁平化(flattened trees):对于任何tree,都可以构建一个特殊的flattened tree来访问这个tree的所有对象。在flattened tree中名字就是一个从原始tree的真实路径,用斜杠分隔。

For example, flattened tree for ttt111 above:

例如,对于上面的ttt111的flattened tree如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
{
"data": ["tree", "blob", "tree", "list", "blob" "blob"],
"links": [
{
"hash": "<ttt222-hash>",
"size": 1234,
"name": "ttt222-name"
},
{
"hash": "<bbb111-hash>",
"size": 123,
"name": "ttt222-name/bbb111-name"
},
{
"hash": "<ttt333-hash>",
"size": 3456,
"name": "ttt333-name"
},
{
"hash": "<lll111-hash>",
"size": 587,
"name": "ttt333-name/lll111-name"
},
{
"hash": "<bbb222-hash>",
"size": 22,
"name": "ttt333-name/lll111-name/bbb222-name"
},
{
"hash": "<bbb222-hash>",
"size": 22
"name": "bbb222-name"
}
]
}

3.7 IPNS:命名和可变状态(IPNS: Naming and Mutable State)

So far, the IPFS stack forms a peer-to-peer block exchange constructing a content-addressed DAG of objects. It serves to publish and retrieve immutable objects. It can even track the version history of these objects. However, there is a critical component missing: mutable naming. Without it, all communication of new content must happen off-band, sending IPFS links. What is required is some way to retrieve mutable state at the same path.

目前为止,IPFS桟由内容寻址的DAG对象建成的点对点数据交换(系统)。它支持发布和获取不可变的对象,甚至可以跟踪这些对象的历史版本。但是,还缺了一个关键组件:可变的命名。没有这个,发送IPFS的links,所有新内容的通信肯定都会有所偏差。现在所需就是能有某种方法可以获取相同路径的可变状态。

It is worth stating why - if mutable data is necessary in the end - we worked hard to build up an immutable Merkle DAG. Consider the properties of IPFS that fall out of the Merkle DAG: objects can be (a) retrieved via their hash, (b) integrity checked, (c) linked to others, and (d) cached indefinitely. In a sense:

这值得详述原因—如果最终可变数据是必须的—我们费了很大的力气构建了一个不可变的Merkle DAG。就当做IPFS脱离了Merkle DAG的特征:对象可以(a)通过哈希值来获取(b)完整性的检查(c)链接到其他的对象(d)无限缓存。从某种意义上说:

对象是永恒的(Objects are permanent)

These are the critical properties of a high-performance distributed system, where data is expensive to move across network links. Object content addressing constructs a web with (a) significant bandwidth optimizations, (b) untrusted content serving, (c) permanent links, and (d) the ability to make full permanent backups of any object and its references.

这些就是一个高性能分布式系统的关键特征,在此系统上跨网络links之间移动文件是非常昂贵的。对象内容可寻址构建了一个具有以下特点的Web,(a)优秀的宽带优化(b)不受信任的内容服务(c)永恒的links(d)能够永久备份任何对象以及它的引用。

The Merkle DAG, immutable content-addressed objects, and Naming, mutable pointers to the Merkle DAG, instantiate a dichotomy present in many successful distributed systems. These include the Git Version Control System, with its immutable objects and mutable references; and Plan9 [?], the distributed successor to UNIX, with its mutable Fossil [?] and immutable Venti [?] filesystems. LBFS [?] also uses
mutable indices and immutable chunks.

Merkle DAG指不可变的内容可寻址对象, 可变命名指向Merkle DAG,很多成功分布式系统中的采用了这种二分法。这些系统包括Git的版本控制系统,使用不可变的对象和可变的引用;还有UNIX分布式的继承者Plan9[?]文件系统,使用可变的Fossil和不可变的Venti[?]。LBFS[?]同样使用可变的索引以及不可变的块。

3.7.1 自认证命名(Self-Certified Names)

Using the naming scheme from SFS [12, 11] gives us a way to construct self-certified names, in a cryptographically assigned global namespace, that are mutable. The IPFS scheme is as follows.

  1. Recall that in IPFS:
    1
    NodeId = hash(node.PubKey)
  2. We assign every user a mutable namespace at:
    1
    /ipns/<NodeId>
  3. A user can publish an Object to this path Signed by her private key, say at:
    1
    /ipns/XLF2ipQ4jD3UdeX5xp1KBgeHRhemUtaA8Vm/
  4. When other users retrieve the object, they can check the signature matches the public key and NodeId. This verifies the authenticity of the Object published by the user, achieving mutable state retrival.

使用SFS[12,11]中的命名方案,给我们提供了一个种可以构建自认证名称的方法,在一个加密指定的全局命名空间中,这是可变的。IPFS的方案如下:
1.回想一下在IPFS中:NodeId = hash(node.PubKey)
2.我们给每个用户分配一个可变的命名空间,在此路径下:/ipns/
3.一个用户可以在此路径下发布一个用自己私钥签名的对象,比如说:/ipns/XLF2ipQ4jD3UdeX5xp1KBgeHRhemUtaA8Vm/
4.当其他用户获取对象时,他们可以检测签名是否与公钥和NodeId匹配。这个验证了用户发布对象的真实性,达到了可变状态的获取。

注意以下细节(Note the following details:)

  • The ipns (InterPlanetary Name Space) separate prefix is to establish an easily recognizable distinction between mutable and immutable paths, for both programs and human readers.
  • 独立的ipns(InterPlanetary的命名空间)前缀是为程序和人类读者建立一个易于识别的可变路径和不可变路径之间的区别。
  • Because this is not a content-addressed object, publishing it relies on the only mutable state distribution system in IPFS, the Routing system. The process is (1) publish the object as a regular immutable IPFS object, (2) publish its hash on the Routing system as a metadata value:
  • 因为这不是一个内容可寻址的对象,所以发布它就要依靠IPFS中的唯一的可变状态分配制度,路由系统。过程是(a)首先把此对象做一个常规的不可变IPFS的对象来发布(b)将此对象的哈希值作为元数据的值发布到路由系统上:
    1
    routing.setValue(NodeId, <ns-object-hash>)
  • Any links in the Object published act as sub-names in the namespace:
  • 发布的对象中任何links在命令空间中充当子名称:
    1
    2
    3
    /ipns/XLF2ipQ4jD3UdeX5xp1KBgeHRhemUtaA8Vm/
    /ipns/XLF2ipQ4jD3UdeX5xp1KBgeHRhemUtaA8Vm/docs
    /ipns/XLF2ipQ4jD3UdeX5xp1KBgeHRhemUtaA8Vm/docs/ipfs
  • it is advised to publish a commit object, or some other object with a version history, so that clients may be able to find old names. This is left as a user option, as it is not always desired.
  • 一般建议发布一个commit对象或者其他对象的时候,要使用历史版本记录,因为这样就用户就可以找到之前使用过的名字。不过由于这并不总是需要的,所以留个用户自己选择。

Note that when users publish this Object, it cannot be published in the same way
注意用户不能使用这样的方式来发布对象。

3.7.2 人类友好命名( Human Friendly Names)

While IPNS is indeed a way of assigning and reassigning names, it is not very user friendly, as it exposes long hash values as names, which are notoriously hard to remember. These work for URLs, but not for many kinds of offline transmission. Thus, IPFS increases the user-friendliness of IPNS with the following techniques.

IPNS的确是一个分配和重新分配名称的好方法,但是对用户却不是十分友好的,因为它使用很长的哈希值作为名称,众所周知这样的名称很难被记住。IPNS足够应付URLs,但对于很多线下的传输工作就没有这么好用了。因此,IPFS使用下面的技术来增加IPNS的用户友好度。

Peer Links.

As encouraged by SFS, users can link other users’ Objects directly into their own Objects (namespace, home, etc).This has the benefit of also creating a web of trust (and supports the old Certificate Authority model):

被SFS所鼓舞,用户可以直接将其他用户的对象link到自己的对象上(命令空间,home目录等等)。这有一个好处就是创建了一个可信任的Web(也支持旧的真实性认证模型):

1
2
3
4
5
6
7
8
# Alice links to bob Bob
ipfs link /<alice-pk-hash>/friends/bob /<bob-pk-hash>
# Eve links to Alice
ipfs link /<eve-pk-hash/friends/alice /<alice-pk-hash>
# Eve also has access to Bob
/<eve-pk-hash/friends/alice/friends/bob
# access Verisign certified domains
/<verisign-pk-hash>/foo.com

DNS TXT IPNS Records.

If /ipns/ is a valid domain name, IPFS looks up key ipns in its DNS TXT records. IPFS interprets the value as either an object hash or another IPNS path:

如果/ipns/<domain>是一个有效的域名称,IPFS会在DNS TXT记录中查找关键的ipns。IPFS会将查找到的值翻译为一个对象的哈希值或者另一个ipns的路径:

1
2
3
4
# this DNS TXT record
ipfs.benet.ai. TXT "ipfs=XLF2ipQ4jD3U ..."
# behaves as symlink
ln -s /ipns/XLF2ipQ4jD3U /ipns/fs.benet.ai

Proquint Pronounceable Identifiers.
There have always been schemes to encode binary into pronounceable words. IPNS supports Proquint [?]. Thus:
总是会有将二进制编码翻译成可读文件的方法。IPNS则支持Proquint[?].。如下:

1
2
3
4
# this proquint phrase
/ipns/dahih-dolij-sozuk-vosah-luvar-fuluh
# will resolve to corresponding
/ipns/KhAwNprxYVxKqpDZ

Name Shortening Services.

Services are bound to spring up that will provide name shortening as a service, offering up their namespaces to users.This is similar to what we see today with DNS and Web URLs:

提供名称缩短为服务必然会涌现出来,为用户提供名称空间。就像我们现在看到的DNS和Web的URLs:

1
2
3
4
# User can get a link from
/ipns/shorten.er/foobar
# To her own namespace
/ipns/XLF2ipQ4jD3UdeX5xp1KBgeHRhemUtaA8Vm

3.8 IPFS用途(Using IPFS)

IPFS is designed to be used in a number of different ways. Here are just some of the usecases I will be pursuing:

  1. As a mounted global filesystem, under /ipfs and /ipns.
  2. As a mounted personal sync folder that automatically versions, publishes, and backs up any writes.
  3. As an encrypted file or data sharing system.
  4. As a versioned package manager for all software.
  5. As the root filesystem of a Virtual Machine.
  6. As the boot filesystem of a VM (under a hypervisor).
  7. As a database: applications can write directly to the Merkle DAG data model and get all the versioning, caching, and distribution IPFS provides.
  8. As a linked (and encrypted) communications platform.
  9. As an integrity checked CDN for large files (without SSL).
  10. As an encrypted CDN.
  11. On webpages, as a web CDN.
  12. As a new Permanent Web where links do not die.

IPFS设计为可以使用多种不同的方式来使用的,下面就是一些我一直在思考的使用场景:

  1. 作为一个挂载的全局文件系统,挂载在/ipfs和/ipns下
  2. 作为一个挂载的个人同步文件夹,自动的进行版本管理,发布,以及备份任何的写入
  3. 作为一个加密的文件或者数据共享系统
  4. 作为所有软件的版本包管理者
  5. 作为虚拟机器的根文件系统
  6. 作为VM的启动文件系统 (在管理程序下)
  7. 作为一个数据库:应用可以直接将数据写入Merkle DAG数据模型中,获取所有的版本,缓冲,以及IPFS提供的分配
  8. 作为一个linked(和加密的)通信平台
  9. 作为一个为大文件的完整性检查CDN(不使用SSL的情况下)
  10. 作为一个加密的CDN
  11. 在网页上,作为一个web CDN
  12. 作为一个links永远存在新的永恒的Web

The IPFS implementations target:

  • (a) an IPFS library to import in your own applications.
  • (b) commandline tools to manipulate objects directly.
  • (c) mounted file systems, using FUSE [?] or as kernel modules.

IPFS要实现的目标:

  • (a) 一个IPFS库可以导出到你自己应用中使用
  • (b) 命令行工具可以直接操作对象
  • (c) 使用FUSE[?]或者内核的模型挂载文件系统

4. 未来(THE FUTURE)

The ideas behind IPFS are the product of decades of successful distributed systems research in academia and open source. IPFS synthesizes many of the best ideas from the most successful systems to date. Aside from BitSwap, which is a novel protocol, the main contribution of IPFS is this coupling of systems and synthesis of designs.

IPFS的思想是几十年成功的分布式系统的研究和开源的产物。IPFS综合了很多迄今为止最成功的系统中优秀的思想。除了BitSwap是新协议之外,IPFS最大的特色就是组合系统和设计。

IPFS is an ambitious vision of new decentralized Internet infrastructure, upon which many different kinds of applications can be built. At the bare minimum, it can be used as a global, mounted, versioned filesystem and namespace, or as the next generation file sharing system. At its best, it could push the web to new horizons, where publishing valuable information does not impose hosting it on the publisher but upon those interested, where users can trust the content they receive without trusting the peers they receive it from,
and where old but important files do not go missing. IPFS looks forward to bringing us toward the Permanent Web.

IPFS雄心勃勃的目标是成为去中心化网络基础设施,基于IPFS可以构建很多不同类型的应用。最低限度,它可以用来作为一个全局的,挂载性,版本控制文件系统和命名空间,或者作为下一代的文件共享系统。在最好的情况下,它可以把网络推向新的层次,由感兴趣人的发布有价值的信息并不是出版商,用户可以信任收到的内容,而不是去信任发送内容的机构(节点),以及一些重要的老文件也永不丢失。IPFS期待着带我们进入到一个永恒Web的世界。

5. 致谢(ACKNOWLEDGMENTS)

IPFS is the synthesis of many great ideas and systems. It would be impossible to dare such ambitious goals without standing on the shoulders of such giants. Personal thanks to David Dalrymple, Joe Zimmerman, and Ali Yahya for long discussions on many of these ideas, in particular: exposing the general Merkle DAG (David, Joe), rolling hash blocking (David), and s/kademlia sybill protection (David, Ali). And special thanks to David Mazieres, for his ever brilliant ideas.

IPFS是许多好思想和系统的结合。如果不是站在这些巨人的肩膀上,都不敢去想要实现这样雄心勃勃的目标。我个人感谢David Dalrymple、Joe Zimmerman和Ali Yahya对这些想法进行了长时间的讨论,尤其是:揭开通用Merkle DAG(David,Joe),滚动哈希阻塞(David)和s/kademlia SyBill保护(David,Ali)。特别感谢David Mazieres带来的绝妙想法。

6. 引用(REFERENCES)

[1] I. Baumgart and S. Mies. S/kademlia: A practicable approach towards secure key-based routing. In Parallel and Distributed Systems, 2007 International Conference on, volume 2, pages 1-8. IEEE, 2007.

[2] I. BitTorrent. Bittorrent and A¸ttorrent software ^surpass 150 million user milestone, Jan. 2012.

[3] B. Cohen. Incentives build robustness in bittorrent. In Workshop on Economics of Peer-to-Peer systems,volume 6, pages 68-72, 2003.

[4] J. Dean and S. Ghemawat. leveldb{a fast and lightweight key/value database library by google, 2011.

[5] M. J. Freedman, E. Freudenthal, and D. Mazieres. Democratizing content publication with coral. In NSDI, volume 4, pages 18{18, 2004.

[6] J. H. Howard, M. L. Kazar, S. G. Menees, D. A.Nichols, M. Satyanarayanan, R. N. Sidebotham, and M. J. West. Scale and performance in a distributed file system. ACM Transactions on Computer Systems (TOCS), 6(1):51{81, 1988.

[7] J. Kubiatowicz, D. Bindel, Y. Chen, S. Czerwinski, P. Eaton, D. Geels, R. Gummadi, S. Rhea, H. Weatherspoon, W. Weimer, et al. Oceanstore: An architecture for global-scale persistent storage. ACM Sigplan Notices, 35(11):190-201, 2000.

[8] D. Levin, K. LaCurts, N. Spring, and B. Bhattacharjee. Bittorrent is an auction: analyzing and improving bittorrent’s incentives. In ACM SIGCOMM Computer Communication Review, volume 38, pages 243{254. ACM, 2008.

[9] A. J. Mashtizadeh, A. Bittau, Y. F. Huang, and D. Mazieres. Replication, history, and grafting in the ori file system. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pages 151-166. ACM, 2013.

[10] P. Maymounkov and D. Mazieres. Kademlia: A peer-to-peer information system based on the xor metric. In Peer-to-Peer Systems, pages 53-65. Springer, 2002.

[11] D. Mazieres and F. Kaashoek. Self-certifying file system. 2000.

[12] D. Mazieres and M. F. Kaashoek. Escaping the evils of centralized control with self-certifying pathnames. In Proceedings of the 8th ACM SIGOPS European workshop on Support for composing distributed applications, pages 118{125. ACM, 1998.

[13] J. Rosenberg and A. Keranen. Interactive connectivity establishment (ice): A protocol for network address translator (nat) traversal for offer/answer protocols. 2013.

[14] S. Shalunov, G. Hazel, J. Iyengar, and M. Kuehlewind. Low extra delay background transport (ledbat).
draft-ietf-ledbat-congestion-04. txt, 2010.

[15] R. R. Stewart and Q. Xie. Stream control transmission protocol (SCTP): a reference guide. Addison-Wesley Longman Publishing Co., Inc., 2001.

[16] L. Wang and J. Kangasharju. Measuring large-scale distributed systems: case of bittorrent mainline dht. In Peer-to-Peer Computing (P2P), 2013 IEEE Thirteenth International Conference on, pages 1-10. IEEE, 2013.