im钱包官方下载安装|sdfs

作者: im钱包官方下载安装
2024-03-07 21:00:06

SDF(signed distance field)基础理论和计算 - 知乎

SDF(signed distance field)基础理论和计算 - 知乎首发于computer graphics切换模式写文章登录/注册SDF(signed distance field)基础理论和计算gzhao01一、SDF基础理论1.1SDF简单介绍一般来说,无论2d或者3d资产都有隐式(implicit)和显式(explicit)两种存储方式,比如3d模型就可以用mesh直接存储模型数据,也可以用sdf、点云(point cloud)、神经网络(nerual rendering)来表示,2d资产(这里指贴图)也是如此。比如贴图一般直接使用rgb、hsv等参数来进行表示,但这样子再放大图片后会出现锯齿,所以想要获取高清的图像就需要较大的存储空间,这时候就需要矢量表示,在2d贴图中sdf就是为了这种需求产生的。SDF(Signed Distance Field)在3d和2d中都有对应的应用。在3d中光线追踪对于性能的消耗过大,所以sdf常常被用来作为物体的隐式表达,配合ray marching达到接近光线追踪的效果,也有比如deepSDF这种对于模型的隐式表达方面的应用。在2d中,sdf常常被用来表示字体,原神的面部渲染中阴影部分贴图也是基于sdf生成的。SDF的本质就是存储每个点到图形的最近距离,即将模型划出一个表面,在模型表面外侧的点数值大于0,在模型表面内侧的点数值小于0,如下所示:与SDF对应的是Unsigned Distance Field,对于物体内部的点距离都表示为0,物体外侧的点则为正数,存储到最近物体的距离。如下所示,就是一个Unsigned Distance Field。Unsigned Distance Field表示一条直线SDF就是在Unsigned Distance Field的基础上添加了正负关系来表示是在物体内部还是外部1.2SDF的性质2d的sdf图映射到[0,1]上后的重要性质就是0.5表示物体的边界,而且显卡可以在硬件层面上进行bilinear interpolation,这样也就能保证,无论sdf图放大多少只要设置阈值只显示大于0.5的区域就可以保证图片的清晰度。二、由灰度图生成二维SDF2.1暴力法1)一种简单的方法是:到前景的distance field:各点到最近前景点的距离。(不清楚包不包括到自身的距离)到背景的distance field:各点到最近背景景点的距离。则: signed distance field = 到背景的distance field - 到前景的distance field。当然这样子计算出来的数值有正负之分,想要存储在贴图中还需要对其进行映射。(sdfDis-minDis)/(maxDis-minDis)这样子即可将其映射到[0,1]之间,也可以再乘以255映射到[0,255]之间。很拉跨的O(n^4)复杂度的代码import math

import cv2 as cv

import numpy as np

width = 100

min_dis = np.ones((width,width),dtype=np.uint8)*(width+1)

min_dis2 = np.ones((width,width),dtype=np.uint8)*(width+1)

ori_img = cv.imread('a.png',0)

show_img = cv.resize(ori_img, (width,width),interpolation=cv.INTER_AREA)

def getDis(x, y, x1, y1):

return math.sqrt((x-x1)*(x-x1) + (y-y1)*(y-y1))

#白色区域,在表面外部

def isWhite(x,y):

return show_img[x][y]==255

#各点到最近的白色距离

for x in range(width):

for y in range(width):

tmp_min_dis = width+1

for x1 in range(width):

for y1 in range(width):

if(isWhite(x1,y1)):

tmp_min_dis = min(tmp_min_dis,getDis(x,y,x1,y1))

min_dis[x][y] = tmp_min_dis

#各点到最近的黑色距离

for x in range(width):

for y in range(width):

tmp_min_dis = width+1

for x1 in range(width):

for y1 in range(width):

if(not(isWhite(x1,y1))):

tmp_min_dis = min(tmp_min_dis,getDis(x,y,x1,y1))

min_dis2[x][y] = tmp_min_dis

final_dis = min_dis2 - min_dis

min_val = np.min(final_dis)

max_val = np.max(final_dis)

val_dis = max_val-min_val

final_dis = (final_dis+min_val)*255/val_dis

cv.imshow("img",final_dis)

cv.waitKey(0)

cv.destroyAllWindows()结果如下复杂度太高了,只能画那么大的图像,虽然不知道结果可能错了,但核心思想基本如此。2)暴力法还有一种方法就是假设源纹理为2000*2000,将其压缩到500*500的目标纹理上,那么就扫描一遍原始纹理,找到对应的目标纹理上的点,并且将点分为inside和outside两种,对每个点求最近的和自己不是一种类型的点的距离(即inside点求离自己最近的outside点的距离,ouside点亦之)。2.2Saito的算法求解欧式距离可以进行分解,saito算法就是基于这种原理:欧几里得距离转换(EDT)算法_tianwaifeimao的博客-CSDN博客_meijster算法2.3 8ssedt算法Signed Distance Fields欧克欧克:Signed Distance Field上面的网址提到8ssedt算法并且给出了c++实现,可以在线性时间内生成sdf图。以下是对以上网站提供的c++代码的注释版本#include "SDL/sdl.h"

#include

#define WIDTH 256

#define HEIGHT 256

struct Point

{

//dx,dy表示对于当前点的偏移值

int dx, dy;

int DistSq() const { return dx*dx + dy*dy; }

};

struct Grid

{

Point grid[HEIGHT][WIDTH];

};

Point inside = { 0, 0 };

Point empty = { 9999, 9999 };

Grid grid1, grid2;

Point Get( Grid &g, int x, int y )

{

// OPTIMIZATION: you can skip the edge check code if you make your grid

// have a 1-pixel gutter.

if ( x >= 0 && y >= 0 && x < WIDTH && y < HEIGHT )

return g.grid[y][x];

else

return empty;

}

void Put( Grid &g, int x, int y, const Point &p )

{

g.grid[y][x] = p;

}

void Compare( Grid &g, Point &p, int x, int y, int offsetx, int offsety )

{

//获取当前点偏移后的点

Point other = Get( g, x+offsetx, y+offsety );

//给获取的点的dx和dy设置对应的偏移值

other.dx += offsetx;

other.dy += offsety;

if (other.DistSq() < p.DistSq())

p = other;

}

// Now all we have to do is run the propagation algorithm. See the paper for exactly what's happening here,

// but basically the idea is to see what the neighboring pixel has for it's dx/dy,

// then try adding it onto ours to see if it's better than what we already have.

void GenerateSDF( Grid &g )

{

// Pass 0

//遍历当前点以及左右、左下、右下、正下方的点,找到距离最短的点存储在网格对应位置处

for (int y=0;y

{

for (int x=0;x

{

Point p = Get( g, x, y );

Compare( g, p, x, y, -1, 0 );

Compare( g, p, x, y, 0, -1 );

Compare( g, p, x, y, -1, -1 );

Compare( g, p, x, y, 1, -1 );

Put( g, x, y, p );

}

for (int x=WIDTH-1;x>=0;x--)

{

Point p = Get( g, x, y );

Compare( g, p, x, y, 1, 0 );

Put( g, x, y, p );

}

}

// Pass 1

for (int y=HEIGHT-1;y>=0;y--)

{

for (int x=WIDTH-1;x>=0;x--)

{

Point p = Get( g, x, y );

Compare( g, p, x, y, 1, 0 );

Compare( g, p, x, y, 0, 1 );

Compare( g, p, x, y, -1, 1 );

Compare( g, p, x, y, 1, 1 );

Put( g, x, y, p );

}

for (int x=0;x

{

Point p = Get( g, x, y );

Compare( g, p, x, y, -1, 0 );

Put( g, x, y, p );

}

}

}

int main( int argc, char* args[] )

{

if ( SDL_Init( SDL_INIT_VIDEO ) == -1 )

return 1;

SDL_Surface *screen = SDL_SetVideoMode( WIDTH, HEIGHT, 32, SDL_SWSURFACE );

if ( !screen )

return 1;

// Initialize the grid from the BMP file.

SDL_Surface *temp = SDL_LoadBMP( "test.bmp" );

temp = SDL_ConvertSurface( temp, screen->format, SDL_SWSURFACE );

SDL_LockSurface( temp );

for( int y=0;y

{

for ( int x=0;x

{

Uint8 r,g,b;

Uint32 *src = ( (Uint32 *)( (Uint8 *)temp->pixels + y*temp->pitch ) ) + x;

SDL_GetRGB( *src, temp->format, &r, &g, &b );

// Points inside get marked with a dx/dy of zero.

// Points outside get marked with an infinitely large distance.

// 两个网格,一个内部设置成0,外部设置成正无穷,另一个网格相反。

if ( g < 128 )

{

Put( grid1, x, y, inside );

Put( grid2, x, y, empty );

} else {

Put( grid2, x, y, inside );

Put( grid1, x, y, empty );

}

}

}

SDL_UnlockSurface( temp );

// Generate the SDF.

GenerateSDF( grid1 );

GenerateSDF( grid2 );

// Render out the results.

SDL_LockSurface( screen );

for( int y=0;y

{

for ( int x=0;x

{

// Calculate the actual distance from the dx/dy

//计算偏移值的点距离当前点的距离

int dist1 = (int)( sqrt( (double)Get( grid1, x, y ).DistSq() ) );

int dist2 = (int)( sqrt( (double)Get( grid2, x, y ).DistSq() ) );

int dist = dist1 - dist2;

// Clamp and scale it, just for display purposes.

int c = dist*3 + 128;

if ( c < 0 ) c = 0;

if ( c > 255 ) c = 255;

Uint32 *dest = ( (Uint32 *)( (Uint8 *)screen->pixels + y*screen->pitch ) ) + x;

*dest = SDL_MapRGB( screen->format, c, c, c );

}

}

SDL_UnlockSurface( screen );

SDL_Flip( screen );

// Wait for a keypress

SDL_Event event;

while( true )

{

if ( SDL_PollEvent( &event ) )

switch( event.type )

{

case SDL_QUIT:

case SDL_KEYDOWN:

return true;

}

}

return 0;

}

以下是该代码的输入和运算结果下面的博客还对8ssedt方法进行了编译上的优化记一次代码优化(C++)三、参考https://en.wikipedia.org/wiki/Signed_distance_functionSigned Distance Fields Part 1: Unsigned Distance FieldsSigned Distance Fields Part 2: Solid geometry小鱼干:SDF(signed distance field) 算法(转)欧几里得距离转换(EDT)算法_tianwaifeimao的博客-CSDN博客_meijster算法Signed Distance Fields欧克欧克:Signed Distance Field编辑于 2022-07-02 22:18计算机图形学游戏开发Unity(游戏引擎)​赞同 172​​3 条评论​分享​喜欢​收藏​申请转载​文章被以下专栏收录computer graph

Signed Distance Field - 知乎

Signed Distance Field - 知乎首发于Tiny TA的碎碎念切换模式写文章登录/注册Signed Distance Field欧克欧克不想搞图形的ta不是一枚好厨师。signed distance field最近UE4.26上线了,离UE5又近了一点。UE的各种渲染大量运用了一种名为Signed Distance Field的技术,前段时间刷屏的《黑神话·悟空》的主程,在一次分享会上也介绍说《悟空》项目中,使用了SDF来实现了诸多效果。为了制作SDF的demo,这段时间查阅了许多资料,发现关于SDF字体的文章比较多,运用相关的也不少,但很少有文章去解释SDF到底是什么样的一种技术,为什么它能解决那么多用其他传统方案不太好处理的问题。所以,本篇文章是以一个自学者的视角去理解SDF,如有错误,还请指正~signed distance field,中文名为有向距离场,SDF有2D和3D的区别,它的定义非常简单:每个像素(体素)记录自己与距离自己最近物体之间的距离,如果在物体内,则距离为负,正好在物体边界上则为0。以简单的图为例,如下:我们规定白色表示为物体,黑色表示为空,那么这样一副中间一个圆的图像,其对应的SDF图应该就是这样:为了图片显示,将有向距离值映射到了[0-1]范围,0.5表示物体内外的分界线。可以清楚看到,位于圆中心的点是最黑的,因为它处于物体最内部,而图片的四个角最白,因为它们距离圆最远。有了大概印象,那我们就正式开始SDF的学习。SDF生成算法 - 8ssedtSDF的定义很简单,生成一张SDF图的算法也不难,暴力出奇迹嘛~但暴力算法的复杂度达到了平方级,当图片分辨率达到1K,估摸着算一次需要的时间都够下楼买杯奶茶了。虽然SDF大部分是离线生成的,但保不准也会有实时性的需求,而且作为程序员(虽然和之前比已经很少写代码了),还是要追求一下代码的优雅。8ssedt就是一种能在线性时间内,计算出SDF的算法。在介绍8ssedt之前,我们先来分析一下SDF的特点。首先,我们设定像素点值为0表示空,1表示为物体,那么对于任何一个像素点,我们要找距离它最近的目标像素点,就有以下几种情况:像素点值为1:自身就是目标点,所以距离为0。像素点值为0:目标点应该在自己的四周,但可能是上下左右任意一方向。第一种情况很简单,麻烦就麻烦在第二种情况,那我们再细分第二种情况。假如当前像素点周围最近的某个像素(上下左右四个方向距离为1的像素)正好为1,那我这个像素点的SDF就应该为1,因为不会有更近的情况了,其次就是左上左下右上右下四个点,如果有为1的点,那该像素点的SDF值就应该为根号2。以此类推,如果我知道了当前像素点周围所有像素的SDF值,那么该像素点的SDF值一定为:MinimumSDF(near.sdf + distance(now,near))near表示附近像素点,now表示当前像素,near.sdf表示near的SDF值,distance表示两点之间距离。有木有很熟悉的感觉,这就是妥妥的动态规划的递推公式啊!用伪代码完整的表示:now.sdf = 999999;

if(now in object){

now.sdf = 0;

}else{

foreach(near in nearPixel(now)){

now.sdf = min(now.sdf,near.sdf + distance(now,near));

}

}

弄清楚了最核心的递推公式,那就可以直接上8ssedt的代码了:#define WIDTH 256

#define HEIGHT 256

struct Point

{

int dx, dy;

int DistSq() const { return dx*dx + dy*dy; }

};

struct Grid

{

Point grid[HEIGHT][WIDTH];

};

Point inside = { 0, 0 };

Point empty = { 9999, 9999 };

Grid grid1, grid2;

Point Get( Grid &g, int x, int y )

{

// OPTIMIZATION: you can skip the edge check code if you make your grid

// have a 1-pixel gutter.

if ( x >= 0 && y >= 0 && x < WIDTH && y < HEIGHT )

return g.grid[y][x];

else

return empty;

}

void Put( Grid &g, int x, int y, const Point &p )

{

g.grid[y][x] = p;

}

void Compare( Grid &g, Point &p, int x, int y, int offsetx, int offsety )

{

Point other = Get( g, x+offsetx, y+offsety );

other.dx += offsetx;

other.dy += offsety;

if (other.DistSq() < p.DistSq())

p = other;

}

void GenerateSDF( Grid &g )

{

// Pass 0

for (int y=0;y

{

for (int x=0;x

{

Point p = Get( g, x, y );

Compare( g, p, x, y, -1, 0 );

Compare( g, p, x, y, 0, -1 );

Compare( g, p, x, y, -1, -1 );

Compare( g, p, x, y, 1, -1 );

Put( g, x, y, p );

}

for (int x=WIDTH-1;x>=0;x--)

{

Point p = Get( g, x, y );

Compare( g, p, x, y, 1, 0 );

Put( g, x, y, p );

}

}

// Pass 1

for (int y=HEIGHT-1;y>=0;y--)

{

for (int x=WIDTH-1;x>=0;x--)

{

Point p = Get( g, x, y );

Compare( g, p, x, y, 1, 0 );

Compare( g, p, x, y, 0, 1 );

Compare( g, p, x, y, -1, 1 );

Compare( g, p, x, y, 1, 1 );

Put( g, x, y, p );

}

for (int x=0;x

{

Point p = Get( g, x, y );

Compare( g, p, x, y, -1, 0 );

Put( g, x, y, p );

}

}

}

int main( int argc, char* args[] )

{

for( int y=0;y

{

for ( int x=0;x

{

Uint8 r,g,b;

Uint32 *src = ( (Uint32 *)( (Uint8 *)temp->pixels + y*temp->pitch ) ) + x;

SDL_GetRGB( *src, temp->format, &r, &g, &b );

// Points inside get marked with a dx/dy of zero.

// Points outside get marked with an infinitely large distance.

if ( g < 128 )

{

Put( grid1, x, y, inside );

Put( grid2, x, y, empty );

} else {

Put( grid2, x, y, inside );

Put( grid1, x, y, empty );

}

}

}

......

// Generate the SDF.

GenerateSDF( grid1 );

GenerateSDF( grid2 );

......

}

8ssedt首先遍历一遍图像,将物体内和物体外的点标记出来,这里有两个gird,因为是有向距离场,分别计算物体外到物体的距离,以及物体内部的点到物体外的距离,如前文提到一般,前者为正值,后者为计算出SDF后视距离为负值。核心函数就是GenerateSDF,一步步来看:for (int x=0;x

{

Point p = Get( g, x, y );

Compare( g, p, x, y, -1, 0 );

Compare( g, p, x, y, 0, -1 );

Compare( g, p, x, y, -1, -1 );

Compare( g, p, x, y, 1, -1 );

Put( g, x, y, p );

}

void Compare( Grid &g, Point &p, int x, int y, int offsetx, int offsety )

{

Point other = Get( g, x+offsetx, y+offsety );

other.dx += offsetx;

other.dy += offsety;

if (other.DistSq() < p.DistSq())

p = other;

}

这实质上就是找目标点及左上方四个点中,SDF最小的值。PASS0就是按照从上到下,从左到右的顺序,遍历整个图像,遍历完成之后,对于所有物体外的点,如果距离它最近的物体是在它的左上方,那么它的SDF值就已确定。类似的,PASS1就是按照从下到上,从右到左的顺序,依次比较右下方的四个点,遍历完成之后,对于所有物体外的点,如果距离它最近的物体是在它的右下方,那么它的SDF也已经确定了。两个PASS结合,那么整个图像的SDF就都计算出来了。(其实这里称SDF并不准确,因为只算了物体外到物体边界的距离,是正值,并没有signed一说,只有做完下一步计算物体内到物体外的距离,两个距离相减,才是SDF)第二个grid的GenerateSDF就很好理解了,就是计算物体内部到外部的距离。因为一个点要么在物体内要么在物体外,所以两次的SDF值要么全为零(在边界上),要么一个为0,一个为距离值。用:grid1(pixel).sdf - grid2(pixel).sdf就能得到完整的SDF。https://www.jianshu.com/p/58271568781d 这篇文章作者从编译的角度对cpp代码进行了优化,居然可以减少一半的运行时间,感兴趣的朋友可以深入研究一下SIMD相关的内容。unity最近也更新了许多库和API来支持SIMD,其实从数据结构的角度,面向对象的编程是不如面向数据的编程的,unity的DOTS,以及ECS应该会慢慢普及开来,之后应该会写一篇unity最新API使用相关的文章,最近事情有点多~SDF vs BitmapSDF最常见的应用是各种字体,知乎上https://zhuanlan.zhihu.com/p/26217154 这篇文章详细介绍了SDF在文字方面的应用。我们来探索一下更普遍的应用场景。比如说这是一张雪花的贴图:我们生成它对应的SDF贴图:根据SDF贴图生成的规则,0.5表示边界,小于0.5表示在物体内,那么我们导入unity后,在shader中将大于0.5的部分去掉,小于0.5的部分颜色设为1,可以得到下面这样一张和原始贴图的对比图:乍一看其实是看不出来这两张图片有什么区别的,但是如果我们把图片放大几倍再看看:可以很明显的看出两者的区别了,前者SDF生成的图像边缘依旧保持光滑锐利,而雪花贴图放大后因为精度不足已经出现了非常多的锯齿,这其实就像是PS中,矢量图和位图的区别。alias and anti-aliasing在游戏领域、或者说计算机图形学领域,AA抗锯齿永远是一个不断寻求更优解的过程。关于什么是锯齿,锯齿产生的本质原因大家可以看闫大大的计算机图形学课程https://www.bilibili.com/video/BV1X7411F744?p=6 ,强烈推荐!看完这些课程,就能够把图像,锯齿,走样,数字信号相关的知识点联系起来。那么,为了要弄清楚SDF保持高清放大的原因,我们先要理清楚SDF和位图区别。首先,我们初衷是为了传递一组信息(信号),比如说一片雪花的几何形状。为了传递这组信号,我们使用了两种不同的技术手段:SDF和位图。SDF使用一定分辨率的图片,存储了每个像素点与雪花边界的距离,接受到这些信号后(收到贴图),我们需要通过接收到的内容重建原始信号,所以我们要使用SDF对应的方法处理图片(小于0.5部分表示在雪花内),才能得到雪花的形状。而位图就简单粗暴,我每个像素点存放的,就是直观的RGB颜色,直接显示在屏幕上就好了。假设两种贴图都是256的精度,那么看上去好像没什么问题(图1),但如果我们现在需要显示1K大小的图像,那么我们就需要对两种贴图进行放大操作。在介绍渲染管线的时候,我们有说过,GPU是通过插值的方式来确定中间点的信息。将贴图从256放大到1024,平均下来,256贴图中每相邻的两个像素之间要插入三个像素点:A B1 B2 B3 C

B1 = A * 0.75 + C * 0.25

B2 = A * 0.5 + C * 0.5

B3 = A * 0.25 + C * 0.75

对于位图信息,每个像素点中存放的是RGB颜色,是标量,通过插值运算得到的结果无实际意义。但对于SDF,每个点存放的是到边界的有向距离,是一个向量,向量通过插值运算得到的结果就是某种意义上,这个新增点应该有的值:Bitmap 存储各点的颜色

A B1 B2 B3 C

A = (0,0,0),C = (1,1,1)

B1 = A * 0.75 + C * 0.25 = (0.25,0.25,0.25)

B2 = A * 0.5 + C * 0.5 = (0.5,0.5,0.5)

B3 = A * 0.25 + C *0.75 = (0.75,0.75,0.75) //无实际意义的颜色

SDF,存储距离原点的距离

A B1 B2 B3 C

A = (0,0),C = (4,0)

B1 = A * 0.75 + C * 0.25 = (1,0)

B2 = A * 0.5 + C * 0.5 = (2,0)

B3 = A * 0.25 + C *0.75 = (3,0) //B1 B2 B3 "正确"的与原点之间的距离

位图存的是每个像素点颜色,颜色是一个标量。两个颜色相加再平均是没有意义的,只是数学上面的混合,对于图像还原来说,就是错误的。SDF每个像素点存的是一个有方向的矢量,最终变成我们看见的图片是需要通过一个计算才能还原的,所以直接对SDF图的像素点操作,实质上是操作向量,放大过程实质上就是向量重建的过程,是不损失精度的。SDF就是利用了插值器的特性实现了光滑放大效果。smooth lerp between textures除了字体方面的应用,SDF还能够实现图像之间的平滑过渡,对不同图像的两张SDF进行插值就行了。在shader中只需要简单的插值:half4 col = half4(1,1,1,1);

half a1 = lerp(color1.r , color2.r, _SDFLerp);

col.a = smoothstep(0.5, 0.5 - _SmoothDelta , a1);

就能得到比较理想的过渡效果:对于每个像素点,本质上是距离上的平滑过渡,转换成RBG颜色就是图像间的平滑过渡了。在非真实感渲染的领域中,如卡通渲染,很多像阴影,高光之类的表现都是由艺术家们控制生成的,大部分都是物理不正确的,甚至许多游戏为了使3D角色看起来更像2D手绘卡通人物,做了许许多多特殊处理。SDF技术在卡渲相关资源的创作过程中,也起到了非常大的作用。这是卡通渲染时脸部阴影的制作方式,简单来说就是艺术家绘制好一些特定光线角度时的脸部阴影,然后通过SDF插值计算出中间过程,并将中间过程叠加到一张图上,通过简单的blur或者smooth操作实现平滑。这是我用上面两张SDF生成的平滑图像:在ps中使用阙值可以查看过渡效果:在视频中,角色脸部阴影使用了多张贴图,最后融合到一张贴图上,实现的平滑过渡,但使用SDF的话,一张原始贴图就要对应一张SDF图,五张原始贴图的平滑渐变就需要五张SDF图来做插值。为什么使用这种方式能节省多张贴图资源?NO FREE LUNCH! 首先,生成的smooth贴图由SDF退化成了位图,每个像素点存储的就是RGB颜色,只是通过阙值来控制显示不同的像素。其次,该方法对原始贴图也有了限制,因为各插值图像是简单叠加在一起的,而使用方式是通过控制阙值显示不同的图像,所以这就要求原始图像的对应像素点之间一定是一个单调变化的关系。比如说,我用三张贴图ABC简单叠加在一起。假设像素x在贴图A的值为0,B为1,C为1,叠加结果像素x的值为:x=(0+1+1) / 3 =0.66667我们令小于阙值的像素点显示,只需要将阙值由1减到0,就能实现x像素由A-B-C的正确显示。反之,在ABC的值分别为(1,1,0),只需要将阙值由0-1变化即可。但是,如果ABC中同时存在以上两种情况的像素点,无论怎么控制阙值,都无法实现图像的正确显示。或者,对于像素点y在ABC的值分别为(0,1,0),得到y的值为0.333,同样无法通过阙值来描述点y(0-1-0)的变化过程。所以,使用该方法生成的smooth图一方面限制了精度(贴图4通道32位单通道精度256),另一方面限制了使用情景。但在少数场合中使用该方案的性价比还是非常高的。碎碎念这篇只是SDF一些基础介绍和相关内容的学习心得,3D的SDF暂时还没有看,在GI方面的使用也还没捣鼓过,留个坑以后再填吧。本文没有涉及到SDF抗锯齿部分的内容,感兴趣的朋友可以看这篇文章https://drewcassidy.me/2020/06/26/sdf-antialiasing/呐,大概就这么多吧,魔幻2020就要过去啦,祝大家岁岁平安,万事如意,希望明年是个好年orz也欢迎大家关注我的公众号:void_TATa Tiny TA的碎碎念github:Alunice/TaTa 编辑于 2020-12-18 18:11计算机图形学Unity(游戏引擎)渲染​赞同 656​​31 条评论​分享​喜欢​收藏​申请转载​文章被以下专栏收录Tiny TA的碎碎念分享一些自己学习心得~立志当一枚好

如何评价专门为 SSD 设计的 SSDFS 文件系统? - 知乎

如何评价专门为 SSD 设计的 SSDFS 文件系统? - 知乎首页知乎知学堂发现等你来答​切换模式登录/注册Linux文件系统如何评价专门为 SSD 设计的 SSDFS 文件系统?Linux 引入新的 SSDFS 文件系统,针对 ZNS SSD 进行优化显示全部 ​关注者6被浏览1,943关注问题​写回答​邀请回答​好问题​添加评论​分享​1 个回答默认排序星外之神北京大学化学与分子工程学院;北大Linux社;Vala社区​ 关注Bcahcefs都还没合并,这个可能要等更久吧。。。Btrfs自5.18以来挺稳定的,功能也很强大,我也不太想换了(个人认为SSDFS最大的亮点是把减小写放大作为了一个核心任务,之前的闪存友好文件系统有的是为磨损均衡设计,有的是为了闪存设备的IO优化,虽然也兼顾了写放大的减小,但是似乎没有太专注于这方面。而SSDFS的开发者声称SSDFS相比于F2FS等闪存友好文件系统,在写放大的减小方面仍然有很大的提升,不知道实际情况怎么样。目前XFS这样使用B+Tree的日志文件系统是写放大较明显的,Btrfs这样的CoW BTree文件系统,ext4这样的传统文件系统的写放大其实也可以接受,而看起来f2fs和nilfs的写放大确实还有一定的优化空间。f2fs毕竟是一个为手机设计的文件系统,在设计的时候格局有点小了,拓展支持比较落后,而且稳健性不够,fsck功能孱弱,掉电风险大,不适合用于大规模IO的使用环境。在日常使用上,f2fs开了attr拓展权限后不支持grub启动,还需要单独/boot分区,比较麻烦。此外,f2fs虽然支持透明压缩,但是这个压缩只是单纯为了减少闪存磨损,甚至不会释放压缩节约出来的空间,非常鸡肋。要是SSDFS能解决这些痛点,也许能占有不错的生态位(?不过2023年Btrfs的性能已经今非昔比了,我还是安心使用吧~编辑于 2023-03-30 10:54​赞同 3​​添加评论​分享​收藏​喜欢收起​​

UE4 Signed Distance Fields:符号距离场(一) - 知乎

UE4 Signed Distance Fields:符号距离场(一) - 知乎切换模式写文章登录/注册UE4 Signed Distance Fields:符号距离场(一)一只飞奔的猪​北京大学 工业设计工程硕士一个 raymarched 程序 SDFhttps://www.zhihu.com/video/1490997386600652800什么是符号距离场?Raymarching 是一种 3D 渲染技术,因其简单性和速度而受到编程爱好者的称赞。它已在演示场景中广泛使用,生成小尺寸的可执行文件和令人惊叹的视觉效果。有符号距离函数,或简称 SDF/Raymarching SDF,在传递空间中某个点的坐标时,返回该点与某个表面之间的最短距离。返回值的符号指示该点是在该表面内部还是外部(因此是带符号的距离函数)。这个想法是这样的:假设你在太空中有一些表面。您没有明确的公式,也没有一组描述它的三角形。但是你可以从任何一点找出它有多远。你将如何渲染这个表面?光线行进算法一旦我们将某些东西建模为 SDF,我们如何渲染它?这就是光线行进算法的用武之地!就像在光线追踪中一样,我们为摄像机选择一个位置,在其前面放置一个网格,从摄像机发送光线通过网格中的每个点,每个网格点对应于输出图像中的一个像素。来自维基百科上的“光线追踪”不同之处在于场景的定义方式,这反过来又改变了我们寻找视线和场景之间交点的选项。在光线追踪中,场景通常根据显式几何定义:三角形、球体等。为了找到视图光线和场景之间的交点,我们进行了一系列几何相交测试:这条光线与这个三角形在哪里相交,如果有的话?这个如何?这个球体呢?在 raymarching 中,整个场景是根据有符号距离函数定义的。为了找到视野光线和场景之间的交点,我们从相机开始,沿着视野光线一点一点地移动一个点。在每一步,我们都会问“这个点在场景表面内吗?”,或者换一种说法,“SDF 在这个点上计算为负数吗?”。如果是这样,我们就完成了!我们撞到了什么。如果不是,我们会继续沿射线达到最大步数。我们可以每次只沿着视图光线的一个非常小的增量前进,但是使用“球体追踪”我们可以做得比这更好(在速度和准确性方面)。我们不是迈出一小步,而是迈出我们知道安全的最大一步,而不通过表面:我们通过距离场来一步步的接近表面,这是 SDF 为我们提供的!什么是二维距离场?那么到底什么是符号距离场?SDF 代表有符号距离场,但我现在将忽略“有符号”,让我们来谈谈什么是距离场。二维距离场是在同一帧中,一个物体与另一个物体的边缘或边缘之间的距离的视觉表示。拿这个圆来举例:圆形图标距离场这个距离场表示,无论它是纯白色的(1),它表示在物体内部,离物体最近。当它是纯黑(0)时,它是离物体最远的点。这是一种很好表示距离场的方法通过数值0到1。另一种表示方法:将距离场想象成一堆不透明的圆圈因为你可以在你的着色器中选择一个范围,你想要渲染多少距离场,所以我可以选择渲染最大范围 1 和最小范围 0.5,看起来像这样: 但你也可以选择很小的距离场范围,比如 0.5 - 0.49将距离场最大值调整为 0.5 和最小值调整为 0.49 后的新圆圈大小因此,当使用 2D 距离场纹理时,您可以像这样在材质中进行设置:你可以看到,虽然可以制作一个比原始纹理图像更大的圆,但它在具有完美圆形形状的东西上并不是那么有用。那是因为纹理是在各种压缩下被采样的,一个完美的圆圈可能看起来不太好。因此,这种距离场技术通常用于不是纯圆形的图标,以及当您想要在不常见的形状周围“发光”时使用。在Photoshop中添加距离场使用笔触功能并在笔触内将您的填充类型设置为“Gradient”样式设置为“Shape Burst”并将位置设置为“Outside”(暂时),然后将Size更改为适合框架的任何大小,现在您已经拥有了一个距离场!就这么简单。PS图层样式Stroke因此,当我们UE材质中加入这个新纹理并将 Min/Max 设置为0.99,1 时,您可以看到我们在引擎中获得了我们的图标:您可以看到,如果您将 Max/Min 保持在接近的长度内,您将获得清晰的图像,如果您将 Max/Min 保持在较长的长度之间,您将获得模糊的图像。如果您想说在任何图标周围制作“发光”,这可能会很方便,您实际上可以使用距离场并将其烘焙到图标中,或者您可以做一个 R/G 通道纹理,其中您的图标为红色R通道,距离场是绿色G通道。这是一种廉价的发光方式,但它确实需要以这种方式设置图标纹理。 所以一个纹理有发光和一个图标的信息;它甚至可以做阴影和笔触......但它确实需要一些逻辑和设置,但这里是发光的强大功能: 您可以通过 UMG 中的 Sequencer 对其进行动画处理,或者您可以将某种发光烘焙到材质中,或者甚至锐化发光以制作笔触......所有这些都可以通过在该纹理上使用距离场的一个纹理来实现。什么是Singed Distance Field(符号距离场)中的“Signed”(符号)呢?“有符号”距离场也是是一样的,只是也要计算在物体内部的距离。因此,如果我们回到原来的圆形示例:圆形图标距离场有符号距离场据此,现在我们有了距离场的内部信息和外部信息:这不是一个精确的线性图!这只是显示的颜色“可能”在纹理上的这些值所以现在回到 photoshop 文件时,我可以将笔触位置类型改为“Center”中心,这会让我得到“有符号的距离场” 但是现在范围不同了。您必须找到正确的 Max/Min 才能找到您的清晰图标。根据您在 Photoshop 中的图标和笔画距离,需要进行一些重新调整才能获得正确的清晰度:那么可以用 SDF 做什么?好吧,它使笔触和变形以及许多其他事情在材质中很容易做到。最简单的做法是“洋葱”或勾勒图标的轮廓这遵循Inigo Quilez 的绝对值规则,然后减去笔画,得到一个在这里找到的洋葱。你也可以做一些疯狂的正弦事情并让它在内部重复:所有这些效果都来自一个纹理。这就是 SDF 的强大之处在于您可以计算各种有距离的事物。它也不必是数学生成的 SDF,您可以使用纹理,正如在 Photoshop 文件中看到的那样。您甚至可以在红/绿通道 2 个不同的图标中进行通道包并将它们一起变形:使用混合选项打包您的图标,并勾选您想要在哪个频道中的图标上的 R 或 G 框,然后当然应用您的 SDF 笔划:使用 TGA - 24bit格式导入虚幻,这是一个很酷的变形材料:希望这有助于说明 SDF 是什么,以及它们在 2D 纹理空间中的用途。最终会讨论由数学生成的 SDF,并更多地了解您可以使用 SDF 制作的疯狂动画,例如这种滴水,它全部使用 SDF 生成——没有纹理——只有一些数学。 更新从 twitter 领域有一个更新,SDF 在技术上是内部距离是负的,所以 - 1--1 这在数学 SDF 生成中更加流行不使用ps贴图的正确SDF距离0-1映射,如下:然后你可以将范围重新映射为-1和1,你会看到这样的东西。准确的说是O - 1之间的黑色但这有点难以在 -1 到 1 值中看到您的 SDF 在负值中所做的事情,但它们就在那里。要更改纹理,您可以通过材质中的ConstantBiasScale(恒定偏差比例)运行它以包含负值以重新映射 0-1 到 -1 - 1:这是一个实用的恒定偏差比例计算器:因此,要将任何值从 0 - 1 更改为 -1 到 1 的新范围,请添加 -0.5 并乘以 2。您可以在此处使用此恒定偏差比例图未完待续...参考链接:发布于 2022-03-26 19:05虚幻 4(游戏引擎)​赞同 63​​14 条评论​分享​喜欢​收藏​申请

sdfs: 短小精干的分布式存储系统 高容错性 数据自动保存多个副本 副本丢失后,自动恢复 适合批处理 移动计算而非数据 数据位置暴露给计算框架 适合小数据处理 kb,m级数据的数据存储 海量规模以上的文件数量 海量节点规模 流式文件访问 一次性写入,多次读取 保证数据一致性 可构建在廉价机器上 通过多副本提高可靠性 提供了容错和恢复机制

sdfs: 短小精干的分布式存储系统

高容错性

数据自动保存多个副本

副本丢失后,自动恢复

适合批处理

移动计算而非数据

数据位置暴露给计算框架

适合小数据处理

kb,m级数据的数据存储

海量规模以上的文件数量

海量节点规模

流式文件访问

一次性写入,多次读取

保证数据一致性

可构建在廉价机器上

通过多副本提高可靠性

提供了容错和恢复机制

登录

注册

开源

企业版

高校版

搜索

帮助中心

使用条款

关于我们

开源

企业版

高校版

私有云

Gitee AI

NEW

我知道了

查看详情

登录

注册

代码拉取完成,页面将自动刷新

捐赠

捐赠前请先登录

取消

前往登录

扫描微信二维码支付

取消

支付完成

支付提示

将跳转至支付宝完成支付

确定

取消

Watch

不关注

关注所有动态

仅关注版本发行动态

关注但不提醒动态

4

Star

11

Fork

2

猴子军团 / sdfs

代码

Issues

0

Pull Requests

0

Wiki

统计

流水线

服务

Gitee Pages

JavaDoc

质量分析

Jenkins for Gitee

腾讯云托管

腾讯云 Serverless

悬镜安全

阿里云 SAE

Codeblitz

我知道了,不再自动展开

加入 Gitee

与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)

免费加入

已有帐号?

立即登录

返回

master

管理

管理

分支 (1)

master

克隆/下载

克隆/下载

HTTPS

SSH

SVN

SVN+SSH

下载ZIP

登录提示

该操作需登录 Gitee 帐号,请先登录后再操作。

立即登录

没有帐号,去注册

提示

下载代码请复制以下命令到终端执行

为确保你提交的代码身份被 Gitee 正确识别,请执行以下命令完成配置

git config --global user.name userName

git config --global user.email userEmail

初次使用 SSH 协议进行代码克隆、推送等操作时,需按下述提示完成 SSH 配置

1

生成 RSA 密钥

2

获取 RSA 公钥内容,并配置到 SSH公钥 中

在 Gitee 上使用 SVN,请访问 使用指南

使用 HTTPS 协议时,命令行会出现如下账号密码验证步骤。基于安全考虑,Gitee 建议 配置并使用私人令牌 替代登录密码进行克隆、推送等操作

Username for 'https://gitee.com': userName

Password for 'https://userName@gitee.com':

#

私人令牌

新建文件

新建 Diagram 文件

新建子模块

上传文件

分支 1

标签 0

贡献代码

同步代码

创建 Pull Request

了解更多

对比差异

通过 Pull Request 同步

同步更新到分支

通过 Pull Request 同步

将会在向当前分支创建一个 Pull Request,合入后将完成同步

猴子军团

logback

f3bf935

18 次提交

提交

取消

提示:

由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件

client

保存

取消

datanode

保存

取消

grpctest

保存

取消

namenode

保存

取消

rpc

保存

取消

script

保存

取消

secondary

保存

取消

LICENSE

保存

取消

README.en.md

保存

取消

README.md

保存

取消

pom.xml

保存

取消

sdfs架构图.png

保存

取消

sdfs详细架构图.jpg

保存

取消

Loading...

README

Apache-2.0

SDFS

介绍

SDFS优点

高容错性

数据自动保存多个副本

副本丢失后,自动恢复

适合批处理

移动计算而非数据

数据位置暴露给计算框架

适合小数据处理

kb,m级数据的数据存储

海量规模以上的文件数量

海量节点规模

流式文件访问

一次性写入,多次读取

保证数据一致性

可构建在廉价机器上

通过多副本提高可靠性

提供了容错和恢复机制

SDFS缺点,不适合以下操作方式:

低延迟数据访问

比如毫秒级

低延迟与高吞吐率

小文件存取

占用NameNode大量内存

一个节点的内存是有限的

寻道时间超过读取时间

并发写入、文件随机修改

一个文件只能有一个写者

软件架构

软件架构说明

安装教程

xxxx

xxxx

xxxx

使用说明

xxxx

xxxx

xxxx

参与贡献

Fork 本仓库

新建 Feat_xxx 分支

提交代码

新建 Pull Request

修改内容

Apache License

Version 2.0, January 2004

http://www.apache.org/licenses/

TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

1. Definitions.

"License" shall mean the terms and conditions for use, reproduction,

and distribution as defined by Sections 1 through 9 of this document.

"Licensor" shall mean the copyright owner or entity authorized by

the copyright owner that is granting the License.

"Legal Entity" shall mean the union of the acting entity and all

other entities that control, are controlled by, or are under common

control with that entity. For the purposes of this definition,

"control" means (i) the power, direct or indirect, to cause the

direction or management of such entity, whether by contract or

otherwise, or (ii) ownership of fifty percent (50%) or more of the

outstanding shares, or (iii) beneficial ownership of such entity.

"You" (or "Your") shall mean an individual or Legal Entity

exercising permissions granted by this License.

"Source" form shall mean the preferred form for making modifications,

including but not limited to software source code, documentation

source, and configuration files.

"Object" form shall mean any form resulting from mechanical

transformation or translation of a Source form, including but

not limited to compiled object code, generated documentation,

and conversions to other media types.

"Work" shall mean the work of authorship, whether in Source or

Object form, made available under the License, as indicated by a

copyright notice that is included in or attached to the work

(an example is provided in the Appendix below).

"Derivative Works" shall mean any work, whether in Source or Object

form, that is based on (or derived from) the Work and for which the

editorial revisions, annotations, elaborations, or other modifications

represent, as a whole, an original work of authorship. For the purposes

of this License, Derivative Works shall not include works that remain

separable from, or merely link (or bind by name) to the interfaces of,

the Work and Derivative Works thereof.

"Contribution" shall mean any work of authorship, including

the original version of the Work and any modifications or additions

to that Work or Derivative Works thereof, that is intentionally

submitted to Licensor for inclusion in the Work by the copyright owner

or by an individual or Legal Entity authorized to submit on behalf of

the copyright owner. For the purposes of this definition, "submitted"

means any form of electronic, verbal, or written communication sent

to the Licensor or its representatives, including but not limited to

communication on electronic mailing lists, source code control systems,

and issue tracking systems that are managed by, or on behalf of, the

Licensor for the purpose of discussing and improving the Work, but

excluding communication that is conspicuously marked or otherwise

designated in writing by the copyright owner as "Not a Contribution."

"Contributor" shall mean Licensor and any individual or Legal Entity

on behalf of whom a Contribution has been received by Licensor and

subsequently incorporated within the Work.

2. Grant of Copyright License. Subject to the terms and conditions of

this License, each Contributor hereby grants to You a perpetual,

worldwide, non-exclusive, no-charge, royalty-free, irrevocable

copyright license to reproduce, prepare Derivative Works of,

publicly display, publicly perform, sublicense, and distribute the

Work and such Derivative Works in Source or Object form.

3. Grant of Patent License. Subject to the terms and conditions of

this License, each Contributor hereby grants to You a perpetual,

worldwide, non-exclusive, no-charge, royalty-free, irrevocable

(except as stated in this section) patent license to make, have made,

use, offer to sell, sell, import, and otherwise transfer the Work,

where such license applies only to those patent claims licensable

by such Contributor that are necessarily infringed by their

Contribution(s) alone or by combination of their Contribution(s)

with the Work to which such Contribution(s) was submitted. If You

institute patent litigation against any entity (including a

cross-claim or counterclaim in a lawsuit) alleging that the Work

or a Contribution incorporated within the Work constitutes direct

or contributory patent infringement, then any patent licenses

granted to You under this License for that Work shall terminate

as of the date such litigation is filed.

4. Redistribution. You may reproduce and distribute copies of the

Work or Derivative Works thereof in any medium, with or without

modifications, and in Source or Object form, provided that You

meet the following conditions:

(a) You must give any other recipients of the Work or

Derivative Works a copy of this License; and

(b) You must cause any modified files to carry prominent notices

stating that You changed the files; and

(c) You must retain, in the Source form of any Derivative Works

that You distribute, all copyright, patent, trademark, and

attribution notices from the Source form of the Work,

excluding those notices that do not pertain to any part of

the Derivative Works; and

(d) If the Work includes a "NOTICE" text file as part of its

distribution, then any Derivative Works that You distribute must

include a readable copy of the attribution notices contained

within such NOTICE file, excluding those notices that do not

pertain to any part of the Derivative Works, in at least one

of the following places: within a NOTICE text file distributed

as part of the Derivative Works; within the Source form or

documentation, if provided along with the Derivative Works; or,

within a display generated by the Derivative Works, if and

wherever such third-party notices normally appear. The contents

of the NOTICE file are for informational purposes only and

do not modify the License. You may add Your own attribution

notices within Derivative Works that You distribute, alongside

or as an addendum to the NOTICE text from the Work, provided

that such additional attribution notices cannot be construed

as modifying the License.

You may add Your own copyright statement to Your modifications and

may provide additional or different license terms and conditions

for use, reproduction, or distribution of Your modifications, or

for any such Derivative Works as a whole, provided Your use,

reproduction, and distribution of the Work otherwise complies with

the conditions stated in this License.

5. Submission of Contributions. Unless You explicitly state otherwise,

any Contribution intentionally submitted for inclusion in the Work

by You to the Licensor shall be under the terms and conditions of

this License, without any additional terms or conditions.

Notwithstanding the above, nothing herein shall supersede or modify

the terms of any separate license agreement you may have executed

with Licensor regarding such Contributions.

6. Trademarks. This License does not grant permission to use the trade

names, trademarks, service marks, or product names of the Licensor,

except as required for reasonable and customary use in describing the

origin of the Work and reproducing the content of the NOTICE file.

7. Disclaimer of Warranty. Unless required by applicable law or

agreed to in writing, Licensor provides the Work (and each

Contributor provides its Contributions) on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or

implied, including, without limitation, any warranties or conditions

of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A

PARTICULAR PURPOSE. You are solely responsible for determining the

appropriateness of using or redistributing the Work and assume any

risks associated with Your exercise of permissions under this License.

8. Limitation of Liability. In no event and under no legal theory,

whether in tort (including negligence), contract, or otherwise,

unless required by applicable law (such as deliberate and grossly

negligent acts) or agreed to in writing, shall any Contributor be

liable to You for damages, including any direct, indirect, special,

incidental, or consequential damages of any character arising as a

result of this License or out of the use or inability to use the

Work (including but not limited to damages for loss of goodwill,

work stoppage, computer failure or malfunction, or any and all

other commercial damages or losses), even if such Contributor

has been advised of the possibility of such damages.

9. Accepting Warranty or Additional Liability. While redistributing

the Work or Derivative Works thereof, You may choose to offer,

and charge a fee for, acceptance of support, warranty, indemnity,

or other liability obligations and/or rights consistent with this

License. However, in accepting such obligations, You may act only

on Your own behalf and on Your sole responsibility, not on behalf

of any other Contributor, and only if You agree to indemnify,

defend, and hold each Contributor harmless for any liability

incurred by, or claims asserted against, such Contributor by reason

of your accepting any such warranty or additional liability.

END OF TERMS AND CONDITIONS

APPENDIX: How to apply the Apache License to your work.

To apply the Apache License to your work, attach the following

boilerplate notice, with the fields enclosed by brackets "[]"

replaced with your own identifying information. (Don't include

the brackets!) The text should be enclosed in the appropriate

comment syntax for the file format. We also recommend that a

file or class name and description of purpose be included on the

same "printed page" as the copyright notice for easier

identification within third-party archives.

Copyright [yyyy] [name of copyright owner]

Licensed under the Apache License, Version 2.0 (the "License");

you may not use this file except in compliance with the License.

You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License.

Starred

11

Star

11

Fork

2

捐赠

0 人次

举报

举报成功

我们将于2个工作日内通过站内信反馈结果给你!

请认真填写举报原因,尽可能描述详细。

举报类型

请选择举报类型

举报原因

取消

发送

误判申诉

此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。

如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。

取消

提交

简介

短小精干的分布式存储系统

高容错性

数据自动保存多个副本

副本丢失后,自动恢复

适合批处理

移动计算而非数据

数据位置暴露给计算框架

适合小数据处理

kb,m级数据的数据存储

海量规模以上的文件数量

海量节点规模

流式文件访问

一次性写入,多次读取

保证数据一致性

可构建在廉价机器上

通过多副本提高可靠性

提供了容错和恢复机制

展开

收起

暂无标签

Java

Apache-2.0

使用 Apache-2.0 开源许可协议

保存更改

取消

发行版

暂无发行版

贡献者

全部

近期动态

加载更多

不能加载更多了

编辑仓库简介

简介内容

短小精干的分布式存储系统

高容错性

数据自动保存多个副本

副本丢失后,自动恢复

适合批处理

移动计算而非数据

数据位置暴露给计算框架

适合小数据处理

kb,m级数据的数据存储

海量规模以上的文件数量

海量节点规模

流式文件访问

一次性写入,多次读取

保证数据一致性

可构建在廉价机器上

通过多副本提高可靠性

提供了容错和恢复机制

主页

取消

保存更改

Java

1

https://gitee.com/yun-lark/sdfs.git

git@gitee.com:yun-lark/sdfs.git

yun-lark

sdfs

sdfs

master

深圳市奥思网络科技有限公司版权所有

Git 大全

Git 命令学习

CopyCat 代码克隆检测

APP与插件下载

Gitee Reward

Gitee 封面人物

GVP 项目

Gitee 博客

Gitee 公益计划

Gitee 持续集成

OpenAPI

帮助文档

在线自助服务

更新日志

关于我们

加入我们

使用条款

意见建议

合作伙伴

售前咨询客服

技术交流QQ群

微信服务号

client#oschina.cn

企业版在线使用:400-606-0201

专业版私有部署:

13670252304

13352947997

开放原子开源基金会

合作代码托管平台

违法和不良信息举报中心

粤ICP备12009483号

简 体

/

繁 體

/

English

点此查找更多帮助

搜索帮助

Git 命令在线学习

如何在 Gitee 导入 GitHub 仓库

Git 仓库基础操作

企业版和社区版功能对比

SSH 公钥设置

如何处理代码冲突

仓库体积过大,如何减小?

如何找回被删除的仓库数据

Gitee 产品配额说明

GitHub仓库快速导入Gitee及同步更新

什么是 Release(发行版)

将 PHP 项目自动发布到 packagist.org

评论

仓库举报

回到顶部

登录提示

该操作需登录 Gitee 帐号,请先登录后再操作。

立即登录

没有帐号,去注册

Just a moment...

a moment...Enable JavaScript and cookies to continue

GitHub - opendedup/sdfs: Deduplication Based Filesystem

GitHub - opendedup/sdfs: Deduplication Based Filesystem

Skip to content

Toggle navigation

Sign in

Product

Actions

Automate any workflow

Packages

Host and manage packages

Security

Find and fix vulnerabilities

Codespaces

Instant dev environments

Copilot

Write better code with AI

Code review

Manage code changes

Issues

Plan and track work

Discussions

Collaborate outside of code

Explore

All features

Documentation

GitHub Skills

Blog

Solutions

For

Enterprise

Teams

Startups

Education

By Solution

CI/CD & Automation

DevOps

DevSecOps

Resources

Learning Pathways

White papers, Ebooks, Webinars

Customer Stories

Partners

Open Source

GitHub Sponsors

Fund open source developers

The ReadME Project

GitHub community articles

Repositories

Topics

Trending

Collections

Pricing

Search or jump to...

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Cancel

Submit feedback

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

Cancel

Create saved search

Sign in

Sign up

You signed in with another tab or window. Reload to refresh your session.

You signed out in another tab or window. Reload to refresh your session.

You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

opendedup

/

sdfs

Public

Notifications

Fork

75

Star

363

Deduplication Based Filesystem

363

stars

75

forks

Branches

Tags

Activity

Star

Notifications

Code

Issues

4

Pull requests

9

Actions

Projects

0

Wiki

Security

Insights

Additional navigation options

Code

Issues

Pull requests

Actions

Projects

Wiki

Security

Insights

opendedup/sdfs

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

 masterBranchesTagsGo to fileCodeFolders and filesNameNameLast commit messageLast commit dateLatest commit History1,617 Commits.metadata/.plugins/org.eclipse.core.runtime/.settings.metadata/.plugins/org.eclipse.core.runtime/.settings  .settings.settings  .vscode.vscode  buildscriptsbuildscripts  dbgdbg  docsdocs  install-packagesinstall-packages  resourcesresources  srcsrc  windows-exewindows-exe  .classpath.classpath  .gitignore.gitignore  .project.project  DockerfileDockerfile  Dockerfile.baseDockerfile.base  Dockerfile.localbuildDockerfile.localbuild  README.mdREADME.md  SDFS_DP_Changes_after_100821_till_071221.patchSDFS_DP_Changes_after_100821_till_071221.patch  basecloudbuild.yamlbasecloudbuild.yaml  cloudbuild.yamlcloudbuild.yaml  dependency-reduced-pom.xmldependency-reduced-pom.xml  linux_performance_tweeks.txtlinux_performance_tweeks.txt  log4j.propertieslog4j.properties  pom.xmlpom.xml  pool0-volume-cfg.xmlpool0-volume-cfg.xml  workspace.code-workspaceworkspace.code-workspace  View all filesRepository files navigationREADMEsdfs

What is this?

A deduplicated file system that can store data in object storage or block storage.

License

GPLv2

Requirements

System Requirements

1. x64 Linux Distribution. The application was tested and developed on ubuntu 18.04

2. At least 8 GB of RAM

3. Minimum of 2 cores

4. Minimum of 16GB or Storage

Optional Packages

* Docker

Installation

Ubuntu/Debian (Ubuntu 14.04+)

Step 1: Download the latest sdfs version

wget http://opendedup.org/downloads/sdfs-latest.deb

Step 2: Install sdfs and dependencies

sudo apt-get install fuse libfuse2 ssh openssh-server jsvc libxml2-utils

sudo dpkg -i sdfs-latest.deb

Step 3: Change the maximum number of open files allowed

echo "* hard nofile 65535" >> /etc/security/limits.conf

echo "* soft nofile 65535" >> /etc/security/limits.conf

exit

Step 5: Log Out and Proceed to Initialization Instructions

CentOS/RedHat (Centos 7.0+)

Step 1: Download the latest sdfs version

wget http://opendedup.org/downloads/sdfs-latest.rpm

Step 2: Install sdfs and dependencies

yum install jsvc libxml2 java-1.8.0-openjdk

rpm -iv --force sdfs-latest.rpm

Step 3: Change the maximum number of open files allowed

echo "* hardnofile 65535" >> /etc/security/limits.conf

echo "* soft nofile 65535" >> /etc/security/limits.conf

exit

Step 5: Log Out and Proceed to Initialization Instructions

Step 6: Disable the IPTables firewall

service iptables save

service iptables stop

chkconfig iptables off

Step 7: Log Out and Proceed to Initialization Instructions

Docker Usage

Setup

Step 1:

docker pull gcr.io/hybrics/hybrics:master

Step 2:

docker run --name=sdfs1 -p 0.0.0.0:6442:6442 -d gcr.io/hybrics/hybrics:master

Step 3:

wget https://storage.cloud.google.com/hybricsbinaries/hybrics-fs/mount.sdfs-master

sudo mv mount.sdfs-master /usr/sbin/mount.sdfs

sudo chmod 777 /usr/sbin/mount.sdfs

sudo mkdir /media/sdfs

Step 4:

sudo ./mount.sdfs -d sdfs://localhost:6442 /mnt

Docker Parameters:

Envronmental Variable

Description

Default

CAPACITY

The Maximum Phyiscal Capacity of the volume. This is Specified in GB or TB

100GB

TYPE

The type of backend storage. This can be specified as AZURE, GOOGLE, AWS, BACKBLAZE. If none is specified local storage is used.

local storage

URL

The url of for the oject storage used

None

BACKUP_VOLUME

If set to true, the sdfs volume will be setup for deduping archive data better and faster. If not set it will default to better read/write access for random IO

false

GCS_CREDS_FILE

The location of a GCS creds file for authicating to Google cloud storage and GCP Pubsub. Will be required for Google Cloud Storage and GCP Pubsub access.

None

ACCESS_KEY

S3 or Azure Access Key

None

SECRET_KEY

The S3 or Azure Secret Key used to Access object storage

None

AWS_AIM

If set to true AWS AIM will be used for access

false

PUBSUB_PROJECT

The project where the pubsub notification should be setup for file changes and replication

None

PUBSUB_CREDS_FILE

The credentials file used for pubsub creation and access with GCP. If not set GCS_CREDS_FILE will be used.

None

DISABLE_TLS

Disable TLS for api access is set to true

false

REQUIRE_AUTH

Whether to require authication for access to the sdfs APIs

false

PASSWORD

The password to use when creating the volume

admin

EXTENDED_CMD

Any addition command parameters to run during creation

None

Docker run examples

Optimize usage running using local storage:

sudo mkdir /opt/sdfs1

sudo docker run --name=sdfs1 --env CAPACITY=1TB --volume /home/A_USER/sdfs1:/opt/sdfs -p 0.0.0.0:6442:6442 -d gcr.io/hybrics/hybrics:master

Optimize usage running using Google Cloud Storage:

sudo mkdir /opt/sdfs1

sudo docker run --name=sdfs1 --env BUCKET_NAME=ABUCKETNAME --env TYPE=GOOGLE --env=GCS_CREDS_FILE=/keys/service_account_key.json --env=PUBSUB_PROJECT=A_GCP_PROJECT --env CAPACITY=1TB --volume=/home/A_USER/keys:/keys --volume /home/A_USER/sdfs1:/opt/sdfs -p 0.0.0.0:6442:6442 -d gcr.io/hybrics/hybrics:master

Build Instructions

Linux Version Must be build from a Linux System and Windows must be build from a Windows System

Linux build Requirements:

1. Docker

2. git

Docker Build Steps

```bash

git clone https://github.com/opendedup/sdfs.git

cd sdfs

git fetch

git checkout -b master origin/master

#Build image with packages

docker build -t sdfs-package:latest --target build -f Dockerbuild.localbuild .

mkdir pkgs

#Extract Package

docker run --rm sdfs-package:latest | tar --extract --verbose -C pkgs/

#Build docker sdfs container

docker build -t sdfs:latest -f Dockerbuild.localbuild .

```

Initialization Instructions for Standalone Volumes

Step 1: Log into the linux system as root or use sudo

Step 2: Create the SDFS Volume. This will create a volume with 256 GB of capacity using a Variable block size.

**Local Storage**

sudo mkfs.sdfs --volume-name=pool0 --volume-capacity=256GB

**AWS Storage**

sudo mkfs.sdfs --volume-name=pool0 --volume-capacity=1TB --aws-enabled true --cloud-access-key --cloud-secret-key --cloud-bucket-name

**Azure Storage**

sudo mkfs.sdfs --volume-name=pool0 --volume-capacity=1TB --azure-enabled true --cloud-access-key --cloud-secret-key --cloud-bucket-name

**Google Storage**

sudo mkfs.sdfs --volume-name=pool0 --volume-capacity=1TB --google-enabled true --cloud-access-key --cloud-secret-key --cloud-bucket-name

Step 3: Create a mount point on the filesystem for the volume

sudo mkdir /media/pool0

Step 4: Mount the Volume

sudo mount -t sdfs pool0 /media/pool0/

Set 5: Add the filesystem to fstab

pool0 /media/pool0 sdfs defaults 0 0

Troubleshooting and other Notes

Running on a Multi-Node clusting on KVM guest.

By default KVM networking does not seem to allow guest to communicate over multicast. It also doesn't seem to work when bridging from a nic. From my reseach it looks like you have to setup a routed network from a KVM host and have all the guests on that shared network. In addition, you will want to enable multicast on the virtual nic that is shared by those guest. Here is is the udev code to make this happen. A reference to this issue is found here.

# cat /etc/udev/rules.d/61-virbr-querier.rules

ACTION=="add", SUBSYSTEM=="net", RUN+="/etc/sysconfig/network-scripts/vnet_querier_enable"

# cat /etc/sysconfig/network-scripts/vnet_querier_enable

#!/bin/sh

if [[ $INTERFACE == virbr* ]]; then

/bin/echo 1 > /sys/devices/virtual/net/$INTERFACE/bridge/multicast_querier

fi

Testing multicast support on nodes in the cluster

The jgroups protocol includes a nice tool to verify multicast is working on all nodes. Its an echo tool and sends messages from a sender to a reciever

On the receiver run

java -cp /usr/share/sdfs/lib/jgroups-3.4.1.Final.jar org.jgroups.tests.McastReceiverTest -mcast_addr 231.12.21.132 -port 45566

On the sender run

java -cp /usr/share/sdfs/lib/jgroups-3.4.1.Final.jar org.jgroups.tests.McastSenderTest -mcast_addr 231.12.21.132 -port 45566

Once you have both sides running type a message on the sender and you should see it on the receiver after you press enter. You may also want to switch rolls to make sure multicast works both directions.

take a look at http://docs.jboss.org/jbossas/docs/Clustering_Guide/4/html/ch07s07s11.html for more detail.

Further reading:

Take a look at the administration guide for more detail. http://www.opendedup.org/sdfs-20-administration-guide

Ask for Help

If you still need help check out the message board here https://groups.google.com/forum/#!forum/dedupfilesystem-sdfs-user-discuss

About

Deduplication Based Filesystem

Resources

Readme

Activity

Stars

363

stars

Watchers

27

watching

Forks

75

forks

Report repository

Releases

9

Cloud Shared Metadata support

Latest

Aug 20, 2016

+ 8 releases

Packages

0

No packages published

Contributors

3

opendedup

Sam Silverberg

nefelim4ag

Timofey Titovets

skulkar2

Swapnil Kulkarni

Languages

Java

91.7%

C++

5.8%

Shell

1.6%

Euphoria

0.6%

Dockerfile

0.1%

C

0.1%

Other

0.1%

Footer

© 2024 GitHub, Inc.

Footer navigation

Terms

Privacy

Security

Status

Docs

Contact

Manage cookies

Do not share my personal information

You can’t perform that action at this time.

Filesystem — ESP8266 Arduino Core 3.1.2-21-ga348833 documentation

Filesystem — ESP8266 Arduino Core 3.1.2-21-ga348833 documentation

Navigation

index

next |

previous |

ESP8266 Arduino Core 3.1.2-21-ga348833 documentation »

Filesystem

Filesystem¶

Flash layout¶

Even though file system is stored on the same flash chip as the program,

programming new sketch will not modify file system contents. This allows

to use file system to store sketch data, configuration files, or content

for Web server.

The following diagram illustrates flash layout used in Arduino

environment:

|--------------|-------|---------------|--|--|--|--|--|

^ ^ ^ ^ ^

Sketch OTA update File system EEPROM WiFi config (SDK)

File system size depends on the flash chip size. Depending on the board

which is selected in IDE, the following table shows options for flash size.

Another option called Mapping defined by Hardware and Sketch is available.

It allows a sketch, not the user, to select FS configuration at boot

according to flash chip size.

This option is also enabled with this compilation define: -DFLASH_MAP_SUPPORT=1.

There are three possible configurations:

FLASH_MAP_OTA_FS: largest available space for onboard FS, allowing OTA (noted ‘OTA’ in the table)

FLASH_MAP_MAX_FS: largest available space for onboard FS (noted ‘MAX’ in the table)

FLASH_MAP_NO_FS: no onboard FS

Sketch can invoke a particular configuration by adding this line:

FLASH_MAP_SETUP_CONFIG(FLASH_MAP_OTA_FS)

void setup () { ... }

void loop () { ... }

Board

Flash chip size (bytes)

File system size (bytes)

Any

512KBytes

32KB(OTA), 64KB, 128KB(MAX)

Any

1MBytes

64KB(OTA), 128KB, 144KB, 160KB, 192KB, 256KB, 512KB(MAX)

Any

2MBytes

64KB, 128KB, 256KB(OTA), 512KB, 1MB(MAX)

Any

4MBytes

1MB, 2MB(OTA), 3MB(MAX)

Any

8MBytes

6MB(OTA), 7MB(MAX)

Any

16MBytes

14MB(OTA), 15MB(MAX)

Note: to use any of file system functions in the sketch, add the

following include to the sketch:

//#include "FS.h" // SPIFFS is declared

#include "LittleFS.h" // LittleFS is declared

//#include "SDFS.h" // SDFS is declared

SPIFFS Deprecation Warning¶

SPIFFS is currently deprecated and may be removed in future releases of

the core. Please consider moving your code to LittleFS. SPIFFS is not

actively supported anymore by the upstream developer, while LittleFS is

under active development, supports real directories, and is many times

faster for most operations.

SPIFFS and LittleFS¶

There are two filesystems for utilizing the onboard flash on the ESP8266:

SPIFFS and LittleFS.

SPIFFS is the original filesystem and is ideal for space and RAM

constrained applications that utilize many small files and care

about static and dynamic wear levelling and don’t need true directory

support. Filesystem overhead on the flash is minimal as well.

LittleFS is recently added and focuses on higher performance and

directory support, but has higher filesystem and per-file overhead

(4K minimum vs. SPIFFS’ 256 byte minimum file allocation unit).

They share a compatible API but have incompatible on-flash

implementations, so it is important to choose one or the other per project

as attempting to mount a SPIFFS volume under LittleFS may result

in a format operation and definitely will not preserve any files,

and vice-versa.

The actual File and Dir objects returned from either

filesystem behave in the same manner and documentation is applicable

to both. To convert most applications from SPIFFS to LittleFS

simply requires changing the SPIFFS.begin() to LittleFS.begin()

and SPIFFS.open() to LittleFS.open() with the rest of the

code remaining untouched.

SDFS and SD¶

FAT filesystems are supported on the ESP8266 using the old Arduino wrapper

“SD.h” which wraps the “SDFS.h” filesystem transparently.

Any commands discussed below pertaining to SPIFFS or LittleFS are

applicable to SD/SDFS.

For legacy applications, the classic SD filesystem may continue to be used,

but for new applications, directly accessing the SDFS filesystem is

recommended as it may expose additional functionality that the old Arduino

SD filesystem didn’t have.

Note that in earlier releases of the core, using SD and SPIFFS in the same

sketch was complicated and required the use of NO_FS_GLOBALS. The

current design makes SD, SDFS, SPIFFS, and LittleFS fully source compatible

and so please remove any NO_FS_GLOBALS definitions in your projects

when updgrading core versions.

SPIFFS file system limitations¶

The SPIFFS implementation for ESP8266 had to accommodate the

constraints of the chip, among which its limited RAM.

SPIFFS was selected because it

is designed for small systems, but that comes at the cost of some

simplifications and limitations.

First, behind the scenes, SPIFFS does not support directories, it just

stores a “flat” list of files. But contrary to traditional filesystems,

the slash character '/' is allowed in filenames, so the functions

that deal with directory listing (e.g. openDir("/website"))

basically just filter the filenames and keep the ones that start with

the requested prefix (/website/). Practically speaking, that makes

little difference though.

Second, there is a limit of 32 chars in total for filenames. One

'\0' char is reserved for C string termination, so that leaves us

with 31 usable characters.

Combined, that means it is advised to keep filenames short and not use

deeply nested directories, as the full path of each file (including

directories, '/' characters, base name, dot and extension) has to be

31 chars at a maximum. For example, the filename

/website/images/bird_thumbnail.jpg is 34 chars and will cause some

problems if used, for example in exists() or in case another file

starts with the same first 31 characters.

Warning: That limit is easily reached and if ignored, problems might

go unnoticed because no error message will appear at compilation nor

runtime.

For more details on the internals of SPIFFS implementation, see the

SPIFFS readme

file.

LittleFS file system limitations¶

The LittleFS implementation for the ESP8266 supports filenames of up

to 31 characters + terminating zero (i.e. char filename[32]), and

as many subdirectories as space permits.

Filenames are assumed to be in the root directory if no initial “/” is

present.

Opening files in subdirectories requires specifying the complete path to

the file (i.e. open("/sub/dir/file.txt");). Subdirectories are

automatically created when you attempt to create a file in a subdirectory,

and when the last file in a subdirectory is removed the subdirectory

itself is automatically deleted. This is because there was no mkdir()

method in the existing SPIFFS filesystem.

Unlike SPIFFS, the actual file descriptors are allocated as requested

by the application, so in low memory conditions you may not be able to

open new files. Conversely, this also means that only file descriptors

used will actually take space on the heap.

Because there are directories, the openDir method behaves differently

than SPIFFS. Whereas SPIFFS will return files in “subdirectories” when

you traverse a Dir::next() (because they really aren’t subdirs but

simply files with “/”s in their names), LittleFS will only return files

in the specific subdirectory. This mimics the POSIX behavior for

directory traversal most C programmers are used to.

Uploading files to file system¶

ESP8266FS is a tool which integrates into the Arduino IDE. It adds a

menu item to Tools menu for uploading the contents of sketch data

directory into ESP8266 flash file system.

Warning: Due to the move from the obsolete esptool-ck.exe to the

supported esptool.py upload tool, upgraders from pre 2.5.1 will need to

update the ESP8266FS tool referenced below to 0.5.0 or later. Prior versions

will fail with a “esptool not found” error because they don’t know how to

use esptool.py.

Download the tool: https://github.com/esp8266/arduino-esp8266fs-plugin/releases/download/0.5.0/ESP8266FS-0.5.0.zip

In your Arduino sketchbook directory, create tools directory if

it doesn’t exist yet.

Unpack the tool into tools directory (the path will look like

/Arduino/tools/ESP8266FS/tool/esp8266fs.jar)

If upgrading, overwrite the existing JAR file with the newer version.

Restart Arduino IDE.

Open a sketch (or create a new one and save it).

Go to sketch directory (choose Sketch > Show Sketch Folder).

Create a directory named data and any files you want in the file

system there.

Make sure you have selected a board, port, and closed Serial Monitor.

If your board requires you to press a button (or other action) to enter

bootload mode for flashing a sketch, do that now.

Select Tools > ESP8266 Sketch Data Upload. This should start

uploading the files into ESP8266 flash file system. When done, IDE

status bar will display SPIFFS Image Uploaded message.

ESP8266LittleFS is the equivalent tool for LittleFS.

Download the 2.6.0 or later version of the tool: https://github.com/earlephilhower/arduino-esp8266littlefs-plugin/releases

Install as above

To upload a LittleFS filesystem use Tools > ESP8266 LittleFS Data Upload

File system object (SPIFFS/LittleFS/SD/SDFS)¶

setConfig¶

SPIFFSConfig cfg;

cfg.setAutoFormat(false);

SPIFFS.setConfig(cfg);

This method allows you to configure the parameters of a filesystem

before mounting. All filesystems have their own *Config (i.e.

SDFSConfig or SPIFFSConfig with their custom set of options.

All filesystems allow explicitly enabling/disabling formatting when

mounts fail. If you do not call this setConfig method before

perforing begin(), you will get the filesystem’s default

behavior and configuration. By default, SPIFFS will autoformat the

filesystem if it cannot mount it, while SDFS will not.

begin¶

SPIFFS.begin()

or LittleFS.begin()

This method mounts file system. It must be called before any

other FS APIs are used. Returns true if file system was mounted

successfully, false otherwise. With no options it will format SPIFFS

if it is unable to mount it on the first try.

Note that both methods will automatically format the filesystem

if one is not detected. This means that if you attempt a

SPIFFS.begin() on a LittleFS filesystem you will lose all data

on that filesystem, and vice-versa.

end¶

SPIFFS.end()

or LittleFS.end()

This method unmounts the file system. Use this method before updating

the file system using OTA.

format¶

SPIFFS.format()

or LittleFS.format()

Formats the file system. May be called either before or after calling

begin. Returns true if formatting was successful.

open¶

SPIFFS.open(path, mode)

or LittleFS.open(path, mode)

Opens a file. path should be an absolute path starting with a slash

(e.g. /dir/filename.txt). mode is a string specifying access

mode. It can be one of “r”, “w”, “a”, “r+”, “w+”, “a+”. Meaning of these

modes is the same as for fopen C function.

r Open text file for reading. The stream is positioned at the

beginning of the file.

r+ Open for reading and writing. The stream is positioned at the

beginning of the file.

w Truncate file to zero length or create text file for writing.

The stream is positioned at the beginning of the file.

w+ Open for reading and writing. The file is created if it does

not exist, otherwise it is truncated. The stream is

positioned at the beginning of the file.

a Open for appending (writing at end of file). The file is

created if it does not exist. The stream is positioned at the

end of the file.

a+ Open for reading and appending (writing at end of file). The

file is created if it does not exist. The initial file

position for reading is at the beginning of the file, but

output is always appended to the end of the file.

Returns File object. To check whether the file was opened

successfully, use the boolean operator.

File f = SPIFFS.open("/f.txt", "w");

if (!f) {

Serial.println("file open failed");

}

exists¶

SPIFFS.exists(path)

or LittleFS.exists(path)

Returns true if a file with given path exists, false otherwise.

mkdir¶

LittleFS.mkdir(path)

Returns true if the directory creation succeeded, false otherwise.

rmdir¶

LittleFS.rmdir(path)

Returns true if the directory was successfully removed, false otherwise.

openDir¶

SPIFFS.openDir(path)

or LittleFS.openDir(path)

Opens a directory given its absolute path. Returns a Dir object.

Please note the previous discussion on the difference in behavior between

LittleFS and SPIFFS for this call.

remove¶

SPIFFS.remove(path)

or LittleFS.remove(path)

Deletes the file given its absolute path. Returns true if file was

deleted successfully.

rename¶

SPIFFS.rename(pathFrom, pathTo)

or LittleFS.rename(pathFrom, pathTo)

Renames file from pathFrom to pathTo. Paths must be absolute.

Returns true if file was renamed successfully.

gc¶

SPIFFS.gc()

Only implemented in SPIFFS. Performs a quick garbage collection operation on SPIFFS,

possibly making writes perform faster/better in the future. On very full or very fragmented

filesystems, using this call can avoid or reduce issues where SPIFFS reports free space

but is unable to write additional data to a file. See this discussion

for more info.

check¶

SPIFFS.begin();

SPIFFS.check();

Only implemented in SPIFFS. Performs an in-depth check of the filesystem metadata and

correct what is repairable. Not normally needed, and not guaranteed to actually fix

anything should there be corruption.

info¶

FSInfo fs_info;

SPIFFS.info(fs_info);

or LittleFS.info(fs_info);

Fills FSInfo structure with

information about the file system. Returns true if successful,

false otherwise.

Filesystem information structure¶

struct FSInfo {

size_t totalBytes;

size_t usedBytes;

size_t blockSize;

size_t pageSize;

size_t maxOpenFiles;

size_t maxPathLength;

};

This is the structure which may be filled using FS::info method. -

totalBytes — total size of useful data on the file system -

usedBytes — number of bytes used by files - blockSize — filesystem

block size - pageSize — filesystem logical page size - maxOpenFiles

— max number of files which may be open simultaneously -

maxPathLength — max file name length (including one byte for zero

termination)

info64¶

FSInfo64 fsinfo;

SD.info(fsinfo);

or LittleFS(fsinfo);

Performs the same operation as info but allows for reporting greater than

4GB for filesystem size/used/etc. Should be used with the SD and SDFS

filesystems since most SD cards today are greater than 4GB in size.

setTimeCallback(time_t (*cb)(void))¶

time_t myTimeCallback() {

return 1455451200; // UNIX timestamp

}

void setup () {

LittleFS.setTimeCallback(myTimeCallback);

...

// Any files will now be made with Pris' incept date

}

The SD, SDFS, and LittleFS filesystems support a file timestamp, updated when the file is

opened for writing. By default, the ESP8266 will use the internal time returned from

time(NULL) (i.e. local time, not UTC, to conform to the existing FAT filesystem), but this

can be overridden to GMT or any other standard you’d like by using setTimeCallback().

If your app sets the system time using NTP before file operations, then

you should not need to use this function. However, if you need to set a specific time

for a file, or the system clock isn’t correct and you need to read the time from an external

RTC or use a fixed time, this call allows you do to so.

In general use, with a functioning time() call, user applications should not need

to use this function.

Directory object (Dir)¶

The purpose of Dir object is to iterate over files inside a directory.

It provides multiple access methods.

The following example shows how it should be used:

Dir dir = SPIFFS.openDir("/data");

// or Dir dir = LittleFS.openDir("/data");

while (dir.next()) {

Serial.print(dir.fileName());

if(dir.fileSize()) {

File f = dir.openFile("r");

Serial.println(f.size());

}

}

next¶

Returns true while there are files in the directory to

iterate over. It must be called before calling fileName(), fileSize(),

and openFile() functions.

fileName¶

Returns the name of the current file pointed to

by the internal iterator.

fileSize¶

Returns the size of the current file pointed to

by the internal iterator.

fileTime¶

Returns the time_t write time of the current file pointed

to by the internal iterator.

fileCreationTime¶

Returns the time_t creation time of the current file

pointed to by the internal iterator.

isFile¶

Returns true if the current file pointed to by

the internal iterator is a File.

isDirectory¶

Returns true if the current file pointed to by

the internal iterator is a Directory.

openFile¶

This method takes mode argument which has the same meaning as

for SPIFFS/LittleFS.open() function.

rewind¶

Resets the internal pointer to the start of the directory.

setTimeCallback(time_t (*cb)(void))¶

Sets the time callback for any files accessed from this Dir object via openNextFile.

Note that the SD and SDFS filesystems only support a filesystem-wide callback and

calls to Dir::setTimeCallback may produce unexpected behavior.

File object¶

SPIFFS/LittleFS.open() and dir.openFile() functions return a File object.

This object supports all the functions of Stream, so you can use

readBytes, findUntil, parseInt, println, and all other

Stream methods.

There are also some functions which are specific to File object.

seek¶

file.seek(offset, mode)

This function behaves like fseek C function. Depending on the value

of mode, it moves current position in a file as follows:

if mode is SeekSet, position is set to offset bytes from

the beginning.

if mode is SeekCur, current position is moved by offset

bytes.

if mode is SeekEnd, position is set to offset bytes from

the end of the file.

Returns true if position was set successfully.

position¶

file.position()

Returns the current position inside the file, in bytes.

size¶

file.size()

Returns file size, in bytes.

name¶

String name = file.name();

Returns short (no-path) file name, as const char*. Convert it to String for

storage.

fullName¶

// Filesystem:

// testdir/

// file1

Dir d = LittleFS.openDir("testdir/");

File f = d.openFile("r");

// f.name() == "file1", f.fullName() == "testdir/file1"

Returns the full path file name as a const char*.

getLastWrite¶

Returns the file last write time, and only valid for files opened in read-only

mode. If a file is opened for writing, the returned time may be indeterminate.

getCreationTime¶

Returns the file creation time, if available.

isFile¶

bool amIAFile = file.isFile();

Returns true if this File points to a real file.

isDirectory¶

bool amIADir = file.isDir();

Returns true if this File points to a directory (used for emulation

of the SD.* interfaces with the openNextFile method).

close¶

file.close()

Close the file. No other operations should be performed on File object

after close function was called.

openNextFile (compatibiity method, not recommended for new code)¶

File root = LittleFS.open("/");

File file1 = root.openNextFile();

File files = root.openNextFile();

Opens the next file in the directory pointed to by the File. Only valid

when File.isDirectory() == true.

rewindDirectory (compatibiity method, not recommended for new code)¶

File root = LittleFS.open("/");

File file1 = root.openNextFile();

file1.close();

root.rewindDirectory();

file1 = root.openNextFile(); // Opens first file in dir again

Resets the openNextFile pointer to the top of the directory. Only

valid when File.isDirectory() == true.

setTimeCallback(time_t (*cb)(void))¶

Sets the time callback for this specific file. Note that the SD and

SDFS filesystems only support a filesystem-wide callback and calls to

Dir::setTimeCallback may produce unexpected behavior.

Table of Contents

Filesystem

Flash layout

SPIFFS Deprecation Warning

SPIFFS and LittleFS

SDFS and SD

SPIFFS file system limitations

LittleFS file system limitations

Uploading files to file system

File system object (SPIFFS/LittleFS/SD/SDFS)

setConfig

begin

end

format

open

exists

mkdir

rmdir

openDir

remove

rename

gc

check

info

Filesystem information structure

info64

setTimeCallback(time_t (*cb)(void))

Directory object (Dir)

next

fileName

fileSize

fileTime

fileCreationTime

isFile

isDirectory

openFile

rewind

setTimeCallback(time_t (*cb)(void))

File object

seek

position

size

name

fullName

getLastWrite

getCreationTime

isFile

isDirectory

close

openNextFile (compatibiity method, not recommended for new code)

rewindDirectory (compatibiity method, not recommended for new code)

setTimeCallback(time_t (*cb)(void))

Previous topic

Libraries

Next topic

ESP8266WiFi library

This Page

Show Source

Quick search

Navigation

index

next |

previous |

ESP8266 Arduino Core 3.1.2-21-ga348833 documentation »

Filesystem

© Copyright 2017, Ivan Grokhotkov.

Created using Sphinx 5.3.0.

一张图搞定SDF的概念 - jack船长大哥 - 博客园

一张图搞定SDF的概念 - jack船长大哥 - 博客园

会员

周边

新闻

博问

AI培训

云市场

所有博客

当前博客

我的博客

我的园子

账号设置

简洁模式 ...

退出登录

注册

登录

jack船长大哥

一张图搞定SDF的概念

本文仅代表个人理解,谬误之处请指正。

SDF:

Signed Distance Field,译为有向距离场,“有向”、“距离”、“场”这三个词非常精确的描述了sdf究竟是个什么东西。GPU

Gems 3中是这么描述sdf的:“SDF是由到(多边形模型)物体表面最近距离的采样网格。作为惯例,使用负值来表示物体内部,使用正值表示物体外部。SDF理念对于图形图像及相关领域具有很大的诱惑力。它经常被用于布料动画碰撞检测、多物体动力学、变形物体、mesh网格生成、运动规划和雕刻。”

关于sdf的更多知识可参考如下链接:GPU

Gem 3,Byte

wrangler`s blog,SIGGRAPH2007_AlphaTestedMagnification。

在naiad中的sdf是个3d的东西,但我接下来用自己制作的一幅二维图像来解释2d的sdf,这样反而更容易理解3d的sdf~

图1

图1中:

1、蓝色线条为sdf的零边界连线,在naiad中则是iso-scope显示的iso

surface。

2、“有向”“距离”“场”的概念:voxel中数字正负与黑色箭头代表“signed”,正负数字与零边界之差为“distance”,青色区域代表“field”。

3、青色区域为sdf存在的区域,在naiad中则是fine

tile区域。

 

 

Iso-surface:

译为等值面、等参面、等值参数面,在naiad中可以理解为一种用于可视化sdf等值面的一种显示方法,iso-scope默认显示的是sdf的零参数面(零边界)。

 

 

iso-scope:

iso-scope用来查看sdf在不同边界值下的iso-surface,在naiad中使用的非常频繁。

图2

 

图3

 

iso-scope不仅能显示模型的原生sdf,也能显示原生sdf通过零边界偏移后的情况,如图4:

图4

 

图4中Iso

 Value中的数值0.1可以理解为显示SDF值为0.1的等值面。

 

iso-scope所查看的对象必须是body,而且这个body必须带有这样一个Field

Channel,这个Field Channel必须是distance

channel,也就是sdf。

 

Quality菜单下

Slice Count:iso-scope的显示原理是把sdf根据voxel进行采样并切片显示。Slice

Count越高,sdf显示越细致,显存\刷新速度越慢,Slice

Count与显存占用呈线性关系。

SuperSampling:对于voxel的超采样精度。SuperSampling越高,sdf显示越准确,显存\刷新速度越慢,SuperSampling与显存占用成指数递增关系,过高容易导致显存溢出,显卡停止响应。

 

 

tile-scope:

tile-scope用来查看body的tile-layout,tile-layout指的是naiad中body的field

channel所存在的区域。查看tile-layout是naiad中用来了解资源占用以及各种性能问题诊断的重要手段。一个正常的带有field的body通常有能将其粒子、模型完全包裹的tile-layout,如图5:

图5

 tile-layout包含fine

tile和coarse tile,sdf只存在于fine

tile中,通常关注fine tile就可以。

再分享一下我老师大神的人工智能教程吧。零基础!通俗易懂!风趣幽默!还带黄段子!希望你也加入到我们人工智能的队伍中来!https://blog.csdn.net/jiangjunshow

posted on

2019-02-01 12:16 

jack船长大哥 

阅读(11111) 

评论(1) 

编辑 

收藏 

举报

会员力量,点亮园子希望

刷新页面返回顶部

导航

博客园

首页

新随笔

联系

订阅

管理

公告

Powered by:

博客园

Copyright © 2024 jack船长大哥

Powered by .NET 8.0 on Kubernetes

Administration Guide – OpenDedup

Administration Guide – OpenDedup

Skip to content

Documentation

User Forum

Professional Support

Download

GitHub

Administration Guide

Index:

Introduction

Features

Architecture

Planning your deployment

Fixed and Variable Block Deduplication

Creating and Mounting SDFS File Systems

Mounting SDFS Volumes as NFS Shares

Managing SDFS Volumes for Virtual Machines

Managing SDFS Volumes through Extended Attributes

File-system Snapshots for SDFS Volumes

SDFS Volume Storage Utilization and Compacting Volumes

Dedup Storage Engine

Cloud Based Deduplication

Dedup Storage Engine Memory

Data Chunks

File and Folder Placement

Other options and Extended Attributes

SDFS Replication

Data Chunk Removal

Scaling and Sizing

Troubleshooting

References

Introduction:

This is intended to be a detailed guide for the SDFS file-system. For most purposes, the Quickstart Guide will get you going but if you are interested in advanced topics, this is the place to look.

SDFS is a distributed and expandable filesystem designed to provide inline deduplication and flexiblity for applications. Services such as backup, archiving, NAS storage, and Virtual Machine primary and secondary storage can benefit greatly from SDFS.

SDFS can be deployed as a standalone filesystem and provide inline deduplication. The deduplication can store data on a number of back ends including:

Object Storage

AWS S3

Glacier

S3 compliant back ends

Google Cloud Storage

Azure Blob Storage

Swift

Local Filesystem

EXT4

NTFS (Windows)

XFS

Features:

SDFS for read and write activity in addition to the features below.

High Availability : All block data is fully recoverable from object storage, including :

MetaData

Hashtable

Unique Data

Global Deduplication from any application that writes to an opendedupe volume

Expand backend storage without having to offline the volume

Unlimited Snapshot capability without IO impact

Efficient, deduplication aware replication

Architecture:

SDFS’s unique design allows for many advantages over a traditional filesystem. The complete decoupleing of block data from file metadata is the main charateristic of this design. Any number of logical files can reference the same unique data block. The unique data block has no knowledge of what files have reference or where the files are located. Metadata has reference to a hash associated with logical location within a file. Since all data is deduplicated and shared between volumes and metadata IO is reduced significantly across the network and on b.

SDFS is comprised of 3 basic components:

SDFS file-system service (Volume)

Deduplication Storage Engine (DSE)

Data Chunks

 

END TO END PROCESS

SDFS FILE META-DATA

Each logical SDFS file is represented by two different actual pieces of metadata and held in two different files. The first piece of metadata is called the “MetaDataDedupFile” this file is stored in a filesystem structure that directly mimics the filesystem namespace that is presented when the filesystem is mounted. As an example, each of these files is named the same as it appears when it is mounted and looks as if its in the same directory structure under “/opt/sdfs/volumes//files/”. This file contain all of the filesystem attributes associated with the file, including size, atime, ctime, acls, and link to the associated map file.

The second meta-data file is the mapping file. This file contains the list of records, corresponding to locations to where the blocks represent data in the file. Each record contains a hash entry, whether the data was a duplicate, and when the data is stored on remote nodes, what nodes that data can be found on.

This data is located in “/opt/sdfs/volumes//ddb/”

each record has the following data structure

| dup (1 byte) | hash (hash algo length) | reserverd 1(byte) | hash location 8 bytes |

The locations field is an 8 byte array that represents a long integer. The long integer represents the Archive File where the data associated with the specific chunk in question can be found.

SDFS Stores all data in archive files that combine multiple data chunks of deduplicated data into larger files. This allows for more efficient storage management and less uploads to the cloud.

WRITE BUFFERS

SDFS Write buffers store data before it is deduplicated. The size of the buffers should be set appropriate to the IO pattern and type of data that is written.

The write buffer size is detemined at mkfs.sdfs initialization with the parmeter –io-chunk-size.  By default its set to 256KB. When the parameter  –backup-volume is selected the chunk size is set to 40MB.

A larger buffer size will allow for better deduplication rates and faster streamed IO. A smaller chunk size will allow for faster random IO.

This parameter cannot be changed after the volume is mounted.

WRITING DATA

When data is written to SDFS it is sent to the File-System Process for the kernel via the fuse library or dokan library. SDFS grabs the data from the fuse layer api and breaks the data into fixed chunks. These chunks are associated to fixed positions within the file as they were written. These chunks are immediately cached on a per file basis for active IO read and writes in a fifo buffer. The size of this fifo buffer is set to 1MB-80MB by default but can be changed via the “max-file-write-buffers” attribute within the SDFS configuration file.

When data expires from the fifo buffer it is moved to a flushing buffer. This flushing buffer is emptied by a pool of threads configured by  “write-threads” attribute. These threads perform the process of computing the hash for the block of data, searching the system to see if the hash/data has already been stored, on what nodes the data is stored(not applicable for standalone), confirming the data has been persisted, and finally writing the record associated with the block to the mapping file.

READING DATA

When data is read from SDFS the requested file position and requested data length is sent through the fuse layer to the SDFS application. The record(s) associated with the file position and length are then looked up in the mapping file and the block data is recalled by either looking up the long integer associated with the archive file where the data is located, or by the hash if that first case fails.

READING DATA FROM AMAZON GLACIER

SDFS can utilize Storage Lifestyle policies with Amazon to support moving data to glacier. When data is read from Amazon Glacier, the DSE first tries the retrieve the data normally, as if its any S3 blob, if this process fails because the blob has been moved to glacies, the DSE then informs the OpenDedupe volume service that the data has been archived. The OpenDedupe Volume Service initiates an glacier archival retrieval process for all blocks associated to chunks unused by file being read. The read operation will be blocked until all blocks have been successfully restored from glacier.

SDFS FILE-SYSTEM SERVICE

The SDFS File-System (FSS) service is synonymous with the operating system concept of a Volume and filesystem as it performs the function of both. It is logical container that is responsible for all file system level activity. This includes filesystem IO activity, chunking data into fixed blocks for deduplication, file system statistics, as well as all enhanced functions such as snapshots. Each File System Service or volume contains a single SDFS namespace, or filesystem instance and is responsible for presenting and storing individual files and folders. The volume is mounted through “mount.sdfs”. This is the primary way users, applications, and services interact with SDFS.

The SDFS file-system service provides a typical POSIX compliant view of deduplicated files and folders to volumes. The SDFS filesystem services store meta-data regarding files and folders. This meta data includes information such as file size, file path, and most other aspects of files and folders other than the actual file data. In addition to meta data the SDFS file-system service also manages file maps that identify data location to dedup/undeduped chunk mappings. The chunks themselves live either within within the local Deduplication Storage Engine or in the cluster depending on the configuration.

SDFS Volumes can be exported through ISCSI or NFS.

File-System services (FSS) will add the available storage on the DSE to its capacity and begin to actively writing unique blocks of data to that node and reference that node id for retrieval. DSE nodes are written to based on a weighted random distribution

If the responses have met the cluster redundancy requirements then the FSS will store the cluster node numbers and the hash in the map file associated with the specific write request. It the block has not yet been stored or if the block does not meet the cluster redundancy requirements then the block will be written to nodes that have not already stored that block. The determination where that block will be stored is based on distribution algorithm described above. These write are done in unicast to the specific storage nodes.

Deduplication Storage Engine

The Deduplication Storage Engine (DSE) stores, retrieves, and removes all deduped chunks. The deduplication storage engine can is run as part of an SDFS Volume. Chunks of data are stored on disk, or at a cloud provider, and indexed for retrieval with a custom written hash table. The DSE database is stored in /opt/sdfs/volumes//chunkstore/hdb-/ .

Data Blocks

Unique data chunks are stored together in Archive Files by the Dedupe Storage Engine(DSE) either on disk or in the cloud. The dedupe storage engine stores collections of data chunks, in sequence, within data blocks in the the chuckstore directory. By default the data block size is no more than 40MB but can be set to anything up to 256MB. New blocks are closed and are no longer writable when either their size is reached or the block times out waiting for new data. The timeout is 6 seconds by default. Writable blocks are stored in chunkstore/chunks/outgoing/.

The DSE creates new blocks as data is unque data is written into the DSE. Each new block is designated by a unique long integer. When unique data is written in, the DSE either compresses/encrypts the chunk and then writes the chunk into a new block in sequence. It then stores a reference to the unique chunk hash and the block’s unique id. The block, itself keeps track of where unique chunks are located in a map file associated with each chunk. As blocks reach their size limit or timeout they are then closed for writing and then either uploaded to the cloud and cached locally or moved to a permanent location on disk /chunkstore/chunks/[1st three numbers of unique id]/. The map file and the blocks are stored together.

Cloud Storage of Data Blocks

When data is uploaded to the cloud, the DSE creates a new thread to upload the closed block. The number of blocks that can be simultaneously uploaded is 16 by default but can be changed within the xml config (io-threads). For slower connections it may make sense to lower this number or raise it for faster connections up to 64 threads.  Data Blocks are stored in the bucked under /blocks/ sub directory and the associated maps are stored in the /keys/ directory.

Data chunks are always read locally when requested by the volume. If cloud storage is leveraged, the data block where a requested chunk is located is retrieved from the cloud storage provider and cached locally. The unique chunk is then read from the local cache and restored.

If data is stored in an Amazon Glacier repository, the DSE informs the volume that the data is archived. The volume will then initiate an archive retrieval process.

Data Chunk

Data Chunks are the unit by which raw data is processes and stored with SDFS. Chunks of data are stored either with the Deduplication Storage Engine or the SDFS file-system service depending on the deduplication process. Chunks are, by default hashed using the SIP Hash 128. SDFS also includes other hashing algorithms but SIP is fast, collision resistant, and requires half the footprint of sha256.

Here are a few other facts regarding unique data redundancy if data is written to the cloud:

Unique data is written to mutiple DSE nodes asynchronously.

If two volumes specifiy different redundancy requirements and share unique data, redundancy will be met based on the volume with the highest redundancy requirement.

If the number of DSEs drops below the replica requirement writes will still occur to the remaining DSE nodes.

 

Planning SDFS Your Architecture:

Standalone VS. Cloud Storage

We deciding on a standalone architecture versus cloud storage consider the following advantages of each :

STANDALONE ADVANTAGES :

IO Speeds : Reads and writes will all be significantly faster using local storage vs cloud storage.

CLOUD ADVANTAGES :

High Availability : The entire volume is replicated to the cloud. This means that if you loose your local volume, you can recover all of the data from the cloud.

Scalability : SDFS scales better with cloud storage because only a small subset of the unique data is stored locally.

Global Deduplication : Multiple SDFS volumes can share the same cloud storage bucket and share each other’s data.

To create a volume you will want to consider the following:

Fixed and Variable Block Deduplication

SDFS Can perform both fixed and variable block deduplication. Fixed block deduplication takes fixed blocks of data and hashes those blocks. Variable block deduplication attempts to find natural breaks within stream of data an creates variable blocks at those break points.

Fixed block deduplication is performed at volume defined fixed byte buffers within SDFS. These fixed blocks are defined when the volume is created and is set at 4k by default but can be set to a maximum value of 128k. Fixed block deduplication is very useful for active structured data such as running VMDKs or Databases. Fixed block deduplication is simple to perform and can therefore be very fast for most applications.

Variable block deduplication is performed using Rabin Window Borders (http://en.wikipedia.org/wiki/Rabin_fingerprint). SDFS uses fixed buffers of 256K and then runs a rolling hash across that buffer to find natural breaks. The minimum size of a variable block is 4k and the maximum size is 32k. Variable block deduplication is very good at finding duplicated blocks in unstructured data such as uncompressed tar files and documents. Variable Block deduplication typically will create blocks of 10k-16k. This makes Variable block deduplication more salable than fixed block deduplication when it is performed at 4k block sizes. The downside of Variable block deduplication is that it can be computationally intensive and sometimes slower for write processing.

 

Creating and Mounting SDFS File Systems:

Both stand alone and clustered SDFS volumes are created through the sdfscli command line. There are many options available within the command line but most of the options are set to their optimal setting. Multiple SDFS Volumes can be hosted on a single host. All volume configurations are stored, by default in /etc/sdfs .

CREATING A STANDALONE SDFS VOLUME

A simple standalone volume named “dedup” with a dedup capacity of 1TB and using variable block deduplication run the following command:

mkfs.sdfs --volume-name=dedup --volume-capacity=1TB

The following will create a volume that has a dedup capacity of 1TB and a unique block size of 32K

mkfs.sdfs --volume-name=dedup --volume-capacity=1TB --io-chunk-size=32

By default volumes store all data in the folder structure /opt/sdfs/. This may not be optimal and can be changed before a volume is mounted for the first time. In addition, volume configurations are held in the /etc/sdfs folder. Each volume configuration is created when the mkfs.sdfs command is run and stored as an XML file and its naming convention is -volume-cfg.xml.

SDFS Volumes are mounted with the mount.sdfs command. Mounting a volume typically typically is executed by running “mount.sdfs -v -m . As an example “mount.sdfs -v sdfs -m /media/dedup will mount the volume as configured by /etc/sdfs/sdfs-volume-cfg.xml to the path /media/dedup. Volume mounting options are as follows:

-c              sdfs volume will be compacted and then exit

-d              debug output-forcecompact   sdfs volume will be compacted even if it is missing blocks. This option is used in conjunction with -c

-h              displays available options

-m        mount point for SDFS file system e.g. /media/dedup

-nossl          If set ssl will not be used sdfscli traffic.

-o        fuse mount options.Will default to direct_io,big_writes,allow_other,fsname=SDFS

-rv       comma separated list of remote volumes that should also be accounted for when doing garbage collection. If not entered the volume will attempt to identify other volumes in the cluster.

-s              Run single threaded

-v        sdfs volume to mount e.g. dedup

-vc       sdfs volume configuration file to mount e.g. /etc/sdfs/dedup-volume-cfg.xml

Volumes are unmounted automatically when the mount.sdfs is killed or the volume is unmounted using  the umount command.

To mount a volume run

Exporting SDFS Volumes:

SDFS can be shared through NFS or ISCSI exports on Linux kernel 2.6.31 and above.

NFS Exports

SDFS is supported and has been tested with NFSv3. NFS opens and closes files with every read or write. File open and closes are expensive for SDFS and as such can degrade performance when running over NFS. SDFS volumes can be optimized for NFS with the option “–io-safe-close=false” when creating the volume. This will leave files open for NFS reads and writes. Files data will still be sync’d with every write command, so data integrity will still be maintained. Files will be closed after an inactivity period has been reached. By default this inactivity period is 15 (900) seconds minutes but can be changed at any time, along with the io-safe-close option within the xml configuration file located in /etc/sdfs/-volume-cfg.xml.

To export an SDFS Volume of FSS via NFS use the fsid= option as part of the syntax in your /etc/exports. As an example, an SDFS volume is mounted at /media/pool0 and you wanted to export it to the world you would use the following syntax in your /etc/exports

/media/pool0 *(rw,async,no_subtree_check,fsid=12)

ISCSI Exports

SDFS is supported and has been tested with LIO using fileio. This means that LIO serves up a file on an SDFS volume as a virtual volume itself. On the SDFS volume the exported volumes are represented as large files within the filesystem. This is a common setup for ISCI exports. The following command squence will server up an 100GB ISCSI volume from a SDFS filesystem mounted at /media/pool0 without any authentication.

tcm_node –fileio fileio_0/test /media/pool0.test.iscsi 107374182400

lio_node –addlun  iqn.2013.org.opendedup.iscsi 1 0 test fileio_0/test

lio_node –addnp iqn.2013.org.opendedup.iscsi 1 0.0.0.0:3260

lio_node –permissive iqn.2013.org.opendedup.iscsi 1

lio_node –disableauth=iqn.2013.org.opendedup.iscsi 1

echo 0 > /sys/kernel/config/target/iscsi/iqn.2013.org.opendedup.iscsi/tpgt_1/attrib/demo_mode_write_protect

Managing SDFS Volumes for Virtual Machines:

It was the original goal of SDFS to be a file system of virtual machines. Again, to get proper deduplication rates for VMDK files set io-chunk-size to “4” when creating the volume. This will match the chunk size of the guest os file system usually. NTFS allow 32k chunk sizes but not on root volumes. It may be advantageous, for Windows guest environments, to have the root volume on one mounted SDFS path at 4k chunk size and data volumes in another SDFS path at 32k chunk sizes. Then format the data ntfs volumes, within the guest, for 32k chunk sizes. This will provide optimal performance.

Managing SDFS Volumes through SDFS command line

Online SDFS management is done through sdfscli. This is a command line executable that allows access to management and information about a particular SDFS volume. The volume in question must be mounted when the command line is executed. below are the command line parameters that can be run. Also help is available for the command line when run as sdfscli –help . The volume itself will listen on as an https service on a port  starting with 6442. By default the volume will only listen on the loopback adapter. This can be changed during volume creation but adding the “enable-replication-master” option. In addition, after creation, this can be changed by modifying the “listen-address” attribute in the sdfscli tag within the xml config.  If multiple volumes are mounted the volume will automatically choose the next highest available port. The tcp port can be determined by running a “df -h” the port will be designated after the “:” within the device name.

usage: sdfs.cmd

–archive-out                   Creates an archive tar for a particular file or folder and outputs the location.

–change-password        Change the administrative password.

–cleanstore                Clean the dedup storage engine of data that is older than defined minutes and is unclaimed by current files. This command only worksif the dedup storage engine is local and not in network mode

–cluster-dse-info                    Returns Dedup Storage Engine Statitics for all Storage Nodes in the cluster.

–cluster-make-gc-master     Makes this host the current Garbage Collection Coordinator.

–cluster-redundancy-check            makes sure that the storage cluster maintains the required number of copies for each block of data

–cluster-volume-add            Adds an unassociated volume in the cluster.

–cluster-volume-remove         Removes an unassociated volume in the cluster.

–cluster-volumes                     Returns A List of SDFS Volumes in the cluster.

–debug                               makes output more verbose

–debug-info                          Returns Debug Information.

–dedup-file             Deduplicates all file blocks if set to true, otherwise it will only dedup blocks that are already stored in the DSE.

–file-path=

–dse-info                            Returns Dedup Storage Engine Statitics.

–expandvolume                  Expand the local volume, online, to a size in MB,GB, or TB

–file-info                           Returns io file attributes such as dedup rate and file io statistics.  e.g. –file-info –file-path=

–file-path           The relative path to the file or folder to take action on.

–flush-all-buffers                   Flushes all buffers within an SDFS file system.

–flush-file-buffers                  Flushes to buffer of a praticular file.

–help                                Display these options.

–import-archive                Imports an archive created using archive out.

–replication-master=

–replication-master-password=

–nossl                               If set, tries to connect to volume without ssl

–password                      password to authenticate to SDFS CLI Interface for volume.

–perfmon-on             Turn on or off the volume performance monitor.

–port                          SDFS CLI Interface tcp listening port for volume.

–replication-batch-size        The size,in MB, of the batch that the replication client will request from the replication master. If ignored or set to <1 it will default to the what ever is on the replication client volume as the default. This is currently 30 MB. This will default to “-1”

–replication-batch-size=

–replication-master-port       The server port associated with the archive imported. This will default to “6442”

–server                        SDFS host location.

–snapshot                            Creates a snapshot for a particular file or folder

–snapshot-path       The relative path to the destination of the snapshot.

–volume-info                         Returns SDFS Volume Statistics.

File-system Snapshots:

SDFS provides snapshot functions for files and folders. The snapshot command is “sdfscli –snapshot –snapshot-path= –file-path=”. The destination path is relative to the mount point of the sdfs filesystem.

As an example to snap a file “/mounted-sdfs-volume/source.bin” to /mounted-sdfs-volume/folder/target.bin you would run the following command:

sdfscli –snapshot –snapshot-path=folder/target.bin –file-path=source.bin

The snapshot command makes a copy of the MetaDataDedupFile and the map file and associates this copy with the snapshot path. This means that no actual data is copied and unlimited snapshots can be created, without performance impact to the target or source, since they not associated or linked in any way.

SDFS File System Service Volume Storage Reporting, Utilization, and Compacting A Volume

Reporting

An FSS reports is size and usage to the operating system, by default, based on the capacity and current usage of the data stored within the DSE. This means that a volume could have a much larger amount of logical data than, if all the file sizes in the filesystem was added up, than is reported to the OS. This method of reporting is the most accurate as it reports actual physical capacity and with deduplication you should see the current size as much smaller that the logical data would report.

It is also possible for FSS report the logical capacity and utilization of the volume. This means that SDFS will report the logical capacity, as specified during volume creation as “–volume-capacity” and current usage based on the logical size or the files as reported during an “ls” command. To change the reporting the following parameters will need to be changed in the sdfs xml configuration when the volume is not mounted.

To set the FSS to report capacity based on the –volume-capacity

use-dse-capacity=”false”

To set the FSS to report current utilization based on logical capacity

use-dse-size=”false”

Volume Size can reported many ways. Both operating system tools and the sdfscli can be used to view capacity and usage of a FSS.  A quick way to view the way the filesystem sees a SDFS Volume is running “df -h”. The os will report the volume name as a concatenation of the config and the port the the sdfscli service is listening on. The port is important because it can be used connect to multiple volumes from the sdfscli using the “–port” option. The size and used columns either report the capacity and usage for the DSE or the logical capacity and usage of the volume, depending on configuration.

Volume usage statistics are also reported by the sdfs command line. For both standalone and volumes in a cluster configuration the command line “sdfscli.sh –volume-info” can be executed. This will output statistics about the local volume.

 

Utilization

SDFS Volumes grow over time as more unique data is stored. As unique data is de-referenced from volumes it is deleted from the DSE, if not claimed, during the garbage collection process. Deleted blocks are overwritten over time. Typically this allows for efficient use of storage but if data is aggressively added or deleted a volume can have a lot of empty space and fragmentation where unique blocks that have been deleted used to reside. This option is only available for standalone volumes.

Dedup Storage Engine (DSE):

The Dedup Storage Engine (DSE) provides services to store, retrieve, and remove deduplicated chunks of data. Each SDFS Volume contains its and manages its own DSE.

 

 

Dedup Storage Engine – Cloud Based Deduplication:

The DSE can be configured to store data to the Amazon S3 cloud storage service or Azure. When enabled, all unique blocks will be stored to a bucket of your choosing. Each block is stored and an individual blob in the bucked. Data can be encrypted before transit, and at rest with the S3 cloud using AES-256 bit encryption. In addition, all data is compressed by default before sent to the cloud.

The purpose of deduplicating data before sending it to cloud storage to minimize storage and maximize  write performance. The concept behind deduplication is to only store unique blocks of data. If only unique data is sent to cloud storage, bandwidth can be optimized and cloud storage can be reduced. Opendedup approaches cloud storage differently than a traditional cloud based file system. The volume data such as the name space and file meta-data are stored locally on the the system where the SDFS volume is mounted. Only the unique chunks of data are stored at the cloud storage provider. This ensures maximum performance by allowing all file system functions to be performed locally except for data reads and writes. In addition, local read and write caching should make writing smaller files transparent to the user or service writing to the volume.

Cloud based storage has been enabled for S3 Amazon Web service and Azure. To create a volume using cloud storage take a look at the cloud storage guide here.

Dedup Storage Engine Memory:

The SDFS Filesystem itself uses about 3GB of RAM for internal processing and caching. For hash table caching and chunk storaged kernel memory is used. It is advisable to have enough memory to store the entire hashtable so that SDFS does not have to scan swap space or the file system to lookup hashes.

To calculate memory requirements keep in mind that each stored chunk takes up approximately 256 MB of RAM per 1 TB of unique storage.

Dedup Storage Engine Crash Recovery:

If a Dedup Storage Engine crashes, a recovery process will be initiated on the next start. In a clusterned node setup, a DSE determines if it was not shut down gracefully if /chunkstore/hdb/.lock exists on startup. In a standalone setup the SDFS volume determines crash if the closed-gracefully=”false” within the configuration xml. If a crash is determined, the Dedup Storage Engine will go through the recovery process.

The DSE recovery process re-hashes all of the blocks stored on disk, or get the hashes from the cloud. It then verifies the hashes are in the hash database and adds an entry if none exists. This process will claim all hashes, even those previously dereferenced hashes that were removed during garbage collection.

In a standalone setup, the volume will then perform a garbage collection to de-reference any orphaned block. In a clustered configuration the garbage collection will need to be performed manually.

Data Chunks:

The chunk size must match for both the SDFS Volume and the Deduplication Storage Engine. The default for SDFS is to store chunks at 4K size. The chunk size must be set at volume and Deduplication Storage Engine creation. When Volumes are created with their own local Deduplication Storage Engine chunk sizes are matched up automatically, but, when the Deduplication Storage Engine is run as a network service this must be set before the data is stored within the engine.

Within a SDFS volume chunksize is set upon creation with the option –io-chunk-size. The option –io-chunk-size sets the size of chunks that are hashed and can only be changed before the file system is mounted to for the first time. The default setting is 4K but can be set as high as 128K. The size of chucks determine the efficient at which files will be deduplicated at the cost of RAM. As an example a 4K chunk size SDFS provides perfect deduplication for Virtual Machines (VMDKs) because it matches the cluster size of most guest os file systems but can cost as much as 6GB of RAM per 1TB to store. In contrast setting the chunk size to 128K is perfect of archived, unstructured data, such as rsync backups, and will allow you to store as much as 32TB of data with the same 6GB of memory.

To create a volume that will store VMs (VMDK files) create a volume using 32K chunk size as follows:

sudo ./mkfs.sdfs –volume-name=sdfs_vol1 –volume-capacity=150GB –io-chunk-size=32

As stated, when running SDFS Volumes with a local DSE chunksizes are matched automatically, but if running the DSE as a network service, than a parameter with the DSE configuration XML file will need to be set before any data is stored. The parameter is:

page-size=””.

As an example to set a 4k chunk size the option would need to be set to:

page-size=”4096″

File and Folder Placement:

Deduplication is IO Intensive. SDFS, by default writes data to /opt/sdfs. SDFS does a lot of writes went persisting data and a lot of random IO when reading data. For high IO intensive applications it is suggested that you split at least the chunk-store-data-location and chunk-store-hashdb-location onto fast and separate physical disks. From experience these are the most IO intensive stores and could take advantage of faster IO.

Other options and extended attributes:

SDFS uses extended attributes to manipulate the SDFS file system and files contained within. It is also used to report on IO performance. To get a list of commands and readable IO statistics run “getfattr -d *” within the mount point of the sdfs file system.

sdfscli –file-info –file-path=

SDFS Volume Replication:

SDFS now provides asynchronous master/slave volume and subvolume replication through the sdfsreplicate service and script. SDFS volume replication  takes a snapshot of the disignated master volume or subfolder and then replicated meta-data and unique blocks to the secondary, or slave, SDFS volume. Only unique blocks that are not already stored on the slave volume are replicated so data transfer should be minimal. The benefits of SDFS Replication are:

* Fast replication – SDFS can replicate large volume sets quickly.

* Reduced bandwidth – Only unique data is replicated between volumes

* Build in scheduling – The sdfsreplicate service has a built in scheduling engine based on cron style syntax.

* Sub-volume replication – The sdfsreplicate service can replicate volumes or subfolders to slave volumes. In addition, replication can be set to  be targeted to sub-volumes on the slave.

* Sub-volume targest on the slave allow for wildcard naming such as and appended timestamp or the hostname of the master.

The steps SDFS uses to perform  asynchronous replication are the following:

1. The sdfsreplicate service, on the slave volume host, requests a snapshot of the master volume or subfolder over the manster’s tcp management channel (typically port 6442).

2. The master volume creates a snapshot of all SDFS metadata and data maps.

3. The master volume tar and zips the snapshot metadata and data maps

4. The sdfsreplicate service, on the slave volume host, downloads the snap shot tar over the manster’s tcp management channel (typically port 6442)

5. The slave volume unzips and imports the tar to its volume structure

6. The slave volume imports data associated with the master snapshot to its dedup storage engine from the master volume over the master’s management cli channel. This defaults to TCP port 6442.

The steps required to setup master/slave replication are the following:

1. Configure your SDFS master volume to allow replication. This is done by creating a SDFS volume with the command line parameter    “–enable-replication-master”.  e.g. mkfs.sdfs –volume-name=vol0 –volume-capacity=1TB –io-chunk-size=4 –chunk-store-size=200GB      –enable-replication-master

2. Configure the replication.props configuration file on the slave. An example of this script is included in the etc/sdfs directory    and includes the following parameters:

#Replication master settings

#IP address of the server where the master volume is located

replication.master=master-ip

#Number of copies of the replicated folder to keep. This will use First In First Out.

#It must be used in combination with the replication.slave.folder option %d. If set to -1 it is ignored

replication.copies=-1

#the password of the master. This defaults to “admin”

replication.master.password=admin

#The sdfscli port on the master server. This defaults to 6442

replication.master.port=6442

#The folder within the volume that should be replicated. If you would like to replicate the entire volume use “/”

replication.master.folder=/

#Replication slave settings#The local ip address that the sdfscli is listening on for the slavevolume.

replication.slave=localhost

#the password used on the sdfscli for the slave volume. This defaults to admin

replication.slave.password=admin

#The tcp port the sdfscli is listening on for the slave

replication.slave.port=6442

#The folder where you would like to replicate to wild cards are %d (date as yyMMddHHmmss) %h (remote host)

#the slave folder to replicated to e.g. backup-%h-%d will output “backup--

replication.slave.folder=backup-%h-%d

#The batch size the replication slave requests data from the server in MB. This defaults to 30MB but can be anything up to 128 MB.

replication.batchsize=-1

#Replication service settings#The folder where the SDFS master snapshot will be downloaded to on the slave. The snapshot tar archive is deleted after import.

archive.staging=/tmp

#The log file that will output replication status

logfile=/var/log/sdfs/replication.log

#Schedule cron = as a cron job, single = run one time

schedule.type=cron

#Every 30 minutes take a look at http://www.quartz-scheduler.org/documentation/quartz-2.x/tutorials/tutorial-lesson-06 for scheduling tutorial

schedule.cron=0 0/30 * * * ?

#The folder where job history will be persisted. This defaults to a folder call “replhistory” under the same directory where this file is located.

#job.history.folder=/etc/sdfs/replhistory

3. Run the sdfsreplicate script on the slave. This will either run once and exit if schedule.type=single or will run continuously with    schedule.type=cron

e.g. sdfsreplicate /etc/sdfs/replication.props

Data Chunk Removal:

SDFS uses two methods to remove unused data from an DedupStorage Engine(DSE). If the SDFS volume has its own dedup storage engine, which it does by default. Unused,or orphaned, chunks are removed as the size of the DSE increases at 10% increments and at specified schedule (defaults to midnight). The specified schedule can me configured at creation with the io-claim-chunks-schedule option. Otherwise it can be configured afterwards within the sdfscli command option –set-gc-schedule. Take a look at cron format for more details. to review the accepted cron syntax. Below details the process for garbage collection.

SDFS tracks reference counts for all unique data stored in the DSE.

The DSE checks for data that is no longer referenced.

The chunks that are no longer referenced are:

Stand Alone – Compacted from the Data Archive where they are contained

Cloud Storage – De-referenced from the cloud data archive. Data in the cloud is only removed once all the containing data chunks are removed.

The Dedup Storage Engine can be cleaned manually by running :

sdfscli --cleanstore

Sizing and Scaling

When running OpenDedupe at scale disk, cpu, and memory requirements need to be considered to size appropriately.

Data Stored on Disk

Cloud Volumes – SDFS Stores file metadata, a local hashtable, and a cache of unique blocks on local disk.

Local Volumes –  SDFS Stores file metadata, a local hashtable, and all unique blocks on local disk.

Data Types:

File MetaData – Information about files and folders stored on opendedupe volumes. This data is also stored in the cloud for DR purposes when using cloud storage. File MetaData represents .21% of the non deduplicated size of the file stored.

HashTable – The hashtable is the lookup table that is used to identify whether incoming data is unique. The hashtable is stored on local disk and in the cloud for object storage backed instances. For local instances the hashtable is stored on local disk only. The hashtable is .4% of the unique storage size.

Local Cache – For Object storage backed volumes, active data is cached locally. The local cache stores compressed, deduplicate blocks only. This local cache size is set to 10GB by default but can be set to any capacity required with a minimum of 1GB. The local cache helps with restore performance and accelerated backup performance.

Local Unique Data Store – OpenDedupe stores all unique blocks locally for volumes not backed by object storage. For Object storage backed volumes this is not used. Local storage size will depend on the data being backed up and retention but typically represents 100% of the front end data for a 60 Day retention. OpenDedupe uses a similar variable block deduplication method to a DataDomain so it will be inline with its sizing requirements.

Storage Performance:

Minimum local disk storage performance:

2000 random read IOPS

2400 random write IOPS

180 MB/s of streaming reads

120 MB/s of streaming writes

Supported Filesystems:

VXFS

XFS

EXT4

NTFS (Windows ONLY)

Storage Requirements:

The following percentages should be used to calculate local storage requirements for Object Backed dedupe Volumes:

MetaData: .21% of Non-Deduped Data Stored

Local Cache: 10GB by default

HashTable: .2% of Deduped Data

An example for 100TB of deduped data with an 8:1 dedupe rate would be as follows:

Logical Data Stored on Disk = 8x100TB = 800TB

Local Cache = 10GB

Unique Data Stored in the Object Store 100TB

MetaData

.21%Logical Data Stored on Disk=MetaData Size

.0021x800TB=1.68TB

HashTable

.2% * Unique Storage

.002* 100TB = 400GB

Total Volume Storage Requirements

Local Cache + MetaData + Hashtable

10GB + 1.68TB + 400GB = 2.09TB

 

The following percentages should be used to calculate local storage requirements for local dedupe Volumes:

MetaData: .21% of Non-Deduped Data Stored

Local Cache: 10GB by default

HashTable: .2% of Deduped Data

Unique Data

An example for 100TB of deduped data with an 8:1 dedupe rate would be as follows:

Logical Data Stored on Disk = 8x100TB = 800TB

Unique Data Stored on disk 100TB

MetaData

.21%Logical Data Stored on Disk=MetaData Size

.0021x800TB=1.68TB

HashTable

.2% * Unique Storage

.002* 100TB = 400GB

Total Volume Storage Requirements

Unique + MetaData + Hashtable +

100TB + 1.68TB + 400GB = 102.08TB

Memory Sizing :

Memory for OpenDedupe is primarily used for internal simplified lookup tables (bloom filter) that indicate, with some likelihood that a hash is already stored or not. These data structures take about 256MB per TB of data stored. 3GB of additional base memory is required for other uses.

In addition to memory used by opendedupe you will want to have memory available for filesystem cache to cache the most active parts of the lookup hashtable into ram. For a volume less than 1TB you will need an additional 3GB of ram. For a volume less than 100GB you will need an addition 8GB of RAM. For a volume over 100TB you will need an additional 16GB of ram.

An example for 100TB of deduped data:

Hash Table Memory

200MB per 1TB of Storage

200MB x 100TB = 25.6 GB

3GB of base memory

8GB of Free RAM for Disk Cache

Total = 25.6+3+8=36.6GB of RAM

CPU Sizing:

As long as the disk meets minimum IO and IOPs requirements the primary limiter for OpenDedupe performance will be CPU at higher dedupe rates. At lower dedupe rates volumes will be limited by the speed of the underlying disk.

For a single 16 Core CPU, SDFS will perform at :

2GB/s for 2% Unique Data

Speed of local disk for 100% unique data. Using minimum requirements this would equal 120MB/s.

TroubleShooting:

There are a few common errors with simple fixes.

1. OutOfMemoryError – This is caused by the size of the DedupStorageEngine memory requirements being larger than the heap size allocated for the JVM. To fix this edit the mount.sdfs script and increase the -Xmx2g to something larger (e.g. -Xmx3g).

2. java.io.IOException : Too Many Open Files – This is caused by there not being enough available file handles for underlying filesystem processes. To fix this add the following lines to /etc/security/limits.conf and the relogin/restart your system.

* soft nofile 65535

* hard nofile 6553

DocumentsHome

Download

OpenDedup Virtual NAS Appliance

Linux Quickstart Guide

OST Admin Guide

Windows Quickstart Guide

Administration Guide

SDFS XML Configuration Parameters

What is Deduplication

Overview

Changes

Features

Comparing Deduplication Options

Supported by PolarKey Technologies

Backblaze B2 Enabled

Netbackup And Backup Exec Ready

Free and Opensource

Scalable Performance

Cloud Storage Native

Secure

File System

SocialFollow me on:

OpenDedup sam.silverberg@gmail.com

Documentation

User Forum

Professional Support

Download

GitHub

Powered by WordPress | Theme: Astrid by aThemes.