前面我们已经成功地在yen项目上运行的我们自己的数据集。
但是效果比较差，分析原因可能有以下两点。
1、用于训练的数据集分辨率过低
2、超参数使用不巧当

Learning Object-Compositional Neural Radiance Field for Editable Scene Rendering论文中记录的效果
在这里插入图片描述
我们自己运行出来的效果。

文章目录

目标
args.config
- 基本参数
- training options
- rendering options
- training options
- dataset options
- 加载llff类型数据集的参数
- logging/saving options
Debug 调试获取数据情况
- load_llff.py `_load_data()`
- load_llff.py `_minify()`
- load_llff.py `load_llff_datad()`
- load_llff.py `render_path_spiral（）`
- run_nerf.py `train()`
- - - Create log dir and copy the config file
    - Create nerf model
    - Move testing data to GPU
    - Prepare raybatch tensor if batching random rays
    - Move training data to GPU
    - 开始进入训练的迭代
    - - Sample random ray batch
    - render
    - 保存checkpoint
    - 输出mp4 视频
    - 保存测试数据集
    - render _only
- run_nerf.py `create_nerf()`
- - - Create optimizer
    - Load checkpoints
- run_nerf_helpers.py `class NeRF（）`
- - - __init__()
    - forward（）
- run_nerf_helpers.py `get_rays_np()`
- run_nerf.py ` render()`
- run_nerf.py `batchify_rays()`
- run_nerf.py `render_rays()`
- run_nerf.py `raw2outputs（）`
- run_nerf.py `render_path()`
- 总结

目标

通过阅读yen源码，尝试回答以下问题或达成的目的。

config.txt 文件中，各个参数的含义。
了解代码中重要变量的含义极其计算方式
调整分辨率前后通过COLMAP计算出来的poses和bds是一样的吗？
论文中那些定量的指标是哪里计算的，并且输出在哪里
render_pose 和pose有什么关系。
load_llff_data（）的参数recenter？

方法：所以准备在pycharm中配置解释器，通过设置断点来查看数据详情。

args.config

直到我把train（）的全流程都走完了之后，才意识到一个重要的东西：我应该先看args！！！

基本参数

 	parser.add_argument('--config', is_config_file=True, 
                        help='config file path') # 生成config.txt 文件
    parser.add_argument("--expname", type=str, 
                        help='experiment name') # 指定实验名称
    parser.add_argument("--basedir", type=str, default='./logs/', 
                        help='where to store ckpts and logs') #指定输出目录
    parser.add_argument("--datadir", type=str, default='./data/llff/fern', 
                        help='input data directory') # 指定数据目录

training options

	parser.add_argument("--netdepth", type=int, default=8, 
                        help='layers in network')   # 网络的深度（层数）
    parser.add_argument("--netwidth", type=int, default=256, 
                        help='channels per layer')  # 网络的宽度，也就是每一层的神经元个数
    parser.add_argument("--netdepth_fine", type=int, default=8, 
                        help='layers in fine network')
    parser.add_argument("--netwidth_fine", type=int, default=256, 
                        help='channels per layer in fine network')
    parser.add_argument("--N_rand", type=int, default=32*32*4,  # batch_size，光束的数量。
                        help='batch size (number of random rays per gradient step)')
    parser.add_argument("--lrate", type=float, default=5e-4,  # 学习率
                        help='learning rate')
    parser.add_argument("--lrate_decay", type=int, default=250,  # 指数学习率衰减（1000 步）
                        help='exponential learning rate decay (in 1000 steps)')
    parser.add_argument("--chunk", type=int, default=1024*32,  # 并行处理的光线数量，如果内存不足则减少
                        help='number of rays processed in parallel, decrease if running out of memory')
    parser.add_argument("--netchunk", type=int, default=1024*64,  # 通过网络并行发送的点数，如果内存不足则减少
                        help='number of pts sent through network in parallel, decrease if running out of memory')
    parser.add_argument("--no_batching", action='store_true',  # 一次只能从 1 张图像中获取随机光线
                        help='only take random rays from 1 image at a time')
    parser.add_argument("--no_reload", action='store_true',  # 不要从保存的 ckpt 重新加载权重
                        help='do not reload weights from saved ckpt')
    parser.add_argument("--ft_path", type=str, default=None,  # 为粗略网络重新加载特定权重 npy 文件
                        help='specific weights npy file to reload for coarse network')

rendering options

	parser.add_argument("--N_samples", type=int, default=64,  # 每条射线的粗样本数
                        help='number of coarse samples per ray')
    parser.add_argument("--N_importance", type=int, default=0, # 每条射线的附加精细样本数
                        help='number of additional fine samples per ray')
    parser.add_argument("--perturb", type=float, default=1., # 设置为 0. 无抖动，1. 抖动
                        help='set to 0. for no jitter, 1. for jitter')
    parser.add_argument("--use_viewdirs", action='store_true', 
                        help='use full 5D input instead of 3D')
    parser.add_argument("--i_embed", type=int, default=0,  #为默认位置编码设置 0，为无设置 -1
                        help='set 0 for default positional encoding, -1 for none')
    parser.add_argument("--multires", type=int, default=10,  # 多分辨率。 位置编码的最大频率的 log2（3D 位置）
                        help='log2 of max freq for positional encoding (3D location)')
    parser.add_argument("--multires_views", type=int, default=4,  # 位置编码的最大频率的 log2（2D 方向）
                        help='log2 of max freq for positional encoding (2D direction)')
    parser.add_argument("--raw_noise_std", type=float, default=0.,  #  噪音方差
                        help='std dev of noise added to regularize sigma_a output, 1e0 recommended')

    parser.add_argument("--render_only", action='store_true',  # 不要优化，重新加载权重和渲染 render_poses 路径
                        help='do not optimize, reload weights and render out render_poses path')
    parser.add_argument("--render_test", action='store_true',  # 渲染测试集而不是 render_poses 路径
                        help='render the test set instead of render_poses path')
    parser.add_argument("--render_factor", type=int, default=0,  # 下采样因子以加快渲染速度，设置为 4 或 8 用于快速预览
                        help='downsampling factor to speed up rendering, set 4 or 8 for fast preview')

training options

	parser.add_argument("--precrop_iters", type=int, default=0, # 对主要作物进行培训的步骤数
                        help='number of steps to train on central crops')
    parser.add_argument("--precrop_frac", type=float, # ？
                        default=.5, help='fraction of img taken for central crops')

dataset options

 	parser.add_argument("--dataset_type", type=str, default='llff', 
                        help='options: llff / blender / deepvoxels')
    parser.add_argument("--testskip", type=int, default=8,  # 将从测试/验证集中加载 1/N 图像，这对于像 deepvoxels 这样的大型数据集很有用
                        help='will load 1/N images from test/val sets, useful for large datasets like deepvoxels')

加载llff类型数据集的参数

	parser.add_argument("--factor", type=int, default=8,  # LLFF 图像的下采样因子
                        help='downsample factor for LLFF images')
    parser.add_argument("--no_ndc", action='store_true',   #如果是store_false,则默认值是True，如果是store_true,则默认值是False
                        help='do not use normalized device coordinates (set for non-forward facing scenes)')  #不要使用标准化的设备坐标（为非前向场景设置
    parser.add_argument("--lindisp", action='store_true',# 在视差而不是深度中线性采样 ？
                        help='sampling linearly in disparity rather than depth')
    parser.add_argument("--spherify", action='store_true',   # 球体的
                        help='set for spherical 360 scenes') # 设置为球形 360 场景
    parser.add_argument("--llffhold", type=int, default=8,  # 将每 1/N 个图像作为 LLFF 测试集，论文使用 8
                        help='will take every 1/N images as LLFF test set, paper uses 8')

logging/saving options

	parser.add_argument("--i_print",   type=int, default=100, 
                        help='frequency of console printout and metric loggin')
    parser.add_argument("--i_img",     type=int, default=500, 
                        help='frequency of tensorboard image logging')
    parser.add_argument("--i_weights", type=int, default=10000, 
                        help='frequency of weight ckpt saving')
    parser.add_argument("--i_testset", type=int, default=50000, 
                        help='frequency of testset saving')
    parser.add_argument("--i_video",   type=int, default=50000, 
                        help='frequency of render_poses video saving')

Debug 调试获取数据情况

我们测试的是desk2这个数据集。
其中包含了151张图像。

load_llff.py `_load_data()`

从 poses_bounds.npy 提取的原始数据 poses_arr , size = 151 x 17 .
poses = poses_arr[:, :-2].reshape([-1, 3, 5]).transpose([1,2,0]) (3, 5, 151), poses[0] ↓
bds = poses_arr[:, -2:].transpose([1,0]) （2，151）
img0 = [os.path.join(basedir, 'images', f) for f in sorted(os.listdir(os.path.join(basedir, 'images'))) if f.endswith('JPG') or f.endswith('jpg') or f.endswith('png')][0] 查看单张图片的情况。'.img0 = /data/img_desk2/images/0000.jpg'
sh = imageio.imread(img0).shape 单张图片的shape， (4344, 5792, 3) .
函数创建目标分辨率的数据集，无返回。
imgfiles list类型，包含了目标数据的路径。
再次获取图片的shape ( sh = (543,724,3))
poses[:2, 4, :] = np.array(sh[:2]).reshape([2, 1]) shape(3,5,151) poses[0] ↓
poses[2, 4, :] = poses[2, 4, :] * 1./factor shape(3,5,151) poses[0] ↓
imgs = imgs = [imread(f)[...,:3]/255. for f in imgfiles] 读取所有的图像数据，并把值控制在0-1之间。
imgs = np.stack(imgs, -1) 转为了array类型，shape (543, 727,3,1,151)
return poses, bds, imgs

load_llff.py `_minify()`

这个函数主要负责创建目标分别率的数据集。

检查目标路径是否存在，若存在直接return。

args = ' '.join(['mogrify', '-resize', resizearg, '-format', 'png', '*.{}'.format(ext)])
        print(args)
        os.chdir(imgdir) # 修改当前工作目录
        check_output(args, shell=True)
        os.chdir(wd)

通过以上操作，创建了目标数据集。

load_llff.py `load_llff_datad()`

poses, bds, imgs = _load_data(basedir, factor=factor)

	poses = np.concatenate([poses[:, 1:2, :], -poses[:, 0:1, :], poses[:, 2:, :]], 1)
    poses = np.moveaxis(poses, -1, 0).astype(np.float32)
    imgs = np.moveaxis(imgs, -1, 0).astype(np.float32)
    images = imgs
    bds = np.moveaxis(bds, -1, 0).astype(np.float32)

接下来对数据进行如上的处理，得到的结果如下：
- bds 是 151 *2 规模的。
- images 是（151，543，727，3）分别对应（图片张数、高、宽、通道）
- poses 是（151，3，5），也就是说，对于每张图片，它的opose是个 3*5的数据。
sc = 1. if bd_factor is None else 1./(bds.min() * bd_factor) sc ：进行边界放缩的比例， = 0.859302
poses 进行边界放缩之后即poses[:,:3,3] *= sc，如下
bds *=sc 之后，所有的值都缩小了。即边界缩小了。

    if recenter:
        poses = recenter_poses(poses)

执行poses = recenter_poses(poses) 之后，poses (shape 151,3,5)的值如下：这个操作修改了前四列的值，保持最后一列值不变。（要弄清楚每列的含义）。最后一列是图像的（高，宽，焦距）

		c2w = poses_avg(poses)  # 3x5
        print('recentered', c2w.shape)
        print(c2w[:3,:4])

        ## Get spiral
        # Get average pose
        up = normalize(poses[:, :3, 1].sum(0))   # 3x1

        # Find a reasonable "focus depth" for this dataset
        close_depth, inf_depth = bds.min()*.9, bds.max()*5. # 1.19999, 1116.4336
        dt = .75
        mean_dz = 1./(((1.-dt)/close_depth + dt/inf_depth))  # 4.656
        focal = mean_dz  #焦距

        # Get radii for spiral path  半径
        shrink_factor = .8
        zdelta = close_depth * .2
        tt = poses[:,:3,3] # ptstocam(poses[:3,3,:].T, c2w).T
        rads = np.percentile(np.abs(tt), 90, 0)  # 求90百分位的数值
        c2w_path = c2w
        N_views = 120
        N_rots = 2
        if path_zflat:  # false
#             zloc = np.percentile(tt, 10, 0)[2]
            zloc = -close_depth * .1
            c2w_path[:3,3] = c2w_path[:3,3] + zloc * c2w_path[:3,2]
            rads[2] = 0.
            N_rots = 1
            N_views/=2

        # Generate poses for spiral path
        render_poses = render_path_spiral(c2w_path, up, rads, focal, zdelta, zrate=.5, rots=N_rots, N=N_views)

通过以上代码获取 render_poses,其中
- c2w = poses_avg(poses) shapa( 3,5 ) , 相当于汇合了所有的图像，值如下：
- 中间数值如下：
- tt = poses[:,:3,3]，取所有poses的三列，shape （151，3）
- rads = np.percentile(np.abs(tt), 90, 0) # 求90百分位的数值
- render_poses = render_path_spiral(c2w_path, up, rads, focal, zdelta, zrate=.5, rots=N_rots, N=N_views) 是个list，长度为120 （由N_view确定），每个元素为（3，5），这一点和poses是一样的。
render_poses = np.array(render_poses).astype(np.float32) 转为array，shape (120，3，5）， render_poses[0]
再次计算c2w c2w = poses_avg(poses). 和之前的对比了一下，数值上是一模一样的。
dists = np.sum(np.square(c2w[:3,3] - poses[:,:3,3]), -1) shape 151
i_test = np.argmin(dists) # 取值最小的索引 值为83，HOLDOUT view is 83。
return images, poses, bds, render_poses, i_test。此时 images (151, 543,724,3), poses (151,3,5) ,bds (151,2) render_poses( 120,3,5) , i_test = 83

load_llff.py `render_path_spiral（）`

render_path_spiral（） 中的hwf = c2w[:,4:5]
获得的第一个render_poses 。 render_poses.append(np.concatenate([viewmatrix(z, up, c), hwf], 1))
return render_poses # 类型是list。

run_nerf.py `train()`

images, poses, bds, render_poses, i_test = load_llff_data(args.datadir, args.factor, recenter=True, bd_factor=.75, spherify=args.spherify) 此时 images (151, 543,724,3), poses (151,3,5) ,bds (151,2) render_poses( 120,3,5) , i_test = 83.
hwf = poses[0,:3,-1]
poses = poses[:,:3,:4] ，下面是poses[0]
Loaded llff (151, 543, 724, 3) (120, 3, 5) [543. 724. 537.2688] ./data/img_desk2
Auto LLFF holdout i_test = np.arange(images.shape[0])[::args.llffhold]之后，i_test 变成了下面这个样子。也就是说，获取了多个测试样本。，声明里面也没有默认值，

		i_val = i_test  # 验证集和测试集相同
        i_train = np.array([i for i in np.arange(int(images.shape[0])) if
                        (i not in i_test and i not in i_val)])  # 把剩下的部分当做训练集

通过上述代码获取验证集和训练集。
定义边界 near = 0. far = 1.

    H, W, focal = hwf
    H, W = int(H), int(W)
    hwf = [H, W, focal]

重新获取hwf的值， list 类型， [543, 724, 537.2688]

    if K is None: #　前文自己定义为空的。　
        K = np.array([
            [focal, 0, 0.5*W],
            [0, focal, 0.5*H],
            [0, 0, 1]
        ])

定义ｋ，　shape (3,3), 值如下：

Create log dir and copy the config file

os.makedirs(os.path.join(basedir, expname), exist_ok=True) 创建log目录
f = os.path.join(basedir, expname, 'args.txt') 参数文件 args.txt

    with open(f, 'w') as file:
        for arg in sorted(vars(args)):
            attr = getattr(args, arg)
            file.write('{} = {}n'.format(arg, attr))

把所有的参数都写到文件里面。

Create nerf model

render_kwargs_train, render_kwargs_test, start, grad_vars, optimizer = create_nerf(args) 创建模型。
- start= 0
- optimizer
- render_kwargs_test 是个dist 类型，9个元素
- render_kwargs_train 也是个dist类型， 9个元素。
- grad_vars 是个list，长度为48
global_step = start
bds_dict = { 'near' : near, 'far' : far, } 表示为字典。
render_kwargs_train.update(bds_dict) 更新render_kwargs_train，字典的update操作，更新之后，render_kwargs_train 变为11个元素的字典。即在末尾添加了 'near' = near， 'far' = far,
render_kwargs_test.update(bds_dict)

Move testing data to GPU

render_poses = torch.Tensor(render_poses).to(device)

Prepare raybatch tensor if batching random rays

use_batching = true 的情况下

rays = np.stack([get_rays_np(H, W, K, p) for p in poses[:,:3,:4]], 0) 获取光束。从函数来看，和poses有关。 shape（151,2,543,724,3），也就是[N, ro+rd, H, W, 3]
rays_rgb = np.concatenate([rays, images[:,None]], 1) ， shape (151, 3, 543, 724, 3), 也就是[N, H, W, ro+rd+rgb, 3]。
rays_rgb = np.transpose(rays_rgb, [0,2,3,1,4]) 调换了位置，[N, H, W, ro+rd+rgb, 3]，shape(151, 543, 724, 3, 3)
rays_rgb = np.stack([rays_rgb[i] for i in i_train], 0) 只获取train images的部分。 shape(132, 543, 724, 3, 3) ，总的数量由151 变为了 132。
rays_rgb = np.reshape(rays_rgb, [-1,3,3]) [(N-1)HW, ro+rd+rgb, 3]，shape (51893424, 3, 3) 。这就相当于获得了51893424个光束。 （这里其实不是N-1，因为测试样本并不只有一个）
np.random.shuffle(rays_rgb) 打乱这个光束的顺序。 shape不变。

Move training data to GPU

    if use_batching:
        images = torch.Tensor(images).to(devi

网络知识

【NeRF】深度解读yenchenlin/nerf-pytorch项目

文章目录

目标

args.config

基本参数

training options

rendering options

training options

dataset options

加载llff类型数据集的参数

logging/saving options

Debug 调试获取数据情况

load_llff.py `_load_data()`

load_llff.py `_minify()`

load_llff.py `load_llff_datad()`

load_llff.py `render_path_spiral（）`

run_nerf.py `train()`

Create log dir and copy the config file

Create nerf model

Move testing data to GPU

Prepare raybatch tensor if batching random rays

Move training data to GPU

文章目录

目标

args.config

基本参数

training options

rendering options

training options

dataset options

加载llff类型数据集的参数

logging/saving options

Debug 调试获取数据情况

load_llff.py _load_data()

load_llff.py _minify()

load_llff.py load_llff_datad()

load_llff.py render_path_spiral（）

run_nerf.py train()

Create log dir and copy the config file

Create nerf model

Move testing data to GPU

Prepare raybatch tensor if batching random rays

Move training data to GPU

相关文章

央媒谈“文盲演员”引发饭圈甩锅大战

野生大熊猫深夜遛达 镜头前呆萌打卡

张萌谈演员到底需不需要有文化

情人节多地迎领证高峰

婚房烂尾10年业主住进毛坯房

多地首套房贷利率降至4%以下 为什么急于提前还贷

load_llff.py `_load_data()`

load_llff.py `_minify()`

load_llff.py `load_llff_datad()`

load_llff.py `render_path_spiral（）`

run_nerf.py `train()`

野生大熊猫深夜遛达镜头前呆萌打卡

多地首套房贷利率降至4%以下为什么急于提前还贷