AI 换脸工具: FaceFusion

本文使用的 FaceFusion 版本为 3.1.0，所有的问题和解决方法均基于此版，版本不同可能会导致些许出入，还需具体情况具体分析。

解除 NSFW 滤镜限制

问题描述

上传的目标图片如果包含涩涩内容，输出结果就会被高斯模糊，这好吗？这不好，涩涩是推动科技进步的重要因素，我劝各位开发者要不忘初心，至少也得留个后门吧。

解决方法

直接注释掉检测函数，不仅可以解除限制，而且跳过检测还可以提高效率。
到 FaceFusion 根目录下找到 facefusion文件夹，编辑 content_analyser.py。找到下列函数并进行修改。

def analyse_stream(vision_frame : VisionFrame, video_fps : Fps) -> bool:
	global STREAM_COUNTER
	# STREAM_COUNTER = STREAM_COUNTER + 1
	# if STREAM_COUNTER % int(video_fps) == 0:
	# 	return analyse_frame(vision_frame)
	return False


def analyse_frame(vision_frame : VisionFrame) -> bool:
	# vision_frame = prepare_frame(vision_frame)
	# probability = forward(vision_frame)
	return False


@lru_cache(maxsize = None)
def analyse_image(image_path : str) -> bool:
	# vision_frame = read_image(image_path)
	return False


@lru_cache(maxsize = None)
def analyse_video(video_path : str, trim_frame_start : int, trim_frame_end : int) -> bool:
	# video_fps = detect_video_fps(video_path)
	# frame_range = range(trim_frame_start, trim_frame_end)
	# rate = 0.0
	# counter = 0
	#
	# with tqdm(total = len(frame_range), desc = wording.get('analysing'), unit = 'frame', ascii = ' =', disable = state_manager.get_item('log_level') in [ 'warn', 'error' ]) as progress:
	# 	for frame_number in frame_range:
	# 		if frame_number % int(video_fps) == 0:
	# 			vision_frame = get_video_frame(video_path, frame_number)
	# 			if analyse_frame(vision_frame):
	# 				counter += 1
	# 		rate = counter * int(video_fps) / len(frame_range) * 100
	# 		progress.update()
	# 		progress.set_postfix(rate = rate)
	return False

修改缓存目录

问题描述

我用的 FaceFusion 整合包使用的是 gradio 界面，它的缓存文件会默认存储在 C 盘的用户\AppData\Local\Temp\gradio，如果 C 盘预留的空间不足或者换脸的视频太大，可能就会导致程序崩溃换脸失败。可以将缓存目录修改至其它空间较大的位置。

解决方法

建议在根目录下创建一个temp文件夹，再到 FaceFusion 根目录下找到 facefusion文件夹，编辑 temp_helper.py，将'E:\\AI\\facefusion-3.1\\facefusion-3.1.0\\facefusion-3.1.0\\temp'替换为自己的目录位置。

1
2
3

def get_temp_directory_path(file_path : str) -> str:
	temp_file_name, _ = os.path.splitext(os.path.basename(file_path))
	return os.path.join('E:\\AI\\facefusion-3.1\\facefusion-3.1.0\\facefusion-3.1.0\\temp', 'facefusion', temp_file_name)

修改默认设置

问题描述

根据设备性能和换脸结果调整得到合适参数，为了避免每次打开都要重复填写参数的操作，可以修改 FaceFusion 的启动参数。

解决方法

在根目录下找到facefusion.ini进行编辑。

[paths]
temp_path =
jobs_path =
source_paths =
target_path =
output_path = E:\AI

[patterns]
source_pattern =
target_pattern =
output_pattern =

[face_detector]
face_detector_model =
face_detector_size =
face_detector_angles =
face_detector_score =

[face_landmarker]
face_landmarker_model =
face_landmarker_score =

[face_selector]
face_selector_mode =
face_selector_order =
face_selector_age_start =
face_selector_age_end =
face_selector_gender =
face_selector_race =
reference_face_position =
reference_face_distance =
reference_frame_number =

[face_masker]
face_occluder_model =
face_parser_model =
face_mask_types = box occlusion
face_mask_blur =
face_mask_padding =
face_mask_regions =

[frame_extraction]
trim_frame_start =
trim_frame_end =
temp_frame_format =
keep_temp =

[output_creation]
output_image_quality =
output_image_resolution =
output_audio_encoder =
output_video_encoder =
output_video_preset =
output_video_quality =
output_video_resolution =
output_video_fps =
skip_audio =

[processors]
processors = face_swapper face_enhancer
age_modifier_model =
age_modifier_direction =
deep_swapper_model = 
deep_swapper_morph =
expression_restorer_model = 
expression_restorer_factor =
face_debugger_items =
face_editor_model =
face_editor_eyebrow_direction =
face_editor_eye_gaze_horizontal =
face_editor_eye_gaze_vertical =
face_editor_eye_open_ratio =
face_editor_lip_open_ratio =
face_editor_mouth_grim =
face_editor_mouth_pout =
face_editor_mouth_purse =
face_editor_mouth_smile =
face_editor_mouth_position_horizontal =
face_editor_mouth_position_vertical =
face_editor_head_pitch =
face_editor_head_yaw =
face_editor_head_roll =
face_enhancer_model = 
face_enhancer_blend =
face_enhancer_weight =
face_swapper_model =
face_swapper_pixel_boost =
frame_colorizer_model =
frame_colorizer_size =
frame_colorizer_blend =
frame_enhancer_model =
frame_enhancer_blend =
lip_syncer_model =

[uis]
open_browser =
ui_layouts =
ui_workflow =

[execution]
execution_device_id = 
execution_providers = cuda
execution_thread_count = 32
execution_queue_count = 4


[download]
download_providers =
download_scope =

[memory]
video_memory_strategy = tolerant
system_memory_limit = 32

[misc]
log_level =

部分参数

启动参数

python run.py [options]

options:
  -h, --help                                                                                                             show this help message and exit
  -s SOURCE_PATHS, --source SOURCE_PATHS                                                                                 choose single or multiple source images or audios
  -t TARGET_PATH, --target TARGET_PATH                                                                                   choose single target image or video
  -o OUTPUT_PATH, --output OUTPUT_PATH                                                                                   specify the output file or directory
  -v, --version                                                                                                          show program's version number and exit

misc:
  --skip-download                                                                                                        omit automate downloads and remote lookups
  --headless                                                                                                             run the program without a user interface
  --log-level {error,warn,info,debug}                                                                                    adjust the message severity displayed in the terminal

execution:
  --execution-providers EXECUTION_PROVIDERS [EXECUTION_PROVIDERS ...]                                                    accelerate the model inference using different providers (choices: cpu, ...)
  --execution-thread-count [1-128]                                                                                       specify the amount of parallel threads while processing
  --execution-queue-count [1-32]                                                                                         specify the amount of frames each thread is processing

memory:
  --video-memory-strategy {strict,moderate,tolerant}                                                                     balance fast frame processing and low vram usage
  --system-memory-limit [0-128]                                                                                          limit the available ram that can be used while processing

face analyser:
  --face-analyser-order {left-right,right-left,top-bottom,bottom-top,small-large,large-small,best-worst,worst-best}      specify the order in which the face analyser detects faces.
  --face-analyser-age {child,teen,adult,senior}                                                                          filter the detected faces based on their age
  --face-analyser-gender {female,male}                                                                                   filter the detected faces based on their gender
  --face-detector-model {retinaface,yoloface,yunet}                                                                      choose the model responsible for detecting the face
  --face-detector-size FACE_DETECTOR_SIZE                                                                                specify the size of the frame provided to the face detector
  --face-detector-score [0.0-1.0]                                                                                        filter the detected faces base on the confidence score

face selector:
  --face-selector-mode {reference,one,many}                                                                              use reference based tracking with simple matching
  --reference-face-position REFERENCE_FACE_POSITION                                                                      specify the position used to create the reference face
  --reference-face-distance [0.0-1.5]                                                                                    specify the desired similarity between the reference face and target face
  --reference-frame-number REFERENCE_FRAME_NUMBER                                                                        specify the frame used to create the reference face

face mask:
  --face-mask-types FACE_MASK_TYPES [FACE_MASK_TYPES ...]                                                                mix and match different face mask types (choices: box, occlusion, region)
  --face-mask-blur [0.0-1.0]                                                                                             specify the degree of blur applied the box mask
  --face-mask-padding FACE_MASK_PADDING [FACE_MASK_PADDING ...]                                                          apply top, right, bottom and left padding to the box mask
  --face-mask-regions FACE_MASK_REGIONS [FACE_MASK_REGIONS ...]                                                          choose the facial features used for the region mask (choices: skin, left-eyebrow, right-eyebrow, left-eye, right-eye, eye-glasses, nose, mouth, upper-lip, lower-lip)

frame extraction:
  --trim-frame-start TRIM_FRAME_START                                                                                    specify the the start frame of the target video
  --trim-frame-end TRIM_FRAME_END                                                                                        specify the the end frame of the target video
  --temp-frame-format {bmp,jpg,png}                                                                                      specify the temporary resources format
  --temp-frame-quality [0-100]                                                                                           specify the temporary resources quality
  --keep-temp                                                                                                            keep the temporary resources after processing

output creation:
  --output-image-quality [0-100]                                                                                         specify the image quality which translates to the compression factor
  --output-video-encoder {libx264,libx265,libvpx-vp9,h264_nvenc,hevc_nvenc}                                              specify the encoder use for the video compression
  --output-video-preset {ultrafast,superfast,veryfast,faster,fast,medium,slow,slower,veryslow}                           balance fast video processing and video file size
  --output-video-quality [0-100]                                                                                         specify the video quality which translates to the compression factor
  --output-video-resolution OUTPUT_VIDEO_RESOLUTION                                                                      specify the video output resolution based on the target video
  --output-video-fps OUTPUT_VIDEO_FPS                                                                                    specify the video output fps based on the target video
  --skip-audio                                                                                                           omit the audio from the target video

frame processors:
  --frame-processors FRAME_PROCESSORS [FRAME_PROCESSORS ...]                                                             load a single or multiple frame processors. (choices: face_debugger, face_enhancer, face_swapper, frame_enhancer, lip_syncer, ...)
  --face-debugger-items FACE_DEBUGGER_ITEMS [FACE_DEBUGGER_ITEMS ...]                                                    load a single or multiple frame processors (choices: bounding-box, landmark-5, landmark-68, face-mask, score, age, gender)
  --face-enhancer-model {codeformer,gfpgan_1.2,gfpgan_1.3,gfpgan_1.4,gpen_bfr_256,gpen_bfr_512,restoreformer_plus_plus}  choose the model responsible for enhancing the face
  --face-enhancer-blend [0-100]                                                                                          blend the enhanced into the previous face
  --face-swapper-model {blendswap_256,inswapper_128,inswapper_128_fp16,simswap_256,simswap_512_unofficial,uniface_256}   choose the model responsible for swapping the face
  --frame-enhancer-model {real_esrgan_x2plus,real_esrgan_x4plus,real_esrnet_x4plus}                                      choose the model responsible for enhancing the frame
  --frame-enhancer-blend [0-100]                                                                                         blend the enhanced into the previous frame
  --lip-syncer-model {wav2lip_gan}                                                                                       choose the model responsible for syncing the lips

uis:
  --ui-layouts UI_LAYOUTS [UI_LAYOUTS ...]                                                                               launch a single or multiple UI layouts (choices: benchmark, default, webcam, ...)

VIDEO MEMORY STRATEGY(显存占用策略)

Strict（严格）: 在执行时会严格限制显存使用(默认选项)，适用于显存资源较为有限的系统环境。
Moderate（适中）: 在适中模式下，会更灵活地使用显存。
Tolerant（宽容）: 宽容模式下，在处理大型或复杂任务时使用更多的显存，而不是在使用量上设定太多限制，适合显存资源充足的硬件设备。

TEMP FRAME FORMAT

FaceFusion在换脸前需要将整个目标视频拆解成每一帧并临时存放到临时目录当中。这个过程中要单独处理每一帧的图片。TEMP FRAME FORMAT这个选项用来指定每帧的图片格式，不同的格式占用的硬盘空间大小和处理速度是不一样的。

JPG格式的图片通常文件小，保存快，但是图片的细节可能会有一点点损失；
PNG格式文件稍微大一点，处理可能慢一些，但保存的图片质量会更好一些。

OUTPUT VIDEO PRESET

ultrafast: 这个预设是编码速度最快的，但通常会导致视频质量最差和文件大小最大。
superfast: 类似于ultrafast，但略微提升了质量和文件压缩。
veryfast: 在快速编码和视频质量之间提供了一个比上两者更好的平衡。
faster: 这个选项比veryfast慢一点，但提供了更好的压缩效率。
fast: 在编码速度和视频质量之间保持较好的平衡。
medium: 这通常是默认的预设，它提供了编码速度和文件压缩的良好平衡。
slow: 开始降低编码速度，但提供更好的压缩效率，通常意味着视频质量更好，文件大小较小。
slower: 这个预设比slow更进一步减慢编码速度，进一步提升压缩效率。
veryslow: 这个预设提供最佳的压缩，但以编码速度为代价，适合不急于快速完成编码且需要最佳视频质量和文件大小的场合。

OUTPUT VIDEO QUALITY

该选项用来调控输出视频文件的压缩质量。设置值范围通常是从80到100，该数值越高，输出视频的质量就越好，但同时文件大小也会增加。

FACE_DEBUGGER_ITEMS

landmark-5和landmark-68：用于面部识别和面部表情跟踪。
landmark-5通常提供五个的基本关键点，包括眼睛、鼻尖和嘴巴；而landmark-68则提供更多，达到68个关键点，这包括了面部的轮廓、眼睛、眉毛、鼻子、嘴巴等细节部位。

精准度: landmark-68 由于关键点更多，能够提供更详细的面部结构信息，在换脸时能够实现更为精细和自然的结果。例如，在细节处理如唇形、眉型调整上会更精确。
速度: landmark-5 在实施上比 landmark-68 要快8-10%。这是因为处理的关键点数量减少了，从而简化了计算过程。