AI描画ツール + Apple全家桶 = ?

イントロダクション#

最近、Apple はCore ML Stable Diffusionプロジェクトを公開しました。このプロジェクトにより、ユーザーや開発者はネイティブの Apple Silicon チップを基に、最先端の AI 描画モデル Stable Diffusion を実行することができます。この記事では、個人的な使用経験に基づいて、技術的背景、プロセス、および遭遇したいくつかの問題について紹介します。

背景#

画像生成分野における近年の最大の突破口は、Diffusion シリーズモデルの登場です。具体的な技術的詳細についてはこちらの記事をご覧ください。

この分野の主要なモデルはGANのいくつかの変種でしたが、モード崩壊や勾配爆発による高いトレーニングコストの問題が常に存在していました。Lipschitz Constraints などのさまざまな方法がこの問題を緩和するために適用されてきましたが、Diffusion モデルには及びません。

Midjourney の登場以来、拡散生成モデルはこの分野で爆発的に成長しており、多くの企業が商業化を始めています。しかし、小規模な開発者にとって、生成モデルを製品クライアントにデプロイすることは依然として難しい課題であり、少なくとも ChatGPT インターフェースを使用するよりもはるかに難しいです。

プロセス#

このプロジェクトは主に 2 つの部分から構成されています。第一の部分は、Hugging Face上の元々Torch ベースの生成モデルを Core ML 形式に再コンパイルすることです。公式には、コンパイル済みの 3 つのバージョンの Stable Diffusion が提供されています。

第二の部分は、このモデルと対応するパラメータを基に画像を生成することです。以前の作業とは異なり、Apple はこれが開発者が画像生成モデルをアプリ開発に統合するのを助けることを望んでいるようで、Swift アプリのバージョンと複数の消費者向け端末でのテストを提供しています。

落とし穴#

新しく公開されたプロジェクトであるため、説明文書にはすべての落とし穴がカバーされていません。以下は、よく遭遇する問題のいくつかです：

リソース不足：消費者向け端末（例えば、私の 16GB M1 Macbook Pro）は、リソース不足による実行停止を避けるためにpip install accelerateが必要です。
アクセス制限：ローカルモデルを直接使用するには、HuggingFace トークンを使用して端末でhuggingface-cli loginを行い、サーバー設定ファイルにアクセスできるようにする必要があります（この問題については該当の issue ページで返信しました）。
環境設定：新しい環境を作成しましょう。

結果#

下の画像は、以下のプロンプト入力から得られたものです。python -m python_coreml_stable_diffusion.pipeline --prompt "a photo of an astronaut riding a horse on mars" --compute-unit ALL -o output --seed 93 -i models/coreml-stable-diffusion-2-base_original_packages --model-version stabilityai/stable-diffusion-2-base

randomSeed_93_computeUnit_ALL_modelVersion_stabilityai_stable-diffusion-2-base

結果はまずまずですが、実行効率は公式文書で示されたデータと比較してまだ大きな差があります。AI の普及の道のりは依然として長いようです。

~~私たちはいつ Siri に病気のために上司に提出する医者の証明書を生成してもらえるのでしょうか？~~

さらなる情報#

https://github.com/apple/ml-stable-diffusion
https://huggingface.co/blog/diffusers-coreml
https://github.com/huggingface/diffusers
https://openaccess.thecvf.com/content/CVPR2022/html/Rombach_High-Resolution_Image_Synthesis_With_Latent_Diffusion_Models_CVPR_2022_paper.html

完全なプロンプト#

(base) henry@HenrydeMacBook-Pro ml-stable-diffusion % python -m python_coreml_stable_diffusion.pipeline --prompt "a photo of an astronaut riding a horse on mars" --compute-unit ALL -o output.png --seed 93 -i models/coreml-stable-diffusion-2-base_original_packages --model-version stabilityai/stable-diffusion-2-base
INFO:__main__:Setting random seed to 93
INFO:__main__:Initializing PyTorch pipe for reference configuration
Fetching 13 files: 100%|██████████████████████████████████████████| 13/13 [00:00<00:00, 63402.27it/s]
/Users/henry/opt/anaconda3/lib/python3.9/site-packages/transformers/models/clip/feature_extraction_clip.py:28: FutureWarning: The class CLIPFeatureExtractor is deprecated and will be removed in version 5 of Transformers. Please use CLIPImageProcessor instead.
  warnings.warn(
WARNING:__main__:Original diffusers pipeline for stabilityai/stable-diffusion-2-base does not have a safety_checker, Core ML pipeline will mirror this behavior.
INFO:__main__:Removed PyTorch pipe to reduce peak memory consumption
INFO:__main__:Loading Core ML models in memory from models/coreml-stable-diffusion-2-base_original_packages
INFO:python_coreml_stable_diffusion.coreml_model:Loading text_encoder mlpackage
INFO:python_coreml_stable_diffusion.coreml_model:Loading models/coreml-stable-diffusion-2-base_original_packages/Stable_Diffusion_version_stabilityai_stable-diffusion-2-base_text_encoder.mlpackage
INFO:python_coreml_stable_diffusion.coreml_model:Done. Took 19.5 seconds.
INFO:python_coreml_stable_diffusion.coreml_model:Loading a CoreML model through coremltools triggers compilation every time. The Swift package we provide uses precompiled Core ML models (.mlmodelc) to avoid compile-on-load.
INFO:python_coreml_stable_diffusion.coreml_model:Loading unet mlpackage
INFO:python_coreml_stable_diffusion.coreml_model:Loading models/coreml-stable-diffusion-2-base_original_packages/Stable_Diffusion_version_stabilityai_stable-diffusion-2-base_unet.mlpackage
INFO:python_coreml_stable_diffusion.coreml_model:Done. Took 139.2 seconds.
INFO:python_coreml_stable_diffusion.coreml_model:Loading a CoreML model through coremltools triggers compilation every time. The Swift package we provide uses precompiled Core ML models (.mlmodelc) to avoid compile-on-load.
INFO:python_coreml_stable_diffusion.coreml_model:Loading vae_decoder mlpackage
INFO:python_coreml_stable_diffusion.coreml_model:Loading models/coreml-stable-diffusion-2-base_original_packages/Stable_Diffusion_version_stabilityai_stable-diffusion-2-base_vae_decoder.mlpackage
INFO:python_coreml_stable_diffusion.coreml_model:Done. Took 6.2 seconds.
INFO:__main__:Done.
INFO:__main__:Initializing Core ML pipe for image generation
WARNING:__main__:You have disabled the safety checker for <class '__main__.CoreMLStableDiffusionPipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
INFO:__main__:Stable Diffusion configured to generate 512x512 images
INFO:__main__:Done.
INFO:__main__:Beginning image generation.
100%|████████████████████████████████████████████████████████████████| 51/51 [00:22<00:00,  2.25it/s]
INFO:__main__:Saving generated image to output.png/a_photo_of_an_astronaut_riding_a_horse_on_mars/randomSeed_93_computeUnit_ALL_modelVersion_stabilityai_stable-diffusion-2-base.png