728x90

출처

havenoammo/Qwen3.6-35B-A3B-MTP-GGUF

llama.cpp 소스 다운로드

(.venv) bluesanta@ubuntu:~/llm$ git clone https://github.com/ggml-org/llama.cpp.git
(.venv) bluesanta@ubuntu:~/llm$ cd llama.cpp

최신 원격 변경 사항을 가져오기

(.venv) bluesanta@ubuntu:~/llm/llama.cpp$ git fetch origin
From https://github.com/ggml-org/llama.cpp
 * [new tag]             b9151      -> b9151

PR #22673을 로컬 브랜치로 가져오기

PR #22673("llama + spec: MTP 지원")은 speculative decoding 기능을 추가

(.venv) bluesanta@ubuntu:~/llm/llama.cpp$ git fetch origin pull/22673/head:pr-22673
From https://github.com/ggml-org/llama.cpp
 * [new tag]             b9156      -> b9156

Checkout master and reset to latest remote

(.venv) bluesanta@ubuntu:~/llm/llama.cpp$ git checkout master
Already on 'master'
Your branch is up to date with 'origin/master'.
(.venv) bluesanta@ubuntu:~/llm/llama.cpp$ git reset --hard 856c3adac
HEAD is now at 856c3adac hexagon: eliminate scalar VTCM loads via HVX splat helpers (#22993)

Merge the PR on top (non-fast-forward)

(.venv) bluesanta@ubuntu:~/llm/llama.cpp# git merge --no-ff pr-22673 -m "Merge [PR #22673](https://github.com/ggml-org/llama.cpp/pull/22673): llama + spec: MTP Support"
Merge made by the 'ort' strategy.
 .devops/intel.Dockerfile                                                     |    20 +-
 .editorconfig                                                                |     8 -
 .gitattributes                                                               |     4 -
 
 delete mode 100644 tools/server/public/index.html
 delete mode 100644 tools/server/public/loading.html

빌드 llama-server

(.venv) bluesanta@ubuntu:~/llm/llama.cpp$ cmake -B build -DGGML_CUDA=ON
bluesanta@ubuntu:~/llm/llama.cpp$ cmake --build build --config Release -j$(nproc)

llama-server 설치

bluesanta@ubuntu:~/llm/llama.cpp$ sudo cmake --install build

llama-server 버전 확인

bluesanta@ubuntu:~/llm/llama.cpp$ ./build/bin/llama-server --version
version: 9173 (0672285b2)
built with GNU 11.4.0 for Linux aarch64

unsloth/Qwen3.6-35B-A3B-MTP-GGUF

소스 다운로드

bluesanta@ubuntu:~/llm$ git clone https://github.com/ggml-org/llama.cpp.git llama.cpp-22673-mtp
bluesanta@ubuntu:~/llm$ cd llama.cpp-22673-mtp
bluesanta@ubuntu:~/llm/llama.cpp-22673-mtp$ git fetch origin pull/22673/head:pr-22673-mtp
remote: Enumerating objects: 158, done.
remote: Counting objects: 100% (128/128), done.
remote: Compressing objects: 100% (9/9), done.
remote: Total 158 (delta 119), reused 119 (delta 119), pack-reused 30 (from 2)
Receiving objects: 100% (158/158), 158.55 KiB | 12.20 MiB/s, done.
Resolving deltas: 100% (124/124), completed with 32 local objects.
From https://github.com/ggml-org/llama.cpp
 * [new ref]             refs/pull/22673/head -> pr-22673-mtp
bluesanta@ubuntu:~/llm/llama.cpp-22673-mtp$ git checkout pr-22673-mtp
Switched to branch 'pr-22673-mtp'
bluesanta@ubuntu:~/llm/llama.cpp-22673-mtp$ git status
On branch pr-22673-mtp
nothing to commit, working tree clean
bluesanta@ubuntu:~/llm/llama.cpp-22673-mtp$ git merge pr-22673-mtp -m "Merge MTP support from PR #22673"

빌드

bluesanta@ubuntu:~/llm/llama.cpp-22673-mtp$ cmake -B build -DGGML_CUDA=ON -DCMAKE_BUILD_TYPE=Release
bluesanta@ubuntu:~/llm/llama.cpp-22673-mtp$ cmake --build build -j$(nproc)

llama-server 설치

bluesanta@ubuntu:~/llm/llama.cpp-22673-mtp$ sudo cmake --install build

llama-server 버전 확인

bluesanta@ubuntu:~/llm/llama.cpp-22673-mtp$ ./build/bin/llama-server --version
version: 9173 (0672285b2)
built with GNU 11.4.0 for Linux aarch64
728x90

+ Recent posts