728x90
출처
- havenoammo/Qwen3.6-35B-A3B-MTP-GGUF · Hugging Face
- unsloth/Qwen3.6-35B-A3B-MTP-GGUF · Hugging Face
- Introducing Unsloth Studio | Unsloth Documentation
havenoammo/Qwen3.6-35B-A3B-MTP-GGUF
llama.cpp 소스 다운로드
(.venv) bluesanta@ubuntu:~/llm$ git clone https://github.com/ggml-org/llama.cpp.git
(.venv) bluesanta@ubuntu:~/llm$ cd llama.cpp
최신 원격 변경 사항을 가져오기
(.venv) bluesanta@ubuntu:~/llm/llama.cpp$ git fetch origin
From https://github.com/ggml-org/llama.cpp
* [new tag] b9151 -> b9151
PR #22673을 로컬 브랜치로 가져오기
PR #22673("llama + spec: MTP 지원")은 speculative decoding 기능을 추가
(.venv) bluesanta@ubuntu:~/llm/llama.cpp$ git fetch origin pull/22673/head:pr-22673
From https://github.com/ggml-org/llama.cpp
* [new tag] b9156 -> b9156
Checkout master and reset to latest remote
(.venv) bluesanta@ubuntu:~/llm/llama.cpp$ git checkout master
Already on 'master'
Your branch is up to date with 'origin/master'.
(.venv) bluesanta@ubuntu:~/llm/llama.cpp$ git reset --hard 856c3adac
HEAD is now at 856c3adac hexagon: eliminate scalar VTCM loads via HVX splat helpers (#22993)
Merge the PR on top (non-fast-forward)
(.venv) bluesanta@ubuntu:~/llm/llama.cpp# git merge --no-ff pr-22673 -m "Merge [PR #22673](https://github.com/ggml-org/llama.cpp/pull/22673): llama + spec: MTP Support"
Merge made by the 'ort' strategy.
.devops/intel.Dockerfile | 20 +-
.editorconfig | 8 -
.gitattributes | 4 -
delete mode 100644 tools/server/public/index.html
delete mode 100644 tools/server/public/loading.html
빌드 llama-server
(.venv) bluesanta@ubuntu:~/llm/llama.cpp$ cmake -B build -DGGML_CUDA=ON
bluesanta@ubuntu:~/llm/llama.cpp$ cmake --build build --config Release -j$(nproc)
llama-server 설치
bluesanta@ubuntu:~/llm/llama.cpp$ sudo cmake --install build
llama-server 버전 확인
bluesanta@ubuntu:~/llm/llama.cpp$ ./build/bin/llama-server --version
version: 9173 (0672285b2)
built with GNU 11.4.0 for Linux aarch64
unsloth/Qwen3.6-35B-A3B-MTP-GGUF
소스 다운로드
bluesanta@ubuntu:~/llm$ git clone https://github.com/ggml-org/llama.cpp.git llama.cpp-22673-mtp
bluesanta@ubuntu:~/llm$ cd llama.cpp-22673-mtp
bluesanta@ubuntu:~/llm/llama.cpp-22673-mtp$ git fetch origin pull/22673/head:pr-22673-mtp
remote: Enumerating objects: 158, done.
remote: Counting objects: 100% (128/128), done.
remote: Compressing objects: 100% (9/9), done.
remote: Total 158 (delta 119), reused 119 (delta 119), pack-reused 30 (from 2)
Receiving objects: 100% (158/158), 158.55 KiB | 12.20 MiB/s, done.
Resolving deltas: 100% (124/124), completed with 32 local objects.
From https://github.com/ggml-org/llama.cpp
* [new ref] refs/pull/22673/head -> pr-22673-mtp
bluesanta@ubuntu:~/llm/llama.cpp-22673-mtp$ git checkout pr-22673-mtp
Switched to branch 'pr-22673-mtp'
bluesanta@ubuntu:~/llm/llama.cpp-22673-mtp$ git status
On branch pr-22673-mtp
nothing to commit, working tree clean
bluesanta@ubuntu:~/llm/llama.cpp-22673-mtp$ git merge pr-22673-mtp -m "Merge MTP support from PR #22673"
빌드
bluesanta@ubuntu:~/llm/llama.cpp-22673-mtp$ cmake -B build -DGGML_CUDA=ON -DCMAKE_BUILD_TYPE=Release
bluesanta@ubuntu:~/llm/llama.cpp-22673-mtp$ cmake --build build -j$(nproc)
llama-server 설치
bluesanta@ubuntu:~/llm/llama.cpp-22673-mtp$ sudo cmake --install build
llama-server 버전 확인
bluesanta@ubuntu:~/llm/llama.cpp-22673-mtp$ ./build/bin/llama-server --version
version: 9173 (0672285b2)
built with GNU 11.4.0 for Linux aarch64728x90