작업 디렉토리 생성
(.venv) bluesanta@bluesanta-desktop:~/llm$ sudo mkdir -p /opt/llama.cpp
(.venv) bluesanta@bluesanta-desktop:~/llm$ sudo chown bluesanta:bluesanta -Rf /opt/llama.cpp/
서비스 파일 생성
(.venv) bluesanta@bluesanta-desktop:~/llm$ sudo vi /etc/systemd/system/llama.service
[Unit]
Description=Llama.cpp Server Service
After=network.target
[Service]
# 사용자 계정
User=bluesanta
Group=bluesanta
WorkingDirectory=/opt/llama.cpp
# 최적화된 실행 명령어
ExecStart=/usr/local/bin/llama-server \
-m /home/bluesanta/llm/models/gemma-4-26b-Q4_K_M.gguf \
--host 0.0.0.0 \
--port 8080 \
--ctx-size 4096 \
--n-gpu-layers 99 \
--threads 12 \
--no-mmap \
--mlock \
--cont-batching \
--metrics
# 프로세스 종료 시 자동 재시작 설정
# Restart=always
# RestartSec=5
[Install]
WantedBy=multi-user.target
서비스 등록
(.venv) bluesanta@bluesanta-desktop:~/llm$ sudo systemctl enable llama.service
Created symlink /etc/systemd/system/multi-user.target.wants/llama.service → /etc/systemd/system/llama.service.
서비스 실행
(.venv) bluesanta@bluesanta-desktop:~/llm$ sudo systemctl start llama
서비스 상태 확인
(.venv) bluesanta@bluesanta-desktop:~/llm$ sudo systemctl status llama
● llama.service - Llama.cpp Server Service
Loaded: loaded (/etc/systemd/system/llama.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2026-04-28 10:58:28 KST; 3min 47s ago
Main PID: 8449 (llama-server)
Tasks: 16 (limit: 74767)
Memory: 10.7G
CPU: 8.245s
CGroup: /system.slice/llama.service
└─8449 /usr/local/bin/llama-server -m /home/bluesanta/llm/models/gemma-4-26b-Q4_K_M.gguf --host 0.0.0.0 --port 8080 --ctx-size 4096 --n-gpu-layers 99 --threads 12 --no-mmap --mlock --cont-batching --metrics
4월 28 10:58:37 bluesanta-desktop llama-server[8449]: Hi there<turn|>
4월 28 10:58:37 bluesanta-desktop llama-server[8449]: <|turn>user
4월 28 10:58:37 bluesanta-desktop llama-server[8449]: How are you?<turn|>
4월 28 10:58:37 bluesanta-desktop llama-server[8449]: <|turn>model
4월 28 10:58:37 bluesanta-desktop llama-server[8449]: '
4월 28 10:58:37 bluesanta-desktop llama-server[8449]: srv init: init: chat template, thinking = 1
4월 28 10:58:37 bluesanta-desktop llama-server[8449]: main: model loaded
4월 28 10:58:37 bluesanta-desktop llama-server[8449]: main: server is listening on http://0.0.0.0:8080
4월 28 10:58:37 bluesanta-desktop llama-server[8449]: main: starting the main loop...
4월 28 10:58:37 bluesanta-desktop llama-server[8449]: srv update_slots: all slots are idle
확인
bluesanta@bluesanta-desktop:~$ curl http://localhost:8080/completion \
-H "Content-Type: application/json" \
-d '{
"prompt": "Jetson AGX Orin의 장점 3가지는?",
"n_predict": 256
}'
{"index":0,"content":"\n\n \n\n most major or im3 \n most major or im3 \n most \n most thought\n<channel|>NVIDIA **Jetson AGX Orin**은 임베디드 시스템(Edge AI) 분야에서 현존하는 가장 강력한 성능을 제공하는 모듈 중 하나입니다. 질문하신 내용에 대해 3가지 핵심 장점을 정리해 드립니다.\n\n### 1. 압도적인 AI 연산 성능 (High Through-put)\nJetson AGX Orin의 가장 큰 장점은 동급 임베디드 모듈 중 최고 수준의 **275 TOPS**(INT8 기준)라는 강력한 AI 연산 능력을 갖추고 있다는 점입니다.\n* **장점:** 복잡한 딥러닝 모델(CNN, Transformer 등)을 지연 시간(Latency) 없이 실시간으로 실행할 수 있습니다. 이는 자율주행 로봇이나 정밀 의료 기기처럼 초 단위의 빠른 판단이 필요한 시스템에서 결정적인 역할을 합니다.\n\n### 2. 탁월한 전력 효율성 (Performance per Watt)\nAGX Orin은 고성능을 유지하면서도 전력 소모를 정밀하게 제어할 수","tokens":[],"id_slot":3,"stop":true,"model":"gemma-4-26b-Q4_K_M.gguf","tokens_predicted":256,"tokens_evaluated":15,"generation_settings":{"seed":4294967295,"temperature":1.0,"dynatemp_range":0.0,"dynatemp_exponent":1.0,"top_k":64,"top_p":0.949999988079071,"min_p":0.05000000074505806,"top_n_sigma":-1.0,"xtc_probability":0.0,"xtc_threshold":0.10000000149011612,"typical_p":1.0,"repeat_last_n":64,"repeat_penalty":1.0,"presence_penalty":0.0,"frequency_penalty":0.0,"dry_multiplier":0.0,"dry_base":1.75,"dry_allowed_length":2,"dry_penalty_last_n":4096,"dry_sequence_breakers":["\n",":","\"","*"],"mirostat":0,"mirostat_tau":5.0,"mirostat_eta":0.10000000149011612,"stop":[],"max_tokens":256,"n_predict":256,"n_keep":0,"n_discard":0,"ignore_eos":false,"stream":false,"logit_bias":[],"n_probs":0,"min_keep":0,"grammar":"","grammar_lazy":false,"grammar_triggers":[],"preserved_tokens":[],"chat_format":"Content-only","reasoning_format":"deepseek","reasoning_in_content":false,"generation_prompt":"","samplers":["penalties","dry","top_n_sigma","top_k","typ_p","top_p","min_p","xtc","temperature"],"speculative.n_max":16,"speculative.n_min":0,"speculative.p_min":0.75,"speculative.type":"none","speculative.ngram_size_n":1024,"speculative.ngram_size_m":1024,"speculative.ngram_m_hits":1024,"timings_per_token":false,"post_sampling_probs":false,"backend_sampling":false,"lora":[]},"prompt":"Jetson AGX Orin의 장점 3가지는?","has_new_line":true,"truncated":false,"stop_type":"limit","stopping_word":"","tokens_cached":270,"timings":{"cache_n":0,"prompt_n":15,"prompt_ms":9557.021,"prompt_per_token_ms":637.1347333333334,"prompt_per_second":1.5695267385098348,"predicted_n":256,"predicted_ms":7686.338,"predicted_per_token_ms":30.0247578125,"predicted_per_second":33.305847335883485}}