登峰造极境

  • WIN
    • CSharp
    • JAVA
    • OAM
    • DirectX
    • Emgucv
  • UNIX
    • FFmpeg
    • QT
    • Python
    • Opencv
    • Openwrt
    • Twisted
    • Design Patterns
    • Mysql
    • Mycat
    • MariaDB
    • Make
    • OAM
    • Supervisor
    • Nginx
    • KVM
    • Docker
    • OpenStack
  • WEB
    • ASP
    • Node.js
    • PHP
    • Directadmin
    • Openssl
    • Regex
  • APP
    • Android
  • AI
    • Algorithm
    • Deep Learning
    • Machine Learning
  • IOT
    • Device
    • MSP430
  • DIY
    • Algorithm
    • Design Patterns
    • MATH
    • X98 AIR 3G
    • Tucao
    • fun
  • LIFE
    • 美食
    • 关于我
  • LINKS
  • ME
Claves
长风破浪会有时,直挂云帆济沧海
  1. 首页
  2. AI
  3. Deep Learning
  4. 正文

使用Xinference部署Qwen及向量化模型

2024-03-06

通过docker部署xinference,对外暴漏openai兼容的api,供外部软件使用。

硬件:GTX2080 TI 22G

模型:qwen1.5-chat-13b-qint4 + bge-base-zh-v1.5

桌面软件:Chatbox

最终实现了qwen的部署,且通过openai的标准API接口,实现了Chatbox接入了私有模型。

xinference docker-compose.yml配置:

version: '3.5'

services:
  
  xinference:
    container_name: xinference
    hostname: xinference
    image: xprobe/xinference:v0.9.1
    privileged: true
    runtime: nvidia
    restart: "no"
    environment:
      - XINFERENCE_MODEL_SRC=modelscope
    volumes:
      - ./volumes/xinference/.xinference:/root/.xinference:rw
      - ./volumes/xinference/.cache/huggingface:/root/.cache/huggingface:rw
      - ./volumes/xinference/.cache/modelscope:/root/.cache/modelscope:rw
    command:
      - xinference-local
      - -H 
      - 0.0.0.0
      - -p 
      - '9997'
      - --log-level
      - debug
    ports:
      - "9997:9997"
      - "9931-9940:9931-9940"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
        limits:
          memory: 60480M


最终效果

标签: 暂无
最后更新:2024-03-06

代号山岳

知之为知之 不知为不知

点赞

COPYRIGHT © 2099 登峰造极境. ALL RIGHTS RESERVED.

Theme Kratos Made By Seaton Jiang

蜀ICP备14031139号-5

川公网安备51012202000587号