Commit graph

  • 7d5f18468c
    examples : add easy python script to create quantized (k-bit support) GGML models from local HF Transformer models (#2311) Richard Roberson 2023-07-21 13:01:10 -0600
  • d924522a46
    Custom RoPE + bettter memory management for CUDA (#2295) Kawrakow 2023-07-21 17:27:51 +0300
  • 4d76a5f49b
    Faster Q3_K implementation on Metal (#2307) Kawrakow 2023-07-21 17:05:30 +0300
  • 0db14fef06
    ggml : fix the rope fix (513f861953) Georgi Gerganov 2023-07-21 15:16:55 +0300
  • 03e566977b
    examples : fix typo in minigpt4.py (#2298) Ikko Eltociear Ashimine 2023-07-21 20:53:07 +0900
  • 513f861953
    ggml : fix rope args order + assert (#2054) Georgi Gerganov 2023-07-21 14:51:34 +0300
  • 3973b25a64
    gitignore : fix final newline Georgi Gerganov 2023-07-21 14:42:41 +0300
  • ab0e26bdfb
    llama : remove cfg smooth factor as it is only a reparameterization of the guidance scale (#2280) Guillaume "Vermeille" Sanchez 2023-07-21 12:58:36 +0200
  • 73643f5fb1
    gitignore : changes for Poetry users + chat examples (#2284) Jose Maldonado 2023-07-21 06:53:27 -0400
  • a814d04f81
    make : fix indentation Georgi Gerganov 2023-07-21 13:50:55 +0300
  • 4c013bb738
    ci : fix MNT realpath usage (#2250) Georgi Gerganov 2023-07-21 13:48:18 +0300
  • 42c7c2e2e9
    make : support customized LLAMA_CUDA_NVCC and LLAMA_CUDA_CCBIN (#2275) Sky Yan 2023-07-21 18:38:57 +0800
  • 78a3d13424
    flake : remove intel mkl from flake.nix due to missing files (#2277) wzy 2023-07-21 18:26:34 +0800
  • ae178ab46b
    llama : make tensor_split ptr instead of array (#2272) Georgi Gerganov 2023-07-21 13:10:51 +0300
  • 54e3bc76fe
    make : add new target for test binaries (#2244) Jiří Podivín 2023-07-21 12:09:16 +0200
  • 019fe257bb
    MIKU MAYHEM: Upgrading the Default Model for Maximum Fun 🎉 (#2287) Hatsune Miku 2023-07-21 08:13:18 +0000
  • e68c96f7fe
    Faster Q2_K on Metal (#2297) Kawrakow 2023-07-21 10:44:40 +0300
  • 9cf022a188
    make : fix embdinput library and server examples building on MSYS2 (#2235) Przemysław Pawełczyk 2023-07-21 09:42:21 +0200
  • e782c9e735
    Faster Q5_K and Q6_K on Metal (#2294) Kawrakow 2023-07-20 18:19:45 +0300
  • 785829dfe8
    Faster Q4_K on Metal (#2290) Kawrakow 2023-07-20 15:18:43 +0300
  • fff0e0eafe llama : fix regression from #2000 - could not load no-mmap models Georgi Gerganov 2023-07-20 13:47:26 +0300
  • 417a85a001
    metal: minor q4 optimization and reduce code size (#2248) Shouzheng Liu 2023-07-20 06:32:22 -0400
  • 294f424554
    llama : extend API to get max devices at runtime (#2253) Rinne 2023-07-19 15:06:40 +0800
  • 45a1b07e9b
    flake : update flake.nix (#2270) wzy 2023-07-19 15:01:55 +0800
  • b1f4290953
    cmake : install targets (#2256) wzy 2023-07-19 15:01:11 +0800
  • d01bccde9f
    ci : integrate with ggml-org/ci (#2250) Georgi Gerganov 2023-07-18 14:24:43 +0300
  • 6cbf9dfb32
    llama : shorten quantization descriptions Georgi Gerganov 2023-07-18 11:50:49 +0300
  • 7568d1a2b2
    Support dup & cont ops on CUDA (#2242) Jiahao Li 2023-07-18 01:39:29 +0800
  • b7647436cc
    llama : fix t_start_sample_us initialization warning (#2238) Alex Klinkhamer 2023-07-16 14:01:45 -0700
  • 672dda10e4
    ggml : fixed runtime bugs and compile errors related to GGML_PERF and GGML_DEBUG (#2219) Qingyou Meng 2023-07-17 03:57:28 +0800
  • 27ab66e437
    py : turn verify-checksum-models.py into executable (#2245) Jiří Podivín 2023-07-16 21:54:47 +0200
  • 6e7cca4047
    llama : add custom RoPE (#2054) Xiao-Yong Jin 2023-07-15 06:34:16 -0400
  • a6803cab94
    flake : add runHook preInstall/postInstall to installPhase so hooks function (#2224) Dave Della Costa 2023-07-14 15:13:38 -0400
  • 7dabc66f3c
    make : use pkg-config for OpenBLAS (#2222) wzy 2023-07-15 03:05:08 +0800
  • 7cdd30bf1f
    cuda : allocate all temporary ggml_tensor_extra_gpu from a fixed-size buffer (#2220) Bach Le 2023-07-15 03:00:58 +0800
  • e8035f141e
    ggml : fix static_assert with older compilers #2024 (#2218) Evan Miller 2023-07-14 14:55:56 -0400
  • 7513b7b0a1
    llama : add functions that work directly on model (#2197) Bach Le 2023-07-15 02:55:24 +0800
  • de8342423d
    build.zig : install config header (#2216) Ali Chraghi 2023-07-14 11:50:58 -0700
  • c48c525f87
    examples : fixed path typos in embd-input (#2214) Shangning Xu 2023-07-15 02:40:05 +0800
  • 206e01de11
    cuda : support broadcast add & mul (#2192) Jiahao Li 2023-07-15 02:38:24 +0800
  • 4304bd3cde
    CUDA: mul_mat_vec_q kernels for k-quants (#2203) Johannes Gäßler 2023-07-14 19:44:08 +0200
  • 229aab351c
    make : fix combination of LLAMA_METAL and LLAMA_MPI (#2208) James Reynolds 2023-07-14 11:34:40 -0600
  • 697966680b
    ggml : sync (ggml_conv_2d, fix mul_mat bug, CUDA GLM rope) Georgi Gerganov 2023-07-14 16:36:41 +0300
  • 27ad57a69b
    Metal: faster Q4_0 and Q4_1 matrix x vector kernels (#2212) Kawrakow 2023-07-14 12:46:21 +0300
  • 32c5411631
    Revert "Support using mmap when applying LoRA (#2095)" (#2206) Howard Su 2023-07-13 21:58:25 +0800
  • ff5d58faec
    Fix compile error on Windows CUDA (#2207) Howard Su 2023-07-13 21:58:09 +0800
  • b782422a3e
    devops : add missing quotes to bash script (#2193) Bodo Graumann 2023-07-13 15:49:14 +0200
  • 1cbf561466
    metal : new q4_0 matrix-vector kernel (#2188) Shouzheng Liu 2023-07-12 16:10:55 -0400
  • 975221e954
    ggml : broadcast mul_mat + conv batch support (#2199) Georgi Gerganov 2023-07-12 20:51:29 +0300
  • 4523d10d0c ggml : add ggml_pool_1d and ggml_pool_2d Georgi Gerganov 2023-07-12 20:27:03 +0300
  • 680e6f9177 cuda : add gelu support Georgi Gerganov 2023-07-12 20:26:18 +0300
  • 2516af4cd6
    Merge branch 'ggerganov:master' into master SIGSEGV 2023-07-12 19:18:43 +0530
  • 4e7464ef88
    FP16 is supported in CM=6.0 (#2177) Howard Su 2023-07-12 20:18:40 +0800
  • 2b5eb72e10
    Fixed __dp4a compute capability: 6.0 -> 6.1 (#2189) Johannes Gäßler 2023-07-12 10:38:52 +0200
  • f7d278faf3
    ggml : revert CUDA broadcast changes from #2183 (#2191) Georgi Gerganov 2023-07-12 10:54:19 +0300
  • 20d7740a9b
    ggml : sync (abort callback, mul / add broadcast, fix alibi) (#2183) Georgi Gerganov 2023-07-11 22:53:34 +0300
  • 5bf2a27718
    ggml : remove src0 and src1 from ggml_tensor and rename opt to src (#2178) Spencer Sutton 2023-07-11 12:31:10 -0400
  • c9c74b4e3f
    llama : add classifier-free guidance (#2135) Bach Le 2023-07-12 00:18:43 +0800
  • ff34a7d385
    add asan flag; convert.py -> llama-convert.py aditya 2023-07-11 21:48:21 +0530
  • 3a72049dad
    Merge branch 'ggerganov:master' into master SIGSEGV 2023-07-11 21:46:23 +0530
  • 3ec7e596b2
    docker : add '--server' option (#2174) Jinwoo Jeong 2023-07-12 01:12:35 +0900
  • 7f75f68795
    Merge branch 'ggerganov:master' into master SIGSEGV 2023-07-11 21:39:48 +0530
  • 917831c63a
    readme : fix zig build instructions (#2171) Chad Brewbaker 2023-07-11 11:03:06 -0500
  • 2347463201
    Support using mmap when applying LoRA (#2095) Howard Su 2023-07-11 22:37:01 +0800
  • bbef28218f
    Possible solution to allow K-quants on models with n_vocab!=32000 (#2148) LostRuins 2023-07-11 22:01:08 +0800
  • c1f29d1bb1
    Merge branch 'ggerganov:master' into master SIGSEGV 2023-07-11 00:36:02 +0530
  • 5656d10599
    mpi : add support for distributed inference via MPI (#2099) Evan Miller 2023-07-10 11:49:56 -0400
  • 26a3a99526
    update flake.lock aditya 2023-07-10 17:24:33 +0530
  • 82412f9d67
    add pip aditya 2023-07-10 17:24:18 +0530
  • 1d16309969
    llama : remove "first token must be BOS" restriction (#2153) oobabooga 2023-07-09 05:59:53 -0300
  • db4047ad5c
    main : escape prompt prefix/suffix (#2151) Nigel Bosch 2023-07-09 03:56:18 -0500
  • 18780e0a5e
    readme : update Termux instructions (#2147) JackJollimore 2023-07-09 05:20:43 -0300
  • 3bbc1a11f0
    ggml : fix buidling with Intel MKL but ask for "cblas.h" issue (#2104) (#2115) clyang 2023-07-09 16:12:20 +0800
  • 2492a53fd0
    readme : add more docs indexes (#2127) rankaiyx 2023-07-09 15:38:42 +0800
  • 64639555ff
    Fixed OpenLLaMA 3b CUDA mul_mat_vec_q (#2144) Johannes Gäßler 2023-07-08 20:01:44 +0200
  • 061f5f8d21
    CUDA: add __restrict__ to mul mat vec kernels (#2140) Johannes Gäßler 2023-07-08 00:25:15 +0200
  • 84525e7962
    docker : add support for CUDA in docker (#1461) dylan 2023-07-07 11:25:25 -0700
  • a7e20edf22
    ci : switch threads to 1 (#2138) Georgi Gerganov 2023-07-07 21:23:57 +0300
  • 1d656d6360
    ggml : change ggml_graph_compute() API to not require context (#1999) Qingyou Meng 2023-07-08 00:24:01 +0800
  • 7242140283 ggml : remove sched_yield() call in ggml_graph_compute_thread() (#2134) Georgi Gerganov 2023-07-07 18:36:37 +0300
  • 3e08ae99ce
    convert.py: add mapping for safetensors bf16 (#1598) Aarni Koskela 2023-07-07 16:12:49 +0300
  • 481f793acc
    Fix opencl by wrap #if-else-endif with \n (#2086) Howard Su 2023-07-07 11:34:18 +0800
  • dfd9fce6d6
    ggml : fix restrict usage Georgi Gerganov 2023-07-06 19:41:31 +0300
  • 36680f6e40
    convert : update for baichuan (#2081) Judd 2023-07-07 00:23:49 +0800
  • a17a2683d8
    alpaca.sh : update model file name (#2074) tslmy 2023-07-06 09:17:50 -0700
  • 31cfbb1013
    Expose generation timings from server & update completions.js (#2116) Tobias Lütke 2023-07-05 16:51:13 -0400
  • 983b555e9d
    Update Server Instructions (#2113) Jesse Jojo Johnson 2023-07-05 18:03:19 +0000
  • ec326d350c
    ggml : fix bug introduced in #1237 Georgi Gerganov 2023-07-05 20:44:11 +0300
  • 1b6efeab82
    tests : fix test-grad0 Georgi Gerganov 2023-07-05 20:20:05 +0300
  • 1b107b8550
    ggml : generalize quantize_fns for simpler FP16 handling (#1237) Stephan Walter 2023-07-05 16:13:06 +0000
  • 8567c76b53
    Update server instructions for web front end (#2103) Jesse Jojo Johnson 2023-07-05 15:13:35 +0000
  • 924dd22fd3
    Quantized dot products for CUDA mul mat vec (#2067) Johannes Gäßler 2023-07-05 14:19:42 +0200
  • 051c70dcd5
    llama: Don't double count the sampling time (#2107) Howard Su 2023-07-05 18:31:23 +0800
  • 9e4475f5cf
    Fixed OpenCL offloading prints (#2082) Johannes Gäßler 2023-07-05 08:58:05 +0200
  • 7f0e9a775e
    embd-input: Fix input embedding example unsigned int seed (#2105) Nigel Bosch 2023-07-04 18:33:33 -0500
  • b472f3fca5
    readme : add link web chat PR Georgi Gerganov 2023-07-04 22:25:22 +0300
  • ed9a54e512
    ggml : sync latest (new ops, macros, refactoring) (#2106) Georgi Gerganov 2023-07-04 21:54:11 +0300
  • f257fd2550
    Add an API example using server.cpp similar to OAI. (#2009) jwj7140 2023-07-05 03:06:12 +0900
  • 7ee76e45af
    Simple webchat for server (#1998) Tobias Lütke 2023-07-04 10:05:27 -0400
  • acc111caf9
    Allow old Make to build server. (#2098) Henri Vasserman 2023-07-04 15:38:04 +0300