Commit graph

  • 2d099e5193
    ggml: add names to tensors (#1268) slaren 2023-05-02 16:03:00 +0200
  • f4cef87edf
    Add git-based build information for better issue tracking (#1232) DannyDaemonic 2023-05-01 09:23:47 -0700
  • 58b367c2d7
    cuBLAS: refactor and optimize f16 mat mul performance (#1259) slaren 2023-05-01 18:11:07 +0200
  • ea3a0ad6b6
    llama : update stubs for systems without mmap and mlock (#1266) xloem 2023-05-01 08:58:51 -0400
  • 2bdc09646d
    ggml : fix ggml_used_mem() (#1264) Kerfuffle 2023-05-01 05:56:07 -0600
  • 70269cae37
    llama : fix session load / save (#1263) Georgi Gerganov 2023-05-01 14:54:59 +0300
  • b925f1f1b0
    cuBLAS: fall back to pageable memory if pinned alloc fails (#1233) slaren 2023-05-01 13:32:22 +0200
  • 90b19bd6ee
    llama : let context be const when accessing const data (#1261) Alex Klinkhamer 2023-05-01 00:24:20 -0700
  • 7ff0dcd320
    ggml : fix UB (int << 31) Georgi Gerganov 2023-04-30 22:28:51 +0300
  • 6f79699286
    build: add armv{6,7,8} support to cmake (#1251) Pavol Rusnak 2023-04-30 20:48:38 +0200
  • a5d30b1f53
    common : better default number of threads (#934) jon-chuang 2023-04-30 14:41:35 -0400
  • 76a884920a
    ggml : add CLBlast q5_0, q5_1, q8_0 dequant kernels (#1225) 0cc4m 2023-04-30 20:34:52 +0200
  • 6bc4400e67
    ggml : add Q5 WASM SIMD + GGML_FTYPE Georgi Gerganov 2023-04-30 19:07:00 +0300
  • f0d70f147d
    Various fixes to mat_mul benchmark (#1253) Stephan Walter 2023-04-30 12:32:37 +0000
  • 3e5aa8a1c4
    ggml : fix labels for GGML_OP_ALIBI Georgi Gerganov 2023-04-30 10:25:46 +0300
  • c3ca7a5f05
    ggml : fix 32-bit ARM NEON Georgi Gerganov 2023-04-29 21:34:23 +0300
  • e8c051611a
    ggml : use vzip instead of vuzp for consistency Georgi Gerganov 2023-04-29 21:12:56 +0300
  • 0b5a935099
    ggml : fix visibility and unused warnings Georgi Gerganov 2023-04-29 19:28:36 +0300
  • ec728e44d7
    ggml : fix #if for f32_f32 mul_mat (CLBlast) (#1229) Georgi Gerganov 2023-04-29 18:43:42 +0300
  • 214b6a3570
    ggml : adjust mul_mat_f16 work memory (#1226) Georgi Gerganov 2023-04-29 18:43:28 +0300
  • 305eb5afd5
    build : fix reference to old llama_util.h Georgi Gerganov 2023-04-29 13:53:12 +0300
  • 84ca9c2ecf
    examples : fix save-load-state + rename llama-util.h Georgi Gerganov 2023-04-29 13:48:11 +0300
  • 334637e43e
    common : change default parameters to pre-#1126 (#1223) Georgi Gerganov 2023-04-29 09:51:06 +0300
  • dd7eff57d8
    llama : new sampling algorithms (#1126) Ivan Stepanov 2023-04-29 08:34:41 +0300
  • 7fc50c051a
    cuBLAS: use host pinned memory and dequantize while copying (#1207) slaren 2023-04-29 02:04:18 +0200
  • b1ee8f59b4
    cuBLAS: non-contiguous tensor support (#1215) Henri Vasserman 2023-04-29 02:31:56 +0300
  • 36d19a603b
    Remove Q4_3 which is no better than Q5 (#1218) Stephan Walter 2023-04-28 23:10:43 +0000
  • 7f15c5c477
    readme : update hot topics Georgi Gerganov 2023-04-28 21:32:52 +0300
  • 55390bcaf2
    ggml : sync ggml (ggml_alibi) Georgi Gerganov 2023-04-28 20:37:43 +0300
  • 5fba3c016b
    examples : add Jeopardy example (#1168) CRD716 2023-04-28 11:13:33 -0500
  • 1481a9cf25
    llama : add session file format and saved sessions in main (#1169) Evan Jones 2023-04-28 11:59:37 -0400
  • 11d902364b
    ggml : add helper debug printf in soft_max Georgi Gerganov 2023-04-28 17:58:44 +0300
  • 7296c961d9
    ggml : add CLBlast support (#1164) 0cc4m 2023-04-28 16:57:16 +0200
  • 78ec543733
    Correcting link to w64devkit (#1214) Folko-Ven 2023-04-28 19:22:48 +0500
  • 92a6e13a31
    Add Manjaro CUDA include and lib dirs to Makefile (#1212) Johannes Gäßler 2023-04-28 15:40:32 +0200
  • 04aaae1d79
    add avx2 for dot_q8_0_q8_0, 2x faster than scalar (#1211) Yann Follet 2023-04-28 19:59:48 +0800
  • 0b2da20538
    ggml : slightly faster AVX2 implementation for Q5 (#1197) Stephan Walter 2023-04-26 20:26:42 +0000
  • f9be42add0
    readme : add quantization info Georgi Gerganov 2023-04-26 23:24:42 +0300
  • 574406dc7e
    ggml : add Q5_0 and Q5_1 quantization (#1187) Georgi Gerganov 2023-04-26 23:14:13 +0300
  • 87a6f846d3
    Allow setting the rng seed after initialization. (#1184) Ásgeir Bjarni Ingvarsson 2023-04-26 20:08:43 +0000
  • ea3ad7eb60
    Updating build instructions to include BLAS support (#1183) DaniAndTheWeb 2023-04-26 22:03:03 +0200
  • 859fee6dfb
    quantize : use map to assign quantization type from string (#1191) Pavol Rusnak 2023-04-26 18:43:27 +0200
  • 4afcc37869
    Update SHA256SUMS after quantization change (#1181) Stephan Walter 2023-04-25 21:41:56 +0000
  • 667c501334
    py : cast lora_alpha to int in convert-lora-to-ggml (#1170) ostix360 2023-04-25 23:33:08 +0200
  • bb98e77be7
    nix: use convert.py instead of legacy wrapper convert-pth-to-ggml.py (#981) Pavol Rusnak 2023-04-25 23:19:57 +0200
  • 7a32fcb3b2
    ggml : add Q8_0 quantization format (rename the old one to Q8_1) (ARM NEON) (#1179) Georgi Gerganov 2023-04-25 23:40:51 +0300
  • dd0eabc049
    ggml : use full range for Q4_0 and Q4_2 quantization (#729) unbounded 2023-04-25 19:20:46 +0200
  • 54bb60e268
    ggml : fix bug in ggml_compute_forward_sum_f32 (#1162) xaedes 2023-04-24 23:02:02 +0200
  • 8a0f8673ba
    ggml : export symbols (#1155) Georgi Gerganov 2023-04-24 22:18:25 +0300
  • 0c5692345d
    examples : add save_load_state example (#1150) xaedes 2023-04-24 18:23:31 +0200
  • 957c8ae21d
    llama : increase scratch buffer size for 65B (ref #1152) Georgi Gerganov 2023-04-24 18:47:03 +0300
  • 9b0a4d4214
    examples/main README improvements and some light refactoring (#1131) mgroeber9110 2023-04-24 17:45:32 +0200
  • 2ec83428de
    Fix build for gcc 8 and test in CI (#1154) Stephan Walter 2023-04-24 15:38:26 +0000
  • e4cf982e0d
    Fix cuda compilation (#1128) slaren 2023-04-24 17:29:58 +0200
  • c4fe84fb0d
    llama : refactor get / set state + remove redundant kv cache API (#1143) Georgi Gerganov 2023-04-24 07:40:02 +0300
  • 1d78fecdab
    Fix LoRA acronym (#1145) slaren 2023-04-23 23:03:44 +0200
  • 284685f169
    scripts : add helper scripts to synch ggml repo Georgi Gerganov 2023-04-23 19:57:09 +0300
  • edce63baa9
    Added README.md for main with examples and explanations (#1139) DannyDaemonic 2023-04-23 08:37:02 -0700
  • ec9cdb6752
    ggml : do not print perf ops that have not been used at all Georgi Gerganov 2023-04-23 18:32:52 +0300
  • e4422e299c
    ggml : better PERF prints + support "LLAMA_PERF=1 make" Georgi Gerganov 2023-04-23 18:15:39 +0300
  • 53c8434398
    Improve AVX2 for vec_dot_q4_3_q8_0 (#1138) Stephan Walter 2023-04-23 11:01:03 +0000
  • c6524f46eb
    readme : update gpt4all instructions (#980) Pavol Rusnak 2023-04-23 10:21:26 +0200
  • c9e2c26f41
    A better packNibbles and mul_sum_i8_pairs_float implementation using AVX512 (#1119) Yishuo Wang 2023-04-23 15:57:05 +0800
  • 0e018fe008
    ggml : fix Q4_3 cuBLAS Georgi Gerganov 2023-04-22 16:31:56 +0300
  • 857308d1e8
    ci : trigger CI for drafts, but not most PR actions (#1125) Stephan Walter 2023-04-22 13:12:29 +0000
  • c50b628810
    Fix CI: ARM NEON, quantization unit tests, editorconfig (#1122) Stephan Walter 2023-04-22 10:54:13 +0000
  • 5f939498d5
    ggml : unit test for quantization functions (#953) unbounded 2023-04-22 11:10:39 +0200
  • 36b4f7e064
    llama : print timings on ctrl+c exit (#1021) wbpxre150 2023-04-22 16:56:35 +0800
  • 10f19c1121
    llama : have n_batch default to 512 (#1091) eiery 2023-04-22 04:27:05 -0400
  • 7e312f165c
    cmake : fix build under Windows when enable BUILD_SHARED_LIBS (#1100) Howard Su 2023-04-22 16:18:20 +0800
  • 872c365a91 ggml : fix AVX build + update to new Q8_0 format Georgi Gerganov 2023-04-22 11:08:12 +0300
  • 955ef9a5d5
    ggml : alternative Q4_3 implementation using modified Q8_0 (#1109) Georgi Gerganov 2023-04-22 10:55:35 +0300
  • c5aa5e5777
    ggml : AVX2 optimization for vec_dot_q4_3_q8_0 and refactoring (#1099) Stephan Walter 2023-04-22 07:37:05 +0000
  • e9a9cb0c54
    examples : Improve Alpaca Default Repeat Penalty: Better Match Alpaca.cpp Experience (#1107) Clint Herron 2023-04-22 02:54:33 -0400
  • b6e7f9b09e
    llama : add api for getting/setting the complete state: rng, logits, embedding and kv_cache (#1105) xaedes 2023-04-22 08:21:32 +0200
  • 50cb666b8a
    Improve cuBLAS performance by using a memory pool (#1094) slaren 2023-04-21 21:59:17 +0200
  • 25d7abbd1f
    llama : fixed rlimit error message (#888) apaz 2023-04-21 13:48:06 -0500
  • 018f2279f5
    cmake : link threads publicly to ggml (#1042) 源文雨 2023-04-22 02:27:06 +0800
  • 9411288271
    main : evaluate tokens in batches after swapping context (#1014) Alex Klinkhamer 2023-04-21 11:18:09 -0700
  • 8687c1f258
    llama : remember and restore kv cache data pointers (#1104) xaedes 2023-04-21 17:25:21 +0200
  • 1bfc153e2f
    ggml : a faster version for Q4_1 x Q8_0 dot products (#1083) Kawrakow 2023-04-21 17:18:26 +0200
  • 3d59769c3b
    Show perplexity ETA in hours and minutes (#1096) slaren 2023-04-21 14:57:57 +0200
  • d40fded93e
    llama : fix comment for "output.weight" tensor Georgi Gerganov 2023-04-21 10:23:36 +0300
  • 2510c1831f
    Add ggml-model-*.bin checksums for 7B, 13B, 30B, 65B (#1088) Stephan Walter 2023-04-20 21:56:44 +0000
  • 12b5900dbc
    ggml : sync ggml (add GPT-NeoX RoPE implementation) Georgi Gerganov 2023-04-20 23:32:59 +0300
  • 9ff334f3c9
    ggml : fix bug in ggml_compute_forward_dup_f32() Georgi Gerganov 2023-04-20 21:58:05 +0300
  • 2005469ea1
    Add Q4_3 support to cuBLAS (#1086) slaren 2023-04-20 20:49:53 +0200
  • 8a1756abdf
    ggml : do not break cuBLAS build (Q4_3 is not yet implemented) Georgi Gerganov 2023-04-20 21:43:50 +0300
  • 66aab46079
    ggml : fix Q4_3 quantization Georgi Gerganov 2023-04-20 20:44:05 +0300
  • 38de86a711
    llama : multi-threaded quantization (#1075) Kawrakow 2023-04-20 19:42:27 +0200
  • e0305ead3a
    ggml : add Q4_3 quantization (#1082) Georgi Gerganov 2023-04-20 20:35:53 +0300
  • 6a9661ea5a
    ci : remove the LLAMA_ACCELERATE matrix dimension from Ubuntu builds in the CI (#1074) Ivan Komarov 2023-04-20 17:15:18 +0200
  • 5addcb120c
    fix: LLAMA_CUBLAS=1 undefined reference 'shm_open' (#1080) 源文雨 2023-04-20 21:28:43 +0800
  • c8c2c52482
    AVX2 optimization for vec_dot_q4_2_q8_0 (#1068) Stephan Walter 2023-04-20 06:45:41 +0000
  • 02d6988121
    Improve cuBLAS performance by dequantizing on the GPU (#1065) slaren 2023-04-20 03:14:14 +0200
  • 834695fe3a
    Minor: Readme fixed grammar, spelling, and misc updates (#1071) CRD716 2023-04-19 14:52:14 -0500
  • f7d05095b4
    Q4_2 quantization with rmse-optimized scale and quants (#1062) Kawrakow 2023-04-19 20:20:14 +0200
  • 884e7d7a2b
    ggml : use 8-bit precision for Q4_1 intermediate results (#1047) Georgi Gerganov 2023-04-19 20:10:08 +0300
  • 7cd5c4a3e9
    readme : add warning about Q4_2 and Q4_3 Georgi Gerganov 2023-04-19 19:07:54 +0300
  • f3d4edf504
    ggml : Q4 cleanup - remove 4-bit dot product code (#1061) Stephan Walter 2023-04-19 16:06:37 +0000