llama.cpp

aditya/llama.cpp

Fork 0

mirror of https://git.adityakumar.xyz/llama.cpp.git synced 2025-02-22 07:40:00 +00:00

2d099e5193

ggml: add names to tensors (#1268) slaren 2023-05-02 16:03:00 +0200
f4cef87edf

Add git-based build information for better issue tracking (#1232) DannyDaemonic 2023-05-01 09:23:47 -0700
58b367c2d7

cuBLAS: refactor and optimize f16 mat mul performance (#1259) slaren 2023-05-01 18:11:07 +0200
ea3a0ad6b6

llama : update stubs for systems without mmap and mlock (#1266) xloem 2023-05-01 08:58:51 -0400
2bdc09646d

ggml : fix ggml_used_mem() (#1264) Kerfuffle 2023-05-01 05:56:07 -0600
70269cae37

llama : fix session load / save (#1263) Georgi Gerganov 2023-05-01 14:54:59 +0300
b925f1f1b0

cuBLAS: fall back to pageable memory if pinned alloc fails (#1233) slaren 2023-05-01 13:32:22 +0200
90b19bd6ee

llama : let context be const when accessing const data (#1261) Alex Klinkhamer 2023-05-01 00:24:20 -0700
7ff0dcd320

ggml : fix UB (int << 31) Georgi Gerganov 2023-04-30 22:28:51 +0300
6f79699286

build: add armv{6,7,8} support to cmake (#1251) Pavol Rusnak 2023-04-30 20:48:38 +0200
a5d30b1f53

common : better default number of threads (#934) jon-chuang 2023-04-30 14:41:35 -0400
76a884920a

ggml : add CLBlast q5_0, q5_1, q8_0 dequant kernels (#1225) 0cc4m 2023-04-30 20:34:52 +0200
6bc4400e67

ggml : add Q5 WASM SIMD + GGML_FTYPE Georgi Gerganov 2023-04-30 19:07:00 +0300
f0d70f147d

Various fixes to mat_mul benchmark (#1253) Stephan Walter 2023-04-30 12:32:37 +0000
3e5aa8a1c4

ggml : fix labels for GGML_OP_ALIBI Georgi Gerganov 2023-04-30 10:25:46 +0300
c3ca7a5f05

ggml : fix 32-bit ARM NEON Georgi Gerganov 2023-04-29 21:34:23 +0300
e8c051611a

ggml : use vzip instead of vuzp for consistency Georgi Gerganov 2023-04-29 21:12:56 +0300
0b5a935099

ggml : fix visibility and unused warnings Georgi Gerganov 2023-04-29 19:28:36 +0300
ec728e44d7

ggml : fix #if for f32_f32 mul_mat (CLBlast) (#1229) Georgi Gerganov 2023-04-29 18:43:42 +0300
214b6a3570

ggml : adjust mul_mat_f16 work memory (#1226) Georgi Gerganov 2023-04-29 18:43:28 +0300
305eb5afd5

build : fix reference to old llama_util.h Georgi Gerganov 2023-04-29 13:53:12 +0300
84ca9c2ecf

examples : fix save-load-state + rename llama-util.h Georgi Gerganov 2023-04-29 13:48:11 +0300
334637e43e

common : change default parameters to pre-#1126 (#1223) Georgi Gerganov 2023-04-29 09:51:06 +0300
dd7eff57d8

llama : new sampling algorithms (#1126) Ivan Stepanov 2023-04-29 08:34:41 +0300
7fc50c051a

cuBLAS: use host pinned memory and dequantize while copying (#1207) slaren 2023-04-29 02:04:18 +0200
b1ee8f59b4

cuBLAS: non-contiguous tensor support (#1215) Henri Vasserman 2023-04-29 02:31:56 +0300
36d19a603b

Remove Q4_3 which is no better than Q5 (#1218) Stephan Walter 2023-04-28 23:10:43 +0000
7f15c5c477

readme : update hot topics Georgi Gerganov 2023-04-28 21:32:52 +0300
55390bcaf2

ggml : sync ggml (ggml_alibi) Georgi Gerganov 2023-04-28 20:37:43 +0300
5fba3c016b

examples : add Jeopardy example (#1168) CRD716 2023-04-28 11:13:33 -0500
1481a9cf25

llama : add session file format and saved sessions in main (#1169) Evan Jones 2023-04-28 11:59:37 -0400
11d902364b

ggml : add helper debug printf in soft_max Georgi Gerganov 2023-04-28 17:58:44 +0300
7296c961d9

ggml : add CLBlast support (#1164) 0cc4m 2023-04-28 16:57:16 +0200
78ec543733

Correcting link to w64devkit (#1214) Folko-Ven 2023-04-28 19:22:48 +0500
92a6e13a31

Add Manjaro CUDA include and lib dirs to Makefile (#1212) Johannes Gäßler 2023-04-28 15:40:32 +0200
04aaae1d79

add avx2 for dot_q8_0_q8_0, 2x faster than scalar (#1211) Yann Follet 2023-04-28 19:59:48 +0800
0b2da20538

ggml : slightly faster AVX2 implementation for Q5 (#1197) Stephan Walter 2023-04-26 20:26:42 +0000
f9be42add0

readme : add quantization info Georgi Gerganov 2023-04-26 23:24:42 +0300
574406dc7e

ggml : add Q5_0 and Q5_1 quantization (#1187) Georgi Gerganov 2023-04-26 23:14:13 +0300
87a6f846d3

Allow setting the rng seed after initialization. (#1184) Ásgeir Bjarni Ingvarsson 2023-04-26 20:08:43 +0000
ea3ad7eb60

Updating build instructions to include BLAS support (#1183) DaniAndTheWeb 2023-04-26 22:03:03 +0200
859fee6dfb

quantize : use map to assign quantization type from string (#1191) Pavol Rusnak 2023-04-26 18:43:27 +0200
4afcc37869

Update SHA256SUMS after quantization change (#1181) Stephan Walter 2023-04-25 21:41:56 +0000
667c501334

py : cast lora_alpha to int in convert-lora-to-ggml (#1170) ostix360 2023-04-25 23:33:08 +0200
bb98e77be7

nix: use convert.py instead of legacy wrapper convert-pth-to-ggml.py (#981) Pavol Rusnak 2023-04-25 23:19:57 +0200
7a32fcb3b2

ggml : add Q8_0 quantization format (rename the old one to Q8_1) (ARM NEON) (#1179) Georgi Gerganov 2023-04-25 23:40:51 +0300
dd0eabc049

ggml : use full range for Q4_0 and Q4_2 quantization (#729) unbounded 2023-04-25 19:20:46 +0200
54bb60e268

ggml : fix bug in ggml_compute_forward_sum_f32 (#1162) xaedes 2023-04-24 23:02:02 +0200
8a0f8673ba

ggml : export symbols (#1155) Georgi Gerganov 2023-04-24 22:18:25 +0300
0c5692345d

examples : add save_load_state example (#1150) xaedes 2023-04-24 18:23:31 +0200
957c8ae21d

llama : increase scratch buffer size for 65B (ref #1152) Georgi Gerganov 2023-04-24 18:47:03 +0300
9b0a4d4214

examples/main README improvements and some light refactoring (#1131) mgroeber9110 2023-04-24 17:45:32 +0200
2ec83428de

Fix build for gcc 8 and test in CI (#1154) Stephan Walter 2023-04-24 15:38:26 +0000
e4cf982e0d

Fix cuda compilation (#1128) slaren 2023-04-24 17:29:58 +0200
c4fe84fb0d

llama : refactor get / set state + remove redundant kv cache API (#1143) Georgi Gerganov 2023-04-24 07:40:02 +0300
1d78fecdab

Fix LoRA acronym (#1145) slaren 2023-04-23 23:03:44 +0200
284685f169

scripts : add helper scripts to synch ggml repo Georgi Gerganov 2023-04-23 19:57:09 +0300
edce63baa9

Added README.md for main with examples and explanations (#1139) DannyDaemonic 2023-04-23 08:37:02 -0700
ec9cdb6752

ggml : do not print perf ops that have not been used at all Georgi Gerganov 2023-04-23 18:32:52 +0300
e4422e299c

ggml : better PERF prints + support "LLAMA_PERF=1 make" Georgi Gerganov 2023-04-23 18:15:39 +0300
53c8434398

Improve AVX2 for vec_dot_q4_3_q8_0 (#1138) Stephan Walter 2023-04-23 11:01:03 +0000
c6524f46eb

readme : update gpt4all instructions (#980) Pavol Rusnak 2023-04-23 10:21:26 +0200
c9e2c26f41

A better packNibbles and mul_sum_i8_pairs_float implementation using AVX512 (#1119) Yishuo Wang 2023-04-23 15:57:05 +0800
0e018fe008

ggml : fix Q4_3 cuBLAS Georgi Gerganov 2023-04-22 16:31:56 +0300
857308d1e8

ci : trigger CI for drafts, but not most PR actions (#1125) Stephan Walter 2023-04-22 13:12:29 +0000
c50b628810

Fix CI: ARM NEON, quantization unit tests, editorconfig (#1122) Stephan Walter 2023-04-22 10:54:13 +0000
5f939498d5

ggml : unit test for quantization functions (#953) unbounded 2023-04-22 11:10:39 +0200
36b4f7e064

llama : print timings on ctrl+c exit (#1021) wbpxre150 2023-04-22 16:56:35 +0800
10f19c1121

llama : have n_batch default to 512 (#1091) eiery 2023-04-22 04:27:05 -0400
7e312f165c

cmake : fix build under Windows when enable BUILD_SHARED_LIBS (#1100) Howard Su 2023-04-22 16:18:20 +0800
872c365a91 ggml : fix AVX build + update to new Q8_0 format Georgi Gerganov 2023-04-22 11:08:12 +0300
955ef9a5d5

ggml : alternative Q4_3 implementation using modified Q8_0 (#1109) Georgi Gerganov 2023-04-22 10:55:35 +0300
c5aa5e5777

ggml : AVX2 optimization for vec_dot_q4_3_q8_0 and refactoring (#1099) Stephan Walter 2023-04-22 07:37:05 +0000
e9a9cb0c54

examples : Improve Alpaca Default Repeat Penalty: Better Match Alpaca.cpp Experience (#1107) Clint Herron 2023-04-22 02:54:33 -0400
b6e7f9b09e

llama : add api for getting/setting the complete state: rng, logits, embedding and kv_cache (#1105) xaedes 2023-04-22 08:21:32 +0200
50cb666b8a

Improve cuBLAS performance by using a memory pool (#1094) slaren 2023-04-21 21:59:17 +0200
25d7abbd1f

llama : fixed rlimit error message (#888) apaz 2023-04-21 13:48:06 -0500
018f2279f5

cmake : link threads publicly to ggml (#1042) 源文雨 2023-04-22 02:27:06 +0800
9411288271

main : evaluate tokens in batches after swapping context (#1014) Alex Klinkhamer 2023-04-21 11:18:09 -0700
8687c1f258

llama : remember and restore kv cache data pointers (#1104) xaedes 2023-04-21 17:25:21 +0200
1bfc153e2f

ggml : a faster version for Q4_1 x Q8_0 dot products (#1083) Kawrakow 2023-04-21 17:18:26 +0200
3d59769c3b

Show perplexity ETA in hours and minutes (#1096) slaren 2023-04-21 14:57:57 +0200
d40fded93e

llama : fix comment for "output.weight" tensor Georgi Gerganov 2023-04-21 10:23:36 +0300
2510c1831f

Add ggml-model-*.bin checksums for 7B, 13B, 30B, 65B (#1088) Stephan Walter 2023-04-20 21:56:44 +0000
12b5900dbc

ggml : sync ggml (add GPT-NeoX RoPE implementation) Georgi Gerganov 2023-04-20 23:32:59 +0300
9ff334f3c9

ggml : fix bug in ggml_compute_forward_dup_f32() Georgi Gerganov 2023-04-20 21:58:05 +0300
2005469ea1

Add Q4_3 support to cuBLAS (#1086) slaren 2023-04-20 20:49:53 +0200
8a1756abdf

ggml : do not break cuBLAS build (Q4_3 is not yet implemented) Georgi Gerganov 2023-04-20 21:43:50 +0300
66aab46079

ggml : fix Q4_3 quantization Georgi Gerganov 2023-04-20 20:44:05 +0300
38de86a711

llama : multi-threaded quantization (#1075) Kawrakow 2023-04-20 19:42:27 +0200
e0305ead3a

ggml : add Q4_3 quantization (#1082) Georgi Gerganov 2023-04-20 20:35:53 +0300
6a9661ea5a

ci : remove the LLAMA_ACCELERATE matrix dimension from Ubuntu builds in the CI (#1074) Ivan Komarov 2023-04-20 17:15:18 +0200
5addcb120c

fix: LLAMA_CUBLAS=1 undefined reference 'shm_open' (#1080) 源文雨 2023-04-20 21:28:43 +0800
c8c2c52482

AVX2 optimization for vec_dot_q4_2_q8_0 (#1068) Stephan Walter 2023-04-20 06:45:41 +0000
02d6988121

Improve cuBLAS performance by dequantizing on the GPU (#1065) slaren 2023-04-20 03:14:14 +0200
834695fe3a

Minor: Readme fixed grammar, spelling, and misc updates (#1071) CRD716 2023-04-19 14:52:14 -0500
f7d05095b4

Q4_2 quantization with rmse-optimized scale and quants (#1062) Kawrakow 2023-04-19 20:20:14 +0200
884e7d7a2b

ggml : use 8-bit precision for Q4_1 intermediate results (#1047) Georgi Gerganov 2023-04-19 20:10:08 +0300
7cd5c4a3e9

readme : add warning about Q4_2 and Q4_3 Georgi Gerganov 2023-04-19 19:07:54 +0300
f3d4edf504

ggml : Q4 cleanup - remove 4-bit dot product code (#1061) Stephan Walter 2023-04-19 16:06:37 +0000

Commit graph Select branches Hide pull requests master Mono Color

Commit graph

Select branches

Hide pull requests

master