llama.cpp

aditya/llama.cpp

Fork 0

mirror of https://git.adityakumar.xyz/llama.cpp.git synced 2025-02-21 15:30:00 +00:00

23c7c6fc91

Update Makefile: clean simple (#2097) ZhouYuChen 2023-07-04 20:15:16 +0800
698efad5fb

CI: make the brew update temporarily optional. (#2092) Erik Scholz 2023-07-04 01:50:12 +0200
14a2cc71f6

[ggml] fix index for ne03 value in ggml_cl_mul_f32 (#2088) Govlzkoy 2023-07-04 07:50:00 +0800
1cf14ccef1

fix server crashes (#2076) Henri Vasserman 2023-07-04 00:05:23 +0300
cc45a7feb8

Fix crash of test-tokenizer-0 under Debug build (#2064) Howard Su 2023-07-04 02:43:55 +0800
55dbb915cc

[llama] No need to check file version when loading vocab score (#2079) Howard Su 2023-07-03 19:58:58 +0800
d7d2e6a0f0

server: add option to output probabilities for completion (#1962) WangHaoranRobin 2023-07-03 05:38:44 +0800
46088f7231 ggml : fix build with OpenBLAS (close #2066) Georgi Gerganov 2023-07-02 09:46:46 +0300
0bc2cdfc87

Better CUDA synchronization logic (#2057) Johannes Gäßler 2023-07-01 21:49:44 +0200
befb3a3562

Test-based VRAM scratch size + context adjustment (#2056) Johannes Gäßler 2023-07-01 21:47:26 +0200
b213227067

cmake : don't force -mcpu=native on aarch64 (#2063) Daniel Drake 2023-07-01 20:31:44 +0200
2f8cd979ec

metal : release buffers when freeing metal context (#2062) Aaron Miller 2023-07-01 11:14:59 -0700
471aab6e4c

convert : add support of baichuan-7b (#2055) Judd 2023-07-02 01:00:25 +0800
463f2f4c4f

llama : fix return value of llama_load_session_file_internal (#2022) Georgi Gerganov 2023-07-01 19:05:09 +0300
cb44dbc7de

llama : catch llama_load_session_file_internal exceptions (#2022) Rand Xie 2023-07-02 00:02:58 +0800
79f634a19d

embd-input : fix returning ptr to temporary Georgi Gerganov 2023-07-01 18:46:00 +0300
04606a1599

train : fix compile warning Georgi Gerganov 2023-07-01 18:45:44 +0300
b1ca8f36a9

ggml : disable GGML_TASK_INIT and GGML_TASK_FINALIZE by default (#1995) Qingyou Meng 2023-07-01 23:42:43 +0800
b8c8dda75f

Use unsigned for random seed (#2006) Howard Su 2023-06-29 21:15:15 +0800
96a712ca1b

Porting the improved K-Quant CUDA kernels to OpenCL (#1966) LostRuins 2023-06-29 11:56:43 +0800
d3494bb86b

llama : replacing auto &kv with const auto &kv (#2041) m3ndax 2023-06-28 20:39:08 +0200
5b351e94d0

cuda : remove nchannels_x argument from mul_mat_vec_nc_f16_f32 (#2028) Salvador E. Tropea 2023-06-28 14:27:31 -0300
6432aabb6d

cuda : fix missing const qualifier in casts (#2027) Salvador E. Tropea 2023-06-28 14:26:26 -0300
b922bc351b

llama : remove shards weight file support (#2000) Howard Su 2023-06-28 10:13:02 -0700
7f9753fa12

CUDA GPU acceleration for LoRAs + f16 models (#1970) Johannes Gäßler 2023-06-28 18:35:54 +0200
cfa0750bc9

llama : support input embeddings directly (#1910) ningshanwutuobang 2023-06-28 23:53:37 +0800
9d23589d63

fix pthreads setaffinity usage on android (#2020) Erik Scholz 2023-06-27 19:06:33 +0200
0be54f75a6

baby-llama : fix build after ggml_rope change (#2016) Howard Su 2023-06-27 13:07:13 +0800
181e8d9755

llama : fix rope usage after ChatGLM change Georgi Gerganov 2023-06-27 00:37:13 +0300
d9779021bd

ggml : add support for ChatGLM RoPE Georgi Gerganov 2023-06-27 00:06:51 +0300
d38e451578

readme : add Scala 3 bindings repo (#2010) Roman Parykin 2023-06-26 22:47:59 +0300
eaa6ca5a61

ggml : increase max tensor name + clean up compiler warnings in train-text (#1988) David Yang 2023-06-27 03:45:32 +0800
aa777abbb7

readme : LD_LIBRARY_PATH complement for some Android devices when building with CLBlast inside Termux (#2007) Gustavo Rocha Dias 2023-06-26 16:34:45 -0300
c824d2e368

ggml : avoid conv 2d kernel round up Georgi Gerganov 2023-06-26 21:03:59 +0300
b853d45601

ggml : add NUMA support (#1556) zrm 2023-06-26 13:57:59 -0400
9225baef71

k-quants : fix indentation Georgi Gerganov 2023-06-26 20:10:52 +0300
a84ab1da8d

tests : fix quantize perf (#1990) katsu560 2023-06-27 01:47:02 +0900
5743ca8092

k-quants : add AVX support to dot functions (#1916) katsu560 2023-06-27 01:46:07 +0900
412c60e473

readme : add link to new k-quants for visibility Georgi Gerganov 2023-06-26 19:45:09 +0300
6769e944c7

k-quants : support for super-block size of 64 (#2001) Kawrakow 2023-06-26 19:43:07 +0300
cbebf61ca7

Fix assert when free invalid cuda pointer (#2005) Howard Su 2023-06-26 23:15:47 +0800
447ccbe8c3

readme : add new roadmap + manifesto Georgi Gerganov 2023-06-25 16:08:12 +0300
bd34cdde38

ggml : sync latest ggml (custom operators) Georgi Gerganov 2023-06-25 14:25:08 +0300
c2a08f87b8

fix server sampling: top k sampler first (#1977) anon998 2023-06-25 08:48:36 +0000
66a2555ba6

readme : add Azure CI discussion link Georgi Gerganov 2023-06-25 09:07:03 +0300
e65ca7e14a

zig : upgrade build system support (#1981) sjinzh 2023-06-25 13:45:44 +0800
5ec8dd5a3c

#1869 Fix null reference errors when training from scratch with CUDA (#1907) Robyn 2023-06-25 04:10:29 +1000
65bdd52a86

tests : sync test-grad0 from ggml Georgi Gerganov 2023-06-24 19:40:18 +0300
fdd1860911

flake : fix ggml-metal.metal path and run nixfmt (#1974) Rowan Hart 2023-06-24 04:07:08 -0700
c943d823c1

convert : fix invalid params in write_vocab_only (#1975) AN Long 2023-06-24 19:02:06 +0800
f2c754e1c3

ggml : improve ggml_graph_dump_dot, add ggml_format_name (#1978) slaren 2023-06-24 12:57:18 +0200
11da1a85cd

readme : fix whitespaces Georgi Gerganov 2023-06-24 13:38:18 +0300
235b610d65

readme : fixed termux instructions (#1973) Alberto 2023-06-24 12:32:13 +0200
b061ba9e2a

llama : fix top-p sampling to match the canonical definition (#1953) Alex Renda 2023-06-24 03:15:01 -0700
527b6fba1d

llama : make model stateless and context stateful (llama_state) (#1797) Didzis Gosko 2023-06-24 11:47:58 +0300
d7b7484f74

Add OpenLLaMA instructions to the README (#1954) eiery 2023-06-23 04:38:01 -0400
7487137227

rework convert.py to read hyper-parameters from config.json (#1958) Erik Scholz 2023-06-22 14:20:47 +0200
bbca06e269

cmake: revert CUDA arch default to 52, 61 if f16 (#1959) Johannes Gäßler 2023-06-21 23:49:25 +0200
fb98254f99

Fix typo in README.md (#1961) Rahul Vivek Nair 2023-06-22 03:18:43 +0530
049aa16b8c

readme : add link to p1 Georgi Gerganov 2023-06-20 19:05:54 +0300
2322ec223a

Fix typo (#1949) Xiake Sun 2023-06-20 05:42:40 -0700
aacdbd4056

llama : fix params struct slignment (#1936) Ettore Di Giacinto 2023-06-20 03:24:39 +0200
20568fe60f

[Fix] Reenable server embedding endpoint (#1937) Henri Vasserman 2023-06-20 01:12:39 +0300
18b35625c3

ggml : fix bug in LBFGS optimizer (found by ggml tests) Georgi Gerganov 2023-06-19 20:43:30 +0300
ba4e85a833

llama : use aligned memory during ggml_init call from loading saved sessions (#1934) l3utterfly 2023-06-19 23:20:06 +0800
23fc5c219a

cmake : fix trailing whitespaces Georgi Gerganov 2023-06-19 18:18:34 +0300
cb40dfca69

llama : only use Q6_K for output weights if tensor size is multiple of 256 (#1932) Kawrakow 2023-06-19 18:17:03 +0300
ca7c3f4da5

cuda : faster k-quants on older GPUs (#1930) Kawrakow 2023-06-19 18:14:09 +0300
b97ca431db

ggml : sync latest ggml repo (#1924) Georgi Gerganov 2023-06-19 18:12:33 +0300
1e3abfcef0

cmake : fix build shared ggml when CUDA is enabled (#1929) Howard Su 2023-06-19 23:10:37 +0800
16b9cd1939

Convert vector to f16 for dequantize mul mat vec (#1913) Johannes Gäßler 2023-06-19 10:23:56 +0200
b24c3049d9

Added tokens per second to info prints (#1928) Johannes Gäßler 2023-06-18 17:41:26 +0200
0ede372a51

Fixed incorrectly applying RMS norm twice (#1925) Johannes Gäßler 2023-06-18 16:07:09 +0200
8596af4277

ggml : fix bug in ggml_compute_forward_add_q_f32 (#1918) l3utterfly 2023-06-18 19:19:16 +0800
e1886cf4fe

readme : update Android build instructions (#1922) Mike 2023-06-18 16:28:26 +0800
8ab8ba62eb

llama : prevent usage of k-quants when tensor size is not a multiple of 256 (#1921) Kawrakow 2023-06-18 11:13:43 +0300
90cc59d6ab

examples : fix examples/metal (#1920) Kawrakow 2023-06-18 10:52:10 +0300
ce2c7d72e2

metal : handle buffers larger than device's maxBufferLength (#1826) Georgi Gerganov 2023-06-18 09:09:47 +0300
57cd69460f

cmake : add CUDA_ARCHITECTURES to new target ggml_static (#1917) Howard Su 2023-06-18 12:29:47 +0800
b2416493ab

make : do not print help for simple example Georgi Gerganov 2023-06-17 20:55:03 +0300
4f9c43e3bd

minor : warning fixes Georgi Gerganov 2023-06-17 20:24:11 +0300
2c9380dd2f

Only one CUDA stream per device for async compute (#1898) Johannes Gäßler 2023-06-17 19:15:02 +0200
051e1b0e6a

llama : fix kv_cache n init (close #1903) Georgi Gerganov 2023-06-17 19:30:22 +0300
86c7571864

make : update for latest Arch (#1701) DaniAndTheWeb 2023-06-17 18:17:22 +0200
3d59ec5935

ggml : fix warnings under MSVC (#1908) Howard Su 2023-06-17 23:46:15 +0800
0711a5f6dc

metal : add norm, cpy f16->f16, alibi kernels (#1823) Aaron Miller 2023-06-17 07:37:49 -0700
fc45a81bc6

exposed modules so that they can be invoked by nix run github:ggerganov/llama.cpp#server etc (#1863) Faez Shakil 2023-06-17 17:13:05 +0500
794db3e7b9

Server Example Refactor and Improvements (#1570) Randall Fitzgerald 2023-06-17 07:53:04 -0400
5ddf7ea1fb

hooks : setting up flake8 and pre-commit hooks (#1681) Jiří Podivín 2023-06-17 12:32:48 +0200
bac19927c3

readme : alternative way to build for Android with CLBlast. (#1828) Gustavo Rocha Dias 2023-06-17 06:01:06 -0300
b4c6f46f17

Allow cmake to build ggml as a library (#1896) Kerfuffle 2023-06-17 01:49:42 -0600
92f20d9942

train : get raw text instead of page with html (#1905) David Yang 2023-06-17 14:51:54 +0800
d411968e99

opencl : support k-quants (#1836) 0cc4m 2023-06-16 20:59:49 +0200
b41b4cad6f

examples : add "simple" (#1840) SuperUserNameMan 2023-06-16 20:58:09 +0200
13fe9d2d84

cmake : add auto detection of BLAS_INCLUDE_DIRS (#1886) Zenix 2023-06-17 03:53:04 +0900
ac3b886953

llama : fix embd when offloading non-repeating layers (#1891) Johannes Gäßler 2023-06-16 20:25:51 +0200
5b9ccaf104

Fixed possible macro redefinition (#1892) FrankHB 2023-06-17 02:25:01 +0800
9cbf50c041

build : fix and ignore MSVC warnings (#1889) Borislav Stanimirov 2023-06-16 21:23:53 +0300
3d01122610

CUDA : faster k-quant dot kernels (#1862) Kawrakow 2023-06-16 20:08:44 +0300
602c748863

gitignore : add several entries specific to Visual Studio (#1888) Borislav Stanimirov 2023-06-16 09:58:11 +0300

Commit graph Select branches Hide pull requests master Mono Color

Commit graph

Select branches

Hide pull requests

master