llama.cpp

aditya/llama.cpp

Fork 0

mirror of https://git.adityakumar.xyz/llama.cpp.git synced 2025-02-20 15:00:02 +00:00

d1d6fd9be5

fix runtime crash master aditya 2023-08-10 12:59:55 +0530
a9ff78b3f4

resolve merge conflict aditya 2023-08-10 12:32:35 +0530
916a9acdd0

ggml-alloc: Don't try to re-use buffers of external tensors (#2562) Sam Spilsbury 2023-08-09 23:47:42 +0300
ea04a4ca19

add log_callback to llama_context_params for custom logging. (#2234) grahameth 2023-08-09 22:46:40 +0200
25d43e0eb5

CUDA: tuned mul_mat_q kernels (#2546) Johannes Gäßler 2023-08-09 09:42:34 +0200
f5bfea0580

Allow passing grammar to completion endpoint (#2532) Martin Krasser 2023-08-08 15:29:19 +0200
acfc5478ff

CUDA: tighter VRAM scratch size for 65b/70b (#2551) Johannes Gäßler 2023-08-08 14:38:16 +0200
7ed8d1fe7f

llm.vim : multiline autocompletion, get rid of "^@" (#2543) chaihahaha 2023-08-08 20:07:02 +0800
e7f94d6fdc

vim : bring back simple llm.vim example Georgi Gerganov 2023-08-08 15:05:30 +0300
2d7baaf50f

vim : streaming and more (#2495) AustinMroz 2023-08-08 06:44:48 -0500
f3c3b4b167

Add --rope-scale parameter (#2544) klosax 2023-08-07 19:07:19 +0200
93356bdb7a

ggml : mul mat tweaks (#2372) Georgi Gerganov 2023-08-07 14:25:58 +0300
60baff7c85

ggml : pad result of ggml_nbytes() Georgi Gerganov 2023-08-07 14:24:42 +0300
9082b5dfbf

ggml : change params pointer (style change) (#2539) Georgi Gerganov 2023-08-07 13:55:18 +0300
99d29c0094

ggml : sync (custom ops) (#2537) Georgi Gerganov 2023-08-07 13:20:09 +0300
3d9a551816

Fixed mmap prefetch for GPU offloading (#2529) Johannes Gäßler 2023-08-07 10:09:40 +0200
f6f9896ac3

metal : fix out-of-bounds access + inc concurrency nodes (#2416) Georgi Gerganov 2023-08-07 10:52:57 +0300
34a14b28ff

[Makefile] Move ARM CFLAGS before compilation (#2536) GiviMAD 2023-08-06 23:21:46 -0700
7297128db8

[Zig] Rewrite build for Zig 0.11 (#2514) Henri Vasserman 2023-08-07 08:35:53 +0300
86c3219895

console : fix issue related to Windows 11 PowerShell console mode persistence (#2521) DannyDaemonic 2023-08-05 23:49:34 -0700
2e8265ae17

convert.py : add missing abstract methods for quantized data (#2491) Keiichi Tabata 2023-08-06 15:34:05 +0900
f514d1b306

CUDA: faster k-quant mul_mat_q kernels (#2525) Johannes Gäßler 2023-08-05 18:20:44 +0200
332311234a

fix firefox autoscroll (#2519) Jonas Wunderlich 2023-08-04 20:16:11 +0000
182af739c4

server: regenerate completion.js.hpp (#2515) Cebtenzzre 2023-08-04 15:00:57 -0400
4329d1acb0

CUDA: use min compute capability of GPUs actually used (#2506) Cebtenzzre 2023-08-04 11:35:22 -0400
02f9d96a86

CUDA: check if event is NULL before cudaStreamWaitEvent (#2505) Cebtenzzre 2023-08-04 11:34:32 -0400
3498588e0f

Add --simple-io option for subprocesses and break out console.h and cpp (#1558) DannyDaemonic 2023-08-04 08:20:12 -0700
5f631c2679

Fixing race condition in server and partial stream handling in frontend. (#2391) Stephen Nichols 2023-08-04 06:37:24 -0500
415e99fec2

Stream save llama context data to file instead of allocating entire buffer upfront (#2488) l3utterfly 2023-08-04 19:29:52 +0800
ff966e7ca6

build : fix several cast and printf warnings (#2499) Borislav Stanimirov 2023-08-04 13:07:21 +0300
8183159cf3

examples : generate JSON according to schema (#1887) Evan Jones 2023-08-02 22:05:44 -0400
468ea24fb4

CUDA: faster non k-quant mul_mat_q kernels (#2483) Johannes Gäßler 2023-08-02 18:04:04 +0200
4f6b60c776

CUDA: Fix models with output size != 32000 (#2480) Johannes Gäßler 2023-08-02 16:48:10 +0200
220d931864

readme : add Aquila-7B model series to supported models (#2487) ldwang 2023-08-02 16:21:11 +0800
81844fbcfd

tests : Fix compilation warnings (Linux/GCC) (#2451) Eve 2023-08-02 04:06:19 -0400
a312193e18

readme : Add Chinese LLaMA-2 / Alpaca-2 to supported models (#2475) Yiming Cui 2023-08-02 14:18:31 +0800
c574bddb36

fix a typo in examples/server/README.md (#2478) Bono Lv 2023-08-01 20:54:28 +0800
86aeb27734

server : Support dark mode (#2414) ebraminio 2023-08-01 01:56:23 -0700
1873ff586b

metal : add gqa8 kernel to allow llama-2-70B on metal (#2459) Matteo Boschini 2023-08-01 09:43:12 +0200
49e7cb5bb1

CUDA: fixed LLAMA_FAST compilation option (#2473) Johannes Gäßler 2023-07-31 21:02:19 +0200
b772bba42e

CUDA: fixed cmake F16 option (#2471) Johannes Gäßler 2023-07-31 19:52:22 +0200
0728c5a8b9

CUDA: mmq CLI option, fixed mmq build issues (#2453) Johannes Gäßler 2023-07-31 15:44:35 +0200
1215ed7d5c

CUDA: Implemented row flattening for non-glm RoPE (#2468) Johannes Gäßler 2023-07-31 14:32:30 +0200
2dbf518911

CUDA: fewer memory bank conflicts for mul_mat_q (#2458) Johannes Gäßler 2023-07-31 13:18:51 +0200
9d2382b3e4

Fix Metal backend broken from the allocator changes (#2455) slaren 2023-07-31 11:02:53 +0200
a113689571

ggml : add graph tensor allocator (#2411) slaren 2023-07-30 15:58:01 +0200
11f3ca06b8

CUDA: Quantized matrix matrix multiplication (#2160) Johannes Gäßler 2023-07-29 23:04:44 +0200
9baf9ef304

CUDA: faster multi GPU synchronization (#2448) Johannes Gäßler 2023-07-29 23:04:10 +0200
8a88e5855c

perplexity : add Hellaswag calculation (#2389) klosax 2023-07-28 20:25:36 +0200
a9559bf77b

ggml : workaround for missing _mm256_setr_m128i in GCC < 8 in k_quants.c (#2405) Lee 2023-07-29 02:17:45 +0800
ee1b497c98

llama : support more diverse tokenizers? (#2420) eric8607242 2023-07-29 02:10:05 +0800
d73b8d48b4

examples : fix whitespace Georgi Gerganov 2023-07-28 21:05:08 +0300
34ae1caf7f

examples : server chat mode with llama2 (#2400) nhamanasu 2023-07-29 03:02:10 +0900
d91f3f0c55

readme : fix the description of the Tail free sampling (TFS) method (#2431) Weird Constructor 2023-07-28 10:44:43 +0200
65cdf34bdc

llama : use n_embd_gqa instead of n_embd to handle llama-2 70B (#2433) Rand Xie 2023-07-28 01:42:53 -0700
edcc7ae7d2

Obtaining LLaMA 2 instructions (#2308) niansa/tuxifan 2023-07-28 03:14:11 +0200
7c529cede6

convert.py : Update to support 70B HF format model files (#2427) mj-shifu 2023-07-27 22:39:17 +0200
1a941869cb

metal : disable graph concurrency optimization due to bug (#2413) Georgi Gerganov 2023-07-27 11:00:54 +0300
b5472ea0ad

ggml : fix assert in ggml_set_unary_op (#2410) slaren 2023-07-26 23:57:23 +0200
6df1f5940f

make : build with -Wmissing-prototypes (#2394) Cebtenzzre 2023-07-26 14:00:04 -0400
5488fb789e

ggml : allocate graphs in a context (#2392) slaren 2023-07-26 15:56:53 +0200
eb542d3932

Add LLAMA_DEFAULT_RMS_EPS so we can change the default (#2384) Kawrakow 2023-07-25 18:35:53 +0300
07aaa0f63f

ggml : fix ggml_flash_attn to use op_params (#2387) slaren 2023-07-25 16:20:12 +0200
fce48caf9a

convert.py : support bpe tokenizer (#2228) ldwang 2023-07-25 21:22:09 +0800
875086bdb9

ggml : relax contiguous constraints in activation function (#2371) Jiahao Li 2023-07-25 20:58:32 +0800
da1889834a

ggml : improve graph build time via hash table lookup (#2329) slaren 2023-07-25 14:32:20 +0200
82552b7f54

build : fix line breaking error in build-info.sh (#2349) Hesen Peng 2023-07-25 05:24:09 -0700
0c06204fb3

main : add --in-prefix-bos to prefix BOS to user inputs; keep EOS (#2304) Xiao-Yong Jin 2023-07-25 07:19:11 -0500
1fed755b1f

ci : add non-AVX scalar build/test (#2356) Eve 2023-07-25 08:16:13 -0400
be2301bcda

k_quants : add AVX support to dot functions with QK_K as 64 (#2339) katsu560 2023-07-25 21:13:41 +0900
1aa18ef994

metal : concurrently dispatch commands (#2358) Shouzheng Liu 2023-07-25 08:00:19 -0400
9a08eaf3c4

Another speed gain for Q4_0 and Q4_1 on Metal (#2375) Kawrakow 2023-07-25 13:48:29 +0300
129d844c87

Fix Q4_K and Q5_K for QK_K = 64 on CUDA (#2359) Kawrakow 2023-07-25 13:48:04 +0300
d5512b782b

server: add rms_norm_eps parameter (#2380) slaren 2023-07-25 11:36:17 +0200
c798308e3a

[Server] Escape HTML in webchat (#2368) Henri Vasserman 2023-07-25 10:27:34 +0300
41c674161f

make rms_norm_eps a parameter (#2374) slaren 2023-07-24 17:57:12 +0200
b3f138d058

Chat UI extras (#2366) Aarni Koskela 2023-07-24 17:54:22 +0300
5b2b2dc6ae

ggml : sync (unary ops refactor, static-correctness) (#2370) Georgi Gerganov 2023-07-24 14:46:21 +0300
42f70cb2f6

Fix scalar version of Q5_K when QK_K = 64 (#2362) Kawrakow 2023-07-24 12:55:02 +0300
84e09a7d8b

llama : add grammar-based sampling (#1773) Evan Jones 2023-07-23 23:58:10 -0400
2f9cf974a0

Some more Q4_K and Q5_K speedup on CUDA (#2346) Kawrakow 2023-07-24 00:19:47 +0300
4f06592cc6

Add gqa parameter support to the server (#2351) IgnacioFDM 2023-07-23 17:31:17 -0300
70d26ac388

Fix __dp4a documentation (#2348) Johannes Gäßler 2023-07-23 17:49:06 +0200
57921ca6db

common : n_threads == -1 uses std:🧵:hardware_concurrency() (#2347) wzy 2023-07-23 21:33:02 +0800
3602ac4255

fix n_tasks (#2342) slaren 2023-07-23 15:19:39 +0200
95a6c595e7

ggml: move op parameters from tensors to ggml_tensor::op_params (#2333) slaren 2023-07-23 14:36:02 +0200
e76d630df1

llama : grouped-query attention + LLaMAv2 70B support (#2276) Georgi Gerganov 2023-07-23 15:09:47 +0300
1d0824b247

llama : print help to stdout (#2338) maddes8cht 2023-07-23 13:59:48 +0200
bc3ec2cdc9

flake : support nix build '.#opencl' (#2337) wzy 2023-07-23 19:57:02 +0800
a940458e48

llama : print max tensor size to stderr (#2336) Christian Demsar 2023-07-23 07:56:34 -0400
91171b8072

make : fix CLBLAST compile support in FreeBSD (#2331) Jose Maldonado 2023-07-23 07:52:08 -0400
355c80f49e

examples : simplify vim plugin (#2327) AustinMroz 2023-07-23 06:16:48 -0500
83a00ce69b

metal : support bcast add & dup & cont op (#2323) Jiahao Li 2023-07-23 19:00:37 +0800
d2a43664f9

Speed up Q4_K (#2322) Kawrakow 2023-07-23 08:49:20 +0300
b9b7d94fc1

CUDA: Fixed 7b q3_K_S with mul_mat_vec_q (#2313) Johannes Gäßler 2023-07-22 21:27:34 +0200
b47b8a9cfe

llama : optimize memory buffers (#2325) Georgi Gerganov 2023-07-22 21:17:57 +0300
b5fe67f8c6

Perplexity: Compute scores correlated to HellaSwag (#2312) klosax 2023-07-22 14:21:24 +0200
24baa54ac1

examples : basic VIM plugin whoreson 2023-07-22 12:34:51 +0200
dd6c67d3cb

ci : fix args Georgi Gerganov 2023-07-22 12:00:56 +0300
5d500e8ccf

ci : add 7B CUDA tests (#2319) Georgi Gerganov 2023-07-22 11:48:22 +0300

Commit graph Select branches Hide pull requests master Mono Color

Commit graph

Select branches

Hide pull requests

master