Commit graph

  • 23c7c6fc91
    Update Makefile: clean simple (#2097) ZhouYuChen 2023-07-04 20:15:16 +0800
  • 698efad5fb
    CI: make the brew update temporarily optional. (#2092) Erik Scholz 2023-07-04 01:50:12 +0200
  • 14a2cc71f6
    [ggml] fix index for ne03 value in ggml_cl_mul_f32 (#2088) Govlzkoy 2023-07-04 07:50:00 +0800
  • 1cf14ccef1
    fix server crashes (#2076) Henri Vasserman 2023-07-04 00:05:23 +0300
  • cc45a7feb8
    Fix crash of test-tokenizer-0 under Debug build (#2064) Howard Su 2023-07-04 02:43:55 +0800
  • 55dbb915cc
    [llama] No need to check file version when loading vocab score (#2079) Howard Su 2023-07-03 19:58:58 +0800
  • d7d2e6a0f0
    server: add option to output probabilities for completion (#1962) WangHaoranRobin 2023-07-03 05:38:44 +0800
  • 46088f7231 ggml : fix build with OpenBLAS (close #2066) Georgi Gerganov 2023-07-02 09:46:46 +0300
  • 0bc2cdfc87
    Better CUDA synchronization logic (#2057) Johannes Gäßler 2023-07-01 21:49:44 +0200
  • befb3a3562
    Test-based VRAM scratch size + context adjustment (#2056) Johannes Gäßler 2023-07-01 21:47:26 +0200
  • b213227067
    cmake : don't force -mcpu=native on aarch64 (#2063) Daniel Drake 2023-07-01 20:31:44 +0200
  • 2f8cd979ec
    metal : release buffers when freeing metal context (#2062) Aaron Miller 2023-07-01 11:14:59 -0700
  • 471aab6e4c
    convert : add support of baichuan-7b (#2055) Judd 2023-07-02 01:00:25 +0800
  • 463f2f4c4f
    llama : fix return value of llama_load_session_file_internal (#2022) Georgi Gerganov 2023-07-01 19:05:09 +0300
  • cb44dbc7de
    llama : catch llama_load_session_file_internal exceptions (#2022) Rand Xie 2023-07-02 00:02:58 +0800
  • 79f634a19d
    embd-input : fix returning ptr to temporary Georgi Gerganov 2023-07-01 18:46:00 +0300
  • 04606a1599
    train : fix compile warning Georgi Gerganov 2023-07-01 18:45:44 +0300
  • b1ca8f36a9
    ggml : disable GGML_TASK_INIT and GGML_TASK_FINALIZE by default (#1995) Qingyou Meng 2023-07-01 23:42:43 +0800
  • b8c8dda75f
    Use unsigned for random seed (#2006) Howard Su 2023-06-29 21:15:15 +0800
  • 96a712ca1b
    Porting the improved K-Quant CUDA kernels to OpenCL (#1966) LostRuins 2023-06-29 11:56:43 +0800
  • d3494bb86b
    llama : replacing auto &kv with const auto &kv (#2041) m3ndax 2023-06-28 20:39:08 +0200
  • 5b351e94d0
    cuda : remove nchannels_x argument from mul_mat_vec_nc_f16_f32 (#2028) Salvador E. Tropea 2023-06-28 14:27:31 -0300
  • 6432aabb6d
    cuda : fix missing const qualifier in casts (#2027) Salvador E. Tropea 2023-06-28 14:26:26 -0300
  • b922bc351b
    llama : remove shards weight file support (#2000) Howard Su 2023-06-28 10:13:02 -0700
  • 7f9753fa12
    CUDA GPU acceleration for LoRAs + f16 models (#1970) Johannes Gäßler 2023-06-28 18:35:54 +0200
  • cfa0750bc9
    llama : support input embeddings directly (#1910) ningshanwutuobang 2023-06-28 23:53:37 +0800
  • 9d23589d63
    fix pthreads setaffinity usage on android (#2020) Erik Scholz 2023-06-27 19:06:33 +0200
  • 0be54f75a6
    baby-llama : fix build after ggml_rope change (#2016) Howard Su 2023-06-27 13:07:13 +0800
  • 181e8d9755
    llama : fix rope usage after ChatGLM change Georgi Gerganov 2023-06-27 00:37:13 +0300
  • d9779021bd
    ggml : add support for ChatGLM RoPE Georgi Gerganov 2023-06-27 00:06:51 +0300
  • d38e451578
    readme : add Scala 3 bindings repo (#2010) Roman Parykin 2023-06-26 22:47:59 +0300
  • eaa6ca5a61
    ggml : increase max tensor name + clean up compiler warnings in train-text (#1988) David Yang 2023-06-27 03:45:32 +0800
  • aa777abbb7
    readme : LD_LIBRARY_PATH complement for some Android devices when building with CLBlast inside Termux (#2007) Gustavo Rocha Dias 2023-06-26 16:34:45 -0300
  • c824d2e368
    ggml : avoid conv 2d kernel round up Georgi Gerganov 2023-06-26 21:03:59 +0300
  • b853d45601
    ggml : add NUMA support (#1556) zrm 2023-06-26 13:57:59 -0400
  • 9225baef71
    k-quants : fix indentation Georgi Gerganov 2023-06-26 20:10:52 +0300
  • a84ab1da8d
    tests : fix quantize perf (#1990) katsu560 2023-06-27 01:47:02 +0900
  • 5743ca8092
    k-quants : add AVX support to dot functions (#1916) katsu560 2023-06-27 01:46:07 +0900
  • 412c60e473
    readme : add link to new k-quants for visibility Georgi Gerganov 2023-06-26 19:45:09 +0300
  • 6769e944c7
    k-quants : support for super-block size of 64 (#2001) Kawrakow 2023-06-26 19:43:07 +0300
  • cbebf61ca7
    Fix assert when free invalid cuda pointer (#2005) Howard Su 2023-06-26 23:15:47 +0800
  • 447ccbe8c3
    readme : add new roadmap + manifesto Georgi Gerganov 2023-06-25 16:08:12 +0300
  • bd34cdde38
    ggml : sync latest ggml (custom operators) Georgi Gerganov 2023-06-25 14:25:08 +0300
  • c2a08f87b8
    fix server sampling: top k sampler first (#1977) anon998 2023-06-25 08:48:36 +0000
  • 66a2555ba6
    readme : add Azure CI discussion link Georgi Gerganov 2023-06-25 09:07:03 +0300
  • e65ca7e14a
    zig : upgrade build system support (#1981) sjinzh 2023-06-25 13:45:44 +0800
  • 5ec8dd5a3c
    #1869 Fix null reference errors when training from scratch with CUDA (#1907) Robyn 2023-06-25 04:10:29 +1000
  • 65bdd52a86
    tests : sync test-grad0 from ggml Georgi Gerganov 2023-06-24 19:40:18 +0300
  • fdd1860911
    flake : fix ggml-metal.metal path and run nixfmt (#1974) Rowan Hart 2023-06-24 04:07:08 -0700
  • c943d823c1
    convert : fix invalid params in write_vocab_only (#1975) AN Long 2023-06-24 19:02:06 +0800
  • f2c754e1c3
    ggml : improve ggml_graph_dump_dot, add ggml_format_name (#1978) slaren 2023-06-24 12:57:18 +0200
  • 11da1a85cd
    readme : fix whitespaces Georgi Gerganov 2023-06-24 13:38:18 +0300
  • 235b610d65
    readme : fixed termux instructions (#1973) Alberto 2023-06-24 12:32:13 +0200
  • b061ba9e2a
    llama : fix top-p sampling to match the canonical definition (#1953) Alex Renda 2023-06-24 03:15:01 -0700
  • 527b6fba1d
    llama : make model stateless and context stateful (llama_state) (#1797) Didzis Gosko 2023-06-24 11:47:58 +0300
  • d7b7484f74
    Add OpenLLaMA instructions to the README (#1954) eiery 2023-06-23 04:38:01 -0400
  • 7487137227
    rework convert.py to read hyper-parameters from config.json (#1958) Erik Scholz 2023-06-22 14:20:47 +0200
  • bbca06e269
    cmake: revert CUDA arch default to 52, 61 if f16 (#1959) Johannes Gäßler 2023-06-21 23:49:25 +0200
  • fb98254f99
    Fix typo in README.md (#1961) Rahul Vivek Nair 2023-06-22 03:18:43 +0530
  • 049aa16b8c
    readme : add link to p1 Georgi Gerganov 2023-06-20 19:05:54 +0300
  • 2322ec223a
    Fix typo (#1949) Xiake Sun 2023-06-20 05:42:40 -0700
  • aacdbd4056
    llama : fix params struct slignment (#1936) Ettore Di Giacinto 2023-06-20 03:24:39 +0200
  • 20568fe60f
    [Fix] Reenable server embedding endpoint (#1937) Henri Vasserman 2023-06-20 01:12:39 +0300
  • 18b35625c3
    ggml : fix bug in LBFGS optimizer (found by ggml tests) Georgi Gerganov 2023-06-19 20:43:30 +0300
  • ba4e85a833
    llama : use aligned memory during ggml_init call from loading saved sessions (#1934) l3utterfly 2023-06-19 23:20:06 +0800
  • 23fc5c219a
    cmake : fix trailing whitespaces Georgi Gerganov 2023-06-19 18:18:34 +0300
  • cb40dfca69
    llama : only use Q6_K for output weights if tensor size is multiple of 256 (#1932) Kawrakow 2023-06-19 18:17:03 +0300
  • ca7c3f4da5
    cuda : faster k-quants on older GPUs (#1930) Kawrakow 2023-06-19 18:14:09 +0300
  • b97ca431db
    ggml : sync latest ggml repo (#1924) Georgi Gerganov 2023-06-19 18:12:33 +0300
  • 1e3abfcef0
    cmake : fix build shared ggml when CUDA is enabled (#1929) Howard Su 2023-06-19 23:10:37 +0800
  • 16b9cd1939
    Convert vector to f16 for dequantize mul mat vec (#1913) Johannes Gäßler 2023-06-19 10:23:56 +0200
  • b24c3049d9
    Added tokens per second to info prints (#1928) Johannes Gäßler 2023-06-18 17:41:26 +0200
  • 0ede372a51
    Fixed incorrectly applying RMS norm twice (#1925) Johannes Gäßler 2023-06-18 16:07:09 +0200
  • 8596af4277
    ggml : fix bug in ggml_compute_forward_add_q_f32 (#1918) l3utterfly 2023-06-18 19:19:16 +0800
  • e1886cf4fe
    readme : update Android build instructions (#1922) Mike 2023-06-18 16:28:26 +0800
  • 8ab8ba62eb
    llama : prevent usage of k-quants when tensor size is not a multiple of 256 (#1921) Kawrakow 2023-06-18 11:13:43 +0300
  • 90cc59d6ab
    examples : fix examples/metal (#1920) Kawrakow 2023-06-18 10:52:10 +0300
  • ce2c7d72e2
    metal : handle buffers larger than device's maxBufferLength (#1826) Georgi Gerganov 2023-06-18 09:09:47 +0300
  • 57cd69460f
    cmake : add CUDA_ARCHITECTURES to new target ggml_static (#1917) Howard Su 2023-06-18 12:29:47 +0800
  • b2416493ab
    make : do not print help for simple example Georgi Gerganov 2023-06-17 20:55:03 +0300
  • 4f9c43e3bd
    minor : warning fixes Georgi Gerganov 2023-06-17 20:24:11 +0300
  • 2c9380dd2f
    Only one CUDA stream per device for async compute (#1898) Johannes Gäßler 2023-06-17 19:15:02 +0200
  • 051e1b0e6a
    llama : fix kv_cache n init (close #1903) Georgi Gerganov 2023-06-17 19:30:22 +0300
  • 86c7571864
    make : update for latest Arch (#1701) DaniAndTheWeb 2023-06-17 18:17:22 +0200
  • 3d59ec5935
    ggml : fix warnings under MSVC (#1908) Howard Su 2023-06-17 23:46:15 +0800
  • 0711a5f6dc
    metal : add norm, cpy f16->f16, alibi kernels (#1823) Aaron Miller 2023-06-17 07:37:49 -0700
  • fc45a81bc6
    exposed modules so that they can be invoked by nix run github:ggerganov/llama.cpp#server etc (#1863) Faez Shakil 2023-06-17 17:13:05 +0500
  • 794db3e7b9
    Server Example Refactor and Improvements (#1570) Randall Fitzgerald 2023-06-17 07:53:04 -0400
  • 5ddf7ea1fb
    hooks : setting up flake8 and pre-commit hooks (#1681) Jiří Podivín 2023-06-17 12:32:48 +0200
  • bac19927c3
    readme : alternative way to build for Android with CLBlast. (#1828) Gustavo Rocha Dias 2023-06-17 06:01:06 -0300
  • b4c6f46f17
    Allow cmake to build ggml as a library (#1896) Kerfuffle 2023-06-17 01:49:42 -0600
  • 92f20d9942
    train : get raw text instead of page with html (#1905) David Yang 2023-06-17 14:51:54 +0800
  • d411968e99
    opencl : support k-quants (#1836) 0cc4m 2023-06-16 20:59:49 +0200
  • b41b4cad6f
    examples : add "simple" (#1840) SuperUserNameMan 2023-06-16 20:58:09 +0200
  • 13fe9d2d84
    cmake : add auto detection of BLAS_INCLUDE_DIRS (#1886) Zenix 2023-06-17 03:53:04 +0900
  • ac3b886953
    llama : fix embd when offloading non-repeating layers (#1891) Johannes Gäßler 2023-06-16 20:25:51 +0200
  • 5b9ccaf104
    Fixed possible macro redefinition (#1892) FrankHB 2023-06-17 02:25:01 +0800
  • 9cbf50c041
    build : fix and ignore MSVC warnings (#1889) Borislav Stanimirov 2023-06-16 21:23:53 +0300
  • 3d01122610
    CUDA : faster k-quant dot kernels (#1862) Kawrakow 2023-06-16 20:08:44 +0300
  • 602c748863
    gitignore : add several entries specific to Visual Studio (#1888) Borislav Stanimirov 2023-06-16 09:58:11 +0300