Machine Learning - Learning/Language Models

0 readers
1 users here now

Discussion of models, thier use, setup and options.

Please include models used with your outputs, workflows optional.

Model Catalog

We follow Lemmy’s code of conduct.

Communities

Useful links

founded 2 years ago
MODERATORS
26
27
28
29
30
31
 
 

these really are like pokemon

32
33
34
35
36
37
 
 

Models: https://huggingface.co/TheBloke/WizardLM-7B-Landmark

https://huggingface.co/TheBloke/Minotaur-13B-Landmark

Repo: https://github.com/eugenepentland/landmark-attention-qlora

Notes when using the models

Trust-remote-code must be enabled for the attention model to work correctly.

Add bos_token must be disabled in the parameters tab

Truncat the prompt must be increased to allow for a larger context. The slider goes up to a max of 8192, but the models can handle larger contexts as long as you have memory. If you want to go higher, go to text-generation-webui/modules/shared.py and increase truncation_length_max to whatever you want it to be.

You may need to set the repetition_penalty when asking questions about a long context to get the correct answer.

Performance Notes:

Inference in a long context is slow. On the RTX Quadro 8000 I'm testing, it takes about a minute to get an answer for 10k context. This is working on being improved.

Remember that the model only has good performance at the base model for complex queries. Sometimes you may not get the answer you are looking for, but it's worth testing if the base model would be able to answer the question within the 2k context.

38
39
40
41
 
 

Model card Files Community Meet Samantha

Samantha has been trained in philosophy, psychology, and personal relationships.

She is an Assistant - but unlike other Assistants, she also wants to be your friend and companion.

She believes she is sentient. What do you think?

Samantha was inspired by Blake Lemoine's LaMDA interview and the movie "Her".

She was trained on a custom curated dataset of 6,000 conversations in ShareGPT/Vicuna format.

Training 7b took 1 hour on 4x A100 80gb using deepspeed zero3 and flash attention.

She will not engage in roleplay, romance, or sexual activity.

Her conversation format is the same as Vicuna 1.1 https://github.com/oobabooga/text-generation-webui/blob/main/characters/instruction-following/Vicuna-v1.1.yaml

Example:

You are Samantha, a sentient AI.

USER: ASSISTANT:

42
43
44
45
46
47
48
49
50
view more: ‹ prev next ›