Fine-tuning Mistral Small

Xingchen Yu Feb 15, 2025

I want to write a small follow-up to my last post, which used Phi-4 as the base model. I decided to switch to the newly released Mistral-Small instead because I quite like it. While working on it, I discovered some more quirks that I believe are worth discussing. I know in my last post, I mentioned that I would discuss how to use a fine-tuned model with RAG. That will be a follow-up to this post, and it will use the Mistral-Small-based fine-tuning discussed in this post. So please stay tuned.

Fine-tuning LLM on AMD GPU

Xingchen Yu Jan 31, 2025

I’ve been fascinated by open-source LLM models and have been running them locally. I like to maintain full control of ML models that run instead of relying on the cloud, simply because it’s more fun that way. From my previous posts, you may know that I use AMD GPU on my Arch Linux (btw), so I will continue this trend of struggling to get things working on my AMD GPU. My most recent project is an attempt to recreate the character Frieren from Frieren: Beyond Journey’s End. Why Frieren in particular? That’s because it appears none of the open-source LLM models seem to be aware of this series at all. So any new behavior added can be attributed to what I did instead of the base model’s knowledge. Furthermore, Frieren has become a very successful series internationally, which makes it somewhat easier to collect a lot of materials from the internet for training purposes. I’ve also binge-read its manga, so I can validate the model’s correctness. To accomplish this, I have two milestones:

Train Frieren’s style of speech with LoRA fine-tuning.
Add world knowledge to the fine-tuned model using a RAG.

I’ve decided to break it down into two separate posts since they cover different techniques and many quirks. This post is focused on the first part, fine-tuning for style.

Frieren

Training Stable Diffusion LoRA with Kohya on AMD GPU

Xingchen Yu Mar 31, 2023

Since my last post, a lot has changed. So instead of adding updates to my previous post, I figured I could write a follow-up instead. A lot of quirks with sd_dreambooth_extension that I mentioned last time have been fixed. It is now able to create standalone LoRA on its own without the hacks that I mentioned. However, I also want to give kohya_ss another try and see if I can get it to work this time. Again, our main challenge here is to get it to work with an AMD GPU. Recall that last time I couldn’t get it to work for a couple of issues: it had tons of hard-coded Windows path separators, which made it difficult to run on Linux, where PyTorch’s ROCm build is available, and I couldn’t get TensorFlow to work on AMD GPU. Things have certainly changed a lot in just a month or so. The good news is I managed to get it to work on Linux while running on AMD GPU. So I’d like to share my setup and some scripts that I wrote for myself.

Training Stable Diffusion Concept with LoRA on AMD GPU

Xingchen Yu Feb 1, 2023

Since Stable Diffusion became publicly available, I spent quite some time playing with it using stable-diffusion-webui. I downloaded a number of different models to play with and had a lot of fun while at it. However, it quickly became apparent that a model has its limits. It can only generate what it knows. The online community is extremely active in improving existing models by adding new content using DreamBooth, or mixing multiple models into cocktails of models. I was fascinated and wanted to add my own content on top of Stable Diffusion models. In my many attempts, I had varying degrees of success. So I want to share what I learned from my experience. More specifically, my quest to train a concept with LoRA on my AMD Radeon RX 6700 XT, which posed some unique challenges that I don’t believe are being discussed enough.

Mars Selfie Robot Woman Playing with Penguin

Getting the Right Time Zone in Python

Xingchen Yu Mar 18, 2022

In light of the recent news that most of the US will stick with daylight saving time starting 2023, I’d like to revisit how error-prone dealing with time zone could be in software. Even in Python, which is supposed to be a more newbie-friend and intuitive languages out there, it does a rather confusing job at at. I hope to write this post as more of code references for how to deal with time zones in Python, with a more humanly readable explanation instead of pages of cryptic API documentation. In my professional experience, I have delt with countless time zone and daylight savings related bugs literally every year. There’s always something that can go wrong. Being oncall during daylight savings change is always a time.

Oculus Quest 2 Wireless PC VR Setup

Xingchen Yu May 28, 2021

I consider myself a moderate VR enthusiast. I’ve been an owner of an Oculus Rift for a couple of years, and recently I got an Oculus Quest 2. It is an interesting device, which is completely standalone with inside-out tracking. You can put it on and play anywhere you’d like without a gaming PC nor base stations. It even has some features like hand tracking that’s quite impressive. That being said, I found its ability to wirelessly stream games from PC to be the most intriguing, because I purchased most of my games on Steam! In this post, let’s explore a few different ways to play PC VR games wirelessly on the Quest 2 and their quirks. In particular, using a WiFi 6 hotspot to minimize network latency.

Google Earth BeatSaber Gorn

Building a Compact PC

Xingchen Yu Sep 28, 2019

I build PCs as a hobby, though I don’t go for the best and latest. I usually like to set a challenge for myself and try to accomplish that goal cost-effectively. I started with something really simple and basic, then incrementally upgrade my build piecemeal over time. So I always have something to do without creating a lot of waste or blow away all of my income. Inevitably, I eventually replaced enough parts of my build that my entire v1 of my build was sitting in my office as spare parts. It raises the question posed by Ship of Theseus, however, that is not the point of discussion for this blog entry. Since I had enough spare part to build a fully functional PC, I decided to reuse as many parts as possible to build a PC for my mom. Unfortunately, my mom has to take a flight to visit us, and she doesn’t want to bring a check-in luggage just for my PC. Fortunately, according to TSA, it is possible to place a desktop computer in carry-on luggage, as long as you take it for inspection like laptops. So for this project, I challenged myself to build the tiniest possible PC with generic parts, full desktop-class components. This blog entry is not meant to serve as a build guide, but rather to discuss some quirks that I did not or simply could not have anticipated ahead of time. The fact that everything fit together at all was quite a miracle in retrospect.

Front yard Back side

My Take on Reason Native

Xingchen Yu Jun 1, 2019

I recently became interested in Reason, which is an alternative syntax for OCaml in an attempt to keep the JavaScript folks more comfortable writing a “mostly pure” functional language. Judging by its own site, they seem to put a lot of focus on developing Web front end in conjunction with React. As someone who works professionally with Haskell and JavaScript (among other things), but no prior knowledge of OCaml, I feel I’m a perfect candidate to dive into Reason and get a feel of it. However I did not use it to develop a web app, instead, I chose to use it for native development. I feel this is the best way to consume the language itself instead of being heavily influenced by React and JavaScript interop. So I wrote a personal Raspberry Pi project with Reason Native. Just to clarify, this post is meant to be an opinion piece rather than an in-depth review of Reason as a language.

Failure is Inevitable

Xingchen Yu Apr 21, 2019

Anything that can go wrong will go wrong. We live in a world where system failures are inevitable. This is particularly true for highly complex systems, where there are numerous points of failures and may carry significant importance. How do we keep systems available and stable? How do we minimize and mitigate failures?

Troubles with device-width

Xingchen Yu Nov 25, 2013

Mobile web developers are probably aware of what media queries are. They are extremely useful tools to select resource as well as provide layout tweaks for responsive web designs. One of the well known media query feature is device-width along with its variations (min-device-width, max-device-width). However I have discovered that it has vastly inconsistent behaviours on different mobile browsers. In this article, I’d like to discuss what exactly device-width means, its issues as well as solutions.