homelab.

195 readers
3 users here now

Welcome to your friendly /r/homelab, where techies and sysadmin from everywhere are welcome to share their labs, projects, builds, etc.

founded 2 years ago
MODERATORS
1
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/homelab by /u/Abject_Arm_895 on 2025-05-30 00:41:20+00:00.

2
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/homelab by /u/Ok-Nefariousness6082 on 2025-05-29 22:49:03+00:00.

3
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/homelab by /u/Chemical-Emu-3740 on 2025-05-29 16:38:29+00:00.


I am a second year Computer engineering student from Spain, some weeks ago we had a Blackout, and although having an UPS, it lacks USB or Ethernet ports, so it just provides a warning by shouting through an internal speaker. As you could assume, it did nothing. In order to fix it, I have seen second hand UPS that allow the use of NUT to manage power outages, but wanting to avoid spending money to solve the issue (I know they aren't expensive, but I didn't want to spend money if another solution was possible) And I also start thinking of using NUT without a proper UPS.

First of all, my homelab consist of two computers, my NAS, running TrueNAS scale, and my primary server running Proxmox, both of them and the switch that connects them to the router are connected to my UPS. So the idea is that every two minutes my primary server pings the router (the task is scheduled using cron), if it is successful, perfect, if not it tries to ping one minute later just to confirm, if everything fails, run upsmon -c fsd to simulate that the UPS lost power. The primary pc is configured as master and my NAS as slave, and it is listening to the default port of NUT in the primary server ip's.

This is a link to the PDF where I go step by step, explaining what I did.

I want your opinions. I know it may be too janky, but I believe it fits my homelab's vibes.

4
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/homelab by /u/Catchgate on 2025-05-29 18:40:23+00:00.

5
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/homelab by /u/Scary-Break-5384 on 2025-05-29 15:06:46+00:00.

6
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/homelab by /u/Sh2_ on 2025-05-29 13:56:29+00:00.

7
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/homelab by /u/Maninii on 2025-05-29 13:53:19+00:00.

8
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/homelab by /u/Individual_Map_7392 on 2025-05-29 06:42:27+00:00.

9
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/homelab by /u/zachsandberg on 2025-05-29 04:40:08+00:00.

10
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/homelab by /u/Knurpel on 2025-05-29 03:43:32+00:00.


When you do a smartctl self test on your NVMe, you probably will get this error, every time you try:

“Read Self-test Log failed: Invalid Field in Command (0x2002)”

As if this alone isn’t quite disconcerting enough, on closer inspection of the NVMe data, you will find many, possibly thousands of errors reporting “Invalid Field  NVMe error count increased in Command.” Your smartd service will tell you that your “NVMe error count increased” to some ungodly number.

Is your NVMe on is last gasp?

No, it is not. The error is caused by smartctl, an app  routinely installed on most Linux machines as part of the smartmontools package. Smartctl is supposed to warn you of drive errors, and an impending death of your unit.

Smartctl in its current version simply does not work with most NVMe drives, it errors-out when you try, only after filling the log with another useless entry, and the user with endless angst. It also will fill the coffers of NVMe suppliers when you rush out to buy a new device, only to notice that the errors continue.

What’s worse, smartctl’s attendant smartd service will simply ignore your NVMe devices, and it will NOT warn you when the device is about to really kick the bucket. You get a false sense of security on top of false errors.

This has been going on for years.

Finally, a new version of smartctl has been developed that avoids this problem. The version number is 7.5.  Your smartctl version most likely is 7.4.

HOWEVER, when you try to update smartmontools, you will most likely hear that the latest version is 7.4, the one with the errors.

The new version of smartmontools will take a while to hit the major distros.  Compiled versions of smartmontools 7.5 are available for only a few platforms.

Currently, the only alternative is to compile your own. http://smartmontools.org/ is down as I am typing this, so here is a short howto for Ubuntu-based machines:

 

apt install libsystemd-dev  #you need this for the smartd service to work

cd /tmp  #or wherever you prefer

wget https://sourceforge.net/projects/smartmontools/files/smartmontools/7.5/smartmontools-7.5.tar.gz

tar zxvf smartmontools-7.5.tar.gz

cd smartmontools-7.5

./configure

make -j $(nproc --all)

sudo make install

 

Note:  Your new smartctl version 7.5 will be installed to /usr/local/sbin/smartctl.  Your old 7.4 version will still be in /usr/sbin/smartctl.   When you hit “smartctl” on the command line, it most likely will use the new version, do check.

Applications that use smartctl, for instance Webmin,  will have to be pointed at the new /usr/local/sbin/smartctl.

Also, your smartd service needs to know of the new smartctl. Edit /etc/systemd/system/smartd.service to make the ExecStart line read as follows:

ExecStart=/usr/local/sbin/smartd -n $smartd_opts

 

Now on the command line:

systemctl daemon-reload

systemctl restart smartd

For a wellness check, do a

systemctl status smartd

If everything was done right, smartd will now monitor your NVMe devices on a regular basis. If you are uncomfortable mucking with the command line and following the advice of random redditors, you will have to live with the problems until the new smartctl hits your distro. The long list of faux errors isn’t the problem. Smartctl ignoring your NVMe will be a huge problem once the device dies without a warning.

11
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/homelab by /u/foobarney on 2025-05-29 02:59:44+00:00.

12
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/homelab by /u/nokerb on 2025-05-29 03:46:42+00:00.

13
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/homelab by /u/cdarrigo on 2025-05-29 00:30:30+00:00.


I recently replaced a battery in one of my cyber power UPS units. I suspect I'll be replacing some others in the upcoming months.

What do you guys do with the old battery? I think APC offers a return service. I haven't found one for cyber power UPS.

14
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/homelab by /u/HubbleWho on 2025-05-29 00:43:49+00:00.

15
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/homelab by /u/balthasar127 on 2025-05-28 23:15:40+00:00.

16
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/homelab by /u/wreckemberry21 on 2025-05-28 13:53:49+00:00.

17
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/homelab by /u/Kirys79 on 2025-05-28 19:24:20+00:00.

18
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/homelab by /u/MageLD on 2025-05-28 17:14:26+00:00.

19
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/homelab by /u/thedecibelkid on 2025-05-28 15:00:10+00:00.

20
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/homelab by /u/PHUZZYthecat on 2025-05-28 08:33:27+00:00.

21
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/homelab by /u/Background_Wrangler5 on 2025-05-28 07:47:08+00:00.


At the moment I run:

  • homeassitant (esphome, nodered, zwave, zigbee, mqtt)
  • jellyfin (with friends)
  • truenas
  • immitch
  • frigate

It happen that I got some free resources, what else can I run? could be something useless but fun or educational. What do you guys host at home?


update: I have proxmox server, so any LXC/VM should be fine as long as it does require tons of storage.

e5-2680 v4, 128gb ram. No dedicated VGA!

22
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/homelab by /u/ConfusionExpensive32 on 2025-05-28 07:19:54+00:00.

23
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/homelab by /u/KooperGuy on 2025-05-28 02:49:05+00:00.


Hello everyone,

Just wanted to share a quick snapshot of my homelab here.

https://imgur.com/a/dbJ2Jsu

My primary focus of my lab has just been with experimenting with hardware and distributed storage solutions. The cabinet on the left has a pair of SN2410 switches running cumulus linux. I also experimented with both an infiniband SB7800 and Dell Z9100 for 100G backend networking. All networking is either done via CX4 or CX5 cards. The right cabinet has an ECS cluster (Elastic Cloud Storage) which are all R740XD2 nodes as well as a few 3.5" R740XDs I got. Above them are two SuperMicro Ice Lake systems and an older R730XD system.

Each one of these R740XD systems seen on the left side came barebone. Over time I upgraded each of them to support 12x U.2 NVMe drives, cascade lake CPUs, and Optane PMEM as an experimental storage tier. I've played around with a lot of things like CEPH, Lustre, BeeGFS, etc using 120 1TB P4510 drives across the 10 nodes.

Here's some unfinished cabling work I did for the ECS Cluster: https://imgur.com/a/KVSunRg

Here's a R640 with 10x NVMe enabled bays and 768GB of memory: https://imgur.com/a/Dgkw8St

I had 4x of these but slowly phased them out as I focused on the R740XD NVMe systems.

Using a Brocade/Ruckus switch and a Dell N3248TE-ON for all my management/iDRAC connectivity. I fully swapped over to the N3248TE-ON for that and decommissioned the Ruckus switch though.

On the side I alsp like to try and build NAS boxes for people using SuperMicro hardware I've come across. Like these: https://imgur.com/a/B3YpPjj

What one of those NAS configs look like: https://imgur.com/a/dUKFoyV

Ultimately I'll be selling all these systems individually as of course I don't need so much hardware long term. Just had the opportunity to set them up and experiment so... Lab it is!

Do you have much experience with distributed NVMe storage? Anything you'd suggest I take a look at? I'm down to 9 nodes now as I sold one off and more will follow. My plan will be to consolidate my storage down to a more reasonable number of nodes... Maybe five or so, depending on erasure coding.

I've done some dabbling with AI stuff using as much memory as I could stuff into a single node along with a pair of Gold 6230s. Not the best performance but was able to run the 671b DeepSeek model locally on one of my nodes. Would of course be a world of a difference with a some real GPUs.

Some of the most relevant stuff I've experimented with via my lab has been the Cumulus Linux and SONiC networking. Learning how to effectively do linux based networking has been great, along with RDMA/RoCE configuration as well as working with infiniband. I've found that most people aren't too focused on those particular aspects of networking which is fairly important for large AI/ML clustering and HPC.

24
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/homelab by /u/korba_ on 2025-05-28 01:29:16+00:00.


Where I live we can now get 8Gbps symmetric fiber to our house at a very reasonable price. But before I switch to it I want to make sure I can actually use it to a good extent.

Now my home/homelab network is mostly 2.5Gbps with some 1Gbps bits.

I'm using a chinese fanless box with 4 2.5Gbps NICs as a firewall running OPNSense, it has served me very well.

I want to move to a dual 10Gbps box also running OPNSense (preferably). The options (within reason for a homelab) I've been able to find so far are:

  1. An OPNSense appliance (like the dec2752) - USD 1.370 - Obviously compatible and with a good chance that its performance and reliability will be up to the task
  2. A ProtectCli appliance (like the VP6650) - USD 800 - Good reviews, reasonably powerful CPU with good PCIe bandwith
  3. A chinese appliance (there are several on aliexpress with two SFP+ ports and N100/N305 CPUs) - USD 400 - Low confidence on thermals specially for a SFP+ 10Gbps RJ45 module (I need one at least) and the N100 as far as I've read might not be enough to route and filter 10Gbps flows. There are some models with N305 but its not significantly better at single thread or PCIe bandwith which seems to be the most relevant here.
  4. A custom build - I'm thinking of using a 1U chassis that can accommodate a PCIe card (like an InWin RF100 or a generic one from aliexpress and an Intel I3-14100 with a PCI dual SFP+ NIC) - parts for this (without including memory and storage - to make the comparison fair with the other options) come up to USD 650

Thoughts, ideas? What am I missing/not seeing? Is there a major disadvantage to option 4 (custom build) that I'm overlooking?

Appreciate the feedback!

25
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/homelab by /u/VagueDustin on 2025-05-28 00:43:40+00:00.

view more: next ›