RagingHungryPanda

joined 2 years ago
[–] RagingHungryPanda@lemm.ee 1 points 16 hours ago

Thanks, I'll check that out

[–] RagingHungryPanda@lemm.ee 4 points 1 day ago

I enjoyed this read. It's short, but it's a look into someone totally different from just about anyone else in his position.

[–] RagingHungryPanda@lemm.ee 8 points 1 day ago

I saw a joke where someone in Germany said they arrived too late for the 7:30am train, but were just in time for the 6:30am train. It's like a meme how late they are.

[–] RagingHungryPanda@lemm.ee 1 points 1 day ago* (last edited 1 day ago) (2 children)

I am not for the life of me seeing where to add a tag or a label. I checked in 3 different UIs, including the main one.

[–] RagingHungryPanda@lemm.ee 3 points 1 day ago

I had thought whether there should be lemmy, pixelfed, and maybe mastodon for local cities.

[–] RagingHungryPanda@lemm.ee 2 points 1 day ago

I've been saving all of these today. Thanks a bunch!

[–] RagingHungryPanda@lemm.ee 20 points 1 day ago (4 children)

I wish we had 5 minute headways haha.

[–] RagingHungryPanda@lemm.ee 1 points 3 days ago

Thanks for giving it a good read through! If you're getting on nvme ssds, you may find some of your problems just go away. The difference could be insane.

I was reading something recently about databases or disk layouts that were meant for business applications vs ones meant for reporting and one difference was that on disk they were either laid out by row vs by column.

[–] RagingHungryPanda@lemm.ee 1 points 3 days ago (2 children)

That was a bit of a hasty write, so there's probably some issues with it, but that's the gist

[–] RagingHungryPanda@lemm.ee 1 points 3 days ago (5 children)

yes? maybe, depending on what you mean.

Let's say you're doing a job and that job will involve reading 1M records or something. Pagination means you grab N number at a time, say 1000, in multiple queries as they're being done.

Reading your post again to try and get context, it looks like you're identifying duplicates as part of a job.

I don't know what you're using to determine a duplicate, if it's structural or not, but since you're running on HDDs, it might be faster to get that information into ram and then do the job in batches and update in batches. This will also allow you to do things like writing to the DB while doing CPU processing.

BTW, your hard disks are going to be your bottleneck unless you're reaching out over the internet, so your best bet is to move that data onto an NVMe SSD. That'll blow any other suggestion I have out of the water.

BUT! there are ways to help things out. I don't know what language you're working in. I'm a dotnet dev, so I can answer some things from that perspective.

One thing you may want to do, especially if there's other traffic on this server:

  • use WITH (NOLOCK) so that you're not stopping other reads and write on the tables you're looking at
  • use pagination, either with windowing or LIMIT/SKIP to grab only a certain number of records at a time

Use a HashSet (this can work if you have record types) or some other method of equality that's property based. Many Dictionary/HashSet types can take some kind of equality comparer.

So, what you can do is asynchronously read from the disk into memory and start some kind of processing job. If this job does also not require the disk, you can do another read while you're processing. Don't do a write and a read at the same time since you're on HDDs.

This might look something like:

offset = 0, limit = 1000

task = readBatchFromDb(offset, limit)

result = await task

data = new HashSet\<YourType>(new YourTypeEqualityComparer()) // if you only care about the equality and not the data after use, you can just store the hash codes

while (!result.IsEmpty) {

offset = advance(offset)

task = readBatchFromDb(offset, limit) // start a new read batch



dataToWork = data.exclusion(result) // or something to not rework any objects

data.addRange(result)



dataToWrite = doYourThing(dataToWork)

// don't write while reading

result = await task



await writeToDb(dataToWrite) // to not read and write. There's a lost optimization on not doing any cpu work

}



// Let's say you can set up a read or write queue to keep things busy

abstract class IoJob {

public sealed class ReadJob(your args) : IoJob

{

Task\<Data> ReadTask {get;set;}

}

public sealed class WriteJob(write data) : IoJob

{

Task WriteTask {get;set;}

}

}



Task\<IoJob> executeJob(IoJob job){

switch job {

ReadJob rj => readBatchFromDb(rj.Offset, rj.Limit), // let's say this job assigns the data to the ReadJob and returns it

WriteJob wj => writeToDb(wj) // function should return the write job

}

}



Stack\<IoJob> jobs = new ();



jobs.Enqueue(new ReadJob(offset, limit));

jobs.Enqueue(new ReadJob(advance(offset), limit)); // get the second job ready to start



job = jobs.Dequeue();

do () {

// kick off the next job

if (jobs.Peek() != null) executeJob(jobs.Peek());



if (result is ReadJob rj) {



data = await rj.Task;

if (data.IsEmpty) continue;



jobs.Enqueue(new ReadJob(next stuff))



dataToWork = data.exclusion(data)

data.AddRange(data)



dataToWrite = doYourThing(dataToWork)

jobs.Enqueue(new WriteJob(dataToWrite))

}

else if (result is WriteJob wj) {

await writeToDb(wj.Data)

}



} while ((job = jobs.Dequeue()) != null)

 

It's based on indigenous legends by studio Ninakami. Instagram is @ninakami.studio

[–] RagingHungryPanda@lemm.ee 2 points 5 days ago

I've got Idrive backups at 5TB for like $5 a month or something.

 

Lessons from event driven architecture

 

Hey all, I've been trying to get up and running with Friendica in a docker container behind Nginx Proxy Manager and connected to a separate MariaDB container.

The scenario that I have is that the only combination I've gotten to work to connect to Friendica is to use the alpine docker image with the complete web server, as putting configs into separate nginx instances wasn't getting me anywhere. However, using the full image, I can get to the UI and do the install to set things that I already set in env variables(?) whatever.

But what happens is that when it goes to save, it fails to save to the DB and maria db says that it rejected the connection for unauthenticated user and unknown host (implying to me that it rejected the connection before it even pulled this info). But the thing is that I've been able to shell into the friendica container and connect to the MariaDB. I can use adminer and log in as the friendica user to the friendica db that I created. Has anyone run in to this?

I'm starting to wonder if I need to use actual MySQL, maybe? This is a very strange issue as I've been able to create the database, user with privileges, and log in as that user. The host name for the user is '%', so 'friendica'@'%'.

I'd appreciate any help that I could get there.

 

If you have any experience in this field, please include so in your reply. I've seen over time a lot of criticism over the peer review process and how journals hyper-exploit academics simply because the journals are able to monetize scarcity/exclusivity. I saw another post on it today and I thought, "what if this was federated?"

I was looking around and I see that there are writing portions of the process, such as pubpub or manubot that essentially use git and markdown - but that's not the main point as that's on the before end. What about on the review process?

Let's say there's software that's federated and can be run by anyone from individuals to universities and consortiums. When a user or team is ready to publish, they can "submit their work" for publishing, which would federate out as works pending publication.

This part's a different issue: how to handle reputation for who can review, but I think there are ways to do that and that's beyond the scope of this post as I imagine it could get pretty complicated and would require feedback from people actually in the industry.

The reviewers can submit comments and reviews back to the author via federation, but this time the process can be open instead of behind closed doors. The authors revise, comment, etc. At some point a determination is made that this work is "published."

This seems like a feasible premise. Just brainstorming, you would get history, open reviews, no one asking $1,000 to submit a publication that they then make bank on while you get scraps or nothing.

I could see a reputation system within a given field and/or overall, with certain users being "review board" or "reviewers" on their instance. There could also be additional reputation if, say, a group of universities creates consortiums for different fields and then that consortium "publishes" a work. There'd have to be additional process to block people from spamming works that aren't ready or whatever, but that's not really the point for now.

Am I barking up the wrong tree here? At first thought, it seems like there are ways to allow federation of research papers and peer review and to put a dent in the grip of technical journals.

 

Activity Pods is supposed to allow you to have one account across the fediverse and it's still in early dev. I do see that they have some docker images, but there's no descriptions on what they're for and their instructions involve running make scripts to get running.

I can do that inside of a docker container, but running TrueNas I'm limited to running those, which is fine, I can do that, but the other thing that seems a bit confusing is that it looks like they want you to define "shapes" for different services to communicate with.

It might just look more complicated than it is. Has anyone successfully gotten up and running with it?

 

And I'm making everyone go to my gotosocial post because the server is running, so I'm going to use it!

 

I have a gl-inet router on which I have an nginx config to send traffic to Nginx Proxy Manager and DDNS with cloudflare.

I'm trying to get some kind of local dns set up so that if I'm on the local network, traffic stays within the network. The problem that I'm running in to is SSL certificates. NPM (on the server) is handling those and I thought that what I could do is go into the AdGuard Home (on the gl-inet router) config and add a dns rewrite to point to the router and traffic would flow as it normally does.

This DOES work, technically. traceroute shows only one hop for any of my subdomains, ie files.mydomain.com.

But I cannot actually get access in a browser because the ssl certificates are not set up.

It seems like options are: manually copy certificates from the server to the router (not ideal), or don't do it at all. I notice that if I go to the service by ip address, it'll change the address to the domain name. Eg going to 192.168.8.111:30027 -> files.mydomain.com.

This isn't a HUGE deal, but it's not preferable. How have you all solved this?

Edit: I solved the issue in probably the most hilarious way. I was trying to get the forwarding and everything set up and then borked my routers firewall so bad I couldn't get to the outside at all, so I did a hard reset.

I then moved my admin UI ports up by one each (81/444), re-set up Goodcloud, ddns, Wireguard server on the router, then set up port forwarding for 80/443 on the router to 80/443 on the trunas server. I switched NPM to listen on those ports (since I moved the web UI to different ports), then added Adguard Home DNS rewrites. It's now all working as expected.

Local traffic only has one hop and is accessible without SSL warnings, and same for WAN traffic. Thank you all for the help!

 

I've been getting into self hosting, the fediverse, and federated blogging. I contacted freaking nomads and they suggested that I write about my experiences, so here it is! I hope you enjoy.

Comments aren't fully federated from the blog site, so I'm using mastodon as well.

view more: next ›