Hey everyone. I am a PhD Student in Computational Linguistics. My main research interest is computational linguistics for under resourced languages. I am particularly interested in computational linguistics for Bantu languages because of the extremely complex morphology that they bring to the table.

I am also doing research in sentiment analysis on stance detection and irony/sarcasm detection. Most of my research is on machine learning approaches.

Techbro grumbles 

Imagine how good free software could have been if RMS hadn't been actively excluding people from it for decades

Fresh from the oven:

Deckmaster v0.2.0 - a powerful Linux service that controls your Elgato Stream Deck.

Customize your own decks with widgets and execute commands, trigger keyboard events or dbus calls on the push of a button!

Packages & binaries for various distros and architectures.

Get it here:

transphobia, fash, "superstraight" 

Just a reminder that "superstraight" is just a transphobic dogwhistle started by actual friggin' fascists. It's just the next step of the "I identify as an attack-helicopter"-"joke", and it should NOT be taken seriously.

I just listened to a podcast with the interview of a PhD & entrepreneur explain that "you don't take any risks when you are a researcher or a professor, unlike a company creator"
Sorry to break it to startuppers, but here are some thoughts:

fsf, rms 

Looks like an open letter calling for the ENTIRE FSF BOARD to resign because they reinstated rms is gaining some traction.

It's got some pretty big signatories already.

Joe Biden has been president for 58 days and there are still concentration camps at the border.

My new project automatically grills a steak in the evening if the weather is nice

It is a fine night steak machine

mosh opinions 

Mosh is really awesome. I love how responsive it feels tethering off a cell connection. Also I can suspend my laptop, wake it and mosh is running like nothing even happened.

I just wish tunneling was easier to set up. I guess I could set up wireguard on my home network to solve that problem.
However, even if you mosh into a jump server and then ssh from there into something else, it still feels considerably lower in latency and is more resilient to network disruptions.

Multilabel morph model 

tag attribute pairs.
This means that unlikely combinations of tags and even completely unseen combinations of tags are no problem.
Looking at the results, they are much more accurate. I'll probably make a blog post with actual numbers but it definitely represents an improvement.
I needed better morph tags so that I can leverage them in my cg3 rules and trust that they're probably not leading me astray and now I think I can.

Multilabel morph model 

(Both the forward and backward context in addition to characters are used).
This means the model does not work with individual features represented in the feature string. If the model is confronted with a word it hasn't seen, it may get all the tags wrong.

With the multilabel model, the individual pieces of the morph tag are broken apart. The first impact of this is that the tag distribution isn't so skewed. Where there were 3,000+ complex tag combos, there are 82 2/3

Multilabel morph model 

Successfully doing multilabel morphological tagging for UD-style tags in Swahili!

In essence, a word like 'vyekundu', could have the features BantuNounClass=8|Number=Sing where the features are a series of attributes and values separated by a pipe symbol.
Previously, I used a naive model where the whole feature string was used as a single label and the model was just predicting the most likely label given the input sequence of words. 1/3

"Police are warning students and universities not to access Sci-Hub, an "illegal website" that allows users to download scientific research papers normally locked behind expensive subscriptions."

Very bad indeed. That name again: Sci-Hub. Remember it so you can avoid it. Ahem.

data_science_in_julia_for_hackers - Federico Carrone, Herman Obst Demaestri & Mariano Nicolini:

A hefty carbon tax on cryptocurrency-related capital gains when

swahili research happenstance 

I tracked the error down and it is due to not having models in stanza for Swahili. Then I remembered that stanza uses UD corpora. I'm annotating the first UD corpus of Swahili for the thesis I was procrastinating on. Thus, in the end, my procrastination was thwarted by itself.

swahili research happenstance 

So I trained a neural translation model for English to Swahili while procrastinating on thesis stuff a week ago using the scripts provided by argos translate and the global voices corpus i'm expanding on for my thesis work.
It works with argos nicely and the translations are kinda okayish. Might be better not using a transformer model. Anyways, I was wanting to test going the other direction (swh -> eng) but I was getting errors due to a missing tokenizer for swh.

Hello everyone, my name is Tamara (she/her), I am a PhD student in cultural anthropology and my research focusses on the development of artificial intelligence in the context of humanoid robotics. To really get a grasp of what is going on, I also study computer science and program parts of the robots’ software myself.

I am new here and would love to connect with people working on similar or related topics!

hi! I’m ezra. I’m a high school student interested in human organization and collaboration, and a lot of other things including (maybe) academia. I made a more complete list in my bio just for fun.

one of the main reasons I’m on this instance is to learn more about the paths you all have taken and continue to take to get where you are now!

Mabiki éditions is a Congolese publisher that releases works in #Lingala, #Kikongo, #Kiswahili, & #Tshiluba, as well as french & english:

