Data analysis tools

Having gone through the process of learning pandas (I'm a long-time Pythonista) enough to merge, clean, and recode some survey results, I would feel more comfortable loading the data into PostgreSQL and munging the hell out of it there with CTEs, GROUP BYs, and aggregate functions, etc.

But... having a perfectly replicable, automated process from start to finish is nothing to sneeze at.

Basic data analysis (!)

For the last two weeks our research team has been trying to agree on one number: how many people completed our survey.

I think I've finally convinced them that the number I've given, implemented in Python pandas, is the most correct.

This does not bode well for the real data analysis to follow :/

Data analysis

So just descriptive stats for now. I haven't even started plotting data yet, but I'm looking forward to that.

So much better than fumbling around in a spreadsheet!

Data analysis

I am really liking Python's pandas for cleaning up and analyzing these survey results. I was facing two sets of results for the same survey with responses that were coded differently in each set (due to translation reordering).

So I'm working with the text value of the responses and using pandas.pydata.org/pandas-docs/ to get everything into a single language, then doing a first pass of analysis with pandas.pydata.org/pandas-docs/ . So far so good! And REPLICABLE!

For context, I discussing this with a research team and was surprised they wanted to keep the partial data. Given the preamble, keeping the data seems unethical.

They have since changed their mind. \o/

Consent preamble on a survey that you're taking said "You can choose to withdraw at any time by closing your browser or navigating away."

After you complete about half of the survey, you decide that you don't want to finish it and close the browser.

What do you expect the researchers to do with the partial data you submitted?

Bilingual survey; data cleanup

I helped design a bilingual survey, but didn't implement it. The PI entered it all into Qualtrics, twice: once for each language.

Now that we have the results, the codes don't match up between languages (of course), so I have to do a ton of cleanup before I can start analysing the data.

But first I have to stop staring at qualtrics.com/support/survey-p wondering why the PI didn't just use that?

See p.8 of doi.org/10.5860/lrts.63n2.119 for my current example. (It's paywalled--stupid ALA--but Google Scholar will find a perfect copy for you at academia.edu)

If I zoom in on-screen, I can barely make out the blurry label text.

Worse, even when printed at 600dpi--theoretically the whole reason for the weird layout--fig. 3 is unreadable.

I hate when I have to struggle to read graph labels in PDF articles on even a 27" QHD screen.

There has to be a better way. Like HTML?

Or at least not arbitrarily scaling the graphic down to 5/8s of the page width. Use the full "8.5"" page-width; it's not like an extra page or two in article length is costing you anything.

Today from when you join the review of draft #1 of a collaboratively-edited 30 page whitepaper on how to implement X in an information retrieval system, where X is a cool thing big corporations have been doing for a few years, and ask the question "So has any research been done on whether X actually benefits users? Is there a lit review?", and you get (presumably embarrassed) silence as a response.

Guess who gets to pull together that lit review?

Dan Scott boosted

Hi all :) I'm working on a PhD on , specifically studying the for science movement. As an activist I'm interested in feminist approaches to tech which I try to implement in meet ups, workshops, etc. A big part of my time goes to working as open as STS lets me, another big part goes to WikiData because I love it. Beautiful communities are what keep me existing through late capitalism, so here I am in Mastodon <3.

Well, good news is that the data was saved in the database correctly; bad news is that the incorrect data that I was shown & downloaded resulted in my negative evaluation of the tool's reliability & data accuracy. (I finished the survey before the researcher responded to me).

Completed a survey + exercise sent out by a PhD candidate working in the same research space. I found and reported a bug in the exercise that might skew their data significantly.

They're doing really interesting work, and including a hands-on exercise in the middle of the survey was a methodological approach that was new to me. I liked it!

But I hope that bug doesn't screw things up too much--my heart sank when I ran into it.

PDF

Every time I try and copy a bit of text from a PDF and the text comesoutallsquishedtogether or fully justified via spaces, I get annoyed.

I get angry when it's a paper behind a paywall, because apparently it takes a lot of $$for those publishers to mess up basic reader requirements. O-ho, and the editor's reply: "The biggest issue on our end is that we rely on the royalty payments we receive from MUSE and JSTOR to run the journal and our contracts require exclusivity."$$\$, and a possibly overly conservative reading of those contracts? I can't imagine every journal on MUSE/JSTOR is blocked from offering OA options.

My response: "This is indeed unfortunate, as it means that our faculty and students cannot comply with open access mandates from funding sources such as the Social Sciences Humanities Research Council (SSHRC). Non-compliance would render them ineligible for future grants. I will therefore have to recommend that they choose a different venue for publishing their work."

Journal's reply to a request to allow deposit of a post-print copy of an article to an institutional repository:

"Our experience has been that most university libraries have access to [Project MUSE or JSTOR] so our target audience usually has access"

This, despite pointing out SSHRC (amongst many other funders) "open access within one year" publication requirements.

Super disappointing.

Well hey there journal website don't you think that maybe you should post your ISSN(s) somewhere?

Algebra refresher workshop

Side note: figured out how to use stem: blocks in Asciidoc so that I can continue taking notes on my laptop rather than scrawling in my notepad.

I think it helps to be able to express things in the required notation rather than just using symbols, e.g.

stem:[\forall x, y \in I, x > y => f(x) \lt f(y)]

Algebra refresher workshop

I survived day one. The agenda was... ambitious, and predictably the workshop leader began to start skipping through content. Which leads to shaky foundations, because of course almost all math builds on previous math.

So the next three days are going to be very challenging. But I can do it!

Show older

Scholar Social is a microblogging platform for researchers, grad students, librarians, archivists, undergrads, academically inclined high schoolers, educators of all levels, journal editors, research assistants, professors, administrators—anyone involved in academia who is willing to engage with others respectfully.