Explore data, Assignment 1: Visualise data

LucyTallents · 6 September 2024 13:22

Submit your assignment for the Visualise data in R module in a reply to this topic

Remember to include these five elements:

A plot that you’ve created during this module
The code you used to create it, surrounded by ```
How do you interpret the plot; what insight can you gain?
A brief reflection next step, challenge you solved, or problem you still have
Give specific and useful feedback to a coursemate

Remember that each written step need only be a sentence or two

Alice_Laguardia · 9 September 2024 09:15

Hi Lucy,
Here is my assignment.

Plot

image1008×720 30.6 KB
Code
‘names(penguins)
ggplot(penguins,
aes(x = body_mass_g,
y = flipper_length_mm)) +
geom_point()’
Interpretation - as body weight increases, so does flipper length
Reflection - forgot the headers, so used ‘names()’ command. Very useful.

Nicorll · 9 September 2024 13:10

Hi All,

My assignment

‘’'penguins_summary ← penguins %>%
group_by(sex, year) %>%
summarise(body_mass_mean = mean(body_mass_g))

ggplot(penguins_summary,
aes(x = year, y = body_mass_mean,
colour = sex)) +
geom_line()

ggplot(penguins,
aes(x = year, y = body_mass_g,
colour = species)) +
geom_point() ‘’’

The Gentoo species has had the highest average body mass over the three years. Chinstrap increased in 2007 and dropped in 2009 while Adelie seems to have peaked in 2009

Been trying to practice with own data in Rstudio but I am failing to access the tidyverse packages so I can sun ggplot.

LucyTallents · 9 September 2024 15:36

Yes, names() is particularly useful when working with unfamiliar datasets

LucyTallents · 9 September 2024 15:37

Is your tidyverse problem caused by the amount of time it takes to download and install the tidyverse? It has so many packages! Have you tried installing only those packages you need, e.g. ggplot2, readr, dplyr?

Or is it another error, that we can maybe help with?

Nicorll · 9 September 2024 17:10

Thank you for this, i will try and download only the packages i need.

Kundai · 10 September 2024 08:09

Hello . Here is my assignment

image1008×720 17.2 KB
ggplot(penguins,
aes(x = year,
fill = island)) +
geom_bar(position = “dodge”)

3.The largest sample size is on Biscoe island , followed by Dream then Torgen with the least ; across all years .

I have been having trouble running these codes on my own data in R studio .I keep getting an error or “NULL” when I edit and run these codes . I downloaded functions, so will try to find out where exactly I am missing it . Kinda frustrating to be getting all these errors , however, its part of the learning I suppose. A few hiccups before getting it right !

LucyTallents · 10 September 2024 08:50

Well done, Kundai!

Coding can be really frustrating, but on the flip side, when it goes smoothly it is very satisfying! We just need to get you to that point…

This week’s learning activities include importing data, assigning the appropriate data type, and correcting errors, so perhaps you’ll find some answers to your problems there, later today

Do share the error messages you’re getting, either as a screenshot or copy/paste your code and the error, with a brief description of your data and/or what you want the code to do

It’s best to ask for help in a new post in our course discussion area:

Hit the C on your keyboard
Select our course area where it says category…
Add a descriptive title mentioning the error/problem
Add the help-needed tag where it says Optional tags
Describe your problem, with code/screenshots

If you need to come back to a draft post at a later point, you can find it in the Drafts area of your profile

LucyTallents · 10 September 2024 08:52

Now that others have posted their assignments, you can have a go at the final Feedback step and complete your assignments @Alice_Laguardia, @Nicorll, @Kundai and all

Alice_Laguardia · 10 September 2024 11:40

Hi! Since the boxplot exercise is now online, I thought of testing it out with your example.

Nicorll · 10 September 2024 14:47

Great work Alice,

I am not good at interpreting box plots

selenaflores · 11 September 2024 16:02

Hi all,

Here’s my assignment:

Plot: Line plot of mean flipper length (mm) over sampled years, by species:

flipperlength_species_year1230×922 57 KB

(I am still troubleshooting another line plot of overall mean flipper lengths just by year, sans species.)
Code:

penguins_summary <- penguins %>% 
  group_by(year, species) %>% 
    summarise(flipper_length_mean = mean(flipper_length_mm))

ggplot(penguins_summary,
  aes(x = year, y = flipper_length_mean,
    colour = species)) +
  geom_line()

Interpretation: Mean flipper length slightly increased over time, for all three species, and thus overall. This could indicate that penguin sizes are potentially becoming larger.
Brief reflection: My next step will be to try out these plotting methods on my own data. I’ll hopefully be able to post some results here!
Feedback: @Nicorll Great line scatterplot! I find them harder to interpret than other plotting techniques. What type of plot do you think would be best suited to easily visualise this data?

And a suggestion/idea for all of us as we advance our plot-making skills, to ensure they are as reader-friendly as possible — to explore which colour schemes are best to address colourblindness!

Rachel_Crouthers · 13 September 2024 11:52

2. Code
"library(tidyverse) # loaded to remove missing data

penguins_weight ← penguins %>% # create new dataset to exclude all N/A for certain variable
drop_na(body_mass_g) %>% # remove all n/a
drop_na(sex) %>% # remove all n/a
drop_na(species) # remove all n/a

head(penguins_weight) # check data + opened data frame

ggplot(penguins_weight, # used modified dataset
aes(x = interaction(sex, species),
y = body_mass_g,
colour = species)) +
geom_boxplot() "

Intrepretation: males are heavier than females for each of three species. The Gentoo species ( both males and males) both have higher median weights compared to the other two species. Whereas the gender weights of Adeile and Chinstrap species are similar
Reflection: Good summary of different plot types and uses and how to quickly cross check dataset using names.(). Visualisation of the plot is easy to interpret. It would be nice in this stage to modify x labels aswell as both axis titles

Pavel · 13 September 2024 15:34

Hi all,

here is a bar chart showing number of penguins counted every year.

ggplot(penguins,
aes(x = year, fill = species)) +
geom_bar(position = “dodge”)

(btw. if I surround the code with ```, the code is not complete once published)

Adelie penguins were the most represented in the measurement each year. Followed by Gentoo and then Chinstrap penguins.
I wanted to put year and island on x line to know specificaly how many individuals were measured each year on every island. I tried some combinations, but nothing worked out
I don’t have a feedback rather than question. In code:
“penguins_summary ← penguins %>%
group_by(sex, year) %>%”
what does %>% do? Or what does it stand for?

Thank you and cheers,
Pavel

Nicorll · 16 September 2024 09:51

Hi @selenaflores , perhaps a line graph would have been the best way?

Chiara · 16 September 2024 10:31

Hello,
I tried to use my own dataset but I will have to wait to complete the next module to know how to prepare the data to be imported in R.
Being extremely stubborn I insisted and create a couple of plots with the leaf data we collected together.

first plot I created with the following code

ggplot(leaf_dataset, 
       aes(x= Country, fill=Angio_Gymno))+
  geom_bar(postion="dodge")

I wanted to see the Angiosperm / Gymnosperm relation by country.
I see that a mispelled word in my dataset caused R to create three categories.
Being unable to fix that error without touching the original dataset and then re-upload it I decided to try with another variable, so to compare the collectors to the trunk size,

using this code

ggplot(leaf_dataset, 
       aes(x = Collector, fill = TrunkSize )) + 
  geom_bar(position = "fill")

despite the satisfaction of being able to create this very colourful plot, the datas are categorised wrong because of mispelling.
Again, being unable to fix it using R, I tried again and again with more variables within the same dataset until I found one variable that seems all correct (mostly because I don’t know the original size of your leaves!)

image720×517 5.04 KB

ggplot(leaf_dataset, 
       aes(x=Length_mm, y= Width_mm))+
  geom_point()

most of the leaves we collected are less than 200 mm long and 100 mm wide

Reflection: data are often recorded directly in the field, often in difficult conditions and with many things that can cause distraction, this exercise made me think at the importance of having a solid data collection methodology designed to reduce human error, and that I need to improve my knowledge to prepare data (I constantly mispell words and do typos!)

Chiara · 16 September 2024 10:37

Hi Selena , very good point about colorful plot for colorblind people! (my colleague is colorblind and since they told me I try to do all the graphs in the presentations with colors he can see too)
But also many papers are published in black and white only, how can we render line plot like yours in a black and white image?

Chiara

selenaflores · 16 September 2024 18:48

Great question! I think putting points of differing shapes on the lines would be one way to address the lack of differing colour options on such plots.

selenaflores · 16 September 2024 18:51

For the factors being compared, I agree!