set.seed(12345)
rpois(5, 10)[1] 11 12 9 8 11
set.seed(12345)
rpois(5, 10) [1] 11 12 9 8 11
set.seed(23)
rpois(5, 10)[1] 10 8 12 15 13
Consider StackOverflow! Always look for extisting answers to your question first, then post! AI (LLMs) can be a good help for simple cases but for more complex things will often give you code that breaks easily. Can also at times be esoteric and difficult to debug.
There is no shame in using LLMs as long as one is aware of these issues. In fact, LLMs can be a great way of turning your headache into a well-formed question.
When presenting your problem to others, it can be very helpful to move away from the full dataset and ALL the code, and create a miniature version of your problem; an MRE.
Minimal: The example should use as little code and data as possible to produce the problem
Complete: Your question should contain ALL the information needed to reproduce the problem.
Reproducible: Make sure the code and data provided ACTUALLY reproduce the same problem (and not a different one)
Create the code example by building it up step-by-step until the problem appears. Alternatively, build up the whole code and remove bits at a time, until the problem disappears - then reinsert the last part that was removed.
Here are some good functions to know when creating a mock dataset.
Before doing any random operations, setting the seed to a fixed value will ensure that the code produces exactly the same output every time:
[1] 1 0 1 0 0 1 1 0 1 0 1 1 1 1 1 1 1 0 1 0
[1] 2 2 3 2 0 1 2 1 2 3 2 0 3 2 2 0 1 3 0 1
Read about other distributions using e.g., ?rnorm, and notice the other ways of extracting info about a distribution (dnorm, pnorm, qnorm for the normal density function, cumulative distribution function and quantile function, respectively).
The simple way:
| trt | age | tte |
|---|---|---|
| 0 | 70.1 | 16 |
| 1 | 62.0 | 19 |
| 1 | 64.1 | 20 |
| 0 | 72.2 | 18 |
| 0 | 65.0 | 14 |
Tibbles allow you to build up columns sequentially; i.e. use info from one column in building the next:
df_tibble <- tibble::tibble(
trt = rbinom(n = 5, size = 1, prob = 0.5),
age = rnorm(n = 5, mean = 65, sd = 7) |> round(1),
tte_death = rpois(n = 5, lambda = 15+trt),
tte_censor = rpois(n = 5, lambda = 13),
tte = pmin(tte_death, tte_censor),
event = ifelse(tte_death <= tte_censor, 1, 0)
)
df_tibble |> gt::gt()| trt | age | tte_death | tte_censor | tte | event |
|---|---|---|---|---|---|
| 0 | 69.8 | 13 | 14 | 13 | 1 |
| 0 | 74.4 | 13 | 16 | 13 | 1 |
| 1 | 76.8 | 12 | 6 | 6 | 0 |
| 0 | 69.4 | 17 | 14 | 14 | 0 |
| 0 | 57.9 | 17 | 11 | 11 | 0 |
Sample from a vector (a form of list in R) of values, using sample().
[1] 5 5 8 5 8 8 5 5 8 8
This can also be used to sample row-indices to extract entire rows from a dataset.
[1] 2 4 5
| trt | age | tte_death | tte_censor | tte | event |
|---|---|---|---|---|---|
| 0 | 74.4 | 13 | 16 | 13 | 1 |
| 0 | 69.4 | 17 | 14 | 14 | 0 |
| 0 | 57.9 | 17 | 11 | 11 | 0 |
Or use the {dplyr} function sample_n() to sample rows from a table directly: